辅导案例-SEEM3550-Assignment 3
The Chinese University of Hong Kong Department of Systems Engineering and Engineering Management SEEM3550 Fundamentals of Information Systems 2019-2020 Assignment 3: Query Processing and Query Optimization This is an individual assignment.1 This assignment is about query processing and query optimization, and is worth of 14% of the total assessment. The due date for the assignment 3 is 11:59pm, May 4, 2020. Submissions need to be submitted to the online Blackboard system. The late penalty will be 10% per day. A submission will not be accepted five days after the deadline. • As requested by CUHK, every assignment handed in should be accompa- nied by a signed declaration. The best is to do as follows. 1. submit your assignment to Veriguide and then sign the declaration form, and 2. submit both your assignment and declaration form together to the blackboard system. • The declaration form is also available at https://www.cuhk.edu.hk/policy/academichonesty/Eng htm files (2013-14)/p10.htm • You must submit your assignment together with the declaration form. Without your declaration form, we cannot give you a mark for your as- signment. In the website for the textbook http://db-book.com, the solutions to practice exercises in most chapters are given. It is strongly recommended for you to try the practice exercises and see whether your answers are correct by checking the answers provided at the website. 1Departmental Guideline for Plagiarism (Department of Systems Engineering and Engi- neering Management): If a student is found plagiarizing, his/her case will be reported to the Department Examination Panel. If the case is proven after deliberation, the student will auto- matically fail the course in which he/she committed plagiarism. The definition of plagiarism includes copying of the whole or parts of written assignments, programming exercises, reports, quiz papers, mid-term examinations and final examinations. The penalty will apply to both the one who copies the work and the one whose work is being copied, unless the latter can prove his/her work has been copied unwittingly. Furthermore, inclusion of others’ works or results without citation in assignments and reports is also regarded as plagiarism with similar penalty to the offender. A student caught plagiarizing during tests or examinations will be reported to the Faculty office and appropriate disciplinary authorities for further action, in addition to failing the course. 1 2019-2020 2 Question 1: Two relational algebra expressions are said to be equiva- lent if the two expressions generate the same set of tuples. To obtain a re- lational algebra expression from another, we use equivalence rules, as discussed in ch13.pptx. Consider the following SQL for a user query: “Find the names of all instructors in the Music department who have taught a course in 2009, along with the titles of the courses that they taught.” select I.name, C.title from instructor I, teaches T, course C where I.dept_name = "Music" and T.year = 2009 and I.ID = T.ID and T.course_id = C.course_id (1) For the SQL given above, show its initial relational algebra expression. Hint: follow the slide 6.38 in ch6.pptx. (2) Given the relational algebra expression you have in (1), show how to get the relational algebra expression as shown in Figure (a) on the slide 1.14 in ch13.pptx, without the projection of Πcourse id,title. (3) Consider the relational algebra expression you get in (2). Show how you add projections based on Rule 8(a) and Rule 8(b). You need to add a projection for each of the three input relations. (4) Let the initial relational algebra expression you have for (1) be E. Show how you add 4 different equivalent relational algebra expressions beside E into EQ following the algorithm given in the slide 1.19 in ch13.pptx in details. You need to show which one you pick up as Ei, which ei is selected, and which rule you use for each new relational algebra expression E′ to be added. Question 2: We discussed the merge-join algorithm for an equi-join r ✶r.A=s.B s, when both relations are sorted in ch12.pptx. The pseudo code is shown in the slide 12.32 in ch12.pptx and discussed in Section 12.5.4 in the textbook. (1) Suppose that relation r is sorted on the attribute A but relation s is not sorted on the attribute B. Here, we assume that there is B+-tree, as a secondary index, built for the join attribute B in s. Recall that there are pointer and search key pairs, (pi, ki), in a leaf node in the secondary B+-tree. For a secondary B+-tree, such a pointer pi points to a list of pointers, Di = {d1, d2, · · · }, where each dj = (bl, sk) points to a data record (or a tuple) that has the search key ki located at the slot sk in the block bi in the data file using the slotted page structure (refer to the slide 2019-2020 3 10.14 in ch11.ppt for the slotted page structure and refer to the slide 12.10 in ch12.pptx for Di). Give a pseudo code to do this “hybrid merge join” using the secondary B+-tree built on the attribute s.B. • A naive approach is to do something using the similar idea presented in indexed nested-loop. Note that it does not make use of sorting for both the sorted attribute A and the secondary B+-tree built on B. • You need to give an algorithm that utilizes the sorting on both sides (e.g., the sorted attribute A and the secondary B+-tree built on the attribute B), and minimizes the block transfers. Hint: for minimiz- ing the block transfer, you need to know the drawback of secondary B+-tree (refer to the slide 12.9 when we discuss the selection using indices). (2) Suppose that relation r is not sorted on the attribute A and the relation s is not sorted on the attribute B. However, there is a secondary B+-tree built for relation r on the attribute A and there is a secondary B+-tree built for relation s on the attribute B. Show a pseudo code to do this “hybrid merge join” making use of both secondary B+-trees to minimize the block transfer.