Problem Set 3: Query Evaluation and Optimization COMP SCI 339 Due: February 21, 2022 Name: 1 Buffer Manager and Simple Query Execution 1. (3 points) Consider a buffer pool with a LRU page replacement policy. It has 10 pages. The buffer pool is initially empty. Recall the relation S – from Problem Set 2 – with the following schema: S(a int, b char(10), c char(20)) and the following SQL query: SELECT S.a FROM S A simple physical query plan for this query is the following: π a Sequential Scan Query Processor Storage Manager Physical Query Plan Heap File Access Method Buffer Pool Manager S
Disk We show the query plan in the context of the relevant DBMS architecture components. To execute the query, the system will call open() and then next() on the project operator. We ignore hasNext() in this exercise. Consider that relation S is stored in a heap file on disk. 1 COMP SCI 339 Problem Set 3 Rogers (a) (1 point) Explain how the execution of this query will proceed as the system calls open() and then next() on the topmost, project operator. You only need to describe what happens on the call to open() and then on the first call to next(). You do not need to describe subsequent calls to next(). Your explanation should describe the control flow (who calls whom and when) between (1) the project operator, (2) the sequential scan operator, (3) the heap file access method, and (4) the buffer pool manager. Keep in mind that the buffer pool manager will call the heap file to actually read a page from disk. Page 2 COMP SCI 339 Problem Set 3 Rogers (b) (0.5 point) What will be the content of the buffer pool after the first next() call on the project operator returns a tuple. (c) (0.5 point) What will be the content of the buffer pool after the second next() call on the project operator returns a tuple. Page 3 COMP SCI 339 Problem Set 3 Rogers (d) (0.5 point) Assuming S contains 10,000 tuples, what will be the content of the buffer pool after 1000 next() calls on the project operator. (e) (0.5 point) Assuming S contains 10,000 tuples, what will be the content of the buffer pool after the query completes? Page 4 COMP SCI 339 Problem Set 3 Rogers 2 Query Plan Cost Estimation 2. (4 pts) Consider the following relations and physical query plan: R(a,b,c) S(d,e,f,g) T(h,i,j) TCARD(R) = 1000 TCARD(S) = 1000 TCARD(T) = 1000 NCARD(R) = 10,000 NCARD(S) = 10,000 NCARD(T) = 10,000 Min(R,a) = 0 ICARD(S,e) = 10 ICARD(T,h) = 100 Max(R,a) = 9000 ICARD(S,f) = 100 ICARD(R,b) = 100 ICARD(S,d) = 50 ICARD(R,c) = 20 ICARD(S,g) = 40 R S c = d (Clustered index on a) (Unclustered index on (e,f) ) (Hash join) (Index Nested Loop) (d) σ b=100 ∨ b=101 (c) (e) g=h T (Clustered index on h ) (a) σ a > 100 ∧ a <= 1000 (Use B+ tree index ) (b) σ e = 10 ∧ f = 1000 (Use B+ tree index ) π a,b,c,g,I,j R1 R2 R3 R4 R5 (f) Page 5 COMP SCI 339 Problem Set 3 Rogers (a) (1 point) Compute the selectivity of the following predicates: 1. a > 100 ∧ a ≤ 1000 2. e = 10 ∧ f = 1000 3. c = d 4. b = 100 ∨ b = 101 5. g = h (b) (1.5 points) Compute the cardinality of all intermediate relations labeled R1 through R5 and the final result, call it R6. Page 6 COMP SCI 339 Problem Set 3 Rogers (c) (1.5 points) Compute the cost of this query plan in terms of number of pages read from disk or written to disk. Assume that all of the operators after the hash join are pipelined - i.e., you do not need to store intermediate results on disk. Only count data pages, not index pages, in your lookups. Page 7 COMP SCI 339 Problem Set 3 Rogers 3 Query Optimization 3. (4 points) Consider the following three relations: R(a,b,c) S(d,e) W(f,g,h) TCARD(R) = 100 TCARD(S) = 1,000 TCARD(W) = 10 NCARD(R) = 1,000 NCARD(S) = 10,000 NCARD(W) = 100 Consider the following SQL Query: SELECT * FROM R, S, W WHERE R.a = S.d AND R.c = W.h (a) (2 points) Assume that all relations are stored in heap files, there are no indexes, only page-at-a-time nested-loop joins can be used, and the selectivity of each join predicate is 0.1%. Show the query plan selected by a Selinger-style, bottom-up, dynamic programming optimizer. Use the number of disk I/O operations as the cost function. Hints: • Remember that a Selinger-style optimizer will try to avoid Cartesian products in the plan. So do NOT consider such plans. • The Selinger optimizer will only consider left-deep plans. Draw the selected plan and show how it is derived. You can use the following table to help you but do NOT worry about computing the exact cost and size values if you don’t need exact values to prune plans. In the table, P/K indicates the choice to either prune or keep the subplan. Hint: When joining tuples, keep in mind that the tuples get bigger. Subquery Cost Size of output Plan P/K R 100 page I/Os 1K records on 100 pages Sequential scan of R K S 1K page I/Os 10K records on 1K pages Sequential scan of S K W 10 page I/Os 100 records on 10 pages Sequential scan of W K RS . . . 10K records on 2K pages . . . SR . . . . . . . . . Page 8 COMP SCI 339 Problem Set 3 Rogers Space for your answer: Page 9 COMP SCI 339 Problem Set 3 Rogers (b) (1 point) Consider the following small modification to the above query and consider that we add an unclustered B+ tree index on S.d as well as a clustered B+ tree index on R.b. Without re-computing the new best plan, explain how these changes affect the optimization process for this query. SELECT * FROM R, S, W WHERE R.a = S.d AND R.c = W.h AND R.b > 100 Page 10 COMP SCI 339 Problem Set 3 Rogers (c) (1 point) What is an interesting order? Give one or two examples and explain how they can be useful in getting a better physical query plan. Page 11
欢迎咨询51作业君