School of Information Technology & Computer Science Session: Autumn 2021 University of Wollongong Lecturer: Janusz R. Getta CSCI317 Database Performance Tuning Assignment 2 Published on 29 March 2021 Scope This assignment includes the tasks in improving performance of database applications through indexing of relational tables, clustering of relational tables, and creating materialized views. This assignment is due by Saturday, 17 April, 2021, 7.00 pm (sharp). This assignment is worth 20% of the total evaluation in the subject. A submission procedure is explained at the end of assignment specification. This assignment consists of 6 tasks and specification of each task starts from a new page. It is recommended to solve the problems before attending a laboratory class in order to efficiently use supervised laboratory time. A submission marked by Moodle as "late" is treated as a late submission no matter how many seconds it is late. A policy regarding late submissions is included in the subject outline. A submission of compressed files (zipped, gzipped, rared, tared, 7-zipped, lhzed, … etc) is not allowed. The compressed files will not be evaluated. All files left on Moodle in a state "Draft(not submitted)" will not be evaluated. It is expected that all tasks included within Assignment 2 will be solved individually without any cooperation with the other students. If you have any doubts, questions, etc. please consult your lecturer or tutor during lab classes or office hours. Plagiarism will result in a FAIL grade being recorded for the assessment task. Please read very carefully information included in Prologue section below about the software environment to be used in the subject. Prologue In this subject we use Oracle 19c database server running under Oracle Linux 7.4 operating system on a virtual machine hosted by VirtualBox. To start Oracle database server you have to start VirtualBox first. To start VirtualBox navigate through the following menus: Start->All Programs->Oracle VM VirtualBox->Oracle VM VirtualBox. It is explained in Cookbook for CSIT115 Recipe 1.1, Step 1 "How to start VirtualBox ?" (https://www.uow.edu.au/~jrg/115/COOKBOOK/e1-1-frame.html) how to start VirtualBox. When VirtualBox is started, import an appliance included in a file OracleLinux7.4- 64bits-Oracle19c-22-JAN-2020.ova and located on a drive VMs(E:) in a folder Virtual Machines\CSCI317-Janusz. When ready, power on a virtual machine OracleLinux7.4-64bits-Oracle19c- 22-JAN-2020. A password to a Linux user ORACLE is oracle and a password to Oracle users SYSTEM and SYS (database administrators) is also oracle. Generally, whenever you are asked about a password then it is always oracle, unless you change it. When logged as a Linux user, you can access Oracle database server either through a command line interface (CLI) SQLcl or through Graphical User Interface (GUI) SQL Developer. You can find in Cookbook, Recipe 1, How to access Oracle 19c database server, how to use SQL Developer, how to use basic SQL and SQLcl, and how to create a sample database ? more information on how to use SQLcl and SQL Developer. It is strongly recommended to drop the relational tables of TPC-HR benchmark database and to drop all tablespaces created in the previous Assignment. Then, recreate TPC-HR benchmark database in a way explained in Assignment 1, task 1. The other option is to import a new copy of virtual machine with Oracle 19c and again repeat Assignment 1, task 1. Tasks Task 1 (3 marks) An objective of this task is to denormalize the relational schemas to speed up processing of a given class of applications. In this task you must operate on the original state of a sample benchmark TPC-HR database. It is explained at the end of Prologue section how to return to the original state of the database. Implement the following query as SELECT statement over TPC-HR benchmark database. Find an order key, (O_ORDERKEY), order status (O_ORDERSTATUS), order value (O_TOTALPRICE) and region name (R_NAME) of all customers who submitted an order in a given year (O_ORDERDATE). Use a denormalization of relational table to speed up processing of SELECT statement implemented above. A year when the orders have been submitted is up to you. To denormalize a relational table and to test the improvements in performance create a script file solution1.sql that performs the following actions. (1) First, the script processes the original SELECT statement create above. Use TIMING option of SQLcl client to measure the total processing time. Setting and using TIMING option of SQLcl client is described in "How to …" Cookbook, Recipe 2.1, Step 2. (2) Next, the script performs denormalization of a relational table to speed up the processing of a given SELECT statement in the best possible way. Please remember to re-load data into the denormalized relational table. In this case, there is NO need for indexing, there is no need for creation of derived attributes and there is no need for creation of materialized views or any additional relational tables. It is recommended to first denormalize a conceptual schema given in a file tpchr.pdf before performing any changes to the relational tables of TPC-HR database. There is no need to provide the outcomes of denormalization of a conceptual schema. (3) Next, the script processes a new SELECT statement, that accesses a denormalized relational table and retrieves the same results as the original SELECT statement. Use TIMING option to measure processing time. Note, that processing time should be shorter than processing time of the statement is processed before denormalization. (4) Next, the script creates an index to speed up processing of the new SELECT statement and again, it uses TIMING option to measure processing time. Note, that processing time should be much shorter than processing time in the previous step. When ready start SQLcl client, connect to Oracle database server, and process SQL script solution1.sql. Save a report from processing of the script in a file solution1.lst. It is explained in Cookbook, Recipe 1.5, Step 9, "How to create and to save a report" how to save a report from processing of SQL script in a text file. The script must be processed with SQLcl options ECHO and FEEDBACK set to ON such that all SQL statements processed are included in the report ! A good habit is to put SQLcl statements SPOOL solution1 SET ECHO ON SET FEEDBACK ON SET LINESIZE 300 SET PAGESIZE 300 at the beginning of each SQL script implemented and the following statement at the end of the script SPOOL OFF A report from processing of the script must have NO syntax errors ! Deliverables A file solution1.lst that contains a report from the processing of a script solution1.sql. Task 2 (5 marks) An objective of this task is to estimate the efficiency of indexing. Consider a relational table EMPLOYEE(e#, name, salary, position) where an attribute e# is a primary key. Assume that: (i) a relational table EMPLOYEE occupies 102 data blocks, (ii) a relational table EMPLOYEE contains 103 rows, (iii) an attribute name has 800 distinct values, (iv) an attribute salary has 20 distinct values, (v) an attribute position has 50 distinct values, (vi) a primary key is automatically indexed, (vii) the attributes salary and position are indexed, (viii) all indexes are implemented as B*-trees with a fanout equal to 10, (ix) a leaf level of an index on attribute salary consists of 5 data blocks, (x) a leaf level of an index on attribute position consists of 20 data blocks. Find each of the following queries describe how a database system plans to compute the queries, (i.e. provide detailed query execution plans) and determine the total number of read block operations needed to compute the query. Show ALL computations performed to get the final answer. (1) SELECT DISTINCT salary FROM EMPLOYEE; (2) SELECT position, COUNT(*) FROM EMPLOYEE GROUP BY position; (3) SELECT * FROM EMPLOYEE WHERE e# = 007 AND position = 'boss'; (4) SELECT * FROM EMPLOYEE WHERE position = 'boss'; (5) SELECT * FROM EMPLOYEE WHERE position = 'boss ' AND salary = 1000; (6) SELECT MAX(SALARY) FROM EMPLOYEE; (7) SELECT * FROM EMPLOYEE WHERE position = 'boss ' OR salary = 1000; (8) SELECT salary FROM EMPLOYEE; (9) SELECT salary, position FROM EMPLOYEE; (10) SELECT * FROM EMPLOYEE ORDER BY salary DESC; Deliverables A file solution2.pdf with the comprehensive descriptions of query processing plans for each query and the estimations of the total number of read block operations needed to process each query. Task 3 (3 marks) An objective of this task is to improve performance of query processing through indexing. In this task you must operate on the original state of a sample benchmark TPC-HR database. It is explained at the end of Prologue section how to return to the original state of the database. Consider the relational tables included in a sample benchmark TPC-HR database and owned by a user tpchr. A conceptual schema of the database is included in a file tpchr.png. We would like to improve performance of the following group of queries: (1) Find the names, retail prices, and brands of all parts ordered by retail prices in ascending order (2) Find the total number of values in a column P_NAME in a relational table PART. (3) Find the total number of parts that belong to each brand. List the names of brands together with the total number of parts in each brand. (4) Find the total number of distinct types of parts. (5) Find the types of parts and retail prices of the parts that belong to a given brand and have a given name. A brand name and a part name are up to you. Implement the queries listed above as SELECT statements and then find the smallest collection of indexes, that speeds up the processing of the queries listed above. Write CREATE INDEX statements that create all indexes found. For each SELECT statement explain in the comments how it will benefit from the existence of the indexes. Implement SQL script solution3.sql, that performs the following actions. (1) First the script creates the smallest number of indexes that that speeds up the processing of all queries listed above. (2) Next, the script finds and lists the query processing plans for all SELECT statement implementing the queries listed above. Include into a script solution3.sql the comments with the explanations what method of index traversal (vertical, horizontal, vertical + horizontal, with and without access to a relational table) is planned by a query optimizer for each SELECT statement. Do not use the names of the index traversal methods listed in the query processing plans. When ready, process SQL script file solution3.sql and save a report from processing in a file solution3.lst. The script must be processed with SQLcl options ECHO and FEEDBACK set to ON such that all SQL statements processed are included in the report ! You must put the following SQLcl statements SPOOL solution3 SET ECHO ON SET FEEDBACK ON SET LINESIZE 300 SET PAGESIZE 300 at the beginning of each SQL script implemented and the following statement at the end of the script SPOOL OFF A report from processing of the script must have NO syntax errors ! Deliverables A file solution3.lst that contains a report from the processing of a script solution3.sql. Task 4 (3 marks) An objective of this task is to improve performance of query processing through indexing. In this task you must operate on the original state of a sample benchmark TPC-HR database. It is explained at the end of Prologue section how to return to the original state of the database. Consider the following SELECT statement. SELECT L_ORDERKEY, COUNT(*) TOT FROM LINEITEM WHERE L_QUANTITY =
AND L_DISCOUNT = GROUP BY L_ORDERKEY HAVING COUNT(*) > 1 ORDER BY L_ORDERKEY The values of placeholders and are up to you. Implement SQL script solution4.sql that performs the following actions. (1) First, the script finds and lists a query processing plan for SELECT statement given above. It is recommended to record the estimated values of query processing costs listed at the top of Cost column of query processing plan. (2) Next, the script creates an index based on the single column. The index must improve performance of query processing in the best way for any single column key index. (3) Next, the script finds a query processing plan for SELECT statement give above. You can use a value listed in a column Cost of query processing plan as a measure of improvement in query processing with an index. (4) Next, the script drops an index created in a step (2). (5) Next, the script creates an index based on two columns (composite index key), that improves performance of query processing in the best way for any index created over two columns. (6) Next, the script finds a query processing plan for SELECT statement give above. You can use a value listed in a column Cost of query processing plan as a measure of improvement in query processing with an index. (7) Next, the script drops an index created in a step (5). (8) Next, the script creates two single column indexes, that improve performance of query processing in the best way for any two single column indexes. (9) Next, the script finds a query processing plan for SELECT statement give above. You can use a value listed in a column Cost of query processing plan as a measure of improvement in query processing with the indexes. (10) Next, the script drops the indexes created in a step (8). (11) Now, assume that we have enough persistent storage to create a single index on any number of columns as long as such index improves performance of query processing in the best way. Create an index on any number of columns, that improves performance of query processing in the best way. (12) Next, the script finds a query processing plan for SELECT statement give above. You can use a value listed in a column Cost of query processing plan as a measure of improvement in query processing with an index. (13) Next, the script drops the indexes created in a step (11). When ready, process SQL script file solution4.sql and save a report from processing in a file solution4.lst. The script must be processed with SQLcl options ECHO and FEEDBACK set to ON such that all SQL statements processed are included in the report ! You must put the following SQLcl statements SPOOL solution4 SET ECHO ON SET FEEDBACK ON SET LINESIZE 300 SET PAGESIZE 300 at the beginning of each SQL script implemented and the following statement at the end of the script SPOOL OFF A report from processing of the script must have NO syntax errors ! Deliverables A file solution4.lst that contains a report from the processing of a script solution4.sql. Task 5 (3 marks) An objective of this task is to improve performance of query processing through indexing. In this task you must operate on the original state of a sample benchmark TPC-HR database. It is explained at the end of Prologue section how to return to the original state of the database. Consider the relational tables included in a sample benchmark TPC-HR database and owned by a user tpchr. A conceptual schema of the database is included in a file tpchr.png. We would like to use indexing to improve performance of SELECT statements listed below. Assume, the we are not restricted by the amounts of persistent storage used for indexing and the relational tables used by the queries are not frequently modified by the data manipulation operations. It means, that for each one of the queries listed below we are allowed to create any number and any size of indexes. An objective is to improve performance of query processing through indexing as much as it is possible. (1) SELECT O_CUSTKEY, SUM(L_EXTENDEDPRICE) FROM LINEITEM JOIN ORDERS ON L_ORDERKEY = O_ORDERKEY WHERE O_ORDERSTATUS = AND L_TAX = GROUP BY O_CUSTKEY; (2) SELECT C_NAME, C_PHONE FROM CUSTOMER WHERE C_ACCTBAL > AND C_CUSTKEY NOT IN ( SELECT O_CUSTKEY FROM ORDERS WHERE O_TOTALPRICE > AND O_CLERK = ); (3) SELECT O_ORDERKEY, O_ORDERDATE, O_TOTALPRICE FROM ORDERS WHERE NOT EXISTS ( SELECT 'whatever' FROM LINEITEM WHERE L_ORDERKEY = O_ORDERKEY AND L_PARTKEY IN ( SELECT P_PARTKEY FROM PART WHERE P_NAME = ) ); The values of placeholders , , , , , and are up to you. Implement SQL script solution4.sql that performs the following actions. (1) First, the script creates the indexes to speed up query processing of SELECT statement (1). (2) Next, the script finds and lists a query processing plan for SELECT statement (1). (3) Next, the script drops the indexes created in a step (1). (4) Next, the script creates the indexes to speed up query processing of SELECT statement (2). (5) Next, the script finds and lists a query processing plan for SELECT statement (2). (6) Next, the script drops the indexes created in a step (4). (7) Next, the script creates the indexes to speed up query processing of SELECT statement (3). (8) Next, the script finds and lists a query processing plan for SELECT statement (3). (9) Next, the script drops the indexes created in a step (7). When ready process a script solution5.sql and save a report from processing in a file solution5.lst. The script must be processed with SQLcl options ECHO and FEEDBACK set to ON such that all SQL statements processed are included in the report ! You must put the following SQLcl statements SPOOL solution5 SET ECHO ON SET FEEDBACK ON SET LINESIZE 300 SET PAGESIZE 300 at the beginning of each SQL script implemented and the following statement at the end of the script SPOOL OFF A report from processing of the script must have NO syntax errors ! Deliverables A file solution5.lst that contains a report from the processing of a script solution5.sql. Task 6 (3 marks) An objective of this task is to improve performance of query processing through clustering. Consider the following parameterized SELECT statement. SELECT * FROM ORDERS WHERE O_TOTALPRICE = ; A value of placeholders is up to you. Implement SQL script solution6.sql, that performs the following actions. (1) First, the script creates an ordinary index to speed up processing of SELECT statement given above. (2) Next, the script finds and lists a query processing plan for SELECT statement given above. (3) Next, the script drops an index created in a step (2). (4) Next, the script creates a new tablespace. All parameters of the tablespace are up to you. (5) Next, the script creates a cluster and a clustered version of a relational table ORDERS in the tablespace created in the previous step. A cluster should be created such that an index over the cluster speeds up processing of SELECT statement given above. A name of a new relational table is up to you. There is no need to enforce the referential integrity constraints in a clustered version of the new relational table. It is explained in the lecture slides and in the Cookbook how to create a cluster and how to create a relational table in a cluster. (6) Next, the script copies the contents of a relational table ORDERS to a clustered version of a relational table ORDERS. (7) Next, the script finds and lists a query processing plan for SELECT statement given above and such that it uses a clustered version of a relational table ORDERS. The query processing costs reported in the current query processing plan must be lower than in a query processing plan obtained in a step (2). (8) Finally, the script drops the tablespace created in a step (4). When ready process a script solution5.sql and save a report from processing in a file solution5.lst. The script must be processed with SQLcl options ECHO and FEEDBACK set to ON such that all SQL statements processed are included in the report ! You must put the following SQLcl statements SPOOL solution5 SET ECHO ON SET FEEDBACK ON SET LINESIZE 300 SET PAGESIZE 300 at the beginning of each SQL script implemented and the following statement at the end of the script SPOOL OFF A report from processing of the script must have NO syntax errors ! Deliverables A file solution5.lst that contains a report from the processing of a script solution5.sql. Submission Note, that you have only one submission. So, make it absolutely sure that you submit the correct files with the correct contents. No other submission is possible! Submit the files solution1.lst, solution2.pdf, solution3.lst, solution4.lst, solution5.lst, and solution6.lst through Moodle in the following way: (1) Access Moodle at http://moodle.uowplatform.edu.au/ (2) To login use a Login link located in the right upper corner the Web page or in the middle of the bottom of the Web page (3) When logged select a site CSCI317 (S121) Database Performance Tuning (4) Scroll down to a section Submissions (5) Click at a link In this place you can submit the outcomes of Assignment 2 (6) Click at a button Add Submission (7) Move a file solution1.lst into an area You can drag and drop files here to add them. You can also use a link Add… (8) Repeat step (7) for the files solution2.pdf, solution3.lst, solution4.lst, solution5.pdf, and solution6.lst. (9) Click at a button Save changes (10) Click at a button Submit assignment (11) Click at the checkbox with a text attached: By checking this box, I confirm that this submission is my own work, … in order to confirm the authorship of your submission. (12) Click at a button Continue A policy regarding late submissions is included in the subject outline. End of specification 欢迎咨询51作业君