程序代写案例-CSCI317

欢迎使用51辅导，51作业君孵化低价透明的学长辅导平台，服务保持优质，平均费用压低50%以上！ 51fudao.top

School of Information Technology & Computer Science Session: Autumn 2021
University of Wollongong Lecturer: Janusz R. Getta

CSCI317 Database Performance Tuning
Assignment 2
Published on 29 March 2021

Scope
This assignment includes the tasks in improving performance of database applications
through indexing of relational tables, clustering of relational tables, and creating
materialized views.
This assignment is due by Saturday, 17 April, 2021, 7.00 pm (sharp).

This assignment is worth 20% of the total evaluation in the subject.

A submission procedure is explained at the end of assignment specification.

This assignment consists of 6 tasks and specification of each task starts from a new page.

It is recommended to solve the problems before attending a laboratory class in order to
efficiently use supervised laboratory time.

A submission marked by Moodle as "late" is treated as a late submission no matter how
many seconds it is late.

A policy regarding late submissions is included in the subject outline.

A submission of compressed files (zipped, gzipped, rared, tared, 7-zipped, lhzed, … etc) is
not allowed. The compressed files will not be evaluated.

All files left on Moodle in a state "Draft(not submitted)" will not be evaluated.

It is expected that all tasks included within Assignment 2 will be solved individually
without any cooperation with the other students. If you have any doubts, questions, etc.
please consult your lecturer or tutor during lab classes or office hours. Plagiarism will result
in a FAIL grade being recorded for the assessment task.

Please read very carefully information included in Prologue section below about the
software environment to be used in the subject.

Prologue
In this subject we use Oracle 19c database server running under Oracle Linux 7.4 operating
system on a virtual machine hosted by VirtualBox. To start Oracle database server you
have to start VirtualBox first. To start VirtualBox navigate through the following menus:
Start->All Programs->Oracle VM VirtualBox->Oracle VM VirtualBox.
It is explained in Cookbook for CSIT115 Recipe 1.1, Step 1 "How to start VirtualBox ?"
(https://www.uow.edu.au/~jrg/115/COOKBOOK/e1-1-frame.html)
how to start VirtualBox.

When VirtualBox is started, import an appliance included in a file OracleLinux7.4-
64bits-Oracle19c-22-JAN-2020.ova and located on a drive VMs(E:) in a
folder Virtual Machines\CSCI317-Janusz.

When ready, power on a virtual machine OracleLinux7.4-64bits-Oracle19c-
22-JAN-2020.

A password to a Linux user ORACLE is oracle and a password to Oracle users SYSTEM
and SYS (database administrators) is also oracle. Generally, whenever you are asked
about a password then it is always oracle, unless you change it.

When logged as a Linux user, you can access Oracle database server either through a
command line interface (CLI) SQLcl or through Graphical User Interface (GUI) SQL
Developer.

You can find in Cookbook, Recipe 1, How to access Oracle 19c database server, how to
use SQL Developer, how to use basic SQL and SQLcl, and how to create a sample
database ? more information on how to use SQLcl and SQL Developer.

It is strongly recommended to drop the relational tables of TPC-HR benchmark database
and to drop all tablespaces created in the previous Assignment. Then, recreate TPC-HR
benchmark database in a way explained in Assignment 1, task 1. The other option is to
import a new copy of virtual machine with Oracle 19c and again repeat Assignment 1, task
1.

Tasks
Task 1 (3 marks)
An objective of this task is to denormalize the relational schemas to speed up
processing of a given class of applications.

In this task you must operate on the original state of a sample benchmark TPC-HR
database. It is explained at the end of Prologue section how to return to the original state
of the database.

Implement the following query as SELECT statement over TPC-HR benchmark database.

Find an order key, (O_ORDERKEY), order status (O_ORDERSTATUS), order value
(O_TOTALPRICE) and region name (R_NAME) of all customers who submitted an order
in a given year (O_ORDERDATE).

Use a denormalization of relational table to speed up processing of SELECT statement
implemented above. A year when the orders have been submitted is up to you. To
denormalize a relational table and to test the improvements in performance create a script
file solution1.sql that performs the following actions.

(1) First, the script processes the original SELECT statement create above. Use TIMING
option of SQLcl client to measure the total processing time. Setting and using
TIMING option of SQLcl client is described in "How to …" Cookbook, Recipe 2.1,
Step 2.

(2) Next, the script performs denormalization of a relational table to speed up the
processing of a given SELECT statement in the best possible way. Please remember
to re-load data into the denormalized relational table.

In this case, there is NO need for indexing, there is no need for creation of derived
attributes and there is no need for creation of materialized views or any additional
relational tables.

It is recommended to first denormalize a conceptual schema given in a file
tpchr.pdf before performing any changes to the relational tables of TPC-HR
database. There is no need to provide the outcomes of denormalization of a conceptual
schema.

(3) Next, the script processes a new SELECT statement, that accesses a denormalized
relational table and retrieves the same results as the original SELECT statement. Use
TIMING option to measure processing time. Note, that processing time should be
shorter than processing time of the statement is processed before denormalization.

(4) Next, the script creates an index to speed up processing of the new SELECT statement
and again, it uses TIMING option to measure processing time. Note, that processing
time should be much shorter than processing time in the previous step.

When ready start SQLcl client, connect to Oracle database server, and process SQL script
solution1.sql. Save a report from processing of the script in a file
solution1.lst. It is explained in Cookbook, Recipe 1.5, Step 9, "How to create and
to save a report" how to save a report from processing of SQL script in a text file.

The script must be processed with SQLcl options ECHO and FEEDBACK set to ON such
that all SQL statements processed are included in the report !

A good habit is to put SQLcl statements

SPOOL solution1
SET ECHO ON
SET FEEDBACK ON
SET LINESIZE 300
SET PAGESIZE 300

at the beginning of each SQL script implemented and the following statement at the end of
the script

SPOOL OFF

A report from processing of the script must have NO syntax errors !

Deliverables
A file solution1.lst that contains a report from the processing of a script
solution1.sql.

Task 2 (5 marks)
An objective of this task is to estimate the efficiency of indexing.

Consider a relational table EMPLOYEE(e#, name, salary, position) where an
attribute e# is a primary key.
Assume that:
(i) a relational table EMPLOYEE occupies 102 data blocks,
(ii) a relational table EMPLOYEE contains 103 rows,
(iii) an attribute name has 800 distinct values,
(iv) an attribute salary has 20 distinct values,
(v) an attribute position has 50 distinct values,
(vi) a primary key is automatically indexed,
(vii) the attributes salary and position are indexed,
(viii) all indexes are implemented as B*-trees with a fanout equal to 10,
(ix) a leaf level of an index on attribute salary consists of 5 data blocks,
(x) a leaf level of an index on attribute position consists of 20 data blocks.

Find each of the following queries describe how a database system plans to compute the
queries, (i.e. provide detailed query execution plans) and determine the total number of
read block operations needed to compute the query. Show ALL computations performed
to get the final answer.

(1)
SELECT DISTINCT salary
FROM EMPLOYEE;

(2)
SELECT position, COUNT(*)
FROM EMPLOYEE
GROUP BY position;

(3)
SELECT *
FROM EMPLOYEE
WHERE e# = 007 AND position = 'boss';

(4)
SELECT *
FROM EMPLOYEE
WHERE position = 'boss';

(5)
SELECT *
FROM EMPLOYEE
WHERE position = 'boss ' AND salary = 1000;

(6)
SELECT MAX(SALARY)
FROM EMPLOYEE;

(7)
SELECT *
FROM EMPLOYEE
WHERE position = 'boss ' OR salary = 1000;

(8)
SELECT salary
FROM EMPLOYEE;

(9)
SELECT salary, position
FROM EMPLOYEE;

(10)
SELECT *
FROM EMPLOYEE
ORDER BY salary DESC;

Deliverables
A file solution2.pdf with the comprehensive descriptions of query processing plans
for each query and the estimations of the total number of read block operations needed to
process each query.

Task 3 (3 marks)
An objective of this task is to improve performance of query processing through
indexing.

In this task you must operate on the original state of a sample benchmark TPC-HR
database. It is explained at the end of Prologue section how to return to the original state
of the database.

Consider the relational tables included in a sample benchmark TPC-HR database and
owned by a user tpchr. A conceptual schema of the database is included in a file
tpchr.png. We would like to improve performance of the following group of queries:

(1) Find the names, retail prices, and brands of all parts ordered by retail prices in
ascending order

(2) Find the total number of values in a column P_NAME in a relational table PART.

(3) Find the total number of parts that belong to each brand. List the names of brands
together with the total number of parts in each brand.

(4) Find the total number of distinct types of parts.

(5) Find the types of parts and retail prices of the parts that belong to a given brand and
have a given name. A brand name and a part name are up to you.

Implement the queries listed above as SELECT statements and then find the smallest
collection of indexes, that speeds up the processing of the queries listed above. Write
CREATE INDEX statements that create all indexes found. For each SELECT statement
explain in the comments how it will benefit from the existence of the indexes.

Implement SQL script solution3.sql, that performs the following actions.

(1) First the script creates the smallest number of indexes that that speeds up the
processing of all queries listed above.

(2) Next, the script finds and lists the query processing plans for all SELECT statement
implementing the queries listed above.

Include into a script solution3.sql the comments with the explanations what
method of index traversal (vertical, horizontal, vertical + horizontal, with and without
access to a relational table) is planned by a query optimizer for each SELECT
statement. Do not use the names of the index traversal methods listed in the query
processing plans.

When ready, process SQL script file solution3.sql and save a report from processing
in a file solution3.lst.
The script must be processed with SQLcl options ECHO and FEEDBACK set to ON such
that all SQL statements processed are included in the report !

You must put the following SQLcl statements

SPOOL solution3
SET ECHO ON
SET FEEDBACK ON
SET LINESIZE 300
SET PAGESIZE 300

at the beginning of each SQL script implemented and the following statement at the end of
the script

SPOOL OFF

A report from processing of the script must have NO syntax errors !

Deliverables
A file solution3.lst that contains a report from the processing of a script
solution3.sql.

Task 4 (3 marks)
An objective of this task is to improve performance of query processing through
indexing.

In this task you must operate on the original state of a sample benchmark TPC-HR
database. It is explained at the end of Prologue section how to return to the original state
of the database.

Consider the following SELECT statement.

SELECT L_ORDERKEY, COUNT(*) TOT
FROM LINEITEM
WHERE L_QUANTITY = AND L_DISCOUNT =
GROUP BY L_ORDERKEY
HAVING COUNT(*) > 1
ORDER BY L_ORDERKEY

The values of placeholders and are up to you.

Implement SQL script solution4.sql that performs the following actions.

(1) First, the script finds and lists a query processing plan for SELECT statement given
above. It is recommended to record the estimated values of query processing costs
listed at the top of Cost column of query processing plan.

(2) Next, the script creates an index based on the single column. The index must improve
performance of query processing in the best way for any single column key index.

(3) Next, the script finds a query processing plan for SELECT statement give above. You
can use a value listed in a column Cost of query processing plan as a measure of
improvement in query processing with an index.

(4) Next, the script drops an index created in a step (2).

(5) Next, the script creates an index based on two columns (composite index key), that
improves performance of query processing in the best way for any index created over
two columns.

(6) Next, the script finds a query processing plan for SELECT statement give above. You
can use a value listed in a column Cost of query processing plan as a measure of
improvement in query processing with an index.

(7) Next, the script drops an index created in a step (5).

(8) Next, the script creates two single column indexes, that improve performance of query
processing in the best way for any two single column indexes.

(9) Next, the script finds a query processing plan for SELECT statement give above. You
can use a value listed in a column Cost of query processing plan as a measure of
improvement in query processing with the indexes.

(10) Next, the script drops the indexes created in a step (8).

(11) Now, assume that we have enough persistent storage to create a single index on any
number of columns as long as such index improves performance of query processing
in the best way. Create an index on any number of columns, that improves performance
of query processing in the best way.

(12) Next, the script finds a query processing plan for SELECT statement give above. You
can use a value listed in a column Cost of query processing plan as a measure of
improvement in query processing with an index.

(13) Next, the script drops the indexes created in a step (11).

When ready, process SQL script file solution4.sql and save a report from processing
in a file solution4.lst.
The script must be processed with SQLcl options ECHO and FEEDBACK set to ON such
that all SQL statements processed are included in the report !

You must put the following SQLcl statements

SPOOL solution4
SET ECHO ON
SET FEEDBACK ON
SET LINESIZE 300
SET PAGESIZE 300

at the beginning of each SQL script implemented and the following statement at the end of
the script

SPOOL OFF

A report from processing of the script must have NO syntax errors !

Deliverables
A file solution4.lst that contains a report from the processing of a script
solution4.sql.

Task 5 (3 marks)
An objective of this task is to improve performance of query processing through
indexing.

In this task you must operate on the original state of a sample benchmark TPC-HR
database. It is explained at the end of Prologue section how to return to the original state
of the database.

Consider the relational tables included in a sample benchmark TPC-HR database and
owned by a user tpchr. A conceptual schema of the database is included in a file
tpchr.png.

We would like to use indexing to improve performance of SELECT statements listed
below. Assume, the we are not restricted by the amounts of persistent storage used for
indexing and the relational tables used by the queries are not frequently modified by the
data manipulation operations. It means, that for each one of the queries listed below we are
allowed to create any number and any size of indexes. An objective is to improve
performance of query processing through indexing as much as it is possible.

(1)
SELECT O_CUSTKEY, SUM(L_EXTENDEDPRICE)
FROM LINEITEM JOIN ORDERS
ON L_ORDERKEY = O_ORDERKEY
WHERE O_ORDERSTATUS = AND
L_TAX =
GROUP BY O_CUSTKEY;

(2)
SELECT C_NAME, C_PHONE
FROM CUSTOMER
WHERE C_ACCTBAL > AND
C_CUSTKEY NOT IN ( SELECT O_CUSTKEY
FROM ORDERS
WHERE O_TOTALPRICE > AND
O_CLERK = );
(3)
SELECT O_ORDERKEY, O_ORDERDATE, O_TOTALPRICE
FROM ORDERS
WHERE NOT EXISTS ( SELECT 'whatever'
FROM LINEITEM
WHERE L_ORDERKEY = O_ORDERKEY AND
L_PARTKEY IN
( SELECT P_PARTKEY
FROM PART
WHERE P_NAME = ) );

The values of placeholders , , , ,

, and are up to you.

Implement SQL script solution4.sql that performs the following actions.

(1) First, the script creates the indexes to speed up query processing of SELECT statement
(1).

(2) Next, the script finds and lists a query processing plan for SELECT statement (1).

(3) Next, the script drops the indexes created in a step (1).

(4) Next, the script creates the indexes to speed up query processing of SELECT statement
(2).

(5) Next, the script finds and lists a query processing plan for SELECT statement (2).

(6) Next, the script drops the indexes created in a step (4).

(7) Next, the script creates the indexes to speed up query processing of SELECT statement
(3).

(8) Next, the script finds and lists a query processing plan for SELECT statement (3).

(9) Next, the script drops the indexes created in a step (7).

When ready process a script solution5.sql and save a report from processing in a file
solution5.lst.

The script must be processed with SQLcl options ECHO and FEEDBACK set to ON such
that all SQL statements processed are included in the report !

You must put the following SQLcl statements

SPOOL solution5
SET ECHO ON
SET FEEDBACK ON
SET LINESIZE 300
SET PAGESIZE 300

at the beginning of each SQL script implemented and the following statement at the end of
the script

SPOOL OFF

A report from processing of the script must have NO syntax errors !

Deliverables
A file solution5.lst that contains a report from the processing of a script
solution5.sql.

Task 6 (3 marks)
An objective of this task is to improve performance of query processing through
clustering.

Consider the following parameterized SELECT statement.

SELECT *
FROM ORDERS
WHERE O_TOTALPRICE = ;

A value of placeholders is up to you.

Implement SQL script solution6.sql, that performs the following actions.

(1) First, the script creates an ordinary index to speed up processing of SELECT statement
given above.

(2) Next, the script finds and lists a query processing plan for SELECT statement given
above.

(3) Next, the script drops an index created in a step (2).

(4) Next, the script creates a new tablespace. All parameters of the tablespace are up to
you.

(5) Next, the script creates a cluster and a clustered version of a relational table ORDERS
in the tablespace created in the previous step. A cluster should be created such that an
index over the cluster speeds up processing of SELECT statement given above. A
name of a new relational table is up to you. There is no need to enforce the referential
integrity constraints in a clustered version of the new relational table. It is explained
in the lecture slides and in the Cookbook how to create a cluster and how to create a
relational table in a cluster.

(6) Next, the script copies the contents of a relational table ORDERS to a clustered version
of a relational table ORDERS.

(7) Next, the script finds and lists a query processing plan for SELECT statement given
above and such that it uses a clustered version of a relational table ORDERS. The query
processing costs reported in the current query processing plan must be lower than in a
query processing plan obtained in a step (2).

(8) Finally, the script drops the tablespace created in a step (4).

When ready process a script solution5.sql and save a report from processing in a file
solution5.lst.

The script must be processed with SQLcl options ECHO and FEEDBACK set to ON such
that all SQL statements processed are included in the report !

You must put the following SQLcl statements

SPOOL solution5
SET ECHO ON
SET FEEDBACK ON
SET LINESIZE 300
SET PAGESIZE 300

at the beginning of each SQL script implemented and the following statement at the end of
the script

SPOOL OFF

A report from processing of the script must have NO syntax errors !

Deliverables
A file solution5.lst that contains a report from the processing of a script
solution5.sql.

Submission
Note, that you have only one submission. So, make it absolutely sure that you submit
the correct files with the correct contents. No other submission is possible!

Submit the files solution1.lst, solution2.pdf, solution3.lst,
solution4.lst, solution5.lst, and solution6.lst through Moodle in the
following way:
(1) Access Moodle at http://moodle.uowplatform.edu.au/
(2) To login use a Login link located in the right upper corner the Web page or in the
middle of the bottom of the Web page
(3) When logged select a site CSCI317 (S121) Database Performance
Tuning
(4) Scroll down to a section Submissions
(5) Click at a link In this place you can submit the outcomes of
Assignment 2
(6) Click at a button Add Submission
(7) Move a file solution1.lst into an area You can drag and drop files
here to add them. You can also use a link Add…
(8) Repeat step (7) for the files solution2.pdf, solution3.lst,
solution4.lst, solution5.pdf, and solution6.lst.
(9) Click at a button Save changes
(10) Click at a button Submit assignment
(11) Click at the checkbox with a text attached: By checking this box, I
confirm that this submission is my own work, … in order to
confirm the authorship of your submission.
(12) Click at a button Continue

A policy regarding late submissions is included in the subject outline.

End of specification

欢迎咨询51作业君