辅导案例-ISYS2120

欢迎使用51辅导,51作业君孵化低价透明的学长辅导平台,服务保持优质,平均费用压低50%以上! 51fudao.top
ISYS2120: Data &
Information Management
Week 4: Relational Algebra
– SQL Nested Subqueries
Presented by
Dr. Matloob Khushi
School of Computer Science
Cf. Kifer/Bernstein/Lewis – Chapter 5.1;
Ramakrishnan/Gehrke – Chapter 4.2;
Ullman/Widom – Chapter 2.4
Administration
 Assignment 1 is due by 18:00:00 on 6th September 2019
 Late Penalty
 A lateness penalty of 10% (1 mark of your total assessment) per day
on your maximum mark will be imposed. E.g. If you submit 2 days
late, your maximum mark will be 10 - 2*1 = 8.
 Group formation
 Everyone should be in a group by now if not please let your tutor
know.
04a-2
04a-3
Outline
 Foundations of Declarative Querying
 Relational Algebra
 Six basic operations
 Join operation
 Composition and equivalence rules
 More SQL: Nested Subqueries
Based on slides from Kifer/Bernstein/Lewis (2006) “Database Systems”
and from Ramakrishnan/Gehrke (2003) “Database Management Systems”,
and also including material from Fekete/Röhm.
How Can We Talk to a Database?
04a-4
Database Querying with SQL
04a-5
SELECT studID
FROM Enrolled E, UnitOfStudy U
WHERE E.uosCode = U.uosCode
AND U.uosName = 'Database Systems I'
“List all students (by SID) who are enrolled in Database Systems I”
Many Ways to Write this Query…
04a-6
SELECT studID
FROM Enrolled E, UnitOfStudy U
WHERE E.uosCode = U.uosCode
AND uosName = 'Database Systems I'
SELECT studID
FROM Enrolled
WHERE uosCode IN (SELECT uosCode
FROM UnitOfStudy
WHERE uosName = 'Database Systems I')
SELECT studID
FROM (SELECT studID, uosName
FROM Enrolled NATURAL JOIN UnitOfStudy)R
WHERE uosName= 'Database Systems I'
SELECT sid AS studID
FROM (SELECT u.uosCode AS u1, e.uosCode AS u2,
e.studId AS sid, u.uosName AS title
FROM Enrolled e, UnitOfStudy u) SubQ
WHERE u1=u2 AND title = 'Database Systems I'
“List all students (by SID) who are enrolled in Database Systems I”
How do we know what a DBMS is doing?
 All the SQL queries on the previous slide are equivalent
 On the same data set, they produce the same result
 Even when run on different database systems by various vendors,
their results will be the same
 Why?
 Why do we know that this is the case?
 How can database system vendors even guarantee this?
 And why are some not equivalent?
Wrong:
04a-7
SELECT studID
FROM Enrolled E, UnitOfStudy U
WHERE uosName = ’Database Systems I'
04a-8
The Big Idea
 Users request information from a database using a query
language
 A query that extracts information can be seen as calculating
a relation from combining one or more relations in the
current state of the database
 Relational algebra (RA) defines some basic operators that
can be used to express a calculation like that
 Each operator takes one or more relation instances, and produces a
relation instance (the operation part of the relational data model)
 Why is it important?
 Helps us understanding the precise meaning of declarative SQL
queries.
 Intermediate language used within DBMS (cf. chapter on db tuning)
04a-9
The Role of Relational Algebra in a DBMS
[cf. Kifer/Bernstein/Lewis, Figure 5.1]
What is an Algebra?
 A language based on operators and a domain of values:
 basic operators
 atomic operands (constants and variables)
 rules to form complex expressions from operators and operands,
typically using nesting with parentheses
 Example: Algebra of Arithmetic
 Operators for usual arithmetic operations: + - * /
 Operands: variables (x, y, z, …) and numerical constants
 Example arithmetic expression: 100 – ((x + y) / 2)
 In databases, operators and operands are finite relations,
which leads to the Relational Algebra
 We refer to expression as a query and the value produced as query result
04a-10
04a-11
Relational Algebra (RA)
 1. Set Operations
 Union ( ∪ ) tuples in relation 1 or in relation 2.
 Intersection ( ∩ ) tuples in relation 1, as well as in relation 2.
 Difference ( - ) tuples in relation 1, but not in relation 2.
 2. Operations that remove parts of a relation
 Selection ( σ ) selects a subset of rows from relation.
 Projection ( π ) deletes unwanted columns from relation.
 3. Operations that combine tuples from two relations
 Cross-product ( X ) allows us to fully combine two relations.
 Join ( ) to combine matching tuples from two relations.
 4. A schema-level ‘rename’ operation
 Rename ( ρ ) allows us to rename one field or even whole relation
 Domain: set of relations
 operators take one/more relations as inputs and gives a new relation as result
=> RA queries by nesting of multiple operators (cf. composition rules at end)
Visualisation of Relational Algebra
04a-12
Projection ( π ) Selection ( σ )
X =
Cross-product ( X )
- =
Set Minus ( - )
∩ =
Set Intersection ( ∩ )
∪ =
Set Union ( ∪ )
=
Join ( )
04a-13
Running Example
Student
sid name gender country
1001 Ian M AUS
1002 Ha Tschi F ROK
1003 Grant M AUS
1004 Simon M GBR
1005 Jesse F CHN
1006 Franzisca F GER
Student UnitOfStudy
gender
name
sid
title
uos_code
creditPoints
UnitOfStudy
uos_code title credit
Points
COMP5138 Relational DBMS 6
COMP5318 Data Mining 6
INFO6007 IT Project Management 6
SOFT1002 Algorithms 12
ISYS3207 IS Project 4
COMP5702 MIT Research Project 18
Enrolled
sid uos_code semester
1001 COMP5138 2005-S2
1002 COMP5702 2005-S2
1003 COMP5138 2005-S2
1006 COMP5318 2005-S2
1001 INFS6014 2004-S1
1003 ISYS3207 2005-S2
country semester
enrolled
04a-14
Set Operations
 These operations take two input relations R and S
 Set Union R ∪ S
 Definition: R U S = { t | t ∈ R ∨ t ∈ S }
 Set Intersection R ∩ S
 Definition: R ∩ S = { t | t ∈ R ∧ t ∈ S }
 Set Difference R - S
 Definition: R - S = { t | t ∈ R ∧ t ∉ S }
 Important constraint: R and S have the same schema
 R, S have the same arity (same number of fields)
 `Corresponding’ fields must have the same names and domains
04a-15
Projection
 ‘Deletes’ attributes that are not in projection list.
 Schema of result contains exactly the fields in the projection list with
distinct values, with the same names that they had in the input
relation.
 Examples:
Student
name country
Ian AUS
Ha Tschi ROK
Grant AUS
Simon GBR
Jesse CHN
Franzisca GER
∏ name, country (Student) ∏ title (UnitOfStudy)
UnitOfStudy
title
Relational DBMS
Data Mining
IT Project Management
Algorithms
IS Project
MIT Research Project
04a-16
Selection
 Selects rows that satisfy a selection condition.
 Example:
 Result relation can be the input for another relational
algebra operation! (Operator composition.)
 Example:
Student
sid name gender country
1001 Ian M AUS
1003 Grant M AUS
Student
name
Ian
Grant
σ country=‘AUS’ (Student)
∏ name ( σ country=‘AUS’ (Student) )
04a-17
Cross-Product
 Defined as: R x S = {t s | t ∈ R ∧ s ∈ S}
 each tuple of R is paired with each tuple of S.
 Resulting schema has one field per field of R and S, with field names
`inherited’ if possible.
 It might end in a conflict with two fields of the same name -> rename needed
 Sometimes also called Cartesian product
 Example:
α
β
1
2
α
β
β
γ
10
10
20
10
a
a
b
b
A B
R
C D E
S
A B C D
result
α
α
α
α
β
β
β
β
1
1
1
1
2
2
2
2
α
β
β
γ
α
β
β
γ
10
10
20
10
10
10
20
10
a
a
b
b
a
a
b
b
Ex =
04a-18
Joins
 Conditional Join:
 Example:
 Result schema same as the cross-product’s result schema.
 Sometimes called theta-join.
 Equi-Join: Special case where the condition c contains only equalities.
sid given family_name gender country empid first_name last_name room
1001 Cho Chung M AUS 47112344 Vera Chung 321
1004 Ciao Poon M CHN 12345678 Simon Poon 431
1004 Ciao Poon M CHN 99004400 Josiah Poon 482
1111 Alice Poon F AUS 12345678 Simon Poon 431
1111 Alice Poon F AUS 99004400 Josiah Poon 482
… … … … … … …
04a-19
Natural Join
 Natural Join:
 Equijoin on all common fields.
 Result schema similar to cross-product, but only one copy of fields
for which equality is specified.
UnitOfStudy
uos_code title points
COMP5138 Relational DBMS 6
COMP5318 Data Mining 6
INFO6007 IT Project Mgmt. 6
SOFT1002 Algorithms 12
ISYS3207 IS Project 4
COMP5702 MIT Research Project 18
Enrolled
sid uos_code
1001 COMP5138
1002 COMP5702
1003 COMP5138
1006 COMP5318
1001 INFO6007
1003 ISYS3207
result
sid uos_code title points
1001 COMP5138 Relational DBMS 6
1002 COMP5702 MIT Research Project 18
1003 COMP5138 Relational DBMS 6
1006 COMP5318 Data Mining 6
1001 INFO6007 IT Project Mgmt. 6
1003 ISYS3207 IS Project 4
=
04a-20
Rename Operation
 Allows to name, and therefore to refer to, the results of
relational-algebra expressions.
 Allows to refer to a relation by more than one name.
 Notation 1: ρ x (E)
 returns the expression E under the name X (rename whole relation)
 Notation 2: ρ X (a1  b1) (E)
 returns the result of expression E under the name X, and with the
attributes a1 …. renamed to b1 … (rename individual attributes)
 (assumes that the relational-algebra expression E has arity n)
 Example:
ρ Classlist( sid  student ) ( σ uos_code=‘ISYS2120' (Enrolled) )
Basic versus Derived Operators
 We can distinguish between basic and derived RA operators
 Only 6 basic operators are required to express everything
else:
 Union ( ∪ ) tuples in relation 1 or in relation 2.
 Set Difference ( - ) tuples in relation 1, but not in relation 2.
 Selection ( σ ) selects a subset of rows from relation.
 Projection ( π ) deletes unwanted columns from relation.
 Cross-product ( X ) allows us to fully combine two relations.
 Rename ( ρ ) allows us to rename one field to another name.
 Additional (derived) operations:
 intersection, join, division:
 Not essential, but (very!) useful.
 Cf. Join:
04a-21
04a-22
Relational Expressions 
 A basic expression in the relational algebra consists of
either one of the following:
 Variable that refers to a relation of the same name in the database
 A constant relation
 Let E1 and E2 be relational-algebra expressions; then
the following are all relational-algebra expressions:
 E1 ∪ E2
 E1 - E2
 E1 x E2
 E1 E2
 σP (E1), P is a (complex) predicate on attributes in E1
 πS (E1), S is a list consisting of some of the attributes in E1
 ρX (E1), x is the new name for the result of E1
04a-23
Equivalence Rules 
The following equivalence rules hold:
 Commutation rules
1. πA ( σp ( R ) ) = σp ( πA ( R ) )
2. R  S = S  R
 Association rule
1. R  (S  T) = (R  S)  T
 Idempotence rules
1. πA ( πB ( R ) ) = πA ( R ) if A ⊆ B
2. σp1 ( σp2 ( R )) = σp1 ∧ p2 ( R )
 Distribution rules
1. πA ( R ∪ S ) = πA ( R ) ∪ πA ( S )
2. σP ( R ∪ S ) = σP ( R ) ∪ σP ( S )
3. σP ( R  S ) = σP (R)  S if P only references R
4. πA,B(R  S ) = πA (R)  πB ( S ) if join-attr. in (A ∩ B)
5. R  ( S ∪ T ) = ( R  S ) ∪ ( R  T )
 These are the basis for the automatic optimisation of relational queries
04a-24
You should now be able …
 to understand the Foundations of SQL
 basic operations of Relational Algebra:
Projection, Selection, Rename, Set Operations, Cross Product
 the meaning of a relational join
 differences between an equi-join, a theta-join and a natural join
 to answer a (simple) English question on a given schema
with a relational algebra query
 to translate a relational algebra expression into a SQL query
 and vice versa… (with the help of the next lecture)
Time for a Break…
04a-25
04a-26
Outline
 Foundations of Declarative Querying
 Relational Algebra
 Six basic operations
 Join operation
 Composition and equivalence rules
 More SQL: Nested Subqueries
Based on slides from Kifer/Bernstein/Lewis (2006) “Database Systems”
and from Ramakrishnan/Gehrke (2003) “Database Management Systems”,
and also including material from Röhm.
Many Ways to Write this Query…
04a-27
SELECT actor_id
FROM Film_Actor a, Film f
WHERE a.film_id = f.film_id
AND f.title = 'VELVET TERMINATOR'
SELECT actor_id
FROM Film_Actor
WHERE film_id IN (SELECT film_id
FROM Film
WHERE title = 'VELVET TERMINATOR')
SELECT actor_id
FROM (SELECT actor_id, title
FROM Film_Actor NATURAL JOIN Film)R
WHERE title = 'VELVET TERMINATOR'
SELECT aid AS actor_id
FROM (SELECT f.film_id AS f1, a.film_id AS f2,
actor_id AS aid, f.title AS title
FROM Film f, Film_Actor a) R
WHERE f1=f2 AND title = 'VELVET TERMINATOR'
“Finds the actors (by ID) who played in the film VELVET TERMINATOR.”
05-28
Nested Subqueries
 SQL provides a mechanism for the nesting of subqueries
helping in the formulation of complex queries
 A subquery is a select-from-where expression that is
nested within another query.
 In a condition of the WHERE clause
 As a “table” of the FROM clause
Within the HAVING clause
 A common use of subqueries is to perform tests for
set membership, set comparisons, and set cardinality.
05-29
Example: Nested Queries
 Find the names of students who have enrolled in
‘ISYS2120’?
SELECT name
FROM Student
WHERE sid IN ( SELECT sid
FROM Enrolled
WHERE uos_code=‘ISYS2120’ )
 Which students have the same name as a lecturer?
SELECT sid, name
FROM Student
WHERE name IN ( SELECT name
FROM Lecturer )
Subquery is embedded in parentheses. In
this case it returns a list that will be used in
the WHERE clause of the outer query
The IN operator will test to see if the
SID value of a row is included in the
list returned from the subquery
Correlated vs. Noncorrelated Subqueries
 Noncorrelated subqueries:
 Do not depend on data from the outer query
 Execute once for the entire outer query
 Correlated subqueries:
Make use of data from the outer query
 Execute once for each row of the outer query
 Can use the EXISTS operator
05-30
05-31
SELECT name
FROM Student
WHERE sid IN ( SELECT DISTINCT sid
FROM Enrolled );
These are the only
students that have IDs
in the Enrolled table
2. The outer query
executes on the results
of the subquery and
returns the searched
student names
No reference to data in outer
query, so subquery executes
once only
1. The subquery
executes first and
returns as
intermediate result
all student IDs from
the Enrolled table
SID
Processing a Noncorrelated Subquery
05-32
Correlated Nested Queries
 With correlated nested queries, the inner subquery depends
on the outer query
 Example:
Find all students who have enrolled in lectures given by ‘Einstein’.
SELECT DISTINCT name
FROM Student, Enrolled e
WHERE Student.sid = e.sid AND
EXISTS ( SELECT 1
FROM Lecturers, UnitofStudy u
WHERE name = ‘Einstein’ AND
lecturer = empid AND
u.uos_code=e.uos_code )
Subquery
refers to
Enrolled
05-33
Processing a Correlated Subquery
1. First join the Student and
Enrolled tables;
2. get the uos_code of the 1. tuple
3. Evaluate the subquery for the
current uos_code to check
whether it is taught by Einstein
4. If yes, include in result.
5. Loop to step (2) until whole
outer query is checked.
Student |><| enrolled
Note: only the students that enrolled
in a course taught by Albert Einstein
will be included in the final results
Subquery refers to outer-
query data, so executes
once for each row of outer
query
05-34
In vs. Exists Function
 The comparison operator IN compares a value v with a set
(or multi-set) of values V, and evaluates to true if v is one of
the elements in V
 A query written with nested SELECT... FROM... WHERE... blocks
and using the = or IN comparison operators can always be
expressed as a single block query.
 EXISTS is used to check whether the result of a correlated
nested query is empty (contains no tuples) or not
05-35
In vs. Exists Function
 Find all students who have enrolled in lectures given by ‘Einstein’.
SELECT distinct name
FROM Student JOIN Enrolled E USING (sid)
WHERE EXISTS ( SELECT *
FROM Lecturer JOIN UnitOfStudy U
ON (lecturer=empid)
WHERE name = ‘Einstein’ AND
U.uos_code = E.uos_code )
Query using IN without a subquery
SELECT distinct name
FROM Student
WHERE Student.sid IN
(SELECT e.sid
FROM Enrolled e, Lecturer, UOS u
WHERE name = ‘Einstein’
AND lecturer = empid
AND u.uos_code = e.uos_code)
SELECT distinct students.name
FROM Student,Enrolled e,Lecturer,UOS u
WHERE Student.sid = e.sid
AND lecturer.name = ‘Einstein’
AND lecturer = empid
AND u.uos_code = e.uos_code
05-36
Set Comparison Operators in SQL
 (not) exists clause
 tests whether a set is (not) empty (true ⇔ R ≠ Ø) (true ⇔ R = Ø)
 unique clause (note: not supported by Oracle or PostgreSQL)
 tests whether a subquery has any duplicate tuples in its result
 all clause
 tests whether a predicate is true for the whole set
F comp ALL R ⇔ ∀ t ∈ R : (F comp t)
 some clause (any)
 tests whether some comparison holds for at least one set element
F comp SOME R ⇔ ∃ t ∈ R : (F comp t)
where
 comp can be: <, ≤, >, ≥, =, ≠
 F is a fixed value or an attribute
 R is a relation
05-37
Examples: Set Comparison
 Find the students with highest marks.
SELECT S.sid
FROM Student NATURAL JOIN Assessment
WHERE mark >= ALL ( SELECT mark
FROM Assessment
WHERE uos_code=‘ISYS2120’ )
 Find students which never repeated any subjects.
SELECT sid, name
FROM Student
WHERE unique ( SELECT uos_code
FROM Enrolled
WHERE Enrolled.sid = Student.sid )
05-38
Examples: Set Comparison (cont’d)
 SQL does not directly support universal quantification (for all)
 SQL Work-around:
Search predicates of the form “for all” or “ for every” can be
formulated using the not exists clause
 Example:
Find courses where all enrolled student already have a grade.
SELECT uos_code
FROM UnitOfStudy U
WHERE NOT EXISTS
( SELECT *
FROM Enrolled E
WHERE E.uos_code=U.uos_code
and grade is null )
Query execution
 Query execution time could greatly differ on different DBs.
 Cf Khushi, M; 2015 https://doi.org/10.1002/jcb.25049
04a-39
04a-40
References
 Kifer/Bernstein/Lewis (2nd edition – 2006)
 Chapter 5.1
one section on RA that covers everything as discussed here in the lecture
 Ramakrishnan/Gehrke (3rd edition - the ‘Cow’ book (2003))
 Chapter 4.2
one compact section on RA, including a discussion of relational division
 Ullman/Widom (3rd edition – 2008)
 Chapter 2.4
a nice and gentle introduction to the basic RA operations,
leaves out relational division though
 Chapters 5.1 and 5.2
goes beyond what we cover here in the lecture by extending RA to bags and
also introduces grouping, aggregation and sorting operators
51作业君

Email:51zuoyejun

@gmail.com

添加客服微信: abby12468