Exam DIT032 / DAT335 – Data Management Wednesday, August 22nd, 2018 14:00 - 18:00 Examiner: Philipp Leitner Contact Persons During Exam: Philipp Leitner (+46 733 05 69 14) Joel Scheuner (+46 733 42 41 26) (visitations around 15:00 and 16:30) Allowed Aides: None except English dictionary (non-electronic), pen/pencil, ruler, and eraser. Check the back of your exam papers for additional help in the appendix. Results: Exam results will be made available no later than in 15 working days through Ladok. Grade Limits: For GU students: 0 – 49 pts: U, 50 – 89 pts: G, 90+ pts: VG For Chalmers students: 0 – 49 pts: U, 50 – 69 pts: 3, 70 – 89 pts: 4, 90+ pts: 5 Review: The exam review will take place Friday, September 17th (11:00 - 12:00) in room 473 (Jupiter building, 4th floor). Please refer to GUL for potential updates. 1 Task 1 – Theory and Understanding (30 pts) (every question can and should be answered in a few sentences, plus an example if asked for) Q1.1: What is an index in SQL? What is it generally used for? Q1.2: What is a “weak entity” in the Entity Relationship model? What is special about them (e.g., in terms of key requirements)? Q1.3: What are the four “CRUD” operations? Describe each in one sentence. Q1.4: How does replication work, and what is it used for? What basic replication models are there? Q1.5: Name and briefly describe the 5 “V”s of Big Data. Q1.6: Briefly describe the advantages and disadvantages often associated with NoSQL databases (in comparison to relational databases). 2 Task 2 – EER Diagrams (24 pts) Consider the following excerpt of the domain description for a conference database. Model the described domain using an EER diagram with the notation we used in the course (find a cheat sheet in the appendix of your exam papers). Use the 1,N,M notation for describing cardinalities rather than the min-max notation. Conference Management: The database needs to keep track of conferences. Conferences have an unique name, a location, a duration, and an edition (e.g., 2nd edition). The duration is specified through a begin and end date. We also want to track how many conferences there are in total in the system. Conferences are attended by delegates (i.e., people attending the conference). Delegates can attend multiple conferences, and clearly a conference can have many delegates attending. For each delegate, we need to store when they registered for each conference (“reg date”). For each delegate, we further need to store their registration number (which is unique for each delegate), their name, country, and date of birth. Conferences further consist of tracks. Tracks have a name which is unique for a conference (but not necessarily unique overall), a date, start time, and end time. Every track has exactly one track chair, who is a special delegate responsible for coordinating a track. For delegates that are also track chairs, we need to save the organisation that they represent in the conference (this is not needed for regular delegates). A track chair can chair multiple tracks, but they need to chair at least one to be considered a track chair. Finally, tracks consist of presentations. For each presentation, we need to save the title, which is unique in a track (but, again, not necessarily unique overall), and the start time. Each presentation is given by exactly one delegate (the presenter), but there may be delegates who are not giving any presentations. Each delegate may give at most one presentation. 3 Task 3 – Relational Model and Algebra (20 pts) Relational Model: DURATION(id, begin, end) CONFERENCE(name, location, duration) CONFERENCE.duration is a foreign key pointing at DURATION.id DELEGATE(reg nr, name, date of birth) REGISTRATION(delegate, conference, registration date) REGISTRATION.delegate is a foreign key pointing at DELEGATE.reg nr REGISTRATION.conference is a foreign key pointing at CONFERENCE.name Given this relational model, write relational algebra statements that exactly represent the following queries. Use the mathematical notation from the course (for the correct notation you can again refer to the appendix). Queries: Q3.1: Find all delegate names registered to a conference at or after January 1st, 2018. Q3.2: List all conference names along with their duration (calculated as end date minus begin date). Q3.3: Find all pairs of conference names of conferences in the same location. It is ok if the same pairs appear multiple times in the list. Q3.4: Find the oldest delegate (i.e., the delegate with the smallest date of birth) attending the conference with the name “ICSE2018”. 4 Task 4 – SQL (20 pts) Given the same relational model as for Task 3, write the following SQL statements or queries. Statements: Q4.1: Delete all registrations to the conference with the name “ICSE2018” from before January 1st, 2018. Q4.2: Return the name and location of conferences in one of the following locations: Gothenburg, Stockholm, Copenhagen, Oslo, or Helsinki. Do not use more than a single conditional statement in the “where” clause of the query. Q4.3: Return a list of all delegate names, their dates of birth,the name of the conferences that they are registered for, and the location of the conference (one return row per registration). Delegates not registered to any conference should not be contained in the result. Q4.4: Create a new index on for the “location” field of the CONFERENCE table. Task 5 – XML (6 pts)
This minimal XML snippet is not well-formed (i.e., it violates the basic syntactical rules of XML). Name and briefly explain at least 3 reasons for this document not being well-formed. 5 Appendix: Notation Guidelines for EER and RA 6 MeaningSymbol Entity Weak Entity Indentifying Relationship Relationship Composite Attribute . . . Key Attribute / Dashed Underline for Partial Key Attribute Derived Attribute Multivalued Attribute Total Participation of E2 in RRE1 E2 Cardinality Ratio 1: N for E1 : E2 in RRE1 E2 N1 d SUBCLASS A SUBCLASS B SUPERCLASS SUBCLASS C Total Disjoint Specialization ------- SUBCLASS A SUBCLASS B SUPERCLASS SUBCLASS C Partial Overlapping Specialization o Table 8.1 Operations of Relational Algebra OPERATION PURPOSE NOTATION SELECT Selects all tuples that satisfy the selection condition from a relation R. σ(R) PROJECT Produces a new relation with only some of the attributes of R, and removes duplicate tuples. π(R) THETA JOIN Produces all combinations of tuples from R1 and R2 that satisfy the join condition. R1 R2 EQUIJOIN Produces all the combinations of tuples from R1 and R2 that satisfy a join condition with only equality comparisons. R1 R2, OR R1 (), () R2 NATURAL JOIN Same as EQUIJOIN except that the join attributes of R2 are not included in the resulting relation; if the join attributes have the same names, they do not have to be specified at all. R1* R2, OR R1* (), () R2 OR R1 * R2 UNION Produces a relation that includes all the tuples in R1 or R2 or both R1 and R2; R1 and R2 must be union compatible. R1 ∪ R2 INTERSECTION Produces a relation that includes all the tuples in both R1 and R2; R1 and R2 must be union compatible. R1 ∩ R2 DIFFERENCE Produces a relation that includes all the tuples in R1 that are not in R2; R1 and R2 must be union compatible. R1 – R2 CARTESIAN PRODUCT Produces a relation that has the attributes of R1 and R2 and includes as tuples all possible combinations of tuples from R1 and R2. R1 × R2 DIVISION Produces a relation R(X) that includes all tuples t[X] in R1(Z) that appear in R1 in combination with every tuple from R2(Y), where Z = X ∪ Y. R1(Z) ÷ R2(Y) ℱ(R) whereas is a list of [MIN|MAX|AVERAGE|SUM|COUNT]