FIT3003 Major Assignment - Semester 2, 2022 FIT3003 Major Assignment - Sem 2 2022 (Weight = 20%) Due date: Week 11, Monday 10-October-2022, 11:55pm Version: 1.0 – 26/08/2022 Learning Outcomes: LO1. Design multidimensional databases and data warehouses. LO2. Use fact and dimensional modelling. LO3. Implement online analytical processing (OLAP) queries. LO4. Explain the roles of data warehousing architecture and the concepts of granularity in data warehousing. LO5. Create business intelligence reports using data warehouses and OLAP. A. General Information and Submission o This is a group assignment. One group consists of two students from the same lab. You need to register your group composition through the FIT3003 Group Selection Form as soon as possible. o Submission method: Submission is online through Moodle. o Penalty for late submission: 10% deduction for each day (applies for the whole group). o Assignment coversheet: You will need to sign the assignment coversheet. o Contribution form: The contribution form needs to be completed by all members and signed (e-signature is acceptable) as an agreement between members. The contribution declaration template is shown on the Major Assignment FAQ page on the EdStem forum. o Assignment FAQ: There is a Major Assignment FAQ page set up on the EdStem forum. B. Problem Description MonCity is one of the largest smart cities in the world, and was developed by Monash University for education and research purposes. In MonCity, there are four different zone areas (ZoneA, ZoneB, ZoneC, ZoneD). A wide range of self-driving cars is provided for Monash students and staff members to travel between different zone areas or within the same zone area. MonCity has an existing operational database that maintains and stores all of the self-driving car-related information, such as the booking, maintenance, and accident records required for management's daily operation. However, in order to improve work efficiency, management at MonCity has decided to hire your team of data warehouse engineers to design and develop a data warehouse that can quickly generate reports based on their needs. Management at MonCity wants to generate reports to keep track of the bookings, accidents and maintenance information, such as calculating statistics of booking records and accident records, which can be used for 1 FIT3003 Major Assignment - Semester 2, 2022 self-driving car analyses later. For the accident details, they are particularly interested in the number of accidents per self-driving car (e.g., the number of accidents caused by Car01). MonCity’s operational database tables can be found in the MonCity account. You can, for example, execute the following query: Select * From MonCity.
; The data definition of each table in MonCity is as follows: Table Name Attributes and Data Types Notes FACULTY FACULTYID VARCHAR This table stores faculty information. One faculty can have multiple passengers.FACULTYNAME VARCHAR ZONE VARCHAR RESEARCHCENTER CENTERID VARCHAR This table stores information about research centres. One research centre may have several maintenance teams. CENTERNAME VARCHAR PHONE NUMBER OPENINGHOUR VARCHAR PASSENGER PASSENGERID VARCHAR This table stores passenger information. Each passenger is from a certain faculty and can have one or multiple booking records. PASSENGERNAME VARCHAR PASSENGERROLE VARCHAR PASSENGERGENDER VARCHAR PASSENGERAGE NUMBER FACULTYID VARCHAR ERROR ERRORCODE VARCHAR This table stores accident error information. One error code may involve more than one accident. ERRORMESSAGE VARCHAR 2 FIT3003 Major Assignment - Semester 2, 2022 MAINTENANCETYPE MAINTENANCETYPE VARCHAR This table stores maintenance type information. One maintenance type may involve more than one maintenance record. MAINTENANCEDESC RIPTION VARCHAR BOOKING BOOKINGID VARCHAR This table stores the information of booking records. One booking record belongs to a certain passenger and car. REGISTRATIONNO VARCHAR BOOKINGDATE DATE DEPARTUREZONE VARCHAR DESTINATIONZONE VARCHAR PASSENGERID VARCHAR CAR REGISTRATIONNO VARCHAR This table stores the information of self-driving cars. One car may have more than one booking, one accident, and one maintenance record. CARMODEL VARCHAR MANUFACTURINGYE AR NUMBER CARBODYTYPE VARCHAR NUMSEATS NUMBER MAINTENANCE MAINTENANCEID VARCHAR This table stores each car’s maintenance history. Each maintenance record belongs to a certain car, maintenance type and maintenance team. REGISTRATIONNO VARCHAR MAINTENANCEDATE DATE MAINTENANCETYPE VARCHAR MAINTENANCECOST NUMBER TEAMID VARCHAR ACCIDENTINFO ACCIDENTID VARCHAR This table stores the information of each accident, including accident ID, accident zone, etc. Each accident can be categorised into a certain error code. ACCIDENTZONE VARCHAR CAR_DAMAGE_SEVE RITY VARCHAR 3 FIT3003 Major Assignment - Semester 2, 2022 ERRORCODE VARCHAR CARACCIDENT REGISTRATIONNO VARCHAR This table stores information about accidents and all related cars. One accident can involve more than one car. ACCIDENTID VARCHAR MAINTENANCETEAM TEAMID VARCHAR This table stores information about maintenance teams. One maintenance team may have more than one maintenance record. Each maintenance team may be from more than one research center. TEAMLEADER VARCHAR PHONE NUMBER BELONGTO TEAMID VARCHAR This table stores the relationship between research centres and maintenance teams. CENTERID VARCHAR C. Tasks The assignment is divided into FIVE main tasks: 1. Design a data warehouse for the above MonCity database. You are required to create a data warehouse for the MonCity database. The management is interested in the following fact measures: ● Number of booking records ● Total maintenance cost ● Number of maintenance records ● Number of accident records The following are some possible dimension attributes that you may need in your data warehouse: ● Booking time period: by month ● Faculty ● Passenger age group: Young adults (18-35 years old); middle-aged adults (36-59 years old); old-aged adults (over 60 years old) 4 FIT3003 Major Assignment - Semester 2, 2022 ● Car body type (including number of seats) ● Research centre ● Maintenance type ● Accident Zone (based on accident zone in the ACCIDENTINFO table) ● Error code ● Car damage severity For each attribute, you may apply your own design decisions on specifying a range or group where this is not already detailed above, but make sure to specify them in your submission. - Preparation stage. Before you start designing the data warehouse, you have to ensure that you have explored the operational database and have done sufficient data cleaning. Once you have done the data cleaning process, you are required to explain what strategies you have taken to explore and clean the data The outputs of this task are: a) The E/R diagram of the operational database. b) If you have done the data cleaning process, explain the strategies you used in this process (you need to show the SQL to explore the operational database and SQL of the data cleaning, as well as the screenshot of data before and after data cleaning). - Designing the data warehouse by drawing star/snowflake schema. The star schema for this data warehouse may contain multi-fact(s). You need to identify the fact measures, dimensions, and attributes of the star/snowflake schema. The following queries might help you to identify the fact measures and dimensions: ● How many booking records were made by passengers from the IT faculty in July by using bus? ● How many booking records were made by each passenger age group? ● How many maintenance records were listed that include at least one maintenance team from research centre 4 (CE04)? ● List the total maintenance cost of each maintenance type for mini bus. ● How many accidents were recorded in the ZoneA for Car01? ● List the number of accidents for each error code at ZoneB for Car06. ● List the number of accidents with severe damage inflicted on Car05. 5 FIT3003 Major Assignment - Semester 2, 2022 You should pay attention to the granularity of your fact tables. You are required to create two versions of star/snowflake schemas based on different levels of aggregation. Version-1 should be in the highest level of aggregation. Version-2 should be in level 0, which means it contains no aggregation. Version Name Level Version-1 High aggregation Version-2 No aggregation (Level 0) The star/snowflake schema of both versions you created might contain bridge tables, determinant dimensions, and/or temporal dimensions. If you are using bridge tables, make sure to include the weight factor and list aggregate attributes. If you are using determinant dimensions, make sure the correct notation has been used (broken lines for the table boundary) and you must provide the reasons for your choice(s). You can use different temporal data warehousing techniques for the temporal dimensions. If there are any, you must provide the reasons for your choice(s). The outputs of this task are: c) Two versions of star/snowflake schema diagrams. d) 1. The reasons for the choice of determinant dimension(s) in your star schema, or the reason for its absence. 2. The reasons for the choice of SCD type(s) for any temporal dimensions in your star schema, or the reason for its absence. e) An explanation of the differences between the two versions of star/snowflake schemas. Note: The above explanation must be consistent with your star schema diagram and based on the assignment scenario. Please have a maximum of 300 words for each explanation. 2. Implement the two versions of the star/snowflake schemas using SQL. You are required to implement the star/snowflake schemas for the two versions that you have drawn in Task 1. That is, you may need to create the different fact and dimension tables for the two versions in SQL, and populate these tables accordingly. When naming the fact tables and dimension tables, you are required to give identical names for the two versions and suffix the version number to differentiate them. For example, you can use “MonCity_fact_v1” for version-1 and “MonCity_fact_v2” for 6 FIT3003 Major Assignment - Semester 2, 2022 version-2. If the dimension is the same between the two versions, you do not need to create them twice. The output is a series of SQL statements to perform this task. You will also need to show that this task has been carried out successfully. Note: ● If your account is full, you will need to drop all of the tables that you have previously created during labs. ● If you have dopped all tables in you account and you still encounter the ORA-01536: space quota exceeded for tablesace ‘TABLE_NAME’, please check your SQL code whether you have properly joined all tables. This issue was mainly caused when you did not do the table join properly as the number of records multiplied during the process. The outputs of this task are: a) SQL statements (e.g., create table, insert into, etc.) to create the star/snowflake schema Version-1. b) SQL statements (e.g., create table, insert into, etc.) to create the star/snowflake schema Version-2. Note: The SQL statements for both levels of star schema must be presented in the PDF file. 3. Create the following reports using OLAP queries. You are required to generate several reports using the data warehouse version-1 (high aggregation) that you have implemented in Task 2. For each report, you should produce the SQL command and sample report output. Note: the table snapshots of the following questions are only for reference purposes. 7 FIT3003 Major Assignment - Semester 2, 2022 3.1. OLAP Queries REPORT 1: MonCity’s cumulative number of booking records of each month for Faculty of IT Create an OLAP SQL command that will produce the following report (including the same output order): FacultyID Month Total bookings Cumulative number of booking records FIT January 260 260 FIT February 230 490 FIT March 234 724 FIT April 228 952 FIT May 245 1,197 FIT June 252 1,449 FIT July 249 1,698 FIT August 245 1,943 FIT September 274 2,217 FIT October 256 2,473 FIT November 251 2,724 FIT December 251 2,975 The outputs of this task are: (a) The SQL command, and (b) The screenshot of the query results (or part of the query results), including all attribute names. REPORT 2: MonCity’s maintenance report Create an OLAP SQL command that will produce the following report (including the same output order): Team ID Car body type Total number of maintenance Total maintenance cost All Teams All Car Body Types 399 125,300 All Teams Bus 136 44,900 All Teams Mini Bus 113 34,000 8 FIT3003 Major Assignment - Semester 2, 2022 All Teams People Mover 150 46,400 T002 All Car Body Types 197 62,700 T002 Bus 58 18,400 T002 Mini Bus 62 19,300 T002 People Mover 77 25,000 T003 All Car Body Types 202 62,600 T003 Bus 78 26,500 T003 Mini Bus 51 14,700 T003 People Mover 73 21,400 The outputs of this task are: (a) The SQL command, and (b) The screenshot of the query results (or part of the query results), including all attribute names. REPORT 3: MonCity’s rank analysis for the number of accidents Create an OLAP SQL command that will produce the following report (including the same output order): Error Code Registration No. Car body type Total number of accidents Rank Error001 Car01 Bus 13 1 Error001 Car12 Mini Bus 12 2 Error001 Car19 Mini Bus 12 2 Error001 Car04 Bus 12 2 Error001 Car08 Bus 11 3 Error001 Car20 Mini Bus 11 3 Error002 Car22 People Mover 45 1 Error002 Car27 People Mover 42 2 Error002 Car23 People Mover 39 3 Error002 Car30 People Mover 39 3 Error003 Car06 Bus 12 1 Error003 Car14 Mini Bus 12 1 Error003 Car10 Bus 11 2 Error003 Car01 Bus 11 2 Error003 Car12 Mini Bus 10 3 Error003 Car09 Bus 10 3 9 FIT3003 Major Assignment - Semester 2, 2022 Error004 Car12 Mini Bus 13 1 Error004 Car15 Mini Bus 10 2 Error004 Car20 Mini Bus 9 3 Error004 Car18 Mini Bus 9 3 Error004 Car04 Bus 9 3 Error005 Car16 Mini Bus 11 1 Error005 Car08 Bus 10 2 Error005 Car20 Mini Bus 10 2 Error005 Car19 Mini Bus 9 3 Error005 Car05 Bus 9 3 Error005 Car01 Bus 9 3 Error005 Car12 Mini Bus 9 3 The outputs of this task are: (a) The SQL command, and (b) The screenshot of the query results (or part of the query results), including all attribute names. REPORT 4: MonCity’s booking report Create an OLAP SQL command that will produce the following report (including the same output order): Car body type Age group Faculty ID Total number of bookings People Mover All Age Groups All Faculties 3,396 People Mover All Age Groups ART 453 People Mover All Age Groups BUS 314 People Mover All Age Groups ENG 841 People Mover All Age Groups FIT 1,009 People Mover All Age Groups SCI 779 People Mover Group1 All Faculties 1,380 People Mover Group1 ART 169 People Mover Group1 BUS 121 People Mover Group1 ENG 382 People Mover Group1 FIT 390 People Mover Group1 SCI 318 People Mover Group2 All Faculties 1,722 People Mover Group2 ART 284 10 FIT3003 Major Assignment - Semester 2, 2022 People Mover Group2 BUS 193 People Mover Group2 ENG 429 People Mover Group2 FIT 497 People Mover Group2 SCI 319 People Mover Group3 All Faculties 294 People Mover Group3 ENG 30 People Mover Group3 FIT 122 The outputs of this task are: (a) The SQL command, and (b) The screenshot of the query results (or part of the query results), including all attribute names. 3.2. Reports with rollup and partial rollup Produce two reports that contain subtotals and one fact measure, using rollup and partial rollup. REPORT 5: Produce one booking-related report that is useful for management that uses rollup. REPORT 6: Produce one booking-related report that is useful for management that uses partial rollup. The outputs of this task are: (a) The query questions written in English, (b) An explanation of the differences between rollup and partial rollup, (c) The SQL commands that contain rollup and partial rollup, and (d) The screenshots of the query results (or part of the query results). 3.3. Report with moving and cumulative aggregates Produce two reports containing moving and cumulative aggregates and one fact measure. REPORT 7: Produce one moving aggregate report that relates to the Booking information. The report must contain or use the month information and number of Bookings in the output. REPORT 8: Produce one cumulative aggregate report that relates to the maintenance information. 11 FIT3003 Major Assignment - Semester 2, 2022 The report must contain or use the number of maintenance records or the total maintenance cost in the output. The outputs of this task are: (a) The query questions written in English, (b) Your explanation of why such a query is necessary or valuable for management, (c) The SQL commands that contain moving and cumulative aggregates, and (d) The screenshots of the query results (or part of the query results). 4. Business Intelligence (BI) Reports Choose any four reports from Task 3 and change the presentation of these reports in a BI report format. Create one dashboard based on your chosen reports. This new representation should be appealing to management. Additionally, in these new reports, you might want to include some selection buttons, which may give users options on what criteria to include so that the graph report will be more dynamic. 5. Final Recommendations/Suggestions Due to the successful operation of the MonCity project, Monash University decided to invest further in the self-driving car project. As a data warehouse engineer: ● You are required to provide a suggestion with valid supporting data to help management decide on how to improve current self-driving car projects, such as which car body type should be further invested in based on the Booking, maintenance, and accident records, or which error code has contributed to the most number of accidents, etc.; or ● If you have a different opinion on the self-driving car project, provide a reason why the project is not worth further investment. Note: The above suggestion must contain valid supporting data from the MonCity database and data warehouse. This could be tables, graphs, or charts. Please use a maximum of 500 words for your suggestions. 12 FIT3003 Major Assignment - Semester 2, 2022 D. Checkpoints Checkpoint Weight Assessment Due date Checkpoint 1 1% ER Diagram Group contribution check Week 7 (during lab) Checkpoint 2 1% Star Schema Version-1 Group contribution check Week 9 (during lab) The Checkpoints will only be assessed during your allocated lab. Your group is required to complete the assessment for each checkpoint in order to obtain the allocated mark. There are associated mark penalties for not meeting the checkpoint assessment on time to a satisfactory state. The member contribution will be checked by your allocated tutor regularly. Your contribution will be based on team member reviews if you have been absent from one of the checkpoint assessments, and the Major Assignment grade will be adjusted accordingly. Note that the Final Report and Implementation are worth 18%. E. Submission Checklist 1. One combined .pdf file containing all tasks mentioned above: Cover page A signed coversheet A contribution declaration form: Each student must state the parts of the assignment that they completed. An example is as follows: Percentage of contribution: 1. Name: Adam, ID: 210008, Contribution: 60% 2. Name: Ben, ID: 230933, Contribution: 40% List of parts that each student completed: 1. Adam: list the parts that Adam did 2. Ben: list the parts that Ben did Task C.1 (outputs a, b, c, d, e) Task C.2 (outputs a, b) Task C.3 Reports 1-4 (outputs a, b) Task C.3 Reports 5-8 (outputs a, b, c, d) Task C.4 (screenshot of a dashboard with four reports) 13 FIT3003 Major Assignment - Semester 2, 2022 2. .sql files for the following task: Task C.1 (SQL command as required by output b) Task C.2 Implementation of the Star Schemas (SQL command as required by output a and b) Task C.3 Reports 1-4 (SQL command as required by output a) Task C.3 Reports 5-8 (SQL command as required by output c) All of the above SQL files must be runnable in Oracle. Zero marks will be given for the implementation if: ● SQL files are not runnable, or ● SQL files are missing, or ● Wrong SQL files are submitted, including showing inconsistent SQL statements between pdf and SQL files. 3. .pbix file for the following task: Task C.4 (a dashboard with four reports) 4. Zip all the SQL files from #2 and the file from #3, and name the ZIP folder as MA_SQL_BI.zip. Submission Method: 1. Upload the PDF file from Checklist #1 and the ZIP file from Checklist #4 to Moodle by the due date: Monday, 10 October 2022, 11:55pm. ● The submission of this assignment must be in the form of a single PDF file AND a single ZIP file. No other forms will be accepted. ● One member of your group can upload the submission. However, please note that all group members must click the submit button and accept the submission statement (failure to do so will cause your assignment submission to be in draft mode and will incur late penalties). ● You must ensure that you have all the files listed in this checklist before submitting your assignment to Moodle. Failure to submit a complete list of files will lead to mark penalties. 2. Penalty for late submission: 10% deduction for each day, including weekends 14 FIT3003 Major Assignment - Semester 2, 2022 3. Submission cut-off time: Monday, 17 October 2022, 11:55 pm. The submission link will be unavailable after this time. Getting help and support: What can you get help for? ● Consultations with the Teaching Team Talk to the Teaching Team: https://lms.monash.edu/course/view.php?id=140811§ion=2 ● English language skills Talk to English Connect: https://www.monash.edu/english-connect ● Study skills Talk to a learning skills advisor: https://www.monash.edu/library/skills/contacts ● Counselling Talk to a counsellor: https://www.monash.edu/health/counselling/appointments Extensions: If you are experiencing difficulties that you think will impact your ability to meet this deadline, you may apply for an assignment extension. You must apply no later than two university working days after the due date of this assignment (before Wednesday, 19 October 2022, 11:55 PM). The extension application can be found on Moodle > Assessments > How to Apply for an Extension. Please allow two business days for your application to be processed. Please ensure your application is supported by appropriate documentation. You can find more information about assignment extensions at the Special Consideration website. Special Consideration: Students should carefully read the Special Consideration website, especially the details about the formal documentation required. All special consideration requests should be made using the Special Consideration Application. Please do not assume that submission of a Special Consideration application guarantees that it will be granted – you must receive an official confirmation that it has been granted. 15 FIT3003 Major Assignment - Semester 2, 2022 Late Penalty: Late assignments submitted without an approved extension may be accepted up to a maximum of seven days with the approval of the Chief Examiner and/or Lecturer but will be penalised at the rate of 10% per day (including weekends and public holidays). Assignments submitted more than seven days after the due date will receive a zero mark for that assignment and may not receive any feedback. Please note (late penalties and extensions): 1. An inability to manage your time or computing resources will not be accepted as a valid excuse. (Several assignments being due at the same time are a fact of university life.) 2. Group issues, hardware failures, whether of personal or university equipment, are not normally recognised as valid excuses. Failure to back up assignment files is also not recognised as a valid excuse. Plagiarism and Collusion: Monash University is committed to upholding standards and academic integrity and honesty. Please take the time to view these links. Academic Integrity Module Student Academic Integrity Policy Test your knowledge, collusion (FIT No Collusion Module) END OF MAJOR ASSIGNMENT 16 欢迎咨询51作业君