辅导案例-AVIA2601
1 Data Analytics Project for AVIA2601 You are given a data set about flight delay records in July 2018 from the Head of Data Analytics of American Airlines to conduct analysis. The July in 2018 was a very busy summer month for AA and hence, many flights were delayed due to many reasons. For reporting purposes, FAA ask US carriers to report delay causes by the following five groups: Delay Cause Group (variable name*) Notes CarrierDelay Carrier Delay, in Minutes WeatherDelay Weather Delay, in Minutes NASDelay National Air System Delay, in Minutes SecurityDelay Security Delay, in Minutes LateAircraftDelay Late Aircraft Delay, in Minutes (*For the full data dictionary, please check the readme.html file that comes with data) You, as a data analyst in AA, is to harvest as much insights as possible from the available data and advise the Head of Data Analytics, Dr. Wu on how to improve flight scheduling and operations of AA in 2019. Hence, your insights are critical in this project in shaping up the new schedule and future operations. It is noted that passenger satisfaction is a top priority for AA and flight OTP is one of the key factors that affect passenger satisfaction. So, any insights on flight delays, flight scheduling, and aircraft ground operations at airports are essential for future schedule improvement, operational improvement and passenger satisfaction. Data source & data dictionary: (OTP_July2018.csv) Download from https://transtats.bts.gov/Tables.asp?DB_ID=120&DB_Name=Airline%20On- Time%20Performance%20Data&DB_Short_Name=On-Time. Please download the Reporting Carrier On-Time Performance Data (the one at the bottom) and choose to download ‘July 2018’ full dataset for this project. The file I downloaded was about 291.7MB and came with a ‘readme.html’ data dictionary. Please read your data and the dictionary carefully before embarking on your data project. 2 Milestone #1- Data Exploration and Visualisation There are two milestones in your data project. Your job as a data analyst in Milestone #1 is to explore this dataset and provide meaningful insights to Dr. Wu. You are free to explore the data with Python (NO Excel and NO PySpark SQL!) but the following tasks must be conducted: • On-time Performance (OTP) statistics for AA flights, grouped by airports, departure or arrival, delays, aircraft tail number, delay causes, taxi in and out delays … etc. • Comparison with other airlines in the same dataset by meaningful ways such as the same departure airport, or the same period of departure/arrival time; • How did taxi delays contribute to overall flight delays including taxi-out/taxi-in delays? You can group the insights by ports, by time slots, or by airlines. • Visualisation of above statistics of this dataset. Milestone 2- Data Modelling and Insight Analysis Your job in Milestone #2 is to develop models based on this dataset. You are free to explore and model this data by using your knowledge of data modelling and aviation. Of course, you need a pinch of creativity in this milestone. You are asked to model (but not limited to) the following issues in this project: • What factors are causing departure delays and how delays are affected by these factors? • Could you build up a model to predict departure delays for a particular flight, a particular period of time or a particular port? You can use any model you know and not limited to those introduced in lectures. • What other models could you build from this dataset? Your modelling is not limited to these questions, so go on and explore the data and produce insights. You are more than welcome to expand the data into other months of 2018 or July in previous years. This will enrich your understanding and modelling of OTP analysis. If you use 10 years of data, then you will be handling a dataset of about 2GB size! Any insights that can help AA is valuable and insights leading to successful flight scheduling strategies would be preferred. 3 Assessment criteria Compulsory tasks listed above for each milestone must be done. Finishing this will give you a Pass mark. To gain higher marks, then you will need to explore the data further and make meaningful analysis or modelling based on the available data. Your CEO is looking for meaningful discussions on your results/models, so pay attention to result discussions. Go further and trouble yourself in this project because that’s where gold is! Submission guide All submissions must be done on Moodle; please check Moodle for exact deadlines. Please also follow the submission guide: 1. Codes: You are required to submit the original Jupyter Notebook file and other associated files including output files such as graphs. You can choose not to submit the data file (due to its size), though. The Jupyter Notebook file is to verify your codes by the assessor so make sure you provide sufficient ‘comments’ in your Notebook. 2. Summary report: Data insights and modelling discussions should be provided in the summary report (not in the working Jupyter file) for ease of reading and report writing. Size of the report doesn’t matter but quality discussions and insights do because they will give you higher marks! Simply reporting results will give you a pass mark only. The soft copy of your report MUST be in PDF format and contained in ONE single PDF file only for submission (20% off penalty, if you don’t follow this document preparation rule). 3. File naming convention: a. Name your report file in the following format: zID_reportMilestoneX.pdf; b. Name your Jupyter working file in the following format: zID_JupyterNotebookMilestoneX.ipynb. 4 Submission check list: