辅导案例-AVIA2601
1
Data Analytics Project for AVIA2601

You are given a data set about flight delay records in July 2018 from the Head of Data
Analytics of American Airlines to conduct analysis. The July in 2018 was a very busy summer
month for AA and hence, many flights were delayed due to many reasons. For reporting
purposes, FAA ask US carriers to report delay causes by the following five groups:

Delay Cause Group (variable name*) Notes CarrierDelay Carrier Delay, in Minutes WeatherDelay Weather Delay, in Minutes NASDelay National Air System Delay, in Minutes SecurityDelay Security Delay, in Minutes LateAircraftDelay Late Aircraft Delay, in Minutes
(*For the full data dictionary, please check the readme.html file that comes with data)

You, as a data analyst in AA, is to harvest as much insights as possible from the available
data and advise the Head of Data Analytics, Dr. Wu on how to improve flight scheduling and
operations of AA in 2019. Hence, your insights are critical in this project in shaping up the
new schedule and future operations.

It is noted that passenger satisfaction is a top priority for AA and flight OTP is one of the key
factors that affect passenger satisfaction. So, any insights on flight delays, flight scheduling,
and aircraft ground operations at airports are essential for future schedule improvement,
operational improvement and passenger satisfaction.

Data source & data dictionary: (OTP_July2018.csv) Download from
https://transtats.bts.gov/Tables.asp?DB_ID=120&DB_Name=Airline%20On-
Time%20Performance%20Data&DB_Short_Name=On-Time. Please download the Reporting
Carrier On-Time Performance Data (the one at the bottom) and choose to download ‘July
2018’ full dataset for this project. The file I downloaded was about 291.7MB and came with
a ‘readme.html’ data dictionary. Please read your data and the dictionary carefully before
embarking on your data project.
2
Milestone #1- Data Exploration and Visualisation
There are two milestones in your data project. Your job as a data analyst in Milestone #1 is
to explore this dataset and provide meaningful insights to Dr. Wu. You are free to explore
the data with Python (NO Excel and NO PySpark SQL!) but the following tasks must be
conducted:
• On-time Performance (OTP) statistics for AA flights, grouped by airports, departure
or arrival, delays, aircraft tail number, delay causes, taxi in and out delays … etc.
• Comparison with other airlines in the same dataset by meaningful ways such as the
same departure airport, or the same period of departure/arrival time;
• How did taxi delays contribute to overall flight delays including taxi-out/taxi-in
delays? You can group the insights by ports, by time slots, or by airlines.
• Visualisation of above statistics of this dataset.

Milestone 2- Data Modelling and Insight Analysis
Your job in Milestone #2 is to develop models based on this dataset. You are free to explore
and model this data by using your knowledge of data modelling and aviation. Of course, you
need a pinch of creativity in this milestone. You are asked to model (but not limited to) the
following issues in this project:
• What factors are causing departure delays and how delays are affected by these
factors?
• Could you build up a model to predict departure delays for a particular flight, a
particular period of time or a particular port? You can use any model you know and
not limited to those introduced in lectures.
• What other models could you build from this dataset?

Your modelling is not limited to these questions, so go on and explore the data and produce
insights. You are more than welcome to expand the data into other months of 2018 or July
in previous years. This will enrich your understanding and modelling of OTP analysis. If you
use 10 years of data, then you will be handling a dataset of about 2GB size! Any insights that
can help AA is valuable and insights leading to successful flight scheduling strategies would
be preferred.
3

Assessment criteria
Compulsory tasks listed above for each milestone must be done. Finishing this will give you
a Pass mark. To gain higher marks, then you will need to explore the data further and make
meaningful analysis or modelling based on the available data. Your CEO is looking for
meaningful discussions on your results/models, so pay attention to result discussions. Go
further and trouble yourself in this project because that’s where gold is!


Submission guide
All submissions must be done on Moodle; please check Moodle for exact deadlines. Please
also follow the submission guide:
1. Codes: You are required to submit the original Jupyter Notebook file and other
associated files including output files such as graphs. You can choose not to submit
the data file (due to its size), though. The Jupyter Notebook file is to verify your
codes by the assessor so make sure you provide sufficient ‘comments’ in your
Notebook.
2. Summary report: Data insights and modelling discussions should be provided in the
summary report (not in the working Jupyter file) for ease of reading and report
writing. Size of the report doesn’t matter but quality discussions and insights do
because they will give you higher marks! Simply reporting results will give you a pass
mark only. The soft copy of your report MUST be in PDF format and contained in
ONE single PDF file only for submission (20% off penalty, if you don’t follow this
document preparation rule).
3. File naming convention:
a. Name your report file in the following format: zID_reportMilestoneX.pdf;
b. Name your Jupyter working file in the following format:
zID_JupyterNotebookMilestoneX.ipynb.


4
Submission check list:
Create a new folder and name it by ‘ZID_MilestoneX’
Create a sub-folder and name it by ‘Codes’. Copy all the contents of your working
Jupyter Notebook folder over except the OTP data file.
Create another sub-folder and name it by ‘Reports’. Copy your
ZID_reportMilestoneX.pdf report file over.
Zip the ‘ZID_MilestoneX’ folder. For Mac users, you can right click and choose
“Compress ZID_MilestoneX” to create a zip ball for submission. For Win users, you
may try WinZip or 7-Zip or other similar tools.
Submit the ZID_MilestoneX.zip on Moodle before deadline.


Penalties to late submissions are heavy; 10% reduction per day, so be on time! Don’t submit
at the last minute because it’s usually a lot ‘bumpier’ at the last minute before a deadline
(and everything could go wrong)!

Have fun. ^_<

Dr. C. Wu
CAO, AA.com
51作业君 51作业君

Email:51zuoyejun

@gmail.com

添加客服微信: ITCSdaixie