FIT5137 S2 2024 Assignment 3: PTV Assignment Scenario
(Weight = 35%)
Due date: Friday, 25 October 2024, 4:30 PM
Version:2.0–21/08/2024
General Information and Submission
● This isanindividualassignment.
● Submission method:Submissionis online throughMoodle.
● Penalty forlatesubmission:5% deductionfor each day.
● Assignment FAQ: Thereis anAssignment3FAQpage set upontheEdStemforum.
Assignment Background
YouhavebeenhiredasadataanalystatPublicTransportVictoria(PTV),theVictoriaGovernment
authority responsible for public transport in the state. Some of your duties are data extraction,
integration and analysis to provide good understanding regarding the public transportation condition
inVictoriatothestakeholders.
After the COVID-19 restrictions were lifted, most companies are switching the workstyle from
work-from-home to face-to-face. Therefore, transportation infrastructure and network is one of the
most important aspects. While some people prefer to drive to work, some other people prefer to use
the public transportation network as their main transportation mode. PTVasthesoleproviderforthe
public transportation network reduced their services during the lockdown period. Now, PTV has
restored theservicestocoverasmanyareasaspossibleinthewholeregion.However,somequestions
remained mysteries. How good is the current PTV coverage? Are there anyuncoveredspots?Which
areahasthebestpublictransportationoptions?
Therefore, as a data analyst, your task is to evaluate the data and provide the spatial dataanalysisto
the stakeholders of PTV. The data should be presented inanarealevel,suchasmunicipality,suburbs
or postcode. For example, you may present “The number of bus services in Bundoora” or "The
numberofTrainsorTramsnetworkinBundoora”.
Data
There are two datasets that you have to obtain in this assignment, which are the PTV/GTFS
datasetandAustralianBoundarydata.
The General Transit Feed Specification (GTFS) is a data specification that allows public
transit agencies to publishtheirtransitdatainaformatthatcanbeconsumedbyawidevarietyof
software applications. Today, the GTFS data format is used by thousands of public transport
providers.
GTFS is split into a schedule component that contains schedule, fare, and geographic transit
information and a real-time component that contains arrival predictions, vehicle positions and
service advisories. A GTFS feed iscomposedofaseriesoftextfilescollectedinaZIPfile.Each
filemodelsaparticularaspectoftransitinformation:stops,routes,trips,andotherscheduledata.
For more detailed information about GTFS, you canrefertotheofficialdocumentationprovided
by Google at https://developers.google.com/transit/gtfs. Additionally, You can read further
explanation about the PTV-GTFS data from https://transitfeeds.com/p/ptv/497. For this
assignment,wewillbeusingthe17thMarch2023versionofthedataset.
TheGTFSdatastructureisshownbelow:
The Australian digital boundary is defined by the Australian Bureau of Statistics using the
Australian Statistical Geography Standard (ASGS).TheASGSisaclassificationofAustralia
into a hierarchy of statistical areas. It is a social geography, developed to reflect the location of
people and communities. It is used for the publicationandanalysisofofficialstatisticsandother
data. The ASGS is updated every 5 years to account for growth and change in Australia’s
population, economy and infrastructure. For the 2021 release, the ASGSwillbere-namedtothe
AustralianStatisticalGeographyStandard(ASGS)Edition3.
The ASGS is split into two parts, the ABS and Non ABS Structures. The ABS Structures are
geographies that the ABS designs specifically for the release and analysis of statistics. This
means that the statistical areas are designed to meet the requirements of statisticalcollectionsas
well as geographic concepts relevant to those statistics. This helps to ensure the confidentiality,
accuracyandrelevanceofABSdata.TheNonABSStructuresgenerallyrepresentadministrative
regions which are not defined or maintained bytheABS,butforwhichtheABSiscommittedto
directlyprovidingarangeofstatistics.
The Main Structure is developed by the ABS and isusedtoreleaseandanalyseabroadrangeof
social, demographic and economic statistics. It is a nested hierarchy of geographies, and each
level directly aggregates to the next level. Mesh Blocks (MBs) are the smallest geographic
areas defined by the ABS and form the building blocks forthelargerregionsoftheASGS.
MostMeshBlockscontain30to60dwellings.
Belowisthesimplified ABSandNonABSStructure.Youcanreadfurtherexplanationaboutthe
structurehere
https://www.abs.gov.au/statistics/standards/australian-statistical-geography-standard-asgs-edition
-3/jul2021-jun2026#overview
TheDigitalboundaryfilesthatyouhavetogetistheMeshBlocksdataset.TheMeshBlocks
datasetisavailableasShapefile.YoucanreadfurtherexplanationabouttheMeshBlocksdataset
here
https://www.abs.gov.au/statistics/standards/australian-statistical-geography-standard-asgs-edition
-3/jul2021-jun2026/access-and-downloads/digital-boundary-files
Allocationfilesarenon-spatialrepresentationsofhoweachgeographyisaggregatedfromtheir
buildingblockgeography.YoucanalsoreadfurtherexplanationabouttheAllocationfilesdataset
here
https://www.abs.gov.au/statistics/standards/australian-statistical-geography-standard-asgs-edition
-3/jul2021-jun2026/access-and-downloads/allocation-files
AssignmentTasklist
Your assignment consists of several parts. Always read the instruction one byone.Donotmove
tothestepwithoutcompletingthepreviousstep:
● Task 1: Data Restoration - Restore the data to the database. Monitor the success
indicatortoensuresuccessfulrestorationofthedata.
● Task 2: Data Preprocessing - Perform necessary structure maintenance and create
resulttablesforfurtherprocessing.
● Task 3: Data Analytics and Visualization - Develop SQL queriestoanalyzethedata
and evaluate performance & Create visualizations to present the results of the data
analytics.
● Nodatacleaningrequiredforthisassignment.
● Formoreinformation,seetheFAQforAssignment3.
For simplicity, all the data required for this assignment is readily available in thePostGIS
Docker container. You can access these datasets within the container by navigating to the
/data/adatafolder.Ifyoudon’tknowhowtodoit,refertothelabs10activities.
Verifyyourdatabeforetherestorationprocess.
As a data analyst, it is your responsibility to understand and
explore these publicly available data.
Assignment Task
Task 1: Data Restoration
Before you can start the data analyticprocesses,thefirstthingyouhavetodoistorestorethe
external data to your database. Make sure you prepare a destination schema to restore your
data.Thedestinationschemaforyourassignmentis“ptv”.
Note:
● Before initiating the data restoration process, it is essential to thoroughly explore
the dataset.Thisexplorationinvolvesidentifyingappropriatedatatypes,determining
field lengths, and making other relevant considerations that will inform the creation
ofthetablestructure.
● Ensure that you restore the data into the PTV schema using regular (local)
tables. Donotutiliseforeigntables,asthedatamustbestoreddirectlywithinthe
PostgreSQLdatabase.
● Ensurethatalltablesaresuccessfullyrestored,including8tablesfromGTFS
and3tablesforMB_2021,LGA_2021andSAL_2021respectively.
Theoutputs of thistask for Reportare:
a) AttachascreenshotoftheresultstoincludeallthetablesyourestoredinTask1,
includingthenumberofrowsforeachtableyourestoredbyusingfollowingcode:
withtblas
(selecttable_schema,TABLE_NAME
frominformation_schema.tables
wheretable_schemain('ptv'))
selecttable_schema,TABLE_NAME,
(xpath('/row/c/text()',query_to_xml(format('selectcount(*)ascfrom%I.%I',table_schema,
TABLE_NAME),FALSE,TRUE,'')))[1]::text::intASrows_n
fromtbl
orderbytable_name;
Task 2: Data Preprocessing for Melbourne Metropolitan area
The purpose of this section is to manipulate the data into a suitable format for the following
taskanalysis.Thistaskhastwoparts:MandatoryrequirementandOptionalrequirements.
Mandatoryrequirement
[Youmustmeetthemandatoryrequirementsdescribedinthissection.]
In this assignment, we aim to explore the transportation accessibility [Topic of report] of
the Melbourne Metropolitan area exclusively [Scope of report]. The mb_2021 table
contains mesh blocks for the entirecountryofAustralia.Tominimisequerycosts,ensurethat
you only use the mesh blocks within the Melbourne Metropolitan area for this assignment.
The Melbourne Metropolitan’s mesh blocks can be identified from the gcc_name21. If the
column contains “Greater Melbourne”,thismeshblockislocatedinMelbourneMetropolitan.
As a result, you need to create a table called "mb2021_mel" that contains ONLY the mesh
blocksinMelbourneMetropolitan.
Optionalrequirements:
[You are free to explore and manipulate the data creatively within the mandatory
requirements, which are limited to Melbourne Metropolitan for the topic of transport
accessibility.]
For optional requirements can be selected based on your specific data analysis needs.Make
sure to include a detailed explanationofyourrationaleinthereportforoptionalrequirements
youchoose.
Question:
DoIhavetoansweratleastoneoftheoptionalrequirements?
Answer:
No, you are free to explore and manipulate the data creatively as long as the data is
analysedinMelbourneMetropolitanforthetopicoftransportaccessibility.
The following suggestion may useful for data exploration and analysis transportation
accessibilityofMelbourneMetropolitanarea:
1.Since the working area will be Melbourne Metropolitan, it is important to have a polygon
for the boundary of our working area. Hint: aggregate all mesh blocks polygon tocreateone
largepolygonforMelbourneMetropolitanboundary.
2.Stops table does not have any geometry column. It might be useful to add a geometry
column, using the latitude and longitude values available in the table. Make sure you use
GDA2020(SRID:7844)forthiscolumn.
3.The Stops table does not show direct information regarding the vehicle types,
routes_short_nameandroutes_long_name.Theseinformationsarestoredintheroutestable.
4.If you want to explore the transportation situation for different vehicle types, such astram,
train, or bus, the vehicle type is determined by the corresponding route type in the routes
table,where:
● 0correspondstotram
● 2correspondstotrain
● 3correspondstobus
● Anyotherroutetypeislabelledas'Unknown'.
Theoutputs of thistask for Reportare:
b) Attach a screenshot of SQL script for creating a table named “mb2021_mel” that
containsONLYthemeshblocksinMelbourneMetropolitan.
c) Provide a detailed explanation of the remaining data processing steps you have
conducted, includingscreenshotsoftheSQLscriptsandtherationalebehindyourchoices
inthereport.
Task 3: Data Analytics and Visualisation
Inthissectionyouwillneedtoperformdataanalysisonthetablesyouhaverestored,focusing
on transport accessibility in metropolitan Melbourne. Use the techniques youhavelearnedin
the spatial database part to carry out your analysis. You are free to choose any specific
perspectives or aspects of data analysis relevant to your dataset,butensurethatyouranalysis
relatestothemaintopic:transportaccessibilityinmetropolitanMelbourne.
This could include exploring different statistical measures or carrying out other relevant
analyses.Presentyourfindingsclearlyandconcisely,demonstratingyourunderstandingofthe
datasetandhighlightinganynotableobservationsorpatterns.
As part of this data visualisation, you will also need to create at least one map-based
headmap using QGIS to present your findings related to the main topics. These
visualisations will be used in the next section of the assignment, the summary report. To
supportyouranalysis,youcanincludescreenshotsofthevisualisationsdirectlyinthereport.
Be sure to include the script or code used for data analysis and data visualisation in the
appendix of your report. The script should provide clearinstructionsonhowtheanalysiswas
performed and any necessary calculations or transformations applied to the data. This will
ensure that your analysis can be reproduced and verified. Remember to include appropriate
labels, titles, and legends in your visualisations to make them easy to understand. The
visualisations should be of sufficient quality and clarity to effectively convey your analysis
findings.
Note:
● UseSQLqueriestoinvestigatetherestoredtables.
● Conductathoroughdescriptiveanalysistouncoverinsightswithinthedata.
● SummariseandVisualiseyourfindingsclearlyandconcisely.
● Highlightkeyobservationsandpatternsdiscoveredduringtheanalysis.
● Ensureyourfindingsreflectadeepunderstandingofthedata.
Theoutputs of thistask for Reportare:
d) Data analysis and visualisation, including the screenshot of SQL scriptandvisualisation.
Forthevisualisation,itmustcontainatleastonemapbasefigure.
Submission Checklist
Summary Report for Task 1 to 3
As aprofessionaldataanalyst,yourtaskistoconsolidatealltheprevioustasks,includingdata
restoration, processing, analysis, and visualisations, into a comprehensive writtenreport.The
report should adhere to a word limitof2000wordsandfollowastructuredformat,consisting
of an introduction, methodology, results, conclusion, and appendix. Please note that a
question-and-answer format is not acceptable for this assignment, and marks will be
deductedforusingsuchaformat.
Please ensure that the report adheres to the given word limit and is well-organised, concise,
andcoherent.Thesamplereportshouldbeformattedasfollows:
Title:Writeyourtitlehereonaseparatepage,(Note:Abstractisnotrequired)
1.Introduction,suchas
● Brieflyexplainthepurposeofthereportandwhatyouaimtoachievewithyour
analysis.
● Highlightthekeyquestions youwanttoinvestigatethroughyouranalysis.
2.Methodology
Thissectionshouldprovideaclearexplanationofthedifferentstagesofyourwork.
● DatasetOverview,suchas
Provideanoverviewofthedataanditssource.
● DataRestorationandPreprocessing,suchas
Explainhowyouimportedandinitiallyexploredthedata.Includethesoftwareand
librariesused.
Provide a detailed explanation of the remaining data processing steps you have
conducted,andtherationalebehindyourchoicesinthereport.
● DataAnalysisandVisualization
Describe which area of transport accessibility in metropolitan Melbourne you are
primarilyexploring.
Describe the analysis you conducted and the types of visualisations you chose to
use, and why you felt they would effectively represent your data and findings.
Whatsoftwareorlibrarieswereusedtocreatethesevisualisations?
3.Results:
Presenttheresultsofyourin-depthinvestigations.Explainwhattheseresultsmean
andhowtheyansweryourinitialquestions.
Forthevisualisation,itmustcontainatleastonemapbasefigure.
4.Discussion
Discussyourfindingsandtheirimplications.
● Restatethemainfindingsofyourdescriptiveandadvancedanalyses.
● Discusshowthesefindingsansweryourinitialquestionsorhypotheses.
● Reflectontheprocessandanylimitationsorchallengesyoufacedduringyour
analysis.
5.References[Excludedfromthe2000-wordlimit]
If you have used external resources, don't forget to cite them properly according to the
chosenstyleguide(APA7thedition).
6.Appendix[Excludedfromthe2000-wordlimit]
Thescreenshotsofthefollowingtasks:
● AttachascreenshotoftheresultstoincludeallthetablesyourestoredinTask1,
includingthenumberofrowsforeachtableyourestored.
● Attach screenshots of theSQLscriptsusedinTask2,includingtheSQLscriptsfor
creating a table called "mb2021_mel" and screenshotsoftheSQLscriptsyouused
fortheremainingdataprocessingsteps.
● TheSQLscriptforDataanalysisandvisualisation
Video presentation
A five minute video presentation in mp4 format save as:
YourstudentID_A3_video.mp4
Based on the report you have created, present your design and findings in a
five-minute video presentation. Ensure you thoroughly understand both the dataset
and thereport toeffectivelyextract andcommunicatethe keypoints.
Assignment Submission
1. A combined pdf file save as: YourstudentID_A3_report.pdf, containing all of the
abovetasks1to 3.
2. A five minute video presentation in mp4 format save as:
YourstudentID_A3_video.mp4
Zip all abovefilesfrom step 1to3,andname theZIP folder asA3_YourstudentID.zip.
● The submission of this assignment must be in the form of a single ZIP file.
Only PDF and .mp4fileswill beaccepted withinthe zipfile.No otherformats
will be accepted.
● You must ensure that you have all the files listed in this checklist before
submitting your assignment to Moodle. Failure to submit a complete list of
fileswill lead tomarkpenalties.
● It's important to note that our support hours arelimited,and wedon'thave the
capacity to address submission issues outside of working hours. You must
ensure that you have allthe fileslisted inthis checklistbefore submittingyour
assignment to Moodle. Failuretosubmita completelistof fileswill resultina
markpenalty.
● Penalty for latesubmission: 5%deductionfor each day,including weekends
● Submission cut-off time:Friday, 1November2024,4:30PM.Submissionswill
not beacceptedafter this timeunlesstherearespecialconsiderations.
Authorship
This assignment is an individual assignment and the final submission must be identifiably
your own work. Breaches of this requirement willresult inanassignment notbeing accepted
for assessment andmayresult indisciplinaryaction.
Late Penalty
Late assignments submitted without an approved extension may be accepted up to a
maximum of seven days with theapprovalof theChief Examinerand/orLecturer butwill be
penalised at the rate of 5% per day (including weekends and public holidays).
Assignments submitted more than seven days after the due date will receive a zeromarkfor
that assignmentand maynot receiveany feedback.
Please note(late penaltyand extension):
1. An inability to manage your time or computing resources will not be accepted as a
valid excuse. (Several assignments being due at thesametimearea factofuniversity
life.)
2. Hardware failures, whether of personal or university equipment, are not normally
recognised as valid excuses.Failure tobackupassignment files isalso notrecognised
asa validexcuse.
Special Consideration
Students no longer seek extensions from chief examiner/teaching team. All extensions /
special considerations will now be handled by the central Spec Con team. Please do not
email teachingstaff to requestan extensionorspecialconsideration.
Extensions and other individual alterations to the assessment regime will only be
considered using the University Special Consideration Policy. Students should carefully
read the Special Consideration website, especially the details about what formal
documentation isrequired.
All special consideration requests should be made using the Special Consideration
Application.
Please do not assume that submission ofaSpecial Considerationapplication guarantees
that it will be granted – you must receive an official confirmation that it has been
granted.
Getting help and support
What canyougethelpfor?
● Consultationswith theTeaching Team
Talkto theTeachingTeam:
https://learning.monash.edu/course/view.php?id=19675§ion=5
● English language skills
Talkto English Connect:https://www.monash.edu/english-connect
● Study skills
Talkto alearningskills advisor:https://www.monash.edu/library/skills/contacts
● Counselling
Talkto acounsellor:https://www.monash.edu/health/counselling/appointments
Plagiarism and Collusion:
Monash University is committed to upholding standards and academic integrity andhonesty.
Pleasetake thetimetoview theselinks.
Academic Integrity Module
Student Academic Integrity Policy
Test your knowledge,collusion(FIT NoCollusion Module)
All the best for your Assignment!