程序代写案例-DATA2001

欢迎使用51辅导,51作业君孵化低价透明的学长辅导平台,服务保持优质,平均费用压低50%以上! 51fudao.top
DATA2001 "Data Science, Big Data and Data Diversity" - 2021 (Roehm/Fekete) 1
DATA2001 – Data Science, Big
Data, and Data Diversity
Assignment Announcement
Presented by
A/Prof Uwe Roehm
School of Computer Science
DATA2001 "Data Science, Big Data and Data Diversity" - 2021 (Roehm/Fekete) 2
Practical Assignment: Bushfire Risk Analysis
– Assignment specification available in Canvas (Canvas: Modules –> Assignment)
– Worth 20% of the final grade in DATA2001/DATA2901
– Due on Friday of Week 12
• Python/SQL notebook; brief report; team demo in tutorials of Week 12/13
– Main idea:
– Calculate a 'risk' or impact score'
per suburb in Greater Sydney wrt. bushfires
• Based on ABS data about population
and bushfire prone areas in NSW
– Visualise and correlate with income data
DATA2001 "Data Science, Big Data and Data Diversity" - 2021 (Roehm/Fekete) 3
Practical Assignment: Bushfire Risk Analysis
– Goal: Practical experience with data variety, data analysis, and presentation
– Technologies as covered in this course: Python, Jupyter notebooks, Web APIs, and SQL
– Three tasks:
– Data import, integration and database generation
• We provide census data and spatial data from NSW Rural Fire Services
• Needs to be loaded into database and combined, eg. via spatial join
• Feel free to extend with own datasets
• Milestone 1: Integration of provided datasets to be ready in Week 11 tutes
– Bushfire Risk Analysis (Jupyter Notebook)
• Computation of risk score per neighborhood; example formula is provided
• When adding other datasets, feel free to adjust formula
• Correlation analysis to affluency of neighborhoods
– Documentation and (brief) Report
– Additional tasks/options on web access and ML for teams in advanced stream
DATA2001 "Data Science, Big Data and Data Diversity" - 2021 (Roehm/Fekete) 4
Provided Datasets (cf. Canvas)
– ABS Data
– Census data on neighbourhoods (SA2-level areas) in Greater Sydney+surrounds
such as population, land area, number of dwellings
– Business statistics per SA2-area
– Income and rent statistics to check for correlation with
– NSW Rural Fire Services – Bush Fire Prone Land (BFPL)
– Locations and areas of bushfire prone land in NSW with 3 categories
– Note that SA2-level data from the ABS does not always match suburbs, and that
the BFPL data is heavily simplified with just a GPS location and an area size;
neither the ABS neighbourhoods nor the BPFL data contain actual shapes
– cf. tutorial this week on how to retrieve boundary data for neighbourhoods
– Adding more datasets from your side is explicitly encouraged.
– Try different types and forms, not just CSV…
DATA2001 "Data Science, Big Data and Data Diversity" - 2021 (Roehm/Fekete) 5
Assignment Rules
– Groupwork
– teams of 2 (unless odd-size class or other good reasons)
– All team members should be in the same tutorial
– Deliverables: Jupyter notebook with source code and a short report (PDF)
– See page 4 of the assignment handout
– Due on Friday of Week 12
– Submission page and marking rubric will be published in Canvas
– Only one member per team needs to submit for the whole group;
they should submit both a ZIP archive under "Bushfire Risk Analysis Assignment" and
also the PDF of your report in the separate "TurnItIn Dropbox – Bushfire Risk Analysis"
– Late submissions: -20% of achieved mark per day late
– Demo in Weeks 12 and 13
– There will be a short demo during the tutorials of the last two weeks to the tutors
– Individual grades can be scaled based on participation in project or demo
DATA2001 "Data Science, Big Data and Data Diversity" - 2021 (Roehm/Fekete) 6
Tip: PostGIS
– Spatial database extension for PostgreSQL supporting geographic objects (OGC)
– Geometry types for Points, LineStrings, Polygons, MultiPoints, etc.
• including import/export from standard formats such as GeoJSON or KML
– Support for spatial reference systems and transformations between
– Spatial predicates on geometries using the 3x3 nine-intersection model
– Spatial operators for determining geospatial measurements like area, distance, length
and perimeter, and geospatial set operations, like union, difference etc.
– R-Tree indexing (over GiST)
– Example:
INSERT INTO superhero VALUES ('Catwoman', ST_SetSRID(ST_MakePoint(41.87,-87.634), 4326);
SELECT superhero.name
FROM city, superhero
WHERE ST_Contains(city.geom, superhero.location)
AND city.name = 'Gotham';
[http://postgis.net/documentation/]
DATA2001 "Data Science, Big Data and Data Diversity" - 2021 (Roehm/Fekete) 7
WGS84 versus Australian GDA94
– WGS84 is used by the GPS system
– The official geodetic datum (coordinate system) for Australia is GDA94
("Geocentric Datum of Australia")
– Based on IERS Terrestrial Reference Frame (ITRF), but fixed to a number of
reference points in Australia.
– ABS data will use GDA94
– Difference between WGS84 and GDA94:
– "The spheroids used for WGS84 and GDA84 are also almost identical, and
both systems are geocentric. Thus for most mapping, exploration and GIS uses,
WGS84 and GDA94 coordinates will be the same. […] For precise surveys,
however, the difference between WGS84 and GDA94 may be significant, and
changes slowly over time. […] The difference between GDA94 and WGS84 is
approximately 45cms in 2000."
[http://www.geoproject.com.au/gda.faq.html]
DATA2001 "Data Science, Big Data and Data Diversity" - 2021 (Roehm/Fekete) 8
OpenGIS Consortium (OGC) Data Model
[Source: OGC Simple Features, 2016]
SRID
- part of every geometry
- needs to match for
spatial predicates

欢迎咨询51作业君
51作业君

Email:51zuoyejun

@gmail.com

添加客服微信: abby12468