辅导案例-2030ICT/7030ICT

欢迎使用51辅导,51作业君孵化低价透明的学长辅导平台,服务保持优质,平均费用压低50%以上! 51fudao.top

2030ICT/7030ICT
Introduction to Big Data Analysis


Assignment Specifications
Part 1






Trimester 2 - 2020

Instructions
● Due: Monday, 3 August 2020 at 12:00 midday (Brisbane time)
● Marks: 20% of your overall grade
● Late Submissions: Late submission is allowed but Penalty applies. The penalty is defined as
is the reduction of the mark allocated to the assessment item by 5% of the total weighted mark
for the assessment item, for each working day that the item is late. A working day will be
defined as Monday to Friday. Assessment items submitted more than five working days after
the due date will be awarded zero marks.
● Extensions: You can request for an extension of time on one of two grounds, as follows:
○ medical
○ other (e.g., family or personal circumstances, employment-related circumstances,
unavoidable commitments).
Please note that proof documents (e.g., medical certificate) are needs for the approval.
● Group Work: You must complete this assignment in a group of maximum 3 students.

Overview
In this assignment, you will need to apply data analytics, using the tools introduced during the labs.
You are required to study and analyze the SEEK job market data for which a dataset is provided. The
assignment consists of 2 parts.
In this first part (Assessment 2), you will need to understand data characteristics using data preparation
and preprocessing techniques. Then, you will perform various data analysis techniques to gain better
understandings about the dataset.
In the 2nd part (Assessment 3), you will need to perform more advance techniques to explore the dataset
deeper. You also need to propose solutions for some challenges which are statemented based on real-
life situations.
• The primary dataset that we would like to use is the job market dataset which is provided in
CSV format (data.csv).
• Perform data preparation and preprocessing for your analysis
• All the chart in the assignment are used for your reference. You are free to choose your
own style.
Part 1 – Data Preparation and Preprocessing. [15 points]
1. Describe the dataset. (8 points)
✓ Describe the dataset (e.g.: type of column, value range). (1 point)
✓ How many records are there in the dataset? (1 point)
✓ Which period does it cover? How many different dates have job postings? (1 point)
✓ How many locations does the dataset have? Which location has the most job postings? (1
point)
✓ How many job sectors(job classifications) are there in the dataset? List the name of each
sector and its’ total of job postings. (1 point)
✓ Choose your favorite job sector (e.g. Information & Communication Technology), how many
sub-sectors are there in that sector? List the name of each sub-sector and its’ job posting
number. (1 point)
✓ List the salary ranges and their total of job postings. (1 point)
✓ List the job types. In each job type, what are the lowest salary and highest salary? (1 point)

2. Normalize and clean data. (7 points)
✓ The salaries are kept in the dataset as “HighestSalary” and “LowestSalary”. You should
calculate the “AverageSalary” for each job. (1 point)
✓ The raw dataset values of the "Id" column had inconsistencies in their representation. The Id
values should have 8 number long integers only. Write code to remove unnecessary
characters. (1 point)
✓ The "Date" column is represented in a format that contained both date and time information.
However, the time is not correct and should be removed. (1 point)
✓ Change type of “Id” column to numeric and change type of “Date” column to DateTime. (1
point)
✓ Are there any duplicate data in the dataset? Provide the way you find them and your solution
to fix it. ** (1.5 point)
✓ Check missing data and visualize them in a corresponding chart.** (1.5 point)

Part 2 – Data Analysis and Interpretation. [5 points]
✓ Get the salary ranges using “AverageSalary”, the total jobs of each range and display them in
the bar chart. (1 point)



✓ Display the list of job types and the number of jobs of each type using pie chart. (1 point)


✓ Display the list of job sectors and the number of jobs of each type using horizontal bar chart. (1
point)



✓ Choose your favorite location. Visualize the market share of that location in pie chart. (1 point)



✓ Can you find the salary distribution for the top 30 cities for the number of job postings?
Visualize them in the boxplot chart. (1 point)

51作业君

Email:51zuoyejun

@gmail.com

添加客服微信: abby12468