辅导案例-FIT5202-Assignment 2

欢迎使用51辅导，51作业君孵化低价透明的学长辅导平台，服务保持优质，平均费用压低50%以上！ 51fudao.top

FIT5202: Data Processing for Big Data - Assignment 2 Marking Rubric
Part A Excellent Very Good Good Satisfactory
Step 01: Import pyspark and initialize Spark
Importing and initializing has been done
successfully
according to the specification. RDDs have been
created with 4 cores and the question has been
answered correctly.
Importing and initializing has been done
with some minor mistakes. RDDs have been
created with single core and the question has been
answered correctly.
Importing and initializing has been done
with some major mistakes. RDDs have been
created
and the question has been answered
correctly.
Importing and initializing has been done
with some major mistakes. RDDs have been
created
and but the question has not been answered.
Step 02:Load the dataset and print the schema and total
number of entries
The dataset is loaded properly and the total
number of entries is calculated and displayed.
The dataset is loaded properly and the total
number of entries is calculated and displayed but
output is wrong.
The dataset is loaded properly and total
number of entries is calculated but not
displayed.
Only data is loaded but total entries is not
calculated.
Part B
Step 03: Delete columns from the dataset
All the columns specified in the question are
deleted or delete some of them with proper
justificaition..
Some of the columns are deleted without proper
justification.
Only 2 or 3 columns are deleted. Only 1 column is deleted.
Step 04: Print the number of missing data in each column.
Define the function to calculated the missing
data in each column and displays the correct
number.
Calculate all the columns but display the wrong
total.
Calculate only some of the columns and
display
Calculate only some of the columns and
display wrong total.
Step 05: Fill the missing data with average value and
maximum occurrence value.
Fill all the missing data both for numeric value
and non-numeric value with correct information Partially fill the missing data both for numeric value
and non-numeric value
Only the numeric values are filled with proper
average but the non-numeric values are not
properly calculated.
Only numeric or non-numeric values are
properly done.
Step 06: Data transformation
The type casting is done properly for all the
specified columns and StringIndexer method is
also applied properly. The columns whose
Indexing is done, their original columns are
droped.
The type casting is done properly for all the
specified columns and StringIndexer method is
also applied properly. The columns whose Indexing
is done, their original columns are not droped.
The type casting is not done properly for all
the specified columns but StringIndexer
method is applied properly. The columns
whose Indexing is done, their original columns
are not droped.
The type casting is not done properly for all
the specified columns and StringIndexer
method is also applied pertially. The columns
whose Indexing is done, their original columns
are not droped.
Step 07: Create the feature vector and divide the dataset
The feature vector is properly created and the
dataset is split properly (randomsplit).
The feature vector is properly created and the
dataset is split but not randomly.
The feature vector is properly created and the
dataset is not splited in described ration.
The feature vector or split is not done
properly.
Part C
Step 08: Apply machine learning classification algorithms on
the dataset and compare their accuracy. Plot the accuracy as
bar graph.
All the machine learning algorithms are properly
impletement and their accuracy is calculated.
The plot is also properly created.
Only subset of machine learning algorithms are
properly impletement and their accuracy is
calculated. The plot is properly created.
Only subset of machine learning algorithms
are properly impletement but their accuracy is
not calculated properly. The plot is not
properly created.
Only one of machine learning algorithm is
properly impletement and its accuracy is
calculated properly. The plot is not properly
created.
Step 09: Calculate the confusion matrix and find the precision,
recall, and F1 score of each classification algorithm. Explain
how the accuracy of the predication can be improved?
The precision, recall and F1 score are calculated
properly for all the algorithms. Properly explain
the improvement staregies.
The precision, recall and F1 score are calculated
properly for all the algorithms. Partially explain the
improvement staregies.
The precision, recall and F1 score are
calculated properly for some of the algorithms.
Partially explain the improvement staregies.
The precision, recall and F1 score are
calculated properly for some of the
algorithms. No explanation for the
improvement.