程序辅导案例 > C/C++ >

代写接单-COS60008 Introduction to Data Science Semester 1 2022 – Final Project

欢迎使用51辅导，51作业君孵化低价透明的学长辅导平台，服务保持优质，平均费用压低50%以上！ 51fudao.top

Swinburne University of Technology Department of Computing Technologies

COS60008 Introduction to Data Science

Semester 1 2022 – Final Project

Due: 23:59 Friday 10 June 2022

Introduction

This is an individual assignment and worth 40% of your final grade. This project utilises the data set

that you selected in Assignment 2 which intends to evaluate your understanding and practical skills about the entire data science process, with a particular focus on data modelling.

Academic Integrity

The submitted assignment must be your own work, and any parts that are not created by yourself must be properly referenced. Plagiarism is treated very seriously at Swinburne. It includes submitting the code and/or text copied from other students, the Internet or other resources without proper reference. Allowing others to copy your work is also plagiarism. Please note that you should always create your own assignment even if you have very similar ideas with other students.

Plagiarism detection software will be used to check your submissions. Severe penalties (e.g., zero mark) will be applied in cases of plagiarism. For further information, please refer to the relevant section in the Unit Outline under the menu “Syllabus” in Canvas and the Academic Integrity information at: https://www.swinburne.edu.au/current-students/manage-course/exams-results-assessment/plagiarismacademic-integrity/

General Requirements

This section contains the general requirements which must be met by your submitted assignment. Marks will be deducted if you fail to meet any of the following general requirements.

• You must complete Tasks 1 in the Jupyter Notebook under the Python 3 kernel.

• All code must be written in one single .ipynb file, where each task and the sub-tasks therein (if

any) must be clearly separated via Markdown cells to ensure good readability.

• You must include code-level comments in the .ipynb file to explain the key parts of your code.

• You must follow the instructions given in each task to complete the corresponding task.

• You must follow the rules specified in the “Submission Requirements” section to make your final

submission.

• Your code in the submitted .ipynb file must be executable during marking, where all necessary

files needed for executing the code must also be submitted, as detailed in the “Submission

Requirements” section.

• All graphs must be properly sized and formatted to include a meaningful title, appropriate axis

labels, and a legend. The fonts contained in the graph must be properly sized for good readability. The components of the graph should be appropriately coloured, if applicable.

Task 1 – Data Exploration (65%)

You will need to accomplish the following. You should apply the suitable techniques covered in the lectures and tutorials.

2. Select one suite of data partitioning from assignment 2. This will split the data into the training data and the test data. The training data will be used for model development, with the test data for performance evaluation.

3. Perform model development

o List all your learning algorithms by expanding on the hyperparameters. For example, you might select

RandomForest, K-Nearest Neighbours (K-NN) and Artificial Neural Networks (ANN) as the three learning algorithms. You nominate the number of neighbours N as the hyperparameter and proposed 5 possible values (e.g. 6, 7, 8, 9 ,10). Hence effectively, you will have the following algorithms:

1. Continue the final project from Assignment 2

o Utilise the data set that you prepared in Assignment 2. If required, perform data pre-processing. This

includes but is not limited to checking typos, dealing with missing values and creating dummy

variables.

o Formulate the problem as a machine learning task that you started in Assignment 2

o Add a third learning algorithm to the two that were selected in assignment 2 and identify the

corresponding hyperparameters if any. There must be at least one of hyperparameter.

  

RandomForest (0 hyperparameters, 1 model)

ANN (0 hyperparameters, 1 model)

K-NN (1 hyperparameters with 5 possible values, 5 models)

 K-NN(N=6)  K-NN(N=7)  K-NN(N=8)  K-NN(N=9)  K-NN(N=10)

o Assess each learning algorithm on the training data. For a given learning algorithm L, you will assess its validation performance as follows:

 Define an n-fold cross validation within the training data, where n is from 3 to 5.

 In each fold, identify the actual training data trData and the validation data vlData. Train L on

the trData and test on the vlData to get the validation performance P.  Obtain the average of P over all folds, which is the final performance of L.

o Select the model M with the highest validation performance. 4. Perform performance assessment

o Apply M on the test data to get the prediction. o Calculate the accuracy and the confusion matrix.

Task 2 – Video (35%)

Prepare a video presentation of at most 3 minutes to showcase the methods you use and report the main findings. You should highlight

o the rational of selecting the  data partition

 3 models

 validation method  best model

o key performance assessment

o Recommendation of model suitability, e.g. given a domain example and recommend a

model based on your best judgement

The video must be recorded at standard definition SD resolution (640 X 480) or (720 x 480), saved in the MP4 format and named “report.mp4” for submission.

Submission Requirements

The final project is due:

23:59, Friday 10 June 2022

Assignments submitted after this time are subjected to late submission penalties. For detailed information, please refer to the relevant section in the Unit Outline under the menu “Syllabus” in Canvas. You need to prepare the following two files:

1. A notebook file named development.ipynb which contain all your code and code-level comments for Task 1.

Note: Please make sure to clean the code before making submission to remove all unnecessary code. You should execute the steps: “Main menu → Kernel → Restart & Run All” in the Jupyter Notebook to ensure you see all the data printed and all the graphs displayed as expected.

2. An HTML version with output as development.html apart from the notebook file.

3. A binary file named model.pickle which contains sklearn model representing the final model M.

4. A video file named report.mp4.

To submit, you must archive all of the above files into ONE single .zip file,

Assignments/Final Project

Please do NOT submit other unnecessary files.

The estimated time needed to complete this project is 5 working days. Hence extension is already incorporated into the due date.

name it as per your student ID (e.g.,

1234567.zip if your student ID is 1234567), and then submit it in Canvas under:

Final Project: Assessment Criteria

Your work will be assessed based on the submitted code (65 points) and a video presentation (35 points).

Code (65 points)

Criteria Ratings

Pts

10 pts

5 pts

1. Code for data pre- processing. You have comments on what you have done and print the final data frame.

10.0 to > 7.0 pts Very Good

Working code with well commented both high and detailed level

7.0 to > 4.0 pts Good

Working code with comments are either high or detailed level

4.0 to > 2.0 pts Good Attempt

English form of the code as comment, e.g. x = 4 is commented as x is assigned 4

2.0 to 0.0 pt Needs Improvement

Code only with no comment

2. Specify the learning type of the problem.

5.0 to > 3.5 pts Very Good

Well written and understandable in layman’s term

3.5 to > 2.0 pts Good

General discussion

2.0 to > 1.0 pts Good Attempt

Confusing at some point

1.0 to 0.0 pt Needs Improvement

No attempt or literal English cersion of the code

3. Specify three learning algorithms. Nominate at least one hyperparameter and propose the possible values.

5.0 to > 3.5 pts Very Good

Appropriate selection of 3 models and hyperparameter

3.5 to > 2.0 pts Good

Appropriate selection of 2 models and hyperparameter

2.0 to > 1.0 pts Good Attempt

Appropriate selection of 1 models and hyperparameter

1.0 to 0.0 pt Needs Improvement

No attempt or missing hyperparameter

4. Code for data partitioning. Print the training data and the test data.

10.0 to > 7.0 pts Very Good

Working code with well commented both high and detailed level

7.0 to > 4.0 pts Good

Working code with comments are either high or detailed level

4.0 to > 2.0 pts Good Attempt

English form of the code as comment, e.g. x = 4 is commented as x is assigned 4

2.0 to 0.0 pt Needs Improvement

Code only with no comment

10 pts

Code for model development

5.1 List all your learning algorithms by expanding on the hyperparameters.

10.0 to > 7.0 pts Very Good

3 working learning algorithms and hyperparameters

7.0 to > 4.0 pts Good

2 working learning algorithms and hyperparameters

4.0 to > 2.0 pts Good Attempt

1 working learning algorithm and hyperparameters

2.0 to 0.0 pt Needs Improvement

No attempt or code only with no hyperparameters

5.2 Code for assessing the validation performance for each learning algorithm.

10.0 to > 7.0 pts Very Good

3 working validation

7.0 to > 4.0 pts Good

2 working validation

4.0 to > 2.0 pts Good Attempt

1 working validation

2.0 to 0.0 pt Needs Improvement

No attempt or partial validation

10 pts

o5 pts

5.3 Present the mean P over folds, and specify the model selected.

5.0 to > 3.5 pts Very Good

Specified P and good explanation of the selection

3.5 to > 2.0 pts Good

Specified P with partial explanation

2.0 to > 1.0 pts Good Attempt

Specified P with no explanation

1.0 to 0.0 pt Needs Improvement

No attempt or specified P but unsure of correctness

Perform performance assessment

Criteria Ratings

6.1 Code for applying M on the test data. Add a column in the test data name predicted which holds the predictions for each row.

6.2 Code for the

accuracy and the confusion matrix.

Video Presentation (35 points)

5.0 to > 3.5 pts Very Good

Working code with well commented both high and detailed level

5.0 to > 3.5 pts

Very Good

Working code with well commented both high and detailed level

3.5 to > 2.0 pts Good

Working code with comments are either high or detailed level

Good

Working code with comments are either high or detailed level

2.0 to > 1.0 pts Good Attempt

English form of the code as comment, e.g. x = 4 is commented as x is assigned 4

Good Attempt

English form of the code as comment, e.g. x = 4 is commented as x is assigned 4

1.0 to 0.0 pt Needs Improvement

Code only with no comment

1.0 to 0.0 pt

Needs Improvement

Code only with no comment

Pts

5 pts

3.5 to > 2.0 pts

2.0 to > 1.0 pts

5 pts

Criteria Ratings

Pts

5 pts

Rational of selecting the data partition

5.0 to > 3.5 pts Very Good

Demonstrate knowledge

3.5 to > 2.0 pts Good

Can improve by providing relevant example

2.0 to > 1.0 pts Good Attempt

Reading

1.0 to 0.0 pt Needs Improvement

No attempt or vague statements

Rational of selecting the 3 models

5.0 to > 3.5 pts Very Good

Demonstrate knowledge

3.5 to > 2.0 pts Good

Can improve by providing relevant example

2.0 to > 1.0 pts Good Attempt

Reading

1.0 to 0.0 pt Needs Improvement

No attempt or vague statements

Rational of selecting the validation method

5.0 to > 3.5 pts Very Good

Demonstrate knowledge

3.5 to > 2.0 pts Good

Can improve by providing relevant example

2.0 to > 1.0 pts Good Attempt

Reading

1.0 to 0.0 pt Needs Improvement

No attempt or vague statements

Rational of selecting the best model

5.0 to > 3.5 pts Very Good

Demonstrate knowledge

3.5 to > 2.0 pts Good

Can improve by providing relevant example

2.0 to > 1.0 pts Good Attempt

Reading

1.0 to 0.0 pt Needs Improvement

No attempt or vague statements

Key performance assessment

5.0 to > 3.5 pts Very Good

Demonstrate knowledge

3.5 to > 2.0 pts Good

Can improve by providing relevant example

2.0 to > 1.0 pts Good Attempt

Reading

1.0 to 0.0 pt Needs Improvement

No attempt or vague statements

Recommendation of model suitability, e.g. given a domain example and recommend a model based on your best judgement

10.0 to > 7.0 pts Very Good

Demonstrate knowledge

7.0 to > 4.0 pts Good

Can improve by providing relevant example

4.0 to > 2.0 pts Good Attempt

Reading

2.0 to 0.0 pt Needs Improvement

No attempt or vague statements

10 pts