代写辅导接单-ECE 657A

欢迎使用51辅导,51作业君孵化低价透明的学长辅导平台,服务保持优质,平均费用压低50%以上! 51fudao.top

University of Waterloo

ECE 657A: Data and KnowledgeModeling and Analysis

Winter 2024

Assignment 1: Data Cleaning and Dimensionality Reduction

Due: Mar 25th, 2024 11:59pm

 

Overview

Assignment Type: Done in groups of up to three students.

Hand in: One report (PDF) or python notebook per group, via the LEARN dropbox. Also submit the code / scripts needed to reproduce your work. (If you are submitting by PDF, if you don’t know LATEX should try to use it, it’s good practice and it will make the project report easier)

Objective: To gain experience on the use of classification.

Datasets

Available on LEARN

Dataset A: This data is the splice junctions on DNA sequences. The given data set includes 2200 samples with 57 features, in the matrix ’fea’. It is a binary class problem. The class labels are either +1 or -1, given in the vector ’gnd’. Parameter selection and classification tasks are conducted on this data set. (File:DataA.csv)

Dataset B: This data consists of 3 different types of irises’ (Setosa, Versicolour, and Virginica) petal and sepal length. The rows being the samples and the columns being: Sepal Length, Sepal Width, Petal Length and Petal Width. (File: DataB.csv)

Dataset C : Handwritten digits of 0, 1, 2, 3, and 4 (5 classes). This dataset contains 2066 samples with 784 features corresponding to a 28 x 28 gray-scale (0-255) image of the digit, arranged in column-wise. This data is used to illustrate the difference between feature extraction methods. (File: DataC.csv)

 

 

 

Guidelines

• No late submissions will be accepted.

• The answer sheets are checked for plagiarism.

• The code will check for plagiarism with the online websites. • For all the random state use seed=42

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Nonlinear Dimensionality Reduction

Refer to DataA.csv

Apply the nonlinear dimensionality reduction methods LocallyLinear Embedding (LLE) and ISOMAP to the dataset C, set the number of nearest neighbors to be 5, the projected low dimension to be

4.

1. Apply LLE to the images of digit ’3’ only. Visualize the original images by plotting the images corresponding to those instances on 2-D representations of the data based on the first and second components of LLE, see Figure for an example ofwhat this looks like for random location of images on of the number 1-3. Describe qualitatively what kind of variations is captured.

2. Repeat step 1 using the ISOMAP method. Comment on theresult. Does ISOMAP do better in some way? Are the patterns being found globally based or locally based?

3. Use the Naive Bayes classifier to classify the dataset based on the projected 4-dimension representations of the LLE andISOMAP. Train your classifier by randomly selected 70% ofdata, and test with remained 30%. Retrain for multiple iterations (using different random partitions of the data) and use the average accuracy of multiple runs for your analysis. Justify why your number of iterations was sufficient. Basedon the average accuracies compare their performance with PCA and LDA. Discuss the result.

 

Binary Classification

 

51作业君

Email:51zuoyejun

@gmail.com

添加客服微信: abby12468