University of Victoria

Examinations April 2020 [On-line Exam]

ECE 535:

Data Analysis and Pattern Recognition A01 & A02

(CRN 20946 & 20947)

Name: Student No. V0

Instructor: Stephen W. Neville Section: A01 & A02

Duration: 3.0 Hours

TO BE ANSWERED ON THE EXAM PAPER

STUDENTS MUST COUNT THE NUMBER OF PAGES IN THIS EXAMINATION PAPER BEFORE

BEGINNING TO WRITE, AND REPORT ANY DISCREPANCY IMMEDIATELY TO THE INVIGILATOR.

THIS QUESTION PAPER HAS 8 PAGES.

TOTAL MARKS: 100

ATTEMPT ALL QUESTIONS.

ALL FIVE (5) QUESTIONS WILL BE COUNTED FOR YOUR GRADE.

EACH QUESTION IS WORTH 20 MARKS.

ECE 535 (A01 & A02), Page 2 of 8

On-line Exam Academic Integrity Statement:

All students writing this on-line exam must abide by UVic academic regulations and observe standards of ‘scholarly

integrity,’ (no plagiarism or cheating).

As such, this online exam must be taken individually and not with a friend, classmate, or group and while only

accessing the allowed information sources, as listed below. You are also prohibited from sharing any information

about the exam with others.

PLEASE ENTER YOUR NAME AND V0 NUMBER INTO THE PLEDGE TEXT BELOW AND SIGN AND

DATE THE STATEMENT.

I, , V0 affirm and confirm that I will

(and have) completed this exam independently, all provided answers will be (and are) solely my own work, and that

I will not make (and have not made) use of any unauthorized materials in my completion of this exam.

Signature: Date:

Note: All exams submitted without a signed academic integrity statement will not be marked and will received a

zero grade.

Exam Notes :

• On-line mathematical resources such as Matlab, Wolfram Alpha, MS Execl, etc. can be used to answer exam

questions, but the resulting code, Excel sheet, etc. must be submitted along with your answer. Answers

submitted without this supporting material, if used, will receive a ZERO grade.

• No other aids permitted. Written answers may be handwritten or word processed.

• All work must be shown. Answers without worked solutions will receive a ZERO grade.

• All submitted exam materials must have your student name and number clearly denoted on them, i.e, on all

submitted code, on each and all scanned exam pages, etc.

• All uploaded exam materials must be either in .pdf or .zip formats, with the pdfs readable by standard pdf

readers, i.e., Acrobat Read. Any materials uploaded in any other formats will NOT be graded. Matlab

LiveScripts must be converted into .zip files before uploading.

• The academic integrity statement must be signed for the exam to be eligible to be marked. Exams with

unsigned statements will not be marked and will receive a ZERO grade.

• The exam includes the exam paper as well as the associated data files.

• Students are solely responsible for ensuring the legibility and accessibility of any and all uploaded materials.

Illegible answers or answers inaccessible using standard technologies will not be marked and will receive a

ZERO grade.

• Security permissions and passwords must not be set on uploaded .zip or .pdf files. Any uploaded materials

with set security features and/or passwords will not be marked and will receive a ZERO grade.

ECE 535 (A01 & A02), Page 3 of 8

• Do not change or rename the provided data files as the original files will be the files that your Matlab

LiveScripts will be executed and marked against.

• Note: Basic Matlab functions, such as mean(.), cov(.), eig(.), goodness-of-fit tests, plot(.), contours, etc. can

be used. But, functions which effectively solve the major bulk of what is being asked cannot be used, e.g.

using a pca(.) function to do principal component analysis. To receive marks you need to show that you are

able to do what is being asked, not that Matlab has a function that someone else has already created to do

what is being asked.

• Submitted code may be checked, including via automated tools, for similarity to other student code or Internet

available code, with any code determined to be highly (overly similar) similar receiving a ZERO grade.

• All submitted LiveScript solution must begin with the command ”clear all” as the first executable statement

in the LiveScript.

ECE 535 (A01 & A02), Page 4 of 8

1. Develop a Matlab LiveScript to:

(a) Load the data file Data Ques 1, which contains a 25 dimensional 1000 sample data set as the variable

Data, where is data sample is a row, and a 1000 × 1 vector Classes, denoting the class labels for each

data point with Class 1 denoted by a 1 and Class 2 by a 2 within this vector.

(b) Apply Fisher Discriminant analysis to reduce this data to a 2-dimensional data set that best linearly

separates the two classes.

(c) Produce a scatter plot of this generated 2-dimensional data, with Class 1 plotted in blue and Class 2

plotted in red.

(d) Formally determine whether the reduced data represents Gaussian distributed classes and, if, so what

are the p(x|ω1) and p(x|ω2) distributions?

(e) Determine the decision boundary between Class 1 and Class 2 in the reduced feature space.

(f) Discuss whether (or not) the application of Fisher Discriminant analysis will always improve classifier

performance.

Note: Your LiveScript must be appropriately commented using Text sections such that it is clear what each

subsequent Code section is intended to do (and does). You will submit your LiveScript code as your answer

to this question. This LiveScript should be named ”firstname lastname ques1 ans.mlx”, using your first and

last names.

ECE 535 (A01 & A02), Page 5 of 8

2. Discuss the implications of the No Free Lunch and Ugly Duckling theorems on modern data analysis and

pattern recognition techniques such as Deep learning, neural networks, etc.

Additionally, discuss these issues with respect to the particular problem of seeking to develop a machine

learning image recognition solution to be used at full production-scales for self-driving cars, i.e., where the

image space would be on the order of 1015M , the available data would be on the order of 1016, and full

production-scale would mean tens to hundreds of millions of self-driving vehicles on the road.

ECE 535 (A01 & A02), Page 6 of 8

3. Two approaches commonly arising within current machine learning-based (ML-based) data analysis are:

• Training a ML classifier on a random selection of 90% of the available data and then testing (assessing)

the classifier’s performance on the remaining 10% of the data, i.e., assessing the ML performance based

on the data not seen during training.

• Reporting the classifier’s performance in terms of a Receiver-Operator Curve (ROC), which denotes the

hit probability versus the false alarm probability, that the trained ML classifier achieved over its test

data.

(a) Discuss the issues with a 90/10 training regime and the conditions where it will and will not lead to a

trained ML classifier that correctly and properly generalizes.

(b) Discuss the issues with such uses of ROC curves and whether and when such an approach would and

would not be appropriate and informative.

ECE 535 (A01 & A02), Page 7 of 8

4. Develop a Matlab LiveScript to:

(a) Load the data file Data Ques 4, which contains 10,000 samples taken from a time series stochastic process

as the variable Data where the first column is the time stamp of each data sample and the second column

is the measured data.

(b) Apply statistical hypothesis testing to determine whether this time series data has a stationary mean

and standard deviation.

(c) Determine whether this time series data has a quasi-stationary mean and/or standard deviation.

(d) Determine whether the data is wide sense stationary.

(e) Determine whether the data can be modeled using a Gaussian distribution.

Note: Your LiveScript must be appropriately commented using Text sections such that it is clear what each

subsequent Code section is intended to do (and does). You will submit your LiveScript code as your answer

to this question. This LiveScript should be named ”firstname lastname ques4 ans.mlx”, using your first and

last names.

ECE 535 (A01 & A02), Page 8 of 8

5. The three classes ω1, ω2, and ω3 are given by the sufficient statistics:

µ1 =

−1010

10

, µ2 =

614

−4

, µ3 =

0−2

12

,

Σ1 =

24.1321 1.3489 −3.51861.3489 47.8791 7.9578

−3.5186 7.9578 6.0794

, Σ2 =

6.2852 −1.1446 −4.5746−1.1446 9.5775 1.5270

−4.5746 1.5270 13.2511

, and

Σ3 =

19.3188 −5.6360 0.7915−5.6360 27.9177 −6.2955

0.7915 −6.2955 30.0063

Develop a Matlab LiveScript to:

(a) Load the data file Data Ques 5, which contains three training data sets of 100 samples each from each

of ω1, ω2, and ω3 denoted by the 3× 100 Matlab variables Class1, Class2, and Class3, where the Matlab

variables m1, m2, m3, S1, S2, and S3 are the above per-class means and covariances, i.e., you do not

need to hand enter these per-class sufficient statistics.

(b) Apply a Bayes classifier with equal a prior probabilities to classify the following ground-truthed points:

Class 1 Points: Dω1 =

−5.67830.3313

−4.6218

,

5.483820.6387

−0.6436

,

10.00273.9438

6.5321

Class 2 Points: Dω2 =

3.1000−0.4120

−1.9054

,

19.763813.1648

12.0513

,

−6.63135.8846

3.2152

Class 3 Points: Dω3 =

−2.5014−7.3041

2.1365

,

−6.3075−7.1580

9.9063

,

18.91997.4855

4.3541

Note: These points are in the Data Ques 5 file as the row-wise variables D w1 , D w2 , and D w3, i.e,

the points above are the rows within the respective variables.

(c) Apply a 1-nearest neighbor classifier to classify the points of (b) based on the training data sets contained

with the variables Class1, Class2, and Class3 in the Data Ques 5 file where each row-wise variable

contains 100 data samples from the given class.

(d) Explain the classification differences that occur between the (b) and (c) classifiers’ performance from the

perspective of the known ground-truths for these points and the minimization of Bayes risk. For this

discussion also provide a clear table, for all of the points above, delineating each point’s actual class, its

class as assigned by the Bayes classifier, and its class as assigned by the 1-nearest neighbor classifier.

Note: Your LiveScript must be appropriately commented using Text sections such that it is clear what each

subsequent Code section is intended to do (and does). You will submit your LiveScript code as your answer

to this question. This LiveScript should be named ”firstname lastname ques5 ans.mlx”, using your first and

last names.

END

欢迎咨询51作业君