程序代写案例-ELEC9741

ELEC9741 – Electrical Engineering Data Science
Part II
Assignment, 2021
Instructions
1. Due Wednesday (28th,July) 4pm, via Moodle
2. S
igned school cover sheet attached
3. Typed only – No handwritten
4. Computer output: No discussion = no marks
5. Analytical results: No working = no marks
6. If using third part code/toolboxes/libraries – make it explicit
7. When explaining anything, always do so with appropriate equations (wherever possible)
8. Follow Prof. Solo’s homework guide and adhere to principles you learnt about visualisation
9. You can discuss the problems with your classmates but NO COPYING
1. (15) Data Distributions
a. Given a multivariate Gaussian distribution defined on a -dimensional space, estimate
the probability (up to a constant scaling factor) of finding a point as a function of
distance from the mean. You may assume the Gaussian distribution is spherical (i.e.,
its covariance matrix is given by ). Plot this for different values of (dimensionality
of space).
b. Simulate the distribution of points in MATLAB and empirically estimate the probability
of finding a point as a function of distance. Compare with previous answer.
c. In speaker verification systems, it is common to represent speech utterances as i-
vectors which are typically 200-400 dimensional vectors that nominally follow a
standard normal distribution (mean is and covariance is ). It has been argued that
‘cosine similarity’ is a good measure of similarity/dissimilarity between these vectors.
Do you agree? Justify your answer based on your previous plots. Cosine similarity is
defined as below:
௖௢௦௜௡௘ሺ,ᇱሻ = 〈,ᇱ〉‖‖‖ᇱ‖
Here, 〈∙,∙〉 denotes vector dot product.
2. (15) Modelling
a. Implement the EM algorithm to obtain the maximum likelihood estimate of a
Gaussian mixture model given some data (multivariate) in MATLAB without using any
built-in functions. (Hint: The ‘log-sum-exp’ trick may be useful in avoiding underflow).
b. Demonstrate that your implementation works by creating suitable artificial data
(where you know the true distribution) and testing your implementation on it. You
are free to create whatever tests you deem are suitable to convince someone (like
your future boss) that your implementation works.
3. (20 + 5 Bonus) Machine Learning
a. Build a classifier to distinguish between the sub-species of iris using data from the
Fisher Iris Dataset (available in MATLAB). Use 30% of the available data as your test
set and report your classification accuracy and confusion matrix. You are free to use
any classification method and feature transformations that you think are suitable to
the problem but explain the reasons for your choice (this can include but is not
restricted to performance comparisons with alternative choices). For BONUS marks,
implement everything yourself without using any built-in functions or external
toolboxes.

欢迎咨询51作业君
51作业君 51作业君

Email:51zuoyejun

@gmail.com

添加客服微信: ITCSdaixie