代写辅导接单-6CCYB064

欢迎使用51辅导，51作业君孵化低价透明的学长辅导平台，服务保持优质，平均费用压低50%以上！ 51fudao.top

King’s College London

This paper is part of an examination of the College counting towards the award of a degree. Examinations are governed by the College Regulations under the authority of the Academic Board.

BENG/MENG BIOMEDICAL ENGINEERING EXAMINATION

6CCYB064 MACHINE LEARNING FOR BIOMEDICAL APPLICATIONS Examination Period 1 (January 20xx)

TIME ALLOWED: TWO HOURS

PART A: COMPULSORY. ANSWER ALL QUESTIONS. (50% OF TOTAL) PART B: ANSWER 2 OUT OF 4 QUESTIONS. (50% OF TOTAL)

ANSWER EACH QUESTION ON A NEW PAGE OF YOUR ANSWER BOOK AND WRITE ITS NUMBER IN THE SPACE PROVIDED.

CALCULATORS MAY BE USED. ONLY THE FOLLOWING MODELS ARE PERMITTED:

Casio fx83

Casio fx85.

DO NOT REMOVE THIS EXAM PAPER FROM THE EXAMINATION ROOM

TURN OVER WHEN INSTRUCTED 20xx © King’s College London

Part A

Compulsory part. Answer all questions in Part A.

Question A.1

(a) Among the following machine learning methods Logistic regression,

PCA, Ridge regression, SVM, K-means choose the ones that are

1. unsupervised 2. regression

3. classification 4. clustering

(b) Results of multivariate regression were evaluated by calculating the root mean squared error (RMSE) on the whole dataset, training test, test set and using cross-validation. The table below shows the results:

set RMSE whole 10.5 training 11.2 test 75.5 cross-validation 50.5

Identify the problem with the fitted model and explain how you arrived to your conclusion.

6CCYB064

[5 marks]

Question A.2

The linear regression model is fitted by minimising loss

yˆ = X ( y i ? X x k w k ) + ? R ( w ) ik

(a) Explain the role of parameter ?

[2 marks] 2 SEE NEXT PAGE

[5 marks]

[10 Marks]

(b) Formulate the penalty R(w) as Ridge and Lasso. Explain what is similar and what is different about these two penalties.

(c) Explain the role of non-linear feature transformation for regression. Re- late it to an example of univariate polynomial regression.

Question A.3

(a) The training sample i has label yi = 0. Logistic regression predicted pi close to 1. Approximatelly evaluate the cross-entropy for this sample and interpret the result.

[4 marks]

[2 marks]

[4 marks]

[10 Marks]

(b) The probabilistic prediction for sample i to belong to class 1 is pi = 0.6. Interpret this result.

(c) Provide intuitive interpretation of mathematical formulation of decision function through dual representation in kernel support vector classifica- tion. Describe how this decision function is used for label prediction.

Question A.4

(a) Among following methods choose the ones that are parallel: Bagging, Boosting, Voting, Random forest.

[2 marks]

[2 marks] 3 SEE NEXT PAGE

(b) What is bias error and how can it be decreased using sequential ensem- ble learning?

6CCYB064

[4 marks]

[10 Marks]

6CCYB064 (d) Describe the concept of boosting. How does it differ from bagging?

Question A.5

(a) Describe the concept of a clustering algorithm and illustrate it on an ex- ample of the k-means.

[2 marks]

(b) For a training data point xi, the likelihood that this point was generated from a mixture of Gaussian distributions with parameters ? = (μk , ?k , ck )k=1,...,K can be expressed as

where G(x, μ, ?2) = i. p(xi|?)

ckG(xi, μk, ?k) e(x?μ)2 . Explain the meaning of

p(xi|?) =

XK k=1

[4 marks]

[10 Marks]

2⇡? ii. ? = (μk,?k,ck)k=1,...,K

2?2

iii. G(xi,μk,?k)

[4 marks]

[10 Marks]

4 SEE NEXT PAGE

[4 marks]

Part B

Answer any 2 of the following 4 questions.

Question B.1

You are given two samples with feature matrix X = (0, 1)T and target vector y = (1, 0)T . Your task is to fit Kernel Ridge regression with linear kernel to these samples and evaluate it for a new sample x = 0.5. Answer the following questions:

(a) Which feature transformation ?(x) results in 1D linear regression?

(b) Calculate kernel (x, x0) for 1D linear regression

(d) Given a new instance x = 0.5 calculate vector k(x)T

[2 marks]

[4 marks]

[6 marks]

[4 marks]

(e) Calculate the value of kernel regression estimate for x = 0.5 given ? = 0

[4 marks]

(f) Discuss advantages and disadvanages of kernel trick compared to non- linear feature transformation.

Question B.2

A dataset with 2D features has covariance matrix

S=✓41 ?12◆ ?12 34

Find the projection on the first principal component and calculate the projected coordinate of the sample x = (1,1)T. How much variance this projection preserves?

5 SEE NEXT PAGE

6CCYB064

[5 marks]

[25 Marks]

(a) Find the eigenvalues of the covariance matrix S

(b) Find the first principal component u1 of the covariance matrix S. Give the equation for the projection y = p1(x) of a feature vector x = (x1, x2)T on the first principal component u1

(d) Calculate how much variance this projection preserves.

(e) List three dimensionality reduction methods and discuss in which situa- tions you would use them.

[25 Marks]

Question B.3

(a) Explain, with the aid of a diagram, what a decision stump is, clearly la- belling its different parts.

[5 marks]

(b) Explain what is meant by the information gain I used in decision trees, by explaining each part of the following formula:

I ( S j , A j ) = H ( S j ) ? X ?? S ji ?? H ? S ji ? i |Sj|

where the entropy of a state is defined by:

H(S) = ? X p(yk) log2 p(yk) yk 2Y

6CCYB064

[4 marks]

[4 marks] [4 marks]

[9 marks]

6 SEE NEXT PAGE

[5 marks]

Calculate the following:

i. The entropy at each of the 5 nodes

ii. The information gain at Node 3 iii. The overall information gain

Question B.4

[15 marks]

[25 Marks]

(a) Describe the main steps of training of the neural network.

(b) Given the feature vector (x1,...,xn), weights w1j,...,wnj and sigmoid ac-

[5 marks] tivation function f(z) = ?(z) = 1?z , give the expression that evaluates

6CCYB064

1+e

the prediction (output oj) of the single-layer perceptron for this feature

vector.

the error E for this sample.

(d) Calculate the derivative of the error with respect to the weight wij. Note,

that derivative of the sigmoid function is @?(z) = ?(z)(1 ? ?(z)). @z

7 SEE NEXT PAGE

[5 marks] [5 marks] [5 marks]

(e) The weights can now be updated using the delta rule according to the gradient descent. Give the expression for the ?wij that gives the updates

for the weight wij. Explain this expression. [5 marks]

[25 Marks]

8 FINAL PAGE

6CCYB064