51作业君
首页
低价平台
服务介绍
代写程序
代写论文
编程辅导
代写案例
论文案例
联系方式
诚邀英才
代写选择指南
程序辅导案例
>
Program
>
程序代写案例-ACS6427
欢迎使用51辅导,51作业君孵化低价透明的学长辅导平台,服务保持优质,平均费用压低50%以上! 51fudao.top
ACS6427 ACS6427 1 TURN OVER Ancillary Material: Open-book examination > DEPARTMENT OF AUTOMATIC CONTROL & SYSTEMS ENGINEERING Autumn Semester 2020‒21 ACS6427 DATA MODELLING AND MACHINE INTELLIGENCE 2.5 hours The 2.5-hour duration of this examination comprises 2 hours of working time plus 30 minutes for upload and submission. Answer ALL questions, and submit A SEPARATE document per question. For full submission instructions, see https://www.sheffield.ac.uk/apse/digital/crowdmark/student Solutions will be considered in the order that they are presented in your submitted document. Trial answers will be ignored if they are clearly crossed out. All questions are marked out of 20. The breakdown on the right-hand side of the paper is meant as a guide to the marks that can be obtained from each part. This is an open-book examination. You may refer to any of the course materials and wider resources relevant to the module. All work must be entirely your own, and you may not engage help from other persons or systems (including Wolfram Alpha, Mathematica, and any homework/essay banks) to complete the examination. ACS6427 ACS6427 2 CONTINUED Blank Page ACS6427 ACS6427 3 TURN OVER 1. a) Show that the least square estimate for 0a in the simple linear regression problem 0y a ε= + from n observations of response y, i.e., 1, , ny y is the average of these observations. Here ε is the modelling error of a zero mean. [5 marks] b) Consider a linear regression problem where only one predictor x1 is involved and the relationship between the predictor and response y is 1y cx ξ= + in which ξ is a NONZERO mean modelling error. A set of 5 observations of 1 and y x are in the table below. i 1 2 3 4 5 iy 3.08 4.09 5.01 6.09 7.06 1ix 2 3 4 5 6 i) Find the least square estimate of parameter c from the observed predictor and response data. [8 marks] ii) Determine the Total Sum of Squares (TSS), Residual Sum of Squares (RSS), Explained Sum of Squares (ESS) of the estimated linear regression model, respectively. [3 marks] iii) Find the 2R statistic of the estimated linear regression model and assess the model accuracy using the 2R statistic. [4 marks] ACS6427 ACS6427 4 CONTINUED 2. a) A logistic function-based two class classifier has been determined as ( )0.1 0.2 1ˆ 1 x y e− + = + i) Find the probability for classification result "1"y = estimated from this logistic classifier when x = ‒3, ‒2, 0, 7, and 10, respectively. [2 marks] ii) Assume the true response y is as shown in the following table when x = ‒3, ‒2, 0, 7, and 10, respectively, and that the threshold T for the logistic classifier is T = 0.5. x ‒3 ‒2 0 7 10 y 0 0 1 0 1 Find the sensitivity, specificity, false negative rate, and false positive rate of the classifier, respectively [4 marks] iii) Show a sketch of the ROC curve of the classifier using the sensitivity and specificity when T is chosen as T = 1, T = 0.5, and T = 0, respectively, and explain why the AUC of the ROC curve is often needed to assess the performance of a classifier. [4 marks] b) A set of 10 observations of predictors xi=(xi1, xi2 ), i=1,…,10, are collected and shown in the following table. The same observations are plotted in Figure 2.1 (overleaf). xi x1 x2 x3 x4 x5 x6 x7 x8 x9 x10 xi1 1 1.5 2 2.5 2.5 2.7 3 4 5 6 xi2 2 0.8 2.5 1 4 2 3.5 2.4 3.6 2 ACS6427 ACS6427 5 TURN OVER Figure 2.1 In order to apply K-mean clustering with K=2 to these observations, at the first step, two centroids are randomly selected as (2,1.5) and (4, 3.5), respectively. (i) Apply the principle in the second step of K-mean clustering to cluster the 10 observations to two subgroups C1 and C2 and show which of the 10 observations are in C1 and which in C2. [4 marks] (ii) Find the centroids of C1 and C1 determined in part (i). [2 marks] (iii) Find the updated subgroups C1* and C2* using the centroids determined in part (ii) and show which of the 10 observations are in C1* and which in C2*. [4 marks] ACS6427 ACS6427 6 CONTINUED 3. a) As part of a project, you are performing a text mining experiment, looking for the term “Karhunen–Loève”. This has involved investigating 5 different data sources, {,,,,}, of which the term appears only in document {}. Based on all documents the TFIDF is 2.796. What is the term frequency for Document {}? [2 marks] b) In black-box modelling we rely on the data to help produce the models we will build. What elements of this data must we consider as we begin to build our model? In this context explain the requirement for cross-validation. Provide pseudo-code to outline the process of k-fold cross validation, explaining each step and showing how the process would change for different values of k. [6 marks] c) Data modelling and machine learning algorithms are often deployed in complex and challenging scenarios. The trained algorithms will often perform poorly, even though this may not have been intended at the design stage. As part of a newly developed team working for a small technology-focussed company, you have been asked to develop an algorithm to identify and predict good candidates for interview from their submitted Curriculum Vitae (CV). Discuss how you might tackle this problem in order to ensure that the system you develop is robust to presentation of a variety of candidates. [12 marks] 4. a) A dataset has been provided for you to analyse. You are concerned that the dataset may not have been presented optimally and wish to investigate this further. The dataset has a Covariance Matrix, = �11 −5 2−5 9 −32 −3 8 � which produces an eigenvector matrix, = �−0.5384 0.6934 0.4789−0.0912 −0.6129 0.78490.8378 0.3789 0.3932� and set of eigenvalues, = � 7.041216.51214.4468 � (i) Discuss the application of Principal Component Analysis (PCA) to this dataset, and explain what the application of PCA would achieve. Explain the geometrical relationship between the principal components. [3 marks] (ii) Which direction vector listed for this dataset gives the first principal component of the data? Discuss why this is the case. How much variance is contained within each principal component? [5 marks] ACS6427 ACS6427 7 b) As part of training a two input ({1, 2}) logistic classifier, you believe that the performance is not fit for purpose. (i) What steps would you take to incorporate non-linearity into the decision boundary for the model? Show how a cubic function might be implemented. [3 marks] (ii) Describe the issues that may arise through the implementation of an arbitrary- shaped decision boundary [2 marks] c) A set of data relating pressure and flow rate in a mains water system has been provided in Table 4-1. From basic theoretical considerations you have determined that the two variables are linked by a non-linear model of the form = 12. Table 4-1: Data for Q4(c) Flow Rate, F (m3/s) 0.436 0.586 0.614 0.659 0.764 0.9467 0.9854 1.07 Pressure, P (Pa) 75842 117211 137895 172369 275790 298705 356210 379212 (i) The model being considered for the problem is non-linear. Show that a linearisation can take place, and provide the new variables for the linear model. [2 marks] (ii) Solve for optimal values of weights within this linearised model, and thus provide values for model parameters that are to be estimated: 1 and 2. [5 marks] END OF QUESTION PAPER
欢迎咨询51作业君
官方微信
TOP
Email:51zuoyejun
@gmail.com
添加客服微信:
abby12468