ECOM135

Page 3 Question 1

The classical linear model is given by y = X + " where y is a n 1 vector of responses, X is an n p matrix of explanatory variables, is a p 1 vector of parameters and " is an n 1 vector of errors. a) Explain why such a model is useful as a starting point in a machine learning context. [5 marks] b) Explain how ridge regression modifies the above model and in what contexts such a model might be required. [5 marks] c) What is a LASSO regression and under what circumstances might this model prove to be better than a ridge regression? [5 marks] d) Explain how ridge and LASSO regressions can both be combined in the form of an elastic net. [5 marks] e) Why is it often necessary to consider the scaling of variables when fitting the models you have described above? [5 marks] Turn over Page 4 ECOM135 (2021) Question 2 a) What is Bayes rule or formula and why is it useful in machine learning? [5 marks] a) Explain in detail what is meant by a naive Bayes classifier? In particular, why is it naive and does this description mean it is unreliable? [8 marks] c) You are given the following set of low (L), moderate (M) and high (H) volatility market state transition probabilities for trading a particular market. The current time period is denoted by subscript 0 and the next time period by subscript 1. For example, if the current state of the market is moderate, there is a 5% chance that the market will move to a low volatility state in the next period, a 70% chance that it will remain in a moderate volatility state and a 25% chance that it will transition to a high volatility state in the next time period. L1 M1 H1 L0 .45 .48 .07 M0 .05 .70 .25 H0 .01 .50 .49 i. This particular market on average spends 50% of the time in state L, 40% in state M and 10% in state H. What is the probability that the market transitions to state H in the next period? ii. If the market is actually in a highly volatile state H, what is the probability that it was in this high volatility state in the previous period? [12 marks] ECOM135 (2021) Question 3 The Poisson density is given by Page 5 P(X = x) = xe x! for > 0 and x = 0, 1, . . . , n. a) In what situations might such a probability model be useful? b) What is the maximum likelihood estimator for ? [5 marks] [7 marks] c) Show that a local maximum is achieved for the estimate given by your answer to part b). [5 marks] d) A trading algorithm devised by you identified the following opportunities to enter a trade in a particular stock over a three year period (assumes 252 trading days in a year). For example, there were three opportunities in a day to enter into a trade in the stock on 31 days and four opportunities on 5 days over the three year period. Number of daily trade opportunities 0 1 2 3 Frequency of occurrence (no. days) 497 158 62 31 45 53 You suspect there has been artificial stock price manipulation. Do these observations support your suspicion? [8 marks] Turn over Page 6 Question 4 a) What is the bootstrap and why is this technique useful in machine learning? b) The following ordinal credit score data were obtained on 10 individuals. 640 589 845 710 701 842 599 913 749 845 ECOM135 (2021) [8 marks] A bootstrap analysis of the data was thought to be sensible and the following five bootstrap samples were obtained from the above sample. i. Why is each bootstrap sample of the same size ii. What is the bootstrap estimate of the median? Bootstrap Sample 12345 701 589 589 701 599 589 710 589 599 749 640 701 842 640 701 845 710 710 845 710 842 845 749 845 845 842 845 845 842 913 599 640 599 710 599 749 640 845 701 845 710 845 842 845 842 845 842 913 845 913 (i.e. 10 observations) as the original data? From the data in the table above: iii. What is the bootstrap estimate of the bias of the sample median? Comment on the result you have obtained. iv. What is the bootstrap estimate of the standard error of the median? c) What are the main limitations of this form of analysis? [14 marks] [3 marks] ECOM135 (2021) Page 7 Question 5 Assume we have two classes or groups g1 and g2 from two multivariate normal distributions each having population means 1 and 2 and population covariance matrices 1 and 2, respectively. a) Describe the linear discriminant function for these groups. b) What is the allocation rule between the groups? c) When does one obtain a quadratic rather than a linear discriminant function? d) What advantages does a support vector machine have over a discriminant function? [8 marks] [8 marks] [4 marks] [5 marks] Turn over Page 8 Question 6 a) What is gradient boosting? b) How does gradient boosting dier from a random forest? c) Outline a general gradient boosting procedure mathematically or in pseudo-code. d) Describe a suitable gradient boosting error function for regression and another for classification. [4 marks] ECOM135 (2021) [7 marks] [5 marks] [9 marks] ECOM135 (2021) Question 7 a) What is an artificial neural network (ANN)? b) What potential uses does an ANN serve? c) What is a feed-forward network? d) Describe generally what is meant by a deep neural network. Page 9 [3 marks] [3 marks] [4 marks] [6 marks] e) Describe the McCullochPitts model of a neuron. Why is this model important in an ANN context? What are the models main limitations? [9 marks] End of Paper (2021) Page 3

Question 1 The classical linear model is given by y = X + " where y is a n 1 vector of responses, X is an n p matrix of explanatory variables, is a p 1 vector of parameters and " is an n 1 vector of errors. a) Explain why such a model is useful as a starting point in a machine learning context. [5 marks] b) Explain how ridge regression modifies the above model and in what contexts such a model might be required. [5 marks] c) What is a LASSO regression and under what circumstances might this model prove to be better than a ridge regression? [5 marks] d) Explain how ridge and LASSO regressions can both be combined in the form of an elastic net. [5 marks] e) Why is it often necessary to consider the scaling of variables when fitting the models you have described above? [5 marks] Turn over Page 4 ECOM135 (2021) Question 2 a) What is Bayes rule or formula and why is it useful in machine learning? [5 marks] a) Explain in detail what is meant by a naive Bayes classifier? In particular, why is it naive and does this description mean it is unreliable? [8 marks] c) You are given the following set of low (L), moderate (M) and high (H) volatility market state transition probabilities for trading a particular market. The current time period is denoted by subscript 0 and the next time period by subscript 1. For example, if the current state of the market is moderate, there is a 5% chance that the market will move to a low volatility state in the next period, a 70% chance that it will remain in a moderate volatility state and a 25% chance that it will transition to a high volatility state in the next time period. L1 M1 H1 L0 .45 .48 .07 M0 .05 .70 .25 H0 .01 .50 .49 i. This particular market on average spends 50% of the time in state L, 40% in state M and 10% in state H. What is the probability that the market transitions to state H in the next period? ii. If the market is actually in a highly volatile state H, what is the probability that it was in this high volatility state in the previous period? [12 marks] ECOM135 (2021) Question 3 The Poisson density is given by Page 5 P(X = x) = xe x! for > 0 and x = 0, 1, . . . , n. a) In what situations might such a probability model be useful? b) What is the maximum likelihood estimator for ? [5 marks] [7 marks] c) Show that a local maximum is achieved for the estimate given by your answer to part b). [5 marks] d) A trading algorithm devised by you identified the following opportunities to enter a trade in a particular stock over a three year period (assumes 252 trading days in a year). For example, there were three opportunities in a day to enter into a trade in the stock on 31 days and four opportunities on 5 days over the three year period. Number of daily trade opportunities 0 1 2 3 Frequency of occurrence (no. days) 497 158 62 31 45 53 You suspect there has been artificial stock price manipulation. Do these observations support your suspicion? [8 marks] Turn over Page 6 Question 4 a) What is the bootstrap and why is this technique useful in machine learning? b) The following ordinal credit score data were obtained on 10 individuals. 640 589 845 710 701 842 599 913 749 845 ECOM135 (2021) [8 marks] A bootstrap analysis of the data was thought to be sensible and the following five bootstrap samples were obtained from the above sample. i. Why is each bootstrap sample of the same size ii. What is the bootstrap estimate of the median? Bootstrap Sample 12345 701 589 589 701 599 589 710 589 599 749 640 701 842 640 701 845 710 710 845 710 842 845 749 845 845 842 845 845 842 913 599 640 599 710 599 749 640 845 701 845 710 845 842 845 842 845 842 913 845 913 (i.e. 10 observations) as the original data? From the data in the table above: iii. What is the bootstrap estimate of the bias of the sample median? Comment on the result you have obtained. iv. What is the bootstrap estimate of the standard error of the median? c) What are the main limitations of this form of analysis? [14 marks] [3 marks] ECOM135 (2021) Page 7 Question 5 Assume we have two classes or groups g1 and g2 from two multivariate normal distributions each having population means 1 and 2 and population covariance matrices 1 and 2, respectively. a) Describe the linear discriminant function for these groups. b) What is the allocation rule between the groups? c) When does one obtain a quadratic rather than a linear discriminant function? d) What advantages does a support vector machine have over a discriminant function? [8 marks] [8 marks] [4 marks] [5 marks] Turn over Page 8 Question 6 a) What is gradient boosting? b) How does gradient boosting dier from a random forest? c) Outline a general gradient boosting procedure mathematically or in pseudo-code. d) Describe a suitable gradient boosting error function for regression and another for classification. [4 marks] ECOM135 (2021) [7 marks] [5 marks] [9 marks] ECOM135 (2021) Question 7 a) What is an artificial neural network (ANN)? b) What potential uses does an ANN serve? c) What is a feed-forward network? d) Describe generally what is meant by a deep neural network. Page 9 [3 marks] [3 marks] [4 marks] [6 marks] e) Describe the McCullochPitts model of a neuron. Why is this model important in an ANN context? What are the models main limitations? [9 marks] End of Paper