MA50263: Assessed Coursework Setter: Luca Zanetti November 2 For this coursework you are asked to write Python code implementing algorithms for regression problems. You can use either standard Python or Jupyter notebooks (or a combination of the two). If you use Jupyter notebooks, please submit the notebooks themselves, not just an HTML copy. In any case, your code should be commented, readable, and executable. You can also submit a PDF file containing derivations, observations, and documentation for your code. The code and supplemental material should be submitted by Tuesday, 17th of November, 6pm (UK time) on Moodle. The expected time to complete this coursework is about 10 hours in total (including writing-up), but it can change significantly based on your programming experience. This coursework accounts for 50% towards the final mark of this unit. 1. Implement Stochastic Gradient Descent to solve Linear Regression problems. In particular, implement SGD to solve the following learn- ing problems. (a) Linear Regression without regularisation. (b) Linear Regression with `2-regularisation, i.e., Ridge Regression. (c) Linear Regression with `1-regularisation. Here, the regularisation penalty for a hypothesis w ∈ Rd is given by λ‖w‖1 where λ ∈ R+ and ‖w‖1 = ∑d i=1 |wi| is the `1-norm of w. You should test your implementation on at least three different suitable datasets, highlighting the effects (or lack thereof) of regularisation. Explicitly write down your derivations of the SGD updates for all three problems. [45] 2. Construct a dataset for Linear Regression with dimension d = 100 and number of training samples m = 1000 for which non-regularised Lin- ear Regression performs poorly, but Ridge Regression performs well. Explain why that happens. To answer this question you should use the implementation of Linear and Ridge Regression provided in the file lslr.py seen in class and available on Moodle. [10] 1 3. A friend of mine told me one of the benefits of `1-regularised Lin- ear Regression compared to Ridge Regression is a higher sparsity of its optimal solution, i.e., a smaller number of nonzero entries in the parameter vector corresponding to the optimal solution. Explain in- tuitively why that should be true and why obtaining sparser solutions might be better in certain scenarios. Can you find evidence of sparser solutions computed by your implementation of `1-regularised Linear Regression? [10] 4. Add early stopping to your implementation of SGD for Linear Re- gression without regularisation. What is the effect of early stopping? Compare the solutions obtained with early stopping to solutions ob- tained without it (both for regularised and non-regularised Linear Re- gression). [15] 5. Implement the kernel trick for Ridge Regression (based on Stochastic Gradient Descent). Write down the pseudocode of your algorithm and how you derived it. Test your implementation on a suitable dataset. [20] The work you hand in should be done by yourself without collaboration with others. Cheating is a serious offence. If exceptional circumstances (e.g., disability or illness) prevent you from completing this assignment by the deadline, make sure you follow the proce- dures at https://moodle.bath.ac.uk/course/view.php?id=1757§ion= 2#support2 and contact me by email at
[email protected]. 2
欢迎咨询51作业君