程序代写案例-STU33009-Assignment 20

欢迎使用51辅导,51作业君孵化低价透明的学长辅导平台,服务保持优质,平均费用压低50%以上! 51fudao.top
TRINITY COLLEGE DUBLIN
School of Computer Science and Statistics
Final Assignment 2020-21 STU33009: Statistical Methods for Computer Science
Submitting Your Report
• Reports must be typed (no handwritten answers please) and submitted
on Blackboard.
• As a guideline, reports should be about 5 pages in length including all
plots (please don’t go a lot over this).
• You will need to use matlab to calculate values from the assignment
dataset, or alternatively write a short program in python to do this. In
either case give the code used as an appendix to the report (it doesn’t
count towards the page limit), but please keep the code short.
• In order to obtain full credit it is essential that you explain/justify how you
obtained your results and, where appropriate, that you critically reflect
upon them. Simply giving raw numbers as answers will receive few marks
as will saying “see code for details” and the like, even if the code contains
explanatory comments.
• It is mandatory to complete the declaration that the work is entirely your
own and you have not collaborated with anyone - the declaration form is
available on Blackboard.
Downloading Dataset
• Download the assignment dataset from https://www.scss.tcd.ie/doug.leith/
ST3009/final2021.php. Important: You must fetch your own copy of the dataset,
do not use the dataset downloaded by someone else.
• The data file consists of three columns of COVID testing data. The first column is the
number of people tested for COVID, the second column the number testing positive
and the third column is the number of people presenting with significant symptoms.
Each row corresponds to one week and the numbers reported are cumulative (so the
value in first column of row i is the total number of people tested up to and including
week i). Please cut and paste the first line of the data file (which begins with a #)
into your report as it identifies your dataset.
Assignment
1. In the first part of the assignment you’ll work with just the last row of data in the file
you downloaded. That has three values: the number N of people tested for COVID, the
number P testing positive and the number S with significant symptoms.
(a) A key concern with COVID is that people may be infected but show no significant
symptoms. Assuming that the people tested are drawn uniformly at random from
the population, use your data to estimate the fraction of the population expected to
(i) test positive for COVID but have no significant symptoms and (ii) test positive
and have significant symptoms. Explain/discuss your calculations. [5 marks]
(b) Estimate a confidence interval for each of your two estimates in part (i). Explain/discuss
your calculation. [5 marks]
(c) Is it important to assume that the people tested are drawn uniformly at random?
How might it affect your estimates if this isn’t the case? [5 marks]
(d) For people without significant symptoms the COVID test used has a false positive
rate of 0.01 and a false negative rate of 0.1. That is, if you don’t have COVID there
is a 0.01 probability that the test will incorrectly give a positive result, while if you
do have COVID (but have no symptoms) there’s a 0.1 probablity that the test will
incorrectly give a negative result. Use this information, combined with your estimate
from part (i), to estimate the fraction of the population that have COVID (rather than
just testing positive) but have no significant symptoms. Hint: Use marginalisation.
[5 marks]
(e) Estimate a confidence interval for your estimate in part (d). Explain/discuss your
calculation. [5 marks]
(f) Given that you test positive for COVID but have no significant symptoms, estimate
the probability that you actually have the disease. Hint: Use Bayes Rule. [5 marks]
(g) Using matlab (or python) write a short stochastic simulation of the setup you have
just analysed. Namely, for each person in the population there is a probability of
catching COVID but showing no symptoms, and a probability of catching COVID and
showing significant symptoms. If a person shows symptoms then when tested this will
come up positive but if they have no symptoms there is a high probability that they
will test positive but also a small probability that they test negative. Explain/discuss
your code. [10 marks]
(h) Using this simulation, estimate the probability that a person (i) tests positive for
COVID but has no significant symptoms and (ii) tests positive and has significant
symptoms. Compare with your estimates in part (i) and discuss. [10 marks]
2. In this part of the assignment you’ll use the full dataset that you downloaded. Let xk
be the number of infected people during week k. Assuming growth is exponential then
xk = e
akx0
where a is a growth parameter and x0 is the initial number of people infected. When a < 0
then the infection decays over time, when a > 0 then it grows. Taking logs,
log xk = ak + log x0
and so if a is a constant then we expect a plot of log xk to be a straight line with slope a.
2
(a) Plot the number of people testing positive vs time and also plot the logarithm of
the number of people testing positive vs time. Discuss and, if appropriate, roughly
estimate growth parameter a. [5 marks]
(b) Write a short piece of matlab (or python) code that trains a linear regression model
using gradient descent. You should implement this from scratch (so you’ll need to
calculate the cost function and its gradient, update these etc). Do not use any built
in functions/libraries for linear regression. [10 marks]
(c) Using your code from (b) train a linear regression model on the log xk data and so
estimate growth parameter a and the level of initial infections log x0.
(i) How did you choose the gradient descent step size? Justify your choice (and
present data to back it up). [5 marks]
(ii) A linear regression model makes some statistical assumptions regarding the data.
What are these assumptions? [5 marks]
(iii) Discuss whether the infection data is likely to satisfy or violate any of these
assumptions. [5 marks]
(d) Explain how to use bootstrapping to estimate confidence intervals for the linear re-
gression estimates of a and log x0. [5 marks]
(e) Now write a short piece of code to implement bootstrapping, and report the confidence
intervals that you obtain. Discuss. [10 marks]
(f) By plugging in a range of values for a that lie within the confidence interval into the
formula xk = e
akx0 estimate a confidence interval for xk when k = 10 weeks. In this
formula just use the value of log x0 (recall x0 = exp(log x0)) that you estimated in
(c), there’s no need to consider a range of x0 values. Explain/discuss. [5 marks]
3

欢迎咨询51作业君
51作业君

Email:51zuoyejun

@gmail.com

添加客服微信: abby12468