STAT2203/7203 (S2-2022): Assignment 03 Due: 21-October-2022 @16:59 1. [5 marks each] We have seen how to simulate from a distribution using the inverse-transform method; see §5.8 of the course notes as well as slide 8/14 of Lecture4-3. Another method to simulate random variables from a given distribution is using rejection sampling. This question concerns a particular application of rejection sampling. Benford’s law is a distribution on the integers {1, 2, . . . , 9} with probability mass function fD(d) = log10 ( d+ 1 d ) , d ∈ {1, 2, . . . , 9}. We would like to be able to simulate the random variable D from this distribution. Suppose X has probability mass function fX(x) = 1 9 , x ∈ {1, 2, . . . , 9} Conditional on X = x, the random variable Y has a Bernoulli(fD(x)/ log10(2)) distribu- tion. (a) Verify that fD(x)/ log10(2) ≤ 1 for all x ∈ {1, 2, . . . , 9}. (b) What is the joint probability mass function of (X, Y )? (c) Determine P(Y = 1). (d) Determine the conditional probability mass function of X given Y = 1. (e) This suggests we can simulate a random variable with probability mass function fD using the following algorithm Y = 0 While (Y = 0) { Simulate X from a uniform distribution on {1,2,...,9} Simulate Y from a Bernoulli distribution with success probability f(X)/log10(2) } Return X 1 In each loop a new pair of random variables (X, Y ) is simulated, independent of all previously simulated random variables. Implement this algorithm in R (or any programming language of your choice). You will need to use a while loop. In R, the general form is while (cond) { expressions } where cond is a length one logical vector. (f) What is the distribution of the number of pairs of random variables (X, Y ) that need to be simulated to simulate a single random variable from Benford’s law? 2. [5 marks each] For The following questions, work out your answers ‘by hand’. You may still use R (or any other programming language) to obtain probabilities and quantiles from the appro- priate distributions and calculate your final answers. A study investigated if psychotherapy combined with limited administration of Methylene- dioxymethamphetamine (MDMA) can reduce symptoms of post-traumatic stress disor- der. Severity of symptoms was measured via the CAPS-IV score with higher scores indicating more severe symptoms. Forty-eight patients recruited to the study with twenty-four patients being randomly allocated each of the two dosage levels (Low – 40 mg, High – 125 mg). The primary outcome was the reduction in CAPS-IV score one month after the end of treatment. The forty-eight patients at the commencement of the study had an average CAPS-IV score of 81.35 with a sample standard deviation of 17.54. At the end of treatment, the High dose group experienced an average drop in CAPS-IV score of 24.2 with a sample standard deviation of 23.1. The Low dose group experienced an average drop in CAPS- IV score of 12.7 with a sample standard deviation of 19.4. (a) Determine a 95% confidence interval of the population mean CAPS-IV score of patients at commencement of the study. (b) Does the data provide evidence that the high dose MDMA treatment is associated with a decrease in mean CAPS-IV score? State the null and alternative hypotheses, and determine the appropriate test statistic and p-value. What do you conclude? (c) Researchers would like to determine if patients experience a greater decrease in CAPS-IV score with the high dose MDMA treatment than low dose MDMA treat- ment. State the null and alternative hypotheses, and determine the appropriate test statistic and p-value. What do you conclude? (d) A secondary outcome was whether the patient experienced a drop of 20% or more in CAPS-IV score. In the high dose treatment group 11 patients experienced such 2 a drop in CAPS-IV score. Construct a 95% confidence interval for the population proportion of patients that would experience a 20% or more drop in CAPS-IV score with the high dose treatment. (e) In addition to the 11 patients in the high dose treatment group who experienced a 20% or more drop in CAPS-IV score, 6 patients in the low dose treatment group also experienced a 20% or more drop in CAPS-IV score. The researchers would like to test if the population proportion of patients that would experience a 20% or more drop in CAPS-IV score is greater with a high dose treatment than the low dose treatment. State the null and alternative hypotheses, and determine the appropriate test statistic and p-value. What do you conclude? (f) Are the assumptions/approximations you used for the analysis in part (e) valid? Justify your answer. 3. [7 marks each] This question concerns the analysis of simple random samples from two populations; the first population having a N (µ1, σ21) distribution and the second population having a N (µ2, σ22) distribution. All parameters are unknown. (a) A researcher wishes to compare the means of the two populations. Based on two simple random samples of equal size from these populations, the researcher con- structs the 95% exact confidence intervals for each of the population means. The researcher then makes the following claim “. . . as the 95% confidence intervals for the means do not overlap, we can conclude there is moderate evidence suggesting that the true means are different (p < 0.05)”. Justify the researchers claim. (b) Even if the 95% confidence intervals of population means overlap, it is still possible that the p-value from testing H0 : µ1 = µ2 against H1 : µ1 6= µ2 is less than 0.05. Provide example summary statistics (sample means, sample stan- dard deviations and sample sizes) for which the confidence intervals of the popu- lation means overlap but the p-value from the test of the above hypotheses is less than 0.05. The overlap of the interval must be more than just the end points of the intervals matching. You must show your summary statistics have the required property by constructing the confidence intervals of the means and carry out the hypothesis test. 4. [2 marks each] Exposure to ground level ozone (O3) is believed to impair airway function in healthy individuals. To investigate this, researchers recruited 60 individuals (34 males and 26 females) and had them exercise for one hour on a cycle ergometer while breathing 0.30 3 parts per million of ozone. The Forced Expiratory Volume (FEV) and Forced Vital Capacity (FVC) of each subject was measured before and after the test and the change recorded as a percentage. The file ozone.csv contains the following variables: • FVC – Percentage change in Forced Vital Capacity • FEV – Percentage change in Forced Expiratory Volume (a) Run linear regression for Change in FEV% against Change in FVC% using R (or any programming language of your choice) and give the summary output. Produce diagnostic plots, namely scatterplot of residuals against fitted values and the normal quantile plot of residuals, for the linear regression fit. Give these captions and figure numbers and refer to them as needed in later questions. (b) List the assumptions of the linear regression model. For each, explain whether or not there is evidence that this assumption is violated, based on the diagnostic plots. (c) A researcher suggest that the linear regression model is not appropriate because the Change in FVC % does not have a normal distribution. Are they correct? Justify your answer. For the following parts you may assume that the model assumptions hold. (d) Report a 99% confidence interval for the slope of the linear regression model. (e) Provide both a 90% prediction interval for the change in FEV% for a healthy individual with a change in FVC of 10% and a 90% confidence interval for the mean change in FEV% for a healthy individual with a change in FVC of 10%. Briefly explain the difference of between the two intervals. (f) Researchers believe that the changes in FEV% and FVC% are both due to a change a decline in inspiratory capacity so that the intercept of the regression line should be zero. Is the result of the regression analysis consistent with this belief? State the null and alternative hypotheses, and report the appropriate test statistic and p-value. What do you conclude? (g) Explain the meaning of the R-squared value in the regression output. [1 mark] 5. [6 marks each] Consider the simple linear regression model as discussed in class where the observations are modeled Yi iid∼ N (β?0 + β?1xi, σ2), i = 1, . . . , n. Consider the least squares estimators, βˆ0 and βˆ1, for the respective coefficients β?0 and β?1 . 4 (a) Show that Var(βˆ1) = σ2∑n i=1(xi − x)2 . (b) Show that Var(βˆ0) = σ2 ( 1 n + x 2∑n i=1(xi − x)2 ) . 100 marks in total Note: • This assignment counts for 20% of the total mark for the course. • Although not mandatory, if you could type up your work, e.g., LaTex, it would be greatly appreciated. • Show all your work and attach your code and all the plots (if there is a programming question). • Combine your solutions, all the additional files such as your code and numerical results, all in one single PDF file. • Please submit your single PDF file on Blackboard. 5
欢迎咨询51作业君