Page 1 of 7 Exam information Course code and name STAT2004 Statistical Modelling and Analysis Semester Semester 2, 2020 Exam type Online, non-invigilated Exam date and time Please refer to your personalised timetable Exam duration You have a 12-hour window in which you must complete your exam. You can access and submit your exam at any time within the 12 hours. Even though you have the entire 12 hours to complete and submit your exam, the expectation is that it will take most students between 2 and 2.5 hours. Reading time Reading time has not been formally allocated for online exams, however students are encouraged to review and plan their approach for the exam before they start. The total exam time should be sufficient to do this. Exam window You must commence your exam during the time listed in your personalised timetable. The exam will remain open only for the duration of the exam. Weighting This exam is weighted at 70% of your total mark for this course. Permitted materials This is an open book exam – you may use any course resource, including lecture notes, reference notes, statistical software and online resources. Instructions Answer all the questions. Note that not all questions are worth the same number of marks. Allocate your time appropriately for each question. You need to write or type your answers on blank paper (clearly label your solutions so that it is clear which problem it is a solution to). You must submit your answers as a single pdf file through Blackboard before the end of the allowed time. For questions requiring an audio response, please upload these as separate audio files via Blackboard before the end of the allowed time. You should include your name and student number on the first page of the file that you submit. Who to contact If you have any concerns or queries about a particular question, or need to make any assumptions to answer the question, state these at the start of your solution to that question. You may also include queries you may have made with respect to a particular question, should you have been able to ‘raise your hand’ in an examination room. If you experience any technical difficulties during the exam, contact the Library AskUs service for advice (open 7am–10pm, 7 days a week, Brisbane time): Chat: https://support.my.uq.edu.au/app/chat/chat_launch_lib/p/45 Phone: +61 7 3506 2615 Email:
[email protected] You should also ask for an email documenting the advice provided so you can provide this on request. In the event of a late submission, you will be required to submit evidence that you completed the exam in the time allowed. We recommend you use a phone camera to take photos (or a video) of every page of your exam. Ensure that the photos are time-stamped. If you submit your exam after the due time then you should send details (including any evidence) to SMP Exams (
[email protected]) as soon as possible after the end of the exam. Page 2 of 7 Important exam condition information The normal academic integrity rules apply. You cannot cut-and-paste material other than your own work as answers. You are not permitted to consult any other person – whether directly, online, or through any other means – about any aspect of this assessment during the period that this assessment is available. If it is found that you have given or sought outside assistance with this assessment then that will be deemed to be cheating and will result in disciplinary action. By undertaking this online assessment you will be deemed to have acknowledged UQ’s academic integrity pledge to have made the following declaration: “I certify that my submitted answers are entirely my own work and that I have neither given nor received any unauthorised assistance on this assessment item”. STAT2004 - Statistical Modelling and Analysis Semester Two Final Examinations, 2020 Question 1 (9 marks) In an exciting scene1 from the 1980 space-opera film “Flash Gordon”, two protagonists take turns in putting their arms into different holes of a large tree stump, where the wood beast lives, in order to see who is subjected to its fatal sting. In a twist on this competition, the sting may take two days to kill, so the game continues even if a sting is made. Suppose that there are nine holes in this stump, all of which are identical looking except that three of them lead to a fatal sting. You are to take on the role of one of the protagonists. (a) (1 mark) Suppose you go first. What is the probability that you do not get stung on your first turn? (b) (1 mark) Suppose you went first and did not get stung. What is the probability that your competitor also does not get stung on their first turn, given that you did not get stung on your first turn? (c) (1 mark) Suppose you went first and got stung. What is the probability that your competitor does not get stung on their first turn, given that you got stung on your first turn? (d) (2 marks) Suppose that we stop the game after each player has taken exactly one turn. Combining your results from parts (a)–(c), or otherwise, show that the probability that you do not get stung is the same regardless of whether you go first or second. (e) (4 marks) Now suppose we keep playing until every hole has been tried exactly once. Would it be better to go first or second? Justify your answer using probability calcula- tions. Question 2 (10 marks) Suppose we tossed a fair coin n times, obtaining x = 16 heads. However, we later forgot the number of times we tossed the coin! So n is an unknown value here, and we would like to estimate it based on our observed outcome of x = 16 heads. (a) (2 marks) Write down the likelihood function for the parameter n given the observed data x = 16. (b) (2 marks) Show that this likelihood function is maximised at both nˆ = 31 and nˆ = 32. (c) (2 marks) [Audio Question:] Give an intuitive explanation as to why the likelihood is maximised at two values here. (d) (2 marks) Let X ∼ Binom(n, 0.5). Find the largest n value such that P (X ≥ 16) ≤ 0.05 and the smallest n value such that P (X ≤ 16) ≤ 0.05. (e) (2 marks) Using your results from part (d), or otherwise, construct and interpret a 90% confidence interval for n based on the observed data x = 16. (Question 3: See next page) Page 3 of 7 1https://www.youtube.com/watch?v=80sCD2p0W1Q STAT2004 - Statistical Modelling and Analysis Semester Two Final Examinations, 2020 Question 3 (17 marks) Let Z1, Z2, . . . be independent and identically distributed copies of Z, where Z is a Gaussian random variable with mean 0 but some unknown variance σ2. For each n, define a random variable Sn by Sn = n∑ i=1 Zi 2. (a) (3 marks) Evaluate E(Z2) and E(Z4). (Hint: moment generating functions may be helpful here.) (b) (4 marks) Let Y = Z2. Find the probability density function for Y , and calculate its mean and variance. (c) (4 marks) Using the Central Limit Theorem, or otherwise, show that for every constant k the probability P(n− k√2n ≤ Sn/σ2 ≤ n+ k √ 2n) tends to a limit as n→∞ . (d) (4 marks) Using the result from part (c), or otherwise, construct an approximate 95% confidence interval for the population variance σ2 based on the following sample of 24 draws from a N(0, σ2) distribution: z=c(-1.0772249, -3.8338972, -3.9409255, 0.7811483, -0.0734284, 4.2074467, 0.1049149, -2.8546731, 4.3560279, -0.8312923, 2.5316146, 1.1064087, -0.9579516, -1.7817389, -2.7417048, -1.2759260, 0.4189448, -2.4035361, -1.5782865, 4.3048260, 0.6618659, -3.7423525, 1.5266717, 3.7929504) Feel free to copy-and-paste this dataset into R (or any other language of your choice). (e) (2 marks) [Audio question:] Give an interpretation of the confidence interval you computed in part (d) in a way that is understandable for another STAT2004 student. (Question 4: See next page) Page 4 of 7 STAT2004 - Statistical Modelling and Analysis Semester Two Final Examinations, 2020 Question 4 (17 marks) Suppose that the waiting time X between buses on a certain route can be modelled by an exponential distribution with pdf given by f(x;λ) = λe−λx , x ≥ 0 , where λ > 0 is an unknown frequency parameter. The bus company, LinksTran, claims that buses on this route arrive once every 12 minutes on average, but a customer suspects that the true average waiting time is longer than this. To investigate this dispute, the customer turns up to her local bus stop at 14 randomly selected times during a week, and records the waiting times X1, X2, . . . , X14 for buses on this route to arrive. (a) (1 mark) Briefly explain why the null and alternative hypotheses here are H0 : λ = 1/12 and H1 : λ = λ1 < 1/12, respectively. (b) (1 mark) Write down the likelihood function for λ given the dataX = (X1, X2, . . . , X14) >. (c) (5 marks) Construct a likelihood ratio test for testing H0 : λ = 1/12 against H1 : λ = λ1 < 1/12, and show that it reduces to a test with rejection region{ X1 +X2 + . . .+X14 ≥ d , for some critical value d } . (d) (2 marks) [Audio question:] Explain to a LinksTran operations officer what the test in part (c) is doing and why this test makes intuitive sense. It is given to you that the sum, Y = X1+X2+. . .+X14, of fourteen independent exponential(λ) random variables has a gamma distribution with shape=14, rate=λ, and pdf given by f(y) = λ14 Γ(α) y13e−λy , y ≥ 0 . You do not have to show this result – it is given to you as a fact. (e) (2 marks) Using the pgamma function in R, or otherwise, show that setting the critical value d in the test to be 248.023 minutes controls the Type I error at 5%. (f) (2 marks) Suppose that the true rate of buses on this route is in fact once every 24 minutes. Using the pgamma function in R, or otherwise, compute the power of this test. (g) (1 mark) Is this test the most powerful at the 5% level for testing H0 : λ = 1/12 against H1 : λ < 1/12? Briefly explain why, or why not. The observed waiting times of buses for these 14 randomly selected times were: x=c(35.441696, 7.690213, 2.786401, 32.883485, 1.700997, 12.671055, 20.412970, 28.720957, 24.971242, 2.974075, 14.157260, 23.623562, 12.801789, 30.215812) Feel free to copy-and-paste this dataset into R (or any other language of your choice). (h) (1 mark) Decide between the two hypothesis from part (a) based on the observed data. (i) (2 marks) Supplement your decision from part (h) by computing a p-value. You may use the pgamma function in R, or otherwise, to help you answer this question. (Question 5: See next page) Page 5 of 7 STAT2004 - Statistical Modelling and Analysis Semester Two Final Examinations, 2020 Question 5 (17 marks) A paper manufacturer is interested in improving their product’s tensile strength. The engi- neering team at this manufacturer believes that tensile strength is a function of the hardwood concentration in the pulp, and that the range of hardwood concentrations of practical interest is between 5% and 20%. This team of engineers decides to investigate four levels of hard- wood concentrations: 5%, 10%, 15%, and 20%. They decide to make up six test specimens at each concentration level by using a pilot plant. All 24 specimens were then tested on a laboratory tensile tenser in random order. A side-by-side boxplot of the tensile strengths for each hardwood concentration is given below: A one-way analysis-of-variance (ANOVA) is used to test whether there are systematic differ- ences between the tensile strengths across the four different hardwood concentrations. (a) (3 marks) Write down the underlying statistical model for a one-way ANOVA, including a full description of all the parameters and any distributional assumptions required by the model. (b) (2 marks) State the appropriate null and alternative hypotheses for this scenario. (c) (3 marks) Show that the ANOVA model from part (a) can be written in the linear model form “y = Xβ + ”, by clearly defining each of the terms y, X, β and here. (Question 5 continues on next page) Page 6 of 7 STAT2004 - Statistical Modelling and Analysis Semester Two Final Examinations, 2020 A partially-complete one-way ANOVA table for these data is given below: Source df SS MS F Pr(F) Hardwood % 382.79 Error Total 512.96 (d) (3 marks) By completing the above ANOVA table, or otherwise, make a decision be- tween the null and alternative hypotheses. Support your decision by computing and interpreting a p-value, and write your conclusions in a way that is understandable to the engineering team. Using their knowledge of physics, the engineering team proposes a square-root law for the tensile strength of paper as a function of hardwood concentration: Tensile strength = β × √ Hardwood concentration + , ∼ N(0, σ2) . (e) (2 marks) Show that the square-root model for this dataset can be written as a con- strained version of the ANOVA model from part (a). Clearly describe how the ANOVA model parameter(s) are being constrained and identify the number of free parameters in the two models. The fitted square-root model is given in the output below: Coefficients: Estimate Std. Error t value Pr(>|t|) sqrt(hardwood.conc) 4.6481 0.1443 32.22 <2e-16 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 2.499 on 23 degrees of freedom Multiple R-squared: 0.9783,Adjusted R-squared: 0.9774 F-statistic: 1038 on 1 and 23 DF, p-value: < 2.2e-16 (f) (3 marks) Using the given output and the ANOVA table from part (c), or otherwise, test whether the square-root law provides an adequate fit to the data. Do this by stating the null and alternative hypotheses, computing a test statistic, and finding a p-value for your test. (g) (1 mark) [Audio question:] Explain the conclusions from your test in part (f) in a way that is understandable to the engineering team. END OF EXAMINATION Page 7 of 7
欢迎咨询51作业君