DEPARTMENT OF STATISTICS SCHOOL of MATHEMATICS and STATISTICS MATH3851 Experimental Design and Categorical Data Assignment 2 (Due time: Friday COB Week 10) The questions are to be done by hand rather than by computer except where other- wise indicated. (You are welcome to check them on the computer.) Q1. Following the outbreak of food poisoning that occured after an outing held for personnel of an insurance company, the following data on food eaten was reported: Crabmeat Potato Salad Ill Not ill Yes Yes 120 81 No 4 31 No Yes 23 24 No 2 23 Suppose we adopt the following notation: C for crabmeat, C¯ for no crabmeat and similarly P , P¯ , I, I¯ . a) Estimate the odds ratio OˆRC for illness for potato salad eaters versus noneaters of potato salad among crabmeat eaters. b) Test at the 5% level of significance H0 : ORC = 1; i.e., no association between eating potato salad and becoming ill among crabmeat eaters. c) Carry out a test of the hypothesis H0 : ORC = ORC¯ by using Woolf’s test. Report your findings. Is crabmeat an effect modifier? Give reasons. d) Use the Mantel-Haenszel method to calculate a summary estimate of the odds ratio for illness among potato salad eaters versus noneaters of potato salad. In view of your answer in c), is this a valid estimate of a common odds ratio? Q2. In the 1930s, the eminent British statistician Ronald Fisher had a colleague at Rothamsted Experimentation Station near London who claimed that, when drinking tea, she could distinguish whether milk or tea was added to the cup first. To test the claim, Fisher designed an experiment in which she tasted eight cups of tea. Four cups had milk added first, and the other four had tea added first. She was told that there were four cups of each type, so that she should try to select the four that had milk added first. The cups were presented to her in random order. The table below shows a potential result of the experiment. Guess Poured First Poured First Milk Tea Total Milk 3 1 4 Tea 1 3 4 Total 4 4 a) What are the appropriate null and alternative hypotheses in this experi- ment? b) What is the null distribution of the number of correct guesses? 1 2c) Calculate the P-value of the test based on the table given in the table and state the conclusion of the experiment. [Recall that the P-value is defined as the probability of the outcome of the experiment being as unlikely as or more unlikely than the one actually observed, assuming that the null hypothesis is true.] Q3. The table below summarises a two-year prospective study of the association between religious belief and mortality among an elderly nursing home pop- ulation. The four strata are: (1) healthy males, (2) ill males, (3) healthy females, and (4) ill females. (Note that the strata are simultaneously adjust- ing for gender and state of health). The data are presented by Lachin (2000) in the following format: Religious Non-religious Stratum Number of deaths Total number Number of deaths Total number 1 4 35 5 42 2 4 21 13 31 3 2 89 2 62 4 8 73 9 45 a) Present the data in the form of four 2×2 contingency tables, one for each strata, with rows representing religious belief and columns representing deaths/survivors. b) Test for homogeneity of odds ratios for the four strata, using a 5% sig- nificance level. c) Use the Mantel-Haenszel method to calculate a summary estimate of the odds ratio for mortality among religious versus non-religious residents. Is this a valid estimate of a common odds ratio for the four strata? d) Calculate the Mantel-Haenszel chi-square statistic and use it to test at the 5% level for an overall association between religious belief and mortality. Q4. The progeny of a certain mating were classified by a physical attribute into three groups, the numbers being n1, n2, n3. According to the genetic model, the probabilities for each group are proportional to: p1(θ) : p2(θ) : p3(θ) = (1 − θ) : (1 + 2θ) : (1 − θ) where θ > 0 is an unknown parameter. A multinomial sample for the three categories is available with n1 = 31, n2 = 47, n3 = 22 (and n = n1 +n2 +n3 = 100). a) Derive the log-likelihood for these data and find the MLE of θ. b) Find the maximum likelihood estimates of p1, p2 and p3 by substituting the given observed frequencies. c) Maximum likelihood theory tells us that under certain regularity condi- tions, the asymptotic variance of the MLE θˆ is { E [ − ∂ 2l ∂θ2 ]}−1 . Use this result to find an asymptotic estimate of the standard error of θˆ. Hence construct an asymptotic 95% confidence interval for the parameter θ. [Recall that for the multinomial distribution, E(nj) = npj, where pj is the probability of belonging to category j. Here j = 1, 2, 3.]
欢迎咨询51作业君