QUESTION/ANSWER SHEET ID:....................... STATS 240 Qu. 1. [12 marks] An experimenter was interested in the effect of adding sodium fluoride (NaF) to blood samples that had been submitted for blood alcohol determination. There was speculation that this process of “salting” caused higher blood alcohol readings. Six volunteer subjects were given alcoholic drinks over a one-hour period to raise their blood alcohol concentrations to between .08% and .10% W/V. Three tubes of blood were taken from each subject and 0, 5, or 10 mg/ml of sodium fluoride was added to each tube. The blood alcohol concentration of each tube was then measured. (a) Identify the following elements of this experiment: (i) The response. (ii) A treatment factor. (iii) A blocking factor. (iv) An experimental unit. Page 3 of 16 QUESTION/ANSWER SHEET ID:....................... STATS 240 (b) Randomisation is one of Fisher’s three principles of experimental design. Explain how ran- domisation should have been applied in this experiment. (c) Suppose that (prior to running the experiment) the experimenter decided that they wanted to increase the amount of replication. They propose that after the sodium chloride has been added to each tube, the contents of the tube be divided into two parts and the blood alcohol concentration measured on each part. Will this actually increase the number of times each treatment is replicated? Explain your answer. Page 4 of 16 QUESTION/ANSWER SHEET ID:....................... STATS 240 Qu. 2. [13 marks] A study was conducted to compare four methods of treating physical discomfort in patients suffering from chronic asthma. The four methods were: AL: drug A at a low dose, BL: drug B at a low dose, AH: drug A at a high dose, BH: drug B at a high dose. Notice that these 4 methods can be considered as a factorial arrangement of treatment factors drug and dose. The experimenter decided to use “time to relief” as the response (measured in minutes). Since response to a drug often varies considerably from patient-to-patient, it was decided to use patients as blocks. It was also thought that the order in which the patient receives the drugs may impact the response. Therefore, a 4× 4 Latin square design was used and the following results were obtained: Patients Order 1 2 3 4 1 14 (AH) 30 (AL) 15 (BH) 28 (BL) 2 25 (AL) 15 (BH) 22 (BL) 21 (AH) 3 17 (BL) 19 (AH) 26 (AL) 22 (BH) 4 7 (BH) 22 (BL) 19 (AH) 34 (AL) (a) The following ANOVA table for this data was produced using R. Error: patient Df Sum Sq Mean Sq F value Pr(>F) Residuals [A] ****** 74.167 Error: order Df Sum Sq Mean Sq F value Pr(>F) Residuals 3 [B] 1.1667 Error: patient:order Df Sum Sq Mean Sq F value Pr(>F) drug 1 ****** 100.00 24.00 0.0027137 ** dose 1 ****** 324.00 [C] 0.0001181 *** drug:dose [D] ****** 9.00 2.16 0.1920392 Residuals [E] ****** 4.17 Calculate the values of A-E that are missing from the ANOVA table (show any working). A= B= C= D= E= Page 5 of 16 QUESTION/ANSWER SHEET ID:....................... STATS 240 (b) Tables of treatment means are: > model.tables(comfort.aov,"means") Tables of means Grand mean 21 drug drug A B 23.5 18.5 dose dose H L 16.5 25.5 drug:dose dose drug H L A 18.25 28.75 B 14.75 22.25 Based on the ANOVA table in (a) and the output from the model.tables what conclusions do you make concerning each of the following (briefly justify your answers): (i) The impact of the treatment factors on the response. (ii) The usefulness of blocking on patient and of blocking on order. Page 6 of 16 QUESTION/ANSWER SHEET ID:....................... STATS 240 (c) Suppose that the experiment had used twelve patients instead of four. The patients were divided into three groups of four, and for each group a 4 × 4 Latin square was used to block on both patient and order. (i) Would the blocking structure for this design be squares*patient*order (squares/patient)*order, (squares/order)*patient or squares/(patient*order)? Ex- plain your answer. (ii) For the blocking structure you chose in (i) write down the set of error strata that would occur in the ANOVA table (assume that the blocking factors do not interact) and give the Error degrees of freedom for each stratum. Note: you can get full marks for this part even if you select the wrong structure in part (i). Page 7 of 16 QUESTION/ANSWER SHEET ID:....................... STATS 240 Qu. 3. [13 marks] “Green manure” is a cover crop sown on an agricultural plot in order to fertilize the soil for the following crop. The following data comes from an experiment that compared the effects of four green manure crops (Fallow, Barley, Vetch, Barley plus Vetch) on the yields of sugar beets (subsequently planted) at two levels of nitrogen fertiliser (none, 120 lb/acre). green manure crop Block nitrogen F B V BV 1 none 13.8 15.5 21.0 18.9 120 lb/acre 19.3 22.2 25.3 25.9 2 none 13.5 15.0 22.7 18.3 120 lb/acre 18.0 24.2 24.8 26.7 3 none 13.2 15.2 22.3 19.6 120 lb/acre 20.5 25.4 28.4 27.6 The following ANOVA table was produced using iNZight lite: Error: Block Df Sum Sq Mean Sq F value Pr(>F) Residuals 2 7.866 3.933 Error: Block:Plot Df Sum Sq Mean Sq F value Pr(>F) Nitrogen 1 262.02 262.02 104.1 0.00947 ** Residuals 2 5.04 2.52 Error: Within Df Sum Sq Mean Sq F value Pr(>F) Gmanure 3 215.26 71.75 118.96 3.43e-09 *** Nitrogen:Gmanure 3 18.70 6.23 10.33 0.00121 ** Residuals 12 7.24 0.60 (a) It is clear from the ANOVA table that the experimenters did not use a completely randomised design. (i) What type of design was used for this experiment? (ii) Describe the blocking structure for this design. (iii) Describe the treatment structure for this design. (iv) Explain how treatments were assigned to experimental units for this design. Page 8 of 16 QUESTION/ANSWER SHEET ID:....................... STATS 240 (5 marks) (b) The “tables of means” are: Grand mean 20.72083 Nitrogen 0 120 17.417 24.025 Gmanure B BV F V 19.583 22.833 16.383 24.083 Nitrogen:Gmanure Gmanure Nitrogen B BV F V 0 15.233 18.933 13.500 22.000 120 23.933 26.733 19.267 26.167 Values of the least significant difference (LSD) and Tukey’s studentised range (TSR) for com- paring means from these tables (α = .05) are: Nitrogen:Gmanure Nitrogen Gmanure same Nitrogen level different Nitrogen levels LSD 2.79 0.97 1.38 3.68 TSR 2.79 1.33 2.29 7.88 The standard formula for the TSR is: q ×√ResMS/reps. In iNZight lite you need to supply values of df, ResMS, rep and means in addition to setting alpha to .05. Fill in the values that would be used for each table of means. For Nitrogen: df= ResMS = reps = means = For Gmanure: df= ResMS = reps = means = For Nitrogen:Gmanure same level of Nitrogen: df= ResMS = reps = means = For Nitrogen:Gmanure different level of Nitrogen: df= ResMS = reps = means = Page 9 of 16 QUESTION/ANSWER SHEET ID:....................... STATS 240 14 16 18 20 22 24 26 Nitrogen m ea n of yie ld 0 120 Gmanure BV V B F (c) Write a short paragraph summarising how the two treatment factors affect the yield of sugar beets. Support your conclusions – reference the ANOVA table, the tables of means and the interaction plot given above as you see fit. Page 10 of 16 QUESTION/ANSWER SHEET ID:....................... STATS 240 Qu. 4. [10 marks] (a) If we divide the population into non-overlapping groups, and collect data from each group, what sort of design have we used? (b) List two reasons why a researcher might consider stratified sampling? (c) What is being traded off in choosing cluster sampling compared with simple random sampling? (d) (i) If we sample n = 100 people at random from a population that has N = 2000 people, what weight should you give each person? (ii) If we were to use this sample to estimate the mean number of hours worked per week for people in the population, should we use a finite population correction? Page 11 of 16 QUESTION/ANSWER SHEET ID:....................... STATS 240 Qu 4 cont. (e) What is being corrected when we make “finite population corrections” and are they being made bigger or smaller? (f) Suppose we take a census of the entire population of families and estimate that the mean family size is 2.97 people. What is the standard error of this estimate? (g) In a simple random sample of sample size n taken from a population of size N , what is the sampling fraction? (h) What is a hexbin plot and how are such plots used for plotting survey data? Page 12 of 16 QUESTION/ANSWER SHEET ID:....................... STATS 240 Qu 4 cont. (i) What do smoothers on plots help reveal? (j) Fill in the missing word. When responses from units in the same cluster are highly correlated, standard errors for a cluster sample are than standard errors for a simple random sample of the same size? (k) Write an equation to represent the relationship between the sample size (n), effective sample size (ess), and design effect (deff) of a survey. (l) If there are big differences between stratum variances, and also big differences in sampling costs between strata, what type of strata do optimally allocated stratified sampling oversample? Page 13 of 16 QUESTION/ANSWER SHEET ID:....................... STATS 240 Qu 4 cont. (m) In a sample where a weight for each unit represents the inverse of the sampling fraction, what estimate is represented by: (i) the sum of weights across the sample? (ii) the sum of the product of weights and values across the sample (e.g., values may be number of cars owned; total income, etc.)? (iii) the sum of the product of weights and values across the sample DIVIDED by the sum of weights across the sample? Page 14 of 16 QUESTION/ANSWER SHEET ID:....................... STATS 240 Qu. 5. [7 marks] Question 5 relates to Andrew Sporle’s lectures. This last sheet will be detached from the rest of your answers during marking so it is essential that you supply your name and ID information again here: ID:............................ Surname:............................ First name:............................ (a) What is the benefit of pretesting survey questions? (b) A forestry company wants to do a sample survey of trees in a very remote forest. The survey involves cutting down the sampled trees and removing them by helicopter. List three (3) reasons why the survey statistician who is designing the survey would be interested in designing the most efficient survey possible? (c) What type of missing data cannot be resolved by using imputation? Page 15 of 16 QUESTION/ANSWER SHEET ID:....................... STATS 240 Qu 5 cont. (d) Name a measure of socio-economic position that would be suitable for a study of health care use by the elderly? (e) Name a type of numerical rating scale commonly used in survey questions? (f) A standardized interview schedule helps reduce what source or type of information bias? Page 16 of 16
欢迎咨询51作业君