程序代写案例-STAT2401

欢迎使用51辅导，51作业君孵化低价透明的学长辅导平台，服务保持优质，平均费用压低50%以上！ 51fudao.top

School of Mathematics and Statistics
FIRST SEMESTER EXAMINATIONS
STAT2401
ANALYSIS OF EXPERIMENTS
FAMILY NAME: STUDENT ID:
GIVEN NAMES: SIGNATURE:
This Paper contains: 6 pages (including title page)
Time allowed: 2 hours and 45 minutes
INSTRUCTIONS:
• This is version 0 . This is an open book exam.
• The marks for each question are indicated in the questions for a total of 75 marks available.
• This examination requires you to use the statistics package R or RStudio.
• You should answer the questions in the Electronic Answer Sheet (Available Online). You
will not gain any mark if the answers are written in somewhere else. The submitted Answer
Sheet should be in PDF format. Photo or in any other format will NOT be accepted. Make
sure you are NOT submitting a blank Answer Sheet. Use SAVE or PRINT to create a new
PDF file that contains your answers
• The submission of your answer sheet should be done via LMS over the Final Exam Upload
Point under the exam folder. Please save the name of your pdf file be “your student
number [your name].pdf”.
• Late submissions will not be marked.
• There are 10 versions numbered 0, 1, 2, 3, 4, 5, 6, 7, 8, 9. Please take the version that is
identical to the last digit of your student number.
• The data is available in LMS, please download the corresponding version.
• All non-integer numerical answers should be given up to 4 decimal places. Fail to follow
this would award a mark of zero.
• When using R or RStudio it is recommended that you write down answers as soon as you
have obtained the necessary output. In this way you should lose little of importance in the
unlikely event of a computer failure.
• You must show your working in order to obtain full marks.
Semester 1 Examinations
June 2020
2.
STAT2401
1. Does pollution kill people? Data in one early study designed to explore this
issue came from five Standard Metropolitan Statistical Areas (SMSA) in the
United States, obtained for the years 1959–1961. Total age-adjusted mortality
(Mortality) from all causes, in deaths per 100,000 population, is the response
variable. The 15 explanatory variables for each of 60 cities are
(1) Precip: mean annual precipitation (in inches);
(2) Humidity: percent relative humidity (annual average at 1 P.M.);
(3) JanTemp: mean January temperature (in degrees Fahrenheit);
(4) JulyTemp: mean July temperature (in degrees Fahrenheit);
(5) Over65: percentage of the population aged 65 years or over;
(6) House: population per household;
(7) Educ: median number of school years completed by persons of age 25 years or more;
(8) Sound: percentage of the housing that is sound with all facilities;
(9) Density: population density (in persons per square mile of urbanized area);
(10) NonWhite: percentage of 1960 population that is nonwhite;
(11) WhiteCol: percentage of employment in white-collar occupations;
(12) Poor: percentage of households with annual income under $3,000 in 1960;
(13) HC: relative pollution potential of hydrocarbons (HC);
(14) NOX: relative pollution potential of oxides of nitrogen (NOX); and
(15) SO2: relative pollution potential of sulphur dioxide (SO2).
It is desired to determine whether the pollution variables (13, 14, and 15) are
associated with mortality, after the other climate and socioeconomic variables
are accounted for.
Save the data in “your working directory”, and read in the data by
setwd("your working directory")
Pollution = read.csv(file="Pollution-Version-0.csv",header=TRUE)
(a) Describe the process of backward variable selection, implemented using F -
test and p-value approach, for a multiple linear regression model.
[5 marks]
(b) Fit a linear model with response Mortality, including only the first 12 ex-
planatory variables, to the data. Report your R code (NOT the R-output)
and the fitted model. [4 marks]
(c) Starting with the fitted model in part (b), perform backward variable selection
using F -test/p-value approach to select a model. Report your R code (NOT
the R-output) and the fitted model that is finally selected. [4 marks]
QUESTION 1 CONTINUES OVER THE PAGE
Semester 1 Examinations
June 2020
1 (Continued)
3.
STAT2401
(d) Starting with the final model in part (c), perform forward variable selection
using F -test/p-value approach to select the last 3 explanatory “pollution”
variables (13, 14, and 15). Report your R code (NOT the R-output) and the
fitted model that is finally selected. [4 marks]
(e) Starting with the NULL model, perform forward variable selection using F -
test/p-value approach to select a model including only the first 12 explanatory
variables. Report your R code (NOT the R-output) and the fitted model that
is finally selected. [4 marks]
(f) Starting with the final model in part (e), perform forward variable selection
using F -test/p-value approach to select the last 3 explanatory “pollution”
variables (13, 14, and 15). Report your R code (NOT the R-output) and the
fitted model that is finally selected. [4 marks]
(g) State the common explanatory variables of the final models found in parts
(d) and (f). [4 marks]
2. This question concerns data from an observational study on the selective mech-
anisms of evolution. An interesting variable in this respect is brain size.
One might expect that bigger brains are better, but certain penalties seem to be
associated with large brains, such as the need for longer pregnancies and fewer
offspring. Although the individual members of the large brained species may
have more chance of surviving, the benefits for the species must be good enough
to compensate for these penalties. To shed some light on this issue, it is helpful
to determine exactly which characteristics are associated with large brains, after
getting the effect of body size out of the way.
The data Brain contains the variables: natural logarithm of the average values
of brain weight (logBrain, response), body weight (logBody), and 4 different
levels of natural logarithm of gestation lengths (loggestation) for 96 species of
mammals.
Save the data in “your working directory”, and read in the data by
setwd("your working directory")
load(file="Brain-Version-0.RData")
QUESTION 2 CONTINUES OVER THE PAGE
Semester 1 Examinations
June 2020
2 (Continued)
4.
STAT2401
(a) Fit the following models in order to explain the response variable logBrain
(natural logarithm of Brain weight) based on the information of logBody
(natural logarithm of body size):
• M1, a simple linear regression for all observations (i.e. intercept and
slope not dependent on different levels of natural logarithm of gestation
lengths (logGestation)).
• M2, parallel regressions for observations from each each level of natu-
ral logarithm of gestation lengths (i.e. regressions have the same slope
but the intercept varies for the different levels of natural logarithm of
gestation lengths (logGestation)).
• M3, separate regression for observations from each level of natural loga-
rithm of gestation lengths (i.e. regressions have intercept and slope that
varies for the different levels of natural logarithm of gestation lengths
(logGestation)).
Report the R code (NOT the R-output) that you used to fit these models.
[3 marks]
(b) Use F tests to select the most appropriate model from M1, M2, and M3,
working at a 5% significance level. Explain your reasoning clearly, and include
the p-values that you obtain for your tests, also report your R code (NOT the
R-output). [6 marks]
(c) For your preferred model, report the fitted models for all levels of natural
logarithm of gestation lengths. [4 marks]
3. This question comes from Ramsey and Schafer Statistical Sleuth, Second Edition,
Chapter 7. Immediately after slaughter the pH in postmortem muscle of a steer
carcass is around 7.0-7.2. For a certain kind of meat processing to take place it
is necessary for pH to decrease to 6.0 so an estimate is needed of the time after
slaughter at which the pH reaches 6.0. To do so, a number of steer carcasses were
identified to have their immediate slaughter postmortem pH level taken, and then
at one of 5 times after slaughter. Time is measured in hours.
Save the data in “your working directory”, and read in the data by
QUESTION 3 CONTINUES OVER THE PAGE
Semester 1 Examinations
June 2020
3 (Continued)
5.
STAT2401
setwd("your working directory")
meat = read.csv(file="meat-Version-0.csv",header=TRUE)
(a) Run a simple linear regression model of log(pH) on log(hour). Report the
your R code (NOT the R-output) and the fitted model. [3 marks]
(b) Test whether the time log(hour) after slaughter is a statistically significant
predictor of postmoterm log(pH) levels. Report your answer and the p-value
for this test. [5 marks]
(c) Use the model fitted in (a), find the estimated mean pH at 4 hours and its
confidence interval. Report also your R code (NOT the R-output).
[4 marks]
(d) Use the model fitted in (a), find the predicted pH at 4 hours and its prediction
interval. Report also your R code (NOT the R-output). [4 marks]
(e) Use the model fitted in (a), determine how long after slaughter you would
expect the mean pH level to be 6.0? Report also your R code (NOT the
R-output). [3 marks]
4. The human brain is protected from bacteria and toxins, which course through the
bloodstream, by a single layer of cells called the blood–brain barrier. This barrier
normally allows only a few substances, including some medications, to reach the
brain. Because chemicals used to treat brain cancer have such large molecular
size, they cannot pass through the barrier to attack tumor cells. At the Oregon
Health Sciences University, Dr. Neuwelt developed a method of disrupting the
barrier by infusing a solution of concentrated sugars.
As a test of the disruption mechanism, researchers conducted a study on rats,
which possess a similar barrier. The rats were inoculated with human lung cancer
cells to induce brain tumors. After 9 to 11 days they were infused with either
the barrier disruption (BD) solution or, as a control, a normal saline (NS) solu-
tion. Fifteen minutes later, the rats received a standard dose of the therapeutic
antibody L6-F(ab’)2. After a set time they were sacrificed, and the amounts of
antibody in the brain tumor and in normal tissue were measured. The time line
QUESTION 4 CONTINUES OVER THE PAGE
Semester 1 Examinations
June 2020
4 (Continued)
6.
STAT2401
for the experiment is as follows
Measurements for the 34 rats are:
Brain: Brain tumor count (per gm);
Liver: Liver tumor count (per gm);
Time: Sacrifice time (hours);
Treatment: Treatment;
Days: Days post inoculation;
Sex: Sex;
Weight: Initial weight (grams);
Loss: Weight loss (grams);
Tumor: Tumor weight (10–4 grams)
The response of interest is taken to be the natural logarithm of antibody concen-
tration ratio (Brain tumor-to-Liver tumor), that is logBLRatio = log(Brain
Liver
).
Save the data in “your working directory”, and read in the data by
setwd("your working directory")
BBB = read.csv(file="BBB-Version-0.csv",header=TRUE)
(a) Fit a multiple linear regression model for logBLRatio based on all other
variables, Time, Treatment, Days, Sex, Weight, Loss, and Tumor. Do you
see evidence for significance of regression? Report also your R code.
[4 marks]
(b) Are there outliers and high leverage points? Use the answer to determine
whether there are influential points (known as ‘bad’ leverage points)? If yes,
state them. Report also your R code. [6 marks]
(c) Calculate the Cook’s distances and determine the influential points (Cook’s
distance greater than 1). Are these influential points different from those in
part (b)? Report also your R code. [4 marks]

欢迎咨询51作业君