程序代写案例-STAT331-Assignment 1

欢迎使用51辅导,51作业君孵化低价透明的学长辅导平台,服务保持优质,平均费用压低50%以上! 51fudao.top
STAT331: Assignment 1
Due: Friday, June 4, 2021 at 5pmEST on Crowdmark
General instructions:
• Your work may be written up using R Markdown, LaTeX, or Word. If you hand-write your
solutions, make sure they are legible. No points will be given if the grader cannot read your
handwriting.
• You may discuss problems with your peers, but you must write up your own answers, and
include names of anyone you worked with on your assignment.
• For data analysis problems: You must clearly present your final answers in addition to the
steps or commands for obtaining your answers. You must include well-commented R code
(and only necessary code) to reproduce your work.
1. [Theory] Suppose we observe a sample of n outcomes yi and covariates xi, and assume the
ususal simple linear regression model:
yi = 0 + 1xi + ✏i, ✏i
iid⇠ N(0,2), for i = 1, 2, . . . , n
and we want to compute the usual least squares (LS) estimators (ˆ0, ˆ1) along with corre-
sponding 95% confidence intervals as we did in class.
(a) If the equal variance (i.e. homoskedasticity) assumption does not hold: are our LS
estimators still unbiased? Explain.
(b) If the equal variance (i.e. homoskedasticity) assumption does not hold: are our confidence
intervals still valid? Explain.
(c) If the independence assumption does not hold: are our LS estimators still unbiased?
Explain.
(d) If the independence assumption does not hold: are our confidence intervals still valid?
Explain.
(e) If the normality assumption does not hold: are our LS estimators still unbiased? Explain.
1

2. [Theory] Consider fitting Model A:
yi = 0 + 1xi + ✏i
with the usual assumptions.
(a) Suppose we conduct a hypothesis test of H0 : 1 = 0 against the two-sided alternative
H1 : 1 6= 0 as we did in class. If we reject the null hypothesis at the 0.05-level (meaning the
p value is less than 0.05), what can be said about the corresponding 95% confidence interval
for 1? Explain.
Suppose now we fit a second Model B:
yi =
B
0 +
B
1 x

i + ✏
B
i
using a new (standardized) variable x⇤i = (xi x¯)/sx, where sx is the sample standard
deviation of xi. (Note the B is a label, not an exponent.)
(b) How do we interpret B1 ?
(c) How are estimates ˆ0 and ˆ1 related to ˆB0 and ˆ
B
1 ?
(d) Consider a test of the null hypothesis of H0 : 1 = 0 against the alternative H1 : 1 6= 0.
How is the corresponding p-value related to the p-value for a test of the null hypothesis of
H0 : B1 = 0 (against H1 :
B
1 6= 0)?
2
3. [Data analysis] The dataset berries.csv for this problem is posted on Learn, and comes
from a study (Journal of Texture Studies, 44, 95-103) on the properties of fruits. The variables
are:
• sugar: Sugar content (g/L)
• chewiness: Chewiness (mJ)
and there are 90 berries in the dataset. Suppose we are interested in predicting a berry’s
chewiness (Y) from its sugar content (X).
(a) Show a scatterplot of the data. Include labels for the axes and the plot (e.g., use the
xlab,ylab,main arguments in R’s plot function).
(b) Assume we use simple linear regression to model the relationship between X and Y.
Compute the least squares estimates ˆ0 and ˆ1 (e.g., using R’s lm function), and give inter-
pretations of those estimates (as appropriate). Add the fitted line to your scatterplot (e.g.,
using R’s abline function, which can take your fitted regression model as input).
(c) Formally test the hypothesis H0 : 1 = 0 vs HA : 1 6= 0, e.g., by using the output of R’s
lm function. Include the t-statistic and also the p-value. Write a sentence summarizing your
conclusion at the ↵ = 0.01 level; i.e., do chewiness and sugar have a statistically significant
linear relationship?
(d) Predict the chewiness of a berry with 110 g/L sugar. Provide an estimate and a 95%
prediction interval.
(e) Compute a 95% confidence interval for the mean chewiness for berries with 110 g/L sugar.
3
4. [Simulation] In this problem you will get some practice coding with R, and numerically
investigate the probability distributions we derived for ˆ0 and ˆ1 in simple linear regression.
The ‘true’ model we will assume for this problem is
Yi
indep⇠ N(1 + bxi,2), for i = 1, . . . , 10
where the 10 known values of xi are 1, 1, 1, 1, 3, 4, 5, 5, 6, 7.
Important: Since this question involves simulation, your first line of R code for this problem
must set the random seed with this command: set.seed(123) where you replace 123 with
your student number.
(a) Write an R function that takes two argument (corresponding to b and ) and conducts
the following simulation:
(i) Simulate a set of data values y1, . . . , y10 according to the model. [To generate a random
value for Y1, use R’s rnorm function.]
(ii) Fit the model yi = 0+1xi+ ✏i (i.e. regress y on x) using the simulated outcomes from
(i).
(iii) Compute a p-value for a hypothesis test of H0 : 1 = 0 against the two sided alternative
HA : 1 6= 0.
(iv) Repeat (i)–(iii) 2000 times, saving your results (p-value) each time.
(b) Using your R function, conduct a simulation under b = 0.25 and = 1 as described in
part (a). Plot a histogram of the p values across the 2000 datasets. In what proportion of
samples would you reject the null hypothesis at the 5% level?
(c) Repeat part (b) with b = 0.
(d) Repeat (b)–(c) with = 0.8. Explain your findings.
(e) What would happen if we increased the sample size?
Remember to include your R code in your submissions for Questions 3 and 4!
4

欢迎咨询51作业君
51作业君

Email:51zuoyejun

@gmail.com

添加客服微信: abby12468