辅导案例-STAT5002

欢迎使用51辅导,51作业君孵化低价透明的学长辅导平台,服务保持优质,平均费用压低50%以上! 51fudao.top
STAT5002 Introduction to Statistics – Semester 2, 2019
Assignment
Notes and Instructions:
• This assignment is structure assessed. It carries a weight of 8% towards your final mark for STAT5002.
• You may discuss the questions with others but you must submit your own individual reports with your
own working and words.
• Presentation of your assignment is marked.
• Show all R code or calculation used to answer the questions in your report.
• Do NOT include your name in the assignment.
• The report has to be submitted via Turnitin (Canvas) by Sunday 3rd November at 23:59pm.
• If you have issues submitting your report, send your report to [email protected].
• You may discuss the questions with others but you must submit your own individual reports with your
own working and words.
Written Report
The written report is based on the Ames Housing data set (AmesHousing.txt, uploaded to Canvas in the
folder AssignmentData along with the description file, DataDocumentation.txt) and should answer the 2
questions below. Show all R code or calculation used to answer the questions in your report. Your report
should be no longer than 5 pages. Presentation of the report is marked. PLEASE SUBMIT A PDF VERSION
OF YOUR REPORT ONLINE.
Problem
Suppose that the Ames Housing data is a representative sample of the houses in Ames.
1. If I select a random household from Ames, estimate the probability that
(a) the selected household has a basement?
(b) the selected household has a pool?
(c) the selected household has a pool and a basement?
2. In this question consider the four variables SalePrice (Y ), Lot.Area (x1), Overall.Qual (x2) and
MS.SubClass (x3).
1
(a) Consider the four simple linear regression model:
Yij = 0 + 1x1i + ✏ij (1)
log(Yij) = 0 + 1x1i + ✏ij (2)
Yij = 0 + 1 log(x1i) + ✏ij (3)
log(Yij) = 0 + 1 log(x1i) + ✏ij (4)
where ✏ij ⇠ N(0,2).
By considering some diagnostic plots and the coecient of determination, r2, explain which of
the four model is the best.
(b) Using only Y , x1, x2 and x3, what is the best (parsimonious) regression model that fits the data?
Explain your conclusion.
(c) Regardless of your answer in (b), consider the following model
log(Yij) = 0 + 1x1i + 2x2i + ✏ij (5)
assuming ✏ij ⇠ N(0,2).
i. Write the fitted model for (5).
ii. Are there any outliers under model (5)?
iii. You inspect a property with a lot area of 10000 feet2 with and an overall quality rated as
“Excellent” using the same standard of rating in the Ames Housing data. What is your
expected sales price under model (5)?
2
51作业君

Email:51zuoyejun

@gmail.com

添加客服微信: abby12468