程序代写案例-ACTL90023-Assignment 1

欢迎使用51辅导,51作业君孵化低价透明的学长辅导平台,服务保持优质,平均费用压低50%以上! 51fudao.top
ACTL90023 Data Analytics in Insurance 1 -
2021 Assignment 1
1 Data
This assignment is going to be based on a data set named “Assignt1 data.csv”
which can be downloaded in Canvas. This data set is a US health insurance
costs data set with 1,338 observations. The data set contains 8 columns:
• id: the policy id
• age: the age of the policyholder
• sex: the gender of the policyholder
• bmi: the bmi index of the policyholder
• children: number of children the policyholder gave birth to
• smoker: smoking status of the policyholder
• region: the policy region in US
• charges: the health insurance cost amount
2 Tasks
2.1 Descriptive analysis of the data set
Here you will need to perform some descriptive analysis of the data set.
Consider the variable charges as the response variable.
1
1. Load the data set and display the data frame.
2. Show numerical summary of the variables in the data.
3. Draw plots that can display relationships among the variables that you
choose.
4. Discuss relationships that could be material in studying the response
variable with reasons.
5. Divide the full data set into training (80%) and validation (20%) sets.
2.2 Multiple linear regression
In this part you will need to finish the following tasks using the training set:
1. Build a multiple linear regression model using all predictors.
2. Discuss the significance of the relationship between the response vari-
able and the predictors by preforming relevant hypothesis testing.
3. Study any interaction effect among the predictors given in the data set.
2.3 Linear model selection
In this part you will need to finish the following tasks using the training set:
1. Select the best model using different subset selection methods.
2. Perform ridge regression and the Lasso model fitting by choosing several
scenarios for the tuning parameter.
2.4 Assessing model performance
In this part you will need to calculate the test MSE using the generated
validation data set for the best models that you obtain previously by different
approaches. Answer the following questions:
1. Which model gives you the lowest test MSE?
2. Justify your finding using the knowledge that you learn so far from this
subject.
2
3 Instructions
• This assignment is due at 5pm on Sunday 18th April. Submit your
solution file in Canvas under “Assessments”.
• You should generate an R markdown document and then produce a
pdf version for submission. You can find a template file in assignment
one materials for your reference.
• You should name your submission file by your student id number.
• This assignment counts for 15% in the total assessment of this subject.
3

欢迎咨询51作业君
51作业君

Email:51zuoyejun

@gmail.com

添加客服微信: abby12468