程序辅导案例 > Program >

代写辅导接单-STATS 4M03/6M03: Multivariate Analysis ASSIGNMENT 2

欢迎使用51辅导，51作业君孵化低价透明的学长辅导平台，服务保持优质，平均费用压低50%以上！ 51fudao.top

STATS 4M03/6M03: Multivariate Analysis ASSIGNMENT 2

Due at 9:30 am on Wednesday, November 1, 2023.

Instructions:

Failure to follow the instructions will result in a deduction of one mark for each instruction not followed.

1. Submit your assignment to Avenue to Learn as a PDF file ONLY. Assignment in other formats will not be accepted.

2. Name your file Lastname Firstname studentID.pdf.

3. You must use LATEX when you write the solution. You must use the LATEX tem- plate that I posted in Avenue to Learn.

4. Do not exceed the number of allowable lines for each part. Tables and plots are not included in the line count.

5. If you include tables and/or figures, add captions.

6. Do not include R code, screenshots, or all R outputs. Instead, report only the important results and use tables and/or plots to summarize results.

7. Questions do not necessarily carry the same amount of marks.

8. Grace period: You can submit your assignment as late as 3:30 p.m. on Wednes- day, November 1, 2023, without penalty. I set the grace period to avoid technical issues or if your computer crashes 2 minutes before deadline.

9. Assignment up to 24 hours late after the grace period will incur a 20% penalty; assignment more than 24 hours late will receive a grade of zero.

10. Students with accommodations through Student Accessibility Services who re- quire accommodations must contact me at least 24 hours before the assignment is due.

Q.1 In the R package DAAG the ais data contain 202 observations and 13 variables. Use the following variables to answer the question

• sex: the sex of the athlete: F means female, and M means male. (use this variable as the label for this data)

• bmi: body mass index, in kg per metre-squared.

• pcBfat: percent Body fat.

• hg: hemaglobin concentration, in g per decaliter.

Important Note: scale the data!!

(a) Perform cluster analysis:

i. Distance Based Clustering:

Fit k-means with G = 2, 3, 4, 5 and set.seed(202311) before running the function.

What is the best value for G? How did you determine this? (Hint: use the silhouette method) (You must include four graphs or one table to summarize results for each G) (maximum 2 lines)

ii. Model Based Clustering:

Fit parsimonious t mixture with G=1:5, k-means initialization and set.seed(202311) before running the function.

What is the name of the best model? What is the best value for G? How did you determine this? (maximum 2 lines )

iii. Model Based Clustering:

Fit multivariate t mixture with G=1:5, k-means initialization and set.seed(202311) before running the function.

What is the name of the best model? What is the best value for G? How did you determine this? (maximum 2 lines )

(b) Compare the methods fitted in (a) Which method performs the best? How did you determine this? (maximum 8 lines )

Q.2 Consider the following summary output from a binary logistic regression on lending data. The response variable is loan, which is coded as 1 (applicant gets credit), 0 (applicant is denied credit). The predictor variables are:

• saccount: has a savings account, coded as 1 (yes), 0 (no);

• cell: has a cell phone, coded as 1 (yes), 0 (no);

• chistory: credit history, coded as 1 (has “good” credit history), 0 (does

not have “good” credit history).

Estimate intercept –1.01

saccount 0.84 cell 0.06 chistory 0.78

Std. Err. z 0.11 –8.92

0.21 3.96 0.15 0.45 0.15 5.06

p <0.01

Est.

Odds Ratio 95% CI

(1.52, 3.49) (0.80, 1.42) (1.61, 2.94)

<0.01 2.31 0.66 1.07 <0.01 2.17

(a) In the context of these data, what is the aim of logistic regression? (max- imum 2 lines)

(b) Interpret the three (estimated) odds ratios and the associated intervals. (maximum 6 lines for odds ratios ( i.e. 1 or 2 lines for each odds ratios ) and maximum 6 lines for intervals (i.e. 1 or 2 lines for each intervals))

(c) Explain the relationship between the 95% odds ratio intervals and the p values associated with the corresponding predictor variables. (maximum 3 lines)

(d) Suppose a person who has no savings account, no telephone, and good credit history applies for a loan. Under this fitted model, what is the estimated probability that this person will get credit?

(e) For this fitted model suppose that G = Null deviance−Residual deviance = 31.8 and the degree of freedom = 3. How can this number be interpreted (your answer must include (null) hypothesis, p value and conclusion)? (maximum 5 lines)

(f) Do you think a simpler model would be preferable? Explain your answer. (maximum 3 lines)