MATH5916: Survival Analysis Term 1, 2022 Assignment 2 Submission deadline: Friday 22 April, 5:00pm Deliverables: One R Markdown file for the entire assignment with file name of the form “LastName FirstName - z1234567 - Ass#.Rmd”. Your Rmd file should produce a PDF file (use option output: pdf document), make no external references to the file structure on your computer and you should have no commands to save output externally. A template can be found on Moodle and more detailed instructions can be found in Lecture 1. Assignment length: There is a 8 page limit and 12pt font size for your Rmd output file. Any pages exceeding this limit or submissions with smaller font sizes will not be marked. If you are over the page limit, be judicious about what R code/output is printed and perhaps modify figure sizes (they do not need to be large but should be legible). Submission: Upload your R Markdown file to Moodle and include the Plagiarism Statement given below (copy-and-paste it). Penalties: Failure to adhere to instructions will result in a minimum 5% mark reduction. Name: Student Number: I declare that this assessment item is my own work, except where acknowledged, and has not been submitted for academic credit elsewhere, and acknowledge that the assessor of this item may, for the purpose of assessing this item: Reproduce this assessment item and provide a copy to another member of the University; and/or, Communicate a copy of this assessment item to a plagiarism checking service (which may then retain a copy of the assessment item on its database for the purpose of future plagiarism checking). I certify that I have read and understood the University Rules in respect of Student Academic Misconduct. Signed: Date: 1 1. Consider the log-linear model with fixed covariates log Ti = µ+ α1x1i + . . .+ αpxpi + σϵi = µ+ xTi α+ σϵi (a) Show that the survival function of Ti is Si(t) = S0(te −xTi α) where S0(t) = P (e µ+σϵi > t) is the baseline survival function (the survival function for an individual with zero covariates). (b) The log-linear model above is an accelerated failure time model, because the effect of the explanatory variables x is to speed up or slow down the time scale for the failure process. The acceleration factor is e−x T i α. Consider an accelerated failure time model with a single binary variable for treat- ment group: x = 0 for the standard treatment group and x = 1 for the new treatment group. i. Describe the effect of treatment on survival if α is (1) positive or (2) negative. ii. Use the definition of expectation to derive a relationship between the expected lifetimes for the two treatment groups. (c) Show that the hazard function corresponding to the survival function in (a) is hi(t) = e −xTi αh0(te−x T i α) where h0(t) is the baseline hazard function. (d) Suppose that the survival time for an individual with zero covariates has a Weibull (λ, γ) distribution. Show that the hazard function for individual i with covariate vector xi is hi(t) = e −γxTi αλγtγ−1. Deduce that the survival time for individual i also has a Weibull distribution and state the parameter values. (e) The Cox proportional hazards model hi(t) = e xTi βh0(t) leaves the baseline haz- ard function h0(t) non-parametric. The Weibull proportional hazards model as- sumes a Weibull distribution for the baseline hazard function, so that hi(t) = ex T i βλγtγ−1. Comparing this with (d), show that the accelerated failure time model for the Weibull distribution also has a proportional hazards interpretation. (In fact, the Weibull is the only distribution with both the proportional hazards and acceler- ated failure time properties). 2 2. This question uses the PBC dataset used in the lectures and tutorials. This data is available on Moodle and is also part of the survival package. The status variable takes on the values in {0,1,2} and the event of interest is status == 2. Several obser- vations have missing data which creates nesting problems when making comparisons across models. You can use either na.omit or drop na() (when piping) to retain only complete cases when reading in the data. Be judicious with your output and summary table(s) will often suffice as long as your underlying code is correct. (a) Modify the R code from lecture to fit all possible main effects models including the null model. Then, use those results to answer the remaining questions. (b) Using this data, follow the model selection strategy proposed by Collett while ignoring the last step for interactions (discussed in Lecture 7 notes). The only variables you should consider are age in years, sex, edema, platelet, stage and the log-transformations for the variables bili, albumin and protime. (c) Create an index plot for AIC and BIC similar to Lecture 7 for all models. (d) Do the final models chosen by Collett’s strategy, AIC and BIC agree with each other? (e) In consideration of your response to part (d), which covariates appear to be im- portant in explaining survival for these patients? 3. A follow-up study on the 312 PBC trial participants was also undertaken (pbcseq.csv), and a brief description is contained in the tutorial notes. For this study, multiple measurements over time were obtained for some of the prog- nostic variables, so this data can be analysed using Cox regression models with time- dependent covariates. (a) Fit a Cox regression model including the variables you deemed important from question 2, treating the longitudinally measured variables as time-dependent co- variates. Do any of the variables become non-significant in this model? Re-fit if required, excluding non-significant variables to arrive at a final model. Write down the fitted model for the hazard function. (b) Give an interpretation of the estimated regression coefficients for the model in (a). For each of the prognostic variables included in the model, indicate whether an increase in these variables has a beneficial or detrimental effect on survival. (c) What was the maximum number of observations, m, taken on a patient? Iden- tify the three patients with m measurements. Plot values of the longitudinally measured variables over time for these three patients. Based on these plots and the values of any fixed covariates, can you suggest why these patients survived a relatively long time? 3
欢迎咨询51作业君