MAST30027: Modern Applied Statistics Assignment 2, 2022. Due: 11:59pm Sunday September 11th • This assignment is worth 7% of your total mark. • To get full marks, show your working including 1) R commands and outputs you use, 2) mathematics derivation, and 3) rigorous explanation why you reach conclusions or answers. If you just provide final answers, you will get zero mark. • The assignment you hand in must be typed (except for math formulas), and be submitted using LMS as a single PDF document only (no other formats allowed). For math formulas, you can take a picture of them. Your answers must be clearly numbered and in the same order as the assignment questions. • The LMS will not accept late submissions. It is your responsibility to ensure that your assignments are submitted correctly and on time, and problems with online submissions are not a valid excuse for submitting a late or incorrect version of an assignment. • We will mark a selected set of problems. We will select problems worth ≥ 50% of the full marks listed. • If you need an extension, please contact the tutor coordinator before the due date with appropriate justification and supporting documents. Late assignments will only be accepted if you have obtained an extension from the tutor coordinator before the due date. Under no circumstances an assignment will be marked if solutions for it have been released. Please DO NOT email the lecturer for extension request. • Also, please read the “Assessments” section in “Subject Overview” page of the LMS. Note: There is no unique answer for this problem. The report for this problem should be typed. Hand-written report or report including screen-captured R codes or figures won’t be marked. An example report written by a student previous year has been posted on LMS. Data: The dataset comes from the Fiji Fertility Survey and shows data on the number of children ever born to married women of the Indian race classified by duration since their first marriage (grouped in six categories), type of place of residence (Suva, urban, and rural), and educational level (classified in four categories: none, lower primary, upper primary, and secondary or higher). The data can be found in the file assignment2 prob1.txt. The dataset has 70 rows representing 70 groups of families. Each row has entries for: • duration: marriage duration of mothers in each group (years), • residence: residence of families in each group (Suva, urban, rural), • education: education of mothers in each group (none, lower primary, upper primary, sec- ondary+), • nChildren: number of children ever born in each group (e.g. 4), and • nMother: number of mothers in each group (e.g. 8). 1 We can summarise data as a table as follows. > data <- read.table(file ="assignment2_prob1.txt", header=TRUE) > data$duration <- factor(data$duration, levels=c("0-4","5-9","10-14","15-19","20-24","25-29") > , ordered=TRUE) > data$residence <- factor(data$residence, levels=c("Suva", "urban", "rural")) > data$education <- factor(data$education, levels=c("none", "lower", "upper", "sec+")) > ftable(xtabs(cbind(nChildren,nMother) ~ duration + residence + education, data)) nChildren nMother duration residence education 0-4 Suva none 4 8 lower 24 21 upper 38 42 sec+ 37 51 urban none 14 12 lower 23 27 upper 41 39 sec+ 35 51 rural none 60 62 lower 98 102 upper 104 107 sec+ 35 47 5-9 Suva none 31 10 lower 80 30 upper 49 24 sec+ 38 22 urban none 59 13 lower 98 37 upper 118 44 sec+ 48 21 rural none 171 70 lower 317 117 upper 200 81 sec+ 47 21 10-14 Suva none 49 12 lower 99 27 upper 58 20 sec+ 24 12 urban none 75 18 lower 143 43 upper 105 29 sec+ 50 15 rural none 364 88 lower 546 132 upper 197 50 sec+ 30 9 15-19 Suva none 59 14 lower 153 31 upper 41 13 sec+ 11 4 urban none 108 23 lower 225 42 upper 92 20 sec+ 19 5 rural none 577 114 lower 481 86 upper 135 30 sec+ 2 1 20-24 Suva none 118 21 lower 91 18 2 upper 47 12 sec+ 13 5 urban none 118 22 lower 147 25 upper 65 13 sec+ 16 3 rural none 756 117 lower 431 68 upper 132 23 sec+ 5 2 25-29 Suva none 310 47 lower 182 27 upper 43 8 sec+ 2 1 urban none 300 46 lower 338 45 upper 98 13 sec+ 0 0 rural none 1459 195 lower 461 59 upper 58 10 sec+ 0 0 Problem: We want to determine which factors (duration, residence, education) and two-way interactions are related to the number of children per woman (fertility rate). The observed number of children ever born in each group (nChildren) depends on the number of mothers (nMother) in each group. We must take account of the difference in the number of mothers (hint: one of the lab problems shows how to handle this issue). Write a report on the analysis that should summarie the substantive conclusions and include the highlights of your analysis: for example, data visualisation, choice of model (e.g., Poisson, binomial, gamma, etc), model fitting and model selection (e.g., using AIC), diagnostic, check for overdispersion if necessary, and summary/interpretation of your final model. At each step of you analysis, you should write why you do that and your interpretation/conclusion. For example, “I make an interaction plot to see whether there are interactions between X and Y”, show a plot, and “It seems that there are some interaction between X and Y”. 3
欢迎咨询51作业君