STAT4051 Name (Print): Fall 2020 Midterm II () Student ID: Time Limit: 24 hours Instructions: • The exam has five problems. You should try to solve all five problems before deciding which four problems to submit. Make it clear on the next page what four problems you want graded. Submitting solutions to all five problems will not gain more points. If you submit solutions to all five problems, then we will grade just the first four regardless. • In your analysis, remember to check for assumptions and think about interactions. Your analysis should go beyond just the linear model and what is significant. It should try to explain what is going on in the data and provide graphics when appropriate. • Do your own work! Discuss any non-statistical questions only with the instructor. Once the exam is distributed, your TA and I cannot answer statistical questions. • Upload one .pdf file to Canvas of your final solutions by Monday, November 23 at 11:00 am. You may use R Markdown to generate your results or cut and paste your output into your file. If you cut and paste R output into your document, change the font of R output to courier to maintain formatting. I will not accept hand written work or screen shots of your analysis. Also, I will not accept .html files. • This exam contains 7 pages (including this cover page). • This exam is worth 100 points. • Show all your work on each problem for full credit. The following rules apply: – Organize your work, in a reasonably neat and coherent way, in the space provided. Work without a clear ordering will receive very little credit. – Mysterious or unsupported answers will not receive full credit. STAT4051, Fall 2020 Midterm II () - Page 2 of 7 Initials: List the problems you want graded here: • • • • STAT4051, Fall 2020 Midterm II () - Page 3 of 7 Initials: Problem I. Short Answer (25 points total) Show all work for full credit unless noted otherwise. Data were collected on six variables for 20 individuals: height, weight, BMI, chest, waist and hip measurements. The first ten individuals in the dataset are male and the remaining ten are female. Download the measurement.csv dataset and provide a thorough analysis which addresses the questions listed below. Here are the variables and their units of measurement: • height (inches) • weight (pounds) • BMI (kg/m2) • chest (inches) • waist (inches) • hips (inches) For this problem, show a logical flow and justify the decisions you made along the way to answer these questions: i. Describe the relationship(s) among these six variables revealed by the analysis. ii. Is it possible to reduce the number of variables in the dataset? If so, how much will you retain? iii. Is it possible to classify the sex of future individuals based on your analysis? If so, generate a graphic in support and discuss what you see. STAT4051, Fall 2020 Midterm II () - Page 4 of 7 Initials: Problem II. Short Answer (25 points total) Show all work for full credit unless noted otherwise. A physiologist studied the effect of three treatments on muscle tissue in cats. Ten litters of three cats each were randomly selected and the three treatments were randomly assigned to the three cats in each litter. Each cat received only one treatment. The physiologist recorded the litter, treatment and muscle activity 1 hour after application. Download the catactivity.csv dataset and provide a thorough analysis which determines whether there are any differences among treatments with regard to muscle activity. If so, identify the treatment(s) that produce(s) the largest activity. Use α = 0.05. STAT4051, Fall 2020 Midterm II () - Page 5 of 7 Initials: Problem III. Short Answer (25 points total) Show all work for full credit unless noted otherwise. Kidney failure patients are commonly treated on dialysis machines that filter toxic substances from the blood. The appropriate ”dose” for effective treatment depends, among other things, on duration of treatment and weight gain between treatments as a result of fluid buildup. To study the effect of these two factors on the number of days hospitalized (attributed to the disease) during a year, a random sample of 56 patients who had undergone treatment at a large dialysis facility was obtained. Treatment duration was categorized into two groups: short duration (average dialyzing time for the year under four hours) and long duration (average dialyzing time for the year equal to or greater than four hours). Average weight gain between treatments during the year was also categorized into three groups: slight, moderate and severe. The response is the number of days hospitalization followed. Download the kidneyfailure.csv dataset Hint: Use the following transformation, loge(y + 1), to stabilize the variances. Provide a thorough analysis that answers the following questions: i. What factor(s) statistically affect the length of hospitalization? Use α = 0.05. ii. What group of patients are hospitalized the longest? iii. How may you improve this experiment? Be specific. STAT4051, Fall 2020 Midterm II () - Page 6 of 7 Initials: Problem IV. Short Answer (25 points total) Show all work for full credit unless noted otherwise. Doctors wish to assess the influence of three anti-viral drugs (two actual drugs and a control) on the length of SARS in patients. As new patients are identified, they are randomly assigned to receive one of the three drugs. The response is the length of time a patient recovers after receiving treatment and no longer tests positive for SARS. It is suspected that the duration the patient experienced SARS symptoms before they were randomized to treatment affects the response, therefore duration was recorded for each patient. Download the antiviral.csv data to use to answer the following questions. Here are the details about the data: • treatment (one of three treatments) • recovery.length (days) - length of recovery time post treatment • duration (days) - how long a patient had SARS symptoms before they were were treated. For this problem: i. Provide a thorough analysis that addresses whether there are any statistical differences among the three treatments. Show the logical flow and the decisions you made along the way to arrive at your final model. Use α = 0.05. ii. Estimate the mean length of recovery for each treatment. STAT4051, Fall 2020 Midterm II () - Page 7 of 7 Initials: Problem V. Short Answer (25 points total) Show all work for full credit unless noted otherwise. A service center employs a large number of technicians who specialize in repairing laptops. Four technicians were randomly selected from all technicians working at the service center to be included in a study about service time. Five laptop brands were randomly selected from all the brands currently serviced by the center. It was desired to study the effects of technician and laptop make on the variability of service time. Download the laptop.csv dataset and provide a thorough analysis which answers the following questions: i. Estimate the various sources of variation in this study. ii. What percentage of the total variability does the technician component contribute? iii. Test whether all variances are statistically different from zero. Use α = 0.05. iv. The study was criticized by one of the center’s managers who said that the results could only be applied to the four technicians in the study. Is this statement accurate or not? Discuss.
欢迎咨询51作业君