辅导案例-2PM

欢迎使用51辅导,51作业君孵化低价透明的学长辅导平台,服务保持优质,平均费用压低50%以上! 51fudao.top
Stat 481 Project Due Time: 2PM on Friday, April 3.
Instructions:
• Project must be typed for credit. Write your final answers as COMPLETE SENTENCES. Projects
submitted using R-markdown will not receive full credit.
• Show all work. Attach your code at the end of the project. You may use R or SAS, or another
statistical software.
• Do NOT just turn in a set of code plus a sheet with your answers. Write this as a report that you
would give to a person who knows nothing about statistics.
• No late projects are accepted.
Dataset Location:
Use the dataset provided to you on Blackboard. It will be a .csv or .xlsx file with a header in the
dataset.
Problem:
This data file contains nutritional information and grocery shelf location for 77 breakfast cereals. Current
research states that adults should consume no more than 30% of their calories in the form of fat, they
need about 50 grams (women) or 63 grams (men) of protein daily, and should provide for the remainder
of their caloric intake with complex carbohydrates. One gram of fat contains 9 calories and
carbohydrates and proteins contain 4 calories per gram. A “good” diet should also contain 20-35 grams
of dietary fiber. A variable named ”rating” was calculated by Consumer Reports.
Our data consists of the following information:
Variable Name Description
Name Name of cereal
type cold or hot
calories calories per serving
protein grams of protein
fat grams of fat
sodium milligrams of sodium
fiber grams of dietary fiber
carbo grams of complex carbohydrates
sugars grams of sugars
potass milligrams of potassium
vitamins vitamins and minerals - 0, 25, or 100, indicating
the typical percentage of FDA recommended
shelf display shelf (1, 2, or 3, counting from the floor)
weight weight in ounces of one serving
cups number of cups in one serving
rating a rating of the cereals
Note: A value of −1 for nutrients indicates a missing observation. Total number of cases: 77.
Question:
The experiment is to develop one’s own rating system and find which cereal is the most healthy for you.
Try to predict the cereal rating based on the nutrition facts and other observable characteristics. What
variables best predict the rating? Can you quantify how good your predictive model is?
1
Important Items:
1. Goal: Construct a regression model.
2. Provide descriptive statistics such as sample size, minimum value, median, mean, variance /
standard deviation, maximum value for quantitative values. For indicator variables, provide
information as to the number of 0’s, the number of 1’s, and the sample size.
3. Ignore the missing values. Do not use the Name column in your analysis.
4. Check for multicollinearity. Exclude any variables with V IF > 10. Be sure to report if any
variable(s) needed to be removed or if there were no issues present. This test only needs to be done
once at the beginning of the analysis.
5. Check the model assumptions (linearity, independence, normality of residuals, equal variance of
residuals). If you do not need to check a model assumption, explain why.
• Provide any applicable plots or tests and interpret them.
• For normality testing, use a 0.05 significance level.
• If any of the model assumptions are not met, suggest ways to “fix” your data and then
proceed to adjust it. You may round λ to one of the following values: −2,−1,−0.5, 0, 0.5, 1, 2,
as suggested by the BoxCox transformation, even if software does not specifically suggest a
convenient lambda value.
• Be sure to re-check all the model assumptions after any transformations and address each of
the assumptions in your report.
• Note: To simplify things, if you need to do a transformation, do one transformation, and then
even if the model assumptions are not met, proceed with analysis. Make some comments as to
the fit of the model, but then continue with the process.
6. Build the “best” model possible by using either backward selection or forward selection (pick one)
with the criteria for inclusion as having a significance of 0.10 or lower.
7. Draw conclusions/interpret your regression model. Include a statement about R2 before and after
creating the “best” model possible. Include statements about each variable kept in the final model.
Make sure these conclusions can be understood by a customer.
Grading:
You will be graded on the following items:
• Code Provided
• Written using Sentences
• Data Summary
• Initial Regression Analysis and Assumptions Check
• Transformation of Variables (if needed, justification required)
• Second Regression Analysis and Assumptions Check (if needed)
• Building the “best” model possible. Includes summarizing which variables are kept and which are
removed. Good to mention if the assumptions are met or not (specific plots / assumptions checks
not required here).
• Drawing Conclusions based on the “best” model possible. Includes final model statement,
interpretation of parameter values, how R2 changes before and after creating the “best” model.
2
Some useful SAS procedures :
• DATA or PROC IMPORT
• PROC REG
SAS PROC REG - model options: https://support.sas.com/documentation/cdl/en/statug/
63033/HTML/default/viewer.htm#statug_reg_sect013.htm
• PROC CORR
• PROC UNIVARIATE
• PROC TRANSREG
• SAS Procedure Help
https://documentation.sas.com/?docsetId=proc&docsetTarget=titlepage.htm&
docsetVersion=9.4&locale=en
Some useful R functions :
• read.table() or read.csv() to import data
• lm() and plot(lm()) to fit linear regression model
• cor() for correlation coefficient
• boxcox() in MASS package
• vif() in car package
• Chapter 11 (R Introduction)
https://cran.r-project.org/doc/manuals/r-release/R-intro.pdf
3
51作业君

Email:51zuoyejun

@gmail.com

添加客服微信: abby12468