程序代写案例-STAT 101 A

欢迎使用51辅导,51作业君孵化低价透明的学长辅导平台,服务保持优质,平均费用压低50%以上! 51fudao.top
STAT 101 A Winter 2021
Kaggle Competition
The final Kaggle Submission is Due Monday March 15th, @ 11:59 PM
The final Project Paper is Due Monday March 22nd, @ 11:59 PM
Predicting Car Prices
Problem Statement
A Chinese automobile company Geely Auto aspires to enter the US market by
setting up their manufacturing unit there and producing cars locally to give
competition to their US and European counterparts. They have contracted an
automobile consulting company to understand the factors on which the pricing of
cars depends. Specifically, they want to understand the factors affecting the pricing
of cars in the American market, since those may be very different from the Chinese
market. The company wants to know:
• Which variables are significant in predicting the price of a car.
• How well those variables describe the price of a car
• Based on various market surveys, the consulting firm has gathered a large
data set of different types of cars across the America market.
Business Goal
We are required to model the price of cars with the available “independent”
variables. It will be used by the management to understand how exactly the prices
vary with the independent variables. They can accordingly manipulate the design
of the cars, the business strategy etc. to meet certain price levels. Further, the
model will be a good way for management to understand the pricing dynamics of a
new market.

Data Description:
The data at hand is divided into two data sets: Training and Testing.
The training data set contains 1500 observations and has 23 predictors and the
response variable PriceNew.
The testing data set contains 500 observations and has 23 predictors and the
response variable PriceNew.
I have already taken care of all missing values in the data sets.
The variables are described as follows:
• "Manufacturer"
• "Model"
• "Type"
• "MPG.highway"
• "AirBags"
• "DriveTrain"
• "Cylinders"
• "EngineSize"
• "Horsepower"
• "RPM"
• "Rev.per.mile"
• "Man.trans.avail"
• "Fuel.tank.capacity"
• "Passengers"
• "Length"
• "Wheelbase"
• "Width"
• "Turn.circle"
• "Rear.seat.room"
• "Luggage.room"
• "Weight"
• "Origin"
• "Make"
• "PriceNew"


Project’s Main Goals
• Use the training data to build a valid MLR.
• Check diagnostics
• Compete to make your MLR model the “best” it can be. (create new
variables out of existing ones, transformations, checking leverages and
outliers, …etc.)
• Your Task is to predict the prices of the cars in the testing data and create a
solution file and submit it on kaggle to check your predictions accuracies.
• The Competition ranks students’ submissions based on their testing R2.
• Accurate, Valid and Simple are the best models.
• The submission file must have two columns with 500 rows: The first column
named Ob and the second named “PriceNew” in a csv format only.
Key assumptions of Multiple Regression:
To perform multiple linear regression, the following assumptions must be met:
--- Before model construction: ---
• Linear relationship: The dependent variable Y (i.e Price) has a linear relationship
with the independent variables X, and to verify this, one must ensure that the XY
dispersion graph is linear.
• No multi-collinearity: Multiple regression assumes that independent variables X
are not strongly correlated with each other. This assumption is tested
using Variance Inflation Factor (VIF) or using Correlation Matrix.
--- After: Residual analysis of the model ---
• Normality of Error Distribution
• Independence of errors
• Homo-scedasticity
Grading Scheme:
1. The First One-Third of the project’s grade is based on the Kaggle Rankings.
2. The Second One-Third of the project’s grade is based on the Validity and the
Simplicity of the final MLR model. (You are not allowed to use Machine
Learning functions and tools (Like Random Forests, Ridge Regression, ...
etc) to create your predictive model. “lm, glm, step and regsubsets
functions are allowed”
3. The Last One-Third of the project’s grade is based on the final paper writ-
up. (No R-codes in your paper-unless it is in the appendix).
4. Make sure you use your full name and your Lecure Number on your Kaggle
account. Failing to do so, you may lose up to 10% of your final project
grade. A sample Name: “First Name Last Name Lecture 1”

https://www.kaggle.com/t/c2919bedc5fd4a1e913e998a4782a891
Have Fun and Good Luck

欢迎咨询51作业君
51作业君

Email:51zuoyejun

@gmail.com

添加客服微信: abby12468