辅导案例-IB9JV0

欢迎使用51辅导，51作业君孵化低价透明的学长辅导平台，服务保持优质，平均费用压低50%以上！ 51fudao.top

IB9JV0 Individual Project (100%)
Scenario
You work for the data science group of a US-based large supermarket chain. Today, you are
assigned to develop a predictive model that can help to improve the future sale of domestic
wine. Your colleague managed to obtain a dataset of 54503 different wines from a market
research firm. The dataset is stored in a .csv file. It contains 8 different attributes about the
wines:
1. Id: uniquely identify a wine.
2. Name: the name of the wine.
3. Score: the rating score given by professional reviewers, scale of 1-100.
4. Price: manufacturer suggested retail price in US dollars.
5. State: the US state where the wine is made.
6. Region_1: The region where the wine is made.
7. Region_2: More specific region where the wine is made.
8. Variety: The grapes used to make the wine.
Requirements:
1. All codes must be implemented using Python.
2. You should use Jupyter Notebook to work on this project and submit the .ipynb file.
3. You are required to write an executive summary (word or pdf file) to present your work.
The summary should be no more than two pages (double spaced, excluding any figures,
tables, and references)
4. Codes must be well documented with comments.
5. You should also include narratives along your codes using Markdown to explain and justify
your steps, as well as describe any insights gained from each step.
6. You may search online or discuss with other students, but each student must work
independently.
Notes:
1. Additional Python packages (not covered in class) are welcomed to use. But they should
be well documented through Markdown.
2. Comments are different from explanations using Markdown.
3. Here is the importance of each component of your work. The percentage is only indicative.
 Explanation/Description using Markdown (20% - 25%)
 Code, including comments (60% - 65%)
 Executive summary (15% - 20%)
4. This dataset is adopted from an open dataset but has been modified to fit our project. You
might find similar datasets online but don’t rely on the existing solutions as they may not
work properly.
5. The accuracy (or other metrics) of your final prediction model is less important than the
process to achieve and improve that value. Particularly when the dataset is modified, you
may end up with a low accuracy.
6. You may try different algorithms (including those not covered in the lectures) and include
them in your submission. However, a purposeful selection of a smaller number of algorithms
with good justification is better than a random selection of a larger number of algorithms
without good justification.
7. Given the size of dataset, it may take time to train your model.