MA 705: Fall 2020 Prof. Cherveny Individual MA705 Project Due: Dec 14th Goals 1. Demonstrate mastery of data science tools (python, working with data frames, visual- izations, data collection and cleaning, dashboards) 2. Communicate data to a general audience Description Choose a real-world question and address it as a data scientist using tools learned in MA705. Major components of the project are: • Articulating a clear question • Collecting data • Cleaning the data • Building a dashboard Instructions 1. Topic: Choose a motivating question to anchor your project. This is the hardest part, and doing it well makes the rest of the project easier and more fun. The topic may be anything you like, but answering it should involve a variety of tools learned in MA705. • Finance/Real estate/Health • Sports • Weather • Pets/Animals/Plants/Nature • Food/Dining/Cooking • Music/Movies/TV/Books • Product of interest (Coke, McDonalds, Pizza, Cereal) • Cars/Trains/Flights • Breakdown of a topic by state, country, county 2. Data Collection: Use web scraping, APIs, and publicly available data sets to collect data that may be useful. 3. Data Wrangling: Appropriately clean the raw data, deal with missing values, merge data sets, etc. Prepare at least one nice data frame for downstream use. MA 705: Fall 2020 Prof. Cherveny 4. Data Presentation: Create a dashboard that presents your data to the user in a way that answers the question according to the user’s interests. At a minimum, there should be a table and a visualization that update according to the user’s input. Feel very free to include more elements. The challenge involved will be noted (but don’t make a complex dashboard for the sake of complexity.. dashboards not user-friendly or not designed with a clear central question in mind will not be viewed favorably). Most projects will not use every tool discussed in MA705, but they should have the above elements to some extent. The lack of challenge in one step may be offset by more involved work in another step. You should feel free to incorporate models learned in other courses (if done right!), but you may not hand in this project (or part of it) as your semester project in another course without speaking to both instructors. Please speak to me if you want to incorporate tools that significantly go beyond those covered in MA705. You may consult any references (books, media, web). However, you shouldn’t consult professionals about this project. Document all data sources and references. Deliverables • Primary: A well-designed Dashboard. It should be visually appealing, address a clear motivating question, be self-contained (meaning good labels on everything, with possibly brief text or markdown explanations as needed). It should allow for client customization in some ways. • Other: Supporting .py files or data sets. Project Example Here are a few detailed examples of possible projects. • (Based on # 3 from Week 11 Exercises) How can we find a great video game? Build a dashboard that presents the user with recommendations based on customized filters. This would involve: – Scraping MetaCritic1 to build a data set containing the 1000+ best video games. Variables might include name, release date, platform, genre, rating, critic score, user score, number of critic reviews, and summary. – Building a dashboard that displays a list of the top 20 video games based on the user’s search parameters and relevant information for each one. The dashboard would have controls for release date, platform, genre, minimum number of critic reviews, and whether results are based on user or critic score. – Ideally a visualization would be involved. The Week 11 exercise suggested a scat- terplot of user score vs critic score with a trend line for practice, but this doesn’t seem like it would help the user with their choice. Perhaps a bar plot containing the top 20 games for the search parameters, sorted by the overall score? 1https://www.metacritic.com/browse/games/score/metascore/all MA 705: Fall 2020 Prof. Cherveny – Challenge: Maybe the user could enter a keyword, and only results that also contain that keyword in the summary would be displayed? – Extra challenge: Add more information for each video game recommendation, such as an icon of the video game or the best price on Amazon or a link to Amazon? This might even involve actually getting the icon and real-time price from Amazon. Again, this is extra challenge and would go beyond the project requirements. • (Current MA705 project) How does NBA team performance depend on player age? – Prepare a data set containing the game logs for all players in the NBA during the 2018-2019 season by either scraping or using the NBA.com’s API. For each player and each game the player was in, variables collected would include number of minutes played, points scored, rebounds, team they played for (player get traded mid-season), and date of the game. – Based on the date of the game and (separately scraped) birthdate of a player, add a variable to the data frame for age of each player in that game. – Now prepare a dashboard that builds a histogram of, say, minutes played in the season by players of each age. The user would select the histogram variable to plot against age (minutes played, points scored, etc) as well as the NBA team to display the data for. – When the user selects an NBA team, the team’s season stats appear, such as win- loss record and if they made the playoffs. It would be cool to have a team logo display as well.
欢迎咨询51作业君