辅导案例-STAT 1361/2360
STAT 1361/2360: Statistical Learning and Data Science University of Pittsburgh Data Science Project Details – 2020 Update The information below is designed to supersede that provided in the original project description (“Data Science Project Details”). Due to the COVID-19 outbreak in the U.S. and the subse- quent transitioning to online-only coursework, a number of changes are necessary to the original project guidelines. The changes below are designed to ensure students are able to complete the entirety of the project independently under the new guidelines. 1. Project Proposals (5%): At the time of this writing, project proposals have already been completed and so there is no change to this portion of the project. No Change. 2. Oral Presentation (5%): Canceled 3. Written Report (15%): The written report is now the only component of the final project. Furthermore, the new guidelines given below consist of only 3 sections rather than the 5 originally given. Finally, these reports should now be completed independently. You may, of course, discuss various aspects of your data and models with your other group members via Skype (or some other online virtual meeting software). However, everyone in the class needs to write all parts of their report independently and in their own words, and should express their own personal unique views, thoughts, and opinions. Each written report must include the following sections: 1. Introduction: An overview of the problem of interest and details of the specific dataset as well as a clear description of the problems of interest. (∼ 0.5 - 1 page) 2. Methods/Results Overview: Provide a brief summary of all models constructed and how they performed relative to each other. This will consist primarily of two parts (i) a sum- mary of the findings from the models/experiments done in the homework and (ii) an explanation of any and all follow-up analyses you performed after all models were first constructed in order to compare models, evaluate variable importance, etc. Note that you should not generally need to include code here. If code and/or plots are necessary (in your view) to fully explain your findings, you may include that kind of thing in an appendix that appears at the end of the file. (∼ 1.5 - 2 pages) 3. Thoughts and Takeaways: After constructing the models and attempting to determine which perform best and which variables are most important, there are a number of things you ought to consider. Please think critically about each of the following issues/prompts and reply to them directly in this section of your report. Note: it is likely easiest to simply copy and paste the prompts below into your report, put them in bold, and then include your thoughts/responses below them. No other writing other than your responses to the following prompts is required in this section. 1 (a) How many models seemed to perform “best” in terms of predictive accuracy? How did you measure this? Relative to what the models are doing, does it make sense why they would perform similarly well or are they quite different? Do you have a sense of whether such models are actually “significantly” better than others? (b) Among the top-performing models, which variables seemed most important? Are they mostly the same between models or are they quite different? Do you have any intuition as to why certain variables might appear more important in some models but not in others? Think about what those variables actually measure, what their general relationship to the response might be, and what kinds of models might do better or worse at picking up different kinds of effects. (c) What were the most challenging aspects of working with your particular dataset? Were you able to mitigate these issues or do you feel that your final results are less certain as a result of them? Perhaps most importantly, do you really trust your “best” results at the end of the day? If you were in a position where you were personally held liable for any negative outcomes associated with implementing your model, how worried would you be? (d) Imagine that you had to present a summary of your findings to a non-technical decision-maker. In other words, this person is going to take some kind of action based on what you report to them but they lack the technical expertise to really “check your work” or even understand how you arrived at any of your conclusions. What would you report to them (i.e. what kind of recommendations would you give)? How would you present it to them? Imagine you had the ability to request more and/or different data. What might you want to request? More observations? More (or different) variables included? It’s difficult to set any kind of length requirements here because some of you may have a lot to say for some of these and relatively little to say for others. In general, I’m picturing roughly a page or so for each of these. So each page would have the prompt at the top in bold followed by a few paragraphs with your thoughts following that. If you have more ideas you want to share, that’s great, but you probably shouldn’t need more than 2 pages maximum for each of these (try to be concise). I’m really just interested in seeing how well you can put together the different pieces of your analysis and recognize potential drawbacks and important issues. Please don’t feel the need to drone on about one simple point for two pages just for appearances – one or two interesting points that are well-summarized in a couple of paragraphs is much better than 1 point drawn out over a page and a half. Really focus on trying to draw connections between the models you’ve built and the results you’ve seen, piece that together with the numerous issues we’ve discussed in class, and give a concise summary of that. 4. Peer Evaluations / Manager Reports (5%): Canceled 2