DNSC 6279 Projection Final Group Project (35% of Course Grade) Description: The goal of this project is to apply the techniques learned in class to analyze a real dataset of your interests. You should use different statistical learning techniques covered in this course in order to identify the best way to analyze the data and obtain some useful information. In addition to the methods covered in class, you are welcome to try techniques outside the class as well. In that case, you need to explain the techniques clearly in the presentation and report (bonus 1-5 points). You will also receive bonus points if you work on dataset related to COVID- 19. The project needs to be done as groups. Each group should be consisting of 2-3 people. Requirements: 1. Proposal (5%): A proposal for the project (no more than 1 page). The proposal should include (1) the list of members; (2) description of the problem of interest; (3) description of the dataset including data source, data size and variable information etc.; (4) explanation on the type of machine learning involved and the techniques that will be used; (5) discussions on the potential challenges and a brief plan on how to handle the difficulties. Due date: Feb 28th. 2. Midway Report (5%): This should be like a final version of your final report, and should have all sections that your final report would have (although some of them will be incomplete yet), showing as much progress as you can. The expectation is that at this point you have done 1/3 or 1/2 of the required work for this project. Due date: March 22th. 3. Presentation (10%): Group presentations will be scheduled for the last 2 weeks of classes. Each presentation should introduce the problem of interest and the dataset used to answer the questions. It should give a review of the machine learning techniques used and the main idea behind these methods as well as why they are chosen as candidate methods. A final summary of the study and the conclusion should be presented with take away messages. 4. Final Report (15%): The report should have a summary of the project as the first page of the report. You should also describe what portion of the project each partner has done. The rest of the report should clearly explain the project with all elements mentioned in the presentation. R markdown should be used to produce the final report. The total report should be no more than (10 pages), including the summary page and references. Due date: April 27th. References on Data Repositories 1. Open Government Data: www.data.gov www.data.gov.uk http://opengovernmentdata.org/data/catalogues/ 2. UCI Machine Learning Repository: http://archive.ics.uci.edu/ml/ 3. StatLib: http://lib.stat.cmu.edu/datasets/ 4. Kaggle website: www.kaggle.com 5. KDNugets http://www.kdnuggets.com/datasets/
欢迎咨询51作业君