CS544 Final Project Picking the Data Set Look into the following sites as an example and select a data set that interests you. 1. https://www.kaggle.com/datasets 2. https://github.com/fivethirtyeight/data 3. http://www.kdnuggets.com/datasets/index.html 4. Any other source of your choice Preparing the data • Import the data set into R. • Document the steps for the import process and any preprocessing had to be done prior to or after the import. Any R code used in the process should be included. Analyzing the data • Do the analysis as in Module3 for at least one categorical variable and at least one numerical variable. Show appropriate plots for your data. • Do the analysis as in Module3 for at least one set of two or more variables. Show appropriate plots for your data. • Pick one variable with numerical data and examine the distribution of the data. • Draw various random samples of the data and show the applicability of the Central Limit Theorem for this variable. • Show how various sampling methods can be used on your data. What are your conclusions if these samples are used instead of the whole dataset. • Implementation of any feature(s) not mentioned in the above specification. Presenting the Project • You will do your project presentation with the Professor using Zoom. • Each presentation is for at most 10 minutes. Signup sheet will be provided later. Submitting the Project Upload a zip file (CS544Final_lastName.zip) containing all the code as RMarkdown (Rmd file), the presentation document (PDF or PPT), and all the results in a RMarkdown HTML.
欢迎咨询51作业君