PROGRAMMING FOR ANALYTICS DNSC-4211 Programming for Analytics Final Examination (Fall 2019) Date: Date: xx/xx/2019 Time: 2 hours THIS IS SAMPLE FINAL EXAM FILE FROM PREVIOUS SEMESTER This was in classroom examination INSTRUCTIONS FOR STUDENTS: • Duration: 2 hours • There are five questions (check all pages are printed, total 8 pages). • Answer all questions. • Save individual files (Total 5 working files, read the instruction given with each question. • Name the folder with your GWid (), zip the folder, and upload it on blackboard • Student can use/access the resources/materials from blackboard only. • Assume missing information and mention it explicitly in code as comments. • Cell phones, chat option, use of google, sharing code are not allowed, two exact similar code will receive a grade of ZERO on the final exam. • Upload your final files folder using the link available on Blackboard: Upload Finals files folder here: $ALL THE BEST$ PROGRAMMING FOR ANALYTICS QUESTION 1 1. Creating functions, loops, for-loop and while-loops, nested loops. [20 points] In a single Jupyter notebook file, name the file as ‘answer1.ipynb’ Task1: Write a program to accept 3 integers separated by a comma (e.g. 1,3,5) from a user and then find average of those numbers without using any built-in. Task2: Write a program to accept one string as an input from user and print the same string in reverse order. (Note: Don’t use built-in function. If input is “apple”, output should be “elppa”) Task3: Create three text files and name them file1.txt, file2.txt and file3.txt. Populate each file with 10 random numbers separated by commas. Keep these files on the same folder as your python script (e.g. file1.txt could contain 4, 7, 10, 30, 2, 1, 20, 15, 8, 3). Write a program which will perform following tasks: a) Accept name of text file from user. b) Read the numbers from the first text file into a list. c) Sort the list and print the original as well as the sorted list. d) Experiment with other two files as well. QUESTION 2 2. Data Visualization using Python (matplotlib) [20 points] In a single Jupyter notebook file, name the file as ‘answer1.ipynb’ Task1: Read the Salaries data set (Salaries.csv) and create some vectors of variables, which are rank, discipline, phd, service, sex, and salary. • Part A: Create a Bar plot based on service and summaries the salaries per service category. What information we can extract from the plot??? • Part B: Create Box plot comparing salary and phd, comment on the output plot based on median and quantiles. • Part C: Create Pie chart comparing salary package of ten professionals • Part D: Create Scatter plot that shows the relationship between two factors of an experiment (You can assume any two factors from dataset, comment on your output and selection) PROGRAMMING FOR ANALYTICS Task2: Create a pie chart for Persons Weekly Spent Time per activities based on given following toy dataset • days = [1,2,3,4,5] • sleeping = [7,8,6,11,7] • eating = [2,3,4,3,2] • working = [7,8,7,2,2] • playing = [8,5,7,8,13] • slices = [39,14,26,41] • activities = ['sleeping', 'eating', 'working', 'playing'] • cols = ['c','m','r', 'b','g'] QUESTION 3 3. Building Prediction Models [20 points] In a single Jupyter notebook file, name the file as ‘answer1.ipynb’ Task1: Read the ‘Marketing_MSA.xlsx’ file. The Master’s in Business Administration (MBA), once a flagship program in School of Business in United States. While elite schools like GWSB, Georgetown, and Chicago are still attracting applicants, other schools are finding it much harder to entice students. As a result, business schools are focusing on specialized master’s programs to give graduates the extra skills necessary to be career ready and successful in more technically challenging fields. An educational researcher is trying to analyze the determinants of the applicant pool for the specialized Master of Science in Accounting (MSA) program at medium-sized universities in the United States. Two important determinants are the marketing expense of the business school and the percentage of the MSA alumni who were employed within three months after graduation. Consider the data collected on the number of applications received (Applicants), marketing expense (Marketing, in $1,000s), and the percentage employed within three months (Employed). • Part A: Estimate and interpret the effect of Marketing and Employed on the number of applications received. For a given marketing expense of $80,000, predict the number of applications received if 50% of the graduates were employed within three months. Repeat the analysis with 80% employed within three months PROGRAMMING FOR ANALYTICS QUESTION 4 4. Machine Learning using Python [20 points] In a single Jupyter notebook file, name the file as ‘answer1.ipynb’ Task1: Perform KMean cluster analysis on ‘Country clusters.csv’ • Part A: Cluster the countries based on “Latitude:”, “Longitude” and “Language:” • Part B: Add scatter and dendrogram plot • Part C: Use both elbow and Hierarchical clustering method Task2: A national phone carrier conducted a socio-demographic study of their current mobile phone subscribers. Subscribers were asked to fill out survey questions about their current annual salaries (Salary), whether or not they live in a city (City equals 1 if living in a city, 0 otherwise), and socio-demographic information such as marital status (Married equals 1 if married, 0 otherwise), sex (Sex equals 1 if male, 0 otherwise), and whether or not they have completed a college degree (College equals 1 if college degree, 0 otherwise). Read the survey data file “Mobile Phone Sub. xlsx” collected from 196 subscribers. • Part A: Perform agglomerative hierarchical clustering and interpret the results. PROGRAMMING FOR ANALYTICS QUESTION 5 4. Data wrangling using python / Regular expression [20 points] In a single Jupyter notebook file, name the file as ‘answer1.ipynb’ Task1: This set of questions is based on summer Olympic games. These games are held every four years and the files that you will be using to answer questions are: ‘summer.csv’, ‘countrydata.csv’, and ‘G20.csv’. The datasets variables are described below: This dataset is based on summer Olympic games. These games are held every four years and the file that you will be using to answer questions is: ‘summer.csv’. The data in this dataset is described below: • Year: Year of the Olympics • City: City which hosted the Olympics • Sport: The sport as in aquatics, athletics etc. • Discipline: The discipline in the sport, e.g. Freestyle in the sport of swimming • Athlete: Name of athlete who won a medal • Country: Name of country represented by the athlete • Gender: Gender of the athlete (Men or Women) • Event: Name of event, e.g. 50M Freestyle in swimming • Medal: Name of medal as in Gold, Silver or Bronze The next set of data is based on ‘countrydata.csv’ • Country: Name of country • Code: Three letter code for the country • Population: Population of the country • GDPpc: GDP per Capita The next set of data is based on ‘G20.csv’ • Member: Name of member country • HDI: Human Development Index: The Human Development Index (HDI) is a statistic composite index of life expectancy, education, and per capita income indicators, which are used to rank countries into four tiers of human development. A country scores a higher HDI when the lifespan is higher, the education level is higher, and the GDP per capita is higher. • IMFClassification: Classification of the country provided by the International Monetary Fund Answer the following question based on the above datasets • Part A: Plot the total number of gold medals won by countries placed in the top five based on the human development index.
欢迎咨询51作业君