程序代写案例-DNSC-4211

欢迎使用51辅导,51作业君孵化低价透明的学长辅导平台,服务保持优质,平均费用压低50%以上! 51fudao.top
PROGRAMMING FOR ANALYTICS
DNSC-4211 Programming for Analytics







Final Examination (Fall 2019)
Date
: Date: xx/xx/2019 Time: 2 hours
THIS IS SAMPLE FINAL EXAM FILE FROM PREVIOUS SEMESTER
This was in classroom examination

INSTRUCTIONS FOR STUDENTS:

• Duration: 2 hours
• There are five questions (check all pages are printed, total 8 pages).
• Answer all questions.
• Save individual files (Total 5 working files, read the instruction given with each
question.
• Name the folder with your GWid (), zip the folder, and upload it on blackboard
• Student can use/access the resources/materials from blackboard only.
• Assume missing information and mention it explicitly in code as comments.
• Cell phones, chat option, use of google, sharing code are not allowed, two exact
similar code will receive a grade of ZERO on the final exam.
• Upload your final files folder using the link available on Blackboard: Upload Finals
files folder here:
$ALL THE BEST$
PROGRAMMING FOR ANALYTICS
QUESTION 1






1. Creating functions, loops, for-loop and while-loops, nested loops. [20 points]
In a single Jupyter notebook file, name the file as ‘answer1.ipynb’

Task1: Write a program to accept 3 integers separated by a comma (e.g. 1,3,5) from a user and
then find average of those numbers without using any built-in.

Task2: Write a program to accept one string as an input from user and print the same string in
reverse order. (Note: Don’t use built-in function. If input is “apple”, output should be “elppa”)
Task3: Create three text files and name them file1.txt, file2.txt and file3.txt. Populate each file
with 10 random numbers separated by commas. Keep these files on the same folder as your
python script (e.g. file1.txt could contain 4, 7, 10, 30, 2, 1, 20, 15, 8, 3). Write a program which
will perform following tasks:
a) Accept name of text file from user.
b) Read the numbers from the first text file into a list.
c) Sort the list and print the original as well as the sorted list.
d) Experiment with other two files as well.

QUESTION 2
2. Data Visualization using Python (matplotlib) [20 points]
In a single Jupyter notebook file, name the file as ‘answer1.ipynb’
Task1: Read the Salaries data set (Salaries.csv) and create some vectors of variables,
which are rank, discipline, phd, service, sex, and salary.
• Part A: Create a Bar plot based on service and summaries the salaries per service
category. What information we can extract from the plot???
• Part B: Create Box plot comparing salary and phd, comment on the output plot based on
median and quantiles.
• Part C: Create Pie chart comparing salary package of ten professionals
• Part D: Create Scatter plot that shows the relationship between two factors of an
experiment (You can assume any two factors from dataset, comment on your output and
selection)
PROGRAMMING FOR ANALYTICS
Task2: Create a pie chart for Persons Weekly Spent Time per activities based on given






following toy dataset
• days = [1,2,3,4,5]
• sleeping = [7,8,6,11,7]
• eating = [2,3,4,3,2]
• working = [7,8,7,2,2]
• playing = [8,5,7,8,13]
• slices = [39,14,26,41]
• activities = ['sleeping', 'eating', 'working', 'playing']
• cols = ['c','m','r', 'b','g']
QUESTION 3
3. Building Prediction Models [20 points]
In a single Jupyter notebook file, name the file as ‘answer1.ipynb’
Task1: Read the ‘Marketing_MSA.xlsx’ file. The Master’s in Business Administration (MBA),
once a flagship program in School of Business in United States. While elite schools like
GWSB, Georgetown, and Chicago are still attracting applicants, other schools are finding it
much harder to entice students. As a result, business schools are focusing on specialized
master’s programs to give graduates the extra skills necessary to be career ready and successful
in more technically challenging fields. An educational researcher is trying to analyze the
determinants of the applicant pool for the specialized Master of Science in Accounting (MSA)
program at medium-sized universities in the United States. Two important determinants are the
marketing expense of the business school and the percentage of the MSA alumni who were
employed within three months after graduation. Consider the data collected on the number of
applications received (Applicants), marketing expense (Marketing, in $1,000s), and the
percentage employed within three months (Employed).
• Part A: Estimate and interpret the effect of Marketing and Employed on the number of
applications received. For a given marketing expense of $80,000, predict the number of
applications received if 50% of the graduates were employed within three months.
Repeat the analysis with 80% employed within three months







PROGRAMMING FOR ANALYTICS
QUESTION 4




4. Machine Learning using Python [20 points]
In a single Jupyter notebook file, name the file as ‘answer1.ipynb’
Task1: Perform KMean cluster analysis on ‘Country clusters.csv’
• Part A: Cluster the countries based on “Latitude:”, “Longitude” and “Language:”
• Part B: Add scatter and dendrogram plot
• Part C: Use both elbow and Hierarchical clustering method

Task2: A national phone carrier conducted a socio-demographic study of their current mobile
phone subscribers. Subscribers were asked to fill out survey questions about their current
annual salaries (Salary), whether or not they live in a city (City equals 1 if living in a city, 0
otherwise), and socio-demographic information such as marital status (Married equals 1 if
married, 0 otherwise), sex (Sex equals 1 if male, 0 otherwise), and whether or not they have
completed a college degree (College equals 1 if college degree, 0 otherwise). Read the survey
data file “Mobile Phone Sub. xlsx” collected from 196 subscribers.

• Part A: Perform agglomerative hierarchical clustering and interpret the results.
PROGRAMMING FOR ANALYTICS
QUESTION 5







4. Data wrangling using python / Regular expression [20 points]
In a single Jupyter notebook file, name the file as ‘answer1.ipynb’
Task1: This set of questions is based on summer Olympic games. These games are held every
four years and the files that you will be using to answer questions are: ‘summer.csv’,
‘countrydata.csv’, and ‘G20.csv’. The datasets variables are described below:
This dataset is based on summer Olympic games. These games are held every four years and
the file that you will be using to answer questions is: ‘summer.csv’. The data in this dataset is
described below:
• Year: Year of the Olympics
• City: City which hosted the Olympics
• Sport: The sport as in aquatics, athletics etc.
• Discipline: The discipline in the sport, e.g. Freestyle in the sport of swimming
• Athlete: Name of athlete who won a medal
• Country: Name of country represented by the athlete
• Gender: Gender of the athlete (Men or Women)
• Event: Name of event, e.g. 50M Freestyle in swimming
• Medal: Name of medal as in Gold, Silver or Bronze
The next set of data is based on ‘countrydata.csv’
• Country: Name of country
• Code: Three letter code for the country
• Population: Population of the country
• GDPpc: GDP per Capita

The next set of data is based on ‘G20.csv’
• Member: Name of member country
• HDI: Human Development Index: The Human Development Index (HDI) is a statistic
composite index of life expectancy, education, and per capita income indicators, which
are used to rank countries into four tiers of human development. A country scores a
higher HDI when the lifespan is higher, the education level is higher, and the GDP per
capita is higher.
• IMFClassification: Classification of the country provided by the International
Monetary Fund

Answer the following question based on the above datasets
• Part A: Plot the total number of gold medals won by countries placed in the top five
based on the human development index.

欢迎咨询51作业君
51作业君

Email:51zuoyejun

@gmail.com

添加客服微信: abby12468