MGT458 Final Examination 2018 Fall Term For Instructor Use: Question Points Score 1 34 2 53 3 8 Total: 95 Page Bonus Points Score 13 4 Total: 4 Page 2 of 14 MGT458 Final Examination 2018 Fall Term 1. Fall, Winter, Spring, Summary?! [45 minutes] (34 points) When you give answers to the following questions, be very precise. (a) (4 points) A data analyst claims to have used some standard methods with de- fault values provided by packages such as pandas to create the following summary table. Does this seem to be believable? Ages Percent Frequency 9-13 10 14-19 26 20-24 22 25-28 15 29-32 17 (b) (3 points) The following data represent the net worth (in millions of dollars) of 45 national corporations. Class limit Frequency 10-20 2 21-31 8 32-42 15 43-53 7 54-64 10 65-75 3 What Python command could be used to create a histogram that presents these data. Assume that you have access to the original data set with 45 observations. Page 3 of 14 MGT458 Final Examination 2018 Fall Term (c) Are the following examples classification or regression/estimation problems. Cir- cle the correct answer. i. (1 point) Predict ages of your customers. A. Classification B. Regression/Estimation ii. (1 point) Predict marital status of your customers. A. Classification B. Regression/Estimation iii. (1 point) Predict the time a customer spends browsing your website. A. Classification B. Regression/Estimation (d) Are the following examples supervised or unsupervised machine learning prob- lems. Circle the correct answer. i. (1 point) Grouping similar customers of a retail company for targeted ad- vertisements. A. Supervised B. Unsupervised ii. (1 point) A Wall Street analyst has been asked to find out the expected change in stock price for a set of companies with similar price/earnings ra- tios. A. Supervised B. Unsupervised iii. (1 point) Whether or not a customer will leave the company. You have ac- cess to previous customers’ labeled data. A. Supervised B. Unsupervised Page 4 of 14 MGT458 Final Examination 2018 Fall Term (e) (4 points) How to find outliers for a variable in a given data set? Is it always necessary to remove outliers from the data before performing any further anal- ysis? Explain why (or why not)? (f) (4 points) An e-commerce company wants you to calculate the average number of sales per day. Data of one-year sales are given. Considering there are a few festive days where the sales are enormously high compared to all other days in a year. Which measure (Mean/Median/Mode) can you use in this case to get a good estimate of average sales on a regular day? Explain why you use that measure? (g) (4 points) Indicate Mean, Median, and Mode in the chart below. What can you say about the skewness of the distribution? Page 5 of 14 MGT458 Final Examination 2018 Fall Term (h) (3 points) Explain why a birthdate variable would be preferred to an age vari- able in a database. (i) (3 points) Can you think of any reasons why, as a strategy for dealing with missing data, it might not be recommended to simply omit the records or fields with missing values from the analysis? (j) (3 points) Data visualization is a very important tool for exploring the data before performing any further analyses. Individual variables can be explored using histograms or bar charts, for example. When exploring relationships be- tween two quantitative variables, scatter plots are the most commonly used tool. Draw a scatter plot that uncovers an outlier that would be invisible from one-dimensional data exploration of the two individual variables. Page 6 of 14 MGT458 Final Examination 2018 Fall Term 2. Credit Cards [110 minutes] (53 points) Suppose you work as a digital marketing analyst for a large bank. Your project for today will be to perform a marketing analysis of UTM sta↵ members who are using special purchasing cards (P-cards) that are used on campus. P-cards are a business credit card that some employees are permitted to use to purchase necessary goods and services. If employees agree to certain rules, they can then use a P-card to make appropriate business purchases rather than using their own credit card. This allows the employee to avoid spending personal funds and seeking reimbursement. You have been assigned the task to study all of UTMs P-card transactions for 2017. To perform this task, you received a CSV file of all P-card transactions for the entire province of Ontario (the province collects all transactions for provincial and higher education institutions) containing the last five years of transactions. (a) (1 point) After loading the CSV file you want to inspect the types of the vari- ables. State a suitable Python command. (b) (3 points) The resulting output is as below. Comment on this output. AgencyNumber int64 AgencyName object CardholderLastName object CardholderFirstInitial object Description object Amount object Vendor object TransactionDate object PostedDate object MerchantCategoryCode(MCC) object Page 7 of 14 MGT458 Final Examination 2018 Fall Term Here are two sample lines of the CSV file. Note: They are printed in three separate lines each here to fit them onto the page. "1000","UNIV OF TORONTO MISSISSAUGA","Edwards","M","GENERAL PURCHASE", "$1,710.00","STYLEBOOK INC","23/7/2014 0:00","24/7/2014 0:00", "WOMEN’S READY-TO-WEAR STORES" "98000","RIVER DAM AUTH.","McGuire","D","GENERAL PURCHASE", "($9.73)","WALKER’S HARDWARE","11/12/2017 0:00","12/12/2017 0:00", "HARDWARE STORES" For all the following parts of this question write Python code. You can use suitable packages such numpy, pandas, matplotlib, and so forth, and you may assume that they have been imported accordingly. (c) (4 points) Ensure that the dates are recognized correctly. Transform the respec- tive column(s). (d) (4 points) Ensure that dollar amounts are recognized correctly. (e) (2 points) Ensure that we are only considering transactions from 2017. Page 8 of 14 MGT458 Final Examination 2018 Fall Term (f) (2 points) After that, ensure that we are only considering transactions from UTM. (g) (5 points) Find all the employees who spent more than $3,000 per a single trans- action. Display all transaction details sorted by the transaction amount in de- creasing order. (h) (5 points) Visualize the findings of the previous analysis in a suitable graph. Page 9 of 14 MGT458 Final Examination 2018 Fall Term (i) (6 points) Display the name and total amount spent during the year for all employees who spent more than $30,000 in 2017. Sort by the total amount spent with the larger amounts listed first. (j) (6 points) Display the name, total amount spent during the month and the month for all employees who spent more than $10,000 per month in 2014. Sort by month (January listed first) and then total the amount spent with the larger amounts listed first. Page 10 of 14 MGT458 Final Examination 2018 Fall Term (k) (10 points) Did any of the employees split an amount of more than $3,000 be- tween two or more swipes of the card by the same person. Display all transaction details where the vendor and purchaser are the same on a specific day, there is more than one transaction for the day and the combined total of the transactions was more than $3,000. Sort them in ascending order by the TransactionDate. (l) (5 points) Continued from the previous part of the questions. Count how often each individual purchaser paid amounts over $3,000 in more than one transac- tion. Sort this in descending order by the count. Page 11 of 14 MGT458 Final Examination 2018 Fall Term 3. Churn Clusters [25 minutes] (8 points) Churn, also called attrition, is a term used to indicate a customer leaving the service of one company in favor of another company. We are investigating a data set with nine variables that contains 3333 customers. The variables of interest are 1. Account length: Integer-valued, how long account has been active. 2. International plan: Binary categorical, yes or no. 3. Voice mail plan: Binary categorical, yes or no. 4. Total day minutes: Continuous, minutes customer used service during the day. 5. Total eve minutes: Continuous, minutes customer used service during the evening. 6. Total night minutes: Continuous, minutes customer used service during the night. 7. Total international minutes: Continuous, minutes customer used service to make in- ternational calls. 8. Number of calls to customer service: Integer-valued. 9. Churn: Target. Indicator of whether the customer has left the company (true or false). Assume we used a clustering algorithm to cluster the data into three clusters. All of the following figures show a statistic for all 3333 records followed by statistics for the three individual clusters. (a) (2 points) Briefly describe the members of the three clusters with respect to International Plan adoption. (b) (2 points) Briefly describe the members of the three clusters with respect to VoiceMail Plan adoption. Page 12 of 14 MGT458 Final Examination 2018 Fall Term In class we have discussed that standardization of the data might help improve the performance of certain machine learning algorithms. Instead of standardizing, the attributes can also be normalized after which the range into which the values fall is restricted to go from 0 to 1. Let MaxX be the maximum value of a particular attribute X and let MinX be the minimum value of that particular attribute X, then the normalized value, n(x), of a particular value x is calculated as n(x) = (x MinX)/(MaxX MinX). For example, if MaxX = 50 and MinX = 10, then for x = 20 we get n(20) = (20 10)/(50 10) = 10/40 = 0.25. (c) (2 points) If MaxX = 100 and MinX = 40, then for x = 60 we get n(x) = ? The following table presents the normalized scores of the numerical variables de- scribed at the beginning of the question. These normalized scores are the scores of the three means of those three clusters. Cluster Count AcctLength DayMins EveMins NightMins IntlMins CustServCalls 1 92 0.434 0.536 0.5669 0.4764 0.5468 0.1630 2 2411 0.413 0.513 0.5507 0.4774 0.5120 0.1753 3 830 0.412 0.509 0.5564 0.4795 0.5077 0.1701 (d) (2 points) Briefly comment on the this information for the three means. The following part is a bonus/challenge question. You are not required to solve it, but you will receive a bonus towards the examination mark for solving it. (e) (4 points (bonus)) Give a detailed description of the cluster members in the three di↵erent clusters. Page 13 of 14 MGT458 Final Examination 2018 Fall Term SCRAP SHEET This page will NOT be marked, but you must submit this page with your exam paper. Page 14 of 14 End of exam.
欢迎咨询51作业君