AD654: Marketing Analytics Boston University Assignment IV: Time Series and AB Testing Once you have completed this assignment, you will upload two files into Blackboard: The .ipynb file that you create in Jupyter Notebook, and an .html file that was generated from your .ipynb file. If you run into any trouble with submitting the .html file to Blackboard, you can submit it as a PDF instead. For any question that asks you to perform some particular task, you just need to show your input and output in Jupyter Notebook. Tasks will always be written in regular, non-italicized font. For any question that asks you to include interpretation, write your answer in a Markdown cell in Jupyter Notebook. Any homework question that needs interpretation will be written in italicized font. Do not simply write your answer in a code cell as a comment, but use a Markdown cell instead. Remember to be resourceful! There are many helpful resources available to you, including the video library, the class slides, the recitation sessions, the Zoom office hours sessions, and the web. Part I: Working with Time Series Data A. Pick any publicly-traded company that trades on the Nasdaq or the NYSE. a. What company did you select, and what is its ticker symbol? B. Go to Yahoo! Finance: finance.yahoo.com. Enter your company’s ticker symbol in the search bar near the top of your screen. Next, click on “Historical Data” and then “Download.” This will automatically download a .csv with one year’s worth of the company’s data onto your computer. C. Bring the dataset into your environment. For this step, bring the dataset into your environment in the same way we have done throughout the semester -- just use read_csv() from pandas, passing the name of the file into the function. a. Use the head() function to explore the variables. b. Next, call the info() function on your dataset. D. Is this dataframe indexed by time values? How do you know this? E. If you answered no to the previous question, you will need to tell Python that this data is actually a time series. Convert it to a time series now -- do this without reading the entire file back into your environment. F. In your Jupyter Notebook, view the index attribute of your time series. a. Now, view the max and min value of your index attribute. b. Now, view the argmax and argmin values of your index attribute. c. What do the results of max, min, argmax, and argmin represent? G. Let’s visualize the entire time series. a. Create a line plot that depicts all of the movement of your ‘Close’ variable for your stock. b. Now, add a horizontal, dashed line that spans the entire length of your graph. The height of this line should represent the mean ‘Close’ value from your dataset. Color this line with any color that you like (you might even want to try a hexadecimal value!) c. As we all know, 2020 has been a pretty crazy year -- and for stock market investors, it has sometimes felt like a ride on a roller coaster at Lobster Land. Use shading to show the contrast between February, March, April, May, and June of 2020. Shade each of these months in a slightly different way on the graph. i. In a few sentences, how did your stock perform across this five-month span? You don’t need to do any outside research or analysis to answer this -- just describe what your graph is showing. H. Let’s visualize some Simple Moving Averages. Show 5, 10, and 20-day Simple Moving averages of the ‘Close’ variable for your company’s data in three separate line graphs. Each time, include the daily closing price for your company overlaid on your graph. a. How did the three simple moving average plots compare to one another? How are they similar, and how are they different? b. What are some pros and cons of using simple moving averages? What about the pros and cons of using shorter or longer k-values in a moving average? I. Next, we will try something called resampling. a. Resample your time series so that its values are based on some different unit of time (larger than daily). i. Plot this newly-resampled time series. ii. Provide an example that explains why someone might care about resampling. To answer this, you may use ANY example that you can think of, or discover, from any field that uses time series data (health, weather, market forecasting, etc.) You don’t need to perform any outside research or go too deeply into domain knowledge here -- 3-4 thoughtful sentences are all you need. Part II: Using a Statistical Test to Analyze Data This summer, Lobster Land introduced a new game of chance called “Giant Dice.” Any visitor to the park can play this game for $5. After paying the money, the person is allowed to choose any number from 1 through 6. The visitor can then either roll, or throw, one gigantic wooden dice (as shown above). If the dice comes up with the same number that the player chose before throwing it, the player will receive a Lobster Land t-shirt. An angry park visitor has decided to sue Lobster Land, because he spent $80 playing this game but did not win a t-shirt. After 16 rolls of the dice, for which he chose the number “4” each time, he failed to win on any occasion. He is claiming that Lobster Land must have manipulated the dice so that the “4” result would not come up. Even though his legal fees will greatly exceed $80, he has announced that he will sue Lobster Land in a court of law to recover his losses. Lobster Land is hoping that you can save the day here! They hope that you can use some of your analytics skills to help them out and stop this lawsuit before it goes any further. A. Lobster Land built its dice so that they will perform in a similar manner as the randint() function from the random module in Python. Using Python, simulate 120 rolls of a dice, being sure to use random.randint() to generate the values. If this were a completely fair dice, how many instances of 1, 2, 3, 4, 5, and 6 would you expect to result from this simulation? How many of each outcome did you actually get? B. Using an appropriate statistical test, determine whether your results support the claim made by the angry park visitor. What is the null hypothesis? Based on the evidence here, will you reject or fail to reject the null hypothesis? C. Run the simulated dice roll again, but this time, use 1200 rolls, rather than 120. Using these newly-obtained values, run your statistical test again. How did your test statistic change? How will you interpret these results? D. Run the simulated dice roll another time, but this time, use 12,000 rolls, rather than 1200. Run the statistical test yet again. How did your test statistic change? How will you interpret these results? E. What general trend did you notice as you increased the number of dice rolls in the simulation? Why do you think this is the case? To answer this, you don’t need to cite any formal statistics rules or formulas -- you can answer this in your own words, in a couple of sentences.
欢迎咨询51作业君