辅导案例-AD654

欢迎使用51辅导，51作业君孵化低价透明的学长辅导平台，服务保持优质，平均费用压低50%以上！ 51fudao.top

AD654: Marketing Analytics
Boston University

Assignment IV: Time Series and AB Testing

Once you have completed this assignment, you will upload two files into Blackboard: The
.ipynb file that you create in Jupyter Notebook, and an .html file that was generated from your
.ipynb file. If you run into any trouble with submitting the .html file to Blackboard, you can
submit it as a PDF instead.

For any question that asks you to perform some particular task, you just need to show your
input and output in Jupyter Notebook. Tasks will always be written in regular, non-italicized
font.

For any question that asks you to include interpretation, write your answer in a Markdown cell
in Jupyter Notebook. Any homework question that needs interpretation will be written in
italicized font. Do not simply write your answer in a code cell as a comment, but use a
Markdown cell instead.

Remember to be resourceful! There are many helpful resources available to you, including the
video library, the class slides, the recitation sessions, the Zoom office hours sessions, and the
web.

Part I: Working with Time Series Data


A. Pick any publicly-traded company that trades on the Nasdaq or the NYSE.
a. What company did you select, and what is its ticker symbol?

B. Go to Yahoo! Finance: finance.yahoo.com. Enter your company’s ticker symbol
in the search bar near the top of your screen. Next, click on “Historical Data”
and then “Download.” This will automatically download a .csv with one year’s
worth of the company’s data onto your computer.

C. Bring the dataset into your environment. For this step, bring the dataset into
your environment in the same way we have done throughout the semester --
just use read_csv() from pandas, passing the name of the file into the function.
a. Use the head() function to explore the variables.
b. Next, call the info() function on your dataset.

D. Is this dataframe indexed by time values? How do you know this?

E. If you answered no to the previous question, you will need to tell Python that
this data is actually a time series. Convert it to a time series now -- do this
without reading the entire file back into your environment.

F. In your Jupyter Notebook, view the index attribute of your time series.
a. Now, view the max and min value of your index attribute.
b. Now, view the argmax and argmin values of your index attribute.
c. What do the results of max, min, argmax, and argmin represent?

G. Let’s visualize the entire time series.
a. Create a line plot that depicts all of the movement of your ‘Close’
variable for your stock.
b. Now, add a horizontal, dashed line that spans the entire length of your
graph. The height of this line should represent the mean ‘Close’ value
from your dataset. Color this line with any color that you like (you might
even want to try a hexadecimal value!)
c. As we all know, 2020 has been a pretty crazy year -- and for stock
market investors, it has sometimes felt like a ride on a roller coaster at
Lobster Land. Use shading to show the contrast between February,
March, April, May, and June of 2020. Shade each of these months in a
slightly different way on the graph.
i. In a few sentences, how did your stock perform across this
five-month span? You don’t need to do any outside research or
analysis to answer this -- just describe what your graph is
showing.

H. Let’s visualize some Simple Moving Averages. Show 5, 10, and 20-day Simple
Moving averages of the ‘Close’ variable for your company’s data in three
separate line graphs. Each time, include the daily closing price for your company
overlaid on your graph.
a. How did the three simple moving average plots compare to one another?
How are they similar, and how are they different?
b. What are some pros and cons of using simple moving averages? What
about the pros and cons of using shorter or longer k-values in a moving
average?

I. Next, we will try something called resampling.
a. Resample your time series so that its values are based on some different
unit of time (larger than daily).
i. Plot this newly-resampled time series.
ii. Provide an example that explains why someone might care about
resampling. To answer this, you may use ANY example that you
can think of, or discover, from any field that uses time series data
(health, weather, market forecasting, etc.) You don’t need to
perform any outside research or go too deeply into domain
knowledge here -- 3-4 thoughtful sentences are all you need.




Part II: Using a Statistical Test to Analyze Data

This summer, Lobster Land introduced a new game of chance called “Giant Dice.” Any visitor
to the park can play this game for $5. After paying the money, the person is allowed to choose
any number from 1 through 6. The visitor can then either roll, or throw, one gigantic wooden
dice (as shown above). If the dice comes up with the same number that the player chose
before throwing it, the player will receive a Lobster Land t-shirt.

An angry park visitor has decided to sue Lobster Land, because he spent $80 playing this game
but did not win a t-shirt. After 16 rolls of the dice, for which he chose the number “4” each
time, he failed to win on any occasion. He is claiming that Lobster Land must have manipulated
the dice so that the “4” result would not come up. Even though his legal fees will greatly
exceed $80, he has announced that he will sue Lobster Land in a court of law to recover his
losses.

Lobster Land is hoping that you can save the day here! They hope that you can use some of
your analytics skills to help them out and stop this lawsuit before it goes any further.

A. Lobster Land built its dice so that they will perform in a similar manner as the
randint() function from the random module in Python. Using Python, simulate
120 rolls of a dice, being sure to use random.randint() to generate the values.

If this were a completely fair dice, how many instances of 1, 2, 3, 4, 5, and 6
would you expect to result from this simulation? How many of each outcome
did you actually get?

B. Using an appropriate statistical test, determine whether your results support the
claim made by the angry park visitor. What is the null hypothesis? Based on the
evidence here, will you reject or fail to reject the null hypothesis?

C. Run the simulated dice roll again, but this time, use 1200 rolls, rather than 120.
Using these newly-obtained values, run your statistical test again. How did your
test statistic change? How will you interpret these results?

D. Run the simulated dice roll another time, but this time, use 12,000 rolls, rather
than 1200. Run the statistical test yet again. How did your test statistic change?
How will you interpret these results?

E. What general trend did you notice as you increased the number of dice rolls in
the simulation? Why do you think this is the case? To answer this, you don’t
need to cite any formal statistics rules or formulas -- you can answer this in your
own words, in a couple of sentences.

欢迎咨询51作业君