Statistical Methods for Data Science

DATA7202

Semester 1, 2020

Assignment 4 (Weight: 30%)

Assignment 4 is due on June 27 2020, 2:00pm.

There are two questions below. For questions 1 and 2, you should present your analysis

of data using Python, Matlab, or R, as a short report, clearly answering the objectives

and justifying the modeling (and hence statistical analysis) choices you make, as well as

discussing your conclusions.

1. (20%) Consider the following example from Efron and Tibshirani (1993). When a

drug company introduces new medications, they are sometimes required to show

“bioequivalence”. Or, in other words, to demonstrate that the new drug is not

substantially different than the current treatment.

The table bellow shows eight subjects who used medical patches to infuse a certain

hormone into the blood. Each subject received three treatments: placebo, old-

patch, new-patch.

subject placebo old new old - placebo new - old

1 9243 17649 16449 8406 -1200

2 9671 12013 14614 2342 2601

3 11792 19979 17274 8187 -2705

4 13357 21816 23798 8459 1982

5 9055 13850 12560 4795 -1290

6 6290 9806 10157 3516 351

7 12412 17208 16570 4796 -638

8 18806 29044 26325 10238 -2719

Let

• Z = old− placebo, and

• Y = new − old.

The Food and Drug Administration (FDA) requirement for bioequivalence is that

|θ| 6 0.20, where: θ = E[Y ]E[Z] .

Write a program that performs the following calculations; set the generator seed to

be 12345.

(a) Calculate the plug-in estimate of θ, which is equal to θ̂ = Y /Z.

(b) Using the bootstrap method with B = 1000 replications, calculate the 95%

confidence interval. Compare the obtained interval with the desired quantity:

|θ| 6 0.20. What is your conclusion?

1

2. (80%) Air Secure wishes to open a number of new service desks, guaranteeing that

in the long run 90% of their customers do not have to wait longer than 8 minutes in

a waiting queue before they are served. Preliminary research by Air Secure showed

that on arrival customers always choose the smallest queue and remain there until

served. This research also investigated the passengers inter-arrival time (in minutes)

and the service time. The results are summarized in data.csv. The data for the

first four passengers are provided below.

inter_arrival_time service_time

2.1230325064814 3.83455057136373

0.304277254841897 3.07898542818172

0.162593146778897 3.87336623034977

0.183088166798198 8.55428148088529

Perform a Discrete-Event Simulation study in Matlab, Python, or R, to answer the

following question.

How many service desks should be minimally available to meet the service re-

quirements? Namely, how many service desks should be available such that

with probability 0.9, a customer do not have to wait longer than 8 minutes in

a waiting queue before they are served. Run the simulation for T = 3000 units

of time.

Perform a Discrete-Event Simulation study to answer the following question.

(a) Give the problem summary and describe the project objective.

(b) Give a specification of variables used in the simulation study. In addition,

show a diagram that describes the project dynamics.

(c) Results and Analysis. Using tables and figures, present a clear outcome of

your study. Present the corresponding confidence intervals.

(d) Formulate your conclusions.

(e) Appendix. Include all code files used. Explain their interaction and provide a

clear and well-commented code.

2