程序代写案例-STAT5002

欢迎使用51辅导,51作业君孵化低价透明的学长辅导平台,服务保持优质,平均费用压低50%以上! 51fudao.top
STAT5002 Introduction to Statistics
Final Examination
Semester 1, 2020
Date: Monday 22 June 2020
Time: 11:00 am to 2:00 pm (writing + submission)
Duration: 180 minutes

Exam Instructions:
1. This exam has 6 questions, including one student declaration statement. Please review the
Student Charter, Links to an external site. Write down “yes” or “no” at the beginning of your
solutions, and sign your name next to it.
NB: Responding “yes” to the declaration means that you undertake the exam in the prescribed
exam conditions with no assistance from a third party or the use of prohibited resources.
2.
3.

4.
5.
Attempt all questions. Questions are not of equal mark value.
All electronic devices and reference material besides those permitted must be removed from the
exam environment.
Clearly number your solutions and upload them as a single file.
The content of this exam is not to be shared or distributed in any form.

































Question 1 (23 marks) P
A bank branch located in a commercial district of a city has the business objective of developing an
improved process for serving customers during the noon-to-1:00 p.m. lunch period. The waiting time, in
minutes, is defined as the time the customer enters the line to when he or she reaches the teller window.
Data are collected from a sample of 15 customers during this hour and are listed below:

0.38 2.34 3.02 3.20 3.54 3.79 4.21 4.50 4.77 5.12 5.13 5.55 6.10 6.19 6.46

Another bank branch, located in a residential area, is also concerned with the noon-to-1:00 p.m. lunch
hour. The waiting times, in minutes, collected from a sample of 15 customers during this hour, are listed as
follows:

3.82 4.08 5.47 5.64 5.79 5.90 6.17 6.68 8.01 8.02 8.35 8.73 9.66 9.91 10.49

Below are comparative boxplots and five number summaries of the waiting times at the 2 bank branches :





a. Compare and contrast the distributions of the waiting times at the two bank branches as exhibited in the
above comparative boxplots. Include mention of the centre, shape, spread, and outliers (if any).

b. Which would be the best measure of spread for the waiting times at the bank branch in the commercial
area: the range, the standard deviation, or the interquartile range?












c. In order to analyze the waiting times as a whole, the two bank branches’ waiting times are combined
and arranged in ascending order below:

0.38 2.34 3.02 3.20 3.54 3.79 3.82 4.08 4.21 4.50 4.77 5.12 5.13 5.47 5.55
5.64 5.79 5.90 6.10 6.17 6.19 6.46 6.68 8.01 8.02 8.35 8.73 9.66 9.91 10.49

We have the following summary statistics combined from these combined data:



Find a point estimate for the average waiting times.


d. With the aid of the following normal quantile plot and box plot, is it appropriate to set up an interval
estimate for the population mean waiting times with the given conditions? Justify your answer with an
explanation.



e. Assuming that the data met all the relevant assumptions, set up a 95% confidence interval for the
average waiting times. You may find the following R output useful:



f. Is it appropriate to set up a 99% confidence interval for the proportion of the waiting times over 6
minutes? Justify your answer with an explanation.




g. Assuming that the data met all the relevant assumptions, set up a 99% confidence interval for the
proportion of the waiting times over 6 minutes. You may find the following R output useful:



h. If you wanted the total width of an 80% confidence interval estimate for the population proportion of the
waiting times over 6 minutes to be 0.05, how many customers should you randomly select? You may
find the following R output useful:








































Question 2 (23 marks) C/D

The Red Book is used in the USA for determining trade in prices for used cars. One possible determiner of
price is using the odometer (how far the car has travelled). The following is a sample of 10 cars.

Odometer (1000 miles) 59 92 61 72 52 67 88 62 95 83
Trade in price ($100s) 37 35 43 39 41 39 35 40 29 33

A scatterplot of the data is as follows:



a. Summarize the key features of the above scatterplot.

b. What method would you use to obtain the best line of best fit?

The R printout is shown below:





c. Write down the fitted regression line.

d. Is there enough evidence to show that there is a linear relationship between odometer and trade in
price at α = 0.01?
e. Find the total sum of squares and the sample variance of Trade in Price.

f. State and interpret the adjusted R2.

g. Find the residual if the odometer reading is 88,000 miles. Has the model overestimated or
underestimated the trade in price?

h. From the scatterplot, comment on the shape of the residuals. Is the linear model valid?



i. Comment on whether this model would be suitable for predicting the trade in price for a car with
100,000 miles on the odometer.

























Question 3 (6 marks) D

The height of children in a particular school is a normally distributed random variable with a mean of 140
cm and a standard deviation 16 cm.

a. What proportion of children have heights over 150 cm? You may find the following R output useful:




b. Given this sample of 10 children from the population, what is the probability that exactly 2 children will
have a height exceeding 150 cm?



c. If ten children are selected randomly from this population, what is the probability that their mean height
exceeds 150 cm? You may find the following R output useful:





























Question 4 (12 marks) HD

10 people try a diet for a month, resulting in the following weights (in kgs).

Person 1 2 3 4 5 6 7 8 9 10
Weight (before) 95 80 67 100 95 112 86 80 78 90
Weight (after) 90 82 64 95 92 102 83 81 75 88

a. For a test for the difference in means, what type of a t-test would you use? Justify your answer with an
explanation.

b. It is claimed by the Diet Company that a person loses 3 kg on average. Test this claim using a
hypothesis test at the 5% level. Your answer should include the H0 and H1, the test statistic and its
sampling distribution, a decision rule based on critical value, calculated test statistic, decision, and
conclusion. You may find the following R output useful:





c. Consider the same scenario mentioned in parts (a) and (b). Now, suppose we would like to use the
sign test to test the claim by the Diet Company instead of using a t-test. What are the assumptions of
the sign test?


d. State the H0 and H1 of the sign test.


e. Calculate the p-value of the sign test in part (d).













Question 5 (28 marks) P/C

The effect of height (H) and weight (W) on catheter length (L) on n = 12 children with congenital heart
disease was analyzed with a multiple regression model called M1.

Use the following R output to answer parts (a) to (c):




a. Write down the fitted multiple regression equation.

b. Obtain a point estimate for the error variance.

c. What is the multiple correlation coefficient between L and (H, W)?

d. What characteristics does a high leverage point have in general?






















Use the following R output to answer parts (e) to (h):







e. Are there any high leverage points in this data set? If so, identify them.

f. What is the sum of all leverage values for the data used to fit M1?

g. Are there any outliers in the data?

h. Comment on the model diagnostic plots.

i. Is there evidence of multicollinearity in M1? Explain.









j. A simple linear regression of L on W called M2 is run. The R output is shown below:



Calculate and interpret the sample correlation coefficient between the L and W values.


k. Suppose you want to test H0: βH = 0 against H1: βH ≠ 0 in M1 using the partial F test. Calculate the F
test statistic. What is the p-value for this test (Hint: You do not need to calculate; simply use the R
output).


l. Compare the R2 and adjusted R2 for the 2 regression models (M1 vs M2). Which model would you
prefer, M1 or M2? Why?






















Question 6 (8 marks) P

A business analyst in the Department of Fair Trade is investigating the collusive bidding amongst the
state’s road construction contractors who sometimes set bid prices higher than the fair market price.
Suppose an investigator has obtained information on the bid status for a random sample of 31 contracts.
In addition, 2 variables thought to be related to bid status are also recorded for each contract: number of
bidders (X1) and the difference between the winning (lowest) bid and the estimated competitive bid (X2)
measured as a percentage of the estimate.

The response variable, Y takes value 1 if it is a fixed bid and 0 if it is a competitive bid.

Let p = P(Y = 1) and r be the odds ratio of P(Y = 1) to P(Y = 0). The data are fitted, and the estimated
regression coefficients are ̂0 = 1.52, ̂1 = -0.75, and ̂2 = 0.12.


a. What is the underlying model behind the estimation result?

b. Write down the estimate regression equation that the business analyst can use to predict the probability
of a fixed bid.

c. Interpret the regression coefficient of X2.

d. In a bidding of road construction contract, 8 bidders submit a bid and the winning bid is 10% higher
than the estimated competitive bid. Estimate the probability that this is a competitive bid.



























欢迎咨询51作业君
51作业君

Email:51zuoyejun

@gmail.com

添加客服微信: abby12468