程序代写案例-101G/108

欢迎使用51辅导,51作业君孵化低价透明的学长辅导平台,服务保持优质,平均费用压低50%以上! 51fudao.top
S1.21 STATS101/101G/108
2/22
1
BLOCK 1
These questions are worth one mark each.
1. Pick the option that correctly completes the statement
.
Data from a categorical variable are:
2. Pick the option that correctly completes the statement.
A study which observes the same group of individuals or units over a long period of time is called
a:
3. Pick the option that correctly completes the statement.
Consider a well-designed experiment involving a group of volunteers.
A tail proportion of less than 5% in the randomisation test allows us to make:
4. Pick the option that correctly completes the statement.
Using random sampling:
5. Pick the option that correctly completes the statement.
A bootstrap confidence interval may be interpreted as an interval:
group or category names for each entity.
measurements or counts taken on each entity.
cross-sectional study.
longitudinal study.
sample-to-population inference.
experiment-to-causation inference.
allows for the calculation of the likely size of sampling errors.
will guarantee representative samples.
of plausible values for the parameter.
within which the parameter is certain to lie.
STATS101/101G/108 - 1213
3/22
6. Pick the option that correctly completes the statement.

All other things being equal, bigger sample sizes give:
7. Pick the option that correctly completes the statement.

The null hypothesis, H , is the:
8. Pick the option that correctly completes the statement.

When conducting a t-test a plot of the sample data is used to check for evidence of:
9. Pick the option that correctly completes the statement.

For a Chi-square test for independence, there will be evidence against the null hypothesis if there
are relatively:
10. Pick the option that correctly completes the statement.

The sign (+ or -) of the sample correlation coefficient, r, is:

Note: For Questions 11 to 20 be careful which option you choose because the order of the
True/False options may change from question to question.

11. Decide whether this statement is True or False.

wider confidence intervals.
narrower confidence intervals.
0
hypothesis we test.
research hypothesis.
non-Normal features.
independence.
small differences between the observed and expected counts in one or more cells.
large differences between the observed and expected counts in one or more cells.
not necessarily the same as the sign of the slope of the least squares regression line.
always the same as the sign of the slope of the least squares regression line.
STATS101/101G/108 - 1213
4/22
For highly skewed data the sample median is a more sensible measure of the centre than the
sample mean.
12. Decide whether this statement is True or False.

An observational study can be used to reliably establish the cause of an effect.
13. Decide whether this statement is True or False.

Under chance alone, when comparing two groups, the difference we observe would purely and
simply be due to which units just happened to have ended up in which group and nothing else.

14. Decide whether this statement is True or False.

Taking larger samples will not reduce the effects of selection bias and other nonsampling errors.
15. Decide whether this statement is True or False.

We can be certain that the true value of a population parameter is somewhere in a bootstrap
confidence interval for that parameter.
16. Decide whether this statement is True or False.

The level of confidence is the long-run success rate for a method which aims at producing
confidence intervals which contain the unknown value of the parameter.
False
True
True
False
True
False
True
False
False
True
STATS101/101G/108 - 1213
5/22
17. Decide whether this statement is True or False.

Statistical significance implies practical significance.
18. Decide whether this statement is True or False.

If the P-value for an F-test for one-way analysis of variance is large then the differences we see
between the sample means could be due to chance alone.
19. Decide whether this statement is True or False.

The greater the value of the Chi-square test statistic, the weaker the evidence against the null
hypothesis.
20. Decide whether this statement is True or False.

The correlation coefficient measures the strength and the direction of a linear relationship between
two numeric variables.

False
True
False
True
True
False
False
True
False
True
Maximum marks: 20
STATS101/101G/108 - 1213
7/22
2 Block 2: Questions 21 to 24
These questions are worth two marks each.

Questions 21 to 24 refer to the information in Appendix A.

21. Which one of the following statements about the study is false?


22. Refer to Figure 2.

Which one of the following statements could be false?


23. Refer to Figure 3.

Which one of the following statements is false?

This study is an experiment because the participants were randomly allocated to either the
TimeRestriction group or the NoTimeRestriction group.
The response variable was ReportedNumber.
The researchers were blinded because they did not know what number the participants
actually rolled.
The NoTimeRestriction group was the control group.
This study had a completely randomised design.
There were more participants in the NoTimeRestriction group who actually rolled a 1 than
participants in the TimeRestriction group who actually rolled a 1.
The standard deviation of the ReportedNumber for the NoTimeRestriction group is higher
than that of the TimeRestriction group.
The median ReportedNumber for the NoTimeRestriction group is less than that of the
TimeRestriction group.
Numbers less than 3 were reported less often by participants in theTimeRestriction group
than were reported by those in the NoTimeRestriction group.
Participants in the TimeRestriction group tended to report a higher number than participants
in the NoTimeRestriction group reported.
STATS101/101G/108 - 1213
8/22

24. Suppose that the researchers were also interested in seeing if the underlying mean time taken
to report their number by those in the NoTimeRestriction group was different to the underlying
mean time taken to report their number by those in the TimeRestriction group.

Let be the difference between the underlying mean time taken to report their number
by those in the NoTimeRestriction group and the underlying mean time taken to report their
number by those in the TimeRestriction group.

Which one the following are a correct pair of hypotheses for this test?


We have evidence that the time restriction caused the participants in the
TimeRestriction group to roll higher numbers.
The P-value for this randomisation test is less than 5%.
We have evidence that chance was not acting alone in the actual study.
We may claim that the time restriction had an effect on the mean ReportedNumber.
We have evidence that Group together with chance produced the observed result.
μNTR − μTR
H0 : ¯¯x¯ NTR − ¯¯x¯ TR ≠ 8
H1 : ¯¯x¯ NTR − ¯¯x¯ TR = 8
H0 : μNTR − μTR = 0
H1 : μNTR − μTR ≠ 0
H0 : ¯¯x¯ NTR − ¯¯x¯ TR = 0
H1 : ¯¯x¯ NTR − ¯¯x¯ TR ≠ 0
H0 : μNTR − μTR ≠ 0
H1 : μNTR − μTR = 0
H0 : μNTR − μTR = 8
H1 : μNTR − μTR ≠ 8
Maximum marks: 8
STATS101/101G/108 - 1213
9/22
3 BLOCK 3: Questions 25 to 30
These questions are worth two marks each.

Questions 25 to 30 refer to the information in Appendix B.

25. Which one of the following could not be present in the data collected?


Questions 26 and 27 refer to Figure 4 and the accompanying information.

26. Which one of the following statements is false?


27. Suppose that it was decided to use t-procedures to calculate a 95% confidence interval for the
difference between the proportion of those interested in politics who said that they had voted and
the proportion of those not interested in politics who said that they had voted.

The sampling situation for calculating the standard error of the estimate is:

Nonresponse bias
Interviewer effects
Question effects
Behavioural considerations
Sampling error
The bootstrap confidence interval includes the difference in the sample proportions.
The smallest sample proportion is the proportion of those not interested in politics who said
that they did not vote.
It's a fairly safe bet that the proportion of those interested in politics who said that they had
voted is somewhere between 11 and 19 percentage points higher than the proportion of
those not interested in politics who said that they had voted.
The majority of respondents said that they had voted.
In every resample the difference in percentage points was more than five.
STATS101/101G/108 - 1213
10/22

Questions 28 to 30 refer to Figure 5 and the accompanying information.

28. The test-statistic, , for this t-test is approximately:


29. Which one of the following statements is not a correct interpretation of the P-value for this t-
test?


30. A 95% confidence interval for is (0.06, 0.12). Suppose that we wish to calculate a
90% confidence interval using the same data.

Which one of the following statements is false?
one sample of size 3207, several response categories.
one sample of size 3412, several response categories.
two independent samples of sizes 3207 and 205.
one sample of size 3412, many yes/no items.
two independent samples of sizes 385 and 3027.
t0
0.09
0.03
4.32
5.59
1.96
At the 5% level of significance we can reject the null hypothesis.
At the 10% level of significance we can reject the alternative hypothesis.
At the 5% level of significance we can claim that is greater than .pN pL
At the 1% level of significance we can claim that is greater than .pN pL
The observed difference is a statistically significant result (at the 5% level).
pN − pL
STATS101/101G/108 - 1213
11/22

The 90% confidence interval will:


be calculated using the same t-multiplier.
be narrower than the 95% confidence interval.
not include zero.
be calculated using the same standard error.
have a smaller margin of error.
Maximum marks: 12
STATS101/101G/108 - 1213
12/22
4 BLOCK 4
These questions are worth two marks each.

Questions 31 to 42 refer to the information in Appendix C.

31. Refer to Figure 6.
Which one of the following statements is false?


32. Which one of the following is a correct pair of hypotheses for this t-test?



33. Refer to the test output in Table 1.
Which one of the following statements is false?

If the mean were shown on the plot it would be below the median.
Approximately half of the movies in this dataset grossed more in the US than in the rest of
the world.
This data is positively (right) skewed.
There are no gross outliers in this data.
We could use this plot to check the assumption of Normality for a paired data t-test on this
data.
H0 : μDiff ≠ 0
H1 : μDiff = 0
H0 : μDiff = 0
H1 : μDiff ≠ 0
H0 : ¯¯x¯ Diff = 0
H1 : ¯¯x¯ Diff ≠ 0
H0 : ¯¯x¯ Diff ≠ 0
H1 : ¯¯x¯ Diff = 0
H0 : μDiff = 0
H1 : μDiff > 0
STATS101/101G/108 - 1213
13/22

34. Based on the results of this t-test, which one of the following is a correct statement?

Questions 35 to 40 refer to Tables 2 & 3, Figures 7 & 8 and the information that goes with them.

35. Refer to Figure 7.
Which one of the following statements is false?

With 95% confidence, we estimate that the underlying mean
difference in gross income is somewhere between US$5.9 million
and US$14.3 million.
The confidence interval is narrow compared to the range of the
sample data because it is calculated using a relatively large dataset.
We cannot be certain that μDiff is somewhere between
US$5.9 million and US$14.3 million.
¯¯x¯ Diff is in the middle of the 95% confidence interval for μDiff .
The margin of error for the 95% confidence interval for μDiff
is US$2.131 million.
We may claim that, on average, movies' gross income from the rest of the world is higher
than it is from the US.
We have very strong evidence that a movie's gross income from the rest of the world is
higher than it is from the US.
We may claim that a movie makes more of its gross income in the rest of the world than in
the US.
We have very strong evidence that, on average, movies' gross income is more in the US
than in the rest of the world.
It is not plausible that, on average, movies make US$10 million more in the rest of the world
than in the US.
STATS101/101G/108 - 1213
14/22

Questions 36 to 39 assume that a simple linear regression is appropriate.

36. The equation for the least squares regression line for this analysis is:


37. One of the movies that had a budget of US$40 million had a total gross income of US$315
million. Under this regression analysis, the residual for this movie is approximately:


38. For movies like those in this dataset, which one of the following statements is true?

With 95% confidence, we estimate (to 1 decimal place) that, on average, an increase of US$10
million in the budget is associated with:
The movie with the highest budget had the highest total gross income.
Only two movies had a total gross income of more than US$1250 million.
As Budget increases the variability in Total tends to increase.
The maximum budget for any of these movies is about US$300 million.
It looks like a lot of movies have a gross income less than US$250 million.
Predicted Total = 8.996 + 3.035 x Budget
Predicted Total = -8.996 + 3.035 x Budget
Predicted Total = 3.035 + 0.117 x Budget
Predicted Total = 3.035 - 8.996 x Budget
Predicted Total = 0.117 + 3.035 x Budget
48
203
−275
−203
−183
STATS101/101G/108 - 1213
15/22


39. Refer to Table 3.
Which one of the following statements is false?

For movies like those in the dataset, with 95% confidence we estimate that:


40. Which one of the following statements is false?

We should be wary of using the results of this regression analysis to predict the total gross income
of a movie released in 2020, based on its budget of US$600 million because:

a decrease in the total gross income of somewhere between US$7.34 million and US$25.5
million.
a decrease in the total gross income of US$90.0 million.
an increase in the total gross income of somewhere between US$28.1 million and US$32.7
million.
an increase in the total gross income of US$30.4 million.
an increase in the total gross income of somewhere between US$7.34 million and US$25.5
million.
movies with a budget of US$82.5 million will have an underlying mean total income of
between US$231.0 and US$251.8 million.
movies with a budget of US$190.0 million will have an underlying mean total income of
between US$266.3 and US$869.1 million.
a movie with a budget of US$175.0 million will have a total income of between US$221.1
and US$823.2 million.
movies with a budget of US$40.0 million will have an underlying mean total income of
between US$102.4 and US$122.4 million.
a movie with a budget of US$225.0 million will have a total income of between US$371.6
and US$976.2 million.
STATS101/101G/108 - 1213
16/22

41. Suppose we wish to investigate whether, on average, some distributors have a higher total
gross income from their movies than others.

Given that the underlying assumptions are satisfied, which form of analysis, using the variables
Total and Distributor, is most appropriate?


42. We wish to use a one-way analysis of variance F-test for no difference between means to see
if Paramount have changed their average budget over the three decades we have data for.
The plot below shows Budget for Paramount movies by Decade.

the highest budget in the data set is US$300 million and we do not know if the relationship
holds for higher budget movies.
there will be variability in the total gross income of movies released in 2020, with a budget of
US$600 million.
both the scatter plot and the residual plot indicate that at least one of the assumptions of
linear regression has not been met.
the dataset only includes movies released between 1982 and 2011 and we do not know if
the relationship holds for newer movies.
both the scatter plot and the residual plot indicate there is non-constant scatter.
One-way analysis of variance F-test for no difference between means
t-test for no difference between two proportions
Simple linear regression
Chi-square test for independence
t-test for no difference between two means
STATS101/101G/108 - 1213
17/22

With regard to the assumptions of the F-test, which one of the following statements is true?

We have concerns about using the F-test because:


the decades are not independent of each other.
the sample sizes for each decade are so different.
the movies were all distributed by Paramount so the assumption of independence within
groups cannot be met.
the plots suggest that the requirement for the assumption of equal standard deviations has
not been met.
one of the medians is very different to the other two.
Maximum marks: 24
STATS101/101G/108 - 1213
18/22
5 BLOCK 5: Questions 43 to 50
These questions are worth two marks each.

Questions 43 to 50 refer to the information in Appendix D.

43. Using the plots in Figure 9 and Figure 10, which one of the following statements is false?


44. Which one of the following is not an appropriate null hypothesis for this Chi-square test?


45. Which one of the following statements is the best justification about the appropriateness of
using the Chi-square test for this data?

The proportions of students who felt neutral about Covid 19 were fairly similar for each of the
three semesters.
Roughly the same number of students felt worried in S2.20 as did in S1.21.
The plots suggest that Response depends on Semester.
The semester with the largest proportion of students who felt very pessimistic was S2.20.
For all three semesters combined, less than half of the students felt very pessimistic or
worried.
H : Response is independent of Semester.0
H : There is no association between Response and Semester.0
H : The variables Response and Semester are not related.0
H : The underlying distribution of Semester is the same for all levels of Response.0
H : The underlying distribution of Response is the same for each of the levels of Semester.0
STATS101/101G/108 - 1213
19/22
Questions 46 to 50 assume that the use of the Chi-square test is appropriate.
(Note that this assumption may not be correct.)

46. Under the null hypothesis, which one of the following would be the best estimate of the
distribution of Response for Semester 2 2020 (S2.20).

There is a concern about using a Chi-square test because two cell contributions are less
than one.
There is no concern about using a Chi-square test because none of the expected counts
are less than five.
There is no concern about using a Chi-square test because all the expected counts are
greater than one.
There is a concern about using a Chi-square test because fewer than 80% of the cell
contributions are five or more.
There is no concern about using a Chi-square test because none of the observed counts
are less than five.
STATS101/101G/108 - 1213
20/22
47. Consider the cell in Table 4 for the students in Semester 1 2020 (S1.20) who were not worried
at all. Under the null hypothesis, the estimated expected count (to 1 decimal place) is:

STATS101/101G/108 - 1213
21/22

48. The P-value for this Chi-square test is calculated by:


49. For these students, which one of the following statements is the best conclusion based on the
P-value for this Chi-square test?


50. Which one of the following statements gives the best reason for the P-value of this Chi-square
test?

18.4
13.3
30.0
52.4
2.3
2 × pr(χ2 ≥ 86.409) where χ2 Chi-square(df = 8)
pr(χ2 ≤ 86.409) where χ2 Chi-square(df = 8)
pr(χ2 ≥ 86.409) where χ2 Chi-square(df = 570)
2 × pr(χ2 ≥ 86.409) where χ2 Chi-square(df = 570)
pr(χ2 ≥ 86.409) where χ2 Chi-square(df = 8)
It is not possible for Response and Semester to be independent.
There is very strong evidence that Response and Semester are related.
There is very strong evidence that the differences in distribution of Response were caused
by the timing of the semester that the question about Covid 19 was asked.
There is no evidence that Response and Semester are associated.
There is very strong evidence that the differences in distribution of Response were not due
to the timing of the semester that the question about Covid 19 was asked.
STATS101/101G/108 - 1213
22/22

For the students in S1.21, the number who were neutral was exactly what would have been
expected if the null hypothesis was true.
For the students in S2.20, there were far fewer who felt very pessimistic and far more who
were mostly fine than would be expected if the null hypothesis was true.
For the students in S2.20, there were far more who felt very pessimistic and far fewer who
were mostly fine than would be expected if the null hypothesis was true.
For the students who felt neutral, the observed number for each semester was very similar
to what would be expected if the null hypothesis was true.
For the students who were neutral, the cell contributions for each semester were very small.
Maximum marks: 16
INCLUSIONS:
• Appendix A: Ethics Data for use in Questions 21 to 24
• Appendix B: NZ Election Study Data for use in Questions 25 to 30
• Appendix C: Movie Data for use in Questions 31 to 41
• Appendix D: Covid 19 Data for use in Questions 43 to 50
• Formulae Appendix
References
Albright, S. C. and Winston, W. L. (2017). Business Analytics: Data Analysis and Decision Making. Cengage Learning, 6th
edition.
Shalvi, S., Eldar, O., and Bereby-Meyer, Y. (2012). Honesty Requires Time (and Lack of Justifications). Psychological Science,
23(10), 1264–1270. PMID: 22972904.
Vowles, J., McMillan, K., Barker, F., Curtin, J., Hayward, J., Greaves, L., and Crothers, C. (2018). New Zealand Election Study.
Online, nzes.org.
Appendix A: Ethics data
Questions 21 to 24 refer to the information in this appendix.
Shalvi et al. (2012) were interested in determining whether having no time to reconsider
one’s impulses affects behaviour. In a study to investigate this, participants rolled a fair
6-sided die and then reported what number they rolled. Participants were told that they
would receive payment based on the number they rolled, where the higher the number
the higher the payment received. Each participant could be confident that they were the
only person who knew what number they actually rolled.
The 72 participants were first-year students who were randomly assigned to two groups.
The 35 participants randomly assigned to the Time Restriction group were given 8
seconds after rolling the die to report the number rolled. The remaining 37 participants
were randomly assigned to the NoTime Restriction group and had no time restriction.
Figure 2 shows ReportedNumber (the number the participant reported rolling) and
its mean for each Group. The observed difference between the two means is also shown.
Note: ReportedNumber may not be the same as the number the participant actually
rolled.
A randomisation test was conducted to see if having a time restriction affected the mean
ReportedNumber. The Re-randomisation distribution and the tail proportion are
shown in Figure 3.
1 2 3 4 5 6
1 2 3 4 5 6
−2 −1 0 1 2
ReportedNum...
1.00
1.00
1.00
1.00
1.00
1.00
1.00
2.00
2.00
2.00
2.00
2.00
2.00
2.00
3.00
3.00
3.00
3.00
3.00
3.00
4.00
4.00
4.00
4.00
5.00
5.00
5.00
5.00
5.00
5.00
6.00
6.00
6.00
6.00
6.00
6.00
6.00
1.00
1.00
2.00
2.00
2.00
3.00
3.00
3.00
3.00
4.00
4.00
4.00
...
Group
NoTimeRestriction
NoTimeRestriction
NoTimeRestriction
NoTimeRestriction
NoTimeRestriction
NoTimeRestriction
NoTimeRestriction
NoTimeRestriction
NoTimeRestriction
NoTimeRestriction
NoTimeRestriction
NoTimeRestriction
NoTimeRestriction
NoTimeRestriction
NoTimeRestriction
NoTimeRestriction
NoTimeRestriction
NoTimeRestriction
NoTimeRestriction
NoTimeRestriction
NoTimeRestriction
NoTimeRestriction
NoTimeRestriction
NoTimeRestriction
NoTimeRestriction
NoTimeRestriction
NoTimeRestriction
NoTimeRestriction
NoTimeRestriction
NoTimeRestriction
NoTimeRestriction
NoTimeRestriction
NoTimeRestriction
NoTimeRestriction
NoTimeRestriction
NoTimeRestriction
NoTimeRestriction
TimeRestriction
TimeRestriction
TimeRestriction
TimeRestriction
TimeRestriction
TimeRestriction
TimeRestriction
TimeRestriction
TimeRestriction
TimeRestriction
TimeRestriction
TimeRestriction
...
ll ll
l
ll
ll
ll
ll
ll
ll
l
ll
ll
l
ll
ll
ll
ll
ll
ll
TimeRestriction
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
NoTimeRestriction
0.94
14 / 1000
= 0.014
Module: Randomisation test Variable: ReportedNumber Quantity: mean Statistic: difference File: time.csv
Data
Re−randomised data
Re−randomisation distribution
Figure 2: ReportedNumber by Group
Recall that ReportedNumber may not be the same as the number the participant
actually rolled.
1 2 3 4 5 6
1 2 3 4 5 6
−2 −1 0 1 2
ReportedNum...
1.00
1.00
1.00
1.00
1.00
1.00
1.00
2.00
2.00
2.00
2.00
2.00
2.00
2.00
3.00
3.00
3.00
3.00
3.00
3.00
4.00
4.00
4.00
4.00
5.00
5.00
5.00
5.00
5.00
5.00
6.00
6.00
6.00
6.00
6.00
6.00
6.00
1.00
1.00
2.00
2.00
2.00
3.00
3.00
3.00
3.00
4.00
4.00
4.00
...
Group
NoTimeRestriction
NoTimeRestriction
NoTimeRestriction
NoTimeRestriction
NoTimeRestriction
NoTimeRestriction
NoTimeRestriction
NoTimeRestriction
NoTimeRestriction
NoTimeRestriction
NoTimeRestriction
NoTimeRestriction
NoTimeRestriction
NoTimeRestriction
NoTimeRestriction
NoTimeRestriction
NoTimeRestriction
NoTimeRestriction
NoTimeRestriction
NoTimeRestriction
NoTimeRestriction
NoTimeRestriction
NoTimeRestriction
NoTimeRestriction
NoTimeRestriction
NoTimeRestriction
NoTimeRestriction
NoTimeRestriction
NoTimeRestriction
NoTimeRestriction
NoTimeRestriction
NoTimeRestriction
NoTimeRestriction
NoTimeRestriction
NoTimeRestriction
NoTimeRestriction
NoTimeRestriction
TimeRestriction
TimeRestriction
TimeRestriction
TimeRestriction
TimeRestriction
TimeRestriction
TimeRestriction
TimeRestriction
TimeRestriction
TimeRestriction
TimeRestriction
TimeRestriction
...
ll ll
l
ll
ll
ll
ll
ll
ll
l
ll
ll
l
ll
ll
ll
ll
ll
ll
TimeRestriction
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
NoTimeRestriction
0.94
14 / 1000
= 0.014
Module: Randomisation test Variable: ReportedNumber Quantity: mean Statistic: difference File: time.csv
Data
Re−randomised data
Re−randomisation distribution
Figure 3: Re-randomisation distribution
Appendix B: NZ Election Study Data
Questions 25 to 30 refer to the information in this appendix.
Data from the 2017 Election study (Vowles et al., 2018) is available online. The data was
collected, immediately after the 2017 General Election, by sending an online questionnaire
to a random sample of those eligible to vote in the 2017 General Election. The overall
response rate was approximately 38%.
Questions 26 and 27 refer to the following additional information.
Two of the questions asked were:
How interested in politics are you?
and
Did you vote? (Yes or No)
A bootstrap confidence interval was constructed to estimate the difference between the
proportion of those not interested in politics who said that they had voted and the
proportion of those interested in politics who said that they had voted.
The VIT output is shown in Figure 4. The blue bars represent those who said that they
had voted, the pink bars represent those who said that they had not voted.
Sample
74
0 0.2 0.4 0.6 0.8
Bootstrap distribution
-0.4 -0.2 0
0.108"
0.4
Figure 4: Bootstrap confidence interval output
Questions 28 to 30 refers to the following additional information.
Another two questions were only asked to those that said that they had voted, they were:
Which party did you give your party vote to?
and
For which party’s candidate did you give your electorate vote?
One reason for asking these two questions is to be able to see if people give both votes
to the same party.
Let:
pN be the underlying proportion of those who gave their party vote to National
who gave their electorate vote to the National candidate,
and
pL be the underlying proportion of those who gave their party vote to Labour
who gave their electorate vote to the Labour candidate.
A two-tailed t-test for no difference between pN and pL was conducted.
Figure 5 shows the t-procedures tool with sample proportions and sample sizes. p̂1 is the
sample proportion for pN and p̂2 is the sample proportion for pL.
Figure 5: t-procedures tool screenshot
Appendix C: Movie Data
Questions 31 to 41 refer to the information in this appendix.
Albright and Winston (2017) collected data on movies released between 1982 and 2011.
For the purposes of this exam we will only consider the 1119 movies for which there was
complete data. Distributor information has been simplified into seven broad categories.
Variables that will be used in this exam are defined as follows:
Distributor The company that distributed the movie
– Buena Vista
– Fox
– Paramount
– Sony
– Warner
– Universal
– Other
Decade The decade in which the movie was first released
– 1982 to 1991
– 1992 to 2001
– 2002 to 2011
Budget The budget for the movie’s production in US$million
US The movie’s gross income from the United States release in US$million
Total The movie’s total (worldwide) gross income in US$million
Rest The movie’s gross income from the rest of the world (Total - US) in US$million
Questions 31 to 34 refer to the following additional information.
A two-sided paired data t-test was conducted to investigate the difference between Rest
and US.
Let:
Diff = Rest − US
and
µDiff be the underlying mean difference between Rest and US
The plot in Figure 6 shows Diff for each movie.
−200 0 200 400 600 800
Diff (US$million)
Figure 6: Difference in gross income (Rest−US)
Output for the paired data t-test on the differences is shown in Table 1.
Paired Samples Test
Paired Differences
95% Confidence
Interval of the
Std. Std.Error Difference Sig.
Mean Deviation Mean Lower Upper t df (2-tailed)
Pair 1 Rest −
US 10.088 71.288 2.131 5.906 14.269 4.733 1118 .000
Table 1: Paired data t-test output
Questions 35 to 40 refer to the following additional information.
A simple linear regression was carried out to investigate if there is a relationship between
Budget and Total. The results of this analysis are shown in Tables 2 and 3. Plots are
given in Figures 7 and 8.
Coefficientsa
Unstandardized Standardized 95.0% Confidence
Coefficients Coefficients Interval for B
Lower Upper
Model B Std. Error Beta t Sig. Bound Bound
1 (constant) −8.996 8.327 −1.080 .280 −25.335 7.343
Budget 3.035 .117 .612 25.889 .000 2.805 3.265
a. Dependent variable: Total
Table 2: Simple linear regression output
I’m committed to playing the long game of learning, rather than the short
game of schooli g.
That long game extends beyond the length of the semester. Missing a
class or failing to extend maximal effort on an assignment are small things
relative to that timeline. It’s hard to know when a lesson is going to land.

John Warner
https://www.insidehighered.com/blogs/just-visiting/continuing-adventures-
ungrading?utm_source=Inside+Higher+Ed&utm_campaign=bd09d7a331-
DNU_2019_COPY_01&utm_medium=email&utm_term=0_1fcbc04421-
bd09d7a331-198458829&mc_cid=bd09d7a331&mc_eid=aa65728924





Table 3: Prediction output
50 100 150 200 250 300
0
50
0
10
00
15
00
20
00
Budget (US$million)
To
ta
l (U
S$
mi
llio
n)
Figure 7: Total gross income against budget
50 100 150 200 250 300
0
50
0
10
00
Budget (US$million)
R
es
id
ua
ls
Figure 8: Residual plot
Appendix D: Covid 19 Data
Questions 43 to 50 refer to the information in this appendix.
One of the Stats 10x lecturers was interested in how her students felt about Covid 19.
She surveyed her Stats 10x class in three different semesters, asking them:
“How do you feel about Covid 19 ?”
She was interested in whether there was a relationship between the response to the
question (Response) and the semester that the students were surveyed in (Semester).
The two variables used are defined as:
Response How the students felt about Covid 19
- Very pessimistic
- Worried
- Neutral
- Mostly fine
- Not worried at all
Semester The semester and year that the survey was taken in
- Semester 1 2020 (S1.20)
- Semester 2 2020 (S2.20)
- Semester 1 2021 (S1.21)
The distribution of Response for the three semesters combined is shown in Figure 9.
Figure 9: Distribution of Response for all three semesters
Figure 10 shows two versions of the distribution of Response by Semester.
Figure 10: Distribution of Response by Semester
A Chi-square test for independence was conducted to see if there was an association
between Response and Semester. The results of this test are shown in Table 4.
Date * Response Crosstabulation
Response
Total
Very
pessimistic Worried Neutral Mostly fine
Not worried
at all
Date S1.20 Count 13 73 69 77 30 262
Expected Count 26.6 89.5 67.9 ++ ++ 262.0
Cell contribution 6.95 3.04 0.02 5.08 7.31
S2.20 Count 34 49 31 8 2 124
Expected Count 12.6 42.3 32.1 28.2 8.7 124.0
Cell contribution 36.35 1.06 0.04 14.47 5.16
S1.21 Count 11 73 48 45 8 185
Expected Count 18.8 63.2 48.0 42.1 ++ 185.0
Cell contribution 3.24 1.52 0 0.20 1.92
Total Count 58 195 148 130 40 571
Expected Count 58.0 195.0 148.0 130.0 40.0 571.0
Chi-Square Tests
Value df Significance
Pearson Chi-Square 86.409a ++ .000
Likelihood Ratio 83.848 ++ .000
N of Valid Cases 571
a. 0 cells (0.0%) have expected count less than 5.
Note: Some values have been replaced with ++
Table 4: Chi-square test output
VERSION 1 STATS 101/101G/108
FORMULAE
Confidence intervals and t-tests
Confidence interval: estimate± t×se(estimate)
t-test statistic: t0 =
estimate− hypothesised value
standard error
Applications:
1. Single mean µ: estimate = x; df = n− 1
2. Single proportion p: estimate = p̂; df =∞
3. Difference between two means µ1 − µ2: (independent samples)
estimate = x1 − x2; df = min(n1 − 1, n2 − 1)
4. Difference between two proportions p1 − p2:
estimate = p̂1 − p̂2; df =∞
Situation (a): Proportions from two independent samples
Situation (b): One sample of size n, several response categories
Situation (c): One sample of size n, many yes/no items
The F -test (ANOVA)
F -test statistic: f0 =
s2B
s2W
; df1 = k − 1, df2 = ntot − k
The Chi-square test
Chi-square test statistic: χ20 =

all cells in the table
(observed − expected)2
expected
Expected count in cell (i, j) =
RiCj
n
df = (I − 1)(J − 1)
Regression
Fitted least-squares regression line: ŷ = β̂0 + β̂1x
Inference about the intercept, β0, and the slope, β1: df = n− 2
VERSION 1 STATS 101/101G/108
ANSWERS:
1 (1) 11 (2) 21 (3) 31 (1) 41 (1)
2 (2) 12 (2) 22 (1) 32 (2) 42 (4)
3 (2) 13 (1) 23 (1) 33 (5) 43 (2)
4 (1) 14 (1) 24 (2) 34 (1) 44 (4)
5 (1) 15 (1) 25 (2) 35 (1) 45 (2)
6 (2) 16 (2) 26 (2) 36 (2) 46 (5)
7 (1) 17 (1) 27 (5) 37 (2) 47 (1)
8 (1) 18 (1) 28 (4) 38 (3) 48 (5)
9 (2) 19 (1) 29 (2) 39 (2) 49 (2)
10 (2) 20 (2) 30 (1) 40 (2) 50 (3)

欢迎咨询51作业君
51作业君

Email:51zuoyejun

@gmail.com

添加客服微信: abby12468