辅导案例-STAT 341

欢迎使用51辅导,51作业君孵化低价透明的学长辅导平台,服务保持优质,平均费用压低50%以上! 51fudao.top
STAT 341 Assignment 1
Student Name and ID
Due Friday September 27 at 9:00am
Note
• Replace “Student Name and ID” with your name and waterloo ID.
• Using RMarkdown or LaTeX is required and no hand-written and/or imported screenshots will be
accepted in the assignments. A mark of 0% will be assigned to the questions which were not complied
in RMarkdown or LaTeX, and/or those which include hand-written solutions and/or screenshots.
• Organization is part of a full solution. Full marks will be awarded to organized complete solutions and
marks will be deducted for unorganized solutions.
Wayne Gretzky Goals
• Wayne Gretzky “The Great One” is a Canadian former professional ice hockey player. He played 20
seasons in the National Hockey League (NHL) and he is considered to be the greatest hockey player
ever. The dataset “GretzkyGoals.csv” contains all of Gretzky’s goals during his time in the NHL. Here,
we will examine the times at which the goals occurred during a sixty-minute game.
• Note:
– For each part below, any plots should be side-by-side in the same figure and they all should be
properly labelled.
a) [5 Marks] Read-in the data and convert the times into seconds. Remove the overtime goals, which are
any goals that occur beyond sixty minute mark of regular play. Then calculate average, median and
range for the times Wayne scored goals during a game.
b) [5 Marks] Plot three histograms using Sturges, Scott and Freedman-Diaconis rules for the number of
bins along with a boxplot. All four plots should be side-by-side in the same figure and they all should be
properly labelled. From these three histograms and boxplot does Wayne tend to score at any particular
time during the game?
c) [5 Marks] Construct two histograms, one using unequal bins (using the same number of bins used
by part b ) and another that the breaks the 60 minute game in two minutes interval. From these two
histograms does Wayne tend to score at any particular time during the game?
d) [3 Marks] Construct a quantile plot of the goal times. What feature does the quantile plot exhibit?
e) [5 Marks] Partition the goal times into empty-net goals and against-goalie goals. Construct a histogram
using the same number of bins used by part b) and using varying bins widths for each group. Comment
on the differences among the groups?
World Health Organization (WHO) on life expectancy
In this question you will be analyzing data for WHO on life expectancy. The data is in the file “WHO_life.csv”
posted on LEARN. Below is the powerfun from the course notes for your convenience.
1
powerfun <- function(x, alpha) {
if(sum(x <= 0, na.rm=TRUE) > 0) stop("x must be positive")
if (alpha == 0)
log(x)
else if (alpha > 0) {
x^alpha
} else -x^alpha
}
• The variables are Country, Year
– LB.XXXX the life expectancy at birth (years) for Males, Females & Both, and
– L60.XXXX Life expectancy at age 60 (years) for Males, Females & Both.
a) [3 Marks] What range of powers (the values of α) make the distribution of the life expectancy at birth
(years) for males symmetric?
b) [3 Marks] What range of powers (the values of α) make the distribution of the life expectancy at age
60 (years) for males symmetric?
c) [3 Marks] Using α = 4 as the power for x = LB.Male and α = 0 as the power for y = L60.Male, plot
the transformed variables. Between the transformed and original data, which one is better-suited for
linear modeling?
Investigating influence and sensitivity of the geometric mean
a) The geometric mean for the population P) = {y1, . . . , yN} is
a(P) = a(y1, . . . , yN ) =
(
N∏
i=1
yi
)1/N
i) [3 Marks] Derive the sensitivity curve for the geometric mean and write it as a function y and a(P)
ii) [3 Marks] Write the infuence of the geometric mean as function of yu and a(P).
b) The measure of sensitivity and influence does not have to depend on the difference between the attribute
values. Instead we might define the sensitivity-ratio (SR) for non-negative attributes as
SR (y;α(P)) =
[
α(y1, ..., yN−1, y)
α(y1, ..., yN−1)
]N
i) [3 Marks] Derive the SR for the geometric mean as function of y and a(P)
ii) [3 Marks] A measure of influence can be constructed with the ratio as well. Here we define the
influence-ratio (IR) for non-negative attribute as
IR (a, u) =
[
a(P)
a(y1, . . . , yu−1, , yu+1, yN )
]N
• Derive the influence-ratio for the geometric mean as function of yu and a(P)
c) The population provided in returns2.txt is the monthly returns of an investment over a period of 20
years.
2
i) [2 Marks] Plot the sensitivity curve (SC) of the geometric mean for this population over the
ranges [0.01, 2] & [0.0001, 100]. No comments required.
ii) [2 Marks] Write a function similar to sc called sr such that the
• inputs are a population y.pop, a sequence or vector of y values y and an attribute function
attr and the
• ouput is the sentivity-ratio for each y value.
iii) [3 Marks] Plot the sensitivity-ratio (SR) of the geometric mean for this population over the
ranges [0.01, 2] & [0.0001, 100]. Comment on the plots.
d) [4 Marks] Using the same population, plot the influence values from a) ii) and b) ii) for the geometric
mean and a histogram of the data, and comment on the plots. Use Freedman–Diaconis rule for the
number of bins. Comment on the influential observations based on each measure.
3
51作业君

Email:51zuoyejun

@gmail.com

添加客服微信: abby12468