Problem 3.6
a)
After doing the regression of Yi and Xi, we can get the following result as shown in figure 1-1 below.
Figure 1-1 Regression
The residuals are shown in the C3 of the figure 1-2 below.
Figure 1-2 Residuals
After doing this, the boxplot of the residuals could be ploted as shown in figure 1-3 below.
Figure 1-3 Boxplot of Residuals
As shown in the boxplot, the mid point of it is close to zero and the upper and lower quartiles are closely symmetric. Besides, no outliers exist, showing that the model fits well. The range of the residual are between [-5,5], meaning that the distribution of it is relatively centralized.
b)
Figure 4 Scatterplot ot RESI vs FITS
From the Scatterplot, it is clear that the pots are randomly distributed with no significant trend, meaning that the variance of the constant and the homoscendasticity hypothesis exists. Besides, no significant linear relationship exists between the fits and residuals, meaning the linear regression is suitable. Additionally, no outliers exist, the model fits well.
c)
Figure 1-5 Probability Plot of RESI
Figure 1-6 The Ryan-Joiner Test
The hypothesis:
H0: The RESI is normally distributed.
H1: The RESI is not normally distributed.
RJ=0.992
P-value>0.100, this means that we can not reject the H0 and accept the H1. Therefure, the residuals follow the normal distribution.
Problem 3.10
a)
As shown in figure 2-1 below, no obvious pattern of the residuals and the fixed values exists, meaning that the model could be linear. Besides, a significant outlier with the index of approximately (15.6, -3.8) exist, since its standardized residuals is less than -3. the data of this point must be of problem. The outlier could have significant influence on the regression result, which should be investigated. The outliers might implicit that the error of variance is not equal in different fitted values. Therefore, the model seems to be reasonable. However, aftering dealing with the outliers or detecting the possible heterosecdasticity could improve the fitness of the model.
Figure 2-1 Scatterplot ot RESI and FITS
b)
Figure 2-2 Data Set
The standardized residuals outside the range of [-1,1] are labelled in blue color above, which includes -1.12, 1.62, 1.79 and -3.78.
Therefore, four standardized residuals are outside the standard deviation.
According to the 68-95-99.7 rule, that approximate 68% of data are in the [-1,1] standard deviation. Thus, 32% (100%-68%) would be outside.
There are 12 observations. Thus, the expected data points outside the [-1,1] SD should be: 12*32%=3.84, which is close to 4 if close to the integer.
Therefore, the model is appropriate.
Problem 3.11
a)
Figure 3-1 Scatterplot ot ei and Xi
Clearly, both positive and negative value of ei exists in the diagram. The distributions of the ei seems to be different when the Xi changes. No significant trends exists.
These implicit that the model can predict the relationship well and the linear hypothesis might be reliable. However, the possible relationship between the ei and Xi could be a symbol of potential heteroskedasticity, therefore, further verifications are needed.
b)
As shown in figure 3-2 below, the regression is:
The positive of 0.934 shows that the deviation of error tends to increase as Xi increase, which supports the existence of heteroskedasticity.
Hypothesis:
H0: Homoscedasticity.
H1: Heteroskedasticity.
The p value of the estimates are all smaller than the significant level of 0.05, meaning that the model is statistically meanful.Thereore, we reject the H0 and accept H1. Heteroskedasticity exists, which supports findings in part (a).
Figure 3-2 Regression
Problem 3.16
a)
Figure 4-1 Scatterplot ot Yi and Xi
Figure 4-2 Scatterplot ot logYi and Xi
b)
Figure 4-3 Box-Cox Plot of Yi
The ‘optimal’ value of is 0.14.
Figure 4-4 Regression Output of Y
Therefore, the value of SSE using the optimal is 0.003851.
c)
Figure 4-4 Regression Output of logY
Here, the estimated linear regression function for the transformed data is:
logY= 0.6549-0.19549Xi
d)
Figure 4-5 The Fitted Regression Plot
Apparenly, the regression line is a good fit because of the high R-square of 99.30% and the points shows strong linear relationship.
e)
Figure 4-6 The Scatterplot
Figure 4-7 The P-P plot
According to figure 4-6, the model fits well. The p value of the P-P plot is 0.825?0.05, meaning that the hypothesis of nomarlized distribution exists.
Therefore, the model fits very well, with residuals satisfy the hypothsis of linear regression.
Problem 3.18
a)
Figure 5-1 Scatterplot ot Yi and Xi
Yes, a linear relation is appropriate here due to the plot in figure 5-1. It seems that as Xi becomes larger, Yi becomes larger as well. However, some clusters exists in the plot especially when Xi is low. It might have the problem of heteroscedasticity. Therefore, it might be more suitable to use transformation to reduce the possibility of heteroscedasticity.
b)
Figure 5-2 Regression Output
The estimated linear regression function is:
c)
Figure 5-3 Fitted Line Plot
The transformation is a good fit because the R-square of 95.32% is very high, and the p-value of it is nearly to 0.000, which is much smaller than 0.05.
d)
Figure 5-4 Scatterplot of RESI and FITS
Figure 5-5 The P-P Plot
The figure 5-4 shows non-random pattern of a concave shape, illustrating that the model might not be suitable for showing the relationship between X and Y. The P-P plot also shows that deviation from normality exists.
Therefore, the transformation does not perform well.