1 MTH 542 Chapter 8 – Transformations on y – The Box – Cox Method Clathrate formation data (from Montgomery, Vining, Peck) Data: y = Clathrate formation (mass%), x1 = amount of surfactant, x2 = Time (minutes) clathrate compounds: A type of inclusion compound in which small molecules are trapped in the cagelike lattice of macromolecules surfactant: A surface-active agent, including substances commonly referred to as wetting agents, surface tension depressants, detergents, dispersing agents, emulsifiers, and quaternary ammonium antiseptics. library(MPV) attach(table.b8) table.b8 x1 x2 y 1 0.00 10 7.5 2 0.00 50 15.0 3 0.00 85 22.0 ----- 34 0.05 90 46.5 35 0.05 120 50.0 36 0.05 150 51.9 pairs(y~x1+x2,gap=0.4,cex.labels=1.5) y 0.00 0.02 0.04 1 0 2 0 3 0 4 0 5 0 0 .0 0 0 .0 2 0 .0 4 x1 10 20 30 40 50 0 50 100 200 300 0 5 0 1 0 0 2 0 0 3 0 0 x2 2 > plot(resid(lm(y~x1+x2)),fitted(lm(y~x1+x2))) > qqnorm(rstandard(lm(y~x1+x2))) Next we look for the most appropriate transformation for the response y to correct non-normality. Use the Box –Cox method. Step I. For each value of λ obtain the transformed response 1)( 1)( y ygmy , if λ ≠ 0 and yygmy ln)()( if λ = 0. 3 Step II. For each λ calculate the residual sum of square RSS (λ) from fitting the model exxy 22110 )( lambda RSS.lambda 1 -2.00 24794.9525 2 -1.75 15571.3368 3 -1.50 9931.1368 4 -1.25 6446.7541 5 -1.00 4271.8840 6 -0.75 2901.3226 7 -0.50 2031.3151 8 -0.25 1477.9404 9 0.00 1129.1560 10 0.25 916.4661 11 0.50 798.0994 12 0.75 748.9798 13 1.00 754.7438 14 1.25 808.1919 15 1.50 907.2317 16 1.75 1053.7580 17 2.00 1253.1491 See that the optimal value for λ is 0.75, because RSS is minimized at this value. From theory we know that the value λ that minimizes RSS is the same with the value λ that maximizes the function n i iynRSS n L 1 log1/)(log 2 )( Using the boxcox function from the MASS package (to use it, we need to install it first) we obtain a graph of the above log-likelihood function: > library(MASS) > boxcox(lm(y~x1+x2), plotit=T) 4 Indeed, the maximum is attained at some point around 0.75. A Q-Q plot for the residuals corresponding to the model with the optimal transformation of the response is: Observe that the boxcox function in R also gives a 95% confidence interval for λ. When a Confidence interval for λ is given we can select a convenient value for λ from this interval. Actually if the interval contains the value 1, we may just take λ = 1, which corresponds to no transformation. Some comments on the Box-Cox method: It is influenced by outliers. If some yi are negative, we can add a constant to all the y’s to make them positive. There is a question on whether the estimation of the parameter λ should be counted as one less df for the model. There is no clear answer for this. There are other ways to transform the response.
欢迎咨询51作业君