STA 302/1001-Methods of Data Analysis I Sections L0101/2001, L0201 & L0301 Shivon Sue-Chee Module 2 Shivon Sue-Chee Simple Linear Regression 1 Module 2 - Simple Linear Regression I 2.1. The SLR Model I 2.2. Estimating regression parameters I 2.3. Properties of LS estimators I 2.4. Statistical Assumptions of SLR I 2.5. SLR In R: Data Example Shivon Sue-Chee Simple Linear Regression 2 Cartoon of the week Shivon Sue-Chee Simple Linear Regression 3 2.1. The SLR Model I What is a linear model? I Examples of linear and non-linear models I What is an SLR model? Shivon Sue-Chee Simple Linear Regression 4 General form of models General form of a model for Y in terms of three predictors: Y = f (X1,X2,X3) + e I f is some unknown function I e is the error not accounted for in f I Issue: If f is a smooth, continuous function, then there are many possibilities for f . Also, we would need infinite data to estimate f directly. I A fix: Restrict f to a linear form Shivon Sue-Chee Simple Linear Regression 5 General form of models General form of a model for Y in terms of three predictors: Y = f (X1,X2,X3) + e I f is some unknown function I e is the error not accounted for in f I Issue: If f is a smooth, continuous function, then there are many possibilities for f . Also, we would need infinite data to estimate f directly. I A fix: Restrict f to a linear form Shivon Sue-Chee Simple Linear Regression 6 General form of models General form of a model for Y in terms of three predictors: Y = f (X1,X2,X3) + e I f is some unknown function I e is the error not accounted for in f I Issue: If f is a smooth, continuous function, then there are many possibilities for f . Also, we would need infinite data to estimate f directly. I A fix: Restrict f to a linear form Shivon Sue-Chee Simple Linear Regression 7 What is a Linear Model? Definition (Linear Model) In a linear model for Y , the parameters enter linearly or Y is linear in terms of the parameters. Examples of linear models: I Y = β0 + β1x + e I Y = β0 + β1 log x + e I Y = β0 + β1x + β2x2 + e I Y = β0 + β1 log x1 + β2x2 + β3x1x2 + e I Y = β0xβ1e I Y = exp (β0 + β1x + e) I Tip: apply a suitable transform to Y to see that the model is linear. Shivon Sue-Chee Simple Linear Regression 8 What is a Linear Model? Definition (Linear Model) In a linear model for Y , the parameters enter linearly or Y is linear in terms of the parameters. Examples of linear models: I Y = β0 + β1x + e I Y = β0 + β1 log x + e I Y = β0 + β1x + β2x2 + e I Y = β0 + β1 log x1 + β2x2 + β3x1x2 + e I Y = β0xβ1e I Y = exp (β0 + β1x + e) I Tip: apply a suitable transform to Y to see that the model is linear. Shivon Sue-Chee Simple Linear Regression 9 Examples of Non-Linear Models I Y = β0 + exp(β1x) + e I Y = exp (β0 + exp β1x) + e I Y = β0 + β1xβ2 + e I Y = β0 + β1x − exp(β2 + β3x) + e Shivon Sue-Chee Simple Linear Regression 10 Linear and Non-linear Models I True non-linear models are rare. I Linear models can handle complex datasets. I Because predictors can be transformed and combined in many ways, linear models are very flexible. I All straight lines are linear models but all linear models are not just straight lines. Shivon Sue-Chee Simple Linear Regression 11 Linear and Non-linear Models I True non-linear models are rare. I Linear models can handle complex datasets. I Because predictors can be transformed and combined in many ways, linear models are very flexible. I All straight lines are linear models but all linear models are not just straight lines. Shivon Sue-Chee Simple Linear Regression 12 Linear and Non-linear Models I True non-linear models are rare. I Linear models can handle complex datasets. I Because predictors can be transformed and combined in many ways, linear models are very flexible. I All straight lines are linear models but all linear models are not just straight lines. Shivon Sue-Chee Simple Linear Regression 13 Linear and Non-linear Models I True non-linear models are rare. I Linear models can handle complex datasets. I Because predictors can be transformed and combined in many ways, linear models are very flexible. I All straight lines are linear models but all linear models are not just straight lines. Shivon Sue-Chee Simple Linear Regression 14 Simple Linear Regression (SLR) Models Y=β0 + β1X+ e I Y - dependent or response or output variable I X - independent or explanatory or predictor or input variable I β0 - intercept parameter I β1 - slope parameter I e - random error/noise, variation in measures that we cannot account for. Q: Why is this model ‘simple’? How useful is this simple model? Shivon Sue-Chee Simple Linear Regression 15 Simple Linear Regression (SLR) Models Y=β0 + β1X+ e I Y - dependent or response or output variable I X - independent or explanatory or predictor or input variable I β0 - intercept parameter I β1 - slope parameter I e - random error/noise, variation in measures that we cannot account for. Q: Why is this model ‘simple’? How useful is this simple model? Shivon Sue-Chee Simple Linear Regression 16 Simple Linear Regression (SLR) Models Y=β0 + β1X+ e I Y - dependent or response or output variable I X - independent or explanatory or predictor or input variable I β0 - intercept parameter I β1 - slope parameter I e - random error/noise, variation in measures that we cannot account for. Q: Why is this model ‘simple’? How useful is this simple model? Shivon Sue-Chee Simple Linear Regression 17 Simple Linear Regression (SLR) Models Y=β0 + β1X+ e I Y - dependent or response or output variable I X - independent or explanatory or predictor or input variable I β0 - intercept parameter I β1 - slope parameter I e - random error/noise, variation in measures that we cannot account for. Q: Why is this model ‘simple’? How useful is this simple model? Shivon Sue-Chee Simple Linear Regression 18 Simple Linear Regression (SLR) Models Y=β0 + β1X+ e I Y - dependent or response or output variable I X - independent or explanatory or predictor or input variable I β0 - intercept parameter I β1 - slope parameter I e - random error/noise, variation in measures that we cannot account for. Q: Why is this model ‘simple’? How useful is this simple model? Shivon Sue-Chee Simple Linear Regression 19 Shivon Sue-Chee Simple Linear Regression 20 Estimating β in SLR Shivon Sue-Chee Simple Linear Regression 21 2.2. Estimating the regression parameters I The Least Squares (LS) Method I The Maximum Likelihood Estimator (MLE) I Bayesian Approach Shivon Sue-Chee Simple Linear Regression 22 Fitting an SLR model MODEL: Y=β0 + β1X+ e AIM: Given a specific value of X, that is, X = x , find the expected value of Y , that is, E (Y |X = x) I need estimates of the regression parameters β0, β1 I need to assess the fit Shivon Sue-Chee Simple Linear Regression 23 Fitting an SLR model MODEL: Y=β0 + β1X+ e AIM: Given a specific value of X, that is, X = x , find the expected value of Y , that is, E (Y |X = x) I need estimates of the regression parameters β0, β1 I need to assess the fit Shivon Sue-Chee Simple Linear Regression 24 Estimating β in SLR I Get data (observational or experimental): I n pairs I bivariate data: (x1, y1), (x2, y2), (x3, y3), . . . , (xn, yn) I Notation: I Estimators: β̂0, β̂1 I Estimates: b0, b1 Shivon Sue-Chee Simple Linear Regression 25 Geometrical representation of estimating β (Figure 2.1. Faraway, 2005) I The response Y is in an n-dimensional space, Y ∈ Rn I The regression parameters are in a p + 1-dimensional space, β ∈ Rp+1 I where p is the number of predictors, so p + 1 is the number of regression parameters; p < n Shivon Sue-Chee Simple Linear Regression 26 (i) The Least Squares (LS) Method I Consider RSS = n∑ i=1 [yi − (b0 + b1xi)]2 -‘least squares criterion’ I Method: LEAST SQUARES METHOD -Find the estimators, b0, b1 that minimize the criterion, RSS Shivon Sue-Chee Simple Linear Regression 27 The LS Method: Fitted line and Residuals I Predicted or Fitted Value for each xi : yˆi = b0 + b1xi I Residuals: eˆi = yi − yˆi Shivon Sue-Chee Simple Linear Regression 28 The LS Method: Why vertical distances? I Want to predict Y from X and so we want yˆi to be as close as possible to yi . I If we minimize the horizontal distances, we will get a different answer for b0 and b1. I Regression is not symmetric! I It matters which variable is dependent and which is independent. Shivon Sue-Chee Simple Linear Regression 29 The LS Method: Why vertical distances? I Want to predict Y from X and so we want yˆi to be as close as possible to yi . I If we minimize the horizontal distances, we will get a different answer for b0 and b1. I Regression is not symmetric! I It matters which variable is dependent and which is independent. Shivon Sue-Chee Simple Linear Regression 30 The LS Method: Why vertical distances? I Want to predict Y from X and so we want yˆi to be as close as possible to yi . I If we minimize the horizontal distances, we will get a different answer for b0 and b1. I Regression is not symmetric! I It matters which variable is dependent and which is independent. Shivon Sue-Chee Simple Linear Regression 31 The LS Method: Why squared deviations? I makes no statistical assumptions I mean square error is the most common way to measure error in Statistics I LS estimators have ‘good’ properties Shivon Sue-Chee Simple Linear Regression 32 The LS Method: Analytical Derivations I l.s. criterion: RSS = ∑n i=1[yi − (b0 + b1xi)]2 I using calculus to minimize the criterion. I get the NORMAL EQUATIONS. wrt b1 wrt b0 Shivon Sue-Chee Simple Linear Regression 33 (ii) Maximum Likelihood Estimation (MLE) I Parameter θ, Estimator θ̂MLE I MLE steps: 1. define the likelihood function as a function of the parameter(s) θ; L(θ) = Distribution(Y |θ) considered a working model of the parameter given the specific data 2. find the value of the parameter that maximizes the likelihood function; that is, the estimator that gives the highest probability density to the observed data θ̂MLE = arg max θ L(θ) Shivon Sue-Chee Simple Linear Regression 34 (ii) Maximum Likelihood Estimation (MLE) I Parameter θ, Estimator θ̂MLE I MLE steps: 1. define the likelihood function as a function of the parameter(s) θ; L(θ) = Distribution(Y |θ) considered a working model of the parameter given the specific data 2. find the value of the parameter that maximizes the likelihood function; that is, the estimator that gives the highest probability density to the observed data θ̂MLE = arg max θ L(θ) Shivon Sue-Chee Simple Linear Regression 35 MLE Properties I regularity conditions are needed to derive the asymptotic distribution of the MLE I inference follows the frequentist paradigm I MLE’s have nice properties: I asymptotically unbiased, I consistent, I sufficient, I have minimum variance, I invariance principle holds. Shivon Sue-Chee Simple Linear Regression 36 MLE Example 1. Consider a normal likelihood for Y in terms of the parameters β, σ2 L(β, σ2) ∼ Nn(xβ, σ2I n) 2. Using calculus we get: β̂0,MLE = β̂1,MLE = σ̂2MLE = Shivon Sue-Chee Simple Linear Regression 37 (iii) Bayesian Approach to estimating β I The parameters are considered random- not fixed constants. p(β), p(σ2) 1. Hence, the parameters have a prior (ie, before observing the data) distribution. Priors could be proper or improper. pi(β, σ2) 2. Assume a likelihood for Y, as a function of the parameters, L(β, σ2) = Distribution(Y |β, σ2) 3. Derive the posterior distribution of the parameters given the data, p(β, σ2|y) ∝ L(β, σ2)× pi(β, σ2) Shivon Sue-Chee Simple Linear Regression 38 (iii) Bayesian Approach to estimating β I The parameters are considered random- not fixed constants. p(β), p(σ2) 1. Hence, the parameters have a prior (ie, before observing the data) distribution. Priors could be proper or improper. pi(β, σ2) 2. Assume a likelihood for Y, as a function of the parameters, L(β, σ2) = Distribution(Y |β, σ2) 3. Derive the posterior distribution of the parameters given the data, p(β, σ2|y) ∝ L(β, σ2)× pi(β, σ2) Shivon Sue-Chee Simple Linear Regression 39 (iii) Bayesian Approach to estimating β I The parameters are considered random- not fixed constants. p(β), p(σ2) 1. Hence, the parameters have a prior (ie, before observing the data) distribution. Priors could be proper or improper. pi(β, σ2) 2. Assume a likelihood for Y, as a function of the parameters, L(β, σ2) = Distribution(Y |β, σ2) 3. Derive the posterior distribution of the parameters given the data, p(β, σ2|y) ∝ L(β, σ2)× pi(β, σ2) Shivon Sue-Chee Simple Linear Regression 40 Bayesian Approach to estimating β I Obtain credible (rather than confidence) intervals for β where the interpretation differs! I With a credible interval, we speak about the probability that the unknown parameter falls into the interval. I often more computationally challenging than LS/ML approaches Shivon Sue-Chee Simple Linear Regression 41 Bayesian Approach: Example 1. Choose improper prior pi(β, σ2) = p(β)× p(σ2) ∝ σ2 2. Assume likelihood of Y is L(β, σ2) ∼ Nn(xβ, σ2I n) 3. The posterior distribution of β given the data is the kernel of a (k + 1)- dimensional t distribution. Results: I posterior mean results are identical to LS approach, ML approach under normality I 100(1− α)% credible intervals yield the same results as the 100(1− α)% confidence intervals but the interpretations are different. Shivon Sue-Chee Simple Linear Regression 42 Bayesian Approach: Example 1. Choose improper prior pi(β, σ2) = p(β)× p(σ2) ∝ σ2 2. Assume likelihood of Y is L(β, σ2) ∼ Nn(xβ, σ2I n) 3. The posterior distribution of β given the data is the kernel of a (k + 1)- dimensional t distribution. Results: I posterior mean results are identical to LS approach, ML approach under normality I 100(1− α)% credible intervals yield the same results as the 100(1− α)% confidence intervals but the interpretations are different. Shivon Sue-Chee Simple Linear Regression 43 Properties of LS Estimators Shivon Sue-Chee Simple Linear Regression 44 2.3. Properties of LS estimators I Properties of the fitted line I Properties of regression parameter estimators I Gauss Markov Theorem Shivon Sue-Chee Simple Linear Regression 45 Least Squares Regression Parameter Estimates I Intercept parameter estimate b0 = y¯ − b1x¯ (2.3) I Slope parameter estimate b1 = ∑n i=1 xiyi − nxy∑n i=1 x 2 − nx¯2 = ∑n i=1(xi − x¯)(yi − y¯)∑n i=1(xi − x¯)2 = SXY SXX (2.4) Exercise: Show 2.4. Shivon Sue-Chee Simple Linear Regression 46 Showing Equation 2.4 (SJS) Shivon Sue-Chee Simple Linear Regression 47 Interpreting Regression Parameter Estimates I Slope, b1: When x changes by 1 unit, the corresponding average change in y is the slope. I Intercept, b0: The average value of y when x = 0. (No practical interpretation unless 0 is within the range of the predictor (x) values.) Shivon Sue-Chee Simple Linear Regression 48 Properties of Fitted LS Regression Line I Fitted Line: yˆ = b0 + b1x Show. 1. The Average of the Residuals is always 0. n∑ i=1 eˆi ≡ 0 Shivon Sue-Chee Simple Linear Regression 49 Properties of Fitted Regression Line 2. The Sum of Squares of Residuals is NOT 0; unless the fit to the data is perfect! RSS = n∑ i=1 eˆ2i 6= 0 Shivon Sue-Chee Simple Linear Regression 50 Properties of Fitted LS Regression Line 3. ∑n i=1 eˆixi = 0 4. ∑n i=1 eˆi yˆi = 0 5. ∑n i=1 yˆi = ∑n i=1 yi Shivon Sue-Chee Simple Linear Regression 51 Gauss-Markov Theorem Theorem (Gauss-Markov Theorem) Under the conditions of the simple linear regression model, the least-squares parameter estimators are BLUE (“Best Linear Unbiased Estimators”). I parameter, θ; estimator, θˆ I Unbiased, E (θˆ) = θ - i.e., does not overestimate or underestimate systematically I Linear- linear in the parameters I “Best”- obtain minimum variance among all unbiased linear estimators Shivon Sue-Chee Simple Linear Regression 52 Rules of expectation I E(a) = a, a ∈ R I E(aY ) = aE(Y ) I E(X ± Y ) = E(X )± E(Y ) I E(XY ) = E(X )E(Y ), if X and Y are independent I Tower rule: E(Y ) = E[E(Y |X )] Shivon Sue-Chee Simple Linear Regression 53 Properties of Slope Estimator: Expectation Recall: b1 = ∑n i=1 xiyi − nx¯ y¯∑n i=1 x 2 − nx¯2 = ∑n i=1(xi − x¯)(yi − y¯)∑n i=1(xi − x¯)2 = SXY SXX (2.4) Since ∑n i=1(xi − x¯) = 0, n∑ i=1 (xi−x¯)(yi−y¯) = n∑ i=1 (xi−x¯)yi−y¯ n∑ i=1 (xi−x¯) = n∑ i=1 (xi−x¯)yi I Let ci = xi − x¯ SXX . I Then rewrite b1 as b1 = ∑n i=1 ciyi Shivon Sue-Chee Simple Linear Regression 54 Properties of Slope Estimator: Expectation I Treat X ’s as fixed I Mean of slope estimate, b1 E (b1|X ) = E [ n∑ i=1 ciyi |X = xi ] Shivon Sue-Chee Simple Linear Regression 55 Properties of Intercept Estimator: Expectation I Recall: b0 = y¯ − b1x¯ I Mean of intercept estimate, b0 E (b0|X ) = E [(y¯ − b1x¯)|X = xi ] Shivon Sue-Chee Simple Linear Regression 56 Variance and Covariance I V(a) = 0, a ∈ R I V(aY ) = a2V(Y ) I Cov(X ,Y ) = E{(X − E(X ))(Y − E(Y ))} = E(XY )− E(X )E(Y ) I Cov(Y ,Y ) = V(Y ) I V(Y ) = V[E(Y |X )] + E[V(Y |X )] I V(X ± Y ) = V(X ) + V(Y )± 2Cov(X ,Y ) I Cov(X ,Y ) = 0, if X and Y are independent I Cov(aX + bY , cU + dW ) = acCov(X ,U) + adCov(X ,W ) + bcCov(Y ,U) + bdCov(Y ,W ) I Correlation: ρXY = Cov(X ,Y )√ V(X )V(Y ) Shivon Sue-Chee Simple Linear Regression 57 Properties of Slope Estimator: Variance I Variance of slope estimate, b1 Var(b1|X ) = Var [ n∑ i=1 ciyi |X = xi ] Shivon Sue-Chee Simple Linear Regression 58 Properties of Intercept Estimator: Variance I Variance of intercept estimate, b0 Var(b0|X ) = Var [(y¯ − b1x¯)|X = xi ] Shivon Sue-Chee Simple Linear Regression 59 Statistical Assumptions of SLR Shivon Sue-Chee Simple Linear Regression 60 2.4. Statistical Assumptions of SLR I SLR Assumptions I Estimating σ2 I Sampling distributions of slope and intercept estimators Shivon Sue-Chee Simple Linear Regression 61 SLR Assumptions 1. We assumed that Y is related to x by the SLR model Yi = β0 + β1xi + ei , i = 1, . . . , n or E (Y |X = xi) = β0 + β1xi . In other words, the linear model is appropriate. And the following three Gauss-Markov conditions : 2. The errors e1, e2, . . . , en are have mean of 0, i.e., E (ei) = 0. 3. The errors e1, e2, . . . , en have a common variance σ2, i.e., Var(ei) = σ 2. The variation is the same for all observations, i.e., homoscedastic. 4. The errors e1, e2, . . . , en are uncorrelated, i.e. Cov(ei , ej) = 0, i 6= j . Shivon Sue-Chee Simple Linear Regression 62 SLR Assumptions 1. We assumed that Y is related to x by the SLR model Yi = β0 + β1xi + ei , i = 1, . . . , n or E (Y |X = xi) = β0 + β1xi . In other words, the linear model is appropriate. And the following three Gauss-Markov conditions : 2. The errors e1, e2, . . . , en are have mean of 0, i.e., E (ei) = 0. 3. The errors e1, e2, . . . , en have a common variance σ2, i.e., Var(ei) = σ 2. The variation is the same for all observations, i.e., homoscedastic. 4. The errors e1, e2, . . . , en are uncorrelated, i.e. Cov(ei , ej) = 0, i 6= j . Shivon Sue-Chee Simple Linear Regression 63 SLR Assumptions 1. We assumed that Y is related to x by the SLR model Yi = β0 + β1xi + ei , i = 1, . . . , n or E (Y |X = xi) = β0 + β1xi . In other words, the linear model is appropriate. And the following three Gauss-Markov conditions : 2. The errors e1, e2, . . . , en are have mean of 0, i.e., E (ei) = 0. 3. The errors e1, e2, . . . , en have a common variance σ2, i.e., Var(ei) = σ 2. The variation is the same for all observations, i.e., homoscedastic. 4. The errors e1, e2, . . . , en are uncorrelated, i.e. Cov(ei , ej) = 0, i 6= j . Shivon Sue-Chee Simple Linear Regression 64 SLR Assumptions 1. We assumed that Y is related to x by the SLR model Yi = β0 + β1xi + ei , i = 1, . . . , n or E (Y |X = xi) = β0 + β1xi . In other words, the linear model is appropriate. And the following three Gauss-Markov conditions : 2. The errors e1, e2, . . . , en are have mean of 0, i.e., E (ei) = 0. 3. The errors e1, e2, . . . , en have a common variance σ2, i.e., Var(ei) = σ 2. The variation is the same for all observations, i.e., homoscedastic. 4. The errors e1, e2, . . . , en are uncorrelated, i.e. Cov(ei , ej) = 0, i 6= j . Shivon Sue-Chee Simple Linear Regression 65 Estimating σ2- variance of the random error term I The random error ei has mean 0 and variance σ2. I The variance σ2 is another parameter of the SLR model. I Aim: estimate σ2 I Why: to measure the variability of our estimates of Y , carry out inference on our model Shivon Sue-Chee Simple Linear Regression 66 Estimating σ2- variance of the random error term I Notice that: ei = Yi − (β0 + β1xi) = Yi − unknown regression line at xi I Replacing β0 and β1 by their respective least squares estimates, we estimate the errors by eˆi = Y − (b0 + b1xi) = Yi − estimated regression line at xi I Using the estimated errors, we can show that an unbiased estimate of σ2 is S2 = ∑n i=1 eˆ 2 i n − 2 = RSS n − 2 Shivon Sue-Chee Simple Linear Regression 67 Statistical assumption for inference I In order to make inferences, we need one more assumption about the errors, ei ’s I Assume: The errors are Normally distributed, i.e., ei ∼ N (0, σ2) or e ∼ Nn(0, σ2I n) Shivon Sue-Chee Simple Linear Regression 68 Statistical assumption for inference Implications: 1. The normality assumption implies that the errors are independent (since they are uncorrelated). 2. Since yi = β0 + β1xi + ei , i = 1, . . . , n, Yi |xi is normally distributed. 3. The LS estimates of β0 and β1 are equivalent to their MLE’s. Shivon Sue-Chee Simple Linear Regression 69 Normal Error Regression Model Shivon Sue-Chee Simple Linear Regression 70 Sampling distributions of Slope and Intercept Estimators I Slope: Since b1 = ∑n i=1 ciyi is a linear combination of the yi ’s, b1|x is also normally distributed, i.e., βˆ1 ∼ N ( β1, σ2 SXX ) I Intercept: Since b1|X is normally distributed, y¯ is normally distributed and b0/x is a linear combination of b1|X and y¯ , we have that βˆ0 ∼ N [ β0, σ 2 (1 n + x¯2 SXX )] Shivon Sue-Chee Simple Linear Regression 71 SLR in R: Old Faithful data models Shivon Sue-Chee Simple Linear Regression 72
欢迎咨询51作业君