程序代写案例-ECMT1020

欢迎使用51辅导，51作业君孵化低价透明的学长辅导平台，服务保持优质，平均费用压低50%以上！ 51fudao.top

ECMT1020 Introduction to Econometrics Week 5, 2021S1
Lecture 5: Multiple Regression Analysis
Instructor: Ye Lu
Please read Chapter 3 of the textbook.
Contents
1 Linear regression model with two explanatory variables 2
1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Coefficient estimates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 Interpretation of coefficients in multiple regression . . . . . . . . . . . . . . . 3
1.3.1 Population parameter . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.3.2 Coefficient estimates . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.4 Properties of coefficient estimates . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.4.1 Unbiasedness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.4.2 Efficiency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.4.3 Precision/variance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.5 t tests and confidence intervals . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2 Multicollinearity 8
2.1 The issue caused by linear dependence . . . . . . . . . . . . . . . . . . . . . . 8
2.2 Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.2.1 Direct method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.2.2 Indirect method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3 Goodness of fit: R2 and adjusted R2 11
3.1 The R2 and its property . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
3.2 Adjusted R2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
4 Prediction 14
In the last two lectures, we discussed the simple regression model where there is only one
explanatory variable. In general, there will be several or even many explanatory variables
and we wish to quantify the impact of each, controlling for the effects of others. However,
the controlled experiments are usually not possible in economics. Understanding what we
mean by ‘controlling for’ is the first goal of learning in this chapter.
Technically, much of the discussion of the multiple regression analysis will be a straigth-
tfoward extension of the simple regression model that we learned in the last two lectures.
What’s new here? → The foremost answer is the ‘Multi-col-linearity’ which occurs when
the variances of estimators, and hence their precision, are adversely affected by the linear
relationships among the explanatory variables.
We will leave the discussion of F test for the model goodness of fit of both simple
regression and multiple regression to the next lecture in Week 6. The mid-semester test in
Week 7 will cover the material up to this week’s lecture. The F test in regression analysis
will not be covered in the mid-semester test in Week 7.
1
1 Linear regression model with two explanatory variables
In this section, we focus on the model with two explanatory variables. The discussion readily
extends to cases with more than two explanatory variables.
1.1 Introduction
We consider the regression
Y = β1 + β2X2 + β3X3 + u (1)
where
• Y = EARNINGS, the hourly earnings measured in dollars;
• X2 = S, years of schooling (highest grade completed);
• X3 = EXP , years spent working after leaving full-time education (experience);
• u is the disturbance term.
Why do we start the labeling of the explanatory variables with X2? Where is X1?
• You may consider X1 as a variable which takes value 1 for any observation in any
sample. In other words, X1 = 1 always and is used to construct the intercept term.
• If we let X1 = 1, then the regression model (1) can be written as
Y = β1X1 + β2X2 + β3X3 + u.
It looks more nicely, doesn’t it?
Interpretations of the parameters:
• β1: how much a respondent would earn, on average, if they have zero years of schooling
and no work experience.
• β2: how much will the hourly earnings change, on average, for every additional year
of schooling, controlling for the effect of work experience (X3). In more mathematical
terms: how will Y change, on average, if X2 increases by one unit, holding X3 constant.
• β3: how much will the hourly earnings change, on average, for every additional year
of work experience, controlling for the effect of schooling (X2). In more mathematical
terms: how will Y change, on average, if X3 increases by one unit, holding X2 constant.
One of the main learning goals in this chapter is to understand, from different perspectives,
what ‘controlling for’ means.
If we plot the data in a 3-dimentional (3D) space, then estimating β1, β2 and β3 in the
model is equivalent to finding a best fit plane.1
1Recall that for the OLS regression with a single explanatory variable, we were looking for a best fit line
through the data plot in a 2-dimensional space.
2
1.2 Coefficient estimates
Given a sample with n observations, we have
Yi = β1 · 1 + β2X2i + β3X3i + ui, (2)
for i = 1, . . . , n. Derivation of the multiple regression coefficient estimates is better done
(and should always be done) using matrix algebra, also called linear algebra. This is also
repeatedly emphasized in the textbook throughout this chapter.
Just to give you a taste of what it means by using matrix algebra (of course not required
in this course): This means we may stack the observations into a vector/matrix instead of
dealing with them individually:
Y :=

Y1
Y2
...
Yn

n×1
, X :=

1 X11 X12
1 X12 X22
...
...
...
1 X1n X2n

n×3
, u :=

u1
u2
...
un

n×1
.
Moreover, we also look at the model parameters as a whole by defining a parameter vector
β :=
β1β2
β3
 .
Then we can write the regression in (2) in a clean form
Y = Xβ + u
using the rules for multiplying matrices. The OLS formula for β1, β2 and β3 can be obtained
all at once:
βˆ =
βˆ1βˆ2
βˆ3
 = (XTX)−1XTY, (3)
where XT means the transpose of matrix X. Isn’t this formula clean and nice!
I would recommend anyone who wants to learn more econometrics or want to be more
proficient in performing data analysis to take a linear algebra course, or learn linear algebra
from the MIT Professor Gilbert Strang! You will also learn to use matrice algebra to con-
duct econometric analysis in more advanced econometrics courses such as ECMT2160 and
ECMT3110 taught by Professor Brendan Beare at Usyd.
Without matrix algebra, the formulas for the OLS estimators are even more clumsy than
those in the simple regression analysis, not surprisingly.
1.3 Interpretation of coefficients in multiple regression
The question here2 is: How to interpret β2 and βˆ2 in the multiple regression (1)?
2It is clear that the same question can be asked for β3 and βˆ3 in an analogous way.
3
1.3.1 Population parameter
We look at β2 first.
• Note that β2 is the parameter in front of the explanatory variable X2 = S for years of
schooling.
• The question is what is the difference between running the simple regression (discussed
in the previous chapter):
Y = β1 + β2X2 + u, (4)
and running the multiple regression with an extra explanatory variable X3 = EXP for
years of work experience:
Y = β1 + β2X2 + β3X3 + u. (5)
Recall:
• Interpretation of β2 in the simple regression (4): how much will Y (hourly earnings)
change, on average, for every additional change in X2 (years of schooling). In fact,
this interpretation makes sense if there is no omitted variables in the disturbance term
which is correlated with X2.
• Interpretation of β2 in the multiple regression (5) as we stated before: how much will
Y (hourly earnings) change, on average, for every additional change in X2 (years of
schooling), controlling for the effect of X3 (work experience).
The difference is this extra “controlling for”: If X3 does have effect on Y , and X3 is correlated
with X2, then β2 in the simple regression (4) and β2 in the multiple regression (5) can be
very different! ⇒ Omitted variable bias
• If X3 = EXP which positively affects Y = EARNINGS is omitted in the simple
regression (4), it is left in the disturbance term. Note that for a person at a given age,
the work experience is often negatively correlated with the years of schooling. We have
X2 = S ↑ ⇒ X3 = EXP ↓ ⇒ Y = EARNINGS ↓
• Therefore, omitting the work experience variable X3 in the regression (4) explaining
hourly earning can underestimate the effect of schooling X2.
1.3.2 Coefficient estimates
Corresponding to the understanding of the population parameters, we may
• denote β˜2 as the OLS estimator for β2 in the simple regression (4);
• denote βˆ2 as the OLS estimator for β2 in the multiple regression (5).
We want to compare β˜2 and βˆ2. By checking the formulas of β˜2 and βˆ2, you can see they
differ in their values for sure. In fact, in our earnings regression example, both βˆ2 and β˜2
are postive and βˆ2 > β˜2. This can be seen from the comparison of two slopes estimated
4
from the simple regression without X3 and from the multiple regression with X3, shown in
Figure 3.2 in the textbook, or from the Stata outputs. Again, omitting the work experience
as the explanatory variable for hourly earnings will lead us to underestimate the effect of
schooling on hourly earnings.
Another interesting way to understand βˆ2 is by the so-called Frisch-Waugh-Lovell theorem
about the “purged regression”: we can purge the effect of work experience to eliminate the
distortion of omitting this variable. Below are the details.
Run regression of Y on X3. The fitted regression is
Yˆ = c1 + c2X3 (6)
where c1 and c2 denote, temporarily, the OLS estimators in the regression of Y on X3. We
denote the fitted residual from the above fitted regression as
EEARN := Y − Yˆ .
We understand EEARN as the “purged earnings” in the sense that the effect of X3 (work
experience) is “purged” from Y (earnings) by fitting a regression like (6).
Run regression of X2 on X3. The fitted regression is
Xˆ2 = d1 + d2X3 (7)
where d1 and d2 denote, temporarily, the OLS estimators in the regression of X2 on X3. We
denote the fitted residual from the above fitted regression as
ES := X2 − Xˆ2.
We understand ES as the “purged schooling” in the sense that the effect of X3 (work
experience) is “purged” from X2 (schooling) by fitting a regression like (7).
Now, using the “purged earnings” EEARN and the “purged schooling” ES obtained as
residuals from the fitted regression (6) and (7), we can run a linear regression of EEARN
on ES:
“purged regression”: EEARN = γ1 + γ2ES + u. (8)
The Frisch-Waugh-Lovell theorem says: the OLS estimator of γ2 in the “purged regres-
sion” (8) is the same as the OLS estimator of β2 in the multiple regression (5)! You can
check this from the Stata outputs.
1.4 Properties of coefficient estimates
Again, for establishing the properties of coefficient estimates, we need to make it clear about
the assumptions made in the classical linear regression models. Extending from simple
regression model to multiple regression model, we maintain all the assumptions made in the
last lecture and just need to add one more which is key for multiple regression:
Key additional assumption: There does not exist an exact linear relationship among the
explanatory variables in the sample.
5
Then we have the unbiasedness and efficiency of OLS coefficient estimates. Gauss-Markov
theorem holds for multiple regressions in general. We also present the formulas of the vari-
ances and the standard errors of the OLS estimators. What we want to emphasize is that all
the subsequent analysis in this section is not substantially different from that in the simple
regression analysis!
1.4.1 Unbiasedness
The proof of the unbiasedness of the OLS estimators for the multiple regression coefficients is
not substantially different from that in the simple regression analysis, under the assumptions
of classical regressions.
Essentially, we just want to decompose the OLS estimator into the sum of the true
parameter and a random term. The random term is a linear combination of the disturbance
terms with the linear coefficients only dependent on the X’s. For example, for βˆ2 being the
OLS estimator of β2 in the multiple regression (1), we can obtain
βˆ2 = β2 +
n∑
i=1
a∗i2ui
where a∗i2 only depends on the data of the explanatory variables, which is treated as fixed in
the classical regression models. Therefore,
E(βˆ2) = β2 +
n∑
i=1
a∗i2E(ui) = β2 + 0 = β2.
The logic is the same as that in the last lecture.
1.4.2 Efficiency
The efficiency result is, again, due to the Gauss-Markov theorem which proves that for
multiple regression analysis, OLS yields the most efficient linear unbiased estimators of the
parameters: Among all the linear unbiased estimators, OLS is the best in the sense that it
has the smallest variance.
1.4.3 Precision/variance
Recall the variance formula of the slope coefficient estimator in the simple regression analysis:
σ2
βˆ2
=
σ2u∑n
i=1(Xi −X)2
,
where X is the only explanatory variable. You should understand how the levels of variance
of the disturbance term σ2u, the sample size n, and the so-called mean square deviation
defined as
MSD(X) =
1
n
n∑
i=1
(Xi −X)2
6
affect the variance of βˆ2, and hence the precision of βˆ2. This is through reading the relation-
ship
σ2
βˆ2
=
σ2u
n ·MSD(X) .
Now, we move to the multiple regression analysis. Again, if you want a clean derivation
of the variance formula of OLS estimators of the parameters, then you may want to use linear
algebra. Without matrix algebra, we do not tempt to derive the variance formula using the
usual algebra. So, we just present the formula here.
In particular, for βˆ2 now being the OLS estimator for β2 in front of regressor X2 in the
multiple regression (1) with another regressor X3, its variance is given by (using the MSD
notation)
σ2
βˆ2
=
σ2u
n ·MSD(X2) ×
1
1− r2X2,X3
, (9)
where the extra factor depends on rX2,X3 , the sample correlation between X2 and X3.
Looking at the variance formula for βˆ2 in (9), you should ask yourself the questions such
as: how does the precision of βˆ2 change (holding others unchanged)
• if the sample correlation between the explanatory variables becomes higher (lower),
positive or negative?
• if the MSD of X2 becomes higher (lower)?
• if the sample size becomes larger (smaller)?
• if the variance of the disturbance term becomes larger (smaller)?
You should also be able to intuitively understand these results.
What’s next is, again, deriving the standard error of the OLS estimator βˆ2 given the
variance formula (9) and an unbiased estimator for σ2u which is
σˆ2u =
1
n− k
n∑
i=1
uˆ2i ,
where uˆ is the fitted residual from OLS multiple regression (1), and the number of right-hand
side variables including the intercept is k = 3 in that case. With σˆ2u in hand, we obtaint the
standard error of βˆ2 as
s.e.(βˆ2) =
√
σˆ2u
n ·MSD(X2) ×
1
1− r2X2,X3
. (10)
Note that everything on the right-hand side of (10) can be constructed using the observations
of your sample. So we are good!
A break-down of the standard error formula (10) into different factors (σˆu, n, MSD(X2),
and rX2,X3) are shown vividly in the earings regression with two subsamples (union and
non-union), in the text book and companion slides on the precision of multiple regression
coefficients.
7
1.5 t tests and confidence intervals
The t tests on the regression coefficients are performed in the same way as for simple regres-
sion analysis. That is, if you want to test the general null hypothesis
H0 : β2 = β
0
2
where β2 is the parameter in front of X2 in the multiple regression (1), and β
0
2 is the hypo-
thetical true value (often β02 = 0 for testing the significance of the parameter), then the t
test statistic is, again, given by
t =
βˆ2 − β02
s.e.(βˆ2)
,
where βˆ2 is the OLS estimator for β2, and s.e.(βˆ2) is given by (10). Under the null hypothesis
(and the normality assumption in the classical regression model), this t statistic follows t
distribution with degree of freedom n− k, that is,
t ∼ tn−k.
Again, here k = 3 for multiple regression (1).
When conducting the test, you need to look up the critical value of this t distribution
tn−k depending on the significance level of the test, and decide whether you reject the null
or not by comparing the test statistic with the critical value. Of course, you can also make
the testing decision by comparing the p-value produced in the regression output with the
significance level of the test. The third way is to construct the confidence interval of the true
parameter using the OLS estimator and its standard error, and see whether the confidence
interval covers the true parameter. More details are given in the last lecture notes.
2 Multicollinearity
The concept of “multi-col-linearlity” is specific to multiple regression analysis, which is not
discussed in the simple regression analysis. This is because it is concerned with the potential
linear relationship between two or among several explanatory variables.
2.1 The issue caused by linear dependence
We have already seen from the variance formula of βˆ2 in (9) that: the larger the higher the
sample correlation between X2 and X3, the larger the population variance is for βˆ2.
3 There-
fore, the correlation between explanatory variables can reduce the precision of the coefficient
estimates. In such case, we say the regression model suffers from “multi-col-linearlity”.4
3The same holds for the variance of βˆ3, because we have analogously
σ2βˆ3 =
σ2u
n · MSD(X3) ×
1
1 − r2X3,X2
.
4Note that a high correlation does not necessarily cause imprecise estimators, because there are other
factors such as MSD, sample size, and variance of disturbance term that also affect the precision of the
estimators.
8
The presence of multicollinearity is generally not an issue, and almost all regresssions
will suffer from it to some extent (it’s a matter of the degree, not the kind), unless all
the explanatory variables are uncorrelated (which is often unlikely). With the presence of
multicollinearity, the OLS estimators are still unbiased, although their variance may increase.
The real trouble maker is the case of exact multicollinearity (also called the perfect mul-
ticollinearity). This is the case when there is a perfect correlation between explanatory
variables, or there is a linear relationship5 between them.
• Example: X3 = 2X2 − 1. What is the population correlation between X2 and X3 in
this case?
ρX2,X3 =
Cov(X2, X3)√
Var(X2)Var(X3)
, (11)
where
Cov(X2, X3) = Cov(X2, 2X2 − 1) = Cov(X2, 2X2) = 2Var(X2), (12)
by covariance rules, and
Var(X3) = Var(2X2 − 1) = 4Var(X2). (13)
by variance rules. Plugging (12) and (13) into (11) yields
ρX2,X3 =
Cov(X2, X3)√
Var(X2)Var(X3)
=
2Var(X2)√
4Var(X2)Var(X2)
= 1!
Note this is not a coincidence. The correlation between two random variables with
a linear relationship is always 1 or −1 (perfectly positively correlated or perfectly
negatively correlated). This was discussed in the Review chapter and proved in Exercise
R.13 (Workshop Week 2).
• See the textbook example (also in the companion slides) for the troubles made by this
linear relationship between X2 and X3:
– You will not be able to identify the true model!
– You will not be able to compute your OLS estimators!
• The machinery of matrix algebra can provide another clear insight on why the linear
relationship among explanatory variables causes problems for model identification or
OLS computation. In fact, if there is perfect linear relationship in your explanatory
variables, then the XTX term in equation (3) will not be invertible (the inverse does
not exist). So (XTX)−1 in the OLS formula is in trouble!
• Note that such exact relationship between the explanatory variables in a regression is
unusual. It typically comes from a logical error in the model specification. Example:
Exercise 3.16.
5Jargon: When such linear relationship exists, we also say there is an exact relationship between the
explanatory variables. This is who the ‘exact multicollinearity’ got its name.
9
• But it often happens when there is an approximate linear relationship: the two ex-
planatory variables are highly correlated although not perfectly correlated6. This is
the case of practical concern.
– The approximate linear relationship in more than two explanatory variables does
not necessarily mean the high pairwise correlations.
– Example: regression of determining educational attainment (highest degree at-
tained by the respondent) with three explanatory variables: cognitive ability,
mother’s highest degree, and father’s highest degree (p.174–175 in the textbook,
also in Exercise 3.1 and 3.2). In Table 3.7, the coefficient estimate for the partial
effect of the mother’s education is shown to be not very significant. This might be
the result of multicollinearity as the assortive mating leads to the high correlation
between mother’s education and father’s education.
2.2 Solutions
The various ways of alleviating multicollinearity in the multiple regression model fall into
two categories: direct method and indirect method.
2.2.1 Direct method
Recall the four factors that are responsible for the precision of the OLS estimators (and
remember the directions of their effects):
1. variance of disturbance term σ2u
2. sample size n
3. mean square deviation (MSD) of the explanatory variable
4. correlation among explanatory variables
To alleviate the problem of the lack of precision caused by multicollinearity, we can
1. reduce variance of disturbance term σ2u by discovering the potential omitted variables
and add them as regressors. → compare Table 3.7 and Table 3.10 in the textbook
• But this approach sometimes can backfire on you: If the additional regressor
is highly correlated with the exisitng regressors, then the standard error of the
estimator can even increases.
2. increase the sample size n by measures such as:
• do not use a sub-sample without a good reason (try to use the full sample) →
compare Table 3.7 and Table 3.9 in the textbook;
• negotiate a bigger budget to conduct the survey in a larger scale;
• make a fixed budget go further by using ‘clustering’ technique;
• for time series data, shorten your sampling interval (but be very careful about the
potential autocorrelation caused by this procedure, which can be another trouble
maker! More in the later lectures on time series models.)
6Again, this is a matter of degree.
10
It is usually hard to alleviate the multicollinearity problem by maniputlating the MSD or
exisiting correlation among explanatory variables.
2.2.2 Indirect method
There are a couple of potential approaches in the indirect method category:
• If the correlated variables are similar conceptually, then we may combine them into
some overall index rather than using them individually in the regression.
– Used already in the educational attainment example where the ‘cognitive ability’
is an index measure.
• Drop some correlated variables if they have insignificant coefficient etimates. But there
is a risk of model misspecification (discussed in Chapter 6).
• Use extraneous information concerning the coefficient of one of the variables (empirical
restriction).
– See the example of using household level information in a time series regression
with aggregate macro variables (consumer expenditure regressed on disposable
income and price level, where the time series data of disposable income and price
level are often highly correlated).
– Also note the potential issues associated with this solution.
• Use theoretical restrictions.
– There might be some hypothetical relationship among the population parameters
in the regression model.
– For example: we assume that the mother’s education and father’s education have
the same level of effect on the respondent’s educational attainment.
– As always, when we make theoretical restrictions (our assumptions based on the
economic or other social science theory), at the same time we sacrifice something
else. Its a trade-off! For example, by making the assumption of mother’s education
and father’s education has the same effect as above, then we will not be able to
separately identify each of them. In practice, you have to make decisions on how
to moderate the joint risk caused by different factors by analyzing them carefully.
– Testing theoretial restrictions is a kind of model misspecification test. This is the
topic in Chapter 6.
3 Goodness of fit: R2 and adjusted R2
The R2 and adjusted R2 are two closely related basic measure for the model goodness of fit,
and hence considered as the diagnostic statistics for studying the model specification. A side
note is that there is a large set of model specification diagnostic statistics which include not
just R2 and adjusted R2. When studying them, we need to understand their usefulness and
limitations as well.
11
3.1 The R2 and its property
As discussed in the simple regression analysis in Week 3, here for the multiple regression
the variation in the dependent variable, measured by TSS, can be decomposed into ESS and
RSS.
Therefore, we can define the measure of goodness of fit in exactly the same way. That
is, we define the R2 as
R2 =
ESS
TSS
=
∑n
i=1(Yˆi − Y )2∑n
i=1(Yi − Y )2
.
It is clear that since TSS = ESS +RSS, we have
R2 = 1− RSS
TSS
= 1−
∑n
i=1 uˆ
2
i∑n
i=1(Yi − Y )2
. (14)
It is also equal to the square of the correlation coefficient for Y and Yˆ .
A very important fact about R2 in the multiple regression framework is that: it can never
decrease, and generally will increase, if you add more variables to your regression.
The formal proof of this argument will be much easiler if matrix algebra is used. But
intuitively, this can be understood by looking at the following two fitted regressions
Yˆ = βˆ1 + βˆ2X2 + βˆ3X3 (15)
and
Yˆ = β˜1 + β˜2X2, (16)
where βˆ’s are OLS estimators of the regression of Y on both X2 and X3, while β˜’s are OLS
estimators of the regression of Y on only X2.
Clearly, when we fit the regression in (16), we implicitly make the restriction that the
coefficient of the variable X3 is zero. Remember that OLS estimators are obtained by
searching through all the possible values of the coefficients to try to minimize the sum of
squared residuals (RSS). If getting the least RSS is the goal, then will OLS procedure does
a better job when some parameter(s) are restricted or not?
More freedom (less restriction) → better result (smaller RSS)!
Therefore, if we let
• RSSu denote the RSS of fitted regression (15) with unrestricted coefficient in front of
X3, and
• RSSr denote the RSS of the fitted regression (16) with restricted (set to zero) coefficient
in front of X3,
then it follows from the above logic that we always have
RSSu ≤ RSSr.
12
Since the total sum of squares, TSS, is the same for both regression (it depends only on Y
and its sample mean), then by the formula (14) it follows immediately that
R2u ≥ R2r ,
where R2u and R
2
r denote the R
2 from the fitted regressions (15) and (16), respectively. In
otherwords,
The R2 is always non-decreasing whenever one more variable is added to the regression.
The cautionary tale here is that you may blindly throw in as many explanatory variables as
possible into your regression if your goal is just to improve the goodness of fit measure R2.
And this can be very dangerous! (Why? Think about the multicollinearity)
3.2 Adjusted R2
To impose penalty on adding extra regressors, the R2 is often ‘adjusted’. The adjusted R2,
usually denoted as R
2
, is defined as
R
2
= 1− RSS
TSS
× n− 1
n− k ,
where k is the number of parameters in the regression model, and the factor n−1n−k acts as a
penalization for extra parameters. We can write the formula as
1−R2 = RSS
TSS
× n− 1
n− k ,
which implies that
k ↑ ⇒ n− k ↓ ⇒ 1
n− k ↑ ⇒ 1−R
2 ↑ ⇒ R2 ↓ (17)
which offsets the effect that
k ↑ ⇒ RSS ↓ ⇒ 1−R2 ↓ ⇒ R2 ↑
Recall the definition for R2 given in (14):
1−R2 = RSS
TSS
.
Clearly, without the penalization factor, the penalization procedure (17) is not present for
R2. The k ↑ will only lead R2 to ↑ (not decrease, more precisely).
It can be shown that the addition of a new variable to a regression will cause R
2
to
rise if and only if the absolute value of its t statistic is greater than one. Since t statistic
being greater than one does not necessarily mean that the coefficient in front of the newly
added variable is significant, it does not necessarily follow that the rise of R
2
implies the
improvement of the model specification.
13
4 Prediction
Note that ‘predition’ is often heard in the time-series regression and related to the forecasting
of the variables there.
In the cross-sectional context, the ‘prediction’ also has its practical relevance. One exam-
ple is the ‘hedonic pricing’ model for a good or service which has a number of characteristics
that individually give it value to the buyer. The market price of the good is then assumed
to be a linear function of the (often implicit) prices (the β’s) of these characteristics.
See the textbook section and also the companion slides.
14

欢迎咨询51作业君