ECMT1020 Introduction to Econometrics Week 8, 2021S1
Lecture 7: Dummy Variables
Instructor: Ye Lu
Please read Chapter 5 of the textbook.
Contents
1 Motivation: two groups in the data 2
1.1 Chow test: pool or separate . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2 Limitation of separate regressions . . . . . . . . . . . . . . . . . . . . . . . . . 5
2 Regression with dummy variable 5
2.1 Separating two groups in one regression: dummy variable . . . . . . . . . . . 6
2.1.1 Intercept dummy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.1.2 Slope dummy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.2 Dummy variable trap: perfect multicollinearity . . . . . . . . . . . . . . . . . 9
3 More than two groups: more than one dummy variable 11
3.1 More than two groups from one grouping criterion . . . . . . . . . . . . . . . 11
3.1.1 M categories: M − 1 dummies . . . . . . . . . . . . . . . . . . . . . . 11
3.1.2 Change of reference category . . . . . . . . . . . . . . . . . . . . . . . 13
3.2 Multiple grouping criteria . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
Figure 1: Cost against number of students for 74 secondary schools in Shanghai, among
which 34 are occupational schools and 40 are regular academic schools.
1
1 Motivation: two groups in the data
Running example:
Yi = β1 + β2Xi + ui, i = 1, . . . , n, (1)
where
• Y = COST is the annual recurrent cost for running a school;
• X = N is the number of students in a school.
The question is to explain the effect of number of students on the cost of running a school.
The scatter plot of the annual cost against number of students for 74 secondary schools
in Shanghai is given in Figure 1. In the plot we can see that data points are divided into
two categories/types:
• occupational school: schools that aim to provide skills for specific occupations,
• regular school: regular general academic schools.
There are in total 34 occupational schools and 40 regular schools in this sample.
It is clear from Figure 1 that the red dots (occupational schools) are generally above the
grey dots (regular schools). This means overall it is more expensive to run the occupational
school than regular school1, or the overhead cost of occupational schools is higher than that
of regular school. Therefore, if we were to fit a linear regressoin like (1) separately for the two
types of schools, we expect the intercept estimate from the occupational school regression to
be higher than that from the regular school regression.
Moreover, we can also see from Figure 1 that the marginal cost of running occupational
schools for each additional student seems to be also higher than the case for regular schools.
In other words, if we were to fit a linear regressoin like (1) separately for the two types of
schools, we expect the slope estimate from occupational school regression to be higher than
that from regular school regression.
Given the above observations, we have good reason to think about fitting separate regressions
for these two distinct subsamples:
• subsample A: observations for occupational schools (sample size nA),
• subsample B: observations for regular schools (sample size nB),
where the subsample sizes must satisfy nA + nB = n.
The separate regressions for subsample A and B are respecively
Regression A: Y Ai = α1 + α2X
A
i + u
A
i , i = 1, . . . , nA, (2)
and
Regression B: Y Bi = γ1 + γ2X
B
i + u
B
i , i = 1, . . . , nB. (3)
Accordingly, we call regression (1) with all the observations as the ‘pooled regression’.
1This is reasonable because the occupational school tends to be expensive to run for the need to maintain
specialized workshops.
2
First of all, we should understand that if we fit separete regressions A and B rather than
a pooled regression, we must get better model fit in terms of smaller total sum of the squares
of the residuals. In other words, we should expect2
RSSA +RSSB ≤ RSSP (4)
where RSSA, RSSB and RSSP denote, respectively, the sum of the squares of the residuals
for regression A, regression B and the pooled regressoin (1) which we call regression P.
The question is: does this improvement specified in (4) statistically significant or not?
Put differently, is the quantity
RSSP − (RSSA +RSSB)
significantly different than zero or not? The Chow test (Chow, 1960) provides a formal
statistical procedure to answer this question.
1.1 Chow test: pool or separate
Recall how we perform F test in the multiple regression model to test whether adding extra
regressor(s) can significantly improve the model fit. The logic for the Chow test is the same,
as it is essentially a F test.
When we fit the pooled regression (1), we have k = 2 parameters to estimate; where the
total numbers of parameters doubles to 2k = 4 if we fit separate regressions (2) and (3). So
the scenario here is again that we get improvement of model fit by adding more parameters,
or in other words, sacrificing the degrees of freedoms (DF). Remember that extra parameters
cost/use up extra degrees of freedom – no free lunch!
The general formula for the F test statistic is
F (extra DF,DF remaining) =
improvement in fit/extra DF
RSS remaining/DF remaining
(5)
where the improvement in fit comes from using a model with more parameters (or less DF)
versus using a model with fewer parameters (or more DF); and the RSS remaining and DF
remaining are the RSS and DF of the model with more parameters. The null hypothesis is
H0 : no improvement in fit from a model with more parameters
and we reject the null hypothesis if the F test statistic is greater than the critical value at
certain significant level.
Chow test aims to decide whether separate Regressions A and B can provide a significantly
better fit than pooled Regression P. Note that we have
• Regression A
– sample size is nA;
– k parameters which use up k degrees of freedom in fitting Regression A;
– remaining DF is nA − k;
2Think about why? Read the textbook companion slides on the Chow test for the great illustration. Also
in the textbook equations (5.41)–(5.43).
3
– residual sum of squares is RSSA.
• Regression B
– sample size is nB;
– k parameters which use up k degrees of freedom in fitting Regression B;
– remaining DF is nB − k;
– residual sum of squares is RSSB.
• Regression P
– sample size is n = nA + nB;
– k parameters which use up k degrees of freedom in fitting Regression P;
– remaining DF is n− k;
– residual sum of squares is RSSP .
To put the Chow test into the F test context, we may consider Regression A and B together
as one model with 2k parameters in total, and Regression P itself as another model with
only k parameters. We want to use the F test given by (5) to test whether the former model
with 2k parameters provides a better fit than the latter with k parameters.
Let’s fit these quantities into the formula in (5):
• improvement in fit = RSSP − (RSSA +RSSB);
• extra DF used up = 2k − k = k;
• RSS remaining = RSSA +RSSB;
• DF remaining = (nA − k) + (nB − k) = (nA + nB)− 2k = n− 2k.
Therefore, we have the F test statistic for Chow test as
F (k, n− 2k) = (RSSP − (RSSA +RSSB))/k
(RSSA +RSSB)/(n− 2k) ,
and this test statistic is distributed as F distribution with k and n− 2k degrees of freedom
under the null hypothesis that there is no significant improvement in fit.
An example of Chow test for the cost regerssion of 74 secondary schools is given in the
textbook and companion slides.
• Regression A (occupational schools)
– nA = 34;
– k = 2 parameters which use up k = 2 degrees of freedom in fitting Regression A;
– remaining DF is nA − k = 34− 2 = 32;
– residual sum of squares is RSSA = 3.49(×1011).
• Regression B (regular schools)
– nB = 40;
– k = 2 parameters which use up k = 2 degrees of freedom in fitting Regression B;
– remaining DF is nB − k = 40− 2 = 38;
4
– residual sum of squares is RSSB = 1.22(×1011).
• Regression P
– n = 34 + 40 = 74;
– k = 2 parameters which use up k = 2 degrees of freedom in fitting Regression P;
– remaining DF is n− k = 74− 2 = 72;
– residual sum of squares is RSSP = 8.92(×1011).
Let’s, again, fit these quantities into the formula in (5):
• improvement in fit = RSSP − (RSSA +RSSB) = 8.91− (3.49 + 1.22) = 8.91− 4.71 =
4.20(×1011);
• extra DF used up = 2k − k = k = 2;
• RSS remaining = RSSA +RSSB = 3.49 + 1.22 = 4.71(×1011);
• DF remaining = n− 2k = 74− 4 = 70(= 32 + 38).
Therefore, the F test statistic for Chow test is
F (2, 70) =
(RSSP − (RSSA +RSSB))/k
(RSSA +RSSB)/(n− 2k) =
4.21/2
4.71/70
= 31.3.
The critical value of F (2, 70) distribution at the 0.1% level is 7.64 < 31.3. So we come to
the conclusion that we reject the null hypothesis at the 0.1% significance level, and believe
that we should run separate regressions two the two types of schools.
1.2 Limitation of separate regressions
We often want to see how different αˆ1 and γˆ1 are, and how different αˆ2 and γˆ2 are from
regressions (2) and (3) when we have the motivation to run separate regressions. They must
differ in values, but we cannot see how statistically significant the differences are. This leads
to the major drawback of the separate regressions. In particular, we draw your attention to
the below two problems.
1. (major problem) How can we tell how different the coefficients are? How statistically
significant are these differences?
2. When you run regressions with two small samples (nA < n, nB < n) instead of running
one large pooled regression, there is an adverse effect on the precision of the estimates
of the coefficients.
2 Regression with dummy variable
The usual solution to above problems is to fit a single regression with an extra dummy vari-
able. The dummy variable is a binary variable which only takes value either 0 or 1, and it
indicates the category that an observation belongs to. Figure 2 gives the illustration of a
dummy variable ‘OCC’ which takes value 1 if the observation is for the occupational school,
and 0 if not. We can see that the dummy variable is essentially an indicator we use to
mark each observation into certain category in the data set – like how we mark the data
5
Figure 2: Illustration of dummy variable: OCC is a 0-1 variable indicating the type of school
in the data set.
points using different colors (red and grey) in Figure 1. We often call the texts listed in
the second column of the table in Figure 2 as ‘categorical data’. So, a dummy variable is
actually a numerical variable which represents categorical data.
2.1 Separating two groups in one regression: dummy variable
Below we consider the intercept dummy and slope dummy, respectively, to address the two
issues we raised above:
1. Overhead costs for running occupational schools and regular schools can be different;
2. Marginal costs of each additional student can also be different for running occupational
schools and regular schools.
2.1.1 Intercept dummy
Let’s first see what happens if we introduce the above dummy variable OCC as an extra
explanatory variable into the simple regression (1). Now we have a multiple regression:
Yi = β1 + β2Xi + β3Di + ui, i = 1, . . . , n, (6)
where I use the generic notations Y,X,D for the dependent variable, the first regressor and
the dummy variable. In particular,
• Yi = COSTi is the annual recurrent cost for running the ith school;
• Xi = Ni is the number of students in the ith school.
• Di = OCCi = 1 if the ith school is an occupational school, and Di = OCCi = 0 if the
ith school is a regular school.
Despite being binary, D can be treated the same as other explanatory variable(s) in the
regression. We can fit the regression using the total n = 74 observations using OLS method,
6
and obtain the OLS estimates βˆ1, βˆ2 and βˆ3. The fitted regression is written as
Yi = βˆ1 + βˆ2Xi + βˆ3Di, i = 1, . . . , n.
How to interpret our parameter estimates? It would be clear if we look at the fitted regres-
sions (interpreted as estimated cost functions here) separately for two types of schools:
• Di = 1 : occupational school
Yi = βˆ1 + βˆ2Xi + βˆ3 · 1
= (βˆ1 + βˆ3) + βˆ2Xi, (7)
where the index i here runs through the indices of the occupational schools in the
sample.
• Di = 0 : regular school
Yi = βˆ1 + βˆ2Xi + βˆ3 · 0
= βˆ1 + βˆ2Xi, (8)
where the index i here runs through the indices of the regular schools in the sample.
Now, by comparing (7) and (8), we can interpret the parameter estimates as follows:
1. βˆ1 is the estimated annual overhead cost for regular schools,
2. βˆ2 is the estimated annual marginal cost of each additional student for both regular
schools and occupational schools.3
3. βˆ3 is the estimated extra annual overhead cost for occupational schools over regular
schools. Note that the intercept in the fitted regression (7) for occupational schools is
βˆ1 + βˆ3 which means the overhead cost for occupational schools is estimated as βˆ1 + βˆ3.
The interpretation of βˆ3 is the key to understand how the dummy variable D works in
regression (6). Implicitly, we have set the regular schools as the ‘reference’. We get the
overhead cost estimate for the reference type of school as βˆ1, the estimate of the intercept
in regression (6); and βˆ3 tells us how much extra overhead cost we need for the other type
of school.
Since the slope coefficient βˆ3 in front of the dummy variable D turns out to be the
difference between the two estimated intercepts in fitted regressions (7) and (8) to capture
the parallel shift from one fitted regression line to another, we often call the dummy variable
D in regression (6) as the ‘intercept dummy’.
Obtaining standard errors and conducting hypothesis testing in a regression with dummy
variable are not different than usual. It can be very useful to perform a t test on the coefficient
of the dummy variable to see whether there is a significant difference in the overhead costs
of the two types of school. See textbook and companion slides for more discussion.
3Note that it is a restriction of this model that the marginal costs for the two types of schools have to be
the same. Since this restriction sounds unrealistic, we will relax it in the next section.
7
2.1.2 Slope dummy
The intercept dummy can only shift the fitted regression line in a parallel manner, and it
cannot allow the two fitted regression lines to have different slopes. As mentioned above, this
is the limitation of the intercept dummy model, where the marginal costs for both types of
school have to be the same. The latter is clearly not a plausible assumption from the visual
inspection of Figure 1: the fitted regression (cost function) for the occupational schools
should be steeper, and that for the regular schools should be flatter.
To allow the slopes to be different, we introduce another ‘slope dummy variable’ X ·D
into the regression (6), and get
Yi = β1 + β2Xi + β3Di + β4XiDi + ui, i = 1, . . . , n, (9)
where the variables Y,X and D are the same as before.
Again, we can fit the regression using the total n = 74 observations and obtain the OLS
estimates βˆ1, βˆ2, β3 and βˆ4. The fitted regression is written as
Yi = βˆ1 + βˆ2Xi + βˆ3Di + βˆ4XiDi, i = 1, . . . , n.
We look at the fitted regressions separately for two types of school:
• Di = 1 : occupational school
Yi = βˆ1 + βˆ2Xi + βˆ3 · 1 + βˆ4Xi · 1
= (βˆ1 + βˆ3) + (βˆ2 + βˆ4)Xi, (10)
where the index i here runs through the indices of the occupational schools in the
sample.
• Di = 0 : regular school
Yi = βˆ1 + βˆ2Xi + βˆ3 · 0 + βˆ4Xi · 0
= βˆ1 + βˆ2Xi, (11)
where the index i here runs through the indices of the regular schools in the sample.
The fitted regressions (10) and (11) clearly show that now both intercepts and slopes can
be different, and the differences are captured by βˆ3 and βˆ4, respectively. Specifically, the
parameter estimates are now interpreted as follows:
1. βˆ1 is the estimated annual overhead cost for regular schools (the reference),
2. βˆ2 is the estimated annual marginal cost of each additional student for regular schools
(the reference).
3. βˆ3 is the estimated extra annual overhead cost for occupational schools.
4. βˆ4 is the estimated extra annual marginal cost for occupational schools.
Again, we can perform t tests as usual. The t test for the significance of the slope coef-
ficient βˆ4 can be useful for telling whether the marginal cost per student in an occupational
school is significantly higher than that in a regular school.
8
We can also perform an F test of the joint explanatory power of the intercept dummy
and slope dummy in regression (9) by testing
H0 : β3 = β4 = 0. (12)
To do this, we compare the RSS from regression (9) where both dummies are included and
the RSS from regression (1) where they are not, and use the usual F test statistic and critical
values to make testing decision. If we reject the null, then it means that at least one of β3
and β4 is different from zero.
In fact, the Chow test we introduced at the beginning of this lecture is equivalent to this F
test. We verify this in the same data example. Recall that we hadRSSA+RSSB = 4.71×1011
in Section 1.1. For the regression (9) on the whole sample with both intercept and slope
dummy variables, the residual sum of squares is also 4.71 × 1011. Note that the number of
parameters in regression (9) is 4 which is the same as the total number of parameters from
subsample Regressions A and B. Therefore, the below two sets of the model fit comparison
are equivalent:
• compare running (the pooled) regression (1) and running separate regressions A and
B in (2) and (3).
– Both regressions have no dummy variables.
– Regression (1) uses the whole sample, while regressions A and B are separately
fit using two subsamples.
• compare regression (1) and regression (9)
– Both regressions are based on whole sample.
– Regression (1) has no dummies, while regression (9) has both intercept and slope
dummies.
2.2 Dummy variable trap: perfect multicollinearity
Wait. There are two types of school in the data, but why do we only use one dummy
variable to separate the two groups? Can we use two dummies one for occupational schools
and another for regular schools? In other words, can we set
Do =
{
1 if occupational school
0 if regular school
Dr =
{
0 if occupational school
1 if regular school
and include both of them in a regression
Y = β1 + β2X + β3D
o + β4D
r + u? (13)
The answer is NO. We fall into the classic ‘dummy variable strap’ if we do this, and the reason
is that we essentially run into a special case of the ‘perfect multicollinearity’ discussed in the
lecture on multiple regressions.
Note what’s special about the two dummies Do and Dr defined above – they always add
to one, simply because a school in the sample is either occupational school or regular school.
9
So we have4
Do +Dr = 1. (14)
Look at the 1 on the right-hand side of the equation: can you find it also hides in our
regression (13)5 as one of the regressors? Yes, the constant 1 is the first regressor in the
regression. Note we can always write (13) as
Y = β1X1 + β2X2 + β3X3 + β4X4 + u
where
X1 = 1, X2 = X, X3 = D
o, X4 = D
r. (15)
Then from (15) and (14), it is clear that we have
X1 = X3 +X4,
which is a perfect linear relationship among the regressors! With such perfect multicollinear-
ity, we will not be able to perform the OLS estimation.
As you might have imagined, we can actually escape from the dummy variable trap by
excluding the intercept term in the regression. For example, instead of (13), we may run the
following regression
Y = β2X + β3D
o + β4D
r + u, (16)
which will be perfectly fine. Without the presence of perfect multicollinearity we can obtain
the OLS estimates for βˆ2, βˆ3 and βˆ4. Following the similar analysis as before (but notice
that logically Do and Dr can never be both one or zero), we have
• Do = 1 and Dr = 0: occupational school fitted regression
Y = βˆ2X + βˆ3. (17)
• Dr = 1 and Do = 0: regular school fitted regression
Y = βˆ2X + βˆ4. (18)
Apparently, now the interpretations of the parameters are different in general:
1. βˆ2 is the estimated annual marginal cost of each additional student for both regular
schools and occupational schools. −→ This is the same as in the case before with only
intercept dummy.
2. βˆ3 is the estimated annual overhead cost for occupational schools.
3. βˆ4 is the estimated annual overhead cost for regular schools.
4To be more explicit, we actually have Doi +D
r
i = 1 for all i = 1, . . . , n in the sample.
5Well, it hides in any regression with an intercept term as one of the regressors
10
The difference here is that there is no reference type of school any more! Neither of the
slope coefficients for the dummy variables is interpreted as the ‘extra’ overhead cost of one
type than the other. They are simply estimating the overhead costs for both types of school
separately.
Summary: If we would like to keep the intercept term in the regression, then for separating
two groups we only need one dummy variable. The general rule is that we need M − 1
dummy variables to separate M categories using a grouping criterion, if we have intercept
in the regression.
3 More than two groups: more than one dummy variable
In practice, there are often cases where there are more than two distinct groups in the data,
either it is because we obtain more than two categories by dividing the observations based
on one grouping criterion, or it is because we use multiple criteria to group the observations.
• One grouping criterion: for example, when we try to group the 74 secondary schools in
Shanghai based on the type of the curriculum, we can do a finer job than just classifying
them as occupational or regular. In fact, there are also two types of occupational school,
and they are technical schools training technicians and skilled workers’ schools training
craftsmen. There are also two types of regular secondary school in Shanghai, they are
general schools which provide the usual academic education, and vocational schools.
So, in total there are 4 types of school among the 74 schools. Figure 3 mark these 4
types into 4 different colors.
• Multiple grouping criteria: suppose we want also to take into account of the fact that
some schools are residential and some are not, then we can use two grouping criteria
to group the observations: residential or not, and occupational or not. Using these two
grouping criteria each has two categories, in total we divide the observations also into
4 groups. This is illustrated in Figure 4.
In the following we consider these two cases.
3.1 More than two groups from one grouping criterion
This is the case as illustrated in Figure 3, and the example is a straightforward extension of
the example we discussed before with two groups.
3.1.1 M categories: M − 1 dummies
To separate the four groups of schools shown in Figure 3, we need 4−1 = 3 dummy variables.
They are illustrated in the last three columns of the table in Figure 5.
Where is the general school? Yes, it is chosen as the reference type/category. Therefore,
we only see dummy variables for the other three types: technical schools, worker’s schools,
and vocational schools. The reference category is hence usually described as the ‘omitted’
category.
The regression we run is
Yi = β1 + β2Xi + β3D
T
i + β4D
W
i + β5D
V
i + ui, i = 1, . . . , n, (19)
11
Figure 3: Cost against number of students for 74 secondary schools in Shanghai classified
into four categories.
Figure 4: Cost against number of students for 74 secondary schools in Shanghai: classified
into two sets of categories: residential/nonresidential and regular/occupational.
where
• Yi = COSTi is the annual recurrent cost for running the ith school;
• Xi = Ni is the number of students in the ith school.
• DTi = TECHi = 1 only if the ith school is a technical school.
• DWi = WORKERi = 1 only if the ith school is a workers’ school.
• DVi = V OCi = 1 only if the ith school is a vocational school.
Keeping in mind that the reference category is the general school, we can easily obtain the
below interpretations of the parameter estimates:
1. βˆ1 is the estimated annual overhead cost for general school (the reference),
12
2. βˆ2 is the estimated annual marginal cost of each additional student for all schools
(because there is no slope dummy yet).
3. βˆ3 is the estimated extra annual overhead cost for technical school over the general
school.
4. βˆ4 is the estimated extra annual marginal cost for workers’ school over the general
school.
5. βˆ5 is the estimated extra annual marginal cost for vocational school over the general
school.
The standard errors and hypothesis tests are not different than usual. The analysis done for
the two groups with one dummy variable can be simply generalized in this case with more
than two groups.
3.1.2 Change of reference category
In the above regression we chose the general school as the reference category, and thus we
can compare the overhead costs of other schools with general schools and to test whether
the differences were significant by using the t test for each individual parameters β3, β4 and
β5.
What if we were interested in testing whether the overhead costs of workers’ schools were
different from those of the other types of schools? −→ The easiest way to do this is to re-run
the regression making workers’ schools the reference category. This is simple: we just need
to get rid of the dummy for worker’s school and add the dummy for the general school in
regression (19)!
What do we expect to see from the estimation of the regression with the new reference?
• The parameter estimates for the intercept and slope coefficients are certainly different
except for βˆ2 (which is the estimated marginal cost for all school types).
• The fitted regression (cost function) for each category should remain the same!
See more detailed discussion in the textbook and the companion slides.
3.2 Multiple grouping criteria
To separate the four groups of schools shown in Figure 4 made by two grouping criteria, we
need two sets of dummy variables. They are illustrated in the last two columns of the table
in Figure 6. Since there are only two categories under each grouping criterion, we need one
(two minus one) dummy variable for each grouping. In total we need two dummy variables.
The regression we run is
Yi = β1 + β2Xi + β3OCCi + β4RESi + ui, i = 1, . . . , n, (20)
where
• Yi = COSTi is the annual recurrent cost for running the ith school;
• Xi = Ni is the number of students in the ith school.
• OCCi = 1 if the ith school is an occupational school.
13
Figure 5: Three dummy variables for separating four types of secondary schools in Shanghai.
Figure 6: Two sets of dummy variables: OCC and RES.
14
• RESi = 1 if the ith school is a residential school.
Obviously, the ith school can be both occupational and residential, or both not. This means
OCCi and RESi can both be one or zero.
The fitted regression is written as (omitting the subscript i for observations)
Y = βˆ1 + βˆ2X + βˆ3OCC + βˆ4RES.
To interpret our parameter estimates, we consider:
• OCC = 0, RES = 0 : regular, nonresidential school cost function
Y = βˆ1 + βˆ2X. (21)
• OCC = 0, RES = 1 : regular, residential school cost function
Y = (βˆ1 + βˆ4) + βˆ2X. (22)
• OCC = 1, RES = 0 : occupational, nonresidential school cost function
Y = (βˆ1 + βˆ3) + βˆ2X. (23)
• OCC = 1, RES = 1 : occupational, residential school cost function
Y = (βˆ1 + βˆ3 + βˆ4) + βˆ2X. (24)
Interpretations:
1. βˆ1 is the overhead cost for regular, nonresidential school −→ see (21).
2. βˆ2 is the marginal cost of each additional student for all schools (because there is no
slope dummy yet) −→ see (21)–(24).
3. βˆ3 is the extra overhead cost for occupational, nonresidential school over the regular,
nonresidential school (compare (21) and (23)) and also the extra overhead cost for
occupational, residential school over the regular, residential school (compare (22) and
(24)).
4. βˆ4 is the extra overhead cost for regular, residential school over the regular, nonresidential
school (compare (21) and (22)) and also the extra overhead cost for occupational,
residential school over the regular, nonresidential school (compare (23) and (24)).
Clearly, βˆ3 estimates the extra overhead cost for occupational school over regular school,
irrespective of the school being residential or not. Likewise, βˆ4 estimates the extra over-
head cost for residential school over nonresidential school, irrespective of the school being
occupational or not. This is part of the restrictions in the model.
15  Email:51zuoyejun

@gmail.com