ECON 203C: System Models TA Note 1: Version ersion 1
Instrument Variable (IV) and Two Stage Least Square (2SLS) Estimators Hisayuki Yoshimoto Last Modi…ed: Modi…ed: April 08, 2008
Abstract: In section 1, we discuss the inconsistency inconsistency of originally originally least square estimator estimator when underlyin underlying g OLS assumption assumptionss are violated. violated. Throughout Throughout this TA note, I used the example example of classical classical labor economic regression regression,, return return to education. In section 2 and section 3, we review the Instrument variable (IV) and two stage least square (2SLS) estimator with their interpr interpretat etation. ion. In section 4, we discuss the asymptotic asymptotic property property of 2SLS estimator. estimator. In section 5, we review partial partial (residual) (residual) regression. regression. Finally Finally,, in section 6, we solve Comp 2006S Part II I Question 1, the application application of partial regression with instruments.
1
Incons Inconsist isten ency cy of OLS OLS Esti Estima mator tor
1.1
Bias and and Inconsi Inconsisten stency cy with with Endogen Endogenous ous Regre Regressors ssors
Roughly Roughly speaking speaking1 , a regressor is called exogenous exogenous if it is not correla correlated ted with with error error term. term. Also, Also, a regres regressor sor is called called endogenous if endogenous if it is correlated with error term. Here, we consider the OLS model such that regressors are endogenous. = xi0 + + ui 0 = xi + ui :
yi yi
|{z} |{z} |{z} |{z} 11
Matrix notation is
11
1K K 1
= X + + U X + U : =
Y Y
|{z} |{z} |{z} |{z} N 1
N K K 1
N 1
However, unlike usual OLS assumptions, we here assume E ui jxi [ ui xi ] = 0 11
j 6
and
1
N
P
N i=1
xi ui
p 9
and
E [xi ui ] = 0 k1
E ujX [ u X ] = 0 N 1
j 6
1
and
6
N i=1
Intuitively, regressor x i and error term u i are correlated. The OLS estimator is ^ OLS
| {z }
N
P
xi xi0
= (X 0 X )
1
X 0 Y
= (X 0 X )
1
X 0 (X 0 + + U )
p
! E [x x0 ] : i i
K 1
1
= + + ( X 0 X ) 1 These
X 0 U
de…nitons are absolutely not formal. Here, I jsut present intuitions of exogenous and endogenous regressors.
1
^ OLS Consider the expectation of E X;u
h i
h
^ OLS
0
= + E X;u (X X )
1
X U
h h 2 64 | 0
= + E X E ujX (X X ) 1
= + E X (X 0 X )
6=
i
0
1
ii 3 7 j {z } 5 0
(Law of Iterated Expectation)
X U X
X 0 E ujX [ U X ] =0N 1 6
Thus, the OLS estimator is biased (…nite sample property) . Furthermore, ^ OLS
= (X 0 X ) = =
and p lim ^
OLS
1
X 0 Y 1
1
P P P P P P " P P # P P | {z } 1
N
N i=1
+
xi xi
N i=1
xi x0i
1
= p lim + = + p lim
N i=1 1
N
1
1
N i=1
xi xi
1
p lim
E [xi ui ]
1
N
N i=1
xi (x0i + ui )
N
N i=1
xi xi
xi xi
xi ui ;
1
N i=1 1
1
0
0
N
1
0
N
N
1
xi yi =
N i=1
N
N i=1
= + E [xi x0i ]
N
1
0
xi ui
1
N
N i=1
xi ui
(by WLLN, Slutzky, and continuity theorems)
=0K 1 6
6=
:
Therefore, the OLS estimator is also inconsistent (large sample property) . In summary, OLS estimator with endogenous regressor(s) is not only biased with …nite sample, but also inconsistent with large sample. This is the serious problem.
1.2
Example
Consider the classic example in labor economics, estimating the return of education. We want to regress ln (hwagei ) = 1 + 2 edui + 3 exi + 4 abi + "i where
8>< >:
hwagei edui exi abi
: : : :
hourly wage education length ; experience on current job ability (unobserved)
and we want to estimate the return to education parameter 2 : The problem is that the explanatory variable ability 2 (abi ) is usually unobserved and researchers inevitably omit the ability term ( abi ). Therefore, the regression equation becomes ln (hwagei ) = 1 + 2 edui + 3 exi + ui where ui = 4 abi + "i : 2 If
you do not like the abstract terminology "ability", replace it by IQ.
2
(1)
Denote
2 3 4 5 P P P P P P 24 35 1
yi = [ln( hwage)]
The OLS estimator is
= +
N i=1
xi xi
xi xi
edui exi
N
N i=1
xi yi = +
:
1
N
N i=1
1
0
xi xi
1
N
N i=1
xi ui
1
N
1
0
xi =
1
0
N i=1
1
N
1
1
N
1
^ OLS = (X 0 X )1 X 0 Y =
and
( 4 abi + "i ) :
edui exi
N i=1
The omission of ability ( abi ) causes the so called omitting variable bias. The labor economics literature states that there are strong correlation between education ( edui ) and unobserved ability ( abi ). An individual who has long education length is expected to have high ability, i.e. education is an endogenous variable (education is correlated with error term) . Therefore, regressing the above equation causes serious problem, biased and inconsistent estimator. As a consequent, we cannot correctly estimate the return to education parameter 2 :3
2
Instrumental Variable (IV) Estimator
2.1
IV estimator
Keep considering the model = x0i + ui = X + u
yi Y
Now, assume that we have a K 1 instrumental vector zi (that contains instrumental variable zi1;:::; ziK ). that has following properties (1) z i is uncorrelated with u i : (2) z i is correlated with x i Mathematically, we need the condition
N
1
P
N i=1
zi ui
p
!0
K 1 :
Then, the instrumental variable (IV) estimator is de…ned as ^ IV
= (Z 0 X )
1
Z 0 Y 1
P P N
1
=
0
N i=1
zi xi
1
N
N i=1
zi yi
^ IV : We can rewrite the IV estimator and checking asymptotic behavior Consider the consistency of ^ IV =
1
1
P P P P P P ! | {z | } {z } 1
N
N i=1
= + p
1
1
0
zi xi
N i=1 1
N
N i=1
N
0
zi xi
1
+ E [zi xi ]
K K
zi yi =
1
1
N
N i=1
0
zi xi
1
N
N i=1
zi (xi + ui )
N
N i=1
zi ui :
E [zi ui ] =OK 1
=
Thus, the IV estimator is consistent (large sample property) . 3 Here,
we assume experience ( exi ) is exogenous variable (experience on current job is uncorrelated to error term or ability (
3
abi )).
2.2
Example of IV estimator
Continuing the example of return to education. The true model is ln (hwagei ) = 1 + 2 edui + 3 exi + 4 abi + "i :
| {z } =ui
However, due to the unavailability of ability ( abi ), we regress the equation
ln (hwagei ) = 1 + 2 edui + 3 exi + ui where ui = 4 abi + "i :
As we discussed before, education is endogenous variable. (education is correlated with ability). Therefore, we need to employ the IV estimation method to consistently estimate the return to education parameter 2 : So, what instrumental variable is available for this regression? Here, we need to …nd the variable that is correlated with education and uncorrelated with ability (equivalently, uncorrelated to error term). Using the last digit of social security number as instrument is a bad idea. It is not only uncorrelated with individual’s ability, but also uncorrelated with education. Angrist and Krueger (1991, Quarterly Journal of Economics ) suggest birth month as instrument for education. In U.S. school system, students are categorized into school year system. As a consequent of this education system, there are …rst and last school categorization months. It is reported that students who were born in earlier months have higher school grades and SAT scores compared to students who were born in later. As a consequent, students who are born earlier are more likely to go to colleges. So, birth month is correlated with the length of education. De…ne the birth month variable as birthmi (assigning the …rst month to 1 and last month to 12) Then, IV vector is
2 4
zi =
1 birthmi exi
3 5
:
Here, the constant is by de…nition uncorrelated with error term (no matter what values error take, it is always constant). As we discussed above birthmi is uncorrelated with error term (i.e. ability). Also, experience on current job is uncorrelated with ability. Remind that we denote dependent variable and the vector of regressors as
2 3 4 5 1
yi = [ln( hwagge)] ;
xi =
Then, the IV estimator is ^
IV
|{z}
0
= +
ui = 4 abii + "i :
;
1
P P P P 24 35 1
= (Z X )
31
edui exi
1
Z Y = +
N i=1
N i=1
1
N
N
1
0
1
zi xi
zi xi
N
1
0
N i=1
zi ui
1
N
[ 4 abi + "i ] :
birthmi exi
N i=1
The second term of above equation are expected to converge 0 31 :
2.3
Asymptotic Distribution of IV Estimator
Deriving the asymptotic distribution of IV estimator. As we discussed ^ IV = + Transforming the above equation into
p
1
P P 1
N
0
N i=1
zi xi
P
^ N IV
=
1
N
N i=1
4
zi x0i
1
N
N i=1
1
zi ui :
p 1N
N
P
i=1
zi ui :
We have N
1
P p P N i=1
zi x0i
p
! E [z x0 ] i i
N
1
N i=1
zi ui
d
! N
0; E u2i zi zi0
:
^ IV is Then, by WLLN, Slutzky and continuity theorems, the limiting distribution of
p
!
^ IV N
d
0 1
2
N 0; E [zi xi ]
0
E ui zi zi
The consistent estimator of variance is obtained by
0 1
E [zi xi ]
P P P ! 1
1
N
n i=1
zi x0i
1
N
n i=1
^2i zi zi0 u
where u ^i = y i
3
N
1
n i=1
1
0
:
0
zi x0i
:
x ^
IV :
i
Two Stage Least Square (2SLS) Estimator
3.1
2SLS Estimator
Assume that there are more instrument variables than regressors. We might think we do not want to discard any of available instruments. What estimating mythology unable us to use all instruments? Rewriting the model x0i
=
yi
11
11
1K K 1
=
Y
+ u
X
N 1
Now, the vector of available instrument is
+ ui
|{z} |{z} |{z} |{z} |{z} |{z} |{z} |{z} |{z} P ! 2 3 |{z} 64 75 N K K 1
N 1
zi
L1
where L
K with condition
1
N
N i=1
p
zi ui
E [zi ui ] = 0L1 :
We stack up the instrument vector and denote the matrix of instrument as z10
.. .
Z =
:
0
N L
zN
The two stage least square (2SLS) estimator is de…ned as (You should check dimensions of this estimator)
| {z } 0 1 B@ |{z} | {z |{z} C } A |{z} | {z |{z} } ^ SLS 2
=
X 0 Z (Z 0 Z )
1
1
Z 0 X
1
X 0 Z (Z 0 Z )
Z 0 Y
K 1
1
=
X 0 Z (Z 0 Z )
K N
= (X 0 P Z X )
1
N N
1
Z 0 X
N K
X 0 P Z Y;
5
X 0 Z (Z 0 Z )
K N
1
N N
Z 0 Y
N 1
where we use the notation of annihilator matrix (projection matrix). 1
= Z (Z 0 Z )
P Z
|{z}
Z 0
N N
0 1 A |{z} |{z} |@ |{z} |{z} {z } 1
=
Z 0
Z
N L
Z 0 :
Z
LN N L
LN
LL
3.2
Example of 2SLS Estimator
Re-discussing the example of return to education. Rewriting the model ln (hwagei ) = 1 + 2 edui + 3 exi + 4 abi + "i : (where ab i is unavailable variable)
| {z } =ui
ln (hwagei ) = 1 + 2 edui + 3 exi + u where
ui = 4 abi + "i
We have discussed edui is endogenous variable, i.e. correlated with error term (or ability ab i ). Also, we have argued that we can utilize birth month as instrument for education. Card (1995, Aspects of Labour Market Behavior) suggests that the vicinity of of four-year college as instrument for education. Prof. Card insists that high school students who are close to four-year college expect lower expenditure for college life because they can commute from their homes. As a result, high school students who live near four-year colleges are more likely to obtain opportunities of college education. On the other hand, the vicinity to college has no relation to individual ability. (Does an individual who were born close to college has high ability? I don’t think so.) Therefore, we can use he vicinity of four-year college as instrument. Denote the distance between individual i and nearest four-year college as disti : Then, the vector of instrument and stacked up matrix are 1 z10 1 birthm1 dist1 ex1 birthmi .. .. .. .. .. and Z = = zi = : . . . . .
2 6 |{z} 4
3 75
disti exi
41
and 2SLS estimate is obtained by
2 3 2 |{z} 64 75 64 0
N 4
zN
1 birthmN distN exN
3 75
0 1 BB CC B@ |{z} |{z} | {z |{z} |{z} CA |{z} |{z} | {z |{z} |{z} } } | {z } | {z } 1
^ SLS = X 0 Z (Z 0 Z ) 2
| {z } 31
3.3
1
Z 0 X
1
1
X 0 Z (Z 0 Z )
Z 0 Y =
1
X 0 Z (Z 0 Z )
3N N 4
44
N N
Z 0 X
4N N 3
1
X 0 Z (Z 0 Z )
3N N 4
44
Z 0 Y
4N N 1
N N
Two Stage Interpretation of 2SLS Estimator
There are two interpretations of 2SLS. The …rst interpretation is straight forward, implementing the least square regression twice, in …rst and second stages. First Stage: In the …rst stage, we project x i on z i (or equivalently project X on Z ) xi xi
= zi0 + i zi + i =
|{z} |{z} |{z} |{z} K 1
K L L1
K 1
where i is error vector. We take transpose of above equation x0i = zi0
+ 0i :
|{z} |{z} |{z} |{z} 1K
1L LK
6
1K
By stacking up transposed vectors, we obtain the matrix notation X = X
|{z} 2 3 64 75 | {z }
=
N K
+ V
Z
|{z} |{z} |{z} 2 3 2 3 64 75 |{z} 64 75 | {z } | {z } N LLK
x0i
.. .
Z + V N L
zi0
=
x0N
vi00
.. .
+
0 zN
N K
LK
N L
.. .
:
0 vN
N L
By regressing above equation by least square estimator (least square in matrix sense), we obtain the OLS estimator of : 1 ^ = ( Z 0 Z ) Z 0 X:
|{z} LK
Thus we can obtain the projection of X on Z (projection of matrix on another matrix) ^ X
|{z}
= Z ^ = Z (Z 0 Z )
N K
1
Z 0 X
| {z } |{z} |{z} |{z} =^
= P Z X (where P Z = Z (Z 0 Z )
1
Z 0 )
N N
=
P Z X :
N N N K
^ as projection of X on Z: where P Z is annihilator matrix (projection matrix) of Z : Note that we denote X Second Stage: ^ In the second stage we regress Y on projected matrix X: ^ + Y = X Then, the least square estimator of is equal to 2SLS estimator ^ = =
^ 0 X ^ X
1
^ XY
0 (P Z X ) P Z X
= (X 0 P Z P Z X ) 1
= (X 0 P Z X ) =
^ = P Z X ) P Z Y (substituting X
0 P Z Y (since P Z is symmetric, P Z = P Z )
P Z Y 1
X 0 Z (Z 0 Z )
^ SLS = 2
1
1
Z 0 X
(since P Z is idempotent, P Z P Z = P Z ) 1
X 0 Z (Z 0 Z )
1
Z 0 Y (substituting P Z = Z (Z 0 Z )
1
Z 0 )
The name "two stage least square" names after this two stage procedure, projection in the …rst state and regression on projected matrix in the second stage. Formally the above two stage procedures are described ‡owingly, To the model equation yi Y
= x0i + ui = X + U;
we multiply the annihilator matrix P Z from left P Z Y = P Z X + P Z U
(2)
Let’s call this operation as "exogenizing", since we project endogenous variable matrix X on exogenous variable matrix Z:
7
Applying OLS to the equation (2), ^ OLS
=
1
0 (P Z X ) P Z X
= (X 0 P Z P Z X ) 1
= (X 0 P Z X ) ^ = :
1
0 (P Z X ) P Z Y
X 0 P Z P Z Y
X 0 P Z Y (since P Z is idempotent, P Z P Z = P Z )
2SLS
Thus, two stage least square procedure is nothing more than regressing "exogenized" model equation.
3.4
GLS Interpretation of 2SLS
Another interpretation is generalized least square (GLS). Note that we have the model x0i
=
yi
+ ui
|{z} |{z} |{z} |{z} |{z} |{z} |{z} |{z} 11
11
1K K 1
=
Y
X
N 1
+ u :
N K K 1
N 1
Multiplying Z 0 to above equation from left
Z 0 Y = Z 0 X + Z 0 u:
(3)
and regressing this equitation with GLS method. The error vector Z u has conditional expectation and variance are E ujZ [ Zu Z ]
= ZE ujZ [ u Z ] = 0 N 1
j
| {z j } 20 10 1 3 64 B@ | {z j } CA B@ | {z j } CA 75 =0N 1
0
V ar ujZ [ Zu Z ]
= E ujZ
j
E ujZ [ Zu Z ]
Zu
E ujZ [ Zu Z ]
Zu
=0N 1
Z
=0N 1
= E ujZ [ Zuu0 Z 0 Z ] = Z E ujZ [ uu0 Z ] Z 0 = Z 2u I N Z 0 = 2u ZZ 0
j
j
Here we assume homoskedastic error. (you can extend heteroskedastic case easily) Therefore, the variance matrix = 2u ZZ 0 and GLS estimator of equation (3) is ^ GLS
= = =
0
(Z 0 X )
0
2u ZZ 0
X 0 Z (ZZ 0 )
^ SLS = 2
1
(ZX ) 1 ZX
1
0
(ZX ) 1 ZY
1
Z 0 X
Z 0 X
1
1
(Z 0 X ) 1
X 0 Z (ZZ 0 )
0
2u ZZ 0
1
Z 0 Y
( 2u ’s are cancelled out)
ZY
(substituting 1 = 2u ZZ 0
1
)
Therefore, 2SLS is GLS estimator of equation (3).
3.5
Asymptotic Distribution of 2SLS Estimator
We will discuss the asymptotic distribution of 2SLS estimator in the following question (Final Review: Question 8) So let me compromise this subsection.
4
Final Review: Question 8 - 2SLS Estimator and Its Asymptotic Distribution
Consider the linear model
yi = x 0i + ui
8
where E [ ui xi ] = 0
j 6
for i = 1; : : : ; n where x i is a K that
1 vector of regressors. Suppose that there exists a vector of random variable z such E [ u j z ] = 0; i
i
i
where z i is an M 1 vector with M > K: (1) Show that a least square regression of y i on x i will yield an inconsistent estimator for Answer: We have
1
P P P P ! 2 3 64 | {z j } 75
= + p
N i=1
xi xi
+ E xi [ xi x0i ]
N i=1
1
1
= + E xi ;ui [xi x0i ]
N i=1
xi yi =
N
1
0
xi xi
N
1
0
N i=1
1
N
1
N
1
^ OLS = (X 0 X )1 X 0 Y =
1
P P 1
N
N i=1
1
0
xi xi
N
N i=1
xi (x0i + ui )
xi ui
E xi ;ui [xi ui ]
(by WLLN, Slutzky, and Continuity Theorem)
E xi xi E ui j [ ui xi ] =011 6
6=
:
Therefore, OSL estimator is inconsistent. (2) Suggest an instrumental variable estimator for using the entire vector of instruments z i : Answer: Since we have the condition on M > L; we suggest 2SLS estimator. ^ 2LSL
= (X 0 P z X )
1
X 0 P z Y 1
= (X 0 Z (Z 0 Z ) Z 0 X )
X 0 Z (Z 0 Z ) Z 0 Y:
(3) Show that the estimator suggested in (2) can be viewed as a GMM estimator. Answer: We will discuss this question when we study GMM. (4) Using the fact established in (3), provide the asymptotic distribution of the estimator for : Answer: Actually, we do not need to answer (3) to solve this question. Arranging the 2SLS estimator ^ 2LSL
! 1
= + X 0 Z (Z 0 Z ) = +
1
0
n
X Z
1
n
Z 0 X
1
0
Z Z
^ SLS N 2
1
n
Moving from LHS to RHS and multiplying
p
1
1
X 0 Z (Z 0 Z ) 1
0
Z X
n
=
n
0
0
X Z
1
n
X Z
Z Z
1
n
1
0
Z Z
1 n
!
1
0
Z X
1 n
0
X Z
Now, by WLLN we have 1 n
X 0 Z =
1 n
1
0
1 n
Z 0 U
1 ( ’s are created). n
p n
1
1
Z 0 U
Z 0 Z =
1 n
1 n
n
P P
i=1 n i=1
9
p
xi zi0
! E [x z0 ]
zi zi0
! E [z z0 ] ;
i i
p
i i
1
n
Z 0 Z
1
p 1n Z 0 U:
and by CLT we have n
p 1n Z 0U = p 1n
P ! p 0 0 1 1 0 1 BB BB CC CC BB CC p @ | {z } @ | {z } A | {z } A | {z } @ | {z } A | {z } d
zi ui
N (0M 1 ; V a r [zi ui ])
N 0M 1 ; E u2i zi zi0
1
1
i=1
:
Therefore, the limiting distribution is ^ N 2SLS
1
1
=
n
1
X 0 Z
n
p
d
n
p
!E [xi zi0 ]
! N (0
1
Z 0 Z
1
Z 0 X
n
p
!E [zi zi0 ]
1
X 0 Z
n
p
!E [xi zi0 ]
!E [xi zi ]0
1
Z 0 Z
p
!E [zi zi0 ]
n
Z 0 U
d
!N (0M 1 ;E [u2i zi zi0 ])
K 1 ; A) :
where A is de…ned as
1
A = E [xi zi0 ] (E [zi zi0 ])
0 1
E [xi zi0 ]
E [xi zi0 ] E [zi zi0 ]
1
E u2z z0 E [z z0 ]1 E [x z0 ]0 i i i
i i
i i
1
E [xi zi0 ] (E [zi zi0 ])
2 3 64 | {z } 75 | {z }
0 1
E [xi zi0 ]
In the special case of homoskedastic error, E ui jz u2i z = 2u ; we have
E u2i zi zi0
= E zi ;ui u2i zi zi0 = E zi E ui jz u2i zi zi0 z = E zi E ui jz u2i z
zi zi0 = 2u E zi [zi zi0 ] :
= 2u
Then, the variance part corrupts to
0
0
1
E [xi zi ] (E [zi zi ])
0 0
E [xi zi ]
1
= 2u E [xi zi0 ] (E [zi zi0 ])
1
E [xi zi0 ]
0 1
0
E [xi zi ] E [zi zi ]
2
0 1
0
E [zi zi ]
E ui zi zi
0 0
E [xi zi ]
0
0
1
E [xi zi ] (E [zi zi ])
E [xi zi ]
= 2u Ezi [zi zi0 ]
0 1
= B:
^ SLS is Therefore, asymptotic distribution of 2
p n p n Or equivalently,
^ SLS 2
^ SLS 2
! !
a
^ SLS 2
N
^ SLS 2
N
a
d
N (0 K 1 ; A)
in the case of heteroskedastic error
d
N (0 K 1 ; A)
in the case of heteroskedastic error.
; ;
1
n
1
n
A
in the case of heteroskedastic error
B
in the case of heteroskedastic error.
(5) Provide a consistent estimator for the asymptotic covariance matrix established in (4). Justify your answer. Answer: By WLLN we have 1
n
P P P
n i=1
1
n
n i=1
1
n
n i=1
p
xi zi0
! E [x z0 ]
zi zi0
! E [z z0 ]
i i
p
i i
u ^2i zi zi0
10
p
! E u2z z0
i i i
0 0
1
where ^i = y i u
x ^ 2 i
LSL :
Therefore, the consistent estimator for the asymptotic variance is ^ = A
P P P ! P P P P P P P P ! 1
n
1
n
0
i=1 n
n i=1
xi zi
1
1
0
n i=1
1
zi zi0
1
n
1
zi zi
n i=1
xi zi0
xi zi
n
1
n i=1
n
1
0
n i=1
0
n
1
0
n
n i=1
xi zi0
n
xi zi
i=1
1
n
n
1
zi zi0
n i=1
1
0
zi zi
n i=1
1
n
1
1
0
n i=1
0
xi zi0
n
u ^2i zi zi0
1
:
In the special case of homoskedastic error, the asymptotic variance corrupts ^ = B ^ 2u where
P P P ! P P n
1
n i=1
^ 2u =
xi zi
n
1
n i=1
n i=1
0
zi zi
n
1
u ^2i =
1
n
1
0
n i=1
yi
1
0
n
n i=1
^ SLS x0i 2
1
0
xi zi
2
:
Consistency is established by LLN, Slutzky, and continuity theorems.
5
Review of Partial (Residual) Regression
In this section, we review partial regression that you had learned on prof. Kyriazidou’s Note 2. We need to utilize partial regression for solving Comp questions (Comp 2003S Part III) in next two sections. For formal derivation of partial regression, please refer to the appendix of prof. Kyriazidou’s Note #2. Here, we just review the formula and discuss interpretations. If regressor X is partitioned into two groups X 1 and X 2 ; Y = X 1 1 + X 2 2 + u;
the OLS estimator of 1 and 2 are given by ^ 1
= (X 10 M X X 1 )
1
2
^
2
1 (X 20 M X1 X 2 ) X 20 M X1 Y;
=
2
X 10 M X Y
where M X and M X are residual operators 1
2
M X
1
|{z} |{z}
= I n
X 1 (X 10 X )1 X 10
= I n
X 2 (X 20 X )1 X 20 :
N N
M X
1
N N
Notice that M X and M X are Idempotent matrix. 1
2
M X M X 1
1
M X M X 2
2
= M X = M X
1
2
Intuitively, the residual operator M X extracts components that X 1 cannot explain. Similarly, the residual operator M X extracts components that X 2 cannot explain. Denote residuals by 1
2
~1 X ~2 X
= M X X 1 = M X X 2 : 2
1
11
OLS estimators can be written as (by using idempotent property of M X an M X ) 1
^ 1
=
^ 2
=
1
~ 0 X ~ X 1 1
1
~ 20 X ~ 2 X
2
~ 0 Y X 1 ~ 20 Y X
^ ; we …rst regress X 1 on X 2 and obtain residual X ~ 1 = M X X 1 (extracting components that X 2 cannot Here for 1 ~ 1 : Similarly, for ^ 2 ; we …rst regress X 2 on X 1 and obtain residual X ~ 2 = M X X 2 (extracting explain). Then, regress Y on X ~2: components that X 1 cannot explain). Then , regress Y on X Alternatively, OLS estimators can be written as (again, by using idempotent property of M X an M X ) 2
1
1
6
^ 1
=
^ 2
=
~ 10 X ~ 1 X ~ 20 X ~ 2 X
1 1
2
~ 10 ~ X Y ~ 20 ~ X Y:
Comp 2003S Part III (Buchinsky): Question 1
Consider the Neo Classical regression model yi = 0 xi + 0 wi + ui
where is k
(i = 1; : : : ; n)
1 vector of parameters, and is p 1 vector of parameters. Also, for x we have i
E [xi ui ] = 0
and for w i we have E [wi ui ] = 0 :
6
(a) Can the coe¢cient vector be consistently estimated by a least-square regression? Demonstrate your answer as precisely as possible. Answer: The matrix notation of the model is = X + W + U = X + W + U :
Y Y
|{z} 2 3 64 75 | {z } n1
|{z} |{z} |{z} |{z} |{z} 2 3 2 3 2 3 64 75 |{z} 64 75 |{z} 64 75 | {z } | {z } | {z } n p p1
nk k 1
x01
y1
.. .
=
.. .
x0n
yn
n1
+
+
p1
M W = I n
1
.. .
:
un
n1
n p
where
^ OLS
u1
(wn )
k 1
^ OLS = ( X 0 M W X )
^ OLS : Checking the consistency of
(w1 )0 .. .
0
nk
Consider the partial OLS estimator
n1
X 0 M W Y
W (W 0 W )1 W 0:
1
X 0 M W Y
1
X 0 M W ( X + W + U ) (substituting Y = X + W + U )
= (X 0 M W X ) = (X 0 M W X )
1
= + ( X 0 M W X )
X 0 M W W + (X 0 M W X )
| {z } | {z }
1
X 0 M W U
=Onp
= +
1 n
1
0
X M W X
1
n
0
X M W U
(since M W W = I n
discussing below
12
0
W (W W )
1
W W = W
W = O
n p )
Notice that 1 n
X 0 M W U =
= = =
1 n
1 n
X 0 I n
W (W 0W )1 W 0 1
X 0 U +
n
1
X 0 W ( W 0 W )
U
W 0 U
P P P P ! | {z } | {z } 1
n
1
X U +
n
1
0
X W
n
n i=1 p
1
0
1
xi ui +
n
n
n i=1
1
1
0
W W
n
wi wi0
n i=1
E [xi ui ] + E [zi wi0 ] E [wi wi0 ]
1
1
n
n i=1
zi ui
E [wi ui ]
=0k1
6=
n
1
n
1
zi wi0
1 ( ’s are created)
W 0 U
=0p1 6
0k1 :
Therefore,
p
^ OLS 9 i.e. OLS estimator is inconsistent. 0 0 (b) Suppose that C ov [xi ; wi0 ] = 0; and X 0 W = 0 ; where X = (x1 ; : : : xn ) ; and W = ( w1 ; : : : ; wn ) : Suppose also that the vector zi0 = (z1i ; : : : ; zli ) (with l > p) is a proper instrument for wi0 = (w1i ; : : : ; w pi ) ;and let Z = (z1 ; : : : ; zn ) : None of the elements in z i equal to any of elements in x i : Compute the instrumental variable estimator for in the regression that includes both x and w : Answer: Transforming the model by multiplying P Z from left and exogenizing the model P Z Y
^ Y
= P Z X + P Z W + P Z U ^ + W ^ + U ^: = X
(4)
where 1
P Z
|{z}
= Z (Z 0 Z )
Z 0
nn
^ Y
= P Z Y ^ = P Z W W ^ = P Z U: U
^ ; therefore we need to exogenize the model. Then, Note that in equation (4), W is endogenous, i.e. uncorrelated with U we can apply partial regression to equation (4). OLS partial regression estimator of is (and call it exogenized OLS) ^ ExogenizedOLS
Checking the consistency of this estimator. ^ ExogenizedOLS
= = = =
|
{z
^ M ^ W ^ W X 0
^ M ^ W ^ W X 0
^ M ^ W ^ W X 0
^ M ^ W ^ W X 0
1
^ M ^ W ^ = W X 0
1
^ M ^ Y ^ W X 0
^ M ^ Y ^ W X 0
1
^ M ^ Y ^ W X 0
1
^ M ^ X ^ + W ^ + U ^ W X 0
1
^ M ^ X ^ + + W ^ 0 M ^ W ^ W X X 0
Op1
|
{z
^ 0 M ^ W ^ = + W X
1
} }
^ M ^ U ^ W X 0
discussing below
13
1
^ M ^ U ^ (since M ^ X ^ = O nk ) W X X 0
Discussing the second term of above equation,
^ M ^ U ^ = (P Z W )0 I n W X 0
= (P Z W )0 I n 0
= (P Z W )
I n
1
0
0
X (X X ) X X (X 0X )1 X 0 0
X (X X )
1
0
P Z U
Z 0 (Z 0 Z )
1
0
X Z
n
1
Z 0 U 1
1
0
Z Z
n
p
Z 0 U
(by WLLN)
| {z }
!E [zi u]=0p1
p
! 0 1: p
Therefore, by WLLN, Slutzky, and continuity theorems, ^ ExogenizedOLS
p
! ;
i.e. ^ ExogenizedOLS is consistent. (c) Under the condition in (b), consider the following estimation procedure: (i) Estimate from a regression of Y on ~ = M X Y (where M X = I X (X 0 X )1 X ) and estimate by computing the instrumental variable X ; and (ii) Compute Y estimator from a regression of Y on w ; using z as instrumental variable for w : Answer: Following suggestions given in the question. We regress Y on X and obtain the residual ~ = M X Y: Y
Next, we project W on Instrument Z; and obtain projected matrix ^ = P Z W: W ~ on W Then, regress Y / and deriving the alternative estimator
^ 0 W ^ W ^ ~ ^ Alternative = W Y: (d) Compare the estimators for from (b) and (c). Explain the di¤erence and/or the similarity. Answer: In (b) and (c), we have derived two estimators ^ ExogenizedOLS
= =
^ Alternative
= =
^ M ^ W ^ W X 0
1
1 ^ ^ W M X^ W
^ M ^ Y ^ W X 0
1
0
n
1 ^ ^ W M X^ Y 0
n
^ 0 W ^ W ^ ~ W Y
1 ^ 0 ^ 1 ^ ~ W W W Y
n
n
Are these estimators equivalent? The answer is yes, but we need to assume very strong assumption, the orthogonal condition between xi and zi i.e. E [xi zi0 ] = O k p : Intuitively, we multiply M X to the exogenized P Z Y M X P Z Y
= P Z X + P Z W + P Z U = M X P Z X + M X P Z W + M X P Z U
Since M X and P Z are symmetric matrix, the product P Z M X is also symmetric 0
0 0 M X P Z = ( M X P Z ) = P Z M X = P Z M X
14
We can transform the equation into = P Z M X X + M X P Z W + M X P Z U
P Z M X Y
| {z } Onk
P Z ~ Y
= M X P Z W + M X P Z U
Also, since X and Z are orthogonal, we have4 M X P Z
= =
1
X (X X ) X Z (Z 0Z )1 Z 0 1 1 1 Z (Z 0 Z ) Z 0 X (X 0 X ) X 0 Z (Z 0 Z ) Z 0 0
I n
0
|{z} =Okp
1
= Z (Z 0 Z ) = P Z Thus, the equation becomes
Z 0
P Z ~ Y = P Z W + P Z U:
We can obtain OLS estimator of this equation by ^ OLS
= =
(P Z W )0 P Z W ~ 0 W ~ W
1
1
~ ~ W Y
= ^ Alternative
4 Here,
my discussin is very sloppy. Formally, we need 1
n P
n k=1
0
xi zi
p !
0
E xi zi = O kp :
15
0
(P Z W ) P Z ~ Y