3A.1
DERIV DERI VATION OF LEAST LEAST-SQU -SQUARES ARES ESTIMA ESTIMATES TES
Differentiating (3.1.2) partially with respect to βˆ1 and βˆ2 , we obt obtai ain n ∂
2 e i
∂ βˆ1
∂
2 e i
∂ βˆ
= −2
= −2
(Y i − βˆ1 − βˆ2 X i ) = −2
(Y i − βˆ1 − βˆ2 X i ) X i
= −2
(1)
(2)
e i
e i X i
2
where e i
ˆ X = Y i − βˆ1 − β 2 i
e 2i
and
RSS
=
Setting equations 1 and 2 equal to zero, gives the estimators. ¯ ¯ −βˆ2 X Note that βˆ1 = Y 3A.2 3A. 2
LINEAR LIN EARITY ITY AND UNBI UNBIASE ASEDNE DNESS SS PROPE PROPERTI RTIES ES
OF LEAST-SQUARES ESTIMATORS
From earlier caculations , we have
xi Y i ki Y i 2 = 2 =
βˆ
(3)
xi
where
ki
=
xi 2 xi
which sho which shows ws tha thatt βˆ2 is a linear estimator be beca caus use e it is a li line near ar fu func ncti tion on of Y Y; Y i with ki ser act ctu uall lly y it is a weig igh hte ted d aver era age of Y erv vin ing g as th the e wei eigh ghts ts.. It ca can n simila sim ilarl rly y be sho shown wn tha thatt βˆ1 to too o is a li line near ar es esti tima mato torr. Incidentally,, note these properties of the weights ki : Incidentally 1. Since the X i are assumed to be nonstochastic, the ki are nonstochas-
tic too. 2. 3. 4.
0. ki = 1 ki = 2
ki xi =
xi2 .
ki X i =
1.
Note that for any two variables Y and X , xi Y i= X i yi = ¯ and Y ¯ are both 0, then all three will be = X i Y i If X
xi yi
To prove 2 (similarly you can prove 3 and 4),
1 ki =
xi
2
xi
=
2
xi
xi ,
sinc si nce e fo forr a gi give ven n sa samp mple le
xi2
is kno known wn
since xi , the sum of deviati deviations ons from from the mean value, is always zero
= 0,
Now substitute the PRF Y i = β1 + β2 X i + ui into (3) to obtain βˆ2 =
(
ki β1 + β2 X i + ui )
= β1
= β2 +
ki + β2
ki X i +
ki ui
ki ui
(4)
where use is made of the properties of ki noted earlier. Now taking expectation of (4) on both sides and noting that ki , being nonstochastic, can be treated as constants, we obtain E(βˆ2 ) = β2 +
= β2
ki E(ui )
(5)
since E (ui ) = 0 by assumption. assumption. Therefore Therefore,, βˆ2 is an unbiased estimator of β2 . Likewise, it can be proved that βˆ1 is also an unbiased estimator of β1 .
3A.3
VARIANCES AND STANDARD ERRORS
OF LEAST-SQUARES ESTIMATORS
Now by the definition of variance, we can write 2
var (βˆ2 ) = E[βˆ2 − E(βˆ2 )] 2 = E(βˆ2 − β2 )
since E(βˆ2 ) = β2 2
k u = E = E k u + k u i
2 2 1 1
(6)
using Eq. (4) above
i
2 2 2 2 2 2 + · · · + kn un + 2 k1 k2 u1 u2 + · · · + 2 kn−1 knun−1 un
Since by assumption, E(u2i ) = σ 2 for each i and E (ui u j ) = 0, i = j , it follows that 2 var (βˆ2 ) = σ 2 ki σ 2
=
(using the definition of k2i )
xi2
(7)
The variance of βˆ1 can be obtained following the same line of reasoning. 2 X ˆ ) = var (β i 2 σ 2 1 n
xi
Once the variances are obtained, their + square roots give the corresponding standard errors.
__________________________________________________________________________ Our PRF is given by Y 0 1 X u ..........(1) We can also write our PRF above as Yi 0 1 Xi ui ..........(2) for i 1, 2,..., n in a sample from this population. We can sum over all thei ' s in (2) and divide by n , and we get Y 0 1 X u ..........(3) Note that u 0. (?) Now we can subtract (3) from (1) and we obtain yi 1 xi (ui u )..........(4). where yi Yi Y and xi X i X . Now let us find xpressions in deviation form for our SRF . Note that SRF for the sample observations i 1,2,..., n is given by Yi 0 1 X i ei ..........(5) ˆ
ˆ
n
where ei ( residuals) satisfy ei 0 and 1
n
x e 0. i i
1
We could write (5) as Y i Yi ei , ˆ
where Yi 0 1 X i ˆ
ˆ
ˆ
is the estimated value of E(Y | X i ).
Now aggregate (5) as well as 7) over i ' s and dividing by n , we get Y 0 1 X Y ..........(8) ˆ
ˆ
ˆ
When we now subtract (8) from (5), we obtain yi 1 xi ei ...........(9). ˆ
[ SRF in deviations form]
Subtracting (8) from (7), we get yi 1 xi .............(10) ˆ
ˆ
or equivalently subtacting Y ( Y ) from (6), we get yi yi ei ...........(11). ˆ
ˆ
n
now bcoz xi ei 0 , 1
n
y e 0 follows from (10). ˆ
i i
1
Notice if we now square (11) on both sides and then sum over i , we will have TSS on LHS and sum of ESS and RSS on RHS. All we would need to show is that yi ei = 0 (see just above). ˆ
3A.5
THE LEAST-SQUARES ESTIMATOR OF σ 2
Recall that (9)
Y i = β1 + β2 X i + ui
Therefore, ¯ = β1 + β2 X ¯ +u Y ¯
(10)
(11)
Subtracting (10) from (9) gives yi = β2 xi + (ui − u ¯)
Also recall that ei = yi − βˆ2 xi
(12)
Therefore, substituting (11) into (12) yields ei = β2 xi + (ui − u ¯ ) − βˆ2 xi
(13)
Collecting terms, squaring, and summing on both sides, we obtain
2 2 e i = (βˆ2 − β2 )
2
2 (ui − u ¯ ) − 2(βˆ2 − β2 )
xi +
Taking expectations on both sides gives E
2
e i
=
2
2 xi E(βˆ2 − β2 ) + E
=
xi (ui − u ¯ ) (14)
(ui − u ¯)
2
− 2 E (βˆ2 − β2 )
2 xi var (βˆ2 ) + (n − 1)var( ui ) − 2 E
2 2 = σ + (n − 1) σ − 2 E
¯) xi (ui − u
ki ui ( xi ui )
2
ki xi ui
(15)
2 2 2 = σ + (n − 1) σ − 2σ 2
= (n − 2)σ
where, in the last but one step, use is made of the definition of ki given in Eq. (3) and the relation given in Eq. (4). Also note that E
2
(ui − u ¯) = E =E
= E
2
2
ui − nu ¯
2
= nσ −
2
ui − n
2
ui −
1 n
ui
2
n
2
ui
n 2 σ = (n − 1)σ 2 n
where use is made of the fact that the ui are uncorrelated and the variance of each ui is σ 2 . Thus, we obtain E
Therefore, if we define
2
e i
2 σ ˆ =
2 = (n − 2)σ
e2i
n− 2
(16)
(17)
its expected value is 2
E(σ ˆ
)=
1 n− 2
E
2
e i
2 = σ
using (16)
which shows that σ ˆ 2 is an unbiased estimator of true σ 2 .
(18)
3A.6
MINIMUM-VARIANCE PROPERTY
OF LEAST-SQUARES ESTIMATORS
It was shown in Appendix 3A, Section 3A.2, that the least-squares estimator βˆ2 is linear as well as unbiased (this holds true of βˆ1 too). To show that these estimators are also minimum variance in the class of all linear unbiased estimators, consider the least-squares estimator βˆ2 :
k Y
βˆ2 =
i i
where
ki =
(X X
¯
i − X i
¯ )2 − X
=
x x i
(see Appendix3A.2)
2 i
(19)
which shows that βˆ2 is a weighted average of the Y ’s, with ki serving as the weights. Let us define an alternative linear estimator of β2 as follows: β2∗ =
w Y
i i
(20)
where w i are also weights, not necessarily equal to ki . Now
E(β2∗ ) = =
w E(Y ) w ( X ) w w X i
i
β1 + β2 i
i
= β 1
i +
β2
i
(21)
i
Therefore, for β2∗ to be unbiased, we must have
w w X
=0
i
and
i
i
= 1 =
w x i
(22)
i
(23)
Also, we may write var (β2∗ ) = var =
w Y i i
w var Y [ Note: var Y var u ] w [ Note: cov(Y , Y ) 0 (i j )] w k k (Note the mathematical trick) x x w k 2 w x x w k 2 i
= σ 2
i
i
2 i
=
i
i
j
=
= σ 2 =
2
2
= σ
i
−
+
i
i
2
= σ
2
= σ
2
i
−
i
+ σ 2 k2i + σ 2
i −
i
i
2 i
2 i
2
i
−
i
+ σ 2 k2i
(24)
because the last term in the next to the last step drops out. (Why? Look at (23)) Since the last term in (24) is constant, the variance of ( β2* ) can be minimized only when wi = ki in the first term. If we let that Eq. (24) reduces to σ 2
var (β2* ) =
xi2
(25)
ˆ ) = var (β 2
To put it differently, if there is a minimum-variance linear unbiased estimator of β2 , it must be the least-squares estimator. Similarly it can be shown that βˆ1 is a minimum variance linear unbiased estimator of β1 .