CHAPTER 1
INTRODUCTION
1.1 WHAT IS ECONOMETRICS?
The term “econometrics” is believed to have been crafted by Ragnar Frisch (1895-197! of "or#ay$ one of the three %rinci%le fo&nders of the 'conometric ociety$ first editor of the )o&rnal 'conometrica$ and co-#inner of the first "obel *emorial +ri,e in 'conomic cien ciences ces in 199. 199. /t is ther theref efor oree fitt fittin ing g that that #e t&rn t&rn to Fris Frisch0 ch0ss o#n #ord #ordss in the the introd&ction to the first iss&e of 'conometrica to describe the disci%line. #ord of e2%lanation regarding the term econometrics may be in order. /ts defini-tion is im%lied in the statement of the sco%e of the 3'conometric4 ociety$ in ection / of the onstit&tion$ #hich reads6 “The 'conometric ociety is an international society for the advancement of economic theory in its relation to statistics and mathematics.... /ts main ob)ect shall be to %romote st&dies that aim at a &nification of the theoretical theoretical-&ant -&antitati itative ve and the em%iricalem%irical-&anti &antitativ tativee a%%roach a%%roach to economic economic %roblems....” &t there are several as%ects of the &antitative a%%roach to economics$ and no single one of these as%ects$ taen by itself$ sho&ld be confo&nded #ith econom economet etri rics cs.. Th&s Th&s$$ econ econome ometr tric icss is by no means means the the same same as econ economi omicc statistics. "or is it identical #ith #hat #e call general economic theory$ altho&gh a considerable %ortion of this theory has a defininitely &antitative character. "or sho&ld econometrics be taen as synonomo&s #ith the a%%lication of mathematics to economics. '2%erience has sho#n that each of these three vie# %oints$ that of statistics$ economic theory$ and mathematics$ is a necessary$ b&t not by itself a s&fficient$ condition for a real &nderstanding of the &antitative relati relations ons in modern modern economic economic life. life. /t is the &nificat &nification ion of all three that is %o#erf&l. nd nd it is this &nification that constit&tes econometrics. econometrics. Ragnar Frisch$ 'conometrica$ (19!$ 1$ %%. 1-:. This definition remains valid today$ altho&gh some terms have evolved some#hat in their &sage. Today$ #e #o&ld say that econometrics is the &nified st&dy of economic models$ mathematical statistics$ and economic data. ;ithi ithin n the the fiel field d of econo econome metr tric icss ther theree are are s&bs&b-di divi visi sions ons and s%ec s%ecia iali li,at ,atio ions ns.. 'conometric the-ory concerns the develo%ment of tools and methods$ and the st&dy of the
%ro%erties of econometric methods. %%lied econometrics is a term describing the develo%ment of &antitative economic models and the a%%lication of econometric methods to these models &sing economic data. 1.2 THE PROBABILITY APPROACH TO ECONOMETRICS
The &nifying methodology of modern econometrics #as artic&lated by Trygve
eterministic models are blatently inconsistent #ith observed economic &an-tities$ and it is incoherent to a%%ly deterministic models to non-deterministic data. 'conomic models sho&ld be e2%licitly designed to incor%orate randomness? stochastic errors sho&ld not be sim%ly added to deterministic models to mae them random. @nce #e acno#ledge that an eco-nomic model is a %robability model$ it follo#s nat&rally that an a%%ro%riate tool #ay to &antify$ estimate$ and cond&ct inferences abo&t the economy is thro&gh the %o#erf&l theory of mathe-matical statistics. The a%%ro%riate method for a &antitative economic analysis follo#s from the %robabilistic constr&ction of the economic model.
literat&re re)ects mathematical statistics (deeming classical theory as ina%%ro%riate for a%%ro2imate models! and instead selects %arameters by matching model and data moments &sing non-statistical ad hoc1 methods. 1.3 Econometrc Term! "n# Not"ton
/n a ty%ical a%%lication$ an econometrician has a set of re%eated meas&rements on a set of vari-ables. For e2am%le$ in a labor a%%lication the variables co&ld incl&de #eely earnings$ ed&cational attainment$ age$ and other descri%tive characteristics. ;e call this information the data$ dataset$ or sam%le. ;e &se the term observations to refer to the distinct re%eated meas&rements on the variables. n individ&al observation often corres%onds to a s%ecific economic &nit$ s&ch as a %erson$ ho&sehold$ cor%oration$ firm$ organi,ation$ co&ntry$ state$ city or other geogra%hical region. n individ&al observation co&ld also be a meas&rement at a %oint in time$ s&ch as &arterly C>+ or a daily interest rate. 'conomists ty%ically denote variables by the italici,ed roman characters y$ 2$ andDor ,. The convention in econometrics is to &se the character y to denote the variable to be e2%lained$ #hile the characters 2 and , are &sed to denote the conditioning (e2%laining! variables. Follo#ing mathematical convention$ real n&mbers (elements of the real line R! are #ritten &sing lo#er case italics s&ch as y$ and vectors (elements of R ! by lo#er case bold italics s&ch as 2$ e.g.
E%%er case bold italics s&ch as are &sed for matrices. ;e ty%ically denote the n&mber of observations by the nat&ral n&mber n$ and s&bscri%t the variables by the inde2 i to denote the individ&al observation$ e.g. yi$ 2i and ,i. /n some conte2ts #e &se indices other than i$ s&ch as in time-series a%%lications #here the inde2 t is common$ and in %anel st&dies #e ty%ically &se the do&ble inde2 it to refer to individ&al i at a time %eriod t.
The i0th observation is the set (yi$ 2 i$ , i!. The sam%le is the set G(y i$ 2i$ ,i! 6 i H 1$ ...$nI.
/t is %ro%er mathematical %ractice to &se &%%er case for random variables and lo#er case 2 for reali,ations or s%eciJc val&es. ince #e &se &%%er case to denote matrices$ the distinction bet#een random variables and their reali,ations is not rigoro&sly follo#ed in econometric notation. Th&s the notation yi #ill in some %laces refer to a random variable$ and in other %laces a s%eciJc reali,ation. This is an &ndesirable b&t there is little to be done abo&t it #itho&t terriJcally com%licating the notation.
^
The covar iance matr ix of an pital boldf ace
ca
)
as the
onometr ic
est
^
var iance matr ix f or β . opef ully without c ausing conf usion, we will use the ^
co
notation V β
β – β )
imator will typically be writt en using the V, of ten with a subsc r ipt to denote the estimator , e.g. V β = var ( β ec
=
avar (
β ) to denote the asymptotic covar iance matr ix of ^
√ n
(
(the var iance of the asymptotic distr ibution).
!stimates will be denoted by appending hats or tildes, e.g.
Vβ ^
is an
est
imate of V β .
1.$ O%!er&"ton"' D"t"
common econometric &estion is to &antify the im%act of one set of variables on another variable. For e2am%le$ a concern in labor economics is the ret&rns to schooling P the change in earnings ind&ced by increasing a #orer0s ed&cation$ holding other variables constant. nother iss&e of interest is the earnings ga% bet#een men and #omen. /deally$ #e #o&ld &se e2%erimental data to ans#er these &estions. To meas&re the ret&rns to schooling$ an e2%eriment might randomly divide children into gro&%s$ mandate diff erent levels of ed&cation to the diff erent gro&%s$ and then follo# the children0s #age %ath after they mat&re and enter the labor force. The diff erences bet#een the gro&%s #o&ld be direct meas&rements of the ef-fects of diff erent levels of ed&cation.
/nstead$ most economic data is observational. To contin&e the above e2am%le$ thro&gh data collection #e can record the level of a %erson0s ed&cation and their #age. ;ith s&ch data #e can meas&re the )oint distrib&tion of these variables$ and assess the )oint de%endence. &t from observational data it is diffic< to infer ca&sality$ as #e are not able
to mani%&late one variable to see the direct eff ect on the other. For e2am%le$ a %erson0s level of ed&cation is (at least %artially! determined by that %erson0s choices. These factors are liely to be aff ected by their %ersonal abilities and attit&des to#ards #or. The fact that a %erson is highly ed&cated s&ggests a high level of ability$ #hich s&ggests a high relative #age. This is an alternative e2%lanation for an observed %ositive correlation bet#een ed&cational levels and #ages.
There are three ma)or ty%es of economic data sets6 cross-sectional$ time-series$ and %anel. They are disting&ished by the de%endence str&ct&re across observations. ross-sectional data sets have one observation %er individ&al. &rveys are a ty%ical so&rce for cross-sectional data. /n ty%ical a%%lications$ the individ&als s&rveyed are %ersons$ ho&seholds$ firms or other economic agents. /n many contem%orary econometric crosssection st&dies the sam%le si,e n is &ite large. /t is conventional to ass&me that crosssectional observations are m&t&ally inde%endent. *ost of this te2t is devoted to the st&dy of cross-section data. Time-series data are inde2ed by time. Ty%ical e2am%les incl&de macroeconomic aggregates$ %rices and interest rates. This ty%e of data is characteri,ed by serial de%endence so the random sam%ling ass&m%tion is ina%%ro%riate. *ost aggregate economic data is only available at a lo# fre&ency (ann&al$ &arterly or %erha%s monthly! so the sam%le si,e is ty%ically m&ch smaller than in cross-section st&dies. The e2ce%tion is financial data #here data are available at a high fre&ency (#eely$ daily$ ho&rly$ or by transaction! so sam%le si,es can be &ite large. +anel data combines elements of cross-section and time-series. These data sets consist of a set of individ&als (ty%ically %ersons$ ho&seholds$ or cor%orations! s&rveyed re%eatedly over time. The common modeling ass&m%tion is that the individ&als are m&t&ally inde%endent of one another$ b&t a given individ&al0s observations are m&t&ally de%endent. This is a modified random sam%ling environment. >ata tr&ct&res
S ross-section S Time-series S +anel
ome contem%orary econometric a%%lications combine elements of cross-section$ timeseries$ and %anel data modeling. These incl&de models of s%atial correlation and cl&stering. s #e mentioned above$ most of this te2t #ill be devoted to cross-sectional data &nder the ass&m%tion of m&t&ally inde%endent observations. y m&t&al inde%endence #e mean that the i0th observation (yi$ 2i$ ,i! is inde%endent of the )0th observation (y )$ 2 )$ , )! for i H ). (ometimes the label “inde%endent” is misconstr&ed. /t is a statement abo&t the relationshi% bet#een observations i and )$ not a statement abo&t the relationshi% bet#een yi and 2i andDor ,i.! F&rthermore$ if the data is randomly gathered$ it is reasonable to model each observation as a random dra# from the same %robability distrib&tion. /n this case #e say that the data are inde%endent and identically distrib&ted or iid. ;e call this a random sam%le. For most of this te2t #e #ill ass&me that o&r observations come from a random sam%le.
>efinition 1.5.1 The observations (yi$ 2 i$ , i! are a random sam%le if they are m&t&ally inde%endent and identically distrib&ted (iid! across i H 1$ ...$n.
/n the random sam%ling frame#or$ #e thin of an individ&al observation (yi$ 2i$ ,i! as a re-ali,ation from a )oint %robability distrib&tion F (y$ 2$ ,! #hich #e can call the %o%&lation. This “%o%&lation” is infinitely large. This abstraction can be a so&rce of co nf&sion as it does not cor-res%ond to a %hysical %o%&lation in the real #orld. /t0s an abstraction since the distrib&tion F is &nno#n$ and the goal of statistical inference is to learn abo&t feat&res of F from the sam%le. The ass&m%tion of random sam%ling %rovides the mathematical fo&ndation for treating economic statistics #ith the tools of mathematical statistics. The random sam%ling frame#or #as a ma)or intellect&ral breathro&gh o f the late 19th cen-t&ry$ allo#ing the a%%lication of mathematical statistics to the social sciences. efore this conce%-t&al develo%ment$ methods from mathematical statistics had not been a%%lied to economic data as they #ere vie#ed as ina%%ro%riate. The random sam%ling frame#or enabled economic sam%les to be vie#ed as homogeno&s and random$ a necessary %recondition for the a%%lication of statistical methods. 1.* So)rce! +or Economc D"t"
Fort&nately for economists$ the internet %rovides a convenient for&m for dissemination of eco-nomic data. *any large-scale economic datasets are available #itho&t charge from governmental agencies. n e2cellent starting %oint is the Reso&rces for 'conomists >ata Bins$ available at rfe.org. From this site yo& can find almost every %&blically available economic data set. ome s%ecific data so&rces of interest incl&de S
&rea& of Babor tatistics
S
E ens&s
S
&rrent +o%&lation &rvey
S
&rvey of /ncome and +rogram +artici%ation
S
+anel t&dy of /ncome >ynamics
S
Federal Reserve ystem (oard of Covernors and regional bans!
S
"ational &rea& of 'conomic Research
S
E.. &rea& of 'conomic nalysis
S
om%&tat
S
/nternational Financial tatistics
nother good so&rce of data is from a&thors of %&blished em%irical st&dies. *ost )o&rnals in economics re&ire a&thors of %&blished %a%ers to mae their datasets generally available. For e2am%le$ in its instr&ctions for s&bmission$ 'conometrica states6 'conometrica has the %olicy that all em%irical$ e2%erimental and sim&lation res<s m&st be re%licable. Therefore$ a&thors of acce%ted %a%ers m&st s&bmit data sets$ %rograms$ and information on em%irical analysis$ e2%eriments and sim&lations that are needed for re%lication and some limited sensitivity analysis. The merican 'conomic Revie# states6 ll data &sed in analysis m&st be made available to any researcher for %&r%oses of re%lication. The o&rnal of +olitical 'conomy states6 /t is the %olicy of the o&rnal of +olitical 'conomy to %&blish %a%ers only if the data &sed in the analysis are clearly and %recisely doc&mented and are readily available to any researcher for %&r%oses of re%lication. /f yo& are interested in &sing the data from a %&blished %a%er$ first chec the )o&rnal0s
#ebsite$ as many )o&rnals archive data and re%lication %rograms online. econd$ chec the #ebsite(s! of the %a%er0s a&thor(s!. *ost academic economists maintain #eb%ages$ and some mae available re%lication files com%lete #ith data and %rograms. /f these investigations fail$ email the a&thor(s!$ %olitely re&esting the data. Uo& may need to be %ersistent. s a matter of %rofessional eti&ette$ all a&thors absol&tely have the obligation to mae their data and %rograms available. Enfort&nately$ many fail to do so$ and ty%ically for %oor reasons. The irony of the sit&ation is that it is ty%ically in the best interests of a scholar to mae as m&ch of their #or (incl&ding all data and %rograms! freely available$ as this only increases the lielihood of their #or being cited and having an im%act. ee% this in mind as yo& start yo&r o#n em%irical %ro)ect. Remember that as %art of yo&r end %rod&ct$ yo& #ill need (and #ant! to %rovide all data and %rograms to the comm&nity of scholars. The greatest form of flattery is to learn that another scholar has read yo&r %a%er$ #ants to e2tend yo&r #or$ or #ants to &se yo&r em%irical methods. /n addition$ %&blic o%enness %rovides a healthy incentive for trans%arency and integrity in em%irical analysis. 1., Econometrc So+t-"re
'conomists &se a variety of econometric$ statistical$ and %rogramming soft#are. TT (###.stata.com! is a %o#erf&l statistical %rogram #ith a broad set of %re %rogrammed econometric and statistical tools. /t is &ite %o%&lar among economists$ and is contin&o&sly being &%dated #ith ne# methods. /t is an e2cellent %acage for most econometric analysis$ b&t is limited #hen yo& #ant to &se ne# or less-common econometric methods #hich have not yet been %rogramed. R (###.r-%ro)ect.org!$ CE (###.a%tech.com!$ *TB (###.math#ors.com!$ and @2 (###.o2metrics.net! are high-level matri2 %rogramming lang&ages #ith a #ide variety of b&ilt-in statistical f&nctions. *any econometric methods have been %rogramed in these lang&ages and are available on the #eb. The advantage of these %acages is that yo& are in com%lete control of yo&r analysis$ and it is easier to %rogram ne# methods than in TT. ome disadvantages are that yo& have to do m&ch of the %rogramming yo&rself$ %rogramming com%licated %roced&res taes significant time$ and %rogramming errors are hard to %revent and diffic< to detect and eliminate. @f these lang&ages$ Ca&ss &sed to be &ite %o%&lar among econometricians$ b&t no# *atlab is more %o%&lar. smaller b&t gro#ing gro&% of econometricians are enth&siastic fans of R$ #hich of these lang&ages is &ni&ely o%en-so&rce$ &ser-contrib&ted$ and best of all$ com%letely freeQ For highly-intensive com%&tational tass$ some economists #rite their %rograms in a standard %rogramming lang&age s&ch as Fortran or . This can lead to ma)or gains in com%&tational s%eed$ at the cost of increased time in %rogramming and deb&gging.
s these diff erent %acages have distinct advantages$ many em%irical economists end &% &sing more than one %acage. s a st&dent of econometrics$ yo& #ill learn at least one
of these %acages$ and %robably more than one. 1. Re"#n/ t0e M"n)!crt
/ have endeavored to &se a &nified notation and nomenclat&re. The develo%ment of the material is c&m&lative$ #ith later cha%ters b&ilding on the earlier ones. "ever-the-less$ every attem%t has been made to mae each cha%ter self-contained$ so readers can %ic and choose to%ics according to their interests. To f&lly &nderstand econometric methods$ it is necessary to have a mathematical &nderstanding of its mechanics$ and this incl&des the mathematical %roofs of the main res<s. onse&ently$ this te2t is self-contained$ #ith nearly all res<s %roved #ith f&ll mathematical rigor. The mathematical develo%ment and %roofs aim at brevity and conciseness (sometimes described as mathematical elegance!$ b&t also at %edagogy. To &nderstand a mathematical %roof$ it is not s&fficient to sim%ly read the %roof$ yo& need to follo# it$ and re-create it for yo&rself. "ever-the-less$ many readers #ill not be interested in each mathematical detail$ e2%lanation$ or %roof. This is oay. To &se a method it may not be necessary to &nderstand the mathematical details. ccordingly / have %laced the more technical mathematical %roofs and details in cha%ter a%%endices. These a%%endices and other technical sections are mared #ith an asteris (V!. These sections can be si%%ed #itho&t any loss in e2%osition. 1. Common Sm%o'!
y 2 R R '(y! var (y! cov (2$y! var (2! corr(2$y! +r
scalar vector matri2 real line '&clidean s%ace mathematical e2%ectation variance covariance covariance matri2 correlation %robability
WX % WX WdX %limnXY "(Z$M:! "([$ 1! \ : /n tr [ W1
limit convergence in %robability convergence in distrib&tion %robability limit normal distrib&tion standard normal distrib&tion chi-s7&are distrib&tion #ith 9 degrees of freedom identity matri2 trace matri2 trans%ose matri2 inverse
] [$ ^ [ a _ def H ∼
log
%ositive definite$ %ositive semi-definite '&clidean norm matri2 (Frobini&s! norm a%%ro2imate e&ality definitional e&ality is distrib&ted as nat&ral logarithm