Analysis of Time Series Data Using R1 ZONGWU CAIa,b,c E-mail address: a
[email protected]
Department of Mathematics & Statistics and Department of Economics, University of North Carolina, Charlotte, NC 28223, U.S.A.
b c
Wang Yanan Institute for Studies in Economics, Xiamen University, China
College of Economics and Management, Shanghai Jiaotong University, China
July 30, 2006
c 2006, ALL RIGHTS RESERVED by ZONGWU CAI ! 1 This
manuscript may be printed and reproduced for individual or instructional use, but may not be printed for commercial purposes.
Preface The purpose of this lecture notes is designed to provide an overview of methods that are useful for analyzing univariate and multivariate phenomena measured over time. Since this is a course emphasizing applications with both theory and applications, the reader is guided through examples involving real time series in the lectures. A collection of simple theoretical and applied exercises assuming a background that includes a beginning level course in mathematical statistics and some computing skills follows each chapter. More importantly, the computer code in R and datasets are provided for most of examples analyzed in this lecture notes. Some materials are based on the lecture notes given by Professor Robert H. Shumway, Department of Statistics, University of California at Davis and my colleague, Professor Stanislav Radchenko, Department of Economics, University of North Carolina at Charlotte. Some datasets are provided by Professor Robert H. Shumway, Department of Statistics, University of California at Davis and Professor Phillips Hans Franses at University of Rotterdam, Netherland. I am very grateful to them for providing their lecture notes and datasets.
Contents 1 Package R and Simple Applications 1.1 Computational Toolkits . . . . . . . . . . . . . . . . . 1.2 How to Install R ? . . . . . . . . . . . . . . . . . . . . 1.3 Data Analysis and Graphics Using R – An Introduction 1.4 CRAN Task View: Empirical Finance . . . . . . . . . . 1.5 CRAN Task View: Computational Econometrics . . . . 2 Characteristics of Time Series 2.1 Introduction . . . . . . . . . . . . . . . . 2.2 Stationary Time Series . . . . . . . . . . 2.2.1 Detrending . . . . . . . . . . . . 2.2.2 Differencing . . . . . . . . . . . . 2.2.3 Transformations . . . . . . . . . . 2.2.4 Linear Filters . . . . . . . . . . . 2.3 Other Key Features of Time Series . . . 2.3.1 Seasonality . . . . . . . . . . . . 2.3.2 Aberrant Observations . . . . . . 2.3.3 Conditional Heteroskedasticity . . 2.3.4 Nonlinearity . . . . . . . . . . . . 2.4 Time Series Relationships . . . . . . . . 2.4.1 Autocorrelation Function . . . . . 2.4.2 Cross Correlation Function . . . . 2.4.3 Partial Autocorrelation Function 2.5 Problems . . . . . . . . . . . . . . . . . . 2.6 Computer Code . . . . . . . . . . . . . . 2.7 References . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . .
. . . . . . (109 . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . .
3 Univariate Time Series Models 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 Least Squares Regression . . . . . . . . . . . . . . . . . . . 3.3 Model Selection Methods . . . . . . . . . . . . . . . . . . . 3.3.1 Subset Approaches . . . . . . . . . . . . . . . . . . 3.3.2 Sequential Methods . . . . . . . . . . . . . . . . . . 3.3.3 Likelihood Based-Criteria . . . . . . . . . . . . . . 3.3.4 Cross-Validation and Generalized Cross-Validation ii
. . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . pages) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . .
1 1 3 4 4 8
. . . . . . . . . . . . . . . . . .
15 15 17 21 24 26 28 30 30 33 34 36 38 39 40 45 49 53 66
. . . . . . .
69 69 74 79 79 83 85 87
CONTENTS
3.4 3.5
3.6 3.7 3.8 3.9 3.10 3.11 3.12 3.13
3.14 3.15 3.16
iii
3.3.5 Penalized Methods . . . . . . . . . . . . . . . . . . . . . . . Integrated Models - I(1) . . . . . . . . . . . . . . . . . . . . . . . . Autoregressive Models - AR(p) . . . . . . . . . . . . . . . . . . . . 3.5.1 Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.5.2 Forecasting . . . . . . . . . . . . . . . . . . . . . . . . . . . Moving Average Models – MA(q) . . . . . . . . . . . . . . . . . . . Autoregressive Integrated Moving Average Model - ARIMA(p, d, q) Seasonal ARIMA Models . . . . . . . . . . . . . . . . . . . . . . . . Regression Models With Correlated Errors . . . . . . . . . . . . . . Estimation of Covariance Matrix . . . . . . . . . . . . . . . . . . . Long Memory Models . . . . . . . . . . . . . . . . . . . . . . . . . . Periodicity and Business Cycles . . . . . . . . . . . . . . . . . . . . Impulse Response Function . . . . . . . . . . . . . . . . . . . . . . 3.13.1 First Order Difference Equations . . . . . . . . . . . . . . . 3.13.2 Higher Order Difference Equations . . . . . . . . . . . . . . Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Computer Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4 Non-stationary Processes and Structural Breaks 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2 Random Walks . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2.1 Inappropriate Detrending . . . . . . . . . . . . . . . . 4.2.2 Spurious (nonsense) Regressions . . . . . . . . . . . . . 4.3 Unit Root and Stationary Processes . . . . . . . . . . . . . . . 4.3.1 Comparison of Forecasts of TS and DS Processes . . . 4.3.2 Random Walk Components and Stochastic Trends . . . 4.4 Trend Estimation and Forecasting . . . . . . . . . . . . . . . . 4.4.1 Forecasting a Deterministic Trend . . . . . . . . . . . . 4.4.2 Forecasting a Stochastic Trend . . . . . . . . . . . . . . 4.4.3 Forecasting ARMA models with Deterministic Trends . 4.4.4 Forecasting of ARIMA Models . . . . . . . . . . . . . . 4.5 Unit Root Tests . . . . . . . . . . . . . . . . . . . . . . . . . . 4.5.1 The Dickey-Fuller and Augmented Dickey-Fuller Tests 4.5.2 Cautions . . . . . . . . . . . . . . . . . . . . . . . . . . 4.6 Structural Breaks . . . . . . . . . . . . . . . . . . . . . . . . . 4.6.1 Testing for Breaks . . . . . . . . . . . . . . . . . . . . 4.6.2 Zivot and Andrews’s Testing Procedure . . . . . . . . . 4.6.3 Cautions . . . . . . . . . . . . . . . . . . . . . . . . . . 4.7 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.8 Computer Code . . . . . . . . . . . . . . . . . . . . . . . . . . 4.9 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . .
88 90 93 93 99 102 106 108 120 130 133 136 141 142 146 150 157 182
. . . . . . . . . . . . . . . . . . . . . .
185 185 188 189 190 190 191 193 194 194 195 195 196 197 197 200 200 201 203 205 205 209 213
CONTENTS
iv
5 Vector Autoregressive Models 5.1 Introduction . . . . . . . . . . . . 5.1.1 Properties of VAR Models 5.1.2 Statistical Inferences . . . 5.2 Impulse-Response Function . . . 5.3 Variance Decompositions . . . . . 5.4 Granger Causality . . . . . . . . . 5.5 Forecasting . . . . . . . . . . . . 5.6 Problems . . . . . . . . . . . . . . 5.7 References . . . . . . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
215 215 218 220 222 225 226 229 229 233
6 Cointegration 6.1 Introduction . . . . . . . . 6.2 Cointegrating Regression . 6.3 Testing for Cointegration . 6.4 Cointegrated VAR Models 6.5 Problems . . . . . . . . . . 6.6 References . . . . . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
234 234 235 236 239 242 243
7 Nonparametric Density, Distribution & Quantile Estimation 7.1 Mixing Conditions . . . . . . . . . . . . . . . . . . . . . . . . . 7.2 Density Estimate . . . . . . . . . . . . . . . . . . . . . . . . . . 7.2.1 Asymptotic Properties . . . . . . . . . . . . . . . . . . . 7.2.2 Optimality . . . . . . . . . . . . . . . . . . . . . . . . . . 7.2.3 Boundary Correction . . . . . . . . . . . . . . . . . . . . 7.3 Distribution Estimation . . . . . . . . . . . . . . . . . . . . . . 7.3.1 Smoothed Distribution Estimation . . . . . . . . . . . . 7.3.2 Relative Efficiency and Deficiency . . . . . . . . . . . . . 7.4 Quantile Estimation . . . . . . . . . . . . . . . . . . . . . . . . 7.4.1 Value at Risk . . . . . . . . . . . . . . . . . . . . . . . . 7.4.2 Nonparametric Quantile Estimation . . . . . . . . . . . . 7.5 Computer Code . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.6 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
244 244 245 246 249 251 254 254 257 258 258 260 261 264
. . . . . . . . . .
267 267 267 268 270 271 273 274 277 279 279
. . . . . .
. . . . . .
. . . . . .
. . . . . .
8 Nonparametric Regression Estimation 8.1 Bandwidth Selection . . . . . . . . . . 8.1.1 Simple Bandwidth Selectors . . 8.1.2 Cross-Validation Method . . . . 8.2 Multivariate Density Estimation . . . . 8.3 Regression Function . . . . . . . . . . 8.4 Kernel Estimation . . . . . . . . . . . . 8.4.1 Asymptotic Properties . . . . . 8.4.2 Boundary Behavior . . . . . . . 8.5 Local Polynomial Estimate . . . . . . . 8.5.1 Formulation . . . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
CONTENTS
8.6
8.7
8.8 8.9
8.5.2 Implementation in R . . . . . . . . . . . . 8.5.3 Complexity of Local Polynomial Estimator 8.5.4 Properties of Local Polynomial Estimator 8.5.5 Bandwidth Selection . . . . . . . . . . . . Functional Coefficient Model . . . . . . . . . . . . 8.6.1 Model . . . . . . . . . . . . . . . . . . . . 8.6.2 Local Linear Estimation . . . . . . . . . . 8.6.3 Bandwidth Selection . . . . . . . . . . . . 8.6.4 Smoothing Variable Selection . . . . . . . 8.6.5 Goodness-of-Fit Test . . . . . . . . . . . . 8.6.6 Asymptotic Results . . . . . . . . . . . . . 8.6.7 Conditions and Proofs . . . . . . . . . . . 8.6.8 Monte Carlo Simulations and Applications Additive Model . . . . . . . . . . . . . . . . . . . 8.7.1 Model . . . . . . . . . . . . . . . . . . . . 8.7.2 Backfitting Algorithm . . . . . . . . . . . 8.7.3 Projection Method . . . . . . . . . . . . . 8.7.4 Two-Stage Procedure . . . . . . . . . . . . 8.7.5 Monte Carlo Simulations and Applications Computer Code . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . .
v . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . .
280 281 284 289 292 292 293 294 296 296 299 301 311 311 311 315 317 319 322 322 326
List of Tables 3.1
AICC values for ten models for the recruits series . . . . . . . . . . . . . . .
4.1 4.2 4.3
Large-sample critical values for the ADF statistic . . . . . . . . . . . . . . . 198 Summary of DF test for unit roots in the absence of serial correlation . . . . 199 Critical Values of the QLR statistic with 15% Trimming . . . . . . . . . . . 203
5.1 5.2
Sims variance decomposition in three variable VAR model . . . . . . . . . . 228 Sims variance decomposition including interest rates . . . . . . . . . . . . . . 228
6.1
Critical values for the Engle-Granger ADF statistic . . . . . . . . . . . . . . 238
8.1
Sample sizes required for p-dimensional nonparametric regression to have comparable performance with that of 1-dimensional nonparametric regression using size 100 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 277
vi
98
List of Figures 2.1 2.2 2.3 2.4
2.5 2.6 2.7 2.8 2.9
2.10 2.11 2.12 2.13 2.14
2.15 2.16 2.17
Monthly SOI (left) and simulated recruitment (right) from a model (n=453 months, 1950-1987). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Simulated MA(1) with θ1 = 0.9. . . . . . . . . . . . . . . . . . . . . . . . . . Log of annual indices of real national output in China, 1952-1988. . . . . . . Monthly average temperature in degrees centigrade, January, 1856 - February 2005, n = 1790 months. The straight line (wide and green) is the linear trend y = −9.037 + 0.0046 t and the curve (wide and red) is the nonparametric estimated trend. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Detrended monthly global temperatures: left panel (linear) and right panel (nonlinear). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Differenced monthly global temperatures. . . . . . . . . . . . . . . . . . . . . Annual stock of motor cycles in the Netherlands, 1946-1993. . . . . . . . . . Quarterly earnings for Johnson & Johnson (4th quarter, 1970 to 1st quarter, 1980, left panel) with log transformed earnings (right panel). . . . . . . . . . The SOI series (black solid line) compared with a 12 point moving average (red thicker solid line). The top panel: original data and the bottom panel: filtered series. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . US Retail Sales Data from 1967-2000. . . . . . . . . . . . . . . . . . . . . . . Four-weekly advertising expenditures on radio and television in The Netherlands, 1978.01 − 1994.13. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . First difference in log prices versus the inflation rate: the case of Argentina, 1970.1 − 1989.4. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Japanese - U.S. dollar exchange rate return series {yt }, from January 1, 1974 to December 31, 2003. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Quarterly unemployment rate in Germany, 1962.1 − 1991.4 (seasonally adjusted and not seasonally adjusted) in the left panel. The scatterplot of unemployment rate (seasonally adjusted) versus unemployment rate (seasonally adjusted) one period lagged in the right panel. . . . . . . . . . . . . . . . . . Multiple lagged scatterplots showing the relationship between SOI and the present (xt ) versus the lagged values (xt+h ) at lags 1 ≤ h ≤ 16. . . . . . . . . Autocorrelation functions of SOI and recruitment and cross correlation function between SOI and recruitment. . . . . . . . . . . . . . . . . . . . . . . . Multiple lagged scatterplots showing the relationship between the SOI at time t + h, say xt+h (x-axis) versus recruits at time t, say yt (y-axis), 0 ≤ h ≤ 15. vii
18 21 22
23 24 25 26 27
29 31 32 33 36
37 39 41 42
LIST OF FIGURES 2.18 Multiple lagged scatterplots showing the relationship between the SOI at time t, say xt (x-axis) versus recruits at time t + h, say yt+h (y-axis), 0 ≤ h ≤ 15. 2.19 Partial autocorrelation functions for the SOI (left panel) and the recruits (right panel) series. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.20 Varve data for Problem 5. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.21 Gas and oil series for Problem 6. . . . . . . . . . . . . . . . . . . . . . . . . 2.22 Handgun sales (per 10,000,000) in California and monthly gun death rate (per 100,00) in California (February 2, 1980 -December 31, 1998. . . . . . . . . . Autocorrelation functions (ACF) for simple (left) and log (right) returns for IBM (top panels) and for the value-weighted index of US market (bottom panels), January 1926 to December 1997. . . . . . . . . . . . . . . . . . . . . 3.2 Autocorrelation functions (ACF) and partial autocorrelation functions (PACF) for the detrended (top panel) and differenced (bottom panel) global temperature series. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3 A typical realization of the random walk series (left panel) and the first difference of the series (right panel). . . . . . . . . . . . . . . . . . . . . . . . . 3.4 Autocorrelation functions (ACF) (left) and partial autocorrelation functions (PACF) (right) for the random walk (top panel) and the first difference (bottom panel) series. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.5 Autocorrelation (ACF) of residuals of AR(1) for SOI (left panel) and the plot of AIC and AICC values (right panel). . . . . . . . . . . . . . . . . . . . . . 3.6 Autocorrelation functions (ACF) and partial autocorrelation functions (PACF) for the log varve series (top panel) and the first difference (bottom panel), showing a peak in the ACF at lag h = 1. . . . . . . . . . . . . . . . . . . . . 3.7 Number of live births 1948(1) − 1979(1) and residuals from models with a first difference, a first difference and a seasonal difference of order 12 and a fitted ARIMA(0, 1, 1) × (0, 1, 1)12 model. . . . . . . . . . . . . . . . . . . . . . . . 3.8 Autocorrelation functions (ACF) and partial autocorrelation functions (PACF) for the birth series (top two panels), the first difference (second two panels) an ARIMA(0, 1, 0)×(0, 1, 1)12 model (third two panels) and an ARIMA(0, 1, 1)× (0, 1, 1)12 model (last two panels). . . . . . . . . . . . . . . . . . . . . . . . . 3.9 Autocorrelation functions (ACF) and partial autocorrelation functions (PACF) for the log J&J earnings series (top two panels), the first difference (second two panels), ARIMA(0, 1, 0) × (1, 0, 0)4 model (third two panels), and ARIMA(0, 1, 1) × (1, 0, 0)4 model (last two panels). . . . . . . . . . . . . . . 3.10 Autocorrelation functions (ACF) and partial autocorrelation functions (PACF) for ARIMA(0, 1, 1)×(0, 1, 1)4 model (top two panels) and the residual plots of ARIMA(0, 1, 1) × (1, 0, 0)4 (left bottom panel) and ARIMA(0, 1, 1) × (0, 1, 1)4 model (right bottom panel). . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.11 Monthly simple return of CRSP Decile 1 index from January 1960 to December 2003: Time series plot of the simple return (left top panel), time series plot of the simple return after adjusting for January effect (right top panel), the ACF of the simple return (left bottom panel), and the ACF of the adjusted simple return. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
viii
42 46 50 51 53
3.1
72
77 91
92 99
104
111
112
115
116
119
LIST OF FIGURES 3.12 Autocorrelation functions (ACF) and partial autocorrelation functions (PACF) for the detrended log J&J earnings series (top two panels)and the fitted ARIMA(0, 0, 0) × (1, 0, 0)4 residuals. . . . . . . . . . . . . . . . . . . . . . . 3.13 Time plots of U.S. weekly interest rates (in percentages) from January 5, 1962 to September 10, 1999. The solid line (black) is the Treasury 1-year constant maturity rate and the dashed line the Treasury 3-year constant maturity rate (red). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.14 Scatterplots of U.S. weekly interest rates from January 5, 1962 to September 10, 1999: the left panel is 3-year rate versus 1-year rate, and the right panel is changes in 3-year rate versus changes in 1-year rate. . . . . . . . . . . . . 3.15 Residual series of linear regression Model I for two U.S. weekly interest rates: the left panel is time plot and the right panel is ACF. . . . . . . . . . . . . . 3.16 Time plots of the change series of U.S. weekly interest rates from January 12, 1962 to September 10, 1999: changes in the Treasury 1-year constant maturity rate are in denoted by black solid line, and changes in the Treasury 3-year constant maturity rate are indicated by red dashed line. . . . . . . . . . . . . 3.17 Residual series of the linear regression models: Model II (top) and Model III (bottom) for two change series of U.S. weekly interest rates: time plot (left) and ACF (right). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.18 Sample autocorrelation function of the absolute series of daily simple returns for the CRSP value-weighted (left top panel) and equal-weighted (right top panel) indexes. The log spectral density of the absolute series of daily simple returns for the CRSP value-weighted (left bottom panel) and equal-weighted (right bottom panel) indexes. . . . . . . . . . . . . . . . . . . . . . . . . . . 3.19 The autocorrelation function of an AR(2) model: (a) φ1 = 1.2 and φ2 = −0.35, (b) φ1 = 1.0 and φ2 = −0.7, (c) φ1 = 0.2 and φ2 = 0.35, (d) φ1 = −0.2 and φ2 = 0.35. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.20 The growth rate of US quarterly real GNP from 1947.II to 1991.I (seasonally adjusted and in percentage): the left panel is the time series plot and the right panel is the ACF. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.21 The time-series yt is generated with wt ∼ N (0, 1), y0 = 5. At period t = 50, there is an additional impulse to the error term, i.e. w!50 = w50 + 1. The impulse response function is computed as the difference between the series yt without impulse and the series y!t with the impulse. . . . . . . . . . . . . . . 3.22 The time-series yt is generated with wt ∼ N (0, 1), y0 = 3. At period t = 50, there is an additional impulse to the error term, i.e. w!50 = w50 + 1. The impulse response function is computed as the difference between the series yt without impulse and the series y!t with the impulse. . . . . . . . . . . . . . . 3.23 Example of impulse response functions for first order difference equations. . . 3.24 The time series yt is generated with wt ∼ N (0, 1), y0 = 3. For the transitory impulse, there is an additional impulse to the error term at period t = 50, i.e. w!50 = w50 + 1. For the permanent impulse, there is an additional impulse for period t = 50, · · ·, 100, i.e. w!t = wt + 1, t = 50, 51, · · ·, 100. The impulse response function (IRF) is computed as the difference between the series yt without impulse and the series y!t with the impulse. . . . . . . . . . . . . . .
ix
123
125
125 126
127
127
135
139
139
143
144 146
147
LIST OF FIGURES
x
3.25 Example of impulse response functions for second order difference equation. . 149
Chapter 1 Package R and Simple Applications 1.1
Computational Toolkits
When you work with large datasets, messy data handling, models, etc, you need to choose the computational tools that are useful for dealing with these kinds of problems. There are “menu driven systems” where you click some buttons and get some work done - but these are useless for anything nontrivial. To do serious economics and finance in the modern days, you have to write computer programs. And this is true of any field, for example, empirical macroeconomics - and not just of “computational finance” which is a hot buzzword recently. The question is how to choose the computational tools. According to Ajay Shah (December 2005), you should pay attention to four elements: price, freedom, elegant and powerful computer science, and network effects. Low price is better than high price. Price = 0 is obviously best of all. Freedom here is in many aspects. A good software system is one that doesn’t tie you down in terms of hardware/OS, so that you are able to keep moving. Another aspect of freedom is in working with colleagues, collaborators and students. With commercial software, this becomes a problem, because your colleagues may not have the same software that you are using. Here 1
CHAPTER 1. PACKAGE R AND SIMPLE APPLICATIONS
2
free software really wins spectacularly. Good practice in research involves a great accent on reproducibility. Reproducibility is important both so as to avoid mistakes, and because the next person working in your field should be standing on your shoulders. This requires an ability to release code. This is only possible with free software. Systems like SAS and Gauss use archaic computer science. The code is inelegant. The language is not powerful. In this day and age, writing C or fortran by hand is “too low level”. Hell, with Gauss, even a minimal thing like online help is tawdry. One prefers a system to be built by people who know their computer science - it should be an elegant, powerful language. All standard CS knowledge should be nicely in play to give you a gorgeous system. Good computer science gives you more productive humans. Lots of economists use Gauss, and give out Gauss source code, so there is a network effect in favor of Gauss. A similar thing is right now happening with statisticians and R. Here I cite comparisons among most commonly used packages (see Ajay Shah (December 2005)); see the web site at http://www.mayin.org/ajayshah/ COMPUTING/mytools.html. R is a very convenient programming language for doing statistical analysis and Monte Carol simulations as well as various applications in quantitative economics and finance. Indeed, we prefer to think of it of an environment within which statistical techniques are implemented. I will teach it at the introductory level, but NOTICE that you will have to learn R on your own. Note that about 97% of commands in S-PLUS and R are same. In particular, for analyzing time series data, R has a lot of bundles and packages, which can be downloaded for free, for example, at http://www.r-project.org/.
CHAPTER 1. PACKAGE R AND SIMPLE APPLICATIONS
3
R, like S, is designed around a true computer language, and it allows users to add additional functionality by defining new functions. Much of the system is itself written in the R dialect of S, which makes it easy for users to follow the algorithmic choices made. For computationally-intensive tasks, C, C++ and Fortran code can be linked and called at run time. Advanced users can write C code to manipulate R objects directly. 1.2
How to Install R ?
(1) go to the web site http://www.r-project.org/; (2) click CRAN; (3) choose a site for downloading, say http://cran.cnr.Berkeley.edu; (4) click Windows (95 and later); (5) click base; (6) click R-2.3.1-win32.exe (Version of 06-01-2006) to save this file first and then run it to install. The basic R is installed into your computer. If you need to install other packages, you need to do the followings: (7) After it is installed, there is an icon on the screen. Click the icon to get into R; (8) Go to the top and find packages and then click it; (9) Go down to Install package(s)... and click it; (10) There is a new window. Choose a location to download the packages, say USA(CA1), move mouse to there and click OK; (11) There is a new window listing all packages. You can select any one of packages and click OK, or you can select all of them and then click OK.
CHAPTER 1. PACKAGE R AND SIMPLE APPLICATIONS
1.3
4
Data Analysis and Graphics Using R – An Introduction (109 pages)
See the file r-notes.pdf (109 pages) which can be downloaded from http://www.math.uncc.edu/˜ zcai/r-notes.pdf. I encourage you to download this file and learn it by yourself. 1.4
CRAN Task View: Empirical Finance
This CRAN Task View contains a list of packages useful for empirical work in Finance, grouped by topic. Besides these packages, a very wide variety of functions suitable for empirical work in Finance is provided by both the basic R system (and its set of recommended core packages), and a number of other packages on the Comprehensive R Archive Network (CRAN). Consequently, several of the other CRAN Task Views may contain suitable packages, in particular the Econometrics Task View. The web site is http://cran.r-project.org/src/contrib/Views/Finance.html
1. Standard regression models: Linear models such as ordinary least squares (OLS) can be estimated by lm() (from by the stats package contained in the basic R distribution). Maximum Likelihood (ML) estimation can be undertaken with the optim() function. Non-linear least squares can be estimated with the nls() function, as well as with nlme() from the nlme package. For the linear model, a variety of regression diagnostic tests are provided by the car, lmtest, strucchange, urca, uroot, and sandwich packages. The Rcmdr and Zelig packages provide user interfaces that may be of interest as well.
CHAPTER 1. PACKAGE R AND SIMPLE APPLICATIONS
5
2. Time series: Classical time series functionality is provided by the arima() and KalmanLike() commands in the basic R distribution. The dse packages provides a variety of more advanced estimation methods; fracdiff can estimate fractionally integrated series; longmemo covers related material. For volatily modeling, the standard GARCH(1,1) model can be estimated with the garch() function in the tseries package. Unit root and cointegration tests are provided by tseries, urca and uroot. The Rmetrics packages fSeries and fMultivar contain a number of estimation functions for ARMA, GARCH, long memory models, unit roots and more. The ArDec implements autoregressive time series decomposition in a Bayesian framework. The dyn and dynlm are suitable for dynamic (linear) regression models. Several packages provide wavelet analysis functionality: rwt, wavelets, waveslim, wavethresh. Some methods from chaos theory are provided by the package tseriesChaos. 3. Finance: The Rmetrics bundle comprised of the fBasics, fCalendar, fSeries, fMultivar, fPortfolio, fOptions and fExtremes packages contains a very large number of relevant functions for different aspect of empirical and computational finance. The RQuantLib package provides several optionpricing functions as well as some fixed-income functionality from the QuantLib project to R. The portfolio package contains classes for equity portfolio management. 4. Risk Management: The VaR package estimates Value-atRisk, and several packages provide functionality for Extreme Value Theory models: evd, evdbayes, evir, extRremes, ismec, POT. The mvtnorm package provides code for mul-
CHAPTER 1. PACKAGE R AND SIMPLE APPLICATIONS
6
tivariate Normal and t-distributions. The Rmetrics packages fPortfolio and fExtremes also contain a number of relevant functions. The copula and fgac packages cover multivariate dependency structures using copula methods. 5. Data and Date Management: The its, zoo and fCalendar (part of Rmetrics) packages provide support for irregularlyspaced time series. fCalendar also addresses calendar issues such as recurring holidays for a large number of financial centers, and provides code for high-frequency data sets. CRAN packages: * ArDec * car * copula * dse * dyn * dynlm * evd * evdbayes * evir * extRemes * fBasics (core) * fCalendar (core) * fExtremes (core) * fgac
CHAPTER 1. PACKAGE R AND SIMPLE APPLICATIONS
* fMultivar (core) * fOptions (core) * fPortfolio (core) * fracdiff * fSeries (core) * ismev * its (core) * lmtest * longmemo * mvtnorm * portfolio * POT * Rcmdr * RQuantLib (core) * rwt * sandwich * strucchange * tseries (core) * tseriesChaos * urca (core) * uroot * VaR * wavelets * waveslim * wavethresh
7
CHAPTER 1. PACKAGE R AND SIMPLE APPLICATIONS
8
* Zelig * zoo (core) Related links: * CRAN Task View: Econometrics. The web site is http://cran.cnr.berkeley.edu/src/ contrib/Views/Econometrics.html or see the next section. * Rmetrics by Diethelm Wuertz contains a wealth of R code for Finance. The web site is http://www.itp.phys.ethz.ch/econophysics/R/ * Quantlib is a C++ library for quantitative finance. The web site is http://quantlib.org/ * Mailing list: R Special Interest Group Finance 1.5
CRAN Task View: Computational Econometrics
Base R ships with a lot of functionality useful for computational econometrics, in particular in the stats package. This functionality is complemented by many packages on CRAN, a brief overview is given below. There is also a considerable overlap between the tools for econometrics in this view and finance in the Finance view. Furthermore, the finance SIG is a suitable mailing list for obtaining help and discussing questions about both computational finance and econometrics. The packages in this view can be roughly structured into the following topics. The web site is
CHAPTER 1. PACKAGE R AND SIMPLE APPLICATIONS
9
http://cran.r-project.org/src/contrib/ Views/Econometrics.html 1. Linear regression models: Linear models can be fitted (via OLS) with lm() (from stats) and standard tests for model comparisons are available in various methods such as summary() and anova(). Analogous functions that also support asymptotic tests (z instead of t tests, and Chi-squared instead of F tests) and plug-in of other covariance matrices are coeftest() and waldtest() in lmtest. Tests of more general linear hypotheses are implemented in linear.hypothesis() in car. HC and HAC covariance matrices that can be plugged into these functions are available in sandwich. The packages car and lmtest also provide a large collection of further methods for diagnostic checking in linear regression models. 2. Microeconometrics: Many standard micro-econometric models belong to the family of generalized linear models (GLM) and can be fitted by glm() from package stats. This includes in particular logit and probit models for modelling choice data and poisson models for count data. Negative binomial GLMs are available via glm.nb() in package MASS from the VR bundle. Zero-inflated count models are provided in zicounts. Further over-dispersed and inflated models, including hurdle models, are available in package pscl. Bivariate poisson regression models are implemented in bivpois. Basic censored regression models (e.g., tobit models) can be fitted by survreg() in survival. Further more refined tools for microecnometrics are provided in micEcon. The package bayesm implements a Bayesian approach to microeconometrics and marketing. Inference for relative distributions is contained in package reldist.
CHAPTER 1. PACKAGE R AND SIMPLE APPLICATIONS
10
3. Further regression models: Various extensions of the linear regression model and other model fitting techniques are available in base R and several CRAN packages. Nonlinear least squares modelling is available in nls() in package stats. Relevant packages include quantreg (quantile regression), sem (linear structural equation models, including two-stage least squares), systemfit (simultaneous equation estimation), betareg (beta regression), nlme (nonlinear mixed-effect models), VR (multinomial logit models in package nnet) and MNP (Bayesian multinomial probit models). The packages Design and Hmisc provide several tools for extended handling of (generalized) linear regression models. 4. Basic time series infrastructure: The class ts in package stats is R’s standard class for regularly spaced time series which can be coerced back and forth without loss of information to zooreg from package zoo. zoo provides infrastructure for both regularly and irregularly spaced time series (the latter via the class “zoo” ) where the time information can be of arbitrary class. Several other implementations of irregular time series building on the “POSIXt” time-date classes are available in its, tseries and fCalendar which are all aimed particularly at finance applications (see the Finance view). 5. Time series modelling: Classical time series modelling tools are contained in the stats package and include arima() for ARIMA modelling and Box-Jenkins-type analysis. Furthermore stats provides StructTS() for fitting structural time series and decompose() and HoltWinters() for time series filtering and decomposition. For estimating VAR models, several methods are available: simple models can be fitted by ar() in stats, more
CHAPTER 1. PACKAGE R AND SIMPLE APPLICATIONS
11
elaborate models are provided by estVARXls() in dse and a Bayesian approach is available in MSBVAR. A convenient interface for fitting dynamic regression models via OLS is available in dynlm; a different approach that also works with other regression functions is implemented in dyn. More advanced dynamic system equations can be fitted using dse. Unit root and cointegration techniques are available in urca, uroot and tseries. Time series factor analysis is available in tsfa. 6. Matrix manipulations: As a vector- and matrix-based language, base R ships with many powerful tools for doing matrix manipulations, which are complemented by the packages Matrix and SparseM. 7. Inequality: For measuring inequality, concentration and poverty the package ineq provides some basic tools such as Lorenz curves, Pen’s parade, the Gini coefficient and many more. 8. Structural change: R is particularly strong when dealing with structural changes and changepoints in parametric models, see strucchange and segmented. 9. Data sets: Many of the packages in this view contain collections of data sets from the econometric literature and the package Ecdat contains a complete collection of data sets from various standard econometric textbooks. micEcdat provides several data sets from the Journal of Applied Econometrics and the Journal of Business & Economic Statistics data archives. Package CDNmoney provides Canadian monetary aggregates and pwt provides the Penn world table. CRAN packages:
CHAPTER 1. PACKAGE R AND SIMPLE APPLICATIONS
* bayesm * betareg * bivpois * car (core) * CDNmoney * Design * dse * dyn * dynlm * Ecdat * fCalendar * Hmisc * ineq * its * lmtest (core) * Matrix * micEcdat * micEcon * MNP * MSBVAR * nlme * pscl * pwt * quantreg
12
CHAPTER 1. PACKAGE R AND SIMPLE APPLICATIONS
* reldist * sandwich (core) * segmented * sem * SparseM * strucchange * systemfit * tseries (core) * tsfa * urca (core) * uroot * VR * zicounts * zoo (core) Related links: * CRAN Task View: Finance. The web site is http://cran.cnr.berkeley.edu/src/contrib/ Views/Finance.html or see the above section. * Mailing list: R Special Interest Group Finance * A Brief Guide to R for Beginners in Econometrics. The web site is http://people.su.se/˜ ma/R−intro/.
13
CHAPTER 1. PACKAGE R AND SIMPLE APPLICATIONS
* R for Economists. The web site is http://www.mayin.org/ajayshah/KB/R/ R−for−economists.html.
14
Chapter 2 Characteristics of Time Series 2.1
Introduction
The very nature of data collected in different fields as as diverse as economics, finance, biology, medicine, and engineering leads one naturally to a consideration of time series models. Samples taken from all of these disciplines are typically observed over a sequence of time periods. Often, for example, one observes hourly or daily or monthly or yearly data, even tick-by-tick trade data, and it is clear from examining the histories of such series over a number of time periods that the adjacent observations are by no means independent. Hence, the usual techniques from classical statistics, developed primarily for independent identically distributed (iid) observations, are not applicable. Clearly, we can not hope to give a complete accounting of the theory and applications of time series in the limited time to be devoted to this course. Therefore, what we will try to accomplish, in this presentation is a considerably more modest set of objectives, with more detailed references quoted for discussions in depth. First, we will attempt to illustrate the kinds of time series analyses that can arise in scientific contexts, particularly, in economics and finance, and give examples of applications using real data. This necessarily 15
CHAPTER 2. CHARACTERISTICS OF TIME SERIES
16
will include exploratory data analysis using graphical displays and numerical summaries such as the autocorrelation and cross correlation functions. The use of scatter diagrams and various linear and nonlinear transformations also will be illustrated. We will define classical time series statistics for measuring the patterns described by time series data. For example, the characterization of consistent trend profiles by dynamic linear or quadratic regression models as well as the representation of periodic patterns using spectral analysis will be illustrated. We will show how one might go about examining plausible patterns of cause and effect, both within and among time series. Finally, some time series models that are particularly useful such as regression with correlated errors as well as multivariate autoregressive and state-space models will be developed, together with unit root, co-integration, and nonlinear time series models, and some other models. Forms of these models that appear to offer hope for applications will be emphasized. It is recognized that a discussion of the models and techniques involved is not enough if one does not have available the requisite resources for carrying out time series computations; these can be formidable. Hence, we include a computing package, called R. In this chapter, we will try to minimize the use of mathematical notation throughout the discussions and will not spend time developing the theoretical properties of any of the models or procedures. What is important for this presentation is that you, the reader, can gain a modest understanding as well as having access to some of the principal techniques of time series analysis. Of course, we will refer to Hamilton (1994) for additional references or more complete discussions relating to an application or principle and will discuss them in detail.
CHAPTER 2. CHARACTERISTICS OF TIME SERIES
2.2
17
Stationary Time Series
We begin by introducing several environmental and economic as well as financial time series to serve as illustrative data for time series methodology. Figure 2.1 shows monthly values of an environmental series called the Southern Oscillation Index (SOI) and associated recruitment (number of new fish) computed from a model by Pierre Kleiber, Southwest Fisheries Center, La Jolla, California. Both series are for a period of 453 months ranging over the years 1950 − 1987. The SOI measures changes in air pressure that are related to sea surface temperatures in the central Pacific. The central Pacific Ocean warms up every three to seven years due to the El Ni˜no effect which has been blamed, in particular, for foods in the midwestern portions of the U.S. Both series in Figure 2.1 tend to exhibit repetitive behavior, with regularly repeating (stochastic) cycles that are easily visible. This periodic behavior is of interest because underlying processes of interest may be regular and the rate or frequency of oscillation characterizing the behavior of the underlying series would help to identify them. One can also remark that the cycles of the SOI are repeating at a faster rate than those of the recruitment series. The recruit series also shows several kinds of oscillations, a faster frequency that seems to repeat about every 12 months and a slower frequency that seems to repeat about every 50 months. The study of the kinds of cycles and their strengths will be discussed later. The two series also tend to be somewhat related; it is easy to imagine that somehow the fish population is dependent on the SOI. Perhaps there is even a lagged relation, with the SOI signalling changes in the fish population. The study of the variation in the different kinds of cyclical behav-
CHAPTER 2. CHARACTERISTICS OF TIME SERIES
18
ior in a time series can be aided by computing the power spectrum which shows the variance as a function of the frequency of oscillation. Comparing the power spectra of the two series would then give valuable information relating to the relative cycles driving each one. One might also want to know whether or not the cyclical variations of a particular frequency in one of the series, say the SOI, are associated with the frequencies in the recruitment series. This would be measured by computing the correlation as a function of frequency, called the coherence. The study of systematic periodic variations Recruit
0
−1.0
20
−0.5
40
0.0
60
0.5
80
1.0
100
Southern Oscillation Index
0
100
200
300
400
0
100
200
300
400
Figure 2.1: Monthly SOI (left) and simulated recruitment (right) from a model (n=453 months, 1950-1987).
in time series is called spectral analysis. See Shumway (1988) and Shumway and Stoffer (2001) for details. We will need a characterization for the kind of stability that is exhibited by the environmental and fish series. One can note that the two series seem to oscillate fairly regularly around central values (0 for SOI and 64 for recruitment). Also, the lengths of the cycles and their orientations relative to each other do not seem to be changing drastically over the time histories. In order to describe this in a simple mathematical way, it is con-
CHAPTER 2. CHARACTERISTICS OF TIME SERIES
19
venient to introduce the concept of a stationary time series. Suppose that we let the value of the time series at some time point t be denoted by {xt}. Then, the observed values can be represented as x1, the initial time point, x2, the second time point and so forth out to xn, the last observed point. A stationary time series is one for which the statistical behavior of xt1 , xt2 , . . . , xtk is identical to that of the shifted set xt1+h, xt2+h, . . . , xtk +h for any collection of time points t1, t2, . . ., tk and for any shift h. This means that all of the multivariate probability density functions for subsets of variables must agree with their counterparts in the shifted set for all values of the shift parameter h. This is called strictly strong stationary, which can be regarded as a mathematical assumption. The above version of stationarity is too strong for most applications and is difficult or impossible to be verified statistically in applications. Therefore, to relax this mathematical assumption, we will use a weaker version, called weak stationarity or covariance stationarity, which requires only that first and second moments satisfy the constraints. This implies that E(xt) = µ and E[(xt+h − µ)(xt − µ)] = γx(h),
(2.1)
where E denotes expectation or averaging over the population densities and h is the shift or lag. This implies, first, that the mean value function does not change over time and that γx(h), the population covariance function, is the same as long as the points are separated by a constant shift h. Estimators for the population covariance are important diagnostic tools for time correlation as we shall see later. When we use the term stationary time series in the sequel, we mean weakly stationary as defined by (2.1). The autocorrelation function (ACF) is defined as a scaled version of (2.1) and is written as ρx(h) = γx(h)/γx(0),
(2.2)
CHAPTER 2. CHARACTERISTICS OF TIME SERIES
20
which is always between −1 and 1. The denominator of (2.2) is the mean square error or variance of the series since γx(0) = E[(xt −µ)2]. Exercise: For given time series {xt}nt=1, how do you check whether the time series {xt} is weakly or strong stationary? Thank about this problem. Example 1.1: We introduce in this example a simple example of a time domain model to be considered in detail later. A simple moving average model assumes that the series xt is generated from linear combinations of independent or uncorrelated “shocks” wt, sometimes called white noise1 (WN), to the system. For example, the simple first order moving average series xt = wt − 0.9 wt−1 is stationary when the inputs {wt} are assumed independent with E(wt) = 0 and E(wt2) = 1. It can be easily verified that E(xt) = 0 and γx(h) = 1 + 0.92 if h = 0, −0.9 if h = ±1, 0 if h > 1 (please verify this). We can see what such a series might look like by drawing random numbers wt from a standard normal distribution and then computing the values of xt. One such simulated series is shown in Figure 2.2 for n = 200 values; the series resembles vaguely the real data in the bottom panel of Figure 2.1. Many of our techniques are based on the idea that a suitably modified time series can be regarded as stationary (weakly). This requires first that the mean value function be constant as in (2.1). Several simple commonly occurring nonstationary time series can be illustrated by letting this assumption be violated. For example, the 1
white noise is defined as a sequence of uncorrelated ransom variables with mean zero and same variance.
CHAPTER 2. CHARACTERISTICS OF TIME SERIES
21
−3 −2 −1
0
1
2
3
Simulated MA(1)
0
50
100
150
200
Figure 2.2: Simulated MA(1) with θ1 = 0.9.
series yt = t + xt, where xt is the moving average series of Example 1.1, will be nonstationary because E(xt) = t and the constant mean assumption of (2.1) is clearly violated. Four techniques for modifying the given series to improve the approximation to stationarity are detrending, differencing, transformations, and linear filtering as discussed below. A simple example of a nonstationary series is also given later. 2.2.1
Detrending
One of the dominant features on many economic and business time series is the trend. Such a trend can be upward or downward, it can be steep or not, and it can be exponential or approximately linear. Since a trend should be definitely somehow be incorporated in a time series model, simply because it can be exploited for out-ofsample forecasting, an analysis of trend behavior typically requires quite some research input. The discussion later will show that the type of trend has an important impact on forecasting. The general version of the nonstationary time series given above is
CHAPTER 2. CHARACTERISTICS OF TIME SERIES
22
to assume a general trend of the form yt = Tt + xt, particularly, the linear trend Tt = β1 + β2 t. If one looks for a method of modifying the above series to achieve stationarity, it is natural to consider the residual x" t = yt − T#t = yt − β#1 − β#2 t
as a plausible stationary series where β#1 and β#2 are the estimated intercept and slope of the least squares line for yt as a function of t. The use of the residual or detrended series is common and the process of constructing residual is known as detrending.
8
Example 1.2: To illustrate the presence of trends in economic data, consider the five graphs in Figure 2.3, which are the annual indices of
5
6
7
agriculture commerce consumption industry transport
1955
1960
1965
1970
1975
1980
1985
Figure 2.3: Log of annual indices of real national output in China, 1952-1988.
real national output (in logs) in China in five different sectors for the sample period 1952 − 1988. These sectors are agriculture, industry, construction, transportation, and commerce. From this figure, it can be observed that the five sectors have grown over the year at different rates, and also that the five sectors seem to have been affected by the, likely, exogenous shocks to the Chinese economy around 1958 and 1968. These shocks roughly correspond
CHAPTER 2. CHARACTERISTICS OF TIME SERIES
23
to the two major political movements in China: the Great-LeapForward around 1958 until 1962 and the Cultural Revolution from 1966 to 1976. It also appears from the graphs that these political movements may not have affected each of the five sectors in a similar fashion. For example, the decline of the output in the construction sector in 1961 seems much larger than that in the industry sector in the same year. It also seems that the Great-Leap-Forward shock already had an impact on the output in the agriculture sector as early as 1959. To quantify the trends in the five Chinese output series, one might consider a simple regression model with a linear trend as mentioned earlier or some more complex models. Example 1.3: As another more interesting example, consider the global temperature series given in Figure 2.4. There appears to be an increasing trend in global temperature which may signal global Original Data with Linear and Nonlinear Trend
−1.0
−0.5
0.0
0.5
o o o o o o o o o oo o o o o o o oo o oo o o o o o o ooo oo o o oo o o o o oo o o o o o o o o o o o o oooo o o o oo o o oo o o o o o o o o o oo o o o o o o o oo o o o o o o o o o o o o o o oo oo o o o o o oo o o o o o o o o o o oo oo o o o o o o o o o o o o o o o o o o o oo o o o oo o o oo o o o o o o o o o o o o o o o oo oo o o o oo o oo o o o o oo o oo o o o o oo o o o o o o oo o o o o ooo o o oo o oo o o o o o o o o o o o o o o o o o oo o o o o o o o o o oo o oo o o o o o o o o o o oo oo o o o o o o o o o o o oo o o o o o o oo o o oo oo oo o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o oo o o o o o o o o o o o o ooo o o o o o o o o oo o o o ooo o o o o o o o o o o o o o o oo o o oo o o o oo o o o oo o o oo o o o o o o o o oo o o o o o oo oo o o o o o o o o o o o o o o o o o o o o o o o o o o o oo o o o o o oo o o o oo oo o oo o o o o o o o o o o o o o o o o o o o o o o o o oo oo o o o o oo o o oo o o o o o o oo o o o o o o o o o o o o o o oo o oo o oo o o oo oooo o o o oo o o o o o o o oo o o o o o oo o oo oo o o o o oo o o o oo o o oo o o o o o o o o oo oo o o o o o o o o o oo o o o o o o o o o o o o o o o o o o o o o o oo o o o o o o o o oo o oo oo o o o o o o o o o o o o o ooooo oo o o o o o o o o o o oo o o o o o o o oo o o o o o o o o o oo o oo o o o o o o o o o o o oo oo o oo o oo oo o o o o o o o o o oo o o o o o o o o o o o oo o o o o o oo o o o o o o o o o o oo o o o o o oo o o o o o o oo o o o o o o ooo o o oo o o o o o o o o oo o o o o o o o o o ooo oo o o o o o o o o o o o o o o o oo o o o o oo o ooo o o o o oo o o o o oo o o o o o o o o ooo o o o oo o o o o o o o oo o oo o o oo o o o o oo o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o oo o o o oo oo oo o o o oo o o o o o o o o o o oo o oo o oo o o o o o oo o o o o o o o o ooo o o o o o o o o o o o o o o oo o o o o o o oo o o o o o ooo oo o o o oo o o o o oo oo o o oo oo oo o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o oo o oo o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o oo oo o oo ooo o o o o oo o o o o o o oo oo o oo o o o o o o o o o o o o oo o o oo o o o o o o o oo o o o o o o o o o oo o o o o oo o o o o o o o o o o o o o o o o ooo o o o o o o o o o o oo o o oo o o oo oo oo o o o o o o o o o oo o o o oo oo o o o o o o oooo o o oo o o o oo o o o o o o oo o o o o o oo o o o o o oo o o o o o o o o o oo oo o o o o o o o ooo oooo o o o o o o o o o o oo o oo o oo o o ooo oo o o o o o o o o o o oo o o o oo o o o o oooo o oo o oo o o o o oo o o o o o o o oo oo o oo o o o o o oo o o oo o oo o oo o o oo o o oo o o o o o o o o o o o oo o oo oo o o o o o oo o o oo o o oo o o o o oo o o o o o oo oo oo o o o o oo ooo o o o o o oo oo oo o o oo o o o o o o o o o oo o o o oo o o o o o o o o o o oo o o o o o o o o oo o o o o o o o o o oo o o o o o o o o oooo oo o o o o o o o o oo o o o o o o o o o o o o o o o o o o o o o o o o o o oo oo o oo o o oo o o oo o oo o o o o o o oo o o o o oo oo o o o oo o o oo o o o o o o o o o o o o o o o o o o o o o oo o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o oo o o o o
o
o
1900
1950
2000
Figure 2.4: Monthly average temperature in degrees centigrade, January, 1856 - February 2005, n = 1790 months. The straight line (wide and green) is the linear trend y = −9.037 + 0.0046 t and the curve (wide and red) is the nonparametric estimated trend.
warming or it may be just a normal fluctuation. Fitting a straight line relating time t to temperature in degrees Centigrade by simple least squares leads to β#1 = −9.037, β#2 = 0.0046 and a detrended series shown in the left panel of Figure 2.5. Note that the detrended series
CHAPTER 2. CHARACTERISTICS OF TIME SERIES Detrended: Nonlinear
−0.5
−0.6 −0.4 −0.2
0.0
0.0
0.2
0.4
0.5
0.6
Detrended: Linear
24
1900
1950
2000
1900
1950
2000
Figure 2.5: Detrended monthly global temperatures: left panel (linear) and right panel (nonlinear).
still contains a trend like bulge that is highest at about t = 60 years. In this case the slope of the line is often used to argue that there is a global warming trend and that the average increase is approximately 0.83 degrees F per 100 years. It is clear that the residuals in Figure 2.5 still contain substantial correlation and the ordinary least squares model may not be appropriate. There may also be other functional forms that do a better job of detrending; for example, quadratic or logarithmic representations are common or nonparametric approach can be used (We will discuss this approach in detail later); see the detrended series shown in the right panel of Figure 2.5. Detrending is particularly essential when one is estimating the covariance function and power spectrum. 2.2.2
Differencing
A common method for achieving stationarity in nonstationary cases is with the first difference ∆yt = yt − yt−1,
CHAPTER 2. CHARACTERISTICS OF TIME SERIES
25
where ∆ is called the differencing operator. The use of differencing as a method for transforming to stationarity is common also for series with trend. For example, in the trend in Example 1.3, the differenced series would be ∆yt = b + xt − xt−1, which is stationary because the difference xt − xt−1 can be shown to be stationary. Example 1.4: The first difference of the global temperature series is shown in Figure 2.6 and we see that the upward linear trend has disappeared as has the trend like bulge that remained in the
−0.5
0.0
0.5
1.0
Differenced Time Series
1900
1950
2000
Figure 2.6: Differenced monthly global temperatures.
detrended series. Higher order differences are defined as successive applications of the operator ∆. For example, the second difference is ∆2yt = ∆∆yt so that∆2yt = yt − 2 yt−1 + yt−2. If the model also contains a quadratic trend term c t2, it is easy to show that taking the second difference reduces the model to a stationary form. The trends in Figures 2.3 and 2.4 are all of the familiar type, that is, many economic time series display an upward moving trend. It is however not necessary for a trend to move upwards to be called a trend. It is also that a trend is less smooth and may display slowly changing tendencies which once in a while change directions.
CHAPTER 2. CHARACTERISTICS OF TIME SERIES
26
250
60
Example 1.5: An example of such a trending pattern is given in the top left panel of Figure 2.7 and the first difference in the top right
40
Data
100
0
150
20
200
First difference
60
70
80
90
0
10
20
30
40
30
50
−20 −10
0
10
20
Second order difference
0
10
20
30
40
Figure 2.7: Annual stock of motor cycles in the Netherlands, 1946-1993.
panel of Figure 2.7 and the second order difference in the bottom left panel of Figure 2.7, where the annual stock of motor cycles in The Netherland is displayed, for 1946 − 1993, with the first order and second order differencing time series. From the figures in the right top and bottom panels, we can see that the differencing might not work well for this example. One way to describe this changing trends is to allow the parameters to change over time, driven by some exogenous shocks (macroeconomic variables)2, for example, oil shock in 1974. 2.2.3
Transformations
A transformation that cuts down the values of larger peaks of a time series and emphasizes the lower values may be effective in reducing 2
see the paper by Cai (2006)
CHAPTER 2. CHARACTERISTICS OF TIME SERIES
27
nonstationary behavior due to changing variance. An example is the logarithmic transformation yt = log(xt), where log denotes the exponential-base logarithm. Example 1.6: For example, the data shown in Figure 2.8 represent quarterly earnings per share for the American Company Johnson & transformed log(earnings)
0
0
5
1
10
2
15
J&J Earnings
0
20
40
60
80
0
20
40
60
80
Figure 2.8: Quarterly earnings for Johnson & Johnson (4th quarter, 1970 to 1st quarter, 1980, left panel) with log transformed earnings (right panel).
Johnson from the from the fourth quarter of 1970 to the first quarter of 1980. It is easy to note some very nonstationary behavior in this series that cannot be eliminated completely by differencing or detrending because of the larger fluctuations that occur near the end of the record when the earnings are higher. The right panel of Figure 2.8 shows the log-transformed series and we note that the latter peaks have been attenuated so that the variance of the transformed series seems more stable. One would have to eliminate the trend still remaining in the above series to obtain stationarity. For more details on the current analyses of this series, see the later analyses and the papers by Burman and Shumway (1998) and Cai and Chen (2006). A general transformation is the well-known Box-Cox transformation; see Hamilton (1994, p.126), Shumway (1988), and Shumway
CHAPTER 2. CHARACTERISTICS OF TIME SERIES
28
and Stoffer (2000), defined in terms of arbitrary power xαt for some α in a certain range, which can be chosen based on some optimal criterion such as the smallest mean squared error. 2.2.4
Linear Filters
The first difference is a linear combination of the values of the series at two lags, say 0 and 1 and has the effect of retaining the faster oscillations and attenuating or reducing the slower oscillations. We may define more general linear filters to do other kinds of smoothing or roughening of a time series to enhance signals and attenuate noise. Consider the general linear combination of past and future values of a time series given as yt =
∞ $
j=−∞
aj xt−j
where aj , j = 0, ±1, ±2, . . ., define a set of fixed filter coefficients to be applied to the series of interest. An example is the first difference where a0 = 1, a1 = −1, aj = 0 otherwise. Note that the above {yt} is also called a linear process in probability literature. Example 1.7: To give a simple illustration, consider the twelve month moving average aj = 1/12, j = 0, ±1, ±2, ±3, ±4, ±5, ±6 and zero otherwise. The result of applying this filter to the SOI index is shown in Figure 2.9. It is clear that this filter removes some higher oscillations and produces a smoother series. In fact, the yearly oscillations have been filtered out (see the bottom panel in Figure 2.9) and a lower frequency oscillation appears with a cycling rate of about 42 months. This is the so-called El Ni˜no effect that accounts for all kinds of phenomena. This filtering effect will be examined further later on spectral analysis since it is extremely important to know exactly how one is influencing the periodic oscillations by filtering.
29
−1.0
−0.5
−0.5
0.0
0.0
0.5
0.5
1.0
CHAPTER 2. CHARACTERISTICS OF TIME SERIES
0
100
200
300
400
0
100
200
300
400
Figure 2.9: The SOI series (black solid line) compared with a 12 point moving average (red thicker solid line). The top panel: original data and the bottom panel: filtered series.
To summarize, the graphical examination of time histories can point the way to further analyses by noting periodicities and trends that may be present. Furthermore, looking at time histories of transformed or filtered series often gives an intuitive idea as to whether one series could also be associated with another. Figure 2.1 indicates that the SOI series tends to precede or lead the recruit series. Naturally, one can ask for a more detailed specification of the leading lagging relation. In the following sections, we will try to show how classical time series methods can be used to provide partial answers to these kinds of questions. Before doing so, we spend some space to introduce some other features, such as seasonality, outliers, nonlinearity, and conditional heteroscedasticity common seen in the economic and financial as well as environmental data.
CHAPTER 2. CHARACTERISTICS OF TIME SERIES
2.3 2.3.1
30
Other Key Features of Time Series Seasonality
When time series (particularly, economic and financial time series) are observed each day or month or quarter, it is often the case that such as a series displays a seasonal pattern (deterministic cyclical behavior). Similar to the feature of trend, there is no precise definition of seasonality. Usually we refer to seasonality when observations in certain seasons display strikingly different features to other seasons. For example, when the retail sales are always large in the fourth quarter (because of the Christmas spending) and small in the first quarter as can be observed from Figure 2.10. It may also be possible that seasonality is reflected in the variance of a time series. For example, for daily observed stock market returns the volatility seems often highest on Mondays, basically because investors have to digest three days of news instead of only day. For mode details, see the book by Taylor (2005, §4.5) and Tsay (2005). Example 1.8: In this example we consider the monthly US retail sales series (not seasonally adjusted) from January of 1967 to December of 2000 (in billions of US dollars). The data can be downloaded from the web site at http://marketvector.com. The U.S. retail sales index is one of the most important indicators of the US economy. There are vast studies of the seasonal series (like this series) in the literature; see, e.g., Franses (1996, 1998) and Ghysels and Osborn (2001) and Cai and Chen (2006). From Figure 2.10, we can observe that the peaks occur in December and we can say that retail sales display seasonality. Also, it can be observed that the trend is basically increasing but nonlinearly. The same phenomenon can be observed from Figure 2.8 for the quarterly earnings for Johnson & Johnson.
31
50000 100000
200000
300000
CHAPTER 2. CHARACTERISTICS OF TIME SERIES
0
100
200
300
400
Figure 2.10: US Retail Sales Data from 1967-2000.
If simple graphs are not informative enough to highlight possible seasonal variation, a formal regression model can be used, for example, one might try to consider the following regression model with seasonal dummy variables ∆ yt = yt − yt−1 =
s $
j=1
βj Dj,t + εt,
where Dj,t is a seasonal dummy variable and s is the number of seasons. Of course, one can use a seasonal ARIMA model, denoted by ARIMA(p, d, q) × (Q, D, Q)s, which will be discussed later. Example 1.9: In this example, we consider a time series with pronounced seasonality displayed in Figure 2.11, where logs of fourweekly advertising expenditures on ratio and television in The Netherlands for 1978.01 − 1994.13. For these two marketing time series one can observe clearly that the television advertising displays quite some seasonal fluctuation throughout the entire sample and the radio advertising has seasonality only for the last five years. Also, there seems to be a structural break in the radio series around observation 53. This break is related to an increase in radio broadcasting minutes in January 1982. Furthermore, there is a visual evidence that the trend
32
11
CHAPTER 2. CHARACTERISTICS OF TIME SERIES
9
10
television
8
radio
0
50
100
150
200
Figure 2.11: Four-weekly advertising expenditures on radio and television in The Netherlands, 1978.01 − 1994.13.
changes over time. Generally, it appears that many time series seasonally observed from business and economics as well as other applied fields display seasonality in the sense that the observations in certain seasons have properties that differ from those data points in other seasons. A second feature of many seasonal time series is that the seasonality changes over time, like what studied by Cai and Chen (2006). Sometimes, these changes appear abrupt, as is the case for advertising on the radio in Figure 2.11, and sometimes such changes occur only slowly. To capture these phenomena, Cai and Chen (2006) proposed a more general flexible seasonal effect model having the following form: yij = α(ti) + βj (ti) + eij ,
i = 1, . . . , n,
j = 1, . . . , s,
where yij = y(i−1)s+j , ti = i/n, α(·) is a (smooth) common trend function in [0, 1], {βj (·)} are (smooth) seasonal effect functions in [0, 1], either fixed or random, subject to a set of constraints, and the error term eij is assumed to be stationary. For more details, see Cai and Chen (2006).
CHAPTER 2. CHARACTERISTICS OF TIME SERIES
2.3.2
33
Aberrant Observations
Possibly distorting observations do not necessarily come in a sequence as in the radio advertising example (which might have socalled regime shifts). It may also be that only few observations have a major impact on time series modeling and forecasting. Such data points are called aberrant observations (outliers in statistics).
5
Example 1.10: As an illustration example, we consider the differenced yt, that is ∆ yt = yt − yt−1, where yt = log(wt), with wt the price level, and the inflation rate ft = (wt −wt−1)/wt−1 in Argentina, for the sample 1970.1 − 1989.4 in Figure 2.12. From the figure, it
0
1
2
3
4
difference inflation
70
75
80
85
90
Figure 2.12: First difference in log prices versus the inflation rate: the case of Argentina, 1970.1 − 1989.4.
is obvious that in the case where the quarterly inflation rate is high (as is the case in 1989.3 where it is about 500 percent), ∆ yt series is not a good approximation to the inflation rate (since the 1989.3 observation would not now correspond to about 200 percent). Also, we can observe that the data in 1989 seem to be quite different from those observations the year before. In fact, if there is any correlation between ∆ yt and ∆ yt−1, such a correlation may be affected by these
CHAPTER 2. CHARACTERISTICS OF TIME SERIES
34
observations. In other words, if a simple regression is used to model the correlation between ∆ yt and ∆ yt−1, we would expect that an estimate of the coefficient ρ is influenced by the data points in the last year. The ordinary least square estimates of ρ are ρ" = 0.561(0.094) for the entire sample and ρ" = 0.704(0.082) for the sample without the data points in the last year, where the estimated standard error is in parentheses. It now turns to the question on how to handle with these aberrant observations. See Chapter 6 of Franses (1998) for details on discussion of methods to delete several types of aberrant observations, and also methods to take account of such data for forecasting. 2.3.3
Conditional Heteroskedasticity
An important feature of economic time series, and in particular, of financial time series is that aberrant observations tend to emerge in clusters (persistence). The intuitive interpretation is that if a certain day news arrives on a stock market, the reaction to this news is to buy or sell many stocks, while the day after the news has digested and valued properly, the stock market returns back to the level of before the arrival of the news. This pattern would be reflected by (a possibly large) increasing or decreasing (usually, the negative impact is larger than the positive impact, called asymmetric) in the returns on one day followed by an opposite change on the next day. As a result, we can regard them as aberrant observations in a row and two sudden changes in returns are correlated since the second sharp change is caused by the first. This is called conditional heteroskedasticity. To characterize this phenomenon, one might use the so called the autoregressive conditional heteroscedasticity (ARCH) model of Engle (1982) and the generalized autoregressive
CHAPTER 2. CHARACTERISTICS OF TIME SERIES
35
conditional heteroscedasticity (GARCH) model of Bollerslev (1986) or other GARCH type models; see Taylor (2005) and Tsay (2005). Example 1.11: This example concerns the closing bid prices of the Japanese Yen (JPY) in terms of U.S. dollar. There is a vast amount of literature devoted to the study of the exchange rate time series; see Sercu and Uppal (2000) and the references therein for details. Here we explore the possible nonlinearity feature (see the next section), heteroscedasticity, and predictability of the exchange rate series (We will discuss this later). The data is a weekly series from January 1, 1974 to December 31, 2003. The daily noon buying rates in New York City certified by the Federal Reserve Bank of New York for customs and cable transfers purposes were obtained from the Chicago Federal Reserve Board (www.frbchi.org). The weekly series is generated by selecting the Wednesdays series (if a Wednesday is a holiday then the following Thursday is used), which has 1566 observations. The use of weekly data avoids the so-called weekend effect as well as other biases associated with nontrading, bid-ask spread, asynchronous rates and so on, which are often present in higher frequency data. We consider the log return series yt = 100 log(ξt/ξt−1), plotted in Figure 2.13, where ξt is an exchange rate level on the t-th week. Around the 44th week of 1998 (the period of the Asian fiance crisis), the returns on the Japanese and U.S. dollar exchange rate decreased by 9.7%. Immediately after that observation, we can find several data points that are large in the absolute value. Additionally, in other parts of sample, we can observe “bubbles”, i.e., clusters of observations with large variances. This phenomenon is called volatility clustering (persistence) or, conditional heteroskedasticity. In other words, the variance changes over time. To allow for the possibility that high volatility is followed by high
36
−10
−5
0
5
CHAPTER 2. CHARACTERISTICS OF TIME SERIES
0
500
1000
1500
Figure 2.13: Japanese - U.S. dollar exchange rate return series {yt }, from January 1, 1974 to December 31, 2003.
volatility, and that low volatility will be followed by low volatility, where volatility is defined in terms of returns themselves, one can consider the presence of so-called conditional heteroskedasticity or its variants such as ARCH or GARCH type models by using (∆ yt)2 as the variance of the returns. Also, once can replay (∆ yt)2 by |∆ yt|, the absolute value of return. The main purpose of exploiting volatility clustering is to forecast future volatility. Since this variable is a measure of risk, such forecasts can be useful to evaluate investment strategies or portfolio selection or risk management. Furthermore, it can be useful for decisions on buying or seeling options or derivatives. See Hamilton (1994, Chapter 21), Taylor (2005), and Tsay (2005) for details. 2.3.4
Nonlinearity
The nonlinear feature of time series can be seen often in economic and financial data as well as other applied fields; see the popular books by Tong (1990), Granger and Ter¨asvirta (1993), Franses and van Dijk (2000), and Fan and Yao (2003). Beyond linear domain, there are
CHAPTER 2. CHARACTERISTICS OF TIME SERIES
37
infinite many nonlinear forms to be explored. Early development of nonlinear time series analysis focused on various nonlinear (sometimes non-Gaussian) parametric forms. The successful examples include, among others, the ARCH-modeling of fluctuating structure for financial time series, and the threshold modeling for biological and economic data, as well as regime switches or structural change modeling for economic and financial time series.
10
Example 1.12: Consider the example of the unemployment rate in Germany for 1962.1 to 1991.4 in Figure 2.14. From the graph in the
0
2
2
4
4
6
6
8
8
unadjusted seasonally adjusted
65
70
75
80
85
90
2
4
6
8
Figure 2.14: Quarterly unemployment rate in Germany, 1962.1 − 1991.4 (seasonally adjusted and not seasonally adjusted) in the left panel. The scatterplot of unemployment rate (seasonally adjusted) versus unemployment rate (seasonally adjusted) one period lagged in the right panel.
left panel, it is clear that unemployment rate sometimes rises quite rapidly, usually in the recession years 1967, 1974 − 1975, and 1980 − 1982, while it decreases very slowly, usually in times of expansions. This asymmetry can be formalized by estimating the parameters in the following simple regression ∆ yt = yt − yt−1 = β1 It(E) + β2 It(R) + εt,
where It(·) is the indicator variable, which allows the absolute value of the rate of change to vary across the two state, say, “decreasing
CHAPTER 2. CHARACTERISTICS OF TIME SERIES
38
yt” and “increasing yt” from β1 to β2, where β1 may be different from −β2. For the German seasonally adjusted unemployment rate, we find that β#1 = −0.040 and β#2 = 0.388, indicating that when the unemployment rate increases (in recessions), it rises faster than when it goes down (in expansions). Furthermore, from the graph in the right panel, the scatterplot of yt versus yt−1, where yt is the seasonally adjusted unemployment rate, we can observe that this series displays cyclical behavior around points that shift over time. When these shifts are endogenous, i.e., caused by past observations on yt themselves, this can be viewed as a typical feature of nonlinear time series. For the detailed analysis of this dataset using a nonlinear methods, the reader is referred to the book by Franses (1998) for nonlinear parametric model and the paper by Cai (2002) for nonparametric model. 2.4
Time Series Relationships
One can identify two basic kinds of association or correlation that are important in time series considerations. The first is the notion of self correlation or autocorrelation introduced in (2.2). The second is that series are somehow related to each other so that one might hypothesize some causal relationship existing between the phenomena generating the series. For example, one might hypothesize that the simultaneous oscillations of the SOI and recruitment series suggest that they are related. We introduce below three statistics for identifying the sources of time correlation. Assume that we have two series xt and yt that are observed over some set of time points, say t = 1, . . . , n.
CHAPTER 2. CHARACTERISTICS OF TIME SERIES
2.4.1
39
Autocorrelation Function
Correlation at adjacent points of the same series is measured by the autocorrelation function defined in (2.2). For example, the SOI series in Figure 2.1 contains regular fluctuations at intervals of approximately 12 months. An indication of possible linear as well as nonlinear relations can be inferred by examining the lagged scatterplots, defined as plots that put xt on the horizontal Figure 2.15. Multiple lagged scatterplots showing the relations between SOI and the present (xt) and lagged values (xt+h) of SOI at lags h = 0, 1, . . . , 12 axis and xt+h on the vertical axis for various values of the lag h = 1, 2, 3, . . ., 12.
14
o o oo oo o oo o o o o o oo o o o o o o o o o o o o o o o o oo o oo o o o o o o oo o oo o o o o o o o o o o o oo o o o o o o o o oo o o o oo o o o o o o o o o o o o o o o o o o o oo o o o o o o o o o o o o o o oo o o o o oo o o o o oo oo o o o o o o o o o o o o o o o o o oo o o o o o o o o o o o o o o o o o o o o o o o o o o o o oo o o ooo o oo o o o o o o o o o o oo o o o o o oo o oo o o o o o o o oo o o o o o o o o o o o o o o ooo o o oo o o o o o ooo o o oo o o o o o ooo o o o o o o o o o o o o o o o o o oo oo o oo oo oo o o o o oo o o oo ooo o o oooo o oo o o o o oo ooo ooo o o o oo oo o o o oo oo −1.0 0.0 0.5 1.0
15
1.0 0.0 −1.0 1.0 0.0 −1.0 1.0
8
0.0
o o o o o o o o oo oo o oooo oo o oo o oo oooo oo oo ooo o o o o o o o o o o o o o o o o oo o oo o o oo o o o o o o o o o o o oo o o oo o o o o o o o o o o o o o o oo o o o o o o o o o o o o oo o o o ooo o o o o o o o o oo o o o o o o o o o o o o o o o oo oo o oo o o o o o o o o o oo o oo o o o o o o o o o ooo o o o o o o oo o o o o o o o o o o o o oo o o o o o o o o o o o o o oo oo oo o o o o o o oo o o o o o o o o oooo o o o o o o o o o o o o o o o o o o o o o o o o oo o o o o o o oo o oo oo o o o o o o o o o o o o o o oo o o oo o oo o o o o oo o o oo o oo o o o oo o o oo o o −1.0 0.0 0.5 1.0
−1.0
11
o o o o o o o o o ooo o o o oo o o o o o o o o o oo oo o oo o oo o oo o o oo oo o o o oo o o o o o o o o oo oo o o o o o o o o o oo o o o o o o o o o o o o o o o oo o oo o o o oo o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o oo o o ooo o o o o o o o oooo o oo o o o o o o o o o o o o o o o o o o o oo o o o o o o o o o o oo o o o o o o o o o o o oo oo o o o o o o o o o o o o o o o o o o o ooo o ooo o o oo oo o o o oo oo oo o o oo o o o o o oo o o oo oo ooo o o o o oo o oo o oo o o o o o ooo o o o o ooooo o oo o o o o oo o o ooo o o o o o o oo oo o −1.0 0.0 0.5 1.0
12
1.0
1.0 0.0 −1.0 1.0 0.0 −1.0 1.0 −1.0
0.0
o o o o o o o o o o o oo o o oo o oo o o o oo oo o o o ooo oo oo o oo oo oo o o o o o o o o o oo o o o o o ooo o o o o o o o o o o o o o o oo o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o oo o oo o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o oo o o o o o o o oo o oo o oo o o o o o o o o o oo o oo o o o o o o o o o o o o o oo o o o o o oo oo o oo o o o o o o oo oo oo o o o o o o o o ooo o o o o o o o o o o o o o o o o o o o o o o o oo o o o o oo oo oo o o o o o o o o o o o o o o o o o o o o o oo o o o o o o o oo o o oo ooo o o oo o o o o o o o oo oo o −1.0 0.0 0.5 1.0
4
o o o ooo oo o oo oo o o ooo o oo o o o oo o o oo o o o o oo o o o o o o o o ooo o o o o o o o oo o oo o ooo o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o oo o o o o o o o o o o o o o o o o o o o o o o o o oo o oo o o o o o o o o o oo o o o o o oooo o o o o ooo o o oo o o o oo o o o o o o o o o o o o o o o o o o o o o o o o o o oo o o o o ooo o o o oo oo o o o o o o oo o o o o o o o o o o oo oo o o o o o o oo o o oo o o o o o o o o o o o o o o o o o o o oo o oooo o o o o oo oo o ooo o o o oo o o o o o oo o o o o o oo oooo oo o o oo o ooo oo o ooo oo o oo −1.0 0.0 0.5 1.0
0.0
o o oo o o o o o o o ooo o o oo o oo o o o o o o o o o o o o oo o o o o oo o o o oo o o o o o oooo oo o o o o o o o oo o o o o o oo o o o o o o o o o o o o o o o o o o oo o oo o o ooo o o o o o o o o o o o o o o oooo oo ooo o o o o o o o o o o o o o o o o o o o o ooo o o o o o o o o o o o oo o o o o o o o o o o o o o o o o oo oo o o o o o o o o o o o ooo o o o o o o o o o oo o o o o o oo o o o o oo o o o o o o o o o o o o o o o o ooo oo o o o o o o oo oo o o o oo o o o o o o o oo oo o o o oo ooo o oo ooo oo oo oo oo o oo oo o oooo ooo oo o o o o oo o oo o o ooo o o oo o o −1.0 0.0 0.5 1.0
7
1.0
10
ooo o oo ooooo o o o o o o o o o o o oo o ooo oo ooo oo o o o o o o o o o o o o oo o o o o o oo o o o o o o o o o oo o o o o o o o o oo o oo o oo oo o o o o o o oo o oo o o o oo o o o o oo ooo o o o o o o o o o o o oo o ooo o o o o o o o o o o ooo o o oo o o o o o o o o o o o o o o o o o oo o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o oo oo o o o o oo o oooo o oo o oooo o o o o o o oo o o o o o o oo o o o o o o o oo oo o o o o o o o oo o o o o o o o o oo o o o o oo o oo oo oo oooo oo oo o o oo oo o o o o oooo oo o oo ooo o oo o o o oo oo o o o −1.0 0.0 0.5 1.0
0.0
1.0 0.0 −1.0 1.0 0.0 −1.0 1.0 0.0
o o o o o oo o o oo o ooo oo o oo o oo o o oo o oo ooo o oo o o o o o o o oooo o oo o oo o oo o oo o o oo o o o o o oo o o o o o o o o o o o oo o o o o o o o o o o o o o o oo oo oooo o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o oo oo o o o o o o o o o o oo oo o o o o o o oo o oo o o oooo oo o o o o o o o o oo o o o o o o o o oooo o o o oo o o o o o o o o o o o o o o o o oo o o o o o o o o o o oooo ooo o o o o oo o o o o o o oo o o o o oo o o oo o oo o o o o o o o o o oo oo o o oo oo oo ooo o o o o ooo oo o o oo o o o o o o o o o o o o ooo o o o o o −1.0 0.0 0.5 1.0
3
o oo o oo o o o ooo ooo o o o o o oo o o oo oo o o oo oo o o oo o o oo o o oo oo o o o oo o o o oo o o o o o o o o o ooo o o o ooo oo o ooo oo o o o o o o o o o o o o o o o o o o o o o o ooo o o o o o o o o o ooo o o o o o o o o o oo o o o oo o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o ooo o o o o o o o o ooo o o o ooo oo o o o o o o oo o o o o o o o oo oo o o o o o o o o o o oo o o o o o o o o o o o o o ooo o ooo o ooo oo o oo o oo oo o oo ooo o o o oo o o o o o o o o oo o oo o o o oo ooo o o o o ooo o o o o o o oo o o o ooo o oo o o o o o o ooo o o oo o o oo −1.0 0.0 0.5 1.0
−1.0
13
6
o o oo o o ooo o o o o o o o o o ooo o o o o o o o oo o o o o o oo o o o oo o o o oo o o o o o o o o o o oo o oo o o o o o o o o o o o o o o o o o o o o o o o ooo o oo o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o ooo o o o o o o o o o o o o o o o oo o oo o o o o o o o o o o o o o o o o o o o oo o oo o o o o o o o o o o o o o oo oo o o o o o o o o o oo ooo oo o o o o oo o o o o o oo o o o oo o o o oo o o o o oo o oo o o o o o o o o o o o o oo o o o o o oo o oo o o o o o o o oo o o o o o o o o ooo o o oo o o o o oo o o o o o o o o o o o ooo oo o o o o o o oo o o o o o o −1.0 0.0 0.5 1.0
−1.0
−1.0
0.0
o oo o oo o o o o ooo o o o o oo o o oo oo o o o o o o o o o o o o o o o oo o o o o o o o o o ooo o o oo o o o o o o oo o o o o o oooo o oo o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o oo o o ooo o o o o o oo o o oo o o o o o o o o o o oo o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o oo oo o o o o o o o o o o o oooo o o o o o o o o o o o o o o o o o o o o oooo o o o o o o o o oo o o oo oo oo o o oo o o o o o o oo o o o o o o o o o o o oo oo o o oo o oo o o oo oo o o o o o o o o o o o o oo o o o ooo o o o o o o o o oooo o o o o o o o o o o o o oo o o oo oo o o −1.0 0.0 0.5 1.0
oo ooo ooo oo oo oo o o oo o o oo o o ooo oo o o o o o o o o o o o o o o o o o o oo oo o o o o o o oo o o o o o o o o o o o o o ooo o o o o o o o o o o o o oo o o oo o o o o o o o o o o ooo oo o o o o o o o o ooo o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o oo o o o o o o o o ooo o o o o o o o o o o o o o o o o o o o o o oo o o o o o o o oo o o o o o o o o o oo o o o o oo o o o o oo o o o o o o oo o o o o o o o o o o o o o o o o o o ooo o o o oo oo o o o oo oo o o o o o o o o o o o o o oo o oo o o oo o o o o o o o oo oo o o o o o o o o o o ooo oo o ooo oooo oo −1.0 0.0 0.5 1.0
−1.0
9
1.0
−1.0
0.0
o oooo o o o o o o o o o oo o oo ooo oo o oo oo o o ooo o o oo o o o oo oo o o o o o o o o o o o oo o o o o o oo o o o o o o o o o o o o o oo oo o o o o o o o oo o o o o o o o o o o ooo o oo o o ooo o o o o o o o o o oo o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o oo o o o o ooo oo o o o ooo o oo o o o o ooo o o o o oo o o oo o o o oo o o o oo oo oo o o o o o o o o oo o o o o o o o o o o o o o o o o o o o o o o o o o o o o o ooo o o o o o o o o o o o o oo ooo o o oo o o oo o o o o o o ooo o oo oo o o oo o oooooo o o o oo ooo o o ooo oo o o o oo oo o ooo −1.0 0.0 0.5 1.0
2
1.0
5
1.0
−1.0
0.0
o o o o o oo o ooo o o o o o o o o o o o oo o oo oo o oo o o oo oo o oo o o o oo o o oo o o o o o o o o o o o o o o o o o o o o o o ooo o o o o o o o o o o o o o o ooo oo o o o o o o ooo o o o o o o o o o o o o o oo oo o o o o o o o o o o oo o o o o o o o o o o o o o o o o o o o o o o o o o o o ooo o o o o o o oo o o o o o o o o o o o o o o o o o o o oo o o o o o o o o o o o o o o o o oo oo o o oo o oo o o oo oo o oo o o o o o oo o o o o o o o o o o o o o o o o o o o o o o o o o o oo o o ooooo ooo oooo o oo o o o o o o o oo o o o o oo o ooo o oo oo o oooo oo o oo o o o o o o o o o o o oo o o o o o o o o o o o oo −1.0 0.0 0.5 1.0
o oo o oo o o oo oo o o o o o o o o o o o o oooo o o oo o o o o oo o o o o o o oo o o o o o o o o o oo o o o oo o o o o o o o o o o o o o o o oo o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o oo o o o o o o o o o o o o ooo oo o o o o oo o o o o oo o o o o o ooo o o o o o o o o o o o o o o o oo oo o o o o o o o oo o o oo o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o ooo o oo o o o o oo o o oo oo oo o oo o o o o o ooo oooo oo o o oo o o ooo oo oo o o oo o oo o ooo oo o oo oo o oo oo o ooo o oo o o oo o o o −1.0 0.0 0.5 1.0
0.0
1
1.0
−1.0
0.0
o o o o o o o o o o oo o o o o oo o o o o oo o oo oo o o o o o o o o o o o oo oo o o o o o o o o o o o o o o o o o o o o o o o o o o o oo o oo o o o o o o o o o o oo o o o o o o o o o o o o o o o o o o o o o o o o o o o ooo oo o o o o o o o o oo o o o o o o o o o o o o o o o o o o o o o o o oo o o o o o o oo o oo oo o o o o o o o o o o o o o o o o o o o oo o o o o o o o o o o o oo oo o o o o o o o o o o o o o o o o o o o o o oo o o o o o o oo oo oo o o o o oo o o o o o oo oo o o o o o oo oo o oo o ooo o o o o o o o o o o oo o ooo o oo oooo o ooo oo o oo oo o o −1.0 0.0 0.5 1.0
−1.0
1.0
Example 1.13: In Figure 2.15, we have made a lagged scatterplot of the SOI series at time t + h against the SOI series at time t and obtained a high correlation, 0.412, between the series xt+12 and the
16
Figure 2.15: Multiple lagged scatterplots showing the relationship between SOI and the present (xt ) versus the lagged values (xt+h ) at lags 1 ≤ h ≤ 16.
series xt shifted by 12 years. Lower order lags at t − 1, t − 2 also
CHAPTER 2. CHARACTERISTICS OF TIME SERIES
40
show correlation. The scatterplot shows the direction of the relation which tends to be positive for lags 1, 2, 11, 12, 13, and tends to be negative for lags 6, 7, 8. The scatterplot can also show no significant nonlinearities to be present. In order to develop a measure for this self correlation or autocorrelation, we utilize a sample version of the scaled autocovariance function (2.2), say ρ"x(h) = γ" x(h)/γ" x(0),
where
1 n−h $ γx(h) = (xt+h − x)(xt − x), n t=1 % which is the sample counterpart of (2.2) with x = nt=1 xt/n. Under the assumption that the underlying process xt is white noise, the approximate standard error of the sample ACF is 1 σρ = √ . (2.3) n That is, ρ"x(h) is approximately normal with mean 0 and variance 1/n. "
Example 1.14: As an illustration, consider the autocorrelation functions computed for the environmental and recruitment series shown in the top two panels of Figure 2.16. Both of the autocorrelation functions show some evidence of periodic repetition. The ACF of SOI seems to repeat at periods of 12 while the recruitment has a dominant period that repeats at about 12 to 16 time points. Again, the maximum values are well above two standard errors shown as dotted lines above and below the horizontal axis. 2.4.2
Cross Correlation Function
The fact that correlations may occur at some time delay when trying to relate two series to one another at some lag h for purposes of
0.0 10
20
30
40
50
0
10
20
30
40
50
1.0
0
−0.5
0.0
0.5
ACF of Recruits
0.5
ACF of SOI Index
−0.5
41
1.0
1.0
CHAPTER 2. CHARACTERISTICS OF TIME SERIES
−0.5
0.0
0.5
CCF of SOI and Recruits
−40
−20
0
20
40
Figure 2.16: Autocorrelation functions of SOI and recruitment and cross correlation function between SOI and recruitment.
prediction suggests that it would also be useful to plot xt+h against yt . Example 1.15: In order to examine this possibility, consider the lagged scatterplot matrix shown in Figures 2.17 and 2.18. Figure 2.17 plots the SOI at time t + h, xt+h, versus the recruitment series yt at lag 0 ≤ h ≤ 15 in Figure 2.17. There are no particularly strong linear relations apparent in this plots, i.e. future values of SOI are not related to current recruitment. This means that the temperatures are not responding to past recruitment. In Figure 2.18, the current SOI values, xt are plotted against the future recruitment values, yt+h for 0 ≤ h ≤ 15. It is clear from Figure 2.18 that the series are correlated negatively for lags h = 5, . . . , 9. The correlation at lag 6, for example, is −0.60 implying that increases in the SOI lead decreases in number of recruits by about 6 months. On the other hand, the series are hardly correlated (0.025) at all in the conventional sense,
0.0
0.5
1.0
14
15
0.5
1.0
100 60 0 20 100
0.0
8
100
0 20
60
o o o oo o o o o ooo o o o o o o o oo o oo o o o o o o o oo o o o o oo o o o o o o o o o oo o o oo o o o o o o oo o o o oo o oo o oo o o o oo o o oo o oo o o o o o o o o oo o o o o o o o o o o o o o o oo o o o o o oo o ooo o o o o o o o o o o o o oooo o o o o o o o o o o o o o o o o oo o o o o o o o oo o o o o o oo o oooo o o oooo oo o o o o ooo o oo oo oo o o o oo o o oo o o o oo o o oooo oo o ooo o oo o o o o oo o o o o oo o o o o o o o oo o o o oo o o o o o o o o oo o oo o oo o o o o o o o o o o o oo o o o o o o oo o o ooo oo oo ooo o o o o o o o o o o o o o o o o o o o ooo oo o oo o o o o o oo o o oo o o −1.0 0.0 0.5 1.0
60
o o oo o o o o o oo o o o oo o o o oo o o o oo o o o o o o o o o oo o o o o o o o o o o o o o o o o o oo o o o o ooo o o o o oo o o o o o o o o o o o o o o o o o oo o o oo o o o o o o o oo o o o o o o o o oo o o o oo o o oo o o o o o o oo oooo o o o o o o o o o o o o o oo o o o o o oo o oo o o o oo o o o o o o o o o o o o oo oo o ooooo o ooo oo o o o o oo o oo o o o oo o oo o o o oo oo o o o o o o o o o o o o o o o oo o o o o oo o o o o oo oo o ooo o o o oo oo o o oo o o o o o o o o o o oo oo o o o o o o o o o o oo o oo o oo o oo o o oo o ooo o o o o oo oo o o o o o o o o o o o oo o oo o o o o o oo o o o o oo o o o oo o o o −1.0 0.0 0.5 1.0
0 20
100 60 0 20 100 60 0 20 100 60
o o o oo oo oo oo o o o oo oo o o o o o o o o o o o o o o o o o o o o oo o o o o o o o oo oo o oo oo o o o o o oo o o o o o o o o oo oo o o ooo o o o o o o oooo o o o o o o o o o o o oo o o o o o o o oo o oo o o oo o o o oo o o oo o oo o o o o o o o o o o o o o oo o o o oo o ooo o o o o o o o o oooo ooo o o o oo o o o o o oo o oo o o oo o oooo o oo oo o o o oo oo o o o o o ooo oo o o o o oooo o o oo o o o o o o o o o o o o o o o o oo oo oooo o o o o ooo ooo o o o oo oo o o o o o o oo oo oo o o o oo o oo o o o o o o o o oo o o o oooo o oo o o o o o o o oo oo o o o o oo o o o o o o o o o o oo o o o oo o o o −1.0 0.0 0.5 1.0
4
−1.0
12
100
oo o o o o o o o o o o o o o o oo o o o o o o o o oo o oo o o o o o o o o o oo o o o o o o oo oo o o o o o o ooo oo oo oo o o o o o oo o oo oo o o o o o o o o oo o oo o o o oo o o o o oo oo o o oo oo o o o o o o o o o o o o o oo o o o o o o o o o o o oo o ooo o o o o o o o ooo o oo o oo o o o o o o o o o o o o oo o oo o ooo oo o o ooo oo o o oo o o o o o o oo o o o o oo ooo oo o oo oo o oo o o oo o o oo oo o o o ooo o o oooo o o o o oo o o oo o oo o o oo o o o o o o o o oo o o o oo o oo o o o o o o o o o o o o o o o o o o o o o o o oo o o o o ooo o oo o o o oo oo o o oo oo o o o o o oo o oo −1.0 0.0 0.5 1.0
11
o o o o o o o o o o o o o o o o o o o o o o o ooo o o o o o o o o o oo o o o o o o o o o o o oo o o o o o o o oo o o o o o o oo o o o oo o oo ooo o oo oo o o o o o oo o oo o o o o o o o o o o o o o o oo o oo o o o oo o o o oo o o o oo o o o oo o o o o o o o o oo o o o o o o o o o o o o o o o o o o oo ooo o o oo o oo ooo o oo o oo o o o o o o o o o o o o o o o o oo ooo o o o o o o ooo o o o o o o o o o o o o o o oo o o oo o o o o o o o o o oo o o o oo o o o oo ooo oo o o o o ooo oo o o o o o o o o o o o oo o o o o ooo o ooo o o o oooo oo o o o o o o o o oo o o o ooooo oo oo o oo o oo o o o o o o o o o o o o o o
oo o o oo o o o o o o o oo o o o o o o o oo o o o o ooo o o o o o oo o o o o o oo o o o oo o o ooo o o o o o o o o oo o o o o oo o o o o o oo o oo o o oo o o o o ooo o oo o o o o o o oo o o o ooo o o o o o o o o o o o o o o o o o o o o o o ooo o o o oo o o oo o ooo o o oo o o o o o o oo o o o o o o o o o o o o ooo o ooo oo oo o o oo o o ooo oo o oo ooo o o o o o oo o oo o o o o o o o o oo o o o oo o o o o oo ooo oo o ooo o oo o oo o o o o o o oo oo o ooo o oo o oooo o oo o o o o o o oo oo o o oo o o o o o o o o oo o o o o o o o o o o o oo o oo o o oo o oo o o o o o o oo o o ooo oo oo o −1.0 0.0 0.5 1.0
60
10
7
o o o oo o o o oo o o o o o o o o o o oo o o o o o o o o o oo o o o o oo o o o o o o o o o o o o o o oo o o o o o o o o o o o o oo o o o o o o oo o o o o o o o oo o oo oo o o o o o o o o o o oo o o o o oo o o o oo oo o o o o o oo o o o o o o o o oo o oo o o oo o o ooo o o oo o o o o o oo o o ooo oo ooo o o o oo oo o o o o o o o o o o o oo oo o o oo o o oo o o oo o o o o o o o oo o oo ooo oo o o o oo o o o o oo o o o o o ooooo o o oo o oo o o o oo o o o oo o o o o o o ooo o o o ooo o o o oo o o o oo o oo o o o ooo oo oo o o o o o o o o ooo o o o o oo o o o o o o o oo ooo o o o o o o o oo o oo o oo ooo o o o o ooooo o oo o o −1.0 0.0 0.5 1.0
0 20
60
o o o oo o ooo o o o o o o o o o o o o o o o o o o o o o o o oo o oo o o o o o o o o o o o o o o o ooo o o o o o o o o o o o o o o o o o o o oooo o o oo o oo o o o o o o o o oo o o o o o o o o oo o o oo oo o o o o o o o oo o o o oo o o o o o o o o o o oo o o o oo o o o o ooo o o o ooo o o o o o o o o ooooo o o o o o o o o o o oo o oo oo o o oo oo o oo oo o o o o o o o o o o oo oooo ooo o ooo o ooo oo o oo oo o o oo o o o o o oo o o o o o oo o o o o o oo o o oo o o o o ooo ooo o o ooo o o o oo o o o oo o oooo oo o o o o o o o oo o ooo o ooo oooo o oo o o o o o o o o o oo o o o oo o o o o o o o o o ooo o o o o o −1.0 0.0 0.5 1.0
oo o o o o o o o o o o o o o o oo o o o o o o o o o o oooo o o o o o o oo o o o o o o oo o oo o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o oo ooo oo o o o o o o o o oo o o o o o oo oo o oo ooo o oo o o oo oo o ooo o o o o o o o o o oo o o o oo o o o o ooo o o o o o o oooo o o o o o o o o o oo oo o oo o o o o oo o o o oooo o o oooo o oo oooo o oo o o oo o o oo ooo o o o o o o o o oo o o oo oo o ooo o o o o o ooo o o o oo o o o o ooo o o o o o oo o o oo oo o o o o o o o o oo o o o o o o o o o o o oo o oo o o o o oo o o oo o oo o oo o o o o o o o o o o oo o oo oo o o o o oo o o o o o oo o o oo o −1.0 0.0 0.5 1.0
100
100 60 0 20 100 60 0 20 100
6
3
42
0 20
13
−1.0
1.0
60
ooo o o o o o o o oo o o o o o o o o o o o o oo o o o o o o o o o o o o o o o oo o o o o o o o o o o oo o o o o o o o o o oo oo o o o o o o o o o o o oo oo o o o o o oo o o o o o o o o o oo o o o o o o oo o o o o o o o oo o o o o o o o o o o o o ooo o o o o ooo o o o o o o o o o o oo ooo oo o o o o o o o o oooo o o o oo o o o oo o o o o o o o oo o o o ooo o oo oo o o o o o oo ooo oo oo o o o ooo o ooooo ooo o o o o oo o o o o oo o o o oo o oo oo o oo o o oo o o o o oo oooo oo oo o o o o o o o o o o oo oo o o o o o o o o o o o o o o o o o oo o oo o o o o o o o oo o o oo o oooo o o o o ooo o o oo o oo oo o o o o o o oo o o o o o
0.5
o oo o o oo o o o o o o o o o o o o oo o o o o o o o o o o o o o o oo o o o o o oo o o o o o o oo o o o o oooo o o oooo o o o o o o o o o o o o o o o o o o o o oo o o oooo o o oo o o o o ooo o o o o oo ooo oo o o o o o o oo o o o o o o o o o o o o oo o o o o o o o o oo o oo o o o o o ooo o o o o o o o o o o o o o o o o o o o o o oo oooo ooo o o o o o o o o o o o o o o oo oo ooo o o oo o ooo ooo o o o o o o o o o oo o o o o o o o oo o oo o o oo oo o o o o o o o o o o oo oo o oo oo o o o o o o oooo o o o oo oo o o o o oo oo o o o o o o o o o o ooo o o o o o oo oo o oo o o o oo o o o oo oo oo oo o o o o o oo o −1.0 0.0 0.5 1.0
0 20
9
0.0
o o o o o o oo o oo o o oo oo o o o oo o o oo oo o o o o o oo o oo o o o o o o o o o o o o o o o o o o o oo o o oo o oo o o o o o o o o o o o o o o o o o o o o o o ooo o o o o o o o oo o o o o o o o o o o o o oo o o o o o o oo oo oo o o o o o o ooo o o o o o o o o o o o o o o o o o o o o o oo o oo o oo o oo oo o o o o o o o o o oooo o o o o o o o o o o oo ooo o oo o o o o o o o o o o o o o o o o o o o o oo ooo ooo oo o oo o o oo o o o o o o o o oo o o o o o o oo oo o oo o oo o o o oo ooo o o oo oo o o o o o o o oo o o oo o o o oo oo o o oo oo o o o ooo o o o o o oo o o o oo o o o o o oo oo o o o o o o o o o o o o o o o o oo −1.0 0.0 0.5 1.0
0 20
0 20
60
100
0 20
60
oo o o o o o o o o o o o o o o o o o o o o o o o oo o o o o o o oo o ooo o o o o o o o o o o o o o o o oo oo o o o o o oooo o o o o oo oo oo oo o o o o o o o oo o o o o o o o o o o o o o oo o o o o o ooo o o o o o o o o o oo oo o o o oo o o o o o o o o o o o o o o o o o o o o o ooo o oo o o o o o o o o o o o o o o o o o oo o o o o oo oo o o oo o o o o o o o o oo oo ooo o o o oo o o o o o o oo o o ooo oo o o oo o o o o o o o o o o oo o o oo o oo oo o o o oo oo o o oo oo o o o o o o o oo o o o o o o o o ooo o o ooo o o o o o o o o o oooo o o o o o o ooo oo o oo o o o oo oo o o o o o o ooo o o o oo o o o oo o o o o o o o ooo o o o o o −1.0 0.0 0.5 1.0
2
−1.0
100
5
100
0 20
60
o o o o o o o o o o o o o o o o o ooooo o o o o o o o o o o o o o oo o o o o o o o o o o o oo o oo o o o o o o o o o oo oo o o o oo o o oo oooo o oo o o o oo o o o o o o o oo o oo o ooo o oo o o o o o o o o oo o oo o oo o o o o o o o o o o o o oo o oo o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o ooo o o oo o o oo oo o o o o o o o o o o o o o o o o o o o o o ooooo ooo o o oo oo oo oo oooo ooo o o o o o oo o o o o o ooo o oo o o oo o o o o o o o oo o o o o o o oo oo o oo oo o o o o o o o oo o o o o o o oo o o o o o oo o o o o oo o oo oo o oo o o o o ooo oo o o oooo o oo ooo o o oo oo oo o ooo o oo o −1.0 0.0 0.5 1.0
o oo o o o o o o o o o o o o oo o o o o o o o oo o oo o o o oo o oo o o o o o o o o o o o o o o o o o o o oo o o o o o o o o o o o o o o o o o o oo o o o o o o o o o o o ooo o o o o o o o o o o o o o o ooo o o o oo o o o o o oooooo o o o o o oo o o o o o o o o o oo o o o oo o o o o ooo o o oo o o o o o ooo oo o o o o o oo o o o o o ooo o o o o o oo o o o o o o o o o o o o o o oo o o oo o o o o o o oo o o oo o o oooo o o o o o o o ooo oo o ooo o o oo oo o o o o o oo oo o o oo o o oo oo o o o o o ooo o oo o oo o o o o oooo oo o o oo o o o o oo o o o o o oo o oo ooo oo o o o oo oooo o ooo o o oo o o oo o o o o o o o o o o o o
60
1
100
0 20
60
oo o o o o o o o o o o o o o o o oo o ooo o o o o o o o o o oo o o o o oo o o o o oo o o o o o o o oo o o o o o o oo o o ooo o oo o ooo o oo o o o o oo o o o o o o o o o o o o o o o o o o o oo oo o o oo oo o o o o o o o o o o o o o o o o o o o o o o oo o o o o o o o o o oo o o o oo o o o o oo o oo oooo o o oo oo o o o o o o oo o o o o o o o o o o oo o o o o o o o o o o o o o o o o o o o o o o o ooo oo o ooo o ooo o o o o o o oo o o o oooo o o o o o oo o ooooo oo oo oo oo oo o o o o oo oo o oo o oo oo o o o oo o o o oo o o o oo o oo o o o oo o oo oo o oo o oo oo oo oo ooooo o o o o oo o o o oo o −1.0 0.0 0.5 1.0
0 20
100
CHAPTER 2. CHARACTERISTICS OF TIME SERIES
16
13
14
o oo o o o o o o oo o o o o o o oo o o o oo o oo o o o o o oo o o o o o o o o o oo o oo o o o o o o o o o oo oo o o o o o o oo oo o o o o o o oo o o o o o o o o o o o o oo oo o oo o o oo o oo o o o o o o o o o o o o o o o o o oo o o o o o o o o o o o o o oo o o o o o o o o oo o o o o oo o oo oo o o o o o o o o o o o o o o oooo o o o o o oo o o oo oo oo o o oo o o o o o o o oo o o oo o oo oo o o o oo o oo oo oo o o o oo oo oo o oo o ooo o oo o ooo o o o o oo ooo oo o ooo o ooo o o ooo o o oo o oo o o o o o o o oo o o o o o o o o oo o oo oo oo ooo o o oo o o o oo oo o o o o o o o o o o ooo o o ooooo −1.0 0.0 0.5 1.0
15
100 60 0 20 100 60 0 20 100
8
60
o o o o o o o o o o o o o o o o o o oo o o o o oo o o o o o o o o o o o o o o o o o o oo o o o oo o o oo o o oo o o o ooo o o o o o oo o o o o oo o o ooo o oo o oo o o o o o o o o o o o o o o o oo o o o o oo o ooo oo oo o oo o oo o oo o o o o o o o o o o o o o o o o o o o o o o o oo o o o o o o o o o o o ooo o oo oo o o oo oo o o o ooo o o o oo o o oo o oo o o oooo o o o ooooo oooo o o o o o o o o ooo oo oo o o o oo o o o oo o o o o o o o o o o o o o o o o o o o o o o o ooo oo o o oo oo o o oo oo oooo o oo o o o o o o oo o o o o o o o o o o o o oo oo o o o o oo ooo oo o oo o oo oo oo o o o o o oo oo −1.0 0.0 0.5 1.0
0 20
100 60 0 20 100 60 0 20 100
11
o o o o o oo o o o o o oo o o o o oo o o o o o o oo o o o oo o o o ooo o o o o o o oo o o o o o o o o o o oo o oo oo o o oo o o o o o o o oo o o o o o oo o o o o oo o o o o o o o o o o o o o o o oo o o o oo o o o o o o o o o o o o o o o o o o o o o o o o o ooo oo o o o o o o o o o o o o oo ooo o o o o o oo o o o o o o o o o o o o o oo oo o o oo o o o oo o o oo o o oo o o o o ooo o o o o o ooo o o o oo o o o o o o o o o o o o o o o o oo o oooo o o o o o oo o o o o o o o o o o o o o o o o o oo oo ooo o o o o o o o oo o o o o oo o o o o o o o o o o o o oo o o o o o o o o o o oo o oo o o −1.0 0.0 0.5 1.0
12
100
o o o oo oo ooo o o o o oooo o o o o o o o o o oo o o o o o o o o o o o o oo ooo o o o o o o o o o o o oo oo o o o o o o o o o o o o o o o o o oo o oo o o o o o o o o o o o oo o o oo o o o o oo o o o o oo oo o o o o o o o oo o o o o o o o o oo o o o o o o o o oo o ooo o o o o o o o o o o o o o o o o o o o o oo o ooo o o oo o o ooo ooo oo o o o o o o o o o o oo o oo o o oo o o oooo oo o oo oo oo o o o o o o oo o o oo o o o ooo oo oo o oo o o oo o ooo o o o o oo o o o o o o oo o o o o o o o o o o o o o o o o o o o o oo o o oo o o o oo o o oo o o o o o o o o o ooo o o oo o o oo oo −1.0 0.0 0.5 1.0
o o o oo oo o o oooo o o o o o o o o o o o o o o o o oo o o o o o o o o o o o o o o o o o o o o o oo o o o o o o o o o o o o oo o o o oo o o o o oo o o o oo o o o o o o o o o o oo o o o o o o o o o o o o o o o oo o o o o o o o oo o o o o o oo o oo o o oo o ooo o o o o oo o o o o o o o o o o o o o o o o o o o o o o o o oo oo o ooo oo o o o oo oo o oo ooo o o o oo o ooo o o o o o oo o o o o oo o o oo oo o oo oo oo o o o o ooo o o o o o o o o o o o o o o o o o o o o o o oo o oo oo o o o o oo o oo o o oo oo o o o o o oo o o o oo o oo o o o oo o o ooo o o o o oo o o o oo o oo oo o o o o oo o o o o o oo o o o oo o −1.0 0.0 0.5 1.0
4
oo o ooo o oo o o o oo oo o oo o o o o o o o o o o o o o o o o o o o o o o oo o o o o o o o o o o o o o o o o o o o oo oo o o o o oo o o o o o oo ooo o o o oo o o o o o o o o o o oo ooo o o o o o o o o o o oo o o o o o o o o o o o oo o o o oo o o o o oo o o o o oo o o o o o o o o o o o o oo o oo o oo o o o o ooo oo o o o o o o o oooo o ooo o o oo o o o ooo o o o o o o o o o o o o oo o oo o oo oo ooo o oooo o o o o o o oo o o oo o o o o o o o oo o oo o o o o o o o o oo o o o oo o o o o oo o o o o ooo o o o o o o o oo oo o o o o o o o oo o o o o o ooo o o o oo oo oo o oo o o o o o o o o o oo ooo o o oo oo oo o oo o oo o o −1.0 0.0 0.5 1.0
60
10
7
60
o o o o o o o oo o o o o oo ooo oo o oo o o o o o o o o o o o o o oo o o o o o o o o o o o o oo o o o oo o o o o oo o o o o o o o o o o o o o o o o o o o o o o oo o oo o o o oo o o o o o o o oo o o o o o o oo o o o o o oo o ooo o o o o o o o oo o o o o o o o oo o oo o o o o o oo o o o o o o o o o o o o o o o o o o o o o o o o o o o oo oo o o o oo o ooo o o o o o o o oo o o o o o oo o ooo oo o o o o oo o o oo o o oo oo o o oo o o o o oo o o o o o o o o o oo o oo o oo o o oo o o oo o o o o o o o o o oo o o oo o o o oo o o o o o o o oo o oo o o o o o o o o o oo oo ooo o o oo oo o o o oo o o o o o o o oo o o o o oo o o o oo o o −1.0 0.0 0.5 1.0
0 20
6
o o o o oo oo o oo o o o o o o o o oo o o o o o o o o o o o o o oo o o o o o o o o o o oo o o o o o o o o o oo o o o o o o o o o o o o oo o o o o o o o o o o o oo o o o o o o o o o o o o o o o o o o o o o o o o o o o o oo o o o o o o o o o o o o o o o o o o o o o o o o o o oo o o o o o o oo oo oo o o o o oo o oo o o o oo o o o oo o o o o ooo o oo o o o o oo o oo o o o o o o o o o o o o o o ooo oo o oo o o o o o o o o o o o o oooo o o o o o o o o o o o o oo o ooo o o o o o o o o oo o o o o o o o o o oo oo o o o oo o o o o o oo o o o oo ooo o o o oo o o o o o o oo o o o oo o o o o o ooo o o oo o oo o o o o −1.0 0.0 0.5 1.0
100
0 20
60
o o o o o oo o o oo o o o o o o o o o o oo o oo ooo o o o o o o o o o o o o o o o o o o o o o o o o o oo o o o o o o o o o o o o o oo o o o o oo o o o oo o o o o o o o o o o o o o o o o oo o o o o o oo o o oo o o o o o oo o o o o o o o o o o o o o o o o o oo o o o o o o oo o oo o o oo o o o oo o o o o o o oo oo o o o oo o ooo o o o o o o o o o o o o o o oo oo o o o oo ooo o oo o o o oo o oo o o o o o o o o o o o o o oo o oo o o oo o o o o o o o oo o o o o o o o o o o o o oo o o o o o o o oo o o o o oo oo o o o o o ooo o oo o o o o o o o oo o o o o o o o o oo oo o o ooo o o o oo o oo o o o o o o o o o oo −1.0 0.0 0.5 1.0
3
60
100 60 0 20 100
1.0
o o o o o o oo o o oo o o oo o o o o o o o o o o o o o o o o o o o o o oo o o o o o o o o o o o o o oo oo o ooo o o o o oo o o o o o o oo oo o o oo oo o o o oo o o oo oo o o o o o o o o o o o o o o o o o o o o o ooo o o o o o o oo oo o oo o o o o o o o o o o o o o o oo o o oo o oo ooo o o o o o o o o o o o o o o o o o oo o oo o ooo o o ooo oo o oo o o o o o o o o o oo o o o oo oo oo o oo o oo o o oo o o o o o o oo ooo o o o oo oo o ooo o ooo o oo o o o o o o o oo oo ooo o o o o o o oo oo o o oo o o o o o o o o o o o o o o o o o o o o o o o oo o ooooo oo o o oo o oo o oo ooo o oo o o oo o o o oo o −1.0 0.0 0.5 1.0
0 20
0 20
60
o o o o o o o o o o o o o o oo o o o o o o o o oo o o o o o o o o o o o o o o oo o o o o o o o o oo o o o o o ooo o o o o o ooo o o oooo o o o o o o o ooo oo oo o o o oo o o o o o o o o o o ooo o o o o o o o o ooo o o o o o oo o oo o o o o oo o o o o o o o o o o o o oo oo o o o o o o o oo o o o oo o oo o o o o o o o o o o o o o o o o ooo oo ooo o oo o o o o oo o o oo o o o o oo oo o oo oo o o o oo o o oo o o o o o oo o o o o o o o o ooo o o o o o oo ooo o o o o o o oo o o o o o o o o o o o o o o ooo ooo o oo o o o o oo oo o o o o o o o o o o o o o ooo ooo o oo o o o o o o o o o o o o oo o oo oo ooo oo o o o o o o o o −1.0 0.0 0.5 1.0
0.5
o o oo oo oo o o o oo o o o o oo o o o o o o o oo o oooo o oo o o o ooo o o o o o oo oo o o o o o o o o o o o o o o oo o o o o o o o o oo o o o oo o oo o o oo o o o o o o o o o o o o o o o o o oo o o oo o o o o o o o o o o o o o oo oo oo o o o o o o ooo o o oo o o o o o o o o o o o o o o o o o o o o o o o o o o o o o oo o o o oo o oo o o o o oo ooo o o o o o oo o o o o o o o o o oo oo oo o o o o oo o o o ooo o o o oo oo oo o oo o o o o o o oo o o o oo oo o o oooo oo o o o ooo o o o oo o o o oo ooo o o o o oo o oo o o o o o o o o o o o o o o o o o o o o o oo oo o ooo o oo oo o o ooo o o oo o oo ooo oo o o o oo o o ooo o −1.0 0.0 0.5 1.0
0 20
9
100
0 20
60
oo o ooo o o o o o o o o o oo oo oo o o o o o o o o o o o o o o o o o o o o o oo ooo o o o o o o oo oo o o o o o o o o o o oo o o o o o o oo o o oo o o o oo o o o o o o o o o o o o o o o o o o o o o o o o o o o o o oo o o o ooo ooo o o o o o o oo o o o o o o oo oo o o o o o o o o o o o o o oo oo o o oo oo o oo o oo o o o o o ooo o o o o o o oo o oo o o o o o o o o ooo o o o o oo oo oo o o o oo o o oo o o o o o o oo o o o o oo o o o o o o o o oo o oo o o o o o oo o o oo o o o o o o o o o o o o o o o o o ooo oo o o o o oo o o o o o o oo oo oo o o oo o o o o o ooo o o o o o o o o o o ooo o o o o o o o −1.0 0.0 0.5 1.0
0.0
100
5
100
0 20
60
o o o o oo o o o o o o o o oo o oo o o oo o o o oo o o o o o o oo o oo o o o oo o o o o oo o oo o o o o o o o o o o o o o o oo oo o o o o o oo ooo o o o o o oo o oo oo o o o o o o o o o o o o o o o oo o o o o o o o o o oo o oo o o o o o oo o o o o o o o o o o o o o o o o o o o o oo o o o o ooo o o oo o o o o oo oo o oo o o ooo o o oo o o o o oo o o o o o oo ooo o o o o o o o o o oo oo oo o o o o oo o o oo o o o o oo oo o o oo oo oo oo o o o o o oo o oo o o o o o o o ooo oo o oo o o o o o o o o o o o oo o o o ooo o o o o o o o o oo o o oo oo oo oo o o o o o o o o o oo oooo ooo oo o o o o o o oo o o o o o o o o o oo o o −1.0 0.0 0.5 1.0
2
−1.0
60
1.0
0 20
0.5
100
0.0
100
1
−1.0
o o o o oo o o o o o o o o o o o o oo o ooo o o o o o o o oo o o o o o o o o o o o o o o o o o o o o ooo o o ooo o o o o o oo o o oo o o o o oo oo o oo o o o o o o o oo oo o o oo o o o o o o o o o o o ooo oo o o o o o ooo o o o o o oo o oo o o o o o o o oo o o o oo o o o o o oooo o oo o o oo o o o o o o o o o o o o o o o o o o o o oo o oo o oo oo o oo o o oo oo oo oo o o oo oo o o oo o o o o o o oo oo oo o o o o o o o o o o o o o o o o oo o o oo o ooooooo ooo o o o o o o o o o o o oo oo o o oo o o o o ooo o o o o o oo oo o o o ooo oooo oo oo o o o o o ooo oo o o o o oo oo oo o o oooo o o o o oo oo o
60
oo o o o o o o o o o o o o o o o oo o ooo o o o o o o o o o oo o o o o o oo o o o o oo o o o o o oo o o o o o o oo o o ooo o oo o ooo o oo o o o o oo o o o o o o o o o o o o o o o o o o oo oo o o oo oo o o o o o o o o o o o o o o o o o o o oo o oo o o o o o o o o o oo ooo o o o o o o oo o o o oo o o o o o o o o o o o o o oo o o oo oo o ooo oo o ooo o o o o o o o o o o o oo oo o o oo oo o o o o o o o oo o o o o o o o o o o oo oo ooo o o o o o o o o o o ooooo o oo oo oo oo oo oo o o oo o oo o oo oo o o o oo o o o oo o o oo o oo o o o oo o o oo oo o o o o o o o o o o oo oo ooooo o o oo oo o o oo o
0 20
0 20
60
100
Figure 2.17: Multiple lagged scatterplots showing the relationship between the SOI at time t + h, say xt+h (x-axis) versus recruits at time t, say yt (y-axis), 0 ≤ h ≤ 15.
16
Figure 2.18: Multiple lagged scatterplots showing the relationship between the SOI at time t, say xt (x-axis) versus recruits at time t + h, say yt+h (y-axis), 0 ≤ h ≤ 15.
measured at lag h = 0. The general pattern suggests that predicting recruits might be possible using the El Ni˜no at lags of 5, 6 ,7, . . .
CHAPTER 2. CHARACTERISTICS OF TIME SERIES
43
months. A measure of the correlation between several series, xt and yt is the cross covariance function (CCF), defined in terms of the counterpart covariance to (2.2), E{(xt+h − µx)(yt − µy )} = γxy (h). The cross correlation is the scaled (to lie between −1 and 1) version of above, say & ρxy (h) = γxy (h)/ γx(0) γy (0). The above quantities are expressed in terms of population averages and must generally be estimated from sample data. The estimated sample cross correlation functions can be used to investigate the possibility of the series being related at different lags. In order to investigate cross correlations between two series xt, yt, t = 1, . . . , n, we note that a reasonable sample version of ρxy (h) might be computed as 1 n−h $ " γxy (h) = (xt+h − x)(yt − y), n t=1 where y is the sample mean of {yt}nt=1. The estimated &sample cross correlation function then becomes ρ"xy (h) = γ" xy (h)/ γ" x(0) γ" y (0), where h denotes the amount that one series is lagged relative to the other. Generally, one computes the function ρ"xy (h) for a number of positive and negative values, h = 0, ±1, ±2, up to about 0.3 n, say and displays the results as a function of lag h. The maximum value, attained for h = 0, is ρxy (0) = 1 and the function takes values between −1 and 1. This fact makes it easy to compare values of the cross correlation with each other. Furthermore, under the hypothesis that there is not relation at time h and that at least one of the two series are independent and identically distributed, the distribution of ρ"xy (h) is approximately normal with mean 0 and standard
CHAPTER 2. CHARACTERISTICS OF TIME SERIES
44
deviation given again by (2.3). Hence, one can compare values of the sample cross correlation with some appropriate number of sample standard deviations based on normal theory. Generally, values within ±1.96 σρ might be reasonable if one is willing to live with each test at a signi.cance level of 0.05. Otherwise, broader limits would be appropriate. In general, if m tests are made, each at level a the overall level of significance is bounded by m α. Example 1.16: To give an example, consider the cross correlations between the environmental series and recruitment shown in the bottom panel of Figure 2.16. The cross correlation between the SOI, xt+h and recruitment yt shows peaks at h = −6, −7 which implies that lagged products involving xt−6 and yt as well as those involving xt−7 and yt match up closely. The value shown on the graph is ρ"xy (−6) = −0.6. This means, in practice that the values of the SOI series tend to lead the recruitment series by 6 or 7 units. Also, one may note that, since the value is negative, lower values of SOI are associated with higher recruitment. The standard error in this case is approximately σρ = 0.047 and the −0.6 easily exceeds two standard deviations, shown as lines above and below the axis in Figure 2.16. Hence, we can reject the hypothesis that the correlation is zero at that lag. It is clear also that there are some periodic fluctuations apparent in the cross correlations. For example, in the SOI Recruitment example, there seems to be systematic fluctuation with a period (1 full cycle) of about 12 months. This produces a number of secondary peaks in the cross correlation function. The analysis of this periodic behavior and the accounting for periodicities in the series is considered in later sections or the books by Franses (1996) and Ghysels and Osborn (2001).
CHAPTER 2. CHARACTERISTICS OF TIME SERIES
2.4.3
45
Partial Autocorrelation Function
A third kind of time series relationship expresses what is essentially the self predictability of a series through the partial autocorrelation function (PACF). There are several ways of thinking about this measure. One may regard the PACF as the simple correlation between two points separated by a lag h, say xt and xt−h, with the effect of the intervening points xt−1, xt−2, . . ., xt−h+1 conditioned out, i.e. the pure correlation between the two points. This interpretation is often given in more practical statistical settings. For example, one may get silly causal inferences by quoting correlations between two variables, e.g. teachers’ income and wine consumption, that may occur simply because both are correlated with some common driving factor, in this case, the gross domestic product, GDP, or some other force influencing disposable income. In time series analysis we are really more interested in the prediction or forecasting problem. In this case we might consider the problem of predicting xt based on observation observed h units back in the past, say xt−1, xt−2, . . ., xt−h. Suppose we want to predict xt from xt−1, . . ., xt−h using some linear function of these past values. Consider minimizing the mean square prediction error M SE = E[(xt − x" t)2]
using the predictor x" t = a1 xt−1 + a2 xt−2 + · · · + ah xt−h over the possible values of the weighting coefficients a1, . . ., ah where we assume, for convenience that xt has been adjusted to have zero mean. Consider the result of minimizing the above mean square predic-
CHAPTER 2. CHARACTERISTICS OF TIME SERIES
46
tion error for a particular lag h. Then, the partial autocorrelation function is defined as the value of the last coefficient a" h, i.e. Φhh = ah. As a practical matter, we minimize the sample error sum of squares n $
SSE =
t=h+1
(xt − x) −
h $
k=1
2
ak (xt−k − x)
# " . with the estimated partial correlation defined as Φ hh = a h
The coefficients, as defined above, are also between −1 and 1 and have the usual properties of correlation coefficients. In particular, its standard error under the hypothesis of no partial autocorrelation is still (2.3). The intuition of the above argument is that the last coefficient will get very small once the forecast horizon or lag h is large enough to give good prediction. In particular, the order of the autoregressive model in the next chapter will be exactly the lag h beyond which Φhh = 0.
1.0
1.0
Example 1.17: As an example, we show in the right panels panel of Figure 2.19 the partial autocorrelation functions of the SOI series
0.0 −0.5
−0.5
0.0
0.5
PACF of Recruits
0.5
PACF of SOI
0
5
10
15
20
25
30
0
5
10
15
20
25
30
Figure 2.19: Partial autocorrelation functions for the SOI (left panel) and the recruits (right panel) series.
(left panel) and the recruits series (right panel). Note that the PACF
CHAPTER 2. CHARACTERISTICS OF TIME SERIES
47
of the SOI has a single peak at lag h = 1 and then relatively small values. This means, in effect, that fairly good prediction can be achieved by using the immediately preceding point and that adding further values does not really improve the situation. Hence we might try an autoregressive model with p = 1. The recruits series has two peaks and then small values, implying that the pure correlation between points is summarized by the first two lags. A major application of the PACF is to diagnosing the appropriate order of an autoregressive model for the series under consideration. Autoregressive models will be studied extensively later but we note here that they are linear models expressing the present value of a series as a linear combination of a number of previous values, with an additive error. Hence, an autoregressive model using two previous values for prediction might be written in the form xt = φ1 xt−1 + φ2 xt−2 + wt where φ1 and φ2 are fixed unknown coefficients and wt are values from an independent series with zero mean and common variance σw2 . For example, if the PACF at lag h is roughly zero beyond a fixed value, say h = 2 as observed for the recruits series in the left panel of Figure 2.19, then one might assume a model of the form above for that recruits series. To finish the introductory discussion, we note that the extension of what has been said above to multivariate time series is fairly straightforward if one restricts the discussion to relations between two series at a time. The two series xt and yt can be laid out in the vector (xt, yt) which becomes a multivariate series of dimension two (bivariate). To generalize, consider the p series (xt1, xt2, . . . , xtp) = xt as the row vector defining the multivariate series xt. The autocovariance matrix of the vector xt can then be defined as the p × p matrix
CHAPTER 2. CHARACTERISTICS OF TIME SERIES
48
containing as elements γij (h) = E{(xt+h,i − µi)(xtj − µj )} and the analysis of the cross covariance structure can proceed on two elements at a time. The discussion of possible multiple relations is deferred until later. To recapitulate, the primary objective of this chapter has been to define and illustrate with real examples three statistics of interest in describing relationships within and among time series. The autocorrelation function measures the correlation over time in a single series; this correlation can be exploited for prediction purposes or for suggesting periodicities. The cross correlation function measures the correlation between series over time; it may be, for example, that a series is better related to the past of another series than it is to its own past. The partial autocorrelation function gives a direct measure of the lag length necessary to predict a series from itself, i.e. to forecast future values. It is also critical in determining the order of an autoregressive model satisfied by some real data set. It should be noted that all three of the measures given in this section can be distorted if there are significant trends in the data. It is obvious that lagged products of the form (xt+h − x)(xt − x) will be artificially large if there is a trend present. Since the correlations of interest are usually associated with the stationary part of the series, i.e., the part that can be thought of as being superimposed on trend, it is usual to evaluate the correlations of the detrended series. This means, in effect, that we replace x in the equations by a" + b" t if the trend can be considered to be linear. If the trend is quadratic or logarithmic, the appropriate alternate nonlinear predicted value is subtracted before computing the lagged products. Note also that
CHAPTER 2. CHARACTERISTICS OF TIME SERIES
49
differencing the series, as discussed in Section 2.2.2, can accomplish the same result. 2.5
Problems
1. Consider a generalization of the model given in Example 1.1, namely, xt = µ + wt − θ wt−1, where {wt} are independent zeromean random variables with variance σw2 . Prove that E(xt) = µ, γx(0) = (1 + θ2)σw2 , γx(1) = −θ σw2 , γx(h) = 0 if |h| > 1, and finally show that xt is weakly stationary. 2 Consider the time series generated by x1 = µ + w1 and xt = µ + xt−1 + wt for t ≥ 2. Show that xt is not stationary no matter µ = 0 or not and find γx(h). 3. Suppose that xt is stationary with mean µx and covariance function given by γx(h). Find the mean and covariance function of (a) yt = a + b xt, where a and b are constants, (b) zt = xt − xt−1. 4. Consider the linear process yt =
∞ $
j=−∞
aj wt−j ,
where wt is a white noise process with variance σw2 and aj , j = 0, ±1, ±2, ldots, are constants. The process yt will exist (as a % limit in mean square) if j |aj | < ∞; you do not need to prove this. Show that the series yt is stationary, with autocovariance function ∞ $ aj+h aj . γy (h) = σw2 j=−∞
CHAPTER 2. CHARACTERISTICS OF TIME SERIES
50
Apply the result to calculating the autocovariance function of the 3-point moving average (xt−1 + xt + xt+1)/3. For the following problems, you need to use a computer package. 5. Melting glaciers deposit yearly layers of sand and silt during the spring melting seasons which can be reconstructed yearly over a period ranging from the time de-glaciation began in New England (about 12,600 years ago) to the time it ended (about 6, 000 years ago). Such sedimentary deposits, called varves, can be used as a proxy for paleoclimatic parameters such as temperature. The file mass2.dat contains yearly records for 634 years beginning 11, 834 years ago, collected from one location in Mas-
100 0
50
varve thickness
150
Varve thickness from Massachusetts (n=634)
0
100
200
300
400
500
600
year
Figure 2.20: Varve data for Problem 5.
sachusetts. For further information, see Shumway and Verosub (1992). (a) Plot the varve records and examine the autocorrelation and partial autocorrelation functions for evidence of nonstationarity. (b) Argue that the transformation yt = log xt might be useful for stabilizing the variance. Compute γx(0) and γy (0) over two time
CHAPTER 2. CHARACTERISTICS OF TIME SERIES
51
intervals for each series to determine whether this is reasonable. Plot the histograms of the raw and transformed series. (c) Plot the autocorrelation of the series yt and argue that a first difference produces a reasonably stationary series. Can you think of a practical interpretation for ut = yt −yt−1 = log xt −log xt−1? (d) Compute the autocorrelation function of the differenced transformed series and argue that a generalization of the model given by Example 1.1 might be reasonable. Assume that ut = wt − θ wt−1 is stationary when the inputs wt are assumed independent with E(wt) = 0 and (wt2) = σw2 . Using the sample ACF and the printed autocovariance γ" u(0), derive estimators for θ and σw2 . 6. Two time series representing average wholesale U.S. gasoline and oil prices over 180 months, beginning in July, 1973 and ending in December, 1987 are given in the file oil-gas.dat. Analyze the data using some of the techniques in this chapter with the idea that one should be looking at how changes in oil prices influence 700
Gas and oil prices (n=180 months)
400 100
200
300
price
500
600
GAS OIL
0
50
100
150
month
Figure 2.21: Gas and oil series for Problem 6.
changes in gas prices. For further reading, see Liu (1991). In
CHAPTER 2. CHARACTERISTICS OF TIME SERIES
52
particular, consider the following options: (a) Plot the raw data and look at the autocorrelation functions to argue that the untransformed data series are nonstationary. (b) It is often argued in economics that price changes are important, in particular, the percentage change in prices from one month to the next. On this basis, argue that a transformation of the form yt = log xt − log xt−1 might be applied to the data where xt is the oil or gas price series. (c) Use lagged multiple scatterplots and the auto and cross correlation functions of the transformed oil and gas price series to investigate the properties of these series. Is it possible to guess whether gas prices are raised more quickly in response to increasing oil prices than they are decreased when oil prices are decreased ? Do you think that it might be possible to predict log percentage changes in gas prices from log percentage changes in oil prices? Plot the two series on the same scale. 7. Monthly Handgun Sales and Firearms Related Deaths in California Legal handgun purchase information for 227 months spanning the time period February 1, 1980 through December 31, 1998 was obtained from the Department of Justices automated Firearms systems database. California resident firearms death data was obtained form the California Department of Health Services. The data are plotted in the figure, with both rates given in numbers per 100,000 residents. Suppose that the main question of interest for this data pertains to the possible relations between handgun sales and death rates over this time period. Include the possibility of lagging relations in your analysis. In particular, answer the questions below:
CHAPTER 2. CHARACTERISTICS OF TIME SERIES
53
2.0
Gun sales and gun death rate
1.0
1.5
Hangun sales/100 per 100,000
Gun death rate per 100,000
0
50
100 months
150
200
Figure 2.22: Handgun sales (per 10,000,000) in California and monthly gun death rate (per 100,00) in California (February 2, 1980 -December 31, 1998.
(a) Use scatterplots to argue that there is a potential nonlinear relation between death rates and handgun sales and indicate whether you think that there might be a lag. (b) Bolster your argument for a lagging relationship by examining the cross correlation function. What do the autocorrelation functions indicate for this data? (c) Examine the first difference for the two processes and indicate what the ACF’s and CCF’s show for the differenced data. (d) Smooth the two series with a 12 point moving average and plot the two series on the same graph. Subtract the moving average from the original unsmoothed series. What do the residual series show in the ACF and CCF for this case? 2.6
Computer Code
The following R commands are used for making the graphs in this chapter.
CHAPTER 2. CHARACTERISTICS OF TIME SERIES
# 3-28-2006 graphics.off()
54
# clean the previous graph on the scre
############################################################ # This is Southern Oscillation Index data and Recruits data ############################################################
y<-read.table("c:\\teaching\\time series\\data\\soi.dat",hea # read data file x<-read.table("c:\\teaching\\time series\\data\\recruit.dat" y=y[,1] x=x[,1]
postscript(file="c:\\teaching\\time series\\figs\\fig-1.1.ep horizontal=F,width=6,height=6) par(mfrow=c(1,2),mex=0.4) # save the graph as a postscript file ts.plot(y,type="l",lty=1,ylab="",xlab="") # make a time series plot title(main="Southern Oscillation Index",cex=0.5) # set up the title of the plot abline(0,0) # make a straight line ts.plot(x,type="l",lty=1,ylab="",xlab="") abline(mean(x),0) title(main="Recruit",cex=0.5) dev.off() z=arima.sim(n=200,list(ma=c(0.9)))
# simulate a MA(1) m
CHAPTER 2. CHARACTERISTICS OF TIME SERIES
55
postscript(file="c:\\teaching\\time series\\figs\\fig-1.2.ep horizontal=F,width=6,height=6) ts.plot(z,type="l",lty=1,ylab="",xlab="") title(main="Simulated MA(1)",cex=0.5) abline(0,0) dev.off() n=length(y) n2=n-12 yma=rep(0,n2) for(i in 1:n2){yma[i]=mean(y[i:(i+12)])} yy=y[7:(n2+6)] yy0=yy-yma
# compute th
postscript(file="c:\\teaching\\time series\\figs\\fig-1.9.ep horizontal=F,width=6,height=6) par(mfrow=c(1,2),mex=0.4) ts.plot(yy,type="l",lty=1,ylab="",xlab="") points(1:n2,yma,type="l",lty=1,lwd=3,col=2) ts.plot(yy0,type="l",lty=1,ylab="",xlab="") points(1:n2,yma,lty=1,lwd=3,col=2) # make a point plot abline(0,0) dev.off() m=17 n1=n-m y.soi=rep(0,n1*m) dim(y.soi)=c(n1,m) y.rec=y.soi for(i in 1:m){
CHAPTER 2. CHARACTERISTICS OF TIME SERIES
56
y.soi[,i]=y[i:(n1+i-1)] y.rec[,i]=x[i:(n1+i-1)]} text_soi=c("1","2","3","4","5","6","7","8","9","10","11","12 "14","15","16")
postscript(file="c:\\teaching\\time series\\figs\\fig-1.15.e horizontal=F,width=6,height=6) par(mfrow=c(4,4),mex=0.4) for(i in 2:17){ plot(y.soi[,1],y.soi[,i],type="p",pch="o",ylab="",xlab="", ylim=c(-1,1),xlim=c(-1,1)) text(0.8,-0.8,text_soi[i-1],cex=2)} dev.off()
text1=c("ACF of SOI Index") text2=c("ACF of Recruits") text3=c("CCF of SOI and Recruits") SOI=y Recruits=x postscript(file="c:\\teaching\\time series\\figs\\fig-1.16.e horizontal=F,width=6,height=6) par(mfrow=c(2,2),mex=0.4) acf(y,ylab="",xlab="",ylim=c(-0.5,1),lag.max=50,main="") # make an ACF plot legend(10,0.8, text1) # set up the lege acf(x,ylab="",xlab="",ylim=c(-0.5,1),lag.max=50,main="") legend(10,0.8,text2) ccf(y,x, ylab="",xlab="",ylim=c(-0.5,1),lag.max=50,main="") legend(-40,0.8,text3) dev.off()
CHAPTER 2. CHARACTERISTICS OF TIME SERIES
57
postscript(file="c:\\teaching\\time series\\figs\\fig-1.17.e horizontal=F,width=6,height=6) par(mfrow=c(4,4),mex=0.4) for(i in 1:16){ plot(y.soi[,i],y.rec[,1],type="p",pch="o",ylab="",xlab="", ylim=c(0,100),xlim=c(-1,1)) text(-0.8,10,text_soi[i],cex=2)} dev.off()
postscript(file="c:\\teaching\\time series\\figs\\fig-1.18.e horizontal=F,width=6,height=6) par(mfrow=c(4,4),mex=0.4) for(i in 1:16){ plot(y.soi[,1],y.rec[,i],type="p",pch="o",ylab="",xlab="", ylim=c(0,100),xlim=c(-1,1)) text(-0.8,10,text_soi[i],cex=2)} dev.off()
postscript(file="c:\\teaching\\time series\\figs\\fig-1.19.e horizontal=F,width=6,height=6) par(mfrow=c(1,2),mex=0.4) pacf(y,ylab="",xlab="",lag=30,ylim=c(-0.5,1),main="") text(10,0.9,"PACF of SOI") pacf(x,ylab="",xlab="",lag=30,ylim=c(-0.5,1),main="") text(10,0.9,"PACF of Recruits") dev.off() ############################################################
############################################################
CHAPTER 2. CHARACTERISTICS OF TIME SERIES
58
# This is global temperature data #################################
y1<-matrix(scan("c:\\teaching\\time series\\data\\ngtemp.dat byrow=T,ncol=1) a<-1:12 a=a/12 y=y1[,1] n=length(y) x<-rep(0,n) for(i in 1:149){ x[((i-1)*12+1):(12*i)]<-1856+i-1+a } x[n-1]<-2005+1/12 x[n]=2005+2/13 ######################### # Nonparametric Fitting # ######################### ######################################################### # Define the Epanechnikov kernel function local estimator
kernel<-function(x){0.75*(1-x^2)*(abs(x)<=1)} ############################################################
# Define the function for computing the local linear estima local<-function(y,x,z,h){ # parameters: y=response, x=design matrix; h=bandwidth; z=
CHAPTER 2. CHARACTERISTICS OF TIME SERIES
59
nz<-length(z) ny<-length(y) beta<-rep(0,nz*2) dim(beta)<-c(nz,2) for(k in 1:nz){ x0=x-z[k] w0<-sqrt(kernel(x0/h)) beta[k,]<-glm(y~x0,weight=w0)$coeff } return(beta) } ############################################################ z=x h=12 # take a badnwidth fit=local(y,x,z,h) # fit model y=m(x) + e mhat=fit[,1] # obtain the nonparametric est resid1=y-(-9.037+0.0046*x) resid2=y-mhat
postscript(file="c:\\teaching\\time series\\figs\\fig-1.4.ep horizontal=F,width=6,height=6) matplot(x,y,type="p",pch="o",ylab="",xlab="",cex=0.5) # make multiple plots points(z,mhat,type="l",lty=1,lwd=3,col=2) abline(-9.037,0.0046,lty=1,lwd=5,col=3) # make a stright line with an intercept and slope title(main="Original Data with Linear and Nonlinear Trend",c dev.off()
CHAPTER 2. CHARACTERISTICS OF TIME SERIES
60
postscript(file="c:\\teaching\\time series\\figs\\fig-1.5.ep horizontal=F,width=6,height=6) par(mfrow=c(1,2),mex=0.4) matplot(x,resid1,type="l",lty=1,ylab="",xlab="",cex=0.5) abline(0,0) title(main="Detrended: Linear",cex=0.5) matplot(x,resid2,type="l",lty=1,ylab="",xlab="",cex=0.5) abline(0,0) title(main="Detrended: Nonlinear",cex=0.5) dev.off()
y_diff=diff(y) postscript(file="c:\\teaching\\time series\\figs\\fig-1.6.ep horizontal=F,width=6,height=6) plot(x[-1],y_diff,type="l",lty=1,ylab="",xlab="",cex=0.5) abline(0,0) title(main="Differenced Time Series",cex=0.5) dev.off() ############################################################ # This is China data ###################################
data<-read.table("c:/teaching/stat3150/data/data1.txt",heade # read data from a file containing 6 columns of data y<-data[,1:5] # put the first 5 columns of data into y x<-data[,6] text1<-c("agriculture","commerce","consumption","industry"," # set the text for legend in a graph
CHAPTER 2. CHARACTERISTICS OF TIME SERIES
61
postscript(file="c:\\teaching\\time series\\figs\\fig-1.3.ep horizontal=F,width=6,height=6) matplot(x,log(y),type="l",lty=1:5,ylab="",xlab="") legend(1960,8,text1,lty=1:5,col=1:5) dev.off()
############################################################ # This is motor cycles data ###################################
data<-read.table("c:/teaching/stat3150/data/data7.txt",heade # read data from a file containing 6 columns of data y<-data[,1] x<-data[,2]-1900 y_diff1=diff(y) y_diff2=diff(y_diff1)
postscript(file="c:\\teaching\\time series\\figs\\fig-1.7.ep horizontal=F,width=6,height=6) par(mfrow=c(2,2),mex=0.4) matplot(x,y,type="l",lty=1,ylab="",xlab="") text(60,250,"Data") ts.plot(y_diff1,type="l",lty=1,ylab="",xlab="") text(20,40,"First difference") abline(0,0) ts.plot(y_diff2,type="l",lty=1,ylab="",xlab="") text(20,25,"Second order difference") abline(0,0) dev.off()
CHAPTER 2. CHARACTERISTICS OF TIME SERIES
62
############################################################ # This is Johnson and Johnson data ###################################
y<-matrix(scan("c:\\teaching\\time series\\data\\jj.dat"),by n=length(y) y_log=log(y) # log of data postscript(file="c:\\teaching\\time series\\figs\\fig-1.8.ep horizontal=F,width=6,height=6) par(mfrow=c(1,2)) ts.plot(y,type="l",lty=1,ylab="",xlab="") title(main="J&J Earnings",cex=0.5) ts.plot(y_log,type="l",lty=1,ylab="",xlab="") title(main="transformed log(earnings)",cex=0.5) dev.off()
############################################################ # This is retail sales data ################################### y=matrix(scan("c:\\res\\0published\\cai-chen\\retail\\retail byrow=T,ncol=1) postscript(file="c:\\teaching\\time series\\figs\\fig-1.10.e horizontal=F,width=6,height=6) ts.plot(y,type="l",lty=1,ylab="",xlab="") dev.off()
############################################################ # This is marketing data ################################### text_tv=c("television")
CHAPTER 2. CHARACTERISTICS OF TIME SERIES
63
text_radio=c("radio") data<-read.table("c:/teaching/stat3150/data/data4.txt",heade TV=log(data[,1]) RADIO=log(data[,2]) postscript(file="c:\\teaching\\time series\\figs\\fig-1.11.e horizontal=F,width=6,height=6) ts.plot(cbind(TV,RADIO),type="l",lty=c(1,2),col=c(1,2),ylab= text(20,10.5,text_tv) text(165,8,text_radio) dev.off()
############################################################ # This is Argentina data ################################### text_ar=c("difference", "inflation") y<-read.table("c:/teaching/stat3150/data/data8.txt",header=T y=y[,1] n=length(y) y_t=diff(log(y)) f_t=diff(y)/y[1:(n-1)] x=seq(70.25,by=0.25,89.75) postscript(file="c:\\teaching\\time series\\figs\\fig-1.12.e horizontal=F,width=6,height=6) matplot(x,cbind(y_t,f_t),type="l",lty=c(1,2),col=c(1,2),ylab legend(72,5,text_ar,lty=c(1,2),col=c(1,2)) dev.off()
############################################################ # This is exchange rate data ###################################
CHAPTER 2. CHARACTERISTICS OF TIME SERIES
64
x<-matrix(scan(file="c:\\res\\cai-xu\\jpy\\jpy.dat"),byrow=T n<-length(x) nweek<-(n-7)/5 week1<-rep(0,n) # Dates for week week1[1:4]<-2:5 for(j in 1:nweek){ i1<-4+(j-1)*5+1 i2<-4+j*5 week1[i1:i2]<-c(1,2,3,4,5) } i2<-(nweek+1)*5 week1[i2:n]<-1:3 y<-x[week1==3] # Wednsday x1<-x[week1==4] # Thursday x1<-append(x1,0) x1<-(1-(y>0))*x1 # Take value from Thursday if ND on Wed x1<-y+x1 # Wednsday + Thursday n<-length(x1)
x<-100*(log(x1[2:n])-log(x1[1:(n-1)])) # log return postscript(file="c:\\teaching\\time series\\figs\\fig-1.13.e horizontal=F,width=6,height=6) ts.plot(x,type="l",ylab="",xlab="") abline(0,0) dev.off()
############################################################ # This is unemployment data ################################### text_unemploy=c("unadjusted", "seasonally adjusted")
CHAPTER 2. CHARACTERISTICS OF TIME SERIES
65
data<-read.table("c:/teaching/stat3150/data/data10.txt",head y1=data[,1] y2=data[,2] n=length(y1) x=seq(62.25,by=0.25,92) postscript(file="c:\\teaching\\time series\\figs\\fig-1.14.e horizontal=F,width=6,height=6) par(mfrow=c(1,2),mex=0.4) matplot(x,cbind(y1,y2),type="l",lty=c(1,2),col=c(1,2),ylab=" legend(66,10,text_unemploy,lty=c(1,2),col=c(1,2)) plot(y2[1:(n-1)],y2[2:n],type="l",lty=c(1),col=c(1),ylab="", dev.off()
############################################################ # This is varve data ##################### x<-matrix(scan("c:\\teaching\\time series\\data\\mass2.dat") postscript(file="c:\\teaching\\time series\\figs\\fig-1.20.e horizontal=F,width=6,height=6) ts.plot(x,type="l",lty=1,ylab="varve thickness",xlab="year") title(main="Varve thickness from Massachusetts (n=634)",cex= dev.off()
############################################################ # This is oil-gas data ####################### data<-matrix(scan("c:\\teaching\\time series\\data\\gas-oil. byrow=T,ncol=2) text4=c("GAS","OIL") postscript(file="c:\\teaching\\time series\\figs\\fig-1.21.e
CHAPTER 2. CHARACTERISTICS OF TIME SERIES
66
horizontal=F,width=7,height=7) ts.plot(data,type="l",lty=c(1,2),col=c(1,2),ylab="price",xla title(main="Gas and oil prices (n=180 months)",cex=0.5) legend(20,700,text4,lty=c(1,2),col=c(1,2)) dev.off()
############################################################ # This is handgun data ##################### y<-matrix(scan("c:\\teaching\\time series\\data\\guns.dat"), sales=y[,1] y=cbind(y[,1]/100,y[,2]) text5=c("Hangun sales/100 per 100,000") text6=c("Gun death rate per 100,000") postscript(file="c:\\teaching\\time series\\figs\\fig-1.22.e horizontal=F,width=7,height=7) par(mex=0.4) ts.plot(y,type="l",lty=c(1,2),col=c(1,2),ylab="",xlab="month title(main="Gun sales and gun death rate",cex=0.5) legend(20,2,lty=1,col=1,text5) legend(20,0.8,lty=2,col=2,text6) dev.off() ############################################################
2.7
References
Bollerslev, T. (1986). Generalized autoregressive conditional heteroskedasticity. Journal of Econometrics, 31, 307-327. Burman, P. and R.H. Shumway (1998). Semiparametric modeling of seasonal time series. Journal of Time Series Analysis, 19, 127-145.
CHAPTER 2. CHARACTERISTICS OF TIME SERIES
67
Cai, Z. (2002). A two-stage approach to additive time series models. Statistica Neerlandica, 56, 415-433. Cai, Z. (2006). Trending time varying coefficient time series models with serially correlated errors. Forthcoming in Journal of Econometrics. Cai, Z. and R. Chen (2006). Flexible seasonal time series models. Advances in Econometrics, 20B, 63-87. Engle, R.F. (1982). Autoregressive conditional heteroscedasticity with estimates of the variance of United Kingdom inflations. Econometrica, 50, 987-1007. Fan, J. and Q. Yao (2003). Nonlinear Time Series: Nonparametric and Parametric Methods. Springer-Verlag, New York. Franses, P.H. (1996). Periodicity and Stochastic Trends in Economic Time Series. New York: Cambridge University Press. Franses, P.H. (1998). Time Series Models for Business and Economic Forecasting. New York: Cambridge University Press. Franses, P.H. and D. van Dijk (2000). Nonlinear Time Series Models for Empirical Finance. New York: Cambridge University Press. Ghysels, E. and D.R. Osborn (2001). The Econometric Analysis of Seasonal Time Series. New York: Cambridge University Press. Granger, C.W.J. and T. Ter¨asvirta (1993). Modeling Nonlinear Economic Relationships. Oxford, U.K.: Oxford University Press. Hamilton, J.D. (1994). Time Series Analysis. Princeton University Press, NJ. Liu, L.M. (1991). Dynamic relationship analysis of U.S. gasoline and crude oil prices. Journal of Forecasting, 10, 521-547. Sercu, P. and R. Uppal (2000). Exchange Rate Volatility, Trade, and Capital Flows under Alternative Rate Regimes. Cambridge: Cambridge University Press. Shumway, R.H. (1988). Applied Statistical Time Series Analysis. Englewood Cliffs, NJ: Prentice-Hall. Shumway, R.H. and D.S. Stoffer (2000). Time Series Analysis & Its Applications. New York: Springer-Verlag. Shumway,R.H. and K.L. Verosub (1992). State space modeling of Paleoclimatic time series. Proceeding of 5th International Meeting on Statistical Climatology, Toronto, 22-26 June, 1992. Taylor, S.(2005). Asset Price Dynamics, Volatility, and Prediction. Princeton University Press, Princeton, NJ.
CHAPTER 2. CHARACTERISTICS OF TIME SERIES
68
Tong, H. (1990). Nonlinear Time Series: A Dynamical System Approach. Oxford University Press, Oxford. Tsay, R.S. (2005). Analysis of Financial Time Series, 2th Edition. John Wiley & Sons, New York.
Chapter 3 Univariate Time Series Models 3.1
Introduction
The organization of this chapter is patterned after the landmark approach to developing models for time series data pioneered by Box and Jenkins (see Box, et al, 1994). This assumes that there will be a representation of time series data in terms of a difference equation that relates the current value to its past. Such models should be flexible enough to include non-stationary realizations like the random walk given above and seasonal behavior, where the current value is related to past values at multiples of an underlying season; a common one might be multiples of 12 months (1 year) for monthly data. The models are constructed from difference equations driven by random input shocks and are labelled in the most general formulation as ARMA (Autoregressive Moving Average) model or a more general model ARIMA, i.e., Autoregressive Integrated Moving Average processes. The analogies with differential equations, which model many physical processes, are obvious. For clarity, we develop the separate components of the model sequentially, considering the integrated, autoregressive, and moving average in order, followed by the seasonal modification. The BoxJenkins approach suggests three steps in a procedure that they sum69
CHAPTER 3. UNIVARIATE TIME SERIES MODELS
70
marize as identification, estimation, and forecasting. Identification uses model selection techniques, combining the ACF and PACF as diagnostics with the versions of the Akaike Information criterion (AIC) type model selection criteria given below to find a parsimonious (simple) model for the data. Estimation of parameters in the model will be the next step. Statistical techniques based on maximum likelihood and least squares are paramount for this stage and will only be sketched in this course. Hopefully, we can discuss them in a great length if time permits. Finally, forecasting of time series based on the estimated parameters, with sensible estimates of uncertainty, is the bottom line, for any assumed model. Correlation and Autocorrelation
The correlation coefficient between two random variables xt and yt is defined as ρxy (0), which is the special case of the cross correlation coefficient ρxy (h) defined in Chapter 2. The correlation coefficient between xt and xt+h is called the lag h autocorrelation of xt and is commonly denoted by ρx(h), which is under the weak stationarity assumption. The definition of ρx(h) is given in Chapter 2. The sample version of ρx(h) is given by ρ"x(h) = γ" x(h)/γ" x(0), where, for given data {xt}nt=1, n 1 n−h 1 $ $ " γx(h) = (xt+h − x)(xt − x) with x = xt . n − h t=1 n t=1 Under some general conditions, ρ"x(1) is a consistent estimate of ρx(1). For example, if {xt} is an independent and identically distributed (iid) sequence and E(x2t ) < ∞, then ρ"x(1) is asymptotically normal with mean zero and variance 1/n; see Brockwell and Davis (1991, Theorem 7.2.2). This result can be used in practice to test the null hypothesis H0 : ρx(1) = 0 versus the alternative hypothesis
CHAPTER 3. UNIVARIATE TIME SERIES MODELS
71
Ha : ρx(1) )= 0. The test v statistic is the usual t-ratio, which is √ n ρ"x(1) and follows asymptotically the standard normal distribution. In general, for the lag h sample autocorrelation of xt, if {xt} is an iid sequence satisfying E(x2t ) < ∞, then, ρ"x(h) is asymptotically normal with mean zero and variance 1/n for any fixed positive integer h. For more information about the asymptotic distribution of sample autocorrelations, see Brockwell and Davis (1991, Chapter 7). In finite samples, ρ"x(h) is a biased estimator of ρx(h). The bias is in the order of 1/n, which can be substantial when the sample size n is small. In most economic and financial applications, n is relatively large so that the bias is not serious. Portmanteau Test
Economic and financial applications often require to test jointly that several autocorrelations of xt are zero. Box and Pierce (1970) proposed the Portmanteau statistic ∗
Q (m) = n
m $
h=1
ρ"2x(h)
as a test statistic for the null hypothesis H0 : ρx(1) = · · · = ρx(m) = 0 against the alternative hypothesis Ha : ρx(i) )= 0 for some i ∈ [1, . . . , m]. Under the assumption that {xt} is an iid sequence with certain moment conditions, Q∗(m) is asymptotically a chi-squared random variable with m degrees of freedom. Ljung and Box (1978) modified the Q∗(m) statistic as below to increase the power of the test in finite samples, Q(m) = n(n + 2)
m $
h=1
ρ"2x(h)/(n − h).
In practice, the selection of m may affect the performance of the Q(m) statistic. Several values of m are often used. Simulation stud-
CHAPTER 3. UNIVARIATE TIME SERIES MODELS
72
ies suggest that the choice of m ≈ log(n) provides better power performance. The function ρ"x(h) is called the sample autocorrelation function (ACF) of xt. It plays an important role in linear time series analysis. As a matter of fact, a linear time series model can be characterized Log Returns
0.2
0.2
Simple Returns
0.0
0.1
IBM
−0.1 0
20
40
60
80
100
−0.2
−0.2
−0.1
0.0
0.1
IBM
0
20
80
100
0.0
0.1
value−weighted index
0
20
40
60
80
100
−0.2
−0.1
0.0
0.1
value−weighted index
−0.1 −0.2
60
Log Returns
0.2
0.2
Simple Returns
40
0
20
40
60
80
100
Figure 3.1: Autocorrelation functions (ACF) for simple (left) and log (right) returns for IBM (top panels) and for the value-weighted index of US market (bottom panels), January 1926 to December 1997.
by its ACF, and linear time series modeling makes use of the sample ACF to capture the linear dynamic of the data. The top panels of Figure 3.1 show the sample autocorrelation functions of monthly simple (left top panel) and log (right top panel) returns of IBM stock from January 1926 to December 1997. The two sample ACFs are very close to each other, and they suggest that the serial correlations of monthly IBM stock returns are very small, if any. The sample ACFs are all within their two standard-error limits, indicating that they are not significant at the 5% level. In addition, for the simple returns,
CHAPTER 3. UNIVARIATE TIME SERIES MODELS
73
the LjungBox statistics give Q(5) = 5.4 and Q(10) = 14.1, which correspond to p-value of 0.37 and 0.17, respectively, based on chisquared distributions with 5 and 10 degrees of freedom. For the log returns, we have Q(5) = 5.8 and Q(10) = 13.7 with p-value 0.33 and 0.19, respectively. The joint tests confirm that monthly IBM stock returns have no significant serial correlations. The bottom panels of Figure 3.1 show the same for the monthly returns (simple in the left panel and log in the right panel) of the value-weighted index from the Center for Research in Security Prices (CRSP), University of Chicago. There are some significant serial correlations at the 5% level for both return series. The LjungBox statistics give Q(5) = 27.8 and Q(10) = 36.0 for the simple returns and Q(5) = 26.9 and Q(10) = 32.7 for the log returns. The p-values of these four test statistics are all less than 0.0003, suggesting that monthly returns of the value-weighted index are serially correlated. Thus, the monthly market index return seems to have stronger serial dependence than individual stock returns. In the finance literature, a version of the Capital Asset Pricing Model (CAPM) theory is that the return {xt} of an asset is not predictable and should have no auto-correlations. Testing for zero autocorrelations has been used as a tool to check the efficient market assumption. However, the way by which stock prices are determined and index returns are calculated might introduce autocorrelations in the observed return series. This is particularly so in analysis of high-frequency financial data. Before we discuss univariate and multivariate time series methods, we first review multiple regression models and model selection methods for both iid and time series data.
CHAPTER 3. UNIVARIATE TIME SERIES MODELS
3.2
74
Least Squares Regression
We begin our discussion of univariate and multivariate time series methods by considering the idea of a simple regression model, which we have met before in other contexts such as statistics or econometrics course. All of the multivariate methods follow, in some sense, from the ideas involved in simple univariate linear regression. In this case, we assume that there is some collection of fixed known functions of time, say zt1 , zt2 , . . . , ztq that are influencing our output yt which we know to be random. We express this relation between the inputs and outputs as yt = β1 zt1 + β2 zt2 + · · · + βq ztq + et
(3.1)
at the time points t = 1, 2, . . . , n, where β1, . . . , βq are unknown fixed regression coefficients and et is a random error or noise, assumed to be white noise; this means that the observations have zero means, equal variances σ 2 and are independent. We traditionally assume also that the white noise series, et, is Gaussian or normally distributed. Example 2.1: We have assumed implicitly that the model yt = β1 + β2 t + et is reasonable in our discussion of detrending in Example 1.2 of Chapter 2. Figure 2.4 shows the monthly average global temperature series and it is plausible that a straight line is a reasonable model. This is in the form of the regression model 3.1 when one makes the identification zt1 = 1 and zt2 = t. The problem in detrending is to estimate the coefficients β1 and β2 in the above equation and detrend by constructing the estimated residual series et, which is shown in the top panel of Figure 2.4. As indicated in the example, estimates for β1 and β2 can be taken as β#1 = −9.037 and β#2 = 0.0046, respectively.
CHAPTER 3. UNIVARIATE TIME SERIES MODELS
75
The linear regression model described by Equation 3.1 can be conveniently written in slightly more general matrix notation by defining the column vectors zt = (zt1 , . . . , ztq )- and β = (β1, . . . , βq )- so that we write (2.1) in the alternate form yt = β -zt + et.
(3.2)
To find estimators for β and σ 2, it is natural to determine the co% efficient vector β minimizing e2t with respect to β. This yields # least squares or maximum likelihood estimator β and the maximum #2 which is proportional to the unbiased likelihood estimator for σ ,2 1 n−1 $ + #2 # σ = yt − β zt . (3.3) n − q t=0 An alternate way of writing the model 3.2 is as y = Z β + e,
(3.4)
where Z- = (z1, z2, . . . , zn) is a q × n matrix composed of the values of the input variables at the observed time points and y = (y1, y2, . . . , yn) is the vector of observed outputs with the errors stacked in the vector e = (e1, e2, . . . , en)-. The ordinary least # squares estimators β are the solutions to the normal equations Z- Z β = Z y.
You need not be concerned as to how the above equation is solved in practice as all computer packages have efficient software for inverting the q × q matrix Z- Z to obtain # β = (Z- Z)−1 Z y.
(3.5)
Cov β = σ 2 (Z- Z)−1 ≡ σ 2 C ≡ σ 2 (cij ).
(3.6)
An important quantity that all software produces is a measure of uncertainty for the estimated regression coefficients, say +
#
,
CHAPTER 3. UNIVARIATE TIME SERIES MODELS
76
Then, Cov(βi, βj ) = σ 2 cij and a 100(1 − α)% confidence interval for β#i is √ # β#i ± tn−q (α/2) σ cii, (3.7) where tdf (α/2) denotes the upper 100(1 − α)% point on a t distribution with df degrees of freedom.
Example 2.1: Consider estimating the possible global warming trend alluded to in Section 2.2.1. The global temperature series, shown previously in Figure 2.4 suggests the possibility of a gradually increasing average temperature over the 149 year period covered by the land based series. If we fit the model in Example 2.1, replacing t by t/100 to convert to a 100 year base so that the increase will be in degrees per 100 years, we obtain β#1 = −9.037 and β#2 = 0.4607 using (3.5). the error variance, from (3.3), is 0.0337, with q = 2 and n = 1790. Then, (3.6) yields
0.0379 −0.0020 Cov(β1, β2) = −0.0020 0.0001 √ leading to an estimated standard error of 0.001 = 0.01008. The value of t with n − q = 1790 − 2 = 1788 degrees of freedom for α = 0.025 is about 1.96, leading to a narrow confidence interval of 0.4607 ± 0.0198 for the slope leading to a confidence interval on the one hundred year increase of about 0.4409 to 0.4805 degrees. We would conclude from this analysis that there is a substantial increase in global temperature amounting to an increase of roughly one degree F per 100 years. #
#
If the model is reasonable, the residuals et = yt − β#1 − β#2 t should be essentially independent and identically distributed with no correlation evident. The plot that we have made in Figure 2.5 (the top panel) of the detrended global temperature series shows that this is
CHAPTER 3. UNIVARIATE TIME SERIES MODELS
77
1.0
1.0
Detrended Temperature
0.0 0
5
10 Lag
15
20
−0.5
−0.5
0.0
0.5
PACF
0.5
ACF
5
10 Lag
15
20
10 Lag
15
20
1.0
1.0
Differenced Temperature
0.0 0
5
10 Lag
15
20
−0.5
−0.5
0.0
0.5
PACF
0.5
ACF
5
Figure 3.2: Autocorrelation functions (ACF) and partial autocorrelation functions (PACF) for the detrended (top panel) and differenced (bottom panel) global temperature series.
probably not the case because of the long low frequency in the observed residuals. However, the differenced series, also shown in Figure 2.6, appears to be more independent suggesting that perhaps the apparent global warming is more consistent with a long term swing in an underlying random walk than it is of a fixed 100 year trend. If we check the autocorrelation function of the regression residuals, shown here in Figure 3.2, it is clear that the significant values at higher lags imply that there is significant correlation in the residuals. Such correlation can be important since the estimated standard errors of the coefficients under the assumption that the least squares residuals are uncorrelated is often too small. We can partially repair the damage caused by the correlated residuals by looking at a model with correlated errors. The procedure and techniques for dealing with correlated errors are based on the autoregressive moving average models to be considered in the next sections. Another method of reducing correlation is to apply a first difference ∆xt = xt − xt−1 to the global
CHAPTER 3. UNIVARIATE TIME SERIES MODELS
78
trend data. The ACF of the differenced series, also shown in Figure 3.2, seems to have lower correlations at the higher lags. Figure 2.6 shows qualitatively that this transformation also eliminates the trend in the original series. Since we have again made some rather arbitrary looking specifications for the configuration of dependent variables in the above regression examples, the reader may wonder how to select among various plausible models. We mention that two criteria which reward reducing the squared error and penalize for additional parameters are the Akaike Information Criterion and the Schwarz Information Criterion (SIC) (Schwarz, 1978) with a common form #2 ) + C(K) /n, log(σ
(3.8)
where K is the number of parameters fitted (exclusive of variance #2 is the maximum likelihood estimator for the variance, parameters), σ and C(K) = 2 K for AIC and K log(n) for SIC. SIC is sometimes termed the Bayesian Information Criterion, BIC and will often yield models with fewer parameters than the other selection methods. A modification to AIC that is particularly well suited for small samples was suggested by Hurvich and Tsai (1989). This is the corrected #2 ) + (n + K)/(n − K − 2). The rule AIC, called AICC given by log(σ for all three measures above is to choose the value of K leading to the smallest value of AIC or SIC or AICC. We will give an example later comparing the above simple least squares model with a model where the errors have a time series correlation structure. A summary of model selection methods is given in the next section. Note that all methods are general purposes.
CHAPTER 3. UNIVARIATE TIME SERIES MODELS
3.3
79
Model Selection Methods
Given a possibly large set of potential predictors, which ones do we include in our model? Suppose [X1, X2, · · ·] is a pool of potential predictors. The model with all predictors, Y = β0 + β1 X1 + β2 X2 + · · · + ε, is the most general model. It holds even if some of the individual βj ’s are zero. But if some βj ’s zero or close to zero, it is better to omit those Xj ’s from the model. Reasons why you should omit variables whose coefficients are close to zero: (a) Parsimony principle: Given two models that perform equally well in terms of prediction, one should choose the model that is more parsimonious (simple). (b) Prediction principle: The model should give predictions that are as accurate as possible, not just for current observation, but for future observations as well. Including unnecessary predictors can apparently improve prediction for the current data, but can harm prediction for future data. Note that the sum of squared errors (SSE) never increases as we add more predictors. Therefore, when we build a statistical model, we should follow these principles. 3.3.1
Subset Approaches
The all-possible-regressions procedure calls for considering all possible subsets of the pool of potential predictors and identifying for detailed examination a few good sub-sets according to some criterion. The purpose of all-possible-regressions approach is identifying
CHAPTER 3. UNIVARIATE TIME SERIES MODELS
80
a small group of regression models that are good according to a specified criterion (summary statistic) so that a detailed examination can be made of these models leading to the selection of the final regression model to be employed. The main problem of this approach is computationally expensive. For example, with k = 10 predictors, we need to investigate 210 = 1024 potential regression models. With the aid of modern computing power, this computation is possible. But still the number of 1024 possible models to examine carefully would be an overwhelming task for a data analyst. Different criteria for comparing the regression models may be used with the all-possible-regressions selection procedure. We discuss several summary statistics: (i) Rp2 (or SSEp), PRESSp,
2 (ii) Radj;p (or MSEp),
(v) Sequential Methods,
and
(iii) Cp,
(iv)
(vi) AIC type criteria
We shall denote the number of all potential predictors in the pool by P − 1. Hence including an intercept parameter β0, we have P potential parameters. The number of predictors in a subset will be denoted by p − 1, as always, so that there are p parameters in the regression function for this subset of predictors. Thus we have 1 ≤ p ≤ P: 1. Rp2 (or SSEp): Rp2 indicates that there are p parameters (or, p−1 predictors) in the regression model. The coefficient of multiple determination Rp2 is defined as Rp2 = 1 − SSEp/SST O, where SSEp is the sum of squared errors of the model including all p−1 predictors and SSTO is the sum of squared total variations.
CHAPTER 3. UNIVARIATE TIME SERIES MODELS
81
It is well known that Rp2 measures the proportion of variance of Y explained by p − 1 predictors, it always goes up as we add a predictor, and it varies inversely with SSEp because SSTO is constant for all possible regression models. That is, choosing the model with the largest Rp2 is equivalent to choosing the model with smallest SSEp. 2 (or MSEp): One often considers models with a large Rp2 2. Radj;p value. However, Rp2 always increases with the number of predictors. Hence it can not be used to compare models with different 2 sizes. The adjusted coefficient of multiple determination Radj;p has been suggested as an alternative criterion: 2 Radj;p
SSEp/(n − p) n − 1 SSEp M SEp = 1− = 1− = 1− . SST O/(n − 1) n − p SST O SST O/(n − 1)
It is like Rp2 but with a penalty for adding unnecessary variables. 2 Radj,p can go down when a useless predictor is added and it 2 varies inversely with MSEp because can be even negative. Radj;p SST O/(n − 1) is constant for all possible regression models. 2 That is, choosing the model with the largest Radj;p is equivalent to choosing the model with smallest MSEp. Note that Rp2 is 2 useful when comparing models of the same size, while Radj;p (or Cp) is used to compare models with different sizes. 3. Mallows Cp: The Mallows Cp is concerned with the total mean squared error of the n fitted values for each subset regression model. The mean squared error concept involves the total error in each fitted value: Y#i − µi = 1Y#i − 23E(Y#i)4 + 1E(Y#i23) − µi4, random error
bias
CHAPTER 3. UNIVARIATE TIME SERIES MODELS
82
where µi is the true mean response at ith observation. The means squared error for Y#i is defined as the expected value of the square of the total error in the above. It can be shown that 5
#
2
#
mse(Yi) = E (Yi − µi)
6
7
#
#
82
= V ar(Yi) + Bias(Yi) ,
where Bias(Y#i) = E(Y#i) − µi. The total mean square error for all n fitted values Y#i is the sum over the observation i: n $
i=1
#
mse(Yi) =
n $
i=1
#
V ar(Yi) +
n 7 $
i=1
#
82
Bias(Yi) .
It can be shown that n $
i=1
#
V ar(Yi) = p σ
2
and
n 7 $
i=1
82
Bias(Yi) = (n−p)[E(Sp2)−σ 2], #
where Sp2 is the MSE from the current model. Using this, we have n $ mse(Y#i) = p σ 2 + (n − p)[E(Sp2) − σ 2], (3.9) i=1
Dividing (3.9) by σ 2, we make it scale-free:
E(Sp2) − σ 2 mse(Y#i) = p + (n − p) , 2 2 σ σ i=1 n $
If the model does not fit well, then Sp2 is a biased estimate of σ 2. We can estimate E(Sp2) by MSEp and estimate σ 2 by the MSE from the maximal model (the largest model we can consider), i.e., #2 = M SE σ P −1 = M SE(X1 , . . . , XP −1 ). Using the estimators for E(Sp2) and σ 2 gives Cp = p+(n−p)
M SEp − M SE(X1, . . . , XP −1) SSEp = − M SE(X1, . . . , XP −1) M SE(X1, . . . , XP −1)
Small Cp is a good thing. A small value of Cp indicates that the model is relatively precise (has small variance) in estimating
CHAPTER 3. UNIVARIATE TIME SERIES MODELS
83
the true regression coefficients and predicting future responses. This precision will not improve much by adding more predictors. Look for models with small Cp. If we have enough predictors in the regression model so that all the significant predictors are included, then MSEp ≈ M SE(X1, . . . , XP −1) and it follows that Cp ≈ p. Thus Cp close to p is evidence that the predictors in the pool of potential predictors (X1, . . . , XP −1) but not in the current model, are not important. Models with considerable lack of fit have values of Cp larger than p. The Cp can be used to compare models with different sizes. If we use all the potential predictors, then Cp = P . 4. PRESSp: The PRESS (prediction sum of squares) is defined as P RESS =
n $
i=1
ε"2(i),
where ε"(i) is called PRESS (prediction sum of squares) residual for the the ith observation. The PRESS residual is defined as ε"(i) = Yi − Y#(i), where Y#(i) is the fitted value obtained by leaving the ith observation. Models with small PRESSp fit well in the sense of having small prediction errors. PRESSp can be calculated without fitting the model n times, each time deleting one of the n cases. One can show that ε"(i) = ε"i/(1 − hii),
where hii is the ith diagonal element of H = X(X - X)−1X -. 3.3.2
Sequential Methods
1. Forward selection (a) Start with the null model.
CHAPTER 3. UNIVARIATE TIME SERIES MODELS
84
(b) Add the significant variable if p-value is less than penter , (equivalently, F is larger than Fenter ). (c) Continue until no more variables enter the model. 2. Backward elimination (a) Start with the full model. (b) Eliminate the least significant variable whose p-value is larger than premove, (equivalently, F is smaller than Fremove). (c) Continue until no more variables can be discarded from the model. 3. Stepwise selection (a) Start with any model.
(b) Check each predictor that is currently in the model. Suppose the current model contains X1, . . . , Xk . Then F statistic for Xi is SSE(X1, . . . , Xi−1, Xi+1, . . . , Xk ) − SSE(X1, . . . , Xk ) F = ∼ F (1; n−k M SE(X1, . . . , Xk ) Eliminate the least significant variable whose p-value is larger than premove, (equivalently, F is smaller than Fremove). (c) Continue until no more variables can be discarded from the model. (d) Add the significant variable if p-value is less than penter , (equivalently, F is larger than Fenter ). (e) Go to step (ii) (f) Repeat until no more predictors can be entered and no more can be discarded.
CHAPTER 3. UNIVARIATE TIME SERIES MODELS
3.3.3
85
Likelihood Based-Criteria
The basic idea of Akaike’s and alike approaches can be found in Akaike (1973) and subsequent papers; see the recent book by Burnham and Anderson (2003). Suppose that f (y): true model (unknown) giving rise to data (is a vector of data) and g(y, θ) : candidate model (parameter vector). We want to find a model g(y, θ) “close to” f (y). The Kullback-Leibler discrepancy: f (Y ) . K(f, g) = Ef log g(Y, θ) This is a measure of how “far” model g is from model f (with reference to model f ). Properties: K(f, g) ≥, 0
K(f, g) = 0 ⇐⇒ f = g.
Of course, we can never know how far our model g is from f . But Akaike (1973) showed that we might be able to estimate something almost as good. Suppose we have two models under consideration: g(y, θ) and h(y, φ). Akaike (1973) showed that we can estimate K(f, g) − K(f, h). It turns out that the difference of maximized log-likelihoods, corrected for a bias, estimates the difference of K-L distances. The " # " # # # maximized likelihoods are, L g = g(y, θ) and Lh (y, φ), where θ and φ # are the ML estimates of the parameters. Akaike’s result: [log(L g) − # q] − [log(L h ) − r] is an asymptotically unbiased estimate (i.e. bias approaches zero as sample size increases) of K(f, g) − K(f, h). Here q is the number of parameters estimated in θ (model g) and r is
CHAPTER 3. UNIVARIATE TIME SERIES MODELS
86
the number of parameters estimated in φ (model h). The price of parameters: the likelihoods in the above expression are penalized by the number of parameters. The AIC for model g is given by # AIC = −2 log(L g ) + 2 q.
The AIC might not perform well for the small sample size case. To overcome this shortcoming, a biased correction version of AIC was proposed by Hurvich and Tsai (1989), defined by # AICC = −2 log(L g )+2 (q+1)/(n−q−2) = AIC+2 (q+1)(q+2)/(n−q−2).
The AICC is in the between the AIC (less penalty) and the BIC (heavy penalty).
Another approach is given by the much older notion of Bayesian statistics. In the Bayesian approach, we assume that a priori uncertainty about the value of model parameters is represented by a prior distribution. Upon observing the data, this prior is updated, yielding a posterior distribution. In order to make inferences about the model (rather than its parameters), we integrate across the posterior distribution. Under the assumption that all models are a priori equally likely (because the Bayesian approach requires model priors as well as parameter priors), Bayesian model selection chooses the model with highest marginal likelihood. The ratio of two marginal likelihoods is called a Bayes factor (BF), which is a widely used method of model selection in Bayesian inference. The two integrals in the Bayes factor are nontrivial to compute unless they form a conjugated family. Monte Carlo methods are usually required to compute BF, especially for highly parameterized models. A large sample approximation of BF yields the easily computable BIC # BIC = −2 log(L g ) + q log n.
CHAPTER 3. UNIVARIATE TIME SERIES MODELS
87
In a sum, both AIC and BIC as well as their generalizations have a similar form as # LC = −2 log(L g ) + λ q,
where λ is fixed constant. The recent developments suggest the use of a data adaptive penalty to replace the fixed penalties. See, Bai, Rao and Wu (1999) and Shen and Ye (2002). That is to estimate λ by data in a complexity form based on a concept of generalized degree of freedom. 3.3.4
Cross-Validation and Generalized Cross-Validation
The cross validation (CV) is the most commonly used method for model assessment and selection. The main idea is a direct estimate of extra-sample error. The general version of CV is to split data into K roughly equal-sized parts and to fit the model to the other K − 1 parts and calculate prediction error on the remaining part. CV =
n $
(Yi − Y#−i)2
i=1
where Y#−i is the fitted value computed with k-th part of data removed. A convenient approximation to CV for linear fitting with squared error loss is generalized cross validation (GCV). A linear fitting method has the following property: Y# = S Y , where Y#i is the fitted value with the whole data. For many linear fitting methods with leaveone-out (k = 1), it can be showed easily that CV =
n $
(Yi − Y#−i)2 =
i=1
n $
i=1
2
Yi − Y#i . 1 − Sii
CHAPTER 3. UNIVARIATE TIME SERIES MODELS
88
Due to the intensive computation, the CV can be approximated by the GCV, defined by
2
%
# 2 n Yi − Yi i=1 (Yi − Yi ) = GCV = . 2 1 − trace(S)/n (1 − trace(S)/n) i=1
n $
#
It has been shown that both the CV and GCV method are very appalling to nonparametric modeling. Recently, the leave-one-out cross-validation method was challenged by Shao (1993). Shao (1993) claimed that the popular leave-one-out cross-validation method, which is asymptotically equivalent to many other model selection methods such as the AIC, the Cp, and the bootstrap, is asymptotically inconsistent in the sense that the probability of selecting the model with the best predictive ability does not converge to 1 as the total number of observations n → ∞ and he showed that the inconsistency of the leave-one-out cross-validation can be rectified by using a leave-nν -out cross-validation with nν , the number of observations reserved for validation, satisfying nν /n → 1 as n → ∞. 3.3.5
Penalized Methods
1. Bridge and Ridge: Frank and Friedman (1993) proposed the Lq (q > 0) penalized least squares as n $
(Yi −
i=1
$
j
βj Xij )2 + λ
$
j
|βj |q ,
which results in the estimator which is called the bridge estimator. If q = 2, the resulting estimator is called the ridge estimator given by β# = (X T X + λ I)−1 X T Y .
CHAPTER 3. UNIVARIATE TIME SERIES MODELS
89
2. LASSO: Tibshirani (1996) proposed the so-called LASSO which is the minimizer of the following constrained least squares n $
(Yi −
i=1
$
j
βj Xij )2 + λ
$
j
|βj |,
which results in the soft threshing rule β#j = sign(β#j0)(|β#j0| − λ)+.
3. Non-concave Penalized LS: Fan and Li (2001) proposed the non-concave penalized least squares n $
(Yi −
i=1
$
j
βj Xij )2 +
$
j
pλ(|βj |),
where the hard threshing penalty function pλ(|θ|) = λ2 − (|θ| − λ)2|(|θ| < λ), which results in the hard threshing rule β#j = β#j0 I(|β#j0| > λ). Finally, Fan and Li (2001) proposed the socalled the smoothly clipped absolute deviation (SCAD) model selection criterion with the penalized function defined as + (a λ − θ) I(θ ≤ λ) − pλ(θ) = λ for some a > 2 and I(θ > λ) (a − 1)λ which results in the estimator #0 #0 #0 + sign( β )(| β | − λ) when | β j j j | ≤ 2λ, 6 5 (a − 1)β#j0 − sign(β#j0) a λ /(a − 2) when 2λ ≤ |β#j0| ≤ a λ, β#j = #0 β when |β#j0| > a λ. j Also, Fan and Li (2001) showed that the SCAD estimator satisfies three properties: (1) unbiasedness, (2) sparsity, and (3) continuity and Fan and Peng (2004) considered the case that the number of regressors can depend on the sample size and goes to infinity in a certain rate.
Remark: Note that the theory for the penalized methods is still open for time series data and it would be a very interesting research topic.
CHAPTER 3. UNIVARIATE TIME SERIES MODELS
3.4
90
Integrated Models - I(1)
We begin our study of time correlation by mentioning a simple model that will introduce strong correlations over time. This is the random walk or unit root model which defines the current value of the time series as just the immediately preceding value with additive noise. The model forms the basis, for example, of the random walk theory of stock price behavior. In this model we define xt = xt−1 + wt,
(3.10)
where wt is a white noise series with mean zero and variance σ 2. The left panel of Figure 3.3 shows a typical realization of such a series (wt ∼ N (0, 1)) and we observe that it bears a passing resemblance to the global temperature series. Appealing to (3.10), the best prediction of the current value would be expected to be given by its immediately preceding value. The model is, in a sense, unsatisfactory, because one would think that better results would be possible by a more efficient use of the past. The ACF of the original series, shown in Figure 3.4, exhibits a slow decay as lags increase. In order to model such a series without knowing that it is necessarily generated by (3.10), one might try looking at a first difference, shown in the right panel of Figure 3.3, and comparing the result to a white noise or completely independent process. It is clear from (3.10) that the first difference would be ∆ xt = xt − xt−1 = wt which is just white noise. The ACF of the differenced process, in this case, would be expected to be zero at all lags h )= 0 and the sample ACF should reflect this behavior. The first difference of the random walk in the right panel of Figure 3.3 is also shown in the bottom panels of Figure 3.4 and we note that it appears to be much more random. The ACF, shown in the left bottom panel of Figure 3.4, reflects this predicted
CHAPTER 3. UNIVARIATE TIME SERIES MODELS
91
2
First Difference
−3
0
−2
−1
5
0
1
10
Random Walk
0
50
100
150
200
0
50
100
150
200
Figure 3.3: A typical realization of the random walk series (left panel) and the first difference of the series (right panel).
behavior, with no significant values for lags other than zero. It is clear that (3.10) is a reasonable model for this data. The original series is nonstationary, with an autocorrelation function that depends on time of the form ρ(xt+h, xt) =
& &
1/(t + h), if h ≥ 0, (t + h)/t, if h < 0.
The above example, using a difference transformation to make a random walk stationary, shows a very particular case of the model identification procedure advocated by Box, et al (1994). Namely, we seek a linearly filtered transformation of the original series, based strictly on the past values, that will reduce it to completely random white noise. This gives a model that enables prediction to be done with a residual noise that satisfies the usual statistical assumptions about model error. We will introduce, in the following discussion, more general versions of this simple model that are useful for modeling and forecasting series with observations that are correlated in time. The notation
CHAPTER 3. UNIVARIATE TIME SERIES MODELS
92
1.0
1.0
Random Walk
0.0 0
5
10
15
20
−0.5
−0.5
0.0
0.5
PACF
0.5
ACF
5
10
lag
15
20
1.0
1.0
First Difference
0.0 0
5
10
15
20
−0.5
−0.5
0.0
0.5
PACF
0.5
ACF
5
10
lag
15
20
Figure 3.4: Autocorrelation functions (ACF) (left) and partial autocorrelation functions (PACF) (right) for the random walk (top panel) and the first difference (bottom panel) series.
and terminology were introduced in the landmark work by Box and Jenkins (1970). A requirement for the ARMA model of Box and Jenkins is that the underlying process be stationary. Clearly the first difference of the random walk is stationary but the ACF of the first difference shows relatively little dependence on the past, meaning that the differenced process is not predictable in terms of its past behavior. To introduce a notation that has advantages for treating more general models, define the back-shift operator L as the result of shifting the series back by one time unit, i.e. L xt = xt−1
(3.11)
and applying successively higher powers, Lk xt = xt−k . The operator has many of the usual algebraic properties and allows, for example, writing the random walk model (3.10) as (1 − L) xt = wt. Note that
CHAPTER 3. UNIVARIATE TIME SERIES MODELS
93
the difference operator discussed previously in 1.2.2 is just ∆ = 1−L. Identifying nonstationarity is an important first step in the BoxJenkins procedure. From the above discussion, we note that the ACF of a nonstationary process will tend to decay rather slowly as a function of lag h. For example, a straightly line would be perfectly correlated, regardless of lag. Based on this observation, we mention the following properties that aid in identifying non-stationarity. Property 2.1: The ACF of a non-stationary time series decays very slowly as a function of lag h. The PACF of a non-stationary time series tends to have a peak very near unity at lag 1, with other values less than the significance level. Note that since I(1) model is very important in modeling economic and financial data, we will discuss more on the model and the statistical inference in the later chapter. 3.5 3.5.1
Autoregressive Models - AR(p) Model
Now, extending the notions above to more general linear combinations of past values might suggest writing xt = φ1 xt−1 + φ2 xt−2 + · · · + φp xt−p + wt
(3.12)
as a function of p past values and an additive noise component wt. The model given by (3.12) is called an autoregressive model of order p, since it is assumed that one needs p past values to predict xt. The coefficients φ1, φ2, · · ·, φp are autoregressive coefficients, chosen to produce a good fit between the observed xt and its prediction based
CHAPTER 3. UNIVARIATE TIME SERIES MODELS
94
on xt−1, xt−2, · · ·, xt−p. It is convenient to rewrite (3.12), using the back-shift operator, as where φ(L) = 1 − φ1 L − φ2 L2 − · · · − φp Lp, (3.13) is a polynomial with roots (solutions of φ(L) = 0) outside the unit circle (|Lj | > 1)1. The restrictions are necessary for expressing the solution xt of (3.13) in terms of present and past values of wt, which is called invertibility of an ARMA series. That solution has the form ∞ $ xt = ψ(L) wt, where ψ(L) = ψk Lk , (3.14) φ(L) xt = wt,
k=0
is an infinite polynomial (ψ0 = 1), with coefficients determined by equating coefficients of B in ψ(L) φ(L) = 1.
(3.15)
Equation (3.14) can be obtained formally by noting that choosing ψ(L) satisfying (3.15), and multiplying both sides of (3.14) by ψ(L) gives the representation (3.14). It is clear that the random walk has φ1 = 1 and φk = 0 for all k ≥ 2, which does not satisfy the restriction % and the process is nonstationary. xt is stationary if k |ψk | < ∞; see Proposition 3.1.2 in Brockwell and Davis (1991, p.84), which can be % weakened by k ψk2 < ∞; see Hamilton (1994, p.52). Example 2.2: Suppose that we have an autoregressive model (3.12) with p = 1, i.e., xt − φ1 xt−1 = (1 − φ1 L)xt = wt. Then (3.15) becomes (1+ψ1 L+ψ2 L2 +· · ·)(1−φ1 L) = 1. Equating coefficients of L implies that ψ1 − φ1 = 0 or ψ1 = φ1. For L2, we would get ψ2 − ψ1 φ1 = 0, or ψ2 = φ21. Continuing, we obtain ψk = φk1 and the 1
This restriction is a sufficient and necessary condition for an ARMA time series to be invertible; see Section 3.7 in Hamilton (1994) or Theorem 3.1.2 in Brockwell and Davis (1991, p.86) and the related discussions.
CHAPTER 3. UNIVARIATE TIME SERIES MODELS
representation is ψ(L) = 1 +
∞ $
k=1
95
φk1 Lk
%∞
and we have xt = k=0 φk1 wt−k . The representation (3.14) is fundamental for developing approximate forecasts and also exhibits the series as a linear process of the form considered in Chapter 2. For data involving such autoregressive (AR) models as defined above, the main selection problems are deciding that the autoregressive structure is appropriate and then in determining the value of p for the model. The ACF of the process is a potential aid for determining the order of the process as are the model selection measures described in Section 3.3. To determine the ACF of the pth order AR in (3.12), write the equation as xt −
p $
k=1
φk xt−k = wt
and multiply both sides by xt−h for any h ≥ 1. Assuming that the mean E(xt) = 0, and using the definition of the autocovariance function leads to the equation
E (xt xt−h −
p $
k=1
φk xt−k xt−h = E[wt xt−h].
The left-hand side immediately becomes γx(h) − The representation (3.14) implies that
%p
k=1 φk
γx(h − k).
2 σ w E[wt xt−h] = E[wt(wt−h+φ1 wt−h−1+φ2 wt−h−2+· · ·)] = , if h = 0, 0 otherwise.
Hence, we may write the equations for determining γx(h) as γx(0) −
p $
k=1
φk γx(−k) = σw2
(3.16)
CHAPTER 3. UNIVARIATE TIME SERIES MODELS
and γx(h) −
p $
k=1
96
φk γx(h − k) = 0 for h ≥ 1.
(3.17)
Note that one will need the property γx(h) = γx(−h) in solving these equations. Equations (3.16) and (3.17) are called the Yule-Walker Equations (see Yule, 1927, Walker, 1931). Example 2.3: Consider finding the ACF of the first-order autoregressive model. First, (3.17) implies that γx(0) − φ1 γx(1) = σw2 . For h ≥ 1, we obtain γx(h) − φ1 γx(h − 1) = 0. Solving these successively gives γx(h) = γx(0) φh1 . Combining with (3.16) yields γx(0) = σw2 /(1 − φ21). It follows that the autocovariance function is γx(h) = σw2 φh1 /(1 − φ21). Taking into account that γx(h) = γx(−h), |h| we obtain ρx(h) = φ1 for all h. The exponential decay is typical of autoregressive behavior and there may also be some periodic structure. However, the most effective diagnostic of AR structure is in the PACF and is summarized by the following identification property: Property 2.2: The partial autocorrelation function as a function of lag h is zero for h > p, the order of the autoregressive process. This enables one to make a preliminary identification of the order p of the process using the partial autocorrelation function PACF. Simply choose the order beyond which most of the sample values of the PACF are approximately zero. To verify the above, note that the PACF (see Section 2.4.3) is basically the last coefficient obtained when minimizing the squared error M SE = E
t+h
x
−
h $
k=1
2
ak x
t+h−k
.
CHAPTER 3. UNIVARIATE TIME SERIES MODELS
97
Setting the derivatives with respect to aj equal to zero leads to the equations E
t+h
x
−
h $
k=1
This can be written as
γx(j) −
2
ak xt+h−k xt+h−j = 0
h $
k=1
ak γx(j − k) = 0
for 1 ≤ j ≤ h. Now, from Equation and (3.17), it is clear that, for an AR(p), we may take ak = φk for k ≤ p and ak = 0 for k > p to get a solution for the above equation. This implies Property 2.2 above. Having decided on the order p of the model, it is clear that, for the estimation step, one may write the model (3.12) in the regression form xt = φ- zt + wt, (3.18)
where φ = (φ1, φ2, · · · , φp)- corresponds to β and zt = (xt−1, xt−2, · · · , xt−p is the vector of dependent variables in (3.2). Taking into account the fact that xt is not observed for t ≤ 0, we may run the regression approach in Section 3.2 for t = p + 1, · · · , n to get estimators for φ and for σ 2, the variance of the white noise process. These so-called conditional maximum likelihood estimators are commonly used because the exact maximum likelihood estimators involve solving nonlinear equations; see Chapter 5 in Hamilton (1994) for details and we will discuss this issue later. Example 2.4: We consider the simple problem of modeling the recruit series shown in the right panel of Figure 2.1 using an autoregressive model. The top right panel of Figure 2.16 and the top right panel of Figure 2.19 shows the autocorrelation and partial autocorrelation functions of the recruit series. The PACF has large values for
CHAPTER 3. UNIVARIATE TIME SERIES MODELS
98
Table 3.1: AICC values for ten models for the recruits series p AICC
1 2 3 5.75 5.52 5.53
4 5 5.54 5.54
6 5.55
7 8 5.55 5.56
9 5.57
10 5.58
h = 1 and 2 and then is essentially zero for higher order lags. This implies by Property 2.2 above that a second order (p = 2) AR model might provide a good fit. Running the regression program for an AR(2) model with intercept xt = φ0 + φ1 xt−1 + φ2 xt−2 + wt leads to the estimators φ#0 = 61.8439(4.0121), φ#1 = 1.3512(0.0417), #2 = 89.53, where the estimated stanφ#2 = −0.4612(0.0416) and σ dard deviations are in parentheses. To determine whether the above order is the best choice, we fitted models for 1 ≤ p ≤ 10, obtaining corrected AICC values summarized in Table 3.1 using (3.8) with K = 2. This shows that the minimum AICC obtains for p = 2 and we choose the second order model. Example 2.5: The previous example used various autoregressive models for the recruits series, fitting a second-order regression model. We may also use this regression idea to fit the model to other series such as a detrended version of the SOI given in previous discussions. We have noted in our discussions of Figure 2.19 from the partial autocorrelation function that a plausible model for this series might be a first order autoregression of the form given above with p = 1. Again, putting the model above into the regression framework (3.2) for a single coefficient leads to the estimators φ#1 = 0.59 with stan#2 = 0.09218 and AICC(1) = −1.375. The ACF of dard error 0.04, σ these residuals shown in the left panel of Figure 3.5, however, will still show cyclical variation and it is clear that they still have a number
CHAPTER 3. UNIVARIATE TIME SERIES MODELS
99
−0.5
0.0
0.5
ACF of residuls of AR(1) for SOI
0
5
10
15
20
−1.70 −1.65 −1.60 −1.55 −1.50 −1.45 −1.40 −1.35
1.0
√ of values exceeding the 1.96/ n threshold. A suggested procedure o
o
o o o o
o
o o
AIC AICC o o o o
o
o
o o o o
o
o
o o o o o
o
o o
o o
o o o o o o o o
o o o o
o
0
5
10
o o o o o o o o o o o o o o o o
15 Lag
20
25
30
Figure 3.5: Autocorrelation (ACF) of residuals of AR(1) for SOI (left panel) and the plot of AIC and AICC values (right panel).
is to try higher order autoregressive models and successive models for 1 ≤ p ≤ 30 were fitted and the AICC(K) values are plotted in the right panel of Figure 3.5. There is a clear minimum for a p = 16 order model. The coefficient vector is φ with components and their standard errors in the parentheses 0.4050(0.0469), 0.0740(0.0505), 0.1527(0.0499), 0.0915(0.0505), −0.0377(0.0500), −0.0803(0.0493), −0.0743(0.0493), −0.0679(0.0492), 0.0096(0.0492), 0.1108 (0.0491), 0.1707(0.0492), 0.1606(0.0499), 0.0281(0.0504), −0.1902(0.0501), −0.1283(0.051 #2 = 0.07166. −0.0413(0.0476), and σ 3.5.2
Forecasting
Time series analysis has proved to be fairly good way of producing forecasts. Its drawback is that it is typically not conducive to structural or economic analysis of the forecast. The model has forecasting power only if the future variable being forecasted is related to current values of the variables that we include in the model.
CHAPTER 3. UNIVARIATE TIME SERIES MODELS
100
The goal is to forecast the variable ys based on a set of variables Xt (Xt may consist of the lags of variable yt). Let yts denote a forecast of ys based on Xt. A quadratic loss function is the same as in OLS regression, i.e. choose yst to minimize E(yst − ys)2 and the mean D E squared error (MSE) is defined as MSE(yst ) = E (yst − ys)2 | Xt . It can be shown that the forecast with the smallest MSE is the expectation of ys conditional on Xt, that is yst = E(ys | Xt). Then, the MSE of the optimal forecast is the conditional variance of ys given Xt, that is Var(ys | Xt). We now consider the class of forecasts that are linear projection. These forecasts are used very often in empirical analysis of time series data. There are two conditions for the forecast yst to be a linear projection: (1) The forecast yst needs to be a linear function of Xt, that is yst = E(ys | Xt) = β - Xt, and (2) the coefficients β should be chosen in such a way that E[(ys − β - Xt)X-t] = 0. The forecast β - Xt satisfying (1) and (2) is called the linear projection of ys on Xt. One of the reasons linear projects are popular is that the linear projection produces the smallest MSE among the class of linear forecasting rules. Finally, we give a general approach to forecasting for any process that can be written in the form (3.14), a linear process. This includes the AR, M A and ARM A processes. We begin by defining an h-step forecast of the process xt as xtt+h = E[xt+h | xt, xt−1, · · ·] Note that this is not exactly right because we only have x1, x2, · · ·, xt available, so that conditioning on the infinite past is only an approximation. From this definition, it is reasonable to intuit that
CHAPTER 3. UNIVARIATE TIME SERIES MODELS
101
xts = xt for s ≤ t and E[ws | xt, xt−1, · · ·] = E[ws | wt, wt−1, · · ·] = wst = ws
(3.19)
for s ≤ t. For s > t, use xts and E[ws | xt, xt−1, · · ·] = E[ws | wt, wt−1, · · ·] = wst = E(ws) = 0 (3.20) since ws will be independent of past values of wt. We define the h-step forecast variance as t Pt+h = E[(xt+h − xtt+h)2 | xt, xt−1, · · ·]
(3.21)
To develop an expression for this mean square error, note that, with ψ0 = 1, we can write xt+h =
∞ $
k=0
ψk wt+h−k .
t Then, since wt+h−k = 0 for t + h − k > t, i.e. k < h, we have
xtt+h
=
∞ $
k=0
t ψk wt+h−k
=
∞ $
k=h
ψk wt+h−k ,
so that the residual is xt+h −
xtt+h
=
h−1 $ k=0
ψk wt+h−k .
Hence, the mean square error (3.21) is just the variance of a linear combination of independent zero mean errors, with common variance σw2 t Pt+h
=
σw2
h−1 $ k=0
ψk2.
(3.22)
For more discussions, see Hamilton (1994, Chapter 4). As an example, we consider forecasting the second order model developed for the recruits series in Example 2.4.
CHAPTER 3. UNIVARIATE TIME SERIES MODELS
102
Example 2.6: Consider the one-step forecast xtt+1 first. Writing the defining equation for t + 1 gives xt+1 = φ1 xt + φ2 xt−1 + wt+1; so t = φ1 xt +φ2 xt−1 +0. Continuing in that xtt+1 = φ1 xtt +φ2 xtt−1 +wt+1 t this vein, we obtain xtt+2 = φ1 xtt+1 +φ2 xtt +wt+2 = φ1 xtt+1 +φ2 xt +0. t = φ1 xtt+h−1 +φ2 xtt+h−2 +0 Then, xtt+h = φ1 xtt+h−1 +φ2 xtt+h−2 +wt+h for h > 2. Forecasts out to lag h = 4 and beyond, if necessary, can be found by solving (3.15) for ψ1, ψ2 and ψ3, and substituting into (3.21). By equating coefficients of B, L2 and L3 in (1 − φ1 L − φ2 L2)(1 + ψ1 L + ψ2 L2 + ψ3 L3 + · · ·) = 1, we obtain ψ1 = ψ1, ψ2 − φ2 + φ1 ψ1 = 0 and ψ3 − φ1 ψ2 − φ2 ψ1 = 0. This gives the coefficients ψ1 = φ1, ψ2 = φ2 − φ21, ψ3 = 2 , φ2 φ1 − φ21. From Example 2.4, we have φ#1 = 1.35, φ#2 = # #2 = 90.31 and β −0.46, σ 0 = 6.74. The forecasts are of the form w xtt+h = 6.74 + 1.35 xtt+h−1 − 0.46 xtt+h−2. For the forecast variance, we evaluate ψ#1 = 1.35, ψ#2 = 2.282, ψ#3 = −3.065, leading to 90.31, 90.31(2.288), 90.31(7.495) and 90.31(16.890) for forecasts at h = 1, 2, 3, 4. The standard deviations of the forecasts are 9.50, 14.37, 26.02 and 39.06 for the standard errors of the forecasts. The recruits series values range from 20 to 100 so the forecast uncertainty will be rather large. 3.6
Moving Average Models – MA(q)
We may also consider processes that contain linear combinations of underlying unobserved shocks, say, represented by white noise series wt. These moving average components generate a series of the form xt = wt −
q $
k=1
θk wt−k ,
(3.23)
where q denotes the order of the moving average component and θk (1 ≤ k ≤ q) are parameters to be estimated. Using the back-shift
CHAPTER 3. UNIVARIATE TIME SERIES MODELS
103
notation, the above equation can be written in the form xt = θ(L) wt with θ(L) = 1 −
q $
k=1
θ k Lk ,
(3.24)
where θ(L) is another polynomial in the shift operator B. It should be noted that the MA process of order q is a linear process of the form considered earlier in Problem 4 in Chapter 2 with ψ0 = 1, ψ1 = −θ1, · · ·, ψq = −θq . This implies that the ACF will be zero for lags larger than q because terms in the form of the covariance function given in Problem 4 of Chapter 2 will all be zero. Specifically, the exact forms are
γx(0) = σw2 1 +
q $
k=1
θk2
−θh + and γx(h) = σw2
q−h $ k=1
θk+hθk
(3.25) for 1 ≤ h ≤ q − 1, with γx(q) = −σw2 θq , and γx(h) = 0 for h > q. Hence, we will have the property of ACF for for MA Series. Property 2.3: For a moving average series of order q, note that the autocorrelation function (ACF) is zero for lags h > q, i.e. ρx(h) = 0 for h > q. Such a result enables us to diagnose the order of a moving average component by examining ρx(h) and choosing q as the value beyond which the coefficients are essentially zero. Example 2.7: Consider the varve thicknesses in Figure 2.19, which is described in Problem 7 of Chapter 2. Figure 3.6 shows the ACF and PACF of the original log-transformed varve series {xt} and the first differences. The ACF of the original series {xt} indicates a possible non-stationary behavior, and suggests taking a first difference ∆ xt, interpreted hear as the percentage yearly change in deposition. The ACF of the first difference ∆ xt shows a clear peak at h = 1 and
CHAPTER 3. UNIVARIATE TIME SERIES MODELS
104
1.0
PACF
1.0
ACF
0.5 0.0 10
15
20
25
30
15
20
25
30
−0.5
5
0
5
10
15
20
25
30
0
5
10
15
20
25
30
1.0
0
1.0
−0.5
0.0
0.5
log varves
0.5 0.0 0
5
10
−0.5
−0.5
0.0
0.5
First difference
Figure 3.6: Autocorrelation functions (ACF) and partial autocorrelation functions (PACF) for the log varve series (top panel) and the first difference (bottom panel), showing a peak in the ACF at lag h = 1.
no other significant peaks, suggesting a first order moving average. Fitting the first order moving average model ∆ xt = wt − θ1 wt−1 to this data using the Gauss-Newton procedure described next leads to #2 = 0.2358. θ"1 = 0.77 and σ w
Fitting the pure moving average term turns into a nonlinear problem as we can see by noting that either maximum likelihood or regression involves solving (3.23) or (3.24) for wt, and minimizing the sum of the squared errors. Suppose that the roots of π(L) = 0 are all outside the unit circle, then this is possible by solving π(L) θ(L) = 1, so that, for the vector parameter θ = (θ1, · · · , θq )-, we may write wt(θ) = π(L) xt %
(3.26)
and minimize SSE(θ) = nt=q+1 wt2(θ) as a function of the vector parameter θ. We do not really need to find the operator π(L) but can simply solve (3.26) recursively for wt, with w1, w2, · · · , wq = 0,
CHAPTER 3. UNIVARIATE TIME SERIES MODELS
105
and wt(θ) = xt + qk=1 θk wt−k for q + 1 ≤ t ≤ n. It is easy to verify that SSE(θ) will be a nonlinear function of θ1, θ2, · · · , θq . However, note that by the Taylor expansion %
∂wt(θ) (θ − θ 0 ), wt(θ) ≈ wt(θ 0) + ∂θ 0 where the derivative is evaluated at the previous guess θ 0. Rearranging the above equation leads to
∂wt(θ) (θ − θ 0 ) + wt (θ), − wt(θ 0) ≈ ∂θ 0 which is just the regression model (3.2). Hence, we can begin with an initial guess θ 0 = (0.1, 0.1, · · · , 0.1)-, say and successively minimize SSE(θ) until convergence. See Chapter 5 in Hamilton (1994) for details and we will discuss this issue later. Forecasting: In order to forecast a moving average series, note % that xt+h = wt+h − qk=1 θk wt+h−k . The results below (3.19) imply that xtt+h = 0 if h > q and if h ≤ q, xtt+h
=−
q $
k=h
θk wt+h−k ,
where the wt values needed for the above are computed recursively as before. Because of (3.14), it is clear that ψ0 = 1 and ψk = −θk for 1 ≤ k ≤ q and these values can be substituted directly into the variance formula (3.22). That is,
t Pt+h = σw2 1 +
h−1 $ k=1
θk2 .
CHAPTER 3. UNIVARIATE TIME SERIES MODELS
3.7
106
Autoregressive Integrated Moving Average Model - ARIMA(p, d, q)
Now combining the autoregressive and moving average components leads to the autoregressive moving average ARMA(p, q) model, written as φ(L) xt = θ(L) wt where the polynomials in B are as defined earlier in (3.13) and (3.24), with p autoregressive coefficients and q moving average coefficients. In the difference equation form, this becomes q p $ $ θk wt−k . φk xt−k = wt − xt − k=1
k=1
The mixed processes do not satisfy the Properties 2.1 - 2.3 any more but they tend to behave in approximately the same way, even for the mixed cases. Estimation and forecasting for such problems are treated in essentially the same manner as for the AR and MA processes. We note that we can formally divide both sides of (3.20) by φ(L) and note that the usual representation (3.14) holds when ψ(L) φ(L) = θ(L).
(3.27)
For forecasting, we determine the {ψk } by equating coefficients of {Lk } in (3.27), as before, assuming the all the roots of φ(L) = 0 are greater than one in absolute value. Similarly, we can always solve for the residuals, say wt = xt −
p $
k=1
φk xt−k +
q $
k=1
θk wt−k
to get the terms needed for forecasting and estimation. Example 2.8: Consider the above mixed process with p = q = 1, i.e. ARMA(1, 1). By (3.21), we may write xt = φ1 xt−1 + wt − θ1 wt−1.
CHAPTER 3. UNIVARIATE TIME SERIES MODELS
107
Now, xt+1 = φ1 xt + wt+1 − θ1 wt so that xtt+1 = φ1 xt + 0 − θ1 wt = φ1 xt − θ1 wt and xtt+h = φ1 xtt+h−1 for h > 1, leading to very simple forecasts in this case. Equating coefficients of Lk in (1 − φ1 L)(1 + ψ1 L + ψ2 L2 + · · ·) = (1 − θ1 L) leads to ψk = (φ1 − θ1)φk−1 for 1 k ≥ 1. Using (3.22) leads to the expression t Pt+h
=
σw2 1
h−1 2 $
+ (φ1 − θ1)
k=1
2(k−1) φ1
=
σw2
F
2(h−1)
1 + (φ1 − θ1)2(1 − φ1
for the forecast variance. In the first example of this chapter, it was noted that nonstationary processes are characterized by a slow decay in the ACF as in Figure 3.4. In many of the cases where slow decay is present, the use of a first order difference ∆ xt = xt − xt−1 = (1 − L) xt will reduce the nonstationary process xt to a stationary series ∆ xt. On can check to see whether the slow decay has been eliminated in the ACF of the transformed series. Higher order differences, ∆d xt = ∆∆d−1 xt are possible and we call the process obtained when the dth difference is an ARMA series an ARIMA(p, d, q) series where p is the order of the autoregressive component, d is the order of differencing needed and q is the order of the moving average component. Symbolically, the form is φ(L) ∆d xt = θ(L) wt. The principles of model selection for ARIMA(p, d, q) series are obtained using the likelihood based methods such as AIC, BIC or AICC which replace K by K = p + q the total number of ARMA parameters or other methods such as penalized methods described in Section 3.3.
)/(1 −
CHAPTER 3. UNIVARIATE TIME SERIES MODELS
3.8
108
Seasonal ARIMA Models
Some economic and financial as well as environmental time series such as quarterly earning per share of a company exhibits certain cyclical or periodic behavior; see the later chapters on more discussions on cycles and periodicity. Such a time series is called a seasonal (deterministic cycle) time series. Figure 2.8 shows the time plot of quarterly earning per share of Johnson and Johnson from the first quarter of 1960 to the last quarter of 1980. The data possess some special characteristics. In particular, the earning grew exponentially during the sample period and had a strong seasonality. Furthermore, the variability of earning increased over time. The cyclical pattern repeats itself every year so that the periodicity of the series is 4. If monthly data are considered (e.g., monthly sales of Wal-Mart Stores), then the periodicity is 12. Seasonal time series models are also useful in pricing weather-related derivatives and energy futures. See Example 1.8 and Example 1.9 in Chapter 2 for more examples with seasonality. Analysis of seasonal time series has a long history. In some applications, seasonality is of secondary importance and is removed from the data, resulting in a seasonally adjusted time series that is then used to make inference. The procedure to remove seasonality from a time series is referred to as seasonal adjustment. Most economic data published by the U.S. government are seasonally adjusted (e.g., the growth rate of domestic gross product and the unemployment rate). In other applications such as forecasting, seasonality is as important as other characteristics of the data and must be handled accordingly. Because forecasting is a major objective of economic and financial time series analysis, we focus on the latter approach
CHAPTER 3. UNIVARIATE TIME SERIES MODELS
109
and discuss some econometric models that are useful in modeling seasonal time series. When the autoregressive, differencing, or seasonal moving average behavior seems to occur at multiples of some underlying period s, a seasonal ARIMA series may result. The seasonal nonstationarity is characterized by slow decay at multiples of s and can often be eliminated by a seasonal differencing operator of the form s D ∆D s xt = (1 − L ) xt . For example, when we have monthly data, it is reasonable that a yearly phenomenon will induce s = 12 and the ACF will be characterized by slowly decaying spikes at 12, 24, 36, 48, · · ·, and we can obtain a stationary series by transforming with the operator (1 − L12) xt = xt − xt−12 which is the difference between the current month and the value one year or 12 months ago. If the autoregressive or moving average behavior is seasonal at period s, we define formally the operators Φ(Ls) = 1 − Φ1 Ls − Φ2 L2s − · · · − ΦP LP s
(3.28)
Θ(Ls) = 1 − Θ1 Ls − Θ2 L2s − · · · − ΘQ LQs.
(3.29)
and
The final form of the seasonal ARIMA(p, d, q) × (P, D, Q)s model is d s Φ(Ls) φ(L)∆D s ∆ xt = Θ(L ) θ(L) wt .
(3.30)
Note that one special model of (3.30) is ARIMA(0, 1, 1) × (0, 1, 1)s, that is (1 − Ls)(1 − L) xt = (1 − θ1 L)(1 − Θ1 Ls) wt. This model is referred to as the airline model or multiplicative seasonal model in the literature; see Box, Jenkins, and Reinsel
CHAPTER 3. UNIVARIATE TIME SERIES MODELS
110
(1994, Chapter 9). It has been found to be widely applicable in modeling seasonal time series. The AR part of the model simply consists of the regular and seasonal differences, whereas the MA part involves two parameters. We may also note the properties below corresponding to Properties 2.1 - 2.3. Property 2.1’: The ACF of a seasonally non-stationary time series decays very slowly at lag multiples s, 2s, 3s, · · ·, with zeros in between, where s denotes a seasonal period, usually 4 for quarterly data or 12 for monthly data. The PACF of a nonstationary time series tends to have a peak very near unity at lag s. Property 2.2’: For a seasonal autoregressive series of order P , the partial autocorrelation function Φhh as a function of lag h has nonzero values at s, 2s, 3s, · · ·, P s, with zeros in between, and is zero for h > P s, the order of the seasonal autoregressive process. There should be some exponential decay. Property 2.3’: For a seasonal moving average series of order Q, note that the autocorrelation function (ACF) has nonzero values at s, 2s, 3s, · · ·, Qs and is zero for h > Qs. Remark: Note that there is a build-in command in R called arima() which is a powerful tool for estimating and making inference for an ARIMA model. The command is
arima(x,order=c(0,0,0),seasonal=list(order=c(0,0,0),period= xreg=NULL,include.mean=TRUE, transform.pars=TRUE,fixed=NUL method=c("CSS-ML","ML","CSS"),n.cond,optim.control=list(),
CHAPTER 3. UNIVARIATE TIME SERIES MODELS
111
See the manuals of R for details about this commend.
400
Example 2.9: We illustrate by fitting the monthly birth series from 1948-1979 shown in Figure 3.7. The period encompasses the 40
First difference
250
−40
−20
300
0
20
350
Births
200
300
100
200
300
−20
0
20
ARIMA(0,1,1)X(0,1,1)_{12}
−40
−40
−20
0
20
ARIMA(0,1,0)X(0,1,0)_{12}
0
40
100
40
0
0
50
100
200
300
0
100
200
300
Figure 3.7: Number of live births 1948(1) − 1979(1) and residuals from models with a first difference, a first difference and a seasonal difference of order 12 and a fitted ARIMA(0, 1, 1)× (0, 1, 1)12 model.
boom that followed the Second World War and there is the expected rise which persists for about 13 years followed by a decline to around 1974. The series appears to have long-term swings, with seasonal effects superimposed. The long-term swings indicate possible nonstationarity and we verify that this is the case by checking the ACF and PACF shown in the top panel of Figure 3.8. Note that by Property 2.1, slow decay of the ACF indicates non-stationarity and we respond by taking a first difference. The results shown in the second panel of Figure 2.5 indicate that the first difference has eliminated the strong low frequency swing. The ACF, shown in the second
CHAPTER 3. UNIVARIATE TIME SERIES MODELS
112
PACF 0.5
20
30
40
50
60
40
50
60
−0.5
30
10
20
20
30
40
50
60
−0.5
10
0
10
10
20
30
40
50
60
0
10
20
30
40
50
60
40
50
60
20
30
40
50
60
50
60
50
60
50
60
ARIMA(0,1,0)X(0,1,0)_{12}
0
10
20
30
40
ARIMA(0,1,0)X(0,1,1)_{12}
0
10
20
30
40
ARIMA(0,1,1)X(0,1,1)_{12}
0.5
0
30
ARIMA(0,1,0)
0.5
0
−0.5
−0.5
0
0.5
20
−0.5
−0.5
10
0.5
0
0.5 −0.5
−0.5
10
0.5 −0.5
data
0.5
0
0.5
−0.5
0.5
ACF
0
10
20
30
40
Figure 3.8: Autocorrelation functions (ACF) and partial autocorrelation functions (PACF) for the birth series (top two panels), the first difference (second two panels) an ARIMA(0, 1, 0) × (0, 1, 1)12 model (third two panels) and an ARIMA(0, 1, 1) × (0, 1, 1)12 model (last two panels).
CHAPTER 3. UNIVARIATE TIME SERIES MODELS
113
panel from the top in Figure 3.8 shows peaks at 12, 24, 36, 48, · · ·, with now decay. This behavior implies seasonal non-stationarity, by Property 2.1’ above, with s = 12. A seasonal difference of the first difference generates an ACF and PACF in Figure 3.8 that we expect for stationary series. Taking the seasonal difference of the first difference gives a series that looks stationary and has an ACF with peaks at 1 and 12 and a PACF with a substantial peak at 12 and lesser peaks at 12, 24, · · ·. This suggests trying either a first order moving average term, by Property 2.3, or a first order seasonal moving average term with s = 12, by Property 2.3’ above. We choose to eliminate the largest peak first by applying a first-order seasonal moving average model with s = 12. The ACF and PACF of the residual series from this model, i.e. from ARIMA(0, 1, 0) × (0, 1, 1)12, written as (1 − L)(1 − L12) xt = (1 − Θ1 L12) wt, is shown in the fourth panel from the top in Figure 3.8. We note that the peak at lag one is still there, with attending exponential decay in the PACF. This can be eliminated by fitting a first-order moving average term and we consider the model ARIMA(0, 1, 1) × (0, 1, 1)12, written as (1 − L)(1 − L12) xt = (1 − θ1 L)(1 − Θ1 L12) wt.
The ACF of the residuals from this model are relatively well behaved with a number of peaks either near or exceeding the 95% test of no correlation. Fitting this final ARIMA(0, 1, 1)×(0, 1, 1)12 model leads to the model (1 − L)(1 − L12) xt = (1 − 0.4896 L)(1 − 0.6844 L12) wt
with AICC= 4.95, R2 = 0.98042 = 0.961, and the p-values are (0.000, 0.000). The ARIMA search leads to the model (1 − L)(1 − L12) xt = (1 − 0.4088 L − 0.1645 L2)(1 − 0.6990 L12) wt,
CHAPTER 3. UNIVARIATE TIME SERIES MODELS
114
yielding AICC= 4.92 and R2 = 0.9812 = 0.962, slightly better than the ARIMA(0, 1, 1) × (0, 1, 1)12 model. Evaluating these latter models leads to the conclusion that the extra parameters do not add a practically substantial amount to the predictability. The model is expanded as xt = xt−1 + xt−12 − xt−13 + wt − θ1 wt−1 − Θ1 wt−12 + θ1Θ1 wt−13. The forecast is xtt+1 = xt + xt−11 − xt−12 − θ1 wt − Θ1 wt−11 + θ1 Θ1 wt−12 xtt+2 = xtt+1 + xt−10 − xt−11 − Θ1 wt−10 + θ1Θ1 wt−11.
Continuing in the same manner, we obtain
xtt+12 = xtt+11 + xt − xt−1 − Θ1 wt + θ1Θ1 wt−1 for the 12 month forecast. Example 2.10: Figure 3.9 shows the autocorrelation function of the log-transformed J&J earnings series that is plotted in Figure 2.8 and we note the slow decay indicating the nonstationarity which has already been obvious in the Chapter 2 discussion. We may also compare the ACF with that of a random walk, shown in Figure 3.2, and note the close similarity. The partial autocorrelation function is very high at lag one which, under ordinary circumstances, would indicate a first order autoregressive AR(1) model, except that, in this case, the value is close to unity, indicating a root close to 1 on the unit circle. The only question would be whether differencing or detrending is the better transformation to stationarity. Following in the Box-Jenkins tradition, differencing leads to the ACF and PACF shown in the second panel and no simple structure is apparent. To force a next step, we interpret the peaks at 4, 8, 12, 16, · · ·, as
CHAPTER 3. UNIVARIATE TIME SERIES MODELS
115 PACF
1.0
1.0
ACF 0.5 0.0
25
30
−0.5
20
25
30
25
30
25
30
0
5
10
15
20
25
30
0
5
10
15
20
25
30
0
5
10
15
20
25
30
0
5
10
15
20
25
30
1.0
15
−0.5
10
−0.5
5
−0.5
0
1.0
−0.5
0.0
0.5
log(J&J)
0.5 0.0
5
10
15
20
1.0
0
1.0
−0.5
0.0
0.5
First Difference
0.5 0.0
5
10
15
20
1.0
0
1.0
−0.5
0.0
0.5
ARIMA(0,1,0)X(1,0,0,)_4
−0.5
0.5 0.0
0.0
0.5
ARIMA(0,1,1)X(1,0,0,)_4
0
5
10
15
20
Figure 3.9: Autocorrelation functions (ACF) and partial autocorrelation functions (PACF) for the log J&J earnings series (top two panels), the first difference (second two panels), ARIMA(0, 1, 0) × (1, 0, 0)4 model (third two panels), and ARIMA(0, 1, 1) × (1, 0, 0)4 model (last two panels).
contributing to a possible seasonal autoregressive term, leading to a possible ARIMA(0, 1, 0) × (1, 0, 0)4 and we simply fit this model and look at the ACF and PACF of the residuals, shown in the third two panels. The fit improves somewhat, with significant peaks still remaining at lag 1 in both the ACF and PACF. The peak in the ACF seems more isolated and there remains some exponentially decaying
CHAPTER 3. UNIVARIATE TIME SERIES MODELS
116
behavior in the PACF, so we try a model with a first-order moving average. The bottom two panels show the ACF and PACF of the resulting ARIMA(0, 1, 1)×(1, 0, 0)4 and we note only relatively minor excursions above and below the 95% intervals under the assumption that the theoretical ACF is white noise. The final model suggested is (yt = log xt) (1 − Φ1 L4)(1 − L) yt = (1 − θ1 L) wt,
(3.31)
" # #2 = 0.0086. The where Φ 1 = 0.820(0.058), θ1 = 0.508(0.098), and σ w model can be written in forecast form as
yt = yt−1 + Φ1(yt−4 − yt−5) + wt − θ1 wt−1. The residual plot of the above is plotted in the left bottom panel of Figure 3.10. To forecast the original series for, say 4 quarters, we 1.0
PACF
1.0
ACF
0.5 0.0 0
5
10
15
20
25
30
5
10
15
20
25
30
Residual Plot ARIMA(0,1,1)X(0,1,1,)_4
0.0
0.0
0.1
0.1
ARIMA(0,1,1)X(1,0,0,)_4
−0.1
−0.1 −0.2
0
0.2
0.2
Residual Plot
−0.5
−0.5
0.0
0.5
ARIMA(0,1,1)X(0,1,1,)_4
0
20
40
60
80
0
20
40
60
80
Figure 3.10: Autocorrelation functions (ACF) and partial autocorrelation functions (PACF) for ARIMA(0, 1, 1) × (0, 1, 1)4 model (top two panels) and the residual plots of ARIMA(0, 1, 1) × (1, 0, 0)4 (left bottom panel) and ARIMA(0, 1, 1) × (0, 1, 1)4 model (right bottom panel).
CHAPTER 3. UNIVARIATE TIME SERIES MODELS
117
compute the forecast limits for yt = log xt and then exponentiate, t i.e. xtt+h = exp(yt+h ). Based on the the exact likelihood method, Tsay (2005) considered the following seasonal ARIMA(0, 1, 1) × (0, 1, 1)4 model (1 − L)(1 − L4) yt = (1 − 0.678 L)(1 − 0.314 L4) wt,
(3.32)
#2 = 0.089, where standard errors of the two MA parameters with σ w are 0.080 and 0.101, respectively. The Ljung-Box statistics of the residuals show Q(12) = 10.0 with p-value 0.44. The model appears to be adequate. The ACF and PACF of the ARIMA(0, 1, 1) × (0, 1, 1)4 model are given in the top two panels of Figure 3.10 and the residual plot is displayed in the right bottom panel of Figure 3.10. Based on the comparison of ACF and PACF of two model (3.31) and (3.32) [the last two panels of Figure 3.9 and the top two panels in Figure 3.10], it seems that ARIMA(0, 1, 1) × (0, 1, 1)4 model in (3.32) might perform better than ARIMA(0, 1, 1) × (1, 0, 0)4 model in (3.31).
To illustrate the forecasting performance of the seasonal model in (3.32), we re-estimate the model using the first 76 observations and reserve the last eight data points for forecasting evaluation. We compute 1-step to 8-step ahead forecasts and their standard errors of the fitted model at the forecast origin t = 76. An anti-log transformation is taken to obtain forecasts of earning per share using the relationship between normal and log-normal distributions. Figure 2.15 in Tsay (2005, p.77) shows the forecast performance of the model, where the observed data are in solid line, point forecasts are shown by dots, and the dashed lines show 95% interval forecasts. The forecasts show a strong seasonal pattern and are close to the observed data. For more comparisons for forecasts using different models including semiparametric and nonparametric models, the reader is referred to the book
CHAPTER 3. UNIVARIATE TIME SERIES MODELS
118
by Shumway (1988), and Shumway and Stoffer (2000) and the papers by Burman and Shummay (1998) and Cai and Chen (2006). When the seasonal pattern of a time series is stable over time (e.g., close to a deterministic function), dummy variables may be used to handle the seasonality. This approach is taken by some analysts. However, deterministic seasonality is a special case of the multiplicative seasonal model discussed before. Specifically, if Θ1 = 1, then model contains a deterministic seasonal component. Consequently, the same forecasts are obtained by using either dummy variables or a multiplicative seasonal model when the seasonal pattern is deterministic. Yet use of dummy variables can lead to inferior forecasts if the seasonal pattern is not deterministic. In practice, we recommend that the exact likelihood method should be used to estimate a multiplicative seasonal model, especially when the sample size is small or when there is the possibility of having a deterministic seasonal component. Example 2.11: To determine deterministic behavior, consider the monthly simple return of the CRSP Decile 1 index from January 1960 to December 2003 for 528 observations. The series is shown in the left top panel of Figure 3.11 and the time series does not show any clear pattern of seasonality. However, the sample ACf of the return series shown in the left bottom panel of Figure 3.11 contains significant lags at 12, 24, and 36 as well as lag 1. If seasonal AIMA models are entertained, a model in form (1 − φ1 L)(1 − Φ1 L12) xt = α + (1 − Θ1 L12) wt is identified, where xt is the monthly simple return. Using the conditional likelihood, the fitted model is (1 − 0.25 L)(1 − 0.99 L12) xt = 0.0004 + (1 − 0.92 L12) wt
CHAPTER 3. UNIVARIATE TIME SERIES MODELS January−adjusted returns
0
100
200
300
400
500
−0.3
−0.2
−0.1
0.0
0.2
0.4
0.1 0.2 0.3 0.4
Simple Returns
119
0
100
200
400
500
1.0 0.5 0
10
20
30
40
−0.5
0.0
0.5 0.0 −0.5
300
ACF
1.0
ACF
0
10
20
30
40
Figure 3.11: Monthly simple return of CRSP Decile 1 index from January 1960 to December 2003: Time series plot of the simple return (left top panel), time series plot of the simple return after adjusting for January effect (right top panel), the ACF of the simple return (left bottom panel), and the ACF of the adjusted simple return.
with σw = 0.071. The MA coefficient is close to unity, indicating that the fitted model is close to being non-invertible. If the exact likelihood method is used, we have (1 − 0.264 L)(1 − 0.996 L12) xt = 0.0002 + (1 − 0.999 L12) wt with σw = 0.067. Cancellation between seasonal AR and MA factors is clearly. This highlights the usefulness of using the exact likelihood method, and the estimation result suggests that the seasonal behavior might be deterministic. To further confirm this assertion, we define the dummy variable for January, that is
Jt =
1 if t is January, 0 otherwise,
and employ the simple linear regression xt = β0 + β1 Jt + et.
CHAPTER 3. UNIVARIATE TIME SERIES MODELS
120
The right panels of Figure 3.11 show the time series plot of and the ACF of the residual series of the prior simple linear regression. From the ACF, there are no significant serial correlation at any multiples of 12, suggesting that the seasonal pattern has been successfully removed by the January dummy variable. Consequently, the seasonal behavior in the monthly simple return of Decile 1 is due to the January effect. 3.9
Regression Models With Correlated Errors
In many applications, the relationship between two time series is of major interest. The market model in finance is an example that relates the return of an individual stock to the return of a market index. The term structure of interest rates is another example in which the time evolution of the relationship between interest rates with different maturities is investigated. These examples lead to the consideration of a linear regression in the form yt = β1 + β2 xt + et, where yt and xt are two time series and et denotes the error term. The least squares (LS) method is often used to estimate the above model. If {et} is a white noise series, then the LS method produces consistent estimates. In practice, however, it is common to see that the error term et is serially correlated. In this case, we have a regression model with time series errors, and the LS estimates of β1 and β2 may not be consistent and efficient. Regression model with time series errors is widely applicable in economics and finance, but it is one of the most commonly misused econometric models because the serial dependence in et is often overlooked. It pays to study the model carefully. The standard method
CHAPTER 3. UNIVARIATE TIME SERIES MODELS
121
for dealing with correlated errors et in the regression model yt = β - zt + et is to try to transform the errors et into uncorrelated ones and then apply the standard least squares approach to the transformed observations. For example, let P be an n × n matrix that transforms the vector e = (e1, · · · , en)- into a set of independent identically distributed variables with variance σ 2. Then, transform the matrix version (3.4) to P y = PZ β + P e and proceed as before. Of course, the major problem is deciding on what to choose for P but in the time series case, happily, there is a reasonable solution, based again on time series ARMA models. Suppose that we can find, for example, a reasonable ARMA model for the residuals, say, for example, the ARMA(p, 0, 0) model et =
p $
k=1
φk et + wt,
which defines a linear transformation of the correlated et to a sequence of uncorrelated wt. We can ignore the problems near the beginning of the series by starting at t = p. In the ARMA notation, using the back-shift operator B, we may write φ(L) et = wt, where φ(L) = 1 −
p $
k=1
φk Lk
(3.33) (3.34)
and applying the operator to both sides of (3.2) leads to the model φ(L) yt = φ(L) zt + wt,
(3.35)
CHAPTER 3. UNIVARIATE TIME SERIES MODELS
122
where the {wt}’s now satisfy the independence assumption. Doing ordinary least squares on the transformed model is the same as doing weighted least squares on the untransformed model. The only problem is that we do not know the values of the coefficients φk (1 ≤ k ≤ p) in the transformation (3.34). However, if we knew the residuals et, it would be easy to estimate the coefficients, since (3.34) can be written in the form et = φ- et−1 + wt,
(3.36)
which is exactly the usual regression model (3.2) with φ = (φ1, · · · , φp)replacing β and et−1 = (et−1, et−2, · · · , et−p)- replacing zt. The above comments suggest a general approach known as the CochranOrcutt procedure (Cochrane and Orcutt, 1949) for dealing with the problem of correlated errors in the time series context. 1. Begin by fitting the original regression model (3.2) by least squares, #obtaining β and the residuals et = yt − β zt.
2. Fit an ARMA to the estimated residuals, say φ(L) et = θ(L) wt.
3. Apply the ARMA transformation found to both sides of the regression equation (3.2) to obtain φ(L) φ(L) yt = β zt + wt. θ(L) θ(L) 4. Run an ordinary least squares on the transformed values to obtain the new β. 5. Return to 2. if desired. Often, one iteration is enough to develop the estimators under a reasonable correlation structure. In general, the Cochran-Orcutt procedure converges to the maximum likelihood or weighted least squares estimators.
CHAPTER 3. UNIVARIATE TIME SERIES MODELS
123
Example 2.12: We might consider an alternative approach to treating the Johnson and Johnson earnings series, assuming that yt = log(xt) = β1 +β2 t+et. In order to analyze the data with this approach, first we fit the model above, obtaining β#1 = −0.6678(0.0349) and β#2 = 0.0417(0.0071). The computed residuals et = yt − β#1 − β#2 t can be computed easily, the ACF and PACF are shown in the top two panels of Figure 3.12. Note that the ACF and PACF suggest 1.0
PACF
1.0
ACF
0.5 0.0 10
15
20
25
30
25
30
−0.5
5
0
5
10
15
20
25
30
0
5
10
15
20
25
30
1.0
0
1.0
−0.5
0.0
0.5
detrended
0.5 0.0 0
5
10
15
20
−0.5
−0.5
0.0
0.5
ARIMA(1,0,0,)_4
Figure 3.12: Autocorrelation functions (ACF) and partial autocorrelation functions (PACF) for the detrended log J&J earnings series (top two panels)and the fitted ARIMA(0, 0, 0) × (1, 0, 0)4 residuals.
that a seasonal AR series will fit well and we show the ACF and PACF of these residuals in the bottom panels of Figure 3.12. The seasonal AR model is of the form et = Φ1 et−4 + wt and we obtain # #2 = 0.00779. Using these values, we Φ 1 = 0.7614(0.0639), with σ w transform yt to # # # yt − Φ 1 yt−4 = β1 (1 − Φ1 ) + β2 [t − Φ1 (t − 4)] + wt
# using the estimated value Φ 1 = 0.7614. With this transformed re-
CHAPTER 3. UNIVARIATE TIME SERIES MODELS
124
gression, we obtain the new estimators β#1 = −0.7488(0.1105) and β#2 = 0.0424(0.0018). The new estimator has the advantage of being unbiased and having a smaller generalized variance. To forecast, we consider the original model, with the newly estit mated β#1 and β#2. We obtain the approximate forecast for yt+h = β#1 + β#2(t + h) + ett+h for the log transformed series, along with upper and lower limits depending on the estimated variance that only incorporates the prediction variance of ett+h, considering the trend and seasonal autoregressive parameters as fixed. The narrower upper and lower limits (The figure is not presented here) are mainly a refection of a slightly better fit to the residuals and the ability of the trend model to take care of the nonstationarity. Example 2:13: We consider the relationship between two U.S. weekly interest rate series: xt: the 1-year Treasury constant maturity rate and yt: the 3-year Treasury constant maturity rate. Both series have 1967 observations from January 5, 1962 to September 10, 1999 and are measured in percentages. The series are obtained from the Federal Reserve Bank of St Louis. Figure 3.13 shows the time plots of the two interest rates with solid line denoting the 1-year rate and dashed line for the 3-year rate. The left panel of Figure 3.14 plots yt versus xt, indicating that, as expected, the two interest rates are highly correlated. A naive way to describe the relationship between the two interest rates is to use the simple model, Model I: yt = β1 + β2 xt + et. This results #2 = 0.538 and in a fitted model yt = 0.911 + 0.924 xt + et, with σ e 2 R = 95.8%, where the standard errors of the two coefficients are 0.032 and 0.004, respectively. This simple model (Model I) confirms the high correlation between the two interest rates. However, the
125
4
6
8
10 12 14 16
CHAPTER 3. UNIVARIATE TIME SERIES MODELS
1970
1980
1990
2000
4
6
8
10
12
14
16
1.5 1.0
o
o o oo
o
o
0.0
0.5
o
−0.5
o o o o o o oo o oo oo oo o o o o ooo o oo o o o o oo oo o oo o o oo o o o o o o o o o o o o o o oo o o o o ooo o o o ooo o o o oo o ooo o o o o o oo o o o o o o o o o o o o oo o o o o o oo oo o o oo o o oo oo o o o o o o o ooo o o oo o o o o o o o oo o ooo o o o o oo o o o o o o o o o o o o oo o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o oo o o o o o o o o o o o o o o o o o o o o o o o o o o o o oo o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o oo o o o ooo o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o oo o o o o o o o o o o o o o o o oo o o o o o o o o o o o o o o o o o o o o o o oo o oo o o o o oo o o oo o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o ooo o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o oo o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o oo o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o oo o o o o o o o o o o o o o o oo o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o oo o o oo o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o oo o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o oo o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o oo o o o o o o o oo o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o oo o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o oo o o o o o o o o o o o o o o o o o o oo o o o o oo o o o o o o o o o o o o o o o o o o o o o o o o o o o o oo o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o oo o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o oo o o o o o o o o o o o o o o o o oo o o o o o o o o o o o o o o o o o o o o o o o o o o oo o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o oo o o o o o o o o o o o o o o o o o o oo o o o o o o o o o o oo o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o
oo o o
o
o o o oo o o o o o o o o oo o o o o o o oo o oo o oo oo o oo o o o oo o o o o o o o o o o o o o o o oo o o o o oo o o o o o o o o o o o o o oo o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o oo o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o oo o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o oo o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o oo o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o oo o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o oo o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o oo o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o oo o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o oo oo o o o o o o o o o o o oo o o o o o o oo o o o o o o o o o o o o o o o o o o o o o o o oo o oo o o o o o oo o o o o o o o oo o oo o oo o o o o o o o oo o o o o o
o o oo
−1.0
4
6
8
10
12
14
16
Figure 3.13: Time plots of U.S. weekly interest rates (in percentages) from January 5, 1962 to September 10, 1999. The solid line (black) is the Treasury 1-year constant maturity rate and the dashed line the Treasury 3-year constant maturity rate (red).
o
o
o
o o
o
o
o
o
−1.5
−0.5
0.5
1.0
1.5
Figure 3.14: Scatterplots of U.S. weekly interest rates from January 5, 1962 to September 10, 1999: the left panel is 3-year rate versus 1-year rate, and the right panel is changes in 3-year rate versus changes in 1-year rate.
model is seriously inadequate as shown by Figure 3.15, which gives the time plot and ACF of its residuals. In particular, the sample ACF of the residuals is highly significant and decays slowly, showing the pattern of a unit root nonstationary time series2. The behavior of the residuals suggests that marked differences exist between the two interest rates. Using the modern econometric terminology, if one 2
We will discuss in detail on how to do unit root test later
126
−0.5
−1.5
−1.0
0.0
−0.5
0.0
0.5
0.5
1.0
1.0
CHAPTER 3. UNIVARIATE TIME SERIES MODELS
1970
1980
1990
2000
0
5
10
15
20
25
30
Figure 3.15: Residual series of linear regression Model I for two U.S. weekly interest rates: the left panel is time plot and the right panel is ACF.
assumes that the two interest rate series are unit root nonstationary, then the behavior of the residuals indicates that the two interest rates are not co-integrated; see later chapters for discussion of unit root and co-integration. In other words, the data fail to support the hypothesis that there exists a long-term equilibrium between the two interest rates. In some sense, this is not surprising because the pattern of “inverted yield curve” did occur during the data span. By the inverted yield curve, we mean the situation under which interest rates are inversely related to their time to maturities. The unit root behavior of both interest rates and the residuals leads to the consideration of the change series of interest rates. Let ∆xt = yt − yt−1 = (1 − L) xt be changes in the 1-year interest rate and ∆ yt = yt − yt−1 = (1 − L) yt denote changes in the 3year interest rate. Consider the linear regression, Model II: ∆ yt = β1 + β2 ∆ xt + et. Figure 3.16 shows time plots of the two change series, whereas the right panel of Figure 3.14 provides a scatterplot between them. The change series remain highly correlated with a fitted linear regression model given by ∆ yt = 0.0002+0.7811 ∆ xt +et #2 = 0.0682 and R2 = 84.8%. The standard errors of the two with σ e
127
−1.5 −1.0 −0.5 0.0 0.5 1.0 1.5
CHAPTER 3. UNIVARIATE TIME SERIES MODELS
1970
1980
1990
2000
Figure 3.16: Time plots of the change series of U.S. weekly interest rates from January 12, 1962 to September 10, 1999: changes in the Treasury 1-year constant maturity rate are in denoted by black solid line, and changes in the Treasury 3-year constant maturity rate are indicated by red dashed line.
1000
1500
2000
0
500
1000
1500
2000
−0.5
500
0
5
10
15
20
25
30
0
5
10
15
20
25
30
1.0
0
−0.5
−0.4
−0.2
0.0
0.0
0.5
0.2
0.4
−0.4
−0.2
0.0
0.0
0.5
0.2
0.4
1.0
coefficients are 0.0015 and 0.0075, respectively. This model further confirms the strong linear dependence between interest rates. The two top panels of Figure 3.17 show the time plot (left) and sample ACF (right) of the residuals (Model II). Once again, the ACF shows
Figure 3.17: Residual series of the linear regression models: Model II (top) and Model III (bottom) for two change series of U.S. weekly interest rates: time plot (left) and ACF (right).
CHAPTER 3. UNIVARIATE TIME SERIES MODELS
128
some significant serial correlation in the residuals, but the magnitude of the correlation is much smaller. This weak serial dependence in the residuals can be modeled by using the simple time series models discussed in the previous sections, and we have a linear regression with time series errors. The main objective of this section is to discuss a simple approach for building a linear regression model with time series errors. The approach is straightforward. We employ a simple time series model discussed in this chapter for the residual series and estimate the whole model jointly. For illustration, consider the simple linear regression in Model II. Because residuals of the model are serially correlated, we identify a simple ARMA model for the residuals. From the sample ACF of the residuals shown in the right top panel of Figure 3.17, we specify an MA(1) model for the residuals and modify the linear regression model to (Model III): ∆ yt = β1 + β2 ∆ xt + et and et = wt − θ1 wt−1, where {wt} is assumed to be a white noise series. In other words, we simply use an MA(1) model, without the constant term, to capture the serial dependence in the error term of Model II. The two bottom panels of Figure 3.17 show the time plot (left) and sample ACF (right) of the residuals (Model III). The resulting model is a simple example of linear regression with time series errors. In practice, more elaborated time series models can be added to a linear regression equation to form a general regression model with time series errors. Estimating a regression model with time series errors was not easy before the advent of modern computers. Special methods such as the Cochrane-Orcutt estimator have been proposed to handle the serial dependence in the residuals. By now, the estimation is as easy as that of other time series models. If the time series model used is
CHAPTER 3. UNIVARIATE TIME SERIES MODELS
129
stationary and invertible, then one can estimate the model jointly via the maximum likelihood method or conditional maximum likelihood method. This is the approach we take by using the package R with the command arima(). For the U.S. weekly interest rate data, the fitted version of Model II is ∆ yt = 0.0002 + 0.7824 ∆ xt + et and #2 = 0.0668 and R2 = 85.4%. The et = wt + 0.2115 wt−1 with σ w standard errors of the parameters are 0.0018, 0.0077, and 0.0221, respectively. The model no longer has a significant lag-1 residual ACF, even though some minor residual serial correlations remain at lags 4 and 6. The incremental improvement of adding additional MA parameters at lags 4 and 6 to the residual equation is small and the result is not reported here. Comparing the above three models, we make the following observations. First, the high R2 and coefficient 0.924 of Modle I are misleading because the residuals of the model show strong serial correlations. Second, for the change series, R2 and the coefficient of ∆ xt of Model II and Model III are close. In this particular instance, adding the MA(1) model to the change series only provides a marginal improvement. This is not surprising because the estimated MA coefficient is small numerically, even though it is statistically highly significant. Third, the analysis demonstrates that it is important to check residual serial dependence in linear regression analysis. Because the constant term of Model III is insignificant, the model shows that the two weekly interest rate series are related as yt = yt−1 + 0.782 (xt − xt−1) + wt + 0.212 wt−1. The interest rates are concurrently and serially correlated. Finally, we outline a general procedure for analyzing linear regression models with time series errors: First, fit the linear regression model and check serial correlations of the residuals. Second,
CHAPTER 3. UNIVARIATE TIME SERIES MODELS
130
if the residual series is unit-root nonstationary, take the first difference of both the dependent and explanatory variables. Go to step 1. If the residual series appears to be stationary, identify an ARMA model for the residuals and modify the linear regression model accordingly. Third, perform a joint estimation via the maximum likelihood method and check the fitted model for further improvement. To check the serial correlations of residuals, we recommend that the Ljung-Box statistics be used instead of the Durbin-Watson (DW) statistic because the latter only considers the lag-1 serial correlation. There are cases in which residual serial dependence appears at higher order lags. This is particularly so when the time series involved exhibits some seasonal behavior. Remark: For a residual series et with T observations, the DurbinWatson statistic is DW =
T $
(et − et−1)2/
t=2
T $
t=1
e2t .
Straightforward calculation shows that DW ≈ 2(1 − ρ"e(1)), where ρe(1) is the lag-1 ACF of {et}. 3.10
Estimation of Covariance Matrix
Consider again the regression model in (3.2). There may exist situations which the error et has serial correlations and/or conditional heteroscedasticity, but the main objective of the analysis is to make inference concerning the regression coefficients β. When et has serial correlations, we discussed methods in Example 2.12 and Example 2.13 above to overcome this difficulty. However, we assume
CHAPTER 3. UNIVARIATE TIME SERIES MODELS
131
that et follows an ARIMA type model and this assumption might not be always satisfied in some applications. Here, we consider a general situation without making this assumption. In situations under which the ordinary least squares estimates of the coefficients remain consistent, methods are available to provide consistent estimate of the covariance matrix of the coefficients. Two such methods are widely used in economics and finance. The first method is called heteroscedasticity consistent (HC) estimator; see Eicker (1967) and White (1980). The second method is called heteroscedasticity and autocorrelation consistent (HAC) estimator; see Newey and West (1987). To ease in discussion, we shall re-write the regression model as yt = β -xt + et, where yt is the dependent variable, xt = (x1t, · · · , xpt)- is a pdimensional vector of explanatory variables including constant and lagged variables, and β = (β1, · · · , βp)- is the parameter vector. The LS estimate of β is given by #
β=
−1 n $ xt xt xt yt, t=1 t=1
n $
and the associated covariance matrix has the so-called “sandwich” form as #
Σβ = Cov(β) =
n $
t=1
−1 - xt xt
C
n $
t=1
−1 if et is iid - xt xt =
where C is called the “meat” given by
C = Var
n $
t=1
−1 n 2 $ - σe xt xt , t=1
et xt ,
σe2 is the variance of et and is estimated by the variance of residuals of the regression. In the presence of serial correlations or conditional
CHAPTER 3. UNIVARIATE TIME SERIES MODELS
132
heteroscedasticity, the prior covariance matrix estimator is inconsis# tent, often resulting in inflating the t-ratios of β. The estimator of White (1980) is based on following:
# Σ β,hc =
n $
t=1
-
−1
xt x-t
# C hc
n $
t=1
−1
xt x-t
,
# where with e"t = yt − β xt being the residual at time t, n n $ # Chc = e"2t xt x-t. n − p t=1
The estimator of Newey and West (1987) is
# Σ β,hac =
n $
t=1
# where C hac is given by #
Chac =
n $
t=1
"2
et xt x-t
+
l $
j=1
−1
xt x-t
wj
n $
t=j+1
# C hac
5
n $
t=1
xt x-t
xt et et−j x-t−j " "
−1
+
,
xt−j et−j et x-t "
"
6
with l is a truncation parameter and wj is weight function such as the Barlett weight function defined by wj = 1 − j/(l + 1). Other weight function can also used. Newey and West (1987) suggested choosing l to be the integer part of 4(n/100)2/9 . This estimator essentially uses a nonparametric method to estimate the covariance % matrix of nt=1 et xt and a class of kernel-based heteroskedasticity and autocorrelation consistent (HAC) covariance matrix estimators was introduced by Andrews (1991). Example 2.14: (Continuation of Example 2.13) For illustration, we consider the first differenced interest rate series in Model II in Example 2.13. The t-ratio of the coefficient of ∆ xt is 104.63 if both serial correlation and conditional heteroscedasticity in residuals
CHAPTER 3. UNIVARIATE TIME SERIES MODELS
133
are ignored; it becomes 46.73 when the HC estimator is used, and it reduces to 40.08 when the HAC estimator is employed. To use HC or HAC estimator, we can use the package sandwich in R and the commands are vcovHC() or vcovHAC(). 3.11
Long Memory Models
We have discussed that for a stationary time series the ACF decays exponentially to zero as lag increases. Yet for a unit root nonstationary time series, it can be shown that the sample ACF converges to 1 for all fixed lags as the sample size increases; see Chan and Wei (1988) and Tiao and Tsay (1983). There exist some time series whose ACF decays slowly to zero at a polynomial rate as the lag increases. These processes are referred to as long memory or long range dependent time series. One such an example is the fractionally differenced process defined by (1 − L)d xt = wt,
|d| < 0.5,
(3.37)
where {wt} is a white noise series and d is called the long memory parameter. Properties of model (3.37) have been widely studied in the literature (e.g., Beran, 1994). We summarize some of these properties below.
1. If d < 0.5, then xt is a weakly stationary process and has the infinite MA representation ∞ k + d − 1 $ xt = wt+ ψk wt−k with ψk = d(d+1) · · · (d+k−1)/k! = k k=1 2. If d > −0.5, then xt is invertible and has the infinite AR representation. x t = wt +
∞ $
k=1
ψk wt−k with ψk = (0−d)(1−d) · · · (k−1−d)/k! =
k−d k
CHAPTER 3. UNIVARIATE TIME SERIES MODELS
134
3. For |d| < 0.5, the ACF of xt is ρx(h) =
d(1 + d) · · · (h − 1 + d) , (1 − d)(2 − d) · · · (h − d)
h ≥ 1.
In particular, ρx(1) = d/(1 − d) and as h → ∞, ρx(h) ≈
(−d)! 2d−1 h . (d − 1)!
4. For |d| < 0.5, the PACF of xt is φh,h = d/(h − d) for h ≥ 1. 5. For |d| < 0.5, the spectral density function fx(·) of xt, which is the Fourier transform of the ACF γx(h) of xt, that is 1 fx(ν) = 2π
∞ $
h=−∞
for ν ∈ [−π, π], where i =
√
γx(h) exp(−i h ν)
−1, satisfies
fx(ν) ∼ ν −2d as ν → 0,
(3.38)
where ν ∈ [0, π] denotes the frequency. See Chapter 6 of Hamilton (1994) for details about the spectral analysis. Of particular interest here is the behavior of ACF of xt when d < 0.5. The property says that ρx(h) ∼ h2d−1, which decays at a polynomial, instead of exponential rate. For this reason, such an xt process is called a long-memory time series. A special characteristic of the spectral density function in (3.38) is that the spectrum diverges to infinity as ν → 0. However, the spectral density function of a stationary ARMA process is bounded for all ν ∈ [−π, π]. Earlier we used the binomial theorem for non-integer powers (1 − L)d =
∞ $
(−1)k
k=0
d k L . k
CHAPTER 3. UNIVARIATE TIME SERIES MODELS
135
0.4
0.4
If the fractionally differenced series (1−L)d xt follows an ARMA(p, q) model, then xt is called an fractionally differenced autoregressive moving average (ARFIMA(p, d, q)) process, which is a generalized ARIMA model by allowing for non-integer d. In practice, if the sample ACF of a time series is not large in magnitude, but decays slowly, then the series may have long memory. For more discussions, we refer to the book by Beran (1994). For the pure fractionally differenced model in (3.37), one can estimate d using either a maximum likelihood method in the time domain or the Whittle likelihood or a regression method with logged periodogram at the lower frequencies in the frequency domain. Finally, long-memory models have attracted some attention in the finance literature in part because of the work on fractional Brownian motion in the continuous time models.
100
200
300
400
0
100
200
300
400
Log Spectral Density of EW
−13
−11
Log Spectral Density of VW
−0.1 0.0
0.1
0.2
0.3
ACF for equal−weighted index
−9 −8 −7 −6
0
−12 −11 −10 −9 −8 −7 −6
−0.1 0.0
0.1
0.2
0.3
ACF for value−weighted index
0.00
0.02
0.04
0.06
0.00
0.02
0.04
0.06
Figure 3.18: Sample autocorrelation function of the absolute series of daily simple returns for the CRSP value-weighted (left top panel) and equal-weighted (right top panel) indexes. The log spectral density of the absolute series of daily simple returns for the CRSP value-weighted (left bottom panel) and equal-weighted (right bottom panel) indexes.
CHAPTER 3. UNIVARIATE TIME SERIES MODELS
136
Example 2.15: As an illustration, Figure 3.18 show the sample ACFs of the absolute series of daily simple returns for the CRSP value-weighted (left top panel) and equal-weighted (right top panel) indexes from July 3, 1962 to December 31, 1997. The ACFs are relatively small in magnitude, but decay very slowly; they appear to be significant at the 5% level even after 300 lags. For more information about the behavior of sample ACF of absolute return series, see Ding, Granger, and Engle (1993). To estimate the long memory parameter estimate d, we can use the package fracdiff in R and results are d" = 0.1867 for the absolute returns of the value-weighted index and d" = 0.2732 for the absolute returns of the equal-weighted index. To support our conclusion above, we plot the log spectral density of the absolute series of daily simple returns for the CRSP value-weighted (left bottom panel) and equal-weighted (right bottom panel). They show clearly that both log spectral densities decay like a log function and they support the spectral densities behavior like (3.38). 3.12
Periodicity and Business Cycles
Let us first recall what we have observed from from Figure 2.1 in Chapter 2. From Figure 2.1, we can conclude that both series in tend to exhibit repetitive behavior, with regularly repeating (stochastic) cycles or periodicity that are easily visible. This periodic behavior is of interest because underlying processes of interest may be regular and the rate or frequency of time series characterizing the behavior of the underlying series would help to identify them. One can also remark that the cycles of the SOI are repeating at a faster rate than those of the recruitment series. The recruits series also shows several kinds of oscillations, a faster frequency that seems to repeat about every 12 months and a slower frequency that seems to repeat
CHAPTER 3. UNIVARIATE TIME SERIES MODELS
137
about every 50 months. The study of the kinds of cycles and their strengths are also very important, particularly in macroeconomics to determine the business cycles. For more discussions, we refer to the books by Franses (1996, 1998) and Ghysels and Osborn (2001). As we mention in Chapter 2, one way to identify the cycles is to compute the power spectrum which shows the variance as a function of the frequency of oscillation. For other modeling methods such as periodic autoregressive model (PAR), PAR(p) modeling techniques, we refer to the books by Franses (1996, 1998) and Ghysels and Osborn (2001) for details. Next, we introduce one method to describe the cyclical behavior using the ACF. As indicated above, there exits cyclical behavior for the recruits. From Example 2.4, an AR(2) fits this series quite well. Therefore, we consider the ACF ρx(h) of a stationary AR(2) series, which satisfies the second order difference equation (1 − φ1 L − φ2 L2) ρx(h) = φ(L) ρx (h) = 0 for h ≥ 2, ρx(0) = 1, and ρx(1) = φ1/(1 − φ2). This difference equation determines the properties of the ACF of a stationary AR(2) time series. It also determines the behavior of the forecasts of xt. Corresponding to the prior difference equation, there is a second order polynomial equation 1 − φ1 x − φ2 x2 = 0. Solutions of this equation are G
φ1 ± φ21 + 4 φ2 x= . −2 φ2
In the time series literature, the inverse of two solutions are referred as the characteristic roots of the AR(2) model. Denote the
CHAPTER 3. UNIVARIATE TIME SERIES MODELS
138
two solutions by ω1 and ω2. If both ωi are real valued, then the second order difference equation of the model can be factored as (1 − ω1 L)(1 − ω2 L) and the AR(2) model can be regarded as an AR(1) model operates on top of another AR(1) model. The ACF of xt is then a mixture of two exponential decays. Yet if φ21 + 4 φ2 < 0, then ω1 and ω2 are complex numbers (called a complex conjugate pair), and the plot of ACF of xt would show a picture of damping sine and cosine waves; see the top two panels of Figure 2.14 for the ACF of the SOI and recruits. In business and economic applications, complex characteristic roots are important. They give rise to the behavior of business cycles. It is then common for economic time series models to have complex valued characteristic roots. For an AR(2) model with a pair of complex characteristic roots, the average length of the stochastic cycles is T0 =
2π √ , cos−1[φ1/(2 −φ2)]
where the cosine inverse is stated in radians. If one wishes√the complex solutions as a ± b i, then we have φ1 = 2 a, φ2 = − a2 + b2, and 2π √ T0 = , cos−1(a/ a2 + b2) √ where a2 + b2 is the absolute value of a ± b i. To illustrate the above idea, Figure 3.19 shows the ACF of four stationary AR(2) models. The right top panel is the ACF of the AR(2) model (1 − 0.6 L + 0.4 L2 ) xt = wt. Because φ21 + 4 φ2 = 1.0 + 4 × (−0.7) = −1.8 < 0, this particular AR(2) model contains two complex characteristic roots, and hence its ACF exhibits damping sine and cosine waves. The other three AR(2) models have realvalued characteristic roots. Their ACFs decay exponentially.
CHAPTER 3. UNIVARIATE TIME SERIES MODELS
139
0.5 0.0 −0.5 5
10
15
20
−1.0
−1.0
−0.5
0.0
0.5
1.0
(b)
1.0
(a)
5
10
20
15
20
0.5 0.0 −0.5 5
10
15
20
−1.0
−1.0
−0.5
0.0
0.5
1.0
(d)
1.0
(c)
15
5
10
Figure 3.19: The autocorrelation function of an AR(2) model: (a) φ1 = 1.2 and φ2 = −0.35, (b) φ1 = 1.0 and φ2 = −0.7, (c) φ1 = 0.2 and φ2 = 0.35, (d) φ1 = −0.2 and φ2 = 0.35.
0.0
−0.02 −0.01
0.2
0.00
0.4
0.01
0.6
0.02
0.8
0.03
1.0
0.04
Example 2.16: As an illustration, consider the quarterly growth rate of U.S. real gross national product (GNP), seasonally adjusted, from the second quarter of 1947 to the first quarter of 1991, which is shown in the left panel of Figure 3.20. The right panel of Figure
1950
1960
1970
1980
1990
0
5
10
15
20
25
30
Figure 3.20: The growth rate of US quarterly real GNP from 1947.II to 1991.I (seasonally adjusted and in percentage): the left panel is the time series plot and the right panel is the ACF.
CHAPTER 3. UNIVARIATE TIME SERIES MODELS
140
3.20 displays the ACF of this series and it shows that a picture of damping sine and cosine waves. We can conclude that cycles exist. This series can be used as an example of nonlinear economic time series; see Tsay (2005, Chapter 4) for the detailed analyses using the Markov switching model. Here we simply employ an AR(3) model for the data. Denoting the growth rate by xt, we can use the model building procedure to estimate the model. The fitted model is xt = 0.0047+0.35 xt−1+0.18 xt−2−0.14 xt−3+wt,
#2 = 0.0098. with σ w
Rewriting the model as xt − 0.35 xt−1 − 0.18 xt−2 + 0.14 xt−3 = 0.0047 + wt, we obtain a corresponding third-order difference equation (1 − 0.35 L − 0.18 L2 + 0.14 L3) = 0, which can be factored as (1 + 0.52 L)(1 − 0.87 L + 0.27 L2) = 0. The first factor (1 + 0.52 L) shows an exponentially decaying feature of the GNP growth rate. Focusing on the second order factor 1 − 0.87 L − (−0.27) L2 = 0, we have φ21 + 4 φ2 = 0.872 + 4(−0.27) = −0.3231 < 0. Therefore, the second factor of the AR(3) model confirms the existence of stochastic business cycles in the quarterly growth rate of U.S. real GNP. This is reasonable as the U.S. economy went through expansion and contraction periods. The average length of the stochastic cycles is approximately T0 =
2π √ = 10.83 quarters, cos−1[φ1/(2 −φ2)]
which is about 3 years. If one uses a nonlinear model to separate U.S. economy into “expansion” and “contraction” periods, the data show that the average duration of contraction periods is about three quarters and that of expansion periods is about 3 years. The average duration of 10.83 quarters is a compromise between the two separate durations. The periodic feature obtained here is common among
CHAPTER 3. UNIVARIATE TIME SERIES MODELS
141
growth rates of national economies. For example, similar features can be found for other countries. For a stationary AR(p) series, the ACF satisfies the difference equation (1 − φ1 L − φ2 L2 − · · · − φp Lp) ρx(h) = 0, for h > 0. The plot of ACF of a stationary AR(p) model would then show a mixture of damping sine and cosine patterns and exponential decays depending on the nature of its characteristic roots. Finally, we continue our analysis of the recruits series as entertained in Example 2.4, from which an AR(2) model is fitted for this series as xt − 1.3512 xt−1 + 0.4612 xt−2 = 61.8439 + wt. Clearly, φ21 + 4 φ2 = 1.35122 − 4 × 0.4612 = −0.0191 < 0, which implies the existence of stochastic business cycles in the recruits series. The average length of the stochastic cycles based on the above fitted AR(2) model is approximately T0 =
2π √ = 61.71 months, cos−1[φ1/(2 −φ2)]
which is about 5 years. Note that this average length of cycles is not close to what we have observed (about 50 months). Please figure out the reason why there is a big difference. 3.13
Impulse Response Function
The task facing the modern time-series econometrician is to develop reasonably simple and intuitive models capable of forecasting, interpreting and hypothesis testing regarding economic and financial data. In this respect, the time series econometrician is often concerned with
CHAPTER 3. UNIVARIATE TIME SERIES MODELS
142
the estimation of difference equations containing stochastic components. 3.13.1
First Order Difference Equations
Suppose we are given the dynamic equation yt = φ1 yt−1 + wt, where yt is the value of variable at period t, wt is the value of variable at period t, 1 ≤ t ≤ n. Indeed, this is an AR(1) model. The equation relates a variable yt to its previous (lagged) values with only the first lag appears on the right hand side (RHS) of the equation. For now, the input variable, {w1, w2, w3, . . .}, will simply be regarded as a sequence of deterministic numbers which can be generated from a distribution. Later on, we will assume that they are stochastic. By solving a difference equation by recursive substitution, assuming that we know the starting value of y−1, called the initial condition, then, we have
y0 = φ1 y−1 + w0, y1 = φ1 y0 + w1 = φ(φ1 y−1 + w0) + w1 = φ21 y−1 + φ1 w0 + w1, y2 = φ1 y1 + w2 = φ1(φ21 y−1 + φ1 w0 + w1) + w2 = φ31 y−1 + φ21 w0 + φ1 w1 + ... t t−1 yt = φ1 yt−1 + wt = φt+1 w1 + · · · + φ1 wt−1 + wt 1 y−1 + φ1 w0 + φ =
φt+1 1 y−1
+
t $
k=0
φk1 wt−k .
The procedure to express yt in term of the past values of wt and the starting value y−1 is known as recursive substitution. The last term on RHS of (3.39) is called the linear process (with finite terms) generated by {wt} if {wt} is random. Next we consider the computation of dynamic multiplier. To do so, we consider one simple experiment by assuming that y−1 and
(3
CHAPTER 3. UNIVARIATE TIME SERIES MODELS
143
{w0, w1, · · · , wn} are given, fixed at this moment, and the value of φ1 is known. Then we can compute the time series yt by yt = φ1 yt−1 + wt for 1 ≤ t ≤ n. What happens to the series yt if at the period t = 50, we change the value of deterministic component w50 H and set the new w to be w 50 = w50 + 1? Of course, the change in w50 leads to changes in yt: H y!50 = φ1 y49 + w 50 = φ1 y49 + w50 + 1 = y50 + 1, y!51 = φ1 y50 + w51 = φ1(y50 + 1) + w51 = y51 + φ1, y!52 = φ1 y!51 + w52 = φ1(y51 + φ1) + w52 = y52 + φ21,
4
and so on. If we look at the difference between y!t and yt then we obtain that: y!50 − y50 = 1, y!51 − y51 = φ1, y!52 − y52 = φ21, and so on. This experiment is illustrated in Figures 3.21 and 3.22. Therefore, No Impulse With Impulse
−2
−2
0
0
2
2
No Impulse With Impulse
phi= − 0.8
40
60
80
100
−4
20
0
40
60
80
100
80
100
0.0 −0.5
Impulse Response Function
0
20
40
60
80
100
−1.0
−0.5
0.0
0.5
phi= − 0.8
0.5
phi=0.8
−1.0
20
1.0
0
1.0
−4
phi=0.8
Impulse Response Function
0
20
40
60
Figure 3.21: The time-series yt is generated with wt ∼ N (0, 1), y0 = 5. At period t = 50, there is an additional impulse to the error term, i.e. w!50 = w50 + 1. The impulse response function is computed as the difference between the series yt without impulse and the series y!t with the impulse.
one can say that one unit increase in w50 leads to y50 increased by 1,
144
10 15 20 25 30
20
CHAPTER 3. UNIVARIATE TIME SERIES MODELS
No Impulse With Impulse
−10
0
10
No Impulse With Impulse
60
80
100
0
20
40
60
80
100
80
100
1
1
2
20
2
0
−20
5
phi= − 1.01 phi=1.01 40
0
phi= − 1.01
0
phi=1.01
−2
−2
−1
Impulse Response Function
−1
Impulse Response Function
0
20
40
60
80
100
0
20
40
60
Figure 3.22: The time-series yt is generated with wt ∼ N (0, 1), y0 = 3. At period t = 50, there is an additional impulse to the error term, i.e. w!50 = w50 + 1. The impulse response function is computed as the difference between the series yt without impulse and the series y!t with the impulse.
y51 increased by φ1, y52 increased by φ21, and so so. The question is how yt+j changes if change wt by one? This is exactly the question that dynamic multipliers answer. Assume that we start with yt−1 instead of y−1, i.e. we observe the value of yt−1 at period t − 1. Can we say something about yt+j ? Well, let us answer this question. yt = φ1 yt−1 + wt, yt+1 = φ1 yt + wt+1 = φ1(φ1 yt−1 + wt) + wt+1 = φ21 yt−1 + φ1 wt + wt+1, ... yt−1 + φj1 wt + φj−1 wt+1 + · · · + φ1 wt+j−1 + wt+j yt+j = φj+1 1 The dynamic multiplier is defined by ∂yt+j = φj1 for an AR(1) model. ∂wt
CHAPTER 3. UNIVARIATE TIME SERIES MODELS
145
Note that the multiplier does not depend on t. The dynamic multiplier ∂yt+j /∂wt is called sometimes the impact multiplier. Impulse Response Function
We can plot dynamic multipliers as a function of lag j, i.e plot ∂y { ∂wt+jt }Jj=1. Because dynamic multipliers calculate the response of yt+j to a single impulse in wt, it is also referred to as the impulse response function (IRF). This function has many important applications in time series analysis because it shows how the entire path of a variable is effected by a stochastic shock. Obviously, the dynamic of impulse response function depends on the value of φ1 for an AR(1) model. Let us look at what the relationship is between the IRF and system. (a) 0 < φ1 < 1 implies that the impulse response converges to zero and the system is stable. (b) −1 < φ1 < 0 gives that the impulse response oscillates but converges to zero and the system is stable. (c) φ1 > 1 concludes that the impulse response is explosive and the system is unstable. (d) φ1 < −1 implies that the impulse response is explosive (oscillationally) and the system is unstable. Impulse response functions for all possible cases are presented in Figure 3.23. Permanent Change in wt
In calculating dynamic multipliers in Figure 3.23, we were asking what would happen if wt were to increase by one unit with wt+1, wt+2, · · · , , wt+
o
30
o o
20
phi1=1.2
o
o
o o o o o o o o o o 5 10
o
o
o
1.0
146
o
o
o
o
0.99 o phi1= o −o o
o
o
0.5
o
o
o
phi1= − 0.5 o
0.0
o
−0.5
o o o o
− 0.9 o phi1= o o o o o o o o o o o o o o o o o o o o o o o o o o o o
o 5
10
o
o
o
o
o
o
o
o
o
15
20
o o phi1= − 1.2
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o o o
−30
0
10
o o
o o
−1.0
o o o o o o o o o o o o phi1=0.99 o o o o o o o o o o o o o o o o o o o o phi1=0.5 o ophi1=0.9 o o o o o o o o o o o o o o o o o o o o o o o o o o 5 10 15 20
−10 0 10 20 30 40
0.0
0.2
0.4
0.6
0.8
1.0
CHAPTER 3. UNIVARIATE TIME SERIES MODELS
15
20
o 5
10
15
20
Figure 3.23: Example of impulse response functions for first order difference equations.
unaffected. We were finding the effect of a purely transitory change in wt. Permanent change in wt means that wt, wt+1, · · · , , wt+j would all increase by one unit. The effect on yt+j of a permanent change in wt beginning in period t is then given by ∂yt+j ∂yt+j 1 − φj+1 ∂yt+j 1 j−1 j + +· · ·+ = φ1+φ1 +· · ·+φ1 = ∂wt ∂wt+1 ∂wt+j 1 − φ1
for the AR(1) m
The different between transitory and permanent change in wt is illustrated in Figure 3.24. 3.13.2
Higher Order Difference Equations
Consider the model: A linear pth order difference equation has the following form: yt = φ1 yt−1 + φ2 yt−2 + · · · + φp yt−p + wt, which is an AR(p) model if wt is the white noise. It is easy to drive the properties of the pth order difference equation. To illustrate the properties of pth order difference equation, we consider the second
147
2
CHAPTER 3. UNIVARIATE TIME SERIES MODELS
4
No Impulse With Impulse
−2
0
−3 −2 −1
0
2
1
No Impulse With Impulse
phi=0.8 40
60
80
100
0
20
40
60
80
100
80
100
0.8
5
−4
20
1.0
0
phi=0.8
0.6
4
phi=0.8
Impulse Response Function
0.4
2 1
phi=0.8
0
0.0
0.2
3
Impulse Response Function
0
20
40
60
80
100
0
20
40
60
Figure 3.24: The time series yt is generated with wt ∼ N (0, 1), y0 = 3. For the transitory impulse, there is an additional impulse to the error term at period t = 50, i.e. w!50 = w50 + 1. For the permanent impulse, there is an additional impulse for period t = 50, · · ·, 100, i.e. w!t = wt + 1, t = 50, 51, · · ·, 100. The impulse response function (IRF) is computed as the difference between the series yt without impulse and the series y!t with the impulse.
order difference equation: yt = φ1 yt−1 + φ2 yt−2 + wt so that p = 2. We write equation as a first order vector difference equation. ξ t = F ξ t−1 + vt,
where
y ξt = t , yt−1
φ φ2 , F= 1 1 0
w and vt = t . 0
If the starting value ξ −1 is known we use the recursive substitution to obtain: ξ t = Ft+1 ξ −1 + Ft v0 + Ft−1 v1 + · · · + F vt−1 + vt. Or if the starting value ξ t−1 is known we use the recursive substitution to obtain: ξ t+j = Fj+1 ξ t−1 + Fj vt + Fj−1 vt+1 + · · · + F vt+j−1 + vt+j
CHAPTER 3. UNIVARIATE TIME SERIES MODELS
148
for any j ≥ 0. To compute the dynamic multipliers, we recall the rules of of vector and matrix differentiation. If x(β) is an m × 1 vector that depends on the n × 1 vector β, then, ∂x1 (β ) ∂x1 (β ) ··· ∂β1 ∂βn ∂x . . . . . . . = . . . ∂β m×n ∂xm(β ) ∂xm (β ) ··· ∂β1
∂βn
It is easy to verify that the dynamic multipliers are given by ∂ξ t+j ∂Fj vt = = Fj . ∂vt vt
Since the first element in ξ t+j is yt+j and the first element in vt is wt then the element(1, 1) in the matrix Fj is ∂yt+j /∂wt, i.e. the dynamic multiplier. For larger values of j, an easy way to obtain a numerical values for dynamic multiplier ∂yt+j /∂wt is to simulate the system. This is done as follows. Set y−1 = y−2 = · · · = y−p = 0 and w0 = 1, and set the values of w for all other dates to 0. Then use AR(p) to calculate the value for yt for ≤ t ≤ n. To illustrate dynamic multipliers for the pth order difference equation, we consider the second order difference equation as follows: φ φ2 . The impulse yt = φ1 yt−1 + φ2 yt−2 + wt, so that F = 1 1 0 response functions for this example with four different settings are presented in Figure 3.25: (a) φ1 = 0.6 and φ2 = 0.2, (b) φ1 = 0.8 and φ2 = 0.4, (c) φ1 = −0.9 and φ2 = −0.5, and
CHAPTER 3. UNIVARIATE TIME SERIES MODELS
149 (b) o
10 12
0.8
1.0
(a) o
Impulse Response Function
o Impulse Response Function o
0.6
8
o o
o
0.4
o
o
o
0
5
o
o o o o o o o o o o o
10
15
2
o
0.0
0.2
o
phi1=0.6 and phi2=0.2
o o o o o o o o o
20
0
5
0 10 20 30
o
0.0
o
o o o o o o o o o o o o
−0.5 −1.0
o
5
20
o
Impulse Response Function o
o o o o o o o
o
o o
o o
o
o
o
10
15
phi1= − 0.5 and phi2= − 1.5
−20
phi1= − 0.9 and phi2= − 0.5
o
15
o
20
−40
1.0 0.5
o o
o
(d)
Impulse Response Function
o
o
o
10
(c) o
o
o
o
0
o
4
o
0
o
phi1=0.8 and phi2=0.4
6
o
o
o
o
0
5
10
15
20
Figure 3.25: Example of impulse response functions for second order difference equation.
(d) φ1 = −0.5 and φ2 = −1.5. Similar to the first order difference equation, the impulse response function for the pth order difference equation can be explosive or converge to zero. What determines the dynamics of impulse response function? The eigenvalues of matrix F determine whether the impulse response oscillates, converges or is explosive. The eigenvalues of a p × p matrix A are those numbers λ for which |A − λ Ip| = 0. The eigenvalues of the general matrix F defined above are the values of λ that satisfy: λp − φ1 λp−1 − φ2 λp−2 − · · · − φp−1 λ − φp = φ(λ−p) = 0. Note that the eigenvalues are also called the characteristic roots of AR(p) model. If the eigenvalues are real but at least one eigenvalue is greater than unity in absolute value, the system is explosive.
CHAPTER 3. UNIVARIATE TIME SERIES MODELS
150
Why do eigenvalues determine the dynamics of dynamic multipliers? Recall that if the eigenvalues of p × p matrix A are distinct, there exists a nonsingular p × p matrix T such that A = T Λ T−1, where Λ is a p × p matrix with the eigenvalues of A on the principal diagonal and zeros elsewhere. Then, Fj = T Λj T−1 and the value of eigenvalues in Λ determines whether the elements of Fj explode or not. Recall that the dynamic multiplier is equal to ∂ξ t+j /∂v-t = Fj . Therefore, the size of eigenvalues determines whether the system is stable or not. Now we compute the the eigenvalues for each case in the above example: (a) λ1 = 0.838 and λ2 = −0.238 so that |λk | < 1 and the system is stable. (b) λ1 = 1.148 and λ2 = −0.348 so that |λ1| > 1 and the system is unstable. &
(c) λ = −0.45 ± 0.545 i so that |λ| = (−0.45)2 + 0.5452 = 0.706 < 1 and the system is stable. Since eigenvalues are complex, the impulse response function oscillates. &
(d) λ = −0.25 ± 1.198 i so that |λ| = (−0.25)2 + 1.1982 = 1.223 > 1 and the system is unstable. Since eigenvalues are complex, the impulse response function oscillates. 3.14
Problems
1. Consider the regression model yt = β1 yt−1 + wt, where wt is white noise with zero-mean and variance σw2 . Assume that we observe {yt}nt=2. Show that the least squares estimator of β1 is % % 2 β#1 = nt=2 yt yt−1/ t=2 yt−1 . If we pretend that yt−1 would be
CHAPTER 3. UNIVARIATE TIME SERIES MODELS
151
%
2 . Relate your answer to fixed, show that Var(β#1) = σe2/ nt=2 yt−1 a method for fitting a first-order AR model to the data yt.
2. Consider the autoregressive model AR(1), i.e. xt −φ1 xt−1 = wt. (a) Find the necessary condition for {xt} to be invertible. (b) Show that xt can be expressed as a linear process. (c) Show that E[wt xt] = σw2 and E[wt xt−1] = 0, so that future errors are uncorrelated with past data. 3. The auto-covariance and autocorrelation functions for AR processes are often derived from the Yule-Walker equations, obtained by multiplying both sides of the defining equation, successively by xt, xt−1, · · ·. Use the Yule-Walker equations to drive ρx(h) for the first-order AR. 4. For an ARMA series we define the optimal forecast based on xt, xt−1, · · · as the conditional expectation xtt+h = E[xt+h | xt, xt−1, · · ·] for h ≥ 1. (a) Show, for the general ARMA model that E[wt+h | xt, xt−1, · · ·] = 0 if h > 0, wt+h if h ≤ 0. (b) For the AR(1) and AR(2) models, derive the optimal forecast xtt+h and the prediction error variance of the one-step forecast. 5. Suppose we have the simple linear trend model yt = β1 t + xt, 1 ≤ t ≤ n, where xt = φ1 xt−1 + wt. Give the exact form of the equations that you would use for estimating β1, φ1 and σw2 using the Cochran-Orcutt procedure. 6. Suppose that the simple return of a monthly bond index follows the MA(1) model xt = wt + 0.2 wt−1,
σw = 0.025.
CHAPTER 3. UNIVARIATE TIME SERIES MODELS
152
Assume that w100 = 0.01. Compute the 1-step (xtt+1) and 2step (xtt+2) ahead forecasts of the return at the forecast origin t = 100. What are the standard deviations of the associated forecast errors? Also compute the lag-1 (ρ(1)) and lag-2 (ρ(2)) autocorrelations of the return series. 7. Suppose that the daily log return of a security follows the model xt = 0.01 + 0.2 xt−2 + wt, where {wt} is a Gaussian white noise series with mean zero and variance 0.02. What are the mean and variance of the return series xt? Compute the lag-1 (ρ(1)) and lag-2 (ρ(2)) autocorrelations of xt. Assume that x100 = −0.01, and x99 = 0.02. Compute the 1-step (xtt+1) and 2-step (xtt+2) ahead forecasts of the return series at the forecast origin t = 100. What are the associated standard deviations of the forecast errors? 8. Consider the file “la-regr.dat”, in the syllabus, which contains cardiovascular mortality, temperature values and particulate levels over 6-day periods from Los Angeles County (1970-1979). The file also contains two dummy variables for regression purposes, a column of ones for the constant term and a time index. The order is as follows: Column 1: 508 cardiovascular mortality values (6-day averages), Column 2: 508 ones, Column 3: the integers 1, 2, · · ·, 508, Column 3: Temperature in degrees F and Column 4: Particulate levels. A reference is Shumway et. al. (1988). The point here is to examine possible relations between the temperature and mortality in the presence of a time trend in cardiovascular mortality. (a) Use scatter diagrams to argue that particulate level may be linearly related to mortality and that temperature has either a
CHAPTER 3. UNIVARIATE TIME SERIES MODELS
153
linear or quadratic relation. Check for lagged relations using the cross correlation function. (b) Adjust temperature for its mean value and fit the model Mt = β0 + β1(Tt − T ) + β2(Tt − T )2 + β3 Pt + et, where Mt, Tt and Pt denote the mortality, temperature and particulate pollution series. You can use as inputs Columns 2 and 3 for the trend terms and run the regression analysis without the constant option. (c) Plot the residuals and compute the autocorrelation (ACF) and partial autocorrelation (PACF) functions. Do the residuals appear to be white? Suggest an ARIMA model for the residuals and fit the residuals. The simple ARIMA(2, 0, 0) model is a good compromise. (d) Apply the ARIMA model obtained in part (c) to all of the input variables and to cardiovascular mortality using a transformation. Retain the forecast values for the transformed mortality, say mt = Mt − φ1 Mt−1 − φ2 Mt−2. 9. Generate 10 realizations of a (n = 200 points each) series from an ARIMA(1, 0, 1) Model with φ1 = 0.90, θ1 = 0.20 and σw2 = 0.25. Fit the ARIMA model to each of the series and compare the estimators to the true values by computing the average of the estimators and their standard deviations. 10. Consider the bivariate time series record containing monthly U.S. Production as measured monthly by the Federal Reserve Board Production Index and unemployment as given in the file “frb.asd”. The file contains n = 372 monthly values for each series. Before you begin, be sure to plot the series. Fit a seasonal
CHAPTER 3. UNIVARIATE TIME SERIES MODELS
154
ARIMA model of your choice to the Federal Reserve Production Index. Develop a 12 month forecast using the model. 11. The file labelled “clim-hyd.asd” has 454 months of measured values for the climatic variables Air Temperature, Dew Point, Cloud Cover, Wind Speed, Preciptation, and Inflow at Shasta Lake. We would like to look at possible relations between the weather factors and between the weather factors and the inflow to Shasta Lake. (a) Fit the ARIMA(0, 0, 0) × (0, 1, 1)12 model to transformed √ precipitation Pt = pt and transformed flow it = log(it). Save the residuals for transformed precipitation for use in part (b). (b) Apply the ARIMA model fitted in part (a) for transformed precipitation to the flow series. Compute the cross correlation between the flow residuals using the precipitation ARIMA model and the precipitation residuals using the precipitation model and interpret. 12. Consider the daily simple return of CRSP equal-weighted index, including distributions, from January 1980 to December 1999 in the file “d-ew8099.txt” (day, ew). The indicator variables for Mondays, Tuesdays, Wednesdays, and Thursdays are in the first four columns of the file “wkdays8099.dat”. Use a regression model, possibly with time series errors, to study the effects of trading days on the index return. What is the fitted model? Are the weekday effects significant in the returns at the 5% level? Use the HAC estimator of the covariance matrix to obtain the t-ratios of regression estimates. Does it change the conclusion of weekday effect? Are there serial correlations in the residuals? Use the Ljung-Box test to perform the test. Draw your conclusion. If
CHAPTER 3. UNIVARIATE TIME SERIES MODELS
155
yes, build a regression model with time series error to study weekday effects. 13. This problem is concerned with the dynamic relationship between the spot and futures prices of the S&P500 index. The data file “sp5may.dat” has three columns: log(futures price), log(spot price), and cost-of-carry (×100). The data were obtained from the Chicago Mercantile Exchange for the S&P 500 stock index in May 1993 and its June futures contract. The time interval is 1 minute (intraday). Several authors used the data to study index futures arbitrage. Here we focus on the first two columns. Let ft and st be the log prices of futures and spot, respectively. Build a regression model with time series errors between {ft} and {st}, with ft being the dependent variable. You need to provide all details (reasons and analysis results) at each step. 14. The quarterly gross domestic product implicit price deflator is often used as a measure of inflation. The file “q-gdpdef.dat” contains the data for U.S. from the first quarter of 1947 to the first quarter of 2004. The data are seasonally adjusted and equal to 100 for year 2000. The data are obtained from the Federal Reserve Bank of St Louis. Build a (seasonal) ARIMA model for the series and check the validity of the fitted model. Use the model to forecast the gross domestic product implicit price deflator for the rest of 2004. 15. Consider the monthly simple returns of the Decile 1, Decile 5, and Decile 10 of NYSE/AMEX/NASDAQ based on market capitalization. The data span is from January 1960 to December 2003, and the data are obtained from CRSP with the file name:
CHAPTER 3. UNIVARIATE TIME SERIES MODELS
156
“m-decile1510.txt”. For each series, test the null hypothesis that the first 12 lags of autocorrelations are zero at the 5% level. Draw your conclusion. Build an AR and MA model for the series Decile 5. Use the AR and MA model built to produce 1-step and 3-step ahead forecasts of the series. Compare the fitted AR and MA models. 16. The data file “q-unemrate.txt” contains the U.S. quarterly unemployment rate, seasonally adjusted, from 1948 to the second quarter of 1991. Consider the change series ∆ xt = xt − xt−1, where xt is the quarterly unemployment rate. Build an AR model for the ∆ xt series. Does the fitted model suggest the existence of business cycles? 17. In this exercise, please construct the impulse response function for a a third order difference equation: yt = φ1 yt−1 + φ2 yt−2 + φ3 yt−3 + wt for 1 ≤ t ≤ n, where it is assumed that {wt}nt=1 is a sequence of deterministic numbers, say generated from N (0, 1). (a) Set φ1 = 1.1, φ2 = −0.8, φ3 = 0.1, y0 = y−1 = y−2 = 0, and n = 150. Generate yt using a third order difference equation for 1 ≤ t ≤ n. (b) Check eigenvalues of this model to determine whether IRF are converging or explosive. (c) Construct the impulse response function for the generated yt. Set the number of periods in the impulse response function to J = 25. Comment your results. (d) Set φ1 = 1.71 and repeat steps (a) - (c). Comment your results.
CHAPTER 3. UNIVARIATE TIME SERIES MODELS
3.15
157
Computer Code
The following R commands are used for making the graphs in this chapter. # 3-28-2006 graphics.off()
############################################################ ibm<-matrix(scan("c:\\teaching\\time series\\data\\m-ibm2697 byrow=T,ncol=1) vw<-matrix(scan("c:\\teaching\\time series\\data\\m-vw2697.t byrow=T,ncol=1) n=length(ibm) ibm1=ibm ibm2=log(ibm1+1) vw1=vw vw2=log(1+vw1)
postscript(file="c:\\teaching\\time series\\figs\\fig-2.1.ep horizontal=F,width=6,height=6) par(mfrow=c(2,2),mex=0.4) acf(ibm1, ylab="", xlab="",ylim=c(-0.2,0.2),lag=100, main="Simple Returns",cex=0.5) text(50,0.2,"IBM") acf(ibm2, ylab="", xlab="",ylim=c(-0.2,0.2),lag=100, main="Log Returns",cex=0.5) text(50,0.2,"IBM") acf(vw1, ylab="", xlab="",ylim=c(-0.2,0.2),lag=100, main="Simple Returns",cex=0.5)
CHAPTER 3. UNIVARIATE TIME SERIES MODELS
158
text(50,0.2,"value-weighted index") acf(vw2, ylab="", xlab="",ylim=c(-0.2,0.2),lag=100, main="Log Returns",cex=0.5) text(50,0.2,"value-weighted index") dev.off() ############################################################
############################################################ y1<-matrix(scan("c:\\teaching\\time series\\data\\ngtemp.dat y=y1[,1] n=length(y) a<-1:12 a=a/12 y=y1[,1] n=length(y) x<-rep(0,n) for(i in 1:149){ x[((i-1)*12+1):(12*i)]<-1856+i-1+a } x[n-1]<-2005+1/12 x[n]=2005+2/13 x=x/100 x1=cbind(rep(1,n),x) z=t(x1)%*%x1 fit1=lm(y~x) # fit a regression model resid1=fit1$resid # obatin residuls sigma2=mean(resid1^2) y.diff=diff(y) # compute difference
CHAPTER 3. UNIVARIATE TIME SERIES MODELS
159
var_beta=sigma2*solve(z)
postscript(file="c:\\teaching\\time series\\figs\\fig-2.2.ep horizontal=F,width=6,height=6) par(mfrow=c(2,2),mex=0.4) acf(resid1,lag.max=20,ylab="",ylim=c(-0.5,1), main="Detrended Temperature",cex=0.5) text(5,0.8,"ACF") pacf(resid1,lag.max=20,ylab="",ylim=c(-0.5,1),main="") text(5,0.8,"PACF") acf(y.diff,lag.max=20,ylab="",ylim=c(-0.5,1), main="Differenced Temperature",cex=0.5) text(5,0.8,"ACF") pacf(y.diff,lag.max=20,ylab="",ylim=c(-0.5,1),main="") text(5,0.8,"PACF") dev.off()
############################################################ # simulate an I(1) series n=200 #y=arima.sim(list(order=c(0,1,0)),n=200) x2=rnorm(n) y=diffinv(x)
# simulate the in # a white noise s # simulate I(1) w
postscript(file="c:\\teaching\\time series\\figs\\fig-2.3.ep horizontal=F,width=6,height=6) par(mfrow=c(1,2),mex=0.4) ts.plot(y,type="l",lty=1,ylab="",xlab="")
CHAPTER 3. UNIVARIATE TIME SERIES MODELS
160
text(100,0.8*max(y),"Random Walk") ts.plot(x2,type="l",ylab="",xlab="") text(100,0.8*max(x2),"First Difference") abline(0,0) dev.off()
postscript(file="c:\\teaching\\time series\\figs\\fig-2.4.ep horizontal=F,width=6,height=6) par(mfrow=c(2,2),mex=0.4) acf(y,ylab="",xlab="",main="Random Walk",cex=0.5,ylim=c(-0.5 text(15,0.8,"ACF") pacf(y,ylab="",xlab="lag",main="",ylim=c(-0.5,1.0)) text(15,0.8,"PACF") acf(x2,ylab="",xlab="",main="First Difference",cex=0.5,ylim= text(15,0.8,"ACF") pacf(x2,ylab="",xlab="lag",main="",ylim=c(-0.5,1.0)) text(15,0.8,"PACF") dev.off()
############################################################ # This is Example 2.5 in Chapter 2 ###################################
x<-read.table("c:\\teaching\\time series\\data\\soi.dat",hea x.soi=x[,1] n=length(x.soi) aicc=0 if(aicc==1){ aic.value=rep(0,30)
# max.lag=10
CHAPTER 3. UNIVARIATE TIME SERIES MODELS
161
aicc.value=aic.value sigma.value=rep(0,30) for(i in 1:30){ fit3=arima(x.soi,order=c(i,0,0)) # fit an AR(i) aic.value[i]=fit3$aic/n-2 # compute AIC sigma.value[i]=fit3$sigma2 # obtain the estimated sigma^2 aicc.value[i]=log(sigma.value[i])+(n+i)/(n-i-2) # compute A print(c(i,aic.value[i],aicc.value[i]))} data=cbind(aic.value,aicc.value) write(t(data),"c:\\teaching\\time series\\soi_aic.dat",ncol= }else{ data<-matrix(scan("c:\\teaching\\time series\\soi_aic.dat"), } text4=c("AIC", "AICC")
postscript(file="c:\\teaching\\time series\\figs\\fig-2.5.ep horizontal=F,width=6,height=6) par(mfrow=c(1,2),mex=0.4) acf(resid1,ylab="",xlab="",lag.max=20,ylim=c(-0.5,1),main="" text(10,0.8,"ACF of residuls of AR(1) for SOI") matplot(1:30,data,type="b",pch="o",col=c(1,2),ylab="",xlab=" legend(16,-1.40,text4,lty=1,col=c(1,2)) dev.off() #fit2=arima(x.soi,order=c(16,0,0)) #print(fit2)
############################################################ # This is Example 2.7 in Chapter 2
CHAPTER 3. UNIVARIATE TIME SERIES MODELS
162
#################################### varve<-read.table("c:\\teaching\\time series\\data\\mass2.da varve=varve[,1] n_varve=length(varve) varve_log=log(varve) varve_log_diff=diff(varve_log)
postscript(file="c:\\teaching\\time series\\figs\\fig-2.6.ep horizontal=F,width=6,height=6) par(mfrow=c(2,2),mex=0.4) acf(varve_log,ylab="",xlab="",lag.max=30,ylim=c(-0.5,1),main text(10,0.7,"log varves",cex=0.7) pacf(varve_log,ylab="",xlab="",lag.max=30,ylim=c(-0.5,1),mai acf(varve_log_diff,ylab="",xlab="",lag.max=30,ylim=c(-0.5,1) text(10,0.7,"First difference",cex=0.7) pacf(varve_log_diff,ylab="",xlab="",lag.max=30,ylim=c(-0.5,1 dev.off()
############################################################ # This is Example 2.9 in Chapter 2 #################################### x<-matrix(scan("c:\\teaching\\time series\\data\\birth.dat") n=length(x) x_diff=diff(x) x_diff_12=diff(x_diff,lag=12)
fit1=arima(x,order=c(0,0,0),seasonal=list(order=c(0,0,0)),in resid_1=fit1$resid fit2=arima(x,order=c(0,1,0),seasonal=list(order=c(0,0,0)),in resid_2=fit2$resid
CHAPTER 3. UNIVARIATE TIME SERIES MODELS
163
fit3=arima(x,order=c(0,1,0),seasonal=list(order=c(0,1,0),per include.mean=F) resid_3=fit3$resid
postscript(file="c:\\teaching\\time series\\figs\\fig-2.8.ep horizontal=F,width=6,height=6) par(mfrow=c(5,2),mex=0.4) acf(resid_1, ylab="", xlab="",ylim=c(-0.5,1),lag=60,main="AC pacf(resid_1,ylab="",xlab="",ylim=c(-0.5,1),lag=60,main="PAC text(20,0.7,"data",cex=1.2) acf(resid_2, ylab="", xlab="",ylim=c(-0.5,1),lag=60,main="") # differenced data pacf(resid_2,ylab="",xlab="",ylim=c(-0.5,1),lag=60,main="") text(30,0.7,"ARIMA(0,1,0)") acf(resid_3, ylab="", xlab="",ylim=c(-0.5,1),lag=60,main="") # seasonal difference of differenced data pacf(resid_3,ylab="",xlab="",ylim=c(-0.5,1),lag=60,main="") text(30,0.7,"ARIMA(0,1,0)X(0,1,0)_{12}",cex=0.8) fit4=arima(x,order=c(0,1,0),seasonal=list(order=c(0,1,1), period=12),include.mean=F) resid_4=fit4$resid fit5=arima(x,order=c(0,1,1),seasonal=list(order=c(0,1,1), period=12),include.mean=F) resid_5=fit5$resid
acf(resid_4, ylab="", xlab="",ylim=c(-0.5,1),lag=60,main="") # ARIMA(0,1,0)*(0,1,1)_12 pacf(resid_4,ylab="",xlab="",ylim=c(-0.5,1),lag=60,main="") text(30,0.7,"ARIMA(0,1,0)X(0,1,1)_{12}",cex=0.8)
CHAPTER 3. UNIVARIATE TIME SERIES MODELS
164
acf(resid_5, ylab="", xlab="",ylim=c(-0.5,1),lag=60,main="") # ARIMA(0,1,1)*(0,1,1)_12 pacf(resid_5,ylab="",xlab="",ylim=c(-0.5,1),lag=60,main="") text(30,0.7,"ARIMA(0,1,1)X(0,1,1)_{12}",cex=0.8) dev.off()
postscript(file="c:\\teaching\\time series\\figs\\fig-2.7.ep horizontal=F,width=6,height=6) par(mfrow=c(2,2),mex=0.4) ts.plot(x,type="l",lty=1,ylab="",xlab="") text(250,375, "Births") ts.plot(x_diff,type="l",lty=1,ylab="",xlab="",ylim=c(-50,50) text(255,45, "First difference") abline(0,0) ts.plot(x_diff_12,type="l",lty=1,ylab="",xlab="",ylim=c(-50, # time series plot of the seasonal difference (s=12) of diff text(225,40,"ARIMA(0,1,0)X(0,1,0)_{12}") abline(0,0) ts.plot(resid_5,type="l",lty=1,ylab="",xlab="",ylim=c(-50,50 text(225,40, "ARIMA(0,1,1)X(0,1,1)_{12}") abline(0,0) dev.off()
############################################################
############################################################ # This is Example 2.10 in Chapter 2 #####################################
y<-matrix(scan("c:\\teaching\\time series\\data\\jj.dat"),by
CHAPTER 3. UNIVARIATE TIME SERIES MODELS
n=length(y) y_log=log(y) y_diff=diff(y_log) y_diff_4=diff(y_diff,lag=4)
165
# log of data # first-order difference # first-order seasonal differ
fit1=ar(y_log,order=1) # fit AR(1) model #print(fit1) library(tseries) # call library(tseries) library(zoo) fit1_test=adf.test(y_log) # do Augmented Dicky-Fuller test for testing unit root #print(fit1_test)
fit1=arima(y_log,order=c(0,0,0),seasonal=list(order=c(0,0,0) include.mean=F) resid_21=fit1$resid fit2=arima(y_log,order=c(0,1,0),seasonal=list(order=c(0,0,0) include.mean=F) resid_22=fit2$resid # residual for ARIMA(0,1,0)*(0,0 fit3=arima(y_log,order=c(0,1,0),seasonal=list(order=c(1,0,0) include.mean=F,method=c("CSS")) resid_23=fit3$resid # residual for ARIMA(0,1,0)*(1,0 # note that this model is non-stationary so that "CSS" is us
postscript(file="c:\\teaching\\time series\\figs\\fig-2.9.ep horizontal=F,width=6,height=6) par(mfrow=c(4,2),mex=0.4) acf(resid_21, ylab="", xlab="",ylim=c(-0.5,1),lag=30,main="A text(16,0.8,"log(J&J)") pacf(resid_21,ylab="",xlab="",ylim=c(-0.5,1),lag=30,main="PA
CHAPTER 3. UNIVARIATE TIME SERIES MODELS
166
acf(resid_22, ylab="", xlab="",ylim=c(-0.5,1),lag=30,main="" text(16,0.8,"First Difference") pacf(resid_22,ylab="",xlab="",ylim=c(-0.5,1),lag=30,main="") acf(resid_23, ylab="", xlab="",ylim=c(-0.5,1),lag=30,main="" text(16,0.8,"ARIMA(0,1,0)X(1,0,0,)_4",cex=0.8) pacf(resid_23,ylab="",xlab="",ylim=c(-0.5,1),lag=30,main="") fit4=arima(y_log,order=c(0,1,1),seasonal=list(order=c(1,0,0) period=4),include.mean=F,method=c("CSS")) resid_24=fit4$resid # residual for ARIMA(0,1,1)*(1,0,0) # note that this model is non-stationary #print(fit4) fit4_test=Box.test(resid_4,lag=12, type=c("Ljung-Box")) #print(fit4_test) acf(resid_24, ylab="", xlab="",ylim=c(-0.5,1),lag=30,main="" text(16,0.8,"ARIMA(0,1,1)X(1,0,0,)_4",cex=0.8) # ARIMA(0,1,1)*(1,0,0)_4 pacf(resid_24,ylab="",xlab="",ylim=c(-0.5,1),lag=30,main="") dev.off()
fit5=arima(y_log,order=c(0,1,1),seasonal=list(order=c(0,1,1) include.mean=F,method=c("ML")) resid_25=fit5$resid # residual for ARIMA(0,1,1)*(0,1,1 #print(fit5) fit5_test=Box.test(resid_25,lag=12, type=c("Ljung-Box")) #print(fit5_test)
postscript(file="c:\\teaching\\time series\\figs\\fig-2.10.e horizontal=F,width=6,height=6) par(mfrow=c(2,2),mex=0.4) acf(resid_25, ylab="", xlab="",ylim=c(-0.5,1),lag=30,main="A
CHAPTER 3. UNIVARIATE TIME SERIES MODELS
167
text(16,0.8,"ARIMA(0,1,1)X(0,1,1,)_4",cex=0.8) # ARIMA(0,1,1)*(0,1,1)_4 pacf(resid_25,ylab="",xlab="",ylim=c(-0.5,1),lag=30,main="PA ts.plot(resid_24,type="l",lty=1,ylab="",xlab="") title(main="Residual Plot",cex=0.5) text(40,0.2,"ARIMA(0,1,1)X(1,0,0,)_4",cex=0.8) abline(0,0) ts.plot(resid_25,type="l",lty=1,ylab="",xlab="") title(main="Residual Plot",cex=0.5) text(40,0.18,"ARIMA(0,1,1)X(0,1,1,)_4",cex=0.8) abline(0,0) dev.off()
############################################################
############################################################ # This is Example 2.11 in Chapter 2
z<-matrix(scan("c:\\teaching\\time series\\data\\m-decile151 decile1=z[,2]
# Model 1: an ARIMA(1,0,0)*(1,0,1)_12 fit1=arima(decile1,order=c(1,0,0),seasonal=list(order=c(1,0, period=12),include.mean=T) #print(fit1) e1=fit1$resid n=length(decile1) m=n/12 jan=rep(c(1,0,0,0,0,0,0,0,0,0,0,0),m) feb=rep(c(0,1,0,0,0,0,0,0,0,0,0,0),m)
CHAPTER 3. UNIVARIATE TIME SERIES MODELS
168
mar=rep(c(0,0,1,0,0,0,0,0,0,0,0,0),m) apr=rep(c(0,0,0,1,0,0,0,0,0,0,0,0),m) may=rep(c(0,0,0,0,1,0,0,0,0,0,0,0),m) jun=rep(c(0,0,0,0,0,1,0,0,0,0,0,0),m) jul=rep(c(0,0,0,0,0,0,1,0,0,0,0,0),m) aug=rep(c(0,0,0,0,0,0,0,1,0,0,0,0),m) sep=rep(c(0,0,0,0,0,0,0,0,1,0,0,0),m) oct=rep(c(0,0,0,0,0,0,0,0,0,1,0,0),m) nov=rep(c(0,0,0,0,0,0,0,0,0,0,1,0),m) dec=rep(c(0,0,0,0,0,0,0,0,0,0,0,1),m) de=cbind(decile1[jan==1],decile1[feb==1],decile1[mar==1],dec decile1[may==1],decile1[jun==1],decile1[jul==1],decile1[aug decile1[sep==1],decile1[oct==1],decile1[nov==1],decile1[dec # Model 2: a simple regression model without correlated err # to see the effect from January fit2=lm(decile1~jan) e2=fit2$resid #print(summary(fit2)) # Model 3: a regression model with correlated errors fit3=arima(decile1,xreg=jan,order=c(0,0,1),include.mean=T) e3=fit3$resid #print(fit3)
postscript(file="c:\\teaching\\time series\\figs\\fig-2.11.e horizontal=F,width=6,height=6) par(mfrow=c(2,2),mex=0.4) ts.plot(decile1,type="l",lty=1,col=1,ylab="",xlab="") title(main="Simple Returns",cex=0.5) abline(0,0) ts.plot(e3,type="l",lty=1,col=1,ylab="",xlab="")
CHAPTER 3. UNIVARIATE TIME SERIES MODELS
169
title(main="January-adjusted returns",cex=0.5) abline(0,0) acf(decile1, ylab="", xlab="",ylim=c(-0.5,1),lag=40,main="AC acf(e3,ylab="",xlab="",ylim=c(-0.5,1),lag=40,main="ACF") dev.off()
############################################################ # This is Example 2.12 in Chapter 2 ##################################### z<-matrix(scan("c:\\teaching\\time series\\data\\jj.dat"),by n=length(z) z_log=log(z) # log of # MODEL 1: y_t=beta_0+beta_1 t+ e_t z1=1:n fit1=lm(z_log~z1) # fit log(z) versus time e1=fit1$resid # Now, we need to re-fit the model using the transformed dat x1=5:n y_1=z_log[5:n] y_2=z_log[1:(n-4)] y_fit=y_1-0.7614*y_2 x2=x1-0.7614*(x1-4) x1=(1-0.7614)*rep(1,n-4) fit2=lm(y_fit~-1+x1+x2) e2=fit2$resid postscript(file="c:\\teaching\\time series\\figs\\fig-2.12.e horizontal=F,width=6,height=6) par(mfrow=c(2,2),mex=0.4) acf(e1, ylab="", xlab="",ylim=c(-0.5,1),lag=30,main="ACF") text(10,0.8,"detrended")
CHAPTER 3. UNIVARIATE TIME SERIES MODELS
170
pacf(e1,ylab="",xlab="",ylim=c(-0.5,1),lag=30,main="PACF") acf(e2, ylab="", xlab="",ylim=c(-0.5,1),lag=30,main="") text(15,0.8,"ARIMA(1,0,0,)_4") pacf(e2,ylab="",xlab="",ylim=c(-0.5,1),lag=30,main="") dev.off() ############################################################
############################################################ # This is Example 2.13 in Chapter 2 #####################################
z<-matrix(scan("c:\\teaching\\time series\\data\\w-gs1n36299 byrow=T,ncol=3) # first column=one year Treasury constant maturity rate; # second column=three year Treasury constant maturity rate; # third column=date x=z[,1] y=z[,2] n=length(x) u=seq(1962+1/52,by=1/52,length=n) x_diff=diff(x) y_diff=diff(y) # Fit a simple regression model and examine the residuals fit1=lm(y~x) # Model 1 e1=fit1$resid
postscript(file="c:\\teaching\\time series\\figs\\fig-2.13.e horizontal=F,width=6,height=6) matplot(u,cbind(x,y),type="l",lty=c(1,2),col=c(1,2),ylab="",
CHAPTER 3. UNIVARIATE TIME SERIES MODELS
171
dev.off()
postscript(file="c:\\teaching\\time series\\figs\\fig-2.14.e horizontal=F,width=6,height=6) par(mfrow=c(1,2),mex=0.4) plot(x,y,type="p",pch="o",ylab="",xlab="",cex=0.5) plot(x_diff,y_diff,type="p",pch="o",ylab="",xlab="",cex=0.5) dev.off()
postscript(file="c:\\teaching\\time series\\figs\\fig-2.15.e horizontal=F,width=6,height=6) par(mfrow=c(1,2),mex=0.4) plot(u,e1,type="l",lty=1,ylab="",xlab="") abline(0,0) acf(e1,ylab="",xlab="",ylim=c(-0.5,1),lag=30,main="") dev.off() # Take different and fit a simple regression again fit2=lm(y_diff~x_diff) # Model 2 e2=fit2$resid
postscript(file="c:\\teaching\\time series\\figs\\fig-2.16.e horizontal=F,width=6,height=6) matplot(u[-1],cbind(x_diff,y_diff),type="l",lty=c(1,2),col=c ylab="",xlab="") abline(0,0) dev.off()
postscript(file="c:\\teaching\\time series\\figs\\fig-2.17.e horizontal=F,width=6,height=6)
CHAPTER 3. UNIVARIATE TIME SERIES MODELS
172
par(mfrow=c(2,2),mex=0.4) ts.plot(e2,type="l",lty=1,ylab="",xlab="") abline(0,0) acf(e2, ylab="", xlab="",ylim=c(-0.5,1),lag=30,main="") # fit a model to the differenced data with an MA(1) error fit3=arima(y_diff,xreg=x_diff, order=c(0,0,1)) # Model e3=fit3$resid ts.plot(e3,type="l",lty=1,ylab="",xlab="") abline(0,0) acf(e3, ylab="",xlab="",ylim=c(-0.5,1),lag=30,main="") dev.off() ############################################################
############################################################ # This is Example 14 of using HC and HAC ##########################################
library(sandwich) # HC and HAC are in the package "sandwich library(zoo) z<-matrix(scan("c:\\teaching\\time series\\data\\w-gs1n36299 byrow=T,ncol=3) x=z[,1] y=z[,2] x_diff=diff(x) y_diff=diff(y) # Fit a simple regression model and examine the residuals fit1=lm(y_diff~x_diff) print(summary(fit1)) e1=fit1$resid # Heteroskedasticity-Consistent Covariance Matrix Estimation
CHAPTER 3. UNIVARIATE TIME SERIES MODELS
173
#hc0=vcovHC(fit1,type="const") #print(sqrt(diag(hc0))) # type=c("const","HC","HC0","HC1","HC2","HC3","HC4") # HC0 is the White estimator hc1=vcovHC(fit1,type="HC0") print(sqrt(diag(hc1))) #Heteroskedasticity and autocorrelation consistent (HAC) est #of the covariance matrix of the coefficient estimates in a #(generalized) linear regression model. hac1=vcovHAC(fit1,sandwich=T) print(sqrt(diag(hac1)))
############################################################ # This is the Example 2.15 in Chapter 2 ####################################### z1<-matrix(scan("c:\\teaching\\time series\\data\\d-ibmvwew byrow=T,ncol=5) vw=abs(z1[,3]) n_vw=length(vw) ew=abs(z1[,4]) postscript(file="c:\\teaching\\time series\\figs\\fig-2.18.e horizontal=F,width=6,height=6) par(mfrow=c(2,2),mex=0.4,bg="light green") acf(vw, ylab="",xlab="",ylim=c(-0.1,0.4),lag=400,main="") text(200,0.38,"ACF for value-weighted index") acf(ew, ylab="",xlab="",ylim=c(-0.1,0.4),lag=400,main="") text(200,0.38,"ACF for equal-weighted index") library(fracdiff) d1=fracdiff(vw,ar=0,ma=0)
CHAPTER 3. UNIVARIATE TIME SERIES MODELS
174
d2=fracdiff(ew,ar=0,ma=0) print(c(d1$d,d2$d))
m1=round(log(n_vw)/log(2)+0.5) pad1=1-n_vw/2^m1 vw_spec=spec.pgram(vw,spans=c(3,3,3),demean=T,detrend=T,pad= ew_spec=spec.pgram(ew,spans=c(3,3,3),demean=T,detrend=T,pad= vw_x=vw_spec$freq[1:1000] vw_y=vw_spec$spec[1:1000] ew_x=ew_spec$freq[1:1000] ew_y=ew_spec$spec[1:1000] scatter.smooth(vw_x,log(vw_y),span=1/15,ylab="",xlab="",col= text(0.04,-7,"Log Spectral Density of VW",cex=0.8) scatter.smooth(ew_x,log(ew_y),span=1/15,ylab="",xlab="",col= text(0.04,-7,"Log Spectral Density of EW",cex=0.8) dev.off() ############################################################ # This is the Example 2.16 in Chapter 2 ####################################### phi=c(1.2,-0.35,1.0,-0.70,0.2, 0.35,-0.2,0.35) dim(phi)=c(2,4) phi=t(phi)
postscript(file="c:\\teaching\\time series\\figs\\fig-2.19.e horizontal=F,width=6,height=6) par(mfrow=c(2,2),mex=0.4,bg="dark grey") for(j in 1:4){ rho=rep(0,20) rho[1]=1 rho[2]=phi[j,1]/(1-phi[j,2])
CHAPTER 3. UNIVARIATE TIME SERIES MODELS
175
for(i in 3:20){rho[i]=phi[j,1]*rho[i-1]+phi[j,2]*rho[i-2]} plot(1:20,rho,type="h",ylab="",ylim=c(-1,1),xlab="") if(j==1){title(main="(a)",cex=0.8)} if(j==2){title(main="(b)",cex=0.8)} if(j==3){title(main="(c)",cex=0.8)} if(j==4){title(main="(d)",cex=0.8)} abline(0,0) } dev.off()
z1<-matrix(scan("c:\\teaching\\time series\\data\\q-gnp4791. byrow=T,ncol=1) n=length(z1) x=1:n x=x/4+1946.25
postscript(file="c:\\teaching\\time series\\figs\\fig-2.20.e horizontal=F,width=6,height=6) par(mfrow=c(1,2),mex=0.4,bg="light pink") plot(x,z1,type="o",ylab="",xlab="") abline(0,0) acf(z1,main="",ylab="",xlab="",lag=30) dev.off() ############################################################ # This is for making graphs for IRF ################################################ n=100 w_t1=rnorm(n,0,1) w_t2=w_t1 w_t2[50]=w_t1[50]+1
CHAPTER 3. UNIVARIATE TIME SERIES MODELS
176
y1=rep(0,2*(n+1)) dim(y1)=c(n+1,2) y1[1,]=c(5,5) y2=y1 phi1=c(0.8,-0.8) for(i in 2:(n+1)){ y1[i,1]=phi1[1]*y1[(i-1),1]+w_t1[i-1] y1[i,2]=phi1[1]*y1[(i-1),2]+w_t2[i-1] y2[i,1]=phi1[2]*y2[(i-1),1]+w_t1[i-1] y2[i,2]=phi1[2]*y2[(i-1),2]+w_t2[i-1] } y1=y1[2:101,] y2=y2[2:101,] irf1=y1[,2]-y1[,1] irf2=y2[,2]-y2[,1] text1=c("No Impulse","With Impulse") postscript(file="c:\\teaching\\time series\\figs\\fig-2.21.e horizontal=F,width=6,height=6) par(mfrow=c(2,2),mex=0.4,bg="dark grey") ts.plot(y1,type="l",lty=1,col=c(1,2),ylab="",xlab="") abline(0,0) legend(40,0.9*max(y1[,1]),text1,lty=c(1,2),col=c(1,2),cex=0. text(40,0.8*min(y1[,1]),"phi=0.8") ts.plot(y2,type="l",lty=1,col=c(1,2),ylab="",xlab="") abline(0,0) legend(40,0.9*max(y2[,1]),text1,lty=c(1,2),col=c(1,2),cex=0. text(60,0.8*min(y2[,1]),"phi= - 0.8") plot(1:n,irf1,type="l",ylab="",ylim=c(-1,1),xlab="") abline(0,0) text(40,-0.6,"Impulse Response Function",cex=0.8)
CHAPTER 3. UNIVARIATE TIME SERIES MODELS
177
text(20,0.8,"phi=0.8") plot(1:n,irf2,type="l",ylab="",ylim=c(-1,1),xlab="") abline(0,0) text(40,-0.6,"Impulse Response Function",cex=0.8) text(20,0.8,"phi= - 0.8") dev.off()
n=100 w_t1=rnorm(n,0,1) w_t2=w_t1 w_t2[50]=w_t1[50]+1 y1=rep(0,2*(n+1)) dim(y1)=c(n+1,2) y1[1,]=c(3,3) y2=y1 phi1=c(1.01,-1.01) for(i in 2:(n+1)){ y1[i,1]=phi1[1]*y1[(i-1),1]+w_t1[i-1] y1[i,2]=phi1[1]*y1[(i-1),2]+w_t2[i-1] y2[i,1]=phi1[2]*y2[(i-1),1]+w_t1[i-1] y2[i,2]=phi1[2]*y2[(i-1),2]+w_t2[i-1] } y1=y1[2:101,] y2=y2[2:101,] irf1=y1[,2]-y1[,1] irf2=y2[,2]-y2[,1] text1=c("No Impulse","With Impulse") postscript(file="c:\\teaching\\time series\\figs\\fig-2.22.e horizontal=F,width=6,height=6) par(mfrow=c(2,2),mex=0.4,bg="light pink")
CHAPTER 3. UNIVARIATE TIME SERIES MODELS
178
ts.plot(y1,type="l",lty=1,col=c(1,2),ylab="",xlab="") abline(0,0) legend(40,0.9*max(y1[,1]),text1,lty=c(1,2),col=c(1,2),cex=0. text(40,0.8*min(y1[,1]),"phi=1.01") ts.plot(y2,type="l",lty=1,col=c(1,2),ylab="",xlab="") abline(0,0) legend(40,0.9*max(y2[,1]),text1,lty=c(1,2),col=c(1,2),cex=0. text(60,0.8*min(y2[,1]),"phi= - 1.01") plot(1:n,irf1,type="l",ylab="",ylim=c(-2,2),xlab="") abline(0,0) text(40,-0.6,"Impulse Response Function",cex=0.8) text(20,0.8,"phi=1.01") plot(1:n,irf2,type="l",ylab="",ylim=c(-2,2),xlab="") abline(0,0) text(40,-0.6,"Impulse Response Function",cex=0.8) text(20,0.8,"phi= - 1.01") dev.off()
x=1:20 phi1=cbind(0.5^x,0.9^x,0.99^x) phi2=cbind((-0.5)^x,(-0.9)^x,(-0.99)^x) postscript(file="c:\\teaching\\time series\\figs\\fig-2.23.e horizontal=F,width=6,height=6) #win.graph() par(mfrow=c(2,2),mex=0.4,bg="light green") matplot(x,phi1,type="o",lty=1,pch="o",ylab="",xlab="") text(4,0.3,"phi1=0.5",cex=0.8) text(15,0.3,"phi1=0.9",cex=0.8) text(15,0.9,"phi1=0.99",cex=0.8) matplot(x,phi2,type="o",lty=1,pch="o",ylab="",xlab="")
CHAPTER 3. UNIVARIATE TIME SERIES MODELS
179
abline(0,0) text(4,0.3,"phi1= - 0.5",cex=0.8) text(15,0.3,"phi1= - 0.9",cex=0.8) text(15,0.9,"phi1= - 0.99",cex=0.8) matplot(x,1.2^x,type="o",lty=1,pch="o",ylab="",xlab="") text(14,22,"phi1=1.2",cex=0.8) matplot(x,(-1.2)^x,type="o",lty=1,pch="o",ylab="",xlab="") abline(0,0) text(13,18,"phi1= - 1.2",cex=0.8) dev.off() n=100 w_t1=rnorm(n,0,1) w_t2=w_t1 w_t3=w_t1 w_t2[50]=w_t1[50]+1 w_t3[50:n]=w_t1[50:n]+1 y=rep(0,3*(n+1)) dim(y)=c(n+1,3) y[1,]=c(3,3,3) phi1=0.8 for(i in 2:(n+1)){ y[i,1]=phi1*y[(i-1),1]+w_t1[i-1] y[i,2]=phi1*y[(i-1),2]+w_t2[i-1] y[i,3]=phi1*y[(i-1),3]+w_t3[i-1]} y=y[2:101,1:3] irf1=y[,2]-y[,1] irf2=y[,3]-y[,1] text1=c("No Impulse","With Impulse")
CHAPTER 3. UNIVARIATE TIME SERIES MODELS
180
postscript(file="c:\\teaching\\time series\\figs\\fig-2.24.e horizontal=F,width=6,height=6) #win.graph() par(mfrow=c(2,2),mex=0.4,bg="light blue") ts.plot(y[,1:2],type="l",lty=1,col=c(1,2),ylab="",xlab="") abline(0,0) legend(40,0.9*max(y[,1]),text1,lty=c(1,2),col=c(1,2),cex=0.8 text(40,0.8*min(y[,1]),"phi=0.8") ts.plot(cbind(y[,1],y[,3]),type="l",lty=1,col=c(1,2),ylab="" abline(0,0) legend(40,0.9*max(y[,3]),text1,lty=c(1,2),col=c(1,2),cex=0.8 text(10,0.8*min(y[,3]),"phi=0.8") plot(1:n,irf1,type="l",ylab="",xlab="") abline(0,0) text(40,0.6,"Impulse Response Function",cex=0.8) text(20,0.8,"phi=0.8") plot(1:n,irf2,type="l",ylab="",xlab="") abline(0,0) text(40,3,"Impulse Response Function",cex=0.8) text(20,0.8,"phi=0.8") dev.off() ff=c(0.6,0.2,1,0,0.8,0.4,1,0,-0.9,-0.5,1,0,-0.5,-1.5,1,0) dim(ff)=c(4,4) mj=20 x=0:mj irf=rep(0,(mj+1)*4) dim(irf)=c(mj+1,4) irf[1,]=1
CHAPTER 3. UNIVARIATE TIME SERIES MODELS
181
for(j in 1:4){ aa=c(1,0,0,1) dim(aa)=c(2,2) ff1=ff[,j] dim(ff1)=c(2,2) ff1=t(ff1) for(i in 1:mj){ aa=aa%*%ff1 irf[i+1,j]=aa[1,1]}}
postscript(file="c:\\teaching\\time series\\figs\\fig-2.25.e horizontal=F,width=6,height=6) #win.graph() par(mfrow=c(2,2),mex=0.4,bg="light yellow") plot(x,irf[,1],type="o",pch="o",ylab="",ylim=c(0,1),xlab="", main="(a)",cex=0.8) text(10,0.8,"Impulse Response Function",cex=0.8) text(12,0.3,"phi1=0.6 and phi2=0.2",cex=0.8) plot(x,irf[,2],type="o",pch="o",ylab="",ylim=c(0,12),xlab="" main="(b)",cex=0.8) text(10,10,"Impulse Response Function",cex=0.8) text(9,6,"phi1=0.8 and phi2=0.4",cex=0.8) plot(x,irf[,3],type="o",pch="o",ylab="",ylim=c(-1,1),xlab="" main="(c)",cex=0.8) abline(0,0) text(10,0.8,"Impulse Response Function",cex=0.8) text(11,-0.3,"phi1= - 0.9 and phi2= - 0.5",cex=0.8) plot(x,irf[,4],type="o",pch="o",ylab="",ylim=c(-40,30),xlab= main="(d)",cex=0.8) abline(0,0)
CHAPTER 3. UNIVARIATE TIME SERIES MODELS
182
text(10,25,"Impulse Response Function",cex=0.8) text(9,-16,"phi1= - 0.5 and phi2= - 1.5",cex=0.8) dev.off() ############################################################
3.16
References
Akaike, H. (1973). Information theory and an extension of the maximum likelihood principle. In Proceeding of 2nd International Symposium on Information Theory (V. Petrov and F. Cs´aki, eds.) 267281. Akad´emiai Kiad´o, Budapest. Andrews, D.W.K. (1991). Heteroskedasticity and autocorrelation consistent covariance matrix estimation. Econometrica, 59, 817-858. Bai, Z., C.R. Rao and Y. Wu (1999). Model selection with data-oriented penalty. Journal of Statistical Planning and Inferences, 77, 103-117. Beran, J. (1994). Statistics for Long-Memory Processes. Chapman and Hall, London. Box, G.E.P. and Jenkins, G.M. (1970). Time Series Analysis, Forecasting, and Control. Holden Day, San Francisco. Box, G.E.P., G.M. Jenkins and G.C. Reinsel (1994). Time Series Analysis, Forecasting and Control. 3th Edn. Englewood Cliffs, NJ: Prentice-Hall. Brockwell, P.J. and Davis, R.A. (1991). Time Series Theory and Methods. New York: Springer. Burman, P. and R.H. Shumway (1998). Semiparametric modeling of seasonal time series. Journal of Time Series Analysis, 19, 127-145. Burnham, K.P. and D. Anderson (2003). Model Selection And Multi-Model Inference: A Practical Information Theoretic Approach, 2nd edition. New York: Springer-Verlag. Cai, Z. and R. Chen (2006). Flexible seasonal time series models. Advances in Econometrics, 20B, 63-87. Chan, N.H. and C.Z. Wei (1988). Limiting distributions of least squares estimates of unstable autoregressive processes. Annals of Statistics, 16, 367-401. Cochrane, D. and G.H. Orcutt (1949). Applications of least squares regression to relationships containing autocorrelated errors. Journal of the American Statistical Association, 44, 32-61.
CHAPTER 3. UNIVARIATE TIME SERIES MODELS
183
Ding, Z., C.W.J. Granger and R.F. Engle (1993). A long memory property of stock returns and a new model. Journal of Empirical Finance, 1, 83-106. Eicker, F. (1967). Limit theorems for regression with unequal and dependent errors. In Proceedings of the 5th Berkeley Symposium on Mathematical Statistics and Probability (L. LeCam and J. Neyman, eds.), University of California Press, Berkeley. Fan and Li (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American Statistical Association, 96, 1348-1360. Fan and Peng (2004). Nonconcave penalized likelihood with a diverging number of parameters. Annals of Statistics, 32, 928-961. Frank, I.E. and J.H. Friedman (1993). A statistical view of some chemometric regression tools (with discussion). Technometrics, 35, 109-148. Franses, P.H. (1996). Periodicity and Stochastic Trends in Economic Time Series. New York: Cambridge University Press. Franses, P.H. (1998). Time Series Models for Business and Economic Forecasting. New York: Cambridge University Press. Franses, P.H. and D. van Dijk (2000). Nonlinear Time Series Models for Empirical Finance. New York: Cambridge University Press. Ghysels, E. and D.R. Osborn (2001). The Econometric Analysis of Seasonal Time Series. New York: Cambridge University Press. Hurvich, C.M. and C.-L.Tsai (1989). Regression and time series model selection in small samples. Biometrika, 76, 297-307. Newey, W.K. and K.D. West (1987). A simple, positive-definite, heteroskedasticity and autocorrelation consistent covariance matrix. Econometrica, 55, 703-708. Schwarz, G. (1978). Estimating the dimension of a model. The Annals of Statistics, 6, 461-464. Shao (1993). Linear model selection by cross-validation. Journal of the American Statistical Association, 88, 486-494. Shen, X.T. and J.M. Ye (2002). Adaptive model selection. Journal of the American Statistical Association, 97, 210-221. Shumway, R.H. (1988). Applied Statistical Time Series Analysis. Englewood Cliffs, NJ: Prentice-Hall. Shumway, R.H., A.S. Azari and Y. Pawitan (1988). Modeling mortality fluctuations in Los Angeles as functions of pollution and weather effects. Environmental Research, 45, 224-241.
CHAPTER 3. UNIVARIATE TIME SERIES MODELS
184
Shumway, R.H. and D.S. Stoffer (2000). Time Series Analysis & Its Applications. New York: Springer-Verlag. Tiao, G.C. and R.S. Tsay (1983). Consistency properties of least squares estimates of autoregressive parameters in ARMA models. Annals of Statistics, 11, 856-871. Tibshirani, R.J. (1996). Regression shrinkage and selection via the Lasso. Journal of the Royal Statistical Society, Series B, 58, 267-288. Tsay, R.S. (2005). Analysis of Financial Time Series, 2th Edition. John Wiley & Sons, New York. Walker, G. (1931).On the periodicity in series of related terms. Proceedings of the Royal Society of London, Series A, 131, 518-532. Yule, G.U. (1927).On a method of investigating periodicities in disturbed series with special reference to Wolfer’s Sun spot numbers. Philosophical Transactions of the Royal Society of London, Series A, 226, 267-298.
Chapter 4 Non-stationary Processes and Structural Breaks 4.1
Introduction
In our analysis so far, we have assumed that the variables in the models that we have analyzed, univariate ARMA models, are stationary yt = µ + ψ(B) et, where yt may be a vector of n variables at time period t. A glance at graphs of most economic time series suffices to reveal invalidity of that assumption, because economies evolve, grow, and change over time in both real and nominal terms and economic forecasts often are very wrong, although that they should occur relatively infrequently in a stationary process. The practical problem that an econometrician faces is to find any relationships that survive for relatively long period of time so that they can be used for forecasting and policy analysis. Hendry and Juselius (2000) pointed out that four issues immediately arise in the issue of nonstationarity: 1. How important is the assumption of stationarity for modeling and inference? It is very important. When data 185
CHAPTER 4. NON-STATIONARY PROCESSES AND STRUCTURAL BREAKS
186
means and variances are non-constant, observations come from different distributions over time, creating difficult problems for empirical modeling. 2. What is the effect of incorrectly assuming it? It is potentially hazardous. Assuming constant means and variances when that is false can induce serious statistical mistakes. If the variables in yt are not stationary, then conventional hypothesis tests, confidence intervals and forecasts can be unreliable. Standard asymptotic distribution theory often does not apply to regressions involving variables with unit roots, and inference may be misleading if this is ignored. 3. What are the sources of non-stationarity? There are many and they are varied. Non-stationarity maybe due to evolution of the economy, legislative changes, technological changes, political events, etc. 4. Can empirical analysis be transformed so stationarity becomes a valid assumption? It is sometimes possible, depending on the source of non-stationarity. Some of non-stationarity can be eliminated by transformations such as de-trending and differencing, or some other type transformations; see Park and Phillips (1999) and Chang, Park and Phillips (2001) for more discussions.
Trending Time Series
A non-stationary process is one which violates the stationary requirement, so its means and variances are non-constant over time. A trend is a persistent long-term movement of a variable over time
CHAPTER 4. NON-STATIONARY PROCESSES AND STRUCTURAL BREAKS
187
and a time series fluctuates around its trend. The common features of a trending time series can be summarized as the following situations: 1. Stochastic trends. A stochastic trend is random and varies over time: (1 − L)yt = δ + ψ(L)et. That is, after taking the difference, the time series is modelled as a linear process. 2. Deterministic trends. A deterministic trend is a nonrandom function of time. For example, a deterministic trend might be linear or quadratic or higher order in time: yt = a + δ t + ψ(L)et. The difference between a linear stochastic trend and a deterministic trend is that the changes of a stochastic trend are random, whereas those of a deterministic trend are constant over time. 3. Permanence of shocks. Macroeconomists used to detrend data and regarded business cycles as the stationary deviations from that trend. Economists investigated whether GNP is better described as random walk or true stationary process. 4. Statistical issues. We could have mistaken a time series with unit roots for trend stationary time series since a time series with unit roots might display a trending phenomena. The trending time series models have gained a lot of attention during the last two decades due to many applications in economics
CHAPTER 4. NON-STATIONARY PROCESSES AND STRUCTURAL BREAKS
188
and finance. See Cai (2006) for more references. Following are some examples. The market model in finance is an example that relates the return of an individual stock to the return of a market index or another individual stock and the coefficient usually is called a betacoefficient in the capital asset pricing model (CAPM); see the books by Cochrane (2001) and Tsay (2005) for more details. However, some recent studies show that the beta-coefficients might vary over time. The term structure of interest rates is another example in which the time evolution of the relationship between interest rates with different maturities is investigated; see Tsay (2005). The last example is the relationship between the prototype electricity demand and other variables such as the income or production, the real price of electricity, and the temperature, and Chang and Martinez-Chombo (2003) found that this relationship may change over time. Although the literature is already vast and continues to grow swiftly, as pointed out by Phillips (2001), the research in this area is just beginning. 4.2
Random Walks
The basic random walk is yt = yt−1 + et with Et−1(et) = 0, where Et−1(·) is the conditional expectation given the past information up to time t − 1, which implies that Et(yt+1) = yt. Random walks have a number of interesting properties: 1. The impulse-response function (IRF) of random walk is one at all horizons. The IRF of a stationary processes dies out eventually.
CHAPTER 4. NON-STATIONARY PROCESSES AND STRUCTURAL BREAKS
189
2. The forecast variance of the random walk grows linearly with the forecast horizon: Var(yt+k | yt) = Var(yt+k − yt) = k σe2 3. The autocovariances of random walk are defined in Section 3.4. Statistical Issues
Suppose a series is generated by a random walk: yt = yt−1 + et. You might test for a random walk by running yt = µ + φ yt−1 + et by the ordinary least square (OLS) and testing whether φ = 1. However, this is not correct since the assumptions underlying the usual asymptotic theory for OLS estimates and test statistics are violated. Exercise: Why is the asymptotic theory for OLS estimates not true? 4.2.1
Inappropriate Detrending
Things get even more complicated with a trend in the model. Suppose the true model is yt = µ + yt−1 + et. Suppose you detrend the model and then fit an AR(1) model, i.e. the fitting model is: (1 − φ L)(yt − b t) = et.
CHAPTER 4. NON-STATIONARY PROCESSES AND STRUCTURAL BREAKS
190
But the above model can be written as follows: yt = α + γ t + φ yt−1 + et, so you could also run directly y on a time trend and lagged y. In this case, φ# is biased downward (under-estimate) and the standard OLS errors are misleading. 4.2.2
Spurious (nonsense) Regressions
Suppose two series are generated by random walks: yt = yt−1 + et,
xt = xt−1 + vt,
E(et vs) = 0,
for all t, s.
Now, suppose you run yt on xt by OLS: y t = α + β xt + u t . The assumptions for classical regression are violated and we tend to see “significant” β more often than OLS formulas say we should. Exercise: Please conduct a Monte Carol simulation to verify the above conclusion. 4.3
Unit Root and Stationary Processes
A more general process than pure random walk may have the following form: (1 − L) yt = µ + ψ1(L) et.
These are called unit root or difference stationary (DS) processes. In the simplest case ψ1(L) = 1, the DS process becomes a random walk with drift: yt = µ + yt−1 + et.
CHAPTER 4. NON-STATIONARY PROCESSES AND STRUCTURAL BREAKS
191
Alternatively, we may consider a process to be stationary around a linear trend: yt = µ t + ψ2(L) et. This process is called trending stationary (TS) process. The TS process can be considered as a special case of DS model. Indeed, we can write the TS model as: (1 − L)yt = (1 − L) µ t + (1 − L)ψ2(L)et = µ + ψ1(L)et. Therefore, if TS model is correct, the DS model is still valid and stationary. For more studies on the TS models, we refer the reader to the papers by Phillips (2001) and Cai (2006). One can think about unit roots as the study of the implications for levels of a process that is stationary in differences. Therefore, it is very important to keep track of whether you are thinking about the level of the process yt or its first difference. Let us examine the impulse response function (IRF) for the above two models. For the TS model, the IRF is determined by MA polynomial ψ2(L), i.e. bj is the j-th period ahead response. For the DS model, aj gives the response of the difference (1 − L)yt+j to a shock at time t. The response of the level of the series yt+j is the sum of the response of the differences. Response of yt+j to shock at period t is: IRFj = (yt−yt−1)+(yt+1−yt)+· · ·+(yt+j −yt+j−1) = a0+a1+· · ·+aj . 4.3.1
Comparison of Forecasts of TS and DS Processes
The forecast of a trend-stationary process is as follows: t y"t+s = µ(t + s) + bs et + bs+1 et−1 + bs+2 et−2 + · · · .
CHAPTER 4. NON-STATIONARY PROCESSES AND STRUCTURAL BREAKS
192
The forecast of a difference stationary process can be written as follows: t t t t y"t+s = ∆y"t+s + ∆y"t+s−1 + · · · + ∆y"t+1 + yt = (µ + bs et + bs+1 et−1 + bs+2et−2 + · · ·) +(µ + bs−1et + bset−1 + bs+1et−2 + · · ·) + · · · +(µ + b1et + b2et−1 + b3et−2 + · · ·) + yt
or t y"t+s = µ s + yt + (bs + bs−1 · · · + b1)et + (bs+1 + bs + · · · + b2)et−1 + · · ·
To see the difference between forecasts for TS and DS processes, we consider a case in which b1 = b2 = · · · = 0. Then, TS : DS :
t y"t+s = µ(t + s) t y"t+s = µ s + yt
Next, compare the forecast error for the TS and DS processes. For the TS process:
t yt+s − y"t+s = (µ(t + s) + et+s + b1et+s−1 + b2et+s−2 + · · · + bs−1et+1 + bset + −(µ(t + s) + bset + bs+1et−1 + · · ·) = et+s + b1et+s−1 + · · · + b
The MSE of this forecast is +
"t
E yt+s − yt+s For the DS process:
,2
= (1 + b21 + b22 + · · · + b2s−1) σ 2.
t t t yt+s − y"t+s = (∆yt+s + · · · + ∆yt+1 + yt) − (∆y"t+s + · · · + ∆y"t+1 + yt ) = et+s + (1 + b1)et+s−1 + (1 + b1 + b2)et+s−2 + · · · + (1 + b1 + b2
The MSE of this forecast is +
"t
E yt+s − yt+s
,2
7
= (1 + b1)2 + (1 + b1 + b2)2 + · · · + (1 + b1 + b2 + · · · + bs−1
CHAPTER 4. NON-STATIONARY PROCESSES AND STRUCTURAL BREAKS
193
Note that for the MSE for a TS process as the forecasting horizon increases, though as s becomes large, the added uncertainty from forecasting into future becomes negligible: +
"t
lim E yt+s − yt+s
s→∞
,2
= (1 + b21 + b22 + · · ·) σ 2
and the limiting MSE is just the unconditional variance of the stationary component ψ2(L). This is not true for the DS process. The MSE for a DS process does not converge to any fixed value as s goes to infinity. To summarize, for a TS process, the MSE reaches a finite bound as the forecast horizon becomes large, whereas for a unit root process, the MSE eventually grows linearly with the forecast horizon. 4.3.2
Random Walk Components and Stochastic Trends
It is well known that any DS process can be written as a sum of a random walk and a stationary component. A decomposition with a nice property is due to the Beveridge-Nelson (1981, BN). If (1 − L)yt = µ + ψ1(L)et, then we can write yt = ct + zt, where zt = % µ + zt−1 + ψ1(1)et and ct = ψ1∗(L)et with a∗j = k>j ak . To see why this decomposition is true, we need to notice that any lag polynomial ψ1(L) can be written as ψ1(L) = ψ1(1) + (1 − L)ψ1∗(L), where % a∗j = k>j ak . To see this, just write it out: ψ1(1) : a0 (1 − L)ψ1∗(L) :
+a1 +a2 +a3 −a1 −a2 −a3 +a1 L +a2 L +a3 L −a2 L −a3 L ···
··· ··· ··· ··· ···
and the terms ψ1(L) remain when you cancel all the terms. There are many ways to decompose a unit root into stationary and random walk components. The BN decomposition is a popular choice because
CHAPTER 4. NON-STATIONARY PROCESSES AND STRUCTURAL BREAKS
194
it has a special property: the random walk component is a sensible definition of the “trend” in yt. The component zt is the limiting forecast of future y, i.e. today’s y plus all future expected changes in y. In the BN decomposition the innovations to the stationary and random walk components are perfectly correlated. Consider an arbitrary combination of stationary and random walk components: yt = zt + ct, where zt = µ + zt−1 + vt and ct = ψ2(L)et. It can be shown that in every decomposition of yt into stationary and random walk components, the variance of changes to the random walk component is the same, ψ1(1)2σe2. Since the unit root process is composed of a stationary plus random walk component, the unit root process has the same variance of forecasts behavior as the random walk when the horizon is long enough. 4.4
Trend Estimation and Forecasting
4.4.1
Forecasting a Deterministic Trend
Consider the liner deterministic model: yt = α + β t + et,
t = 1, 2, . . . , T.
# t # + β(t The h-step-ahead forecast is given by yt+h =α + h), where # # and β α are the OLS estimates of the parameters α and β. The forecast variance may be computed using the following formula F+
t E yt+h − yt+h
,2 I
= σ 2 1 +
1 t + h − (t + 1)/2 2 ≈ σ , + %t 2 t m=1 (m − (t + 1)/2)
where the last approximation is valid if t (the period at which forecast is constructed) is large relative to the forecast horizon h.
CHAPTER 4. NON-STATIONARY PROCESSES AND STRUCTURAL BREAKS
4.4.2
195
Forecasting a Stochastic Trend
Consider the random walk with drift yt = α + yt−1 + et,
t = 2, 3, . . . , T.
# be an estimate of a obtained from the following regression Let α model: ∆yt = α + et.
t # h. The forecast The h-step-ahead forecast is given by yt+h = yt + α variance may be computed using the following formula F+
t E yt+h − yt+h
,2 I
h + = σ2
2
h 2 ≈ hσ , t−1
where the last approximation is valid if t is large relative to h. 4.4.3
Forecasting ARMA models with Deterministic Trends
The basic models for deterministic and stochastic trend ignore possible short-run fluctuations in the series. Consider the following ARMA model with deterministic trend ψ(L)yt = α + β t + θ(L)et, where the polynomial φ(L) satisfies the stationarity condition and θ(L) satisfies the invertibility condition. The forecast is constructed as follows: 1. Linear detrending. Estimate the following regression model: yt = δ1 + δ2 t + z!t,
and compute zt = yt − δ"1 − δ"2 t.
CHAPTER 4. NON-STATIONARY PROCESSES AND STRUCTURAL BREAKS
196
2. Estimate an appropriate ARM A(p, q) model for covariance stationary variable zt : ψ(L)zt = θ(L)et. The estimated ARM A(p, q) model may be used to construct forecasts ht period-ahead forecasts of zt, zt+h . t 3. Construct the h-period-ahead forecast of yt as follows: yt+h = t + δ"1 + δ"2(t + h). zt+h
t t The MSE of yt+h may be approximated by the MSE of zt+h .
4.4.4
Forecasting of ARIMA Models
Consider the forecasting of time series that are integrated of order 1 that are described by the ARIM A(p, 1, q) model: φ(L)(1 − L)yt = α + θ(L)et, where the polynomial φ(L) satisfies the stationarity condition and θ(L) satisfies the invertibility condition. The forecast is constructed as follows: 1. Compute the first difference of yt, i.e. zt = ∆yt. 2. Estimate an appropriate ARM A(p, q) model for covariance stationary variable zt: φ(L)zt = θ(L)et. The estimated ARM A(p, q) model may be used to construct forecasts ht period-ahead forecasts of zt, zt+h . 3. Construct the h-period-ahead forecast of yt as follows: t t yt+h = yt + zt+h
CHAPTER 4. NON-STATIONARY PROCESSES AND STRUCTURAL BREAKS
4.5
197
Unit Root Tests
Although it might be interesting to know whether a time series has a unit root, several papers have argued that the question can not be answered on the basis of a finite sample of observations. Nevertheless, you will have to conduct test of unit root in doing empirical projects. It can be done using informal or informal methods. The informal methods involve inspecting a time series plot of the data and computing the autocorrelation coefficients, as what we did in Chapters 1 and 2. If a series has a stochastic trend, the first autocorrelation coefficient will be near one. A small first autocorrelation coefficient combined with a time series plot that has no apparent trend suggests that the series does not have a trend. Dickey-Fuller’s (1979, DF) test is a most popular formal statistical procedure for unit root testing. 4.5.1
The Dickey-Fuller and Augmented Dickey-Fuller Tests
The starting point for the DF test is the autoregressive model of order one, AR(1): yt = α + ρ yt−1 + et. (4.1) If ρ = 1, yt is nonstationary and contains a stochastic trend. Therefore, within the AR(1) model, the hypothesis that yt has a trend can be tested by testing: H0 : ρ = 1 vs. H1 : ρ < 1. This test is most easily implemented by estimating a modified version of (4.1). Subtract yt−1 from both sides and let δ = ρ − 1. Then, model (4.1) becomes: ∆ yt = α + δyt−1 + et
(4.2)
CHAPTER 4. NON-STATIONARY PROCESSES AND STRUCTURAL BREAKS
198
Table 4.1: Large-sample critical values for the ADF statistic Deterministic regressors 10% Intercept only -2.57 Intercept and time trend -3.12
5% -2.86 -3.41
1% -3.43 -3.96
and the testing hypothesis is: H0 : δ = 0 vs. H1 : δ < 0. The OLS t-statistic in (4.2) testing δ = 0 is known as the DickeyFuller statistic. The extension of the DF test to the AR(p) model is a test of the null hypothesis H0 : δ = 0 against the one-sided alternative H1 : δ < 0 in the following regression: ∆yt = α + δ yt−1 + γ1 ∆ yt−1 + · · · + γp ∆yt−p + et.
(4.3)
Under the null hypothesis, yt has a stochastic trend and under the alternative hypothesis, yt is stationary. If instead the alternative hypothesis is that yt is stationary around a deterministic linear time trend, then this trend must be added as an additional regressor in model (4.3) and the DF regression becomes ∆yt = α + β t + δ yt−1 + γ1 ∆ yt−1 + · · · + γp ∆yt−p + et.
(4.4)
This is called the augmented Dickey-Fuller (ADF) test and the test statistic is the OLS t-statistic testing that δ = 0 in equation (4.4). The ADF statistic does not have a normal distribution, even in large samples. Critical values for the one-sided ADF test depend on whether the test is based on equation (4.3) or (4.4) and are given in Table 4.1. Table 17.1 of Hamilton (1994, p.502) presents a summary of DF tests for unit roots in the absence of serial correlation for testing
CHAPTER 4. NON-STATIONARY PROCESSES AND STRUCTURAL BREAKS
199
Table 4.2: Summary of DF test for unit roots in the absence of serial correlation Case 1: True process: yt = yt−1 + ut , ut ∼ N (0, σ 2 ) iid. Estimated regression: yt = ρyt−1 + ut . T (ρ" − 1) has the distribution described under the heading Case 1 in Table B.5. " 2 has the distribution described under Case 1 in Table B.6. (ρ" − 1)/σ ρ " Case 2: True process: yt = yt−1 + ut , ut ∼ N (0, σ 2 ) iid. Estimated regression: yt = α + ρ yt−1 + ut . T (ρ" − 1) has the distribution described under Case 2 in Table B.5. (ρ" − 1)/σρ"2 has the distribution described under Case 2 in Table B.6. OLS F-test of join hypothesis that α = 0 and ρ = 1 has the distribution described under Case 2 in Table B.7. Case 3: True process: yt = α + yt−1 + ut , α )= 0, ut ∼ N (0, σ 2 ) iid. Estimated regression: yt = α + ρ yt−1 + ut . (ρ" − 1)/σρ"2 → N (0, 1). Case 4: True process: yt = α + yt−1 + ut , α )= 0, ut ∼ N (0, σ 2 ) iid. Estimated regression: yt = α + ρyt−1 + δ t + ut . T (ρ" − 1) has the distribution described under Case 4 in Table B.5. (ρ" − 1)/σρ"2 has the distribution described under Case 4 in Table B.6. OLS F-test of join hypothesis that ρ = 1 and δ = 0 has the distribution described under Case 4 in Table B.7.
the null hypothesis of unit root against some different alternative hypothesis. It is very important for you to understand what your alternative hypothesis is in conducting unit root tests. I reproduce this table here, but you need to check Hamilton’s (1994) book for the critical values of DF statistic for different cases. The critical values are presented in the Appendix of the book. In the above models (4 cases), the basic assumption is that ut is iid. But this assumption is violated if ut is serially correlated and potentially heteroskedastic. To take account of serial correlation and potential heteroskedasticity, one way is to use the PP test proposed by Phillips and Perron (1988). For other tests for unit roots, please read the book by Hamilton (1994, p.532). Some recent testing
CHAPTER 4. NON-STATIONARY PROCESSES AND STRUCTURAL BREAKS
200
methods have been proposed. For example, Juhl (2005) used the functional coefficient type model of Cai, Fan and Yao (2000) to test unit root and Phillips and Park (2005) employed the nonparametric regression. Finally, notice that in R, there are at least three packages to provide unit root tests such as tseries, urca and uroot. 4.5.2
Cautions
The most reliable way to handle a trend in a series is to transform the series so that it does not have the trend. If the series has a stochastic trend, unit root, then the first difference of the series does not have a trend. In practice, you can rarely be sure whether a series has a stochastic trend or not. Recall that a failure to reject the null hypothesis doe not necessarily mean that the null hypothesis is true; it simply means that there are not enough evidence to conclude that it is false. Therefore, failure to reject the null hypothesis of a unit root using the ADF test does not mean that the series actually has a unit root. Having said that, even though failure to reject the null hypothesis of a unit root does not mean the series has a unit root, it still can be reasonable to approximate the true autoregressive root as equaling one and use the first difference of the series rather than its levels. 4.6
Structural Breaks
Another type of nonstationarity arises when the population regression function changes over the sample period. This may occur because of changes in economic policy, changes in the structure of the economy or industry, events that change the dynamics of specific industries or firm related quantities (inventories, sales, production),
CHAPTER 4. NON-STATIONARY PROCESSES AND STRUCTURAL BREAKS
201
etc. If such changes, called breaks, occur then regression models that neglect those changes lead to a misleading inference or forecasting. Breaks may result from a discrete change (or changes) in the population regression coefficients at distinct dates or from a gradual evolution of the coefficients over a longer period of time. Discrete breaks may be a result of some major changes in economic policy or in the economy (oil shocks) while “gradual” breaks, population parameters evolve slowly over time, may be a result of slow evolution of economic policy. If a break occurs in the population parameters during the sample, then the OLS regression estimates over the full sample will estimate a relationship that holds on “average”. 4.6.1
Testing for Breaks
Tests for breaks in the regression parameters depend on whether the break date is know or not. If the date of the hypothesized break in the coefficients is known, then the null hypothesis of no break can be testing using a dummy variable. Consider the following model: yt = β0 + β1yt−1 + δ1xt−1 + γ0Dt(τ ) + γ1 Dt(τ ) yt−1 + γ2 Dt(τ ) xt−1 + ut β0 + β1 yt−1 + δ1 xt−1 + ut , if t ≤ τ , = (β0 + γ0) + (β1 + γ1)yt−1 + (δ1 + γ2)xt−1 + ut, if t > τ ,
where τ denotes the hypothesized break date, Dt(τ ) is a binary variable that equals zero before the break date and one after, i.e. Dt(τ ) = 0 if t ≤ τ and Dt(τ ) = 1 if t > τ . Under the null hypothesis of no break, γ0 = γ1 = γ2 = 0, and the hypothesis of a break
CHAPTER 4. NON-STATIONARY PROCESSES AND STRUCTURAL BREAKS
202
can be tested using the F-statistic. This is called a Chow test for a break at a known break date. Indeed, the above structural break model can be regarded as a special case of the following trending time series model yt = β0(t) + β1(t) yt−1 + δ1(t) xt−1 + ut. For more discussions, see Cai (2006). If there are more variables or more lags, this test can be extended by constructing binary variable interaction variables for all the dependent variables. This approach can be modified to check for a break in a subset of the coefficients. The break date is unknown in most of the applications but you may suspect that a break occurred sometime between two dates, τ0 and τ1. The Chow test can be modified to handle this by testing for break at all possible dates t in between τ0 and τ1, then using the largest of the resulting F-statistics to test for a break at an unknown date. This modified test is often called Quandt likelihood ratio (QLR) statistic or the sup-Wald statistic: QLR = max{F (τ0), F (τ0 + 1), · · · , F (τ1)}. Since the QLR statistic is the largest of many F-statistics, its distribution is not the same as an individual F-statistic. The critical values for QLR statistic must be obtained from a special distribution. This distribution depends on the number of restriction being tested, m, τ0, τ1, and the subsample over which the F-statistics are computed expressed as a fraction of the total sample size. For the large-sample approximation to the distribution of the QLR statistic to be a good one, the subsample endpoints, τ0 and τ1, can not be too close to the end of the sample. That is why the QLR statistic is computed over a “trimmed” subset of the sample. A
CHAPTER 4. NON-STATIONARY PROCESSES AND STRUCTURAL BREAKS
203
Table 4.3: Critical Values of the QLR statistic with 15% Trimming Number of restrictions (m) 10% 1 7.12 2 5.00 3 4.09 4 3.59 5 3.26 6 3.02 7 2.84 8 2.69 9 2.58 10 2.48
5% 8.68 5.86 4.71 4.09 3.66 3.37 3.15 2.98 2.84 2.71
1% 12.16 7.78 6.02 5.12 4.53 4.12 3.82 3.57 3.38 3.23
popular choice is to use 15% trimming, that is, to set for τ0 = 0.15T and τ1 = 0.85T . With 15% trimming, the F-statistic is computed for break dates in the central 70% of the sample. Table 4.3 presents the critical values for QLR statistic computed with 15% trimming. This table is from Stock and Watson (2003) and you should check the book for a complete table. The QLR test can detect a single break, multiple discrete breaks, and a slow evolution of the regression parameters. If there is a distinct break in the regression function, the date at which the largest Chow statistic occurs is an estimator of the break date. In R, the packages strucchange and segmented provide several testing methods for testing breaks or you use the function StructTS. 4.6.2
Zivot and Andrews’s Testing Procedure
Sometimes, you would suspect that a series may either have a unit root or be a trend stationary process that has a structural break at some unknown period of time and you would want to test the null
CHAPTER 4. NON-STATIONARY PROCESSES AND STRUCTURAL BREAKS
204
hypothesis of unit root against the alternative of a trend stationary process with a structural break. This is exactly the hypothesis tested by Zivot and Andrews’s (1992) test. In this testing procedure, the null hypothesis is a unit root process without any structural breaks and the alternative hypothesis is a trend stationary process with possible structural change occurring at an unknown point in time. Zivot and Andrews (1992) suggested estimating the following regression: xt = µ+θ DUt(τ )+β t+γ DTt(τ )+α xt−1 +
k $
i=1
ci ∆ xt−i +et, (4.5)
where τ = T B /T is the break fraction; DUt(τ ) = 1 if t > τ T and 0 otherwise; DTt(τ ) = t − T τ if t > τ T and 0 otherwise; and xt is the time series of interest. This regression allows both the slope and intercept to change at date T B . Note that for t ≤ τ T (t ≤ T B ) model (4.5) becomes: xt = µ + β t + α xt−1 +
k $
i=1
ci ∆ xt−i + et,
while for t > τ T (t > T B ) model (4.5) becomes: B
xt = [µ + θ] + [β t + γ(t − T )] + α xt−1 +
k $
i=1
ci ∆ xt−i + et.
Model (4.5) is estimated by OLS with the break points ranging over the sample and the t-statistic for testing α = 1 is computed. The minimum t-statistic is reported. Critical values for 1%, 5% and 10% critical values are −5.34, −4.8 and −4.58, respectively. The appropriate number of lags in differences is estimated for each value of τ . Please read the paper by Sadorsky (1999) for more details about this method and empirical applications.
CHAPTER 4. NON-STATIONARY PROCESSES AND STRUCTURAL BREAKS
4.6.3
205
Cautions
The appropriate way to adjust for a break in the population parameters depends on the sources of the break. If a distinct break occurs at a specific date, this break will be detected with high probability by the QLR statistic, and the break date can be estimated. The regression model can be estimated using a dummy variable. If there is a distinct break, then inference on the regression coefficients can proceed as usual using t-statistics for hypothesis testing. Forecasts can be produced using the estimated regression model that applies to the end of the sample. The problem is more difficult if the break is not distinct and the parameters slowly evolve over time. In this case a state-space modelling is required. 4.7
Problems
1. You will build a time-series model for real oil price listed in the tenth column in file “MacroData.xls”. (a) Construct graphs of time series data: time plots and scatter plots of the levels of real oil prices (OP) and the log-difference of oil prices. The log-difference of oil price is defined as follows ∆ log(OPt) = log(OPt) − log(OPt−1). Comment your results. (b) Try to identify lag structure for levels of real oil prices and the log-difference of oil prices by using ACF and PACF. Comment your results. (c) We will not estimate ARMA models yet. So, estimate AR(p) model, 1 ≤ p ≤ 8. Compute Akaike Information Criteria (AIC) or AICC and Schwarz Information Criteria (SIC). Present your results. Choose the AR lag length based on the AIC or AICC.
CHAPTER 4. NON-STATIONARY PROCESSES AND STRUCTURAL BREAKS
206
Which lag length would you choose based on the AIC or AICC or SIC? Comment. (d) Estimate the AR(p) model with the optimal lag length. Present your results nicely. (e) Conduct the diagnostic checking of the estimated AR model. You need do the followings and comment your results: (i) Construct graphs of residuals (time series plots, scatterplots, squared residuals). (ii) Check for serial correlation using sample ACF. (iii) Check for serial correlation using Ljung-Box test statistic. (iv) Conduct Jarque-Bera test of normality and make a Q-Q plot. (v) Estimate the following AR(1) model for the estimated squared residuals and test the null hypothesis that the slope is insignificant. How would interpret your results? What does it say about the constancy of variance? (vi) Based on diagnostic checking in the above, can you use the model or you should go back to identification of lag structure in (b)? (f) Is there any structure change for oil price? Note: The Jarque-Bera (1980, 1987) test evaluates the hypothesis that X has a normal distribution with unspecified mean and variance, against the alternative that X does not have a normal distribution. The test is based on the sample skewness and kurtosis of X. For a true normal distribution, the sample skewness should be near 0 and the sample kurtosis should be near 3. A test has the following general form:
2
T (K − 3) 2 Sk + → χ , 2 6 4 where Sk and K are the measures of skewness and kurtosis respectively. To use the build-in function for the Jarque-Bera test JB =
CHAPTER 4. NON-STATIONARY PROCESSES AND STRUCTURAL BREAKS
207
in the package tseries in R, the command for the Jarque-Bera test is library(tseries) jb=jarque.bera.test(x) print(jb)
# call the package "tseries" # x is the series for the test # print the testing result
2. Based on the economic theory, the description of inflation suggests two fundamental causes, excess monetary growth (faster than real output) and the dissipation of external shocks. The precise mechanisms at work, appropriate lag structures are not perfectly defined. In this exercise, you will estimate the following simple model: ∆ log(Pt) = β1+β2[∆ log(M 1t−1)−∆ log(Qt−1)]+β3 ∆ log(Pt−1)+ut, where Pt is the quarterly price level (CPI) listed in the eighth column in “MacroData.xls”, Qt is the quarterly real output (listed in the third column), and M 1t is the quarterly money stock (listed in the thirteenth column). (a) Nicely present the results of OLS estimation. Comment your results. (b) Explain what may be an advantage of the above model compared to a simple autoregressive model of prices. To answer this question, you might need to do some statistical analysis. (c) Is there any structure change for CPI? (d) Any suggestions to build a better model? 3. In this exercise, you will build a time series model for the real GDP process listed in the second column in “MacroData.xls”.
CHAPTER 4. NON-STATIONARY PROCESSES AND STRUCTURAL BREAKS
208
(a) Build informatively and formally an AR(p) model with the optimal lag length based on some criterion and conduct the diagnostic checking of the estimated AR model. (b) Based on the data 1959.1-1999.3 construct forecasts for the quarters 1999.4 -2002.3. Plot the constructed forecasts and the realized values. Comment your results. 4. You need to replicate some of the steps in the analysis of oil price shocks and stock market activity conducted by Sadorsky (1999). You should write your report in such a way that an outside reader may understand what the report is about and what you are doing. Write a referee report for the paper of Sadorsky (1999). One possible structure of the referee report is: (a) Summary of the paper (This assures that you really read the paper carefully): (i) Is the economic/financial question of relevance? (ii) What have you learned from reading this paper? (iii) What contribution does this paper make to the literature? (b) Can you think of interesting extensions for the paper? (c) Expository quality of the paper: (i) Is the paper well structured? If not, suggest and alternative structure. (ii) Is the paper easy to read? 5. Analyze the stochastic properties of the following interest rates: (i) federal funds rate, (ii) 90-day T-bill rate, (iii) 1-year T-bond interest rate, (iv) 5-year T-bond interest rate; (v) 10-year T-bond interest rate. The interest rates may be found in the Excel file “IntRates.xls”. (a) Use the ADF or PP approach to test the null hypothesis that five interest rate are difference stationary against the alternative
CHAPTER 4. NON-STATIONARY PROCESSES AND STRUCTURAL BREAKS
209
that they are stationary. Explain carefully how you conduct the test. (b) Use the ADF or PP approach to test the null hypothesis that five interest rate are difference stationary against the alternative that they are stationary around a deterministic trend. Explain carefully how you conduct the test. (c) Use the QLR testing procedure to test whether there was at least one structural break in interest rates series. 4.8
Computer Code
The following R commands are used for making the graphs in this chapter. # 5-20-2006 graphics.off()
############################################################ y=read.csv("c:\\teaching\\time series\\data\\MacroData.csv", cpi=y[,8] qt=y[,3] m0=y[,12] m1=y[,13] m2=y[,14] m3=y[,15] op=y[,10] v0=cpi*qt/m0 v1=cpi*qt/m1 v2=cpi*qt/m2
CHAPTER 4. NON-STATIONARY PROCESSES AND STRUCTURAL BREAKS
v3=cpi*qt/m3 vt=cbind(v0,v1,v2,v3) win.graph() par(mfrow=c(2,2),mex=0.4,bg="light blue") ts.plot(cpi,type="l",lty=1,ylab="",xlab="") title(main="CPI",col.main="red") ts.plot(qt,type="l",lty=1,ylab="",xlab="") title(main="Industry Output",col.main="red") ts.plot(qt,type="l",lty=1,ylab="",xlab="") title(main="Oil Price",col.main="red") win.graph() par(mfrow=c(2,2),mex=0.4,bg="light grey") ts.plot(m0,type="l",lty=1,ylab="",xlab="") title(main="Money Aggregate",col.main="red") ts.plot(m1,type="l",lty=1,ylab="",xlab="") title(main="Money Aggregate",col.main="red") ts.plot(m2,type="l",lty=1,ylab="",xlab="") title(main="Money Aggregate",col.main="red") ts.plot(m3,type="l",lty=1,ylab="",xlab="") title(main="Money Aggregate",col.main="red") win.graph() par(mfrow=c(2,2),mex=0.4,bg="yellow") ts.plot(v0,type="l",lty=1,ylab="",xlab="") title(main="Velocity",col.main="red") ts.plot(v1,type="l",lty=1,ylab="",xlab="") title(main="Velocity",col.main="red") ts.plot(v2,type="l",lty=1,ylab="",xlab="")
210
CHAPTER 4. NON-STATIONARY PROCESSES AND STRUCTURAL BREAKS
211
title(main="Velocity",col.main="red") ts.plot(v3,type="l",lty=1,ylab="",xlab="") title(main="Velocity",col.main="red") library(tseries) library(urca) library(quadprog) library(zoo)
# call library(tseries) # call library(urca)
adf_test=adf.test(cpi) # Augmented Dickey-Fuller test print(adf_test) adf_test=pp.test(cpi) # do Phillips-Perron test print(adf_test) #adf_test2=ur.df(y=cpi,lag=5,type=c("drift")) #print(adf_test2) adf_test=adf.test(op) print(adf_test) adf_test=pp.test(op) print(adf_test)
# Augmented Dickey-Fuller test # do Phillips-Perron test
for(i in 1:4){ adf_test=pp.test(vt[,i]) print(adf_test) adf_test=adf.test(vt[,i]) print(adf_test) }
############################################################
CHAPTER 4. NON-STATIONARY PROCESSES AND STRUCTURAL BREAKS
212
y=read.csv("c:\\teaching\\time series\\data\\MacroData.csv", op=y[,10] library(strucchange) win.graph() par(mfrow=c(2,2),mex=0.4,bg="green") op=ts(op) fs.op <- Fstats(op ~ 1) # no lags and covariat plot(op,type="l") plot(fs.op) sctest(fs.op) ## visualize the breakpoint implied by the argmax of the F s plot(op,type="l") lines(breakpoints(fs.op)) ##################################### # The following is the example from R ###################################### win.graph() par(mfrow=c(2,2),mex=0.4,bg="red") if(! "package:stats" %in% search()) library(ts) ## Nile data with one breakpoint: the annual flows drop ## because the first Ashwan dam was built data(Nile) plot(Nile)
## test whether the annual flow remains constant over th fs.nile <- Fstats(Nile ~ 1) plot(fs.nile) sctest(fs.nile)
CHAPTER 4. NON-STATIONARY PROCESSES AND STRUCTURAL BREAKS
213
plot(Nile) lines(breakpoints(fs.nile)) ############################################################
4.9
References
Beveridge, S. and C.R. Nelson (1981). A new approach to decomposition of economic time series into permanent and transitory components with particular attention to measurement of the business cycle. Journal of Monetary Economics, 7, 151-174. Cai, Z. (2006). Trending time varying coefficient time series models with serially correlated errors. Forthcoming in Journal of Econometrics. Cai, Z., J. Fan, and Q. Yao (2000). Functional-coefficient regression models for nonlinear time series. Journal of the American Statistical Association, 95, 941-956. Chang, Y., J.Y. Park and P.C.B. Phillips (2001). Nonlinear econometric models with cointegrated and deterministically treading regressors. Econometrics Journal, 4, 1-36. Chang, Y. and E. Martinez-Chombo (2003). Electricity demand analysis using cointegration and error-correction models with time varying parameters: The Mexican case, Working paper, Department of Economics, Rice University. Cochrane, J.H. (1997). Time series for macroeconomics and finance. Lecture Notes. http://gsb.uchicago.edu/fac/john.cochrane/research/Papers/timeser1.pdf Cochrane, J.H. (2001). Asset Pricing. New Jersey: Princeton University Press. Dickey, D.A. and W.A. Fuller (1979). Distribution of the estimators for autoregressive time series with a unit root. Journal of the American Statistical Association, 74, 427-431. Hamilton, J.D. (1994). Time Series Analysis. Princeton University Press. Heij, C., P. de Boer, P.H. Franses and H.K. van Dijk (2004). Econometric Methods with Applications in Business and Economics. Oxford University Press. Hendry, F.D. and K. Juselius (2000). Explaining cointegration analysis: Part I. Journal of Energy, 21, 1-42. Jarque, C.M. and A.K. Bera (1980). Efficient tests for normality, homoscedasticity and serial independence of regression residuals. Economics Letters, 6, 255-259. Jarque, C.M. and A.K. Bera (1987). A test for normality of observations and regression residuals. International Statistical Review, 55, 163-172.
CHAPTER 4. NON-STATIONARY PROCESSES AND STRUCTURAL BREAKS
214
Juhl, T. (2005). Functional coefficient models under unit root behavior. Econometrics Journal, 8, 197-213. Park, J.Y. and P.C.B. Phillips (1999). Asymptotics for nonlinear transformations of integrated time series. Econometric Theory, 15, 269-298. Park, J.Y. and P.C.B. Phillips (2002). Nonlinear regressions with integrated time series. Econometrica, 69, 117-161. Phillips, P.C.B. (2001). Trending time series and macroeconomic activity: Some present and future challenges. Journal of Econometrics, 100, 21-27. Phillips, P.C.B. and J. Park (2005). Non-stationary density and kernel autoregression. Under revision for Econometrics Theory. Phillips, P.C.B. and P. Perron (1988). Testing for a unit root in time series regression. Biometrika, 75, 335-346. Sadorsky, P. (1999). Oil price shocks and stock market activity. Energy Economics, 21, 449-469. Stock, J.H. and M.W. Watson (2003). Introduction to Econometrics. Addison-Wesley. Tsay, R.S. (2005). Analysis of Financial Time Series, 2th Edition. John Wiley & Sons, New York. Zivot, E. and D.W.K. Andrews (1992). Further evidence on the great crash, the oil price shock and the unit root hypothesis. Journal of Business and Economics Statistics, 10, 251-270.
Chapter 5 Vector Autoregressive Models 5.1
Introduction
A univariate autoregression is a single-equation, single-variable linear model in which the current value of a variable is explained by its own lagged values. Multivariate models look like the univariate models with the letters re-interpreted as vectors and matrices. Consider a multivariate time series:
y xt = t . zt Recall that by multivariate white noise et ∼ N (0, Σ), we mean that
v et = t , ut
E(et) = 0,
2 σvu σ v , E(etet) = Σ = σuv σu2
E(ete-t−j ) = 0.
The AR(1) model for the random vector xt is xt = φ xt−1 + et which in multivariate framework means that
y φ φyz t = yy φzy φzz zt
v y t−1 + t . zt−1 ut
Notice that both lagged y and z appear in each equation which means that the multivariate AR(1) process (VAR) captures cross-variable dynamics (co-movements). 215
CHAPTER 5. VECTOR AUTOREGRESSIVE MODELS
216
A general VAR is an n-equation, n-variable linear model in which each variable is explained by its own lagged values, plus current and past values of the remaining n − 1 variables: yt = Φ1 yt−1 + · · · + Φp yt−1 + et,
(5.1)
where
y1t y2t , yt = ... ynt
e1t e2t , et = ... ent
(i)
(i)
(i)
φ12 · · · φ1n φ11 (i) (i) (i) φ · · · φ φ21 22 2n Φi = , ... ... ... ... (i) (i) (i) φn1 φn2 · · · φnn
and the error terms et have a variance-covariance matrix Ω. The name “vector autoregression” is usually used in the place of “vector ARMA” because it is very uncommon to estimate moving average terms. Autoregressions are easy to estimate because the OLS assumptions still apply and each equation may be estimated by ordinary least squares regression. The MA terms have to be estimated by maximum likelihood. However, since every MA process has AR(∞) representation, pure autoregression can approximate MA process if enough lags are included in AR representation. The usefulness of VAR models is that macro economists can do four things: (1) describe and summarize macroeconomic data; (2) make macroeconomic forecasts; (3) quantify what we do or do not know about the true structure of the macro economy; (4) advise macroeconomic policymakers. In data description and forecasting, VARs have proved to be powerful and reliable tools that are now in everyday use. Policy analysis is more difficult in VAR framework because it requires differentiating between correlation and causation, so called “identification problem”. Economic theory is required to solve the identification problem. Standard practice in VAR analysis is
CHAPTER 5. VECTOR AUTOREGRESSIVE MODELS
217
to report results from Granger-causality, impulse responses and forecast error variance decomposition which will be discussed respectively in the next sections. For more about the history and recent developments as well as application, see the paper by Stock and Watson (2001). VAR models come in three varieties: reduced form, recursive andstructural forms. Here, we only focus on the first one and the details for the last two can be found in Hamilton (1994, Chapter 11). A reduced form VAR process of order p has the same form as equation (5.1): yt = Φ1 yt−1 + · · · + Φp yt−1 + et,
(5.2)
where yt is an n × 1 vector of the variables, Φi is an n × n matrix of the parameters, c is an n × 1 vector of constants, and et ∼ N (0, Ω). The error terms in these regressions are the “surprise” movements in the variables after taking their past values into account. The model (5.2) can be presented in many different ways: (1) using the lag operator notation as Φ(L) yt = c + et, where Φ(L) = In − Φ1 L − · · · − Φp Lp; (2) in the matrix notation as Y =c+XΠ+E where E is a T × n matrix of the disturbances and Y is a T × n matrix of the observations; (3) in terms of deviations from the mean (centerized). For estimating VAR models, several methods are available: simple models can be fitted by the function ar() in the package stats (builtin), more elaborate models are provided by estVARXls() in the
CHAPTER 5. VECTOR AUTOREGRESSIVE MODELS
218
package dse1 and a Bayesian approach is available in the package MSBVAR. 5.1.1
Properties of VAR Models
A vector process is said to be a covariance-stationary (weakly stationary) if its first and second moments do not depend on t. The VAR(p) model in equation (5.1) can be written in the form of VAR(1), called companion form, process as: ξt = F ξt−1 + vt,
(5.3)
where
yt yt−1 ξt = , ... yt−p+1
et 0 vt = .. , . 0
Φ1 Φ2 Φ3 I 0 0 n 0 F = In 0 . ... ... . . 0 0 0
To understand conditions for stationarity of a vector process, note from the above equation that ξt+s = vt+s + F vt+s−1 + F2 vt+s−2 + · · · + Fs−1 vt+1 + Fs ξt.
Proposition 4.1 The VAR process is covariance-stationary if the eigenvalues of the matrix F are less than unity in absolute value. For covariance-stationary n-dimensional vector process, the j-th auto-covariance is defined to be the following n × n matrix Γj = E(yt − µ)(yt−j − µ)-. Note that Γj )= Γ−j but Γj = Γ-−j .
· · · Φp−1 Φp ··· 0 0 . ··· 0 0 . . ... .. .. ··· In 0
CHAPTER 5. VECTOR AUTOREGRESSIVE MODELS
219
A vector moving average process of order q takes the following forms M A(q) : yt = et + Θ1 et−1 + · · · + Θq et−q ,
where et is a vector of white noises and et ∼ N (0, Ω). The VAR(p) model may be presented as an MA(∞) model yt =
p $
j=1
Φj yt−j + et =
∞ $
k=0
Ψk et−k ,
where the sequence {Ψk } is assumed to be absolutely summable. To compute the variance of a VAR process, let us rewrite a VAR(p) process in the form of VAR(1) process as in (5.3). Assume that vectors ξ and y are covariance-stationary, let Σ denote the variance of ξ. Then, Σ is defined as follows:
Γ1 · · · Γp−1 Γ0 Γ · · · Γ Γ1 0 p−2 = F Σ F- + Q, Σ= . . . . . . . . . . . . Γp−1 Γp−2 · · · Γ0 where Q = Var(vt). We can apply Vec operator (If Ad×d is symmetric, Vec(A) denotes the d(d + 1)/2 column vector representing the stacked up columns of A which are on and below the diagonal of A) to both sides of the above equation, Vec(Σ) = Vec(F ΣF-) + Vec(Q) = (F ⊗ F) Vec(Σ) + Vec(Q), and with A = F ⊗ F, Vec(Σ) = (Ir2 − A)−1Vec(Q) provided that the matrix Ir2 − A is nonsingular, r = np. If the process ξt is covariance-stationary, then Ir2 − A is nonsingular.
CHAPTER 5. VECTOR AUTOREGRESSIVE MODELS
5.1.2
220
Statistical Inferences
Suppose we have a sample of size T , {yt}Tt=1, drawn from an ndimensional covariance-stationary process with E(yt) = µ and E[(yt− µ)(yt−j − µ)-] = Γj . As usually, the sample mean is defined : yT =
T $
t=1
yt .
It is easy to show that for a variance-covariance stationary process: E(y T ) = µ,
T 1 $ and Var(y T ) = 2 T Γ0 + (T − j){Γj + Γ−j } . T j=1
Proposition 4.2 Let yt be a covariance-stationary process with the mean µ and the auto-covariance Γj and with absolutely summable autocovariances. Then the sample mean y T satisfies: y T converges % to µ in probability and T Var(y T ) converges to ∞ j=−∞ Γj ≡ S. A consistent estimate of S can be constructed based on the Newey and West’s (1987) HAC estimator as follows (see Section 3.10): q $
8 j 7" 1 − Γ−j + Γ" j , S# = Γ" 0 + q+1 j=1
where
1 Γj = T −j "
T $
(yt − y T )(yt−j − y T )-
v=j+1
and one can set the value of q as follows: q = 0.75 T 1/3. To estimate the parameters in a VAR(p) model, we consider the following VAR(p) model given in (5.2) as yt = c + Φ1 yt−1 + · · · + Φp yt−1 + et,
CHAPTER 5. VECTOR AUTOREGRESSIVE MODELS
221
where yt is an n × 1 vector containing the values that n variables assume at date t; et ∼ N (0, Ω). We assume that we have observed each of these n variables for (T + p) time periods, i.e we observe the following sample {y−p+1, . . . , y0, y1, y2, . . . , yT }. The simplest approach to model estimation is to condition on the first p observations {y−p+1, y−p+2, . . . , y0} and to do estimation using the last T observations {y1, y2, . . . , yT }. We use the following notation:
1 y1 , xt = ... yt−p
c - Φ1 Π= . , .. Φp
where xt is a (np + 1) × 1 vector and Π is an n × (np + 1) matrix. Then equation (5.2) can be written as: yt = Π-xt + et, where yt is an n × 1 vector of the variables at period t, et is an n × 1 vector of the disturbances and et ∼ N (0, Ω). The likelihood function for the model (5.2) can be calculated in the same way as for a univariate autoregression. The log likelihood for the entire sample is as follows: T 1 $ T ln |Ω| − (yt − Π-xt)-Ω−1(yt − Π-xt), L(Π, Ω) = C − 2 2 t=1 where C is a constant. To find the MLE estimates of the parameters Π and Ω, we find the derivatives of the log-likelihood in the above with respect to Π and Ω and set them equal to zero. It can be shown that the MLE estimates of Π and Ω are as follows: # Π = (X -X)−1X -Y,
# ## and Ω =E E /T
# # with E = Y − X Π.
The OLS estimator of Π is the same as the unrestricted MLE estimator.
CHAPTER 5. VECTOR AUTOREGRESSIVE MODELS
222
To test the hypothesis H0 : θ = θ0, we can use the popular approach: likelihood ratio test. To do so, we need to calculate the maximum values θ" and θ"0 under H0 and H1, respectively. The Likelihood Ratio test statistic is found as: " λT = −2 [L(θ) − L(θ"0)].
Under the null hypothesis, λT asymptotically has a χ2 distribution with degrees of freedom equal to the number of restrictions imposed under H0. To derive the asymptotic properties of the MLE, let us define # # # # = Vec(Π), π where Π is the MLE of Π. Note that since Π is an np×n # is an n2 p×1 vector. It can be showed in Proposition 11.1 of matrix, π % Hamilton (1994) that: (1) (1/T ) Tt=1 xtx-t → Q in probability, where # # → π in probability; (3) Ω matrix; (2) π → Q = E(xtx-t) is an np×np √ J K # − π) → N 0, Ω ⊗ Q−1 . Therefore, π # Ω in probability; (4) T (π J K # ≈ N π, Ω ⊗ (X - X)−1 . To test can be treated as approximately: π the hypothesis of the form H0 : R π = r we can use the following form of the Wald test: 2
-
7
5
#
-
−1
χ (m) = (Rπ − r) R Ω ⊗ (X X) 5.2
#
6
R
8 - −1
# − r). (Rπ
Impulse-Response Function
Impulse responses trace out responses of current and future values of each of the variables to a one-unit increase in the current value of one of the VAR errors, assuming that this error returns to zero in subsequent period and that all other errors are equal to zero. This function is of interest for several reasons: It is another characterization of the behavior of models and it allows one to think about “causes” and “effects”.
CHAPTER 5. VECTOR AUTOREGRESSIVE MODELS
223
Recall that for an AR(1) process, the model is xt = φ xt−1 + et or % j xt = ∞ j=0 φ et−j . Based on the MA(∞) representation, we see from Section 3.13 that the impulse-response function is ∂xt+j = φj . ∂et Vector process works the same way. The covariance-stationary VAR model can be written in MA(∞) form as yt = µ +
∞ $
j=0
Ψj et−j .
Then,
∂yt+j = Ψj . ∂e-t The element ψij of Ψs identifies the consequences of a one-unit increase in the j-th variable’s innovation at date t (ejt) for the value of the i-th variable at time t + s (yi,t+s), holding all other innovations at all dates constant. One may also find the response of a specific variable to shocks in all other variables or the response of all variables to a specific shock ∂yi,t+s ∂yt+s (s) (s) = Ψ , and = Ψ .i j. . ∂e-t ∂ejt If one is interested in how the variables of the vector yt+s are affected if the first element of et changed by δ1 at the same time that the second element changed by δ2, and the n-th element by δn, then the combined effect of these changes on the value of yt+s is given by n ∂yt+s $ ∆ yt+s = - δj = Ψs δ. ∂e j=1 jt A plot of the row i, column j element of Ψs,
S i,t+s
∂y ∂e jt
s=0
CHAPTER 5. VECTOR AUTOREGRESSIVE MODELS
224
as a function of s is called the orthogonal impulse-response function. It describes the response of yi,t+s to a one-time impulse in yjt with all other variables dated t or earlier held constant.
Suppose that the date t value of the first variable in the autoregression, y1t, was higher than expected. How does this cause us to revise the forecast of yi,t+s? To answer this question, we define , . . . , yt−p ), where yt−i is an n × 1 vector and xt−1 x-t−1 = (yt−1 , yt−2 is an np × 1 vector. The question becomes, what is ∂E(yi,t+s |y1t, xt−1) ? ∂y1t Note that ∂E(yi,t+s |y1t, xt−1) ∂E(yi,t+s |y1t, xt−1) ∂E(et | y1t, xt−1) (s) = × = ψ . .1 ∂y1t ∂E(e-t | y1t, xt−1) ∂y1t Let us examine the forecast revision resulting from new information about the second variable, y2t, beyond that contained in the first variables, y1t, ∂E(yi,t+s |y1t, y2t, xt−1) ∂E(yi,t+s |y1t, y2t, xt−1) ∂E(et | y1t, y2t, xt−1) ( = × = ψ .2 ∂y2t ∂E(e-t | y1t, y2t, xt−1) ∂y2t Similarly we might find the forecast revision for the third variable and so on. For variable ynt, ∂E(yi,t+s |y1t, · · · , ynt, xt−1) ∂E(yi,t+s |y1t, · · · , ynt, xt−1) ∂E(et | y1t, · · · , y = × ∂ynt ∂E(e-t | y1t, · · · , yn2t, xt−1) ∂ynt The followings are three important properties of impulse-responses: First, the MA(∞) representation is the same thing as the impulseresponse function; second, the easiest way to calculate MA(∞) representation is to simulate the impulse response function; finally, the impulse response function is the same as Et(yt+j ) − Et−1(yt+j ).
CHAPTER 5. VECTOR AUTOREGRESSIVE MODELS
5.3
225
Variance Decompositions
In the organized system, we can compute an accounting of forecast error variance: what percent of the k step ahead forecast error variance is due to which variable. To do this, we start with MA representation: yt = Ψ(L) et where yt = (xt, zt)-, et = (ext, ezt)-, E(et e-t) = I, and Ψ(L) = %∞ j j=0 Ψj L . The one step forecast error is et+1 = yt1+ − Et(yt+1) = Ψ0 et
and its variance is 2 2 Vart(xt+1) = ψxx,0 + ψxz,0 . 2 ψxx,0 gives the amount of one-step ahead forecast error variance of 2 x due to the ex shock and ψxz,0 gives the amount due to ez shock. 2 2 2 /(ψxx,0 + ψxz,0 ). More In practice, one usually reports fractions ψxx,0 formally, we can write
Vart(yt+1) = Ψ0 Ψ-0. Define
0 0 1 0 . and I2 = I1 = 0 1 0 0 Then, the part due of the one step ahead forecast error variance due to the first shock x is Ψ0 I1Ψ-0 and the part due to the second shock z is Ψ0 I2Ψ-0. Generalizing to k steps can be done as follows: Vart(yt+k ) =
k−1 $ j=0
Ψj Ψ-j .
Then, wk,τ =
k−1 $ j=0
Ψj Iτ Ψ-j
CHAPTER 5. VECTOR AUTOREGRESSIVE MODELS
226
is the variance of k step ahead forecast errors due to the τ -th shock and the variance is sum of these components, e.g. Vart(yt+k ) = % τ wk,τ . 5.4
Granger Causality
The first thing that you learn in econometrics is a caution that putting x on the right hand side of y = x-β + e does not mean that x “causes” y. Then you learn that causality is not something you can test for statistically, but must be known a priori. It turns out that there is a limited sense in which we can test whether one variable “causes” another and vice versa. Granger-causality statistics examine whether lagged values of one variable help to predict another variable. The variable y fails to Granger-cause the variable x if for all s > 0 the MSE of a forecast of xt+s based on (xt, xt−1, . . .) is the same as the MSE of a forecast of xt+s that uses both (xt, xt−1, . . .) and (yt, yt−1, . . .). If one considers only linear functions, y fails to Granger-cause x if # # M SE[E(x t+s | xt , xt−1 , . . .)] = M SE[E(xt+s | xt , xt−1 , . . . , yt , yt−1 , . . .)].
Equivalently, we say that x is exogenous in the time series sense with respect to y if the above holds. Or, y is not linear informative about future x.
In a bivariate VAR model describing x and y, x does not Grangercause y if the coefficients matrices Φj are lower triangular for all j, i.e.
x t = yt
(1) c φ 1 + 11 (1) c2 φ21
(p) 0 xt−1 φ11 +· · ·+ (p) (1) yt−1 φ21 φ22
0 xt−p e1t + (p) e2t yt−p φ22
CHAPTER 5. VECTOR AUTOREGRESSIVE MODELS
227
The Granger-causality can be tested by conducting F -test of the null hypothesis (1) (2) (p) H0 : φ21 = φ21 = · · · = φ21 = 0.
The first and most famous application of Granger causality was the question of whether “money growth causes changes in GNP”. Friedman and Schwartz (1963) documented a correlation between money growth and GNP. But Tobin (1970) argued that a phase lead and a correlation may not indicate causality. Sims (1980) answered this criticism and had shown that money Granger causes GNP and not vice versa (he found different results later). Sims (1980) analyzed the following regression to study effect of money on GNP: yt =
∞ $
j=0
bj mt−j + ut.
This regression is known as a “St. Louis Fed” equation. The coefficients were interpreted as the response of y to changes in m, i.e. if the Fed sets m, {bj } gives the response of y. Since the coefficients were relatively big, it implied that constant money growth rules were desirable. The obvious objection to this statement is that coefficients may reflect reverse causality: the Fed sets money in anticipation of subsequent economic growth, or the Fed sets money in response to past y. This means that the error term u is correlated with current and lagged m so OLS estimates of the parameters b are inconsistent due to the endogeneity problem. Why is “Granger causality” not “causality”? Granger causality is not causality because of the possible effect of other variables. If x leads to y with one lag but to z with two lags, then y will Granger cause z in bivariate system. The reason is that y will help forecast z because it reveals information about the “true
CHAPTER 5. VECTOR AUTOREGRESSIVE MODELS
228
Table 5.1: Sims variance decomposition in three variable VAR model Explained by shocks to Var. of M1 IP WPI M1 97 2 1 IP 37 44 18 WPI 14 7 80
Table 5.2: Sims variance decomposition including interest rates Explained by shocks to Var. of R M1 IP WPI R 50 19 4 28 M1 56 42 1 1 IP 2 32 60 6 WPI 30 4 14 52
cause” x. But it does not follow that if you change y then a change in z will follow. This would not be a problem if the estimated pattern of causality in macroeconomic time series was stable over the inclusion of several variables. An example by Sims (1980) illustrated that is not the case very often. Sims (1980) estimated three variable VAR with money, industrial production and whole sale price index and four variable VAR with interest rate, money, industrial production and wholesale price index. The results are in Table 5.1. The first row in Table 5.1 verifies that M 1 is exogenous because it does not respond to other variables’ shocks. The second row shows that M 1 “causes” changes in IP , since 37% of the 48 month ahead variance of IP is due to M 1 shocks. The third row is puzzling because it shows that W P I is exogenous. Table 5.2 shows what happens when on more variable, interest rate, is added to the model. The second row shows substantial response of M 1 to interest rate shocks. In this model M 1
CHAPTER 5. VECTOR AUTOREGRESSIVE MODELS
229
is not exogenous. In the third row one can see that M 1 does influence IP ; the fourth row shows that M 1 does not influence W P I, interest rate does. 5.5
Forecasting
To do forecasting, we need to do the following things. First, choose the lag length of VAR using either one of the information criteria (AIC, SIC, AICC) or one of the forecasting criteria; second, estimate the VAR model using the OLS and obtain the parameter estimates # Φ j ; and finally, the h-period-ahead forecasts are constructed recursively:
and so on.
# # # t y"t+1 = Φ 1 yt + Φ2 yt−1 + · · · + Φp yt−p+1 # "t # # t y"t+2 = Φ 1 yt+1 + Φ2 yt + · · · + Φp yt−p+2 # "t # # t t y"t+3 = Φ 1 yt+2 + Φ2 y"t+1 + · · · + Φp yt−p+3
How well do VARs perform the tasks? Because VAR involves current and lagged values of multiple time series, they capture comovements that can not be detected in univariate models. VAR summary statistics like Granger-causality tests, impulse response functions and variance decompositions are well-accepted methods for portraying co-movements. Small VARs have become a benchmark against which new forecasting systems are judged. The problem is that small VARs are often unstable and thus poor predictors of the future. 5.6
Problems
1. Write a referee report for the paper by Sims (1992).
CHAPTER 5. VECTOR AUTOREGRESSIVE MODELS
230
2. Use quarterly data for real GDP, GDP deflator, CPI, the Federal Funds rate, Money base measure, M 1, and the index of commodity prices for the period 1959.I-2002.III (file “TEps8data.xls”). Also, use the monthly data for the Total Reserves (file “TotResAdjRR.xls”) and Non-Borrowed Reserves (file “BOGNONBR.xls”) for the period 1959.1-2002.9. Transform monthly data into quarterly data. We examine the VAR model, investigated by Christiano, Eichenbaum and Evans (2000). The model has seven variables: the log of real GDP (Y ), log of consumer price index (CP I), change in the index of sensitive commodity prices (P com), Federal funds rate(F F ), log of total reserves (T R), log of non-borrowed reserves (N BR), and log of M 1 monetary aggregate (M 1). A monetary policy shock in the model is represented by a shock to the Federal funds rate: the information set consists of current and lagged values of Y , CP I and P com, and only lagged values of F F , T R, N BR and M 0. It implies the following ordering of the variables in the model: x-t = (Y, CP I, P com, F F, T R, N BR, M 1). The reference for this paper is Christiano, Eichenbaum and Evans (2000). (a) Construct graphs of all time series data. Comment your results. (b) Estimate a VAR model for xt. (i) Nicely present the impulse response functions representing the response of all variables in the model to a monetary policy shock (F F rate). Carefully explain your results. (ii) Nicely present the variance-decomposition results for all variables for the forecast horizons k = 2, 4, 12, 36. Carefully explain your results. (iii) Conduct Granger Causality tests for all variables in the model. Carefully explain your
CHAPTER 5. VECTOR AUTOREGRESSIVE MODELS
231
results. (c) Most macro economists agree that there was a shift of monetary policy toward inflation during the late 1970s from accommodating to aggressive. Estimate model for two periods 1959.I-1979.II and 1979.III -2002.III. Nicely present the impulse response functions representing the response of all variables in the model to a monetary policy shock (F F rate) for both periods. Carefully explain your results. How do impulse response function reveal the change in the monetary policy? (d) Use the VAR model to construct 8-period-ahead forecasts for all the variables in the model. 3. Some macro economists looked at the N BR and N BR/T R specifications of monetary policy shocks. In the case of a N BR monetary policy shock, the information set is identical to a F F shock, while in the case of a N BR/T R shock the information set includes also the current value of total reserves. Use a recursive scheme for identification and examine the effect of a monetary policy shock for a N BR specification (think how you should reorder the variables in model). (a) Nicely present the impulse response functions representing the response of all variables in the model to a monetary policy shock (the level of N BR). Explain your results. (b) Nicely present the variance-decomposition results for all variables showing the contribution of N BR shock only for the forecast horizons k = 2, 4, 12, 36. Explain your results. 4. Write a program for implementing FAVAR approach of Bernanke et al. (2005).
CHAPTER 5. VECTOR AUTOREGRESSIVE MODELS
232
(a) Run the program and explain the main steps in the estimation. (b) Using the estimation results, carefully explain the effect of monetary policy on 90-day T-bill rate, 1-year T-bond interest rate, 5-year T-bond interest rate and 10-year T-bond interest rate. (c) Carefully explain your finding on the effect of monetary policy on employment. You need to use the impulse response functions for employment series, unemployment series, average hours worked, and new claims for unemployment. (d) Explain the effect of a shock to the Federal Funds rate on different aggregate measures of money supply. (e) Explain the effect of a monetary policy shock on exchange rate and real stock prices. (f) Explain the effect of a monetary policy shock on different measures of GDP (real GDP, different measures of Industrial Production, etc.). (g) What happens to the results if the number of Diffusion Indexes (see Stock and Watson (2002)) is increased from three to five? 5. Assume that you are faced with a task of forecasting different interest rates. Explain how you may apply the Diffusion Indexes approach of Stock and Watson (2002) to use as many variables as possible in the forecasting. 6. Write a referee report for the paper by Bachmeier, Leelahanon and Li (2005). Please think about any possible future projects.
CHAPTER 5. VECTOR AUTOREGRESSIVE MODELS
5.7
233
References
Bachmeier, L., S. Leelahanon and Q. Li (2005). Money growth and inflation in the United States. Working Paper, Department of Economics, Texas A&M University. Bernanke, B.S., J. Boivin and E. Piotr (2005). Measuring the effects of monetary policy: a factor-augmented vector autoregressive (FAVAR) approach. The Quarterly Journal of Economics, 120, 387-422. Christiano, L.J., M. Eichenbaum and C.L. Evans (2000). Monetary policy shocks: what have we learned and to what end? Handbook of Macroeconomics, Vol. 1A. Cochrane, J.H. (1994). Shocks. NBER working paper #46984. Cochrane, J.H. (1997). Time series for macroeconomics and finance. Lecture Notes. http://www-gsb.uchicago.edu/fac/john.cochrane/research/Papers/timeser1.pdf Friedman, M. and A.J. Schwartz (1963). A Monetary History of the United States, 18671960. Princeton University Press. Hamilton, J.D. (1994). Time Series Analysis. Princeton University Press. Hendry, F.D. and K. Juselius (2000). Explaining cointegration analysis: Part I. Journal of Energy, 21, 1-42. Newey, W.K. and K.D. West (1987). A simple, positive-definite, heteroskedasticity and autocorrelation consistent covariance matrix. Econometrica, 55, 703-708. Sims, C. (1980). Macroeconomics and reality. Econometrica, 48, 1-48. Sims, C.A. (1992). Interpreting the macroeconomic time series facts: the effects of monetary policy. European Economic Review, 36, 975-1000. Stock, J.H. and M.W. Watson (2001). Vector autoregressions. Journal of Economic Perspectives, 15, 101-115. Stock, J.H. and M.W. Watson (2002). Macroeconomic forecasting using diffusion indexes. Journal of Business and Economic Statistics, 20, 147-162. Stock, J.H. and M.W. Watson (2003). Introduction to Econometrics. Addison-Wesley. Tobin, J. (1970). Money and income. Quarterly Journal of Economics.
Chapter 6 Cointegration 6.1
Introduction
Cointegration is a generalization of unit root to vector models as a single series can not be cointegrated. Cointegration analysis is designed to find linear combinations of variables to remove unit roots. Suppose that two series are each integrated with the following MA representation: (1 − L) yt = a(L) ut,
and
(1 − L) xt = b(L) vt.
In general, linear combinations of yt and xt will also have a unit roots. But if there is some linear combination, yt − θ xt, that is stationary, yt and xt are said to be cointegrated and α = (1, −θ)- is their cointegrating vector. Cointegration vectors are of considerable interest when they exist, since they determine I(0) relations that hold between variables that are individually non-stationary. As an example, we may look at the real GNP and consumption. Each of these series probably has a unit root but the ratio of consumption to real GNP is stable over the long periods of time. Therefore, log consumption minus log GNP is stationary, and log GNP and consumption are cointegrated. Other possible examples include the dividend/price ratio or money and prices. However, cointegration 234
CHAPTER 6. COINTEGRATION
235
does not say anything about the direction of causality. 6.2
Cointegrating Regression
Cointegrating vectors are “super-consistent” which means that you can estimate them by OLS even when the right hand side variables are correlated with the error term, and the estimates converge at a faster rate than usual OLS estimates. Suppose yt and xt are cointegrated so that yt − θ xt is stationary. Estimate the following model using OLS regression: yt = β xt + et. (6.1) OLS estimates of β converge to θ, even if the errors are correlated with xt. Note that if yt and xt are each individual I(1) but are not cointegrated then the regression in equation (6.1) results in spurious regression. Therefore, you have to check whether the estimated residuals e"t are I(1) or I(0). We will discuss it later in the notes. Representation of Cointegrating System
Let yt be a first difference stationary vector time series. The elements of yt are cointegrated if there is at least one vector α, cointegrating vector, such that α- yt is stationary in levels. Since the difference of yt is stationary, it has a moving average representation (1 − L) yt = A(L) et. Since the stationarity of α- yt is an extra restriction, it must imply a restriction on A(L). Similar to the univariate Beveridge-Nelson (1981) decomposition, the multivariate Beveridge-Nelson decomposition can be done in the
CHAPTER 6. COINTEGRATION
236
same way as: yt = zt + ct %
where (1 − L) zt = A(1) et and ct = A∗(L) et with A∗j = ∞ k=j+1 Ak . The restriction on A(1) implied by cointegration: the elements of yt are cointegrated with cointegrating vectors α if and only if (iff) α-A(1) = 0. This implies that the rank of A(1) is the number of elements of yt minutes number of cointegrating vectors α. There are three cases for A(1): First, A(1) = 0 iff yt is stationary in levels and all linear combinations of yt are stationary in levels. Second, A(1) is not full rank iff (1 − L) yt is stationary and some linear combinations α- yt are stationary. Finally, A(1) has full rank iff (1 − L) yt is stationary and no linear combinations of yt are stationary. Impulse response function
A(1) is the limiting impulse-response of the levels of the vector yt = (xt, zt)-. To see how cointegration affects A(1), consider a simple case, α = (1, −1)-. The reduced rank of A(1) means: α-A(1) = 0,
A(1)xx A(1)zx = 0. or (1 − 1) A(1)xz A(1)zz
Therefore, A(1)xx = A(1)zx and A(1)xz = A(1)zz so that each variable’s long-run response to a shock must be the same. 6.3
Testing for Cointegration
There are several ways to decide whether variables can be modeled as cointegrated: First, use expert knowledge and economic theory. Second, graph the series and see whether they appear to have a common stochastic trend. Finally, perform statistical tests for cointegration.
CHAPTER 6. COINTEGRATION
237
All three methods should be used in practice. We will consider residual based statistical test for cointegration. Testing for cointegration when cointegrating vector is known
Sometimes, a researcher may know the cointegrating vector based on the economic theory. For example, the hypothesis of purchasing power parity implies that: Pt = St × Pt∗,
where Pt is an index of the price level in the U.S., St is the exchange rate ($/Chinese Yuan), Pt∗ is a price index for China. Taking logs, this equation can be written as: pt = st + p∗t A weaker version of the hypothesis is that the variable vt defined by: p t ∗ vt = pt − st − pt = (1 − 1 − 1) st p∗t is stationary, even though the individual elements (pt, st, p∗t ) are all I(1). In this case the cointegrating vector α is known to be (1, −1, −1). Testing for cointegration in this case consists of several steps: 1. Verify that pt, st, p∗t are each individually I(1). This will be true if (a) You test for unit root in levels of these series and can no reject the null hypothesis of unit root (ADF or other tests for unit root). (b) You test for unit root in first differences of these series and reject the null hypothesis of unit root (ADF or other tests for unit root). 2. Test that a series vt is stationary or not.
CHAPTER 6. COINTEGRATION
238
Table 6.1: Critical values for the Engle-Granger ADF statistic Number of X " s in equation (6.1) 10% 1 -3.12 2 -3.52 3 -3.84 4 -4.20
5% -3.41 -3.80 -4.16 -4.49
1% -3.96 -4.36 -4.73 -5.07
Testing for cointegration when cointegrating vector is unknown
Consider an example in which two series yt and xt are cointegrated with cointegrated vector α = (1, −θ)- so that vt = yt − θ xt is stationary. However, the cointegrating coefficient θ is not known. To estimate θ, we can use the Engle-Granger Augmented Dickey-Fuller (EG-ADF) test for cointegration which consists of two steps: 1. Verify that yt and xt are each individually I(1). 2. Estimate the cointegrating coefficient θ using OLS estimation of the regression yt = µ + θ xt + vt. 3. A Dickey-Fuller t-test (with intercept µ no time trend) is used to test for a unit root in the residual from this regression, v" t. Since we estimate residuals in the first step, we need to use different critical values for the unit root test. Critical values for EG-ADF statistic are given in Table 6.1. This table is taken from Stock and Watson (2002). If xt and yt are cointegrated, then the OLS estimator of the coefficient in the regression is super-consistent. Therefore, the OLS estimator has a non-normal distribution, and inferences based on its t-statistic can be misleading. To avoid this problem, Stock and Watson (1993) developed dynamic OLS (DOLS) estimator of θ from the
CHAPTER 6. COINTEGRATION
239
following regression: yt = µ + θ x t +
p $
j=−p
δj ∆ xt−j + ut.
If xt and yt are cointegrated, statistical inferences about θ and δ -s based on HAC standard errors are valid. If xt were strictly exogenous, then the coefficient on xt, θ, would be the long-run cumulative multiplier, that is, the long-run effect on y of a change in x. See the long-run multiplier between oil and gasoline prices in the paper by Borenstein et al. (1997). 6.4
Cointegrated VAR Models
We start with the autoregressive representation of levels of yt, B(L) yt = et: yt = −B1 yt−1 − B2 yt−2 − · · · + et
Applying BN decomposition B(L) = B(1)+(1−L)B ∗(L), we obtain: [B(1) + (1 − L)B ∗(L)] yt = B(1) yt + B ∗(L) ∆ yt = et so that yt = −[B1 + B2 + · · ·] yt−1 −
∞ $
j=1
Bj∗ ∆yt−j + et.
Subtracting yt−1 from both sides, we get ∆ yt = −B(1) yt−1 −
∞ $
j=1
Bj ∆yt−j + et.
The matrix B(1) controls the cointegration properties: 1. If B(1) is full rank, any linear combination of yt is stationary and yt is stationary. In this case, we run a normal VAR.
CHAPTER 6. COINTEGRATION
240
2. If B(1) has rank between 0 and full rank. There are some linear combinations of yt that are stationary, so yt is stationary. The VAR in levels is consistent but inefficient if you know the cointegrating vector and the VAR in differences is misspecified in this case. 3. If B(1) has rank zero, so no linear combination of yt is stationary and ∆yt is stationary with no cointegration. In this case we run a normal VAR in differences. Error Correction Representation
If B(1) has less than full rank, we can express it as: B(1) = γ α-. If there are K cointegrating vectors, then the rank of B(1) is K and γ and α each have K columns. Then the system can be rewritten as -
∆ yt = −γ α yt−1 −
∞ $
j=1
Bj∗∆ yt−j + et,
where α- is a K × N matrix of cointegrating vectors. The above expression is the well known error-correction model (ECM) representation of the integrated system. It is not easy to estimate this model when all cointegrated vectors in α are unknown. Consider a multivariate model consisting of two variables xt and wt which are individually I(1). One may model these two variables using one of the following models: 1. A VAR model in levels 2. A VAR in the first differences
CHAPTER 6. COINTEGRATION
241
3. An ECM representation. With cointegration, a pure VAR in differences is misspecified. ∆ xt = a(L) ∆ xt−1 + b(L) ∆ zt−1 + et ∆ zt = c(L) ∆ xt−1 + d(L) ∆ zt−1 + vt Looking at the error-correction form, there is a missing regressor, αx xt−1 + αz zt−1. This is a problem. A pure VAR in levels is a little bit unconventional since the variables in the model are nonstationary. The VAR in levels is not misspecified and the estimates are consistent but the coefficients may have non-standard distributions and they are not efficient. If there is cointegration, it imposes restrictions on B(1) that are not imposed in a pure VAR in levels. Cochrane (1994) suggested that one way to impose cointegration is to run an errorcorrection VAR: ∆ xt = γx(αx xt−1 + αz zt−1) + a(L) ∆ xt−1 + b(L) ∆ zt−1 + et ∆ zt = γw (αx xt−1 + αz zt−1) + c(L) ∆ xt−1 + d(L) ∆ zt−1 + vt This specification imposes that x and z are cointegrated with cointegrating vector α. This is very useful if you know that the variables are cointegrated and you know the cointegrating vector. Otherwise, you have to pre-test for cointegration and estimate the cointegrating vector in a separate step. Another difficulty with the error-correction form is that it does not fit nicely into standard VAR packages. A way to use standard packages is to estimate companion form: ∆ xt = a(L) ∆ xt−1 + b(L) (αx xt−1 + αz zt−1) + et αx xt + αz zt = c(L) ∆ xt−1 + d(L)(αx xt−1 + αz zt−1) + vt We need to know cointegrating vector to use this procedure. There is much debate as to which approach is best. When you do not
CHAPTER 6. COINTEGRATION
242
really know whether there is cointegration or what the cointegrating vector is, the VAR in levels seems to be better. When you know that there is cointegration and what the cointegrating vector is, the error-correction form model or VAR in companion form is better. Some of unit root and cointegration tests are provided by the packages tseries, urca and uroot in R. For example, in tseries, the function po.test is for the Phillip-Ouliaris’s (1990) test for testing that x is not cointegrated. There are many other test methods available in the packages urca and uroot; for details, see their manuals. 6.5
Problems
1. The interest rates may be found in the Excel file “IntRates.xls”. Check whether 90-day T-bill rate and 10-year T-bond interest rate are cointegrated. Carefully explain how you conduct the test and how you interpret your findings. 2. Download data (by yourself) from the web site to have data on consumer price index (CPI), producer price index (PPI), threemonth T-bill rate, the index of industrial production, S&P 500 common stock price index. The data for industrial production is seasonally adjusted while all other variables are not seasonally adjusted. Conduct a test of cointegration between the variables. Explain your results. 3. Collect the following macroeconomic annual time series of China from China Statistical Yearbook: GNP, GDP, GDP1 (GDP of primary industry), GDP2 (GDP of secondary industry), GDP3 (GDP of tertiary industry), and per capita GDP. From both the nominal and real terms of the definitions of national products, derive the corresponding price deflators.
CHAPTER 6. COINTEGRATION
243
(a) Define and test the unit roots for each of the 18 economic variables (nominal, real, and deflator of GDPs) assuming no structural break in the series. (b) Define and test the unit roots for each of the 18 economic variables (nominal, real, and deflator of GDPs) assuming one-time structural break in the series. (c) Conduct cointegration tests for some of the 18 economic variables. Explain your results. 6.6
References
Bernanke, B.S., J. Boivin and E. Piotr (2005). Measuring the effects of monetary policy: a factor-augmented vector autoregressive (FAVAR) approach. The Quarterly Journal of Economics, 120, 387-422. Borenstein, S., A.C. Cameron and R. Gilbert (1997). Do gasoline prices respond asymmetrically to crude oil price changes? The Quarterly Journal of Economics, 112, 305-339. Christiano, L.J., M. Eichenbaum and C.L. Evans (2000). Monetary policy shocks: what have we learned and to what end? Handbook of Macroeconomics, Vol. 1A. Cochrane, J.H. (1994). Shocks. NBER working paper #46984. Cochrane, J.H. (1997). Time series for macroeconomics and finance. Lecture Notes. Engle, R.F. and C.W.J. Granger (1987). Cointegration and error correction: Representation, estimation and testing. Econometrica, 55, 251-276. Hamilton, J.D. (1994). Time Series Analysis. Princeton University Press. Hendry, F.D. and K. Juselius (2000). Explaining cointegration analysis: Part I. Journal of Energy, 21, 1-42. Phillip, P.C.B. and S. Ouliaris (1990). Asymptotic properties of residual based on tests for cointegration. Econometrica, 58, 165-193. Stock, J.H. and M.W. Watson (1993). A simple estimator of cointegrating vectors in higher order integrated systems. Econometrica, 61, 1097-1107. Stock, J.H. and M.W. Watson (2003). Introduction to Econometrics. Addison-Wesley.
Chapter 7 Nonparametric Density, Distribution & Quantile Estimation 7.1
Mixing Conditions
It is well known that α-mixing includes many time series models as a special case. In fact, under very mild assumptions linear autoregressive and more generally bilinear time series models are α-mixing with mixing coefficients decaying exponentially. Many nonlinear time series models, such as functional coefficient autoregressive processes with/without exogenous variables, ARCH and GARCH type processes, stochastic volatility models, and nonlinear additive autoregressive models with/without exogenous variables, are strong mixing under some mild conditions. See Cai (2002) and Chen and Tang (2005) for more details. To simplify the notation, we only introduce mixing conditions for strictly stationary processes (in spite of the fact that a mixing process is not necessarily stationary). The idea is to define mixing coefficients to measure the strength (in different ways) of dependence for the two segments of a time series which are apart from each other in time.
244
CHAPTER 7. NONPARAMETRIC DENSITY, DISTRIBUTION & QUANTILE ESTIMATION245
Let {Xt} be a strictly stationary time series. For n ≥ 1, define α(n) =
sup
0 ;B∈F ∞ A∈F−∞ n
|P (A)P (B) − P (AB)|,
where Fij denotes the σ-algebra generated by {Xt; i ≤ t ≤ j}. Note that Fn∞ ↓. If α(n) → 0 as n → ∞, {Xt} is called α-mixing or strong mixing. There are several other mixing conditions such as ρ-mixing, β-mixing, φ-mixing, and ψ-mixing; see the books by Hall and Heyde (1980) and Fan and Yao (2003, page 68). It is well known that the relationships among the mixing conditions are ψ-mixing =⇒ φ-mixing =⇒ ρ-mixing and β-mixing =⇒ α-mixing. Lemma 1: (Davydov’s inequality) (i) If E|Xi|p + E|Xj |q < ∞ for some p ≥ 1 and q ≥ 1 and 1/p + 1/q < 1, it holds that |Cov(Xi, Xj )| ≤ 8 α1/r (|j − i|){E|Xi |p}1/p{E|Xj |q }1/q ,
where r = (1 − 1/p − 1/q)−1. (ii) If P (|Xi| ≤ C1) = 1 and P (|Xj | ≤ C2) = 1 for some constants C1 and C2, it holds that |Cov(Xi, Xj )| ≤ 4 α(|j − i|) C1 C2.
Note that if we allow Xi and Xj to be complex-valued random variables, (ii) still holds with the coefficient “4” on the RHS of the inequality replaced by “16”. 7.2
Density Estimate
Let {Xi} be a random sample with a (unknown) marginal distribution F (·) (CDF) and its probability density function (PDF) f (·). The question is how to estimate f (·) and F (·). Since F (x) = P (Xi ≤ x) = E[I(Xi ≤ x)] =
L x
−∞
f (u)du,
CHAPTER 7. NONPARAMETRIC DENSITY, DISTRIBUTION & QUANTILE ESTIMATION246
and f (x) = lim h↓0
F (x + h) − F (x − h) F (x + h) − F (x − h) ≈ 2h 2h
if h is very small, by the method of moment estimation (MME), F (x) can be estimated by 1 Fn(x) = n
n $
i=1
I(Xi ≤ x),
which is called the empirical cumulative distribution function (ecdf), so that f (x) can be estimated by fn(x) =
Fn(x + h) − Fn(x − h) 1 = 2h n
n $
i=1
Kh(Xi − x),
where K(u) = I(|u| ≤ 1)/2 and Kh(u) = K(u/h)/h. Indeed, the kernel function K(u) can be taken to be any symmetric density function. Exercise: Please show that Fn(x) is unbiased estimate of F (x) but fn(x) is biased estimate of f (x). Think about intuitively (1) why fn(x) is biased (2) where the bias comes from (3) why K(·) should be symmetric. 7.2.1
Asymptotic Properties
Let us look at the variance of estimators. If {Xi} is stationary, then
i − 1 1 − n Var(Fn(x)) = Var(I(Xi ≤ x)) + 2 Cov(I(X1 ≤ x), I(Xi ≤ n i=2 n $
= F (x)[1 − F (x)] + 2 1
→σ 2 (x)
n $
i=2
Cov(I(X1 ≤ x), I(Xi ≤ x)) 23
by assuming that
σ 2 (x)<∞
4
CHAPTER 7. NONPARAMETRIC DENSITY, DISTRIBUTION & QUANTILE ESTIMATION247
−2 1
i−1 Cov(I(X1 ≤ x), I(Xi ≤ x)) i=2 n 23 4 →0 by Kronecker Lemma n $
→ σF2 (x) ≡ F (x)[1 − F (x)] + 2
∞ $
i=2 1
Therefore,
Cov(I(X1 ≤ x), I(Xi ≤ x) 23
This term is called
n Var(Fn(x)) → σF2 (x).
(7.1)
It is clear that Ad = 0 if {Xi} are independent. If Ad )= 0, the question is how to estimate it. We can use the HC estimator by White (1980) or the HAC estimator by Newey and West (1987); see Section 3.10. Next, we derive the asymptotic variance for fn(x). First, define Zi = Kh(Xi − x). Then, E[Z1 Zi] =
L L
L L
Kh(u − x)Kh(v − x) f1,i(u, v)dudv
= K(u)K(v) f1,i (x + u h, x + v h)dudv → f1,i(x, x), where f1,i(u, v) is the joint density of (X1, Xi), so that Cov(Z1, Zi) → f1,i(x, x) − f 2(x). It is easy to show that M
h Var(Z1) → ν0(K) f (x),
where νj (K) = uj K 2(u)du. Therefore,
i − 1 1 − n h Var(fn(x)) = Var(Z1) + 2h Cov(Z1, Zi) n i=2 1 23 4 ≡Af →0 under some assumptions → ν0(K) f (x). n $
Ad
CHAPTER 7. NONPARAMETRIC DENSITY, DISTRIBUTION & QUANTILE ESTIMATION248
To show that Af → 0, let dn → ∞ and dn h → 0. Then, |Af | ≤ h
dn $
i=2
|Cov(Z1, Zi)| + h
n $
i=dn +1
|Cov(Z1, Zi)|.
For the first term, if f1,i(u, v) ≤ M1, then, it is bounded by h dn = o(1). For the second term, we apply the Davydov’s inequality to obtain h
n $
i=dn +1
|Cov(Z1, Zi)| ≤ M2
n $
i=dn +1
α(i)/h = O(d−β+1 h−1) n
if α(n) = O(n−β ) for some β > 2. If dn = O(h−2/β ), then, the second term is dominated by O(h1−2/β ) which goes to 0 as n → ∞. Hence, n h Var(fn(x)) → ν0(K) f (x). (7.2) We can establish the following asymptotic normality for fn(x) but the proof will be discussed later.
Theorem 1: Under regularity conditions, we have 2 √ h n h fn(x) − f (x) − µ2(K) f -- (x) + op(h2) → N (0, ν0(K) f (x)) . 2 Exercise: By comparing (7.1) and (7.2), what can you observe? Example 1: Let us examine how importance the choice of bandwidth is. The data {Xi}ni=1 are generated from N (0, 1) (iid) and n = 300. The grid points are taken to be [−4, 4] with an increment ∆ = 0.1. Bandwidth is taken to be 0.25, 0.5 and 1.0, respectively and the kernel can be the Epanechnikov kernel or Gaussian kernel. Example 2: Next, we apply the kernel density estimation to the density of the weekly 3-month Treasury bill from January 2, 1970 to December 26, 1997.
CHAPTER 7. NONPARAMETRIC DENSITY, DISTRIBUTION & QUANTILE ESTIMATION249
Note that the computer code in R for the above two examples can be found in Section 7.5. R has a build-in function density() for computing the nonparametric density estimation. Also, you can use the command plot(density()) to plot the estimated density. Further, R has a build-in function ecdf() for computing the empirical cumulative distribution function estimation and plot(ecdf()) for plotting the step function. 7.2.2
Optimality
As we already have shown that h2 E(fn(x)) = f (x) + µ2(K) f -- (x) + o(h2), 2 and
ν0(K) f (x) + o((nh)−1), nh so that the asymptotic mean integrated squares error (AMISE) is Var(fn(x)) =
L ν0(K) h4 2 µ2(K) [f --(x)]2 + . AMISE = 4 nh Minimizing the AMISE gives the
−2/5
hopt = C1(K) ||f -- ||2 where C1(K) =
n−2/5,
(7.3)
7
81/5 2 ν0(K)/µ2(K) .
With this asymptotically optimal bandwidth, the optimal AMISE is given by 5 2/5 AMISEopt = C2(K) ||f -- ||2 n−4/5, 4 where 82/5 7 2 . C2(K) = ν0 (K) µ2(K)
CHAPTER 7. NONPARAMETRIC DENSITY, DISTRIBUTION & QUANTILE ESTIMATION250
To choose the best kernel, it suffices to choose one to minimize C2(K). Proposition 1: The nonnegative probability density function K minimizing C2(K) is a re-scaling of the Epanechnikov kernel: Kopt(u) =
3 (1 − u2/a2)+ 4a
for any a > 0. Proof: First of all, we note that C2(Kh) = C2(K) for any h > 0. Let K0 be the Epanechnikov kernel. For any other nonnegative negative K, by re-scaling if necessary, we assume that µ2(K) = µ2(K0). Thus, we need only to show that ν0(K) ≤ ν0(K). Let G = K − K0. Then, L
G(u)du = 0 and
L
u2 G(u)du = 0,
which implies that L
(1 − u2) G(u)du = 0.
Using this and the fact that K0 has the support [−1, 1], we have 3 L 2 G(u)(1 − u )du G(u) K0(u)du = 4 |u|≤1 3 L 3 L 2 = − |u|>1 G(u)(1 − u )du = K(u)(u2 − 1)du. |u|>1 4 4 Since K is nonnegative, so is the last term. Therefore, L
L
2
K (u)du =
L
K02(u)du+2
L
L
2
K0(u)G(u)du+ G (u)du ≥
L
K02(u)du,
which proves that K0 is the optimal kernel. Remark: This proposition implies that the Epanechnikov kernel should be used in practice.
CHAPTER 7. NONPARAMETRIC DENSITY, DISTRIBUTION & QUANTILE ESTIMATION251
7.2.3
Boundary Correction
In many applications, the density f (·) has a bounded support. For example, the interest rate can not be less than zero and the income is always nonnegative. It is reasonable to assume that the interest rate has support [0, 1). However, because a kernel density estimator spreads smoothly point masses around the observed data points, some of those near the boundary of the support are distributed outside the support of the density. Therefore, the kernel density estimator under estimates the density in the boundary regions. The problem is more severe for large bandwidth and for the left boundary where the density is high. Therefore, some adjustments are needed. To gain some further insights, let us assume without loss of generality that the density function f (·) has a bounded support [0, 1] and we deal with the density estimate at the left boundary. For simplicity, suppose that K(·) has a support [−1, 1]. For the left boundary point x = c h (0 ≤ c < 1) , it can easily be seen that as h → 0, E(fn(ch)) =
L 1/h−c
−c
f (ch + hu)K(u)du
= f (0+) µ0,c (K) + h f -(0+)[c µ0,c(K) + µ1,c(K)] + o(h), (7.4) where f (0+) = limx↓0 f (x), µj,c =
L ∞
−c
j
u K(u)du,
and νj,c (K) =
L ∞
−c
uj K 2(u)du.
Also, we can show that Var(fn(ch)) = O(1/nh). Therefore, fn(ch) = f (0+) µ0,c (K) + h f -(0+)[c µ0,c(K) + µ1,c(K)] + op(h). Particularly, if c = 0 and K(·) is symmetric, then E(fn(0)) = f (0)/2 + o(1). There are several methods to deal with the density estimation at boundary points. Possible approaches include the boundary kernel (see Gasser and M¨uller (1979) and M¨uller (1993)), reflection
CHAPTER 7. NONPARAMETRIC DENSITY, DISTRIBUTION & QUANTILE ESTIMATION252
(see Schuster (1985) and Hall and Wehrly (1991)), transformation (see Wand, Marron and Ruppert (1991) and Marron and Ruppert (1994)) and local polynomial fitting (see Hjort and Jones (1996) and Loader (1996)), and others. Boundary Kernel
One way of choosing a boundary kernel is
12 3c2 − 2c + 1 K(c)(u) = (1 − 2c)u + I . (1 + u) [−1,c] (1 + c)4 2
Note K(1)(t) = K(t), the Epanechnikov kernel as defined above. Moreover, Zhang and Karunamuni (1998) have shown that this kernel is optimal in the sense of minimizing the MSE in the class of all kernels of order (0, 2) with exactly one change of sign in their support. The downside to the boundary kernel is that it is not necessarily nonnegative, as will be seen on densities where f (0) = 0. Reflection
The reflection method is to construct the kernel density estimate based on the synthetic data {±Xt; 1 ≤ t ≤ n} where “reflected” data are {−Xt; 1 ≤ t ≤ n} and the original data a re {Xt; 1 ≤ t ≤ n}. This results in the estimate 1 fn(x) = n
n $
t=1
Kh(Xt − x) +
n $
t=1
Kh(−Xt − x) ,
for x ≥ 0.
Note that when x is away from the boundary, the second term in the above is practically negligible. Hence, it only corrects the estimate in the boundary region. This estimator is twice the kernel density estimate based on the synthetic data {±Xt; 1 ≤ t ≤ n}.
CHAPTER 7. NONPARAMETRIC DENSITY, DISTRIBUTION & QUANTILE ESTIMATION253
Transformation
The transformation method is to first transform the data by Yi = g(Xi), where g(·) is a given monotone increasing function, ranging from −∞ to ∞. Now apply the kernel density estimator to this transformed data set to obtain the estimate fn(y) for Y and apply the inverse transform to obtain the density of X. Therefore, n 1 $ fn(x) = g (x) Kh(g(Xt) − g(x)). n t=1 The density at x = 0 corresponds to the tail density of the transformed data since log(0) = −∞, which can not usually be estimated well due to lack of the data at tails. Except at this point, the transformation method does a fairly good job. If g(·) is unknown in many situations, Karunamuni and Alberts (2003) suggested a parametric form and then estimated the parameter. Also, Karunamuni and Alberts (2003) considered other types of transformations. Local Likelihood Fitting
The main idea is to consider the approximation log(f (Xt)) ≈ P (Xt − % x), where P (u − x) = pj=0 aj (u − x)j with the localized version of log-likelihood n $
t=1
L
log(f (Xt)) Kh(Xt − x) − n Kh(u − x)f (u)du.
With this approximation, the local likelihood becomes L(a0, · · · , dp) =
n $
t=1
L
P (Xt−x) Kh(Xt−x)−n Kh(u−x) exp(P (u−x))du.
Let {a" j } be the maximizer of the above local likelihood L(a0, · · · , dp). Then, the local likelihood density estimate is fn(x) = exp(a" 0).
CHAPTER 7. NONPARAMETRIC DENSITY, DISTRIBUTION & QUANTILE ESTIMATION254
The maximizer does not exist, then fn(x) = 0. See Loader (1996) and Hjort and Jones (1996) for more details. If R is used for the local fit for density estimation, please use the function density.lf() in the package localfit. Exercise: Please conduct a Monte Carol simulation to see what the boundary effects are and how the correction methods work. For example, you can consider some distribution densities with a finite support such as beta-distribution. 7.3 7.3.1
Distribution Estimation Smoothed Distribution Estimation
The question is how to obtain a smoothed estimate of CDF F (x). Well, one way of doing so is to integrate the estimated PDF fn(x), given by L x n 1 x − X $ i , F#n(x) = −∞ fn(u)du = K n i=1 h Mx where K(x) = −∞ K(u)du; the distribution of K(·). Why do we need this smoothed estimate of CDF? To answer this question, we need to consider the mean squares error (MSE). First, we derive the asymptotic bias. By the integration by parts, we have 7 8 L x − X i # E Fn(x) = E K = F (x − hu)K(u)du h h2 = F (x) + µ2(K) f -(x) + o(h2). 2 Next, we derive the asymptotic variance. L x − X i 2 = F (x − hu)b(u)du = F (x) − h f (x) θ + o(h), E K h
CHAPTER 7. NONPARAMETRIC DENSITY, DISTRIBUTION & QUANTILE ESTIMATION255 M
where b(u) = 2 K(u) K(u) and θ = u b(u)du. Then,
x − Xi Var K = F (x)[1 − F (x)] − h f (x) θ + o(h). h
Define Ij (x) = Cov (I(X1 ≤ x), I(Xj+1 ≤ t)) = Fj (x, x) − F 2(x) and x − X x − X 1 j+1 , K . Inj (x) = Cov K h h By means of Lemma 2 in Lehmann (1966), the covariance Inj (x) may be written as follows L 5 x − X x − X 1 j+1 > u, K > v P K Inj (t) = h h 6 x − X x − X 1 j+1 >u P K > v dudv. −P K h h Inverting the CDF K(·) and making two changes of variables, the above relation becomes L
Inj (x) = [Fj (x−hu, x−hv)−F (x−hu)F (x−hv)]K(u)K(v)dudv. Expanding the right-hand side of the above equation according to Taylor’s formula, we obtain |Inj (x) − Ij (x)| ≤ C h2. By the Davydov’s inequality (see Lemma 1), we have |Inj (x) − Ij (x)| ≤ C α(j), so that for any 1/2 < τ < 1, |Inj (x) − Ij (x)| ≤ C h2 τ α1−τ (j).
Therefore, n−1 ∞ 1 n−1 $ $ 2τ $ 1−τ (n − j)|Inj (x) − Ij (x)| ≤ |Inj (x) − Ij (x)| ≤ C h α (j) = O n j=1 j=1 j=1
CHAPTER 7. NONPARAMETRIC DENSITY, DISTRIBUTION & QUANTILE ESTIMATION256 %
1−τ provided that ∞ (j) < ∞ for some 1/2 < τ < 1. Indeed, j=1 α this assumption is satisfied if α(n) = O(n−β ) for some β > 2. By the stationarity, it is clear that
x − X1 2 n−1 $ # n Var Fn(x) = Var K + (n − j)Inj (x). h n j=1 +
,
Therefore, +
#
,
n Var Fn(x)
= F (x)[1 − F (x)] − h f (x) θ + o(h) + 2
∞ $
j=1
Ij (x) + O(h2τ )
= σF2 (x) − h f (x) θ + o(h).
We can establish the following asymptotic normality for F#n(x) but the proof will be discussed later. Theorem 2: Under regularity conditions, we have √
+ , h2 # 2 2 n Fn(x) − F (x) − µ2(K) f (x) + op(h ) → N 0, σF (x) . 2
Similarly, we have n h4 2 n AMSE(Fn(x)) = µ2(K) [f -(x)]2 + σF2 (x) − h f (x) θ. 4 If θ > 0, minimizing the AMSE gives the #
hopt =
1/3
θ f (x) 2 2 µ2(K)[f (x)]
n−1/3,
and with this asymptotically optimal bandwidth, the optimal AMSE is given by 3 n AMSEopt(F#n(x)) = σF2 (x) − 4
2/3
2 2 θ f (x) µ2(K)f -(x)
n−1/3.
CHAPTER 7. NONPARAMETRIC DENSITY, DISTRIBUTION & QUANTILE ESTIMATION257
Remark: From the aforementioned equation, we can see that if θ > 0, the AMSE of F#n(x) can be smaller than that for Fn(x) in the second order. Also, it is easy to that if K(·) is the Epanechnikov kernel, θ > 0. 7.3.2
Relative Efficiency and Deficiency
To measure the relative efficiency and deficiency of F#n(x) over Fn(x), we define 5
+
#
,6
i(n) = min k ∈ {1, 2, . . .}; MSE(Fk (x)) ≤ MSE Fn(x)
.
We have the following results without the detailed proof which can be found in Cai and Roussas (1998). Proposition 2: (i) Under regularity conditions, i(n) → 1, if and only if n (ii) Under regularity conditions, i(n) − n → θ(x), nh where θ(x) = f (x)θ/σF2 (x).
if and only if
nh4n → 0.
nh3n → 0,
Remark: It is clear that the quantity θ(x) may be looked upon as a way of measuring the performance of the estimate F#n(x). Suppose that the kernel K(·) is chosen, so that θ > 0, which is equivalent to θ(x) > 0. Then, for sufficiently large n, i(n) > n + nhn(θ(x) − ε). Thus, i(n) is substantially larger than n, and, indeed, i(n) − n tends to ∞. Actually, Reiss (1981) and Falk (1983) posed the question of determining the exact value of the superiority of θ over a certain
CHAPTER 7. NONPARAMETRIC DENSITY, DISTRIBUTION & QUANTILE ESTIMATION258
class of kernels. More specifically, let Km be the class of kernels K : [−1, 1] → 4 which are absolutely continuous and satisfy the M1 requirements: K(−1) = 0, K(1) = 1, and −1 uµK(u)du = 0, µ = 1, · · · , m, for some m = 0, 1, · · · (where the moment condition is vacuous for m = 0). Set Ψm = sup{θ; K ∈ Km}. Then, Mammitzsch (1984) answered the question posed by showing in an elegant manner. See Cai and Roussas (1998) for more details and simulation results. 7.4
Quantile Estimation
Let X(1) ≤ X(2) ≤ · · · ≤ X(n) denote the order statistics of {Xt}nt=1. Define the inverse of F (x) as F −1(p) = inf{x ∈ 4; F (x) ≥ p}, where 4 is the real line. The traditional estimate of F (x) has been the empirical distribution function Fn(x) based on X1, . . . , Xn, while the estimate of the p-th quantile ξp = F −1(p), 0 < p < 1, is the sample quantile function ξpn = Fn−1(p) = X([np]), where [x] denotes the integer part of x. It is a consistent estimator of ξp for α-mixing data (Yoshihara, 1995). However, as stated in Falk (1983), Fn(x) does not take into account the smoothness of F (x); i.e., the existence of a probability density function f (x). In order to incorporate this characteristic, investigators proposed several smoothed quantile estimates, one of which is based on F#n(x) obtained as a convolution between Fn(x) and a properly scaled kernel function; see the previous section. Finally, note that R has a command quantile() which can be used for computing ξpn, the nonparametric estimate of quantile. 7.4.1
Value at Risk
Value at Risk (VaR) is a popular measure of market risk associated with an asset or a portfolio of assets. It has been chosen by the Basel
CHAPTER 7. NONPARAMETRIC DENSITY, DISTRIBUTION & QUANTILE ESTIMATION259
Committee on Banking Supervision as a benchmark risk measure and has been used by financial institutions for asset management and minimization of risk. Let {Xt}nt=1 be the market value of an asset over n periods of t = 1 a time unit, and let Yt = log(Xt/Xt−1) be the log-returns. Suppose {Yt}nj=1 is a strictly stationary dependent process with marginal distribution function F (y). Given a positive value p close to zero, the 1 − p level VaR is νp = inf{u : F (u) ≥ p},
which specifies the smallest amount of loss such that the probability of the loss in market value being larger than νp is less than p. Comprehensive discussions on VaR are available in Duffie and Pan (1997) and Jorion (2001), and references therein. Therefore, VaR can be regarded as a special case of quantile. R has a build-in package called VaR for a set of methods for calculation of VaR, particularly, for parametric models. Another popular risk measure is the expected shortfall (ES) which is the expected loss, given that the loss is at least as large as some given quantile of the loss distribution (e.g., VaR). It is well known from Artzner, Delbaen, Eber and Heath (1999) that ES is a coherent risk measure such as it satisfies the four axioms: homogeneity (increasing the size of a portfolio by a factor should scale its risk measure by the same factor), monotonicity (a portfolio must have greater risk if it has systematically lower values than another), risk-free condition or translation invariance (adding some amount of cash to a portfolio should reduce its risk by the same amount), and subadditivity (the risk of a portfolio must be less than the sum of separate risks or merging portfolios cannot increase risk). VaR satisfies homogeneity, monotonicity, and risk-free condition but is not sub-additive. See Artzner, et al. (1999) for details.
CHAPTER 7. NONPARAMETRIC DENSITY, DISTRIBUTION & QUANTILE ESTIMATION260
7.4.2
Nonparametric Quantile Estimation
The smoothed sample quantile estimate of ξp, ξ"p, based on F#n(x), is defined by: #−1
"
5
#
6
ξp = Fn (p) = inf x ∈ 4; Fn(x) ≥ p .
ξ"p is referred to in literature as the perturbed (smoothed) sample quantile. Asymptotic properties of ξ"p, both under independence as well as under certain modes of dependence, have been investigated extensively in literature; see Cai and Roussas (1997) and Chen and Tang (2005). By the differentiability of F#n(x), we use the Taylor expansion and ignore the higher terms to obtain F#n(ξ"p) = p ≈ F#n(ξp) − fn(ξp) (ξ"p − ξp),
then,
(7.5)
ξ"p − ξp ≈ [F#n(ξp) − p]/fn(ξp) ≈ [F#n(ξp) − p]/f (ξp)
since fn(x) is a consistent estimator of f (x). As an application of Theorem 2, we can establish the following theorem for the asymptotic normality of ξ"p but the proof is omitted since it is similar to that for Theorem 2. Theorem 3: Under regularity conditions, we have + , √ " h2 2 2 2 n ξp − ξp − µ2(K) f (ξp)/f (ξp ) + op(h ) → N 0, σF (ξp)/f (ξp) . 2 Next, let us examine the AMSE. To this effect, we can derive the asymptotic bias and variance. From the previous section, we have h2 E ξp = ξp + µ2(K) f - (ξp)/f (ξp) + op(h2), 2 7
"
8
CHAPTER 7. NONPARAMETRIC DENSITY, DISTRIBUTION & QUANTILE ESTIMATION261
and
7
8
n Var ξp = σF2 (ξp)/f 2(ξp) − h θ/f (ξp) + o(h). "
Therefore, the AMSE is
n h4 2 n AMSE(ξp) = µ2(K) [f -(ξp)/f (ξp )]2+σF2 (ξp)/f 2(ξp)−h θ/f (ξp). 4 If θ > 0, minimizing the AMSE gives the "
hopt
1/3
θ f (ξp) = µ22(K)[f -(ξp)]2
n−1/3,
and with this asymptotically optimal bandwidth, the optimal AMSE is given by 3 n AMSEopt(ξ"p) = σF2 (ξp)/f 2(ξp) − 4
2/3
θ2 µ2(K)f - (ξp)f (ξp )
n−1/3,
which indicates a reduction to the AMSE of the second order. Chen and Tang (2005) conducted an intensive study on simulations to demonstrate the advantages of nonparametric estimation ξ"p over the sample quantile ξpn under the VaR setting. We refer to the paper by Chen and Tang (2005) for simulation results and empirical examples. Exercise: Please use the above procedures to estimate nonparametrically the ES and discuss its properties as well as conduct simulation studies and empirical applications. 7.5
Computer Code
# July 20, 2006 graphics.off() # clean the previous garphs on the secreen
CHAPTER 7. NONPARAMETRIC DENSITY, DISTRIBUTION & QUANTILE ESTIMATION262
######################################################### # Define the Epanechnikov kernel function kernel<-function(x){0.75*(1-x^2)*(abs(x)<=1)} ############################################################ # Define the kernel density estimator kernden=function(x,z,h,ker){ # parameters: x=variable; h=bandwidth; z=grid point; ker=k nz<-length(z) nx<-length(x) x0=rep(1,nx*nz) dim(x0)=c(nx,nz) x1=t(x0) x0=x*x0 x1=z*x1 x0=x0-t(x1) if(ker==1){x1=kernel(x0/h)} # Epanechnikov kernel if(ker==0){x1=dnorm(x0/h)} # normal kernel f1=apply(x1,2,mean)/h return(f1) } ############################################################ ############################################################ # Simulation for different bandiwidths and different kernels n=300 # n=300 ker=1 # ker=1 => Epan; ker=0 => Ga h0=c(0.25,0.5,1) # set initial bandwidths z=seq(-4,4,by=0.1) # grid points nz=length(z) # number of grid points x=rnorm(n) # simulate x ~ N(0, 1) if(ker==1){h_o=2.34*n^{-0.2}} # optimal bandwidth for Epan
CHAPTER 7. NONPARAMETRIC DENSITY, DISTRIBUTION & QUANTILE ESTIMATION263
if(ker==0){h_o=1.06*n^{-0.2}} # optimal bandwidth for norm f1=kernden(x,z,h0[1],ker) f2=kernden(x,z,h0[2],ker) f3=kernden(x,z,h0[3],ker) f4=kernden(x,z,h_o,ker) text1=c("True","h=0.25","h=0.5","h=1","h=h_o") data=cbind(dnorm(z),f1,f2,f3,f4) # combine th win.graph() matplot(z,data,type="l",lty=1:5,col=1:5,xlab="",ylab="") legend(-1,0.2,text1,lty=1:5,col=1:5) ###########################################################
########################################################### # A Real Example ################## z1=matrix(scan(file="c:\\teaching\\time series\\data\\w-3mt byrow=T,ncol=4) # dada: weekly 3-month Treasury bill from 1970 to 1997 x=z1[,4]/100 # decimal n=length(x) y=diff(x) # Delta x_t=x_t-x_{t-1}=ch x=x[1:(n-1)] n=n-1 x_star=(x-mean(x))/sqrt(var(x)) # standardized den_3mtb=density(x_star,bw=0.30,kernel=c("epanechnikov"),fr den_est=den_3mtb$y # estimated density values z_star=seq(-3,3,by=0.1) text1=c("Estimated Density","Standard Norm") win.graph()
CHAPTER 7. NONPARAMETRIC DENSITY, DISTRIBUTION & QUANTILE ESTIMATION264
par(bg="light green") plot(den_3mtb,main="Density of 3mtb (Buind-in)",ylab="",xla points(z_star,dnorm(z_star),type="l",lty=2,col=2,ylab="",xl legend(0,0.45,text1,lty=c(1,2),col=c(1,2),cex=0.7) h_den=0.5 f_hat=kernden(x_star,z_star,h_den,1) ff=cbind(f_hat,dnorm(z_star))
win.graph() par(bg="light blue") matplot(z_star,ff,type="l",lty=c(1,2),col=c(1,2),ylab="",xl title(main="Density of 3mtb",col.main="red") legend(0,0.55,text1,lty=c(1,2),col=c(1,2),cex=0.7) ###########################################################
7.6
References
Artzner, P., F. Delbaen, J.M. Eber, and D. Heath (1999). Coherent measures of risk. Mathematical Finance, 9, 203-228. Cai, Z. (2002). Regression quantile for time series. Econometric Theory, 18, 169-192. Cai, Z. and G.G. Roussas (1997). Smooth estimate of quantiles under association. Statistics and Probability Letters, 36, 275-287. Cai, Z. and G.G. Roussas (1998). Efficient estimation of a distribution function under quadrant dependence. Scandinavian Journal of Statistics, 25, 211-224. Chen, S.X. and C.Y. Tang (2005). Nonparametric inference of value at risk for dependent financial returns. Journal of Financial Econometrics, 3, 227-255. Duffie, D. and J. Pan (1997). An overview of value at risk. Journal of Derivatives, 4, 7-49. Fan, J. and Q. Yao (2003). Nonlinear Time Series: Nonparametric and Parametric Methods. Springer-Verlag, New York.
CHAPTER 7. NONPARAMETRIC DENSITY, DISTRIBUTION & QUANTILE ESTIMATION265 Gasser, T. and H.-G. M¨ uller (1979). Kernel estimation of regression functions. In Smoothing Techniques for Curve Estimation, Lecture Notes in Mathematics, 757, 23-68. SpringerVerlag, New York. Falk, M.(1983). Relative efficiency and deficiency of kernel type estimators of smooth distribution functions. Statistica Neerlandica, 37, 73-83. Hall, P. and C.C. Heyde (1980). Martingale Limit Theory and its Applications. Academic Press, New York. Hall, P. and T.E. Wehrly (1991). A geometrical method for removing edge effects from kernel-type nonparametric regression estimators. Journal of American Statistical Association, 86, 665-672. Hjort, N.L. and M.C. Jones (1996). Locally parametric nonparametric density estimation. The Annals of Statistics, 24,1619-1647. Jorion, P. (2001). Value at Risk, 2nd Edition. New York: McGraw-Hill. Karunamuni, R.J. and T. Alberts (2003). On boundary correction in kernel density estimation. Working paper, Department of Mathematical an d Statistical Sciences, University of Alberta, Canada. Lehmann, E. (1966). Some concepts of dependence. Annals of Mathematical Statistics, 37, 1137-1153. Loader, C. R. (1996). Local likelihood density estimation. The Annals of Statistics, 24, 1602-1618. Mammitzsch, V. (1984). On the asymptotically optimal solution within a certain class of kernel type estimators. Statistics Decisions, 2, 247-255. Marron, J.S. and D. Ruppert (1994). Transformations to reduce boundary bias in kernel density estimation. Journal of the Royal Statistical Society Series B, 56, 653-671. M¨ uller, H.-G. (1993). On the boundary kernel method for nonparametric curve estimation near endpoints. Scandinavian Journal of Statistics, 20, 313-328. Reiss, R.D. (1981). Nonparametric estimation of smooth distribution functions. Scandinavia Journal of Statistics, 8, 116-119. Schuster, E.F. (1985). Incorporating support constraints into nonparametric estimates of densities. Communications in Statistics Theory and Methods, 14, 1123-1126. Wand, M.P., J.S. Marron and D. Ruppert (1991). Transformations in density estimation (with discussion). Journal of the American Statistical Association, 86, 343-361. Yoshihara, K. (1995). The Bahadur representation of sample quantiles for sequences of strongly mixing random variables. Statistics and Probability Letters, 24, 299-304.
CHAPTER 7. NONPARAMETRIC DENSITY, DISTRIBUTION & QUANTILE ESTIMATION266 Zhang, S. and R.J. Karunamuni (1998). On Kernel density estimation near endpoints. Journal of Statistical Planning and Inference, 70, 301-316.
Chapter 8 Nonparametric Regression Estimation 8.1 8.1.1
Bandwidth Selection Simple Bandwidth Selectors
The optimal bandwidth (7.3) is not directly usable since it depends on the unknown parameter ||f --||2. When f (x) is a Gaussian density with standard deviation σ, it is easy to see from (7.3) that √ hopt = (8 π/3)1/5C1(K) σ n−1/5, which is called the normal reference bandwidth selector in literature, obtained by replacing the unknown parameter σ in the above equation by the sample standard deviation s. In particular, after calculating the constant C1(K) numerically, we have the following normal reference bandwidth selector "
hopt
1.06 s n−1/5 for the Gaussian kernel = 2.34 s n−1/5 for the Epanechnikov kernel
Hjort and Jones (1996) proposed an improved rule obtained by using an Edgeworth expansion for f (x) around the Gaussian density. Such a rule is given by h" ∗opt = hopt
−1/5 385 35 35 2 2 1 + γ4 , γ" 4 + γ" 3 + 48 32 1024 267
CHAPTER 8. NONPARAMETRIC REGRESSION ESTIMATION
268
where γ" 3 and γ" 4 are respectively the sample skewness and kurtosis.
Note that the normal reference bandwidth selector is only a simple rule of thumb. It is a good selector when the data are nearly Gaussian distributed, and is often reasonable in many applications. However, it can lead to over-smooth when the underlying distribution is asymmetric or multi-modal. In that case, one can either subjectively tune the bandwidth, or select the bandwidth by more sophisticated bandwidth selectors. One can also transform data first to make their distribution closer to normal, then estimate the density using the normal reference bandwidth selector and apply the inverse transform to obtain an estimated density for the original data. Such a method is called the transformation method. There are quite a few important techniques for selecting the bandwidth such as crossvalidation (CV) and plug-in bandwidth selectors. A conceptually simple technique, with theoretical justification and good empirical performance, is the plug-in technique. This technique relies on finding an estimate of the functional ||f --||2, which can be obtained by using a pilot bandwidth. An implementation of this approach is proposed by Sheather and Jones (1991) and an overview on the progress of bandwidth selection can be found in Jones, Marron and Sheather (1996). Function dpik() in the package KernSmooth in R selects a bandwidth for estimating the kernel density estimation using the plug-in method. 8.1.2
Cross-Validation Method
The integrated squared error (ISE) of fn(x) is defined by L
ISE(h) = [fn(x) − f (x)]2dx.
CHAPTER 8. NONPARAMETRIC REGRESSION ESTIMATION
269
A commonly used measure of discrepancy between fn(x) and f (x) is the mean integrated squared error (MISE) MISE(h) = E[ISE(h)]. It can be shown easily (or see Chiu, 1991) that MISE(h) ≈ AMISE(h). The optimal bandwidth minimizing the AMISE is given in (7.3). The least squares cross-validation (LSCV) method proposed by Rudemo (1982) and Bowman (1984) is a popular method to estimate the optimal bandwidth hopt. Cross-validation is very useful for assessing the performance of an estimator via estimating its prediction error. The basic idea is to set one of the data point aside for validation of a model and use the remaining data to build the model. The main idea is to choose h to minimize ISE(h). Since ISE(h) =
L
fn2(x)dx
−2
L
L
f (x) fn (x)dx + f 2(x)dx,
the question is how to estimate the second term on the right hand side. Well, let us consider the simplest case when {Xt} are iid. Reexpress fn(x) as n − 1 (−s) 1 fn (x) + Kh(Xs − x) n n for any 1 ≤ s ≤ n, where fn(x) =
fn(−s)(x) =
1 n−1
n $
t)=s
Kh(Xt − x),
which is the kernel density estimate without the sth observation, commonly called the jackknife estimate or leave-one-out estimate. It is easy to see that for any 1 ≤ s ≤ n, fn(x) ≈ fn(−s)(x). Let Ds = {X1, · · · , Xs−1, Xs+1, · · · , Xn}. Then, E
7
fn(−s)(Xs) | Ds
8
=
L
fn(−s)(x)f (x)dx
≈
L
fn(x)f (x)dx,
CHAPTER 8. NONPARAMETRIC REGRESSION ESTIMATION
which, by using the method of moment, can be estimated by n1 Therefore, the cross-validation is CV(h) =
L
fn2(x)dx
1 = 2 n
$
s,t
2 − n
Kh∗(Xs
n $
s=1
270 %n
(−s) (Xs). s=1 fn
fn(−s)(Xs)
2 − Xt) − n(n − 1)
n $
t)=s
Kh(Xs − Xt),
where Kh∗(·) is the convolution of Kh(·) and Kh(·) as Kh∗(u)
=
L
Kh(v) Kh(u − v)dv.
Let h" cv be the minimizer of CV(h). Then, it is called the optimal bandwidth based on the cross-validation. Stone (1984) showed that h" cv is a consistent estimate of the optimal bandwidth hopt. Function lscv() in the package locfit in R selects a bandwidth for estimating the kernel density estimation using the least squares cross-validation method. 8.2
Multivariate Density Estimation
As we discussed in Chapter 7, the kernel density or distribution estimation is basically one-dimensional. For multivariate case, the kernel density estimate is given by fn(x) =
1 n
n $
t=1
KH (Xt − x),
(8.1)
where KH (u) = K(H −1 u)/ det(H), K(u) is a multivariate kernel function, and H is the bandwidth matrix such as for all 1 ≤ i, j ≤ p, n hij → ∞ and hij → 0 where hij is the (i, j)th element of H. The bandwidth matrix is introduced to capture the dependent structure in the independent variables. Particularly, if H is a diagonal matrix
CHAPTER 8. NONPARAMETRIC REGRESSION ESTIMATION
271
and K(u) = pj=1 Kj (uj ) where Kj (·) is a univariate kernel function, then, fn(x) becomes N
1 fn(x) = n
p n O $
t=1 j=1
Khj (Xjt − xj ),
which is called the product kernel density estimation. This case is commonly used in practice. Similar to the univariate case, it is easy to derive the theoretical results for the multivariate case, which is left as an exercise. See Wand and Jones (1995) for details. Exercise: Please derive the asymptotic results given in (8.1) for the general multivariate case. In R, the build-in function density() is only for univariate case. For multivariate situations, there are two packages ks and KernSmooth. Function kde() in ks can compute the multivariate density estimate for 2- to 6- dimensional data and Function bkde2D() in KernSmooth computes the 2D kernel density estimate. Also, ks provides some functions for some bandwidth matrix selection such as Hbcv() and Hscv for 2D case and Hlscv() and Hpi(). 8.3
Regression Function
Suppose that we have the information set It at time t and we want to forecast the future value, say Yt+1 (one step-ahead forecast, or Yt+s, s-step ahead). There are several forecasting criteria. The general form is m(It) = min E[ρ(Yt+1 − a) | It], a
where ρ(·) is an objective (loss) function. There are three major criteria: (1) If ρ(·) is the quadratic function, then, m(It) = E(Yt+1 | It),
CHAPTER 8. NONPARAMETRIC REGRESSION ESTIMATION
272
called the mean regression function. (2) If ρτ (y) = y (τ − I{y<0}) called the “check” function, where τ ∈ (0, 1) and IA is the indicator function of any set A, then, m(It) satisfies L m(I ) t f (y | It) du = F (m(It) | It) = τ, −∞
where f (y | It) and F (m(It) | It) are the conditional PDF and CDF of Yt+1, respectively, given It. This m(It) becomes the conditional quantile or quantile regression, dented by qτ (It). Particularly, if τ = 1/2, then, m(It) is the well known absolute deviation (LAD) regression which is robust. (3) If ρ(x) = 21 x2 I|x|≤M +M (|x|−M/2) I|x|>M , the so called Huber function in literature, then it is the Huber robust regression. We will not discuss this topic. If you have an interest, please read the book by Rousseeuw and Leroy (1987). In R, the library MASS has the function rlm for robust linear model. Also, the library lqs contains functions for bounded-influence regression. Since the information set It contains too many variables (high dimension), it is often to approximate It by some finite numbers of variables, say Xt = (Xt1, . . . , Xtp)T (p ≥ 1), including the lagged variables and exogenous variables. First, our focus is on the mean regression m(Xt). Of course, by the same token, we can consider the nonparametric estimation of the conditional variance σ 2(x) = Var(Yt|Xt = x). Why do we need to consider nonlinear (nonparametric) models in economic practice? You can find the answer in the book by Granger, C.W.J., and T. Ter¨asvirta (1993).
CHAPTER 8. NONPARAMETRIC REGRESSION ESTIMATION
8.4
273
Kernel Estimation
Let us look at the Nadaraya-Watson estimate of the mean regression m(Xt). The main idea is as follows: m(x) =
L
y f (y | x)dy =
M
y f (x, y)dy , M f (x, y)dy
where f (x, y) is the joint PDF of Xt and Yt. To estimate m(x), we can apply the plug-in method. That is, plug the nonparametric kernel density estimate fn(x, y) (double kernel method) into the right hand side of the above equation to obtain M y fn(x, y)dy P m (x) = M nw fn(x, y)dy = ... n 1 $ = Yt Kh(Xt − x)/fn(x) n t=1 =
n $
t=1
Wt Yt ,
where fn(x) is the kernel density estimation of f (x), defined in Chapter 7, and n $ Wt = Kh(Xt − x)/ Kh(Xt − x). t=1
P m nw (x) is the well known Nadaraya-Watson (NW) estimator. Note P that the weights {Wt} do not depend on {Yt}. Therefore, m nw (x) is called a linear estimator, similar to the least squares estimate (LSE). P Let us look at the NW estimator from a different angle. m nw (x) can be re-expressed as the minimizer of the weighted locally least squares; that is, P m nw (x) = min a
n $
(Yt − a)2 Kh(Xt − x).
t=1
CHAPTER 8. NONPARAMETRIC REGRESSION ESTIMATION
274
This means that when Xt is in a neighborhood of x, m(Xt) is approximated by a constant a (local approximation). Indeed, we consider the following working model Yt = m(Xt) + εt ≈ a + εt with the weights {Kh(Xt − x)}, where εt = Yt − E(Yt | Xt). In the implementation, for each x, we can fit the following transformed linear model Yt∗ = β1 Xt∗ + εt, Yt∗
&
Xt∗
&
where = Kh(Xt − x) Yt and = Kh(Xt − x). Therefore, the Nadaraya-Watson estimator is also called the local constant estimator. In R, we can use functions lm() or glm() with weights {Kh(Xt − x)} to fit a weighted least squares or generalized linear model. Or, you can use the weighted least squares theory (matrix multiplication). 8.4.1
Asymptotic Properties
We derive the asymptotic properties of the nonparametric estimator for the time series situations. Also, we consider the simple case that p = 1. n n 1 $ $ P m (x) = m(X ) K (X − x)/f (x) + Wt εt . nw t h t n n t=1 t=1 1 23 4 23 4 1 I1
I2
We will show that I1 contributes only bias and I2 gives the asymptotic normality. First, we derive the asymptotic bias for the interior boundary points. By the Taylor’s expansion, when Xt is in (x − h, x + h), we have 1 m(Xt) = m(x) + m-(x)(Xt − x) + m--(xt)(Xt − x)2, 2
CHAPTER 8. NONPARAMETRIC REGRESSION ESTIMATION
275
where xt = x + θ(Xt − x) with −1 < θ < 1. Then, n 1 $ I11 = m(Xt) Kh(Xt − x) n t=1 n 1 $ (Xt − x) Kh(Xt − x) = m(x) fn(x) + m (x) n t=1 1 23 4 J1 (x)
+ Then,
1 1 2 1n
n $
t=1
m--(xt)(Xt − x)2 Kh(Xt − x) . 23
4
J2 (x)
E[J1(x)] = E[(Xt − x) Kh(Xt − x)] L = (u − x)Kh(u − x)f (u)du L
= h uK(u)f (x + hu)du = h2 f -(x) µ2(K) + o(h2).
Similar to the derivation of the variance of fn(x) in (7.2), we can show that nh Var(J1(x)) = O(1). Therefore, J1(x) = h2 f -(x) µ2(K) + op(h2). By the same token, we have 7
--
2
8
E[J2(x)] = E m (xt)(Xt − x) Kh(Xt − x) 2
L
= h m--(x + θ hu)u2K(u)f (x + hu)du = h2 m--(x) µ2(K) f (x) + o(h2)
and Var(J2(x)) = O(1/nh). Therefore, J2(x) = h2 m--(x) µ2(K) f (x)+ op(h2). Hence, 1 I1 = m(x) + m-(x) J1(x)/fn (x) + J2(x)/fn(x) 2 2 h = m(x) + µ2(K) [m--(x) + 2 m-(x)f - (x)/f (x)] + op(h2) 2
CHAPTER 8. NONPARAMETRIC REGRESSION ESTIMATION
276
by the fact that fn(x) = f (x) + op(1). The term h2 Bnw (x) = µ2(K) [m--(x) + 2 m-(x)f - (x)/f (x)] 2 is regarded as the asymptotic bias. If p > 1 (multivariate case), Bnw (x) becomes 5 68 h2 7 -T tr µ2(K) m (x) + 2 f (x)m (x) /f (x) , (8.2) Bnw (x) = 2 M where µ2(K) = u uT K(u)du. The bias term involves not only curvatures of m(x) but also the unknown density function f (x) and its derivative f -(x) so that the design can not be adaptive.
Under some regularity conditions, similar to (7.2), we can show that for x being an interior grid point, 2 n hp Var(I2) → ν0(K) σε2(x)/f (x) = σm (x),
where σε2(x) = Var(εt | Xt = x). Further, we can establish the asymptotic normality (the proof is provided later) 7 8 5 6 √ 2 2 P n hp m (x) − m(x) − B (x) + o (h ) → N 0, σ (x) , nw nw p m where Bnw (x) is given in (8.2).
When p is large, there exists the so called “curse of dimensionality”. To understand this problem quantitatively, we can look at the rate of convergence. The bias is of order O(h2) and the variance is of order O(/nhp ). This leads to the optimal rate of convergence for MSE O(n−2/(4+p)) by trading off the rates between the bias and variance. To have a comparable performance with one-dimensional nonparametric regression with n1 data points, for p-dimensional nonparametric regression, we need −2/5
n−2/(4+p) = O(n1
),
CHAPTER 8. NONPARAMETRIC REGRESSION ESTIMATION
277
Table 8.1: Sample sizes required for p-dimensional nonparametric regression to have comparable performance with that of 1-dimensional nonparametric regression using size 100 dimension sample size
2 3 252 631
4 1,585
5 3,982
6 10,000
7 25,119
8 63,096
9 158,490
10 398,108
(p+4)/5
or n = O(n1 ). Table 8.1 shows the result with n1 = 100. The increase of required sample sizes is exponentially fast. 8.4.2
Boundary Behavior
As for the boundary behavior of the NW estimator, we can follow Fan and Gijbels (1996). Without loss of generality, we consider the left boundary point x = c h, 0 < c < 1. From Fan and Gijbels (1996), we take K(·) to have support [−1, 1] and m(·) to have support [0, 1]. Similar to (7.4), it is easy to see that if x = c h, E[J1(ch)] = E[(Xt − ch) Kh(Xt − ch)] =
L 1
0
= h
(u − ch)Kh(u − ch)f (u)du L 1/h−c
−c
uK(u)f (h(u + c))du
= h f (0+) µ1,c (K) + h2 f -(0+)[µ2,c(K) + c µ1,c(K)] + o(h), and 7
--
2
8
E[J2(ch)] = E m (xt)(Xt − ch) Kh(Xt − ch) 2
= h
L 1/h−c
−c 2 --
m--(h(c + θ u))u2K(u)f (h(u + c))du
= h m (0+) µ2,c(K) f (0+) + o(h2). Also, we can see that Var(J1(ch)) = O(1/nh) and Var(J2(ch)) = O(1/nh),
CHAPTER 8. NONPARAMETRIC REGRESSION ESTIMATION
278
which imply that J1(ch) = h f (0+) µ1,c (K) + op(h) and J2(ch) = h2 m--(0+) µ2,c(K) f (0+) + o(h2). This, in conjunction with (7.4), gives 1 I1 − m(ch) = m-(ch) J1(ch)/fn (ch) + J2(ch)/fn(ch) 2 2 = a(c, K) h + b(c, K) h + op(h2) where
m-(0+)µ1,c (K) a(c, K) = , µ0,c(K)
and µ2,c(K) m--(0+) b(c, K) = 2 µ0,c(K) f -(0+)m-(0+)[µ2,c(K) µ0,c (K) − µ21,c(K)] . + f (0+) µ20,c (K) Here, a(c, K) h+b(c, K) h2 serves as the asymptotic bias term, which is of the order O(h). We can show that at the boundary point, the asymptotic variance has the following form 2 P n hpVar(m nw (x)) → ν0,c (K) σm (0+)/[µ0,c (K) f (0+)],
which the same order as that for the interior point although the scaling constant is different.
CHAPTER 8. NONPARAMETRIC REGRESSION ESTIMATION
8.5
279
Local Polynomial Estimate
To overcome the above shortcomings of local constant estimate, we can use the local polynomial fitting scheme; see Fan and Gijbels (1996). The main idea is described as follows. 8.5.1
Formulation
Assume that the regression function m(x) has (q + 1)th order continuous derivative. For ease notation, assume that p = 1. When Xt ∈ (x − h, x + h), then q m(j)(x) $ j m(Xt) ≈ (Xt − x) = βj (Xt − x)j , j! j=0 j=0 q $
where βj = m(j)(x)/j!. Therefore, when Xt ∈ (x − h, x + h), the model becomes q $ Yt ≈ βj (Xt − x)j + εt. j=0
Hence, we can apply the weighted least squares method. The weighted locally least squares becomes n $ t
t=1
Y −
q $
j=0
2
Kh (Xt − x). βj (Xt − x)j
(8.3)
Minimizing the above with respect to β = (β0, . . . , βq )T to obtain # the local polynomial estimate β; #
+
T
,−1
β = X WX
XT W Y,
(8.4)
where W = diag{Kh(X1 − x), · · · , Kh(Xn − x)}, q Y 1 (X − x) · · · (X − x) 1 1 1 q (X − x) · · · (X − x) Y2 1 2 2 . and Y = X= , ... ... ... ... ... q Yn 1 (Xn − x) · · · (Xn − x)
CHAPTER 8. NONPARAMETRIC REGRESSION ESTIMATION
Therefore, for 1 ≤ j ≤ q,
280
# P(j) (x) = j! β m j.
This means that the local polynomial method estimates not only the regression function itself but also derivatives of regression. 8.5.2
Implementation in R
There are several ways of implementing the local polynomial estimator. One way you can do so is to write your own code by using matrix multiplication as in (8.4) or employing function lm() or glm() with weights {Kh(Xt − x)}. Recently, in R, there are some build-in packages for implementing the local polynomial estimate. For example, the package KernSmooth contains several functions. Function bkde() computes the kernel density estimate and Function bkde2D() computes the 2D kernel density estimate as well as Function bkfe() computes the kernel functional (derivative) density estimate. Function dpik() selects a bandwidth for estimating the kernel density estimation using the plug-in method and Function dpill() choose a bandwidth for the local linear (q = 1) regression estimation using the plug-in approach. Finally, Function locpoly() is for the local polynomial fitting including a local polynomial estimate of the density of X (or its derivative). Example: We apply the kernel regression estimation method and local polynomial fitting method to the drift and diffusion of the weekly 3-month Treasury bill from January 2, 1970 to December 26, 1997. Let xt denote the weekly 3-month Treasury bill. It is often to model xt by assuming that it satisfies the continuous-time stochastic differential equation (Black-Scholes model) d xt = µ(xt) dt + σ(xt) dWt,
CHAPTER 8. NONPARAMETRIC REGRESSION ESTIMATION
281
where Wt is a Wiener process, µ(xt) is called the drift function and σ(xt) is called the diffusion function. Our interest is to identify µ(xt) and σ(xt). Assume a time series sequence {Xt ∆, 1 ≤ t ≤ n} is observed at equally spaced time points. Using the infinitesimal generator (Øksendal, 1985), the first-order approximations of moments of xt, a discretized version of the Ito’s process, are given by Stanton (1997) √ ∆ xt = µ(xt) ∆ t + σ(xt) ε ∆, where ∆ xt = xt+∆ − xt and ε ∼ N (0, 1). Therefore, µ(xt) = lim E[∆xt | xt]/∆ ∆→0
and 2
7
2
8
σ (xt) = lim E (∆xt) | xt /∆. ∆→0
See Fan and Zhang (2003) for the higher orders. 8.5.3
Complexity of Local Polynomial Estimator
To implement the local polynomial estimator, we have to choose the order of the polynomial q, the bandwidth h and the kernel function K(·). These parameters are of course confounded each other. Clearly, when h = ∞, the local polynomial fitting becomes a global polynomial fitting and the order q determines the model complexity. Unlike in the parametric models, the complexity of local polynomial fitting is primarily controlled by the bandwidth, as shown in Fan and Gijbels (1996) and Fan and Yao (2003). Hence q is usually small and the issue of choosing q becomes less critical. We discuss those issues in detail as follows. (1) If the objective is to estimate m(j)(·) (j ≥ 0), the local polynomial fitting corrects automatically the boundary bias when q − j is
CHAPTER 8. NONPARAMETRIC REGRESSION ESTIMATION
282
is odd. Further, when q − j is odd, comparing with the order q − 1 fit (so that q − j − 1 is even), the order q fit contains one extra parameter without increasing the variance for estimating m(j)(·). But this extra parameter creates opportunities for bias reduction, particularly in the boundary regions; see the next section and the books by Fan and Gijbels (1996) and Ruppert and Wand(1994). For these reasons, the odd order fits (the order q is chosen so that q − j is odd) outperforms the even order fits [the order (q − 1) fit so that q is even]. Based on theoretical and practical considerations, the order q = j + 1 is recommended in Fan and Gijbels (1996). If the primary objective is to estimate the regression function, one uses local linear fit and if the target function is the first order derivative, one uses the local quadratic fit and so on. (2) It is well known that the choice of the bandwidth h plays an important role in the local polynomial fitting. A too large bandwidth causes over-smoothing, creating excessive modeling bias, while a too small bandwidth results in under-smoothing, obtaining wiggly estimates. The bandwidth can be subjectively chosen by users via visually inspecting resulting estimates, or automatically chosen by data via minimizing an estimated theoretical risk (discussed later). (3) Since the estimate is based on the local regression (8.3), it is reasonable to require a non-negative weight function K(·). It can be shown (see Fan and Gijbels (1996)) that for all choices of q and j, the optimal weight function is K(z) = 3/4(1 − z 2)+, the Epanechnikov kernel, based on minimizing the asymptotic variance of the local polynomial estimator. Thus, it is a universal weighting scheme and provides a useful benchmark for other kernels to compare with. As shown in Fan and Gijbels (1996) and Fan and Yao (2003), other kernels have nearly the same efficiency for practical use of q and j. Hence the choice of the kernel function is not critical.
CHAPTER 8. NONPARAMETRIC REGRESSION ESTIMATION
283
The local polynomial estimator compares favorably with other estimators, including the Nadaraya-Watson (local constant) estimator and other linear estimators such as the Gasser and M¨uller estimator of Gasser and M¨uller (1979) and the Priestley and Chao estimator of Priestley and Chao (1972). Indeed, it was shown by Fan (1993) that the local linear fitting is asymptotically minimax based on the quadratic loss function among all linear estimators and is nearly minimax among all possible linear estimators. This minimax property is extended by Fan, Gasser, Gijbels, Brockmann and Engel (1995) to more general local polynomial fitting. For the detailed comparisons of the above four estimators, see Fan and Gijbels (1996). Note that the Gasser and M¨uller estimator and the Priestley and Chao estimator are particularly for the fixed design. That is, Xt = t. Let st = (2t + 1)/2 (t = 1, · · · , n − 1) with s0 = −∞ and sn = ∞. The Gasser and M¨uller estimator is #
fgm(t0) =
n L st $
t=1 st−1
Kh(u − t0)du Yt.
Unlike the local constant estimator, no denominator is needed since the total weight n L st $ Kh(u − t0)du = 1. s t=1
t−1
Indeed, the Gasser and M¨uller estimator is an improved version of the Priestley and Chao estimator, which is defined as #
fpc(t0) =
n $
t=1
Kh(t − t0) Yt.
Note that the Priestley and Chao estimator is only applicable for the equi-space setting.
CHAPTER 8. NONPARAMETRIC REGRESSION ESTIMATION
8.5.4
284
Properties of Local Polynomial Estimator
Define, for 0 ≤ j ≤ q, sn,j (x) =
n $
(Xt − x)j Kh(Xt − x)
t=1
and Sn(x) = XT W X. Then, the (i + 1, j + 1)th element of Sn(x) is sn,i+j (x). Similar to the evaluation of I11, we can show easily that sn,j (x) = n hj µj (K) f (x){1 + op(1)}. Define, H = diag{1, h, · · · , hq } and S = (µi+j (K))0≤i,j≤q . Then, it is not difficult to show that Sn(x) = n f (x) H S H {1 + op(1)}. First of all, for 0 ≤ j ≤ q, let ej be a (q + 1) × 1 vector with (j + 1)th element being one and zero otherwise. Then, β#j can be re-expressed as #
βj =
eTj β#
=
n $
t=1
Wj,n,h (Xt − x) Yt,
where Wj,n,h (Xt − x) is called the effective kernel in Fan and Gijbels (1996) and Fan and Yao (2003), given by Wj,n,h (Xt −x) = eTj Sn(x)−1 (1, (Xt −x), · · · , (Xt −x)q )T Kh(Xt −x). It is not difficult to show that Wj,n,h (Xt − x) satisfies the following the so-called discrete moment conditions n $
(Xt − x)l Wj,n,h (Xt − x) =
t=1
1 if l = j, 0 otherwise.
(8.5)
Note that the local constant estimator does not have this property; see J1(x) in Section 8.4.1. This property implies that the local polynomial estimator is unbiased for estimating βj , when the true regression function m(x) is a polynomial of order q.
CHAPTER 8. NONPARAMETRIC REGRESSION ESTIMATION
285
To gain more insights about the local polynomial estimator, define the equivalent kernel (see Fan and Gijbels (1996)) Wj (u) = eTj S −1 (1, u, · · · , uq )T K(u). Then, it can be shown (see Fan and Gijbels (1996)) that Wj,n,h (Xt − x) = and
L
1 n hj+1 f (x)
Wj ((Xt − x)/h){1 + op(1)}
1 if l = j, 0 otherwise. The implications of these results are as follows. l
u Wj (u)du =
As pointed out by Fan and Yao (2003), the local polynomial estimator works like a kernel regression estimation with a known design density f (x). This explains why the local polynomial fit adapts to various design densities. In contrast, the kernel regression estimator has large bias at the region where the derivative of f (x) is large, namely it can not adapt to highly-skewed designs. To see that, imagine the true regression function has large slope in this region. Since the derivative of design density is large, for a given x, there are more points on one side of x than the other. When the local average is taken, the Nadaraya-Watson estimate is biased towards the side with more local data points because the local data are asymmetrically distributed. This issue is more pronounced at the boundary regions, since the local data are even more asymmetric. On the other hand, the local polynomial fit creates asymmetric weights, if needed, to compensate for this kind of design bias. Hence, it is adaptive to various design densities and to the boundary regions. We next derive the asymptotic bias and variance expression for local polynomial estimators. For independent data, we can obtain
CHAPTER 8. NONPARAMETRIC REGRESSION ESTIMATION
286
the bias and variance expression via conditioning on the design matrix X. However, for time series data, conditioning on X would mean conditioning on nearly the entire series. Hence, we derive the asymptotic bias and variance using the asymptotic normality rather than conditional expectation. As explained in Chapter 7, localizing in the state domain weakens the dependent structure for the local data. Hence, one would expect that the result for the independent data continues to hold for the stationary process with certain mixing conditions. The mixing condition and the bandwidth should be related, which can be seen later. Set Bn(x) = (b1(x), · · · , bn(x))T , where, for 0 ≤ j ≤ q, bj+1(x) = Then,
n $
t=1
+
#
m(j)(x) m(Xt) − (Xt − x)j (Xt −x)j Kh(Xt −x). j! j=0 q $
,−1
T
β − β = X WX
+
T
,−1
Bn(x) + X W X
XT W ε,
where ε = (ε1, · · · , εn)T . It is easy to show that if q is odd, Bn(x) = n h
q+1
m(q+1)(x) H f (x) c1,q {1 + op(1)}, (q + 1)!
where, for 1 ≤ k ≤ 3, ck,q = (µq+k (K), · · · , µ2q+k (K))T . If q is even,
Bn(x) = n hq+2 H f (x) c2,q
(q+1)
-
(q+2)
m (x)f (x) m (x) {1+op (1)}. + c3,q f (x)(q + 1)! (q + 2)!
Note that f -(x)/f (x) does not appear in the right hand side of Bn(x) when q is odd. In either case, we can show that 7
8
Var H(β − β) → σ 2(x)S −1 S ∗ S −1/f (x) = Σ(x), #
where S ∗ is a (q + 1) × (q + 1) matrix with the (i, j)th element being νi+j−2(K).
CHAPTER 8. NONPARAMETRIC REGRESSION ESTIMATION
287
This shows that the leading conditional bias term depends on whether q is odd or even. By a Taylor series expansion argument, we know that when considering |Xt − x| < h, the remainder term from a qth order polynomial expansion should be of order O(hq+1), so the result for odd q is quite easy to understand. When q is even, (q + 1) M is odd hence the term hq+1 is associated with ul K(u)du for l odd, and this term is zero because K(u) is a even function. Therefore, the hq+1 term disappears, while the remainder term becomes O(hq+2). Since q is either odd or even, then we see that the bias term is an even power of h. This is similar to the case where one uses higher order kernel functions based upon a symmetric kernel function (an even function), where the bias is always an even power of h. Finally, we can show that when q is odd, 7 8 √ # n h H(β − β) − B(x) → N (0, Σ(x)),
the asymptotic bias term for the local polynomial estimator is hq+1 m(q+1)(x) S −1 c1,q {1 + op(1)}. B(x) = (q + 1)! Or, √
7
P(j) (x) n h2j+1 m
8
(j)
− m (x) − Bj (x) → N (0, σjj (x)),
where the asymptotic bias and variance for the local polynomial estimator of m(j)(x) are j! hq+1−j (q+1) Bj (x) = m (x) (q + 1)! and
L
uq+1 Wj (u)du{1 + op(1)}
(j!)2σ 2(x) σjj (x) = f (x)
L
Wj2(u)du.
CHAPTER 8. NONPARAMETRIC REGRESSION ESTIMATION
288
Similarly, we can derive the asymptotic bias and variance at boundary points if the regression function has a finite support. For details, see Fan and Gijbels (1996), Fan and Yao (2003), and Ruppert and Wand (1994). Indeed, define Sc, Sc∗, and ck,q,c similarly to S, S ∗ and ck,q with µj (K) and νj (K) replaced by µj,c (K) and νj,c (K) respectively. We can show that 7 8 √ # n h H(β(ch) − β(ch)) − Bc(0) → N (0, Σc(0)), (8.6) where the asymptotic bias term for the local polynomial estimator at the left boundary point is hq+1 Bc(0) = m(q+1)(0) Sc−1 c1,q,c{1 + op(1)}, (q + 1)! and the asymptotic variance is Σc(0) = σ 2(0)Sc−1 Sc∗ Sc−1/f (0). Or, 7 8 √ (j) (j) 2j+1 P nh m (ch) − m (ch) − Bj,c(0) → N (0, σjj,c(0)), where with Wj,c (u) = eTj Sc−1 (1, u, · · · , uq )T K(u), j! hq+1−j (q+1) m (0) Bj,c(0) = (q + 1)! and
L ∞
−c
(j!)2σ 2(0) σjj,c (0) = f (0)
uq+1 Wj,c (u)du{1 + op(1)} L ∞
−c
2 Wj,c (u)du.
Exercise: Please derive the asymptotic properties for the local polynomial estimator. That is to prove (8.6). The above conclusions show that when q − j is odd, the bias at the boundary is of the same order as that for points on the interior. Hence, the local polynomial fit does not create excessive boundary bias when q − j is odd. Thus, the appealing boundary behavior of
CHAPTER 8. NONPARAMETRIC REGRESSION ESTIMATION
289
local polynomial mean estimation extends to derivative estimation. However, when q − j is even, the bias at the boundary is larger than in the interior, and the bias can also be large at points where f (x) is discontinuous. This is referred to as boundary effect. For these reasons (and the minimax efficiency arguments), it is recommended that one strictly set q − j to be odd when estimating m(j)(x). It is indeed an odd world! 8.5.5
Bandwidth Selection
As seen in previous sections, for stationary sequences of data under certain mixing conditions, the local polynomial estimator performs very much like that for independent data, because of windowing reduces dependency among local data. Partially because of this, there are not many studies on bandwidth selection for these problems. However, it is reasonable to expect the bandwidth selectors for independent data continue to work for dependent data with certain mixing conditions. Below, we summarize a few of useful approaches. When data do not have strong enough mixing, the general strategy is to increase bandwidth in order to reduce the variance. As what we had already seen for the nonparametric density estimation, the cross-validation method is very useful for assessing the performance of an estimator via estimating its prediction error. The basic idea is to set one of the data point aside for validation of a model and use the remaining data to build the model. It is defined as n $ 2 P CV(h) = [Ys − m −s (Xs )] s=1
P where m −s (Xs ) is the local polynomial estimator with j = 0 and bandwidth h, but without using the sth observation. The above sum-
CHAPTER 8. NONPARAMETRIC REGRESSION ESTIMATION
290
mand is indeed a squared-prediction error of the sth data point using the training set {(Xt, Yt) : t )= s}. This idea of the cross-validation method is simple but is computationally intensive. An improved version, in terms of computation, is the generalized cross-validation (GCV), proposed by Wahba (1977) and Craven and Wahba (1979). This criterion can be described as follows. The fitted values Y# = # T P P (m(X 1 ), · · · , m(X n )) can be expressed as Y = H(h)Y , where H(h) is an n × n hat matrix, depending on the X-variate and bandwidth h, and it is also called a smoothing matrix. Then the GCV approach selects the bandwidth h that minimizes 7
−1
GCV(h) = n tr(I − H(h)) where MASE(h) = residuals.
%n
t=1 (Yt
8−2
MASE(h)
2 P − m(X t )) /n is the average of squared
A drawback of the cross-validation type method is its inherited variability (see Hall and Johnstone, 1992). Further, it can not be directly applied to select bandwidths for estimating derivative curves. As pointed out by Fan, Heckman, and Wand (1995), the crossvalidation type method performs poorly due to its large sample variation, even worse for dependent data. Plug-in methods avoid these problems. The basic idea is to find a bandwidth h minimizing estimated mean integrated square error (MISE). See Ruppert, Sheather and Wand (1995) and Fan and Gijbels (1995) for details. Nonparametric AIC Selector
Inspired by the nonparametric version of the Akaike final prediction error criterion proposed by Tjøstheim and Auestad (1994b) for the lag selection in nonparametric setting, Cai (2002) proposed a simple and quick method to select bandwidth for the foregoing estimation
CHAPTER 8. NONPARAMETRIC REGRESSION ESTIMATION
291
procedures, which can be regarded as a nonparametric version of the Akaike information criterion (AIC) to be attentive to the structure of time series data and the over-fitting or under-fitting tendency. Note that the idea is also motivated by its analogue of Cai and Tiwari (2000). The basic idea is described as follows. By recalling the classical AIC for linear models under the likelihood setting −2 (maximized log likelihood)+2 (number of estimated parameters), Cai (2002) proposed the following nonparametric AIC to select h minimizing AIC(h) = log {MASE} + ψ(tr(Sh), n),
(8.7)
where ψ(tr(Sh), n) is chosen particularly to be the form of the biascorrected version of the AIC, due to Hurvich and Tsai (1989), ψ(tr(H(h)), n) = 2 {tr(H(h)) + 1}/[n − {tr(H(h)) + 2}], (8.8) and tr(H(h)) is the trace of the smoothing matrix H(h), regarded as the nonparametric version of degrees of freedom, called the effective number of parameters. See the book by Hastie and Tibshirani (1990, Section 3.5) for the detailed discussion on this aspect for nonparametric models. Note that actually, (8.7) is a generalization of the AIC for the parametric regression and autoregressive time series contexts, in which tr(H(h)) is the number of regression (autoregressive) parameters in the fitting model. In view of (8.8), when ψ(tr(H(h)), n) = −2 log(1 − tr(H(h))/n), then (8.7) becomes the generalized cross-validation (GCV) criterion, commonly used to select the bandwidth in the time series literature even in the iid setting, when ψ(tr(H(h)), n) = 2 tr(H(h))/n, then (8.7) is the classical AIC
CHAPTER 8. NONPARAMETRIC REGRESSION ESTIMATION
292
discussed in Engle, Granger, Rice, and Weiss (1986) for time series data, and when ψ(tr(H(h)), n) = − log(1 − 2 tr(H(h))/n), (8.7) is the T-criterion, proposed and studied by Rice (1984) for iid samples. It is clear that when tr(H(h))/n → 0, then the nonparametric AIC, the GCV and the T-criterion are asymptotically equivalent. However, the T-criterion requires tr(H(h))/n < 1/2, and, when tr(H(h))/n is large, the GCV has relatively weak penalty. This is especially true for the nonparametric setting. Therefore, the criterion proposed here counteracts the over-fitting tendency of the GCV. Note that Hurvich, Simonoff, and Tsai (1998) gave the detailed derivation of the nonparametric AIC for the nonparametric regression problems under the iid Gaussian error setting and they argued that the nonparametric AIC performs reasonably well and better than some existing methods in the literature. 8.6 8.6.1
Functional Coefficient Model Model
As mentioned earlier, when p is large, there exists the so called curse of dimensionality. To overcome this shortcoming, one way to do so is to consider the functional coefficient model as studied in Cai, Fan and Yao (2000)e and the additive model discussed in Section 8.7. First, we study the functional coefficient model. To use the notation from Cai, Fan and Yao (2000), we change the notation from the previous sections. Let {Ui, Xi, Yi}∞ i=−∞ be jointly strictly stationary processes with Ui taking values in 4k and Xi taking values in 4p. Typically, k is small. Let E(Y12) < ∞. We define the multivariate regression
CHAPTER 8. NONPARAMETRIC REGRESSION ESTIMATION
293
function m(u, x) = E (Y | U = u, X = x) ,
(8.9)
where (U, X, Y ) has the same distribution as (Ui, Xi, Yi). In a pure time series context, both Ui and Xi consist of some lagged values of Yi. The functional-coefficient regression model has the form m(u, x) =
p $
j=1
aj (u) xj ,
(8.10)
where the functions {aj (·)} are measurable from 4k to 41 and x = (x1, . . . , xp)T . This model has been studied extensively in the literature; see Cai, Fan and Yao (2000) for the detailed discussions. For simplicity, in what follows, we consider only the case k = 1 in (8.10). Extension to the case k > 1 involves no fundamentally new ideas. Note that models with large k are often not practically useful due to the “curse of dimensionality”. If k is large, to overcome the problem, one way to do so is to consider an index functional coefficient model proposed by Fan, Yao and Cai (2003) m(u, x) =
p $
j=1
aj (β T u) xj ,
(8.11)
where β1 = 1. Fan, Yao and Cai (2003) studied the estimation procedures, bandwidth selection and applications. Hong and Lee (2003) considered the applications of model (8.11) to the exchange rates. 8.6.2
Local Linear Estimation
As recommended by Fan and Gijbels (1996), we estimate the coefficient functions {aj (·)} using the local linear regression method
CHAPTER 8. NONPARAMETRIC REGRESSION ESTIMATION
294
from observations {Ui, Xi, Yi}ni=1, where Xi = (Xi1, . . . , Xip)T . We assume throughout that aj (·) has a continuous second derivative. Note that we may approximate aj (·) locally at u0 by a linear function aj (u) ≈ aj + bj (u − u0). The local linear estimator is defined as a" j (u0) = a" j , where {(a" j , b" j )} minimize the sum of weighted squares n $
i=1
i
Y −
p $
j=1
2
{aj + bj (Ui − u0)} Xij Kh(Ui − u0),
(8.12)
where Kh(·) = h−1K(·/h), K(·) is a kernel function on 41 and h > 0 is a bandwidth. It follows from the least squares theory that a" j (u0) =
n $
k=1
Kn,j (Uk − u0, Xk ) Yk ,
(8.13)
where Kn,j (u, x) =
eTj,2p
Q
RT
R
S−1
X WX
x Kh(u) ux
(8.14)
R ej,2p is the 2p × 1 unit vector with 1 at the jth position, X denotes an n × 2p matrix with (XTi , XTi (Ui − u0)) as its ith row, and W = diag {Kh(U1 − u0), . . . , Kh(Un − u0)}.
8.6.3
Bandwidth Selection
Various existing bandwidth selection techniques for nonparametric regression can be adapted for the foregoing estimation; see, e.g., Fan, Yao, and Cai (2003) and the nonparametric AIC as discussed in Section 8.5.5. Also, Fan and Gijbels (1996) and Ruppert, Sheather, and Wand (1995) developed data-driven bandwidth selection schemes based on asymptotic formulas for the optimal bandwidths, which are less variable and more effective than the conventional data-driven bandwidth selectors such as the cross-validation bandwidth rule.
CHAPTER 8. NONPARAMETRIC REGRESSION ESTIMATION
295
Similar algorithms can be developed for the estimation of functionalcoefficient models based on (8.24); however, this will be a future research topic. Cai, Fan and Yao (2000) proposed a simple and quick method for selecting bandwidth h. It can be regarded as a modified multi-fold cross-validation criterion that is attentive to the structure of stationary time series data. Let m and Q be two given positive integers and n > mQ. The basic idea is first to use Q subseries of lengths n − qm (q = 1, , · · · , Q) to estimate the unknown coefficient functions and then compute the one-step forecasting errors of the next section of the time series of length m based on the estimated models. More precisely, we choose h that minimizes the average mean squared (AMS) error AMS(h) = where for q = 1, · · · , Q,
Q $
q=1
AMSq (h),
(8.15)
2
p 1 n−qm+m $ $ " AMSq (h) = Y − a (U )X , i j,q i i,j m i=n−qm+1 j=1
and {a" j,q (·)} are computed from the sample {(Ui, Xi, Yi), 1 ≤ i ≤ n − qm} with bandwidth equal h[n/(n − qm)]1/5. Note that we rescale bandwidth h for different sample sizes according to its optimal rate, i.e. h ∝ n−1/5. In practical implementations, we may use m = [0.1n] and Q = 4. The selected bandwidth does not depend critically on the choice of m and Q, as long as mQ is reasonably large so that the evaluation of prediction errors is stable. A weighted version of AMS(h) can be used, if one wishes to down-weight the prediction errors at an earlier time. We believe that this bandwidth should be good for modeling and forecasting for time series.
CHAPTER 8. NONPARAMETRIC REGRESSION ESTIMATION
8.6.4
296
Smoothing Variable Selection
Of importance is to choose an appropriate smoothing variable U in applying functional-coefficient regression models if U is a lagged variable. Knowledge on physical background of the data may be very helpful, as Cai, Fan and Yao (2000) discussed in modeling the lynx data. Without any prior information, it is pertinent to choose U in terms of some data-driven methods such as the Akaike information criterion (AIC) and its variants, cross-validation, and other criteria. Ideally, we would choose U as a linear function of given explanatory variables according to some optimal criterion, which can be fully explored in the work by Fan, Yao and Cai (2003). Nevertheless, we propose here a simple and practical approach: let U be one of the given explanatory variables such that AMS defined in (8.15) obtains its minimum value. Obviously, this idea can be also extended to select p (number of lags) as well. 8.6.5
Goodness-of-Fit Test
To test whether model (8.10) holds with a specified parametric form which is popular in economic and financial applications, such as the threshold autoregressive (TAR) models
aj (u) =
aj1, if u ≤ η aj2, if u > η,
or generalized exponential autoregressive (EXPAR) models aj (u) = αj + (βj + γj u) exp(−θj u2), or smooth transition autoregressive (STAR) models aj (u) = [1 − exp(−θj u)]−1 (logistic), or
CHAPTER 8. NONPARAMETRIC REGRESSION ESTIMATION
297
aj (u) = 1 − exp(−θj u2) (exponential), or aj (u) = [1 − exp(−θj |u|)]−1 (absolute), [for more discussions on those models, please see the survey papers by van Dijk, Ter¨asvirta and Franses (2002)], we propose a goodness-offit test based on the comparison of the residual sum of squares (RSS) from both parametric and nonparametric fittings. This method is closely related to the sieve likelihood method proposed by Fan, Zhang and Zhang (2001). Those authors demonstrated the optimality of this kind of procedures for independent samples. Consider the null hypothesis H0 : aj (u) = αj (u, θ),
1 ≤ j ≤ p,
(8.16)
where αj (·, θ) is a given family of functions indexed by unknown # parameter vector θ. Let θ be an estimator of θ. The RSS under the null hypothesis is RSS0 = n
n 5 −1 $ i=1
#
#
62
Yi − α1(Ui, θ)Xi1 − · · · − αp(Ui, θ)Xip .
Analogously, the RSS corresponding to model (8.10) is RSS1 = n
n −1 $
i=1
{Yi − a" 1(Ui)Xi1 − · · · − a" p(Ui)Xip}2 .
The test statistic is defined as
Tn = (RSS0 − RSS1)/RSS1 = RSS0/RSS1 − 1, and we reject the null hypothesis (8.16) for large value of Tn. We use the following nonparametric bootstrap approach to evaluate the p value of the test:
CHAPTER 8. NONPARAMETRIC REGRESSION ESTIMATION
298
1. Generate the bootstrap residuals {ε∗i }ni=1 from the empirical dis" n , where tribution of the centered residuals {ε"i − ¯ε} i=1 εi = Yi − a1(Ui) Xi1 − · · · − ap(Ui) Xip, "
and define
"
"
¯ε" =
n 1 $ ε"i, n i=1
# # ∗ Yi∗ = α1(Ui, θ)X i1 + · · · + αp (Ui , θ)Xip + εi .
2. Calculate the bootstrap test statistic Tn∗ based on the sample {Ui, Xi, Yi∗}ni=1. 3. Reject the null hypothesis H0 when Tn is greater than the upperα point of the conditional distribution of Tn∗ given {Ui, Xi, Yi}ni=1. The p-value of the test is simply the relative frequency of the event {Tn∗ ≥ Tn} in the replications of the bootstrap sampling. For the sake of simplicity, we use the same bandwidth in calculating Tn∗ as that in Tn. Note that we bootstrap the centralized residuals from the nonparametric fit instead of the parametric fit, because the nonparametric estimate of residuals is always consistent, no matter whether the null or the alternative hypothesis is correct. The method should provide a consistent estimator of the null distribution even when the null hypothesis does not hold. Kreiss, Neumann, and Yao (1998) considered nonparametric bootstrap tests in a general nonparametric regression setting. They proved that, asymptotically, the conditional distribution of the bootstrap test statistic is indeed the distribution of the test statistic under the null hypothesis. It may be proven that # the similar result holds here as long as θ converges to θ at the rate n−1/2. It is a great challenge to derive the asymptotic property of the testing statistics Tn under time series context and general assumptions.
CHAPTER 8. NONPARAMETRIC REGRESSION ESTIMATION
299
That is to show that bn [Tn − λn] → N (0, σ 2) for some bn and λn, which is a great project for future research. Note that Fan, Zhang and Zhang (2001) derived the above result for the iid sample. 8.6.6
Asymptotic Results
We first present a result on mean squared convergence that serves as a building block for our main result and is also of independent interest. We now introduce some notation. Let S S n,0 n,1 Sn = Sn(u0) = Sn,1 Sn,2 and
T (u ) Tn = Tn(u0) = n,0 0 Tn,1(u0)
with Sn,j
1 = Sn,j (u0) = n
and 1 Tn,j (u0) = n
j T Ui − u0 Kh(Ui − u0) Xi Xi h i=1 n $
Ui − u0 j Xi Kh(Ui − u0) Yi. h i=1 n $
(8.17)
Then, the solution to (8.12) can be expressed as # β = H−1 S−1 n Tn ,
(8.18)
where H = diag (1, . . . , 1, h, . . . , h) with p-diagonal elements 1’s and p diagonal elements h’s. To facilitate the notation, we denote +
T
,
Ω = (ωl,m )p×p = E X X | U = u0 .
(8.19)
CHAPTER 8. NONPARAMETRIC REGRESSION ESTIMATION
300
Also, let f (u, x) denote the joint density of (U, X) and fu(u) be the marginal density of U . We use the following convention: if U = Xj0 for some 1 ≤ j0 ≤ p, then f (u, x) becomes f (x) the joint density of X. Theorem 1. Let condition A.1 in hold, and let f (u, x) be continuous at the point u0. Let hn → 0 and n hn → ∞, as n → ∞. Then it holds that E(Sn,j (u0)) → fu(u0) Ω(u0) µj , and n hn Var(Sn,j (u0)l,m ) → fu(u0) ν2j ωl,m
for each 0 ≤ j ≤ 3 and 1 ≤ l, m ≤ p.
As a consequence of Theorem 1, we have Sn
P −→
fu(u0) S,
Sn,3
and
P −→
µ3 fu(u0) Ω
in the sense that each element converges in probability, where Ω µ Ω 1 . S= µ1 Ω µ2 Ω Put σ 2(u, x) = Var(Y | U = u, X = x) (8.20)
and
∗
7
T
2
8
Ω (u0) = E X X σ (U, X) | U = u0 .
J
K
J
K
Let c0 = µ2/ µ2 − µ21 and c1 = −µ1/ µ2 − µ21 .
(8.21)
Theorem 2. Let σ 2(u, x) and f (u, x) be continuous at the point u0. Then under conditions A.1 and A.2, 2 2 + , √ h µ − µ µ 1 3 -D 2 " 2 n hn a(u0) − a(u0) − a (u0) −→ N 0, Θ (u0) , 2 µ2 − µ21 (8.22)
CHAPTER 8. NONPARAMETRIC REGRESSION ESTIMATION
301
provided that fu(u0) )= 0, where
c20 ν0 + 2 c0 c1 ν1 + c21 ν2 −1 Θ (u0) = Ω (u0) Ω∗(u0) Ω−1(u0). fu(u0) (8.23) 2
Theorem 2 indicates that the asymptotic bias of a" j (u0) is h2 µ22 − µ1 µ3 -a (u0) 2 µ2 − µ21 j
and the asymptotic variance is (n hn)−1 θj2(u0), where θj2(u0)
c20 ν0 + 2 c0 c1 ν1 + c21 ν2 T −1 = ej,p Ω (u0) Ω∗(u0) Ω−1(u0) ej,p . fu(u0)
When µ1 = 0, the bias and variance expressions can be simplified as h2 µ2 a--j (u0)/2 and ν0 θj2(u0) = eTj,p Ω−1(u0) Ω∗(u0) Ω−1(u0) ej,p . fu(u0) The optimal bandwidth for estimating aj (·) can be defined to be the one that minimizes the squared bias plus variance. The optimal bandwidth is given by
−1
∗
−1
1/5
µ22 ν0 − 2 µ1 µ2 ν1 + µ21 ν2 eTj,p Ω (u0) Ω (u0) Ω (u0) ej,p hj,opt = 5 62 2 2 -fu(u0) (µ2 − µ1 µ3) aj (u0) (8.24)
8.6.7
Conditions and Proofs
We first impose some conditions on the regression model but they might not be the weakest possible.
n−1/
CHAPTER 8. NONPARAMETRIC REGRESSION ESTIMATION
302
Condition A.1
a. The kernel function K(·) is a bounded density with a bounded support [−1, 1]. b. |f (u, v | x0, x1; l)| ≤ M < ∞, for all l ≥ 1, where f (u, v, | x0, x1; l) is the conditional density of (U0, Ul )) given (X0, Xl ), and f (u | x) ≤ M < ∞, where f (u | x) is the conditional density of U given X = x. c. The process {Ui, Xi, Yi} is α-mixing with for some δ > 2 and c > 1 − 2/δ.
%
k c[α(k)]1−2/δ < ∞
d. E|X|2 δ < ∞, where δ is given in condition A.1c. Condition A.2
a. Assume that E
5
Y02
6
2
+ Yl | U0 = u, X0 = x0; Ul = v, Xl = x1 ≤ M < ∞, (8.25) for all l ≥ 1, x0, x1 ∈ 4p, u, and v in a neighborhood of u0. b. Assume that hn → and n hn → ∞. Further, assume that there exists a+ sequence, of positive integers sn such that sn → ∞, sn = o (n hn)1/2 , and (n/hn)1/2 α(sn) → 0, as n → ∞. c. There exists δ ∗ > δ, where δ is given in Condition A.1c, such that 5 6 δ∗ E |Y | |U = u, X = x ≤ M4 < ∞ (8.26) for all x ∈ 4p and u in a neighborhood of u0, and +
α(n) = O n where θ∗ ≥ δ δ ∗/{2(δ ∗ − δ)}.
−θ∗
,
,
(8.27)
CHAPTER 8. NONPARAMETRIC REGRESSION ESTIMATION ∗
d. E|X|2 δ < ∞, and n1/2−δ/4 hδ/δ
∗ −1/2−δ/4
303
= O(1).
Remark A.1. We provide a sufficient condition for the mixing coefficient α(n) to satisfy conditions A.1c and A.2b. Suppose that hn = A n−ρ (0 ,< ρ < 1, A > 0), sn = (n hn/ log n)1/2 and + α(n) = O n−d for some d > 0. Then condition A.1c is satisfied for d > 2(1 − 1/δ)/(1 − 2/δ) and condition A.2b is satisfied if d > (1 + ρ)/(1 − ρ). Hence both conditions are satisfied if + , 1 + ρ 2(1 − 1/δ) −d α(n) = O n , d > max , . 1 − ρ 1 − 2/δ Note that this is a trade-off between the order δ of the moment of Y and the rate of decay of the mixing coefficient; the larger the order δ, the weaker the decay rate of α(n). " To study the joint asymptotic normality of a(u 0 ), we need to center the vector Tn(u0) by replacing Yi with Yi − m(Ui, Xi) in the expression (8.17) of Tn,j (u0). Let
T∗n,j (u0)
1 = n
and
j U − u i 0 K (U − u ) [Y − m(U , X )], Xi h i 0 i i i h i=1 n $
∗ T ∗ Tn = ∗n,0 . Tn,1 Because the coefficient functions aj (u) are conducted in the neighborhood of |Ui − u0| < h, by Taylor’s expansion,
h2 Ui − u0 2 T -T T m(Ui, Xi) = Xi a(u0)+(Ui−u0) Xi a (u0)+ Xi a (u0)+op(h2 2 h where a-(u0) and a--(u0) are the vectors consisting of the first and second derivatives of the functions aj (·). Then, Tn,0 −
T∗n,0
h2 = Sn,0 a(u0) + h Sn,1 a (u0) + Sn,2 a--(u0) + op(h2) 2 -
CHAPTER 8. NONPARAMETRIC REGRESSION ESTIMATION
304
and Tn,1 −
T∗n,1
h2 = Sn,1 a(u0) + h Sn,2 a (u0) + Sn,3 a--(u0) + op(h2), 2 -
so that Tn −
T∗n
h2 = Sn H β + 2
S n,2 a-- (u ) + o (h2 ), 0 p Sn,3
(8.28)
where β = (a(u0)T , a-(u0)T )T . Thus it follows from (8.18), (8.28), and Theorem 1 that + , h2 −1 µ2 Ω -# −1 ∗ −1 a (u0) + op(h2), H β − β = fu (u0) S Tn + S µ3 Ω 2 (8.29) # from which the bias term of β(u0) is evident. Clearly,
−1 7 8 h2 µ2 − µ µ Ω 1 3 -2 ∗ ∗ " a(u0)−a(u0) = µ T − µ T + a (u0)+o 2 1 n,0 n,1 fu(u0) (µ2 − µ21) 2 µ2 − µ21 (8.30) " Thus (8.30) indicates that the asymptotic bias of a(u 0 ) is
h2 µ22 − µ1 µ3 -a (u0). 2 µ2 − µ21
Let
1 Qn = n
where
n $
i=1
Zi,
(8.31)
Ui − u0 Zi = Xi c0 + c1 Kh(Ui − u0) [Yi − m(Ui, Xi)] (8.32) h K K J J with c0 = µ2/ µ2 − µ21 and c1 = −µ1/ µ2 − µ21 . It follows from (8.30) and (8.31) that √
Ω−1 √ h2 µ22 − µ1 µ3 -" n hn a(u0) − a(u0) − n hn Qn+op(1). a (u0) = 2 µ2 − µ21 fu(u0) (8.33)
CHAPTER 8. NONPARAMETRIC REGRESSION ESTIMATION
305
We need the following lemma, whose proof is more involved than that for Theorem 1. Therefore, we prove only this lemma. Throughout this Appendix, we let C denote a generic constant, which may take different values at different places. Lemma A.1. Under conditions A.1 and A.2 and the assumption that hn → 0 and n hn → ∞, as n → ∞, if σ 2(u, x) and f (u, x) are continuous at the point u0, then we have D
E
(a) hn Var(Z1) → fu(u0) Ω∗(u0) c20 ν0 + 2 c0 c1 ν1 + c21 ν2 ; (b) hn
%n−1|
l=1
|Cov(Z1, Zl+1)| = o(1); and
E
D
(c) n hn Var(Qn) → fu(u0) Ω∗(u0) c20 ν0 + 2 c0 c1 ν1 + c21 ν2 . Proof: First, by conditioning on (U1, X1) and using Theorem 1 of Sun (1984), we have
Var(Z1) = E X1 XT1 σ 2(U1, X1) c0 + c1
U1 − u h
2 0
Kh2(U1 − u0)
5 6 8 17 ∗ 2 2 = fu(u0) Ω (u0) c0 ν0 + 2 c0 c1 ν1 + c1 ν2 + o(1) . (8.34) h
The result (c) follows in an obvious manner from (a) and (b) along with 1 l 2 n−1 $ 1 − Cov(Z , Z Var(Qn) = Var(Z1) + (8.35) 1 l+1 ). n n l=1 n It thus remains to prove part (b). To this end, let dn → ∞ be a sequence of positive integers such that dn hn → 0. Define J1 =
dn$ −1 l=1
|Cov(Z1, Zl+1)|
CHAPTER 8. NONPARAMETRIC REGRESSION ESTIMATION
and J2 =
n−1 $
l=dn
306
|Cov(Z1, Zl+1)|. J
K
J
K
It remains to show that J1 = o h−1 and J2 = o h−1 . We remark that because K(·) has a bounded support [−1, 1], aj (u) is bounded in the neighborhood of u ∈ [u0 − h, u0 + h]. % Let B = max1≤j≤p sup|u−u0|
|Cov(Z1, Zl+1)| 7 ≤ C E |X1 XTl+1| {|Y1| + B g(X1)}{|Yl+1| + B g(Xl+1)} Kh(U1 − u0) Kh( F
5
61/2 5
≤ C E |X1XTl+1| M2 + B 2g 2(X1) ≤CE ≤ C.
7
|X1 XTl+1|
{1 + g(X1)} {1 + g(Xl+1)}
It follows that
+
61/2
M2 + B 2g 2(Xl+1)
−1
J1 ≤ C dn = o h
8
Kh(U1 − u0
,
by the choice of dn. We next consider the upper bound of J2. To this end, using Davydov’s inequality (see Hall and Heyde 1980, Corollary A.2), we obtain, for all 1 ≤ j, m ≤ p and l ≥ 1, |Cov(Z1j , Zl+1,m)| ≤ C [α(l)]
1−2/δ
7
8 7 δ 1/δ
E|Zj |
8 δ 1/δ
E|Zm|
. (8.37)
By conditioning on (U, X) and using conditions A.1b and A.2c, one has 7
δ
E |Zj |
8
≤ CE
≤ CE
7
δ
|Xj | Khδ (U 7 |Xj |δ Khδ (U
5
δ
δ
δ
68
− u0) |Y | + B g (X) 5
δ
δ
68
− u0) M3 + B g (X)
CHAPTER 8. NONPARAMETRIC REGRESSION ESTIMATION 1−δ
≤ Ch
7
δ
E |Xj |
≤ C h1−δ .
5
δ
307 68
δ
M3 + B g (X)
(8.38)
A combination of (8.37) and (8.38) leads to 2/δ−2
J2 ≤ C h
∞ $
[α(l)]
1−2/δ
l=dn
2/δ−2
≤Ch
d−c n
∞ $
c
l [α(l)]
+
1−2/δ
=o h
l=dn
(8.39) by choosing dn such that h1−2/δ dcn = C, so the requirement that dn hn → 0 is satisfied. Proof of Theorem 2
We use the small-block and large-block technique – namely, partition {1, . . . , n} into 2 qn + 1 subsets with large block of size r = rn and small block of size s = sn. Set
n q = qn = . r n + sn
(8.40)
We now use the Cram´er-Wold device to derive the asymptotic nor√ T mality of Qn. For any unit vector d ∈ 4p, let Zn,i = h d Zi+1, i = 0, . . . , n − 1. Then √ 1 n−1 $ T √ n h d Qn = Zn,i, n i=0 and, by Lemma A.1, T
∗
Var(Zn,0) ≈ fu(u0) d Ω (u0) d ≡ θ2(u0)
and
n−1 $ l=0
7
c20 ν0
+ 2 c0 c1 ν1 +
|Cov(Zn,0, Zn,l )| = o(1).
−1
c21 ν2
8
(8.41) (8.42)
,
CHAPTER 8. NONPARAMETRIC REGRESSION ESTIMATION
308
Define the random variables, for 0 ≤ j ≤ q − 1, ηj =
j(r+s)+r−1 $
Zn,i,
ξj =
(j+1)(r+s) $
Zn,i,
i=j(r+s)
i=j(r+s)+r
and ζq =
n−1 $
i=q(r+s)
Zn,i.
Then, q−1 $
√
q−1 $
1 1 √ n h dT Qn = √ η + ≡ ξ + ζ {Qn,1 + Qn,2 + Qn,3} . j j q n j=0 n j=0 (8.43) We show that as n → ∞, 1 E [Qn,3]2 → 0, n
1 E [Qn,2]2 → 0, n T T T T
and
E [exp(i t Qn,1)] − 1 n
q−1 $ j=0
E
+
q−1 O j=0
ηj2
,
T T T T
(8.44)
E [exp(i t ηj )] → 0,
(8.45)
→ θ2(u0),
(8.46)
7 5 √ 68 1 q−1 $ 2 E ηj I |ηj | ≥ ε θ(u0) n → 0 (8.47) n j=0 for every ε > 0. (8.44) implies that Qn,2 and Qn,3 are asymptotically negligible in probability, (8.45) shows that the summands ηj in Qn,1 are asymptotically independent and (8.46) and (8.47) are the standard Lindeberg-Feller conditions for asymptotic normality of Qn,1 for the independent setup.
CHAPTER 8. NONPARAMETRIC REGRESSION ESTIMATION
309
We first establish (8.44). For this purpose, we choose the large block size. Condition A.2b implies that there is a sequence of positive constants γn → ∞ such that +√ , γn sn = o n hn and γn(n/hn)1/2 α(sn) → 0.
(8.48)
Define the large block size rn by rn = 6(n hn)1/2/γn7 and the small block size sn. Then it can easily be shown from (8.48) that as n → ∞, sn/rn → 0,
rn (n hn)−1/2 → 0, (8.49)
rn/n → 0,
and (n/rn) α(sn) → 0.
Observe that 2
E [Qn,2] =
q−1 $ j=0
Var(ξj ) + 2
$
0≤i
(8.50)
Cov(ξi, ξj ) ≡ I1 + I2. (8.51)
It follows from stationarity and Lemma A.1 that I1 = qn Var(ξ1) = qn Var
sn $
j=1
2 = qn sn [θ (u0 ) + o(1)]. (8.52) Zn,j
Next consider the second term I2 in the right side of (8.51). Let rj∗ = j(rn + sn), then rj∗ − ri∗ ≥ rn for all j > i, we thus have |I2| ≤ 2 ≤ 2
$
sn $
sn $
0≤i
|Cov(Zn,ri∗+rn+j1 , Zn,rj∗+rn+j2 )|
|Cov(Zn,j1 , Zn,j2 )|.
By stationarity and Lemma A.1, one obtains |I2| ≤ 2n
n $
j=rn +1
|Cov(Zn,1, Zn,j )| = o(n).
(8.53)
CHAPTER 8. NONPARAMETRIC REGRESSION ESTIMATION
310
Hence, by (8.49)-(8.53), we have + , 1 2 −1 E[Qn,2] = O qn sn n + o(1) = o(1). n
(8.54)
It follows from stationarity, (8.49), and Lemma A.1 that
Var [Qn,3] = Var
n−qn$ (rn +sn ) j=1
= O(n − qn (rn + sn )) = o(n). Zn,j
(8.55) Combining (8.49), (8.54), and (8.55), we establish (8.44). As for (8.46), by stationarity, (8.49), (8.50), and Lemma A.1, it is easily seen that 1 n
qn$ −1 j=0
+
,
E ηj2 =
qn qn rn 1 2 → θ (u0 ). Zn,j E η12 = · Var n n rn j=1 +
,
rn $
To establish (8.45), we use Lemma 1.1 of Volkonskii and Rozanov (1959) (see also Ibragimov and Linnik 1971, p. 338) to obtain T T T T
E [exp(i t Qn,1)] −
qnO −1 j=0
T T T T
E [exp(i t ηj )] ≤ 16 (n/rn) α(sn)
tending to 0 by (8.50).
It remains to establish (8.47). For this purpose, we use theorem 4.1 of Shao and Yu (1996) and condition A.2 to obtain 7 5 √ 68 2 E η1 I |η1| ≥ ε θ(u0) n ≤ Cn ≤ C As in (8.38),
1−δ/2
+
E |η1|
n1−δ/2 rnδ/2 +
δ
δ∗
E |Zn,0|
,
5
,
+
δ∗
E |Zn,0|
≤ C h1−δ
∗ /2
,6δ/δ ∗
.
.
(8.56) (8.57)
CHAPTER 8. NONPARAMETRIC REGRESSION ESTIMATION
311
Therefore, by (8.56) and (8.57), E ≤ C
7
√
5
68
η12 I |η1| ≥ ε θ(u0) n ∗ ∗ n1−δ/2 rnδ/2 h(2−δ )δ/(2 δ ).
(8.58)
Thus, by (8.40) and the definition of rn, and using conditions A.2c and A.2d, we obtain 1 n
q−1 $ j=0
E
7
ηj2I
√
5
|ηj | ≥ ε θ(u0) n
68
≤ C γn1−δ/2 n1/2−δ/4 hδ/δ n
∗ −1/2−δ/4
→0
(8.59)
because γn → ∞. This completes the proof of the theorem. 8.6.8
Monte Carlo Simulations and Applications
See Cai, Fan and Yao (2000) for the detailed Monte Carlo simulation results and applications. 8.7 8.7.1
Additive Model Model
In this section, we use the notation from Cai (2002). Let {Xt, Yt, Zt}∞ t=−∞ p be jointly stationary processes, where Xt and Yt take values in 4 and 4q with p, q ≥ 0, respectively. The regression surface is defined by m(x, y) = E {Zt | Xt = x, Yt = y} . (8.60) Here, it is assumed that E|Zt| < ∞. Note that the regression function m(·, ·) defined in (8.60) can identify only the sum m(x, y) = µ + g1(x) + g2(y).
(8.61)
CHAPTER 8. NONPARAMETRIC REGRESSION ESTIMATION
312
Such a decomposition holds, for example, for the following nonlinear additive autoregressive model with exogenous variables (ARX) Yt = µ + g1(Xt−j1 , . . . , Xt−jp ) + g2(Yt−i1 , . . . , Yt−iq ) + ηt, Xt−j1 = g3(Xt−j2 , . . . , Xt−jp ) + εt. For detailed discussions on the ARX model, the reader is referred to the papers by Masry and Tjøstheim (1997) and Cai and Masry (2000). For identifiability, it is assumed that E {g1(Xt)} = 0 and E {g2(Yt)} = 0. Then, the projection of m(x, y) on the g1(x)direction is defined by E{m(x, Yt)} = µ + g1(x) + E {g2(Yt)} = µ + g1(x).
(8.62)
Clearly, g1(·) can be identified up to an additive constant and g2(·) can be retrieved likewise. A thorough discussion of additive time series models defined in (8.61) can be found in Chen and Tsay (1993). Additive components can be estimated with a one-dimensional nonparametric rate. In most papers, to estimate additive components, several methods have been proposed. For example, Chen and Tsay (1993) used the iterative backfitting procedures, such as the ACE algorithm and the BRUTO approach; see Hastie and Tibshirani (1990) for details. But, their asymptotic properties are not well understood due to the implicit definition of the resulting estimators. To attenuate the drawbacks of iterative procedures, Auestad and Tjøstheim (1991) and Tjøstheim and Auestad (1994a) proposed a direct method based on an average regression surface idea, referred to as projection method in Tjøstheim and Auestad (1994a) for time series data. As pointed out by Cai and Fan (2000), a direct method has some advantages, such as it does not rely on iterations, it can make computation fast, and
CHAPTER 8. NONPARAMETRIC REGRESSION ESTIMATION
313
more importantly, it allows an asymptotic analysis. Finally, the projection method is extended to nonlinear ARX models by Masry and Tjøstheim (1997) using the kernel method and Cai and Masry (2000) coupled with the local polynomial approach. It should be remarked that the projection method, under the name of marginal integration, is proposed independently by Newey (1994) and Linton and Nielsen (1995) for iid samples, and since then, some important progresses have been made by some authors. For example, by combining the marginal integration with one-step backfitting, Linton (1997, 2000) presents an efficient estimator, Mammen, Linton, and Nielsen (1999) establish rigorously the asymptotic theory of the backfitting, Cai and Fan (2000) consider estimating each component using the weighted projection method coupled with the local linear fitting in an efficient way, and Sperlich, Tjøtheim, and Yang (2000) extend the efficient method to models with simple interactions. The projection method has some disadvantages although it has the aforementioned merits. The projection method may not be efficient if covariates (endogenous or exogenous variables) are strongly correlated, which is particularly relevant for autoregressive models. The intuitive interpretation is that additive components are not orthogonal. To overcome this shortcoming, two efficient estimation methods have been proposed in the literature. The first one is called weight function procedure, proposed by Fan, H¨ardle, and Mammen (1998) for iid samples and extended to time series situations by Cai and Fan (2000). With an appropriate choice of the weight function, additive components can be efficiently estimated in the sense that an additive component can be estimated with the same asymptotic bias and variance as if the rest of components were known. The second one is to combine the marginal integration with one-step backfitting,
CHAPTER 8. NONPARAMETRIC REGRESSION ESTIMATION
314
introduced by Linton (1997, 2000) for iid samples and extended by Sperlish, Tjøstheim, and Yang (2000) to additive models with single interactions, but this method has not been advocated for time series situations. However, there has not been any attempt to discuss the bandwidth selection for the projection method and its variations in the literature due to their complexity. In practice, one bandwidth is usually used for all components although Cai and Fan (2000) argue that different bandwidths might be used theoretically to deal with the situation that additive components posses the different smoothness. Therefore, the projection method may not be optimal in practice in the sense that one bandwidth is used. To estimate unknown additive components in (8.61) efficiently, following the spirit of the marginal integration with one-step backfitting proposed by Linton (1997) for iid samples, I use a two-stage method, due to Linton (2000), coupled with the local linear (polynomial) method, which has some attractive properties, such as mathematical efficiency, bias reduction and adaptation of edge effect (see Fan and Gijbels, 1996). The basic idea of the two-stage approach is described as follows. At the first stage, one obtains the initial estimated values for all components. More precisely, the idea for estimating any additive component is first to estimate directly highdimensional regression surface by the local linear method and then to average the regression surface over the rest of variables to stabilize variance. Such an initial estimate, in general, is under-smoothed so that the bias should be asymptotically negligible. At the second stage, the local linear (polynomial) technique is used again to estimate any additive component by using the initial estimated values of the rest of components. In such a way, it is shown that the estimate at the second stage is not only efficient in the sense of being
CHAPTER 8. NONPARAMETRIC REGRESSION ESTIMATION
315
equivalent to a procedure based on knowing other components, but also making the bandwidth selection much easier. Note that this technique is not novel to this paper since the two-stage method is first used by Linton (1997, 2000) for iid samples, but many details and insights are. The rest of the paper is organized as follows. Section 2 gives a brief review of the projection method and discusses its advantages and shortcomings. Section 3 presents the two-stage approach coupled with a new bandwidth selector, based on the nonparametric version of the Akaike information criterion. Also, the asymptotic normality of the resulting estimator is established. In Section 4, a small simulation study is carried out to illustrate the methodology and the two-stage approach is also applied to a real example. Finally, together with some regularity conditions, the technical proof is relegated to the Appendix. 8.7.2
Backfitting Algorithm
The building block of the generalized additive model algorithm is the scatterplot smoother. We will first describe scatterplot smoothing in a simple setting, and then indicate how it is used in generalized additive modelling. Here y is a response or outcome variable, and x is a prognostic factor. We wish to fit a smooth curve f (x) that summarizes the dependence of y on x. If we were to find the curve % that simply minimizes ni=1[yi − f (xi)]2, the result would be an interpolating curve that would not be smooth at all. The cubic spline smoother imposes smoothness on f (x). We seek the function f (x) that minimizes n $
L
[yi − f (xi)] + λ [f --(x)]2dx
i=1
2
(8.63)
CHAPTER 8. NONPARAMETRIC REGRESSION ESTIMATION
316
M
Notice that [f --(x)]2dx measures the “wiggliness” of the function M f (x): linear f (x)s have [f --(x)]2dx = 0, while non-linear fs produce values bigger than zero. λ is a non-negative smoothing parameter that must be chosen by the data analyst. It governs the tradeoff between the goodness of fit to the data and (as measured by and wiggleness of the function. Larger values of λ force f (x) to be smoother. For any value of λ, the solution to (8.63) is a cubic spline, i.e., a piecewise cubic polynomial with pieces joined at the unique observed values of x in the dataset. Fast and stable numerical procedures are available for computation of the fitted curve. What value of did we use in practice? In fact it is not a convenient to express the desired smoothness of f (x) in terms of λ, as the meaning of λ depends on the units of the prognostic factor x. Instead, it is possible to define an “effective number of parameters” or “degrees of freedom” of a cubic spline smoother, and then use a numerical search to determine the value of λ to yield this number. In practice, if we chose the effective number of parameters to be 5, roughly speaking, this means that the complexity of the curve is about the same as a polynomial regression of degrees 4. However, the cubic spline smoother “spreads out” its parameters in a more even manner, and hence is much more flexible than a polynomial regression. Note that the degrees of freedom of a smoother need not be an integer. The above discussion tells how to fit a curve to a single prognostic factor. With multiple prognostic factors, if xij denotes the value of the jth prognostic factor for the ith observation, we fit the additive model d $ yi = fj (xij ) + εi. j=1
CHAPTER 8. NONPARAMETRIC REGRESSION ESTIMATION
317
A criterion like (8.63) can be specified for this problem, and a simple iterative procedure exists for estimating the fj s. We apply a cubic % spline smoother to the outcome yi − dj)=k f#j (xij ) as a function of xik , for each prognostic factor in turn. The process is continues until the estimates f#j (x) stabilize. These procedure is known as “backfitting” and the resulting fit is analogous to a multiple regression for linear models. 8.7.3
Projection Method
This section is devoted to a brief review of the projection method and discusses its merits and disadvantages. It is assumed that all additive components have continuous second partial derivatives, so that m(u, v) can be locally approximated by a linear term in a neighborhood of (x, y), namely, m(u, v) ≈ β0 + β T1 (u − x) + β T2 (v − y) with {β j } depending on x and y, where β T1 denotes the transpose of β 1. Let K(·) and L(·) be symmetric kernel functions in 4p and 4q , respectively, and h11 = h11(n) > 0 and h12 = h12(n) > 0 be bandwidths in the step of estimating the regression surface. Here, to handle various degrees of smoothness, Cai and Fan (2000) propose using h11 and h12 differently although the implementation may not be easy in practice. The reader is referred to the paper by Cai and # Fan (2000) for details. Given observations {Xt, Yt, Zt}nt=1, let β j be the minimizer of the following locally weighted least squares n 5 $
t=1
Zt − β0 −
β T1
(Xt − x) −
β T2
62
(Yt − y)
Kh11 (Xt−x) Lh12 (Yt−y),
where Kh(·) = K(·/h)/hp and Lh(·) = L(·/h)/hq . Then, the local P linear estimator of the regression surface m(x, y) is m(x, y) = β#0.
CHAPTER 8. NONPARAMETRIC REGRESSION ESTIMATION
318
P By computing the sample average of m(·, ·) based on (8.62), the projection estimators of g1(·) and g2(·) are defined as, respectively,
and
g"1(x) =
n 1 $ P # m(x, Yt) − µ, n t=1
n 1 $ P # g2(y) = m(X t , y) − µ, n t=1 % n # = n−1 where µ t=1 Zt . Under some regularity conditions, by using the same arguments as those employed in the proof of Theorem 3 in Cai and Masry (2000), it can be shown (although not easy and tedious) that the asymptotic bias and asymptotic variance of g"1(x) are, respectively, h211 tr{µ2(K) g1--(x)}/2 and v1(x) = ν0(K) A(x), where L A(x) = p22(y) σ 2(x, y) p−1(x, y) dy "
and σ 2(x, y) = Var ( Zt | Xt = x, Yt = y) .
Here, p(x, y) stands for the joint density of Xt and Yt, p1(x) denotes the marginal density of Xt, p2(y) is the marginal density of Yt, M M ν0(K) = K 2(u)du, and µ2(K) = uuT K(u) du. The foregoing method has some advantages, such as it is easy to understand, it can make computation fast, and it allows an asymptotic analysis. However, it can be quite inefficient in an asymptotic sense. To demonstrate this idea, let us consider the ideal situation that g2(·) and µ are known. In such a case, one can estimate g1(·) by H directly regressing the partial error Z t = Zt − µ − g2 (Yt ) on Xt and such an ideal estimator is optimal in an asymptotic minimax sense (see, e.g., Fan and Gijbels, 1996). The asymptotic bias for the ideal
CHAPTER 8. NONPARAMETRIC REGRESSION ESTIMATION
319
estimator is h211 tr{ µ2(K) g1--(x)}/2 and the asymptotic variance is v0(x) = ν0(K) B(x) with B(x) =
p−1 1 (x)
5
2
6
E σ (Xt, Yt) | Xt = x (8.64) (see, e.g., Masry and Fan, 1997). It is clear that v1(x) = v0(x) if Xt and Yt are independent. If Xt and Yt are correlated and when σ 2(x, y) is a constant, it follows from the Cauchy-Schwarz inequality that
σ 2 L 1/2 p2(y) σ 2 L p22(y) B(x) = p (y |x) 1/2 dy ≤ d y = A(x), p1(x) p (y |x) p1(x) p(y|x) which implies that the ideal estimator has always smaller asymptotic variance than the projection method although both have the same bias. This suggests that the projection method could lead to an inefficient estimation of g1(·) and g2(·) when Xt and Yt are serially correlated, which is particularly relevant for autoregressive models. To alleviate this shortcoming, I propose the two-stage approach described next. 8.7.4
Two-Stage Procedure
The two-stage method due to Linton (1997, 2000) is introduced. The basic idea is to get an initial estimate for g"2(·) using a small bandwidth h12. The initial estimate can be obtained by the projection method and h12 can be chosen so small that the bias of estimating g"2(·) can be asymptotically negligible. Then, using the partial #−g " (Y ), we apply the local linear regression residuals Zt∗ = Zt − µ 2 t technique to the pseudo regression model Zt∗ = g1(Xt) + ε∗t
CHAPTER 8. NONPARAMETRIC REGRESSION ESTIMATION
320
to estimate g1(·). This leads naturally to the weighted least-squares problem n 5 $
t=1
Zt∗
− β1 −
β T2
62
(Xt − x)
Jh2 (Xt − x),
(8.65)
where J(·) is the kernel function in 4p and h2 = h2(n) > 0 is the bandwidth at the second-stage. The advantage of this is twofold: the bandwidth h2 can now be selected purposely for estimating g1(·) only and any bandwidth selection technique for nonparametric regression can be applied here. Maximizing (8.65) with respect to β1 and β 2 gives the two-stage estimate of g1(x), denoted by g!1(x) = β#1, where # β#1 and β 2 are the minimizer of (8.65).
It is shown in Theorem 1, in which follows, that under some regularity conditions, the asymptotic bias and variance of the two-stage estimate g!1(x) are the same as those for the ideal estimator, provided that the initial bandwidth h12 satisfies h12 = o (h2).
Sampling Properties
To establish the asymptotic normality of the two-stage estimator, it is assumed that the initial estimator satisfies a linear approximation; namely, n 1 $ " g2(Yt) − g2(Yt) ≈ Lh (Yi − Yt)Γ(Xi, Yt) δi n i=1 12 1 (8.66) + h212 tr{µ2(L) g2--(Yt)}, 2 where δt = Zt −m(Xt, Yt) and Γ(x, y) = p1(x)/p(x, y). Note that under some regularity conditions, by following the same arguments as in Masry (1996), one might show (although the proof is not easy, quite lengthy, and tedious) that (8.66) holds. Note that this assumption is also imposed in Linton (2000) for iid samples to simplify the
CHAPTER 8. NONPARAMETRIC REGRESSION ESTIMATION
321
proof of the asymptotic results of the two-stage estimator. Now, the asymptotic normality for the two-stage estimator is stated here and its proof is relegated to the Appendix. THEOREM 1. Under (8.66) and Assumptions A1 – A9 stated in the Appendix, if bandwidths h12 and h2 are chosen such that h12 → 0, n hq12 → ∞, h2 → 0, and n hp2 → ∞ as n → ∞, then,
&
n hp2
7
g1(x) − g1(x) − bias(x) + !
+
op h212
+
h22
,8
D −→
N {0, v0(x)} ,
where the asymptotic bias is h212 h22 -tr{µ2(J) g1 (x)} − tr {µ2(L) E (g2--(Yt) | Xt = x)} bias(x) = 2 2 and the asymptotic variance is v0(x) = ν0(J) B(x). We remark that by Theorem 1, the asymptotic variance of the two-stage estimator is independent of the initial bandwidths. Thus, the initial bandwidths should be chosen as small as possible. This is another benefit of using the two-stage procedure: the bandwidth selection problem becomes relatively easy. In particular, when h12 = o (h2), the bias from the initial estimation can be asymptotically negligible. For the ideal situation that g2(·) is known, Masry and Fan (1997) show that under some regularity conditions, the optimal estimate of g1(x), denoted by g"1∗(x), by using (8.65) in which the H partial residual Zt∗ is replaced by the partial error Z t = Yt − µ − g2(Yt), is asymptotically normally distributed, &
n hp2 g"1∗(x) − g1(x) −
h22 2
tr{µ2(J) g1--(x)} + op(h22)
D −→
N {0, v0(x)} .
This, in conjunction with Theorem 1, shows that the two-stage estimator and the ideal estimator share the same asymptotic bias and variance if h12 = o (h2).
CHAPTER 8. NONPARAMETRIC REGRESSION ESTIMATION
322
Finally, it is worth to pointing out that under some regularity conditions, the nonlinear additive ARX processes are stationary and α-mixing with geometric decay mixing coefficient, see Masry and Tjøstheim (1997), so that Assumptions A6, A7, and A8 in the Appendix imposed on the mixing coefficient are automatically satisfied. Therefore, assuming that the other technical assumptions of this paper are satisfied, the result in Theorem 1 can be applied to the nonlinear additive ARX models. 8.7.5
Monte Carlo Simulations and Applications
See Cai (2002) for the detailed Monte Carlo simulation results and applications. 8.8
Computer Code
# 07-31-2006 graphics.off() # clean the previous graphs on the screen ############################################################ z1=matrix(scan(file="c:\\teaching\\time series\\data\\w-3mtb byrow=T,ncol=4) # dada: weekly 3-month Treasury bill from 1970 to 1997 x=z1[,4]/100 n=length(x) y=diff(x) # Delta x_t=x_t-x_{t-1} x=x[1:(n-1)] n=n-1 x_star=(x-mean(x))/sqrt(var(x)) z=seq(min(x),max(x),length=50) #win.graph()
CHAPTER 8. NONPARAMETRIC REGRESSION ESTIMATION
323
postscript(file="c:\\teaching\\time series\\figs\\fig-8.1.ep horizontal=F,width=6,height=6) par(mfrow=c(2,2),mex=0.4,bg="light blue") scatter.smooth(x,y,span=1/10,ylab="",xlab="x(t-1)",evaluatio title(main="(a) y(t) vs x(t-1)",col.main="red") scatter.smooth(x,abs(y),span=1/10,ylab="",xlab="x(t-1)",eval title(main="(b) |y(t)| vs x(t-1)",col.main="red") scatter.smooth(x,y^2,span=1/10,ylab="",xlab="x(t-1)",evaluat title(main="(c) y(t)^2 vs x(t-1)",col.main="red") dev.off() ############################################################ ######################### # Nonparametric Fitting # ######################### ######################################################### # Define the Epanechnikov kernel function kernel<-function(x){0.75*(1-x^2)*(abs(x)<=1)} ############################################################ # Define the kernel density estimator kernden=function(x,z,h,ker){ # parameters: x=variable; h=bandwidth; z=grid point; ker=k nz<-length(z) nx<-length(x) x0=rep(1,nx*nz) dim(x0)=c(nx,nz) x1=t(x0) x0=x*x0 x1=z*x1 x0=x0-t(x1) if(ker==1){x1=kernel(x0/h)} # Epanechnikov kernel
CHAPTER 8. NONPARAMETRIC REGRESSION ESTIMATION
324
if(ker==0){x1=dnorm(x0/h)} # normal kernel f1=apply(x1,2,mean)/h return(f1) } ############################################################ # Define the local constant estimator local.constant=function(y,x,z,h,ker){ # parameters: x=variable; h=bandwidth; z=grid point; ker=k nz<-length(z) nx<-length(x) x0=rep(1,nx*nz) dim(x0)=c(nx,nz) x1=t(x0) x0=x*x0 x1=z*x1 x0=x0-t(x1) if(ker==1){x1=kernel(x0/h)} # Epanechnikov kernel if(ker==0){x1=dnorm(x0/h)} # normal kernel x2=y*x1 f1=apply(x1,2,mean) f2=apply(x2,2,mean) f3=f2/f1 return(f3) } ############################################################ # Define the local linear estimator local.linear<-function(y,x,z,h){ # parameters: y=response, x=design matrix; h=bandwidth; z=g nz<-length(z) ny<-length(y)
CHAPTER 8. NONPARAMETRIC REGRESSION ESTIMATION
325
beta<-rep(0,nz*2) dim(beta)<-c(nz,2) for(k in 1:nz){ x0=x-z[k] w0<-kernel(x0/h) beta[k,]<-glm(y~x0,weight=w0)$coeff } return(beta) } ############################################################ h=0.02 # Local constant estimate mu_hat=local.constant(y,x,z,h,1) sigma_hat=local.constant(abs(y),x,z,h,1) sigma2_hat=local.constant(y^2,x,z,h,1) win.graph() par(mfrow=c(2,2),mex=0.4,bg="light yellow") scatter.smooth(x,y,span=1/10,ylab="",xlab="x(t-1)") points(z,mu_hat,type="l",lty=1,lwd=3,col=2) title(main="(a) y(t) vs x(t-1)",col.main="red") legend(0.04,0.0175,"Local Constant Estimate") scatter.smooth(x,abs(y),span=1/10,ylab="",xlab="x(t-1)") points(z,sigma_hat,type="l",lty=1,lwd=3,col=2) title(main="(b) |y(t)| vs x(t-1)",col.main="red") scatter.smooth(x,y^2,span=1/10,ylab="",xlab="x(t-1)") title(main="(c) y(t)^2 vs x(t-1)",col.main="red") points(z,sigma2_hat,type="l",lty=1,lwd=3,col=2) # Local Linear Estimate fit2=local.linear(y,x,z,h)
CHAPTER 8. NONPARAMETRIC REGRESSION ESTIMATION
326
mu_hat=fit2[,1] fit2=local.linear(abs(y),x,z,h) sigma_hat=fit2[,1] fit2=local.linear(y^2,x,z,h) sigma2_hat=fit2[,1] win.graph() par(mfrow=c(2,2),mex=0.4,bg="light green") scatter.smooth(x,y,span=1/10,ylab="",xlab="x(t-1)") points(z,mu_hat,type="l",lty=1,lwd=3,col=2) title(main="(a) y(t) vs x(t-1)",col.main="red") legend(0.04,0.0175,"Local Linear Estimate") scatter.smooth(x,abs(y),span=1/10,ylab="",xlab="x(t-1)") points(z,sigma_hat,type="l",lty=1,lwd=3,col=2) title(main="(b) |y(t)| vs x(t-1)",col.main="red") scatter.smooth(x,y^2,span=1/10,ylab="",xlab="x(t-1)") title(main="(c) y(t)^2 vs x(t-1)",col.main="red") points(z,sigma2_hat,type="l",lty=1,lwd=3,col=2) ############################################################
8.9
References
Bowman, A. (1984). An alternative method of cross-validation for the smoothing of density estimate. Biometrika, 71, 353-360. Cai, Z. (2002). A two-stage approach to additive time series models. Statistica Neerlandica, 56, 415-433. CAI, Z. and J. FAN (2000). Average regression surface for dependent data. Journal of Multivariate Analysis, 75, 112-142. Cai, Z., J. Fan and Q. Yao (2000). Functional-coefficient regression models for nonlinear time series. Journal of American Statistical Association, 95, 941-956. CAI, Z. and E. MASRY (2000). Nonparametric estimation of additive nonlinear ARX time series: Local linear fitting and projection. Econometric Theory, 16, 465-501.
CHAPTER 8. NONPARAMETRIC REGRESSION ESTIMATION
327
Cai, Z. and R.C. Tiwari (2000). Application of a local linear autoregressive model to BOD time series. Environmetrics, 11, 341-350. CHEN, R. and R. TSAY (1993). Nonlinear additive ARX models. Journal of the American Statistical Association, 88, 310-320. Chiu, S.T. (1991). Bandwidth selection for kernel density estimation. The Annals of Statistics, 19, 1883-1905. Engle, R.F., C.W.J. Grabger, J. Rice, and A. Weiss (1986). Semiparametric estimates of the relation between weather and electricity sales. Journal of The American Statistical Association, 81, 310-320. Fan, J. (1993). Local linear regression smoothers and their minimax efficiency. The Annals of Statistics, 21, 196-216. Fan, J., N.E. Heckman, and M.P. Wand (1995). Local polynomial kernel regression for generalized linear models and quasi-likelihood functions. Journal of the American Statistical Association, 90, 141-150. Fan, J., T. Gasser, I. Gijbels, M. Brockmann and J. Engel (1996). Local polynomial fitting: optimal kernel and asymptotic minimax efficiency. Annals of the Institute of Statistical Mathematics, 49, 79-99. Fan, J. and I. Gijbels (1996). Local Polynomial Modeling and Its Applications. London: Chapman and Hall. Fan, J. and Q. Yao (2003). Nonlinear Time Series: Nonparametric and Parametric Methods. New York: Springer-Verlag. Fan, J., Q. Yao and Z. Cai (2003). Adaptive varying-coefficient linear models. Journal of the Royal Statistical Society, Series B, 65, 57-80. Fan, J. and C. Zhang (2003). A re-examination of diffusion estimators with applications to financial model validation. Journal of the American Statistical Association, 98, 118-134. Fan, J., C. Zhang and J. Zhang (2001). Generalized likelihood test statistic and Wilks phenomenon. The Annals of Statistics, 29, 153-193. Gasser, T. and H.-G. M¨ uller (1979). Kernel estimation of regression functions. In Smoothing Techniques for Curve Estimation, Lecture Notes in Mathematics, 757, 2368. SpringerVerlag, New York. Granger, C.W.J., and T. Ter¨asvirta (1993). Modeling Nonlinear Economic Relationships. Oxford, U.K.: Oxford University Press. Hall, P., and C.C. Heyde (1980). Martingale Limit Theory and Its Applications. New York: Academic Press.
CHAPTER 8. NONPARAMETRIC REGRESSION ESTIMATION
328
Hall, P., and I. Johnstone (1992). Empirical functional and efficient smoothing parameter selection (with discussion). Journal of the Royal Statistical Society, Series B, 54, 475-530. HASTIE, T.J. and R.J. TIBSHIRANI (1990). Generalized Additive Models. Chapman and Hall, London. Hjort, N.L. and M.C. Jones (1996). Better rules of thumb for choosing bandwidth in density estimation. Working paper, Department of Mathematics, University of Oslo, Norway. Hong, Y. and Lee, T.-H. (2003). Inference on via generalized spectrum and nonlinear time series models. The Review of Economics and Statistics, 85, 1048-1062. Hurvich, C.M., J.S. Simonoff and C.-L. Tsai (1998). Smoothing parameter selection in nonparametric regression using an improved Akaike information criterion. Journal of the Royal Statistical, Society B, 60, 271-293. Jones, M.C., J.S. Marron and S.J. Sheather (1996). A brief survey of bandwidth selection for density estimation. Journal of American Statistical Association, 91, 401-407. Kreiss, J.P., M. Neumann and Q. Yao (1998). Bootstrap tests for simple structures in nonparametric time series regression. Unpublished manuscript. LINTON, O.B. (1997). Efficient estimation of additive nonparametric regression models. Biometrika, 84, 469-473. LINTON, O.B. (2000). Efficient estimation of generalized additive nonparametric regression models. Econometric Theory, 16, 502-523. LINTON, O.B. and J.P. NIELSEN (1995). A kernel method of estimating structured nonparametric regression based on marginal integration. Biometrika, 82, 93-100. MAMMEN, E., O.B. LINTON, and J.P. NIELSEN (1999). The existence and asymptotic properties of a backfitting projection algorithm under weak conditions. The Annals of Statistics, 27, 1443-1490. MASRY, E. and J. FAN (1997). Local polynomial estimation of regression functions for mixing processes. Scandinavian Journal of Statistics, 24, 165-179. MASRY, E. and D. TJØSTHEIM (1997). Additive nonlinear ARX time series and projection estimates. Econometric Theory, 13, 214-252. Øksendal, B. (1985). Stochastic Differential Equations: An Introduction with Applications, 3th edition. New York: Springer-Verlag. Priestley, M.B. and M.T. Chao (1972). Nonparametric function fitting. Journal of the Royal Statistical Society, Series B, 34, 384-392. Rice, J. (1984). Bandwidth selection for nonparametric regression. The Annals of Statistics, 12, 1215-1230.
CHAPTER 8. NONPARAMETRIC REGRESSION ESTIMATION
329
Rudemo, M . (1982). Empirical choice of histograms and kernel density estimators. Scandinavia Journal of Statistics, 9, 65-78 . Ruppert, D., S.J. Sheather and M.P. Wand (1995). An effective bandwidth selector for local least squares regression. Journal of American Statistical Association, 90, 1257-1270. Ruppert, D. and M.P. Wand (1994). Multivariate weighted least squares regression. The Annals of Statistics, 22, 1346-1370. Rousseeuw, R.J. and A.M. Leroy (1987). Robust Regression and Outlier Detection. New York: Wiley. Shao, Q. and H. Yu (1996). Weak convergence for weighted empirical processes of dependent sequences. The Annals of Probability, 24, 2098-2127. Sheather, S.J. and M.C. Jones (1991). A reliable data-based bandwidth selection method for kernel density estimation. Journal of the Royal Statistical Society, Series B, 53, 683-690. SPERLISH, S., D. TJØSTHEIM, and L. YANG (2000). Nonparametric estimation and testing of interaction in additive models. Econometric Theory, Stanton, R. (1997). A nonparametric model of term structure dynamics and the market price of interest rate risk. Journal of Finance, 52, 1973-2002. Sun, Z. (1984). Asymptotic unbiased and strong consistency for density function estimator. Acta Mathematica Sinica, 27, 769-782. TJØSTHEIM, D. and B. AUESTAD (1994a). Nonparametric identification of nonlinear time series: Projections. Journal of the American Statistical Association, 89, 13981409. Tjøstheim, D. and B. Auestad (1994b). Nonparametric identification of nonlinear time series: Selecting significant lags. Journal of the American Statistical Association, 89, 1410-1419. van Dijk, D., T. Ter¨asvirta, and P.H. Franses (2002). Smooth transition autoregressive models - a survey of recent developments. Econometric Reviews, 21, 1-47. Wand, M.P. and M.C. Jones (1995). Kernel Smoothing. London: Chapman and Hall.