Non parametrical Estimation of the Regression used in Economic Analyses

Prof. Constantin ANGHELACHE PhD „Artifex” University of Bucharest Academy of Economic Studies, Bucharest Bucharest Prof. Gabriela Victoria ANGHELACHE PhD Academy of Economic Studies, Bucharest Bucharest Prof. Liviu BEGU PhD Academy of Economic Studies, Bucharest Bucharest Georgeta BARDAŞU PhD Student Academy of Economic Studies, Bucharest Bucharest

Abstract Non-parametric methods are useful, but raises some problems. In practice, they require a large number of observations and are used for a relatively small number of explanatory variables. Moreover, the result is sensitive to the choice of the smoothing parameter and to a lesser less er extent in the nucleus. They pose a problem for the presentation of results that can not be contained in a compact formula but can only be described by graphs. A non-parametric analysis does not allow extrapolation outside the range of observation, but econometric is an advantage. Key words: non-parametric methods, variables, regression function, appraisal JEL Classification: C01, C51 •

General aspects Contrary to the other domains, the economic theory is rarely mentioning functional forms but, usually, it specifies only a list of the relevant variables in order to explain a phenomenon. The specification of the relation form is resulting, to a great extent, out of an empirical study containing a “good” model which “works well”. A first level of analysis consists of writing a model (linear, logarithm linear, non-linear etc.) and performing the estimation without taking into account its approximate nature. A second approach consists of specifying a parametrical model which incorrect specification is explicit. This is leading, for instance, to the correction of the expression for variations or to the selection of the models for the erroneous specification.

38

Revista Română de Statistică Trim. I/2013 - Supliment

Practically, we have to get all the specified conditions by adopting a non parametrical approach when estimating the regression, in which the data themselves are selecting the form of the function to be built up. Various methods (models) for estimating the non-parametrical regression have been drawn up which are presently commonly used. We consider likewise the nucleus method, which is a simple one and, in certain situations, dominated by other approaches. The non-parametrical methods are useful but they are raising certain problems. In practice, they are requiring a large number of observations and are to apply to a relatively small number of explanatory variables. Moreover, the outcome is sensitive to the selection of the equalizing parameter and, to a smaller extent to the nucleus. They are raising a problem as to submitting the outcomes which cannot be covered by a compact formula but can be described by means of diagrams. A non-parametrical analysis does not allow an extrapolation outside the observation domain but, from the econometric point of view, this is an advantage. In order to redeem some of these difficulties, semi-parametrical methods have been developed which purpose is to estimate only certain characteristics of the regression or to constrain the regression function to satisfy certain conditions. The dimension of the issue is thus reduced and the obtaining of the outcomes facilitated. Meantime, it is possible to insert also structural conditionings to the model. For the beginning, we take into consideration the standard estimation of the regression nucleus and then, we discuss certain problems of the estimation for specific characteristics of the regression or the estimation under compulsion. •

The band lengths for the variables The previous expression is transformed in the following mode. In the q

q

dispersion terms, hn becomes ∏ h jn . j 1 =

In addition, the same argument as the one applied to the density can be utilized in order to set up the width of the band and nucleus. We can use the expression of the squared mean asymptotic integrated error in order to derive the best width of the band at z fix. This calculation implies that g and f are known. We can go on with estimating the f and g, first with a couple of initial values of the band width and, then, by using these estimations in order to improve the band width. This procedure is merely a delicate one because it requires the estimation of the differentials, which are converging slowly (and need a large sample) and the conditioned dispersion. This method of connecting has been also extended to the selection of a specific band width for each explanatory variable. After replacing the band width by its optimum value, we can look after an optimum nucleus, which is the Epanechnikov nucleus, as in the case of the density estimation.

Revista Română de Statistică Trim I/2013- Supliment

39

An alternative approach for selecting the optimum width consists of the socalled crossed validation method. The expression does not depend on hn and can be numerically minimized by observing hn for a given interval. The AMISE calculation is based on two conditions, respectively: the fact that dar and on the double difference of the observation density. The distance between g and g n , measured by AMISE can be reduced by assuming a differentiation at a higher order or by selecting K so that: for j < r In this case, the smallest r in this formula is called the order of the nucleus K. To note that when K is a density of measurement of the probability (K non negative), then r equals to 2. The term of the systematic error is then equal, up to a multiplicative constant, with h n2 min( s , r ) , where s is the order of the differentiation and r is the order of the nucleus. The disadvantage of the nucleuses of high order, of order higher then 2,, is that there are no more densities and the estimated densities can be negative, at least on small samples. When Lf hn equals to the optimum choice, the convergence rate

nhn

q

becomes:

This is the convergence non-parametrical optimum rate with the measure q which can be compared with the usual parametrical rate, namely n . We are checking the fact that indeed the interval between the two rates increases along with the increase of q . In order to utilize this outcome in practice, we must estimate the density and the conditional dispersion. The density is estimated by the nucleus and, similarly, the conditional dispersion. •

The estimation of the regression function transformation Instead of the estimation of the regression function, we can analyze a transformation of this function. The option for this transformation is grounded by 40


the economic analysis which defines the parameters of interest. Obviously , there are many transformations which can be considered but we shall focus on a specific class characterized by the relation:

~ | z ~ = z ) , and w(z) is a weight function which is In this formula , g ( z ) = E ( y either scalar, or vectorial and satisfies

w(z)=0 if f m arg ( z ) = 0 , which is natural

since g(z) is defined only if f m arg ( z ) > 0 . The parameter of interest λ

is scalar

or vectorial. This class of transformation is justified by the properties of the resulting estimator λ and, meantime, by its relevance as regards many issues of applied econometrics, which are special situations of thes e analyses. Before entering into details, we notice the fact that this transformation does not insert the over-determination of the conditions on the variables distribution. We shall estimate the mean of the regression differentials. We have seen that the parametrical estimation of a regression erroneously specified does not allow us to consistently estimate the differentials of this function in a certain point. In many econometrical issues, the differentials are parameters of interest. The estimation is possible but its rate of convergence is very slow and, consequently, requires a large sample. Nevertheless, in many applications it is enough to estimate the mean of the regression differentials, namely:

where α is a multiple index of the derivation and ∂ α is the derivation defined by this multiple index. The function v(z) is a density on the explanatory variable which can be equal to f m ( z ) , the density of the actual explanatory variable being studied. We shall analyze the under-additively test. In order to illustrate this situation, let’s assume that the function C is the function cost which associates an expected cost with the quantities of the different products z . The economic theory is interested in the under-additively C, namely it is:

p

Which means that, the cost of a company producing

∑ z j , is lower than the cost j =1

of several companies each producing z j . The above property must be true for each p and each sequence ( z 1 ,..., z p ) . It is easy to show that this property is equivalent to the property which will be explicitly shown by the content. If ϕ is


41

~ the density of the sum z + ... + z and ϕ the density the density ( z 1 ,..., z p ) , ϕ 1 p j z j , than, it is equivalent with the fact that for each ϕ , we have:

The reciprocal is resulting by taking into account the distribution on ( z 1 ,..., z p ) focused in one point. Now, we shall approach the under-additively test. The previous relation suggests that there is a λ defined, namely:

the sign of this parameter having to be tested. The estimation of λ defined can be made in two modes. The first variant consists of the estimation of g followed by the calculation. The second approach avoids the estimation g and is based on the particularity given by the utilized (final) function:

This condition is seldom satisfied. We can replace

f m arg with a

parametrical or non-parametrical estimation. Implicitly, we assume that w is given. In practice, iv can be partially or totally unknown (since it is, for instance, a function of f m arg ) and thus w must be replaced by an estimation. A procedure of adjustment is inserted sometimes, consisting of the elimination of the data placed at the limit of the support of the explanatory variables distribution. The adjustment can be inserted in the function w as the form of a function with multiplying indicator.

ˆ at λ . Indeed, we The main asymptotic result is the convergence rate λ n know:

in the frame regularity conditionings and under the condition that the bands width have an adequate asymptotic behavior. In order to limit the problems of dimensioning or to impose certain restrictions originating in the economic theory, we often assume that the conditioned probability g(z), which is a function of the variables q, depends in fact on the functions of a reduced number of variables and, possibly, on certain parameters. In fact, there are two points of view being expressed: either we assume that g is actually restricted to this specific form or we 42


are searching for the best approximation g through an element satisfying the considered restrictions.

References Anghelache, C. (coord., 2012) – „ Modele statistico – econometrice de analiz ă economic – utilizarea modelelor în studiul economiei României ”, Revista Română de Statistică, Supliment Noiembrie 2012 Bardsen, G., Nymagen, R., Jansen, E. (2005) – „The Econometrics of Macroeconomic Modelling”, Oxford University Press Benjamin, C., Herrard A., Hane e-Bigot, M., Tave re, C. (2010) – „ Forecasting with an Econometric Model ”, Springer Dougherty, C. (2008) – “ Introduction to econometrics. Fourth edition”, Oxford University Press Jesus Fernandez-Villaverde & Juan Rubio-Ramirez (2009) – “ Two Books on the New Macroeconometrics”, Taylor and Francis Journals, Econometric Reviews Mitruţ, C. (2008) – „ Basic econometrics for business administration”, Editura ASE, Bucureşti Voineagu, V., Ţiţan, E. şi colectiv (2007) – “Teorie şi practică econometrică”, Editura Meteor Press


43

Non parametrical Estimation of the Regression used in Economic Analyses

Recommend Documents