Linear Regression Models

Ali Jenzarli, Ph.D.

All Rights Reserved

Copyright ©

Linear Regression Models Simple Linear Regression

Simple linear regression is the study of the linear relationship between two random variables X and Y. We call X the independent or explanatory variable and we call Y the dependent, the predicted or the forecast variable. We assume that we can represent represent the relationship between population values of X and Y using the equation of the linear model Y i = β 0 + β 1 X i + ε i . We call β 0 the Y-intercept and it represents the expected value of Y that is independent of X or or the expected value of Y when X equals zero, if appropriate. We call β 1 the slope and it represents the expected change in Y per unit change in X, i.e., the expected marginal change in Y with respect to X. Finally, we call ε i the random error in Y for each observation i . Given a sample of X and Y values, we use the method of least squares to estimate sample values for β 0 and β 1 which we call b0 and b1 , respectively. We represent the predicted value of Y using the prediction line equation or the simple linear regression equation ˆ Y

=

b0

+

b1 X .

We call b0 the sample Y-intercept and it represents the expected value of Y that is independent of X (or the expected value of Y when X=0, if appropriate), and we call b1 the sample slope and it represents the expected change in Y per unit change in X, i.e., the expected marginal change in Y with respect to X. Finally, we call Y ˆ the predicted value of Y. The coefficient of Determination and the Correlation coefficient

The coefficient of determination is the statistic r 2 and it measures the proportion of the linear variation in Y that is explained by X using the regression model. The correlation coefficient is the statistic association between X and Y.

r and

it measures the strength of the linear

t Test for the Slope β 1

H0: β 1 = 0 (there is no linear relationship) H1: β 1 ≠ 0 (there is a linear relationship) relationship) For α = .05 the t he p-value for the slope (the coefficient of the explanatory variable) should be less than .05 for th e sample slope slop e b1 to be statistically significantly different from zero, indicating the presence of a statistically significant linear relationship between X and Y.

Ali Jenzarli, Ph.D.

All Rights Reserved

Copyright ©

Ali Jenzarli, Ph.D.

All Rights Reserved

Copyright ©

Important Notice: The p-value for the slope and the Significance F are the same in Simple Linear Regression, leading to the same conclusion. Acceptable values of α are less than 0.10 with preferred values being less than or equal to 0.05. Multiple-Linear Regression

Multiple-linear regression or multiple regression is the study of the linear relationship between more than two random variables X1, X2, …, Xk , and Y. We call the Xi, i=1,2,…,k, the independent or explanatory variables and we call Y the dependent, the predicted or the forecast variable. We assume that we can represent the relationship between population values of Xi and Y using the equation of the linear model Y i = β 0 + β 1 X i + β 2 X 2 + ... + β k X k + ε i . We call β 0 the Y-intercept and it represents the expected value of Y that is independent of X or the expected value of Y when each Xi equals zero, if appropriate. We call β i the slope of Y with variable Xi, holding each X j, j≠i, constant; and it represents the expected change in Y per unit change in X i, i.e., the expected marginal change in Y with respect to Xi, holding each X j, j≠i, constant. Finally, we call ε i the random error in Y for each observation i . Given a sample of X and Y values, we use the method of least squares to estimate sample values for β 0 and β i which we call b0 and bi for all i = 1,…,k, respectively. We represent the predicted value of Y using the prediction line equation or the simple linear regression equation ˆ Y

=

b0

+

b1 X 1

+ ... +

bk X k .

We call b0 the sample Y-intercept and it represents the expected value of Y that is independent of all Xi (i=1, 2,…, n) in the model or the expected value of Y when each X i equals zero, if appropriate. We call bi the sample slope of Y with variable Xi , holding each X j, j≠i, constant; and it represents the expected change in Y per unit change in X i, i.e., the expected marginal change in Y with respect to Xi, holding each X j, j≠i, constant. Finally, we call Y ˆ the predicted value of Y. The Adjusted

r

2

The adjusted r 2 measures the proportion of the linear variation in Y that is explained by all Xi (i=1, 2,…, n) in the multiple-regression model—adjusted for the number of independent variables (Xi) and sample size. The coefficient of Partial Determination

The coefficient of partial determination measures the proportion of the linear variation in Y that is explained by a particular Xi , holding each X j, j≠i, constant.

Ali Jenzarli, Ph.D.

All Rights Reserved

Copyright ©

Ali Jenzarli, Ph.D.

All Rights Reserved

Copyright ©

Significance Testing

In a multiple-regression model with two or more independent variables we recommend that those independent variables that fail the t-test of significance for their slope coefficients, be removed from the model, one independent variable at a time, and the model be run again without them. We also recommend that the independent variable that has the highest insignificant p-value be removed first and the model be run again without this variable. This process should be repeated until we are left with only those independent variables which pass the significance test. Recall that this t-test of significance is the same as the slope test in a simple linear regression model (outlined above). Caution: Removing all independent variables that fail the t-test after the first run is not advisable because two or more of these variables might be collinear or highly correlated, i.e. they explain the same variability in the dependent variable Y. Interactions

Interaction terms, or cross-product terms, are introduced into a multiple-regression model when the effect of an independent variable, Xi on the dependent variable Y changes according to the values of other independent variables, X j , j≠i. In such cases we recommend running the model with all the relevant interaction terms, and removing those interaction terms that are not statistically significant, one term at a time starting with the one that has the highest insignificant p-value, while keeping those that are statistically significant, then running the model again followed by significance testing to confirm the effects of all remaining interaction terms.

Ali Jenzarli, Ph.D.

All Rights Reserved

Copyright ©

Linear Regression Models

Recommend Documents