simple linear regression

Analysis of Variance Approach to Regression Analysis

Rationale 

The Analysis of Variance (ANOVA) is a statistical principle that is based on partitioning total observed variation into several components with the aim of trying to explain the sources of such variation.



Total observed variation is often measured by the total of the squared deviations of each observation from the mean. 2

Rationale 

In the context of regression analysis in which we presume that the observations on the response variable can be expressed as a (linear) function of the independent variables in the form of yi   0   1 xi  ei

3

Rationale 

Based on sample data, and assuming that such a relation is true, the line that best fits the observed values is obtained as yi   0   1 xi ˆ

ˆ

ˆ



After fitting the said regression line, we now gather some evidence if indeed such model really holds in describing such relationship. 4

Rationale yi

y

( yi  yi ) ˆ

( yi  y ) yi ˆ

( yi  y ) ˆ

y

yi   0   1 xi ˆ

ˆ

ˆ

( yi  y )  ( yi  yi )  ( yi  y ) ˆ

ˆ

x 5

Total Deviation ( yi  y )  ( yi  yi )  ( yi  y ) ˆ

TOTAL DEVIATION

ˆ

Deviation of fitted regression value around the mean

Deviation around fitted regression line

6

Sum of Squares 2 2 y y y y y y      ( ) [( ) ( )]  i  i i i ˆ

ˆ

  ( yi  y )   ( yi  yi )  2 ( yi  yi )( yi  y ) 2

2

ˆ

ˆ

ˆ

ˆ

0

  ( yi  y )   ( yi  y )   ( yi  yi ) 2

2

ˆ

Total Sum of Squares (TSS)

Sum of Squares due to the Regression of y on x (SSR)

2

ˆ

Sum of Squares Error (SSE)

TSS = SSR + SSE 7

Degrees of Freedom 

Total Degrees of Freedom (associated with TSS) is (n-1). One degree of freedom is lost because: 



The deviations ( yi  y ) is subject to one constraint: sum=0; or, The sample mean is used to estimate the population mean

8


Degrees of Freedom due to Error : n-2. Two degrees of freedom are lost because we are estimating two parameters  0 and  1 in obtaining the fitted value yi ˆ

9


Degreed of Freedom due to Regression: 1. 

Although there are n deviations ( yi  y ) , all fitted values yi are calculated from the same regression line. ˆ

ˆ



Two df is associated with regression line but 1 df is lost because the deviations ( yi  y ) are subject to one constraint: sum is zero ˆ



Thus,

df Total  df Regression  df Error 10

Mean Squares 



In a general ANOVA, the mean squares are obtained by dividing the SS with it corresponding df. That is 

MSTot = SSTot/(n-1)



MSR = SSR/1; MSE = SSE/(n-2)

Note: MSTot  MSR + MSE

11

ANOVA Table 

Results of the Analysis of Variance are summarized in an ANOVA table:

Source of

df

SS

MS

1

SSR

MSR

Error

n-2

SSE

MSE

Total

n-1

Variation

Regression

12

Expected Mean Squares (EMS) 

The EMS are useful quantities that: 

Tells us what parametric function is being estimated by the MS [Method of Moments Estimator]



In some instances, this will suggest how the test-statistic will be defined to test specific hypotheses.

13

Expected Mean Squares (EMS)  



E[ MSE ]  

2

The mean of the sampling distribution of MSE is 2 whether or not X and Y are linearly related ( whether or not β1=0) 2

( SSE ) /  ~ 

2 ( n  2)

 E[ SSE /  ]  n  2 2

 SSE  2  E    E MSE [ ]   n  2 14

Expected Mean Squares (EMS 

2

2

E[ MSR ]     1

2 x x  ( )  i



The mean of the sampling distribution of MSR is also 2 when β1=0. In this case, MSR and MSE will tend to be of the same magnitude.



When, β10, MSR > MSE. Thus, a comparison of MSR and MSE may be used to determined whether or not β1=0. 15

Test of Hypothesis: H0:1=0 vs H1: 10 

From the EMS, it appears to be logical that to test this hypothesis, one can compare MSR and MSE.



From statistical theory and assuming normality of the error terms (Cochran’s Theorem):

16

Test of Hypothesis: H0:1=0 vs H1: 10 

MSR and MSE are independent



MSR ~  (1) ,



2

Thus, a logical test-statistic (GLRT) would be MSR Fc 



2

MSE ~  ( n  2)

MSE

~ F (1,n 2)

Reject H0 for large values of Fc or if Fc  F  ,(1,n  2 ) 17

General Linear Test Approach 

Another approach to test the hypothesis concerning regression parameters (or a function of such parameters).



First Fit the Full Model. In SLR case, yi   0   1 xi  ei



Compute: SSE ( F ) 



( yi  yi ) 2  ˆ



( yi  [  0   1 xi ]) 2  SSE ˆ

ˆ

18


Fit the Reduced Model under H0 (assuming H0 is true). In the SLR case, H0: β1=0. This means, we fit the (reduced) model: yi   0  ei



In this case, the value of β0 that minimizes



ei  2



( yi   0 ) 2 is  0  y ˆ

ˆ

19


Compute the SSE for the reduced model as: SSE ( R)   ( yi   0 ) 2   ( yi  y ) 2  SST ˆ



Note that in general, SSE ( F )  SSE ( R) due to the fact that the more parameters are employed in model fitting, the better the fit.

20


Test Statistic: SSE ( R )  SSE ( F ) F  *

df R  df F SSE( F )

~ F [ df R df F ,df F ]

df F 

Note that in the case of the SLR and testing H0:β1=0, 21

General Linear Test Approach SST  SSE 

SSR

MSR ( n  1)  (n  2) 1   F  SSE MSE MSE n2 *



Thus, the two tests are equivalent.



This approach can be extended for more complex tests.

22

simple linear regression

Recommend Documents