Regression and Correlation Analysis Violeta Bartolome
Senior Associate Scientiest PBGB-CRIL
[email protected]
Correlation Analysis • A measure of association between two numerical variables. • Example (positive correlation) o As
soil fertility increases, rice grain yield also increases
IRRI-PBGB-CRIL
2
Example
For seven randomly nitrogen content in the soil and the grain yield were recorded.
Nitrogen Content (%)
Grain Yield (kg/ha)
0.12
1652
0.14
2056
0.15
2598
0.16
2734
0.19
3238
0.22
4824
0.23
4858
IRRI-PBGB-CRIL
3
Print document
In order to print this document from Scribd, you'll How would you describe the graph? first need to download it.
Grain YieldCancel of Rice at differntAnd levels Download Print of Soil Nitrogen Content 6000 ) 5000 a h / g k ( d l e i Y 3000 n i a r G
2000 1000 0.1
0.15
0.2
0.25
Nitrogen Content (%)
How “strong” is the linear relationship? IRRI-PBGB-CRIL
4
Print document
Measuring the Relationship In order to print this document from Scribd, you'll first need to download it. Cancel
Download And Print
Pearson’s Sample Correlation Coefficient, r measures the direction and the strength of the linear association between two numerical paired variables.
IRRI-PBGB-CRIL
5
Print document In order to print this document from Scribd, you'll first need to download it.
Direction of Association Download And Print
Cancel
Positive Co Correlation
Negative Co Correlation
IRRI-PBGB-CRIL
6
Print document
Strength of Linear Association In order to print this document from Scribd, you'll first need to download it. Cancel
r
value
Download And Print
Interpretation
1
perfect positive linear relationship
0
no linear relationship
-1
perfect negative linear relationship IRRI-PBGB-CRIL
7
Print document In order to print this document from Scribd, you'll first need to download it.
Strength of Linear Association Cancel
Download And Print
Perfect Linear Positive Correlation
No Linear Correlation
IRRI-PBGB-CRIL
8
Print document
In order to print this document from Scribd, you'll first need to download it.
Other Strengths of Association Cancel
r
value
Download And Print
Interpretation
.
s rong assoc a on
0.5
moderate association
0.25
weak association IRRI-PBGB-CRIL
9
Print document In order to print this document from Scribd, you'll first need to download it.
Other Strengths of Association Cancel
Download And Print
Strong Positive Linear Correlation
Moderate Negative near or re a on
IRRI-PBGB-CRIL
10
Print document In order to print this document from Scribd, you'll first need to download it.
Formula
Download And Print
Cancel
= n = number of paired items x i = input variable x = x-bar = mean of ’s x ’s s x = standard deviation of x ’s ’s
y i = output variable y = y-bar = mean of ’s y ’s s y = standard deviation of y ’s ’s
IRRI-PBGB-CRIL
11
Print document
In order to print this document from Scribd, you'll first need to download it.
Correlation Coefficient (r) Cancel
Download And Print
r=0 does not necessarily mean no relationshi . Relationshi ma be nonlinear.
IRRI-PBGB-CRIL
12
Print document
In order to print this document from Scribd, you'll first need to download it.
Correlation Coefficient Cancel
Download And Print
IRRI-PBGB-CRIL
13
Print document
In order to print this document from Scribd, you'll first need to download it.
Correlation Coefficient (r) Cancel
Download And Print
A significant r does not necessarily mean a strong linear relationship
IRRI-PBGB-CRIL
14
Print document In order to print this document from Scribd, you'll first need to download it.
Correlation Coefficient Cancel
Download And Print
50 0
r = .25** n = 234
45 0
40 0
When no. of o serva ons s large, a low r-value may still be significant.
35 0 t o l p / d 30 0 l e i Y
25 0
20 0
15 0
10 0 0
5
10
15
20
Tiller/plant
IRRI-PBGB-CRIL
15
Print document
In order to print this document from Scribd, you'll first need to download it.
Correlation Coefficient (r) Cancel
Download And Print
To be able to conclude that 2 variables have a strong linear relationship, r should be both high and si nificant
IRRI-PBGB-CRIL
16
Print document In order to print this document from Scribd, you'll first need to download it.
Correlation Coefficient Cancel
Download And Print
6
r = .90** n = 60
5
4 ) a h / t ( 3 d l e i Y
2
1
0 20
30
40
50
60
70
80
90
100
110
No. of spikelet/panicle
IRRI-PBGB-CRIL
17
Print document In order to print this document from Scribd, you'll first need to download it.
Test of significance for r Download And Print
Cancel
Degrees of Freedom
Probability, p
0.05
0.01
0.001
1
0.997
1.000
1.000
2
0.950
0.990
0.999
3
0.878
0.959
0.991
4
0.811
0.917
0.974
5
0.755
0.875
0.951
6
0.707
0.834
0.925
.
.
.
8
0.632
0.765
0.872
9
0.602
0.735
0.847
10
0.576
0.708
0.823
11
0.553
0.684
0.801
12
0.532
0.661
0.780
13
0.514
0.641
0.760
14
0.497
0.623
0.742
15
0.482
0.606
0.725
16
0.468
0.590
0.708
17
0.456
0.575
0.693
18
0.444
0.561
0.679
19
0.433
0.549
0.665
20
0.423
0.457
0.652
IRRI-PBGB-CRIL
r is significant if the absolute value is reater that the tabular value.
18
Print document
In order to print this document from Scribd, you'll first need to download it. Cancel
CORRELATION CORRELATION ANALYSIS ANALYSIS
Download And Print
PEARSON CORRELATION CORRELATION ANALYSIS Nitrogen.Content Grain.Yield Nitrogen.Content Nitrogen.Content Coef 1 0.99 P-value 1 1e-04 . . P-value 1e-04 1
IRRI-PBGB-CRIL
19
Print document
In order to print this document from Scribd, you'll first need to download it. Cancel
R
r
Download And Print
i n An l
IRRI-PBGB-CRIL
i
20
Print document
In order to print this document from Scribd, you'll first need to download it.
Scientific Question Cancel
Download And Print
What is the growth rate of a rice plant? Growth rate can be defined as the change in height er unit of time.
IRRI-PBGB-CRIL
21
Print document In order to print this document from Scribd, you'll first need to download it.
Data Collection Cancel
Download And Print
DAS
Height (cm)
0
0
10
12
30
55
60
80
90
110
IRRI-PBGB-CRIL
22
Print document In order to print this document from Scribd, you'll first need to download it.
Statistical Questions
Download And Print
Cancel
120 )100 m c ( t 80 h g i 60 e H t 40 n a l P 20
0 0
20
40
60
Days Days after Se eding
80
100
• What What is the the re rela lati tion onsh ship ip between age and height? Linear • How do I describe ibe or or quanti y t e re ations ip? Regression • Is the the associ ociation ion significant? Statistical Test
IRRI-PBGB-CRIL
23
Print document
In order to print this document from Scribd, you'll first need to download it.
Linear Regression Cancel
Download And Print
• A general method for estimating or describing association between a continuous outcome variable (dependent) and one or multiple predictors in one equation. o o
One predictor: Simple linear regression Multiple predictors: Multiple linear regression IRRI-PBGB-CRIL
24
Print document In order to print this document from Scribd, you'll first need to download it.
Statistical Model Download And Print Cancel
Data = Model Fit + Residual i =
Y
ˆ
i
i
ˆ = β + β X Y i i 0 1 Inte Interrcept Slope
X
Y i = µ + α i + ε i IRRI-PBGB-CRIL
25
Print document In order to print this document from Scribd, you'll first need to download it.
Least Squares Estimates Cancel
Download And Print
ˆ + ε Y i = Y i i
ˆ = β + β X Y 0 1 i i
To estimate the intercept and slope, minimize residual sum of squares (RSS (RSS ) RSS = ∑ε i
2
∂ RSS ∂β 0
=
∂ RSS ∂β 1 ==>
=
∑ (Y i − Y ˆi ) 2 =∑ (Y i − β 0 − β 1 X i ) 2
∑ (Y i − β 0 − β 1 X i ) ∂β 0 ˆ β 0
==>
=
=
2
2∑ (Y i − β 0 − β 1 X i ) = 0
=−
ˆ X Y − β 1
∑ (Y i − Y + β 1 X − β 1 X i ) 2
ˆ = β 1
∂β 1
=−
2∑ ( X i − X )( X )(Y Y i − Y + β 1 X − β 1 X i ) = 0
X )(Y Y i − Y ) Y ) ∑ ( X i − X )( X ) 2 ∑ ( X i − X )
IRRI-PBGB-CRIL
We don’t have to do the estimation by hand. R/CropStat or other statistical packages can do the work for us. 26
Print document In order to print this document from Scribd, you'll first need to download it. ANALYSIS ANAL YSIS
LINEAR REGRESSION REG RESSION Dependent Variable: Height
Cancel
Analysis of Variance SV Df Sum Square DAS 1 8201.389781 Residuals 3 257.810219
Download And Print
Mean Square 8201.389781 85.93674
F value 95.435198
Pr (>F) 0.002279
Parameter Estimates Parameter Estimate
Std. Error
t value
Pr (> |t|)
(Intercept) DAS
6.311259 6.311259 0.125227
0.778356 9.769094
0.493109 0.002279
Model Summary quare Adj. R Squared
. 0.959364
4.912409 1.223358
IRRI-PBGB-CRIL
27
Print document In order to print this document from Scribd, you'll first need to download it.
Example: Growth Rate Data Cancel
Parameter Estimates Parameter Estimate Estimate (Intercept) 4.912409 DAS 1.223358
Download And Print
Std. Error 6.311259 6.311259 0.125227
t value 0.778356 9.769094
Pr (> |t|) 0.493109 0.002279
140 120 Height =4.9+ 1.223DAS 1.223DAS r = 0.98
) 100 m c ( t h 80 g i e H 60 t n a l P 40
Intercept: The height at age 0 is 4.9 cm. Slope: The height increase per day after seed seedin ing g is 1.22 1.223 3 cm. cm.
20 0 0
20
40
60
80
100
Days after Seeding
IRRI-PBGB-CRIL
28
Print document In order to print this document from Scribd, you'll first need to download it.
Prediction
Download And Print
Cancel
140
Given the regression line, it can be predicted that the hei ht at 40 da s after seeding will be 53.8 cm.
120 Height =4.9+ 1.223DAS r = 0.98
) 100 m c ( t h g i e H 60 t n a l P 40
20 0 0
20
40
60
80
100
Days Days after Se eding
IRRI-PBGB-CRIL
29
Print document In order to print this document from Scribd, you'll first need to download it.
Example: Growth Rate Data Cancel
Analysis of Variance SV Df Sum Square DAS 1 8201.389781 Residuals 3 257.810219
Download And Print
Mean Square 8201.389781 85.93674
F value 95.435198
Pr (>F) 0.002279
Model Summary Adj. R Squared
. 0.959364
Sums of Squares
∑ (Y i − Y ) =∑ (Y i − Y ˆi + Y ˆi − Y ) =∑ (Y ˆi − Y ) 2
SST Degrees of freedom n-1 R
2
=
ˆ − Y ) (Y ∑ i = SST ∑ (Y i − Y ) 2
SSM
2
2
2
+
∑ (Y i − Y ˆi )
SSM
SSE
1
n-2
2
RRI-PB GB-CRIL 30 X. R 2 is Ithe fraction of variation in Y explained by
Print document In order to print this document from Scribd, you'll first need to download it.
Linear Regression vs. ANOVA Cancel
Download And Print
ANOVA Dependent: Continuous Independent: Categorical
Linear regression Dependent: Continuous Independent: Continuous
Linear models ANOVA and regression are the same thing!!!
IRRI-PBGB-CRIL
31
Print document
In order to print this document from Scribd, you'll first need to download it.
Misuse of Regression and Correlation Cancel Analysis Download And Print
• Performing regression and correlation on spurious data could could give significant results. But this this is not a valid indication of a linear relationship.
IRRI-PBGB-CRIL
32
Print document
In order to print this document from Scribd, you'll first need to download it.
Misuse of Regression and Correlation Cancel Analysis Download And Print
• Extrapolation of results o scope of data is extended. Example If the relationship of yield IR8 and stemborer
o
If the relationship between grain yield and protein content from varietal trials is assumed to be applicable to other types of experiments such as fertilizer trials
functional relationship is assumed to hold beyond the range of X values tested IRRI-PBGB-CRIL
33
Print document In order to print this document from Scribd, you'll first need to download it.
Misuse of Regression and Correlation Cancel Analysis Download And Print 11000
10000 y = 23.751x + 4307.2 r = 0.987**
9000
There is no evidence if a near re a ons p s o s above N = 180 kg/ha
) a h / g 8000 k ( d l e i Y 7000 n i a r G
6000
5000
4000 0
30
60
90
120
150
180
210
240
N-rate (kg/ha)
IRRI-PBGB-CRIL
34
Print document
In order to print this document from Scribd, you'll first need to download it.
Coefficient of Determination (R 2) Cancel
•
Download And Print
Percentage of the total variation that is explained by the linear function.
For example, with an R 2 value of 0.64, the implication is 64% [(0.64)(100) = 64] of the variation in the variable Y can be explained by the linear function of the variable X.
IRRI-PBGB-CRIL
35
Print document
In order to print this document from Scribd, you'll first need to download it.
Problems with R 2 •
Cancel
R 2
tends to increase as additional variables are included to a regression equation, regardless of their true importance in determining the values of the dependent variable 2
Ra2
•
Download And Print
=
1−
2 a
n −1 n − ( p + 1)
(1 − R 2 )
where n
=
no . of observatio ns
p
=
no . of independen t var iables
Gives no information on the appropriateness of the model IRRI-PBGB-CRIL
36
Print document
Problems with R 2
In order to print this document from Scribd, you'll first need to download it.
Download And Print
Cancel
Curvilinear data fitted by a straight line with high R2
Segregated data fitted by a straight line with high R2
For detecting these kinds of departures from the regression model there is no substitute to plotting the data IRRI-PBGB-CRIL
37
Print document
In order to print this document from Scribd, you'll first need to download it. Cancel
Download And Print
Thank you!
IRRI-PBGB-CRIL
38