CHEPTER # 4 SIMPLE REGRESSION AND CORRELATION EXERCISE # 4 BY SHAHID MEHMOOD
sm_078@hotma sm_078 @hotmail. il.com com
LINEAR REGRESSION
Linear regression attempts to model the relationship between two variables by fitting a linear equation to observed data. One variable is considered to be an explanatory variable or independ independent ent variable, variable, and the other is conside considered red to be a depende dependent nt variabl variable. e. For example, a modeler might want to relate the weights of individuals to their heights using a linear regression model. Before attempting to fit a linear model to observed data, a modeler should first determine whether or not there is a relationship between the variables of interest. This does not necessarily imply that one variable causes the other (for example, higher SAT scores do not cause higher college grades), but that there is some significant association between the two variables. A scatter scatter plot can be a helpful tool in determining the strength of the relati relationsh onship ip betwee between n two variabl variables. es. If there there appear appearss to be no associa association tion between between the proposed explanatory and dependent variables (i.e., the scatter plot does not indicate any increasing or decreasing trends), then fitting a linear regression model to the data probably will not provide a useful model. Linear Regression lines (i) (i)
Regr egressi essio on lin linee wh when Y dep depeends up upon X. X.
A linear regression line has an equation of the form Y = a + bX , where X is the explanatory variable and Y is the dependent variable. The slope of the line is b, and a is the intercept y when x = 0). (the value of y
(ii) (ii)
Regre egress ssio ion n lin linee wh when X dep depen ends ds up upo on Y. Y.
A linear regression line has an equation of the form X = a0 + b0Y , where Y is is the explanatory variable and X is the dependent variable. The slope of the line is b0, and a0 is x when y= 0). the intercept (the value of x FITTING OF REGRESSION LINES BY LEAST-SQUARES METHOD The most common method for fitting a regression line is the method of least-squares. This method calculates the best-fitting line for the observed data by minimizing the sum of the squares of the vertical deviations from each data point to the line (if a point lies on the fitted line exactly, then its vertical deviation is 0). Because the deviations are first squared, then summed, there are no cancellations between positive and negative values. We may work out the values of “a” and “b” by using the following formulae. a
=
y − b x
Page 1 of 13
CHEPTER # 4 SIMPLE REGRESSION AND CORRELATION EXERCISE # 4 BY SHAHID MEHMOOD
[email protected]
n∑ xy − ∑ x ∑ y b
n∑ x 2 − ( ∑ x )
=
2
Example # 1
i) ii) x y
Fit a regression line y on x by the method of least squares Estimate when x = 12 2 3 6 5 7 10 12
8
Solution: x 2 3 6 8 19
y 5 7 10 12 34
n∑ xy − ∑ x ∑ y n∑ x − ( ∑ x ) 2
b
=
x =
4.75 ,
y
a
=
y − b x
Y
=
3.17+1.2X
(ii)
8.5
=
3.17
X = 12 Y = 3.17+1.2(12)
X2
4 9 36 64 113
4(187) − (19)(34)
2
=
xy 10 21 60 96 187
=
=
4(113) − (19)
2
=
1.2
17.57
CORRELATION
Correlation is a statistical technique that can show whether and how strongly pairs of variables are related. For example, height and weight are related; taller people tend to be heavier than shorter people. The relationship isn't perfect. People of the same height vary in weight, and you can easily think of two people you know where the shorter one is heavier than the taller one. Nonetheless, the average weight of people 5'5'' is less than the average weight of people 5'6'', and their average weight is less than that of people 5'7'', etc. Correlation can tell you just how much of the variation in peoples' weights is related to their heights. Although this correlation is fairly obvious your data may contain unsuspected correlations. You may also suspect there are correlations, but don't know which are the strongest. An Page 2 of 13
CHEPTER # 4 SIMPLE REGRESSION AND CORRELATION EXERCISE # 4 BY SHAHID MEHMOOD
[email protected]
intelligent correlation analysis can lead to a greater understanding of your data. CORRELATION COEFFICIENT
For a set of variable pairs, the correlation coefficient gives the strength of the association. The square of the size of the correlation coefficient is the fraction of the variance of the one variable that can be explained from the variance of the other variable. It is denoted by “r”
The value of r is such that -1 < r < +1. The + and – signs are used for positive linear correlations and negative linear correlations, respectively. Positive correlation: If x and y have a strong positive linear correlation, r is close to +1. An r value of exactly +1 indicates a perfect positive fit. Positive values indicate a relationship between x and y variables such that as values for x increases, value for y also increase. Negative correlation: If x and y have a strong negative linear correlation, r is close to -1. An r value of exactly -1 indicates a perfect negative fit. Negative values indicate a relationship between x and y such that as values for x increase, values for y decrease. No correlation: If there is no linear correlation or a weak linear correlation, r is close to 0. A value near zero means that there is a random, nonlinear relationship between the two variables Note that r is a dimensionless quantity; that is, it does not depend on the units employed. A perfect correlation of ± 1 occurs only when the data points all lie exactly on a straight line. If r = +1, the slope of this line is positive. If r = -1, the slope of this line is negative. A correlation greater than 0.8 is generally described as strong , whereas a correlation less than 0.5 is generally described as weak . These values can vary based Page 3 of 13
CHEPTER # 4 SIMPLE REGRESSION AND CORRELATION EXERCISE # 4 BY SHAHID MEHMOOD
[email protected]
upon the "type" of data being examined. A study utilizing scientific data may require a stronger correlation than a study using social science data. Calculation of Product Moment correlation coefficient
n∑ xy − ∑ x ∑ y r
[n∑ x
=
2
− (∑ x )
2
] n∑ y
n
− (∑ y )
2
Question # 1
A study was made to know the relation between advertising expenditure (x) and the increase in sales (y). Following data were obtained. X: Y:
140 80
120 75
125 78
i) ii)
Plot a scatter diagram. Find the regression line to predict increase in sales from advertisement expenditure. Estimate increase in sales when advertising expenditure is 250
iii)
120 76
130 82
150 90
140 87
Question # 1
Given the following data, fit the regression lines (y on x). Page 4 of 13
160 100
180 120
195 126
125 130
150 125
CHEPTER # 4 SIMPLE REGRESSION AND CORRELATION EXERCISE # 4 BY SHAHID MEHMOOD
[email protected]
X
2.80
2.10
3.75
2.80
2.55
3.00
2.50
2.20
2.90
3.80
Y
3.94
2.63
3.20
3.57
2.25
2.80
3.40
2.00
3.70
3.97
2.7 0 3.2 0
3.4 5 3.9 0
Ans:, y=0.75X+.05 Question # 2
Given the data: 3 8
X Y
7 3
5 2
4 2
3 3
3 4
9 3
6 7
Fit a regression line of X on Y and hence estimate X if Y=4.5. Ans
X= -0.194y+5.776; 4.903
Question # 3
Calculate the equation of the least squares regression line of y on x from the following data. X Y
1 5
3 3
3 2
4 2
5 0
5 1
Ans: Y = 5.88 – 1.06X Question # 4
An organization has collected the following data showing relationship between price charged and quantities sold: P 5
6
7
8
9
10
12
13
rice Q 590 560 555
540 525 500 480 475
ty i)Determine the regression line equation. ii)Compute the quantity that the company may produce if it wishes to sell at the price of 18
Page 5 of 13
CHEPTER # 4 SIMPLE REGRESSION AND CORRELATION EXERCISE # 4 BY SHAHID MEHMOOD
[email protected]
Question # 5
The manager of an educational computer facility would like to develop a model to predict the number of services calls per annum for interactive terminals based upon the age of the terminal. sample of 10 terminals was selected. The data follows: Terminal No.of service calls Age(years)
1 3
2 4
3 3
4 5
5 5
6 7
7 8
8 10
9 10
10 12
1
1
2
2
3
3
4
4
5
5
Fit a regression line to predict the number of services calls. Ans: Y= 0.092+0.434X Question # 6
Compute the coefficient of correlation between height (cm.) and weight (kg.) of six adults. Heights(cm ) Weight (kg)
170
175
176
178
183
184
57
64
70
76
71
82
Ans ; 0.864 Question # 7
A personnel officer is studying performances of job applicants on two tests given when the applicant contacts the firm. The first test measures mental ability; the second measure potential for success in the job. The test-score results of a sample of eleven applicants are shown below: Applicants Mental ability(X) Potential(Y)
A 3 7 6 3
B 40
C 36
D 49
E 36
F 40
G 39
H 47
I 32
J 65
K 27
42
41
39
38
49
25
29
15
52
25
Calculate the sample correlation coefficient. Ans; 0.41
Page 6 of 13
CHEPTER # 4 SIMPLE REGRESSION AND CORRELATION EXERCISE # 4 BY SHAHID MEHMOOD
[email protected]
Question # 8
The following data were obtained in a study of the relationship between the weight and chest size of infants at birth: Weight(kg 3.71 ) Chest size 28.7
2.31
4.30
3.21
4.32
2.75
5.52
2.15
4.41
28.3
30.3
27.2
27.7
29.5
36.5
26.3
32.2
Compute and interpret the sample correlation coefficient. Ans; 0.784 Question # 9
Calculate the coefficients of correlation between the values of X and Y from the following tables. X Y
76 123
87 135
95 154
67 110
57 105
77 134
59 121
59 106
Ans; 0.917
Question # 10
State in each case whether you would expect a positive correlation, a negative correlation, or no correlation: i)The ages of husbands and wives; ii)The amount of rubber on tires and the number of miles they have been driven; iii)Shoe size and IQ; iv)The weight of the load of trucks and their petrol consumption.
Question # 11
For the data of heights and weights of 5 men:
Page 7 of 13
CHEPTER # 4 SIMPLE REGRESSION AND CORRELATION EXERCISE # 4 BY SHAHID MEHMOOD
[email protected]
Height Weight
64 160
68 170
70 180
72 190
74 195
i)Establish a least squares equation of Regression between height and weight. ii)Calculate Co-efficient of Correlation. Question # 12
A computer while calculating the correlation co-efficient between two variables X and Y from 25 pairs of observation obtained the following sums: ∑X = 125 ∑X2 = 650 ∑Y = 100 ∑Y2 = 460 ∑XY = 508 The following mistakes were discovered at the time of checking: Wrong Values Recorded
X 6 8
Correct Values need to be Recorded X Y 8 12 6 8
Y 14 6
Find out the correct value of the co-efficient of correlation. Question # 13
For the following two sets of bivariate data, the regression lines for each set are, respectively: i) y = 1.94x + 10.83 (y on x) and x = 0.15y + 6.18 (x on y) ii)
y = -1.96x + 15(y on x) and x = -0.45y + 7.16 (x on y)
Required: Find the co-efficient of correlation in each case. Question # 14
Page 8 of 13
CHEPTER # 4 SIMPLE REGRESSION AND CORRELATION EXERCISE # 4 BY SHAHID MEHMOOD
[email protected]
A firm trains employees to use a statistical software package. A random sample of trainees turned in the following performance: Traine e
Hours of Training (x)
A
1
B
4
3
C
6
2
D
8
1
E
2
5
F
3
4
G
1
7
i)
Determine the least square regression line of y on x.
ii)
Interpret the co-efficient of regression.
Number of errors (y)
6
Predict the number of errors for a person with 5 hours of training.
Question # 15
A research compiled the following information to investigate, the relationship between smoking and lung cancer:
Page 9 of 13
CHEPTER # 4 SIMPLE REGRESSION AND CORRELATION EXERCISE # 4 BY SHAHID MEHMOOD
Country
[email protected]
Per Capital
Deaths per 100.000
Cigarette Consumption
From Lung Cancer
USA
1300
UK
1100
46
Finland
1100
65
Switzerland
510
25
Canada
500
15
Holland
490
24
Australia
480
18
Denmark
380
17
Sweden
300
11
Norway
250
9
Iceland
230
6
20
Compute r and ~ describe what they mean Ans:
0.5476
Question # 16 Calculate the co-efficient of correlation for the following data: Annual percentage Annual percentage increase in advertising
Page 10 of 13
CHEPTER # 4 SIMPLE REGRESSION AND CORRELATION EXERCISE # 4 BY SHAHID MEHMOOD
[email protected] increase in sale revenue
expenditure (X)
(Y)
1
1
3
2
4
4
6
4
8
5
9
7
11
8
14
9
ii) What is the purpose of finding the correlation co-efficient and what dose its value indicate in respect of the above data on advertising expenditure and sales revenue? Ans: r = 0.9770 Question # 17 Calculate the co-efficient of correlation for the following data: Annual percentage increase in advertising expenditure
Annual percentage increase in sale revenue
(X)
(Y)
1
1
3
2
4
4
6
4
8
5
9
7
11
8
14
9
ii) What is the purpose of finding the correlation co-efficient and what dose its value indicates in respect of the above data on advertising expenditure and sales revenue?
Page 11 of 13
CHEPTER # 4 SIMPLE REGRESSION AND CORRELATION EXERCISE # 4 BY SHAHID MEHMOOD
[email protected]
Ans: r = 0.9770 Question # 18 An analyst is studying the relationship between shopping center traffic and a department Store’s daily sales. The analyst develops an index to measure the daily volume of traffic Entering the shopping center and an index of daily sales. The following table shows the Index value for 9 randomly selected days.
Traffic Index (x)
Sale Index (y)
71 82 111 85 89 110 75 105 120
135 170 184 160 175 190 140 152 210
Forecast sale index for a traffic index of 132 by the method of least square. Ans: 198.29 Q 19 company wants to assess the impact of advertising expenditures on its annual profit. The following table presents the information for eight years: Year
2001 2002 2003 2004 2005 2006 2007 2008
Advertising Expenditure ( in millions) 90 100 95 110 130 145 150 140
Annual profit ( in millions)
45 42 44 60 30 34 35 30
(a) Construct the least square regression equation and predict the annual profit for the year 2009 if the advertising expenditure is budgeted at Rs. 160 million. (b) Determine the coefficient of correlation and interpret your result. Q 20
Page 12 of 13
CHEPTER # 4 SIMPLE REGRESSION AND CORRELATION EXERCISE # 4 BY SHAHID MEHMOOD
[email protected]
Student Name
Ali
Adil
Asif
Ahmed
Marks given by Judge A
70
92
80
65
Marks given by Judge B
54
43
43
67
Ayub
70 64
Required:
Calculate Spearman’s Rank Correlation Coefficient. Q 21
A firm trains employees to use a statistical software package. A random sample of trainees turned in the following performance: Traine e
Hours of Training (x)
A
1
B
4
3
C
6
2
D
8
1
E
2
5
F
3
4
G
1
7
i)
Number of errors (y)
6
Determine the least square regression line of y on x.
ii) Interpret the co-efficient of regression. Predict the number of errors for a person with 5 hours of training.
Page 13 of 13