Chapter 08 - The Comparison of Two Populations
CHAPTER 8 THE COMPARISON OF TWO POPULATIONS 8-1.
n = 25
D = 19.08
H0: D = 0 t (24) =
s D = 30.67
H1: D 0
D D0 sD / n
= 3.11
Reject H0 at = 0.01. Paired Difference Test Evidence Size
25
n
Average Difference 19.08 D
Assumption Populations Normal
Stdev. of Difference 30.67 sD Note: Difference has been defined as Test Statistic 3.1105 t df 24 Hypothesis Testing Null Hypothesis p-value H0: 1 2 =0 0.0048
8-2.
n = 40
D = 5 s D = 2.3
H0: D = 0 t(39) =
At an of 5% Reject
H1: D 0
50 2.3 / 40
= 13.75
Strongly reject H0. 95% C.I. for D 2.023(2.3/ 40 ) = [4.26, 5.74]. 8-3.
n = 12
D = 3.67
H0: D = 0
s D = 2.45 (D = Movie – Commercial)
H1: D 0
8-1
Chapter 08 - The Comparison of Two Populations
(template: Testing Paired Difference.xls, sheet: Sample Data) Paired Difference Test Data Current Previous Evidence M C
Size
9
n
Average Difference 3.66667 D
Assumption
1
15
10
Populations Normal
2 3 4 5 6 7 8 9
17 25 17 14 18 17 16 14
9 Stdev. of Difference 2.44949 sD Note: Difference has been defined as 21 16 Test Statistic 4.4907 t 11 df 8 At an of 12 Hypothesis Testing Null Hypothesis p-value 13 5% H0: 1 2 =0 Reject 15 0.0020 H0: 1 2 >=0 13 0.9990 H0: 1 2 <=0 Reject 0.0010
At = 0.05, we reject H0. There are more viewers for movies than commercials. 8-4.
D = 0.2
n = 60
H0: D 0 t(24) =
0.2 0 1 / 60
sD = 1
H1: D > 0 = 1.549. At = 0.05, we cannot reject H0.
Paired Difference Test Evidence Size
60
n
Assumption
Average Difference
0.2
D
Populations Normal
Stdev. of Difference
1
sD Note: Difference has been defined as
Test Statistic 1.5492 t df 59 Hypothesis Testing Null Hypothesis p-value H0: 1 2 =0 0.1267 H0: 1 2 >=0 0.9367 H0: 1 2 <=0 0.0633
8-5.
n = 15
D = 3.2
H0: D 0 t (14) =
s D = 8.436
At an of 5%
(D = After – Before)
H1: D > 0
3.2 0 8.436 / 15
= 1.469
8-2
Chapter 08 - The Comparison of Two Populations
There is no evidence that the shelf facings are effective. 8-6.
n = 12
D = 37.08
H0: D = 0
s D = 43.99
H1: D 0
(template: Testing Paired Difference.xls, sheet: Sample Data) Paired Difference Test Data Current Previous Evidence France Spain
Size
12
n
Average Difference 37.0833 D
Assumption
1
258
214
Populations Normal
2 3 4 5 6 7 8 9 10 11 12
289 228 200 190 350 310 212 195 175 299 190
250 Stdev. of Difference 43.9927 sD Note: Difference has been defined as 190 185 Test Statistic 2.9200 t 114 df 11 At an of 285 Hypothesis Testing Null Hypothesis p-value 378 5% H0: 1 2 =0 Reject 230 0.0139 H : >= 160 0 0.9930 0 1 2 H0: 1 2 <=0 Reject 120 0.0070 220 105
Reject H0. There is strong evidence that hotels in Spain are cheaper than those in France, based on this small sample. p-value = 0.0139 8-7.
Power at D = 0.1 H0: D 0
n = 60
D = 1.0
= 0.01
H1: D > 0
C = 0 + 2.326( / n ) = 0.30029
We need:
P( D > C | D = 0.1) = P( D > 0.30029 | D = 0.1) 0.30029 0.1 = P Z 1 / 60
= P(Z > 1.551) = 0.0604 8-8.
n = 20
D = 1.25
H0: D = 0
s D = 42.896
H1: D 0
8-3
Chapter 08 - The Comparison of Two Populations
t (19) =
1.25 0
= 0.13
42.89 / 20
Do not reject H0; no evidence of a difference. Paired Difference Test Evidence Size
20
Assumption
n
Average Difference 1.25 D
Populations Normal
Stdev. of Difference 42.89 sD Note: Difference has been defined as Test Statistic 0.1303 t df 19 Hypothesis Testing Null Hypothesis p-value H0: 1 2 =0 0.8977
8-9.
n1 = 100
At an of 5%
n 2 = 100 x1 = 76.5 x 2 = 88.1 s1 = 38 s 2 = 40
H0: 2 1 0
H1: 2 1 0
(Template: Testing Population Means.xls, sheet: Z-test from Stats) (need to use the t-test since the population std. dev. is unknown) Evidence
Assumptions Populations Normal
Sample1 Sample2 Size Mean Std. Deviation
100 76.5 38
n x-bar s
100 88.1 40
H0: Population Variances Equal F ratio 1.10803 p-value 0.6108
Assuming Population Variances are Equal Pooled Variance 1522 s2p Test Statistic -2.1025 t df 198
Null Hypothesis H0: 1 2 =0 H0: 1 2 >=0 H0: 1 2 <=0
At an of Confidence Interval for difference in Population Means Confidence p-value 5% Interval Reject 0.0368 95% -11.6 ± 10.8801 = [ -22.48, -0.7199 ] 0.0184 0.9816
Reject
Reject H0. There is evidence that gasoline outperforms ethanol. 8-10.
n1 = n 2 = 30
H0: 1 2 = 0 Nikon (1): x1 = 8.5
H1: 1 2 0 s1 = 2.1
Minolta (2): x 2 = 7.8 s 2 = 1.8
8-4
Chapter 08 - The Comparison of Two Populations
z=
8.5 7.8 2
= 1.386
2
(2.1 / 30) (1.8 / 30)
Do not reject H0. There is no evidence of a difference in the average ratings of the two cameras. 8-11.
Bel Air (1):
n1 = 32
x1 = 2.5M
s1 = 0.41M
Marin (2):
n 2 = 35
x 2 = 4.32M
s 2 = 0.87M
H0: 1 2 = 0
H1: 1 2 0
(Template: Testing Population Means.xls, sheet: t-test from Stats) (need to use the t-test since the population std. dev. is unknown) Equal variance assumptions is questionable. t-Test for Difference in Population Means Evidence
Assumptions Populations Normal
Sample1 Sample2 Size Mean Std. Deviation
32 2.5 0.41
35 4.32 0.87
n x-bar s
H0: Population Variances Equal F ratio 4.50268 p-value 0.0001
Assuming Population Variances are Equal Pooled Variance 0.47609 s2p Test Statistic -10.7845 t df 65
Warning: Equal variance assumption is questionable. At an of
Null Hypothesis H0: 1 2 =0
p-value 0.0000
5% Reject
H0: 1 2 >=0 H0: 1 2 <=0
0.0000 1.0000
Reject
Confidence Interval for difference in Population Means Confiden ce Interval 95% -1.82 ± 0.33704 = [ -2.157, -1.48296 ]
Assuming Population Variances are Unequal Test Statistic -11.101 t df 49 Null Hypothesis H0: 1 2 =0 H0: 1 2 >=0 H0: 1 2 <=0
p-value 0.0000 0.0000 1.0000
At an of 5% Reject Reject
8-5
Confidence Interval for difference in Population Means Confidence Interval 95% -1.82 ± 0.32946 = [ -2.1495, -1.49054 ]
Chapter 08 - The Comparison of Two Populations
Reject H0. There is evidence that the average Bel Air price is lower. 8-12.
(Template: Testing Population Means.xls, sheet: t-test from Stats) (need to use the t-test since the population std. dev. is unknown)
H0: μJ – μSP = 0
H1: μJ – μSP ≠ 0
t-Test for Difference in Population Means Evidence
Assumptions Populations Normal
Sample1 Sample2 Size Mean Std. Deviation
40 15 3
n x-bar s
40 6.2 3.5
H0: Population Variances Equal F ratio 1.36111 p-value 0.3398
Assuming Population Variances are Equal Pooled Variance 10.625 s2p Test Statistic 12.0735 t df 78
Null Hypothesis H0: 1 2 =0
At an of Confidence Interval for difference in Population Means Confidence p-value 5% Interval 0.0000 Reject 95% 8.8 ± 1.45107 = [ 7.34893, 10.2511 ]
H0: 1 2 >=0 H0: 1 2 <=0
1.0000 0.0000
Reject
Reject the null hypothesis. The global equities outperform U.S. market. 8-13.
Music:
n1 = 128 x1 = 23.5
s1 = 12.2
Verbal: n 2 = 212
x 2 = 18.0
H0: 1 2 = 0
H1: 1 2 0
z=
23.5 18.0 (12.2 / 128) (10.5 2 / 212) 2
s 2 = 10.5
= 4.24
Reject H0. Music is probably more effective.
8-6
Chapter 08 - The Comparison of Two Populations
Evidence Sample1 Sample2 n Size 128 212 x-bar Mean 23.5 18 Popn. Std. Devn.
Popn. 1 12.2
Popn. 2 10.5
Hypothesis Testing Test Statistic 4.2397 z At an of p-value 5% Reject 0.0000
Null Hypothesis H0: 1 2 =0
8-14.
n1 = 13
n 2 = 13
x1 = 20.385
= .05
s1 = 7.622
s 2 = 4.292
H0: u1 = u2
H1: u1 u2
S p2
x 2 = 10.385
13 17.622 2 13 14.292 2 13 13 2
t 24
20.385 10.385
38.2581 1 1 13 13
38.2581
4.1219
df 24. Use a critical value of 2.064 for a two-tailed test. Reject H0. The two methods do differ. 8-15.
Liz (1):
n1 = 32
x1 = 4,238
Calvin (2):
n 2 = 37
x 2 = 3,888.72 s 2 = 876.05
a. one-tailed: H0: 1 2 0 b. z =
s1 = 1,002.5
H1: 1 2 > 0
4,238 3,888.72 0 (1,002.52 / 32) (876.052 / 37)
= 1.53
c. At = 0.5, the critical point is 1.645. Do not reject H0 that Liz Claiborne models do not get more money, on the average. d. p-value = .5 .437 = .063 (It is the probability of committing a Type I error if we choose to reject and H0 happens to be true.)
8-7
Chapter 08 - The Comparison of Two Populations
e.
S 2p
10 11002.5 2 11 1876.05 2 10 11 2
t 24
4238 3888.72
879983.804 1 1 10 11
879983.804
0.8522
df 19 8-16.
(Template: Testing Population Means.xls, sheet: t-test from Stats) (need to use the t-test since the population std. dev. is unknown) H0: 1 2 = 0 H1: 1 2 0 t-Test for Difference in Population Means Evidence Sample1
Sample2
28 0.19 5.72
28 0.72 5.1
Size Mean Std. Deviation
Assumptions Populations Normal n x-bar s
H0: Population Variances Equal F ratio 1.25792 p-value 0.5552
Assuming Population Variances are Equal Pooled Variance 29.3642 s2p Test Statistic -0.3660 t df 54
Null Hypothesis H0: 1 2 =0
At an of Confidence Interval for difference in Population Means Confidence p-value 1% Interval 0.7158 99% -0.53 ± 3.86682 = [ -4.3968, 3.33682 ]
H0: 1 2 >=0 H0: 1 2 <=0
0.3579 0.6421
Do not reject the null hypothesis. Pre-earnings announcements have no impact on earnings on stock investments. 8-17.
Non-research (1):
n1 = 255
s1 = 0.64
Research (2):
n 2 = 300
s 2 = 0.85
x 2 x1 = 2.54
95% C.I. for 2 1 is: ( x 2 x1 ) z / 2 (s1 / n1 ) (s 2 / n2 ) 2
2
2
2
= 2.54 1.96 (.64 / 255) (.85 / 300) = [2.416, 2.664] percent.
8-8
Chapter 08 - The Comparison of Two Populations
8-18.
Audio (1): n1 = 25
x1 = 87
s1 = 12
Video (2): n 2 = 20
x 2 = 64
s 2 = 23
H0: 1 2 = 0 t(43) =
H1: 1 2 0 x1 x 2 0
(n1 1) s1 (n 2 1) s 2 n1 n 2 2 2
2
1 1 n1 n 2
= 4.326
Reject H0. Audio is probably better (higher average purchase intent). Waldenbooks should concentrate in audio. Evidence Sample1 Sample2 Size Mean Std. Deviation
25 87 12
n x-bar s
20 64 23
Assuming Population Variances are Equal Pooled Variance 314.116 s2p Test Statistic 4.3257 t df 43 Null Hypothesis H0: 1 2 =0
8-19.
With training (1):
p-value 0.0001
At an of 5% Reject
n1 = 13
x1 = 55
s1 = 8
Without training (2): n 2 = 15
x 2 = 48
s2 = 6
H0: 1 2 4,000
H1: 1 2 > 4,000
(55 48) 4
t (26) =
2
2
= 1.132
(12)(8) (14)(6) 1 1 26 13 15
The critical value at = .05 for t (26) in a right-hand tailed test is 1.706. Since 1.132 < 1.706, there is no evidence at = .05 that the program executives get an average of $4,000 per year more than other executives of comparable levels.
8-20.
(Use template: “testing difference in means.xls”) (need to use the t-test since the population std. dev. is unknown) H0: μP - μL= 0
H1: μP - μL 0
8-9
Chapter 08 - The Comparison of Two Populations
t-Test for Difference in Population Means Evidence
Assumptions Populations Normal
Sample1 Sample2 Size Mean Std. Deviation
20 1 1.1
20 6 2.5
n x-bar s
H0: Population Variances Equal F ratio 5.16529 p-value 0.0008
The variances are not equal. Assuming Population Variances are Unequal Test Statistic -8.1868 t df 26
Null Hypothesis H0: 1 2 =0 H0: 1 2 >=0 H0: 1 2 <=0
At an Confidence Interval for difference in Population of Means Confidence Interval p-value 5% 0.0000 Reject 95% -5 ± 1.25539 = [ -6.2554, -3.74461 ] 0.0000 Reject 1.0000
Reject the null hypothesis: the average cost of beer is cheaper in Prague. Londoners save between $3.74 and $6.26. 8-21.
(Use template: “testing difference in means.xls”) (need to use the t-test since the population std. dev. is unknown) H1: 1 2 0 H0: 1 2 = 0 t-Test for Difference in Population Means Evidence Size Mean Std. Deviation
US
China
15 3.8 2.2
18 6.1 5.3
8-10
Assumptions Populations Normal n x-bar s
H0: Population Variances Equal F ratio 5.80372 p-value 0.0018
Chapter 08 - The Comparison of Two Populations
Equal variance assumption is violated. Assuming Population Variances are Unequal Test Statistic -1.676 t df 23
Null Hypothesis H0: 1 2 =0 H0: 1 2 >=0 H0: 1 2 <=0
Confidence Interval for difference in Population At an of Means Confidence p-value Interval 1% 0.1073 99% -2.3 ± 3.85252 = [ -6.1525, 1.55252 ] 0.0536 0.9464
Do not reject the null hypothesis (p-value = 0.1073), investment returns are the same in China and the US. 8-22.
Old (1):
n1 = 19
x1 = 8.26
s1 = 1.43
New (2):
n 2 = 23
x 2 = 9.11
s 2 = 1.56
H0: 2 1 0
H1: 2 1 > 0 9.11 8.26 0
t (40) =
2
2
= 1.82
18(1.43) 22(1.56) 1 1 40 19 23
Some evidence to reject H0 (p-value = 0.038) for the t-distribution with df = 40, in a one-tailed test. 8-23.
Take proposed route as population 1 and alternate route as 2. Assume equal variance for both populations. H0: 1 2 0 H1: 1 2 > 0 p-value from the template = 0.8674 cannot reject H0
8-24.
(Use template: “testing difference in means.xls”) (need to use the t-test since the population std. dev. is unknown) H0: 1 2 = 0 H1: 1 2 0
8-11
Chapter 08 - The Comparison of Two Populations
Evidence
Assumptions Populations Normal
Sample1 Sample2 Size Mean Std. Deviation
20 3.56 2.8
n x-bar s
20 4.84 3.2
H0: Population Variances Equal F ratio 1.30612 p-value 0.5662
Assuming Population Variances are Equal Pooled Variance 9.04 s2p Test Statistic -1.3463 t df 38 Null Hypothesis H0: 1 2 =0
p-value 0.1862
H0: 1 2 >=0 H0: 1 2 <=0
0.0931 0.9069
At an of 5%
Do not reject the null hypothesis. Neither investment outperforms the other. 8-25.
“Yes” (1): n1 = 25
x1 = 12
“No” (2):
x 2 = 13.5
n 2 = 25
s1 = 2.5 s2 = 1
Assume independent random sampling from normal populations with equal population variances. H0: 2 1 0 H1: 2 1 > 0 13.5 12
t(48) =
2
= 2.785
2
24(2.5) 24(1) 1 1 48 25 25
At = 0.05, reject H0. Also reject at = 0.01. p-value = 0.0038. Evidence Sample1 Sample2 n Size 25 25 Mean 12 13.5 x-bar s Std. Deviation 2.5 1
Assuming Population Variances are Equal Pooled Variance 3.625 s2p Test Statistic -2.7854 t df 48 Null Hypothesis H0: 1 2 =0 H0: 1 2 >=0
At an of p-value 5% 0.0076 Reject 0.0038
Reject
8-12
Chapter 08 - The Comparison of Two Populations
8-26.
H0: 1 2 = 0
H1: 1 2 0
.1331 .105 0
z=
2
= 0.8887
2
20(.09) 27(.122) 1 1 47 21 28
Do not reject H0. There is no evidence of a difference in average stock returns for the two periods. 8-27.
(Use template: “testing difference in means.xls”) (need to use the t-test since the population std. dev. is unknown) H0: μN - μO ≤ 0 H1: μN - μO > 0 Evidence
Assumptions Populations Normal
Sample1 Sample2 Size Mean Std. Deviation
8 3 2
n x-bar s
10 2.3 2.1
H0: Population Variances Equal F ratio 1.1025 p-value 0.9186
Assuming Population Variances are Equal Pooled Variance 4.23063 s2p Test Statistic 0.7175 t df 16 Null Hypothesis H0: 1 2 =0
p-value 0.4834
H0: 1 2 >=0 H0: 1 2 <=0
0.7583 0.2417
At an of 5%
Do not reject the null hypothesis. (p-value = 0.2417) The new advertising firm has not resulted in significantly higher sales. 8.28.
From Problem 8-25: n1 = n 2 = 25 x1 = 12
x 2 = 13.5
s1 = 2.5
s2 = 1
We want a 95% C.I. for 2 1 : (n1 1) s1 (n2 1) s 2 ( x 2 x1 ) 2.011 n1 n2 2 2
2
2
1 1 n1 n2
2
24(2.5) 24(1) 1 1 = (13.5 – 12) 2.001 48 25 25
= [0.4170, 2.5830] percent.
8-13
Chapter 08 - The Comparison of Two Populations
8-29.
Before (1):
x1 = 85
n1 = 100
After (2):
x 2 = 68
n 2 = 100
H0: p1 – p2 0 H1: p1 – p2 > 0 pˆ 1 pˆ 2 .85 .68 z= = = 2.835 1 1 1 1 (.765)(.235) pˆ (1 pˆ ) 100 100 n1 n 2 Reject H0. On-time departure percentage has probably declined after NW’s merger with Republic. p-value = 0.0023.
Evidence
Sample Sample 1 2 Size 100 100 n #Successes 85 68 x Proportion 0.8500 0.6800 p-hat
Hypothesis Testing Hypothesized Difference Zero Pooled p-hat 0.7650 Test Statistic 2.8351 z Null Hypothesis H0: p1 - p2 = 0 H0: p1 - p2 >= 0 H0: p1 - p2 <= 0
8-30.
p-value 0.0046 0.9977 0.0023
At an of 5% Reject Reject
Small towns (1):
n1 = 1,000
x 1 = 850
Big cities (2):
n 2 = 2,500
x 2= 1,950
H0: p1 – p2 0
H1: p1 – p2 > 0 850 1,950 1,000 2,500
z=
1 850 1,950 2,800 1 1 3,500 3,500 1,000 2,500
= 4.677
Reject H0. There is strong evidence that the percentage of word-of-mouth recommendations in small towns is greater than it is in large metropolitan areas. 8.31.
n1 = 31
x 1 = 11
H0: p1 – p2 = 0
n 2 = 50
x 2= 19
H1: p1 – p2 0
8-14
Chapter 08 - The Comparison of Two Populations
z=
pˆ 1 pˆ 2 1 1 pˆ (1 pˆ ) n n 2 1
= 0.228
Do not reject H0. There is no evidence that one corporate raider is more successful than the other. 8-32.
Before campaign (1):
n1 = 2,060
pˆ 1 = 0.13
After campaign (2):
n 2 = 5,000
pˆ 2 = 0.19
H0: p2 p1 .05 H1: p2 – p1 > .05 pˆ 2 pˆ 1 D 0.19 0.13 .05 z= = = 1.08 (.13)(.87) (.19)(.81) pˆ 1 (1 pˆ 1 ) pˆ 2 (1 pˆ 2 ) 2,060 5,000 n1 n2 No evidence to reject H0; cannot conclude that the campaign has increased the proportion of people who prefer California wines by over 0.05.
8-33.
95% C.I. for p2 p1: = .06 1.96
( pˆ 2 pˆ 1 ) 1.96
pˆ 1 (1 pˆ 1 ) pˆ 2 (1 pˆ 2 ) n1 n2
(.13)(.87) (.19)(.81) = [0.0419, 0.0781] 2,060 5,000
We are 95% confident that the increase in the proportion of the population preferring California wines is anywhere from 4.19% to 7.81%. Confidence Interval 95%
8-34.
Confidence Interval 0.0600 ± 0.0181 = [
0.0419 , 0.0782 ]
The statement to be tested must be hypothesized before looking at the data: Chase Man. (1): n1 = 650 x 1 = 48 Manuf. Han. (2):
n 2 = 480
x 2 = 20
H0: p 1 – p 2 0 H1: p 1 – p 2 > 0 pˆ 1 pˆ 2 z= = 2.248 1 1 pˆ (1 pˆ ) n1 n 2 Reject H0. p-value = 0.0122. 8-35.
American execs (1): n1 = 120
x 1 = 34
European execs (2):
x 2 = 41
H0: p 1 – p 2 0
n 2 = 200
H1: p 1 – p 2 > 0
8-15
Chapter 08 - The Comparison of Two Populations
z=
.283 .205 1 1 (.234)(1 .234) 120 200
= 1.601
At = 0.05, there is no evidence to conclude that the proportion of American executives who prefer the A380 is greater than that of European executives. (p-value = 0.0547.) Evidence
Sample 1 Sample 2 Size 120 200 n x #Successes 34 41 Proportion 0.2833 0.2050 p-hat
Hypothesis Testing Hypothesized Difference Zero Pooled p-hat 0.2344 Test Statistic 1.6015 z
8-36.
Null Hypothesis
p-value
H0: p1 - p2 = 0
0.1093
H0: p1 - p2 >= 0
0.9454
H0: p1 - p2 <= 0
0.0546
At an of 5%
Cleveland (1): n1 = 1,000
x 1 = 75
pˆ 1 = .075
n 2 = 1,000
x 2 = 72
pˆ 2 = .072
Chicago (2): H0: p 1 – p 2 = 0 z=
H1: p 1 – p 2 0
pˆ 1 pˆ 2 1 1 pˆ (1 pˆ ) n1 n 2
pˆ = (72 +75)/2,000 = .0735
= 0.257
We cannot reject H0. p. value = 0.7971 8-37.
(Use template: “testing difference in proportions.xls”) H0: pQ – pN = 0 H1: pQ – pN ≠ 0
8-16
Chapter 08 - The Comparison of Two Populations
Comparing Two Population Proportions Evidence
Sample 1 Sample 2 Size 100 100 n x #Successes 18 6 Proportion 0.1800 0.0600 p-hat
Hypothesis Testing Hypothesized Difference Zero Pooled p-hat 0.1200 Test Statistic 2.6112 z At an of p-value 5%
Null Hypothesis H0: p1 - p2 = 0
0.0090
Reject
Reject the null hypothesis, the new accounting method is more effective. 8-38.
(Use template: “testing difference in proportions.xls”) H0: pC – pD = 0 H1: pC – pD ≠ 0 Comparing Two Population Proportions Evidence
Sample 1 Sample 2 Size 100 100 n x #Successes 32 19 Proportion 0.3200 0.1900 p-hat
Hypothesis Testing Hypothesized Difference Zero Pooled p-hat 0.2550 Test Statistic 2.1090 z Null Hypothesis H0: p1 - p2 = 0
At an of p-value 1% 0.0349
Do not reject the null hypothesis: the proportions are not significantly different. 8-39.
Motorola (1):
n1 = 120
x 1 = 101 p1 = .842
Blaupunkt (2):
n 2 = 200
x 2 = 110 p2 = .550
H0: p 1 p 2
H1: p 1 > p 2
z=
pˆ = (101 +110)/320 = .659
.842 .550 1 1 (.659)(1 .659) 120 200
= 5.33
8-17
Chapter 08 - The Comparison of Two Populations
Strongly reject H0; Motorola’s system is superior (p-value is very small). 8-40.
Old method (1):
n1 = 40
2 s1 = 1,288
New method (2):
n 2 = 15
2 s 2 = 1,112
H0:
2 1
2
2
H1:
2 1
2
>2
use = .05
2 2 s 1 /s 2
F (39,14) = = 1,288/1,112 = 1.158 The critical point at = .05 is F (39,14) = 2.27 (using approximate df in the table). Do not reject H0. There is no evidence that the variance of the new production method is smaller. F-Test for Equality of Variances Sample 1 Sample 2 Size 40 15 Variance
1288
1112
Test Statistic 1.158273 F df1 39 df2 14
Null Hypothesis H0:
2 1
H0:
2 1 2 H0: 1
8-41.
-
2 2
p-value
= 0 0.7977
-
>= 0 0.6012
-
<= 0 0.3988
2 2 2 2
At an of 5%
Test the equal-variance assumption of Problem 8-27: 2
2
H0: 1 = 2 F = 1.1025
H1:
2 1
2 2
Assumptions Populations Normal H0: Population Variances Equal F ratio 1.1025 p-value 0.9186
Do not reject H0. Variances are equal. 8-42.
“Yes” (1): n1 = 25 s1= 2.5 “No” (2):
n 2 = 25 s2= 1
2
2
2
2
H0: 1 = 2 H1: 1 2 2 Put the larger s in the numerator and use 2 : 2
2
2
2
F (24,24) = s1 / s2 = (2.5) /(1) = 6.25
8-18
Chapter 08 - The Comparison of Two Populations
From the F table using = .01, the critical point is F (24,24) = 2.66. Therefore, reject H0. The population variances are not equal at = 2(.01) = 0.02. F-Test for Equality of Variances
Size Variance
6.25
Test Statistic df1 df2
6.25 24 24
Null Hypothesis H0:
2 1
2 H0: 1 2 H0: 1
8-43.
Sample 1 Sample 2 25 25
n1 = 21
-
-
2 2
2 2 2 2
p-value
= 0 0.0000
1 F
At an of 5% Reject
>= 0 1.0000 <= 0 0.0000
s1 = .09 2
n2 = 28
Reject
s2 = .122
2
F (27,20) = (.122) /(.09) = 1.838 At = .10, we cannot reject H0 because the critical point for = .05 from the table with df’s = 30, 20 is 2.04 and for df’s 24, 20 it is 2.08. We did not reject H0 at = .10 so we would also not reject it at = .02. Hence this particular C.I. contains the value 1.00. 8-44.
2
Before (1):
n1 = 12
s1 = 16,390.545
After (2):
n 2 = 11
2 s 2 = 86,845.764
2
2
2
2
H0: 1 = 2 H1: 1 2 F (10,11) = 5.298 The critical point from the table, using = .01, is F (10,11) = 4.54. Therefore, reject H0. The population variances are probably not equal. p-value < .02 (double the ).
8-19
Chapter 08 - The Comparison of Two Populations
F-Test for Equality of Variances Sample 1 Sample 2 Size 11 12 Variance 86845.76 16390.55 Test Statistic 5.298528 F df1 10 df2 11
Null Hypothesis 2 H0: 1 2 H0: 1 2 H0: 1 -
8-45.
n1 = 25
p-value
At an of 1%
2 - 2= 2 2 >= 2 2 <=
0 0.9945 0 0.0055
Reject
s1 = 2.5
n2 = 25
s2 = 3.1
2
0 0.0109
2
2
2
H1: 1 2 H0: 1 = 2 = .02 F (24,24) = (3.1)2/(2.5)2 = 1.538 From the table: F .01(24,24) = 2.66. Do not reject H0. There is no evidence that the variances in the two waiting lines are unequal. 8-46.
2
nA = 25
sA = 6.52 2
2
2
nB = 22
sB = 3.47
2
2
H1: A B H0: A = B = .01 F (24,21) = 6.52/3.47 = 1.879 The critical point for = .01 is F (24,21) = 2.80. Do not reject H0. There is no evidence that stock A is riskier than stock B. F-Test for Equality of Variances
Size Variance
Sample 1 Sample 2 25 22 6.52
3.47
Test Statistic 1.878963 F df1 24 df2 21
Null Hypothesis H0:
2 1
2 H0: 1 2 H0: 1
-
-
2 2
2 2 2 2
p-value
At an of 1%
= 0 0.1485
>= 0 0.9258 <= 0 0.0742
8-20
Chapter 08 - The Comparison of Two Populations
8-47.
The assumptions we need are: independent random sampling from the populations in question, and normal population distributions. The normality assumption is not terribly crucial as long as no serious violations of this assumption exist. In time series data, the assumption of random sampling is often violated when the observations are dependent on each other through time. We must be careful.
8-48.
(Use template: “testing difference in means.xls”) (need to use the t-test since the population std. dev. is unknown) H0: μLeg – μKnee= 0 H1: μLeg – μKnee 0
Evidence Sample1 Sample2 Size 200 Mean 10402 Std. Deviation 8500
200 n 11359 x-bar 9100 s
Assumptions Populations Normal H0: Population Variances Equal F ratio 1.14616 p-value 0.3367
Assuming Population Variances are Equal Pooled Variance 7.8E+07 s2p Test Statistic -1.0869 t df 398 Null Hypothesis H0: 1 2 =0
p-value 0.2778
H0: 1 2 >=0 H0: 1 2 <=0
0.1389 0.8611
At an of 5%
Do not reject the null hypothesis. The average cost of the two procedures are similar. 8-49.
99% C.I. for : μLeg – μKnee: Confidence Interval for difference in Population Means Confidence Interval 99% -957 ±2278.97 = [ -3235.97, 1321.97 ]
The C.I. contains zero as expected from the results of Problem 8-48. 8-50.
d = 51
d = 4.636
s d = 7.593
H0: u d 0 H1: u d > 0 4.636 = 2.025 t (10) = 7.593/ 11
Reject H0. Performance did improve after the sessions.
8-21
Chapter 08 - The Comparison of Two Populations
8-51.
For Problem 8-50: 95% C.I.: D t (10) s d / n 7.593 = 4.636 5.101 = [0.465, 9.737] = 4.636 2.228 11 Confidence Intervals for the Difference in Means (1 - ) Confidence Interval 95% 4.636 ± 5.10105 = [ -0.465 , 9.73705]
8-52.
(Use template: “testing difference in proportions.xls”) H0: pNFL – pSCI = 0 H1: pNFL – pSCI ≠ 0 Comparing Two Population Proportions Evidence
Sample 1 Sample 2 Size 200 200 n x #Successes 96 52 Proportion 0.4800 0.2600 p-hat
Hypothesis Testing Hypothesized Difference Zero Pooled p-hat 0.3700 Test Statistic 4.5567 z Null Hypothesis
At an of p-value 5%
H0: p1 - p2 = 0
0.0000
H0: p1 - p2 >= 0
1.0000
H0: p1 - p2 <= 0
0.0000
Reject Reject
Reject H0. There is evidence that NFL viewers watch more commercials than those viewing Survivor. 8-53. 99% C.I. pNFL – pSCI (for the difference between viewing commercials for NFL viewers vs. Survivor viewers.) Confidence Interval 99%
Confidence Interval 0.2200 ± 0.1211 = [
0.0989 , 0.3411 ]
The C.I. does not contain zero, as expected.
8-22
Chapter 08 - The Comparison of Two Populations
8-54.
(Use template: “testing difference in means.xls”) (need to use the t-test since the population std. dev. is unknown) H0: μCR – μGuat = 0 H1: μCR – μGuat ≠ 0
Evidence
Assumptions Populations Normal
Sample1 Sample2 Size Mean Std. Deviation
15 1242 50
n x-bar s
15 1240 50
H0: Population Variances Equal F ratio 1 p-value 1.0000
Assuming Population Variances are Equal Pooled Variance 2500 s2p Test Statistic 0.1095 t df 28 Null Hypothesis H0: 1 2 =0
p-value 0.9136
H0: 1 2 >=0 H0: 1 2 <=0
0.5432 0.4568
At an of 5%
Do not reject the null hypothesis. The number of roses imported from both countries is about the same.
8-55.
x1 = 60
n1 = 80
x2= 65
n2 = 100
pˆ = 125/180 = .6944
H0: p1 p2 = 0 H1: p1 p2 0 pˆ 1 pˆ 2 0 .75 .65 = = 1.447 z= 1 1 1 1 (.6944)(1 .6944) pˆ (1 pˆ ) 80 100 n1 n 2 Do not reject H0. (There is no evidence that one movie will be more successful than the other (p-value = 0.1478).
8-23
Chapter 08 - The Comparison of Two Populations
Evidence
Sample 1 Sample 2 Size 80 100 n x #Successes 60 65 Proportion 0.7500 0.6500 p-hat
Hypothesis Testing Hypothesized Difference Zero Pooled p-hat 0.6944 Test Statistic 1.4473 z At an of p-value 5%
Null Hypothesis H0: p1 - p2 = 0
8-56.
0.1478
95% C.I. for the difference between the two population proportions: ( pˆ 1 pˆ 2 ) 1.96 = 0.10 1.96
pˆ 1 (1 pˆ 1 ) pˆ 2 (1 pˆ 2 ) n1 n2
(.75)(.25) (.65)(.35) = [0.0332, 0.2332] 80 100
Yes, 0 is in the C.I., as expected from the results of Problem 8-55. 8-57.
K: L:
nK = 12 nL = 12
x K = 12.55 x L = 11.925
H0: K L = 0 t (22) =
sK = .7342281 sL = .3078517
H1: K L 0 12.55 11.925 2
= 2.719
2
11(.7342281) 11(.3078517) 1 1 22 12 12
Reject H0. The critical points for t (22) at = .02 are 2.508. Critical points for t (22) at = .01 are 2.819. So .01 < p-value < .02. The L-boat is probably faster.
8-24
Chapter 08 - The Comparison of Two Populations
Evidence Sample1 Sample2 n Size 12 12 Mean 12.55 11.925 x-bar Std. Deviation 0.73423 0.30785 s
Assuming Population Variances are Equal Pooled Variance 0.31693 s2p Test Statistic 2.7194 t df 22 Null Hypothesis H0: 1 2 =0
8-58.
p-value 0.0125
At an of 5% Reject
Do Problem 8-57 with the data being paired. The differences KL are: 0.2 n = 12 t (11) =
1.0
0.2
D = .625
.625 0 .7723929/ 12
1.0
2.2
0.2
0.8
0.9
1.0
0.2
0.6
1.2
sD = .7723929 = 2.803
2.718 < 2.803 < 3.106 (between the critical points of t (11) for = .01 and .02). Hence, .01 < p-value < .02, which is as before, in Problem 8-57 (the pairing did not help much herewe reach the same conclusion). Paired Difference Test Evidence Size
12
n
Average Difference 0.625 D
Assumption Populations Normal
Stdev. of Difference 0.77239 sD Note: Difference has been defined as Test Statistic 2.8031 t df 11 Hypothesis Testing Null Hypothesis p-value H0: 1 2 =0 0.0172 H0: 1 2 >=0 0.9914 H0: 1 2 <=0 0.0086
8-59.
At an of 5% Reject Reject
(Use template: “testing difference in proportions.xls”) H0: μWest – μSouth = 0
H1: μWest – μSouth ≠ 0
8-25
Chapter 08 - The Comparison of Two Populations
Comparing Two Population Proportions Evidence
Sample 1 Sample 2 Size 1000 1000 n #Successes 49.5 67.9 x Proportion 0.0495 0.0679 p-hat
Hypothesis Testing Hypothesized Difference Zero Pooled p-hat 0.0587 Test Statistic -1.7503 z At an of p-value 5%
Null Hypothesis H0: p1 - p2 = 0
0.0801
Do not reject the null hypothesis: the delinquency rates are the same. 8-60.
IIT (1):
n1 = 100
pˆ 1 = 0.94
Competitor (2):
n2 = 125
pˆ 2 = 0.92
H0: p1 p2 = 0
H1: p1 p2 0
z=
.02 1 1 (.9288)(1 .9288) 100 125
pˆ = .92888
= 0.58
There is no evidence that one program is more successful than the other. 8-61.
Design (1):
n1 = 15
x1 = 2.17333
Design (2):
n2 = 13
x 2 = 2.5153846
H0: 2 1 = 0
s1 = .3750555 s2 = .3508232
H1: 2 1 0 2.5153846 2.173333
t (26) =
2
= 2.479
2
14(.3750555) 12(.3508232) 1 1 26 15 13
p-value = .02. Reject H0. Design 1 is probably faster. 8-62.
H0:
2 1
2
2
H1: 1
=2 2
2 2
2
F (14,12) = s1 / s2 = (.3750555)2/(.3508232)2 = 1.143
8-26
Chapter 08 - The Comparison of Two Populations
Do not reject H0 at = 0.10. (Since 1.143 < 2.62. Also < 2.10, so the p-value > 0.20.) The solution of Problem 8-61 is valid from the equal-variance requirement.
8-63.
A = After:
nA = 16
B = Before: nB = 15 H0: A B 5
x A = 91.75
sA = 5.0265959
x B = 84.7333
sB = 5.3514573
H1: A B > 5 91.75 84.733 5
t (29) =
2
= 1.08
2
15(5.0265959) 14(5.3514573) 1 1 29 16 15
Do not reject H0. There is no evidence that advertising is effective. 8-64.
2
2
2
2
H0: 1 = 2 H1: 1 2 F (14,15) = (5.3514573)2/(5.0265959)2 = 1.133 Do not reject H0 at = 0.10. There is no evidence that the population variances are not equal. F-Test for Equality of Variances
Size
Sample 1 Sample 2 15 16
Variance 28.6381
25.26667
Test Statistic 1.133434 F df1 14 df2 15
Null Hypothesis H0:
2 1
2 H0: 1 2 H0: 1
8-65.
-
-
2 2
2 2 2 2
p-value
At an of 10%
= 0 0.8100
>= 0 0.5950 <= 0 0.4050
From problem 8-48: sL = 8500 sK = 9100 H0:
2 L
2
=K
2
2
H1: L ≠ K
Evidence Sample1 Sample2 Size 200 Mean 10402 Std. Deviation 8500
200 n 11359 x-bar 9100 s
8-27
Assumptions Populations Normal H0: Population Variances Equal F ratio 1.14616 p-value 0.3367
Chapter 08 - The Comparison of Two Populations
F = 1.146 p = 0.34 Do not reject the null hypothesis of equal variances.
8-66.
H0: K 2 = L 2
H1: K 2 L 2
F (11,11) = (.7342281)2/(.3078517)2 = 5.688 Critical point for = 0.02 is about 4.5. Therefore, reject H0. Thus the analysis in Problem 8-57 is not valid. We need to use the other test. The other test also gives t = 2.719 but the df are obtained using Equation (8-6): ( s1 / n1 s2 / n2 ) 2 2
df =
2
= approximately 14 (rounded downward). ( s12 / n1 ) ( s2 2 / n2 ) 2 n 1 n 1 1 2 t .02(14) = 2.624 < 2.719 < 2.977 = t .01(14), hence 0.01 < p-value < 0.02. Reject H0. 8-67.
Differences A – B: 11 3 D = 2.375
t (15) =
3
14 8 10 5 7
2 12
6
5 10 22 12
sD = 9.7425185 n = 16
2.375 0 9.7425185/ 16
H0: D = 0
2
= 0.9751
H1: D 0
Do not reject H0. There is no evidence that one package is better liked than the other. Paired Difference Test Evidence Size
16
n
Average Difference -2.375 D
Assumption Populations Normal
Stdev. of Difference 9.74252 sD Note: Difference has been defined as Test Statistic -0.9751 df 15 Hypothesis Testing Null Hypothesis H0: 1 2 =0 H0: 1 2 >=0 H0: 1 2 <=0
8-68.
t At an of 5%
p-value 0.3450 0.1725 0.8275
Supplier A: nA = 200 xA = 12 Supplier B: nB = 250 xB = 38 H0: pA – pB = 0 H1: pA – pB 0
pˆ = (12 +38)/450 = .1111
8-28
Chapter 08 - The Comparison of Two Populations
pˆ A pˆ B 0
z=
1 1 pˆ (1 pˆ ) n n 2 1
.06 .152
=
1 1 (.1111)(.8888) 200 250
= 3.086
Reject H0. p-value = .002. Supplier A is probably more reliable as the proportion of defective components is lower. 8-69.
95% C.I. for the difference in the proportion of defective items for the two suppliers: ( pˆ B pˆ A ) 1.96
pˆ A (1 pˆ A ) pˆ B (1 pˆ B ) nA nB
=.092 1.96(.0282415) = [0.0366, 0.1474]. Confidence Interval 95%
8-70.
Confidence Interval 0.0920 ± 0.0554 = [
0.0366 , 0.1474 ]
90% C.I. for the difference in average occupancy rate at the Westin Plaza Hotel before and after the advertising: 2
2
15(5.0265959) 14(5.3514573) 1 1 ( x B x A ) 1.699 29 15 16
= 7.016667 3.1666375 = [3.85, 10.18] percent occupancy. 8-71.
(Use template: “testing difference in means.xls”) (need to use the t-test since the population std. dev. is unknown) H0: μB – μO = 0 H1: μB – μO ≠ 0 t-Test for Difference in Population Means Evidence
Assumptions Populations Normal
Sample1 Sample2 Size Mean Std. Deviation
25 60 14
20 65 8
n x-bar s
Assumption of equal variances is violated. Assuming Population Variances are Unequal Test Statistic -1.5048 t df 39 Null Hypothesis H0: 1 2 =0 H0: 1 2 >=0 H0: 1 2 <=0
p-value 0.1404 0.0702 0.9298
At an of 5%
8-29
H0: Population Variances Equal F ratio 3.0625 p-value 0.0155
Chapter 08 - The Comparison of Two Populations
Do not reject the null hypothesis. The price of the two virtual dolls is about the same. 8-72.
(Use template: “testing difference in means.xls”) (need to use the t-test since the population std. dev. is unknown) H0: μA – μB = 0 H1: μA – μB ≠ 0 t-Test for Difference in Population Means Evidence
Assumptions Populations Normal
Sample1 Sample2 Size Mean Std. Deviation
74 28 6
n x-bar s
65 22 6
H0: Population Variances Equal F ratio 1 p-value 1.0000
Assuming Population Variances are Equal Pooled Variance 36 s2p Test Statistic 5.8825 t df 137 Null Hypothesis H0: 1 2 =0
p-value 0.0000
At an of 5% Reject
H0: 1 2 >=0 H0: 1 2 <=0
1.0000 0.0000
Reject
Reject the null hypothesis: the average returns are similar. 8-73.
(Use template: “testing difference in means.xls”; sheet:”t-test from stats”) H0: μ2 – μ1 = 0 H1: μ2 – μ1 ≠ 0 t-Test for Difference in Population Means Evidence
Assumptions Populations Normal
Sample1 Sample2 Size Mean Std. Deviation
74 50 20
65 14 8
The assumption of equal variances is violated.
8-30
n x-bar s
H0: Population Variances Equal F ratio 6.25 p-value 0.0000
Chapter 08 - The Comparison of Two Populations
Assuming Population Variances are Unequal Test Statistic 14.2414 t df 98 Confidence Interval for difference in Population At an of Means p-value 5% Confidence Interval Reject 0.0000 95% 36 ± 5.01643 = [ 30.9836, 41.0164 ] 1.0000 Reject 0.0000
Null Hypothesis H0: 1 2 =0 H0: 1 2 >=0 H0: 1 2 <=0
The 95% CI: [$30.98M, $41.02M] 8-74.
a.
n1 = 2500 x1 = 39
s1 = s2 = 2 H0: u1 = u2 z=
x 2 = 35
= .05 H1: u1 u2 39 35
2
n 2 = 2500
2
= 70.711
2 / 2500 2 / 2500
Reject H0. The average workweek has shortened. 2
2
b. 95% C.I.: (39 –35) 1.96 2 / 2500 2 / 2500 = 4 .1109 = [3.8891, 4.1109]
8-75.
(Use template: “testing difference in means.xls”; sheet:”t-test from stats”) H0: μ2 – μ1 = 0 H1: μ2 – μ1 ≠ 0 t-Test for Difference in Population Means Evidence
Assumptions Populations Normal
Sample1 Sample2 Size Mean Std. Deviation
25 1.7 0.4
25 1.5 0.7
n x-bar s
H0: Population Variances Equal F ratio 3.0625 p-value 0.0081
The assumption of equal variances is violated. Assuming Population Variances are Unequal Test Statistic 1.24035 t df 38 Null Hypothesis H0: 1 2 =0
p-value 0.2225
At an of 5%
Do not reject the null hypothesis. The mean catches are about the same. p-value = 0.2225
8-31
Chapter 08 - The Comparison of Two Populations
8-76.
Yes. Lower income households are less likely to have internet access. (p-value = 0.0038) Comparing Two Population Proportions Evidence Size #Successes Proportion
Sample 1 500 350 0.7000
Sample 2 n 500 x 310 p-hat 0.6200
Hypothesis Testing Hypothesized Difference Zero Pooled p-hat
0.6600
Test Statistic
2.6702
z
Null Hypothesis
p-value
At an of 5% Reject
H0: p1 - p2 = 0
0.0076
H0: p1 - p2 >= 0
0.9962
H0: p1 - p2 <= 0
0.0038
Reject
8-77. The 95% C.I. contains 0, which supports the results from 8-75. Confidence Interval for difference in Population Means Confidence Interval 95% 0.2 ± 0.32642 = [ -0.1264, 0.52642 ]
8-78
The ration of the variances is 3.18. The degrees of freedom for both samples is 10 – 1 = 9. Using the F-table for 9 degrees of freedom in both the numerator and the denominator, we find a value of 3.18 when α = 0.05. Therefore, there is a 5% chance.
8-79
(Use template: “testing difference in means.xls”; sheet:”t-test from data”) 1. Assuming equal variances: H0: μ2 – μ1 = 0 H1: μ2 – μ1 ≠ 0
8-32
Chapter 08 - The Comparison of Two Populations
t-Test for Difference in Population Means Data Co.1 Co.2 Sample1 Sample2 2570 2480 2870 2975
2055 2940 2850 2475
2660
1940
2380 2590 2550 2485 2585 2710
2100 2655 1950 2115
Evidence: Sample1 Sample2 n Size 11 9 Mean 2623.18 2342.22 x-bar Std. Deviation 174.087 393.55 s
Assuming Population Variances are Equal Pooled Variance 85673.3 s2p Test Statistic 2.1356 t df 18 Null Hypothesis H0: 1 2 =0 H0: 1 2 >=0 H0: 1 2 <=0
p-value 0.0467
At an of 5% Reject
0.9766 0.0234
Reject
At 0.05 level of significance, reject the null hypothesis that the charges are the same. 2. Test the assumption of equal variances.
H0 : 12 22
H1 : 12 22
Assumptions Populations Normal H0: Population Variances Equal F ratio 5.11054 p-value 0.0193
Reject null hypothesis: the variances are not equal. 3.Assuming unequal variances, H0: μ2 – μ1 = 0 H1: μ2 – μ1 ≠ 0
8-33
Chapter 08 - The Comparison of Two Populations
Assuming Population Variances are Unequal Test Statistic 1.98846 t df 10 Null Hypothesis H0: 1 2 =0 H0: 1 2 >=0 H0: 1 2 <=0
At an of 5%
p-value 0.0748 0.9626 0.0374
Reject
Accept the null hypothesis: the charges are not different. Case 10: Tiresome Tires II 1) Do not reject the null hypothesis at 5% Evidence
Assumptions Populations Normal
Sample1 Sample2 Size 40 40 n Mean 2742.5 2729.35 x-bar Std. Deviation 32.8883 38.3189 s
H0: Population Variances Equal F ratio 1.16512 p-value 0.6356
Assuming Population Variances are Equal Pooled Variance 1274.99 s2p Test Statistic 1.6470 t df 78 Null Hypothesis
p-value
H0: 1 2 <=0
0.0518
At an of Confidence Interval for difference in Population Means Confidence Interval 5% 95% 13.15 ± 15.8956 =
2) Increasing would decrease . Increasing to any value above 5.18% will cause the null hypothesis to be rejected. 3) Paired difference test: Reject the null hypothesis, (p-value = 0.0471)
8-34
Chapter 08 - The Comparison of Two Populations
Paired Difference Test Data Old Meth New Meth Evidence Sample1 Sample2
Size
Average Difference
40
1
2792
2713
13.15
2 3 4 5 6 7 8 9 10 11
2755 2745 2731 2799 2793 2705 2729 2747 2725 2715
2741 Stdev. of Difference 48.4877 2701 2731 Test Statistic 1.7152 2747 df 39 2679 Hypothesis Testing Null Hypothesis 2773 2676 2677 H0: 1 2 <=0 2721 2742
n
Assumption
D
Populations Normal
sD Note: Difference has been defined as Sample1 - Sample2
t
p-value
At an of 5%
0.0471
Reject
4) Reducing the variance of the new process will decrease the chances of a Type I error.
8-35
Chapter 09 - Analysis of Variance
CHAPTER 9 ANALYSIS OF VARIANCE 9-1.
H0: H1:
X X X X X X X X X X X X X X X X X X X X
1 2 3 4 All 4 are different 2 equal; 2 different 3 equal; 1 different 2 equal; other 2 equal but different from first 2
9-2.
ANOVA assumptions: normal populations with equal variance. Independent random sampling from the r populations.
9-3.
Series of paired t-test are dependent on each other. There is no control over the probability of a Type I error for the joint series of tests.
9-4.
r = 5 n1 = n2 = . . . = n5 = 21 n =105 df’s of F are 4 and 100. Computed F = 3.6. The p-value is close to 0.01. Reject H0. There is evidence that not all 5 plants have equal average output. F Distribution 10% (1-Tail) F-Critical 2.0019
9-5.
5% 2.4626
1% 3.5127
0.50% 3.9634
r = 4 n1 = 52 n2 = 38 n3 = 43 n4 = 47 Computed F = 12.53. Reject H0. The average price per lot is not equal at all 4 cities. Feel very strongly about rejecting the null hypothesis as the critical point of F (3,176) for = .01 is approximately 3.8. F Distribution 10% (1-Tail) F-Critical 2.1152
5% 2.6559
1% 3.8948
0.50% 4.4264
9-6.
Originally, treatments referred to the different types of agricultural experiments being performed on a crop; today it is used interchangeably to refer to the different populations in the study.Errors are the differences between the data points and their sample means.
9-7.
Because the sum of all the deviations from a mean is equal to 0.
9-1
Chapter 09 - Analysis of Variance
9-8. 9-9.
Total deviation = xij – x = ( x i – x ) + x ij xi = treatment deviation + error deviation. The sum of squares principle says that the sum of the squared total deviations of all the data points is equal to the sum of the squared treatment deviations plus the sum of all squared error deviations in the data.
9-10.
An error is any deviation from a sample menu that is not explained by differences among populations. An error may be due to a host of factors not studied in the experiment.
9-11.
Both MSTR and MSE are sample statistics given to natural variation about their own means. (If x > 0 we cannot immediately reject H0 in a single-sample case either.)
9-12.
The main principle of ANOVA is that if the r population means are not all equal then it is likely that the variation of the data points about their sample means will be small compared to the variation of the sample means about the grand mean.
9-13.
Distances among populations means manifest themselves in treatment deviations that are large relative to error deviations. When these deviations are squared, added, and then divided by df’s, they give two variances. When the treatment variance is (significantly) greater than the error variance, population mean differences are likely to exist.
9-14.
a) degrees of freedom for Factor: 4 – 1 = 3 b) degrees of freedom for Error: 80 – 4 = 76 c) degrees of freedom for Total: 80 – 1 = 79
9-15
SST = SSTR + SSE, but this does not equal MSTR + MSE. A counterexample: Let n = 21
r = 6 SST = 100
SSTR = 85
SSE = 15
Then SST = SSTR + SSE = 85 + 15 = 100. But = MSTR MSE
SST SSTR SSE 85 15 18 n 1 r 1 n r 5 15
9-16.
When the null hypothesis of ANOVA is false, the ratio MSTR/MSE is not the ratio of two independent, unbiased estimators of the common population variance 2 , hence this ratio does not follow an F distribution.
9-17.
For each observation xij , we know that (tot.) = (treat.) + (error):
xij – x = ( x i – x ) + x ij xi
Squaring both sides of the equation: ( xij – x )2 = ( x i – x )2 + 2( x i – x )( xij – x i ) + ( xij – x i )2
9-2
Chapter 09 - Analysis of Variance
Now sum this over all observations (all treatments i = 1, . . . , r; and within treatment i, all observations j = 1, . . . , ni: ni
r
( xij – x )2 =
i 1 j 1
ni
r
( x i – x )2 +
i 1 j 1
ni
r
2( x i – x )( xij – x i ) +
r
ni
( xij – x i )2
i 1 j 1
i 1 j 1
r
Notice that the first sum of the R.H.S. here equals
ni( x i – x )2 since for each i the
i 1
summand doesn’t vary over each of the ni) values of j. Similarly the second sum is r
2
i 1
ni
[( x i – x ) ( xij – x i )]. But for each fixed i, j 1
ni
( xij – x i ) = 0 since this is just the sum
j 1
of all deviations from the mean within treatment i. Thus the whole second sum in the long R.H.S. above is 0, and the equation is now r
ni
( xij – x )2 =
i 1 j 1
r
ni( x i – x )2 +
r
ni
( xij – x i )2
i 1 j 1
i 1
which is precisely Equation (9-12). 9-18.
(From Minitab): Source df SS MS F Treatment 2 381127 190563 20.71 Error 27 248460 9202 Total 29 629587 The critical point for F (2,27) at = 0.01 is 5.49. Therefore, reject H0. The average range of the 3 prototype planes is probably not equal. 5%
ANOVA Table Source SS Between 381127 Within 248460 Total 629587
9-19.
df
MS Fcritical p-value F 2 190563.33 20.7084038 3.3541312 0.0000 Reject 27 9202.2222 29
(Template: Anova.xls, sheet: 1-way): ANOVA Table Source SS Between 187.696 Within 152.413 Total 340.108
5% df 3 28 31
MS Fcritical p-value F 62.565 11.494 2.9467 0.0000 Reject 5.4433
9-3
Chapter 09 - Analysis of Variance
MINITAB output One-way ANOVA: UK, Mex, UAE, Oman Source Factor Error Total
DF 3 28 31
S = 2.333
SS 187.70 152.41 340.11
MS 62.57 5.44
R-Sq = 55.19%
F 11.49
P 0.000
R-Sq(adj) = 50.39% Individual 95% CIs For Mean Based on
Pooled Level UK Mex UAE Oman
N
Mean
StDev
8 8 8 8
60.160 58.390 55.190 54.124
2.535 2.405 2.224 2.149
StDev +---------+---------+---------+-------(------*-----) (------*-----) (------*------) (-----*------) +---------+---------+---------+--------
52.5
55.0
57.5
60.0
Pooled StDev = 2.333
Critical point F (3,28) for = 0.05 is 2.9467. Therefore we reject H0. There is evidence of differences in the average price per barrel of oil from the four sources. The Rotterdam oil market may not be efficient. The conclusion is valid only for Rotterdam, and only for Arabian Light. We need to assume independent random samples from these populations, normal populations with equal population variance. Observations are time-dependent (days during February), thus the assumptions could be violated. This is a limitation of the study. Another limitation is that February may be different from other months.
9-20.
An F(.05,2,101) = 3.61 result, relative to a critical value of 3.08637, indicates a significant difference in their perceptions on the roles played by African American models in commercials.
9-21.
(From Minitab): Source Treatment Error Total
df 2 38 40
SS 91.0426 140.529 231.571
9-4
MS 45.5213 3.69812
F 12.31
Chapter 09 - Analysis of Variance
p-value = .0001. Critical point for F (2,38) at = .05 is 3.245. Therefore, reject H0. There is a difference in the length of time it takes to make a decision. 5%
ANOVA Table
Source SS df MS Fcritical p-value F Between 91.0426 2 45.521302 12.3093042 3.2448213 0.0001 Reject Within 140.529 38 3.6981215 Total 231.571 40
9-22.
An F(.05,2,55) = 52.787 result, relative to a critical value of 3.165, indicates a significant difference in the monetary-economic reaction to the three inflation fighting policies.
9-23.
The test results exceed the critical value of F(.01,3,236) = 3.866. The results indicate that the performances of the four different portfolios are significantly different.
9-24.
95% C.I. for the mean responses: Martinique: x2 t / 2 MSE / n2 = 75 1.96 504.4 / 40 = [68.04, 81.96] Eleuthera: 73 1.96 MSE / n3 = [66.04, 79.96] Paradise Island: 91 1.96 MSE / n4 = [84.04, 97.96] St. Lucia: 85 1.96 MSE / n5 = [78.04, 91.96]
9-25.
Where do differences exist in the circle-square-triangle populations from Table 9-1, using Tukey? From the text: MSE = 2.125 triangles: n1 = 4 x1 = 6 squares:
n2 = 4
x 2 = 11.5
circles:
n3 = 3
x3 = 2
For = .01, q (r,nr) = q 0.01(3,8) = 5.63 Smallest ni is 3: T = q MSE / 3 = 5.63 2.125 / 3 = 4.738 | x1 x 2 | = 5.5 > 4.738
sig.
| x 2 x 3 | = 9.5 > 4.738
sig.
| x1 x 3 | = 4.0 < 4.738
n.s.
Thus: “1 = 3”; “2 > 1”; “2 > 3”
9-26.
Find which prototype planes are different in Problem 9-18: MSE = 9,202 ni = 10 for all i x A = 4,407 x B = 4,230
xC = 4,135
For = .05, q (3,27) = approximately 3.51. T = 3.51 9,202 / 10 = 106.475
9-5
Chapter 09 - Analysis of Variance
| x A x B | = 177 > 106.475
sig.
| x B xC | = 95 < 106.475
n.s.
| x A xC | = 272 > 106.475
sig.
Prototype A is shown to have higher average range than both B and C. Prototypes B and C have no significant difference in average range (all conclusions are at = 0.05). Tukey test for pairwise comparison of group means A r B Sig B 3 n-r C Sig C 27 q0 T
9-27.
3.51 106.476
Since H0 was rejected in Problem 9-19, there are significant differences. T = q0.05(4,28) 5.4433 / 8 = 3.332 |UK – MEX| = |60.16 – 58.39| = 1.77 |UK – UAE| = |60.16 – 55.19| = 4.97 |UK – OMAN| = |60.16 – 54.1238| = 6.0362 |MEX – UAE| = |58.39 – 55.19| = 3.2 |MEX – OMAN| = |58.39 – 54.1238| = 4.2662 |UAE – OMAN| = |55.19 – 54.1238| = 1.0662 All are < 0.22, thus not significantas expected. Tukey test for pairwise comparison of group means UK r Mex Mex 4 n-r UAE Sig UAE 28 Oman Sig Sig q0 4.04 T 3.33248
9-28.
(Question has no relevance to 9-20)
9-29.
Degrees of freedom for Factor: 3-1 = 2 Degrees of freedom for Error: 157 – 3 = 154 Degrees of freedom for Total: 157 – 1 = 156 The overall F test indicates that there is a difference in the groups’ reaction to pricing tactics. The subsequent information also indicates that there is a significant difference between each of the groups’ reactions.
9-30.
a) Total sample size = 275 b) The critical value for F(.05, 2, 272) is 3.029; therefore the overall ANOVA test is very significant. c) Monopoly prices are significantly different than limited competition and strong competition.
9-6
Chapter 09 - Analysis of Variance
9-31.
We cannot extend the results to planes built after the analysis. We used fixed effects here, not random effects. The 3 prototypes were not randomly chosen from a population of levels as would be required for the random effects model.
9-32.
A randomized complete block design is a design with restricted randomization. Each block of experimental units is assigned to treatments with randomization of treatments within the block.
9-33.
Fly all 3 planes on the same route every time. The route (flown by the 3 planes) is the block.
9-34.
Look at the residuals. If the spread of the residuals is not equal, we probably have unequal 2 , the assumption of equal variances is violated. A histogram of the residuals will reveal normality violations.
9-35.
Otherwise you are not randomly sampling from a population of treatments, and inference is not valid for the entire “population.”
9-36.
No. Rotterdam (and Arabian Light) was not randomly chosen.
9-37.
If the locations and the artists are chosen randomly, we have a random effects model.
9-38.
1. Testing for possible interactions among factor levels. 2. Efficiency.
9-39.
Limitations and problems: (1) We don’t know the overall significance level of the 3 tests; (2) If we have 1 observation per cell then there are 0 degrees of freedom for error. Also, for a fixed sample size there is a reduction of the df for error.
9-40.
1. As more factors are included, df for error decreases. 2. As more factors are included, we lose the control on , and the probability of at least one Type I error increases.
9-41.
Since there are interactions, there are differences in emotions averaged over all levels of advertisements.
9-42.
At = 0.05: Location: F = 50.6, significant Job type: F = 50.212, significant Interaction: F = 2.14, n.s.
9-7
Chapter 09 - Analysis of Variance
ANOVA Table Source SS df MS F Location 2520.988 2 1260.49 50.645 Job Type 2499.432 2 1249.72 50.212 Interaction 212.716 4 53.179 2.1367 Error 1792 72 24.8889 Total 7025.136 80
9-43. Morning Evening Late Night
ABC 50 50 50
CBS 50 50 50
NBC 50 50 50
5% Fcritical p-value 3.1239 0.0000 Reject 3.1239 0.0000 Reject 2.4989 0.0850
Source Network Newstime Interaction Error Total
SS 145 160 240 6200 6745
df 2 2 4 441 449
MS 72.5 80 60 14.06
F 5.16 5.69 4.27
From table: F 0.01(4,400) = 3.36 F 0.01(2,400) = 4.66 Therefore, all are significant at = 0.01. There are interactions. There are Network main effects averaged over Newstime levels. There are Newstime main effects over Network levels. Levels of task difficulty: a – 1 = 1; therefore a = 2 Levels of effort: b – 1 = 1; therefore b = 2 There are no task difficulty main effects because p-value = 0.5357 There are effort main effects because p-value < 0.0001 There are no significant interactions, as p-value = 0.1649.
9-44.
a. b. c. d. e.
9-45.
a. Explained is “Treatment”: Treat = Factor A + Factor B + (AB) b. Levels of exercise price: a – 1 = 2; therefore a = 3 c. Levels of time of expiration: b–1 = 1; therefore b = 2 ab(n – 1) = 144, a = 3, b = 2; therefore n – 1 = 24, n = 25, N = 25 6 = 150 n = 25 There are no exercise-price main effects (F = 0.42 < 1). There are time-of-expiration main effects at = 0.05 but not at = 0.01 because F (1,144) = 4.845. From the F table, for df’s = 1, 150: critical point for = 0.05 is 3.91 and for = 0.01 it is 6.81. h. There are no interactions: F = .193 < 1 i. There is some evidence for time-of-expiration main effects. There is no evidence for exercise-price main effects or interaction effects. j. For time-of-expiration main effects, .01 < p-value < .05. For the other two tests, the p-values are very high.
d. e. f. g.
k. We could use a t-test for time-of-expiration effects: t
9-8
2 (144)
= F (1,144)
Chapter 09 - Analysis of Variance
9-46.
Since there are interactions but neither of the main factors have significant F-tests, a likely conclusion is that the two factors work in opposite directions, i.e., inverse to each other.
9-47.
Advantages: reduced experimental errors (the effects of extraneous factors) and greater economy of sample sizes.
9-48.
Use blocking by firm, to reduce the error contributions arising from differences between firms.
9-49.
Could use a randomized blocking design: 4 observations, UK, Mexico, UAE, Oman at 4 locations and 4 different dates.
9-50.
A good blocking variable would be size of firm in terms of total assets or total sales, etc.
9-51.
Yes. Have people of the same occupation/age/demographics use sweaters of the 3 kinds under study. Each group of 3 people are a block.
9-52.
As stated in 9-23, a good blocking variable would be some measure of diversity in the portfolio.
9-53.
We could group the executives into blocks according to some choice of common characteristics such as age, sex, years employed at current firm, etc. The different blocks for the chosen attribute would then form a third variable beyond Location and Type to use in a 3-way ANOVA.
9-54.
We must assume no block-factor interactions.
9-55.
SSTR = 3,233 SSE = 12,386 n = 100 blocks df error = (n – 1)(r – 1) = 99(2) = 198 df treatment = r – 1 = 2 3,233/ 2 = 25.84 12,386 / 198 Reject H0. p-value is very small. There are differences among the 3 sweeteners. Should be very confident of results. Blocking reduces experimental error here, as people of the same weight/age/sex will tend to behave homogeneously with respect to losing weight.
F = MSTR/MSE =
9-56.
n = 70 r=4 SSTR = 9,875 SSBL = 1,445 SST = 22,364 SSE = 22,364 – 1,445 – 9,875 = 11, 044 MSE =
11,044 = 53.35 (69)(3)
MSTR =
9,875 = 3,291.67 3
F (3,207) = MSTR/MSE = 61.7 Reject H0. p-value is very small. Not all of the four methods are equally effective.
9-9
Chapter 09 - Analysis of Variance
9-57.
SSTR = 7,102 SSE = 10,511 r = 8 ni = 20 for all i MSTR = SSTR/(r – 1) = 7,102/7 = 1,014.57 MSE = SSE/(n – r) = 10,511/(160 – 8) = 69.15 F (7,152) = 14.67 > 2.76 (crit. point for = 0.01). Therefore, reject H0. Not all tapes are equally appealing. p-value is very small.
9-58.
n1 = 32 n2 = 30 n3 = 38 n4 = 41 n =141 MSTR = SSTR/(r – 1) = 4,537/3 = 1,512.33 F (3,137) = MSTR/MSE = 1,512.33/412 = 3.67 (at = 0.05) 2.67 < 3.67 < 3.92 (at = 0.01) We can reject H0 at = 0.05. There is some evidence that the four names are not all equally well liked.
9-59.
Software packages: 3 SS software = 77,645 SS computer = 54,521 SS int. = 88,699 SSE = 434,557 n = 60 Source software computer interaction error Total
Computers 4
SS 77,645 54,521 88,699 434,557 655,422
df 2 3 6 708 719
MS 38,822.5 18,173.667 14,783.167 613.78
F 63.25 29.60 24.09
Both main effects and the interactions are highly significant. 9-60.
Treatment df = (r-1) = 2 Block df = 74 Total df = 224 Total sample size was 225: Error df = (n-1)(r-1) = (74)(2) = 148 Critical value of F(.05, 2, 148) = 3.0572, which is less than F = 13.65. The results are significant.
9-10
Chapter 09 - Analysis of Variance
9-61. Source pet location interaction error Total
SS 22,245 34,551 31,778 554,398 642,972
df 3 3 9 144 159
MS 7,415 11,517 3,530.89 3,849.99
F 1.93 2.99 0.92
There are no interactions. There are no pet main effects. ( = 0.05) 2.68 < 2.99 < 3.92 ( = 0.01) Thus there are location main effects at = 0.05. 9-62. F-ratio = 4.5471 p-value = .0138 (using a computer). At = 0.05, only groups 1 and 3 are significantly different from each other. Drug group is significantly different from the No. Treatment group. 5%
ANOVA Table Source SS Between 3203.12 Within 25359.6 Total 28562.7
df
MS Fcritical p-value F 2 1601.56 4.54708749 3.123901138 0.0138 Reject 72 352.21667 74
Confidence Intervals of Group Means Group Confidence Interval Drug 24.16 ± 7.4824354 Placebo 27.8 ± 7.4824354 N0-Treatment 39.48 ± 7.4824354
95% 95% 95%
Tukey test for pairwise comparison of group means Drug r Placebo Placebo 3 n-r N0-Treatment Sig 72 q0 3.41 T 12.7994
9-63.
N0-Treatment
a. Blocking (repeated measures) is more efficient as every person is his/her own control. Reductions in errors. Limitations? Maybe carryover effects from trial to trial.
9-11
Chapter 09 - Analysis of Variance
9-64.
b. SSTR = 44,572 SSE = 112,672 r=3 n= 30 MSTR = 44,572/2 = 22,286 MSE = 112,672/(29)(2) = 1,942.62 F (2,58) = 11.47. Reject H0. n1 = n2 = n3 = 15 r = 3 A one-way ANOVA gives an F-value of 22.21, which is significant even at < 0.001, hence we reject the hypothesis of no differences among the three models. MSE = 48.1, so at = 0.01 we use the critical point q = 4.37 (closest to the required value for df’s = 3, 42), giving the Tukey criterion T = q MSE / ni = 7.83. Observed means: xGI = 124.73
x P = 121.40
| xGI x P | = 3.33
So:
x Z = 108.73
| xGI x Z | = 16.00*
| x P x Z | = 12.67*
Using T = 7.83, we reject the hypothesis of xGI = x Z and also x P = x Z (at the 0.01 level of significance), but not the xGI = x P hypothesis. 5%
ANOVA Table Source SS Between 2137.78 Within 2021.47 Total 4159.24
df
MS Fcritical p-value F 2 1068.8889 22.2082976 3.219938094 0.0000 Reject 42 48.130159 44
Confidence Intervals of Group Means Group Confidence Interval GI 124.733 ± 3.6149467 Phillips 121.4 ± 3.6149467 Zenith 108.733 ± 3.6149467
95% 95% 95%
Tukey test for pairwise comparison of group means GI r Phillips Phillips 3 n-r Zenith Sig Sig 42 q0 4.37 T 7.82789
9-65.
n = 50 r =3 SSTR = 128,889 F (2,98) =
128,899 / 2 42,223,987 / 98
SSE = 42,223.987 = 0.14958
Do not reject the null hypothesis 9-66.
t
2 (df)
= F (1,df)
9-12
Zenith
Chapter 09 - Analysis of Variance
9-67.
Rents are equal on average. There is no evidence of differences among the four cities.
9-68.
Answers will vary depending upon which report is selected.
9-69.
A one-way ANOVA strongly rejecting H0. For the three levels of Store, 95% confidence intervals are calculated for means, as shown, which do not overlap at all. Case 11: Rating Wines (Template: ANOVA.xls, sheet: 1-Way) data: n
11
1 2 3 4 5 6 7 8 9 10 11
Chard 89 88 89 78 80 86 87 88 88 89 88
10
13
11
Merlot C.Blanc C.Sauv 91 81 92 88 81 89 99 81 89 90 82 9 91 81 92 88 78 90 88 79 91 89 80 93 90 83 91 87 81 97 88 88 85 86
1) Do not reject the null hypothesis, there is no difference in the average ratings due to the type of grape. ANOVA Table Source SS Between 411.617 Within 6545.63 Total 6957.24
5% df 3 41 44
MS Fcritical p-value F 137.21 0.8594 2.8327 0.4698 159.65
Case 12: Checking out Checkout
9-13
Chapter 09 - Analysis of Variance
1. ANOVA n 1 2 3 4 5 6 7 8 9 10
10
10
10
Scan1 Scan2 Scan3 16 13 18 15 18 19 12 13 15 15 15 14 16 18 19 15 14 16 15 15 17 14 15 14 12 14 15 14 16 17
5%
ANOVA Table Source Between Within Total
SS df 20.6 2 79.7 27 100.3 29
MS 10.3 2.9519
Fcritical p-value F 3.4893 3.3541 0.0449 Reject
Reject the null hypothesis of equal number of scans per minute. 2. Rows = Clerks, columns = scanners ANOVA Table 5% Source SS df MS Fcritical p-value F Row 20.76667 4 5.19167 2.1239 2.5787 0.0934 Column 90.7 2 45.35 18.552 3.2043 0.0000 Reject Interaction 14.13333 8 1.76667 0.7227 2.1521 0.6705 Error 110 45 2.44444 Total 235.6 59
Reject the null hypothesis of equal number of scans per minute (columns) Do not reject the null hypothesis that the clerks are equally efficient. There are no interaction effects present.
9-14
Chapter 10 - Simple Linear Regression and Correlation
CHAPTER 10 SIMPLE LINEAR REGRESSION AND CORRELATION (The template for this chapter is: Simple Regression.xls.) 10-1.
A statistical model is a set of mathematical formulas and assumptions that describe some realworld situation.
10-2.
Steps in statistical model building: 1) Hypothesize a statistical model; 2) Estimate the model parameters; 3) Test the validity of the model; and 4) Use the model.
10-3.
Assumptions of the simple linear regression model: 1) A straight-line relationship between X and Y; 2) The values of X are fixed; 3) The regression errors, , are identically normally distributed random variables, uncorrelated with each other through time.
10-4.
0 is the Y-intercept of the regression line, and 1 is the slope of the line.
10-5.
The conditional mean of Y, E(Y | X), is the population regression line.
10-6.
The regression model is used for understanding the relationship between the two variables, X and Y; for prediction of Y for given values of X; and for possible control of the variable Y, using the variable X.
10-7.
The error term captures the randomness in the process. Since X is assumed nonrandom, the addition of makes the result (Y) a random variable. The error term captures the effects on Y of a host of unknown random components not accounted for by the simple linear regression model.
10-8.
The equation represents a simple linear regression model without an intercept (constant) term.
10-9.
The least-squares procedure produces the best estimated regression line in the sense that the line lies “inside” the data set. The line is the best unbiased linear estimator of the true regression line as the estimators 0 and 1 have smallest variance of all linear unbiased estimators of the line parameters. Least-squares line is obtained by minimizing the sum of the squared deviations of the data points from the line.
10-10. Least squares is less useful when outliers exist. Outliers tend to have a greater influence on the determination of the estimators of the line parameters because the procedure is based on minimizing the squared distances from the line. Since outliers have large squared distances they exert undue influence on the line. A more robust procedure may be appropriate when outliers exist.
10-1
Chapter 10 - Simple Linear Regression and Correlation
10-11. (Template: Simple Regression.xls, sheet: Regression) Simple Regression Income X
Wealth Y
Error
1
1
17.3
0.8
2 3 4
2 3 4
23.6 40.2 45.8
-3.02 3.46 -1.06
0.167 -0.967 95% 10.12 + or - 2.77974 0.833 0.967 0.333 -0.431 Confidence Interval for Intercept
5
5
56.8
-0.18
0.500 0.000
Quantile
Z
Confidence Interval for Slope
0.667 0.431
95%
(1-) C.I. for 1
(1-) C.I. for 0 6.38 + or - 9.21937
Regression Equation: Wealth Growth = 6.38 + 10.12 Income Quantile
10-12. b1 = SSXY /SSX = 934.49/765.98 = 1.22 10-13. (Template: Simple Regression.xls, sheet: Regression) Thus, b0 = 3.057
b1 = 0.187 2
r 0.9217 Coefficient of Determination r 0.9601 Coefficient of Correlation
Confidence Interval for Slope (1-) C.I. for 1 95% 0.18663 + or - 0.03609
s(b1)
Confidence Interval for Intercept (1-) C.I. for 0 95% -3.05658 + or - 2.1372
s(b0) 0.97102Standard Error of Intercept
Prediction Interval for Y (1-) P.I. for Y given X X 95% 10 -1.19025 + or - 2.8317
0.0164Standard Error of Slope
s 0.99538Standard Error of prediction
Prediction Interval for E[Y|X] X (1-) P.I. for E[Y | X] + or ANOVA Table Source SS Regn. 128.332 Error 10.8987 Total 139.231
df 1 11 12
MS F Fcritical p-value 128.332 129.525 4.84434 0.0000 0.99079
10-2
Chapter 10 - Simple Linear Regression and Correlation
10-14. b1 = SSXY /SSX = 2.11 b0 = y b1 x = 165.3 (2.11)(88.9) = 22.279 10-15. Simple Regression Inflation X
Return Y
Error
1
1
-3
-20.0642
2 3 4
2 12.6 -10.3
36 12 -8
17.9677 -16.294 -14.1247
5 6 7 8 9
0.51 2.03 -1.8 5.79 5.87
53 -2 18 32 24
36.4102 -20.0613 3.64648 10.2987 2.22121
Inflation & return on stocks 2
r 0.0873 Coefficient of Determination r 0.2955 Coefficient of Correlation
Confidence Interval for Slope
(1-) C.I. for 1
95%
0.96809 + or - 2.7972
s(b1) 1.18294Standard Error of Slope
Confidence Interval for Intercept 95%
(1-) C.I. for 0 16.0961 + or - 17.3299
s(b0) 7.32883Standard Error of Intercept s 20.8493Standard Error of prediction
ANOVA Table Source SS Regn. 291.134 Error 3042.87 Total 3334
df 1 7 8
MS F 291.134 0.66974 434.695
Fcritical p-value 5.59146 0.4401
10-3
Chapter 10 - Simple Linear Regression and Correlation
60 50
y = 0.9681x + 16.096 40 30 Y
20 10 0 -15
-10
-5
-10 0
5
10
15
-20 X
There is a weak linear relationship (r) and the regression is not significant (r2, F, p-value) 10-16. Simple Regression Year X
Value Y
Error
1
1960
180000
84000
2 3 4
1970 1980 1990
40000 60000 160000
-72000 -68000 16000
5
2000
200000
40000
Average value of Aston Martin 2
r 0.1203 Coefficient of Determination r 0.3468 Coefficient of Correlation
Confidence Interval for Slope
(1-) C.I. for 1
95%
1600 + or - 7949.76
s(b1)
2498Standard Error of Slope
Confidence Interval for Intercept 95%
(1-) C.I. for 0 -3040000 + or - 1.6E+07
s(b0) 4946165Standard Error of Intercept s 78993.7Standard Error of prediction
ANOVA Table Source SS Regn. 2.6E+09 Error 1.9E+10 Total 2.1E+10
df 1 3 4
MS F 2.6E+09 0.41026 6.2E+09
Fcritical p-value 10.128 0.5674
10-4
Chapter 10 - Simple Linear Regression and Correlation
250000
y = 1600x - 3E+06
200000
Y
150000 100000 50000 0 1950
1960
1970
1980 X
1990
2000
2010
There is a weak linear relationship (r) and the regression is not significant (r 2, F, p-value). Limitations: sample size is very small. Hidden variables: the 70s and 80s models have a different valuation than other decades possibly due to a different model or style. 10-17. Regression equation is: Credit Card Transactions = 39.6717 + 0.06129 Debit Card Transactions 2
r 0.9624 Coefficient of Determination r 0.9810 Coefficient of Correlation
Confidence Interval for Slope 95%
(1-) C.I. for 1 0.6202
+ or -
0.17018
s(b1) 0.06129Standard Error of Slope
Confidence Interval for Intercept 95%
(1-) C.I. for 0 177.641 + or - 110.147
s(b0) 39.6717Standard Error of Intercept
Prediction Interval for Y (1-) P.I. for Y given X X + or -
s 56.9747Standard Error of prediction
Prediction Interval for E[Y|X] X (1-) P.I. for E[Y | X] + or ANOVA Table Source SS Regn. 332366 Error 12984.5 Total 345351
df 1 4 5
MS F 332366 102.389 3246.12
Fcritical p-value 7.70865 0.0005
There is no implication for causality. A third variable influence could be “increases in per capital income” or “GDP Growth”. 10-5
Chapter 10 - Simple Linear Regression and Correlation
y b b x Take partial derivatives with respect to b / b [ ( y b b x) ] = 2 y b b x / b [ ( y b b x) ] = 2 x y b b x 2
10-18. SSE =
0
0
0
1
1
0
1
1
0
and b1:
2
0
1
2
0
1
Setting the two partial derivatives to zero and simplifying, we get:
y b b x = 0 and x y b y nb xb = 0 and 0
1
0
0
b1 x = 0.
1
Expanding, we get:
xy - xb x b 2
0
1
=0
Solving the above two equations simultaneously for b0 and b1 gives the required results. 10-19. 99% C.I. for 1 :
1.25533 2.807(0.04972) = [1.1158, 1.3949].
The confidence interval does not contain zero. 10-20. MSE = 7.629 From the ANOVA table for Problem 10-11: ANOVA Table Source SS Regn. 1024.14 Error 22.888 Total 1047.03
df 1 3 4
MS 1024.14 7.62933
10-21. From the regression results for problem 10-11 s(b0) = 2.897 s(b1) = 0.873 s(b1) 0.87346Standard Error of Slope s(b0) 2.89694Standard Error of Intercept
10-22. From the regression results for problem 10-11 Confidence Interval for Slope 95%
(1-) C.I. for 1 10.12 + or - 2.77974
Confidence Interval for Intercept 95%
(1-) C.I. for 0 6.38 + or - 9.21937
95% C.I. for the slope: 10.12 ± 2.77974 = [7.34026, 12.89974] 95% C.I. for the intercept: 6.38 ± 9.21937 = [-2.83937, 15.59937]
10-6
Chapter 10 - Simple Linear Regression and Correlation
10-23. s(b0) = 0.971 s(b1) = 0.016; estimate of the error variance is MSE = 0.991. 95% C.I. for 1 : 0.187 + 2.201(0.016) = [0.1518, 0.2222]. Zero is not a plausible value at = 0.05.
Confidence Interval for Slope
(1-) C.I. for 1
95%
0.18663 + or - 0.03609
s(b1) 0.0164Standard Error of Slope
Confidence Interval for Intercept (1-) C.I. for 0 95% -3.05658 + or - 2.1372
s(b0) 0.97102Standard Error of Intercept
10-24. s(b0) = 85.44 s(b1) = 0.1534 Estimate of the regression variance is MSE = 8122 95% C.I. for b1: 1.5518 2.776 (0.1534) = [1.126, 1.978] Zero is not in the range.
Confidence Interval for Slope
(1-) C.I. for 1
95%
1.55176 + or - 0.42578
Confidence Interval for Intercept (1-) C.I. for 0 95% -255.943 + or - 237.219
s(b1) 0.15336Standard Error of Slope
s(b0) 85.4395Standard Error of Intercept
10-25. s 2 gives us information about the variation of the data points about the computed regression line. 10-26. In correlation analysis, the two variables, X and Y, are viewed in a symmetric way, where no one of them is “dependent” and the other “independent,” as the case in regression analysis. In correlation analysis we are interested in the relation between two random variables, both assumed normally distributed. 10-27. From the regression results for problem 10-11: r 0.9890 Coefficient of Correlation
10-28. r = 0.960 r
0.9601
Coefficient of Correlation
10-7
Chapter 10 - Simple Linear Regression and Correlation
10-29. t(5) =
0.3468 (1 .1203) / 3
= 0.640
Accept H0. The two variables are not linearly correlated. 10-30. Yes. For example suppose n = 5 and r = .51; then: t=
r (1 r ) /(n 2) 2
= 1.02 and we do not reject H0. But if we take n = 10,000 and
r = 0.04, giving t = 14.28, this leads to strong rejection of H0. 10-31. We have: r = 0.875 and n = 10. Conducting the test: t (8) =
r (1 r ) /(n 2) 2
=
.875 (1 .8752 ) / 8
= 5.11
There is statistical evidence of a correlation between the prices of gold and of copper. Limitations: data are time-series data, hence not dependent random samples. Also, data set contains only 10 points.
10-34. n= 65 r = 0.37
t (63) =
.37 (1 .37 2 ) / 63
= 3.16
Yes. Significant. There is a correlation between the two variables. 1 1 ln [(1 + r)/(1 – 5)] = ln (1.37/0.63) = 0.3884 2 2 1 1 = ln [(1 + )/(1 – )] = ln (1.22/0.78) = 0.2237 2 2
10-35. z =
= 1/ n 3 = 1/ 62 = 0.127 z = ( z )/ = (0.3884 – 0.2237)/0.127 = 1.297.
Cannot reject H0.
10-36. Using “TINV(,df)” function in Excel, where df = n-2 = 52: =TINV(0.05,52) = 2.006645 And TINV(0.01, 52) = 2.6737 Reject H0 at 0.05 but not at 0.01. There is evidence of a linear relationship at = 0.05 only. 10-37. t (16) = b1/s(b1) = 3.1/2.89 = 1.0727. Do not reject H0. There is no evidence of a linear relationship using any . 10-38. Using the regression results for problem 10-11: critical value of t is: t( 0.05, 3) = 3.182 computed value of t is: t = b1/s(b1) = 10.12 / 0.87346 = 11.586 Reject H0. There is strong evidence of a linear relationship.
10-8
Chapter 10 - Simple Linear Regression and Correlation
10-39. t (11) = b1/s(b1) = 0.187/0.016 = 11.69 Reject H0. There is strong evidence of a linear relationship between the two variables. 10-40. b1/ s(b1) = 1600/2498 = 0.641 Do not reject H0. There is no evidence of a linear relationship. 10-41. t (58) = b1/s(b1) = 1.24/0.21 = 5.90 Yes, there is evidence of a linear relationship. 10-42. Using the Excel function, TDIST(x,df,#tails) to estimate the p-value for the t-test results, where x = 1.51, df = 585692 – 2 = 585690, #tails = 2 for a 2-tail test: TDIST(1.51, 585690,2) = 0.131. The corresponding p-value for the results is 0.131. The resgression is not significant even at the 0.10 level of significance. 10-43. t (211) = z = b1/s(b1) = 0.68/12.03 = 0.0565 Do not reject H0. There is no evidence of a linear relationship using any . (Why report such results?) 10-44. b1 = 5.49 s(b1) = 1.21 t (26) = 4.537 Yes, there is evidence of a linear relationship. 10-45. The coefficient of determination indicates that 9% of the variation in customer satisfaction can be explained by the changes in a customer’s materialism measurement. 10-46 a. The model should not be used for prediction purposes because only 2.0% of the variation in pension funding is explained by its relationship with firm profitability. b. The model explains virtually nothing. c. Probably not. The model explains too little. 10-47. In Problem 10-11 regression results, r 2 = 0.9781. Thus, 97.8% of the variation in wealth growth is explained by the income quantile. 2
r
0.9781 Coefficient of Determination
10-48. In Problem 10-13, r 2 = 0.922. Thus, 92.2% of the variation in the dependent variable is explained by the regression relationship. 10-49. r 2 in Problem 10-16: r 2 = 0.1203 10-50. Reading directly from the MINITAB output: r 2 = 0.962
10-9
Chapter 10 - Simple Linear Regression and Correlation
2
r 0.9624 Coefficient of Determination
10-51. Based on the coefficient of determination values for the five countries, the UK model explains 31.7% of the variation in long-term bond yields relative to the yield spread. This is the best predictive model of the five. The next best model is the one for Germany, which explains 13.3% of the variation. The regression models for Canada, Japan, and the US do not predict long-term yields very well. 10-52. From the information provided, the slope coefficient of the equation is equal to -14.6. Since its value is not close to zero (which would indicate that a change in bond ratings has no impact on yields), it would indicate that a linear relationship exists between bond ratings and bond yields. This is in line with the reported coefficient of determination of 61.56%. 10-53. r 2 in Problem 10-15: r 2 = 0.873 2
r 0.8348 Coefficient of Determination
10-54.
( y y) = [( yˆ y) ( y yˆ )] = [( yˆ y) 2( yˆ y)( y yˆ ) ( y yˆ ) = ( yˆ y ) 2 ( yˆ y )( y yˆ ) + ( y yˆ ) But: 2 ( yˆ y )( y yˆ ) = 2 yˆ ( y yˆ ) 2 y ( y yˆ ) = 0 2
2
2
2
]
2
2
because the first term on the right is the sum of the weighted regression residuals, which sum to zero. The second term is the sum of the residuals, which is also zero. This establishes the result: ( y y ) 2 ( yˆ y ) 2 ( y yˆ ) 2 .
10-55. From Equation (10-10): b1 = SSXY/SSX. From Equation (10-31): SSR = b1SSXY. Hence, SSR = (SSXY /SSX)SSXY = (SSXY) 2/SSX 10-56. Using the results for problem 10-11: F = 134.238 F(1,3) = 10.128 Reject H0. F 134.238
Fcritical p-value 10.128 0.0014
10-57. F (1,11) = 129.525 F 129.525
10-58. F(1,4) = 102.39
t (11) = 11.381 Fcritical 4.84434
t (4) = 10.119
t 2 = 11.3812 = the F-statistic value already calculated. p-value 0.0000
t 2 = F (10.119)2 = 102.39
10-10
Chapter 10 - Simple Linear Regression and Correlation
F 102.389
p-value 0.0005
Fcritical 7.70865
10-59. F (1,7) = 0.66974 Do not reject H0.
10-60. F (1,102) = MSR/MSE =
87,691/ 1 = 701.8 12,745 / 102
There is extremely strong evidence of a linear relationship between the two variables. 10-61. t (k2 ) = F (1,k) . Thus, F(1,20) = [b1/s(b1)]2 = (2.556/4.122)2 = 0.3845 Do not reject H0. There is no evidence of a linear relationship.
10-62
t (k2 )
SS / SS X = [b1/s(b1)] = XY s / SS X 2
2
[using Equations (10-10) and (10-15) for b1 and s(b1), respectively] SS / SS XY X = MSE / SS X
=
2
2 = (SS XY / SS X ) MSE / SS X
SSR/1 MSR SS 2XY / SS X = = = F (1,k) MSE MSE MSE
[because SS 2XY / SS X = SSR by Equations (10-31) and (10-10)] 10-63. a. Heteroscedasticity. b. No apparent inadequacy. c. Data display curvature, not a straight-line relationship. 10-64. a. No apparent inadequacy. b. A pattern of increase with time. 10-65. a. No serious inadequacy. b. Yes. A deviation from the normal-distribution assumption is apparent.
10-11
Chapter 10 - Simple Linear Regression and Correlation
10-66. Using the results for problem 10-11: Residual Analysis
Durbin-Watson statistic d 3.39862 Residual Plot
4 3 2
Error
1 0 -1 -2 -3 -4 X
Residual variance fluctuates; with only 5 data points the residuals appear to be normally distributed.
Normal Probability Plot of Residuals 3
Corresponding Normal Z
2 1 0 -10
-5
0 -1 -2 -3
Residuals
10-12
5
10
Chapter 10 - Simple Linear Regression and Correlation
10-67. Residuals plotted against the independent variable of Problem 10-14: * resids
1.2+ *
* *
0.0+
*
* *
* *
* -1.2+
*
*
*
Quality 30
40
50
60
No apparent inadequacy. Residual Analysis
Durbin-Watson statistic d 2.0846
10-68.
10-13
70
80
Chapter 10 - Simple Linear Regression and Correlation
Residual Analysis d
Durbin-Watson statistic 1.70855
Plot shows some curvature.
10-69. In the American Express example, give a 95% prediction interval for x = 5,000: yˆ = 274.85 + 1.2553(5,000) = 6,551.35. P.I. = 6,551.35 (2.069)(318.16) 1
1 (5,000 3,177.92) 2 25 40,947,557.84
= [5,854.4, 7,248.3] 10-70. Given that the slope of the equation for 10-52 is –14.6, if the rating falls by 3 the yield should increase by 43.8 basis points. 10-71. For 99% P.I.:
t .005(23) = 2.807
6,551.35 (2.807)(318.16) 1
1 (5,000 3,177.92) 2 25 40,947,557.84
= [5,605.75, 7,496.95] 10-72. Point prediction: yˆ 6.38 10.12(4) 46.86 The 99% P.I.: [28.465, 65.255] Prediction Interval for Y (1-) P.I. for Y given X X 99% 4 46.86 + or - 18.3946
10-14
Chapter 10 - Simple Linear Regression and Correlation
10-73. The 99% P.I.: [36.573, 77.387] Prediction Interval for Y (1-) P.I. for Y given X X 99% 5 56.98 + or - 20.407
10-74. The 95% P.I.: [-142633, 430633] Prediction Interval for Y (1-) P.I. for Y given X X 95% 1990 144000 + or - 286633
10-75. The 95% P.I.: [-157990, 477990] Prediction Interval for Y (1-) P.I. for Y given X X 95% 2000 160000 + or - 317990
ˆ 16.0961 0.96809(5) 20.9365 10-76. Point prediction: y 10-77. a) simple regression equation: Y = 2.779337 X – 0.284157 when X = 10, Y = 27.5092 Intercept
Slope
b0 b1 -0.284157 2.779337
b) forcing through the origin: regression equation: Y = 2.741537 X. Intercept
Slope
b0 0
b1 2.741537
When X = 10, Y = 27.41537 Prediction X 10
Y 27.41537
c) forcing through (5, 13): regression equation: Y = 2.825566 X – 1.12783 Intercept
Slope
b0 b1 -1.12783 2.825566
Prediction X 5
Y 13
10-15
Chapter 10 - Simple Linear Regression and Correlation
When X = 10, Y = 27.12783 Prediction X 10
Y 27.12783
d) slope 2: regression equation: Y = 2 X + 4.236 Intercept
Slope
b0 4.236
b1 2
When X = 10, Y = 24.236 10-78. Using Excel function, TINV(x, df), where x = the p-value of 0.034 and df = 2058 – 2: TINV(0.034, 2056) = 2.121487. Since the slope coefficient = -0.051, t-value becomes negative, t = -2.121487. a) standard error of the slope: sb1
b1 0.051 0.02404 2.12487 t
b) Using an = 0.05, we would reject the null hypothesis of no relationship between the response variable and the predictor based on the reported p-value of 0.034. 10-79. Given the reported p-value, we would reject the null hypothesis of no relationship between neuroticism and job performance. Given the reported coefficient of determination, 19% of the variation in job performance can be explained by neuroticism. 10-80. The t-statistic for the reported information is:
t
b1 0.233 4.236 0.055 sb1
Using Excel function, TDIST(t,df,#tails), we get a p-value of 0.000068: TDIST(4.236, 70, 2) = 6.8112E-05. There is a linear relationship between frequency of online shopping and the level of perceived risk.
10-81 (From Minitab) The regression equation is Stock Close = 67.6 + 0.407 Oper Income
10-16
Chapter 10 - Simple Linear Regression and Correlation
Predictor Constant
Coef
Oper Inc
67.62 0.40725
s = 9.633
R-sq = 89.0%
Stdev 12.32
t-ratio 5.49 11.38
0.03579
p 0.000 0.000
R-sq(adj) = 88.3%
Analysis of Variance SOURCE DF SS MS F p Regression 1 12016 12016 129.49 0.000 Error 16 1485 93 Total 17 13500 Stock close based on an operating income of $305M is yˆ = $56.24.
(Minitab results for Log Y) The regression equation is Log_Stock Close = 2.32 + 0.00552 Oper Inc Predictor Constant Oper Inc s = 0.08422
Coef 2.3153 0.0055201
Stdev 0.1077 0.0003129
R-sq = 95.1%
Analysis of Variance SOURCE DF Regression 1 Error 16 Total 17
p 0.000 0.000
R-sq(adj) = 94.8%
SS 2.2077 0.1135 2.3212
Unusual Observations Obs. x y 1 240 3.8067
t-ratio 21.50 17.64
MS 2.2077 0.0071
Fit 3.6401
F 311.25
Stdev.Fit 0.0366
p 0.000
Residual 0.1666
R denotes an obs. with a large st. resid. Stock close based on an operating income of $305M is yˆ = $54.80
10-17
St.Resid 2.20R
Chapter 10 - Simple Linear Regression and Correlation
The regression using the Log of monthly stock closings is a better fit. Operating Income explains over 95% of the variation in the log of monthly stock closings versus 89% for non-transformed Y. 10-82. a) The calculated t-value for the slope coefficient is:
t
b1 0.92 92.00 sb1 0.01
Using Excel function, TDIST(t,df,#tails), we get a p-value of 0.0 TDIST(92.0, 598, 2) = 0. There is a linear relationship. b) The excess return would be 0.9592: FER = 0.95 + 0.92(0.01) = 0.9592 10-83 a) adding 2 to all X values: new regression: Y = 5 X + 17 since the intercept is b0 Y b1 X , the only thing that has changed is that the value for Xbar has increased by 2. Therefore, take the change in X-bar times the slope and add it to the original regression intercept. b) adding 2 to all Y values: new regression: Y = 5X + 9 using the formula for the intercept, only the value for Y-bar changes by 2. Therefore, the intercept changes by 2 c) multiplying all X values by 2: new regression: Y = 2.5 X + 7 d) multiplying all Y Values by 2: new regression: Y = 10 X + 7 10-84 You are minimizing the squared deviations from the former x-values instead of the former yvalues. 10-85 a)
Y = 3.820133 X + 52.273036 Intercept
Slope
b0 b1 52.273036 3.820133
b)
90% CI for slope: [3.36703, 4.27323] Confidence Interval for Slope
c)
(1-) C.I. for 1
90%
3.82013 + or - 0.4531
r2 = 0.9449, very high; F = 222.931 (p-value = 0.000): both indicate that X affects Y
10-18
Chapter 10 - Simple Linear Regression and Correlation
d)
since the 99% CI does not contain the value 0, the slope is not 0 Confidence Interval for Slope
e)
(1-) C.I. for 1
99%
3.82013 + or - 0.77071
Y = 90.47436 when X = 10 Prediction X 10
Y 90.47436
f)
X = 12.49354
g)
residuals appear to be random
Residual Analysis d
h)
Durbin-Watson statistic 2.56884
appears to be a little flatter than normal
10-19
Chapter 10 - Simple Linear Regression and Correlation
Case 13: Level of leverage a) Leverage = -0.118 – 0.040 (Rights) b) Using Excel function, TDIST(t,df,#tails), we get a p-value of 0.0 TDIST(2.62, 1307, 2) = 0.0089 There is a linear relationship. c) The reported coefficient of determination indicates that shareholders’ rights explain 16.5% of the variation in a firm’s leverage. Case 14: Risk and Return 1)
Y = 1.166957 X – 1.060724 Intercept
Slope
b0 b1 -1.090724 1.166957
2)
stock has above average risk: b1 > 1.10
3)
95 % CI for slope: Confidence Interval for Slope
4)
(1-) C.I. for 1
95%
1.16696 + or - 0.37405
When X = 10, Y = 10.57884 95% CI on prediction: Prediction Interval for Y (1-) P.I. for Y given X X 95% 10 10.5788 + or - 5.35692
5)
residuals appear random Residual Analysis d
Durbin-Watson statistic 0.83996
10-20
Chapter 10 - Simple Linear Regression and Correlation
6)
a little flatter than normal Normal Probability Plot of Residuals
Cor responding Nor mal Z
3 2 1 0 -10
-5
-1
0
5
10
-2 -3
Residuals
7)
Y = 1.157559 – 0.945353 Intercept
Slope
b0 b1 -0.945353 1.157559
Prediction X 6
Y 6
risk has dropped a little but it is still above average since b1 > 1.10
10-21
Chapter 11 - Multiple Regression
CHAPTER 11 MULTIPLE REGRESSION (The template for this chapter is: Multiple Regression.xls.) 11-1.
The assumptions of the multiple regression model are that the errors are normally and independently distributed with mean zero and common variance 2 . We also assume that the X i are fixed quantities rather than random variables; at any rate, they are independent of the error terms. The assumption of normality of the errors is need for conducting test about the regression model.
11-2.
Holding advertising expenditures constant, sales volume increases by 1.34 units, on average, per increase of 1 unit in promotional experiences.
11-3.
In a correlational analysis, we are interested in the relationships among the variables. On the other hand, in a regression analysis with k independent variables, we are interested in the effects of the k variables (considered fixed quantities) on the dependent variable only (and not on one another).
11-4.
A response surface is a generalization to higher dimensions of the regression line of simple linear regression. For example, when 2 independent variables are used, each in the first order only, the response surface is a plane is a plane in 3-dimensional euclidean space. When 7 independent variables are used, each in the first order, the response surface is a 7-dimensional hyperplane in 8-dimensional euclidean space.
11-5.
8 equations.
11-6.
The least-squares estimators of the parameters of the multiple regression model, obtained as solutions of the normal equations.
11-7.
Y nb b X b X X Y b X b X b X X X Y b X b X X b X 0
1
1
2
2
2
1
0
2
0
1
2
1
1
1
1
2
2
1
2
2
2 2
852 = 100b0 + 155b1 + 88b2 11,423 = 155b0 + 2,125b1 + 1,055b2 8,320 = 88b0 + 1,055b1 + 768b2 b0 = (852 – 155b1 – 88b2)/100 11,423 = 155(852 – 155b1 – 88b2)/100 + 2,125b1 + 1,055b2 8,320 = 88(852 – 155b1 – 88b2)/100 + 1,055b1 + 768b2
11-1
Chapter 11 - Multiple Regression
Continue solving the equations to obtain the solutions: b0 = 1.1454469 11-8.
b1 = 0.0487011
b2 = 10.897682
Using SYSTAT: DEP VAR:
VALUE
N: 9
MULTIPLE R: .909
ADJUSTED SQUARED MULTIPLE R:
.769
STANDING ERROR OF ESTIMATE: VARIABLE
COEFFICIENT
SQUARED MULTIPLE R: .826
59.477
STD ERROR
STD COEF
TOLERANCE
T
P(2TAIL)
CONSTANT
9.800
80.763
0.121
0.907
SIZE
0.173
0.040
0.753
0.9614430
4.343
0.005
DISTANCE
31.094
14.132
0.382
0.9614430
2.200
0.070
0.000
ANALYSIS OF VARIANCE SOURCE
SUM-OF-SQUARES
DF
MEAN-SQUARE
F-RATIO
REGRESSION
101032.867
2
50516.433
14.280
RESIDUAL
21225.133
6
3537.522
Multiple Regression Results 0 Intercept b -9.7997 s(b) 80.7627 t -0.1213 p-value 0.9074
1 2 Size Distance 0.17331 31.094 0.0399 14.132 4.34343 2.2002 0.0049 0.0701
VIF 1.0401
P 0.005
Value 3
4
5
6
7
8
1.0401
ANOVA Table Source SS Regn. 101033 Error 21225.1
df 2 6
MS 50516 3537.5
Total 122258
8
15282
11-2
FCritical p-value F 14.28 5.1432 0.0052 2
R 0.8264
s 59.477 2
Adjusted R 0.7685
Chapter 11 - Multiple Regression
11-9.
With no advertising and no spending on in-store displays, sales are b0 47.165 (thousands) on the average. Per each unit (thousand) increase in advertising expenditure, keeping in-store display expenditure constant, there is an average increase in sales of b1 = 1.599 (thousand). Similarly, for each unit (thousand) increase in in-store display expenditure, keeping advertising constant, there is an average increase in sales of b2 = 1.149 (thousand).
11-10. We test whether there is a linear relationship between Y and any of the X, variables (that is, with at least one of the Xi). If the null hypothesis is not rejected, there is nothing more to do since there is no evidence of a regression relationship. If H0 is rejected, we need to conduct further analyses to determine which of the variables have a linear relationship with Y and which do not, and we need to develop the regression model. 11-11. Degrees of freedom for error = n 13. 11-12. k = 2 n = 82 SSE = 8,650 SSR = 988 MSR = SSR / k = 988 / 2 = 494 SST = SSR + SSE = 988 + 8650 = 9638 MSE = SSE / n – (k+1) = 8650 / 79 = 109.4937 F = MSR / MSE = 494 / 109.4937 = 4.5116 Using Excel function to return the p-value, FDIST(F, dfN, dfD), where F is the F-test result and the df’s refer to the degrees of freedom in the numerator and denominator, respectively. FDIST(4.5116, 2, 79) = 0.013953 Yes, there is evidence of a linear regression relationship at = 0.05, but not at 0.01.
11-13. F (4,40) = MSR/MSE =
7,768 / 4 = 1,942/197.625 = 9.827 (15,673 7,768) / 40
Yes, there is evidence of a linear regression relationship between Y and at least one of the independent variables. 11-14. Source Regression Error Total
SS 7,474.0 672.5 8,146.5
df 3 13 16
MS 2,491.33 51.73
F 48.16
Since the F-ratio is highly significant, there is evidence of a linear regression relationship between overall appeal score and at least one of the three variables prestige, comfort, and economy. 11-15. When the sample size is small; when the degrees of freedom for error are relatively smallwhen adding a variable and thus losing a degree if freedom for error is substantial.
11-3
Chapter 11 - Multiple Regression
11-16. R 2 = SSR/SST. As we add a variable, SSR cannot decrease. Since SST is constant, R 2 cannot decrease. 11-17. No. The adjusted coefficient is used in evaluating the importance of new variables in the presence of old ones. It does not apply in the case where all we consider is a single independent variable. 11-18. By the definition of the adjusted coefficient of determination, Equation (11-13): R2 = 1
n 1 SSE /( n k 1) = 1 – (SSE/SST) n k 1 SST /( n 1)
But: SSE/SST = 1 – R 2, so the above is equal to: 1 – (1 – R 2)
n 1 n (k 1)
which is Equation (11-14).
11-19. The mean square error gives a good indication of the variation of the errors in regression. However, other measures such as the coefficient of multiple determination and the adjusted coefficient of multiple determination are useful in evaluating the proportion of the variation in the dependent variable explained by the regressionthus giving us a more meaningful measure of the regression fit. 11-20. Given an adjusted R 2 = 0.021, only 2.1% of the variation in the stock return is explained by the four independent variables. Using Excel function to return the p-value, FDIST(F, dfN, dfD), where F is the F-test result and the df’s refer to the degrees of freedom in the numerator and denominator, respectively. FDIST(2.27, 4, 433) = 0.06093 There is evidence of a linear regression relationship at = 0.10 only.
11-21. R 2 = 7,474.0/8,146.5 = 0.9174 A good regression. R 2 = 1 (1 0.9174)(16/13) = 0.8983
s=
MSE =
51.73 = 7.192
11-22. Given R 2 = 0.94, k = 2 and n = 383, the adjusted R 2is: n 1 = 1 (1 0.94)(382/380) = 0.9397 n (k 1) Therefore, security and time effects characterize 93.97% of the variation on market price. Given the value of the adjusted R 2, the model is a reliable predictor of market price.
R2
=1 (1 R 2)
11-23. R 2 = 1 (1 R 2)
n 1 = 1 (1 0.918)(16/12) = 0.8907 n (k 1)
Since R 2 has decreased, do not include the new variable.
11-4
Chapter 11 - Multiple Regression
11-24. Given R 2 = 0.769, k = 6 and n = 242 R 2 = 1 (1 R 2)
n 1 = 1 (1 0.769)(241/235) = 0.7631 n (k 1)
Since R 2 =76.31%, approximately 76% of the variation in the information price is characterized by the 6 independent marketing variables. Using Excel function to return the p-value, FDIST(F, dfN, dfD), where F is the F-test result and the df’s refer to the degrees of freedom in the numerator and denominator, respectively. FDIST(44.8, 6, 235) = 2.48855E-36 There is evidence of a linear regression relationship at all ’s. 11-25. a.
The regression expresses stock returns as a plane in space, with firm size ranking and stock price ranking as the two horizontal axes: RETURN = 0.484 - 0.030(SIZRNK) 0.017(PRCRNK) The t-test for a linear relationship between returns and firm size ranking is highly significant, but not for returns against stock price ranking.
b. We know that R 2 = 0.093 and n = 50, k = 2. Using Equation (11-14) we calculate: n 1 = 1 R 2 (1 – R 2) n ( k 1 ) n (k 1) R 2 = 1 – (1 – R 2 ) = 1 – (1 – 0.093)(47/49) = 0.130 n 1 Thus, 13% of the variation is due to the two independent variables.
c. The adjusted R 2 is quite low, indicating that the regression on both variables is not a good model. They should try regressing on size alone. 11-26. R 2 = 1 – (1 -– R 2)
n 1 = 1 – (1 – 0.72)(712/710) = 0.719 n (k 1)
Based solely on this information, this is not a bad regression model. 11-27. k = 8
n = 500 Source Regn. Error Total
SSE = 6179
SST = 23108
SS 16929 6179
df 8 491
23108
499
11-5
MS 2116.125 12.5845 3.0684E+14
F 168.153
Chapter 11 - Multiple Regression
Using Excel function to return the p-value, FDIST(F, dfN, dfD), where F is the F-test result and the df’s refer to the degrees of freedom in the numerator and denominator, respectively. FDIST(168.153, 8, 491) = 0.00 approximately There is evidence of a linear regression relationship at all ’s. R 2 = SSR/SST = 0.7326
R2= 1
SSE /[ n (k 1)] = 0.7282 SST /( n 1)
MSE = 12.5845
11-28. A joint confidence region for both parameters is a set of pairs of likely values of 1 , and 2 at 95%. This region accounts for the mutual dependency of the estimators and hence is elliptical rather than rectangular. This is why the region may not contain a bivariate point included in the separate univariate confidence intervals for the two parameters. 11-29. Assuming a very large sample size, we use the following formula for testing the significance of each of the slope parameters: z
bi . and use = 0.05. Critical value of |z| = 1.96 sbi
For firm size: z = 0.06/0.005 = 12.00 (significant) For firm profitability: z = -5.533 (significant) For fixed-asset ratio: z = -0.08 For growth opportunities: z = -0.72 For nondebt tax shield: z = 4.29 (significant) The slope estimates with respect to “firm size”, “firm profitability” and “nondebt tax shield” are not zero. The adjusted R-square indicates that 16.5% of the variation in governance level is explained by the five independent variables. Next step: exclude “fixed-asset ratio” and “growth opportunities” from the regression and see what happens to the adjusted R-square.\ 11-30. 1. The usual caution about the possibility of a Type 1 error. 2. Multicollinearity may make the tests unreliable. 3. Autocorrelation in the errors may make the tests unreliable. 11-31. 95% C.I.’s for 2 through 5 :
2 : 5.6 1.96(1.3) = [3.052, 8.148] 3 : 10.35 1.96(6.88) = [3.135, 23.835] 4 : 3.45 1.96(2.7) = [1.842, 8.742] 5 : 4.25 1.96(0.38) = [4.995, 3.505] 3 & 4 :contains the point (0,0)
11-32. Use the following formula for testing the significance of each of the slope parameters:
z
bi . and use = 0.05. Critical value of |z| = 1.96 sbi
11-6
Chapter 11 - Multiple Regression
For unexpected accruals: z = -2.0775 / 0.4111 = -5.054 (significant) For auditor quality: z = 0.5176 For return on investment: z = 1.7785 For expenditure on R&D: z = 2.1161 (significant) The R-square indicates that 36.5% of the variation in a firm’s reputation can be explained by the four independent variables listed. 11-33. Yes. Considering the joint confidence region for both slope parameters is equivalent to conducting an F test for the existence of a linear regression relationship. Since (0,0) is not in the joint 95% region, this is equivalent to rejecting the null hypothesis of the F test at = 0.05. 11-34. Prestige is not significant (or at least appears so, pending further analysis). Comfort and Economy are significant (Comfort only at the 0.05 level). The regression should be rerun with variables deleted. 11-35. Variable Lend seems insignificant because of collinearity with M1 or Price. 11-36. a. As Price is dropped, Lend becomes significant: there is, apparently, a collinearity between Lend and Price. b.,c. The best model so far is the one in Table 11-9, with M1 and Price only. The adjusted R 2 for that model is higher than for the other regressions. d. For the model in this problem, MINITAB reports F = 114.09. Highly significant. For the model in Table 11-9: F = 150.67. Highly significant. e. s = 0.3697. For Problem 11-35: s = 0.3332. As a variable is deleted, s (and its square, MSE) increases. f. In Problem 11-35: MSE = s 2 = (0.3332)2 = 0.111. 11-37. Autocorrelation of the regression error may cause this. 11-38. Use the following formula for testing the significance of each of the slope parameters:
z
bi . and use = 0.05. Critical value of |z| = 1.96 sbi
For new technological process: z = -0.014 / 0.004 = -3.50 (significant) For organizational innovation: z = 0.25 For commercial innovation: z = 3.2 (significant) For R&D: z = 4.50 (significant) All but “organizational innovation” is an important independent variable in explaining employment growth. The R-square indicates that 74.3% of the variation in employment growth is explained by the four independent variables in the equation.
11-7
Chapter 11 - Multiple Regression
11-39. Regress Profits on Employees and Revenues Multiple Regression
Sl.No. 1 2 3 4 5 6 7 8 9 10
Y Profits -1221 -2808 -773 248 38 1461 442 14 57 108
ANOVA Table Source Regn. Error Total
1 Ones 1 1 1 1 1 1 1 1 1 1
X1 Employees 96400 63000 70600 39100 37680 31700 32847 12867 11475 6000
SS df 4507008.861 2 7281731.539 7 11788740.4
9
X2 Revenues 17440 13724 13303 9510 8870 6846 5937 2445 2254 1311
MS 2253504.43 1040247.363 1309860.044
Multiple Regression Results 0 1 2 Employees Revenues Intercept 0.0085493 -0.174148688 b 834.9510193 s(b) 621.1993315 0.064416986 0.340929503 t 1.344095167 0.132718098 -0.510805567 0.2208 0.8982 0.6252 p-value VIF
FCritical p-value F 2.166 4.737 0.1852 2
R 0.3823
29.8304
29.8304
s
1019.925
2
0.2058
Adjusted R
Correlation matrix 1 2 Employees Revenues 1 Employees 1.0000 2 Revenues 0.9831 1.0000 Y
Profits
-0.5994
-0.6171
Regression Equation: Profits = 834.95 + 0.009 Employees - 0.174 Revenues The regression equation is not significant (F value), and there is a large amount of multicollinearity present between the two independent variables (0.9831). There is so much multicollinearity present that the negative partial correlations between the independent variables and profits are not maintained in the regression results (both of the parameters of the independent variables should be negative). None of the values of the parameters are significant.
11-40. The residual plot exhibits both heteroscedasticity and a curvature apparently not accounted for in the model.
11-8
Chapter 11 - Multiple Regression
11-41. a) residuals appear to be normally distributed b) residuals are not normally distributed 11-42. An outlier is an observation far from the others. 11-43. A plot of the data or a plot of the residuals will reveal outliers. Also, most computer packages (e.g., MINITAB) will automatically report all outliers and suspected outliers. 11-44. Outliers, unless they are due to errors in recording the data, may contain important information about the process under study and should not be blindly discarded. The relationship of the true data may well be nonlinear. 11-45. An outlier tends to “tilt” the regression surface toward it, because of the high influence of a large squared deviation in the least-squares formula, thus creating a possible bias in the results. 11-46. An influential observation is one that exerts relatively strong influence on the regression surface. For example, if all the data lie in one region in X-space and one observation lies far away in X, it may exert strong influence on the estimates of the regression parameters. 11-47. This creates a bias. In any case, there is no reason to force the regression surface to go through the origin. 11-48. The residual plot in Figure 11-16 exhibits strong heteroscedasticity. 11-49. The regression relationship may be quite different in a region where we have no observations from what it is in the estimation-data region. Thus predicting outside the range of available data may create large errors. 11-50. yˆ = 47.165 + 1.599(8) + 1.149(12) = 73.745 (thousands), i.e., $73,745. 11-51. In Problem 11-8: X 2 (distance) is not a significant variable, but we use the complete original regression relationship given in that problem anyway (since this problem calls for it): yˆ = 9.800 + 0.173X 1 + 31.094X 2 yˆ (1800,2.0) = 9.800 + (.173)1800 + (31.094)2.0 = 363.78
11-52. Using the regression coefficients reported in Problem 11-25: Yˆ = 0.484 0.030Sizrnk 0.017Prcrnk = 0.484 0.030(5.0) 0.017(6.0) = 0.232 11-53. Estimated SE( Yˆ ) is obtained as: (3.939 0.6846)/4 = 0.341. Estimated SE(E(Y | x)) is obtained as: (3.939 0.1799)/4 = 0.085.
11-9
Chapter 11 - Multiple Regression
11-54. From MINITAB: Fit: 73.742 St Dev Fit: 2.765 95% C.I. [67.203, 80.281] 95% P.I. [65.793, 81692] (all numbers are in thousands) 11-55. The estimators are the same although their standard errors are different. 11-56. A prediction interval reflects more variation than a confidence interval for the conditional mean of Y. The additional variation is the variation of the actual predicted value about the conditional mean of Y (the estimator of which is itself a random variable). 11-57. This is a regression with one continuous variable and one dummy variable. Both variables are significant. Thus there are two distinct regression lines. The coefficient of determination is respectively high. During times of restricted trade with the Orient, the company sells 26,540 more units per month, on average. 11-58. Use the following formula for testing the significance of each of the slope parameters:
z
bi . and use = 0.05. Critical value of |z| = 1.96 sbi
For the dummy variable: z = -0.003 / 0.29 = -0.0103 is not significant. A firm’s being regulated or not does not affect its leverage level. 11-59. Two-way ANOVA. 11-60. Use analysis of covariance. Run it as a regressionLength of Stay is the concomitant variable. 11-61. Early investment is not statistically significant (or may be collinear with another variable). Rerun the regression without it. The dummy variables are both significantthere is a distinct line (or plane if you do include the insignificant variable) for each type of firm. 11-62. This is a second-order regression model in three independent variables with cross-terms. 11-63. The STEPWISE routine chooses Price and M1 * Price as the best set of explanatory variables. This gives the estimated regression relationship: Exports = 1.39 + 0.0229Price + 0.00248M1 * Price The t-statistics are: 2.36, 4.57, 9.08, respectively. R 2 = 0.822. 11-64. The STEPWISE routine chooses the three original variables: Prod, Prom, and Book, with no squares. Thus the original regression model of Example 11-3 is better than a model with squared terms.
11-10
Chapter 11 - Multiple Regression
Example 11-3 with production costs squared: higher s than original model. Multiple Regression Results 0 Intercept b 7.04103 s(b) 5.82083 t 1.20963 0.2451 p-value
1 2 prod promo 3.10543 2.2761 1.76478 0.262 1.75967 8.6887 0.0988 0.0000
3 4 book prod^2 7.1125 -0.017 1.9099 0.1135 3.7241 -0.15 0.0020 0.8827
5
6
7
8
VIF 34.5783 1.7050 1.2454 32.3282 ANOVA Table Source SS Regn. 6325.48 Error 217.472
df 4 15
MS FCritical p-value F 1581.4 109.07 3.0556 0.0000 14.498
Total 6542.95
19
344.37
2
s 3.8076 2
R 0.9668
Adjusted R 0.9579
Example 11-3 with production and promotion costs squared: higher s and slightly higher R2 Multiple Regression Results 0 Intercept b 5.30825 s(b) 5.84748 t 0.90778 p-value 0.3794
1 2 prod promo 4.29943 1.2803 1.95614 0.8094 2.19792 1.5817 0.0453 0.1360
3 book 6.7046 1.8942 3.5396 0.0033
4 5 prod^2 promo^2 -0.0948 0.0731 0.1262 0.0564 -0.7511 1.297 0.4651 0.2156
6
7
8
VIF 44.4155 17.0182 1.2807 41.7465 16.2580 ANOVA Table Source SS Regn. 6348.81 Error 194.145
df 5 14
MS FCritical p-value F 1269.8 91.564 2.9582 0.0000 13.867
Total 6542.95
19
344.37
R
2
11-11
0.9703
s 3.7239 2
Adjusted R 0.9597
Chapter 11 - Multiple Regression
Example 11-3 with promotion costs squared: slightly lower s, slightly higher R2 Multiple Regression Results 0 1 2 Intercept prod promo b 9.21031 2.86071 1.5635 s(b) 2.64412 0.39039 0.7057 t 3.48332 7.3279 2.2157 p-value 0.0033 0.0000 0.0426 VIF
3 4 book promo^2 7.0476 0.053 1.8114 0.0489 3.8908 1.0844 0.0014 0.2953
5
6
7
8
1.8219 13.3224 1.2062 12.5901
ANOVA Table Source SS Regn. 6340.98 Error 201.967
df 4 15
MS 1585.2 13.464
Total 6542.95
19
344.37
FCritical p-value F 117.74 3.0556 0.0000 2
R 0.9691
s 3.6694 2
Adjusted R 0.9609
11-65. Use the following formula for testing the significance of each of the slope parameters:
z
bi . and use = 0.05. Critical value of |z| = 1.96 sbi
For After * Bankdep: z = -0.398 / 0.035 = -11.3714 (significant interaction) For After * Bankdep * ROA: z = 2.7193 (significant interaction) For After * ROA: z = -3.00 (significant interaction) For Bankdep * ROA: z = -3.9178 (significant interaction) An adjusted R-square of 0.53 indicates that 53% of the variation in bank equity has been expressed by interaction among the independent variables. 11-66. The squared X 1 variable and the cross-product term appear not significant. Drop the least significant term first, i.e., the squared X 1, and rerun the regression. See what happens to the cross-product term now. 11-67. Try a quadratic regression (you should get a negative estimated x 2 coefficient). 11-68. Try a quadratic regression (you should get a positive estimated x 2 coefficient). Also try a cubic polynomial. 11-69. Linearizing a model; finding a more parsimonious model than is possible without a transformation; stabilizing the variance.
11-12
Chapter 11 - Multiple Regression
11-70. A transformed model may be more parsimonious, when the model describes the process well. 11-71. Try the transformation logY. 11-72. A good model is log(Exports) versus log(M 1) and log(Price). This model has R 2 = 0.8652. Thus implies a multiplicative relation. 11-73. A logarithmic model. 11-74. This dataset fits an exponential model, so use a logarithmic transformation to linearize it. 11-75. A multiplicative relation (Equation (11-26)) with multiplicative errors. The reported error term, , is the logarithm of the multiplicative error term. The transformed error term is assumed to satisfy the usual model assumptions. 11-76. An exponential model Y = (e 0 1x1 2 x2 ) =
(e3.79+1.66χ 1 +2.91χ 2 )
11-77. No. We cannot find a transformation that will linearize this model. 11-78. Take logs of both sides of the equation, giving: log Q = log 0 + 1log C + 2log K + 3log L + log 11-79. Taking reciprocals of both sides of the equation. 11-80. The square-root transformation Y Y 11-81. No. They minimize the sum of the squared deviations relevant to the estimated, transformed model. 11-82. It is possible that the relation between a firm’s total assets and bank equity is not linear. Including the logarithm of a firm’s total assets is an attempt to linearize that relationship. 11-83. Prod Prom Book
Earn .867 .882 .547
Prod
Prom
.638 .402
.319
As evidenced by the relatively low correlations between the independent variables, multicollinearity does not seem to be serious here.
11-13
Chapter 11 - Multiple Regression
11-84. The VIFs are: 1.82, 1.70, 1.20. No severe multicollinearity is present. 11-85. The sample correlation is 0.740. VIF = 2.2 minor multicollinearity problem 11-86. a) Y = 11.031 + 0.41869 X1 – 7.2579 X2 + 37.181 X3 Multiple Regression Results 0 1 2 X1 X2 Intercept 11.031 0.41869 -7.2579 b s(b) 20.9905 0.28418 5.3287 t 0.52552 1.47334 -1.362 0.6107 0.1714 0.2031 p-value VIF
1.0561 557.7
ANOVA Table Source SS Regn. 2459.78 Error 5981.02 Total
8440.8
3 X3 37.181 26.545 1.4007 0.1916
4
5
6
7
8
9
557.9
df 3 10
MS 819.93 598.1
13
649.29
FCritical p-value F 1.3709 3.7083 0.3074 R
2
s 24.456 2
0.2914
Adjusted R 0.0788
b) Y = 20.8808 + 0.29454 X1 +16.583 X2 – 81.717 X3 Multiple Regression Results 0 Intercept 20.8808 b s(b) 23.5983 t 0.88484 0.3970 p-value VIF
1 X1 0.29454 0.29945 0.98361 0.3485
2 3 X2 X3 16.583 -81.717 23.96 119.5 0.6921 -0.6838 0.5046 0.5096
1.0262 9867.0
4
5
6
7
8
9867.4
ANOVA Table Source Regn. Error
SS 1605.98 6834.82
df 3 10
MS FCritical p-value F 535.33 0.7832 3.7083 0.5300 683.48
Total
8440.8
13
649.29
11-14
2
R 0.1903
s 26.143 2
Adjusted R -0.0527
9
Chapter 11 - Multiple Regression
c) all parameters of the equation change values and some change signs. X2 and X3 are correlated (0.9999) Solution: use either X2 or X3, but not both. d) Yes, the correlation matrix indicated that X2 and X3 were correlated X1 X2 X3 X1 1.0000 X2 -0.0137 1.0000 X3 -0.0237 0.9991 1.0000
11-87. Artificially high variances of regression coefficient estimators; unexpected magnitudes of some coefficient estimates; sometimes wrong signs of these coefficients. Large changes in coefficient estimates and standard errors as a variable or a data point is added or deleted. 11-88. Perfect collinearity exists when at least one variable is a linear combination of other variables. This causes the determinant of the X X matrix to be zero and thus the matrix non-invertible. The estimation procedure breaks down in such cases. (Other, less technical, explanations based on the text will suffice.) 11-89. Not true. Predictions may be good when carried out within the same region of the multicollinearity as used in the estimation procedure. 11-90. No. There are probably no relationships between Y and any of the two independent variables. 11-91. X 2 and X 3 are probably collinear. 11-92. Delete one of the variables X 2, X 3, X 4 to check for multicollinearity among a subset of these three variables, or whether they are all insignificant. 11-93. Drop some of the other variables one at a time and see what happens to the suspected sign of the estimate. 11-94. The purpose of the test is to check for a possible violation of the assumption that the regression errors are uncorrelated with each other. 11-95. Autocorrelation is correlation of a variable with itself, lagged back in time. Third-order autocorrelation is a correlation of a variable with itself lagged 3 periods back in time. 11-96. First-order autocorrelation is a correlation of a variable with itself lagged one period back in time. Not necessarily: a partial fifth-order autocorrelation may exist without a first-order autocorrelation. 11-97. 1) The test checks only for first-order autocorrelation. 2) The test may not be conclusive. 3) The usual limitations of a statistical test owing to the two possible types of errors.
11-15
Chapter 11 - Multiple Regression
11-98. DW = 0.93
n = 21
k=2
d L = 1.13 d U = 1.54 4 d L = 2.87 4 d U = 2.46 At the 0.10 level, there is some evidence of a positive first-order autocorrelation. 11-99. DW = 2.13
n = 20
k=3
d L = 1.00 d U = 1.68 4 d L = 3.00 4 d U = 2.32 At the 0.10 level, there is no evidence of a first-order autocorrelation.
Durbin-Watson d = 2.125388
11-100. DW = 1.79 n = 10 k = 2 Since the table does not list values for n = 10, we will use the closest table values, those for n = 15 and k = 2: d L = 0.95 d U = 1.54 4 d L = 3.05 4 d U = 2.46 At the 0.10 level, there is no evidence of a first-order autocorrelation. Note that the table values decrease as n decreases, and thus our conclusion would probably also hold if we knew the actual critical points for n = 10 and used them. 11-101. Suppose that we have time-series data and that it is known that, if the data are autocorrelated, by the nature of the variables the correlation can only be positive. In such cases, where the hypothesis is made before looking at the actual data, a onesided DW test may be appropriate. (And similarly for a negative autocorrelation.) 11-102. DW analysis on results from problem 11-39: Durbin-Watson d = 1.552891
k = 2 independent variables n = 10 for the sample size. Table 7 for the critical values of the DW statistic begins with sample sizes of 15, which is a little larger than our sample. Using the values for size 15 as an approximation, we have: for α = 0.05, dl = 0.95 and du = 1.54 the value for d is slightly larger than du indicating no autocorrelation.
Residual plot with employees on x-axis:
11-16
Chapter 11 - Multiple Regression
Residual Plot 2000 1500
Residual
1000 500 0 0
20000
40000
60000
80000
100000
120000
-500 -1000 -1500 -2000
11-103. F (r,n(k+1)) =
(6.996 6.9898) / 2 (SSE R SSE F ) / r = = 0.0275 0.1127 MSE F
Cannot reject H0. The two variables should definitely be droppedthey add nothing to the model. 11-104. Y = 47.16 + 1.599X 1 + 1.1149X 2 The STEPWISE regression routine selects both variables for the equation. R 2 = 0.961. 11-105. The STEPWISE procedure selects all three variables. R 2 = 0.9667. 11.106. All possible regression is the best procedure because it evaluates every possibility. It is expensive in computer time; however, as computing power and speed increase, this becomes a very viable option. Forward selection is limited by the fact that once a variable is in, there is no way it can come out once it becomes insignificant in the presence of new variables. Backward elimination is similarly limited. Stepwise regression is an excellent method that enjoys very wide use and that has stood the test of time. It has the advantages of both the forward and the backward methods, without their limitation. 11-107. Because a variable may lose explanatory power and become insignificant once other variables are added to the model. 11-108. Highest adjusted R 2; lowest MSE; highest R 2 for a given number of variables and the assessment of the increase in R 2 as we increase the number of variables; Mallows’s Cp. 11-109. No. There may be several different “best” models. A model may be best using one criterion, and not the best using another criterion.
11-17
Chapter 11 - Multiple Regression
11-110. Results will vary. Sample regression for Australia. (Data source: Foreign Statistics/Handbook of International Economic Statistics/Tables) Australia
Real GDP
Defense Exp % GDP
1970
171
2.3
14.6
1,219
1980
238
2.7
17.0
1,052
1990
328
2.2
17.3
1,670
1992
330
2.3
17.5
1,800
1993
342
2.6
17.7
2,000
1994
359
2.5
17.9
1,230
1995
369
2.7
18.1
1,800
1996
382
2.6
18.3
2,090
2.5
18.4
1,790
1997
394
Grain Population Yields
Multiple Regression Results 0 1 2 3 Intercept Defense Exp % GDP Population Grain Yields -64.709 58.04 0.035 b -583.38 s(b) 123.753 45.0667 8.8057 0.0246 t -4.714 -1.4358 6.5912 1.4181 0.2105 0.0012 0.2154 p-value 0.0053 VIF
1.3387
ANOVA Table Source SS Regn. 40654.3 Error 2039.75 Total
42694
2.0253
1.6331
df 3 5
MS FCritical p-value F 13551 33.218 5.4094 0.0010 407.95
8
5336.8
2
R 0.9522
Correlation matrix 1 2 3 Defense Population Grain Yields Defense 1.0000 Population 0.4444 1.0000 Grain Yields 0.0689 0.5850 1.0000 Real GDP 0.2573
0.9484
0.7023
11-18
s 20.198 2
Adjusted R 0.9236
Chapter 11 - Multiple Regression
Partial F Calculations
Australia #Independent variables in full model #Independent variables dropped from the model SSEF
2039.748
SSER
39867.85
Partial F p-value
46.36369 0.0010
3k 2r
Model is significant with high R2, F-value, low multicollinearity. 11-111. Substitution of a variable with its logarithm transforms a non-linear model to a linear model. In this case, the logarithm of size of fund has a linear relationship with the dependent variables. 11-112. Since the t-statistic for each variable alone is significant and given the R-square, we can conclude that a good linear relation exists between the dependent and independent variables. Since the tstatistic of the cross products are not significant, there is no relation among the independent variables and the cross products. In conclusion, there is only a linear relationship among the dependent and independent variables. 11-113. Using MINITAB Regression Analysis: Com. Eff. versus Sincerity, Excitement, ... The regression equation is Com. Eff. = - 36.5 + 0.098 Sincerity + 1.99 Excitement + 0.507 Ruggedness - 0.366 Sophistication
Predictor Constant Sincerity Excitement Ruggedness Sophistication
S = 3.68895
Coef -36.49 0.0983 1.9859 0.5071 -0.3664
SE Coef 24.27 0.3021 0.2063 0.7540 0.3643
R-Sq = 94.6%
T -1.50 0.33 9.63 0.67 -1.01
P 0.171 0.753 0.000 0.520 0.344
R-Sq(adj) = 91.8%
Based on the p-values for the estimated coefficients, only the assessed excitement variable is significant. The adjusted R-square indicates that 91.8% of the variation in commercial effectiveness is explained by the model. The ANOVA test indicates that a linear relation exists between the dependent and independent variables.
11-19
Chapter 11 - Multiple Regression
Analysis of Variance Source Regression Residual Error Total
DF 4 8 12
SS 1890.36 108.87 1999.23
MS 472.59 13.61
F 34.73
P 0.000
11-114. STEPWISE chooses only Number of Rooms and Assessed Value. b0 = 91018 b1 = 7844 b2 = 0.2338 R 2 = 0.591 11-115. Answers to this web exercise will vary with selected countries and date of access. Case 15: Return on Capital for Four Different Sectors Indicator variables used: Sector
I1
I2
I3
Banking
0
0
0
Computers
1
0
0
Construction
0
1
0
Energy
0
0
1
1. A 1 2 3 4 5 6 7 8 9 10
B
C
D
E
Multiple Regression Results
b s(b) t p-value
F G Chapter 11 Case - ROC
H
0 1 Intercept Sales 14.6209 2.30E-05 2.51538 2.60E-05 5.81259 0.88781 0.0000 0.3770
2 Oper M 0.0824 0.0553 1.4905 0.1396
3 Debt/C -0.0919 0.0444 -2.0692 0.0414
4 I1 10.051 2.0249 4.9636 0.0000
5 I2 2.8059 2.2756 1.2331 0.2208
6 I3 -1.6419 1.8725 -0.8769 0.3829
1.2472
1.2212
1.6224
1.8560
1.8219
1.9096
VIF
I
7
Based on the regression coefficients of I1, I2, I3, the ranking of the sectors from highest return to lowest will be: Computers, Construction, Banking, Energy 2. From "Partial F" sheet, the p-value is almost zero. Hence the type of industry is significant.
11-20
Chapter 11 - Multiple Regression
3. 95% Prediction Intervals: Sector
95% Prediction Interval
Banking Computers
12.9576 + or – 12.977
Construction Energy
23.0082 + or – 13.295 15.7635 + or – 13.139 11.3157 + or – 12.864
11-21
Chapter 12 - Time Series, Forecasting, and Index Numbers
CHAPTER 12 TIME SERIES, FORECASTING, AND INDEX NUMBERS 12-1.
Trend analysis is a quick method of determining in which general direction the data are moving through time. The method lacks, however, the theoretical justification of regression analysis because of the inherent autocorrelations and the intended use of the method in extrapolation beyond the estimation data set.
12-2.
The trend regression is: b0 = 28.7273 b1 = -0.6947 r 2 = 0.511 yˆ = 28.7273 – 0.6947 t yˆ (Jul-2007) = 12.055%
for t = 24
(Using the template: “Trend Forecast.xls”) Forecasting with Trend
t 24 25 26 27 28
Forecast Z-hat 12.0553 11.3607 10.666 9.9713 9.27668
Regression Statistics 2 r 0.5111 MSE 22.24426 Slope -0.69466 Intercept 28.72727
Forecast for July, 2007 (t = 24) = 12.0553% 12-3.
The trend regression is: b0 = 34.818 b1 = 12.566 yˆ = 34.818 + 12.566 t
r 2 = .9858
yˆ (2008) = 198.182 yˆ (2009) = 210.748 (Using the template: “Trend Forecast.xls”)
12-1
Chapter 12 - Time Series, Forecasting, and Index Numbers
Forecasting with Trend Data Period 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007
t 1 2 3 4 5 6 7 8 9 10 11 12
Zt 53 65 74 85 92 105 120 128 144 158 179 195
Forecast t 13 14 15 16 17 18 19 20 21 22 23 24
Z-hat 198.182 210.748 223.315 235.881 248.448 261.014 273.58 286.147 298.713 311.28 323.846 336.413
Regression Statistics 2
r 0.9858 MSE 32.51189 Slope 12.56643 Intercept 34.81818
Forecast for 2008 (t = 13) = 198.182 and for 2009 (t = 14) = 210.748
12-4.
The trend regression is: b0 = -0.873 b1 = 3.327 yˆ = -0.667 + 3.269 t yˆ = 39.05%
r 2 = 0.8961
for t = 12
12-2
Chapter 12 - Time Series, Forecasting, and Index Numbers
(Using the template: “Trend Forecast.xls”) Forecasting with Trend
t 12 13 14 15 16
Forecast Z-hat 39.0545 42.3818 45.7091 49.0364 52.3636
Regression Statistics 2 r 0.8961 MSE 15.68081 Slope 3.327273 Intercept -0.87273
Forecast for next year (t = 12) = 39.05% 12-5.
No, because of the seasonality.
12-6.
No. Cycles are not well modeled by trend analysis.
12-7.
The term, ‘seasonal variation’ is reserved for variation with a cycle of one year.
12-8.
There will be too few degrees of freedom for error.
12-9.
The weather, for one thing, changes from year to year. Thus sales of winter clothing, as an example, would have a variable seasonal component.
12-3
Chapter 12 - Time Series, Forecasting, and Index Numbers
12-10. Using MINITAB to conduct a multiple regression with a time variable and 11 dummy variables: Regression Analysis: profit versus t, jan, ... adjusted R-square is reasonable. Setting t = 25, Jan = 1 and the rest of the months = 0, we The The regression equation is get a =forecasted for Jan, t2007 1.588 jan + 0.121 feb + 0.319 mar + 0.567 apr profit 0.163 value + 0.0521 + = 0.123 + 0.615 may + 0.413 jun + 0.510 jul + 0.758 aug + 0.856 sep + Predicted0.904 Values for New oct + Observations 0.602 nov New Obs Fit SE Fit SE 95% CI 95% Predictor Coef Coef T PI P 1 1.5875 0.3104 (0.9043, 2.2707) (0.5874, 2.5876) Constant 0.1625 0.3104 0.52 0.611 t 0.05208 0.01129 4.61 0.001 jan 0.1229 0.3543 0.35 0.735 for New 0.34 Observations febValues of Predictors 0.1208 0.3505 0.737 mar 0.3188 0.3470 0.92 0.378 New apr 0.5667 0.3439 1.65 0.128 Obs t jan feb mar apr may 1 25.0 0.6146 0.3411 0.000000 1.80 0.099 1.00 0.000000 0.000000 jun 0.4125 0.3387 1.22 0.249 julNew 0.5104 0.3366 1.52 0.158 augObs 0.7583 0.3349 2.26 0.045 aug sep oct nov 0.000000 0.000000 sep 1 0.000000 0.8563 0.3336 2.57 0.000000 0.026 oct 0.9042 0.3326 2.72 0.020 nov 0.6021 0.3320 1.81 0.097
may 0.000000
12-11. Using trend analysis: trend regression is: = 83.2% S = The 0.331834 R-Sq R-Sq(adj) = 64.8% 2 b0 = 8165707 b1 = 40169.72 r = 0.9715 yˆ = 8165707 + 40169.72 t Analysis of Variance yˆ = 8728083 for t = 13 Source DF SS MS F P Regression 12“Trend 5.9783 0.4982 4.52 0.009 (Using the template: Forecast.xls”) Residual Errorwith 11Trend 1.2112 0.1101 Forecasting Total 23 7.1896 t 14 15 16 17 18
Forecast Z-hat 8728083 8768252 8808422 8848592 8888761
Regression Statistics 2 r 0.9715 MSE 7.82E+08 Slope 40169.72 Intercept 8165707
Forecast for next year (t = 13) = 8728083
12-4
jun 0.000000
jul 0.000000
Chapter 12 - Time Series, Forecasting, and Index Numbers
12-12. Using a computer: Linear regression trend line:
data: t (mon.) Z(t) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34
(Jul) (Aug) (Sep) (Oct) (Nov) (Dec) (Jan) (Feb) (Mar) (Apr) (May) (Jun) (Jul) (Aug) (Sep) (Oct) (Nov) (Dec) (Jan) (Feb) (Mar) (Apr) (May) (Jun) (Jul) (Aug) (Sep) (Oct) (Nov) (Dec) (Jan) (Feb) (Mar) (Apr)
trend: Zhat(t)
7.40 6.80 6.40 6.60 6.50 6.00 7.00 6.70 8.20 7.80 7.70 7.30 7.00 7.10 6.90 7.30 7.00 6.70 7.60 7.20 7.90 7.70 7.60 6.70 6.30 5.70 5.60 6.10 5.80 5.90 6.20 6.00 7.30 7.40
7.18 7.17 7.15 7.13 7.11 7.09 7.07 7.05 7.03 7.01 6.99 6.97 6.95 6.93 6.91 6.89 6.87 6.86 6.84 6.82 6.80 6.78 6.76 6.74 6.72 6.70 6.68 6.66 6.64 6.62 6.60 6.58 6.56 6.54
Zhat(t) = 7.2043 0.0194 t
Centered Moving Average
C(t) = CMA Zhat(t)
7.02 7.01 7.05 7.10 7.15 7.20 7.25 7.30 7.30 7.29 7.28 7.25 7.20 7.11 7.00 6.89 6.79 6.71 6.62 6.51 6.43 6.40
0.993 0.995 1.002 1.012 1.022 1.032 1.043 1.052 1.057 1.057 1.059 1.058 1.053 1.043 1.029 1.017 1.005 0.996 0.985 0.971 0.963 0.960
-----------FORECAST---------------35 (May) (Zhat = 6.525)(S = 109.60)/100 = 7.15
Template Forecast is 7.045
12-5
Ratio Moving Average
Seasonal Index S
99.76 95.54 116.38 109.92 107.76 101.45 96.55 97.32 94.47 100.17 96.16 92.41 105.62 101.29 112.92 111.73 111.90 99.88 95.21 87.58 87.05 95.37
95.68 92.25 90.57 97.57 95.96 92.22 102.47 98.21 114.41 110.59 109.60 100.45 95.68 92.25 90.57 97.57 95.96 92.22 102.47 98.21 114.41 110.59 109.60 100.45 95.68 92.25 90.57 97.57 95.96 92.22 102.47 98.21 114.41 110.59
[Deseasoned] Z(t)/S% 7.73 7.37 7.07 6.76 6.77 6.51 6.83 6.82 7.17 7.05 7.03 7.27 7.32 7.70 7.62 7.48 7.29 7.27 7.42 7.33 6.90 6.96 6.93 6.67 6.58 6.18 6.18 6.25 6.04 6.40 6.05 6.11 6.38 6.69
Chapter 12 - Time Series, Forecasting, and Index Numbers
12-13. (Using the template: “Trend+Season Forecasting.xls, sheet: monthly”) Forecasting with Trend and Seasonality Forecast for Sep, 2006 = 0.33587 t Year 1 2004 2 2004 3 2005 4 2005 5 2005 6 2005 7 2005 8 2005 9 2005 10 2005 11 2005 12 2005 13 2005 14 2005 15 2006 16 2006 17 2006 18 2006 19 2006 20 2006 21 2006 22 2006
Month 11 Nov 12 Dec 1 Jan 2 Feb 3 Mar 4 Apr 5 May 6 Jun 7 Jul 8 Aug 9 Sep 10 Oct 11 Nov 12 Dec 1 Jan 2 Feb 3 Mar 4 Apr 5 May 6 Jun 7 Jul 8 Aug
Y 0.38 0.38 0.44 0.42 0.44 0.46 0.48 0.49 0.51 0.52 0.45 0.4 0.39 0.37 0.38 0.37 0.33 0.33 0.32 0.32 0.32 0.31 Intercept
Trend Equation 0.518283
Deseasonalized 0.40913 0.41684 0.45224 0.42406 0.48048 0.49272 0.45687 0.45687 0.4539 0.44922 0.44242 0.43222 0.4199 0.40587 0.39057 0.37357 0.36036 0.35347 0.30458 0.29837 0.2848 0.26781
t 23 24 25 26
Year 2006 2006 2006 2006
Slope -0.00818
12-14. (Using the template: “Trend+Season Forecasting.xls, sheet: monthly”) Forecasting with Trend and Seasonality
12-6
Forecasts Month 9 Sep 10 Oct 11 Nov 12 Dec
Y 0.33587 0.29803 0.29152 0.27867
Chapter 12 - Time Series, Forecasting, and Index Numbers
Forecast for Oct, 2006 = 28.73718 t 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
Data Year Month 2005 1 Jan 2005 2 Feb 2005 3 Mar 2005 4 Apr 2005 5 May 2005 6 Jun 2005 7 Jul 2005 8 Aug 2005 9 Sep 2005 10 Oct 2005 11 Nov 2005 12 Dec 2006 1 Jan 2006 2 Feb 2006 3 Mar 2006 4 Apr 2006 5 May 2006 6 Jun 2006 7 Jul 2006 8 Aug 2006 9 Sep Intercept
Trend Equation 22.54861
Y Deseasonalized 14 16.8856 10 22.2728 50 54.0922 24 24.6668 16 15.3033 15 15.8805 20 22.3533 42 22.5141 18 21.3884 26 20.2627 21 20.6647 20 21.4286 18 21.71 10 22.2728 22 23.8006 24 24.6668 26 24.8678 24 25.4087 18 20.1179 58 31.0909 40 47.5297
t 22 23 24 25
Forecasts Year Month 2006 10 Oct 2006 11 Nov 2006 12 Dec 2007 1 Jan
Y 28.73718 22.75217 20.88982 18.55136
Slope -0.00694
The forecast for October is considerably less than the actual percents recorded for August and September. The forecast reflects the historical percentage of negative stories instead of the recent past history. 12-15. (Using the template: “Trend+Season Forecasting.xls”) Forecasting with Trend and Seasonality (quarterly) t Year 1 2005 2 2005 3 2005 4 2005 5 2006 6 2006 7 2006 8 2006 9 2007
Q 1 2 3 4 1 2 3 4 1
Y
Deseasonalized 3.4 3.869621 4.5 4.150717 4 4.258289 5 4.554288 4.2 4.78012 5.4 4.98086 4.9 5.216404 5.7 5.191888 4.6 5.23537
12-7
Chapter 12 - Time Series, Forecasting, and Index Numbers
Forecasts Year t 10 2007 11 2007 12 2007
Q 2 3 4
Y 6.20676 5.56327 6.71894
Seasonal Indices Q Index 1 87.86 2 108.42 3 93.93 4 109.79 400
Forecast for Q2, 2007 = 6.20676 12-16. Assuming a weight of 0.4 (Using the template: “Exponential Smoothing.xls”) Exponential Smoothing MAE MAPE MSE 3.3688 7.91% 18.2177 Period 45 46 47 48 49
Actual Forecast 27 27.6959 26 27.4175 27 26.8505 28 26.9103 27.3462
Forecast for next quarter = 27.3462
12-8
Chapter 12 - Time Series, Forecasting, and Index Numbers
12-17. Using a computer: w = 0.3
Zhat(1) = Z(1) = 57
Zhat( 2): Zhat( 3): Zhat( 4): Zhat( 5): Zhat( 6): Zhat( 7): Zhat( 8): Zhat( 9): Zhat(10): Zhat(11): Zhat(12): Zhat(13): Zhat(14): Zhat(15): Zhat(16): Zhat(17):
0.3(57.00) 0.3(58.00) 0.3(60.00) 0.3(54.00) 0.3(56.00) 0.3(53.00) 0.3(55.00) 0.3(59.00) 0.3(62.00) 0.3(57.00) 0.3(50.00) 0.3(48.00) 0.3(52.00) 0.3(55.00) 0.3(58.00) 0.3(61.00)
+ + + + + + + + + + + + + + + +
0.7(57.00) 0.7(57.00) 0.7(57.30) 0.7(58.11) 0.7(56.88) 0.7(56.61) 0.7(55.53) 0.7(55.37) 0.7(56.46) 0.7(58.12) 0.7(57.79) 0.7(55.45) 0.7(53.21) 0.7(52.85) 0.7(53.50) 0.7(54.85)
w = 0.8 = = = = = = = = = = = = = = = =
57.00 57.30 58.11 56.88 56.61 55.53 55.37 56.46 58.12 57.79 55.45 53.21 52.85 53.50 54.85 56.69
0.8(57.00) 0.8(58.00) 0.8(60.00) 0.8(54.00) 0.8(56.00) 0.8(63.00) 0.8(55.00) 0.8(59.00) 0.8(62.00) 0.8(57.00) 0.8(50.00) 0.8(48.00) 0.8(52.00) 0.8(55.00) 0.8(58.00) 0.8(61.00)
+ + + + + + + + + + + + + + + +
0.2(57.00) 0.2(57.00) 0.2(57.80) 0.2(59.56) 0.2(55.11) 0.2(55.82) 0.2(53.56) 0.2(54.71) 0.2(58.14) 0.2(61.23) 0.2(57.85) 0.2(51.57) 0.2(48.71) 0.2(51.34) 0.2(54.27) 0.2(57.25)
= = = = = = = = = = = = = = = =
57.00 57.80 59.56 55.11 55.82 53.56 54.71 58.14 61.23 57.85 51.57 48.71 51.34 54.27 57.25 60.25
The w = .8 forecasts follow the raw data much more closely. This makes sense because the raw data jump back and forth fairly abruptly, so we need a high w for the forecasts to respond to those oscillations sooner.
12-9
Chapter 12 - Time Series, Forecasting, and Index Numbers
12-18. Using a computer: w = 0.7
Zhat(1) = Z(1) = 195
Zhat( 2): 0.7(195.00) + 0.3(195.00) = Zhat( 3): 0.7(193.00) + 0.3(195.00) = Zhat( 4): 0.7(190.00) + 0.3(193.60) = Zhat( 5): 0.7(185.00) + 0.3(191.08) = Zhat( 6): 0.7(180.00) + 0.3(186.82) = Zhat( 7): 0.7(190.00) + 0.3(182.05) = Zhat( 8): 0.7(185.00) + 0.3(187.61) = Zhat( 9): 0.7(186.00) + 0.3(185.78) = Zhat(10): 0.7(184.00) + 0.3(185.94) = Zhat(11): 0.7(185.00) + 0.3(184.58) = Zhat(12): 0.7(198.00) + 0.3(184.87) = Zhat(13): 0.7(199.00) + 0.3(194.06) = Zhat(14): 0.7(200.00) + 0.3(197.52) = Zhat(15): 0.7(201.00) + 0.3(199.26) = Zhat(16): 0.7(199.00) + 0.3(200.48) = Zhat(17): 0.7(187.00) + 0.3(199.44) = Zhat(18): 0.7(186.00) + 0.3(190.73) = Zhat(19): 0.7(191.00) + 0.3(187.42) = Zhat(20): 0.7(195.00) + 0.3(189.93) = Zhat(21): 0.7(200.00) + 0.3(193.48) = Zhat(22): 0.7(200.00) + 0.3(198.04) = Zhat(23): 0.7(190.00) + 0.3(199.41) = Zhat(24): 0.7(186.00) + 0.3(192.82) = Zhat(25): 0.7(196.00) + 0.3(188.05) = Zhat(26): 0.7(198.00) + 0.3(193.61) = Zhat(27): 0.7(200.00) + 0.3(196.68) = ---------------FORECAST-------------Zhat(28): 0.7(200.00) + ).3(199.01) =
195.00 193.60 191.08 186.82 182.05 187.61 185.78 185.94 184.58 184.87 194.06 197.52 199.26 200.48 199.44 190.73 187.42 189.93 193.48 198.04 199.41 192.82 188.05 193.61 196.68 199.01 199.70
12-10
Chapter 12 - Time Series, Forecasting, and Index Numbers
Exponential Smoothing MAE MAPE MSE 4.8241 2.52% 34.8155
w 0.7 t 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28
Zt 195 193 190 185 180 190 185 186 184 185 198 199 200 201 199 187 186 191 195 200 200 190 186 196 198 200 200
Forecast 195 195 193.6 191.08 186.824 182.047 187.614 185.784 185.935 184.581 184.874 194.062 197.519 199.256 200.477 199.443 190.733 187.42 189.926 193.478 198.043 199.413 192.824 188.047 193.614 196.684 199.005 199.702
2
|Error|
%Error
Error
3.6 6.08 6.824 7.9528 2.61416 0.21575 1.93527 0.41942 13.1258 4.93775 2.48132 1.7444 1.47668 12.443 4.7329 3.58013 5.07404 6.52221 1.95666 9.413 6.8239 7.95283 4.38585 3.31575 0.99473
1.89% 3.29% 3.79% 4.19% 1.41% 0.12% 1.05% 0.23% 6.63% 2.48% 1.24% 0.87% 0.74% 6.65% 2.54% 1.87% 2.60% 3.26% 0.98% 4.95% 3.67% 4.06% 2.22% 1.66% 0.50%
12.96 36.9664 46.567 63.247 6.83383 0.04655 3.74529 0.17591 172.287 24.3814 6.15697 3.04292 2.18059 154.828 22.4004 12.8173 25.7459 42.5392 3.82853 88.6046 46.5656 63.2475 19.2357 10.9942 0.98948
12-11
Chapter 12 - Time Series, Forecasting, and Index Numbers
12-19. Assuming a weight of 0.9 (Using the template: “Exponential Smoothing.xls”) Exponential Smoothing w t
0.9
Zt 1 2565942 2 2724292 3 3235231 4 3863508 5 4819747 6 5371689 7 6119114 8
Forecast 2565942 2565942 2708457 3182554 3795413 4717314 5306251 6037828
Forecast for 2007 = 6037828 12-20. Answers will vary. 12-21. Equation (12-11): Zˆ wZ w(1 w)Z t 1
t
t 1
w(1 w) 2 Z t 2 w(1 w) 3 Z t 3 ...
The same equation for Zˆ t (shifting all subscripts back by 1): Zˆ t = wZ t1 + w(1w)Z t2 + w(1-w) 2Z t3 + w(1-w) 3Z t4 + …
Now multiplying this second equation throughout by (1w) gives: (1w) Zˆ = w(1w)Z t-1 + w(1w) 2Z t-2 + w(1w) 3Z t3 = w(1w) 4Z t-4 + … t
Now note that all the terms on the right side of the equation above are identical to all the terms in Equation (12-11) on the top, after the term wZ t. Hence we can substitute in Equation (12-11) the left hand side of our last equation, (1w) Zˆ t for all the terms past the first. This gives us: Zˆ t 1 wZ t (1 w)Zˆ t
which is Equation (12-12). 12-22. Equation (12-13) is:
ˆ ˆ Z t +1 = Zt + (1 w)(Zt
Multiplying out we get:
Zt )
Zˆ t 1 Z t (1 w)Zˆ t (1 w)Z t Z t (1 w)Zˆ t Z t wZ t wZ t (1 w)Zˆ t ,
which is Equation (12-12).
12-12
Chapter 12 - Time Series, Forecasting, and Index Numbers
289.1 ; thus: 100 new CPI 24.9 26.9 27.5 27.7 .. .
12-23. Simply divide each CPI by year 1950 1951 1952 1953 .. .
old CPI 72.1 77.8 79.5 80.1 .. .
12-24. 168.77 in July 2000 and 173.48 in June 2001. 12-25. A simple price index reflects changes in a single price variable of time, relative to a single base time. 12-26. Index numbers are used as deflators for comparing values and prices over time in a way that prevents a given inflationary factor from affecting comparisons. They are also used to provide an aggregate measure of changes over time in several related variables. 12-27. a. 1988 1993index 163 = 1988index 100 c. It fell, from 145% of the 1988 output down to 133% of that output. d. Big increase in the mid ‘80’s, then a sharp drop in 1986, tumbling for three more then slowly climbing back up until 1995, then a drop-off.
b. Just divide each index number by
a) Price Index BaseYear 1988 Year 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997
100 Base Price Index 175 175 190 190 132 132 96 96 100 100 78 78 131 131 135 135 154 154 163 163 178 178 170 170 145 145 133 133
12-13
years,
Chapter 12 - Time Series, Forecasting, and Index Numbers
c) Price Index BaseYear 1993 Year 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997
163 Base Price Index 175 107.36 190 116.56 132 80.982 96 58.896 100 61.35 78 47.853 131 80.368 135 82.822 154 94.479 163 100 178 109.2 170 104.29 145 88.957 133 81.595
12-28. Divide each data point by Jun. ’03: 98.6
Jan.2004value 1.44 : 100
Jul. ’03: 95.14 …
12-29. Since a yearly cycle has 12 months and there are only 18 data points, a seasonal/cyclical decomposition isn’t feasible. Simple linear regression, with the successive months numbered 1,2,..., gives SALES = 4.23987 .03870MONTH, thus for July 1995 (month #19), the forecast is 3.5046. (Using the template: “Trend Forecast.xls”)
12-14
Chapter 12 - Time Series, Forecasting, and Index Numbers
Forecasting with Trend Data Period
t
jan feb mar apr may jun jul aug sep oct nov dec jan feb mar apr may jun
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
Zt 4.4 4.2 3.8 4.1 4.1 4 4 3.9 3.9 3.8 3.7 3.7 3.8 3.9 3.8 3.7 3.5 3.4
Forecast t 19 20 21
Z-hat 3.50458 3.46588 3.42718
The forecast of sales for July, 2004 is 3.5 million units. Regression Statistics 2
r 0.7285 MSE 0.016906 Slope -0.0387 Intercept 4.239869
12-30. Trend analysis is a quick, if sometimes inaccurate, method that can give good results. The additive and multiplicative TSCI models are sometimes useful, although they lack a firm theoretical framework. Exponential smoothing methods are good models. The ones described in this book do not handle seasonality, but extensions are possible. This author believes that BoxJenkins ARIMA models are the way to go. One limitation of these models is the need for large data sets. 12-31. Exponential smoothing models smooth the data of sharp variations and produce forecasts that follow a type of “average” movement in the data. The greater the weighting factor w, the closer the exponential smoothing series follows the data and forecasts tend to follow the variations in the data more closely.
12-15
Chapter 12 - Time Series, Forecasting, and Index Numbers
12-32. Using MINITAB: Stat: Time Series: Moving Average Moving Average for Data Moving Average Length
4
Accuracy Measures MAPE MAD MSD
1.69534 1.75000 3.66964
Forecasts Period 13
Forecast 103.375
Lower 99.6204
Upper 107.130
Forecast for next period = 103.375 12-33. Assuming a weight of 0.4 Use the template: Exponential Smothing.xls w t 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
0.4 Zt 18 17 15 14 15 11 8 5 4 3 5 4 6 5 7 8
Forecast 18 18 17.6 16.56 15.536 15.3216 13.593 11.3558 8.81347 6.88808 5.33285 5.19971 4.71983 5.2319 5.13914 5.88348 6.73009
y(2007) = 6.73009
12-16
Chapter 12 - Time Series, Forecasting, and Index Numbers
12-34. a) raised the seasonal index to 99.38 for April from 99.29 We would expect to see the April index change by a significant amount. The reason it did not is due to the calculations involving moving average. b) raised the seasonal index to 122.27 for April from 99.29 c) raised the seasonal index to 100.16 for December from 100.09 We would expect the December index to change by a significant amount. It did not due to the calculations for moving average. d) very high or low values for data points at the beginning or end of a series have little impact on the seasonal index due to their limited influence in the moving average computations. 12-35. (Using the template: “Trend Forecast.xls”) Forecasting with Trend Data Period 1998 1999 2000 2001 2002 2003 2004
t 1 2 3 4 5 6 7
Zt 6.3 6.6 7.3 7.4 7.8 6.9 7.8
Forecast t 8 9 10
Z-hat 7.95714 8.15714 8.35714
Forecast for 2005 = 7.957 Regression Statistics 2
r 0.5552 MSE 0.179429 Slope 0.2 Intercept 6.357143
12-36. Answers will vary. Case 16: Auto Parts Sales Forecast 1) Forecasts Year t 17 2002 18 2002 19 2002 20 2002
Q 1 2 3 4
Y $85,455,550.30 $108,706,616.14 $97,706,824.92 $105,724,455.54
12-17
Chapter 12 - Time Series, Forecasting, and Index Numbers
Using Excel’s regression tool and the Centered Moving Average (col. G of the template) as our Y and the values under t (col. B of the template) for our X, we get the following supporting detail for the Trend+Seasonal model: Regression Statistics Multiple R 0.89727 R Square 0.805093 Adjusted R Square 0.785602 Standard Error 1.558112 Observations 12
Intercept time
Standard Coefficients Error t Stat P-value 152.2638 1.195366 127.3785 2.18E-17 -0.83741 0.130296 -6.42701 7.57E-05
(Note: the coefficient values are identical to those generated by the template.) ANOVA df
SS MS F Significance F 1 100.2802 100.2802 41.30642 7.57E-05 10 24.27713 2.427713 11 124.5573
Regression Residual Total
2) Multiple Regression Equation: Y = -2693200091 – 8445234.547 M2 +82447357.24 NF – 3768891Oil Price Multiple Regression Results 0 Intercept -2693200091 b s(b) 1096606287 t -2.455940771 0.0303 p-value ANOVA Table Source SS df Regn. 3.77493E+15 3 Error 8.2767E+14 12 Total
4.6026E+15
15
1 2 M2 Index Non Farm Activity Index -8445234.547 82447357.24 101021547.4 38350031.1 -0.083598349 2.149864156 0.9348 0.0527
3 Oil Price -3768891 1263314.066 -2.983336528 0.0114
MS FCritical p-value F 1.25831E+15 18.243631 3.4902996 0.0001 6.89725E+13 3.0684E+14
R
3) forecasted values using regression model:
12-18
2
0.8202
s 8304970.102 2
Adjusted R 0.77521656
Chapter 12 - Time Series, Forecasting, and Index Numbers
Quarter 2002/Q1 2002/Q2 2002/Q3 2002/Q4 4)
Forecast $81,337,085.11 $55,574,874.53 $60,903,732.58 $59,868,829.41
4. Add the new data: Y
1
X1
X2 X3 X4 X5 X6 Non Farm Ones Activity Q2 Q3 Q4 Sales M2 Index Index Oil Price 35452300 1 2.356464 34.2 19.15 0 0 0 41469361 1 2.357643 34.27 16.46 1 0 0 40981634 1 2.364126 34.3 18.83 0 1 0 42777164 1 2.379493 34.33 19.75 0 0 1 43491652 1 2.373544 34.4 18.53 0 0 0 57669446 1 2.387192 34.33 17.61 1 0 0 59476149 1 2.403903 34.37 17.95 0 1 0 76908559 1 2.42073 34.43 15.84 0 0 1 63103070 1 2.431623 34.37 14.28 0 0 0 84457560 1 2.441958 34.5 13.02 1 0 0 67990330 1 2.447452 34.5 15.89 0 1 0 68542620 1 2.445616 34.53 16.91 0 0 1 73457391 1 2.45601 34.6 16.29 0 0 0 89124339 1 2.48364 34.7 17 1 0 0 85891854 1 2.532692 34.67 18.2 0 1 0 69574971 1 2.564984 34.73 17 0 0 1
12-19
Chapter 12 - Time Series, Forecasting, and Index Numbers
Multiple Regression Results 0
b s(b) t p-value
1
M2 Index Intercept -2655354679 -12780153.29 1219227600 118142020 -2.177899088 -0.108176187 0.0574 0.9162
2 Non Farm Activity Index 81566233.8 43101535.65 1.892420596 0.0910
9.9367
9.0506
VIF ANOVA Table Source SS df Regn. 3.89129E+15 6 Error 7.11312E+14 9 Total
MS 6.48548E+14 7.90347E+13
4.6026E+15 15
3
5
6
Oil Price Q2 Q3 Q4 -3827527.175 5802059 7127252.8 3211850.1 1534592.501 6575281.3 6653402.9 6716387.2 -2.494165177 0.8824047 1.0712192 0.478211 0.0342 0.4005 0.3120 0.6439 1.4058
F 8.2058616
3.0684E+14
4
R
2
1.6411
1.6803
FCritical p-value 3.3737564 0.0031
s 8890145.586 Adjusted 2 R 0.742423692
0.8455
Regression Equation: Sales = -2655354679 –12780153.29 M2 + 81566233.8 NFAI – 3827527.175 Oil P +5802059 Q2 +7127252.8 Q3 + 3211850.1 Q4 5. Forecast for next four quarters: Quarter 02 Q1 02 Q2 02 Q3 02 Q4
Sales 76344324 56495768 62878143 57771809
6. Partial F-test: H0: β4 = β5 = β6 = 0 H1: not all are zero (Remember, to drop the three indicator variables, they must be the last three independent variables in the data sheet of the template.) Partial F Calculations #Independent variables in full model #Independent variables dropped from the model SSEF
7.11E+14
SSER
8.28E+14
Partial F p-value
0.490747 0.6973
12-20
1.7123
6 3
Chapter 12 - Time Series, Forecasting, and Index Numbers
p-value = 0.6973, very high. Do not reject the null hypothesis: indicator variables are not significant. 7. Comparing the three model forecasts: It would be ideal to have the values for 2004 to compare the forecasts to the actual values. However, these values are not available. The next step is to compare the three models relative to R2, F and standard error of the models. Model R2 F Std. error Trend + Seasonal 0.805 41.306 1.558 MR (part 2) 0.820 18.244 8,304,970.1 MR (part 4) 0.846 8.206 8,890,145.6 Clearly, the best model is the Trend + Seasonal Model with the smallest Std. Error and highest Fvalue. The only significant independent variable in the multiple regression models is oil prices, and a regression of sales on oil prices only yields an R2 of 0.33 and a very high std. error.
12-21