SM McClave Stat10 Wm

To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com

INSTRUCTOR'S SOLUTIONS MANUAL to Accompany James T. McClave P. George Benson and Terry Sincich's

STATISTICS FOR BUSINESS AND ECONOMICS Tenth Edition

Nancy S. Boudreau

Bowling Green State University

Upper Saddle River, New Jersey Columbus, Ohio



Contents Preface

v

Chapter 1

Statistics, Data, and Statistical Thinking

1

Chapter 2

Methods for Describing Sets of Data The Kentucky Milk Case

5 46

Chapter 3

Probability

55

Chapter 4

Random Variables and Probability Distributions The Furniture Fire Case

82 136

Chapter 5

Inferences Based on a Single Sample: Estimation with Confidence Intervals

137

Chapter 6

Inferences Based on a Single Sample: Tests of Hypothesis

161

Chapter 7

Inferences Based on Two Samples: Confidence Intervals and Tests of Hypotheses The Kentucky Milk Case – Part II

201 243

Chapter 8

Design of Experiments and Analysis of Variance

256

Chapter 9

Categorical Data Analysis Discrimination in the Work Place

300 328

Chapter 10

Simple Linear Regression

332

Chapter 11

Multiple Regression and Model Building The Condo Sales Case

379 444

Chapter 12

Methods for Quality Improvement

448

Chapter 13

Time Series: Descriptive Analyses, Models, and Forecasting The Gasket Manufacturing Case

476 522

Chapter 14

Nonparametric Statistics

529

iii


iv


Preface This solutions manual is designed to accompany the text, Statistics for Business and Economics, Tenth Edition, by James T. McClave, P. George Benson, and Terry Sincich. It provides answers to most evennumbered exercises for each chapter in the text. Other methods of solution may also be appropriate; however, the author has presented one that she believes to be most instructive to the beginning Statistics student. This manual is provided to help instructors save time in preparing presentations of the solutions and to possibly provide another point of view regarding their meaning. Some of the exercises are subjective in nature. Subjective decisions regarding these exercises have been made and are explained by the author. Solutions based on these decisions are presented; the solution to this type of exercise is often most instructive. When an alternative interpretation of an exercise may occur, the author has often addressed it and given justification for the approach taken. I would like to thank Kelly Barber for creating the art work and for typing this work.

Nancy S. Boudreau Bowling Green State University Bowling Green, Ohio

v




Chapter 1

1.2

Descriptive statistics utilizes numerical and graphical methods to look for patterns, to summarize, and to present the information in a set of data. Inferential statistics utilizes sample data to make estimates, decisions, predictions, or other generalizations about a larger set of data.

1.4

The first element of inferential statistics is the population of interest. The population is a set of existing units. The second element is one or more variables that are to be investigated. A variable is a characteristic or property of an individual population unit. The third element is the sample. A sample is a subset of the units of a population. The fourth element is the inference about the population based on information contained in the sample. A statistical inference is an estimate, prediction, or generalization about a population based on information contained in a sample. The fifth and final element of inferential statistics is the measure of reliability for the inference. The reliability of an inference is how confident one is that the inference is correct.

1.6

Quantitative data are measurements that are recorded on a meaningful numerical scale. Qualitative data are measurements that are not numerical in nature; they can only be classified into one of a group of categories.

1.8

A population is a set of existing units such as people, objects, transactions, or events. A sample is a subset of the units of a population.

1.10

An inference without a measure of reliability is nothing more than a guess. A measure of reliability separates statistical inference from fortune telling or guessing. Reliability gives a measure of how confident one is that the inference is correct.

1.12

Statistical thinking involves applying rational thought processes to critically assess data and inferences made from the data. It involves not taking all data and inferences presented at face value, but rather making sure the inferences and data are valid.

1.14

a.

The two variables measured are ‘type of credit card used’ and ‘amount of purchase.’ ‘Type of credit card used’ is qualitative. It has no meaningful number associated with it, only the name of the card used. ‘Amount of purchase’ is quantitative. It has a meaningful number associated with it.

b.

In Study 1, it says that all purchases were tracked. Thus, the data represent a population.

a.

High school GPA is a number usually between 0.0 and 4.0. Therefore, it is quantitative.

b.

Honors/awards would have responses that name things. Therefore, it would be qualitative.

1.16


1


1.18

1.20

1.22.

2

c.

The scores on the SAT's are numbers between 200 and 800. Therefore, it is quantitative.

d.

Gender is either male or female. Therefore, it is qualitative.

e.

Parent's income is a number: $25,000, $45,000, etc. Therefore, it is quantitative.

f.

Age is a number: 17, 18, etc. Therefore, it is quantitative.

a.

1.

The variable of interest is the status of a company’s e-commerce strategy. Since a company either has an e-commerce strategy or not, the variable is qualitative.

2.

The variable of interest is when the company will implement an e-commerce plan. Since the time of implementation will be a date, this variable will be qualitative.

3.

The variable of interest is whether the company is delivering products over the internet or not. Since the company is either delivering products or not, the variable is qualitative.

4.

The variable of interest is the company’s total revenue in the last fiscal year. Since this is a meaningful number, this variable is quantitative.

b.

Since there are many more that 154 companies in the U.S., this represents a sample rather than a population.

a.

The population of interest is the collection of computer security personnel at all U.S. corporations and government agencies.

b.

Surveys were sent to computer security personnel at all U. S. corporations and government agencies. However, in 2006, only 616 organizations responded to the survey. There could be nonresponse bias. Often, only those subjects with strong opinions will respond to a survey. Thus, the responses may not reflect what the population as a whole thinks.

c.

The variable measured in the survey is whether or not there was unauthorized use of computer systems at the firms during the year. Since the responses will be either ‘Yes’ or “No’, the variable is qualitative.

d.

If we assume that the responses were a random sample from the population, we could infer that about 52% of all computer security personnel will admit to unauthorized use of computer systems at their firms during the year.

a.

The data collection method used is a designed experiment.

b.

The experimental units in the study are the 50,000 smokers.

c.

The variable of interest is the age at which the scanning method first detects a tumor. Since this is a meaningful number, this variable is quantitative.

Chapter 1


1.24

1.26

1.28

1.30

d.

The population of interest is the set of all smokers in the U.S. The sample of interest is the set of 50,000 smokers surveyed.

e.

The researchers want to compare the age at first detection for the 2 methods to see if one is more sensitive than the other.

a.

The variable of interest to the researchers is the rating of highway bridges.

b.

Since the rating of a bridge can be categorized as one of three possible values, it is qualitative.

c.

The data set analyzed is a population since all highway bridges in the U.S. were categorized.

d.

The data were collected observationally. Each bridge was observed in its natural setting.

a.

The population of interest is the set of all New York accounting firms employing two or more professionals. There are two variables of interest: Whether or not the firm uses audit sampling methods, and if so, whether or not it uses random sampling. The sample is the set of 163 firms whose responses were useable. The inference of interest to the New York Society of CPAs is the proportion of all New York accounting firms employing two or more professionals that use sampling methods in auditing their clients.

b.

The four responses that were unusable could have been returned blank or could have been filled out incorrectly.

c.

Any time a survey is mailed it is questionable whether the returned questionnaires represent a random sample. Often times, only those with very strong opinions return the surveys. In such a case, the returned surveys would not be representative of the entire population.

a.

The experimental units in this study are the 24 projects.

b.

The population from which the sample was selected is the set of all new software development projects.

c.

The variable of interest in this project is the outcome of reusing previously developed software for the new software development projects.

d.

In the sample, 9 of the 24 projects were judged failures. This is (9 / 24)*100% = 37.5%. We could infer that approximately 37.5% of all projects would be judged failures.

a.

The process being studied is the process of filling beverage cans with softdrink at CCSB's Wakefield plant.

b.

The variable of interest is the amount of carbon dioxide added to each can of beverage.

c.

The sampling plan was to monitor five filled cans every 15 minutes. The sample is the total number of cans selected.


3


4

d.

The company's immediate interest is learning about the process of filling beverage cans with softdrink at CCSB's Wakefield plant. To do this, they are measuring the amount of carbon dioxide added to a can of beverage to make an inference about the process of filling beverage cans. In particular, they might use the mean amount of carbon dioxide added to the sampled cans of beverage to estimate the mean amount of carbon dioxide added to all the cans on the process line.

e.

The technician would then be dealing with a population. The cans of beverage have already been processed. He/she is now interested in the outputs.

Chapter 1


Methods for Describing Sets of Data 2.2

a.

To find the frequency for each class, count the number of times each letter occurs. The frequencies for the three classes are: Class X Y Z Total

b.

Chapter 2

Frequency 8 9 3 20

The relative frequency for each class is found by dividing the frequency by the total sample size. The relative frequency for the class X is 8/20 = .40. The relative frequency for the class Y is 9/20 = .45. The relative frequency for the class Z is 3/20 = .15. Class X Y Z Total

Frequency 8 9 3 20

Relative Frequency .40 .45 .15 1.00

c.

The frequency bar chart is:

d.

The pie chart for the frequency distribution is:

Methods for Describing Sets of Data

5


2.4

a.

The variable summarized in the table is ‘Reason for requesting the installation of the passenger-side on-off switch.’ The values this variable could assume are: Infant, Child, Medical, Infant & Medical, Child & Medical, Infant & Child, and Infant & Child & Medical. Since the responses name something, the variable is qualitative.

b.

The relative frequencies are found by dividing the number of requests for each category by the total number of requests. For the category ‘Infant’, the relative frequency is 1,852/30,337 = .061. The rest of the relative frequencies are found in the table below: Reason Infant

Number of Requests 1,852

1,852/30,337

Relative frequencies .061

Child

17,148

17,148/30,337

.565

Medical

8,377

8,377/30,337

.276

Infant & Medical

44

44/30,337

.0014

Child & Medical

903

903/30,337

.030

1,878

1,878/30,337

.062

135

135/30,337

.0045

Infant & Child Infant & Child & Medical TOTAL c.

30,337

.9999

Using MINITAB, a pie chart of the data is:

Pie Chart of Reason Child

(17148, 56.5%)

Child&Medica ( 903, 3.0%) Inf &Chd&Med ( 135, 0.4%) Inf ant

( 1852, 6.1%)

Medical

( 8377, 27.6%)

Inf ant&Child ( 1878, 6.2%) Inf ant&Medic (

d.

6

44, 0.1%)

There are 4 categories where Medical is mentioned as a reason: Medical, Infant & Medical, Child & Medical, and Infant & Child & Medical. The sum of the frequencies for these 4 categories is 8,377 + 44 + 903 + 135 = 9,459. The proportion listing Medical as one of the reasons is 9,459/30,337 = .312.

Chapter 2


2.6

a.

To find relative frequencies, we divide the frequencies of each category by the total number of incidents. The relative frequencies of the number of incidents for each of the cause categories are: Management System Cause Category Engineering & Design Procedures & Practices Management & Oversight Training & Communication TOTAL

b.

Number of Incidents

Relative Frequencies

27 24 22 10 83

27 / 83 = .325 24 / 83 = .289 22 / 83 = .265 10 / 83 = .120 1

The Pareto diagram is: Management Systen Cause Category 35 30

P er cent

25 20 15 10 5 0

2.8

E ng&D es

P roc&P ract M gmt&O v er C ategor y

Trn&C omm

c.

The category with the highest relative frequency of incidents is Engineering and Design. The category with the lowest relative frequency of incidents is Training and Communication.

a.

The data collection method was a survey.

b.

Since the data were numbers (percentage of US labor and materials), the variable is quantitative. Once the data were collected, they were grouped into 4 categories.


7


c.

Using MINITAB, a pie chart of the data is: Pie Chart of Made in USA

100% (64, 60.4%)

<50% ( 4, 3.8%)

75-99% (20, 18.9%)

50-74% (18, 17.0%)

About 60% of those surveyed believe that “Made in USA” means 100% US labor and materials. 2.10

Using MINITAB, a bar chart of the frequency of occurrence of the industry types is:

Chart of INDUSTRY 80 70

Count

60 50 40 30 20 0

Aerospace/Defense Banking Capital Goods Chemicals Conglomerates Construction Consumer Durables Diversified Financials Drugs/Biotechnology Food Markets Food/Drink/Tobacco Health Care Hotels/Restaurants/Leisure Household/Personal Products Insurance Materials Media Oil & Gas Retailing Semiconductors Services/Supplies Software & Services Technology Equipment Telecommunications Transportation Utilities

10

INDUSTRY

8

Chapter 2


2.12

Using MINITAB, the side-by-side bar charts are: Chart of 1999, 2006 vs Use Yes

No

1999

0.7

D on't know

2006

Relative Fr equency

0.6 0.5 0.4 0.3 0.2 0.1 0.0

Yes

No Don't know Unathor ized Use of C O mputer Systems

The relative frequency of unauthorized use of computer systems has decreased from 1999 to 2006. 2.14

a.

Using MINITAB, the side-by-side graphs are: Chart of Exposure, Opportunity, Content, Faculty vs Stars 5 Exposure

4

3

2

Opportunity 16 12

Fr equency

8 4 Content

Faculty

0

16 12 8 4 0

5

4

3

2

Star s

From these graphs, one can see that very few of the top 30 MBA programs got 5-stars in any criteria. In addition, about the same number of programs got 4 stars in each of the 4 criteria. The biggest difference in ratings among the 4 criteria was in the number of programs receiving 3-stars. More programs received 3-stars in Course Content than in any of the other criteria. Consequently, fewer programs received 2-stars in Course Content than in any of the other criteria. b.

Since this chart lists the rankings of only the top 30 MBA programs in the world, it is reasonable that none of these best programs would be rated as 1-star on any criteria.


9


2.16

2.18

a.

The original data set has 1 + 3 + 5 + 7 + 4 + 3 = 23 observations.

b.

For the bottom row of the stem-and-leaf display: The stem is 0. The leaves are 0, 1, 2. The numbers in the original data set are 0, 1, and 2.

2.20.

10

c.

The dot plot corresponding to all the data points is:

a.

The measurement class that contains the highest proportion of respondents is “none”. Sixty-one percent of the respondents said that their companies did not outsource any computer security functions.

b.

From the graph, 6% of the respondents indicated that they outsourced between 20% and 40% of their computer security functions.

c.

The proportion of the 609 respondents who outsourced at least 40% of computer security functions is .04 + .01 + .01 = .06.

d.

The number of the 609 respondents who outsourced less than 20% of computer security functions is (.27 + .61)*609 = .88(609) = 536.

Chapter 2


2.22

a.

Using MINITAB, the stem-and-leaf display of the data is:

Stem-and-Leaf Display: SCORE Stem-and-leaf of SCORE Leaf Unit = 1.0 1 6 1 6 2 7 3 7 4 8 15 8 56 9 (100) 9 13 10

2.24

N

= 169

2 2 8 4 66677888899 00001111111222222222233333333344444444444 55555555555555555555556666666666666666666777777777777777777888888+ 0000000000000

b.

From the stem-and-leaf display, we see that there are only 4 observations with sanitation scores less than the acceptable score of 86. The proportion of ships that have an accepted sanitation standard would be (169 – 4) / 169 = .976.

c.

The sanitation score of 84 is in bold in the stem-and-leaf display in part a.

a.

Using MINITAB, the frequency histogram is:

Frequency

30

20

10

0 20

30

40

50

Length


11


b.

Using MINITAB, the frequency histogram is: 35 30

Frequency

25 20 15 10 5 0 0

500

1000

1500

2000

250

Weight

c.

Using MINITAB, the frequency histogram is:

140 120

Frequency

100 80 60 40 20 0 0

500

1000

DDT

2.26

Using MINITAB, the two dot plots are: Dotplot for Arrive-Depart

Yes. Most of the numbers of items arriving at the work center per hour are in the 135 to 165 area. Most of the numbers of items departing the work center per hour are in the 110 to 140 area. Because the number of items arriving is larger than the number of items departing, there will probably be some sort of bottleneck.

12

Chapter 2


2.28

a.

Using MINITAB, the three frequency histograms are as follows (the same starting point and class interval were used for each): Histogram of C1

N = 25

Tenth Performance Midpoint Count 4.00 0 8.00 0 12.00 1 16.00 5 20.00 10 24.00 6 28.00 0 32.00 2 36.00 0 40.00 1

* ***** ********** ****** ** *

Histogram of C2

N = 25

Thirtieth Performance Midpoint Count 4.00 1 8.00 9 12.00 12 16.00 2 20.00 1

* ********* ************ ** *

Histogram of C3

N = 25

Fiftieth Performance Midpoint Count 4.00 3 8.00 15 12.00 4 16.00 2 20.00 1

b.

*** *************** **** ** *

The histogram for the tenth performance shows a much greater spread of the observations than the other two histograms. The thirtieth performance histogram shows a shift to the left—implying shorter completion times than for the tenth performance. In addition, the fiftieth performance histogram shows an additional shift to the left compared to that for the thirtieth performance. However, the last shift is not as great as the first shift. This agrees with statements made in the problem.


13


2.30

a.

A stem-and-leaf display is as follows, where the stems are the units place and the leaves are the decimal places: Stem Leaves 1 0 0 0 0 1 1 2 2 222 3 4 4 4 4444 5 5 55 6 79 2 1 144 6 7 9 9 3 0 028 9 9 4 1112 5 5 24 6 7 8 8 9 10 1

2.32

b.

A little more than half (26/49 = .53) of all companies spent less than 2 months in bankruptcy. Only two of the 49 companies spent more than 6 months in bankruptcy. It appears then, in general, the length of time in bankruptcy for firms using "prepacks" is less than that of firms not using "prepacks."

c.

A dot diagram will be used to compare the time in bankruptcy for the three types of "prepack" firms:

d.

The circled times in part a correspond to companies that were reorganized through a leverage buyout. There does not appear to be any pattern to these points. They appear to be scattered about evenly throughout the distribution of all times.

Using MINITAB, the stem-and-leaf display for the data is: Stem-and-leaf of Time Leaf Unit = 1.0 3 7 (7) 11 6 4 2 1

3 4 5 6 7 8 9 10

N

= 25

239 3499 0011469 34458 13 26 5 2

The numbers in bold represent delivery times associated with customers who subsequently did not place additional orders with the firm. Since there were only 2 customers with delivery times of 68 days or longer that placed additional orders, I would say the maximum tolerable delivery time is about 65 to 67 days. Everyone with delivery times less than 67 days placed additional orders.

14

Chapter 2


2.34

a.

∑ x = 3 + 8 + 4 + 5 + 3 + 4 + 6 = 33

b.

∑x

c.

∑ ( x − 5)

= 32 + 82 + 42 + 52 + 32 + 42 + 62 = 175

2

2

= (3 − 5)2 + (8 − 5)2 + (4 − 5)2 + (5 − 5)2 + (3 − 5)2 + (4 − 5)2 + (6 − 5)2 = 20

d.

∑ ( x − 2)

2

= (3 − 2)2 + (8 − 2)2 + (4 − 2)2 + (5 − 2)2 + (3 − 2)2 + (4 − 2)2 + (6 − 2)2 = 71

2.36

2.38

2.40

e.

(∑ x)

a.

∑ x = 6 + 0 + (−2) + (−1) + 3 = 6

b.

∑x

2

c.

∑x

2

a.

x=

b.

x=

400 = 25 16

c.

x=

35 = .78 45

d.

x=

242 = 13.44 18

2

= (3 + 8 + 4 + 5 + 3 + 4 + 6)2 = 332 = 1089

= 62 + 02 + (−2)2 + (−1)2 + 32 = 50

(∑ x) − 5

∑ x = 85 n

10

2

= 50 −

62 = 50 − 7.2 = 42.8 5

= 8.5

The median is the middle number once the data have been arranged in order. If n is even, there is not a single middle number. Thus, to compute the median, we take the average of the middle two numbers. If n is odd, there is a single middle number. The median is this middle number. A data set with five measurements arranged in order is 1, 3, 5, 6, 8. The median is the middle number, which is 5. A data set with six measurements arranged in order is 1, 3, 5, 5, 6, 8. The median is the average 5 + 5 10 = 5. of the middle two numbers which is = 2 2


15


2.42

a.

∑ x = 7 + " + 4 = 15

x =

n

6

Median =

6

= 2.5

3+3 = 3 (mean of 3rd and 4th numbers, after ordering) 2

Mode = 3

2.44

∑ x = 2 + " + 4 = 40

b.

= 3.08 n 13 13 Median = 3 (7th number, after ordering) Mode = 3

c.

= 49.6 10 10 48 + 50 Median = = 49 (mean of 5th and 6th numbers, after ordering) 2 Mode = 50

a.

The sample mean is:

x =

∑ x = 51 + " + 37 = 496

x =

n

n

x=

∑x i =1

i

n

=

529 + 355 + 301 + ... + 63 3757 = = 144.5 26 26

The sample median is found by finding the average of the 13th and 14th observations once the data are arranged in order. The 13th and 14th observations are 100 and 105. The average of these two numbers (median) is: median =

100 + 105 205 = = 102.5 2 2

The mode is the observation appearing the most. For this data set, the mode is 70, which appears 3 times. Since the mean is larger than the median, the data are skewed to the right. b.

The sample mean is: n

x=

∑x i =1

i

n

=

11 + 9 + 6 + ... + 4 136 = = 5.23 26 26

The sample median is found by finding the average of the 13th and 14th observations once the data are arranged in order. The 13th and 14th observations are 5 and 5. The average of these two numbers (median) is: median =

16

5 + 5 10 = =5 2 2

Chapter 2


The mode is the observation appearing the most. For this data set, the mode is 6, which appears 6 times. Since the mean and median are about the same, the data are somewhat symmetric. 2.46

a.


x=

∑ xi i =1

n

=

1.72 + 2.50 + 2.16 + ⋅⋅⋅ + 1.95 37.62 = = 1.881 20 20

The sample average surface roughness of the 20 observations is 1.881. b.

The median is found as the average of the 10th and 11th observations, once the data have been ordered. The ordered data are: 1.06 1.09 1.19 1.26 1.27 1.40 1.51 1.72 1.95 2.03 2.05 2.13 2.13 2.16 2.24 2.31 2.41 2.50 2.57 2.64

The 10th and 11th observations are 2.03 and 2.05. The median is: 2.03 + 2.05 4.08 = = 2.04 2 2

The middle surface roughness measurement is 2.04. Half of the sample measurements were less than 2.04 and half were greater than 2.04.

2.48

c.

The data are somewhat skewed to the left. Thus, the median might be a better measure of central tendency than the mean. The few small values in the data tend to make the mean smaller than the median.

a.

Using MINITAB, the stem-and-leaf display is: Stem-and-leaf of PAF Leaf Unit = 1.0 6 8 (2) 7 5 4 4 3

b.

0 1 2 3 4 5 6 7

N=17

000009 25 45 13 0 2 057

The median is the middle number once the data are arranged in order. The data arranged in order are: 0, 0, 0, 0, 0, 9, 12, 15, 24, 25, 31, 33, 40, 62, 70, 75, 77. The middle number or the median is 24.

c.

The mean of the data is x =


∑x n

=

77 + 33 + 75 + " + 31 473 = = 27.82 17 17

17


2.50

d.

The number occurring most frequently is 0. The mode is 0.

e.

The mode corresponds to the smallest number. It does not seem to locate the center of the distribution. Both the mean and the median are in the middle of the stem-and-leaf display. Thus, it appears that both of them locate the center of the data.

a.

The sample mean length is: n

x=

∑x i =1

i

n

=

42.5 + 44.0 + 41.5 + ... + 36.0 6165 = = 42.81 144 144

The average length of the 144 fish is 42.81 cm. The median is the average of the middle two observations once they have been ordered. The 72nd and 73rd observations are 45 and 45. The average of these two observations is 45. Half of the fish lengths are less than 45 cm and half are longer. The mode is 46 cm. This observation occurred 12 times. b.

The sample mean weight is: n

x=

∑x i =1

i

n

=

732 + 795 + 547 + ... + 1433 151159 = = 1049.72 144 144

The average weight of the 144 fish is 1049.72 grams. The median is the average of the middle two observations once they have been ordered. The 72nd and 73rd observations are 989 and 1011. The average of these two observations is median =

989 + 1,011 = 1000 2

Half of the fish weights are less than 1000 grams and half are heavier. There are 2 modes, 886 and 1186. Each of these observations occurred 3 times. c.

The sample mean DDT level is: n

x=

∑x i =1

n

i

=

10 + 16 + 23 + ... + 1.9 3507.1 = = 24.35 144 144

The average DDT level of the 144 fish is 24.35 parts per million.

18

Chapter 2


The median is the average of the middle two observations once they have been ordered. The 72nd and 73rd observations are 7.1 and 7.2. The average of these two observations is median =

7.1 + 7.2 = 7.15 2

Half of the fish DDT levels are less than 7.15 parts per million and half are greater. The mode is 12. This observation occurred 8 times.

2.52

2.54

d.

From the graph in Exercise 2.24a, the data are skewed to the left. This corresponds to the relationship between the mean and the median. For data skewed to the left, the mean is less than the median. For the fish lengths, the mean is 42.81 and the median is 45.

e.

From the graph in Exercise 2.24b, the data are slightly skewed to the right. This corresponds to the relationship between the mean and the median. For data skewed to the right, the mean is more than the median. For the fish weights, the mean is 1049.72 and the median is 1000.

f.

From the graph in Exercise 2.24c, the data are skewed to the right. This corresponds to the relationship between the mean and the median. For data skewed to the right, the mean is more than the median. For the fish DDT levels, the mean is 24.35 and the median is 7.15.

a.

Due to the "elite" superstars, the salary distribution is skewed to the right. Since this implies that the median is less than the mean, the players' association would want to use the median.

b.

The owners, by the logic of part a, would want to use the mean.

a.


x=

∑x i =1

n

i

=

5 + 3 + 4 + ... + 3 80 = =4 20 20

The sample median is found by finding the average of the 10th and 11th observations once the data are arranged in order. The data arranged in order are: 1 1 1 1 1 2 2 3 3 3 4 4 4 5 5 5 6 7 9 13 The 10th and 11th observations are 3 and 4. The average of these two numbers (median) is: median =

3+ 4 7 = = 3.5 2 2

The mode is the observation appearing the most. For this data set, the mode is 1, which appears 5 times.


19


b.

Eliminating the largest number which is 13 results in the following: The sample mean is: n

x=

∑x i =1

i

n

=

5 + 3 + 4 + ... + 3 67 = = 3.53 19 19

The sample median is found by finding the middle observation once the data are arranged in order. The data arranged in order are: 1 1 1 1 1 2 2 3 3 3 4 4 4 5 5 5 6 7 9 The 10th observation is 3. The median is 3 The mode is the observations appearing the most. For this data set, the mode is 1, which appears 5 times. By dropping the largest number, the mean is reduced from 4 to 3.53. The median is reduced from 3.5 to 3. There is no effect on the mode. c.

The data arranged in order are: 1 1 1 1 1 2 2 3 3 3 4 4 4 5 5 5 6 7 9 13 If we drop the lowest 2 and largest 2 observations we are left with: 1 1 1 2 2 3 3 3 4 4 4 5 5 5 6 7

The sample 10% trimmed mean is: n

x=

∑x i =1

n

i

=

1 + 1 + 1 + ... + 7 56 = = 3.5 16 16

The advantage of the trimmed mean over the regular mean is that very large and very small numbers that could greatly affect the mean have been eliminated.

20

Chapter 2


2.56

a.

b.

2.58

s2 =

s2 =

(∑ x) − n −1

∑x

2

∑x

2

= 2

= 2

n

a.

Range = 42 − 37 = 5

b.

(∑ x) −

n −1

s=

3.3333 = 1.826

17 2 20 = .1868 20 − 1

=

s=

.1868 = .432

2

1992 5 = 3.7 5 −1

s=

3.7 = 1.92

7935 −

n

=

Range = 100 − 1 = 99

s2 = c.

∑x

2

4.8889 = 2.211

18 −

s2 =

n −1

1002 ` 40 = 3.3333 40 − 1

s=

380 −

n

(∑ x) −

202 10 = 4.8889 10 − 1

84 −

n

(∑ x) −

n −1

2

c.

s2 =

∑x

2

(∑ x) −

n −1

2

3032 9 = 1,949.25 9 −1

25,795 −

n

=

s = 1,949.25 = 44.15

Range = 100 − 2 = 98

s2 = 2.60

∑x

2

∑x

2

(∑ x) −

n −1

2

2952 8 = 1,307.84 8 −1

20,033 −

n

=

s = 1,307.84 = 36.16

This is one possibility for the two data sets. Data Set 1: 1, 1, 2, 2, 3, 3, 4, 4, 5, 5 Data Set 2: 1, 1, 1, 1, 1, 5, 5, 5, 5, 5

x1 = x2 =

∑ x = 1 + 1 + 2 + 2 + 3 + 3 + 4 + 4 + 5 + 5 = 30 = 3 n

10 10 1 + 1 + 1 + 1 + 1 + 5 + 5 + 5 + 5 + 5 30 = = =3 n 10 10

∑x

Therefore, the two data sets have the same mean. The variances for the two data sets are:

s12 =

s22 =

∑x

2

(∑ x) − n −1

∑x

2

n

(∑ x) −

n −1

2

n

=

302 10 = 20 = 2.2222 9 9

110 −

2

=

302 10 = 20 = 4.4444 9 9

110 −


21


The dot diagrams for the two data sets are shown below.

2.62

a.

Range = 3 − 0 = 3

s2 = b.

∑x

2

(∑ x) −

n −1

2

72 5 = 1.3 = 5 −1 15 −

n

s = 1.3 = 1.1402

After adding 3 to each of the data points, Range = 6 − 3 = 3

s2 = c.

∑x

2

(∑ x) −

n −1

2

n

=

222 5 = 1.3 5 −1

102 −

s = 1.3 = 1.1402

After subtracting 4 from each of the data points, Range = −1 − (−4) = 3

s2 =

2.64

∑x

2

(∑ x) −

n −1

n

2

=

(−13) 2 5 = 1.3 5 −1

39 −

s = 1.3 = 1.1402

d.

The range, variance, and standard deviation remain the same when any number is added to or subtracted from each measurement in the data set.

a.

The maximum age is 64. The minimum age is 39. The range is 64 – 39 = 25.

b.

The variance is: 2

⎛ n ⎞ ⎜ ∑ xi ⎟ n 2 24942 x − ⎝ i =1 ⎠ ∑ 125,764n 50 = 27.822 s 2 = i =1 = n −1 50-1

c.

The standard deviation is: s = s 2 = 27.822 = 5.275

d.

22

Since the standard deviation of the ages of the 50 most powerful women in Europe is 10 years and is greater than that in the U.S. (5.275 years), the age data for Europe is more variable.

Chapter 2


2.66

a.

The maximum weight is 1.1 carats. The minimum weight is .18 carats. The range is 1.1 − .18 = .92 carats.

b.

The variance is: 2

⎛ ⎞ ⎜ ∑ xi ⎟ 194.322 xi2 − ⎝ i ⎠ 146.19 − ∑ n 308 = .0768 square carats s2 = i = 308 − 1 n −1 c.

The standard deviation is: s = s 2 = .0768 = .2772 carats

2.68

d.

The standard deviation. This gives us an idea about how spread out the data are in the same units as the original data.

a.

A worker's overall time to complete the operation under study is determined by adding the subtask-time averages. Worker A

The average for subtask 1 is: x =

∑ x = 211 = 30.14 n

7 21 = The average for subtask 2 is: x = =3 n 7 Worker A's overall time is 30.14 + 3 = 33.14.

∑x

Worker B

The average for subtask 1 is: x =

∑ x = 213 = 30.43 n

7 29 = The average for subtask 2 is: x = = 4.14 n 7 Worker B's overall time is 30.43 + 4.14 = 34.57.

∑x

b.

Worker A

s=

∑x

2

(∑ x) − n −1

2

n

=

2117 7 = 15.8095 = 3.98 7 −1

6455 −

Worker B

s= c.

∑x

2

(∑ x) − n −1

n

2

=

2132 7 = .9524 = .98 7 −1

6487 −

The standard deviations represent the amount of variability in the time it takes the worker to complete subtask 1.


23


d.

Worker A

∑x

s=

(∑ x) −

2

n −1

2

n

=

212 7 = .6667 = .82 7 −1

67 −

Worker B

∑x

s= e.

(∑ x) −

2

n −1

2

n

=

292 7 = 4.4762 = 2.12 7 −1

147 −

I would choose workers similar to worker B to perform subtask 1. Worker B has a slightly higher average time on subtask 1 (A: x = 30.14, B: x = 30.43). But, Worker B has a smaller variability in the time it takes to complete subtask 1 (part b). He or she is more consistent in the time needed to complete the task. I would choose workers similar to Worker A to perform subtask 2. Worker A has a smaller average time on subtask 2 (A: x = 3, B: x = 4.14). Worker A also has a smaller variability in the time needed to complete subtask 2 (part d).

2.70

2.72

Since no information is given about the data set, we can only use Chebyshev's Rule. a.

Nothing can be said about the percentage of measurements which will fall between x − s and x + s.

b.

At least 3/4 or 75% of the measurements will fall between x − 2s and x + 2s.

c.

At least 8/9 or 89% of the measurements will fall between x − 3s and x + 3s.

a.

x =

s2 =

∑ x = 206 n

∑x

25

= 8.24

(∑ x) −

2

n −1

n

2

=

2062 25 = 3.357 25 − 1

1778 −

s=

s 2 = 1.83

b. Interval

c.

24

Number of Measurements in Interval

Percentage

x ± s, or (6.41, 10.07)

18

18/25 = .72 or 72%

x ± 2s, or (4.58, 11.90)

24

24/25 = .96 or 96%

x ± 3s, or (2.75, 13.73)

25

25/25 = 1

or 100%

The percentages in part b are in agreement with Chebyshev's Rule and agree fairly well with the percentages given by the Empirical Rule.

Chapter 2


d.

Range = 12 − 5 = 7 s ≈ range/4 = 7/4 = 1.75 The range approximation provides a satisfactory estimate of s = 1.83 from part a.

2.74

From Chebyshev’s Theorem, we know that at least ¾ or 75% of all observations will fall within 2 standard deviations of the mean. From Exercise 2.47, x = .631. From Exercise 2.66, s = .2772. This interval is: x ± 2 s ⇒ .631 ± 2(.2772) ⇒ .631 ± .5544 ⇒ (.0766, 1.1854)

2.76

a.

From the information given, we have x = 375 and s = 25. From Chebyshev's Rule, we know that at least three-fourths of the measurements are within the interval: x ± 2s, or (325, 425)

Thus, at most one-fourth of the measurements exceed 425. In other words, more than 425 vehicles used the intersection on at most 25% of the days. b.

According to the Empirical Rule, approximately 95% of the measurements are within the interval: x ± 2s, or (325, 425)

This leaves approximately 5% of the measurements to lie outside the interval. Because of the symmetry of a mound-shaped distribution, approximately 2.5% of these will lie below 325, and the remaining 2.5% will lie above 425. Thus, on approximately 2.5% of the days, more than 425 vehicles used the intersection. 2.78

a.

Since the sample mean (18.2) is larger than the sample median (15), it indicates that the distribution of years is skewed to the right. In addition, the maximum number of years is 50 and the minimum is 2. If the distribution were symmetric, the mean and median should be about halfway between these two numbers. Halfway between the maximum and minimum values is 26, which is much larger than either the mean or the median.

b.

The standard deviation can be estimated by the range divided by either 4 or 6. For this distribution, the range is: Range = Largest − smallest = 50 − 2 = 48. Dividing the range by 4, we get an estimate of the standard deviation to be 48/4 = 12. Dividing the range by 6, we get an estimate of the standard deviation to be 48/6 = 8. Thus, the standard deviation should be somewhere between 8 and 12. For this problem, the standard deviation is s = 10.64. This value falls in the estimated range of 8 to 12.


25


c.

First, we calculate the number of standard deviations from the mean the value of 40 years is. To do this, we first subtract the mean and then divide by the value of the standard deviation. 40 − x 40 − 18.2 Number of standard deviations is = 2.05 ≈ 2 = 10.64 s Using Chebyshev's Rule, we know that at most 1/k2 or 1/22 = 1/4 of the data will be more than 2 standard deviations from the mean. Thus, this would indicate that at most 25% of the Generation Xers responded with 40 years or more. Next, we calculate the number of standard deviations from the mean the value of 8 years is. Number of standard deviations is

8 − x 8 − 18.2 = −.96 ≈ -1 = s 10.64

Using Chebyshev's Rule, we get no information about the data within 1 standard deviation of the mean. However, we know the median (15) is more than 8. By definition, 50% of the data are larger than the median. Thus, at least 50% of the Generation Xers responded with 8 years or more. No additional information can be obtained with the information given. 2.80

a.

Using MINITAB, the frequency histogram for the time in bankruptcy is:

Frequency

20

10

0 1

2

3

4

5

6

7

8

9

10

Time in Bankrupt

The Empirical Rule is not applicable because the data are not mound shaped.

26

Chapter 2


b. Using MINITAB, the descriptive measures are: Descriptive Statistics: Time in Bankrupt

Variable Time in

N 49

Mean 2.549

Median 1.700

TrMean 2.333

Variable Time in

Minimum 1.000

Maximum 10.100

Q1 1.350

Q3 3.500

StDev 1.828

SE Mean 0.261

From Chebyshev’s Theorem, we know that at least 75% of the observations will fall within 2 standard deviations of the mean. This interval is: x ± 2 s ⇒ 2.549 ± 2(1.828) ⇒ 2.549 ± 3.656 ⇒ (−1.107, 6.205)

c. There are 47 of the 49 observations within this interval. The percentage would be (47/49)*100% = 95.9%. This agrees with Chebyshev’s Theorem (at least 75%0. It also agrees with the Empirical Rule (approximately 95%). d. From the above interval we know that about 95% of all firms filing for prepackaged bankruptcy will be in bankruptcy between 0 and 6.2 months. Thus, we would estimate that a firm considering filing for bankruptcy will be in bankruptcy up to 6.2 months. 2.82

2.84

a.

Since it is given that the distribution is mound-shaped, we can use the Empirical Rule. We know that 1.84% is 2 standard deviations below the mean. The Empirical Rule states that approximately 95% of the observations will fall within 2 standard deviations of the mean and, consequently, approximately 5% will lie outside that interval. Since a mound-shaped distribution is symmetric, then approximately 2.5% of the day's production of batches will fall below 1.84%.

b.

If the data are actually mound-shaped, it would be extremely unusual (less than 2.5%) to observe a batch with 1.80% zinc phosphide if the true mean is 2.0%. Thus, if we did observe 1.8%, we would conclude that the mean percent of zinc phosphide in today's production is probably less than 2.0%.

a.

Since we do not have any idea of the shape of the distribution of SAT-Math score changes, we must use Chebyshev’s Theorem. We know that at least 8/9 of the observations will fall within 3 standard deviations of the mean. This interval would be: x ± 3s ⇒ 19 ± 3(65) ⇒ 19 ± 195 ⇒ (−176, 214)

Thus, for a randomly selected student, we could be pretty sure that this student’s score would be any where from 176 points below his/her previous SAT-Math score to 214 points above his/her previous SAT-Math score. b.

Since we do not have any idea of the shape of the distribution of SAT-Verbal score changes, we must use Chebyshev’s Theorem. We know that at least 8/9 of the observations will fall within 3 standard deviations of the mean. This interval would be: x ± 3s ⇒ 7 ± 3(49) ⇒ 7 ± 147 ⇒ (−140, 154)


27


Thus, for a randomly selected student, we could be pretty sure that this student’s score would be any where from 140 points below his/her previous SAT-Verbal score to 154 points above his/her previous SAT-Verbal score.

2.86

c.

A change of 140 points on the SAT-Math would be a little less than 2 standard deviations from the mean. A change of 140 points on the SAT-Verbal would be a little less than 3 standard deviations from the mean. Since the 140 point change for the SAT-Math is not as big a change as the 140 point on the SAT-Verbal, it would be most likely that the score was a SAT-Math score.

a.

z=

b.

z=

c.

z=

d.

z=

x − x 40 − 30 = 2 (sample) = s 5 x−μ

σ x−μ

σ

2 standard deviations above the mean.

=

90 − 89 = .5 (population) .5 standard deviations above the mean. 2

=

50 − 50 = 0 (population) 0 standard deviations above the mean. 5

x − x 20 − 30 = −2.5 (sample) 2.5 standard deviations below the mean. = s 4

2.88

The 50th percentile of a data set is the observation that has half of the observations less than it. Another name for the 50th percentile is the median.

2.90

Since the element 40 has a z-score of −2 and 90 has a z-score of 3, −2 =

40 − μ

σ

and 3 =

⇒ −2σ = 40 − μ ⇒ μ − 2σ = 40 ⇒ μ = 40 + 2σ

90 − μ

σ ⇒ 3σ = 90 − μ ⇒ μ + 3σ = 90

By substitution, 40 + 2σ + 3σ = 90 ⇒ 5σ = 50 ⇒ σ = 10 By substitution, μ = 40 + 2(10) = 60 Therefore, the population mean is 60 and the standard deviation is 10. 2.92

28

The percentile ranking of the age of 25 years would be 100% − 73.5% = 26.5%.

Chapter 2


2.94

a.

From Exercise 2.77, x = 94.91 and s = 4.83. The z-score for an observation of 78 is: z=

x − x 78 − 94.91 = = −3.50 s 4.83

This z-score indicates that an observation of 78 is 3.5 standard deviations below the mean. Very few observations will be lower than this one. b.

The z-score for an observation of 98 is: z=

x − x 98 − 94.91 = = 0.63 s 4.83

This z-score indicates that an observation of 98 is .63 standard deviations above the mean. This score is not an unusual observation in the data set. 2.96

a.

From the problem, μ = 2.7 and σ = .5 z=

x-μ

σ

⇒ zσ = x − μ ⇒ x = μ + zσ

For z = 2.0, x = 2.7 + 2.0(.5) = 3.7 For z = −1.0, x = 2.7 − 1.0(.5) = 2.2 For z = .5, x = 2.7 + .5(.5) = 2.95 For z = −2.5, x = 2.7 − 2.5(.5) = 1.45 b.

For z = −1.6, x = 2.7 − 1.6(.5) = 1.9

c.

If we assume the distribution of GPAs is approximately mound-shaped, we can use the Empirical Rule. From the Empirical Rule, we know that ≈.025 or ≈2.5% of the students will have GPAs above 3.7 (with z = 2). Thus, the GPA corresponding to summa cum laude (top 2.5%) will be greater than 3.7 (z > 2). We know that ≈.16 or 16% of the students will have GPAs above 3.2 (z = 1). Thus, the limit on GPAs for cum laude (top 16%) will be greater than 3.2 (z > 1). We must assume the distribution is mound-shaped.


29


2.98

a.

Since the data are approximately mound-shaped, we can use the Empirical Rule. On the blue exam, the mean is 53% and the standard deviation is 15%. We know that approximately 68% of all students will score within 1 standard deviation of the mean. This interval is: x ± s ⇒ 53 ± (15) ⇒ (38, 68)

About 95% of all students will score within 2 standard deviations of the mean. This interval is: x ± 2 s ⇒ 53 ± 2(15) ⇒ 53 ± 30 ⇒ (23, 83)

About 99.7% of all students will score within 3 standard deviations of the mean. This interval is: x ± 3s ⇒ 53 ± 3(15) ⇒ 53 ± 45 ⇒ (8, 98)

b.

Since the data are approximately mound-shaped, we can use the Empirical Rule. On the red exam, the mean is 39% and the standard deviation is 12%. We know that approximately 68% of all students will score within 1 standard deviation of the mean. This interval is: x ± s ⇒ 39 ± (12) ⇒ (27, 51)

About 95% of all students will score within 2 standard deviations of the mean. This interval is: x ± 2 s ⇒ 39 ± 2(12) ⇒ 39 ± 24 ⇒ (15, 63)

About 99.7% of all students will score within 3 standard deviations of the mean. This interval is:

c.

2.100

30

x ± 3s ⇒ 39 ± 3(12) ⇒ 39 ± 36 ⇒ (3, 75) The student would have been more likely to have taken the red exam. For the blue exam, we know that approximately 95% of all scores will be from 23% to 83%. The observed 20% score does not fall in this range. For the blue exam, we know that approximately 95% of all scores will be from 15% to 63%. The observed 20% score does fall in this range. Thus, it is more likely that the student would have taken the red exam.

The 25th percentile, or lower quartile, is the measurement that has 25% of the measurements below it and 75% of the measurements above it. The 50th percentile, or median, is the measurement that has 50% of the measurements below it and 50% of the measurements above it. The 75th percentile, or upper quartile, is the measurement that has 75% of the measurements below it and 25% of the measurements above it.

Chapter 2


2.102

a.

Median is approximately 4.

b.

QL is approximately 3 (Lower Quartile) QU is approximately 6 (Upper Quartile)

2.104

c.

IQR = QU − QL ≈ 6 − 3 = 3

d.

The data set is skewed to the right since the right whisker is longer than the left, there is one outlier, and there are two potential outliers.

e.

50% of the measurements are to the right of the median and 75% are to the left of the upper quartile.

f.

There are two potential outliers, 12 and 13. There is one outlier, 16.

a.

From the problem, x = 52.33 and s = 9.22. The highest salary is 75 (thousand). The z-score is z =

x−x 75 − 52.33 = = 2.46 s 9.22

Therefore, the highest salary is 2.46 standard deviations above the mean. The lowest salary is 35.0 (thousand). The z-score is z =

x−x 35.0 − 52.33 = = −1.88 s 9.22

Therefore, the lowest salary is 1.88 standard deviations below the mean. The mean salary offer is 52.33 (thousand). The z-score is z =

x−x 52.33 − 52.33 = =0 s 9.22

The z-score for the mean salary offer is 0 standard deviations from the mean. No, the highest salary offer is not unusually high. For any distribution, at least 8/9 of the salaries should have z-scores between −3 and 3. A z-score of 2.46 would not be that unusual.


31


b.

Using MINITAB, the box plot is:

Since no salaries are outside the inner fences, none of them are potentially faulty observations. 2.106

Using MINITAB, the side-by-side box plots are: 65

60

A GE

55

50

45

40 1

2 GRO UP

3

From the boxplots, there appears to be one outlier in the third group. 2.108

a.

First, we will compute the mean and standard deviation. The sample mean is: n

x=

∑x i =1

n

i

=

393 = 5.24 75

The sample variance is: 2

⎛ ⎞ ⎜ ∑ xi ⎟ 3932 xi2 − ⎝ i ⎠ 5943 − ∑ n 75 = 52.482 s2 = i = 75 − 1 n −1

32

Chapter 2


The standard deviation is: s = s 2 = 52.482 = 7.244

Since this data set is highly skewed, we will use 2 standard deviations from the mean as the cutoff for outliers. Z-scores with values greater than 2 in absolute value are considered outliers. An observation with a z-score of 2 would have the value: z=

x−x x − 5.24 ⇒2= ⇒ 2(7.244) = x − 5.24 ⇒ 14.488 = x − 5.24 ⇒ x = 19.728 s 7.244

An observation with a z-score of -2 would have the value: x−x x − 5.24 ⇒ −2 = ⇒ −2(7.244) = x − 5.24 z= s 7.244 ⇒ −14.488 = x − 5.24 ⇒ x = −9.248

Thus any observation that is greater than to 19.728 or less than -9.248 would be considered an outlier. In this data set there would be 4 outliers: 21, 21, 25, 48. b.

Deleting these 4 outliers, we will recalculate the mean, median, variance, and standard deviation. The median for the original data set is the middle number once they have been arranged in order and is the 38th observation which is 3. The new mean is: n

x=

∑x i =1

n

i

=

278 = 3.92 71

The new sample variance is: 2

⎛ ⎞ ⎜ ∑ xi ⎟ 2782 xi2 − ⎝ i ⎠ 2132 − ∑ n 71 = 14.907 s2 = i = n −1 71 − 1 The new standard deviation is: s = s 2 = 14.907 = 3.861

The new median is the 36th observation once the data have been arranged in order and is 3. In the original data set, the mean is 5.24, the standard deviation is 7.244, and the median is 3. In the revised data set, the mean is 3.92, the standard deviation is 3.861, and the median is 3. The mean has been decreased, the standard deviation has been almost halved, but the median stays the same.


33


2.110

For Perturbed Intrinsics, but no Perturbed Projections: n

x=

∑ xi i =1

n

=

1.0 + 1.3 + 3.0 + 1.5 + 1.3 8.1 = = 1.62 5 5 2

⎛ n ⎞ ⎜ ∑ xi ⎟ n 2 8.12 xi − ⎝ i =1 ⎠ 15.63 − ∑ n 5 = 2.508 = .627 s 2 = i =1 = 4 4 n −1 s = s 2 = .627 = .792

The z-score corresponding to a value of 4.5 is z=

x − x 4.5 − 1.62 = = 3.63 s .792

Since this z-score is greater than 3, we would consider this an outlier for perturbed intrinsics, but no perturbed projections. For Perturbed Projections, but no Perturbed Intrinsics: n

x=

∑ xi i =1

n

=

22.9 + 21.0 + 34.4 + 29.8 + 17.7 125.8 = = 25.16 5 5 2

⎛ n ⎞ ⎜ ∑ xi ⎟ n 2 125.82 xi − ⎝ i =1 ⎠ 3350.1 − ∑ n 5 = 184.972 = 46.243 s 2 = i =1 = 4 4 n −1 s = s 2 = 46.243 = 6.800

The z-score corresponding to a value of 4.5 is z=

x − x 4.5 − 25.16 = = −3.038 s 6.800

Since this z-score is less than -3, we would consider this an outlier for perturbed projections, but no perturbed intrinsics. Since the z-score corresponding to 4.5 for the perturbed projections, but no perturbed intrinsics is smaller than that for perturbed intrinsics, but no perturbed projections, it is more likely that the that the type of camera perturbation is perturbed projections, but no perturbed intrinsics.

34

Chapter 2


2.112

Using MINITAB, a scatterplot of the data is: 15

Var2

10

5

0 -1

0

1

2

3

4

5

6

7

8

Var1

2.114

Using MINITAB, the scatterplot of the data is:

550

Lawyers

450

350

250

150

50 0

5

10

Offices

As the number of offices increases, the number of lawyers also tends to increase. 2.116

a.

Using MINITAB, the scatterplot is: 20

30th

15

10

5 10

20

30

40

10th

It appears that as the completion time for the 10th trial increases, the completion time for the 30th trial decreases.


35


b.


50th

15

10

5

10

20

30

40

10th

It appears that as the completion time for the 10th trial increases, the completion time for the 50th trial increases. c.


50th

15

10

5

5

10

15

20

30th

It appears that as the completion time for the 30th trial increases, the completion time for the 50th trial increases.

36

Chapter 2


2.118

Using MINITAB, the scatterplot of the data is: Scatterplot of Mass vs Time 7 6 5

M ass

4 3 2 1 0 0

10

20

30 T ime

40

50

60

There is evidence to indicate that the mass of the spill tends to diminish as time increases. As time is getting larger, the mass is decreasing. 2.120

The mean is sensitive to extreme values in a data set. Therefore, the median is preferred to the mean when a data set is skewed in one direction or the other.

2.122

a.

If we assume that the data are about mound-shaped, then any observation with a z-score greater than 3 in absolute value would be considered an outlier. From Exercise 1.121, the z-score corresponding to 50 is −1, the z-score corresponding to 70 is 1, and the z-score corresponding to 80 is 2. Since none of these z-scores is greater than 3 in absolute value, none would be considered outliers.

b.

From Exercise 1.121, the z-score corresponding to 50 is −2, the z-score corresponding to 70 is 2, and the z-score corresponding to 80 is 4. Since the z-score corresponding to 80 is greater than 3, 80 would be considered an outlier.

c.

From Exercise 1.121, the z-score corresponding to 50 is 1, the z-score corresponding to 70 is 3, and the z-score corresponding to 80 is 4. Since the z-scores corresponding to 70 and 80 are greater than or equal to 3, 70 and 80 would be considered outliers.

d.

From Exercise 1.121, the z-score corresponding to 50 is .1, the z-score corresponding to 70 is .3, and the z-score corresponding to 80 is .4. Since none of these z-scores is greater than 3 in absolute value, none would be considered outliers.


37


2.124

a.

∑ x = 4 + 6 + 6 + 5 + 6 + 7 = 34 ∑ x = 42 + 62 + 62 + 52 + 62 + 72 = 198 ∑ x = 34 = 5.67 x= 2

n

s2 =

∑x

6

2

(∑ x) −

2

n

=

n −1 s = 1.067 = 1.03 b.

342 6 = 5.3333 = 1.0667 6 −1 5

198 −

∑ x = −1 + 4 + (−3) + 0 + (−3) + (−6) = −9 ∑ x = (−1)2 + 42 + (−3)2 + 02 + (−3)2 + (−6)2 = 71 ∑ x = −9 = -$1.5 x= 2

n

∑x

6

2

(∑ x) −

2

n = n −1 s = 11.5 = $3.39 s2 =

c.

3

4

2

1

(−9) 2 6 = 57.5 = 11.5 dollars squared 6 −1 5

71 −

1

∑ x = 5 + 5 + 5 + 5 + 16 2

2

= 2.0625 2

2

2

2 ⎛ 3⎞ ⎛ 4⎞ ⎛ 2⎞ ⎛1⎞ ⎛ 1 ⎞ ∑ x = ⎜⎝ 5 ⎟⎠ + ⎜⎝ 5 ⎟⎠ + ⎜⎝ 5 ⎟⎠ + ⎜⎝ 5 ⎟⎠ + ⎜⎝ 16 ⎟⎠ = 1.2039 ∑ x = 2.0625 = .4125% x= 5 n

s2 =

d.

2.126

38

∑x

2

(∑ x) − n

2

=

2.06252 .3531 5 = .0883% squared = 5 −1 4

1.2039 −

s=

n −1 .0883 = .30%

(a)

Range = 7 − 4 = 3

(b)

Range = $4 − ($-6) = $10

(c)

Range =

4 1 64 5 59 % − % = % − % = % = .7375% 5 16 80 80 80

σ ≈ range/4 = 20/4 = 5

Chapter 2


2.128

Using MINITAB, a pie chart of the data is: Pie Chart of defect C ategory false true

true 9.8%

false 90.2%

A response of ‘true’ means the software contained defective code. Thus, only 9.8% of the modules contained defective software code. 2.130

The z-score would be: z=

x − x 408 − 603.7 = = −1.06 185.4 s

Since this value is not very big, this is not an unusual value to observe. 2.132

2.134

a.

The variable of interest is opinion of book reviews. The values could be ‘would not recommend’, ‘cautious or very little recommendation’, ‘little or no preference’, ‘favorable/recommended’, and ‘outstanding/significant contribution’. Since these responses are not numerical, the variable is quantitative.

b.

Most of the books (63%) received a "favorable/recommended" review. About the same percentage of books received the following reviews: "cautious or very little recommendation" (10%), "little or no preference" (9%), and "outstanding/significant contribution" (12%). Only 5% of the books received "would not recommend" reviews.

c.

If the top two categories are added together, the percent recommended is 75% (actually slightly higher than 75%). This agrees with the study.

a.

To display the status, we use a pie chart. From the pie chart, we see that 58% of the Beanie babies are retired and 42% are current.


39


b.

Using Minitab, a histogram of the values is:

Most (40 of 50) Beanie babies have values less than $100. Of the remaining 10, 5 have values between $100 and $300, 1 has a value between $300 and $500, 1 has a value between $500 and $700, 2 have values between $700 and $900, and 1 has a value between $1900 and $2100. c.

A plot of the value versus the age of the Beanie Baby is as follows:

From the plot, it appears that as the age increases, the value tends to increase. 2.136

a.

Using MINITAB, the stem-and-leaf display is: Stem-and-leaf of C1 Leaf Unit = 0.10 4 (25) 16 4 2 2 2 2 1 1

40

0 0 1 1 2 2 3 3 4 4

N = 46

34 4 4 5 5 5 5 5 5 5 556666 6 6 6 7 7 7 7 7 8 8 8 8 9 000011222 3 34 7 7

9 7

Chapter 2


2.138

b.

The leaves that represent those brands that carry the American Dental Association seal are circled above.

c.

It appears that the cost of the brands approved by the ADA tend to have the lower costs. Thirteen of the twenty brands approved by the ADA, or (13/20) × 100% = 65% are less than the median cost.

a.

Using MINITAB, the summary statistics are:

Descriptive Statistics: Marketing, Engineering, Accounting, Total Variable Marketin Engineer Accounti Total

N 50 50 50 50

Mean 4.766 5.044 3.652 13.462

Median 5.400 4.500 0.800 13.750

TrMean 4.732 4.798 2.548 13.043

Variable Marketin Engineer Accounti Total

Minimum 0.100 0.400 0.100 1.800

Maximum 11.000 14.400 30.000 36.200

Q1 2.825 1.775 0.200 8.075

Q3 6.250 7.225 3.725 16.600

b.

SE Mean 0.365 0.542 0.885 0.965

The z-scores corresponding to the maximum time guidelines developed for each department and the total are as follows: Marketing: z =

x − x 6.5 − 4.77 = .67 = 2.58 s

Engineering: z =

x − x 7.0 − 5.04 = .51 = 3.84 s

Accounting: z =

x − x 8.5 − 3.65 = .77 = 6.26 s

Total: z = c.

StDev 2.584 3.835 6.256 6.820

x − x 17 − 13.46 = .52 = s 6.82

To find the maximum processing time corresponding to a z-score of 3, we substitute in the values of z, , and s into the z formula and solve for x. z=

x−x ⇒ x − x = zs ⇒ x = x + zs s

Marketing:

x = 4.77 + 3(2.58) = 4.77 + 7.74 = 12.51 None of the orders exceed this time.

Engineering:

x = 5.04 + 3(3.84) = 5.04 + 11.52 = 16.56 None of the orders exceed this time.

These both agree with both the Empirical Rule and Chebyshev's Rule.


41


Accounting:

x = 3.65 + 3(6.26) = 3.65 + 18.78 = 22.43 One of the orders exceeds this time or 1/50 = .02.

Total:

x = 13.46 + 3(6.82) = 13.46 + 20.46 = 33.92 One of the orders exceeds this time or 1/50 = .02.

These both agree with Chebyshev's Rule but not the Empirical Rule. Both of these last two distributions are skewed to the right. d.

Marketing:

x = 4.77 + 2(2.58) = 4.77 + 5.16 = 9.93 Two of the orders exceed this time or 2/50 = .04.

Engineering:


Accounting:

x = 3.65 + 2(6.26) = 3.65 + 12.52 = 16.17 Three of the orders exceed this time or 3/50 = .06.

Total:


All of these agree with Chebyshev's Rule but not the Empirical Rule. e.

No observations exceed the guideline of 3 standard deviations for both Marketing and Engineering. One observation exceeds the guideline of 3 standard deviations for both Accounting (#23, time = 30.0 days) and Total (#23, time = 36.2 days). Therefore, only (1/10) × 100% of the "lost" quotes have times exceeding at least one of the 3 standard deviation guidelines. Two observations exceed the guideline of 2 standard deviations for both Marketing (#31, time = 11.0 days and #48, time = 10.0 days) and Engineering (#4, time = 13.0 days and #49, time = 14.4 days). Three observations exceed the guideline of 2 standard deviations for Accounting (#20, time = 22.0 days; #23, time = 30.0 days; and #36, time = 18.2 days). Two observations exceed the guideline of 2 standard deviations for Total (#20, time = 30.2 days and #23, time = 36.2 days). Therefore, (7/10) × 100% = 70% of the "lost" quotes have times exceeding at least one the 2 standard deviation guidelines. We would recommend the 2 standard deviation guideline since it covers 70% of the lost quotes, while having very few other quotes exceed the guidelines.

2.140

a.

First, construct a relative frequency distribution for the departments. Class 1 2 3 4 5

42

Department Production Maintenance Sales R&D Administration TOTAL

Frequency 13 31 3 2 5 54

Relative Frequency .241 .574 .056 .037 .093 1.001

Chapter 2


The Pareto diagram is: From the diagram, it is evident that the departments with the worst safety record are Maintenance and Production.

b.

First, construct a relative frequency distribution for the type of injury in the maintenance department. Class 1 2 3 4 5 6 7 8

Injury Burn Back strain Eye damage Cuts Broken arm Broken leg Concussion Hearing loss TOTAL

Frequency 6 5 2 10 2 1 3 2 31

Relative Frequency .194 .161 .065 .323 .065 .032 .097 .065 1.002

The Pareto diagram is: From the Pareto diagram, it is evident that cuts is the most prevalent type of injury. Burns and back strain are the next most prevalent types of injuries.

2.142

a.

Using MINITAB, the descriptive statistics are:

Descriptive Statistics: MPG Variable MPG

N 36

Mean 40.056

Median 40.000

TrMean 40.063

Variable MPG

Minimum 35.000

Maximum 45.000

Q1 39.000

Q3 41.000


StDev 2.177

SE Mean 0.363

43


The mean is 40.056 and the standard deviation is 2.177. Both of these measures are measured in the same units as the original data, which are miles per gallon. b.

Since the sample mean is a good estimate of the population mean, the manufacturer should be satisfied. The sample mean is 40.056 which is greater than 40.

c.

The range of the data set is 45 − 35 = 10. Using Chebyshev's Rule, the range should cover approximately 6 standard deviations. Thus, a good estimate of the standard deviation would be 10/6 = 1.67. Using the Empirical Rule, the range should cover approximately 4 standard deviations. Thus, a good estimate of the standard deviation would be 10/4 = 2.5 The given standard deviation is 2.177 which is between these two estimates. Thus, it is a reasonable value.

d.

Using MINITAB, the frequency histogram is (the relative frequency histogram would have the same shape):

9 8

Frequency

7 6 5 4 3 2 1 0 35

36

37

38

39

40

41

42

43

44

45

MPG

Yes, the data appear to be mound-shaped. e.

Because the data are mound-shaped, we can use the Empirical Rule. We would expect approximately 68% of the data within the interval x ± s, approximately 95% of the data within the interval x ± 2s, and approximately all of the data within the interval x ± 3s.

f.

The interval x ± s is 40.056 ± 2.177 or (37.879, 42.233). Twenty-seven of the observations fall in this interval or 27/36 = .75 or 75%. This number is a little larger than 68%. The interval x ± 2s is 40.056 ± 2(2.177) or (35.702, 44.410). Thirty-four of the observations fall in this interval or 34/36 = .94 or 94%. This number is very close to 95%. The interval x ± 3s is 40.056 ± 3(2.177) or (33.525, 46.587). Thirty-six of the observations fall in this interval or 36/36 = 1.00 or 100%. This number is the same as all of the observations.

44

Chapter 2


2.144

a.

Both the height and width of the bars (peanuts) change. Thus, some readers may tend to equate the area of the peanuts with the frequency for each year.

b.

The frequency bar chart is:


45


The Kentucky Milk Case

(To accompany Chapters 1–2)

There are many things that could be included in a report about the possibility of collusion. I have concentrated on the incumbency rates, bid levels and dispersion, and average winning bids. With the data available, no comparison of market share can be made since there was so much missing data. Actually, with the data available, the exact analysis cannot be made, since only the winning bid information is provided. Thus, we have no idea what the losing bids were. I will present what I think is a reasonable solution. This is by no means the only solution to the case. Many other presentations could also be used.

Incumbency Rates The incumbency rate is the percent of the school districts that are won by the same vendor who won the previous year. A table containing the incumbency rates is included as well as a plot. Notice in the plot that the incumbency rates in the Tri-county market is higher than that in the Surrounding market. From 1985 through 1988, the incumbency rate for the Tri-county market was never lower than .923, while in the same period in the Surrounding market, the incumbency rate was never higher than .730. This implies the possibility of collusion in the Tri-county market.

Year 1984 1985 1986 1987 1988 1989 1990 1991

46

Surrounding Market Tri-county Market Number of Same Incumbency Number of Same Incumbency Districts Vendors Rate Districts Vendors Rate 26 16 .615 10 8 .800 27 19 .704 12 12 1.000 32 19 .594 13 13 1.000 37 27 .730 13 12 .923 37 25 .676 13 13 1.000 37 23 .622 13 9 .692 34 24 .706 13 10 .769 5 3 .600 13 11 .846



The plot of the incumbency rates is:

Bid Levels and Dispersion Since we only have access to the winning bids in each of the school districts, we cannot make a true analysis of the bid levels and dispersions. As a compromise, I have used the winning bids of the two dairies in question—Trauth and Meyer. I have looked at only the winning bids of these two dairies in both the Tri-county market and in the Surrounding market. If there was no collusion, then the winning bids and the dispersions of the winning bids should be similar in the two markets for the two dairies. I looked at the box plots of the winning bids of the two dairies in each market for each type of milk: whole white, lowfat white and lowfat chocolate. I have included only a few of the box plots as illustrations. Those included are for 1985 and 1986.


47


1985 Winning Bids:

OBS

MARKET

WINNER

WHOLE WHITE

LOWFAT WHITE

LOWFAT CHOCOLATE

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

SUR SUR SUR SUR SUR SUR SUR TRI TRI TRI TRI TRI TRI TRI TRI TRI TRI TRI TRI TRI

MEYER TRAUTH TRAUTH TRAUTH MEYER TRAUTH MEYER TRAUTH TRAUTH MEYER TRAUTH MEYER MEYER MEYER TRAUTH TRAUTH MEYER TRAUTH MEYER TRAUTH

0.1280 0.1200 . . 0.1225 0.1230 0.1250 0.1440 0.1450 0.1410 0.1393 0.1340 0.1445 . 0.1449 . 0.1480 0.1310 . 0.1435

0.1250 0.1110 0.1079 0.1190 0.1130 0.1130 0.1145 0.1440 0.1350 0.1410 0.1393 0.1340 0.1345 0.1345 0.1349 0.1299 0.1480 0.1290 0.1380 0.1335

0.1315 0.1090 0.1079 0.1210 0.1099 0.1120 0.1140 . . 0.1410 . 0.1340 0.1395 . 0.1399 0.1299 0.1480 . . .

Box Plots for Whole White Milk—1985 Boxplots for Whole White Milk - 1985 0.150 0.145

WWBID

0.140 0.135 0.130 0.125 0.120 S U RRO U N D

TRI-C O U N TY M A RKET

48



Box Plots for Lowfat White Milk—1985 Boxplots for Lowfat White Milk - 1985 0.15

LFWBID

0.14

0.13

0.12

0.11 S U RRO U N D


Box Plots for Lowfat Chocolate Milk—1985 Boxplots for Lowfat Chocolate Milk - 1985 0.15

LFC BID

0.14

0.13

0.12

0.11 S U RRO U N D



49


For each type of milk, the mean and median winning bids for the Tri-county market were higher than the corresponding winning bids in the Surrounding market. Also, the dispersion, indicated by the width of the boxes and the length of the whiskers, for the Surrounding market is larger than for the Tri-county market in most cases. This is indicative of collusion in the Tri-county market. This same pattern also existed in 1986. 1986 Winning Bids:

OBS

MARKET

WINNER

WHOLE WHITE

LOWFAT WHITE

LOWFAT CHOCOLATE

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21

SUR SUR SUR SUR SUR SUR SUR SUR TRI TRI TRI TRI TRI TRI TRI TRI TRI TRI TRI TRI TRI

TRAUTH TRAUTH TRAUTH MEYER TRAUTH TRAUTH TRAUTH TRAUTH TRAUTH TRAUTH MEYER TRAUTH MEYER MEYER MEYER TRAUTH TRAUTH MEYER TRAUTH MEYER TRAUTH

0.1195 0.1330 0.1140 0.1350 0.1224 . . 0.1250 0.1475 0.1469 0.1440 0.1420 0.1390 0.1470 . 0.1474 . 0.1505 0.1360 . 0.1460

0.1100 0.1240 0.1070 0.1250 0.1124 0.1110 0.1180 0.1125 0.1475 0.1369 0.1340 0.1420 0.1390 0.1370 0.1380 0.1374 0.1349 0.1505 0.1320 0.1430 0.1360

0.1085 0.1290 0.1050 0.1315 0.1110 0.1110 0.1200 0.1115 . . 0.1395 . 0.1390 0.1420 . 0.1424 0.1349 0.1505 . . .

Box Plots for Whole White Milk—1986 Boxplots for Whole White Milk - 1986 0.15

WWBID

0.14

0.13

0.12

0.11 S U RRO U N D


50



Box Plots for Lowfat White Milk—1986 Boxplots for Lowfat White Milk - 1986 0.15

LFWBID

0.14

0.13

0.12

0.11

S U RRO U N D


Box Plots for Lowfat Chocolate Milk—1986 Boxplots for Lowfat Chocolate Milk - 1986 0.15

LFC BID

0.14

0.13

0.12

0.11

0.10 S U RRO U N D



51


The same pattern that existed for 1985 and 1986 also existed in 1984, 1987, and 1988. From 1989 on, the pattern no longer existed. Thus, from the plots, it appears that the two dairies were working together from 1984 through 1988 in the Tri-county market. I also plotted the mean winning bids for the two dairies in each of the two markets from 1984 through 1991 for each type of milk. In all three plots, the mean winning bid in 1983 was almost the same in the two markets. Then, in 1984, the mean winning bid in the Tri-county market was higher than in the Surrounding market for all three types of milk. This trend holds basically through 1988 (the lowfat white milk mean winning bid for the Surrounding market was greater than the mean winning bid in the Tri-county market in 1988). After 1988, the mean winning bids in the two markets are almost the same. This points to collusion in the Tri-county market from 1984 through 1988.

52



The dispersion, measured using the standard deviation, of the winning bids for each of the three types of milk was basically smaller in the Tri-county market than in the Surrounding market for the years 1985 through 1988. Again, after 1988 this pattern no longer existed. Again, this points to collusion between the two dairies in the Tri-county market during the years 1984 through 1988.


53


54



Probability

3.2

Chapter 3

a.

This is a Venn Diagram.

b.

If the sample points are equally likely, then P(1) = P(2) = P(3) = ⋅⋅⋅ = P(10) =

1 10

Therefore,

1 1 1 3 + + = = .3 10 10 10 10 1 1 2 P(B) = P(6) + P(7) = + = = .2 10 10 10

P(A) = P(4) + P(5) + P(6) =

3.4

1 1 3 5 + + = = .25 20 20 20 20 3 3 6 + = P(B) = P(6) + P(7) = = .3 20 20 20

c.

P(A) = P(4) + P(5) + P(6) =

a.

⎛ 9⎞ 9! 9 ⋅ 8 ⋅ 7 ⋅ 6 ⋅ 5 ⋅ 4 ⋅ 3 ⋅ 2 ⋅1 = 126 = ⎜ ⎟= ⎝ 4 ⎠ 4!(9 − 4)! 4 ⋅ 3 ⋅ 2 ⋅ 1 ⋅ 5 ⋅ 4 ⋅ 3 ⋅ 2 ⋅ 1

b.

⎛7⎞ 7! 7 ⋅ 6 ⋅ 5 ⋅ 4 ⋅ 3 ⋅ 2 ⋅1 = = 21 ⎜ ⎟= ⎝ 2 ⎠ 2!(7 − 2)! 2 ⋅ 1 ⋅ 5 ⋅ 4 ⋅ 3 ⋅ 2 ⋅ 1

c.

⎛ 4⎞ 4! 4 ⋅ 3 ⋅ 2 ⋅1 =1 = ⎜ ⎟= ⎝ 4 ⎠ 4!(4 − 4)! 4 ⋅ 3 ⋅ 2 ⋅ 1 ⋅ 1

d.

⎛ 5⎞ 5! 5 ⋅ 4 ⋅ 3 ⋅ 2 ⋅1 =1 = ⎜ ⎟= ⎝ 0 ⎠ 0!(5 − 0)! 1 ⋅ 5 ⋅ 4 ⋅ 3 ⋅ 2 ⋅ 1

e.

⎛ 6⎞ 6! 6 ⋅ 5 ⋅ 4 ⋅ 3 ⋅ 2 ⋅1 = =6 ⎜ ⎟= ⎝ 5 ⎠ 5!(6 − 5)! 5 ⋅ 4 ⋅ 3 ⋅ 2 ⋅ 1 ⋅ 1

Probability

55


3.6

a.

The 36 sample points are: 1,1 1,2 1,3 1,4 1,5 1,6 2,1 2,2 2,3 2,4 2,5 2,6 3,1 3,2 3,3 3,4 3,5 3,6 4,1 4,2 4,3 4,4 4,5 4,6 5,1 5,2 5,3 5,4 5,5 5,6 6,1 6,2 6,3 6,4 6,5 6,6

b.

If the dice are fair, then each of the sample points is equally likely. Each would have a probability of 1/36 of occurring.

c.

There is one sample point in A: 3,3. Thus, P(A) =

1 . 36

There are 6 sample points in B: 1,6 2,5 3,4 4,3 5,2 and 6,1. Thus, P(B) =

6 1 = . 36 6

There are 18 sample points in C: 1,1 1,3 1,5 2,2 2,4 2,6 3,1 3,3 3,5 4,2 4,4 18 1 = . 4,6 5,1 5,3 5,5 6,2 6,4 and 6,6. Thus, P(C) = 36 2 3.8

Each student will obtain slightly different proportions. However, the proportions should be close to P(A) = 1/10, P(B) = 6/10 and P(C) = 3/10.

3.10

Define the following event: B: {Postal worker was assaulted on the job in the past year} P(B) =

3.12

a.

600 = .05 12,000

The 5 sample points are: Total population, Agricultural change, Presence of industry, Growth, and Population concentration.

b.

The probabilities are best estimated with the sample proportions. Thus, P(Total population) = .18 P(Agricultural change) = .05 P(Presence of industry) = .27 P(Growth) = .05 P(Population concentration) = .45

c.

Define the following event: A: {Factor specified is population-related} P(A) = P(Total population) + P(Growth) + P(Population concentration) = .18 + .05 + .45 = .68.

56

Chapter 3


3.14

a.

The sample points of this experiment correspond to each of the 8 possible types of commodities. Suppose we introduce notation to make the listing of the sample points easier. A: {carload contains agricultural products} CH: {carload contains chemicals} CO: {carload contains coal} F: {carload contains forest products} MO: {carload contains metallic ores and minerals} MV: {carload contains motor vehicles and equipment} N: {carload contains nonmetallic minerals and products} O: {carload contains other}

The eight sample points are: A CH CO F MO MV N O b.

The probability of each sample point is found by dividing the number of carloads for each sample point by the total number of carloads. The probabilities are: P(A) = 41,690 / 335,770 = .124 P(CH) = 38,331 / 335,770 = .114 P(CO) = 124,595 / 335,770 = .371 P(F) = 21,929 / 335,770 = .065 P(MO) = 34,521 / 335,770 = .103 P(MV) = 22,906 / 335,770 = .068 P(N) = 37,416 / 335,770 = .111 P(O) = 14,382 / 335,770 = .043

c.

P(MV) = .068 P(nonagricultural products) = P(CH) + P(CO) + P(F) + P(MO) + P(MV) + P(N) + P(O) = .114 + .371 + .065 + .103 + .068 + .111 + .043 = .875

d.

P(CH) + P(CO) = .114 + .371 = .485

e.

Since there were 335,770 carloads that week, the probability of selecting any one in particular would be 1 / 335,770 = .00000298. Thus, the probability of selecting the carload with the serial number 1003642 is .00000298.

Probability

57


3.16

a.

Since order does not matter, the number of different bets would be a combination of 8 things taken 2 at a time. The number of ways would be ⎛8 ⎞ 8! 8 ⋅ 7 ⋅ 6 ⋅ 5 ⋅ 4 ⋅ 3 ⋅ 2 ⋅ 1 40,320 = = = 28 ⎜ ⎟= ⎝ 2 ⎠ 2!(8 − 2)! 2 ⋅ 1 ⋅ 6 ⋅ 5 ⋅ 4 ⋅ 3 ⋅ 2 ⋅ 1 1440

3.18

b.

If all players are of equal ability, then each of the 28 sample points would be equally likely. Each would have a probability of occurring of 1/28. There is only one sample point with values 2 and 7. Thus, the probability of winning with a bet of 2-7 would by 1/28 or .0357.

a.

Let I = Infiniti 1435, TP = Toyota Prius, and C = Chevrolet Corvette. All possible rankings are as follows, where the first dealer listed is ranked first, the second dealer listed is ranked second, and the third dealer listed is ranked third: I,TP,C

b.

I,C,TP

C,I,TP

C,TP,I

TP,I,C

TP,C, I

If each set of rankings is equally likely, then each has a probability of 1/6. The probability that the Toyota Prius is ranked first = P(TP,I,C) + P(TP,C, I) =1/6 + 1/6 = 2/6 = 1/3. The probability that the Infinity 1435 is ranked third = P(C,TP,I) + P(TP,C, I) =1/6 + 1/6 = 2/6 = 1/3. The probability that the Toyota Prius is ranked first and the Chevrolet Corvette is ranked second = P(TP,C, I) =1/6.

3.20

First, we need to compute the total number of ways we can select 2 bullets (pair) from 1,837 bullets. This is a combination of 1,837 things taken 2 at a time. The number of pairs is:

⎛1,837 ⎞ 1,837! 1837 ⋅1836 ⋅ ⋅ ⋅ ⋅1 1837 ⋅1836 ⎜⎜ ⎟⎟ = = = = 1,686,366 2 ⎝ 2 ⎠ 2!(1,837 − 2)! 2 ⋅1 ⋅1835 ⋅1834 ⋅ ⋅ ⋅1 The probability of a false positive is the number of false positives divided by the number of pairs and is: P(false positive) = # false positives / # pairs = 693 / 1,686,366 = .0004

This probability is very small. There would be only about 4 false positives out of every 10,000. I would have confidence in the FBI’s forensic evidence.

58

Chapter 3


3.22

3.24

a.

P ( B c ) = 1 − P ( B ) = 1 − .7 = .3

b.

P ( Ac ) = 1 − P ( A) = 1 − .4 = .6

c.

P ( A ∪ B ) = P ( A) + P ( B ) − P( A ∩ B) = .4 + .7 − .3 = .8

The experiment consists of rolling a pair of fair dice. The sample points are: 1, 1 1, 2 1, 3 1, 4 1, 5 1, 6

2, 1 2, 2 2, 3 2, 4 2, 5 2, 6

3, 1 3, 2 3, 3 3, 4 3, 5 3, 6

4, 1 4, 2 4, 3 4, 4 4, 5 4, 6

5, 1 5, 2 5, 3 5, 4 5, 5 5, 6

6, 1 6, 2 6, 3 6, 4 6, 5 6, 6

Since each die is fair, each sample point is equally likely. The probability of each sample point is 1/36. a.

A: {(1, 6), (2, 5), (3, 4), (4, 3), (5, 2), (6, 1)} B: {(1, 4), (2, 4), (3, 4), (4, 4), (5, 4), (6, 4), (4, 1), (4, 2), (4, 3), (4, 5), (4, 6)} A ∩ B: {(3, 4), (4, 3)} A ∪ B: {(1, 4), (2, 4), (3, 4), (4, 4), (5, 4), (6, 4), (4, 1), (4, 2), (4, 3), (4, 5), (4, 6), (1, 6), (2, 5), (5, 2), (6, 1)} Ac: {(1, 1), (1, 2), (1, 3), (1, 4), (1, 5), (2, 1), (2, 2), (2, 3), (2, 4), (2, 6), (3, 1), (3, 2), (3, 3), (3, 5), (3, 6), (4, 1), (4, 2), (4, 4), (4, 5), (4, 6), (5, 1), (5, 3), (5, 4), (5, 5), (5, 6), (6, 2), (6, 3), (6, 4), (6, 5), (6, 6)}

b.

⎛ 1 ⎞ 6 1 P(A) = 6 ⎜ ⎟ = = ⎝ 36 ⎠ 36 6 ⎛ 1 ⎞ 11 P(B) = 11⎜ ⎟ = ⎝ 36 ⎠ 36 ⎛ 1 ⎞ 2 1 P(A ∩ B) = 2 ⎜ ⎟ = = ⎝ 36 ⎠ 36 18 ⎛ 1 ⎞ 15 5 P(A ∪ B) = 15 ⎜ ⎟ = = ⎝ 36 ⎠ 36 12 ⎛ 1 ⎞ 30 5 P(Ac) = 30 ⎜ ⎟ = = ⎝ 36 ⎠ 36 6 1 11 1 6 + 11 − 2 15 5 + − = = = 6 36 18 36 36 12

c.

P(A ∪ B) = P(A) + P(B) − P(A ∩ B) =

d.

A and B are not mutually exclusive. To be mutually exclusive, P(A ∩ B) must be 0. Here, 1 . P(A ∩ B) = 18

Probability

59


3.26

3.28

3.30

a.

P(Ac) = P(E3) + P(E6) = .2 + .3 = .5

b.

P(Bc) = P(E1) + P(E7) = .10 + .06 = .16

c.

P(Ac ∩ B) = P(E3) + P(E6) = .2 + .3 = .5

d.

P(A ∪ B) = P(E1) + P(E2) + P(E3) + P(E4) + P(E5) + P(E6) + P(E7) = .10 + .05 + .20 + .20 + .06 + .30 + .06 = .97

e.

P(A ∩ B) = P(E2) + P(E4) + P(E5) = .05 + .20 + .06 = .31

f.

P(Ac ∪ Bc) = P(E1) + P(E7) + P(E3) + P(E6) = .10 + .06 + .20 + .30 = .66

g.

No. A and B are mutually exclusive if P(A ∩ B) = 0. Here, P(A ∩ B) = .31.

a.

The outcome "On" and "High" is A ∩ D.

b.

The outcome "Low" or "Medium" is Dc.

Define the following events: A: {problems with absenteeism} T: {problems with turnover} From the problem, P(A) = .55, P(T) = .41, and P(A ∩ T) = .22 P(problems with either absenteeism or turnover) = P(A ∪ T) = P(A) + P(T) − P(A ∩ T) = .55 + .41 − .22 = .74

3.32

60

a.

The event A ∩ B is the event the outcome is black and odd. The event is A ∩ B: {11, 13, 15, 17, 29, 31, 33, 35}

b.

The event A ∪ B is the event the outcome is black or odd or both. The event A ∪ B is {2, 4, 6, 8, 10, 11, 13, 15, 17, 20, 22, 24, 26, 28, 29, 31, 33, 35, 1, 3, 5, 7, 9, 19, 21, 23, 25, 27}

Chapter 3


c.

Assuming all events are equally likely, each has a probability of 1/38. ⎛ 1 ⎞ 18 9 P(A) = 18 ⎜ ⎟ = = ⎝ 38 ⎠ 38 19 ⎛ 1 ⎞ 18 9 P(B) = 18 ⎜ ⎟ = = ⎝ 38 ⎠ 38 19 4 ⎛ 1 ⎞ 8 P(A ∩ B) = 8 ⎜ ⎟ = = ⎝ 38 ⎠ 38 19 ⎛ 1 ⎞ 28 14 P(A ∪ B) = 28 ⎜ ⎟ = = ⎝ 38 ⎠ 38 19 ⎛ 1 ⎞ 18 9 P(C) = 18 ⎜ ⎟ = = ⎝ 38 ⎠ 38 19

d.

The event A ∩ B ∩ C is the event the outcome is odd and black and low. The event A ∩ B ∩ C is {11, 13, 15, 17}.

e.

P(A ∪ B) = P(A) + P(B) − P(A ∩ B) =

f.

2 ⎛ 1 ⎞ 4 = P(A ∩ B ∩ C) = 4 ⎜ ⎟ = 38 38 19 ⎝ ⎠

g.

The event A ∪ B ∪ C is the event the outcome is odd or black or low. The event A ∪ B ∪ C is:

9 9 4 14 + − = 19 19 19 19

{1, 2, 3, ... , 29, 31, 33, 35} or {All sample points except 00, 0, 30, 32, 34, 36}

3.34

h.

⎛ 1 ⎞ 32 16 = P(A ∪ B ∪ C) = 32 ⎜ ⎟ = ⎝ 38 ⎠ 38 19

a.

P∩S∩A Products 6 and 7 are contained in this intersection.

b.

P(possess all the desired characteristics) = P(P ∩ S ∩ A) = P(6) + P(7) =

c.

1 1 1 + = 10 10 5

A∪S P(A ∪ S) = P(2) + P(3) + P(5) + P(6) + P(7) + P(8) + P(9) + P(10) 1 1 1 1 1 1 1 1 8 4 + + + + + + + = = = 10 10 10 10 10 10 10 10 10 5

Probability

61


d.

P∩S P(P ∩ S) = P(2) + P(6) + P(7) =

3.36

3.38

1 1 1 3 + + = 10 10 10 10

First, convert the percentages in the table to probabilities by dividing the percent by 100%. a.

P(A) = .259 + .169 + .115 = .543 P(B) = .003 P(C) = .037 + .078 + .016 + .002 + .047 + .027 = .207 P(D) = .414

b.

P(A ∩ D) = .156 + .094 + .043 = .293 P(A ∪ D) = P(A) + P(B) − P(A ∩ D) = .543 + .414 − .293 = .664

c.

Ac: {The worker is under 40} Bc: {The worker is 20 or older or is not part-time} Dc: {The worker is not part-time}

d.

P(Ac) = 1 − P(A) = 1 − .543 = .457 P(Bc) = 1 − P(B) = 1 − .003 = .997 P(Dc) = 1 − P(D) = 1 − .414 = .586

Define the following events: A: {Wheelchair user had an injurious fall} B: {Wheelchair user had all five features installed in the home} C: {Wheelchair user had no falls} D: {Wheelchair user had none of the features installed in the home}

3.40

62

a.

P ( A) =

48 = .157 306

b.

P( B) =

9 = .029 306

c.

P (C ∩ D) =

89 = .291 306

There are a total of 6 x 6 x 6 = 216 possible outcomes from throwing 3 fair dice. To help demonstrate this, suppose the three dice are different colors – red, blue and green. When we roll these dice, we will record the outcome of the red die first, the blue die second, and the green die third. Thus, there are 6 possible outcomes for the first position, 6 for the second, and 6 for the third. This leads to the 216 possible outcomes.

Chapter 3


The Grand Duke argued that the chance of getting a sum of 9 and the chance of getting a sum of 10 should be the same since the number of partitions for 9 and 10 are the same. These partitions are: 9 126 135 144 225 234 333

10 136 145 226 235 244 334

In each case, there are 6 partitions. However, if we take into account the three colors of the dice, then there are various ways to get each partition. For instance, to get a partition of 126, we could get 126, 162, 216, 261, 612, and 621 (again, think of the red die first, the blue die second, and the green die third). However, to get a partition of 333, there is only 1 way. To get a partition of 144, there are 3 ways: 144, 414, and 441. The numbers of ways to get each of the above partitions are: 9 126 135 144 225 234 333

# ways 6 6 3 3 6 _ 1 25

10 136 145 226 235 244 334

# ways 6 6 3 6 3 _3 27

Thus, there are a total of 25 ways to get a sum of 9 and 27 ways to get a sum of 10. The chance of throwing a sum of 9 (25 chances out of 216 possibilities) is less than the chance of throwing a 10 (27 chances out of 216 possibilities). 3.42

3.44

a.

P ( A ∩ B ) = P ( A | B ) P ( B ) = .6(.2) = .12

b.

P ( B | A) =

a.

Since A and B are mutually exclusive events, P(A ∪ B) = P(A) + P(B) = .30 + .55 = .85

b.

Since A and C are mutually exclusive events, P(A ∩ C) = 0

c.

P(A│B) =

d.

Since B and C are mutually exclusive events, P(B ∪ C) = P(B) + P(C) = .55 + .15 = .70

e.

No, B and C cannot be independent events because they are mutually exclusive events.

Probability

P ( A ∩ B ) .12 = .3 = P( A) .4

P( A ∩ B) 0 = =0 P( B) .55

63


3.46

a.

If two fair coins are tossed, there are 4 possible outcomes or simple events. They are: (1) HH

(2) HT

(3) TH

(4) TT

Event A contains the simple events (2), (3), and (4). Event B contains the simple events (2) and (3). A Venn diagram of this would be:

A

B 2 3

4

1

Since the coins are fair, each of the sample points is equally likely. Each would have probabilities of ¼. b.

⎛1⎞ 3 P ( A) = 3 ⎜ ⎟ = = .75 ⎝4⎠ 4 ⎛1⎞ 2 1 P ( B ) = 2 ⎜ ⎟ = = = .5 ⎝4⎠ 4 2 P ( A ∩ B ) = P (2)+P (3) =

c.

64

1 1 2 1 + = = = .5 4 4 4 2

P( A | B) =

P ( A ∩ B ) .5 = =1 P( B) .5

P ( B | A) =

P ( A ∩ B ) .5 = = .667 P ( A) .75

Chapter 3


3.48

The 36 possible outcomes obtained when tossing two dice are listed below: (1, 1) (1, 2) (1, 3) (1, 4) (1, 5) (1, 6) (2, 1) (2, 2) (2, 3) (2, 4) (2, 5) (2, 6) (3, 1) (3, 2) (3, 3) (3, 4) (3, 5) (3, 6) (4, 1) (4, 2) (4, 3) (4, 4) (4, 5) (4, 6) (5, 1) (5, 2) (5, 3) (5, 4) (5, 5) (5, 6) (6, 1) (6, 2) (6, 3) (6, 4) (6, 5) (6, 6) A: {(1, 2), (1, 4), (1, 6), (2, 1), (2, 3), (2, 5), (3, 2), (3, 4), (3, 6), (4, 1), (4, 3), (4, 5), (5, 2), (5, 4), (5, 6), (6, 1), (6, 3), (6, 5)} B: {(3, 6), (4, 5), (5, 4), (5, 6), (6, 3), (6, 5), (6, 6)} A ∩ B: {(3, 6), (4, 5), (5, 4), (5, 6), (6, 3), (6, 5)} If A and B are independent, then P(A)P(B) = P(A ∩ B). 18 1 7 6 1 = P(B) = P(A ∩ B) = = 36 2 36 36 6 1 7 7 1 P(A)P(B) = ⋅ = ≠ = P ( A ∩ B ) . Thus, A and B are not independent. 2 36 72 6

P(A) =

3.50

Define the following events: S: {cause of fatal crash is speeding} C: {cause of fatal crash is missing a curve} From the problem, we know P(S) = .3 and P(S ∩ C) = .12. P (C | S ) =

3.52

P (C ∩ S ) .12 = .4 = P( S ) .3

Define the following events: A: {Winner is from the American League} B: {Winner is from the National League} C: {Winner is from the Eastern Division} D {Winner is from the Central Division} E: {Winner is from the Western Division}

a.

Probability

P (C | A) =

7 P( A ∩ C ) 7 = 15 = = .7 10 P( A) 10 15

65


3.54

b.

1 P( B ∩ D) 1 P ( B | D) = = 15 = = .333 3 P( D) 3 15

c.

P( D ∪ E | B) =

2 P (( D ∪ E ) ∩ B ) 2 = 15 = = .4 5 P( B) 5 15

Define the following events: A: {electrical switch monitors quality of power} B: {electrical switch not wired properly} From the problem, P(A) = .90 and P(B | A) = .90. P(A ∩ B) = P(B | A) P(A) = .90(.90) = .81.

3.56

Define the following events:

Ai : {ith CEO has bachelors degree} a. b.

3.58

P ( A1 ) =

8 = .20 40

If the first 4 CEO’s have just bachelor’s degree, then on the next pick there are only 4 left to choose from. Similarly, after picking 4 CEO’s, there are only 36 observations left to choose from. 4 P ( A5 | A1 ∩ A2 ∩ A3 ∩ A4 ) = = .111 36

If A and B are independent, then P ( A ∩ B ) = P ( A) P ( B ) . For this Exercise, 1385 + 786 2171 1385 + 1175 2560 = = .651 , and P ( A) = = = .552 , P ( B ) = 3934 3934 3934 3934 P( A ∩ B) =

1385 = .352 . 3934

P ( A) P ( B ) = .552(.651) = .359 ≠ .352 = P ( A ∩ B ) . Thus, A and B are not independent. 3.60

66

The probability of a false positive is P(A | B).

Chapter 3


3.62

First, define the following event: A: {CVSA correctly determines the veracity of a suspect} P(A) = .98 (from claim)

3.64

a.

The event that the CVSA is correct for all four suspects is the event A ∩ A ∩ A ∩ A. P(A ∩ A ∩ A ∩ A) = .98(.98)(.98)(.98) = .9224

b.

The event that the CVSA is incorrect for at least one of the four suspects is the event (A ∩ A ∩ A ∩ A)c. P(A ∩ A ∩ A ∩ A)c = 1 − P(A ∩ A ∩ A ∩ A) = 1 − .9224 = .0776

Define the following events: I: {Leak ignites immediately (jet fire)} D: {Leak has delayed ignition (flash fire)} From the problem, P(I) = .01 and P(D | Ic) = .01 The probability of a jet fire or a flash fire = P(I ∪ D) = P(I) + P(D) – P(I ∩ D) = P(I) + P(D | Ic)P(Ic) − P(I ∩ D) = .01 + .01(1 − .01) – 0 = .01 + .0099 = .0199 A tree diagram of this problem is: I .01

I .01

D(.01)

.99

Ic

Dc (.99)

3.66

a.

Ic∩D .99(.01)=.0099

Ic∩Dc .99(.99)=.9801

Define the following events: W: F:

{Player wins the game Go} {Player plays first (black stones)}

P(W ∩ F) = 319/577 = .553

Probability

67


b.

P(W ∩ F│CA) = 34/34 = 1 P(W ∩ F│CB) = 69/79 = .873 P(W ∩ F│CC) = 66/118 = .559 P(W ∩ F│BA) = 40/54 = .741 P(W ∩ F│BB) = 52/95 = .547 P(W ∩ F│BC) = 27/79 = .342 P(W ∩ F│AA) = 15/28 = .536 P(W ∩ F│AB) = 11/51 = .216 P(W ∩ F│AC) = 3/39 = .077

c.

There are three combinations where the player with the black stones (first) is ranked higher than the player with the white stones: CA, CB, and BA. P(W ∩ F│CA ∪ CB ∪ BA) = (34 + 69 + 40)/(34 + 79 + 54) = 143/167 = .856

d.

There are three combinations where the players are of the same level: CC, BB, and AA. P(W ∩ F│CC ∪ BB ∪ AA) = (66 + 52 + 15)/(118 + 95 + 28) = 133/241 = .552

3.68

a.

Suppose the elements of the population are: 1, 2, 3, 4, 5, 6, 7, 8, 9, and 10. The possible samples of size 2 are: (1, 2) (1, 3) (1, 4) (1, 5) (1, 6) (1, 7) (1, 8) (1, 9) (1, 10) (2, 3) (2, 4) (2, 5) (2, 6) (2, 7) (2, 8) (2, 9) (2, 10) (3, 4) (3, 5) (3, 6) (3, 7) (3, 8) (3, 9) (3, 10) (4, 5) (4, 6) (4, 7) (4, 8) (4, 9) (4, 10) (5, 6) (5, 7) (5, 8) (5, 9) (5, 10) (6, 7) (6, 8) (6, 9) (6, 10) (7, 8) (7, 9) (7, 10) (8, 9) (8, 10) (9, 10) Since there are N = 10 elements in the population, the number of samples of size n = 2 is a combination of 10 things taken 2 at a time or ⎛ 10 ⎞ 10! 10 ⋅ 9 ⋅ 8 ⋅ 7 ⋅ 6 ⋅ 5 ⋅ 4 ⋅ 3 ⋅ 2 ⋅ 1 = 1 = 45 ⎜ ⎟= ⎝ 2 ⎠ 2!8! (2 ⋅ 1)(8 ⋅ 7 ⋅ 6 ⋅ 5 ⋅ 4 ⋅ 3 ⋅ 2 ⋅ 1) Therefore, there are 45 different samples of size n = 2 that can be selected from a population of N = 10.

b.

68

If random sampling is employed, every pair of elements has an equal probability of being selected. Therefore, the probability of drawing a particular pair is 1/45.

Chapter 3


c.

To draw a random sample of 2 elements from 10, we will number the elements from 0 to 9. Then, starting in an arbitrary position in Table I, Appendix B, we will select two numbers by going either down a column or across a row. Suppose that we start in the third position of column 6 and row 9. We will proceed down the column. The first sample drawn will be 1 and 5. The second sample drawn will be 9 and 4. The 20 samples selected are: Sample Number

1 2 3 4 5 6 7 8 9 10

Items Selected 1, 5 9, 4 4, 2 9, 3 8, 1 5, 6 1, 3 0, 2 4, 6 8, 0

Sample Number 11 12 13 14 15 16 17 18 19 20

Items Selected 0, 9 1, 0 3, 7 3, 9 0, 8 3, 4 0, 4 9, 7 8, 4 0, 5

There are actually two pairs of samples that match: Samples 10 and 15, and samples 4 and 14. Given the low probability of each pair occurring, it is not that likely to have two pairs of samples that match. 3.70

First, number the elements of the population from 1 to 200,000. Starting in row 10, column 1, of Table I of Appendix B and reading down, take the first ten 6-digit numbers. Eliminate any duplicates, the number 000000, and all numbers greater than 200,000. The 10 numbers selected for the random sample are: 094299 103656 071199 023682 010115 070569 024883 007425 053660 005820 Elements with the above numbers are selected for the sample.

3.72

To draw a random sample of 1,000 households from 534,322, we will number the households from 1 to 534,322. Then, starting in an arbitrary position in Table I, Appendix B, we will select 6-digit numbers by proceeding down a column. We will continue selecting numbers until we have 1,000 different 6-digit numbers, eliminating 000000 and any numbers between 534,323 and 999,999.

Probability

69


3.74

a.

Give each stock in the NYSE-Composite Transactions table of the Wall Street Journal a number (1 to m). Using Table I of Appendix B, pick a starting point and read down using the same number of digits as in m until you have n different numbers between 1 and m, inclusive.

3.76

a.

P ( B1 ∩ A) = P ( A | B1 ) P ( B1 ) = .3(.75) = .225

b.

P( B2 ∩ A) = P( A | B2 ) P( B2 ) = .5(.25) = .125

c.

P ( A) = P ( B1 ∩ A) + P ( B2 ∩ A) = .225 + .125 = .35

d.

P ( B1 | A) =

P ( B1 ∩ A) .225 = = .643 P( A) .35

e.

P ( B2 | A) =

P ( B2 ∩ A) .125 = = .357 P ( A) .35

3.78

If A is independent of B1, B2, and B3, then P( A | B1 ) = P( A) = .4 . Then P ( B1 | A) =

3.80

a.

P ( A | B1 ) P ( B1 ) .4(.2) = = .2 P ( A) .4

P( E1 ∩ error ) P (error ) P (error | E1 ) P( E1 ) = P(error | E1 ) P( E1 ) + P(error | E2 ) P( E2 ) + P(error | E3 ) P ( E3 )

P ( E1 | error ) =

= b.

.01(.30) .003 .003 = = .158 = .01(.30) + .03(.20) + .02(.50) .003 + .006 + .01 .019

P( E2 ∩ error ) P (error ) P(error | E2 ) P( E2 ) = P (error | E1 ) P ( E1 ) + P (error | E2 ) P ( E2 ) + P(error | E3 ) P( E3 )

P ( E2 | error ) =

=

70

.03(.20) .006 .006 = = .316 = .01(.30) + .03(.20) + .02(.50) .003 + .006 + .01 .019

Chapter 3


c.

P ( E3 ∩ error ) P(error ) P(error | E3 ) P ( E3 ) = P(error | E1 ) P( E1 ) + P(error | E2 ) P( E2 ) + P(error | E3 ) P( E3 )

P ( E3 | error ) =

= d.

3.82

.02(.50) .01 .01 = = = .526 .01(.30) + .03(.20) + .02(.50) .003 + .006 + .01 .019

If there was a serious error, the probability that the error was made by engineer 3 is .526. This probability is higher than for any of the other engineers. Thus engineer #3 is most likely responsible for the error.

Define the following events: D: {Defect in steel casting} H: {NDE detects ‘Hit” or defect in steel casting} From the problem, P(H | D) = .97, P(H | Dc) = .005, and P(D) = .01. P(H) = P(H | D)P(D) + P(H | Dc)P(Dc) = .97(.01) + .005(.99) = .0097 + .00495 = .01465 P( D | H ) =

3.84

P ( D ∩ H ) P ( H | D) P ( D) .97(.01) .0097 = = = = .6621 P( H ) P( H ) .01465 .01465

Define the following events: A: {Alarm A sounds alarm} B: {Alarm B sounds alarm} I: {Intruder} From the problem: P(A | I ) = .9 P(B | I ) = .95 P(A | Ic ) = .2 P(B | Ic ) = .1 P( I ) = .4 Since the two systems are operating independently of each other, P(A ∩ B | I ) = P(A | I ) P(B | I ) = .9 (.95) = .855 P(A ∩ B ∩ I ) = P(A ∩ B | I ) P( I ) = .855(.4) = .342 P(A ∩ B | Ic ) = P(A | Ic ) P(B | Ic ) = .2 (.1) = .02

Probability

71


P(A ∩ B ∩ Ic ) = P(A ∩ B | Ic ) P( Ic ) = .02(.6) = .012 Thus, P(A ∩ B) = P(A ∩ B ∩ I ) + P(A ∩ B ∩ Ic ) = .342 + .012 = .354 Finally, P(I | A ∩ B ) = P(A ∩ B ∩ I ) / P(A ∩ B) = .342 / .354 = .966 3.86

a.

The two probability rules for a sample space are that the probability for any sample point is between 0 and 1 and that the sum of the probabilities of all the sample points is 1. For this Exercise, all the probabilities of the sample points are between 0 and 1 and 4

∑ P(S ) = P(S ) + P(S ) + P(S ) + P( S ) =.2 + .1 + .3 + .4 = 1.0 i =1

b.

i

1

2

3

4

P( A) = P( S1 ) + P( S4 ) = .2 + .4 = .6

3.88

P ( A ∪ B ) = P ( A) + P( B) − P( A ∩ B) = .7 + .5 − .4 = .8

3.90

a.

If the Dow Jones Industrial Average increases, a large New York bank would tend to decrease the prime interest rate. Therefore, the two events are not mutually exclusive since they could occur simultaneously.

b.

The next sale by a PC retailer could not be both a laptop and a desktop computer. Since the two events cannot occur simultaneously, the events are mutually exclusive.

c.

Since both events cannot occur simultaneously, the events are mutually exclusive.

a.

Because events A and B are independent, we have:

3.92

P(A ∩ B) = P(A)P(B) = (.3)(.1) = .03 Thus, P(A ∩ B) ≠ 0, and the two events cannot be mutually exclusive.

3.94

72

P( A ∩ B ) .03 = = .3 P( B) .1

P(B│A) =

P( A ∩ B ) .03 = = .1 P ( A) .3

b.

P(A│B) =

c.

P(A ∪ B) = P(A) + P(B) − P(A ∩ B) = .3 + .1 − .03 = .37

Mutually exclusive events are also dependent events since the assumption that one event occurs alters the probability of the occurrence of the other one. If we assume that one event has occurred, it is impossible for the other one to occur simultaneously since they are mutually exclusive. In other words, if A and B are mutually exclusive, P(A ∩ B) = 0. P(A│B) = P( A ∩ B) 0 = = 0. Since P(A) ≠ 0, A and B are dependent. P( B) P( B)

Chapter 3


3.96

Define the following events: C: {Public school building has inadequate plumbing} D: {Public school has plans for repairing building} From the problem, we know P(C) = .25 and P(D|C) = .38. P (C ∩ D) = P ( D | C ) P(C ) = .38(.25) = .095

3.98

a.

The event {The manager was involved in the ISO 9000 registration} contains the sample points {The manager was very involved}, {The manager had moderate involvement}, and {The manager had minimal involvement}. Thus, P(A) is: P(A) =

b.

The event {The length of time to achieve ISO 9000 registration was more than 2 years} contains the sample points {The length of time to achieve ISO 9000 registration was between 2.1 and 2.5 years} and {The length of time to achieve ISO 9000 registration was greater than 2.5 years}. Thus, P(B) is: P(B) =

3.100

9 16 12 37 = = .925 + + 40 40 40 40

2 3 5 = = .125 + 40 40 40

c.

We cannot determine if events A and B are independent from the data given because there is no way of finding the P(A ∩ B). In order to find P(A ∩ B), the 40 individuals would have to be classified on both variables at the same time. In the data provided, the individuals are first classified on the first variable and then classified on the second variable.

a.

The experiment consists of selecting 159 employees and asking each to indicate how strongly he/she agreed or disagreed with the statement "I believe that management is committed to CQI." There are five sample points: "Strongly agree," "Agree," "Neither agree nor disagree," "Disagree," and "Strongly disagree."

b.

Since we have frequencies for each of the sample points, good estimates of the probabilities are the relative frequencies. To find the relative frequencies, divide all of the frequencies by the sample size of 159. The estimates of the probabilities are:

c.

Probability

Strongly Agree

Agree

Neither Agree Nor Disagree

Disagree

Strongly Disagree

.189

.403

.258

.113

.038

The probability that an employee agrees or strongly agrees with the statement is .189 + .403 = .592.

73


3.102

d.

The probability that an employee does not strongly agree with the statement is equal to the sum of all the probabilities except that for "strongly agree" = .403 + .258 + .113 + .038 = .812.

a.

There are a total of 9 × 2 = 18 sample points for this experiment. There are 9 sources of CO poisoning, and each source of poisoning has 2 possible outcomes, fatal or nonfatal. Suppose we introduce some notation to make it easier to write down the sample points. Let FI = Fire, AU = Auto exhaust, FU = Furnace, K = Kerosene or spaceheater, AP = Appliance, OG = Other gas-powered motors, FP = Fireplace, O = Other, and U = Unknown. Also, let F = Fatal and N = Nonfatal. The 18 sample points are: FI, F FI, N

AU, F AU, N

FU, F FU, N

K, F K, N

AP, F AP, N

OG, F OG, N

FP, F FP, N

O, F O, N

b.

The set of all sample points is called the sample space.

c.

The event A is made up of the following sample points: FI, F and FI, N

U, F U, N

Then, P(A) = P(FI, F) + P(FI, N) = 63/981 + 53/981 = 116/981 = .118 d.

The event B is made up of the following sample points: (FI, F); (AU, F); (FU, F); (K, F); (AP, F); (OG, F); (FP, F); (O, F); (U, F) Then, P(B) = P(FI, F) + P(AU, F) + P(FU, F) + P(K, F) + P(AP, F) + P(OG, F) + P(FP, F) + P(O, F) + P(U, F) = 63/981 + 60/981 + 18/891 + 9/981 + 9/981 + 3/981 + 0/981 + 3/981 + 9/981 = 174/981 = .177

e.

The event C is made up of the following sample points: (AU, F) and (AU, N) Then, P(C) = P(AU, F) + P(AU, N) = 60/981 + 178/981 = 238/981 = .243

f.

The event D is made up of the following sample point: AU, F Then, P(D) = P(AU, F) = 60/981 = .061

g.

The event E is made up of the following sample point: FI, N Then, P(E) = P(FI, N) = 53/981 = .054

3.104

Since there are 11 individuals who are willing to serve on the panel, the number of different panels of 5 experts is a combination of 11 things taken 5 at a time or ⎛ 11⎞ 11! 11 ⋅ 10 ⋅ 9 ⋅ 8 ⋅ 7 ⋅ 6 ⋅ 5 ⋅ 4 ⋅ 3 ⋅ 2 ⋅ 1 = 462 = ⎜ ⎟= ⎝ 5 ⎠ 5!6! (5 ⋅ 4 ⋅ 3 ⋅ 2 ⋅ 1)(6 ⋅ 5 ⋅ 4 ⋅ 3 ⋅ 2 ⋅ 1)

74

Chapter 3


3.106

The possible ways of ranking the blades are: GSW GWS

SGW SWG

WGS WSG

If the consumer had no preference but still ranked the blades, then the 6 possibilities are equally likely. Therefore, each of the 6 possibilities has a probability of 1/6 of occurring.

3.108

a.

P(Ranks G first) = P(GSW) + P(GWS) =

1 1 2 1 + = = 6 6 6 3

b.

P(Ranks G last) = P(SWG) + P(WSG) =

1 1 2 1 + = = 6 6 6 3

c.

P(ranks G last and W second) = P(SWG) =

d.

P(WGS) =

a.

Consecutive tosses of a coin are independent events since what occurs one time would not affect the next outcome.

b.

If the individuals are randomly selected, then what one individual says should not affect what the next person says. They are independent events.

c.

The results in two consecutive at-bats are probably not independent. The player may have faced the same pitcher both times which may affect the outcome.

d.

The amount of gain and loss for two different stocks bought and sold on the same day are probably not independent. The market might be way up or down on a certain day so that all stocks are affected.

e.

The amount of gain or loss for two different stocks that are bought and sold in different time periods are independent. What happens to one stock should not affect what happens to the other.

f.

The prices bid by two different development firms in response to the same building construction proposal would probably not be independent. The same variables would be present for both firms to consider in their bids (materials, labor, etc.).

Probability

1 6

1 6

75


3.110

a.

We will define the following events: A:{The first activation device works properly; i.e., activates the sprinkler when it should} B:{The second activation device works properly} From the statement of the problem, we know P(A) = .91 and P(B) = .87 Furthermore, since the activation devices work independently, we conclude that P(A ∩ B) = P(A)P(B) = (.91)(.87) = .7917 Now, if a fire starts near a sprinkler head, the sprinkler will be activated if either the first activation device or the second activation device, or both, operates properly. Thus, P(Sprinkler head will be activated) = P(A ∪ B) = P(A) + P(B) − P(A ∩ B) = .91 + .87 − .7917 = .9883

b.

The event that the sprinkler head will not be activated is the complement of the event that the sprinkler will be activated. Thus, P(Sprinkler head will not be activated) = 1 − P(Sprinkler head will be activated) = 1 − .9883 = .0117

c.

From part a, P(A ∩ B) = P(A)P(B) = .7917

d.

In terms of the events we have defined, we wish to determine P(A ∩ Bc) = P(A)P(Bc) (by independence) = .91(1 − .87) = .91(.13) = .1183

3.112

Define the following events: S: {System shuts down} F1: {Hardware failure} F2: {Software failure} F3: {Power failure} From the Exercise, we know: P(F1) = .01, P(F2) = .05, and P(F3) = .02. Also, P(S|F1) = .73, P(S|F2) = .12, and P(S|F3) = .88.

76

Chapter 3


The probability that the current shutdown is due to a hardware failure is: P ( F1 ∩ S ) P( S | F1 ) P ( F1 ) = P(S ) P ( S | F1 ) P ( F1 ) + P ( S | F2 ) P ( F2 ) + P ( S | F3 ) P ( F3 )

P ( F1 | S ) =

.73(.01) .0073 .0073 = = = .2362 .73(.01) + .12(.05) + .88(.02) .0073 + .006 + .0176 .0309

=

The probability that the current shutdown is due to a software failure is: P ( F2 | S ) = =

P ( F2 ∩ S ) P ( S | F2 ) P ( F2 ) = P(S ) P ( S | F1 ) P ( F1 ) + P( S | F2 ) P( F2 ) + P( S | F3 ) P( F3 ) .12(.05) .006 .006 = = = .1942 .73(.01) + .12(.05) + .88(.02) .0073 + .006 + .0176 .0309

The probability that the current shutdown is due to a power failure is: P ( F3 | S ) = = 3.114

P( F3 ∩ S ) P ( S | F3 ) P ( F3 ) = P( S ) P ( S | F1 ) P ( F1 ) + P ( S | F2 ) P ( F2 ) + P ( S | F3 ) P ( F3 ) .88(.02) .0176 .0176 = = = .5696 .73(.01) + .12(.05) + .88(.02) .0073 + .006 + .0176 .0309

Define the following events: C: {Committee judges joint acceptable} I: {Inspector judges joint acceptable} The sample points of this experiment are: C∩I C ∩ Ic Cc ∩ I Cc ∩ I c a.

The probability the inspector judges the joint to be acceptable is: P(I) = P(C ∩ I) + P(C c ∩ I) =

101 23 124 + = = .810 153 153 153

The probability the committee judges the joint to be acceptable is: P(C) = P(C ∩ I) + P(C ∩ I c) =

Probability

101 10 111 + = = .725 153 153 153

77


b.

The probability that both the committee and the inspector judge the joint to be acceptable is: P(C ∩ I) =

101 = .660 153

The probability that neither judge the joint to be acceptable is: P(C c ∩ I c) = c.

19 = .124 153

The probability the inspector and committee disagree is: P(C ∩ I c) + P(C c ∩ I) =

10 23 33 + = = .216 153 153 153

The probability the inspector and committee agree is: P(C ∩ I) + P(C c ∩ I c) = 3.116

a.

101 19 120 + = = .784 153 153 153

Define the following events: A1: A2: B3: B4: A: B:

{Component 1 works properly} {Component 2 works properly} {Component 3 works properly} {Component 4 works properly} {Subsystem A works properly} {Subsystem B works properly}

The probability a component fails is .1, so the probability a component works properly is 1 − .1 = .9. Subsystem A works properly if both components 1 and 2 work properly. P(A) = P(A1 ∩ A2) = P(A1)P(A2) = .9(.9) = .81 (since the components operate independently) Similarly, P(B) = P(B1 ∩ B2) = P(B1)P(B2) = .9(.9) = .81 B

B

B

B

The system operates properly if either subsystem A or B operates properly. The probability the system operates properly is: P(A ∪ B) = P(A) + P(B) - P(A ∩ B) = P(A) + P(B) − P(A)P(B) = .81 + .81 − .81(.81) = .9639

78

Chapter 3


b.

The probability exactly one subsystem fails is: P(A ∩ Bc) + P(Ac ∩ B) = P(A)P(Bc) + P(Ac)P(B) = .81(1 − .81) + (1 − .81).81 = .1539 + .1539 = .3078

c.

The probability the system fails is the probability that both subsystems fail or: P(Ac ∩ Bc) = P(Ac)P(Bc) = (1 − .81)(1 − .81) = .0361

d.

The system operates correctly 99% of the time means it fails 1% of the time. The probability one subsystem fails is .19. The probability n subsystems fail is .19 n. Thus, we must find n such that .19n ≤ .01 Thus, n = 3.

3.118

Define the events: A: {A bottle comes from machine A} B: {A bottle comes from machine B} R: {A bottle is rejected}. Then the given probabilities are: P(A) = .75, P(B) = .25, P(R│A) =

1 1 , P(R│B) = 20 30

The proportion of rejected bottles is: P(R) = P(A ∩ R) + P(B ∩ R) = P(R⏐A)P(A) + P(R│A)P(B) 1 1 (.75) + (.25) = .0458 = 20 30 The probability that a bottle comes from machine A, given that it is accepted is: c P( A ∩ R c ) P ( R A) ⋅ P ( A) (19 / 20) ⋅ (.75) = .7467 P(A│R ) = = = R( R c ) 1 − P( R) 1 − .0458

c

Probability

79


3.120

There are a total of 6 × 6 = 36 outcomes when rolling 2 dice. If we let the first number in the pair represent the outcome of die number 1 and the second number in the pair represent the outcome of die number 2, then the possible outcomes are: 1,1 1,2 1,3 1,4 1,5 1,6

2,1 2,2 2,3 2,4 2,5 2,6

3,1 3,2 3,3 3,4 3,5 3,6

4,1 4,2 4,3 4,4 4,5 4,6

5,1 5,2 5,3 5,4 5,5 5,6

6,1 6,2 6,3 6,4 6,5 6,6

If both dice are fair, then each of these outcomes are equally like and have a probability of 1/36. a.

To win on the first roll, a player must roll a 7 or 11. There are 6 ways to roll a 7 and 2 ways to roll an 11. Thus the probability of winning on the first roll is: P (7 or 11) =

b.

To lose on the first roll, a player must roll a 2 or 3. There is 1 way to roll a 2 and 2 ways to roll a 3. Thus the probability of losing on the first roll is: P (2 or 3) =

c.

8 = .2222 36

3 = .0833 36

If a player rolls a 4 on the first roll, the game will end on the next roll if the player rolls the original roll again (player wins) or if the player rolls a seven (player loses). Now, there are 3 ways of getting a 4 on the first roll: 1,3, 2,2, or 3,1. If the first roll was 2,2, then the game would end on the next roll if the player threw a 2,2, 1,6, 2,5, 3,4, 4,3, 5,2, or 6,1 on the next roll. The probability of the game ending on the next roll would be: P (2, 2 or 7 on second toss | 2, 2 on first) =

7 = .1944 36

Now, suppose the first roll ended with a 1 and a 3. Since the dice are not marked, this result could have happened two ways: 1, 3 or 3,1. Regardless of how the original 1 and 3 were obtained, the player would have 2 ways of winning on the next roll: 1,3 or 3,1. For the game to end on the next roll, the player could throw 1,3, 3,1, 1,6, 2,5, 3,4, 4,3, 5,2, or 6,1. The probability of the game ending on the next roll would be: P (1,3 or 3,1 or 7 on second toss |1 and 3 on first) =

8 = .2222 36

Since there were 3 ways to get a 4 on the first roll, and each were equally likely, P(2,2) = 1/3 and P[1 and 3 (any order)] = 2/3.

80

Chapter 3


The probability that the game ends on the second roll is P (2, 2 or 7 on second toss | 2, 2 on first) P (2, 2 on first) + P (1,3 or 3,1 or 7 on second toss |1 and 3 on first) P (1 and 3 on first) ⎛1⎞ ⎛2⎞ = .1944 ⎜ ⎟ + .2222 ⎜ ⎟ = .0648 + .1481 = .2129 ⎝3⎠ ⎝3⎠

3.122

Suppose we define the following event: E: {Error produced when dividing} From the problem, we know that P(E) = 1 / 9,000,000,000 The probability of no error produced when dividing is P(Ec) = 1 – P(E) = 1 – 1 / 9,000,000,000 = 8,999,999,999 / 9,000,000,000 = .999999999 ≈ 1.0000 Suppose we want to find the probability of no errors in 2 divisions (assuming each division is independent): P(Ec ∩ Ec) = .999999999(.999999999) = .999999999 ≈ 1.0000 Thus, in general, the probability of no errors in k divisions would be: P(Ec ∩ Ec ∩ Ec ∩ …∩ Ec) = P(Ec)k = [8,999,999,999 / 9,000,000,000]k k times Suppose a user ran a program that performed 1 billion divisions. The probability of no errors in these 1 billion divisions would be: P(Ec)1,000,000,000 = [8,999,999,999 / 9,000,000,000]1,000,000,000 = .9048 Thus, the probability of at least 1 error in 1 billion divisions would be 1 − P(Ec)1,000,000,000 = 1 - [8,999,999,999 / 9,000,000,000]1,000,000,000 = 1 − .9048 = .0852 For a heavy MINITAB user, this flawed chip would be a problem because the above probability is not that small.

Probability

81


Random Variables and Probability Distributions 4.2

Chapter 4

a.

The closing price of a particular stock on the New York Stock Exchange is discrete. It can take on only a countable number of values.

b.

The number of shares of a particular stock that are traded on a particular day is discrete. It can take on only a countable number of values.

c.

The quarterly earnings of a particular firm is discrete. It can take on only a countable number of values.

d.

The percentage change in yearly earnings between 2005 and 2006 for a particular firm is continuous. It can take on any value in an interval.

e.

The number of new products introduced per year by a firm is discrete. It can take on only a countable number of values.

f.

The time until a pharmaceutical company gains approval from the U.S. Food and Drug Administration to market a new drug is continuous. It can take on any value in an interval of time.

4.4

The number of customers, x, waiting in line can take on values 0, 1, 2, 3, … . Even though the list is never ending, we call this list countable. Thus, the random variable is discrete.

4.6

A banker might be interested in the number of new accounts opened in a month, or the number of mortgages it currently has, both of which are discrete random variables.

4.8

The manager of a hotel might be concerned with the number of employees on duty at a specific time, or the number of vacancies there are on a certain night.

4.10

A stockbroker might be interested in the length of time until the stockmarket is closed for the day.

4.12

a.

The variable x can take on values 1, 3, 5, 7, and 9.

b.

The value of x that has the highest probability associated with it is 5. It has a probability of .4.

82

Chapter 4


4.14

4.16

c.

Using MINITAB, the probability distribution of x as a graph is:

d.

P(x = 7) = .2

e.

P(x ≥ 5) = p(5) + p(7) + p(9) = .4 + .2 + .1 = .7

f.

P(x > 2) = p(3) + p(5) + p(7) + p(9) = .2 + .4 + .2 + .1 = .9

a.

This is not a valid distribution because

b.

This is a valid distribution because 0 ≤ p(x) ≤ 1 for all values of x and

c.

This is not a valid distribution because p(4) = −.3 < 0.

d.

The sum of the probabilities over all possible values of the random variable is ∑ p( x) = 1.1 > 1, so this is not a valid probability distribution.

a.

μ = E(x) =

∑ p( x) = .9 ≠ 1.

∑ p( x) = 1.

∑ xp( x)

= 10(.05) + 20(.20) + 30(.30) + 40(.25) + 50(.10) + 60(.10) = .5 + 4 + 9 + 10 + 5 + 6 = 34.5

σ2 = E(x − μ)2 =

∑ (x − μ)

2

p ( x)

= (10 − 34.5)2(.05) + (20 − 34.5)2(.20) + (30 − 34.5)2(.30) + (40 − 34.5)2(.25) + (50 − 34.5)2(.10) + (60 − 34.5)2(.10) = 30.0125 + 42.05 + 6.075 + 7.5625 + 24.025 + 65.025 = 174.75 σ = 174.75 = 13.219 b.

Random Variables and Probability Distributions

83


c.

μ ± 2σ ⇒ 34.5 ± 2(13.219) ⇒ 34.5 ± 26.438 ⇒ (8.062, 60.938) P(8.062 < x < 60.938) = p(10) + p(20) + p(30) + p(40) + p(50) + p(60) = .05 + .20 + .30 + .25 + .10 + .10 = 1.00

4.18

a.

It would seem that the mean of both would be 1 since they both are symmetric distributions centered at 1.

b.

P(x) seems more variable since there appears to be greater probability for the two extreme values of 0 and 2 than there is in the distribution of y.

c.

For x:

∑ xp( x) = 0(.3) + 1(.4) + 2(.3) = 0 + .4 + .6 = 1 σ2 = E[(x − μ)2] = ∑ ( x − μ ) p ( x) μ = E(x) =

2

= (0 − 1)2(.3) + (1 − 1)2(.4) + (2 − 1)2(.3) = .3 + 0 + .3 = .6

∑ yp( y) = 0(.1) + 1(.8) + 2(.1) = 0 + .8 + .2 = 1 σ2 = E[(y − μ)2] = ∑ ( y − μ ) p( y ) μ = E(y) =

For y:

2

= (0 − 1)2(.1) + (1 − 1)2(.8) + (2 − 1)2(.1) = .1 + 0 + .1 = .2 The variance for x is larger than that for y. 4.20

a.

Yes. Relative frequencies are observed values from a sample. Relative frequencies are commonly used to estimate unknown probabilities. In addition, relative frequencies have the same properties as the probabilities in a probability distribution, namely 1. all relative frequencies are greater than or equal to zero 2. the sum of all the relative frequencies is 1

b.

Using MINITAB, the graph of the probability distribution is: 0.15

p(age)

0.10

0.05

0.00 20

25

30

age

c.

Let x = age of employee. Then P(x > 30) = .13 + .15 + .12 = .40. P(x > 40) = 0 P(x < 30) = .02 + .04 + .05 + .07 + .04 + .02 + .07 + .02 + .11 + .07 = .51

d.

84

P(x = 25 or x = 26) = .02 + .07 = .09

Chapter 4


4.22

4.24

a.

The probability distribution for x is: Grill Display Combination 1-2-3

x 6

p(x) 35 / 124 = .282

1-2-4

7

8 / 124 = .065

1-2-5

8

42 / 124 = .339

2-3-4

9

4 / 124 = .032

2-3-5

10

1 / 124 = .008

2-4-5

11

34 / 124 = .274

b.

P(x > 10) = p(10) + p(11) = .008 + .274 = .282

a.

First, we must find the probability distribution of x. Define the following events: C: {Chicken is contaminated} N: {Chicken is not contaminated} If 3 slaughtered chickens are randomly selected, then the possible outcomes are: CCC, CCN, CNC, NCC, CNN, NCN, NNC, and NNN Each of these outcomes are NOT equally likely since P(C) = 1/100 = .01. P(N) = 1 – P(C) = 1 -−.01 = .99. P(CCC) = P(C ∩ C ∩ C ) = P(C) P(C) P(C) = .01(.01)(.01) = .000001 P(CCN) = P(CNC) = P(NCC) = P(C ∩ C ∩ N ) = P(C) P(C) P(N) = .01(.01)(.99) = .000099 P(CNN) = P(NCN) = P(NNC) = P(C ∩ N ∩ N ) = P(C) P(N) P(N) = .01(.99)(.99) = .009801 P(NNN) = P(N ∩ N ∩ N ) = P(N) P(N) P(N) = .99(.99)(.99) = .970299. The variable x is defined as the number of contaminated chickens in the sample. The value of x for each of the outcomes is: Event CCC CCN CNC NCC CNN NCN NNC NNN

x 3 2 2 2 1 1 1 0


p(x) .000001 .000099 .000099 .000099 .009801 .009801 .009801 .970299

85


The probability distribution of x is: x 3 2 1 0

b.

p(x) .000001 .000297 .029403 .970299

Using MINITAB, the probability graph for x is:

1.0 0.9 0.8 0.7

p(x)

0.6 0.5 0.4 0.3 0.2 0.1 0.0 0

1

2

3

x

c. 4.26

P(x ≤ 1) = P(x = 0) + P(x = 1) = .970299 + .029403 = .999702

To find the probability distribution of x, we first list the possible values of x. For this exercise, the possible values of x are −3, −1, and 5. Next, we list the number of cases, f(x), that result in the particular values of x. To find the probability distribution of x, we divide the number of cases for each value of x, f(x), by the total number of cases, 678. For x = −3, the probability is p(−3) = 68 / 678 = .100. For x = −1, the probability is p(−1) = 71 / 678 = .105. For x = 5, the probability is p(5) = 539 / 678 = .795. The probability distribution of x is:

x −3 −1 5 Total

86

f(x) 68 71 539 678

p(x) .100 .105 .795 1.000

Chapter 4


Using MINITAB, the graph of the probability distribution is:

0.8 0.7 0.6

p(x)

0.5 0.4 0.3 0.2 0.1 0.0 -3

-2

-1

0

1

2

3

4

5

x

4.28

a.

E(x) =

∑ xp( x)

All x

Firm A: E(x) = 0(.01) + 500(.01) + 1000(.01) + 1500(.02) + 2000(.35) + 2500(.30) + 3000(.25) + 3500(.02) + 4000(.01) + 4500(.01) + 5000(.01) = 0 + 5 + 10 + 30 + 700 + 750 + 750 + 70 + 40 + 45 + 50 = 2450 Firm B: E(x) = 0(.00) + 200(.01) + 700(.02) + 1200(.02) + 1700(.15) + 2200(.30) + 2700(.30) + 3200(.15) + 3700(.02) + 4200(.02) + 4700(.01) = 0 + 2 + 14 + 24 + 255 + 660 + 810 + 480 + 74 + 84 + 47 = 2450 b.

σ = σ2

σ2 =

∑ (x − μ)

2

p( x)

All x

Firm A: σ2 = (0 − 2450)2(.01) + (500 − 2450)2(.01) + ⋅⋅⋅ + (5000 − 2450)2(.01) = 60,025 + 38,025 + 21,025 + 18,050 + 70,875 + 750 + 75,625 + 22,050 + 24,025 + 42,025 + 65,025 = 437,500 σ = 661.44 Firm B: σ2 = (0 − 2450)2(.00) + (200 − 2450)2(.01) + ⋅⋅⋅ + (4700 − 2450)2(.01) = 0 + 50,625 + 61,250 + 31,250 + 84,375 + 18,750 + 84,375 + 31,250 + 61,250 + 50,625 = 492,500 σ = 701.78 Firm B faces greater risk of physical damage because it has a higher variance and standard deviation.


87


4.30

a.

If a large number of measurements are observed, then the relative frequencies should be very good estimators of the probabilities.

b.

E(x) =

∑ xp( x) = 1(.01) + 2(.04) + 3(.04) + 4(.08) + 5(.10) + 6(.15) + 7(.25) + 8(.20) + 9(.08) + 10(.05) = .01 + .08 + .12 + .32 + .50 + .90 + 1.75 + 1.60 + .72 + .50 = 6.50

The average number of checkout lanes per store is 6.5. c.

σ2 =

∑ (x − μ)

2

p( x) = (1 − 6.5)2(.01) + (2 − 6.5)2(.04) + (3 − 6.5)2(.04)

All x

+ (4 − 6.5)2(.08) + (5 − 6.5)2(.10) + (6 − 6.5)2(.15) + (7 − 6.5)2(.25) + (8 − 6.5)2(.20) + (9 − 6.5)2(.08) + (10 − 6.5)2(.05) = .3025 + .8100 + .4900 + .5000 + .2250 + .0375 + .0625 + .4500 + .5000 + .6125 = 3.99

σ= d.

3.99 = 1.9975

Chebyshev's Rule says that at least 0 of the observations should fall in the interval μ ± σ.

Chebyshev's Rule says that at least 75% of the observations should fall in the interval μ ± 2σ. e.

μ ± σ ⇒ 6.5 ± 1.9975 ⇒ (4.5025, 8.4975)

P(4.5025 ≤ x ≤ 8.4975) = .10 + .15 + .25 + .20 = .70 This is at least 0.

μ ± 2σ ⇒ 6.5 ± 2(1.9975) ⇒ 6.5 ± 3.995 ⇒ (2.505, 10,495)

P(2.505 ≤ x ≤ 10.495) = .04 + .08 + .10 + .15 + .25 + .20 + .08 + .05 = .95 This is at least .75 or 75%.

4.32

Let x = winnings in the Florida lottery. The probability distribution for x is: x p(x) 22,999,999/23,000,000 −$1 $6,999,999 1/23,000,000

The expected net winnings would be:

μ = E(x) = (−1)(22,999,999/23,000,000) + 6,999,999(1/23,000,000) = −$.70 The average winnings of all those who play the lottery is −$.70.

88

Chapter 4


4.34

Each point in the system can have one of 2 status levels, “free” or “obstacle”. Define the following events: AF: {Point A is free} BF: {Point B is free} CF: {Point C is free}

AO: {Point A is obstacle} BO: {Point B is obstacle} CO: {Point C is obstacle}

Thus, the sample points for the space are: AFBFCF, AFBFCO, AFBOCF, AFBOCO, AOBFCF, AOBFCO, AOBOCF, AOBOCO Since it is stated that the probability of any point in the system having a “free” status is .5, the probability of any point having an “obstacle” status is also .5, Thus, the probability of each of the sample points above is P(AiBiCi) = .5(.5)(.5) = .125. The values of Y, the number of free links in the system, for each sample point are listed below. A link is free if both the points are free. Thus, a link from A to B is free if A is free and B is free. A link from B to C is free if B is free and C is free.

Sample point

Y

Probability

AFBFCF

2

.125

AFBFCO

1

.125

AFBOCF

0

.125

AFBOCO

0

.125

AOBFCF

1

.125

AOBFCO

0

.125

AOBOCF

0

.125

AOBOCO

0

.125

The probability distribution for Y is: Y

Probability

0

.625

1

.250

2

.125


89


4.36

a.

x is discrete. It can take on only six values.

b.

This is a binomial distribution.

c.

⎛ 5⎞ p(0) = ⎜ ⎟ (.7)0(.3)5-0 = ⎝ 0⎠ ⎛ 5⎞ p(1) = ⎜ ⎟ (.7)1(.3)5-1 = ⎝ 1⎠

⎛ 5⎞ p(2) = ⎜ ⎟ (.7)2(.3)5-2 = ⎝ 2⎠ ⎛5⎞ p(3) = ⎜ ⎟ (.7)3(.3)5-3 = ⎝ 3⎠ ⎛ 5⎞ p(4) = ⎜ ⎟ (.7)4(.3)5-4 = ⎝ 4⎠ ⎛5⎞ p(5) = ⎜ ⎟ (.7)5(.3)5-5 = ⎝5⎠

90

5! 5⋅ 4 ⋅3⋅ 2 ⋅1 (.7)0(.3)5 = (1)(.00243) = .00243 0!5! 1⋅5⋅ 4 ⋅3⋅ 2 ⋅1

5! (.7)1(.3)4 = .02835 1!4! 5! (.7)2(.3)3 = .1323 2!3!

5! (.7)3(.3)2 = .3087 3!2! 5! (.7)4(.3)1 = .36015 4!1! 5! (.7)5(.3)0 = .16807 5!0!

d.

μ = np = 5(.7) = 3.5 σ = npq = 5(.7)(.3) = 1.0247

e.

μ ± 2σ = 3.5 ± 2(1.0247) ⇒ (1.4506, 5.5494)

Chapter 4


4.38

a.

⎛ 3⎞ 3! 3 ⋅ 2 ⋅1 p(0) = ⎜ ⎟ (.3)0(.7)3-0 = (.3)0(.7)3 = (1)(.7)3 = .343 0!3! 1 ⋅ 3 ⋅ 2 ⋅1 ⎝ 0⎠ ⎛ 3⎞ 3! (.3)1(.7)2 = .441 p(1) = ⎜ ⎟ (.3)1(.7)3-1 = 1 1!2! ⎝ ⎠ ⎛ 3⎞ p(2) = ⎜ ⎟ (.3)2(.7)3-2 = ⎝ 2⎠ ⎛ 3⎞ p(3) = ⎜ ⎟ (.3)3(.7)3-3 = ⎝ 3⎠

4.40

4.42

x

p(x)

0 1 2 3

.343 .441 .189 .027

5! (.3)2(.7)1 = .189 2!1!

5! (.3)3(.7)0 = .027 3!0!

a.

P(x = 2) = P(x ≤ 2) − P(x ≤ 1) = .167 − .046 = .121 (from Table II, Appendix B)

b.

P(x ≤ 5) = .034

c.

P(x > 1) = 1 − P(x ≤ 1) = 1 − .919 = .081

d.

P(x < 10) = P(x ≤ 9) = 0

e.

P(x ≥ 10) = 1 − P(x ≤ 9) = 1 − .002 = .998

f.

P(x = 2) = P(x ≤ 2) − P(x ≤ 1) = .206 − .069 = .137

a.

We will check the 5 characteristics of a binomial random variable. 1. 2.

3. 4. 5.

The experiment consists of n = 200 identical trials. There are only two possible outcomes on each trial. Let S = young adult owns a mobile phone with internet access and F = young adult does not own a mobile phone with internet access. The probability of success (S) is the same from trial to trial. For each trial, p = P(S) = .20. q = 1 – p = 1 − .20 = .80. The trials are independent. The binomial random variable x is the number of young adults in 200 trials that own a mobile phone with internet access.

Thus, x is a binomial random variable. b.

From the exercise, p = .20. For any young adult, the probability that they own a mobile phone with internet access is .20.

c.

μ = E ( x) = np = 200(.20) = 40 . On the average, for every 200 young people surveyed, 40 will own mobile phones with internet access.


91


4.44

a.

We will check the 5 characteristics of a binomial random variable. 1. The experiment consists of n = 5 identical trials. We have to assume that the number of bottled water brands is large. 2. There are only 2 possible outcomes for each trial. Let S = brand of bottled water used tap water and F = brand of bottled water did not use tap water. 3. The probability of success (S) is the same from trial to trial. For each trial, p = P(S) = .25 and q = 1 – p = 1 - .25 = .75. 4. The trials are independent. 5. The binomial random variable x is the number of brands in the 5 trials that used tap water. If the total number of brands of bottled water is large, then the above characteristics will be basically true. Thus, x is a binomial random variable.

b.

c.

d.

4.46

⎛5 ⎞ The formula for the probability distribution for x is p( x) = ⎜ ⎟ .25 x (.75)5− x , ⎝ x⎠ for x = 1, 2, 3, 4, 5. ⎛5⎞ 5! .252.753 = .2637 P ( x = 2) = ⎜ ⎟ .252 (.75)5− 2 = 2 2!3! ⎝ ⎠

⎛5⎞ ⎛ 5⎞ P ( x ≤ 1) = P ( x = 0) + P ( x = 1) = ⎜ ⎟ .250 (.75)5−0 + ⎜ ⎟ .251 (.75)5−1 ⎝ 0⎠ ⎝1 ⎠ 5! 5! = .250.755 + .251.754 = .2373 + .3955 = .6328 0!5! 1!4!

a.

In order for x to be a binomial random variable, the n trials must be identical. We can assume that the process of selecting of a worker is identical from trial to trial. There are two possible outcomes - a worker missed work due to a back injury or not. The probability of success must be the same from trial to trial. We can assume that the probability of missing work due to a back injury is constant. The trials must be independent of each other. We can assume that the outcome of one trials will not affect the outcome of any other. Thus, x is a binomial random variable.

b.

From the information given in the problem, the estimate of p is .40.

c.

The mean is μ = E(x) = np = 10(.40) = 4. The standard deviation is σ =

d.

np(1 − p ) = 10(.40)(.60) = 2.4 1 = 1.549

Using Table II, Appendix B, with n = 10 and p = .40, P(x = 1) = P(x ≤ 1) − P(x ≤ 0) = .046 − .006 = .040 P(x > 1) = 1 − P(x ≤ 1) = 1 − .046 = .954

92

Chapter 4


4.48

Let x = number of packets observed by a network sensor in 150 trials. Then x has an approximate binomial distribution with n = 150 and p = .001. The virus will be detected if at least 1 packets is observed. ⎛ 150 ⎞ 150! 0 150 − 0 P ( x ≥ 1) = 1 − P ( x = 0) = 1 − ⎜ =1− .999150 = 1 − .8606 = .1394 ⎟ .001 (.999) 0!150! ⎝ 0 ⎠

4.50

a.

We must assume that the trials are identical, the probability of success is constant from trial to trial, and the trials are independent of each other.

b.

From the problem, we estimate p to be .20. Using Table II, Appendix B, with n = 25 and p = .20,

P(x ≤ 10) = .994 c.

E(x) = np = 25(.20) = 5

σ=

np(1 − p ) = 25(.20)(.80) = 4 = 2

d.

μ ± 2σ ⇒ 5 ± 2(2) ⇒ 5 ± 4 ⇒ (1, 9)

e.

Using Table II, Appendix B, with n = 25 and p = .20,

P(1 < x < 9) = P(x ≤ 8) − P(x ≤ 1) = .953 − .027 = .926 4.52

Assuming the supplier's claim is true,

μ = np = 500(.001) = .5 σ = npq = 500(.001)(.999) = .4995 = .707 If the supplier's claim is true, we would only expect to find .5 defective switches in a sample of size 500. Therefore, it is not likely we would find 4. Based on the sample, the guarantee is probably inaccurate. Note: z =

x−μ

σ

=

4 − .5 = 4.95 .707

This is an unusually large z-score. 4.54

a.

For this test, n = 20 and p = .10. Then x is a binomial random variable with n = 20 and p = .10. Using Table II, Appendix, with n = 20 and p = .10,

P(x ≤ 1) = .392


93


b.

For the experiment in part a, the level of confidence is 1 − P(x ≤ 1) = 1 − .392 = .608. Since this value is not close to 1, this would not be an acceptable level.

c.

Suppose we increased n from 20 to 25. Using Table II, Appendix B, with n = 25 and p = .10,

P(x ≤ 1) = .271. This value is smaller than the value found in part a. Now, suppose we keep n = 20, but change K to 0 instead of 1. Using Table II, Appendix B, with n = 20 and p = .10,

P(x ≤ 0) = .122. This value is again, smaller than the value found in part a. d.

Suppose we let K = 0. Now, we need to find n such that the level of confidence ≥ .95, which means that P(x = 0) ≤ .05. ⎛n⎞ P ( x = 0) = ⎜ ⎟ .10 (.9) n −0 ≤ .05 ⎝0⎠ n! n .9 ≤ .05 0!n! ⇒ .9n ≤ .05 ⇒

⇒ ln(.9n ) ≤ ln(.05) ⇒ nln(.9) ≤ ln(.05) ln(.05) −2.99573 = = 28.4 −.10536 ln(.9) Thus, if K = 0, then we need a sample size of 28 to get a level of confidence of at least .95. ⇒n≤

Now, suppose K = 1. Now, we need to find n such that the level of confidence is at least .95, which means that P(x ≤ 1) ≤ .05.

⎛n⎞ ⎛n⎞ P ( x ≤ 1) = P ( x = 0) + P( x = 1) = ⎜ ⎟ .10 (.9) n −0 + ⎜ ⎟ .11 (.9) n −1 ≤ .05 ⎝0⎠ ⎝1 ⎠ n! n n! .9 + .11.9n −1 ≤ .05 ⇒ 0!n! 1!(n − 1)! ⇒ .9n + n.11.9n −1 ≤ .05 ⇒ .9n −1 (.9 + .1n) ≤ ln(.05) From here, we will use trial and error.

94

Chapter 4


For n = 30, .930-1(.9+.1(30)) = .1837 n

.9n-1(.9+.1n)

30

.930-1(.9+.1(30)) = .1837

40

.940-1(.9+.1(40)) = .0805

45

.945-1(.9+.1(45)) = .0524

46

.946-1(.9+.1(46)) = .0480

Thus, for K = 1, we would need a sample size of 46 to get a level of confidence of at least .95. 4.56

μ = λ = 1.5 Using Table III of Appendix B:

4.58

a.

P(x ≤ 3) = .934

b.

P(x ≥ 3) = 1 − P(x ≤ 2) = 1 − .809 = .191

c.

P(x = 3) = P(x ≤ 3) − P(x ≤ 2) = .934 − .809 = .125

d.

P(x = 0) = .223

e.

P(x > 0) = 1 − P(x = 0) = 1 − .223 = .777

f.

P(x > 6) = 1 − P(x ≤ 6) = 1 − .999 = .001

a.

To graph the Poisson probability distribution with λ = 5, we need to calculate p(x) for x = 0 to 15. Using Table III, Appendix B, p(0) = .007 p(1) = P(x ≤ 1) − P(x ≤ 0) = .040 − .007 = .033 p(2) = P(x ≤ 2) − P(x ≤ 1) = .125 − .040 = .085 p(3) = P(x ≤ 3) − P(x ≤ 2) = .265 − .125 = .140 p(4) = P(x ≤ 4) − P(x ≤ 3) = .440 − .265 = .175 p(5) = P(x ≤ 5) − P(x ≤ 4) = .616 − .440 = .176 p(6) = P(x ≤ 6) − P(x ≤ 5) = .762 − .616 = .146 p(7) = P(x ≤ 7) − P(x ≤ 6) = .867 − .762 = .105 p(8) = P(x ≤ 8) − P(x ≤ 7) = .932 − .867 = .065 p(9) = P(x ≤ 9) − P(x ≤ 8) = .968 − .932 = .036 p(10) = P(x ≤ 10) − P(x ≤ 9) = .986 − .968 = .018 p(11) = P(x ≤ 11) − P(x ≤ 10) = .995 − .986 = .009 p(12) = P(x ≤ 12) − P(x ≤ 11) = .998 − .995 = .003 p(13) = P(x ≤ 13) − P(x ≤ 12) = .999 − .998 = .001 p(14) = P(x ≤ 14) − P(x ≤ 13) = 1.000 − .999 = .001 p(15) = P(x ≤ 15) − P(x ≤ 14) = 1.000 − 1.000 = .000


95


The graph is shown at right:

4.60

b.

μ=λ=5 σ = λ = 5 = 2.2361 μ ± 2σ ⇒ 5 ± 2(2.2361) ⇒ 5 ± 4.4722 ⇒ (.5278, 9.4722)

c.

P(.5278 < x < 9.4722) = P(1 ≤ x ≤ 9) = P(x ≤ 9) − P(x = 0) = .968 − .007 = .961

a.

E(x) = μ = λ = 6

σ = λ = 6 = 2.449 x−μ

z=

c.

Using Table III, Appendix B, with λ = 6,

σ

=

1− 6 = −2.041 2.449

b.

P(x ≤ 10) = .957

4.62

a.

In the problem, it is stated that E(x) = .03. This is also the value of λ.

σ2 = λ = .03 b.

96

The experiment consists of counting the number of deaths or missing persons in a threeyear interval. We must assume that the probability of a death or missing person in a three-year period is the same for any three-year period. We must also assume that the number of deaths or missing persons in any three-year period is independent of the number of deaths or missing persons in any other three-year period.

Chapter 4


c.

4.64

P(x = 1) =

λ 1e -λ = .031e -.03 = .0291

P(x = 0) =

λ 0e - λ = .030e -.03 = .9704

1!

0!

1!

0!

a.

Using Table III and λ = 6.2, P(x = 2) = P(x ≤ 2) − P(x ≤ 1) = .054 − .015 = .039 P(x = 6) = P(x ≤ 6) − P(x ≤ 5) = .574 − .414 = .160 P(x = 10) = P(x ≤ 10) − P(x ≤ 9) = .949 − .902 = .047

b.

The plot of the distribution is:

c.

μ = λ = 6.2, σ = λ = 6.2 = 2.490 μ ± σ ⇒ 6.2 ± 2.49 ⇒ (3.71, 8.69) μ ± 2σ ⇒ 6.2 ± 2(2.49) ⇒ 6.2 ± 4.98 ⇒ (1.22, 11.18) μ ± 3σ ⇒ 6.2 ± 3(2.49) ⇒ 6.2 ± 7.47 ⇒ (−1.27, 13.67) See the plot in part b.

d.

First, we need to find the mean number of customers per hour. If the mean number of customers per 10 minutes is 6.2, then the mean number of customers per hour is 6.2(6) = 37.2 = λ.

μ = λ = 37.2 and σ = λ = 37.2 = 6.099 μ ± 3σ ⇒ 37.2 ± 3(6.099) ⇒ 37.2 ± 18.297 ⇒ (18,903, 55.498) Using Chebyshev's Rule, we know at least 8/9 or 88.9% of the observations will fall within 3 standard deviations of the mean. The number 75 is way beyond the 3 standard deviation limit. Thus, it would be very unlikely that more than 75 customers entered the store per hour on Saturdays.


97


4.66

Let x = number of minor flaws in one square foot of a door's surface. Then x has a Poisson distribution with λ = .5.

μ= λ = .5, using Table III, Appendix B: P(fail inspection) = P(2 or more minor flaws in the square foot inspected) = P(x ≥ 2) = 1 − P(x ≤ 1) = 1 − .910 = .090 P(pass inspection) = P(x < 2) = P(x ≤ 1) = .910

4.68

If it takes exactly 5 minutes to wash a car and there are 5 cars in line, it will take 5(5) = 25 minutes to wash these 5 cars. Thus, for anyone to be in line at closing time, more than 1 car must arrive in the final ½ hour. In addition, if on average 10 cars arrive per hour, then an average of 5 cars will arrive per ½ hour (30 minutes). If we let x = number of cars to arrive in ½ hour, then x is a Poisson random variable with λ = 5. P(x > 1) = 1 – P(x ≤ 1) = 1 − .04 = .96 (Using Table III, Appendix B)

Since this probability is so big, it is very likely that someone will be in line at closing time. 4.70

4.72

⎧ .04 (20 ≤ x ≤ 45) From Exercise 4.69, f(x) = ⎨ ⎩ 0 otherwise a.

P(20 ≤ x ≤ 30) = (30 − 20)(.04) = .4

b.

P(20 < x < 30) = (30 − 20)(.04) = .4

c.

P(x ≥ 30) = (45 − 30)(.04) = .6

d.

P(x ≥ 45) = (45 − 45)(.04) = 0

e.

P(x ≤ 40) = (40 − 20)(.04) = .8

f.

P(x < 40) = (40 − 20)(.04) = .8

g.

P(15 ≤ x ≤ 35) = (35 − 20)(.04) = .6

h.

P(21.5 ≤ x ≤ 31.5) = (31.5 − 21.5)(.04) = .4

⎧ 1 (3 ≤ x ≤ 7) ⎪ From Exercise 4.71, f(x) = ⎨ 4 ⎪⎩ 0 otherwise a.

98

⎛1⎞ P(x ≥ a) = .6 ⇒ (7 − a) ⎜ ⎟ = .6 ⎝4⎠ ⇒ 7 − a = 2.4 ⇒ a = 4.6

Chapter 4


b.

c.

d.

4.74

⎛1⎞ P(x ≤ a) = .25 ⇒ (a − 3) ⎜ ⎟ = .25 ⎝4⎠ ⇒ a−3=1 ⇒ a=4 ⎛1⎞ P(x ≤ a) = 1 ⇒ (a − 3) ⎜ ⎟ = 1 ⎝4⎠ ⇒ a−3=4 ⇒ a=7 For any value of a ≥ 7, P(x ≤ a) = 1. Thus, a ≥ 7. ⎛1⎞ P(4 ≤ x ≤ a) = .5 ⇒ (a − 4) ⎜ ⎟ = .5 ⎝4⎠ ⇒ a − 4= 2 ⇒ a=6

c+d = 10 ⇒ c + d = 20 ⇒ c = 20 - d 2 d -c σ= = 1 ⇒ d − c = 12 12

μ=

Substituting, d − (20 − d) = 12 ⇒ 2d − 20 = 12 ⇒ 2d = 20 + 12 20 + 12 ⇒d= 2 ⇒ d = 11.732 Since c + d = 20 ⇒ c + 11.732 = 20 ⇒ c = 8.268 1 (c ≤ x ≤ d) f(x) = d −c 1 1 1 = = = .289 d − c 11.732 - 8.268 3.464 ⎧ .289 (8.268 ≤ x ≤ 11.732) Therefore, f(x) = ⎨ ⎩ 0 otherwise The graph of the probability distribution for x is given here.


99


4.76

a.

For this problem, c = 0 and d = 1. 1 ⎧ 1 (0 ≤ x ≤ 1) = ⎪ f(x) = ⎨ d − c 1 − 0 ⎪⎩ 0 otherwise c+d 0 +1 = = .5 2 2 2 2 (d − c) (1 − 0) 1 = .0833 σ2 = = = 12 12 12 P(.2 < x < .4) = (.4 − .2)(1) = .2

μ=

b.

4.78

c.

P(x > .995) = (1 − .995)(1) = .005. Since the probability of observing a trajectory greater than .995 is so small, we would not expect to see a trajectory exceeding .995.

a.

For layer 2, let x = amount loss. Since the amount of loss is random between .01 and .05 million dollars, the uniform distribution for x is: f(x) =

1 d −c

(c ≤ x ≤ d)

1 1 1 = = = 25 d − c .05 − .01 .04

⎧ 25 (.01 ≤ x ≤ .05) Therefore, f(x) = ⎨ ⎩ 0 otherwise A graph of the distribution looks like the following:

μ=

σ=

c + d .01 + .05 = = .03 2 2 d −c

12

=

.05 − .01 12

= .0115, σ2 = (.0115)2 = .00013

The mean loss for layer 2 is .03 million dollars and the variance of the loss for layer 2 is .00013 million dollars squared.

100

Chapter 4


b.

For layer 6, let x = amount loss. Since the amount of loss is random between .50 and 1.00 million dollars, the uniform distribution for x is: f(x) =

1 d −c

(c ≤ x ≤ d)

1 1 1 = = =2 d − c 1.00 − .50 .50

⎧ 2 (.50 ≤ x ≤ 1.00) Therefore, f(x) = ⎨ ⎩ 0 otherwise A graph of the distribution looks like the following:

μ=

σ=

c + d .50 + 1.00 = = .75 2 2

d −c

12

=

1.00 − .50 = .1443, σ2 = (.1443)2 = .0208 12

The mean loss for layer 6 is .75 million dollars and the variance of the loss for layer 6 is .0208 million dollars squared. c.

A loss of $10,000 corresponds to x = .01. P(x > .01) = 1 A loss of $25,000 corresponds to x = .025. 1 ⎛ 1 ⎞ ⎛ ⎞ P(x < .025) = (Base)(Height) = (x − c) ⎜ ⎟ = (.025 − .01) ⎜ ⎟ ⎝d − c⎠ ⎝ .05 − .01 ⎠ = .015(25) = .375


101


d.

A loss of $750,000 corresponds to x = .75. A loss of $1,000,000 corresponds to x = 1. 1 ⎛ 1 ⎞ ⎛ ⎞ P(.75 < x < 1) = (Base)(Height) = (d - x) ⎜ ⎟ = (1.00 - .75) ⎜ ⎟ ⎝ 1.00 − .50 ⎠ ⎝d −c⎠ = .25(2) = .5

A loss of $900,000 corresponds to x = .90. 1 ⎛ 1 ⎞ ⎛ ⎞ P(x > .9) = (Base)(Height) = (d − x) ⎜ ⎟ = (1.00 − .90) ⎜ ⎟ ⎝ 1.00 − .50 ⎠ ⎝d −c⎠ = .10(2) = .20 P(x = .9) = 0

4.80

Let x = cycle availability, where x has a uniform distribution on the interval from 0 to 1. Mean = μ =

c + d 0 +1 = = .5 2 2

Standard deviation = σ =

d −c

12

=

1− 0 = .289 12

The 10th percentile is that value of x such that 10% of all observations are below it. Let K1 = 10th percentile. P(x ≤ K1) = (K1 − 0)(1 − 0) = K1 = .10

The lower quartile is that value of x such that 25% of all observations are below it. Let K2 = 25th percentile. P(x ≤ K2) = (K2 − 0)(1 − 0) = K2 = .25

The UPPER quartile is that value of x such that 75% of all observations are below it. Let K3 = 75th percentile. P(x ≤ K3) = (K3 − 0)(1 − 0) = K3 = .75

4.82

102

a.

Chapter 4


b.

c + d 0 +1 = = .5 2 2 d − c 1− 0 = σ= = .289 12 12 μ=

σ2 = .2892 = .083

c.

P(p > .95) = (1 − .95)(1) = .05 P(p < .95) = (.95 − 0)(1) = .95

d.

The analyst should use a uniform probability distribution with c = .90 and d = .95. 1 1 ⎧ 1 = = = 20 (.90 ≤ p ≤ .95) ⎪ f(p) = ⎨ d − c .95 − .90 .05 ⎪⎩ 0 otherwise

4.84

4.86

Table IV in the text gives the area between z = 0 and z = z0. In this exercise, the answers may thus be read directly from the table by looking up the appropriate z. a.

P(0 < z < 2.0) = .4772

b.

P(0 < z < 3.0) = .4987

c.

P(0 < z < 1.5) = .4332

d.

P(0 < z < .80) = .2881

a.

P(−1 ≤ z ≤ 1) = A1 + A2 = .3413 + .3413 = .6826

b.

P(−2 ≤ z ≤ 2) = A1 + A2 = .4772 + .4772 = .9544

c.

P(−2.16 < z ≤ 0.55) = A1 + A2 = .4846 + .2088 = .6934


103


4.88

4.90

104

d.

P(−.42 < z < 1.96) = P(−.42 ≤ z ≤ 0) + P(0 ≤ z ≤ 1.96) = A 1 + A2 = .1628 + .4750 = .6378

e.

P(z ≥ −2.33) = P(−2.33 ≤ z ≤ 0) + P(z ≥ 0) = A 1 + A2 = .4901 + .5000 = .9901

f.

P(z < 2.33) = P(z ≤ 0) + P(0 ≤ z ≤ 2.33) = A 1 + A2 = .5000 + .4901 = .9901

a.

P(z = 1) = 0, since a single point does not have an area.

b.

P(z ≤ 1) = P(z ≤ 0) + P(0 < z ≤ 1) = A 1 + A2 = .5 + .3413 = .8413 (Table IV, Appendix B)

c.

P(z < 1) = P(z ≤ 1) = .8413 (Refer to part b.)

d.

P(z > 1) = 1 − P(z ≤ 1) = 1 − .8413 = .1587 (Refer to part b.)

Using Table IV, Appendix B: a.

P(z ≥ z0) = .05 A1 = .5 − .05 = .4500 Looking up the area .4500 in Table IV gives z0 = 1.645.

b.


c.

P(z ≤ z0) = .025 A1 = .5 − .025 = .4750 Looking up the area .4750 in Table IV gives z = 1.96. Since z0 is to the left of 0, z0 = −1.96.

Chapter 4


4.92

4.94

d.


e.

P(z > z0) = .10 A1 = .5 − .1 = .4 z0 = 1.28 (same as in d)

a.

z=1

b.

z = −1

c.

z=0

d.

z = −2.5

e. z = 3 Using Table IV of Appendix B: a.

To find the probability that x assumes a value more than 2 standard deviations from μ: P(x < μ − 2σ) + P(x > μ + 2σ) = P(z < −2) + P(z > 2) = 2P(z > 2) = 2(.5000 − .4772) = 2(.0228) = .0456 To find the probability that x assumes a value more than 3 standard deviations from μ: P(x < μ − 3σ) + P(x > μ + 3σ) = P(z < −3) + P(z > 3) = 2P(z > 3) = 2(.5000 − .4987) = 2(.0013) = .0026

b.

To find the probability that x assumes a value within 1 standard deviation of its mean: P(μ − σ < x < μ + σ) = P(−1 < z < 1) = 2P(0 < z < 1) = 2(.3413) = .6826


105


To find the probability that x assumes a value within 2 standard deviations of μ: P(μ − 2σ < x < μ + 2σ) = P(−2 < z < 2) = 2P(0 < z < 2) = 2(.4772) = .9544 c.

To find the value of x that represents the 80th percentile, we must first find the value of z that corresponds to the 80th percentile. P(z < z0) = .80. Thus, A1 + A2 = .80. Since A1 = .50, A2 = .80 - .50 = .30. Using the body of Table IV, z0 = .84. To find x, we substitute the values into the z-score formula: z=

x−μ

σ

.84 =

x − 1000 ⇒ x = .84(10) + 1000 = 1008.4 10

To find the value of x that represents the 10th percentile, we must first find the value of z that corresponds to the 10th percentile.

P(z < z0) = .10. Thus, A1 = .50 - .10 = .40. Using the body of Table IV, z0 = −1.28. To find x, we substitute the values into the z-score formula: z=

x−μ

σ

−1.28 = 4.96

x − 1000 ⇒ x = −1.28(10) + 1000 = 987.2 10

The random variable x has a normal distribution with μ = 50 and σ = 3. a.

P(x ≤ x0) = .8413 So, A1 + A2 = .8413 Since A1 = .5, A2 = .8413 − .5 = .3413. Look up the area .3413 in the body of Table IV, Appendix B; z0 = 1.0.

106

Chapter 4


To find x0, substitute all the values into the z-score formula: z=

x−μ

σ

x − 50 1.0 = 0 3 x0 = 50 + 3(1.0) = 53 b.

P(x > x0) = .025 So, A = .5000 − .025 = .4750 Look up the area .4750 in the body of Table IV, Appendix B; z0 = 1.96. To find x0, substitute all the values into the z-score formula: z=

x−μ

σ

x − 50 1.96 = 0 3 x0 = 50 + 3(1.96) = 55.88 c.

P(x > x0) = .95 So, A1 + A2 = .95. Since A2 = .5, A1 = .95 − .5 = .4500. Look up the area .4500 in the body of Table IV, Appendix B; (since it is exactly between two values, average the z-scores). z0 ≈ −1.645. To find x0, substitute into the z-score formula: z=

x−μ

σ

x − 50 −1.645 = 0 3 x0 = 50 − 3(1.645) = 45.065

d.

P(41 ≤ x < x0) = .8630 z=

x−μ

σ

=

41 − 50 = −3 3

A1 = P(41 ≤ x ≤ μ) = P(−3 ≤ z ≤ 0) = P(0 ≤ z ≤ 3) = .4987 A1 + A2 = .8630, since A1 = .4987, A2 = .8630 - .4987 = .3643. Look up .3643 in the body of Table IV, Appendix B; z0 = 1.1.


107


To find x0, substitute into the z-score formula: z=

x−μ

σ

x − 50 1.1 = 0 3 x0 = 50 + 3(1.1) = 53.3 e.

P(x < x0) = .10 So A = .5000 − .10 = .4000 Look up area .4000 in the body of Table IV, Appendix B; z0 = 1.28. Since z0 is to the left of 0, z0 = −1.28. To find x0, substitute all the values into the z-score formula: z=

x−μ

σ

x0 − 50 3 x0 = 50 − 1.28(3) = 46.16

−1.28 =

f.

P(x > x0) = .01 So A = .5000 − .01 = .4900 Look up area .4900 in the body of Table IV, Appendix B; z0 = 2.33. To find x0, substitute all the values into the z-score formula: z=

x−μ

σ

x0 − 50 3 x0 = 50 + 2.33(3) = 56.99

2.33 =

4.98

a.

Using Table IV, Appendix B,

0 − 5.26 ⎞ ⎛ P ( x > 0) = P ⎜ z > ⎟ = P ( z > −0.526) 10 ⎠ ⎝ = .5 + P (−0.53 < z < 0) = .5 + .2019 = .7019 b.

108

15 − 5.26 ⎞ ⎛ 5 − 5.26
Chapter 4


c.

d.

1 − 5.26 ⎞ ⎛ P ( x < 1) = P ⎜ z < ⎟ = P( z < −0.426) 10 ⎠ ⎝ = .5 − P(−0.43 < z < 0) = .5 − .1664 = .3336 −25 − 5.26 ⎞ ⎛ P ( x ≤ −25) = P ⎜ z ≤ ⎟ = P ( z ≤ −3.026) 10 ⎝ ⎠ = .5 − P(−3.03 ≤ z < 0) = .5 − .4988 = .0012 Since the probability of seeing a win percentage of -25% or anything more unusual is so small (p = .0012), we would conclude that the average casino win percentage is not 5.26%.

4.100

Let x = driver’s head injury rating. The random variable x has a normal distribution with μ = 605 and σ = 185. Using Table IV, Appendix B, a.

b.

700 − 605 ⎞ ⎛ 500 − 605 P (500 < x < 700) = P ⎜
= P ( −1.11 < z < 0) − P ( −0.57 < z < 0) = .3665 − .2157 = .1508 c.

d.

850 − 605 ⎞ ⎛ P ( x < 850) = P ⎜ z < ⎟ = P ( z < 1.32) = .5 + P (0 < z < 1.32) 185 ⎠ ⎝ = .5 + .4066 = .9066 1, 000 − 605 ⎞ ⎛ P ( x > 1, 000) = P ⎜ z > ⎟ = P ( z > 2.14) = .5 − P (0 < z < 2.14) 185 ⎝ ⎠

= .5 − .4838 = .0162 4.102

a.

Let x = crop yield. The random variable x has a normal distribution with μ = 1,500 and σ = 250. 1,600 -1,500 ⎞ ⎛ P(x < 1,600) = P ⎜ z < ⎟ = P(z < .4) = .5 + .1554 = .6554 250 ⎝ ⎠ (Using Table IV)


109


b.

Let x1 = crop yield in first year and x2 = crop yield in second year. If x1 and x2 are independent, then the probability that the farm will lose money for two straight years is: 1,600 − 1,500 ⎞ ⎛ 1,600 − 1,500 ⎞ ⎛ P(x1 < 1,600) P(x2 < 1,600) = P ⎜ z1 < ⎟ P ⎜ z2 < ⎟ 250 250 ⎝ ⎠ ⎝ ⎠

= P(z1 < .4) P(z2 < .4) = (.5 + .1554)(.5 + .1554) = .6554(.6554) = .4295 (Using Table IV) c.

4.104

[1,500 + 2σ ] − 1,500 ⎞ ⎛ [1,500 − 2σ ] − 1,500 P(1,500 − 2σ ≤ x ≤ 1,500 + 2σ) = P ⎜ ≤z≤ ⎟ σ σ ⎝ ⎠ = P(−2 ≤ z ≤ 2) = 2P(0 ≤ z ≤ 2) = 2(.4772) = .9544 (Using Table IV)

Let x = wage rate. The random variable x is normally distributed with μ = 16 and σ = 1.25. Using Table IV, Appendix B, a.

b.

c.

17.30 − 16 ⎞ ⎛ P ( x > 17.30) = P ⎜ z > ⎟ = P ( z > 1.04) 1.25 ⎠ ⎝ = .5 − P(0 < z < 1.04) = .5 − .3508 = .1492 17.30 − 16 ⎞ ⎛ P ( x > 17.30) = P ⎜ z > ⎟ = P ( z > 1.04) 1.25 ⎠ ⎝ = .5 − P(0 < z < 1.04) = .5 − .3508 = .1492 P(x ≤ η) = P(x ≥ η) = .5 Thus, μ = η = 16. (Recall from section 2.4 that in a symmetric distribution, the mean equals the median.)

4.106

a.

The contract will be profitable if total cost, x, is less than $1,000,000. 1,000,000 − 850,000 ⎞ ⎛ P(x < 1,000,000) = P ⎜ z < ⎟ = P(z < .88) = .5 + .3106 = .8106 170,000 ⎝ ⎠

b.

The contract will result in a loss if total cost, x, exceeds 1,000,000. P(x > 1,000,000) = 1 − P(x < 1,000,000) = 1 − .8106 = .1894

110

Chapter 4


c.

P(x < R) = .99. Find R. R − 850,000 ⎞ ⎛ = P(z < z0) = .99 P(x < R) = P ⎜ z < 170,000 ⎟⎠ ⎝ A1 = .99 − .5 = .4900 Looking up the area .4900 in Table IV, z0 = 2.33 R − 850,000 R − 850,000 ⇒ 2.33 = 170,000 170,000 ⇒ R = 2.33(170,000) + 850,000 = $1,246,100

z0 =

4.108

a.

Let x = quantity injected per container. The random variable x has a normal distribution with μ = 10 and σ = .2. 10 − 10 ⎞ ⎛ P(x < 10) = P ⎜ z < ⎟ = P(z < 0.0) = .5 .2 ⎠ ⎝

10 − 10 ⎞ ⎛ P(x ≥ 10) = P ⎜ z ≥ ⎟ = P(z ≥ 0.0) = .5 .2 ⎠ ⎝

4.110

b.

Since the container needed to be reprocessed, it cost $10. Upon refilling, it contained 10.60 units with a cost of 10.60($20) = $212. Thus, the total cost for filling this container is $10 + $212 = $222. Since the container sells for $230, the profit is $230 − $222 = $8.

c.

Let x = quantity injected per container. The random variable x has a normal distribution with μ = 10.10 and σ = .2. The expected value of x is E(x) = μ = 10.10. The cost of a container with 10.10 units is 10.10($20) = $202. Thus, the expected profit would be the selling price minus the cost or $230 − $202 = $28.

a.

If z is a standard normal random variable, QL = zL is the value of the standard normal distribution which has 25% of the data to the left and 75% to the right. Find zL such that P(z < zL) = .25 A1 = .50 − .25 = .25. Look up the area A1 = .25 in the body of Table IV of Appendix B; zL = −.67 (taking the closest value). If interpolation is used, −.675 would be obtained.


111


QU = zU is the value of the standard normal distribution which has 75% of the data to the left and 25% to the right. Find zU such that P(z < zU) = .75 A1 + A2 = P(z ≤ 0) + P(0 ≤ z ≤ zU) = .5 + P(0 ≤ z ≤ zU) = .75 Therefore, P(0 ≤ z ≤ zU) = .25. Look up the area .25 in the body of Table IV of Appendix B; zU = .67 (taking the closest value). b.

Recall that the inner fences of a box plot are located 1.5(QU - QL) outside the hinges (QL and QU). To find the lower inner fence, QL − 1.5(QU − QL) = −.67 − 1.5(.67 − (−.67)) = -.67 − 1.5(1.34) = -2.68 (−2.70 if zL = −.675 and zU = +.675) The upper inner fence is: QU + 1.5(QU - QL) = .67 + 1.5(.67 − (−.67)) = .67 + 1.5(1.34) = 2.68 (+2.70 if zL = −.675 and zU = +.675)

c.

Recall that the outer fences of a box plot are located 3(QU − QL) outside the hinges (QL and QU). To find the lower outer fence, QL − 3(QU − QL) = −.67 − 3(.67 − (−.67)) = −.67 − 3(1.34) = -4.69 (−4.725 if zL = −.675 and zU = +.675) The upper outer fence is: QU + 3(QU − QL) = .67 + 3(.67 − (−.67)) = .67 + 3(1.34) = 4.69 (4.725 if zL = −.675 and zU = +.675)

112

Chapter 4


d.

P(z < −2.68) + P(z > 2.68) = 2P(z > 2.68) = 2(.5000 − .4963) (Table IV, Appendix B) = 2(.0037) = .0074 (or 2(.5000 − .4965) = .0070 if −2.70 and 2.70 are used) P(z < −4.69) + P(z > 4.69) = 2P(z > 4.69) ≈ 2(.5000 − .5000) ≈ 0

4.112

4.114

e.

In a normal probability distribution, the probability of an observation being beyond the inner fences is only .0074 and the probability of an observation being beyond the outer fences is approximately zero. Since the probability is so small, there should not be any observations beyond the inner and outer fences. Therefore, they are probably outliers.

a.

IQR = QU − QL = 195 − 72 = 123

b.

IQR/s = 123/95 = 1.295

c.

Yes. Since IQR is approximately 1.3, this implies that the data are approximately normal.

a.

Using MINITAB, the stem-and-leaf display is: Stem-and-leaf of X Leaf Unit = 0.10 5

11266

6

2

1

8

3

35

11

4

035

14

5

039

14

6

3457

10

7

346

7

8

24469

2

N = 28

47

Since the data do not form a mound-shape, it indicates that the data may not be normally distributed. b.

Using MINITAB, the descriptive statistics are: Variable X Variable X

N

Mean

Median

TrMean

StDev

SE Mean

28

5.511

6.100

5.519

2.765

0.5230

Minimum

Maximum

Q1

Q3

1.100

9.700

3.350

8.050

The standard deviation is 2.765.


113


c.

Using the printout from MINITAB in part b, QL = 3.35, and QU = 8.05. The IQR = QU − QL = 8.05 − 3.35 = 4.7. If the data are normally distributed, then IQR/s ≈ 1.3. For this data, IQR/s = 4.7/2.765 = 1.70. This is a fair amount larger than 1.3, which indicates that the data may not be normally distributed.

d.

Using MINITAB, the normal probability plot is:

The data at the extremes are not particularly on a straight line. This indicates that the data are not normally distributed.

4.116

From the normal probability plot, it appears that the data may not be normal. The points with small observed values and the points with large observed values do not fall on the straight line. This implies that the data may not be from a normal distribution.

4.118

a.

We will look at the 4 methods for determining if the data are normal. First, we will look at a histogram of the data. Using MINITAB, the histogram of the fish weights is: 35 30

Frequency

25 20 15 10 5 0 0

500

1000

1500

2000

2500

Weight

From the histogram, the data appear to be fairly mound-shaped. This indicates that the data may be normal.

114

Chapter 4


Next, we look at the intervals x ± s, x ± 2 s, x ± 3s . If the proportions of observations falling in each interval are approximately .68, .95, and 1.00, then the data are approximately normal. Using MINITAB, the summary statistics are: Descriptive Statistics: Weight Variable Weight

N 144

Mean 1049.7

Median 1000.0

TrMean 1039.4

Variable Weight

Minimum 173.0

Maximum 2302.0

Q1 804.5

Q3 1263.3

StDev 376.5

SE Mean 31.4

x ± s ⇒ 1049.7 ± 376.5 ⇒ (673.2, 1, 426.2) 98 of the 144 values fall in this interval. The proportion is .68. This is exactly the .68 we would expect if the data were normal. x ± 2 s ⇒ 1049.7 ± 2(376.5) ⇒ 1049.7 ± 753 ⇒ (296.7, 1802.7) 140 of the 144 values fall in this interval. The proportion is .97. This is somewhat larger than the .95 we would expect if the data were normal. x ± 3s ⇒ 1049.7 ± 3(376.5) ⇒ 1049.7 ± 1126.5 ⇒ (−79.8, 2179.2) 143 of the 144 values fall in this interval. The proportion is .993. This is close to the 1.00 we would expect if the data were normal. From this method, it appears that the data are normal. Next, we look at the ratio of the IQR to s. IQR = QU – QL = 1263.3 – 804.5 = 458.8. IQR 458.8 = = 1.22 This is close to the 1.3 we would expect if the data were normal. 376.5 s This method indicates the data are normal. Finally, using MINITAB, the normal probability plot is: Normal Probability Plot for Weight ML Estimates - 95% CI

ML Estimates 99

Mean

1049.72

StDev

375.236

Percent

95 90

Goodness of Fit

80 70 60 50 40 30 20

AD*

0.793

10 5 1

0

1000

2000

Data

Since the data form a fairly straight line, the data appear to be normal.


115


From the 4 different methods, all indications are that the fish weight data are approximately normal. b.

We will look at the 4 methods for determining if the data are normal. First, we will look at a histogram of the data. Using MINITAB, the histogram of the fish DDT levels is:

140 120

Frequency

100 80 60 40 20 0 0

500

1000

DDT

From the histogram, the data appear to be skewed to the right. This indicates that the data may not be normal. Next, we look at the intervals x ± s, x ± 2 s, x ± 3s . If the proportions of observations falling in each interval are approximately .68, .95, and 1.00, then the data are approximately normal. Using MINITAB, the summary statistics are: Descriptive Statistics: DDT Variable DDT

N 144

Mean 24.35

Median 7.15

TrMean 10.38

Variable DDT

Minimum 0.11

Maximum 1100.00

Q1 3.33

Q3 13.00

StDev 98.38

SE Mean 8.20

x ± s ⇒ 24.35 ± 98.38 ⇒ (−74.03, 122.73) 138 of the 144 values fall in this interval. The proportion is .96. This is much greater than the .68 we would expect if the data were normal. x ± 2 s ⇒ 24.35 ± 2(98.38) ⇒ 24.35 ± 196.76 ⇒ (−172.41, 221.11) 142 of the 144 values fall in this interval. The proportion is .986 This is much larger than the .95 we would expect if the data were normal. x ± 3s ⇒ 24.35 ± 3(98.38) ⇒ 24.35 ± 295.14 ⇒ (−270.79, 319.49) 142 of the 144 values fall in this interval. The proportion is .986. This is somewhat lower than the 1.00 we would expect if the data were normal. From this method, it appears that the data are not normal. Next, we look at the ratio of the IQR to s. IQR = QU – QL = 13.00 – 3.33 = 9.67.

116

Chapter 4


IQR 9.67 = = 0.098 This is much smaller than the 1.3 we would expect if the data were s 98.38 normal. This method indicates the data are not normal. Finally, using MINITAB, the normal probability plot is: Normal Probability Plot for DDT ML Estimates - 95% CI

ML Estimates 99

Percent

95 90

Mean

24.355

StDev

98.0364

Goodness of Fit

80 70 60 50 40 30 20

AD*

38.58

10 5 1

0

500

1000

Data

Since the data do not form a straight line, the data are not normal. From the 4 different methods, all indications are that the fish DDT level data are not normal. 4.120

We will look at the 4 methods for determining if the data are normal. First, we will look at a histogram of the data. Using MINITAB, the histogram of the sanitation scores is: Histogram of SCORE

40

Fr equency

30

20

10

0

66

72

78

84

90

96

SC O RE


117


From the histogram, the data appear to be skewed to the left. This indicates that the data are not normal. Next, we look at the intervals x ± s, x ± 2 s, x ± 3s . If the proportions of observations falling in each interval are approximately .68, .95, and 1.00, then the data are approximately normal. Using MINITAB, the summary statistics are: Descriptive Statistics: DDT Variable DDT

N 144

Mean 24.35

Median 7.15

TrMean 10.38

Variable DDT

Minimum 0.11

Maximum 1100.00

Q1 3.33

Q3 13.00

StDev 98.38

SE Mean 8.20

x ± s ⇒ 94.911 ± 4.825 ⇒ (90.086, 99.736) 137 of the 169 values fall in this interval. The proportion is .81. This is much larger than the .68 we would expect if the data were normal. x ± 2 s ⇒ 94.911 ± 2(4.825) ⇒ 94.911 ± 9.65 ⇒ (85.261, 104.561) 165 of the 169 values fall in this interval. The proportion is .98. This is somewhat larger than the .95 we would expect if the data were normal. x ± 3s ⇒ 94.911 ± 3(4.825) ⇒ 94.911 ± 14.475 ⇒ (80.436, 109.386) 166 of the 169 values fall in this interval. The proportion is .982. This is somewhat smaller than the 1.00 we would expect if the data were normal. From this method, it appears that the data are not normal. Next, we look at the ratio of the IQR to s. IQR = QU – QL = 98 – 93 = 5. IQR 5 = = 1.036 This is much smaller than the 1.3 we would expect if the data were s 4.825 normal. This method indicates the data are not normal. Finally, using MINITAB, the normal probability plot is: Probability Plot of SCORE N ormal - 95% C I 99.9

Mean StDev

99

N AD P-Value

95

94.91 4.825 169 7.216 <0.005

P er cent

90 80 70 60 50 40 30 20 10 5 1 0.1

60

118

70

80

90 SC O RE

100

110

Chapter 4


Since the data do not form a straight line, the data are not normal. From the 4 different methods, all indications are that the sanitation scores data are not normal. 4.122

We will look at the 4 methods for determining if the data are normal. First, we will look at a histogram of the data. Using MINITAB, the histogram of the tensile strength values is: Histogram of Strength 3.0

Fr equency

2.5 2.0

1.5 1.0

0.5 0.0

330

335

340 345 Str ength

350

355

From the histogram, the data appear to be somewhat skewed to the left. This might indicate that the data are not normal. Next, we look at the intervals x ± s, x ± 2 s, x ± 3s . If the proportions of observations falling in each interval are approximately .68, .95, and 1.00, then the data are approximately normal. Using MINITAB, the summary statistics are: Descriptive Statistics: Strength Variable Strength

N 11

N* 0

Variable Strength

Maximum 356.30

Mean 342.13

SE Mean 2.38

StDev 7.91

Minimum 328.20

Q1 334.70

Median 343.60

Q3 347.80

x ± s ⇒ 342.13 ± 7.91 ⇒ (334.22, 350.04) 8 of the 11 values fall in this interval. The proportion is .73. This is somewhat larger than the .68 we would expect if the data were normal. x ± 2 s ⇒ 342.16 ± 2(7.91) ⇒ 342.16 ± 9.65 ⇒ (326.34, 357.98) All 11 of the 11 values fall in this interval. The proportion is 1.00. This is somewhat larger than the .95 we would expect if the data were normal. x ± 3s ⇒ 342.16 ± 3(7.91) ⇒ 342.16 ± 23.73 ⇒ (318.43, 365.89) Again, all 11 of the 11 values fall in this interval. The proportion is 1.00. This is equal to the 1.00 we would expect if the data were normal.


119


From this method, it appears that the data are quite normal. Next, we look at the ratio of the IQR to s. IQR = QU – QL = 347.80 – 334.70 = 13.1. IQR 13.1 = = 1.656 This is much larger than the 1.3 we would expect if the data were normal. s 7.91 This method indicates the data are not normal.

Finally, using MINITAB, the normal probability plot is: Probability Plot of Strength Normal - 95% CI 99

Mean StDev N AD P-Value

95 90

342.1 7.907 11 0.154 0.937

80

Percent

70 60 50 40 30 20 10 5

1

310

320

330

340 Strength

350

360

370

Since the data do form a fairly straight line, the data could be normal. From the 4 different methods, three of the four indicate that the data probably are not from a normal distribution. 4.124

a.

In order to approximate the binomial distribution with the normal distribution, the interval μ ± 3σ ⇒ np ± 3 npq should lie in the range 0 to n. When n = 25 and p = .4, np ± 3 npq ⇒ 25(.4) ± 3 25(.4)(1 − .4) ⇒ 10 ± 3 6 ⇒ 10 ± 7.3485 ⇒ (2.6515, 17.3485) Since the interval calculated does lie in the range 0 to 25, we can use the normal approximation.

b.

μ = np = 25(.4) = 10 σ2 = npq = 25(.4)(.6) = 6

c.

P(x ≥ 9) = 1 − P(x ≤ 8) = 1 − .274 = .726

d.

120

(Table II, Appendix B)

⎛ (9 − .5) − 10 ⎞ P(x ≥ 9) ≈ P ⎜ z ≥ ⎟ 6 ⎝ ⎠ = P(z ≥−.61) = .5000 + .2291 = .7291 (Using Table IV in Appendix B.)

Chapter 4


4.126

μ = np = 1000(.5) = 500, σ = a.

npq = 1000(.5)(.5) = 15.811

Using the normal approximation, (500 + .5) 500 ⎞ ⎛ P(x > 500) ≈ P ⎜ z > ⎟ = P(z > .03) = .5 − .0120 = .4880 15.811 ⎝ ⎠ (from Table IV, Appendix B)

b.

(500 − .5) − 500 ⎞ ⎛ (490 − .5) − 500 P(490 ≤ x < 500) ≈ P ⎜ ≤z< ⎟ 15.811 15.811 ⎝ ⎠ = P(−.66 ≤ z < −.03) = .2454 − .0120 = .2334 (from Table IV, Appendix B)

c.

4.128

(550 + .5) − 500 ⎞ ⎛ P(x > 550) ≈ P ⎜ z > ⎟ = P(z > 3.19) ≈ .5 − .5 = 0 15.811 ⎝ ⎠ (from Table IV, Appendix B)

a.

E(x) = μ = np = 350(.27) = 94.5.

b.

σ = σ 2 = npq = 350(.27)(.73) = 68.985 = 8.306

c.

z=

d.

To see if the normal approximation is appropriate, we use:

x−μ

σ

=

99.5 − 94.5 = 0.60 8.306

μ ± 3σ ⇒ 94.5 ± 3(8.306) ⇒ 94.5 ± 24.918 ⇒ (69.582, 119.418) Since the interval lies in the range of 0 to 350, the normal approximation is appropriate. P ( x ≥ 100) ≈ P ( z ≥ 0.60) = .5 − .2257 = .2743 (Using Table IV, Appendix B) 4.130

Let x = number of white-collar employees in good shape who will develop stress related illnesses in a sample of 400. Then x is a binomial random variable with n = 400 and p = .10. To see if the normal approximation is appropriate for this problem: np ± 3 npq ⇒ 400(.1) ± 3 400(.1)(.9) ⇒ 40 ± 18 ⇒ (22, 58) Since this interval is contained in the interval 0, n = 400, the normal approximation is appropriate.

(60 + .5) − 40 ⎞ ⎛ P(x > 60) ≈ P ⎜ z > ⎟ 6 ⎝ ⎠ = P(z > 3.42) ≈ .5000 - .5000 = 0


121


4.132

a.

For n = 100 and p = .01: μ ± 3σ ⇒ np ± 3 npq ⇒ 100(.01) ± 3 100(.01)(.99) ⇒ 1 ± 3(.995) ⇒ 1 ± 2.985 ⇒ (−1.985, 3.985) Since the interval does not lie in the range 0 to 100, we cannot use the normal approximation to approximate the probabilities.

b.

For n = 100 and p = .5: μ ± 3σ ⇒ np ± 3 npq ⇒ 100(.5) ± 3 100(.5)(.5) ⇒ 50 ± 3(5) ⇒ 50 ± 15 ⇒ (35, 65) Since the interval lies in the range 0 to 100, we can use the normal approximation to approximate the probabilities.

c.

For n = 100 and p = .9: μ ± 3σ ⇒ np ± 3 npq ⇒ 100(.9) ± 3 100(.9)(.1) ⇒ 90 ± 3(3) ⇒ 90 ± 9 ⇒ (81, 99) Since the interval lies in the range 0 to 100, we can use the normal approximation to approximate the probabilities.

4.134

b.

Let v = number of credit card users out of 100 who carry Visa. Then v is a binomial random variable with n = 100 and pv = .539. E(v) = npv = 100(.539) = 53.9. Let d = number of credit card users out of 100 who carry Discover. Then d is a binomial random variable with n = 100 and pd = .040. E(d) = npd = 100(.040) = 4.0.

c.

To see if the normal approximation is valid, we use:

μ ± 3σ ⇒ npv ± 3 npv qv ⇒ 100(.539) ± 3 100(.539)(.461) ⇒ 53.9 ± 3(4.9848) ⇒ 53.9 ± 14.9544 ⇒ (38.946, 68.854) Since the interval lies in the range 0 to 100, we can use the normal approximation to approximate the probability. (50 − .5) − 53.9 ⎞ ⎛ P (v ≥ 50) ≈ P ⎜ z ≥ ⎟ = P ( z ≥ −.88) = .5 + .3106 = .8106 4.985 ⎝ ⎠

122

Chapter 4


Let a = number of credit card users out of 100 who carry American Express. Then a is a binomial random variable with n = 100 and pa = .132. To see if the normal approximation is valid, we use:

μ ± 3σ ⇒ npa ± 3 npa qa ⇒ 100(.132) ± 3 100(.132)(.868) ⇒ 13.2 ± 3(3.385) ⇒ 13.2 ± 10.155 ⇒ (3.045, 23.355) Since the interval lies in the range 0 to 100, we can use the normal approximation to approximate the probability. (50 − .5) − 13.2 ⎞ ⎛ P (a ≥ 50) ≈ P ⎜ z ≥ ⎟ = P( z ≥ 10.72) ≈ .5 + .5 = 0 3.385 ⎝ ⎠

4.136

d.

In order for the normal approximation to be valid, μ ± 3σ must lie in the interval (0, n). This check was done in part c for both portions of the question. In both cases, the normal approximation was justified.

a.

If 80% of the passengers pass through without their luggage being inspected, then 20% will be detained for luggage inspection. The expected number of passengers detained will be: E(x) = np = 1,500(.2) = 300

4.140

b.

For n = 4,000, E(x) = np = 4,000(.2) = 800

c.

⎛ (600 + .5) − 800 ⎞ P(x > 600) ≈ P ⎜ z > ⎟ = P(z > −7.89) = .5 + .5 = 1.0 ⎜ 4000(.2)(.8) ⎟⎠ ⎝

E(x) = μ =

∑ xp( x) = 1(.2) + 2(.3) + 3(.2) + 4(.2) + 5(.1) = .2 + .6 + .6 + .8 + .5 = 2.7

E( x ) =

∑ xp( x ) = 1.0(.04) + 1.5(.12) + 2.0(.17) + 2.5(.20) + 3.0(.20) + 3.5(.14) + 4.0(.08) + 4.5(.04) + 5.0(.01) = .04 + .18 + .34 + .50 + .60 + .49 + .32 + .18 + .05 = 2.7

4.144

The sampling distribution is approximately normal only if the sample size is sufficiently large or if the population being sampled from is normal.

4.146

a.

μ x = μ = 10, σ x = σ / n = 3/ 25 = 0.6

b.

μ x = μ = 100, σ x = σ / n = 25 / 25 = 5

c.

μ x = μ = 20, σ x = σ / n = 40 / 25 = 8

d.

μ x = μ = 10, σ x = σ / n = 100 / 25 = 20


123


4.148

4.150

a.

μ x = μ = 20, σ x = σ / n = 16 / 64 = 2

b.

By the Central Limit Theorem, the distribution of is approximately normal. In order for the Central Limit Theorem to apply, n must be sufficiently large. For this problem, n = 64 is sufficiently large.

c.

z=

d.

z=

x − μx

σx x − μx

σx

=

15.5 − 20 = − 2.25 2

=

23 − 20 = 1.50 2

For this population and sample size, E ( x ) = μ = 100, σ x = σ / n = 10 / 900 = 1/3 a.

b.

4.154

Approximately 95% of the time, will be within two standard deviations of the mean, i.e., 2 ⎛1⎞ μ ± 2σ ⇒ 100 ± 2 ⎜ ⎟ ⇒ 100 ± ⇒ (99.33, 100.67). Almost all of the time, the 3 ⎝3⎠ sample mean will be within three standard deviations of the mean, i.e., μ ± 3σ ⇒ 100 ± ⎛1⎞ 3 ⎜ ⎟ ⇒ 100 ± 1 ⇒ (99, 101). ⎝ 3⎠ ⎛1⎞ No more than three standard deviations, i.e., 3 ⎜ ⎟ = 1 ⎝ 3⎠

c.

No, the previous answer only depended on the standard deviation of the sampling distribution of the sample mean, not the mean itself.

a.

μ x = μ = 98,500

b.

σx =

σ n

=

30,000 50

= 4, 242.6407

c. By the Central Limit Theorem, the sampling distribution of x is approximately normal.

124

x − μx

z=

e.

P ( x > 89,500) = P ( z > −2.12) = .5 + .4830 = .9830 (Using Table IV, Appendix B)

σx

=

89,500 − 98,500 = −2.12 4, 242.6407

d.

Chapter 4


4.156

a.

μ x = μ = 89.34; σ x =

σ n

=

7.74 35

= 1.3083

b.

c.

d.

4.158

a.

88 − 89.34 ⎞ ⎛ P( x > 88) = P ⎜ z > ⎟ = P(z > −1.02) = .5 + .3461 = .8461 1.3083 ⎠ ⎝ (using Table IV, Appendix B) 87 − 89.34 ⎞ ⎛ P( x < 87) = P ⎜ z < ⎟ = P(z < −1.79) = .5 − .4633 = .0367 1.3083 ⎠ ⎝ (using Table IV, Appendix B)

Since the sample size is small, we also have to assume that the distribution from σ .5 which the sample was drawn is normal. μ x = μ = 1.8 , σ x = = = .1118 n 20 1.85 − 1.8 ⎞ ⎛ P ( x ≥ 1.85) = P ⎜ z ≥ = P ( z ≥ 0.45) = .5 − .1736 = .3264 .1118 ⎟⎠ ⎝ (using Table IV, Appendix B)

b.


Descriptive Statistics: Rough Variable Rough

N 20

N* 0

Mean 1.881

SE Mean 0.117

StDev 0.524

Minimum 1.060

Q1 1.303

Median 2.040

Q3 2.293

Maximum 2.640

From this output, the value of x is 1.881. c.

For x = 1.881: 1.881 − 1.8 ⎞ ⎛ P ( x ≥ 1.881) = P ⎜ z ≥ = P ( z ≥ 0.72) = .5 − .1736 = .3264 .1118 ⎟⎠ ⎝

Since this probability is so high, observing a sample mean of x = 1.881, is not unusual. The assumptions in part a appear to be valid, 4.160

If the observations are independent of each other, then

P(1, 1) = p(1)p(1) = .2(.2) = .04 P(1, 2) = p(1)p(2) = .2(.3) = .06 P(1, 3) = p(1)p(3) = .2(.2) = .04 etc.


125


a.

x

Possible Sample

1, 1 1, 2 1, 3 1, 4 1, 5 2, 1 2, 2 2, 3 2, 4 2, 5 3, 1 3, 2 3, 3

1 1.5 2 2.5 3 1.5 2 2.5 3 3.5 2 2.5 3

p( x )

Possible Samples

.04 .06 .04 .04 .02 .06 .09 .06 .06 .03 .04 .06 .04

3, 4 3, 5 4, 1 4, 2 4, 3 4, 4 4, 5 5, 1 5, 2 5, 3 5, 4 5, 5

x 3.5 4 2.5 3 3.5 4 4.5 3 3.5 4 4.5 5

p( x ) .04 .02 .04 .06 .04 .04 .02 .02 .03 .02 .02 .01

Summing the probabilities, the probability distribution of is:

x 1 1.5 2 2.5 3 3.5 4 4.5 5

p( x ) .04 .12 .17 .20 .20 .14 .08 .04 .01

b.

126

c.

P( x ≥ 4.5) = .04 + .01 = .05

d.

No. The probability of observing = 4.5 or larger is small (.05).

Chapter 4


4.162

For n = 36, μ x = μ = 406 and σ x = σ / n = 10.1/ 36 = 1.6833. By the Central Limit Theorem, the sampling distribution is approximately normal (n is large). 400.8 − 406 ⎞ ⎛ P( x ≤ 400.8) = P ⎜ z ≤ ⎟ = P(z ≤ −3.09) = .5 − .4990 = .0010 1.6833 ⎠ ⎝ (using Table IV, Appendix B) The first. If the true value of μ is 406, it would be extremely unlikely to observe an as small as 400.8 or smaller (probability .0010). Thus, we would infer that the true value of μ is less than 406.

4.164

4.166

a.

This experiment consists of 100 trials. Each trial results in one of two outcomes: chip is defective or not defective. If the number of chips produced in one hour is much larger than 100, then we can assume the probability of a defective chip is the same on each trial and that the trials are independent. Thus, x is a binomial. If, however, the number of chips produced in an hour is not much larger than 100, the trials would not be independent. Then x would not be a binomial random variable.

b.

This experiment consists of two trials. Each trial results in one of two outcomes: applicant qualified or not qualified. However, the trials are not independent. The probability of selecting a qualified applicant on the first trial is 3 out of 5. The probability of selecting a qualified applicant on the second trial depends on what happened on the first trial. Thus, x is not a binomial random variable. It is a hypergeometric random variable.

c.

The number of trials is not a specified number in this experiment, thus x is not a binomial random variable. In this experiment, x is counting the number of calls received.

d.

The number of trials in this experiment is 1000. Each trial can result in one of two outcomes: favor state income tax or not favor state income tax. Since 1000 is small compared to the number of registered voters in Florida, the probability of selecting a voter in favor of the state income tax is the same from trial to trial, and the trials are independent of each other. Thus, x is a binomial random variable.

a.

μ= σ2 =

∑ xp( x) = 10(.2) + 12(.3) + 18(.1) + 20(.4) = 15.4 ∑ (x − μ)

2

p ( x)

= (10 − 15.4) (.2) + (12 − 15.4)2(.3) + (18 − 15.4)2(.1) + (20 − 15.4)2(.4) = 18.44 σ = 18.44 ≈ 4.294 2

b

P(x < 15) = p(10) + p(12) = .2 + .3 = .5

c.

μ ± 2σ = 15.4 ± 2(4.294) ⇒ (6.812, 23.988)

d.

P(6.812 < x < 23.988) = .2 + .3 + .1 + .4 = 1.0


127


4.168

4.170

128

Using Table III, Appendix B, a.

When λ = 2, p(3) = P(x ≤ 3) − P(x ≤ 2) = .857 − .677 = .180

b.

When λ = 1, p(4) = P(x ≤ 4) − P(x ≤ 3) = .996 − .981 = .015

c.

When λ = .5, p(2) = P(x ≤ 2) − P(x ≤ 1) = .986 − .910 = .076

a.

1 1 ⎧ 1 = = ,10 ≤ x ≤ 90 ⎪ f(x) = ⎨ d − c 90 − 10 80 ⎪⎩ 0 otherwise

b.

μ=

c.

The interval μ ± 2σ ⇒ 50 ± 2(23.094) ⇒ 50 ± 46.188 ⇒ (3.812, 96.188) is indicated on the graph.

d.

P(x ≤ 60) = Base(height) = (60 − 10)

e.

P(x ≥ 90) = 0

f.

P(x ≤ 80) = Base(height) = (80 − 10)

g.

P(μ −σ ≤ x ≤ μ + σ) = P(50 − 23.094 ≤ x ≤ 50 + 23.094) = P(26.906 ≤ x ≤ 73.094) = Base(height) ⎛ 1 ⎞ 46.188 = (73.094 − 26.906) ⎜ ⎟ = = .577 80 ⎝ 80 ⎠

h.

P(x > 75) = Base(height) = (90 − 75)

c+d 10 + 90 = = 50 2 2 d −c 90 − 10 σ= = = 23.094011 12 12

1 5 = = .625 80 8

1 7 = = .875 80 8

1 15 = = .1875 80 80

Chapter 4


4.172

a.

P(z ≤ z0) = .5080 ⇒ P(0 ≤ z ≤ z0) = .5080 − .5 = .0080 Looking up the area .0080 in Table IV, ⇒ z0 = .02

b.

P(z ≥ z0) = .5517 ⇒ P(z0 ≤ z ≤ 0) = .5517 − .5 = .0517 Looking up the area .0517 in Table IV, z0 = −.13.

c.

P(z ≥ z0) = .1492 ⇒ P(0 ≤ z ≤ z0) = .5 − .1492 = .3508 Looking up the area .3508 in Table IV, ⇒ z0 = 1.04

d.

P(z0 ≤ z ≤ .59) = .4773 ⇒ P(z0 ≤ z ≤ 0) + P(0 ≤ z ≤ .59) = .4773 P(0 ≤ z ≤ .59) = .2224 Thus, P(z0 ≤ z ≤ 0) = .4773 − .2224 = .2549 Looking up the area .2549 in Table IV, z0 = -.69

4.174

μ = np = 100(.5) = 50, σ = a.

npq = 100(.5)(.5) = 5

(48 + .5) − 50 ⎞ ⎛ P(x ≤ 48) = P ⎜ z ≤ ⎟ 5 ⎝ ⎠ = P(z ≤ −.30) = .5 − .1179 = .3821

b.

P(50 ≤ x ≤ 65) (65 + .5) − 50 ⎞ ⎛ (50 − .5) − 50 = P⎜ ≤ z ≤ ⎟ 5 5 ⎝ ⎠ = P(−.10 ≤ z ≤ 3.10) = .0398 + .5000 = .5398


129


c.

(70 − .5) − 50 ⎞ ⎛ P(x ≥ 70) = P ⎜ z ≥ ⎟ 5 ⎝ ⎠ = P(z ≥ 3.90) = .5 − .5 = 0

d.

P(55 ≤ x ≤ 58) (58 + .5) − 50 ⎞ ⎛ (55 − .5) − 50 = P⎜ ≤ z ≤ ⎟ 5 5 ⎝ ⎠ = P(.90 ≤ z ≤ 1.70) = P(0 ≤ z ≤ 1.70) − P(0 ≤ z ≤ .90) = .4554 − .3159 = .1395

e.

P(x = 62) (62 + .5) − 50 ⎞ ⎛ (62 − .5) − 50 = P⎜ ≤ z ≤ ⎟ 5 5 ⎝ ⎠ = P(2.30 ≤ z ≤ 2.50) = P(0 ≤ z ≤ 2.50) − (0 ≤ z ≤ 2.30) = .4938 − .4893 = .0045

f.

P(x ≤ 49 or x ≥ 72) (49 + .5) − 50 ⎞ ⎛ = P⎜ z ≤ ⎟ 5 ⎝ ⎠ (72 − .5) − 50 ⎞ ⎛ + P⎜ z ≥ ⎟ 5 ⎝ ⎠ = P(z ≤ −.10) + P(z ≥ 4.30) = (.5 − .0398) + (.5 − .5) = .4602

4.176

a.

First we must compute μ and σ. The probability distribution for x is: x 1 2 3 4

μ = E(x) =

p(x) .3 .2 .2 .3

∑ xp( x) = 1(.3) + 2(.2) + 3(.2) + 4(.3) = 2.5

σ2 = E ∑ ( x − μ ) 2 =

∑ (x − μ)

2

p ( x)

= (1 − 2.5) (.3) + (2 − 2.5)2(.2) + (3 − 2.5)2(.2)+ (4 − 2.52)(.3) = 1.45 σ 1.45 μ x = μ = 2.5, σ x = = = .1904 n 40 2

130

Chapter 4


4.180

b.

By the Central Limit Theorem, the distribution of is approximately normal. The sample size, n = 40, is sufficiently large. Our answer does depend on n. If n is not sufficiently large, the Central Limit Theorem would not apply.

a.

In order to be a binomial random variable, the five characteristics must hold. 1. 2. 3.

4. 5.

For this problem, there are 5 items scanned. We will assume that these 5 trials are identical. For each item scanned, there are 2 possible outcomes: priced incorrectly (S) or priced correctly (F). The probability of being priced incorrectly remains constant from trial to trial. For this problem, we will assume that the probability of being priced incorrectly is P(S) = 1/30 for each trial. We will assume that whether one item is priced incorrectly is independent of any other. The random variable x is the number of items priced incorrectly in 5 trials.

Thus, x is a binomial random variable. b.

The estimate of p, the probability of an item being priced incorrectly is 1/30.

c.

⎛ 5⎞ P(x = 1) = ⎜ ⎟ (1/30)1(29/30)4 = .1455 ⎝ 1⎠

d. e.

⎛ 5⎞ P(x ≥ 1) = 1 − P(x = 0) = 1 − ⎜ ⎟ (1/30)0(29/30)5 = 1 − .8441 = .1559 ⎝0⎠ Let x = number of items with incorrect prices in 10,000 trials. Thus, x is a binomial random variable with n = 10,000 and p = 1/30 = .033.

μ ± 3σ ⇒ np ± 3 npq ⇒ 10,000(.033) ± 3 10, 000(.033)(.967) ⇒ 330 ± 3 319.11 ⇒ 330 ± 3(17.864) ⇒ 330 ± 53.591 ⇒ (276.409, 383.591) Since the interval lies in the range 0 to 10,000, we can use the normal approximation to approximate the probabilities. (100 − .5) − 330 ⎞ ⎛ P(x ≥ 100) ≈ P ⎜ z ≥ ⎟ = P(z ≥ −12.90) 17.864 ⎝ ⎠ = P(−12.90 ≤ z < 0) + .5 ≈ .5 + .5 = 1 (using Table IV, Appendix B)


131


f.

Let x = number of items with incorrect prices in 100 trials. Thus, x is a binomial random variable with n = 100 and p = 1/30 = .033.

μ ± 3σ ⇒ np ± 3 npq ⇒ 100(.033) ± 3 100(.033)(.967) ⇒ 3.3 ± 3 3.191 ⇒ 3.3 ± 3(1.786) ⇒ 3.3 ± 5.358 ⇒ (−2.058, 8.658) Since the interval does not lie in the range 0 to 100, the normal approximation will not be appropriate. 4.182

a.

Using Table IV, Appendix B, with μ = 8.72 and σ = 1.10, 6 − 8.72 ⎞ ⎛ = P(z < −2.47) = .5 − .4932 = .0068 P(x < 6) = P ⎜ z < 1.10 ⎟⎠ ⎝ Thus, approximately .68% of the games would result in fewer than 6 hits.

4.184

b.

The probability of observing fewer than 6 hits in a game is p = .0068. The probability of observing 0 hits would be even smaller. Thus, it would be extremely unusual to observe a no hitter.

a.

Using Table III, Appendix B, with λ =1, P(x = 3) = P(x ≤ 3) − P(x ≤ 2) = .981 − .920 = .061

b. P(x > 2) = 1 – P(x ≤ 2) = 1 − .920 = .080. 4.186

a.

Let x = number of employees who have a drug problem in 1,000 trials. Then x is a binomial random variable with n = 1,000 and p = .052. E(x) = np = 1,000(.052) = 52

b.

Let x = number of employees who have an alcohol problem in 10 trials. Then x is a binomial random variable with n = 10 and p = .085. ⎛ 10 ⎞ P(x ≥ 1) = 1 − P(x = 0) = 1 − ⎜ ⎟ .0850 .91510-0 ⎝ 0⎠ 10! =1− .0850 .91510 = 1 − .4113 = .5887 0!(10 - 0)! ⎛ 10 ⎞ 10! P(x = 2) = ⎜ ⎟ .0852 .91510-2 = .0852 .9158 = .1597 2 2!(10 − 2)! ⎝ ⎠

c.

132

We had to assume that the probability of an employee having a substance abuse problem was constant from trial to trial and that the trials were independent.

Chapter 4


4.188

Let x = demand for white bread. Then x is a normal random variable with μ = 7200 and σ = 300: a.

P(x ≤ x0) = .94. Find x0. ⎛ x − 7200 ⎞ P(x ≤ x0) = P ⎜ z ≤ 0 ⎟ 300 ⎠ ⎝ = P(z ≤ z0) = .94 A1 = .94 − .50 = .4400 Using Table IV and area .4400, z0 = 1.555. x 0 − 7200 x − 7200 ⇒ 1.555 = 0 ⇒ x0 = 7666.5 ≈ 7667 300 300 If the company produces 7,667 loaves, the company will be left with more than 500 loaves if the demand is less than 7,667 - 500 = 7167. 7167 − 7200 ⎞ ⎛ P(x < 7167) = P ⎜ z < ⎟ = P(z < −.11) 300 ⎝ ⎠ z0 =

b.

= .5 − .0438 = .4562 (from Table IV, Appendix B) Thus, on 45.62% of the days the company will be left with more than 500 loaves. 4.190

Let x = number of inches a gouge is from one end of the spindle. Then x has a uniform distribution with f(x) as follows: 1 1 ⎧ 1 = = ⎪ f ( x) = ⎨ d − c 18 − 0 18 ⎪⎩ 0

0 ≤ x ≤ 18 otherwise

In order to get at least 14 consecutive inches without a gouge, the gouge must be within 4 inches of either end. Thus, we must find: P(x < 4) + P(x > 14) = (4 − 0)(1/18) + (18 − 14)(1/18) = 4/18 + 4/18 = 8/18 = .4444 4.192

a.

b.

c.

μ x = μ = 3.5 σ x =

σ n

=

.5 100

= .05

3.60 − 3.5 ⎞ ⎛ 3.40 − 3.5 P(3.40 < x < 3.60) = P ⎜ 3.62) = P ⎜ z > ⎟ = P(z > 2.40) = .5 − .4918 =.0082 .05 ⎝ ⎠ (using Table IV, Appendix B)


133


d.

μ x = μ = 3.5 σ x =

σ n

=

.5 200

= .03536

The mean of the sampling distribution of would stay the same, but the standard deviation would decrease.

3.60 − 3.5 ⎞ ⎛ 3.40 − 3.5
3.62 − 3.5 ⎞ ⎛ P( x > 3.62) = P ⎜ z > ⎟ = P(z > 3.39) ≈ .5 − .5 = 0 .03536 ⎠ ⎝ (using Table IV, Appendix B) This probability is smaller than when the sample size was 100.

4.194

a.

Let p1 = probability of an error = 1/100 = .01 and p2 = probability of an error resulting in a significant problem = 1/500 = .002. Let x = number of errors in 60,000 trials. Then E(x) = μ1 = np1 = 60,000(.01) = 600.

b.

Let y = number of significant errors in 60,000 trials. Then E(y) = μ2 = np2 = 60,000(.002) = 120. σ = np2q2 = 60,000(.002)(.998) = 119.76 σ = 119.76 = 10.94 μ2 ± 3σ ⇒ 120 ± 3(10.94) ⇒ 120 ± 32.82 ⇒ (87.18, 152.82) Using Chebyshev's Rule, at least 88.9% of the observations will fall within 3 standard deviations of the mean. We would expect the number of significant errors to fall between 87 and 153.

4.196

c.

We must assume that the trials are independent and that the probability of a significant error is constant from trial to trial.

a.

By the Central Limit Theorem, the sampling distribution of x is approximately normal since n > 30 and σ 15 σx = μ x = μ = 840 = = 2.1213 n 50

b. c.

134

830 − 840 ⎞ ⎛ P( x ≤ 830) = P ⎜ z ≤ ⎟ = P(z ≤ −4.71) ≈ .5 − .5 = 0 2.1213 ⎠ ⎝ Since the probability of observing a mean of 830 or less is extremely small (≈0) if the true mean is 840, we would tend to believe that the mean is not 840, but something less.

Chapter 4


d.

By the Central Limit Theorem, the sampling distribution of is approximately normal since n > 30 and σ 45 σx = μ x = μ = 840 = = 6.3640 n 50 830 − 840 ⎞ ⎛ P( x ≤ 830) = P ⎜ z ≤ ⎟ = P(z ≤ −1.57) ≈ .5 − .4418 = .0582 6.3640 ⎠ ⎝

4.198

Let x = length of time a bus is late. Then x is a uniform random variable with probability distribution: ⎧1 (0 ≤ x ≤ 20) ⎪ f(x) = ⎨ 20 ⎪⎩ 0 otherwise 0 + 20 = 10 2

a.

μ=

b.

⎛ 1 ⎞ 1 P(x ≥ 19) = (20 − 19) ⋅ ⎜ ⎟ = = .05 ⎝ 20 ⎠ 20

c.

It would be doubtful that the director's claim is true, since the probability of the bus being more than 19 minutes late is so small.


135


The Furniture Fire Case (To accompany Chapters 3–4)

Using the entire data set of 3,005 invoices as the population, the mean profit margin is 48.9% and the standard deviation is 13.8291%. If a random sample is selected from this population, the sampling distribution of the sample mean ( x ) is approximately normal with a mean of 48.901% and a standard deviation of 13.8291%/ n by the Central Limit Theorem. If a random sample of 253 invoices is selected, then the probability of obtaining a sample mean of 50.8% or higher is:

50.8 − 48.901 ⎞ ⎛ P(x ≥ 50.8) = P ⎜ z ≥ ⎟ = P(z ≥ 2.18) = .5 − .4854 = .0146 13.8291/ 253 ⎠ ⎝ Since the probability of obtaining a sample mean of 50.8% or higher from this population is extremely small (.0146), we would conclude that there is evidence of fraud. If we look at the two samples separately, the evidence becomes even more damning. For the sample of 134 invoices, the probability of obtaining a sample mean of 50.6% or higher is: 50.6 − 48.901 ⎞ ⎛ P( x1 ≥ 50.6) = P ⎜ z ≥ ⎟ = P(z ≥ 1.42) = .5 − .4222 = .0778 13.8291/ 134 ⎠ ⎝ For the sample of 119 invoices, the probability of obtaining a sample mean of 51.0% or higher is: 51.0 − 48.901 ⎞ ⎛ P( x2 ≥ 51.0) = P ⎜ z ≥ ⎟ = P(z ≥ 1.66) = .5 − .4515 = .0485 13.8291/ 119 ⎠ ⎝ The probability of observing one sample mean of 50.6% or higher AND a second sample mean of 51.0% or higher is:

P( x1 ≥ 50.6, x2 ≥ 51.0) = .0778(.0485) = .0038 Again, since the probability of obtaining two sample means of 50.8% or higher and 51.0% or higher from this population is extremely small (.0038), we would conclude that there is evidence of fraud.

136

The Furniture Fire Case



5.2

5.4

a.

zα/2 = 1.96, using Table IV, Appendix B, P(0 ≤ z ≤ 1.96) = .4750. Thus, α/2 = .5000 − .4750 = .025, α = 2(.025) = .05, and 1 - α = 1 - .05 = .95. The confidence level is 100% × .95 = 95%.

b.

zα/2 = 1.645, using Table IV, Appendix B, P(0 ≤ z ≤ 1.645) = .45. Thus, α/2 = .50 − .45 = .05, α = 2(.05) = .1, and 1 − α = 1 − .1 = .90. The confidence level is 100% × .90 = 90%.

c.


d.


e.

zα/2 = .99, using Table IV, Appendix B, P(0 ≤ z ≤ .99) = .3389. Thus, α/2 = .5000 − .3389 = .1611, α = 2(.1611) = .3222, and 1 − α = 1 − .3222 = .6778. The confidence level is 100% × .6778 = 67.78%.

a.

For confidence coefficient .95, α = .05 and α/2 = .05/2 = .025. From Table IV, Appendix B, z.025 = 1.96. The confidence interval is:

x ± z.025 b.

c.

s 2.7 ⇒ 25.9 ± 1.96 ⇒ 25.9 ± .56 ⇒ (25.34, 26.46) 90 n

For confidence coefficient .90, α = .10 and α/2 = .10/2 = .05. From Table IV, Appendix B, z.05 = 1.645. The confidence interval is:

x ± z.05

s n

⇒ 25.9 ± 1.645

2.7 90

⇒ 25.9 ± .47 ⇒ (25.43, 26.37)

For confidence coefficient .99, α = .01 and α/2 = .01/2 = .005. From Table IV, Appendix B, z.005 = 2.58. The confidence interval is: x ± z.005

5.6

Chapter 5

s 2.7 ⇒ 25.9 ± 2.58 ⇒ 25.9 ± .73 ⇒ (25.17, 26.63) 90 n

If we were to repeatedly draw samples from the population and form the interval x ± 1.96 σ x each time, approximately 95% of the intervals would contain μ. We have no way of knowing whether our interval estimate is one of the 95% that contain μ or one of the 5% that do not.


137


5.8

a.

For confidence coefficient .95, α = .05 and α/2 = .05/2 = .025. From Table IV, Appendix B, z.025 = 1.96. The confidence interval is: x ± z.025

5.10

s 3.3 ⇒ 33.9 ± 1.96 ⇒ 33.9 ± .323 ⇒ (33.577, 34.223) n 400

b.

x ± z.025

c.

For part a, the width of the interval is 2(.647) = 1.294. For part b, the width of the interval is 2(.323) = .646. When the sample size is quadrupled, the width of the confidence interval is halved.

a.

A point estimate for the average number of latex gloves used per week by all healthcare workers with latex allergy is x = 19.3 .

b.

For confidence coefficient .95, α = .05 and α/2 = .05/2 = .025. From Table IV, Appendix B, z.025 = 1.96. The confidence interval is: x ± zα / 2

138

s n

⇒ 19.3 ± 1.96

11.9 46

⇒ 19.3 ± 3.44 ⇒ (15.86, 22.74)

c.

We are 95% confident that the true average number of latex gloves used per week by all healthcare workers with a latex allergy is between 15.86 and 22.74.

d.

The conditions required for the interval to be valid are: a. b.

5.12

s 3.3 ⇒ 33.9 ± 1.96 ⇒ 33.9 ± .647 ⇒ (33.253, 34.547) 100 n

The sample selected was randomly selected from the target population. The sample size is sufficiently large, i.e. n > 30.

a.

The point estimate for the mean charitable commitment of tax-exempt organizations is x = 74.9667.

b.

From the printout, the 95% confidence interval is (68.2371, 81.6962).

c.

The probability of estimating the true mean charitable commitment with a single number is 0. By estimating the true mean charitable commitment with an interval, we can be pretty confident that the true mean is in the interval.

Chapter 5


5.14

Using MINITAB, the descriptive statistics are: Descriptive Statistics: r Variable r

N 34

Mean 0.4224

Median 0.4300

TrMean 0.4310

Variable r

Minimum -0.0800

Maximum 0.7400

Q1 0.2925

Q3 0.6000

StDev 0.1998

SE Mean 0.0343


s

⇒ .4224 ± 1.96

n ⇒ (.3552, .4895)

.1998 34

⇒ .4224 ± .0672

We are 95% confident that the mean value of r is between .3552 and .4895. 5.16

a.


Descriptive Statistics: Rate Variable Rate

N 30

Mean 79.73

Median 80.00

TrMean 80.15

Variable Rate

Minimum 60.00

Maximum 90.00

Q1 76.75

Q3 84.00

StDev 5.96

SE Mean 1.09


s 5.96 ⇒ 79.73 ± 1.645 ⇒ 79.73 ± 1.79 n 30 ⇒ (77.94, 81.52)

b.

We are 90% confident that the mean participation rate for all companies that have 401(k) plans is between 77.94% and 81.52%.

c.

We must assume that the sample size (n = 30) is sufficiently large so that the Central Limit Theorem applies.

d.

Yes. Since 71% is not included in the 90% confidence interval, it can be concluded that this company's participation rate is lower than the population mean.

e.

The center of the confidence interval is . If 60% is changed to 80%, the value of will increase, thus indicating that the center point will be larger. The value of s2 will decrease if 60% is replaced by 80%, thus causing the width of the interval to decrease.


139


5.18

a.

Using MINITAB, I generated 30 random numbers using the uniform distribution from 1 to 308. The random numbers were: 9, 15, 19, 36, 46, 47, 63, 73, 90, 92, 108, 112, 117, 127, 144, 145, 150, 151, 172, 178, 218, 229, 230, 241, 242, 246, 252, 267, 274, 282 I numbered the 308 observations in the order that they appear in the file. Using the random numbers generated above, I selected the 9th, 15th, 19th, etc. observations for the sample. The selected sample is: .31, .34, .34, .50, .52, .53, .64, .72, .70, .70, .75, .78, 1.00, 1.00, 1.03, 1.04, 1.07, 1.10, .21, .24, .58, 1.01, .50, .57, .58, .61, .70, .81, .85, 1.00

b.

Using MINITAB, the descriptive statistics for the sample of 30 observations are:

Descriptive Statistics: carats-samp Variable carats-s

N 30

Mean 0.6910

Median 0.7000

TrMean 0.6965

Variable carats-s

Minimum 0.2100

Maximum 1.1000

Q1 0.5150

Q3 1.0000

StDev 0.2620

SE Mean 0.0478

From above, x =.6910 and s = .2620. c.


5.20

s n

⇒ .691 ± 1.96

.262 30

⇒ .691 ± .094 ⇒ (.597, .785)

d.

We are 95% confident that the mean number of carats is between .597 and .785.

e.

From Exercise 2.47, we computed the “population” mean to be .631. This mean does fall in the 95% confidence interval we computed in part d.

x=

11,298 = 2.26 5,000

For confidence coefficient, .95, α = .05 and α/2 = .025. From Table IV, Appendix B, z.025 = 1.96. The confidence interval is: 1 .5 s ⇒ 2.26 ± 1.96 ⇒ 2.26 ± .04 ⇒ (2.22, 2.30) x ± zα/2 5000 n We are 95% confident the mean number of roaches produced per roach per week is between 2.22 and 2.30.

140

Chapter 5


5.22

5.24

a.

If x is normally distributed, the sampling distribution of x is normal, regardless of the sample size.

b.

If nothing is known about the distribution of x, the sampling distribution of x is approximately normal if n is sufficiently large. If n is not large, the distribution of x is unknown if the distribution of x is not known.

a.

P(t ≥ t0) = .025 where df = 11 t0 = 2.201

b.

P(t ≥ t0) = .01 where df = 9 t0 = 2.821

c.

P(t ≤ t0) = .005 where df = 6 Because of symmetry, the statement can be rewritten P(t ≥ −t0) = .005 where df = 6 t0 = −3.707

d.

5.26

P(t ≤ t0) = .05 where df = 18 t0 = −1.734

For this sample, ∑ x = 1567 = 97.9375 x= n 16 s2 = s=

∑x

2

(∑ x) −

n −1

2

n

=

1567 2 16 = 159.9292 16 − 1

155,867 −

s 2 = 12.6463

a.

For confidence coefficient, .80, α = 1 − .80 = .20 and α/2 = .20/2 = .10. From Table VI, Appendix B, with df = n − 1 = 16 − 1 = 15, t.10 = 1.341. The 80% confidence interval for μ is: s 12.6463 x ± t.10 ⇒ 97.94 ± 1.341 ⇒ 97.94 ± 4.240 ⇒ (93.700, 102.180) n 16

b.

For confidence coefficient, .95, α = 1 − .95 = .05 and α/2 = .05/2 = .025. From Table VI, Appendix B, with df = n − 1 = 24 − 1 = 23, t.025 = 2.131. The 95% confidence interval for μ is: x ± t.025

s n

⇒ 97.94 ± 2.131

12.6463 16

⇒ 97.94 ± 6.737 ⇒ (91.203, 104.677)

The 95% confidence interval for μ is wider than the 80% confidence interval for μ found in part a.


141


c.

For part a: We are 80% confident that the true population mean lies in the interval 93.700 to 102.180. For part b: We are 95% confident that the true population mean lies in the interval 91.203 to 104.677. The 95% confidence interval is wider than the 80% confidence interval because the more confident you want to be that μ lies in an interval, the wider the range of possible values.

5.28

a.


Descriptive Statistics: MTBE Variable MTBE

N 12

N* 0

Mean 97.2

SE Mean 32.8

StDev 113.8

Minimum 8.00

Q1 12.0

Median 50.5

Q3 146.0

Maximum 367.0

A point estimate for the true mean MTBE level for all well sites located near the New Jersey gasoline service station is x = 97.2 . b.

For confidence coefficient .99, α = .01 and α/2 = .01/2 = .005. From Table VI, Appendix B, with df = n – 1 = 12 – 1 = 11, t.005 = 3.106. The 99% confidence interval is: s

x ± t.005

n

⇒ 97.2 ± 3.106

113.8 12

⇒ 97.2 ± 102.04 ⇒ (−4.84, 199.24)

We are 99% confident that the true mean MTBE level for all well sites located near the New Jersey gasoline service station is between −4.84 and 199.24. c.

We must assume that the data were sampled from a normal distribution. We will use the four methods to check for normality. First, we will look at a histogram of the data. Using MINITAB, the histogram of the data is: Histogram of MTBE 5

Fr equency

4

3

2

1

0

142

0

50

100

150 200 M T BE

250

300

350

Chapter 5


From the histogram, the data do not appear to be mound-shaped. This indicates that the data may not be normal. Next, we look at the intervals x ± s, x ± 2 s, x ± 3s . If the proportions of observations falling in each interval are approximately .68, .95, and 1.00, then the data are approximately normal. Using MINITAB, the summary statistics are: x ± s ⇒ 97.2 ± 113.8 ⇒ (−16.6, 211.0) 10 of the 12 values fall in this interval. The proportion is .83. This is not very close to the .68 we would expect if the data were normal. x ± 2 s ⇒ 97.2 ± 2(113.8) ⇒ 97.2 ± 227.6 ⇒ (−130.4, 324.8) 11 of the 12 values fall in this interval. The proportion is .92. This is a somewhat smaller than the .95 we would expect if the data were normal. x ± 2 s ⇒ 97.2 ± 3(113.8) ⇒ 97.2 ± 341.4 ⇒ (−244.2, 438.6) 12 of the 12 values fall in this interval. The proportion is 1.00. This is exactly the 1.00 we would expect if the data were normal. From this method, it appears that the data may not be normal. Next, we look at the ratio of the IQR to s. IQR = QU – QL = 146.0 – 12.0 = 134.0. IQR 134.0 = = 1.18 This is somewhat smaller than the 1.3 we would expect if the data s 113.8 were normal. This method indicates the data may not be normal.

Finally, using MINITAB, the normal probability plot is: Probability Plot of MTBE N ormal - 95% C I 99

95 90

Mean StDev

97.17 113.8

N AD P-Value

12 0.929 0.012

P er cent

80 70 60 50 40 30 20 10 5

1

-300

-200

-100

0

100 200 M T BE

300

400

500

Since the data do not form a fairly straight line, the data may not be normal. From above, the all methods indicate the data may not be normal. It appears that the data probably are not normal.


143


5.30

We must assume that the distribution of the LOS's for all patients is normal. a.

For confidence coefficient .90, α = 1 − .90 = .10 and α/2 = .10/2 = .05. From Table VI, Appendix B, with df = n − 1 = 20 − 1 = 19, t.05 = 1.729. The 90% confidence interval is:

x ± t.05

5.32

5.34

s n

⇒ 3.8 ± 1.729

1.2 20

⇒ 3.8 ± .464 ⇒ (3.336, 4.264)

b.

We are 90% confident that the mean LOS is between 3.336 and 4.264 days.

c.

“90% confidence” means that if repeated samples of size n are selected from a population and 90% confidence intervals are constructed, 90% of all intervals thus constructed will contain the population mean.

a.

The 95% confidence interval for the mean surface roughness of coated interior pipe is (1.63580, 2.12620).

b.

No. Since 2.5 does not fall in the 95% confidence interval, it would be very unlikely that the average surface roughness would be as high as 2.5 micrometers.

a.

The population is the set of all DOT permanent count stations in the state of Florida.

b.

Yes. There are several types of routes included in the sample. There are 3 recreational areas, 7 rural areas, 5 small cities, and 5 urban areas.

c.


Descriptive Statistics: 30th hour, 100th hour Variable 30th hou 100th ho

N 20 20

Mean 2206 2096

Median 2064 1999

TrMean 2165 2048

Variable 30th hou 100th ho

Minimum 252 229

Maximum 4905 4815

Q1 1429 1318

Q3 3068 2877

StDev 1224 1203

SE Mean 274 269

For confidence coefficient .95, α = .05 and α/2 = .05/2 = .025. From Table VI, Appendix B, with df = n – 1 = 20 – 1 = 19, t.025 = 2.093. The 95% confidence interval is: x ± t.025

s n

⇒ 2, 206 ± 2.093

1, 224 20

⇒ 2, 206 ± 572.84 ⇒ (1,633.16, 2,778.84)

We are 95% confident that the mean traffic count at the 30th highest hour is between 1,633.16 and 2,778.84. d.

144

We must assume that the distribution of the traffic counts at the 30th highest hour is normal. From the stem-and-leaf display, the data look fairly mound-shaped. Thus, the assumption of normality is probably met.

Chapter 5


e.

For confidence coefficient .95, α = .05 and α/2 = .05/2 = .025. From Table VI, Appendix B, with df = n – 1 = 20 – 1 = 19, t.025 = 2.093. The 95% confidence interval is: x ± t.025

s n

⇒ 2,096 ± 2.093

1, 203 20

⇒ 2,096 ± 563.01 ⇒ (1,532.99, 2,659.01)

We are 95% confident that the mean traffic count at the 100th highest hour is between 1,532.99 and 2,659.01. We must assume that the distribution of the traffic counts at the 100th highest hour is normal. From the stem-and-leaf display, the data look fairly mound-shaped. Thus, the assumption of normality is probably met. f.

If μ = 2,700, it is very possible that it is the mean count for the 30th highest hour. It falls in the 95% confidence interval for the mean count for the 30th highest hour. It is not very likely that the mean count for the 100th highest hour is 2,700. It does not fall in the 95% confidence interval for the mean count for the 100th highest hour. (See parts c and e above.)

5.36

By the Central Limit Theorem, the sampling distribution of is approximately normal with pq mean μ pˆ = p and standard deviation σ pˆ = . n

5.38

a.

The sample size is large enough if the interval pˆ ± 3σ pˆ does not include 0 or 1.

pˆ ± 3σ pˆ ⇒ pˆ ± 3

ˆˆ pq pq .88(1 − .88) ⇒ .88 ± .089 ⇒ pˆ ± 3 ⇒ .88 ± n n 121 ⇒ (.791, .969)

Since the interval lies within the interval (0, 1), the normal approximation will be adequate. b.

For confidence coefficient .90, α = .10 and α/2 = .05. From Table IV, Appendix B, z.05 = 1.645. The 90% confidence interval is: pˆ ± z .05

c.

pq ⇒ pˆ ± 1.645 n

ˆˆ pq .88(.12) ⇒ .88 ± .049 ⇒ .88 ± 1.645 1.645 n 121 ⇒ (.831, .929)

We must assume that the sample is a random sample from the population of interest.


145


5.40

a.

Of the 50 observations, 15 like the product ⇒ pˆ =

15 = .30. 30

To see if the sample size is sufficiently large:

pˆ ± 3 σ pˆ ≈ pˆ ± 3

ˆˆ pq .3(.7) ⇒ .3 ± 3 ⇒ .3 ± .194 ⇒ (.106, .494) n 50

Since this interval is wholly contained in the interval (0, 1), we may conclude that the normal approximation is reasonable. For the confidence coefficient .80, α = .20 and α/2 = .10. From Table IV, Appendix B, z.10 = 1.28. The confidence interval is: pˆ ± z.10

5.42

ˆˆ pq .3(.7) ⇒ .3 ± 1.28 ⇒ .3 ± .083 ⇒ (.217, .383) n 50

b.

We are 80% confident the proportion of all consumers who like the new snack food is between .217 and .383.

a.

The point estimate of p is pˆ = .11 .

b.

To see if the sample size is sufficiently large: ˆˆ pq .11(.89) ⇒ .11 ± 3 ⇒ .11 ± .077 ⇒ (.033, .187) n 150 Since the interval is wholly contained in the interval (0, 1), we may conclude that the normal approximation is reasonable. pˆ ± 3σ pˆ ≈ pˆ ± 3

For confidence coefficient .95, α = .05 and α/2 = .05/2 = .025. From Table IV, Appendix B, z.025 = 1.96. The confidence interval is: pˆ ± z.025

5.44

ˆˆ pq .11(.89) ⇒ .11 ± 1.645 ⇒ .11 ± .05 ⇒ (.06, .16) n 150

c.

We are 95% confident that the true proportion of MSDS that are satisfactorily completed is between .06 and .16.

a.

The point estimate of p is pˆ =

x 16 = = .052 . n 308

To see if the sample size is sufficiently large: ˆˆ pq .052(.948) pˆ ± 3σ pˆ ≈ pˆ ± 3 ⇒ .052 ± 3 ⇒ .052 ± .038 ⇒ (.014, .090) n 308 Since the interval is wholly contained in the interval (0, 1), we may conclude that the normal approximation is reasonable.

146

Chapter 5



b.

ˆˆ pq .052(.948) ⇒ .052 ± 2.58 ⇒ .052 ± .033 ⇒ (.019, .085) n 308

We are 99% confident that the true proportion of diamonds for sale that are classified as “D” color is between .019 and .085. x 81 = .263 . The point estimate of p is pˆ = = n 308 To see if the sample size is sufficiently large: pˆ ± 3σ pˆ ≈ pˆ ± 3

ˆˆ pq .263(.737) ⇒ .263 ± 3 ⇒ .263 ± .075 ⇒ (.188, .338) n 308

Since the interval is wholly contained in the interval (0, 1), we may conclude that the normal approximation is reasonable. For confidence coefficient .99, α = .01 and α/2 = .01/2 = .005. From Table IV, Appendix B, z.005 = 2.58. The confidence interval is: pˆ ± z.05

ˆˆ pq .263(.737) ⇒ .263 ± 2.58 ⇒ .263 ± .065 ⇒ (.198, .328) n 308

We are 99% confident that the true proportion of diamonds for sale that are classified as “VS1” clarity, is between .198 and .328. 5.46

a.

The population is all senior human resource executives at U.S. companies.

b.

The population parameter of interest is p, the proportion of all senior human resource executives at U.S. companies who believe that their hiring managers are interviewing too many people to find qualified candidates for the job.

c.


x 211 = = .42 . To see if the sample size is sufficiently n 502

large: pˆ ± 3σ pˆ ≈ pˆ ± 3

ˆˆ pq .42(.58) ⇒ .42 ± 3 ⇒ .42 ± .066 ⇒ (.354, .486) n 502

Since the interval is wholly contained in the interval (0, 1), we may conclude that the normal approximation is reasonable.


147


d.


ˆˆ pq .42(.58) ⇒ .42 ± 2.33 ⇒ .42 ± .051 ⇒ (.369, .471) n 502

We are 98% confident that the true proportion of all senior human resource executives at U.S. companies who believe that their hiring managers are interviewing too many people to find qualified candidates for the job is between .369 and .471.

5.48

e.

A 90% confidence interval would be narrower. If the interval was narrower, it would contain fewer values, thus, we would be less confident.

a.

The point estimate of p is

b.

We must check to see if the sample size is sufficiently large:

pˆ ± 3σ pˆ ≈ pˆ ± 3

pˆ = x/n = 35/55 = .636.

ˆˆ pq .636(.364) ⇒ .636 ± 3 ⇒ .636 ± .195 ⇒ (.441, .831) n 55

Since the interval is wholly contained in the interval (0, 1) we may assume that the normal approximation is reasonable. For confidence coefficient, .99, α = .01 and α/2 = .01/2 = .005. From Table IV, Appendix B, z.005 = 2.575. The confidence interval is: pˆ ± z.005 c. d.

5.50

ˆˆ pq .636(.364) ⇒ .636 ± 2.575 ⇒ .636 ± .167 ⇒ (.469, .803) n 55

We are 99% confident that the true proportion of fatal accidents involving children is between .469 and .803. The sample proportion of children killed by air bags who were not wearing seat belts or were improperly restrained is 24/35 = .686. This is rather large proportion. Whether a child is killed by an airbag could be related to whether or not he/she was properly restrained. Thus, the number of children killed by air bags could possibly be reduced if the child were properly restrained.


x 36 = = .434 . n 83

To see if the sample size is sufficiently large: pˆ ± 3σ pˆ ≈ pˆ ± 3

ˆˆ pq .434(.566) ⇒ .434 ± 3 ⇒ .434 ± .163 ⇒ (.271, .597) n 83

Since the interval is wholly contained in the interval (0, 1), we may conclude that the normal approximation is reasonable.

148

Chapter 5


For confidence coefficient .95, α = .05 and α/2 = .05/2 = .025. From Table IV, Appendix B, z.025 = 1.96. The confidence interval is: ˆˆ pq .434(.566) ⇒ .434 ± 1.96 ⇒ .434 ± .107 ⇒ (.327, .541) n 83

pˆ ± z.025

We are 95% confident that the true proportion of healthcare workers with latex allergies actually suspects the he or she actually has the allergy is between .327 and .541. 5.52

To compute the necessary sample size, use

n=

2 ( zα / 2 ) σ 2

where α = 1 − .95 = .05 and α/2 = .05/2 = .025.

SE 2

From Table IV, Appendix B, z.025 = 1.96. Thus, n=

(1.96) 2 (7.2) = 307.328 ≈ 308 .32

You would need to take 308 samples. 5.54

a.

To compute the needed sample size, use:

n=

Thus, n =

( zα / 2 ) SE

2

pq

2

where z.025 = 1.96 from Table IV, Appendix B.

(1.96) 2 (.2)(.8) = 96.04 ≈ 97 .08 2

You would need to take a sample of size 97. b.

To compute the needed sample size, use:

n=

( zα / 2 )

2

SE

2

pq

=

(1.96) 2(.5)(.5) = 150.0625 ≈ 151 .08 2

You would need to take a sample of size 151. 5.56

a.

For a width of 5 units, SE = 5/2 = 2.5. To compute the needed sample size, use

( zα / 2 ) σ 2 2

n=

SE

2

where α = 1 − .95 = .05 and α/2 = .025.


149


From Table IV, Appendix B, z.025 = 1.96. Thus,

n=

(1.96) 2 (14) 2 = 120.47 ≈ 121 2.52

You would need to take 121 samples at a cost of 121($10) = $1210. Yes, you do have sufficient funds. b.

For confidence coefficient .90, α = 1 − .90 = .10 and α/2 = .10/2 = .05. From Table IV, Appendix B, z.05 = 1.645.

n=

(1.645) 2 (14) 2 = 84.86 ≈ 85 2.52

You would need to take 85 samples at a cost of 85($10) = $850. You still have sufficient funds but have an increased risk of error. 5.58

The sample size will be larger than necessary for any p other than .5.

5.60

a.

The confidence level desired by the researchers is 90%.

b.

The sampling error desired by the researchers is SE = .05.

c.

For confidence coefficient .90, α = .10 and α/2 = .10/2 = .05. From Table IV, x 64 Appendix B, z.05 = 1.645. From the problem, we will use pˆ = = = .604 n 106 to estimate p. Thus,

n=

( zα / 2 ) 2 pq 1.6452.604(.396) = = 258.9 ≈ 259 ( SE ) 2 .052

Thus, we would need a sample of size 259. 5.62

For confidence coefficient .95, α = .05 and α/2 = .05/2 = .025. From Table IV, Appendix B, z.025 = 1.96. For this study, n=

( zα / 2 ) 2 σ 2 1.962 (5) 2 ≈ = 96.04 ≈ 97 SE 2 12

The sample size needed is 97.

150

Chapter 5


5.64

For confidence coefficient .90, α = .10 and α/2 = .05. From Table IV, Appendix B, z.05 = 1.645. For a width of .06, SE = .06/2 = .03 ( zα / 2 ) 2 pq (.1645) 2 (.17)(.83) = 424.2 ≈ 425 = The sample size is n = 2 .032 SE You would need to take n = 425 samples.

5.66

To compute the necessary sample size, use n=

( zα / 2 ) 2 σ 2 where α = 1 − .90 = .10 and α/2 = .05. SE 2

From Table IV, Appendix B, z.05 = 1.645. Thus, n= 5.68

a.

(1.645) 2 (10) 2 = 270.6 ≈ 271 12

To compute the needed sample size, use n=

( zα / 2 ) 2 σ 2 where α = 1 − .90 = .10 and α/2 = .05. SE 2

From Table IV, Appendix B, z.10 = 1.645. Thus, n=

(1.645) 2 (2) 2 = 1,082.41 ≈ 1,083 .12

b.

As the sample size decreases, the width of the confidence interval increases. Therefore, if we sample 100 parts instead of 1,083, the confidence interval would be wider.

c.

To compute the maximum confidence level that could be attained meeting the management's specifications, n=

( zα / 2 ) 2 σ 2 ( zα / 2 )(2) 2 100(.01) ⇒ 100 = ⇒ ( zα / 2 ) 2 = = .25 ⇒ zα/2 = .5 2 2 4 SE .1

Using Table IV, Appendix B, P(0 ≤ z ≤ .5) = .1915. Thus, α/2 = .5000 − .1915 = .3085,

α = 2(.3085) = .617, and 1 − α = 1 − .617 = .383. The maximum confidence level would be 38.3%.


151


5.70

5.72

σx =

σ n

N −n N 2500 − 1000 = 4.90 2500

a.

σx=

200 1000

b.

σx =

200 5000 − 1000 = 5.66 5000 1000

c.

σx =

10,000 − 1000 = 6.00 10,000 1000

d.

σx =

200 100,000 − 1000 = 6.293 100,000 1000

a.

For n = 36, with the finite population correction factor: ⎛ N − n ⎞ 24 ⎛ 5000 − 64 ⎞ σˆ x = s / n ⎜⎜ ⎟= ⎜ ⎟ = 3 .9872 = 2.9807 N ⎟⎠ 5000 ⎟⎠ 64 ⎜⎝ ⎝

200

without the finite population correction factor: 24 σˆ x = s / n = =3 64

σˆ x without the finite population correction factor is slightly larger. b.

For n = 400, with the finite population correction factor: ⎛ N −n ⎞ 24 ⎛ 5000 − 400 ⎞ σˆ x = s / n ⎜⎜ ⎟⎟ = ⎜ ⎟ = 1.2 .92 = 1.1510 N ⎠ 5000 ⎟⎠ 400 ⎜⎝ ⎝ without the finite population correction factor: 24 σˆ x = s / n = = 1.2 400

c.

5.74

In part a, n is smaller relative to N than in part b. Therefore, the finite population correction factor did not make as much difference in the answer in part a as in part b.

An approximate 95% confidence interval for μ is: s N −n 14 375 − 40 x ± 2σˆ x ⇒ x ± 2 ⇒ 422 ± 2 375 N 40 n ⇒ 422 ± 4.184 ⇒ (417.816, 426.184)

152

Chapter 5


5.76

a.

For N = 2,193, n = 223, x =116,754, and s = 39,185, the 95% confidence interval is:

s N −n 39,185 2,193 − 223 ⇒ 116,754 ± 2 N 2,193 n 223 ⇒ 116,754 ± 4,974.06 ⇒ (111,779.94, 121,728.06)

x ± 2σˆ x ⇒ x ± 2

5.78

b.

We are 95% confident that the mean salary of all vice presidents who subscribe to Quality Progress is between $111,777.94 and $121,728.06.

a.

The population of interest is the set of all households headed by women that have incomes of $25,000 or more in the database.

b.

Yes. Since n/N = 1,333/25,000 = .053 exceeds .05, we need to apply the finite population correction.

c.

The standard error for pˆ should be:

σˆ pˆ = d.

.708(1 − .708) ⎛ 25,000 − 1,333 ⎞ pˆ (1 − pˆ ) ⎛ N − n ⎞ ⎜ ⎟= ⎜ ⎟ = .012 1333 25,000 n ⎝ N ⎠ ⎝ ⎠

For confidence coefficient .90, α = 1 − .90 = .10 and α/2 = .10/2 = .05. From Table IV, Appendix B, z.05 = 1.645. The approximate 90% confidence interval is:

pˆ ± 1.645σˆ pˆ ⇒ .708 ± 1.645(.012) ⇒ (.688, .728) 5.80

For N = 1,500, n = 35, x = 1, and s = 124, the 95% confidence interval is:

⎛ s ⎞ N −n ⎛ 124 ⎞ 1,500 − 35 x ± 2σˆ x ⇒ x ± 2 ⎜ ⇒ 1 ± 2⎜ ⇒ 1 ± 41.43 ⎟ ⎟ 1,500 N ⎝ n⎠ ⎝ 35 ⎠ ⇒ (−40.43, 42.43) We are 95% confident that the mean error of the new system is between -$40.43 and $42.43.

5.82

a.

For a small sample from a normal distribution with unknown standard deviation, we use the t statistic. For confidence coefficient .95, α = 1 − .95 = .05 and α/2 = .05/2 = .025. From Table VI, Appendix B, with df = n − 1 = 23 − 1 = 22, t.025 = 2.074.

b.

For a large sample from a distribution with an unknown standard deviation, we can estimate the population standard deviation with s and use the z statistic. For confidence coefficient .95, α = 1 − .95 = .05 and α/2 = .05/2 = .025. From Table IV, Appendix B, z.025 = 1.96.

c.

For a small sample from a normal distribution with known standard deviation, we use the z statistic. For confidence coefficient .95, α = 1 − .95 = .05 and α/2 = .05/2 = .025. From Table IV, Appendix B, z.025 = 1.96.


153


5.84

d.

For a large sample from a distribution about which nothing is known, we can estimate the population standard deviation with s and use the z statistic. For confidence coefficient .95, α = 1 − .95 = .05 and α/2 = .05/2 = .025. From Table IV, Appendix B, z.025 = 1.96.

e.

For a small sample from a distribution about which nothing is known, we can use neither z nor t.

a.

Of the 400 observations, 227 had the characteristic ⇒ pˆ = 227/400 = .5675. To see if the sample size is sufficiently large: pˆ ± 3σ pˆ ⇒ pˆ ± 3

ˆˆ pq pq .5675(.4325) ⇒ pˆ ± 3 ⇒ .5675 ± 3 ⇒ .5675 ± .0743 n n 400 ⇒ (.4932, .6418)

Since the interval lies within the interval (0, 1), the normal approximation will be adequate. For confidence coefficient .95, α = .05 and α/2 = .05/2 = .025. From Table IV, Appendix B, z.025 = 1.96. The confidence interval is: pˆ ± z.025

b.

pq ⇒ ± 1.96 n

ˆˆ pq .5675(.4325) ⇒ .5675 ± 1.96 ⇒ .5675 ± .0486 n 400 ⇒ (.5189, .6161)

For this problem, SE = .02. For confidence coefficient .95, α = .05 and α/2 = .05/2 = .025. From Table IV, Appendix B, z.025 = 1.96. Thus, n=

( zα / 2 ) 2 pq (1.96) 2 (.5675)(.4325) = = 2,357.2 ≈ 2,358 SE 2 .022

Thus, the sample size was 2,358. 5.86

a.

The finite population correction factor is: ( N − n) = N

b.


c.

(100 − 20) = .8944 100


154

(2,000 − 50) = .9874 2,000

(1,500 − 300) = .8944 1,500

Chapter 5


5.88

5.90

a.

From the printout, the 90% confidence interval is (4.277, 6.184). We are 90% confident that the mean number of offices operated by all Florida law firms is between 4.277 and 6.184.

b.

From the histogram, it appears that the data probably are not from a normal distribution. The data appear to be skewed to the right.

c.

The interval constructed in part a depends on the assumption that the data came from a normal distribution. From part b, it appears that this assumption is not valid. Thus, the confidence interval is probably not valid.

a.


b.

To see if the sample size is sufficiently large:

x 67 = = .638 . n 105

ˆˆ pq .638(.362) ⇒ .638 ± 3 ⇒ .638 ± .141 ⇒ (.497, .779) n 105 Since the interval is wholly contained in the interval (0, 1), we may conclude that the normal approximation is reasonable. pˆ ± 3σ pˆ ≈ pˆ ± 3


5.92

ˆˆ pq .638(.362) ⇒ .638 ± 1.96 ⇒ .638 ± .092 ⇒ (.546, .730) n 105

c.

We are 95% confident that the true proportion of on-the-job homicide cases that occurred at night is between .546 and .730.

a.


Descriptive Statistics: NJValues Variable NJValues

N 20

N* 0

Mean 440.4

SE Mean 67.8

StDev 303.0

Minimum 159.0

Q1 212.3

Median 297.5

Q3 660.5

Maximum 1190.0

For confidence coefficient .95, α = .05 and α/2 = .05/2 = .025. From Table VI, Appendix B, with df = n – 1 = 20 – 1 = 19, t.025 = 2.093. The 95% confidence interval is: x ± t.025 b.

s n

⇒ 440.4 ± 2.093

303.0 20

⇒ 440.4 ± 141.81 ⇒ (298.59, 582.21)

We are 95% confident that the true mean sales price is between $298,590 and $582,210.


155


c.

"95% confidence" means that in repeated sampling, 95% of all confidence intervals constructed will contain the true mean sales price and 5% will not.

d.

Using MINITAB, a histogram of the data is: Histogram of NJValues 9 8

Fr equency

7 6 5 4 3 2 1 0

200

400

600 800 NJValues

1000

1200

Since the sample size is small (n = 20), we must assume that the distribution of sales prices is normal. From the histogram, it does not appear that the data come from a normal distribution. Thus, this confidence interval is probably not valid. 5.94

a.

For confidence coefficient .90, α = .10 and α/2 = .05. From Table IV, Appendix B, z.05 = 1.645. The 90% confidence interval is: x ± z.05

σ n

⇒ x ± 1.645

s n

⇒ 12.2 ± 1.645

10 100

⇒ 12.2 ± 1.645 ⇒ (10.555, 13.845)

We are 90% confident that the mean number of days of sick leave taken by all its employees is between 10.555 and 13.845. b.

For confidence coefficient .99, α = .01 and α/2 = .005. From Table IV, Appendix B, z.005 = 2.58. The sample size is n =

2 ( zα / 2 ) σ 2

SE 2

=

(2.58) 2 (10) 2 = 166.4 ≈ 167 22

You would need to take n = 167 samples.

156

Chapter 5


5.96

a.


2.21 s ⇒ 1.13 ± 2.58 ⇒ 1.13 ± .67 72 n ⇒ (.46, 1.80)

We are 99% confident that the mean number of pecks at the blue string is between .46 and 1.80.

5.98

b.

Yes. The mean number of pecks at the white string is 7.5. This value does not fall in the 99% confident interval for the blue string found in part a. Thus, the chickens are more apt to peck at white string.

a.

First we must compute pˆ : pˆ =

x 124 = .78 = n 159

To see if the sample size is sufficiently large: ˆˆ pq .78(22) ⇒ .78 ± 3 ⇒ .78 ± .099 ⇒ (.681, .879) n 159 Since this interval is wholly contained in the interval (0, 1), we may conclude that the normal approximation is reasonable.

pˆ ± 3σ pˆ ≈ pˆ ± 3


pq ≈ pˆ ± 1.645 n

ˆˆ pq .78(.22) ⇒ .78 ± 1.645 ⇒ .78 ± .054 n 159 ⇒ (.726, .834)

We are 90% confident that the true proportion of all truck drivers who suffer from sleep apnea is between .726 and .834.

5.100

b.

Sleep researchers believe that 25% of the population suffer from obstructive sleep apnea. Since the 90% confidence interval for the proportion of truck drivers who suffer from sleep apnea does not contain .25, it appears that the true proportion of truck drivers who suffer from sleep apnea is larger than the proportion of the general population.

a.

The population of interest is the set of all debit cardholders in the U.S.

c.

Of the 1252 observations, 180 had used the debit card to purchase a product or service on the Internet ⇒ pˆ =

180 = .144 1252


157


To see if the sample size is sufficiently large: pˆ ± 3σ pˆ ≈ pˆ ± 3

ˆˆ pq .144(.856) ⇒ .144 ± 3 ⇒ .144 ± .030 ⇒ (.114, .174) n 1252

Since this interval is wholly contained in the interval (0, 1), we may conclude that the normal approximation is reasonable. d.

For confidence coefficient .98, α = 1 − .98 = .02 and α/2 = .02/2 = .01. From Table IV, Appendix B, z.01 = 2.33. The confidence interval is: pˆ ± z.01

ˆˆ pq .144(.856) ⇒ .144 ± .023 ⇒ (.121, .167) ⇒ .144 ± 2.33 n 1252

We are 98% confident that the proportion of debit cardholders who have used their card in making purchases over the Internet is between .121 and .167.

5.102

e.

Since we would have less confidence with a 90% confidence interval than with a 98% confidence interval, the 90% interval would be narrower.

a.

Of the 100 cancer patients, 7 were fired or laid off ⇒ = 7/100 = .07. To see if the sample size is sufficiently large: pˆ ± 3σ pˆ ⇒ pˆ ± 3

ˆˆ pq pq .07(.93) ⇒ pˆ ± 3 ⇒ .07 ± 3 ⇒ .07 ± .077 n n 100 ⇒ (−.007, .145)

Since the interval does not lie within the interval (0, 1), the normal approximation will not be adequate. We will go ahead and construct the interval anyway. For confidence coefficient .90, α = .10 and α/2 = .10/2 = .05. From Table IV, Appendix B, z.05 = 1.645. The confidence interval is: pˆ ± z.05

pq ⇒ pˆ ± 1.645 n

ˆˆ pq .07(.93) ⇒ .07 ± 1.645 ⇒ .07 ± .042 n 100 ⇒ (.028, .112)

Converting these to percentages, we get (2.8%, 11.2%).

158

b.

We are 90% confident that the percentage of all cancer patients who are fired or laid off due to their illness is between 2.8% and 11.2%.

c.

Since the rate of being fired or laid off for all Americans is 1.3% and this value falls outside the confidence interval in part b, there is evidence to indicate that employees with cancer are fired or laid off at a rate that is greater than that of all Americans.

Chapter 5


5.104

a.

x 9296 = = .9296 n 10,000

pˆ =

The approximate 95% confidence interval is: pˆ (1 − pˆ ) N − n .9296(.0704) 500,000 − 10,000 ⇒ .9296 ± 2 10,000 500,000 n N

pˆ ± 2

⇒ .9296 ± 2 .000006413 ⇒ .9296 ± .0051 ⇒ (.9245, .9347)

5.106

10,000 × 100% = 2% of the subscribers returned the questionnaire. Often in mail 500,000 surveys, those that respond are those with strong views. Thus, the 10,000 that responded may not be representative. I would question the estimate in part a.

b.

Only

a.

The point estimate for the fraction of the entire market who refuse to purchase bars is:

pˆ = b.

x 23 = = .094 n 244

To see if the sample size is sufficient:

pˆ ± 3

ˆˆ pq (.094)(.906) ⇒ .094 ± 3 ⇒ .094 ± .056 ⇒ (.038, .150) 244 n

Since the interval above is contained in the interval (0, 1), the sample size is sufficiently large. c.

For confidence coefficient .95, α = 1 − .95 = .05 and α/2 = .05/2 = .025. From Table IV, Appendix B, z.025 = 1.96. The confidence interval is:

pˆ ± z.025 d.

ˆˆ pq (.094)(.906) ⇒ .094 ± 1.96 ⇒ .094 ± .037 ⇒ (.057, .131) 244 n

The best estimate of the true fraction of the entire market who refuse to purchase bars six months after the poisoning is .094. We are 95% confident the true fraction of the entire market who refuse to purchase bars six months after the poisoning is between .057 and .131.


159


5.108

The bound is SE = .1. For confidence coefficient .99, α = 1 − .99 = .01 and α/2 = .01/2 = .005. From Table IV, Appendix B, z.005 = 2.575. We estimate p with from Exercise 7.48 which is = .636. Thus,

n=

( zα / 2 ) 2 pq 2.5752 (.636)(.364) ≈ = 153.5 ⇒ 154 .12 SE 2

The necessary sample size would be 154. 5.110

Since the manufacturer wants to be reasonably certain the process is really out of control before shutting down the process, we would want to use a high level of confidence for our inference. We will form a 99% confidence interval for the mean breaking strength. For confidence coefficient .99, α = .01 and α/2 = .01/2 = .005. From Table VI, Appendix B, with df = n – 1 = 9 – 1 = 8, t.005 = 3.355. The 99% confidence interval is:

x ± t.005

s 22.9 ⇒ 985.6 ± 3.355 ⇒ 985.6 ± 25.61 ⇒ (959.99, 1,011.21) 9 n

We are 99% confident that the true mean breaking strength is between 959.99 and 1,011.21. Since 1,000 is contained in this interval, it is not an unusual value for the true mean breaking strength. Thus, we would recommend that the process is not out of control.

160

Chapter 5



Chapter 6

6.2

The test statistic is used to decide whether or not to reject the null hypothesis in favor of the alternative hypothesis.

6.4

A Type I error is rejecting the null hypothesis when it is true. A Type II error is accepting the null hypothesis when it is false.

α = the probability of committing a Type I error. β = the probability of committing a Type II error. 6.6

We can compute a measure of reliability for rejecting the null hypothesis when it is true. This measure of reliability is the probability of rejecting the null hypothesis when it is true which is α. However, it is generally not possible to compute a measure of reliability for accepting the null hypothesis when it is false. We would have to compute the probability of accepting the null hypothesis when it is false, β, for every value of the parameter in the alternative hypothesis.

6.8

Let p = proportion of U.S. companies that have formal, written travel and entertainment policies for their employees. The null hypothesis would be: H0: p = .80

6.10

Let μ = average Libor rate for 3-month loans. Since many Western banks think that the reported average Libor rate (.054) is too high, they want to show that the average is less than .054. The appropriate hypotheses would be: H0: μ = .054 Ha: μ < .054

6.12

Let p = proportion of time the camera correctly detects liars. The null hypothesis would be: H0: p = .75

6.14

a.

A Type I error would be concluding the proposed user is unauthorized when, in fact, the proposed user is authorized. A Type II error would be concluding the proposed user is authorized when, in fact, the proposed user is unauthorized. In this case, a more serious error would be a Type II error. One would not want to conclude that the proposed user is authorized when he/she is not.

b.

The Type I error rate is 1%. This means that the probability of concluding the proposed user is unauthorized when, in fact, the proposed user is authorized is .01.


161


The Type II error rate is .00025%. This means that the probability of concluding the proposed user is authorized when, in fact, the proposed user is unauthorized is .0000025. c.

The Type I error rate is .01%. This means that the probability of concluding the proposed user is unauthorized when, in fact, the proposed user is authorized is .0001. The Type II error rate is .005%. This means that the probability of concluding the proposed user is authorized when, in fact, the proposed user is unauthorized is .00005.

6.16

6.18

a.

The null hypothesis is: Ho: There is no intrusion.

b.

The alternative hypothesis is: Ha: There is an intrusion.

c.

α = P(warning | no intrusion) =

1 = .001 . 1000

β = P(no warning | intrusion) =

500 = .5 . 1000

a.

The decision rule is to reject H0 if x > 270. Recall that z=

x − μ0

σx

Therefore, reject H0 if x > 270 can be written reject H0 if z >

x − μ0

σx 270 − 255 z> 63/ 81 z > 2.14

The decision rule in terms of z is to reject H0 if z > 2.14. b.

6.20

a.

P(z > 2.14) = .5 − P(0 < z < 2.14) = .5 − .4838 = .0162 H0: μ = .36 Ha: μ < .36

The test statistic is z =

x − μ0

σx

=

.323 − .36 .034 / 64

= −1.61

The rejection region requires α = .10 in the lower tail of the z-distribution. From Table IV, Appendix B, z.10 = 1.28. The rejection region is z < −1.28.

162

Chapter 6


Since the observed value of the test statistic falls in the rejection region (z = −1.61 < −1.28), H0 is rejected. There is sufficient evidence to indicate the mean is less than .36 at α = .10. b.

H0: μ = .36 Ha: μ ≠ .36

The test statistic is z = −1.61 (see part a). The rejection region requires α/2 = .10/2 = .05 in the each tail of the z-distribution. From Table IV, Appendix B, z.05 = 1.645. The rejection region is z < −1.645 or z > 1.645. Since the observed value of the test statistic does not fall in the rejection region (z = −1.61
a.

To determine whether the mean July, 2006 dealer price of the Toyota Prius differs from $25,000, we test: H0: μ = 25,000 Ha: μ ≠ 25,000

b.

The sample mean is x =

∑ xi = 4,076,271 = 25, 476.69 n

160

The sample variance is:

s2 =

∑

xi2

( ∑ xi ) − n −1

n

2

=

104,788,653,115 − 160 − 1

4,076,2712 160 = 5,904,057.862

The sample standard deviation is: s = s 2 = 5,904,057.862 = 2, 429.8267 x − μo


d.

The rejection region requires α/2 = .05/2 = .025 in each tail of the z-distribution. From Table IV, Appendix B, z.025 = 1.96. The rejection region is z < −1.96 or z > 1.96.

e.

Since the observed value of the test statistic falls in the rejection region (z = 2.48 > 1.96), Ho is rejected. There is sufficient evidence to indicate the mean July, 2006 dealer price of the Toyota Prius differs from $25,000 at α = .05.

σx

=

25, 476.69 − 25,000 = 2.48 2, 429.8267 160

c.


163


6.24

a.

A Type I error is rejecting H0 when H0 is true. In this case, we would conclude that the mean number of carats per diamond is different from .6 when, in fact, it is equal to .6. A Type II error is accepting H0 when H0 is false. In this case, we would conclude that the mean number of carats per diamond is equal to .6 when, in fact, it is different from .6.

b.

From Exercise 5.18, the random sample of 30 diamonds yielded x = .691 and s = .262. Let μ = mean number of carats per diamond. To determine if the mean number of carats per diamond is different from .6, we test: H0: μ = .6 Ha: μ ≠ .6 The test statistic is z =

x − μ0

σx

=

.691 − .6 .262

30

= 1.90

The rejection region requires α/2 = .05/2 = .025 in each tail of the z-distribution. From Table IV, Appendix B, z.025 = 1.96. The rejection region is z > 1.96 or z < −1.96. Since the observed value of the test statistic does not fall in the rejection region (z = 1.90 >/ 1.96), H0 is not rejected. There is insufficient evidence to indicate the mean number of carats per diamond is different from .6 carats at α = .05. c.

When α is changed, H0, Ha, and the test statistic remain the same. The rejection region requires α/2 = .10/2 = .05 in each tail of the z-distribution. From Table IV, Appendix B, z.05 = 1.645. The rejection region is z > 1.645 or z < −1.645. Since the observed value of the test statistic falls in the rejection region (z = 1.90 > 1.645), H0 is rejected. There is sufficient evidence to indicate the mean number of carats per diamond is different from .6 carats at α = .10.

d.

6.26

When the value of α changes, the decision can also change. Thus, it is very important to include the level of α used in all decisions.

Using MINITAB, the descriptive statistics are: Descriptive Statistics: GASTURBINE Variable GASTURBINE

N 67

N* 0

Variable GASTURBINE

Maximum 16243

Mean 11066

SE Mean 195

StDev 1595

Minimum 8714

Q1 9918

Median 10656

Q3 11842

To determine if the mean heat rate of gas turbines augmented with high pressure inlet fogging exceeds 10,000 kJ/kWh, we test: H0: μ = 10,000 H0: μ > 10,000

164

Chapter 6


x − μo


σx

=

11,066 − 10,000 = 5.47 1,595 67

The rejection region requires α = .05 in the upper tail of the z-distribution. From Table IV, Appendix B, z.05 = 1.645. The rejection region is z > 1.645. Since the observed value of the test statistics falls in the rejection region (z = 5.47 > 1.645), H0 is rejected. There is sufficient evidence to indicate the true mean heat rate of gas turbines augmented with high pressure inlet fogging exceeds 10,000 kJ/kWh at α = .05. 6.28

a.

Let μ = average full-service fee (in thousands of dollars) of U.S. funeral homes in 2006. To determine if the average full-service fee exceeds $6,500, we test: H0: μ = 6.50 Ha: μ > 6.50

b.

Using MINTAB, the output is: Descriptive Statistics: FUNERAL Variable Fee Variable Fee

N 36

Mean 6.819 Minimum 5.200

Median 6.600 Maximum 11.600

StDev 1.265 Q1 6.025

SE Mean 0.211 Q3 7.400

H0: μ = 6.50 Ha: μ > 6.50 The test statistic is z =

x − μ0

σx

=

6.819 − 6.50 = 1.51 1.265 36

The rejection region requires α = .05 in the upper tail of the z-distribution. From Table IV, Appendix B, z.05 = 1.645. The rejection region is z > 1.645. Since the observed value of the test statistic does not fall in the rejection region (z = 1.51 >/ 1.645), H0 is not rejected. There is insufficient evidence to indicate the true mean full-service fee of U.S. funeral homes in 2006 exceeds $6,500 at α = .05. c.

No. Since the sample size (n = 36) is greater than 30, the Central Limit Theorem applies. The distribution of x is approximately normal regardless of the population distribution.


165


6.30

a.

To determine if the sample data refute the manufacturer's claim, we test:

H0: μ = 10 Ha: μ < 10 b.

A Type I error is concluding the mean number of solder joints inspected per second is less than 10 when, in fact, it is 10 or more. A Type II error is concluding the mean number of solder joints inspected per second is at least 10 when, in fact, it is less than 10.

c.


Descriptive Statistics: PCB Variable PCB

N 48

Mean 9.292

Median 9.000

TrMean 9.432

Variable PCB

Minimum 0.000

Maximum 13.000

Q1 9.000

Q3 10.000

StDev 2.103

SE Mean 0.304

H0: μ = 10 Ha: μ < 10 The test statistic is z =

x − μ0

σx

=

9.292 − 10 2.103 / 48

= −2.33

The rejection region requires α = .05 in the lower tail of the z-distribution. From Table IV, Appendix B, z.05 = 1.645. The rejection region is z < −1.645. Since the observed value of the test statistic falls in the rejection region (z = −2.33 < −1.645), H0 is rejected. There is sufficient evidence to indicate the mean number of inspections per second is less than 10 at α = .05. 6.32

166

We will reject H0 if the p-value < α. a.

.06
b.

.10
c.

.01 < .05, reject H0.

d.

.001 < .05, reject H0.

e.

.251
f.

.042 < .05, reject H0.

Chapter 6


6.34

z=

x − μ0

σx

=

49.4 − 50 4.1/ 100

= −1.46

p-value = P(z ≥ −1.46) = .5 + .4279 = .9279 There is no evidence to reject H0 for α ≤ .10. 6.36

First, find the value of the test statistic: z=

x − μ0

σx

=

10.7 − 10 3.1/ 50

= 1.60

p-value = P(z ≤ −1.60 or z ≥ 1.60) = 2P(z ≥ 1.60) = 2(.5 − .4452) = 2(.0548) = .1096 (using Table IV, Appendix B) There is no evidence to reject H0 for α ≤ .10. 6.38

a.

The p-value reported by SAS is for a two-tailed test. Thus, P(z ≤ −1.63) + P(z ≥ 1.63) = .1032. For this one-tailed test, the p-value = P(z ≤ −1.63) = .1032/2 = .0516. Since the p-value = .0516 > α = .05, H0 is not rejected. There is insufficient evidence to indicate μ < 75 at α = .05.

b.

For this one-tailed test, the p-value = P(z ≤ 1.63). Since P(z ≤ −1.63) = .1032/2 = .0516, P(z ≤ 1.63) = 1 − .0516 = .9484. Since the p-value = .9484 > α = .10, H0 is not rejected. There is insufficient evidence to indicate μ < 75 at α = .10.

c.

For this one-tailed test, the p-value = P(z ≥ 1.63) = .1032/2 = .0516. Since the p-value = .0516 < α = .10, H0 is rejected. There is sufficient evidence to indicate μ > 75 at α = .10.

d.

For this two-tailed test, the p-value = .1032. Since the p-value = .1032 > α = .01, H0 is not rejected. There is insufficient evidence to indicate μ ≠ 75 at α = .01.

6.40

The p-value is p = 0.014. The probability of observing a test statistic of t = 2.48 or anything more unusual if μ = 25,000 is 0.014. Since p = 0.014 is so small, we would reject H0. There is sufficient evidence to indicate the mean prices for hybrid Toyota Prius cars is different than $25,000 for any value of α > .014.

6.42

From the printout, the p-value = .000. Since the p-value = .000 < α = .01, H0 is rejected. There is sufficient evidence to indicate that the true population mean weight of plastic golf tees is different from .250 at α = .01.


167


6.44

a.

z=

x − μo

σx

=

52.3 − 51 7.1

= 1.29

50

The p-value is p = P ( z ≥ 1.29)+P ( z ≤ −1.29) = (.5 − .4015) + (.5 − .4015) = .1970 . (Using Table IV, Appendix B.)

b.

The p-value is p = P ( z ≥ 1.29)= (.5 − .4015) = .0985 . (Using Table IV, Appendix B.)

c.

z=

x − μo

σx

=

52.3 − 51 10.4

50

= 0.88

The p-value is p = P ( z ≥ 0.88)+P ( z ≤ −0.88) = (.5 − .3106) + (.5 − .3106) = .3788 . (Using Table IV, Appendix B.) d.

In part a, in order to reject H0, α would have to be greater than .1970. In part b, in order to reject H0, α would have to be greater than .0985. In part c, in order to reject H0, α would have to be greater than .3788.

e.

For a two-tailed test, α/2 = .01/2 = .005. From Table IV, Appendix B, z.005 = 2.58. z=

x − μo

σx

⇒ 2.58 =

52.3 − 51 s

50

⇒ 2.58

s 50

= 52.3 − 51 ⇒ .3649s = 1.3 ⇒ s = 3.56

For a one-tailed test, α = .01. From Table IV, Appendix B, z.01 = 2.33. z=

6.46

a.

z=

x − μo

σx

x − μ0

σx

⇒ 2.33 =

=

52.3 − 51 s

10.2 − 0

50

⇒ 2.33

s 50

= 52.3 − 51 ⇒ .3295s = 1.3 ⇒ s = 3.95

= 2.30

31.3 / 50

b.

For this two-sided test, the p-value = P(z ≥ 2.30) + P(z ≤ −2.30) = (.5 − .4893) + (.5 − .4893) = .0214. Since this value is so small, there is evidence to reject H0. There is sufficient evidence to indicate the mean level of feminization is different from 0% for any value of α > .0214.

c.

z=

x - μ0

σx

=

15.0 − 0

= 4.23

25.1/ 50

For this two-sided test, the p-value = P(z ≥ 4.23) + P(z ≤ −4.23) ≈ (.5 − .5) + (.5 − .5) = 0. Since this value is so small, there is evidence to reject H0. There is sufficient evidence to indicate the mean level of feminization is different from 0% for any value of α > 0.0.

168

Chapter 6


6.48

6.50

a.

P(t > 1.440) = .10 (Using Table VI, Appendix B, with df = 6)

b.

P(t < −1.782) = .05 (Using Table VI, Appendix B, with df = 12)

c.

P(t < −2.060) + P(t > 2.060) = .025 + .025 = .05 (Using Table VI, Appendix B, with df = 25)

d.

The probability of a Type I error is computed above for each of the parts.

a.

H0: μ = 6 Ha: μ < 6 The test statistic is t =

x − μ0 s/ n

=

4.8 − 6 1.3/ 5

= −2.064

The necessary assumption is that the population is normal. The rejection region requires α = .05 in the lower tail of the t-distribution with df = n − 1 = 5 − 1 = 4. From Table VI, Appendix B, t.05 = 2.132. The rejection region is t < −2.132. Since the observed value of the test statistic does not fall in the rejection region (t = −2.064
H0: μ = 6 Ha: μ ≠ 6 The test statistic is t = −2.064 (from a). The assumption is the same as in a. The rejection region requires α/2 = .05/2 = .025 in each tail of the t-distribution with df = n − 1 = 5 − 1 = 4. From Table VI, Appendix B, t.025 = 2.776. The rejection region is t < −2.776 or t > 2.776. Since the observed value of the test statistic does not fall in the rejection region (t = −2.064

169


c.

For part a, the p-value = P(t ≤ −2.064). From Table VI, with df = 4, .05 < P(t ≤ −2.064) < .10 or .05 < p-value < .10. For part b, the p-value = P(t ≤ −2.064) + P(t ≥ 2.064). From Table VI, with df = 4, 2(.05) < p-value < 2(.10) or .10 < p-value < .20.

6.52

a.

To determine if the true mean breaking strength of the new bonding adhesive is less than 5.70 Mpa, we test: H0: μ = 5.70 Ha: μ < 5.70

6.54

b.

The rejection region requires α = .01 in the lower tail of the t-distribution with df = n – 1 = 10 – 1 = 9. From Table VI, Appendix B, t.01 = 2.821. The rejection region is t < -2.821.

c.

The test statistic is t =

d.

Since the observed value of the test statistic falls in the rejection region (t = −4.33 < −2.821), H0 is rejected. There is sufficient evidence to indicate the true mean breaking strength of the new bonding adhesive is less than 5.70 Mpa at α = .01.

e.

We must assume that the sample was random and selected from a normal population.

x − μo s

n

=

5.07 − 5.70 .46

10

= −4.33 .

Some preliminary calculations are:

x=

s2 =

∑ x − 736 n

∑x

7

2

= 105.14

(∑ x) − n −1

n

2

=

(736) 2 7 = 218.4762 7 −1

78696 −

s=

218.4762 = 14.7809

a.

To determine if the mean consumption rate of salad dressings in the Southeastern U.S. is different than the mean national consumption rate, we test: H0: μ = 100 Ha: μ ≠ 100

b.

170

Since the sample size is so small, we must assume that the population being sampled is normal. In addition, we must assume that the sample is random.

Chapter 6


c.


x − μ0 s/ n

=

105.14 − 100 14.7809 / 7

= .92

The rejection region requires α/2 = .05/2 = .025 in each tail of the t-distribution. From Table VI, Appendix B, with df = n − 1 = 7 − 1 = 6, t.025 = 2.447. The rejection region is t > 2.447 or t < −2.447. Since the value of the test statistic does not fall in the rejection region (t = .92 >/ 2.447), H0 is not rejected. There is insufficient evidence to indicate the mean consumption rate of salad dressings in the Southeastern U.S. is different than the mean national consumption rate at α = .05.

6.56

d.

The observed significance level is p-value = P(t ≥ .92) + P(t ≤ −.92). Since we did not reject H0 in part c, we know that the p-value must be greater than .05. Using Table VI, Appendix B, with df = n − 1 = 7 − 1 = 6, p-value = P(t ≥ .92) + P(t ≤ −.92) > .1 + .1 = .2 Thus, with this table, we only know that the p-value is greater than .2.

a.

To determine if the mean repellency percentage of the new mosquito repellent is less than 95, we test:

H0: μ = 95 Ha: μ < 95 The test statistic is t =

x − μ0 s/ n

=

83 − 95 15 / 5

= −1.79

The rejection region requires α = .10 in the lower tail of the t distribution. From Table VI, Appendix B, with df = n − 1 = 5 − 1 = 4, t.10 = 1.533. The rejection region is t < −1.533. Since the observed value of the test statistic falls in the rejection region (t = −1.79 < −1.533), H0 is rejected. There is sufficient evidence to indicate that the true mean repellency percentage of the new mosquito repellent is less than 95 at α = .10.

6.58

b.

We must assume that the population of percent repellencies is normally distributed.

a.


Descriptive Statistics: Plants Variable Plants

N 20

Mean 4.000

Median 3.500

TrMean 3.667

Variable Plants

Minimum 1.000

Maximum 13.000

Q1 1.250

Q3 5.000

StDev 3.061

SE Mean 0.684

Let μ = mean number of active nuclear power plants operating in all states. To determine if the mean number of active nuclear power plants operating in all states exceeds 3, we test:

H0: μ = 3 Ha: μ > 3


171



x − μo s

n

=

4−3 3.061

20

= 1.46

The rejection region requires α = .10 in the upper tail of the t-distribution with df = n – 1 = 20 – 1 = 19. From Table VI, Appendix B, t.10 = 1.328. The rejection region is t > 1.328. Since the observed value of the test statistic falls in the rejection region (t = 1.46 > 1.328), H0 is rejected. There is sufficient evidence to indicate the mean number of active nuclear power plants operating in all states exceeds 3 at α = .10. b.

We will look at the 4 methods for determining if the data are normal. First, we will look at a histogram of the data. Using MINITAB, the histogram of the number of power plants is:

7 6

Frequency

5 4 3 2 1 0 2

4

6

8

10

12

14

Plants

From the histogram, the data appear to be skewed to the right. This indicates that the data may not be normal. Next, we look at the intervals x ± s, x ± 2 s, x ± 3s . If the proportions of observations falling in each interval are approximately .68, .95, and 1.00, then the data are approximately normal.

x ± s ⇒ 4 ± 3.061 ⇒ (.939, 7.061) 18 of the 20 values fall in this interval. The proportion is .90. This is much greater than the .68 we would expect if the data were normal. x ± 2s ⇒ 4 ± 2(3.061) ⇒ 4 ± 6.122 ⇒ (−2.122, 10.122) 19 of the 20 values fall in this interval. The proportion is .95. This is the same as the .95 we would expect if the data were normal. x ± 3s ⇒ 4 ± 3(3.061) ⇒ 4 ± 9.183 ⇒ (−5.183, 13.183) 20 of the 20 values fall in this interval. The proportion is 1.000. This is equal to the 1.00 we would expect if the data were normal.

172

Chapter 6


From this method, it appears that the data are not normal. Next, we look at the ratio of the IQR to s. IQR = QU – QL = 5.00 – 1.25 = 3.75.

IQR 3.75 = = 1.22 This is close to the 1.3 we would expect if the data were normal. s 3.061 This method indicates the data may be normal. Finally, using MINITAB, the normal probability plot is: Normal Probability Plot for Plants ML Estimates - 95% CI

99

ML Estimates

95

Mean

4

StDev

2.98329

90

Goodness of Fit

Percent

80

AD*

70 60 50

1.298

40 30 20 10 5

1 -5

0

5

10

Data

Since the data do not form a straight line, the data are not normal. From 3 of the 4 different methods, the indications are that the number of power plants data are not normal. c.

The two largest values are 9 and 13. The two lowest values are 1 and 1. Using MINITAB with the data deleted yields the descriptive statistics:

Descriptive Statistics: Plants2 Variable Plants2

N 16

Mean 3.500

Median 3.500

TrMean 3.429

Variable Plants2

Minimum 1.000

Maximum 7.000

Q1 2.000

Q3 5.000

StDev 1.826

SE Mean 0.456

To determine if the mean number of active nuclear power plants operating in all states exceeds 3 (using the reduced data set), we test: H0: μ = 3 Ha: μ > 3


173



x − μo s

n

=

3.5 − 3 1.826

16

= 1.10

The rejection region requires α = .10 in the upper tail of the t-distribution with df = n – 1 = 16 – 1 = 15. From Table VI, Appendix B, t.10 = 1.341. The rejection region is t > 1.341. Since the observed value of the test statistic does not fall in the rejection region (t = 1.10 >/ 1.341), H0 is not rejected. There is insufficient evidence to indicate the mean number of active nuclear power plants operating in all states exceeds 3 at α = .10. By eliminating the top two and bottom two observations, we have changed the decision from rejecting H0 to not rejecting H0. d.

6.60

It is very dangerous to eliminate data points to satisfy assumptions. The data may, in fact, not be normal. By eliminating data points, one has changed the kind of data that come from the parent population. Thus, incorrect decisions could be made.

Using MINITAB, the descriptive statistics for the 2 plants are: Descriptive Statistics: AL1, AL2 Variable aximum AL1 AL2

N

N*

Mean

SE Mean

StDev

Minimum

Q1

Median

Q3

2 2

0 0

0.00750 0.0700

0.00250 0.0200

0.00354 0.0283

0.00500 0.0500

* *

0.00750 0.0700

* *

M 0.01000 0.0900

To determine if plant 1 is violating the OSHA standard, we test: H0: μ = .004 Ha: μ > .004 The test statistic is t =

x − μo s

n

=

.0075 − .004 .00354

2

= 1.40

Since no α level was given, we will use α = .05. The rejection region requires α = .05 in the upper tail of the t-distribution with df = n – 1 = 2 – 1 = 1. From Table VI, Appendix B, t.05 = 6.314. The rejection region is t > 6.314. Since the observed value of the test statistic does not fall in the rejection region (t = 1.40 >/ 6.314), H0 is not rejected. There is insufficient evidence to indicate the OSHA standard is violated by plant 1 at α = .05. To determine if plant 2 is violating the OSHA standard, we test: H0: μ = .004 Ha: μ > .004 The test statistic is t =

174

x − μo s

n

=

.07 − .004 .0283

2

= 3.30

Chapter 6


Since no α level was given, we will use α = .05. The rejection region requires α = .05 in the upper tail of the t-distribution with df = n – 1 = 2 – 1 = 1. From Table VI, Appendix B, t.05 = 6.314. The rejection region is t > 6.314. Since the observed value of the test statistic does not fall in the rejection region (t = 3.30 >/ 6.314), H0 is not rejected. There is insufficient evidence to indicate the OSHA standard is violated by plant 2 at α = .05. 6.62

b.

First, check to see if n is large enough. p0 ± 3σ pˆ ⇒ p0 ± 3

p0 q0 (.70)(.30) ⇒ .70 ± 3 ⇒ .70 ± .14 ⇒ (.56, .84) 100 n

Since the interval lies within the interval (0, 1), the normal approximation will be adequate. H0: p = .70 Ha: p < .70 The test statistic is z =

pˆ − p0

σ pˆ

=

pˆ − p0 p0 q0 n

=

.63 − .70 = −1.53 .70(.30) 100

The rejection region requires α = .05 in the lower tail of the z-distribution. From Table IV, Appendix B, z.05 = 1.645. The rejection region is z < −1.645. Since the observed value of the test statistic does not fall in the rejection region (−1.53
p-value = P(z ≤ −1.53) = .5 − .4370 = .0630 Since p is not less than α = .05, H0 is not rejected.

6.64

a.

No. The p-value is the probability of observing your test statistic or anything more unusual if H0 is true. For this problem, the p-value = .3300/2 = .1650. Given the true value of the population proportion, p, is .5, the probability of observing a test statistic of z = .44 or larger is .1650. Since the p-value is not small (p = .1650), there is no evidence to reject H0. There is no evidence to indicate the population proportion is greater than .5.

b.

If the alternative hypothesis were two-tailed, the p-value would be 2 times the p-value for a one-tailed test. For this problem, the p-value = .3300. The probability of observing your test statistic or anything more unusual if H0 is true is .3300. There is no evidence to reject H0 for α ≤ .10. There is no evidence to indicate that p ≠ .5 for α ≤ .10.


175


6.66

6.68

x 64 = = .604 n 106

a.

pˆ =

b.

H0: p = .70 Ha: p ≠ .70

c.


d.

The rejection region requires α/2 = .01/2 = .005 in each tail of the z-distribution. From Table IV, Appendix B, z.005 = 2.58. The rejection region is z > 2.58 or z < −2.58.

e.

Since the observed value of the test statistic does not fall in the rejection region (z = −2.16
a.

The population parameter of interest is p = proportion of items that had the wrong price scanned at California Wal-Mart stores.

b.

To determine if the true proportion of items scanned at California Wal-Mart stores with the wrong price exceeds the 2% NIST standard, we test:

pˆ − p0 p0 q0 n

=

.604 − .70 = −2.16 .70(.30) 106

H0: p = .02 Ha: p > .02 c.


pˆ − po po qo n

=

.083 − .02 .02(.98) 1000

= 14.23

The rejection region requires α = .05 in the upper tail of the z-distribution. From Table IV, Appendix B, z.05 = 1.645. The rejection region is z > 1.645. d.

Since the observed value of the test statistic falls in the rejection region (z = 14.23 > 1.645), H0 is rejected. There is sufficient evidence to indicate that the true proportion of items scanned at California Wal-Mart stores with the wrong price exceeds the 2% NIST standard at α = .05. This means that the proportion of items with wrong prices at California Wal-Mart stores is much higher than what is allowed.

e.

In order for the inference to be valid, the sampling distribution of pˆ must be approximately normal. We check this assumption: po ± 3σ pˆ ⇒ po ± 3

po qo .02(.98) ⇒ .02 ± 3 ⇒ .02 ± .013 ⇒ (.007, .033) n 1000

Since the above interval falls completely in the interval (0, 1), the normal distribution will be adequate.

176

Chapter 6


6.70

a.

Let p = proportion of vacation-home owners who are minorities in 2003. pˆ =

x 46 = = .111 n 416

To determine if the percentage of vacation-home owners in 2006 who are minorities is larger than 6%, we test: H0: p = .06 Ha: p > .06 The test statistic is z =

pˆ − po po qo n

=

.111 − .06 = 4.38 .06(.94) 416

The rejection region requires α = .01 in the upper tail of the z-distribution. From Table IV, Appendix B, z.01 = 2.33. The rejection region is z > 2.33. Since the observed value of the test statistic falls in the rejection region (z = 4.38 > 2.33), H0 is rejected. There is sufficient evidence to indicate that the true percentage of vacation-home owners in 2006 who are minorities is larger than 6% at α = .01. b.

6.72

Since the return rate of the questionnaire was so small compared to the number sent out, one should be very skeptical of the results. It would be fairly unusual that the sample of returned questionnaires would be representative of the entire population.

Let p = proportion of firms in violation of the new 4-day rule for reporting material changes. pˆ =

x 23 = = .050 n 462

To determine if the percentage of firms in violation of the new 4-day rule for reporting material changes is less than 10%, we test: H0: p = .10 Ha: p < .10 The test statistic is z =

pˆ − po po qo n

=

.050 − .10 = −3.58 .10(.90) 462

The rejection region requires α = .01 in the lower tail of the z-distribution. From Table IV, Appendix B, z.01 = 2.33. The rejection region is z < −2.33. Since the observed value of the test statistic falls in the rejection region (z = −3.58 < −2.33), Ho is rejected. There is sufficient evidence to indicate that the true percentage of firms in violation of the new 4-day rule for reporting material changes is less than 10% at α = .01.


177


6.74

Let p = proportion of patients taking the pill who reported an improved condition. First we check to see if the normal approximation is adequate: p0 ± 3σ pˆ ⇒ p0 ± 3

p0 q0 .5(.5) ⇒± 3 ⇒ .5 ± .018 ⇒ (.482, .518) 7000 n

Since the interval falls completely in the interval (0, 1), the normal distribution will be adequate. To determine if there really is a placebo effect at the clinic, we test: H0: p = .5 Ha: p > .5 The test statistic is z =

pˆ − p0 p0 q0 n

=

.7 − .5 = 33.47 .5(.5) 7000

The rejection region requires α = .05 in the upper tail of the z distribution. From Table IV, Appendix B, z.05 = 1.645. The rejection region is z > 1.645. Since the observed value of the test statistic falls in the rejection region (z = 33.47 > 1.645), H0 is rejected. There is sufficient evidence to indicate that there really is a placebo effect at the clinic at α = .05. 6.76

a.

The power of a test increases when: 1. 2. 3.

b.

178

The distance between the null and alternative values of μ increases. The value of α increases. The sample size increases.

The power of a test is equal to 1 − β. As β increases, the power decreases.

Chapter 6


6.78

6.80

From Exercise 6.77 we want to test H0: μ = 500 against Ha: μ > 500 using α = .05, σ = 100, n = 25, and x = 532.9. ⎛

532.9 − 575 ⎞ ⎟ = P(z < −2.11) 100 / 25 ⎠ = .5 − .4826 = .0174

a.

β = P( x0 < 532.9 when μ = 575) = P ⎜ z <

b.

Power = 1 − β = 1 − .0174 = .9826

c.

In Exercise 6.77, β = .1949 and the power is .8051. The value of β has decreased in this exercise since μ = 575 is further from the hypothesized value than μ = 550. As a result, the power of the test in this exercise has increased (when β decreases, the power of the test increases).

a.

From Exercise 6.79, we want to test H0: μ = 75 against Ha: μ < 75 using α = .10, σ = 15, n = 49, and x = 72.257.

⎝

If μ = 74,

⎛

β = P( x0 > 72.257 when μ = 74) = P ⎜ z > ⎝

If μ = 72,

⎛

μ = P( x0 > 72.257 when μ = 72) = P ⎜ z > ⎝

If μ = 70,

72.257 − 74 ⎞ ⎟ = P(z > −.81) 15 / 49 ⎠ = .5 + .2910 = .7910 72.257 − 72 ⎞ ⎟ = P(z > .12) 15 / 49 ⎠ = .5 − .0478 = .4522

β = P( x0 > 72.257 when μ = 70) = .1469 (Refer to Exercise 6.69, part c.) If μ = 68,

⎛

β = P( x0 > 72.257 when μ = 68) = P ⎜ z > ⎝

If μ = 66,

⎛

β = P( x0 > 72.257 when μ = 66) = P ⎜ z > ⎝

In summary,

μ β

74 .7910

72 .4522

70 .1469


72.257 − 68 ⎞ ⎟ = P(z > 1.99) 15 / 49 ⎠ = .5 − .4767 = .0233 72.257 − 66 ⎞ ⎟ = P(z > 2.92) 15 / 49 ⎠ = .5 − .4982 = .0018

68 .0233

66 .0018

179


b.

c.

Looking at the graph, β is approximately .62 when μ = .73.

d.

Power = 1 − β Therefore, 74 μ .7910 β Power .2090

72 .4522 .5478

70 .1469 .8531

68 .0233 .9767

66 .0018 .9982

The power curve starts out close to 1 when μ = 66 and decreases as μ increases, while the β curve is close to 0 when μ = 66 and increases as μ increases.

6.82

e.

As the distance between the true mean μ and the null hypothesized mean μ0 increases, β decreases and the power increases. We can also see that as β increases, the power decreases.

a.

To determine if the mean size of California homes exceeds the national average, we test: H0: μ = 2230 Ha: μ > 2230

180

Chapter 6



x − μ0

σx

=

2347 − 2230 = 4.55 257 / 100

The rejection region requires α = .01 in the upper tail of the z-distribution. From Table IV, Appendix B, z.05 = 2.33. The rejection region is z > 2.33. Since the observed value of the test statistic falls in the rejection region (z = 4.55 > 2.33), H0 is rejected. There is sufficient evidence to indicate the mean size of California homes exceeds the national average at α = .01. b.

To compute the power, we must first set up the rejection regions in terms of . ⎛ s ⎞ ⎛ 257 ⎞ x0 = μ0 + zα σ x ≈ μ0 + 2.33 ⎜ ⎟ = 2, 230 + 2.33 ⎜ ⎟ = 2,289.88 ⎝ n⎠ ⎝ 100 ⎠

We would reject H0 if x > 2,289.88 The power of the test when μ = 2,330 would be: ⎛ x − μa ⎞ 2, 289.88 − 2,330 ⎞ ⎛ Power = P( x > 2289.88⏐μ = 2,330) = P ⎜ z > 0 ⎟ = P⎜ z > ⎟ σx ⎠ 257 / 100 ⎝ ⎠ ⎝ = P(z > −1.56) = .5 + .4406 = .9406

c.

The power of the test when μ = 2,280 would be: ⎛ x − μa Power = P( > 2289.88⏐μ = 2,280) = P ⎜ z > 0 σx ⎝ = P(z > 0.38) = .5 − .1480 = .3520

6.84

a.

⎞ 2, 289.88 − 2, 280 ⎞ ⎛ ⎟ = P⎜ z > ⎟ 257 / 100 ⎝ ⎠ ⎠

To determine if the mean mpg for 2006 Honda Civic autos is greater than 38 mpg, we test: H0: μ = 38 Ha: μ > 38

b.


x − μ0

σx

=

40.3 − 38 = 2.16 6.4 / 36

The rejection region requires α = .05 in the upper tail of the z-distribution. From Table IV, Appendix B, z.05 = 1.645. The rejection region is z > 1.645. Since the observed value of the test statistic falls in the rejection region (z = 2.16 > 1.645), H0 is rejected. There is sufficient evidence to indicate that the mean mpg for 2006 Honda Civic autos is greater than 38 mpg at α = .05. We must assume that the sample was a random sample.


181


c.

First find: x0 = μ0 + zα σ x = μ0 + zα

Thus, x0 = 38 + 1.645

σ n

where zα = 1.645 from Table IV, Appendix B.

6.4 = 39.75 36

For μ = 38.5: 39.75 − 38.5 ⎞ ⎛ Power = P( x > 39.75│μ = 38.5) = P ⎜ z > ⎟ = P(z > 1.17) 6.4 / 36 ⎠ ⎝ For μ = 39:

= .5 − .3790 = .1210

39.75 − 39 ⎞ ⎛ Power = P( x > 39.75│μ = 39) = P ⎜ z > ⎟ = P(z > .70) 6.4 / 36 ⎠ ⎝ For μ = 39.5:

= .5 − .2580 = .2420

39.75 − 39.5 ⎞ ⎛ Power = P( x > 39.75│μ = 39.5) = P ⎜ z > ⎟ = P(z > .23 ) 6.4 / 36 ⎠ ⎝ For μ = 40:

= .5 − .0910 = .4090

39.75 − 40 ⎞ ⎛ Power = P( x > 39.75│μ = 40) = P ⎜ z > ⎟ = P(z > −.23) 6.4 / 36 ⎠ ⎝ For μ = 40.5:

= .5 + .0910 = .5910

39.75 − 40.5 ⎞ ⎛ Power = P( x > 39.75│μ = 40.5) = P ⎜ z > ⎟ = P(z > −.70) 6.4 / 36 ⎠ ⎝ = .5 + .2580 = .7580 d.

182

The plot is:

Chapter 6


e.

From the plot, the power is approximately .5. For μ = 39.75 : ⎛ 39.75 − 39.75 ⎞ Power = P( x > 39.75 | μ = 39.75) = P ⎜ z > ⎟ = P( z > 0) = .5 ⎜ 6.4 36 ⎟⎠ ⎝

f.

From the plot, the power is approximately 1. For μ = 43 : ⎛ 39.75 − 43 ⎞ Power = P( x > 39.75 | μ = 43) = P ⎜ z > ⎟ = P( z > −3.05) ⎜ 6.4 36 ⎟⎠ ⎝ = .5 + .4989 = .9989 If the true value of μ is 40, the approximate probability that the test will fail to reject H0 is 1 − .9989 = .0011.

6.86

Using Table VII, Appendix B: a.

For n = 12, df = n − 1 = 12 − 1 = 11 P(χ2 > χ 02 ) = .10 ⇒ χ 02 = 17.2750

b.

For n = 9, df = n − 1 = 9 − 1 = 8 P(χ2 > χ 02 ) = .05 ⇒ χ 02 = 15.5073

c.

For n = 5, df = n − 1 = 5 − 1 = 4 P(χ2 > χ 02 ) = .025 ⇒ χ 02 = 11.1433

6.88

a.

It would be necessary to assume that the population has a normal distribution.

b.

H0: σ2 = 1 Ha: σ2 > 1 The test statistic is χ2 =

(n − 1) s 2

σ

2 0

=

6(4.84) = 29.04 1

The rejection region requires α = .05 in the upper tail of the χ2 distribution with 2 = 12.5916. The rejection df = n − 1 = 7 − 1 = 6. From Table VII, Appendix B, χ.05 region is χ2 > 12.5916. Since the observed value of the test statistic falls in the rejection region (29.04 > 12.5916), H0 is rejected. There is sufficient evidence to indicate that the variance is greater than 1 at α = .05.


183


c.

H0: σ2 = 1 Ha: σ2 ≠ 1 (n − 1) s 2

The test statistic is χ2 =

σ

2 0

=

6(4.84) = 29.04 1

The rejection region requires α/2 = .025 in the upper tail of the χ2 distribution with 2 = 1.237347 and df = n − 1 = 7 − 1 = 6. From Table VII, Appendix B, χ.975 2 χ.025 = 14.4494. The rejection region is χ2 < 1.237347 or χ2 > 14.4494.

Since the observed value of the test statistic falls in the rejection region (29.04 > 14.4494), H0 is rejected. There is sufficient evidence to indicate that the variance is not equal to 1 at α = .05. 6.90


s2 =

∑x

(∑ x) −

2

n −1

n

2

=

302 7 = 7.9048 7 −1

176 −

To determine if σ2 < 1, we test: H0: σ2 = 1 Ha: σ2 < 1 The test statistic is χ2 =

(n − 1) s 2

σ

2 0

=

(7 − 1)7.9048 = 47.43 1

The rejection region requires α = .05 in the lower tail of the χ2 distribution with df = n − 1 = 7 2 = 1.63539. The rejection region is χ2 < 1.63539. − 1 = 6. From Table VII, Appendix B, χ.95 Since the observed value of the test statistic does not fall in the rejection region (χ2 = 47.43
a.

To determine if the breaking strength variance of the new adhesive is less than the variance of the standard composite adhesive, σ2 = .25, we test: H0: σ2 = .25 Ha: σ2 < .25

b.

184

The rejection region requires α = .01 in the lower tail of the χ2 distribution with 2 df = n – 1 = 10 – 1 = 9. From Table VII, Appendix B, χ.99 = 2.087912. The rejection 2 region is χ < 2.087912.

Chapter 6


6.94

(n − 1) s 2

(10 − 1).462 = 7.6176 . .25

c.

The test statistic is χ 2 =

b.

Since the observed value of the test statistic does not fall in the rejection region (χ2 = 7.6176
e.

We must assume that the distribution of the breaking strengths is approximately normal and that a random sample was selected from this population.

σ o2

=

To determine if the true standard deviation of the point-spread errors exceed 15 (variance exceeds 225), we test: H0: σ2 = 225 Ha: σ2 > 225 The test statistic is χ2 =

(n − 1) s 2

σ 02

=

(240 − 1)13.32 = 187.896 225

The rejection region requires α in the upper tail of the χ2 distribution with df = n − 1 = 240 − 1 = 239. The maximum value of df in Table VII is 100. Thus, we cannot find the rejection region using Table VII. Using a statistical package, the p-value associated with χ2 = 187.896 is .9938. Since the p-value is so large, there is no evidence to reject H0. There is insufficient evidence to indicate that the true standard deviation of the point-spread errors exceeds 15 for any reasonable value of α. (Since the observed variance (or standard deviation) is less than the hypothesized value of the variance (or standard deviation) under H0, there is no way H0 will be rejected for any reasonable value of α.) 6.96

Using MINITAB, the descriptive statistics are: Descriptive Statistics: GASTURBINE Variable GASTURBINE

N 67

N* 0

Variable GASTURBINE

Maximum 16243

Mean 11066

SE Mean 195

StDev 1595

Minimum 8714

Q1 9918

Median 10656

Q3 11842

To determine if the heat rates of the augmented gas turbine engine are more variable than the heat rates of the standard gas turbine engine, we test: Ho: σ2 = 1,5002 Ha: σ2 > 1,5002


185



( n − 1) s 2

σ o2

=

(67 − 1)1,5952 = 74.625 . 1,5002

The rejection region requires α = .05 in the upper tail of the χ2 distribution with 2 ≈ 85.95148. The rejection df = n – 1 = 67 – 1 = 66. From Table VII, Appendix B, χ.05 2 region is χ > 85.95148. Since the observed value of the test statistic does not fall in the rejection region (χ2 = 74.625 >/ 85.95148), H0 is not rejected. There is insufficient evidence to indicate the heat rates of the augmented gas turbine engine are more variable than the heat rates of the standard gas turbine engine at α = .05. 6.98

For a large sample test of hypothesis about a population mean, no assumptions are necessary because the Central Limit Theorem assures that the test statistic will be approximately normally distributed. For a small sample test of hypothesis about a population mean, we must assume that the population being sampled from is normal. The test statistic for the large sample test is the z statistic, and the test statistic for the small sample test is the t statistic.

6.100

The elements of the test of hypothesis that should be specified prior to analyzing the data are: null hypothesis, alternative hypothesis, and rejection region based on α.

6.102

α = P(Type I error) = P(rejecting H0 when it is true). Thus, if rejection of H0 would cause your firm to go out of business, you would want this probability or α to be small.

6.104

a.

H0: μ = 8.3 Ha: μ ≠ 8.3 The test statistic is z =

x − μ0

σx

=

8.2 − 8.3 .79 / 175

= −1.67

The rejection region requires α/2 = .05/2 = .025 in each tail of the z-distribution. From Table IV, Appendix B, z.025 = 1.96. The rejection region is z < −1.96 or z > 1.96. Since the observed value of the test statistic does not fall in the rejection region (−1.67
H0: μ = 8.4 Ha: μ ≠ 8.4 The test statistic is z =

x − μ0

σx

=

8.2 − 8.4 = −3.35 .79 / 175

The rejection region is the same as part b, z < −1.96 or z > 1.96.

186

Chapter 6


Since the observed value of the test statistic falls in the rejection region (−3.35 < −1.96), H0 is rejected. There is sufficient evidence to indicate that the mean is different from 8.4 at α = .05. c.

H0: σ = 1 Ha: σ ≠ 1

H0: σ2 = 1 or

Ha: σ2 ≠ 1


(n − 1) s 2

σ 02

=

(175 − 1)(.79) 2 = 108.59 1

The rejection region requires α/2 = .05/2 = .025 in each tail of the χ 2 distribution with df 2 2 ≈ 129.561 and χ.975 ≈ = n – 1 = 175 – 1 = 174. From Table VII, Appendix B, χ.025

74.2219. The rejection region is χ 2 > 129.561 or χ 2 < 74.2219. Since the observed value of the test statistic does not fall in the rejection region ( χ 2 = 108.59 >/ 129.561 and χ 2 = 108.59
In part a, the rejection region is z < −1.96 or z > 1.96. In terms of x , the rejection region would be:

z=

x − μ0

z=

x − μ0

σx

σx

⇒ 1.96 =

xU − 8.3 .79

⇒ −1.96 =

175

⇒ .117 = xU − 8.3 ⇒ xU = 8.417

xL − 8.3 .79

175

⇒ −.117 = xL − 8.3 ⇒ xL = 8.183

Based on x , the rejection region would be: Reject H0 if x < 8.183 or x > 8.417 The power of the test is the probability the test statistic falls in the rejection region, given the alternative hypothesis is true. In this case, we will let μa = 8.5. Power = P( x < 8.183 | μa = 8.5) + P( x > 8.417 | μa = 8.5) ⎛ ⎛ 8.183 − 8.5 ⎞ 8.417 − 8.5 ⎞ = P ⎜⎜ z < ⎟ + P ⎜⎜ z > ⎟ ⎟ .79 175 ⎠ .79 175 ⎟⎠ ⎝ ⎝ = P( z < −5.31) + P ( z > −1.39) = (.5 − .5) + (.5 + .4177) = .9177 (Using Table IV, Appendix B)


187


6.106

6.108

a.

The p-value = .1288 = P(t ≥ 1.174). Since the p-value is not very small, there is no evidence to reject H0 for α ≤ .10. There is no evidence to indicate the mean is greater than 10.

b.

We must assume that a random sample was selected from a population that is normally distributed.

c.

For the alternative hypothesis Ha: μ ≠ 10, the p-value is 2 times the p-value for the onetailed test. The p-value = 2(.1288) = .2576. There is no evidence to reject H0 for α ≤ .10. There is no evidence to indicate the mean is different from 10.

a.

If we wish to test the research hypothesis that the mean GHQ score for all unemployed men exceeds 10, we test: H0: μ = 10 Ha: μ > 10 This is a one-tailed test. We are only interested in rejecting H0 if the mean GHQ score for all unemployed men is greater than 10.

b.

The rejection region requires α = .05 in the upper tail of the z-distribution. From Table IV, Appendix B, z.05 = 1.645. The rejection region is z > 1.645.

c.


x − μ0

σx

=

10.94 − 10.0 = 1.29 5.10 / 49

Since the observed value of the test statistic does not fall in the rejection region (z = 1.29 >/ 1.645), H0 is not rejected. There is insufficient evidence to indicate the mean GHQ score for all unemployed men is greater than 10 at α = .05. d.

The p-value is P(z ≥ 1.29) = .5 − .4015 = .0985. (Using Table IV, Appendix B) The probability of observing our test statistic or anything more unusual, given H0 is true, is .0985. Since this value is not less than α = .05, we do not reject H0. There is insufficient evidence to indicate the mean GHO score is greater than 10.

6.110

a.

The population parameter of interest is p = proportion of all television viewers with access to cable-TV who agree with the statement “Overall, I find the quality of news on cable networks to be better than news on the ABC, CBS, and NBC networks.

b.

pˆ =

c.

To determine if the true proportion of TV-viewers who find cable news to be better quality than network news differs from .50, we test:

x 248 = = .496 n 500

H0: p = .50 Ha: p ≠ .50

188

Chapter 6


d.


pˆ − p0 p0 q0 n

=

.496 − .50 = −0.18 .50(.50) 500

The rejection region requires α/2 = .10/2 = .05 in each tail of the z-distribution. From Table IV, Appendix B, z.05 = 1.645. The rejection region is z > 1.645 or z < −1.645. Since the observed value of the test statistic does not fall in the rejection region (z = −0.18
In order for the inference to be valid, the sampling distribution of pˆ must be approximately normal. We check this assumption: p0 ± 3σ pˆ ⇒ p0 ± 3

p0 q0 .5(.5) ⇒ .5 ± 3 ⇒ .5 ± .067 ⇒ (.433, .567) n 500

Since the interval falls completely in the interval (0, 1), the normal distribution will be adequate. 6.112

a.

First, check to see if the normal approximation is adequate: p0 ± 3 σ pˆ ⇒ p0 ± 3

p0 q0 (.25)(.75) ⇒ .25 ± 3 ⇒ .25 ± .103 ⇒ (.147, .353) n 159

Since the interval falls completely in the interval (0, 1), the normal distribution will be adequate.

pˆ =

x 124 = .786 = n 159

To determine if the percentage of truckers who suffer from sleep apnea differs from 25%, we test: H0: p = .25 Ha: p ≠ .25 The test statistic is z =

pˆ − p0 p0 q0 n

=

.786 − .25 = 15.61 (.25)(.75) 159

The rejection region requires α/2 = .10/2 = .05 in each tail of the z-distribution. From Table IV, Appendix B, z.05 = 1.645. The rejection region is z < −1.645 or z > 1.645.


189


Since the observed value of the test statistic falls in the rejection region (z = 15.61 > 1.645), H0 is rejected. There is sufficient evidence to indicate that the percentage of truckers who suffer from sleep apnea differs from 25% at α = .05. b.

The observed significance level is the p-value and is: p-value = P(z ≥ 15.61) + P(z ≤ −15.61) ≈ (.5 − .5) + (.5 − .5) = 0 Since the p-value is so small, we would reject H0 for any reasonable value of α. There is sufficient evidence to indicate that the percentage of truckers who suffer from sleep apnea differs from 25%.

6.114

c.

The inference from a confidence interval and a test of hypothesis must agree because the same numbers are used in both if the same level of significance is used.

a.

Let p = proportion of shoppers using cents-off coupons. To determine if the proportion of shoppers using cents-off coupons exceeds .65, we test: H0: p = .65 Ha: p > .65 The test statistic is z =

pˆ − p0 p0 q0 n

=

.77 − .65 .65(.35) 1, 000

= 7.96

The rejection region requires α = .05 in the upper tail of the z-distribution. From Table IV, Appendix B, z.05 = 1.645. The rejection region is z > 1.645. Since the observed value of the test statistic falls in the rejection region (z = 7.96 > 1.645), H0 is rejected. There is sufficient evidence to indicate the proportion of shoppers using cents-off coupons exceeds .65 at α = .05. b.

The sample size is large enough if the interval does not include 0 or 1. p0 q0 .65(.35) ⇒ .65 ± 3 ⇒ .65 ± .045 ⇒ (.605, .695) n 1, 000 Since the interval falls completely in the interval (0, 1), the normal distribution will be adequate. p0 ± 3σ pˆ ⇒ p0 ± 3

c.

190

The p-value is p = P ( z ≥ 7.96) = (.5 − .5) ≈ .0 . (Using Table IV, Appendix B.) Since the p-value is smaller than α = .05, H0 is rejected. There is sufficient evidence to indicate the proportion of shoppers using cents-off coupons exceeds .65 at α = .05.

Chapter 6


6.116

Using MINITAB, the descriptive statistics are: Descriptive Statistics: Tunnel Variable Tunnel

N 10

Mean 989.8

Median 970.5

TrMean 987.9

Variable Tunnel

Minimum 735.0

Maximum 1260.0

Q1 862.5

Q3 1096.8

StDev 160.7

SE Mean 50.8

To determine whether peak hour pricing succeeded in reducing the average number of vehicles attempting to use the Lincoln Tunnel during the peak rush hour, we test: H0: μ = 1,220 Ha: μ < 1,220 The test statistic is t =

x − μ0 s/ n

=

989.8 − 1, 220 160.7 / 10

= −4.53

Since no α is given, we will use α = .05. The rejection region requires α = .05 in the lower tail of the t-distribution with df = n − 1 = 10 − 1 = 9. From Table VI, Appendix B, t.05 = 1.833. The rejection region is t < −1.833. Since the observed value of the test statistic falls in the rejection region (t = −4.53 < −1.833), H0 is rejected. There is sufficient evidence to indicate that peak hour pricing succeeded in reducing the average number of vehicles attempting to use the Lincoln Tunnel during the peak rush hour at α = .05. 6.118

a.

To determine if the true mean number of pecks at the blue string is less than 7.5, we test: H0: μ = 7.5 Ha: μ < 7.5 The test statistic is z =

x − μ0

σx

=

1.13 − 7.5 2.21

72

= −24.46

The rejection region requires α = .01 in the lower tail of the z-distribution. From Table IV, Appendix B, z.01 = 2.33. The rejection region is z < −2.33. Since the observed value of the test statistic falls in the rejection region (z = −24.46 < −2.33), H0 is rejected. There is sufficient evidence to indicate the true mean number of pecks at the blue string is less than 7.5 at α = .01.

b.

From Exercise 5.96, the 99% confidence interval is (.46, 1.80). Since the hypothesized value of the mean (μ = 7.5) does not fall in the confidence interval, it is not a likely candidate for the true value of the mean. Thus, you would reject it. This agrees with the conclusion in part a.


191


6.120

a.

pˆ = 24/40 = .6 To determine if the proportion of shoplifters turned over to police is greater than .5, we test: H0: p = .5 Ha: p > .5 The test statistic is z =

pˆ − p0 p0 q0 n

=

.6 − .5 .5(.5) 40

= 1.26

The rejection region requires α = .05 in the upper tail of the z-distribution. From Table IV, Appendix B, z.05 = 1.645. The rejection region is z > 1.645. Since the observed value of the test statistic does not fall in the rejection region (z = 1.26 >/ 1.645), H0 is not rejected. There is insufficient evidence to indicate the proportion of shoplifters turned over to police is greater than .5 at α = .05. b.

To determine if the normal approximation is appropriate, we check: p0 ± 3σ pˆ ± 3

p0 q0 (.5)(.5) ≈ .5 ± 3 ⇒ .5 ± .237 ⇒ (.263, .737) n 40

Since the interval falls completely in the interval (0, 1), the normal distribution will be adequate. c.

The observed significance level of the test is p-value = P(z ≥ 1.26) = .5 − .3962 = .1038. The probability of observing the value of our test statistic or anything more unusual if the true value of p is .5 is .4038. Since this p-value is so large, there is no evidence to reject H0. There is no evidence to indicate the true proportion of shoplifters turned over to police is greater than .5.

6.122

d.

Any value of α that is greater than the p-value would lead one to reject H0. Thus, for this problem, we would reject H0 for any value of α > .1038.

a.

To determine whether the mean profit change for restaurants with frequency programs is greater than $1047.34, we test: H0: μ = 1047.34 Ha: μ > 1047.34

b.

Some preliminary calculations are: x =

192

∑ x = 30,113.17 n

12

= 2,509.43

Chapter 6


(∑ x) −

2

30,113.17 2 n 12 s2 = = 4,619,331.955 = n −1 12 − 1 s = 4,619,331.955 = 2149.2631

∑x

2


126,379,568.8 −

x − μ0 s/ n

=

2509.43 − 1047.34 2149.2631/ 12

= 2.36

The rejection region requires α = .05 in the upper tail of the t-distribution with df = n − 1 = 12 − 1 = 11. From Table VI, Appendix B, t.05 = 1.796. The rejection region is t > 1.796. Since the observed value of the test statistic falls in the rejection region (t = 2.36 > 1.796), H0 is rejected. There is sufficient evidence to indicate the mean profit change for restaurants with frequency programs is greater than $1047.34 for α = .05. It appears that the frequency program would be profitable for the company if adopted nationwide. 6.124

a.

A Type II error would be concluding the mean amount of PCB in the air is less than or equal to 3 parts per million when, in fact, it is more than 3 parts per million.

b.

From Exercise 6.123, z =

x0 − μ

σ/ n

⇒ x0 = z

σ n

.5 +3 50 ⇒ x0 = 3.165

+ μ0 ⇒ x0 = 2.33

⎛ ⎞ ⎜ 3.165 − 3.1 ⎟ ⎟ = P(z ≤ .92) = .5 + .3212 = .8212 For μ = 3.1, β = P( x ≤ 3.165) = P ⎜ z ≤ .5 ⎜ ⎟ ⎜ ⎟ 50 ⎝ ⎠ (from Table IV, Appendix B) c.

Power = 1 − β = 1 − .8212 = .1788

d.

⎛ ⎞ ⎜ 3.165 − 3.2 ⎟ ⎟ = P(z ≤ −.49) = .5 − .1879 = .3121 For μ = 3.2, β = P( x ≤ 3.165) = P ⎜ z ≤ .5 ⎜ ⎟ ⎜ ⎟ 50 ⎝ ⎠ Power = 1 − β = 1 − .3121 = .6879 As the plant's mean PCB departs further from 3, the power increases.


193


6.126

a.

Some preliminary calculations: x =

s2 = s=

∑ x = 79.93 n

∑x

5

2

(∑ x) −

= 15.986 2

n = n −1 .00043 = .0207

1, 277.7627 − 5 −1

79.932 5 = .00043

To determine if the mean measurement differs from 16.01, we test: H0: μ = 16.01 Ha: μ ≠ 16.01 The test statistic is t =

x − μ0 s/ n

=

15,986 − 16.01 .0207 / 5

= −2.59

The rejection region requires α/2 = .05/2 = .025 in each tail of the t-distribution with df = n − 1 = 5 − 1 = 4. From Table VI, Appendix B, t.025 = 2.776. The rejection region is t < −2.776 or t > 2.776. Since the observed value of the test statistic does not fall in the rejection region (t = −2.59
We must assume that the sample of measurements was randomly selected from a population of measurements that is normally distributed.

c.

To determine if the standard deviation of the weight measurements is greater than .01, we test: H0: σ2 = .012 Ha: σ2 > .012 The test statistic is χ 2 =

( n − 1) s 2

σ o2

=

(5 − 1).0207 2 = 16.0684 . .012

The rejection region requires α = .05 in the upper tail of the χ2 distribution with df = n – 1 = 5 – 1 = 4. From Table VII, Appendix B, χ .205 = 9.48773. The rejection region is χ2 > 9.48773. Since the observed value of the test statistic falls in the rejection region (χ2 = 16.0684 > 9.48773), H0 is rejected. There is sufficient evidence to indicate the standard deviation of the weight measurements is greater than .01 at α = .05.

194

Chapter 6


6.128

a.

Let pi = proportion of first round games won by the ith seed. To determine if the higher seed has a better than 50-50 chance of winning a first-round game, we test: H0: pi = .5 Ha: pi > .5 for i = 1, 2, 3, …, 8 The test statistic is zi =

pˆ i − p0 po qo n

.

No value of α was given. We will use α = .05. The rejection region requires α = .05 in the upper tail of the z-distribution. From Table IV, Appendix B, z.05 = 1.645. The rejection region is z > 1.645. xi x x 52 x 49 41 = 1 , pˆ 2 = 2 = = .942 , pˆ 3 = 3 = = .788 , . Thus, pˆ1 = 1 = n 52 n 52 n n 52 x 37 x x x 42 36 35 pˆ 4 = 4 = = .808 , pˆ 5 = 5 = = .712 , pˆ 6 = 6 = = .692 , pˆ 7 = 7 = = .673 , n 52 n 52 n 52 n 52 x 22 pˆ 8 = 8 = = .423 n 52 pˆ i =

The corresponding test statistics are: z1 =

z3 =

z5 =

z7 =

pˆ1 − p0 po qo n pˆ 3 − p0 po qo n pˆ 5 − p0 po qo n pˆ 7 − p0 po qo n

=

1.00 − .5

=

=

=

.5(.5) 52 .788 − .5 .5(.5) 52 .712 − .5 .5(.5) 52 .673 − .5 .5(.5) 52

= 7.21 , z2 =

= 4.15 , z4 =

= 3.06 , z6 =

= 2.50 , z8 =

pˆ 2 − p0 po qo n pˆ 4 − p0 po qo n pˆ 6 − p0 po qo n pˆ 8 − p0 po qo n

=

.942 − .5

=

=

=

.5(.5) 52 .808 − .5 .5(.5) 52 .692 − .5 .5(.5) 52 .423 − .5 .5(.5) 52

= 6.37 ,

= 4.44 ,

= 2.77 ,

= −1.11

For games matching 1 and 16, since the observed value of the test statistic falls in the rejection region (z1 = 7.21 > 1.645), H0 is rejected. There is sufficient evidence to indicate the #1 seed has a better than 50-50 chance of winning a first-round game at α = .05.


195


For games matching 2 and 15, since the observed value of the test statistic falls in the rejection region (z2 = 6.37 > 1.645), H0 is rejected. There is sufficient evidence to indicate the #2 seed has a better than 50-50 chance of winning a first-round game at α = .05. For games matching 3 and 14, since the observed value of the test statistic falls in the rejection region (z3 = 4.15 > 1.645), H0 is rejected. There is sufficient evidence to indicate the #3 seed has a better than 50-50 chance of winning a first-round game at α = .05. For games matching 4 and 13, since the observed value of the test statistic falls in the rejection region (z4 = 4.44 > 1.645), H0 is rejected. There is sufficient evidence to indicate the #4 seed has a better than 50-50 chance of winning a first-round game at α = .05. For games matching 5 and 12, since the observed value of the test statistic falls in the rejection region (z5 = 3.06 > 1.645), H0 is rejected. There is sufficient evidence to indicate the #5 seed has a better than 50-50 chance of winning a first-round game at α = .05. For games matching 6 and 11, since the observed value of the test statistic falls in the rejection region (z6 = 2.77 > 1.645), H0 is rejected. There is sufficient evidence to indicate the #6 seed has a better than 50-50 chance of winning a first-round game at α = .05. For games matching 7 and 10, since the observed value of the test statistic falls in the rejection region (z7 = 2.50 > 1.645), H0 is rejected. There is sufficient evidence to indicate the #7 seed has a better than 50-50 chance of winning a first-round game at α = .05. For games matching 8 and 9, since the observed value of the test statistic does not fall in the rejection region (z8 = −1.11 >/ 1.645), H0 is not rejected. There is insufficient evidence to indicate the #8 seed has a better than 50-50 chance of winning a first-round game at α = .05. b.

Let μi = mean margin of victory. To determine if the mean margin of victory is greater than 10 points, we test: H0: μi = 10 Ha: μi > 10 i = 1, 2, 3, and 4 The test statistic is zi =

xi − μ0

σx

No value of α was given. We will use α = .05. The rejection region requires α = .05 in the upper tail of the z-distribution. From Table IV, Appendix B, z.05 = 1.645. The rejection region is z > 1.645.

196

Chapter 6


The test statistics are: z1 =

z3 =

x1 − μ0

σx

x3 − μ0

σx

=

22.9 − 10 12.4

=

52

= 7.50 , z2 =

10.6 − 10 12.0

52

= 0.36 , z4 =

x2 − μ0

σx

x4 − μ0

σx

17.2 − 10

=

11.4

=

52

= 4.55 ,

10.0 − 10 12.5

52

=0

For games matching 1 and 16, since the observed value of the test statistic falls in the rejection region (z1 = 7.50 > 1.645), H0 is rejected. There is sufficient evidence to indicate the #1 seed wins by more than 10 points in first-round games at α = .05. For games matching 2 and 15, since the observed value of the test statistic falls in the rejection region (z2 = 4.55 > 1.645), H0 is rejected. There is sufficient evidence to indicate the #2 seed wins by more than 10 points in first-round games at α = .05. For games matching 3 and 14, since the observed value of the test statistic does not fall in the rejection region (z3 = 0.36 >/ 1.645), H0 is not rejected. There is insufficient evidence to indicate the #3 seed wins by more than 10 points in first-round games at α = .05.

c.

For games matching 4 and 13, since the observed value of the test statistic does not fall in the rejection region (z4 = 0 >/ 1.645), H0 is not rejected. There is insufficient evidence to indicate the #4 seed wins by more than 10 points in first-round games at α = .05. Let μi = mean margin of victory. To determine if the mean margin of victory is less than 5 points, we test: H0: μi = 5 Ha: μi < 5 i = 5, 6, 7, and 8 The test statistic is zi =

xi − μ0

σx

No value of α was given. We will use α = .05. The rejection region requires α = .05 in the upper tail of the z-distribution. From Table IV, Appendix B, z.05 = 1.645. The rejection region is z < −1.645. The test statistics are: z5 =

z7 =

x5 − μ0

σx

x7 − μ0

σx

=

=

5.3 − 5 10.4

52

3.2 − 5 10.5

52

= 0.21 , z6 =

x6 − μ0

= −1.24 , z8 =

σx

=

x8 − μ0


σx

4.3 − 5 10.7 =

52

= −.47 ,

−2.1 − 5 11.0

52

= −4.65

197


For games matching 5 and 12, since the observed value of the test statistic does not fall in the rejection region (z5 = 0.21
To determine if the standard deviation of victory margin differs from 11, we test: H0: σ 12 = 112 = 121 Ha: σ 12 ≠ 112 = 121 The test statistic is χ i2 =

(n − 1) si2

σ 02

No α level was given, so we will use α = .05. The rejection region requires α/2 = .05/2 = .025 in each tail of the χ 2 distribution with df = n – 1 = 52 – 1 = 51. From Table VII, 2 2 = 71.4202 and χ.975 = 32.3574. The rejection region is χ 2 < 32.3574 Appendix B, χ.025

or χ 2 > 71.4202. The test statistics are:

χ12 =

χ 32 =

χ 52 =

χ 72 =

198

(n − 1) s12

=

(n − 1) s22 (52 − 1)(11.4) 2 (52 − 1)(12.4) 2 = 64.808 , χ 22 = = = 54.777 , 121 121 σ 02

(n − 1) s32

=

(n − 1) s42 (52 − 1)(12.5) 2 (52 − 1)(12.0) 2 = 60.694 , χ 42 = = = 65.857 , 121 121 σ 02

(n − 1) s52

=

(n − 1) s62 (52 − 1)(10.7) 2 (52 − 1)(10.4) 2 = 45.588 , χ 62 = = = 48.256 , 121 121 σ 02

(n − 1) s72

=

(n − 1) s82 (52 − 1)(11) 2 (52 − 1)(10.5) 2 = 46.469 , χ 82 = = = 51.000 121 121 σ 02

σ

2 0

σ 02

σ 02

σ 02

Chapter 6


For games matching 1 and 16, since the observed value of the test statistic does not fall in the rejection region ( χ12 = 64.808 >/ 71.4202 and χ12 = 64.808 / 71.4202 and χ 22 = 54.777 / 71.4202 and χ 32 = 60.694 / 71.4202 and χ 42 = 65.857 / 71.4202 and χ 52 = 45.588 / 71.4202 and χ 62 = 48.256 / 71.4202 and χ 72 = 46.469 / 71.4202 and χ 82 = 51.000

199


e.

Let μ = mean difference in game outcome and point spread. To determine if the point spread is a good predictor of the victory margin, we test: H0: μ = 0 Ha: μ ≠ 0 The test statistic is z =

x − μ0

σx

=

.7 − 0 11.3

360

= 1.18 .

Since no α was given, we will use α = .05. The rejection region requires α/2 = .05/2 = .025 in each tail of the z-distribution. From Table IV, Appendix B, z.025 = 1.96. The rejection region is z > 1.96 or z < −1.96. Since the observed value of the test statistic does not fall in the rejection region (z = 1.18 >/ 1.96), H0 is not rejected. There is insufficient evidence to indicate there is a difference in the game outcome and point spread at α = .05. There is no evidence to indicate the point spread is not a good predictor of the victory margin. 6.130

Using MINITAB, the descriptive statistics are: Descriptive Statistics: Candy Variable Candy

N 5

N* 0

Mean 24.00

SE Mean 1.67

StDev 3.74

Minimum 21.00

Q1 21.00

Median 23.00

Q3 27.50

Maximum 30.00

To give the benefit of the doubt to the students we will use a small value of α. (We do not want to reject H0 when it is true to favor the students.) Thus, we will use α = .001.

We must also assume that the sample comes from a normal distribution. To determine if the mean number of candies exceeds 15, we test: H0: μ = 15 Ha: μ > 15 The test statistic is z =

x − μo

σ

n

=

22 − 15 3

5

= 5.22

The rejection region requires α = .001 in the upper tail of the z-distribution. From Table IV, Appendix B, z.001 = 3.08. The rejection region is z > 3.08. Since the observed value of the test statistic falls in the rejection region (z = 5.22 > 3.08), H0 is rejected. There is sufficient evidence to indicate the mean number of candies exceeds 15 at α = .001.

200

Chapter 6


Inferences Based on Two Samples: Confidence Intervals and Tests of Hypothesis 7.2

a.

μ x = μ1 = 12

σx =

b.

μ x = μ2 = 10

σx =

c.

μ x − x = μ1 − μ2 = 12 − 10 = 2 1

1

7.4

2

2

σ1 n1

σ2 n2

=

4 = .5 64

=

3 64

= .375

2

σ x −x = d.

1

1

Chapter 7

2

σ 12 n1

+

σ 22 n2

=

42 32 25 + = = .625 64 64 64

Since n1 ≥ 30 and n2 ≥ 30, the sampling distribution of x1 − x2 is approximately normal by the Central Limit Theorem.

Assumptions about the two populations: 1. 2.

Both sampled populations have relative frequency distributions that are approximately normal. The population variances are equal.

Assumptions about the two samples: The samples are randomly and independently selected from the population. 7.6

a.

sp2 =

(n1 − 1) s12 + (n2 − 1) s22 (25 − 1)120 + (25 − 1)100 5280 = 110 = = n1 + n2 − 2 25 + 25 + 2 48

b.

sp2 =

(20 − 1)12 + (10 − 1)20 408 = = 14.5714 20 + 10 − 2 28

c.

sp2 =

(6 − 1).15 + (10 − 1).2 2.55 = = .1821 6 + 10 − 2 14

d.

sp2 =

(16 − 1)3000 + (17 − 1)2500 85,000 = = 2741.9355 16 + 17 − 2 31

e.

sp2 falls near the variance with the larger sample size.

Inferences Based on Two Samples: Confidence Intervals and Tests of Hypothesis

201


7.8

σ 12

σ 22

9 16 + = .25 = .5 100 100

a.

σ x −x =

b.

The sampling distribution of x1 − x2 is approximately normal by the Central Limit Theorem since n1 ≥ 30 and n2 ≥ 30.

1

2

n1

+

n2

=

μ x − x = μ1 − μ2 = 10 1

c.

2

x1 − x2 = 15.5 − 26.6 = −11.1 Yes, it appears that x1 − x2 = −11.1 contradicts the null hypothesis H0: μ1 − μ2 = 10.

d.

The rejection region requires α/2 = .025 = .05/2 in each tail of the z-distribution. From Table IV, Appendix B, z.025 = 1.96. The rejection region is z < −1.96 or z > 1.96.

e.

H0: μ1 − μ2 = 10 Ha: μ1 − μ2 ≠ 10 The test statistic is z =

( x1 − x2 ) − 10

σ 12 n1

+

σ 22

=

(15.5 − 26.6) − 10 = −42.2 .5

n2

The rejection region is z < −1.96 or z > 1.96. (Refer to part d.) Since the observed value of the test statistic falls in the rejection region (z = −42.2 < −1.96), H0 is rejected. There is sufficient evidence to indicate the difference in the population means is not equal to 10 at α = .05. f.

The form of the confidence interval is: ( x1 − x2 ) ± zα / 2

σ 12 n1

+

σ 22 n2

For confidence coefficient .95, α = 1 − .95 = .05 and α/2 = .05/2 = .025. From Table IV, Appendix B, z.025 = 1.96. The confidence interval is: 9 16 + ⇒ −11.1 ± .98 ⇒ (−12.08, −10.12) (15.5 − 26.6) ± 1.96 100 100

We are 95% confident that the difference in the two means is between −12.08 and −10.12. g.

202

The confidence interval gives more information.

Chapter 7


7.10

Some preliminary calculations: x1 =

∑x

1

n1

∑x

2 1

s12 = x1 =

sp2 = a.

654 15

(∑ x ) −

2

∑x

2

n2

∑x

6542 15 = 419.6 = 29.3167 15 − 1 14

1

n1

=

28934 −

=

n1 − 1

2 2

s22 =

=

858 = 53.625 16

(∑ x ) −

2

2

n2 n2 − 1

=

8582 16 = 439.75 = 29.3167 16 − 1 15

46450 −

(n1 − 1) s12 + (n2 − 1) s22 (15 − 1)29.9714 + (16 − 1)29.3167 859.3501 = 29.6328 = = 29 n1 + n2 − 2 15 + 16 − 2

H0: μ2 − μ1 = 10 Ha: μ2 − μ1 > 10 The test statistic is t =

( x1 − x2 ) − D0 ⎛1 1⎞ sp2 ⎜ + ⎟ ⎝ n1 n2 ⎠

=

(53.625 − 43.6) − 10 ⎛1 1⎞ 29.6328 ⎜ + ⎟ ⎝ 15 16 ⎠

=

.025 = .013 1.9564

The rejection region requires α = .01 in the upper tail of the t-distribution with df = n1 + n2 − 2 = 15 + 16 − 2 = 29. From Table VI, Appendix B, t.01 = 2.462. The rejection region is t > 2.462. Since the test statistic does not fall in the rejection region (t = .013 >/ 2.462), H0 is not rejected. There is insufficient evidence to conclude μ2 − μ1 > 10 at α = .01. b.

For confidence coefficient .98, α = .02 and α/2 = .01. From Table VI, Appendix B, with df = n1 + n2 − 2 = 15 + 16 − 2 = 29, t.01 = 2.462. The 98% confidence interval for (μ2 − μ1) is:

⎛1 1⎞ ⎛1 1⎞ ( x1 − x2 ) ± tα / 2 sp2 ⎜ + ⎟ ⇒ (53.625 − 43.6) ± 2.462 29.6328 ⎜ + ⎟ ⎝ 15 16 ⎠ ⎝ n1 n2 ⎠ ⇒ 10.025 ± 4.817 ⇒ (5.208, 14.842) We are 98% confident that the difference between the mean of population 2 and the mean of population 1 is between 5.208 and 14.842.


203


7.12

a.

Let μ1 = mean carat size of diamonds certified by GIA and μ2 = mean carat size of diamonds certified by HRD. For confidence coefficient .95, α = .05 and α/2 = .05/2 = .025. From Table IV, Appendix B, z.025 = 1.96. The 95% confidence interval is:

σ 12

( x1 − x2 ) ± zα / 2

n1

+

σ 22 n2

⇒ (.6723 − .8129) ± 1.96

.24562 .18312 + 151 79

⇒ −.1406 ± .0563 ⇒ (−.1969, − .0843)

b.

We are 95% confident that the difference in mean carat size between diamonds certified by GIA and those certified by HRD is between -.1969 and -.0843.

c.

Let μ3 = mean carat size of diamonds certified by IGI. ( x1 − x3 ) ± zα / 2

σ 12 n1

+

σ 32 n3

⇒ (.6723 − .3665) ± 1.96

.24562 .21632 + 151 78

⇒ .3058 ± .0620 ⇒ (.2438, .3678)

7.14

d.

We are 95% confident that the difference in mean carat size between diamonds certified by GIA and those certified by IGI is between .2438 and .3678.

e.

( x2 − x3 ) ± zα / 2

f.

We are 95% confident that the difference in mean carat size between diamonds certified by HRD and those certified by IGI is between .3837 and .5091.

a.

Let μ1 = mean score for males and μ2 = mean score for females. For confidence coefficient .90, α = .10 and α/2 = .10/2 = .05. From Table IV, Appendix B, z.025 = 1.645. The 90% confidence interval is:

σ 22 n2

( x1 − x2 ) ± zα / 2

+

σ 32 n3

σ 12 n1

.18312 .21632 + 79 78 ⇒ .4464 ± .0627 ⇒ (.3837, .5091)

⇒ (.8129 − .3665) ± 1.96

+

σ 22 n2

⇒ (39.08 − 38.79) ± 1.645

6.732 6.942 + 127 114

⇒ 0.29 ± 1.452 ⇒ (−1.162, 1.742)

We are 90% confident that the difference in mean service-rating scores between males and females. b.

204

Because 0 falls in the 90% confidence interval, we are 90% confident that there is no difference in the mean service-rating scores between males and females.

Chapter 7


7.16

a.

The descriptive statistics are:

Descriptive Statistics: US, Japan Variable US Japan

N 5 5

Mean 6.562 3.118

Median 6.870 3.220

TrMean 6.562 3.118

Variable US Japan

Minimum 4.770 1.920

Maximum 8.000 4.910

Q1 5.415 1.970

Q3 7.555 4.215

s 2p =

StDev 1.217 1.227

SE Mean 0.544 0.549

(n1 − 1) s12 + (n2 − 1) s22 (5 − 1)1.217 2 + (5 − 1)1.227 2 = 1.4933 = 5+5−2 n1 + n2 − 2

To determine if the mean annual percentage turnover for U.S. plants exceeds that for Japanese plants, we test:

H0: μ1 − μ2 = 0 Ha: μ1 − μ2 > 0 The test statistic is t =

( x1 − x2 ) − D0 ⎛1 1 ⎞ sp2 ⎜ + ⎟ ⎝ n1 n2 ⎠

=

(6.562 − 3.118) − 0 ⎛1 1⎞ 1.4933 ⎜ + ⎟ ⎝5 5⎠

= 4.456

The rejection region requires α = .05 in the upper tail of the t-distribution with df = n1 + n2 − 2 = 5 + 5 − 2 = 8. From Table VI, Appendix B, t.05 = 1.860. The rejection region is t > 1.860. Since the observed value of the test statistic falls in the rejection region (t = 4.46 > 1.860), H0 is rejected. There is sufficient evidence to indicate the mean annual percentage turnover for U.S. plants exceeds that for Japanese plants at α = .05. b.

The p-value = P(t ≥ 4.456). Using Table VI, Appendix B, with df = n1 + n2 − 2 = 5 + 5 – 2 = 8, .005 < P(t ≥ 4.456) < .001. Since the p-value is so small, there is evidence to reject H0 for α > .005.

c.

The necessary assumptions are: 1. 2. 3.

Both sampled populations are approximately normal. The population variances are equal. The samples are randomly and independently sampled.

There is no indication that the populations are not normal. Both sample variances are similar, so there is no evidence the population variances are unequal. There is no indication the assumptions are not valid.


205


7.18

Let μ1 = the mean relational intimacy score for participants in the CMC group and μ2 = the mean relational intimacy score for participants in the FTF group. Using MINITAB, the descriptive statistics are: Descriptive Statistics: CMC, FTF Variable CMC FTF

N 24 24

N* 0 0

Mean 3.500 3.542

SE Mean 0.159 0.134

StDev 0.780 0.658

Minimum 2.000 2.000

Q1 3.000 3.000

Median 3.500 4.000

Q3 4.000 4.000

Maximum 5.000 5.000

Some preliminary calculations are: s 2p =

( n1 − 1) s12 + ( n2 − 1) s22 = ( 24 − 1) .7802 + ( 24 − 1) .6582 n1 + n2 − 2

24 + 24 − 2

= 0.5207

To determine if the mean relational intimacy score for participants in the CMC group is lower than the mean relational intimacy score for participants in the FTF group, we test: H0: μ1 − μ2 = 0 Ha: μ1 − μ2 < 0 The test statistic is t =

( x1 − x2 ) − Do ⎛1 1 ⎞ s 2p ⎜ + ⎟ ⎝ n1 n2 ⎠

=

( 3.500 − 3.542 ) − 0 = −0.042 = −.20 1 ⎞ ⎛ 1 .5207 ⎜ + ⎟ ⎝ 24 24 ⎠

.20831

The rejection region requires α= .10 in the lower tail of the t-distribution with df = n1 + n2 – 2 = 24 + 24 – 2 = 46. From Table VI, Appendix B, t.10 ≈1.303. The rejection region is t < −1.303. Since the observed value of the test statistic does not fall in the rejection region (t = −.20 ≤/ −1.303), H0 is not rejected. There is insufficient evidence to indicate that the mean relational intimacy score for participants in the CMC group is lower than the mean relational intimacy score for participants in the FTF group at α = .10. 7.20

206

a.

The first population is the set of responses for all business students who have access to lecture notes and the second population is the set of responses for all business students not having access to lecture notes.

Chapter 7


b.

To determine if there is a difference in the mean response of the two groups, we test: H0: μ1 − μ2 = 0 Ha: μ1 − μ2 ≠ 0 The test statistic is z =

( x1 − x2 ) − 0 s12 s22 + n1 n2

=

(8.48 − 7.80) − 0 = 2.19 .94 2.99 + 86 35

The rejection region requires α/2 = .01/2 = .005 in each tail of the z-distribution. From Table IV, Appendix B, z.005 = 2.58. The rejection region is z < −2.58 or z > 2.58. Since the observed value of the test statistic does not fall in the rejection region (z = 2.19 >/ 2.58), H0 is not rejected. There is insufficient evidence to indicate a difference in the mean response of the two groups at α = .01. c.

For confidence coefficient .99, α = .01 and α/2 = .01/2 = .005. From Table IV, Appendix B, z.005 = 2.58. The confidence interval is: s12 s22 .94 2.99 + ⇒ (8.48 − 7.80) ± 2.58 + n1 n2 86 35

( x1 − x2 ) ± z.005

⇒ .68 ± .801 ⇒ (−.121, 1.481) We are 99% confident that the difference in the mean response between the two groups is between −.121 and 1.481.

7.22

d.

A 95% confidence interval would be smaller than the 99% confidence interval. The z value used in the 95% confidence interval is z.025 = 1.96 compared with the z value used in the 99% confidence interval of z.005 = 2.58.

a.

The bacteria counts are probably normally distributed because each count is the median of five measurements from the same specimen.

b.

Let μ1 = mean of the bacteria count for the discharge and μ2 = mean of the bacteria count upstream. Since we want to test if the mean of the bacteria count for the discharge exceeds the mean of the count upstream, we test: H0: μ1 − μ2 = 0 Ha: μ1 − μ2 > 0

c.

Using MINITAB, the descriptive statistics are: Descriptive Statistics: Plant, Upstream

Variable Plant Upstream

N 6 6

Mean 32.10 29.617

Median 31.75 30.000

TrMean 32.10 29.617

Variable Plant Upstream

Minimum 28.20 26.400

Maximum 36.20 32.300

Q1 29.40 27.075

Q3 35.23 31.850

StDev 3.19 2.355


SE Mean 1.30 0.961

207


(n1 − 1) s12 + (n2 − 1) s22 (6 − 1)3.192 + (6 − 1)2.3552 = 7.861 s = = n1 + n2 − 2 6+6−2 2 p


( x1 − x2 ) − 0 ⎛1 1 ⎞ s ⎜ + ⎟ ⎝ n1 n2 ⎠

=

(32.10 − 29.617) − 0

2 p

⎛1 1⎞ 7.861 ⎜ + ⎟ ⎝6 6⎠

= 1.53

No α level was given, so we will use α = .05. The rejection region requires α = .05 in the upper tail of the t-distribution with df = n1 + n2 − 2 = 6 + 6 – 2 = 10. From Table VI, Appendix B, t.05 = 1.812. The rejection region is t > 1.812. Since the observed value of the test statistic does not fall in the rejection region (t = 1.53 >/ 1.812), H0 is not rejected. There is insufficient evidence to indicate the mean bacteria count for the discharge exceeds the mean of the count upstream at α = .05. d.

We must assume: 1. 2. 3.

7.24

The mean counts per specimen for each location is normally distributed. The variances of the 2 distributions are equal. Independent and random samples were selected from each population.

a.

We cannot make inferences about the difference between the mean salaries of male and female accounting/finance/banking professionals because no standard deviations are provided.

b.

To determine if the mean salary for males is significantly greater than that for females, we test: H0: μ1 − μ2 = 0 Ha: μ1 − μ2 > 0 The rejection region requires α = .05 in the upper tail of the z-distribution. From Table IV, Appendix B, z.05 = 1.645. To make things easier, we will assume that the standard deviations for the 2 groups are the same. The test statistic is z=

208

( x1 − x2 ) − Do = ( 69, 484 − 52,012 ) − 0 = ⎛ σ 12 σ 22 ⎞ + ⎜ ⎟ ⎝ n1 n2 ⎠

1 ⎞ ⎛ 1 + ⎟ 1400 1400 ⎝ ⎠

σ2⎜

17,836 471,896.2038 = σ (.037796) σ

Chapter 7


In order to reject H0 this test statistic must fall in the rejection region, or be greater than 1.645. Solving for σ we get: z=

471,896.2038

σ

> 1.645 ⇒ σ <

471,896.2038 = 286,866.99 1.645

Thus, to reject H0 the average of the two standard deviations has to be less than $286,866.99.

7.26

c.

Yes. In fact, reasonable values for the standard deviation will be around $5,000. which is much smaller than the required $286,866.99.

d.

These data were collected from voluntary subjects who responded to a Web-based survey. Thus, this is not a random sample, but a self-selected sample. Generally, subjects who respond to surveys tend to have very strong opinions, which may not be the same as the population in general. Thus, the results from this self-selected sample may not reflect the results from the population in general.

a. Pair

Difference

1 2 3 4 5 6

3 2 2 4 0 1

nd

d=

∑d i =1

nd

i

=

12 =2 6

⎛ nd ⎞ ⎜ ∑ di ⎟ nd i =1 2 ⎠ di − ⎝ ∑ n d sd2 = i =1 nd − 1

2

⎛ (12) 2 ⎞ ⎜ 34 − ⎟ 6 ⎠ ⎝ =2 = 5

b.

μd = μ1 − μ2

c.

For confidence coefficient .95, α = .05 and α/2 = .025. From Table VI, Appendix B, with df = nD − 1 = 6 − 1 = 5, t.025 = 2.571. The confidence interval is:

d ± tα / 2

sd nd

= 2.571

2 6

⇒ 2 ± 1.484 ⇒ (.516, 3.484)


209


d.

H0: μd = 0 Ha: μd ≠ 0 The test statistic is t = t =

d sd

nd

=

2 = 3.46 2/ 6

The rejection region requires α/2 = .05/2 = .025 in each tail of the t-distribution with df = nD − 1 = 6 − 1 = 5. From Table VI, Appendix B, t.025 = 2.571. The rejection region is t < −2.571 or t > 2.571. Since the observed value of the test statistic falls in the rejection region (3.46 > 2.571), H0 is rejected. There is sufficient evidence to indicate that the mean difference is different from 0 at α = .05. 7.28

a.

H0: μ1 − μ2 = 0 Ha: μ1 − μ2 < 0 The rejection region requires α = .10 in the lower tail of the z-distribution. From Table IV, Appendix B, z.10 = 1.28. The rejection region is z < −1.28.

b.

H0: μ1 − μ2 = 0 Ha: μ1 − μ2 < 0 The test statistic is z =

d − 0 −3.5 − 0 = = −4.71 . sd 21 nd 38

The rejection region is z < −1.28 (Refer to part a.) Since the observed value of the test statistic falls in the rejection region (z = −4.71 < −1.28), H0 is rejected. There is sufficient evidence to indicate μ1 − μ2 < 0 at α = .10. c.

Since the sample size of the number of pairs is greater than 30, we do not need to assume that the population of differences is normal. The sampling distribution of d is approximately normal by the Central Limit Theorem. We must assume that the differences are randomly selected.

d.

For confidence coefficient .90, α = .10 and α/2 = .10/2 = .05. From Table IV, Appendix B, z.05 = 1.645. The 90% confidence interval is:

d ± z.05

e.

210

sd nd

⇒ −3.5 ± 1.645

21 38

⇒ −3.5 ± 1.223 ⇒ (−4.723, − 2.277)

The confidence interval provides more information since it gives an interval of possible values for the difference between the population means.

Chapter 7


7.30

a.

Let μ1 = the mean salary of technology professionals in 2003 and μ2 = the mean salary of technology professionals in 2005. Let μd = μ1 - μ2. To determine if the mean salary of technology professionals at all U.S. metropolitan areas has increased between 2003 and 2005, we test:

H0: μ1 − μ2 = 0

H0: μd = 0 OR

Ha: μ1 − μ2 < 0

Ha: μd < 0

b. Metro Area

2003 Salary ($ thousands) 87.7 78.6 71.4 70.8 73.0 76.3 73.6 71.1 69.5 69.0 71.0 73.0 62.3

Silicon Valley New York Washington, D.C. Los Angeles Denver Boston Atlanta Chicago Philadelphia San Diego Seattle Dallas-Ft. Worth Detroit

2005 Salary ($ thousands) 85.9 80.3 77.4 77.1 77.1 80.1 73.2 73.0 69.8 77.1 66.9 71.0 64.1

Difference (2003 – 2005) 1.8 −1.7 −6.0 −6.3 −4.1 −3.8 0.4 −1.9 −0.3 −8.1 4.1 2.0 −1.8

nd

c.

d=

∑ di 1

nd

=

−25.7 = −1.977 13 2

⎛ nd ⎞ ⎜⎜ ∑ di ⎟⎟ nd 1 ⎠ 2 (−25.7) 2 di − ⎝ ∑ 206.59 − nd 13 = = 12.9819 sd2 = 1 nd − 1 13 − 1 sd = sd2 = 12.9819 = 3.603 d − μo


e.

The rejection region requires α = .10 in the lower tail of the t-distribution with df = nd – 1 = 13 – 1 = 14. From Table VI, Appendix B, t.10 = 1.345. The rejection region is t < −1.345.

sd

nd

=

−1.977 − 0 = −1.978 3.603 13

d.


211


f.

Since the observed value of the test statistic falls in the rejection region (t = −1.978 < −1.345), H0 is rejected. There is sufficient evidence to indicate the mean salary of technology professionals at all U.S. metropolitan areas has increased between 2003 and 2005 at α = .10.

g.

In order for the inference to be valid, we must assume that the population of differences is normal and that we have a random sample. Using MINITAB, the histogram of the differences is: Histogram of Diff 3.0

Fr equency

2.5 2.0

1.5 1.0

0.5 0.0

-7.5

-5.0

-2.5

0.0

2.5

5.0

Diff

The graph is fairly mound-shaped although it is somewhat skewed to the right. Since there are only 13 observations, this graph is close enough to being mound-shaped to indicate the normal assumption is reasonable. 7.32

212

a.

The data should be analyzed as a paired difference experiment because each actor who won an Academy Award was paired with another actor with similar characteristics who did not win the award.

b.

Let μ1 = mean life expectancy of Academy Award winners and μ2 = mean life expectancy of non-Academy Award winners. To compare the mean life expectancies of Academy Award winners and non-winners, we test: H0: μ1 − μ2 = μd = 0 Ha: μd ≠ 0

c.

Since the p-value was so small, there is sufficient evidence to indicate the mean life expectancies of the Academy Award winners and non-winners are different for any value of α > .003. Since the sample mean life expectancy of Academy Award winners is greater than that for non-winners, we can conclude that Academy Award winners have a longer mean life expectancy than non-winners.

Chapter 7


7.34

a.

Let μ1 = mean driver chest injury rating and μ2 = mean passenger chest injury rating. Because the data are paired, we are interested in μ1 − μ2 = μd, the difference in mean chest injury ratings between drivers and passengers.

b.

The data were collected as matched pairs and thus, must be analyzed as matched pairs. Two ratings are obtained for each car – the driver’s chest injury rating and the passenger’s chest injury rating.

c.


Descriptive Statistics: DrivChst, PassChst, diff Variable DrivChst PassChst diff

N 98 98 98

Mean 49.663 50.224 -0.561

Median 50.000 50.500 0.000

TrMean 49.682 50.148 -0.420

Variable DrivChst PassChst diff

Minimum 34.000 35.000 -15.000

Maximum 68.000 69.000 13.000

Q1 45.000 45.000 -4.000

Q3 54.000 55.000 3.000

StDev 6.670 7.107 5.517

SE Mean 0.674 0.718 0.557

For confidence coefficient .99, α = .01 and α/2 = .01/2 = .005. From Table IV, Appendix B, z.005 = 2.58. The 99% confidence interval is: d ± z.005

7.36

sd nd

⇒ −0.561 ± 2.58

5.517 98

⇒ −0.561 ± 1.438 ⇒ (−1.999, 0.877)

d.

We are 99% confidence that the difference between the mean chest injury ratings of drivers and front-seat passengers is between −1.999 and 0.877. Since 0 is in the confidence interval, there is no evidence that the true mean driver chest injury rating exceeds the true mean passenger chest injury rating.

e.

Since the sample size is large, the sampling distribution of d is approximately normal by the Central Limit Theorem. We must assume that the differences are randomly selected.

a.

Let μC1 = mean relational intimacy score for the CMC group on the first meeting and μC3 = mean relational intimacy score for the CMC group on the third meeting. Let μCd = difference in mean relational intimacy score between the first and third meetings for the CMC group. To determine if the mean relational intimacy score will increase between the first and third meetings, we test: Ho: μCd = 0 Ha: μCd < 0

b.

The researchers used the paired t-test because the same individuals participated in each of the three meeting sessions. Thus, the samples would not be independent.

c.

Since the p-value is so small (p = .003), H0 would be rejected. There is sufficient evidence to indicate that the mean relational intimacy score for participants in the CMC group increased from the first to the third meeting for any value of α > .003.


213


d.

Let μF1 = mean relational intimacy score for the FTF group on the first meeting and μF3 = mean relational intimacy score for the FTF group on the third meeting. Let μFd = difference in mean relational intimacy score between the first and third meetings for the FTF group. To determine if the mean relational intimacy score will change between the first and third meetings, we test: H0: μFd = 0 Ha: μFd ≠ 0

e.

7.38

Since the p-value is not small (p = .39), H0 would be not be rejected. There is insufficient evidence to indicate that the mean relational intimacy score for participants in the FTF group changed from the first to the third meeting for any value of α < .39.

Using MINITAB, the descriptive statistics are: Descriptive Statistics: Method1, Method2, Diff Variable Method1 Method2 Diff

N 10 10 10

N* 0 0 0

Mean 13.39 13.10 0.290

SE Mean 4.18 3.96 0.553

StDev 13.22 12.51 1.750

Minimum 1.00 1.40 -2.200

Q1 1.30 1.78 -0.875

Median 10.35 9.50 -0.150

Q3 24.63 25.05 1.575

Maximum 34.40 30.70 3.700

To determine if the mean transition error for method 1 differs from the mean transition error for method 2, we test: H0: μ1 − μ2 = 0

H0: μd = 0 OR

Ha: μ1 − μ2 ≠ 0 The test statistic is t =

d − μo sd

nd

=

Ha: μd ≠ 0

0.290 − 0 = 0.52 1.750 10

The rejection region requires α/2 = .10/2 = .05 in each tail of the t-distribution with df = nd – 1 = 10 – 1 = 9. From Table VI, Appendix B, t.05 = 1.833. The rejection region is t < −1.833 or t > 1.833. Since the observed value of the test statistic does not fall in the rejection region (t = 0.52 >/ 1.833), H0 is not rejected. There is insufficient evidence to indicate the mean transition error for method 1 differs from the mean transition error for method 2 at α = .10. 7.40

Using MINITAB, the descriptive statistics are: Descriptive Statistics: HMETER, HSTATIC, Diff

214

Variable HMETER HSTATIC Diff

N 40 40 40

N* 0 0 0

Mean 1.0405 1.0410 -0.000523

Variable HMETER HSTATIC Diff

Median 1.0232 1.0237 -0.000165

SE Mean 0.00638 0.00649 0.000204

Q3 1.0883 1.0908 0.000317

StDev 0.0403 0.0410 0.001291

Minimum 0.9936 0.9930 -0.004480

Q1 1.0047 1.0043 -0.001078

Maximum 1.1026 1.1052 0.001580

Chapter 7


For confidence coefficient .95, α = .05 and α/2 = .05/2 = .025. From Table VI, Appendix B, with df = nd – 1 = 40 – 1 = 39, t.025 ≈ 2.021. The 95% confidence interval is: d ± t.025

sd

⇒ −0.000523 ± 2.021

n ⇒ (−0.000936,

0.001291 ⇒ −0.000523 ± 0.000413 40

− 0.000110)

We are 95% confident that the true difference in mean density measurements between the two methods is between -0.000936 and -0.000110. Since the absolute value of this interval is completely less than the desired maximum difference of .002, the winery should choose the alternative method of measuring wine density. 7.42

a.


b.


c.

The rejection region requires α = .05 in the lower tail of the z-distribution. From Table IV, Appendix B, z.05 = 1.645. The rejection region is z < −1.645. The rejection region requires α = .10 in the lower tail of the z-distribution. From Table IV, Appendix B, z.10 = 1.28. The rejection region is z < −1.28.

d.

7.44

For confidence coefficient .95, α = 1 − .95 = .05 and α/2 = .05/2 = .025. From Table IV, Appendix B, z.025 = 1.96. The 95% confidence interval for p1 − p2 is approximately: a.

( pˆ1 − pˆ 2 ) ± zα / 2

pˆ1qˆ1 pˆ 2 qˆ2 .65(1 − .65) .58(1 − .58) + ⇒ (.65 − .58) ± 1.96 + n1 n2 400 400 ⇒ .07 − .067 ⇒ (.003, .137)

b.

( pˆ1 − pˆ 2 ) ± zα / 2

pˆ1qˆ1 pˆ 2 qˆ2 + ⇒ (.31 − .25) − 1.96 n1 n2

.31(1 − .31) .25(1 − .25) + 180 250

⇒ .06 ± .086 ⇒ (−.026, .146) c.

( pˆ1 − pˆ 2 ) ± zα / 2

pˆ1qˆ1 pˆ 2 qˆ2 .46(1 − .46) .61(1 − .61) + ⇒ (.46 − .61) ±1.96 + 100 120 n1 n2 ⇒ −.15 ± .131 ⇒ (−.281, −.019)

7.46

pˆ =

n1 pˆ1 + n2 pˆ 2 55(.7) + 65(.6) 78 = = = .65 55 + 65 120 n1 + n2

qˆ = 1 − pˆ = 1 − .65 = .35

H0: p1 − p2 = 0 Ha: p1 − p2 > 0


215



( pˆ1 − pˆ 2 ) − 0 ⎛1 1⎞ ˆ ˆ⎜ + ⎟ pq ⎝ n1 n2 ⎠

=

(.7 − .6) − 0 1 ⎞ ⎛ 1 .65(.35) ⎜ + ⎟ 55 65 ⎝ ⎠

=

.1 = 1.14 .08739

The rejection region requires α = .05 in the upper tail of the z-distribution. From Table IV, Appendix B, z.05 = 1.645. The rejection region is z > 1.645. Since the observed value of the test statistic does not fall in the rejection region (z = 1.14 >/ 1.645), H0 is not rejected. There is insufficient evidence to indicate the proportion from population 1 is greater than that for population 2 at α = .05. 7.48

a.

Let p1 = proportion of men who prefer to keep track of appointments in their head and p2 = proportion of women who prefer to keep track of appointments in their head. To determine if the proportion of men who prefer to keep track of appointments in their head is greater than that of women, we test:

H0: p1 − p2 = 0 Ha: p1 − p2 > 0 b.

pˆ =

n1 pˆ1 + n2 pˆ 2 500(.56) + 500(.46) = .51 and qˆ = 1 − pˆ = 1 − .51 = .49 = n1 + n2 500 + 500


7.50

( pˆ1 − pˆ 2 ) − 0 ⎛1 1⎞ ˆ ˆ⎜ + ⎟ pq ⎝ n1 n2 ⎠

=

(.56 − .46) − 0 1 ⎞ ⎛ 1 + .51(.49) ⎜ ⎟ ⎝ 500 500 ⎠

= 3.16

c.

The rejection region requires α = .01 in the upper tail of the z distribution. From Table IV, Appendix B, z.01 = 2.33. The rejection region is z > 2.33.

d.

The p-value is p = P(z ≥ 3.16) ≈ .5 − .5 = 0.

e.

Since the observed value of the test statistic falls in the rejection region (z = 3.16 > 2.33), H0 is rejected. There is sufficient evidence to indicate the proportion of men who prefer to keep track of appointments in their head is greater than that of women at α = .01.

a.

Let p1 = proportion of customers returning the printed survey and p2 = proportion of customers returning the electronic survey. Some preliminary calculations are: pˆ1 =

x1 261 = = .414 n1 631

pˆ 2 =

x2 155 = = .374 n2 414


216

Chapter 7


( pˆ1 − pˆ 2 ) ± z.05

pˆ1qˆ1 pˆ 2 qˆ2 .414(.586) .374(.626) + ⇒ (.414 − .374) ± 1.645 + n1 n2 631 414 ⇒ .04 ± .051 ⇒ (−.011, .091)

We are 90% confidence that the difference in the response rates for the two types of surveys is between −.011 and .091.

7.52

b.

Since the value .05 falls in the 90% confidence interval, it is not an unusual value. Thus, there is no evidence that the difference in response rates is different from .05. The researchers would be able to make this inference.

a.

Let p1 = proportion of managers and professionals who are male and p2 = proportion of part-time MBA students who are male. To see if the samples are sufficiently large: pˆ1 ± 3σ pˆ1 ⇒ pˆ1 ± 3

p1q1 pˆ qˆ (.95)(0.5) ⇒ pˆ1 ± 3 1 1 ⇒ .95 ± 3 n1 n1 162

⇒ .95 ± .05 ⇒ (.90, 1.00) pˆ 2 ± 3σ pˆ 2 ⇒ pˆ 2 ± 3

p2 q2 pˆ qˆ (.689)(.311) ⇒ pˆ 2 ± 3 2 2 ⇒ .95 ± 3 n2 n2 109

⇒ .689 ± .133 ⇒ (.556, .822) Since both intervals are contained within the interval (0, 1), the normal approximation will be adequate. First, we calculate the overall estimate of the common proportion under H0. pˆ =

n1 pˆ1 + n2 pˆ 2 162(.95) + 109(.689) = .845 = n1 + n2 162 + 109

To determine if the population of managers and professionals consists of more males than the part-time MBA population, we test: H0: p1 = p2 Ha: p1 > p2 The test statistic is z =

( pˆ1 − pˆ 2 ) − 0 ⎛1 1⎞ ˆ ˆ⎜ + ⎟ pq ⎝ n1 n2 ⎠

=

(.95 − .689) − 0 1 ⎞ ⎛ 1 + .845(.155) ⎜ ⎟ ⎝ 162 109 ⎠

= 5.82

The rejection region requires α = .05 in the upper tail of the z-distribution. From Table IV, Appendix B, z.05 = 1.645. The rejection region is z > 1.645. Since the observed value of the test statistic falls in the rejection (z = 5.82 > 1.645), H0 is rejected. There is sufficient evidence to indicate that population of managers and professionals consists of more males than the part-time MBA population at α = .05.


217


b.

We had to assume: 1. Both samples were randomly selected 2. Both sample sizes are sufficiently large.

c.

First, we calculate the overall estimate of the common proportion under H0. pˆ =

n1 pˆ1 + n2 pˆ 2 162(.912) + 109(.534) = = .760 n1 + n2 162 + 109

To determine if the population of managers and professionals consists of more married individuals than the part-time MBA population, we test: H0: p1 = p2 Ha: p1 > p2 The test statistic is z =

( pˆ1 − pˆ 2 ) − 0 ⎛1 1⎞ ˆ ˆ⎜ + ⎟ pq ⎝ n1 n2 ⎠

(.912 − .534) − 0

=

1 ⎞ ⎛ 1 + .760(.240) ⎜ ⎟ ⎝ 162 109 ⎠

= 7.14

The rejection region requires α = .01 in the upper tail of the z-distribution. From Table IV, Appendix B, z.01 = 2.33. The rejection region is z > 2.33. Since the observed value of the test statistic falls in the rejection (z = 7.14 > 2.33), H0 is rejected. There is sufficient evidence to indicate that population of managers and professionals consists of more married individuals than the part-time MBA population at α = .01. d.

We had to assume: 1. Both samples were randomly selected 2. Both sample sizes are sufficiently large.

7.54

Let p1 = accuracy rate for modules with correct code and p2 = accuracy rate for modules with defective code. Some preliminary calculations are:

pˆ 1 =

218

x1 400 = = .891 n1 449

pˆ 2 =

x 2 20 = = .408 n2 49

Chapter 7


For confidence coefficient .99, α = .01 and α/2 = .01/2 = .005. From Table IV, Appendix B, z.005 = 2.58. The 99% confidence interval is: ( pˆ1 − pˆ 2 ) ± z.005

pˆ1qˆ1 pˆ 2 qˆ2 .891(.109) .408(.592) + ⇒ (.891 − .408) ± 2.58 + n1 n2 449 49 ⇒ .483 ± .185 ⇒ (.298, .668)

We are 99% confident that the difference in accuracy rates between modules with correct code and modules with defective code is between .298 and .668. 7.56

a.

Let p = proportion of all children who recognize Joe Camel. pˆ =

x 15 + 46 = = .735 n 28 + 55

qˆ = 1 − pˆ = 1 − .735 = .265

To see if the sample is sufficiently large: pˆ ± 3σ pˆ ⇒ pˆ ± 3

ˆˆ pq pq .735(.265) ⇒ pˆ ± 3 ⇒ .735 ± 3 ⇒ .735 ± .145 n n 83 ⇒ (.590, .880)

Since the interval lies within the interval (0, 1), the normal approximation will be adequate. For confidence coefficient .95, α = .05 and α/2 = .05/2 = .025. From Table IV, Appendix B, z.025 = 1.96. The 95% confidence interval is: pˆ ± z.025

ˆˆ pq .735(.265) ⇒ .735 ± 1.96 ⇒ .735 ± .095 ⇒ (.640, .830) n 83

We are 95% confident that the proportion of all children who recognize Joe Camel is between .640 and .830. b.

Let p1 = proportion of children under the age of 6 who recognize Joe Camel and p2 = proportion of children age 6 and over who recognize Joe Camel. x1 15 = = .536 n1 28 x 46 pˆ 2 = 2 = = .836 n2 55 pˆ1 =

qˆ1 = 1 − pˆ1 = 1 − .536 = .464 qˆ2 = 1 − pˆ 2 = 1 − .836 = .164


219


To see if the samples are sufficiently large:

pˆ1 ± 3σ pˆ1 ⇒ pˆ1 ± 3

pˆ 2 ± 3σ pˆ 2 ⇒ pˆ 2 ± 3

p1q1 pˆ qˆ .536(.464) ⇒ pˆ1 ± 3 1 1 ⇒ .536 \± 3 28 n1 n1 p2 q2 n2

⇒ .536 ± .283 ⇒ (.253, .819) pˆ qˆ .836(.164) ⇒ pˆ 2 ± 3 2 2 ⇒ .836 ± 3 n2 55 ⇒ .836 ± .150 ⇒ (.686, .986)

Since both intervals lie within the interval (0, 1), the normal approximation will be adequate. To determine if the recognition of Joe Camel increases with age, we test: H0: p1 − p2 = 0 Ha: p1 − p2 < 0 The test statistic is z =

( pˆ1 − pˆ 2 ) − 0 ⎛1 1⎞ ˆ ˆ⎜ + ⎟ pq ⎝ n1 n2 ⎠

=

(.536 − .836) − 0 1 ⎞ ⎛ 1 .735(.265) ⎜ + ⎟ ⎝ 28 55 ⎠

= −2.93

The rejection region requires α = .05 in the lower tail of the z-distribution. From Table IV, Appendix B, z.05 = 1.645. The rejection region is z < −1.645. Since the observed value of the test statistic falls in the rejection region (z = −2.93 < −1.645), H0 is rejected. There is sufficient evidence to indicate that the recognition of Joe Camel increases with age at α = .05. 7.58

a.

For confidence coefficient .95, α = 1 − .95 = .05 and α/2 = .05/2 = .025. From Table IV, Appendix B, z.025 = 1.96. n1 = n2 =

b.

2 ( zα / 2 ) (σ 12 + σ 22 )

ME 2

=

(1.96) 2 (152 + 17 2 ) = 192.83 ≈ 193 3.22

If the range of each population is 40, we would estimate σ by:

σ ≈ 60/4 = 15 For confidence coefficient .99, α = 1 − .99 = .01 and α/2 = .01/2 = .005. From Table IV, Appendix B, z.005 = 2.58. n1 = n2 =

220

2 ( zα / 2 ) (σ 12 + σ 22 )

ME 2

=

(2.58) 2 (152 + 152 ) = 46.80 ≈ 47 82

Chapter 7


c.

For confidence coefficient .9, α = 1 − .9 = .1 and α/2 = .1/2 = .05. From Table IV, Appendix B, z.05 = 1.645. For a width of 1, the bound is .5. n1 = n2 =

7.60

2 ( zα / 2 ) (σ 12 + σ 22 )

ME

2

=

(1.645) 2 (5.82 + 7.52 ) = 143.96 ≈ 144 .52

First, find the sample sizes needed for width 5, or margin of error 2.5. For confidence coefficient .9, α = 1 − .9 = .1 and α/2 = .1/2 = .05. From Table IV, Appendix B, z.05 = 1.645. n1 = n2 =

2 ( zα / 2 ) (σ 12 + σ 22 )

ME 2

=

(1.645) 2 (102 + 102 ) = 86.59 ≈ 87 2.52

Thus, the necessary sample size from each population is 87. Therefore, sufficient funds have not been allocated to meet the specifications since n1 = n2 = 100 are large enough samples. 7.62

For confidence coefficient .95, α = .05 and α/2 = .05/2 = .025. From Table IV, Appendix B, z.025 = 1.96.

n1 = n2 =

2 ( zα / 2 ) (σ 12 + σ 22 )

( ME ) 2

=

1.962 (3.1892 + 2.3552 ) = 26.8 ≈ 27 1.52

We would need to sample 27 specimens from each location. 7.64

For confidence coefficient .90, α = 1 − .90 = .10 and α/2 = .10/2 = .05. From Table IV, Appendix B, z.05 = 1.645. Since no information is given about the values of p1 and p2, we will be conservative and use .5 for both. A width of .04 means the bound is .04/2 = .02. n1 = n2 =

7.66

a.

( zα / 2 )

( p1 q1 + p2 q2 ) ( ME ) 2

=

1.6452 (.5(.5) + .5(.5) ) .022

= 3,382.5 ≈ 3,383

For confidence coefficient .80, α = 1 − .80 = .20 and α/2 = .20/2 = .10. From Table IV, Appendix B, z.10 = 1.28. Since we have no prior information about the proportions, we use p1 = p2 = .5 to get a conservative estimate. For a width of .06, the margin of error is .03. n1 = n2 =

b.

2

( zα / 2 )

2

( p1q1 + p2 q2 ) ME 2

=

(1.28) 2 (.5(1 − .5) + .5(1 − .5) ) .032

= 910.22 ≈ 911

For confidence coefficient .90, α = 1 − .90 = .10 and α/2 = .10/2 = .05. From Table IV, Appendix B, z.05 = 1.645. Using the formula for the sample size needed to estimate a proportion from Chapter 7, n=

( zα / 2 )

2

ME

2

pq

=

1.6452 (.5(1 − .5) ) .02

2

=

.6765 = 1691.27 ≈ 1692 .0004

No, the sample size from part a is not large enough.


221


7.68

For confidence coefficient .95, α = 1 − .95 = .05 and α/2 = .025. From Table IV, Appendix B, z.025 = 1.96.

n1 = n2 = 7.70

2 ( zα / 2 ) (σ 12 + σ 22 )

( ME ) 2

=

1.962 (352 + 802 ) = 292.9 ≈ 293 102

a.

With ν1 = 2 and ν2 = 30, P(F ≥ 5.39) = .01 (Table XI, Appendix B)

b.

With ν1 = 24 and ν2 = 10, P(F ≥ 2.74) = .05 (Table IX, Appendix B) Thus, P(F < 2.74) = 1 − P(F ≥ 2.74) = 1 − .05 = .95.

c.

With ν1 = 7 and ν2 = 1, P(F ≥ 236.8) = .05 (Table VIII, Appendix B) Thus, P(F < 236.8) = 1 − P(F ≥ 236.8) = 1 − .05 = .95.

d.

7.72

With ν1 = 40 and ν2 = 40, P(F > 2.11) = .01 (Table XI, Appendix B)

To test H0: σ 12 = σ 22 against Ha: σ 12 ≠ σ 22 , the rejection region is F > Fα/2 with ν1 = 10 and ν2 = 12. a.

α = .20, α/2 = .10 Reject H0 if F > F.10 = 2.19 (Table VIII, Appendix B)

b.

α = .10, α/2 = .05 Reject H0 if F > F.05 = 2.75 (Table IX, Appendix B)

c.

α = .05, α/2 = .025 Reject H0 if F > F.025 = 3.37 (Table X, Appendix B)

d.

α = .02, α/2 = .01 Reject H0 if F > F.01 = 4.30 (Table XI, Appendix B)

7.74

a.

To determine if a difference exists between the population variances, we test: H0: σ 12 = σ 22 Ha: σ 12 ≠ σ 22

The test statistic is F =

222

s22 8.75 = = 2.26 s12 3.87

Chapter 7


The rejection region requires α/2 = .10/2 = .05 in the upper tail of the F-distribution with ν1 = n2 − 1 = 27 − 1 = 26 and ν2 = n1 − 1 = 12 − 1 = 11. From Table IX, Appendix B, F.05 ≈ 2.60. The rejection region is F > 2.60. Since the observed value of the test statistic does not fall in the rejection region (F = 2.26 >/ 2.60), H0 is not rejected. There is insufficient evidence to indicate a difference between the population variances. b.

The p-value is 2P(F ≥ 2.26). From Tables VIII and IX, with ν1 = 26 and ν2 = 11, 2(.05) < 2P(F ≥ 2.26) < 2(.10) ⇒ .10 < 2P(F ≥ 2.26) < .20 There is no evidence to reject H0 for α ≤ .10.

7.76

Let σ 12 = variance of carat size for diamonds certified by GIA, σ 22 = variance of carat size for diamonds certified by HRD, and σ 32 = variance of carat size for diamonds certified by IGI. a.

To determine if the variation in carat size differs for diamonds certified by GIA and diamonds certified by HRD, we test: H0: σ 12 = σ 22

Ha: σ 12 ≠ σ 22 The test statistic is F =

Larger sample variance s12 .24562 = = = 1.799 Smaller sample variance s22 .18312

The rejection region requires α/2 = .05/2 = .025 in the upper tail of the F-distribution with ν1 = n1 – 1 = 151 – 1 = 150 and ν2 = n2 – 1 = 79 – 1 = 78. From Table X, Appendix B, F.025 ≈ 1.43. The rejection region is F > 1.43.

Since the observed value of the test statistic falls in the rejection region (F = 1.799 > 1.43), H0 is rejected. There is sufficient evidence to indicate the variation in carat size differs for diamonds certified by GIA and those certified by HRD at α = .05. b.

To determine if the variation in carat size differs for diamonds certified by GIA and diamonds certified by IGI, we test: H0: σ 12 = σ 32 Ha: σ 12 ≠ σ 32


Larger sample variance s12 .24562 = = = 1.289 Smaller sample variance s32 .21632

The rejection region requires α/2 = .05/2 = .025 in the upper tail of the F-distribution with ν1 = n1 – 1 = 151 – 1 = 150 and ν2 = n3 – 1 = 78 – 1 = 77. From Table X, Appendix B, F.025 ≈ 1.43. The rejection region is F > 1.43.


223


Since the observed value of the test statistic does not fall in the rejection region (F = 1.289 >/ 1.43), H0 is not rejected. There is insufficient evidence to indicate the variation in carat size differs for diamonds certified by GIA and those certified by IGI at α = .05. c.

To determine if the variation in carat size differs for diamonds certified by HRD and diamonds certified by IGI, we test: H0: σ 22 = σ 32 Ha: σ 22 ≠ σ 32

Larger sample variance s32 .21632 = = = 1.396 Smaller sample variance s22 .18312 The rejection region requires α/2 = .05/2 = .025 in the upper tail of the F-distribution with ν1 = n3 – 1 = 78 – 1 = 77 and ν2 = n2 – 1 = 79 – 1 = 78. From Table X, Appendix B, F.025 ≈ 1.67. The rejection region is F > 1.67. The test statistic is F =

Since the observed value of the test statistic does not fall in the rejection region (F = 1.396 >/ 1.67), H0 is not rejected. There is insufficient evidence to indicate the variation in carat size differs for diamonds certified by HRD and those certified by IGI at α = .05. d.

We will look at the 4 methods for determining if the data are normal. First, we will look at histograms of the data. Using MINITAB, the histograms of the carat sizes for the 3 certification bodies are:

40 40

30

Percent

Percent

30

20

20

10 10

0

0

0.0

0.5

1.0

0.0

GIA

0.5

1.0

HRD

40

Percent

30

20

10

0 0.0

0.5

1.0

IGI

From the histograms, none of the data appear to be mound-shaped. It appears that none of the data sets are normal.

224

Chapter 7


Next, we look at the intervals x ± s, x ± 2 s, x ± 3s . If the proportions of observations falling in each interval are approximately .68, .95, and 1.00, then the data are approximately normal. Using MINITAB, the summary statistics are: Descriptive Statistics: GIA, IGI, HRD Variable GIA IGI HRD

N 151 78 79

Mean 0.6723 0.3665 0.8129

Median 0.7000 0.2900 0.8100

TrMean 0.6713 0.3406 0.8169

Variable GIA IGI HRD

Minimum 0.3000 0.1800 0.5000

Maximum 1.1000 1.0100 1.0900

Q1 0.5000 0.2100 0.6500

Q3 0.9000 0.4850 1.0000

StDev 0.2456 0.2163 0.1831

SE Mean 0.0200 0.0245 0.0206

For GIA:

x ± s ⇒ .6723 ± .2456 ⇒ (.4267, .9179) 84 of the 151 values fall in this interval. The proportion is .56. This is much smaller than the .68 we would expect if the data were normal. x ± 2 s ⇒ .6723 ± 2(.2456) ⇒ .6723 ± .4912 ⇒ (.1811, 1.1635) 151 of the 151 values fall in this interval. The proportion is 1.00. This is much larger than the .95 we would expect if the data were normal. x ± 3s ⇒ .6723 ± 3(.2456) ⇒ .6723 ± .7368 ⇒ (−.0645, 1.4091) 151 of the 151 values fall in this interval. The proportion is 1.00. This is the same as the 1.00 we would expect if the data were normal. From this method, it appears that the data are not normal. For IGI:

x ± s ⇒ .3665 ± .2163 ⇒ (.1502, .5828) 69 of the 78 values fall in this interval. The proportion is .88. This is much larger than the .68 we would expect if the data were normal. x ± 2s ⇒ .3665 ± 2(.2163) ⇒ .3665 ± .4326 ⇒ (−.0661, .7991) 74 of the 78 values fall in this interval. The proportion is .95. This is the same as the .95 we would expect if the data were normal. x ± 3s ⇒ .3665 ± 3(.2163) ⇒ .3665 ± .6489 ⇒ (−.2824, 1.0154) 78 of the 78 values fall in this interval. The proportion is 1.00. This is the same as the 1.00 we would expect if the data were normal. From this method, it appears that the data are not normal.


225


For HRD:

x ± s ⇒ .8129 ± .1831 ⇒ (.6298, .9960) 30 of the 79 values fall in this interval. The proportion is .38. This is much smaller than the .68 we would expect if the data were normal. x ± 2 s ⇒ .8129 ± 2(.1831) ⇒ .8129 ± .3662 ⇒ (.4467, 1.1791) 79 of the 79 values fall in this interval. The proportion is 1.00. This is much larger than the .95 we would expect if the data were normal. x ± 3s ⇒ .8129 ± 3(.1831) ⇒ .8129 ± .5493 ⇒ (.2636, 1.3622) 79 of the 79 values fall in this interval. The proportion is 1.00. This is the same as the 1.00 we would expect if the data were normal. From this method, it appears that the data are not normal. Next, we look at the ratio of the IQR to s. For GIA:

IQR = QU – QL = 1.1 − .3 = .8. IQR .8 = = 3.26 This is much larger than the 1.3 we would expect if the data were s .2456 normal. This method indicates the data are not normal. For IGI:

IQR = QU – QL = 1.01 - .18 = .83. IQR .83 = = 3.84 This is much larger than the 1.3 we would expect if the data were s .2163 normal. This method indicates the data are not normal. For HRD:

IQR = QU – QL = 1.09 - .5 = .59. IQR .59 = = 3.22 This is much larger than the 1.3 we would expect if the data were s .1831 normal. This method indicates the data are not normal.

226

Chapter 7


Finally, using MINITAB, the normal probability plot for GIA is: Normal Probability Plot for GIA ML Estimates - 95% CI

ML Estimates 99

Percent

95 90

Mean

0.672252

StDev

0.244757

Goodness of Fit

80 70 60 50 40 30 20

AD*

3.332

10 5 1

0.0

0.5

1.0

1.5

Data

Since the data do not form a straight line, the data are not normal. Using MINITAB, the normal probability plot for IGI is: Normal Probability Plot for IGI ML Estimates - 95% CI

ML Estimates 99

Mean

0.366538

StDev

0.214863

Percent

95 90

Goodness of Fit

80 70 60 50 40 30 20

AD*

5.622

10 5 1

0.0

0.5

1.0

Data

Since the data do not form a straight line, the data are not normal.


227


Using MINITAB, the normal probability plot for HRD is: Normal Probability Plot for HRD ML Estimates - 95% CI

ML Estimates 99

Percent

95 90

Mean

0.812911

StDev

0.181890

Goodness of Fit

80 70 60 50 40 30 20

AD*

3.539

10 5 1

0.5

1.0

1.5

Data

Since the data do not form a straight line, the data are not normal. From the 4 different methods, all indications are that the carat size data are not normal for any of the certification bodies. 7.78

a.

The amount of variability of GHQ scores tells us how similar or different the members of the group are on GHQ scores. The larger the variability, the larger the differences are among the members on the GHQ scores. The smaller the variability, the smaller the differences are among the members on the GHQ scores.

b.

Let σ 12 = variance of the mental health scores of the employed and σ 22 = variance of the mental health scores of the unemployed. To determine if the variability in mental health scores differs for employed and unemployed workers, we test: H0: σ 12 = σ 22 Ha: σ 12 ≠ σ 22

c.


Larger sample variance s12 5.102 = 2.45 = = Smaller sample variance s22 3.262

The rejection region requires α/2 = .05/2 = .025 in the upper tail of the F-distribution with ν1 = n2 − 1 = 49 − 1 = 48 and ν2 = n1 − 1 = 142 − 1 = 141. From Table XI, Appendix B, F.025 ≈ 1.61. The rejection region is F > 1.61. Since the observed value of the test statistic falls in the rejection region (F = 2.45 > 1.61), H0 is rejected. There is sufficient evidence to indicate that the variability in mental health scores differs for employed and unemployed workers for α = 05.

228

Chapter 7


d.

7.80

We must assume that the 2 populations of mental health scores are normally distributed. We must also assume that we selected 2 independent random samples.

Let σ 12 = variance zinc measurements from the text-line, σ 22 = variance zinc measurements from the witness-line, and σ 32 = variance zinc measurements from the intersection. Using MINITAB, the descriptive statistics are: Descriptive Statistics: Text-line, Witness-line, Intersection Variable Text-lin WitnessIntersec

N 3 6 5

Mean 0.3830 0.3042 0.3290

Median 0.3740 0.2955 0.3190

TrMean 0.3830 0.3042 0.3290

Variable Text-lin WitnessIntersec

Minimum 0.3350 0.1880 0.2850

Maximum 0.4400 0.4390 0.3930

Q1 0.3350 0.2045 0.2900

Q3 0.4400 0.4075 0.3730

a.

StDev 0.0531 0.1015 0.0443

SE Mean 0.0306 0.0415 0.0198

To determine if the variation in the zinc measurements for the text-line and the intersection differ, we test: H0: σ 12 = σ 32 Ha: σ 12 ≠ σ 32 The test statistic is F =

Larger sample variance s12 .05312 = 1.437 = = Smaller sample variance s32 .04432

The rejection region requires α/2 = .05/2 = .025 in the upper tail of the F-distribution with ν1 = n1 – 1 = 3 – 1 = 2 and ν2 = n3 – 1 = 5 – 1 = 4. From Table X, Appendix B, F.025 = 10.65. The rejection region is F > 10.65. Since the observed value of the test statistic does not fall in the rejection region (F = 1.437 >/ 10.65), H0 is not rejected. There is insufficient evidence to indicate the variation in the zinc measurements for the text-line and the intersection differ at α = .05. b.

To determine if the variation in the zinc measurements for the witness-line and the intersection differ, we test: H0: σ 22 = σ 32 Ha: σ 22 ≠ σ 32 The test statistic is F =

Larger sample variance s22 .10152 = 5.250 = = Smaller sample variance s32 .04432

The rejection region requires α/2 = .05/2 = .025 in the upper tail of the F-distribution with ν1 = n2 – 1 = 6 – 1 = 5 and ν2 = n3 – 1 = 5 – 1 = 4. From Table X, Appendix B, F.025 = 9.36. The rejection region is F > 9.36. Since the observed value of the test statistic does not fall in the rejection region (F = 5.250 >/ 9.36), H0 is not rejected. There is insufficient evidence to indicate the variation in the zinc measurements for the witness-line and the intersection differ at α = .05.


229


7.82

c.

There is no indication that the variances of the zinc measurements for three locations differ.

d.

With only 3, 6, and 5 measurements, it is very difficult to check the assumptions.

Using MINITAB, some preliminary calculations are: Descriptive Statistics: HEATRATE Variable HEATRATE

ENGINE Advanced Aeroderiv Traditional

N 21 7 39

Variable HEATRATE

ENGINE Advanced Aeroderiv Traditional

Q3 10060 14628 11964

a.

N* 0 0 0

Mean 9764 12312 11544

SE Mean 139 1002 205

StDev 639 2652 1279

Minimum 9105 8714 10086

Q1 9252 9469 10592

Median 9669 12414 11183

Maximum 11588 16243 14796

To determine if the heat rate variances for traditional and aeroderivative augmented gas turbines differ, we test: H0: σ 22 = σ 32 Ha: σ 22 ≠ σ 32 89) The test statistic is

F=

Larger sample variance s22 26522 = 4.299 = = Smaller sample variance s32 12792

The rejection region requires α/2 = .05/2 = .025 in the upper tail of the F distribution with numerator df = ν2 = n2 – 1 = 7 – 1 = 6 and denominator df = ν3 = n3 – 1 = 39 – 1 = 38. From Table X, Appendix B, F.025 ≈ 2.74. The rejection region is F > 2.74. Since the observed value of the test statistic falls in the rejection region (F = 4.299 > 2.74), H0 is rejected. There is sufficient evidence to indicate the heat rate variances for traditional and aeroderivative augmented gas turbines differ at α = .05. Since the test in Exercise 7.23 a assumes that the population variances are the same, the validity of the test is suspect since we just found the variances are different. b.

To determine if the heat rate variances for advanced and aeroderivative augmented gas turbines differ, we test: H0: σ 12 = σ 22 Ha: σ 12 ≠ σ 22

230

Chapter 7


Larger sample variance s 212 26522 = 17.224 = = The test statistic is F = Smaller sample variance s12 6392 The rejection region requires α/2 = .05/2 = .025 in the upper tail of the F distribution with numerator df = ν1 = n1 – 1 = 7 – 1 = 6 and denominator df = ν2 = n2 – 1 = 21 – 1 = 20. From Table X, Appendix B, F.025 = 3.13. The rejection region is F > 3.13. Since the observed value of the test statistic falls in the rejection region (F = 17.224 > 3.13), H0 is rejected. There is sufficient evidence to indicate the heat rate variances for advanced and aeroderivative augmented gas turbines differ at α = .05. Since the test in Exercise 7.23 b assumes that the population variances are the same, the validity of the test is suspect since we just found the variances are different. 7.84

a.

The 2 samples are randomly selected in an independent manner from the two populations. The sample sizes, n1 and n2, are large enough so that x1 and x2 each have approximately normal sampling distributions and so that s12 and s22 provide good approximations to σ 12 and σ 22 . This will be true if n1 ≥ 30 and n2 ≥ 30.

b.

7.86

1. 2. 3.

Both sampled populations have relative frequency distributions that are approximately normal. The population variances are equal. The samples are randomly and independently selected from the populations.

c.

1. 2.

The relative frequency distribution of the population of differences is normal. The sample of differences are randomly selected from the population of differences.

d.

The two samples are independent random samples from binomial distributions. Both samples should be large enough so that the normal distribution provides an adequate approximation to the sampling distributions of pˆ1 and pˆ 2 .

e.

The two samples are independent random samples from populations which are normally distributed.

a.

H0: σ 12 = σ 22 Ha: σ 12 ≠ σ 22 The test statistic is F =

s2 Larger sample variance 120.1 = 22 = = 3.84 Smaller sample variance s1 31.3

The rejection region requires α/2 = .05/2 = .025 in the upper tail of the F-distribution with numerator df ν1 = n2 − 1 = 15 − 1 = 14 and denominator df ν2 = n1 − 1 = 20 − 1 = 19. From Table XI, Appendix B, F.025 ≈ 2.66. The rejection region is F > 2.66. Since the observed value of the test statistic falls in the rejection region (F = 3.84 > 2.66), H0 is rejected. There is sufficient evidence to conclude σ 12 ≠ σ 22 at α = .05.


231


b.

7.88

No, we should not use a small sample t test to test H0: (μ1 − μ2) = 0 against Ha: (μ1 − μ2) ≠ 0 because the assumption of equal variances does not seem to hold since we concluded σ 12 ≠ σ 22 in part b.


pˆ1 = a.

x1 110 x 130 x +x 110 + 130 240 = = = .55; pˆ 2 = 2 = = .65; pˆ = 1 2 = n1 200 n2 200 n1 + n2 200 + 200 400

H0: (p1 − p2) = 0 Ha: (p1 − p2) < 0 The test statistic is z =

( pˆ1 − pˆ 2 ) − 0 ⎛1 1⎞ ˆ ˆ⎜ + ⎟ pq ⎝ n1 n2 ⎠

=

(.55 − .65) − 0 1 ⎞ ⎛ 1 .6(1 − .6) ⎜ + ⎟ ⎝ 200 200 ⎠

=

−.10 = −2.04 .049

The rejection region requires α = .10 in the lower tail of the z-distribution. From Table IV, Appendix B, z.10 = 1.28. The rejection region is z < −1.28. Since the observed value of the test statistic falls in the rejection region (z = −2.04 < −1.28), H0 is rejected. There is sufficient evidence to conclude (p1 − p2 < 0) at α = .10. b.

For confidence coefficient .95, α = 1 − .95 = .05 and α/2 = .05/2 = .025. From Table IV, Appendix B, z.025 = 1.96. The 95% confidence interval for (p1 − p2) is approximately: pˆ qˆ pˆ qˆ ( pˆ1 − pˆ 2 ) ± zα / 2 1 2 + 2 2 n1 n2 .55(1 − .55) .65(1 − .65) + 200 200 ⇒ −.10 ± .096 ⇒ (−.196, −.004) ⇒ (.55 − .65) ± 1.96

c.

From part b, z.025 = 1.96. Using the information from our samples, we can use p1 = .55 and p2 = .65. For a width of .01, the margin of error is .005. n1 = n2 =

232

( zα / 2 )

2

( p1q1 + p2 q2 ) ME

2

=

(1.96) 2 (.55(1 − .55) + .65(1 − .65) )

.005 = 72990.4 ≈ 72,991

2

=

1.82476 .000025

Chapter 7


7.90

a.

Let p1 = proportion of Opening Doors students enrolled full time and p2 = proportion of traditional students enrolled full time. The target parameter for this comparison is p1 – p2.

b.

Let μ1 = mean GPA of Opening Doors students and μ2 = mean GPA of traditional students. The target parameter for this comparison is μ1 – μ2.

7.92

Using MINITAB, some preliminary calculations are: Descriptive Statistics: Spillage Variable Spillage

Cause Collision Fire Grounding HullFail Unknown

Variable Spillage

Cause Q3 Maximum Collision 102.0 257.0 Fire 80.5 239.0 Grounding 59.00 124.00 HullFail 46.0 221.0 Unknown * 27.00

a.

N 10 12 14 12 2

N* 0 0 0 0 0

Mean 76.6 70.9 47.79 54.4 26.00

SE Mean 22.3 17.5 7.61 16.3 1.00

StDev 70.4 60.7 28.47 56.4 1.41

Minimum 31.0 26.0 21.00 24.0 25.00

Q1 35.0 32.3 30.25 29.3 *

Median 41.5 49.0 37.50 31.5 26.00

Let μ1 = mean spillage for accidents caused by collision and μ2 = mean spillage for accidents caused by fire/explosion. s 2p =

( n1 − 1) s12 + ( n2 − 1) s22 = (10 − 1) 70.42 + (12 − 1) 60.72 n1 + n2 − 2

10 + 12 − 2

= 4, 256.7415

For confidence coefficient .90, α = 1 − .90 = .10 and α/2 = .10/2 = .05. From Table VI, Appendix B, with df = n1 + n2 − 2 = 10 + 12 – 2 = 20, t.05 = 1.725. The confidence interval is:

⎛1 1 ⎞ ⎛ 1 1 ⎞ ( x1 − x2 ) ± t.05 s 2p ⎜ + ⎟ ⇒ (76.6 − 70.9) ± 1.725 4256.7415 ⎜ + ⎟ ⎝ 10 12 ⎠ ⎝ n1 n2 ⎠ ⇒ 5.7 ± 48.19 ⇒ ( −42.59, 53.89) b.

Let μ3 = mean spillage for accidents caused by grounding and μ4 = mean spillage for accidents caused by hull failure. s 2p =

( n1 − 1) s12 + ( n2 − 1) s22 = (14 − 1) 28.47 2 + (12 − 1) 56.42 n1 + n2 − 2

14 + 12 − 2

= 1,896.9830


233


To determine if the mean spillage amount for accidents caused by grounding is different from the mean spillage amount caused by hull failure, we test: H0: μ3 − μ4 = 0 Ha: μ3 − μ4 ≠ 0 The test statistic is t =

( x1 − x2 ) − Do ⎛1 1 ⎞ s 2p ⎜ + ⎟ ⎝ n1 n2 ⎠

=

( 47.79 − 54.4 ) − 0 ⎛ 1 1⎞ 1896.983 ⎜ + ⎟ ⎝ 14 12 ⎠

=

−6.61 = −.39 17.1342

The rejection region requires α/2 = .05/2 = .025 in each tail of the t-distribution with df = n1 + n2 – 2 = 14 +12 – 2 = 24. From Table VI, Appendix B, t.025 = 2.064. The rejection region is t < −2.064 or t > 2.064. Since the observed value of the test statistic does not fall in the rejection region (t = −.39
The necessary assumptions are: We must assume that the distributions from which the samples were selected are approximately normal, the samples are independent, and the variances of the two populations are equal. Below are the stem-and-leaf plots for each of the samples: Stem-and-leaf of Spillage Cause = Collision Leaf Unit = 10 (6) 4 2 1 1 1

0 0 1 1 2 2

234

0 0 0 0 1 1 1 1 1 2 2

= 10

333444 69 2 5

Stem-and-leaf of Spillage Cause = Fire Leaf Unit = 10 4 (4) 4 3 2 2 1 1 1 1 1

N

N

= 12

2333 4455 7 8 3

3

Chapter 7


Stem-and-leaf of Spillage Cause = Grounding Leaf Unit = 1.0 3 (5) 6 4 3 2 2 2 1 1 1

2 3 4 5 6 7 8 9 10 11 12

N

168 11678 15 8 2 1 4

Stem-and-leaf of Spillage Cause = Hull Failure Leaf Unit = 10 (8) 4 2 2 2 1 1 1 1 1 1

0 0 0 0 1 1 1 1 1 2 2

= 14

N

= 12

22233333 44 0

2

Based on the shapes of the stem-and-leaf plots, it does not appear that the data are normally distributed. Also, we know that if the data are normally distributed, then the Interquartile Range, IQR, divided by the standard deviation should be approximately 1.3. We will compute IQR/s for each of the samples: Collision: Fire: Grounding: Hull Failure:

IQR/s = (102.0 – 35.0) / 70.4 = .95 IQR/s = (80.5 – 32.3) / 60.7 = .79 IQR/s = (59.0 – 30.25) / 28.47 = 1.01 IQR/s = (46.0 – 29.3) / 56.4 = .29

Since all of these ratios are quite a bit smaller than 1.3, it indicates that none of the samples come from normal distributions. Thus, it appears that the assumption of normal distributions is violated. The sample standard deviations are: Collision: Fire: Grounding: Hull Failure:

s = 70.4 s = 60.7 s = 28.47 s = 56.4

Without doing formal tests, it appears that the variances of the groups Collision, Fire, and Hull Failure are probably not significantly different. However, it appears that the variance for the grounding group is smaller than the others.


235


d.

Let σ 12 = variance of spillage for accidents caused by collision and σ 22 = variance of spillage for accidents caused by grounding. To determine if the variances of the amounts of spillage due to collision and grounding differ, we test: H0: σ 12 = σ 22 Ha: σ 12 ≠ σ 22 The test statistic is F =

Larger sample variance s12 70.42 = 6.11 = 2 = Smaller sample variance s2 28.47 2

The rejection region requires α/2 = .02/2 = .01 in the upper tail of the F distribution with numerator df = ν1 = n1 – 1 = 10 – 1 = 9 and denominator df = ν2 = n2 – 1 = 14 – 1 = 13. From Table XI, Appendix B, F.01 = 4.19. The rejection region is F > 4.19. Since the observed value of the test statistic falls in the rejection region (F = 6.11 > 4.19), H0 is rejected. There is sufficient evidence to indicate the variances of the amounts of spillage due to collision and grounding differ at α = .02. 7.94

a.

Let μ1 = mean rating of concern about product tampering for males and μ2 = mean rating of concern about product tampering for females. To determine whether a difference exists in the mean level of concern about product tampering between males and females, we test: H0: μ1 − μ2 = 0 Ha: μ1 − μ2 ≠ 0

7.96

b.

The p-value = .008. Since the p-value is so small, there is evidence to reject H0. There is sufficient evidence to indicate a difference exists in the mean level of concern about product tampering between males and females for α > .008.

c.

We must assume the sample sizes were sufficiently large so that the Central Limit Theorem applies. We must also assume that we selected two random and independent samples from the two populations.

For confidence coefficient .95, α = .05 and α/2 = .025. From Table IV, Appendix B, z.025 = 1.96. n1 − n 2 =

7.98

236

a.

( zα / 2 )

2

( p1q1 + p2 q2 ) ( ME ) 2

=

1.962 (.395(.605) + .293(.707) ) .032

= 1904.26 ≈ 1905

Let p1999 = proportion of adult Americans who would vote for a woman president in 1999 and p1975 = proportion of adult Americans who would vote for a woman president in 1975.

Chapter 7


b.

To see if the samples are sufficiently large: pˆ1999 ± 3σ pˆ1999 ⇒ pˆ1999 ± 3

p1999 q1999 pˆ qˆ .92(.08) ⇒ pˆ1999 ± 3 1999 1999 ⇒ .92 ± 3 n1999 n1999 2000

⇒ .92 ± .02 ⇒ (.90, .94) pˆ1975 ± 3σ pˆ1975 ⇒ pˆ1975 ± 3

p1975 q1975 pˆ qˆ .73(.27) ⇒ pˆ1975 ± 3 1975 1975 ⇒ .73 ± 3 n1975 n1975 2000

⇒ .73 ± .03 ⇒ (.70, .76) Since both intervals are contained within the interval (0, 1), the normal approximation will be adequate. c.

For confidence coefficient .90, α = .10 and α/2 = .10/2 = .05. From Table IV, Appendix B, z.05 = 1.645. The 90% confidence interval is: ( pˆ1 − pˆ 2 ) ± z.05

pˆ1 qˆ1 pˆ 2 qˆ2 .92(.08) .73(.27) + ⇒ (.92 − .73) ± 1.645 + n1 n2 2000 1500

⇒ .19 ± .02 ⇒ (.17, .21) We are 90% confident that the difference in the proportions of adult Americans who would vote for a woman president between 1999 and 1975 is between .17 and .21. d.

To see if the samples are sufficiently large: pˆ1999 ± 3σ pˆ1999 ⇒ pˆ1999 ± 3

p1999 q1999 pˆ qˆ .92(.08) ⇒ pˆ1999 ± 3 1999 1999 ⇒ .92 ± 3 n1999 n1999 20

⇒ .92 ± .18 ⇒ (.74, 1.10) pˆ1975 ± 3σ pˆ1975 ⇒ pˆ1975 ± 3

p1975 q1975 pˆ qˆ .73(.27) ⇒ pˆ1975 ± 3 1975 1975 ⇒ .73 ± 3 n1975 n1975 50

⇒ .73 ± .19 ⇒ (.54, .92) Since the first interval is not contained within the interval (0, 1), the normal approximation will not be adequate. 7.100

a.

For each measure, let μ1 = mean job satisfaction for day-shift nurses and μ2 = mean job satisfaction for night-shift nurses. To determine whether a difference in job satisfaction exists between day-shift and night-shift nurses, we test: H0: μ1 − μ2 = 0 Ha: μ1 − μ2 ≠ 0


237


b.

Hours of work: The p-value = .813. Since the p-value is so large, there is no evidence to reject H0. There is insufficient evidence to indicate a difference in mean job satisfaction exists between day-shift and night-shift nurses on hours of work for α ≤ .10. Free time: The p-value = .047. Since the p-value is so small, there is evidence to reject H0. There is sufficient evidence to indicate a difference in mean job satisfaction exists between day-shift and night-shift nurses on free time for α > .047. Breaks: The p-value = .0073. Since the p-value is so small, there is evidence to reject H0. There is sufficient evidence to indicate a difference in mean job satisfaction exists between day-shift and night-shift nurses on breaks for α > .0073.

c.

We must make the following assumptions for each measure: 1. 2. 3.

7.102

The job satisfaction scores for both day-shift and night-shift nurses are normally distributed. The variances of job satisfaction scores for both day-shift and night-shift nurses are equal. Random and independent samples were selected from both populations of job satisfaction scores.

For confidence coefficient .90, α = 1 − .90 = .10 and α/2 = .10/2 = .05. From Table IV, Appendix B, z.05 = 1.645. We estimate p1 = p2 = .5. n1 − n 2 =

7.104

( zα / 2 )

2

( p1q1 + p2 q2 ) ( ME ) 2

=

(1.645)2 (.5(.5) + .5(.5) ) .052

= 541.205 ≈ 542

Let p1 = proportion of larvae that died in containers containing high carbon dioxide levels and p2 = proportion of larvae that died in containers containing normal carbon dioxide levels. The parameter of interest for this problem is p1 − p2, or the difference in the death rates for the two groups. Some preliminary calculations are: pˆ =

x1 + x2 .10(80) + .05(80) = = .075 n1 + n2 80 + 80

qˆ = 1 − pˆ = 1 − .075 = .925

To determine if an increased level of carbon dioxide is effective in killing a higher percentage of leaf-eating larvae, we test: H0: p1 − p2 = 0 Ha: p1 − p2 > 0 The test statistic is z =

238

( pˆ1 − pˆ 2 ) − 0 1 1 ˆ ˆ ⎛⎜ + ⎞⎟ pq ⎝ 80 80 ⎠

=

(.10 − .05) − 0 1 ⎞ ⎛ 1 .075(.925) ⎜ + ⎟ ⎝ 80 80 ⎠

= 1.201

Chapter 7


The rejection region requires α = .01 in the upper tail of the z distribution. From Table IV, Appendix B, z.01 = 2.33. The rejection region is z > 2.33. Since the observed value of the test statistic does not fall in the rejection region (z = 1.201 >/ 2.33), H0 is not rejected. There is insufficient evidence to indicate that an increased level of carbon dioxide is effective in killing a higher percentage of leaf-eating larvae at α = .01. 7.106

a.

Let p1 = proportion of female students who switched due to loss of interest in SME and p2 = proportion of male students who switched due to lack of interest in SME. Some preliminary calculations are: pˆ1 =

x1 74 x x +x 72 74 + 72 = = .43; pˆ 2 = 2 = = .44; pˆ = 1 2 = = .436 n1 172 n2 163 n1 + n2 172 + 163

To determine if the proportion of female students who switch due to lack of interest in SME differs from the proportion of males who switch due to a lack of interest, we test: H0: p1 − p2 = 0 Ha: p1 − p2 ≠ 0 The test statistic is z =

( pˆ1 − pˆ 2 ) − 0 ⎛1 1⎞ ˆ ˆ⎜ + ⎟ pq ⎝ n1 n2 ⎠

=

(.43 − .44) − 0 1 ⎞ ⎛ 1 + .436(.564) ⎜ ⎟ ⎝ 172 163 ⎠

= −0.18

The rejection region requires α/2 = .10/2 = .05 in each tail of the z-distribution. From Table IV, Appendix B, z.05 = 1.645. The rejection region is z < −1.645 or z > 1.645. Since the observed value of the test statistic does not fall in the rejection region (z = −0.18
Let p1 = proportion of female students who switched due to low grades in SME and p2 = proportion of male students who switched due to low grades in SME. Some preliminary calculations are:

pˆ1 =

x1 33 = = .19; n1 172

pˆ 2 =

x2 44 = = .27 n2 163

For confidence coefficient .90, α = .10 and α/2 = .10/2 = .05. From Table IV, Appendix B, z.05 = 1.645. The confidence interval is: ( pˆ1 − pˆ 2 ) ± z.05

pˆ1qˆ1 pˆ 2 qˆ2 .19(.81) .27(.73) + ⇒ (.19 − .27) ± 1.645 + n1 n2 172 163 ⇒ −.08 ± .075 ⇒ (−.155, −.005)


239


We are 90% confident that the difference between the proportions of female and male switchers who lost confidence due to low grades in SME is between −.155 and −.005. Since the interval does not include 0, there is evidence to indicate the proportion of female switchers due to low grades is less than the proportion of male switchers due to low grades. 7.108

For confidence level .95, α = .05 and α/2 = .05/2 = .025. From Table IV, Appendix B, z.025 = 1.96. The standard deviation can be estimated by dividing the range by 4:

σ≈

Range 4 = =1 4 4 2 ( zα / 2 ) (σ 12 + σ 22 )

n1 = n 2 = 7.110

( ME ) 2

=

1.962 (12 + 12 ) = 192.08 ≈ 193 .22


∑x

2 1

s12 =

2

1

n1

=

n1 − 1

∑x

2 2

s22 =

(∑ x ) −

(∑ x ) −

2

2

n2 n2 − 1

2252 5 = 126 = 31.5 4 5 −1

10, 251 −

=

227 2 5 = 45.2 = 11.3 4 5 −1

10,351 −

Let σ 12 = variance for instrument A and σ 22 = variance for instrument B. Since we wish to determine if there is a difference in the precision of the two machines, we test: H0: σ 12 = σ 22 Ha: σ 12 ≠ σ 22 The test statistic is F =

Larger sample variance s12 31.5 = 2.79 = = Smaller sample variance s22 11.3

The rejection region requires α/2 = .10/2 = .05 in the upper tail of the F-distribution with ν1 = n1 − 1 = 5 − 1 = 4 and ν2 = n2 − 1 = 5 − 1 = 4. From Table IX, Appendix B, F.05 = 6.39. The rejection region is F > 6.39. Since the observed value of the test statistic does not fall in the rejection region (F = 2.79 >/ 6.39), H0 is not rejected. There is insufficient evidence of a difference in the precision of the two instruments at α = .10.

240

Chapter 7


7.112

a.

Let μ1 = mean change in bond prices handled by underwriter 1 and μ2 = mean change in bond prices handled by underwriter 2. sp2 =

(n1 − 1) s12 + ( n2 − 1) s22 (27 − 1).0098 + (23 − 1).002465 .30903 = = .006438 = 48 n1 + n2 − 2 27 + 23 − 2

To determine if there is a difference in the mean change in bond prices handled by the 2 underwriters, we test: H0: μ1 − μ2 = 0 Ha: μ1 − μ2 ≠ 0 The test statistic is t =

( x1 − x2 ) − D0 ⎛1 1⎞ s ⎜ + ⎟ ⎝ n1 n2 ⎠

=

2 p

−.0491 − (−.0307) − 0 1 ⎞ ⎛ 1 .006438 ⎜ + ⎟ ⎝ 27 23 ⎠

= −.81

The rejection region requires α/2 = .05/2 = .025 in each tail of the t-distribution with df = n1 + n2 − 2 = 27 + 23 − 2 = 48. From Table VI, Appendix B, t.025 ≈ 1.96. The rejection region is t < −1.96 or t > 1.96. Since the observed value of the test statistic does not fall in the rejection region (t = −.81
For confidence coefficient .95, α = 1 − .95 = .05 and α/2 = .05/2 = .025. From Table VI, Appendix B, with df = 48, t.025 ≈ 1.96. The confidence interval is: ⎛1 1⎞ ( x1 − x2 ) ± t.025 sp2 ⎜ + ⎟ ⎝ n1 n2 ⎠ 1 ⎞ ⎛ 1 ⇒ (−.0491 − (−.0307) ± 1.96 .006438 ⎜ + ⎟ ⎝ 27 23 ⎠ ⇒ −.0184 ± .0446 ⇒ (−.063, .0262)

We are 95% confident the difference in the mean bond prices handled by underwriter 1 and underwriter 2 is somewhere between −.063 and .0262. 7.114

a.

To determine if the mean salary of all males with post-graduate degrees exceeds the mean salary of all females with post-graduate degrees, we test: H0: μM = μF Ha: μM > μF

b.


( xM − xF ) − 0 s

2 xM

+s

2 xF

=

(61, 340 − 32, 227) 2,1852 + 9322

= 12.26


241


242

c.


d.

Since the observed value of the test statistic falls in the rejection region (z = 12.26 > 2.33), H0 is rejected. There is sufficient evidence to indicate the mean salary of all males with post-graduate degrees exceeds the mean salary of all females with post-graduate degrees at α = .01.

Chapter 7


The Kentucky Milk Case—Part II (To accompany Chapters 5–7)

(1)Incumbency Rates I have repeated the incumbency rates for the Tri-county market. If the "normal" incumbency rate is .7 in competitive markets, then we would like to test to see if the incumbency rate in the Tri-county market is larger than .7. We will run a test for each of the years from 1985 through 1988, and also for the four years combined.

Year 1984 1985 1986 1987 1988 1989 1990 1991

Tri-County Market Number of Same Incumbency Districts Vendors Rate 10 8 .800 12 12 1.000 13 13 1.000 13 12 .923 13 13 1.000 13 9 .692 13 10 .769 13 11 .846

1985 One of the assumptions necessary for this test is that the sample size is sufficiently large. In order for the sample size to be sufficiently large, the interval p0 ± 3σ pˆ must not contain 0 or 1. For this problem, the interval is: p0 ± 3σ pˆ ⇒ .7 ± 3

.7(.3) ⇒ .7 ± .397 ⇒ (.303, 1.097) 12

Since 1 is included in the interval, the sample size is not sufficiently large. The following test may not be valid. To see if the incumbency rate in 1985 exceeds .7, we test: H0: p = .7 Ha: p > .7


pˆ − p0 p0 q0 n

The Kentucky Milk Case—Part II

=

1 − .7 = 2.27 .7(.3) 12

243


The rejection region requires α = .05 in the upper tail of the z-distribution. From Table IV, Appendix B, z.05 = 1.645. The rejection region is z > 1.645. Since the observed value of the test statistic falls in the rejection region (z = 2.27 > 1.645), H0 is rejected. There is sufficient evidence to indicate that the incumbency rate exceeds .7 in the Tricounty market at α = .05. 1986

In order for the sample size to be sufficiently large, the interval p0 ± 3σ pˆ must not contain 0 or 1. For this problem, the interval is: p0 ± 3σ pˆ ⇒ .7 ± 3

.7(.3) ⇒ .7 ± .381 ⇒ (.319, 1.081) 13

Since 1 is included in the interval, the sample size is not sufficiently large. The following test may not be valid. To see if the incumbency rate in 1986 exceeds .7, we test: H0: p = .7 Ha: p > .7


pˆ − p0 p0 q0 n

=

1 − .7 .7(.3) 13

= 2.36


In order for the sample size to be sufficiently large, the interval p0 ± 3σ pˆ must not contain 0 or 1. For this problem, the interval is: p0 ± 3σ pˆ ⇒ .7 ± 3

244

.7(.3) ⇒ .7 ± .381 ⇒ (.319, 1.081) 13



Since 1 is included in the interval, the sample size is not sufficiently large. The following test may not be valid. To see if the incumbency rate in 1987 exceeds .7, we test: H0: p = .7 Ha: p > .7 The test statistic is z =

pˆ − p0 p0 q0 n

=

.923 − .7 = 1.75 .7(.3) 13


This test is the same as the test for 1986. Combined 1985-1988

To see if the sample size is sufficiently large, the interval p0 ± 3σ pˆ must not contain 0 or 1. For this problem, the interval is: p0 ± 3σ pˆ ⇒ .7 ± 3

.7(.3) ⇒ .7 ± .193 ⇒ (.507, .893) 51

Since neither 0 nor 1 is included in the interval, the sample size is sufficiently large. pˆ =

50 = .980 51

To see if the incumbency rate in 1985–1988 exceeds .7, we test: H0: p = .7 Ha: p > .7


pˆ − p0 p0 q0 n

=

980 − .7 = 4.36 .7(.3) 51



245


Since the observed value of the test statistic falls in the rejection region (z = 4.36 > 1.645), H0 is rejected. There is sufficient evidence to indicate that the incumbency rate exceeds .7 in the Tricounty market at α = .05. Thus, there is evidence, based on the incumbency rates, that bid collusion is present in the Tricounty market.

(2)Bid Price Dispersion

Again, we can use only the data provided which are the winning bids in each of the school districts in both markets. The sample sizes and the variances for each of the milk products for each year and each market are provided in the table. Whole White Milk

YR 83 84 85 86 87 88 89 90 91

N 22 22 26 33 36 36 37 35 5

Surround Market VAR 0.000212 0.000188 0.000174 0.000120 0.000105 0.000128 0.000056 0.000063 0.000042

N 8 9 10 10 12 12 12 12 13

Tri-County Market VAR 0.000213 0.000022 0.000028 0.000019 0.000027 0.000024 0.000089 0.000010 0.000020

N 10 12 13 13 13 13 13 13 12


Lowfat White Milk

YR 83 84 85 86 87 88 89 90 91

246

N 24 26 29 33 35 35 35 34 5




Lowfat Chocolate Milk

YR 83 84 85 86 87 88 89 90 91

N 24 25 28 34 36 36 36 33 5


N 5 6 6 6 7 9 9 10 11


I will write out the first test and then summarize the others in a table. The first test will be for the year 1983 and will compare the variances of the whole white milk. To determine if the variances in the winning bid prices differ for the two markets, we test:

σ 12 =1 σ 22 σ2 Ha: 12 ≠ 1 σ2 H0:


s2 larger sample variance .000213 = 1.005 = 12 = s2 smaller sample variance .000212

The rejection region requires α/2 = .05/2 = .025 in the upper tail of the F-distribution with ν1 = n2 − 1 = 8 − 1 = 7 and ν2 = n1 − 1 = 22 − 1 = 21. From Table IX, Appendix B, F.025 = 2.97. The rejection region is F > 2.97. Since the observed value of the test statistic does not fall in the rejection region (F = 1.005 >/ 2.97), H0 is not rejected. There is insufficient evidence to indicate that the variances of the winning bids are different for the two markets. Whole White Milk Year 1983 1984 1985 1986 1987 1988 1989 1990 1991

ν 1, ν 2 7,21 21,8 25,9 32,9 35,11 35,11 11,36 34,11 4,12

F.025 2.97 4.00 3.61 3.56 2.96 2.96 2.51 3.12 4.12


F 1.005 8.545 6.214 6.316 3.889 5.333 1.589 6.300 2.100

Decision Do not reject Reject Reject Reject Reject Reject Do not reject Reject Do not reject

247


In all cases where there was a significant difference in the variances of the winning bids between the two markets, the variance in the Surrounding market was larger than the variance in the Tri-county market. This implies that collusion might be present in the Tri-county market. Lowfat White Milk Year 1983 1984 1985 1986 1987 1988 1989 1990 1991

ν 1, ν 2 23,9 25,11 28,12 32,12 34,12 34,12 12,34 33,12 4,11

F.025 3.62 3.17 3.02 2.96 2.96 2.96 2.41 2.96 4.28

F 1.800 5.400 7.500 4.964 3.102 4.342 1.581 3.640 1.500

Decision Do not reject Reject Reject Reject Reject Reject Do not reject Reject Do not reject

Again, in all cases where there was a significant difference in the variances of the winning bids between the two markets, the variance in the Surrounding market was larger than the variance in the Tri-county market. This implies that collusion might be present in the Tri-county market. Lowfat Chocolate Milk Year 1983 1984 1985 1986 1987 1988 1989 1990 1991

v1,v2 23,4 24,5 27,5 33,5 35,6 35,8 8,35 32,9 4,10

F.025 8.56 6.28 6.28 6.23 5.07 3.89 2.65 3.56 4.47

F 19.133 3.900 6.526 6.037 4.075 10.222 1.450 7.000 2.333

Decision Reject Do not reject Reject Do not reject Do not reject Reject Do not reject Reject Do not reject

Again, in all cases where there was a significant difference in the variances of the winning bids between the two markets, the variance in the Surrounding market was larger than the variance in the Tri-county market. This implies that collusion might be present in the Tri-county market. Based on the analysis of the three milk products, there appears to be collusion in the Tri-county market.

248



(3)Average Winning Bid Price

I have provided the SAS output for computing the t-tests to compare the mean winning bid prices between the two markets for each of the years and each of the milk products. I will discuss the findings for each milk product separately. For t-tests, we must assume that the two population variances are the same. If the population variances are not the same, there is an approximate test that takes into consideration the different variances. The SAS printout provided allows for the test of equal variances first. I used a p-value of .25 as the cutoff point. If the p-value was less than or equal to .25 for the F-test, I assumed that the variances were different and used the approximate test designated as UNEQUAL. If the p-value for the F-test was greater than .25, I assumed that the population variances were the same and used the test designated as EQUAL. Whole White Milk: Variable: Whole White Milk - 1983 MARKET

N

Mean

Std Dev

Std Error

Variances

T

DF

Prob>|T|

-----------------------------------------------------------------------------SUR

22

0.1318

0.01458844

0.00311027

Unequal

2.4045

12.4

0.0326

TRI

8

0.1173

0.01462038

0.00516909

Equal

2.4071

28.0

0.0229*

For H0: Variances are equal, F' = 1.00

DF = (7,21)

Prob>F' = 0.9116 ************************************************************************ Variable: Whole White Milk MARKET

N

Mean

- 1984

Std Dev

Std Error

Variances

T

DF Prob>|T|

-----------------------------------------------------------------------------SUR

22

0.1309

0.01374189

0.00292978

Unequal

-2.3904

28.6

0.0236*

TRI

9

0.1389

0.00474871

0.00158290

Equal

-1.6825

29.0

0.1032


DF = (21,8)


N

Mean

- 1985

Std Dev

Std Error

Variances

T

DF Prob>|T|

-----------------------------------------------------------------------------SUR

26

0.1279

0.01321810

0.00259228

Unequal

-4.3968

33.8

0.0001*

TRI

10

0.1415

0.00534266

0.00168950

Equal

-3.1348

34.0

0.0035


DF = (25,9)

Prob>F' = 0.0077 ************************************************************************


249


Variable: Whole White Milk MARKET

N

Mean

- 1986

Std Dev

Std Error

Variances

T

DF Prob>|T|

----------------------------------------------------------------------------SUR

33

0.1253

0.01098665

0.00191253

Unequal

-8.1534

37.3

0.0001*

TRI

10

0.1446

0.00442846

0.00140040

Equal

-5.3943

41.0

0.0000


DF = (32,9)


N

Mean

- 1987

Std Dev

Std Error

Variances

T

DF

Prob>|T|

-----------------------------------------------------------------------------SUR

36

0.1264

0.01026078

0.00171013

Unequal

TRI

12

0.1495

0.00527196

0.00152188

Equal


-10.0785

37.5

0.0001*

-7.4313

46.0

0.0000

DF = (35,11)


N

Mean

- 1988

Std Dev

Std Error

Variances

T

DF

Prob>|T|

-----------------------------------------------------------------------------SUR

36

0.1277

0.01135449

0.00189242

Unequal

-9.9271

42.2

0.0001*

TRI

12

0.1513

0.00499090

0.00144075

Equal

-6.9441

46.0

0.0000


DF = (35,11)


N

Mean

- 1989

Std Dev

Std Error

Variances

T

DF

Prob>|T|

-----------------------------------------------------------------------------SUR

37

0.1299

0.00752173

0.00123657

Unequal

-0.4890

15.8

0.6316

TRI

12

0.1314

0.00944991

0.00272795

Equal

-0.5501

47.0

0.5849NS


DF = (11,36)


N

Mean

- 1990

Std Dev

Std Error Variances

T

DF

Prob>|T|

--------------------------------------------------------------------------SUR

35

0.1609

0.00794659

0.00134322 Unequal

-1.1177

43.7

0.2698NS

TRI

12

0.1628

0.00317904

0.00091771 Equal

-0.7673

45.0

0.4469


DF = (34,11)

Prob>F' = 0.0026 ************************************************************************

250



Variable: Whole White Milk MARKET

N

Mean

- 1991

Std Dev

Std Error

Variances

T

DF

Prob>|T|

-----------------------------------------------------------------------------SUR

5

0.1452

0.00652012

0.00291589

Unequal

1.2585

5.6

TRI

13

0.1412

0.00458169

0.00127073

Equal

1.4813

16.0


0.2585 0.1580NS

DF = (4,12)

Prob>F' = 0.3095 ************************************************************************

The mean winning bid prices were significantly different between the markets for all years except 1989, 1990, and 1991. In 1983, the mean winning bid for the Surrounding market was significantly larger than that for the Tri-county market. For the years 1984–1988, the mean winning bid price for the Tri-county market was significantly larger than that for the Surrounding market. This implies evidence of collusion for the years 1984–1988. Lowfat White Milk: Variable: Lowfat White Milk - 1983 MARKET

N

Mean

Std Dev

Std Error

Variances

T

DF

Prob>|T|

-----------------------------------------------------------------------------SUR

24

0.1243

0.01672220

0.00341341

Unequal

2.5085

22.6

0.0198

TRI

10

0.1112

0.01246237

0.00394095

Equal

2.2214

32.0

0.0335*


DF = (23,9)

Prob>F' = 0.3627 ************************************************************************ Variable: Lowfat White Milk - 1984 MARKET

N

Mean

Std Dev

Std Error

Variances

T

DF

Prob>|T|

----------------------------------------------------------------------------SUR

26

0.1236

0.01469859

0.00288263

Unequal

-3.0061

36.0

0.0048*

TRI

12

0.1338

0.00635717

0.00183516

Equal

-2.3099

36.0

0.0267


DF = (25,11)


N

Mean

Std Dev

Std Error

Variances

T

DF

Prob>|T|

----------------------------------------------------------------------------SUR

29

0.1200

0.01452245

0.00269675

Unequal

-5.3857

39.2

0.0001*

TRI

13

0.1366

0.00537445

0.00149061

Equal

-3.9769

40.0

0.0003


DF = (28,12)

Prob>F' = 0.0008 ************************************************************************


251


Variable: Lowfat White Milk - 1986 MARKET

N

Mean

Std Dev

Std Error

Variances

T

DF

Prob>|T|

-----------------------------------------------------------------------------SUR

33

0.1178

0.01180640

0.00205523

Unequal

-8.4010

43.0

0.0001*

TRI

13

0.1391

0.00533205

0.00147884

Equal

-6.2183

44.0

0.0000


DF = (32,12)


N

Mean

Std Dev

Std Error

Variances

T

DF

Prob>|T|

------------------------------------------------------------------------------SUR

35

0.1173

0.01235100

0.00208770

Unequal

-8.7991

37.8

0.0001*

TRI

13

0.1424

0.00701738

0.00194627

Equal

-6.8995

46.0

0.0000


DF = (34,12)


N

Mean

Std Dev

Std Error

Variances

T

DF

Prob>|T|

----------------------------------------------------------------------------SUR

35

0.1182

0.01285522

0.00217293

Unequal

-9.6219

42.7

0.0001*

TRI

13

0.1448

0.00618019

0.00171408

Equal

-7.1332

46.0

0.0000


DF = (34,12)


N

Mean

Std Dev

Std Error

Variances

T

DF

Prob>|T|

----------------------------------------------------------------------------SUR

35

0.1187

0.00655938

0.00110874

Unequal

-2.1005

17.9

0.0501

TRI

13

0.1240

0.00828350

0.00229743

Equal

-2.3400

46.0

0.0237*


DF = (12,34)


N

Mean

Std Dev

Std Error

Variances

T

DF

Prob>|T|

-----------------------------------------------------------------------------SUR

34

0.1519

0.00954524

0.00163700

Unequal

-2.3772

39.8

0.0223*

TRI

13

0.1570

0.00508486

0.00141029

Equal

-1.8347

45.0

0.0732


DF = (33,12)

Prob>F' = 0.0238 ************************************************************************

252



Variable: Lowfat White Milk - 1991 MARKET

N

Mean

Std Dev

Std Error

Variances

T

DF

Prob>|T|

----------------------------------------------------------------------------SUR

5

0.1364

0.00718485

0.00321316

Unequal

0.2745

6.3

TRI

12

0.1354

0.00585768

0.00169097

Equal

0.3001

15.0


0.7925 0.7682NS

DF = (4,11)

Prob>F' = 0.5343 ************************************************************************

The mean winning bid prices were significantly different between the markets for all years except 1991. In 1983, the mean winning bid for the Surrounding market was significantly larger than that for the Tri-county market. For the years 1984–1990, the mean winning bid price for the Tricounty market was significantly larger than that for the Surrounding market. This implies evidence of collusion for the years 1984–1990. Lowfat Chocolate Milk: Variable: Lowfat Chocolate Milk - 1983 MARKET

N

Mean

Std Dev

Std Error

Variances

T

DF

Prob>|T|

-----------------------------------------------------------------------------SUR

24

0.1267

0.01696642

0.00346326

Unequal

5.3313

26.3

0.0001*

TRI

5

0.1060

0.00394740

0.00176533

Equal

2.6795

27.0

0.0124

For H0: Variances are equal, F' =

18.47

DF = (23,4)

Prob>F' = 0.0117 ************************************************************************ Variable: Lowfat Chocolate Milk - 1984 MARKET

N

Mean

Std Dev

Std Error

Variances

T

DF

Prob>|T|

----------------------------------------------------------------------------SUR

25

0.1251

0.01530156

0.00306031

Unequal

-2.1693

15.7

0.0457*

TRI

6

0.1347

0.00778522

0.00317830

Equal

-1.4733

29.0

0.1514


DF = (24,5)


N

Mean

Std Dev

Std Error

Variances

T

DF

Prob>|T|

----------------------------------------------------------------------------SUR

28

0.1206

0.01575587

0.00297758

Unequal

-4.6215

20.9

0.0001*

TRI

6

0.1387

0.00621914

0.00253895

Equal

-2.7384

32.0

0.0100


DF = (27,5)

Prob>F' = 0.0472 ************************************************************************


253


Variable: Lowfat Chocolate Milk - 1986 MARKET

N

Mean

Std Dev

Std Error

Variances

T

DF

Prob>|T|

----------------------------------------------------------------------------SUR

34

0.1169

0.01279357

0.00219408

Unequal

-8.0140

18.2

0.0001*

TRI

6

0.1414

0.00521130

0.00212751

Equal

-4.5821

38.0

0.0000


DF = (33,5)


N

Mean

Std Dev

Std Error

Variances

T

DF

Prob>|T|

----------------------------------------------------------------------------SUR

36

0.1184

0.01280507

0.00213418

Unequal

-7.8853

17.5

0.0001*

TRI

7

0.1436

0.00632926

0.00239224

Equal

-5.0675

41.0

0.0000


DF = (35,6)


N

Mean

Std Dev

Std Error

Variances

T

DF

Prob>|T|

----------------------------------------------------------------------------SUR

36

0.1192

0.01359999

0.00226666

Unequal

10.3636

40.6

0.0001*

TRI

9

0.1470

0.00425532

0.00141844

Equal

-5.9934

43.0

0.0000

For H0: Variances are equal, F' =

10.21

DF = (35,8)


N

Mean

Std Dev

Std Error

Variances

T

DF

Prob>|T|

----------------------------------------------------------------------------SUR

36

0.1200

0.00776605

0.00129434

Unequal

-1.7178

10.9

0.1140

TRI

9

0.1258

0.00932923

0.00310974

Equal

-1.9216

43.0

0.0613NS


DF = (8,35)


N

Mean

Std De

Std Error

Variances

T

DF

Prob>|T|

-----------------------------------------------------------------------------SUR

33

0.1531

0.00993298

0.00172911

Unequal

-3.9472

38.3

0.0003*

TRI

10

0.1614

0.00383030

0.00121125

Equal

-2.5773

41.0

0.0137


DF = (32,9)

Prob>F' = 0.0050 ************************************************************************

254



Variable: Lowfat Chocolate Milk - 1991 MARKET

N

Mean

Std Dev

Std Error

Variances

T

DF

Prob>|T|

----------------------------------------------------------------------------SUR

5

0.1402

0.00991020

0.00443197

Unequal

-0.4431

5.6

TRI

11

0.1423

0.00650294

0.00196071

Equal

-0.5216

14.0


0.6743 0.6101NS

DF = (4,10)

Prob>F' = 0.2552

The mean winning bid prices were significantly different between the markets for all years except 1989 and 1991. In 1983, the mean winning bid for the Surrounding market was significantly larger than that for the Tri-county market. For the years 1984–1988 and 1990, the mean winning bid price for the Tri-county market was significantly larger than that for the Surrounding market. This implies evidence of collusion for the years 1984–1988.


255


Design of Experiments and Analysis of Variance 8.2

Chapter 8

The treatments are the combinations of levels of each of the two factors. There are 2 × 5 = 10 treatments. They are: (A, 50), (A, 60), (A, 70), (A, 80), (A, 90) (B, 50), (B, 60), (B, 70), (B, 80), (B, 90)

8.4

8.6

a.

College GPA's are measured on college students. The experimental units are college students.

b.

Household income is measured on households. The experimental units are households.

c.

Gasoline mileage is measured on automobiles. The experimental units are the automobiles of a particular model.

d.

The experimental units are the sectors on a computer diskette.

e.

The experimental units are the states.

a.

The response variable is the amount of the purchase.

b. There is one factor in this problem: type of credit card. c. There are 4 treatments, corresponding to the 4 levels of the factor. The treatments are VISA, MasterCard, American Express, and Discover. d. The experimental units are the credit card holders. 8.8

8.10

256

a.

The response variable in this problem is the consumer’s opinion on the value of the discount offer.

b.

There are two treatments in this problem: Within-store price promotion and betweenstore price promotion.

c.

The experimental units are the consumers.

a.

There are 2 factors in the problem: Type of yeast and Temperature. Type of yeast has 2 levels – Brewer’s yeast and baker’s yeast. Temperature has 4 levels – 45o, 48o, 51o and 54oC.

b.

The response variable is the autolysis yield.

c.

There are a total of 2 × 4 = 8 treatments in this experiment. The treatments are all the type of yeast-temperature combinations.

d.

This is a designed experiment.

Chapter 8


8.12

8.14

8.16

a.

The response is the evaluation by the undergraduate student of the ethical behavior of the salesperson.

b.

There are two factors—type of sales job at two levels (high tech. vs. low tech.) and sales task at two levels (new account development vs. account maintenance).

c.

The treatments are the 2 × 2 = 4 combinations type of sales job and sales task.

d.

The experimental units are the college students.

a.

From Table IX with ν1 = 4 and ν2 = 4, F.05 = 6.39.

b.

From Table XI with ν1 = 4 and ν2 = 4, F.01 = 15.98.

c.

From Table VIII with ν1 = 30 and ν2 = 40, F.10 = 1.54.

d.

From Table X with ν1 = 15, and ν2 = 12, F.025 = 3.18.

a.

In the second dot diagram #2, the difference between the sample means is small relative to the variability within the sample observations. In the first dot diagram #1, the values in each of the samples are grouped together with a range of 4, while in the second diagram #2, the range of values is 8.

b. For diagram #1,

∑x

7 + 8 + 9 + 10 + 11 54 = =9 n 6 6 ∑ x2 = 12 + 13 + 14 + 14 + 15 + 16 = 84 = 14 x2 = n 6 6 x1 =

1

=

For diagram #2,

∑x

5 + 5 + 7 + 11 + 13 + 13 54 = =9 n 6 6 ∑ x2 = 10 + 10 + 12 + 16 + 18 + 18 = 84 = 14 x2 = n 6 6

x1 =

c.

1

=

For diagram #1, 2

SST =

∑ n (x i =1

i

i

− x ) 2 1 = 6(9 − 11.5)2 + 6(14 − 11.5)2 = 75

⎛ ∑ x = 54 + 84 = 11.5 ⎞⎟ ⎜⎜ x = ⎟ 12 n ⎝ ⎠ For diagram #2, 2

SST =

∑ n (x i =1

i

i

− x ) 2 = 6(9 - 11.5)2 + 6(14 - 11.5)2 = 75


257


d.

For diagram #1,

∑x

2 1

s12 =

(∑ x ) −

2

1

n1

=

n1 − 1

∑x

2 2

s22 =

(∑ x ) −

2 542 54 496 − 6 = 6 =2 6 −1 6 −1

496 −

2

2

n2 n2 − 1

=

842 6 =2 6 −1

1186 −

SSE = (n1 − 1) s12 + (n2 − 1) s22 = (6 − 1)2 + (6 − 1)2 = 20 For diagram #2,

∑x

2 1

s12 =

(∑ x ) −

2

1

n1

=

n1 − 1

∑x

2 2

s22 =

(∑ x ) −

2

2

n2 n2 − 1

542 6 = 14.4 6 −1

558 −

=

842 6 = 14.4 6 −1

1248 −

SSE = (n1 − 1) s12 + (n2 − 1) s22 = (6 − 1)14.4 + (6 − 1)14.4 = 144 e.

For diagram #1, SS(Total) = SST + SSE = 75 + 20 = 95 SST is

SST 75 × 100% = × 100% = 78.95% of SS(Total) SS(Total) 95

For diagram #2, SS(Total) = SST + SSE = 75 + 144 = 219 SST is

f.

SST 75 × 100% = × 100% = 34.25% of SS(Total) SS(Total) 219 SST 75 = = 75 k −1 2 −1 SSE 20 = MSE = =2 n − k 12 − 2

For diagram #1, MST =

SST 75 = = 75 k −1 2 −1 SSE 144 = = 14.4 MSE = n − k 12 − 2

F=

MST 75 = = 37.5 MSE 2

F=

MST 75 = = 5.21 MSE 14.4

For diagram #2, MST =

258

Chapter 8


g.

The rejection region for both diagrams requires α = .05 in the upper tail of the Fdistribution with ν1 = p − 1 = 2 − 1 = 1 and ν2 = n − p = 12 − 2 = 10. From Table IX, Appendix B, F.05 = 4.96. The rejection region is F > 4.96. For diagram #1, the observed value of the test statistic falls in the rejection region (F = 37.5 > 4.96). Thus, H0 is rejected. There is sufficient evidence to indicate the samples were drawn from populations with different means at α = .05. For diagram #2, the observed value of the test statistic falls in the rejection region (F = 5.21 > 4.96). Thus, H0 is rejected. There is sufficient evidence to indicate the samples were drawn from populations with different means at α = .05.

h. 8.18

We must assume both populations are normally distributed with common variances.

Refer to Exercise 8.16, the ANOVA table is: For diagram #1: Source Treatment Error Total

Df 1 10 11

SS 75 20 95

MS 75 2

F 37.5

SS 75 144 219

MS 75 14.4

F 5.21

For diagram #2: Source Treatment Error Total

8.20

a.

Df 1 10 11

df for Error is 41 − 6 = 35 SSE = SS(Total) − SST = 46.5 − 17.5 = 29.0 SST 17.5 = = 2.9167 k −1 6 MST 2.9167 = F= = 3.52 MSE .8286

MST =

MSE =

SSE 29.0 = = .8286 n−k 35

The ANOVA table is: Source Treatment Error Total

df 6 35 41

SS 17.5 29.0 46.5

MS 2.9167 .8286


F 3.52

259


b.

The number of treatments is k. We know k − 1 = 6 ⇒ k = 7.

c.

To determine if there is a difference among the population means, we test: H0: μ1 = μ2 = ⋅⋅⋅ = μ7 Ha: At least one of the population means differs from the rest The test statistic is F = 3.52. The rejection region requires α = .10 in the upper tail of the F-distribution with numerator df = k − 1 = 6 and denominator df = n − k = 35. From Table VIII, Appendix B, F.10 ≈ 1.98. The rejection region is F > 1.98. Since the observed value of the test statistic falls in the rejection region (F = 3.52 > 1.98), H0 is rejected. There is sufficient evidence to indicate a difference among the population means at α = .10.

d.

The observed significance level is P(F ≥ 3.52). With numerator df = 6 and denominator df = 35, and Table XI, P(F ≥ 3.52) < .01.

e.

H0: μ1 = μ2 Ha: μ1 ≠ μ2 The test statistic is t =

x1 − x2 ⎛1 1 ⎞ MSE ⎜ + ⎟ ⎝ n1 n2 ⎠

=

3.7 − 4.1 ⎛1 1⎞ .8286 ⎜ + ⎟ ⎝6 6⎠

= −.76

The rejection region requires α/2 = .10/2 = .05 in each tail of the t-distribution with df = n − p = 35. From Table VI, Appendix B, t.05 ≈ 1.697. The rejection region is t < −1.697 or t > 1.697. Since the observed value of the test statistic does not fall in the rejection region (t = −.76
For confidence coefficient .90, α = .10 and α/2 = .05. From Table VI, Appendix B, with df = 35, t.05 ≈ 1.697. The confidence interval is:

⎛1 1⎞ ⎛1 1⎞ ( x1 − x2 ) ± t.05 MSE⎜ + ⎟ ⇒ (3.7 − 4.1) ± 1.697 .8286 ⎜ + ⎟ ⎝6 6⎠ ⎝ n1 n 2 ⎠ ⇒ −.4 ± .892 ⇒ (1.292, .492) g.

260

The confidence interval is: x1 ± t.05 MSE/6 ⇒ 3.7 ± 1.697 .8286 / 6 ⇒ 3.7 ± .631 ⇒ (3.069, 4.331)

Chapter 8


8.22

a.

The experimental unit in the study is the college tennis coach. The dependent variable is the response to the statement “the Prospective Student-Athlete Form on the web site contributes very little to the recruiting process” on a scale from 1 to 7. There is one factor in the study and it is the NCAA division of the college tennis coach. There are 3 levels of this factor, and thus, there are 3 treatments – Division I, Division II, and Division III.

b.

To determine if the mean responses of tennis coaches from the different divisions differ, we test:

H0: μ1 = μ2 = μ3 Ha: At least 1 μi differs

8.24

c.

Since the observed p-value of the test (p < .003) is less than α = .05, H0 is rejected. There is sufficient evidence to indicate differences in mean response among coaches of the 3 divisions.

a.

A completely randomized design was used.

b.

There are 4 treatments: 3 robots/colony, 6 robots/colony, 9 robots/colony, and 12 robots/colony.

c.

To determine if there was a difference in the mean energy expended (per robot) among the 4 colony sizes, we test:

H0: μ1 = μ2 = μ3 = μ4 Ha: At least two means differ d.

8.26

a.

Since the p-value (<.001) is less than α (.05), H0 is rejected. There is sufficient evidence to indicate a difference in mean energy expended per robot among the 4 colony sizes at α = .05. To determine if differences exist in the mean rates of return among the three types of fund groups, we test:

H0: μ1 = μ2 = μ3 Ha: At least two means differ b.

c.

The rejection region requires α = .01 in the upper tail of the F-distribution with ν1 = k – 1 = 3 – 1 = 2 and ν2 = N – k = 90 – 3 = 87. From Table XI, Appendix B, F.01 ≈ 4.98. The rejection region is F > 4.98. Since the observed value of the test statistic falls in the rejection region (F = 69.65 > 4.98), H0 is rejected. There is sufficient evidence to indicate differences exist in the mean rates of return among the three types of fund groups at α = .01.


261


8.28

a.

The response variable for this study is the safety rating of nuclear power plants.

b.

There are three treatments in this study. The treatment groups are the scientists, the journalists, and the federal government policymakers.

c.

To determine whether there are differences in the attitudes of scientists, journalists, and government officials regarding the safety of nuclear power plants, we test:

H0: μ1 = μ2 = μ3 Ha: At least two means differ d.

The rejection region requires α = .05 in the upper tail of the F-distribution with ν1 = k − 1 = 3 − 1 = 2 and ν2 = n − k = 300 − 3 = 297. From Table IX, Appendix B, F.05 ≈ 3.00. The rejection region is F > 3.00. In order to reject H0, the test statistic F must be greater than 3.00.

F=

MST > 3.00 MSE

⇒ MST > 3.00(MSE) ⇒ 3.00 (2.355) = 7.065. Thus, MST must be greater than 7.065.

8.30

MST 11.28 = = 4.79 MSE 2.355

e.

For MST = 11.280, F =

f.

With ν1 = k − 1 = 3 − 1 = 2 and ν2 = n − k = 300 − 3 = 297, P(F > 4.79) ≈ .01, using Table XI, Appendix B. The approximate p-value is .01.

a.

We will select size as the quantitative variable and color as the qualitative variable. To determine if the mean size of diamonds differ among the 6 colors, we test:

H0: μ1 = μ2 = μ3 = μ4 = μ5 = μ6 Ha: At least two means differ b.

Using MINITAB, the ANOVA table is:

One-way ANOVA: Carats versus Color Analysis of Variance for Carats Source DF SS MS Color 5 0.7963 0.1593 Error 302 22.7907 0.0755 Total 307 23.5869 Level D E F G H I

N 16 44 82 65 61 40

Pooled StDev =

262

Mean 0.6381 0.6232 0.5929 0.5808 0.6734 0.7310 0.2747

StDev 0.3195 0.2677 0.2648 0.2792 0.2643 0.2918

F 2.11

P 0.064

Individual 95% CIs For Mean Based on Pooled StDev ----------+---------+---------+-----(-------------*------------) (-------*-------) (-----*-----) (------*------) (------*------) (-------*--------) ----------+---------+---------+-----0.60 0.70 0.80

Chapter 8


The test statistic is F = 2.11 and the p-value is p = 0.064. Since the p-value (0.064) is less than α = .10, H0 is rejected. There is sufficient evidence to indicate the mean size of diamonds differ among the 6 colors at α = .10. c.

We will check the assumptions of normality and equal variances. Using MINITAB, the stem-and-leaf plots are: Stem-and-Leaf Display: Carats Stem-and-leaf of Carats Leaf Unit = 0.010 1 3 5 5 7 7 (4) 5 5 5

1 2 3 4 5 6 7 8 9 10

1 2 3 4 5 6 7 8 9 10

1 2 3 4 5 6 7 8 9 10

1156 00011

1 2 3 4 5 6 7 8 9 10

Color = 2

N

= 44

Color = 3

N

= 82

N

= 65

9 123 0011345 6 00012245668 23 000123 113 0000011113

88999 1356667 01124445567 0178 000111122333345566678 0 00001112367 0012555 0 00000011112224

Stem-and-leaf of Carats Leaf Unit = 0.010 5 12 21 23 (12) 30 26 16 12 12

= 16

23


N

9 01 01


Color= 1

Color = 4

88899 0001359 000124455 08 000013556789 0034 0000001348 0125 000000011126


263


Stem-and-leaf of Carats Leaf Unit = 0.010 5 14 16 21 27 (13) 21 14 14 1

2 3 4 5 6 7 8 9 10 11

Color = 5

2 457 012344567 03 25778 001466 0001112233448 0014669

2 3 4 5 6 7 8 9 10

= 61

1 89

0000011111266 0

Stem-and-leaf of Carats Leaf Unit = 0.010 4 8 11 13 15 20 20 17 16

N

Color = 6

N

= 40

5689 0113 115 26 25 03355 002 0 0000001111114579

The data for the 6 colors do not look particularly mound-shaped, so the assumption of normality is probably not valid. However, departures from this assumption often do not invalidate the ANOVA results. Using MINITAB, the box plots are:

1.1 1.0 0.9

Carats

0.8 0.7 0.6 0.5 0.4 0.3 0.2 D

E

F

G

H

I

Color

The spreads of all the colors appear to be about the same, so the assumption of constant variance is probably valid.

264

Chapter 8


8.32

a.

The df for Groups = ν1 = k – 1 = 3 – 1 = 2. The df for Error = ν2 = n – k = 71 – 3 = 68. The completed ANOVA table is: Source Groups Error

b.

df

2 68

SS 128.70 27,124.52

MS 64.35 398.89

F 0.16

To determine if the total number of activities undertaken differed among the three groups of entrepreneurs, we test:

H0: μ1 = μ2 = μ3 Ha: At least one mean differs The test statistic is F = 0.16. The rejection region requires α = .05 in the upper tail of the F-distribution with ν1 = k − 1 = 3 − 1 = 2 and ν2 = n − k = 71 − 3 = 68. From Table IX, Appendix B, F.05 ≈ 3.15. The rejection region is F > 3.15. Since the observed value of the test statistic does not fall in the rejection region (F = 0.16 >/ 3.15), H0 is not rejected. There is insufficient evidence to indicate that the total number of activities differed among the groups of entrepreneurs at α = .05. c.

The p-value of the test is P(F > 0.16). From Table VIII, Appendix B, with ν1 = 2 and

ν2 = 68, P(F > 0.16) > .10.

d.

No. Since our conclusion was that there was no evidence of a difference in the total number of activities among the groups, there would be no evidence to indicate a difference between two specific groups.

e.

This study would be observational. The group that each entrepreneur fell into was observed, not controlled. Since no differences were found, the type of study does not have an impact on the conclusions.

8.34

The experimentwise error rate is the probability of making a Type I error for at least one of all of the comparisons made. If the experimentwise error rate is α = .05, then each individual comparison is made at a value of α which is less than .05.

8.36

a.

From the diagram, the following pairs of treatments are significantly different because they are not connected by a line: A and E, A and B, A and D, C and E, C and B, C and D, and E and D. All other pairs of means are not significantly different because they are connected by lines.

b.

From the diagram, the following pairs of treatments are significantly different because they are not connected by a line: A and B, A and D, C and B, C and D, E and B, E and D, and B and D. All other pairs of means are not significantly different because they are connected by lines.


265


8.38

8.40

c.

From the diagram, the following pairs of treatments are significantly different because they are not connected by a line: A and E, A and B, and A and D. All other pairs of means are not significantly different because they are connected by lines.

d.

From the diagram, the following pairs of treatments are significantly different because they are not connected by a line: A and E, A and B, A and D, C and E, C and B, C and D, E and D, and B and D. All other pairs of means are not significantly different because they are connected by lines.

a.

The total number of comparisons conducted is k(k – 1)/2 = 4(4 – 1)/2 = 6.

b.

The mean energy expended by robots in the 12 robot colony is significantly smaller than the mean energy expended by robots in any of the other size colonies. There is no difference in the mean energy expended by robots in the 3 robot colony, the 6 robot colony, and the 9 robot colony.

a.

There will be c =

b.

Comparing the mean safety scores for government officials and journalists, the difference in mean safety scores is 4.2 − 3.7 = .5, The critical value for the Tukey comparison is .23. Since .5 > .23, we conclude that the mean safety score for government officials is higher than the mean safety score for journalists.

k (k − 1) 3(3 − 1) = 3 pairwise comparisons. = 2 2

Comparing the mean safety scores for government officials and scientists, the difference in mean safety scores is 4.2 − 4.1 = .1. Since .1 < .23, we conclude that there is no difference in mean safety scores between government officials and scientists. Comparing the mean safety scores for scientists and journalists, the difference in mean safety scores is 4.1 − 3.7 = .4, The critical value for the Tukey comparison is .23. Since .4 > .23, we conclude that the mean safety score for scientists is higher than the mean safety score for journalists. A display of these conclusions is: Journalists 3.7 8.42

Scientists 4.1

Gov. Officials 4.2

a.

The probability of declaring at least one pair of means different when they are not is .01.

b.

There are a total of

k (k − 1) 3(3 − 1) = = 3 pair-wise comparisons. They are: 2 2

‘Under $30 thousand’ to ‘Between $30 and $60 thousand’ ‘Under $30 thousand’ to ‘Over $60 thousand’ ‘Between $30 and $60 thousand’ to ‘Over $60 thousand’

266

Chapter 8


c.

Means for groups in homogeneous subsets are displayed in the table: Income Group

Subsets

Under $30,000 $30,000-$60,000 Over $60,000 d.

N 379 392 267

1 4.60

2 5.08 5.15

Two of the comparisons in part b will yield confidence intervals that do not contain 0. They are: ‘Under $30 thousand’ to ‘Between $30 and $60 thousand’ ‘Under $30 thousand’ to ‘Over $60 thousand’

8.44

From Exercise 8.30, we found that there were differences in the mean carats among the 6 levels of color From Exercise 8.30, the mean carats for the 6 colors are: G F E D H I

0.5808 0.5929 0.6232 0.6381 0.6734 0.7310

Using MINITAB, the Tukey confidence intervals are: Tukey's pairwise comparisons Family error rate = 0.100 Individual error rate = 0.0101 Critical value = 3.66 Intervals for (column level mean) - (row level mean) D

E

F

G

E

-0.1926 0.2225

F

-0.1491 0.2395

-0.1026 0.1631

G

-0.1411 0.2558

-0.0964 0.1812

-0.1059 0.1302

H

-0.2350 0.1644

-0.1909 0.0904

-0.2007 0.0397

-0.2194 0.0341

I

-0.3032 0.1174

-0.2631 0.0475

-0.2752 -0.0010

-0.2931 -0.0074


H

-0.2022 0.0871

267


There are only 2 intervals that do not contain 0: The confidence interval for the difference in mean carats between colors G and I is (−0.2931, −0.0074). The confidence interval for the difference in mean carats between colors F and I is (−0.2752, −0.0010). Since 0 is not contained in these confidence intervals, there is sufficient evidence of a difference in the mean number of carats between colors G and I and between colors F and I. No other differences exist. 8.46

a.

There are 3 blocks used since Block df = b − 1 = 2 and 5 treatments since the treatment df = k − 1 = 4.

b.

There were 15 observations since the Total df = n − 1 = 14.

c.

H0: μ1 = μ2 = μ3 = μ4 = μ5 Ha: At least two treatment means differ

d.


e.

The rejection region requires α = .01 in the upper tail of the F distribution with ν1 = k − 1 = 5 − 1 = 4 and ν2 = n − k − b + 1 = 15 − 5 − 3 + 1 = 8. From Table XI, Appendix B, F.01 = 7.01. The rejection region is F > 7.01.

f.

Since the observed value of the test statistic falls in the rejection region (F = 9.109 > 7.01), H0 is rejected. There is sufficient evidence to indicate that at least two treatment means differ at α = .01.

g.

The assumptions necessary to assure the validity of the test are as follows: 1. 2.

8.48

a.

The probability distributions of observations corresponding to all the blocktreatment combinations are normal. The variances of all the probability distributions are equal.

The ANOVA Table is as follows: Source Treatment Block Error Total

268

MST = 9.109 MSE

df 2 3 6 11

SS 12.032 71.749 .708 84.489

MS 6.016 23.916 .118

F 50.958 202.586

Chapter 8


b.

To determine if the treatment means differ, we test:

H0: μA = μB = μC Ha: At least two treatment means differ B


MST = 50.958 MSE

The rejection region requires α = .05 in the upper tail of the F distribution with ν1 = k − 1 = 3 − 1 = 2 and ν2 = n − k − b + 1 = 12 − 3 − 4 + 1 = 6. From Table IX, Appendix B, F.05 = 5.14. The rejection region is F > 5.14. Since the observed value of the test statistic falls in the rejection region (F = 50.958 > 5.14), H0 is rejected. There is sufficient evidence to indicate that the treatment means differ at α = .05. c.

To see if the blocking was effective, we test:

H0: μ1 = μ2 = μ3 = μ4 Ha: At least two block means differ The test statistic is F =

MSB = 202.586 MSE

The rejection region requires α = .05 in the upper tail of the F distribution with ν1 = k − 1 = 4 − 1 = 3 and ν2 = n − k − b + 1 = 12 − 3 − 4 + 1 = 6. From Table IX, Appendix B, F.05 = 4.76. The rejection region is F > 4.76. Since the observed value of the test statistic falls in the rejection region (F = 202.586 > 4.76), H0 is rejected. There is sufficient evidence to indicate that blocking was effective in reducing the experimental error at α = .05. d.

From the printouts, we are given the differences in the sample means. The difference between Treatment B and both Treatments A and C are positive (1.125 and 2.450), so Treatment B has the largest sample mean. The difference between Treatment A and C is positive (1.325), so Treatment A has a larger sample mean than Treatment C. So Treatment B has the largest sample mean, Treatment A has the next largest sample mean and Treatment C has the smallest sample mean. From the printout, all the means are significantly different from each other.

e.

The assumptions necessary to assure the validity of the inferences above are: 1. 2.

The probability distributions of observations corresponding to all the blocktreatment combinations are normal. The variances of all the probability distributions are equal.


269


8.50

a.

This is a randomized block design. The blocks are the 12 plots of land. The treatments are the three methods used on the shrubs: fire, clipping, and control. The response variable is the mean number of flowers produced. The experimental units are the 36 shrubs.

b.

Plot

c.

To determine if there is a difference in the mean number of flowers produced among the three treatments, we test:

H0: μ1 = μ2 = μ3 Ha: The mean number of flowers produced differ for at least two of the methods. The test statistic is F = 5.42 and p = .009. We can reject the null hypothesis at the α > .009 level of significance. At least two of the methods differ with respect to mean number of flowers produced by pawpaws. d.

8.52

270

The means of Control and Clipping do not differ significantly. The means of Clipping and Burning do not differ significantly. The mean of treatment Burning exceeds that of the Control.

From the printout, the p-value for treatments or Decoy is p = .589. Since the p-value is not small, we cannot reject H0. There is insufficient evidence to indicate a difference in mean percentage of a goose flock to approach to within 46 meters of the pit blind among the three decoy types. This conclusion is valid for any reasonable value of α.

Chapter 8


8.54

Using SAS, the ANOVA Table is: The ANOVA Procedure Dependent Variable: temp Source

DF

Sum of Squares

Mean Square

F Value

Pr > F

Model

11

18.53700000

1.68518182

0.52

0.8634

Error

18

58.03800000

3.22433333

Corrected Total

29

76.57500000

R-Square

Coeff Var

Root MSE

temp Mean

0.242076

1.885189

1.795643

95.25000

Source STUDENT PLANT

DF

Anova SS

Mean Square

F Value

Pr > F

9 2

18.41500000 0.12200000

2.04611111 0.06100000

0.63 0.02

0.7537 0.9813

To determine if there are differences among the mean temperatures among the three treatments, we test:

H0: μ1 = μ2 = μ3 Ha: At least two treatment means differ The test statistic is F = 0.02. The associated p-value is p = .9813. Since the p-value is very large, there is no evidence of a difference in mean temperature among the three treatments. Since there is no difference, we do not need to compare the means. It appears that the presence of plants or pictures of plants does not reduce stress. 8.56

a.


(∑ y ) CM =

2

n SS(Total) =

2.952 = .435125 10 y 2 − CM = .4705 − .435125 = .035375

=

∑

1.622 1.332 T12 T22 + − CM = + − .435125 = .004205 10 10 b b SST .004205 = = .004205, df = k − 1 = 1 MST = 2 −1 k −1 B2 B2 B2 SSB = SS(DOG) = 1 + 2 + ⋅⋅⋅ + 10 − CM k k k 2 2 2 2 2 .32 + .38 + .27 + .36 + .42 + .312 + .19 2 + .192 + .32 + .212 = 2 − .435125 = .028925 SSB .028925 = MSB = = .003214, df = b − 1 = 9 b −1 10 − 1

SST = SS(DRUG) =

SSE = SS(Total) − SST − SSB = .035375 − .004205 − .028925 = .002245


271


MSE =

F=

SSE .002245 = = .0002494 n − k − b + 1 20 − 2 − 10 + 1

MST .004205 = = 16.86 MSE .0002494

F=

MSB .003214 = = 12.89 MSE .0002494

To determine if there is a difference in mean pressure readings for the two treatments, we test:

H0: μA = μB Ha: μA ≠ μB B

B


MST = 16.86 MSE

The rejection region requires α = .05 in the upper tail of the F distribution with ν1 = k − 1 = 2 − 1 = 1 and ν2 = n − k − b + 1 = 20 − 2 − 10 + 1 = 9. From Table IX, Appendix B, F.05 = 5.12. The rejection region is F > 5.12. Since the observed value of the test statistic falls in the rejection region (F = 16.86 > 5.12), H0 is rejected. There is sufficient evidence to indicate a difference in mean pressure readings for the two drugs at α = .05. b.

Since there is expected to be much variation between the dogs, we use the dogs as blocks to eliminate this identified source of variation.

c.

272

Dog

Drug A

Drug B

1 2 3 4 5 6 7 8 9 10

.17 .20 .14 .18 .23 .19 .12 .10 .16 .13

.15 .18 .13 .18 .19 .12 .07 .09 .14 .08

(A − B) Differences .02 .02 .01 .00 .04 .07 .05 .01 .02 .05

Chapter 8


Some preliminary calculations are: d=

sd2 = sd =

∑d

i

nd

∑d

.29 = .029 10

= 2 i

(∑ d ) −

2

i

nd nd − 1

=

(.29) 2 10 = .00449 = .0004989 10 − 1 9

.0129 −

sd2 = .0004989 = .02234

To determine if there is a difference in mean pressure readings for the two treatments, we test: H0: μA = μB Ha: μA ≠ μB B

B


d −0 sd / nd

=

.029 − 0

= 4.105

.02234 / 10

The rejection region requires α/2 = .05/2 = .025 in each tail of the t distribution with df = n − 1 = 10 − 1 = 9. From Table VI, Appendix B, t.025 = 2.262. The rejection region is t < −2.262 or t > 2.262. Since the observed value of the test statistic falls in the rejection region (t = 4.105 > 2.262), H0 is rejected. There is sufficient evidence to indicate a difference in the treatment means at α = .05. d.

In part a, F = 16.86; and in part c, t = 4.105. Note that t2 = 4.1052 = 16.85 = F. In part a, F.05 = 5.12; and in part c, t.025 = 2.262. Note that t.2025 = 2.2622 = 5.12 = F.05.

e.

p-value = P(F ≥ 16.86) with ν1 = 1 and ν2 = 9. Using Table XI, Appendix B, P(F ≥ 10.56) < .01. Thus, the p-value is < .01. The probability of a test statistic this extreme if the treatment means are the same is less than .01. This is very significant. We would reject H0 in favor of Ha if α is larger than the p-value.

8.58

a.

There are two factors.

b.

No, we cannot tell whether the factors are qualitative or quantitative.

c.

Yes. There are four levels of factor A and three levels of factor B.

d.

A treatment would consist of a combination of one level of factor A and one level of factor B. There are a total of 4 × 3 = 12 treatments.


273


8.60

e.

One problem with only one replicate is there are no degrees of freedom for error. This is overcome by having at least two replicates.

a.

Factor A has 3 + 1 = 4 levels and factor B has 1 + 1 = 2 levels.

b.

There are a total of 23 + 1 = 24 observations and 4 × 2 = 8 treatments. Therefore, there were 24/8 = 3 observations for each treatment.

c.

AB df = (a − 1)(b − 1) = (4 − 1)(2 − 1) = 3 Error df = n − ab = 24 − 4(2) = 16 SS A ⇒ SSA = (a − 1)MSA = (4 − 1)(.75) = 2.25 a −1 SSB .95 = MSB = = .95 b −1 2 −1 SS AB MSAB = ⇒ SSAB = (a − 1)(b − 1)MSAB = (4 − 1)(2 − 1)(.30) = .9 (a − 1)(b − 1) SSE = SS(Total) − SSA − SSB − SSAB = 6.5 − 2.25 − .95 − .9 = 2.4 SSE 2.4 = MSE = = .15 n − ab 24 - 4(2)

MSA =

SST = SSA + SSB + SSAB = 2.25 + .95 + .90 = 4.1 Treatment df = ab − 1 = 4(2) − 1 = 7 SST 4.1 MST .5857 MST = = .5857 FT = = 3.90 = = ab − 1 7 MSE .15 MSA .75 = = 5.00 MSE .15 MSAB .30 = = 2.00 FAB = MSE .15

FA =

FB = B

MSB .95 = = 6.33 MSE .15

The ANOVA table is: Source Treatments A B AB Error Total

274

df 7 3 1 3 16 23

SS 4.1 2.25 .95 .90 2.40 6.50

MS .59 .75 .95 .30 .15

F 3.90 5.00 6.33 2.00

Chapter 8


d.

To determine whether the treatment means differ, we test: H0: μ1 = μ2 = ⋅⋅⋅ = μ8 Ha: At least two treatment means differ The test statistic is F =

MST = 3.90 MSE

The rejection region requires α = .10 in the upper tail of the F-distribution with ν1 = ab − 1 = 4(2) − 1 = 7 and ν2 = n − ab = 24 − 4(2) = 16. From Table VIII, Appendix B, F.10 = 2.13. The rejection region is F > 2.13. Since the observed value of the test statistic falls in the rejection region (F = 3.90 > 2.13), H0 is rejected. There is sufficient evidence to indicate the treatment means differ at α = .10. e.

To determine if the factors interact, we test: H0: Factors A and B do not interact to affect the response mean Ha: Factors A and B do interact to affect the response mean The test statistic is F = 2.00. The rejection region requires α = .10 in the upper tail of the F-distribution with ν1 = (a − 1)(b − 1) = (4 − 1)(2 − 1) = 3 and ν2 = n − ab = 24 − 4(2) = 16. From Table VIII, Appendix B, F.10 = 2.46. The rejection region is F > 2.46. Since the observed value of the test statistic does not fall in the rejection region (F = 2.00 >/ 2.46), H0 is not rejected. There is insufficient evidence to indicate factors A and B interact at α = .10. To determine if the four means of factor A differ, we test: H0: There is no difference in the four means of factor A Ha: At least two of the factor A means differ The test statistic is F = 5.00. The rejection region requires α = .10 in the upper tail of the F-distribution with ν1 = a − 1 = 4 − 1 = 3 and ν2 = n − ab = 24 - 4(2) = 16. From Table VIII, Appendix B, F.10 = 2.46. The rejection region is F > 2.46. Since the observed value of the test statistic falls in the rejection region (F = 5.00 > 2.46), H0 is rejected. There is sufficient evidence to indicate at least two of the four means of factor A differ at α = .10. To determine if the 2 means of factor B differ, we test: H0: There is no difference in the two means of factor B Ha: At least two of the factor B means differ


275


The test statistic is F = 6.33. The rejection region requires α = .10 in the upper tail of the F-distribution with ν1 = b − 1 = 2 − 1 = 1 and ν2 = n − ab = 24 − 4(2) = 16. From Table VIII, Appendix B, F.10 = 3.05. The rejection region is F > 3.05. Since the observed value of the test statistic falls in the rejection region (F = 6.33 > 3.05), H0 is rejected. There is sufficient evidence to indicate the two means of factor B differ at α = .10. All of the tests performed are warranted because interaction was not significant. 8.62

a.

The treatments are the combinations of the levels of factor A and the levels of factor B. There are 2 × 2 = 4 treatments. The treatment means are: x11 = x21 =

∑x

11

2 ∑ x21 2

=

29.6 + 35.2 = 32.4 2

x12 =

=

12.9 + 17.6 = 15.25 2

x22 =

∑x

12

2 ∑ x22 2

=

47.3 + 42.1 2

=

28.4 + 22.7 2

The factors do not appear to interact—the lines are almost parallel. The treatment means do appear to differ because the sample means range from 15.25 to 44.7.

b.

276

(∑ x )

2

235.82 8 n 2 SS(Total) = ∑ x − CM = 7922.92 − 6950.205 = 972.715

CM =

i

SSA =

∑A

SSB =

∑B

2 i

br 2 i

ar

=

− CM=

154.22 81.62 + = 7609.05 − 6950.205 = 658.845 2(2) 2(2)

− CM=

95.32 140.52 + = 7205.585 − 6950.205 = 255.38 2(2) 2(2)

Chapter 8


∑∑ AB

2 ij

64.82 89.42 30.52 51.12 + + + r 2 4 2 2 − 658.845 − 255.38 − 6950.205 = 7866.43 − 7864.43 = 2 SSE = SS(Total) − SSA − SSB − SSAB = 972.715 − 658.845 − 255.38 − 2 = 56.49

SSAB =

− SSA − SSB − CM =

df = a − 1 = 2 − 1 = 1 df = b − 1 = 2 − 1 = 1 df = (a − 1)(b − 1) = (2 − 1)(2 − 1) = 1 df = n − ab = 8 − 2(2) = 4 df = n − 1 = 8 − 1 = 7

A B AB Error Total

SSA 658.845 = = 658.845 a −1 1 SSAB 2 = =2 MSAB = (a − 1)(b − 1) 1

MSA =

MSB =

SSB 255.38 = = 255.38 b −1 1

MSE =

SSE 56.49 = 14.1225 = n - ab 4

MSA 658.845 = = 46.65 MSE 14.1225

FA = FAB =

FB = B

MSB 255.38 = = 18.08 MSE 14.1225

MSAB 2 = = .14 MSE 14.1225

The ANOVA table is: Source A B AB Error Total

c.

df

1 1 1 4 7

SS 658.845 255.380 2.000 56.490 972.715

MS 658.845 255.380 2.000 14.1225

F 46.65 18.08 .14

SST = SSA + SSB + SSAB = 658.845 + 255.380 + 2.000 = 916.225 df = ab − 1 = 2(2) − 1 = 3 SST 916.225 MST 305.408 = 21.63 MST = = = 305.408 FT = = ab − 1 3 MSE 14.1225 To determine whether the treatment means differ, we test: H0: μ1 = μ2 = μ3 = μ4 Ha: At least two of the treatment means differ The test statistic is F = 21.63. The rejection region requires α = .05 in the upper tail of the F-distribution with ν1 = ab − 1 = 2(2) − 1 = 3 and ν2 = n − ab = 8 − 2(2) = 4. From Table IX, Appendix B, F.05 = 6.59. The rejection region is F > 6.59.


277


Since the observed value of the test statistic falls in the rejection region (F = 21.63 > 6.59), H0 is rejected. There is sufficient evidence to indicate the treatment means differ at α = .05. This agrees with the conclusion in part a. d.

Since there are differences among the treatment means, we test for the presence of interaction: H0: Factors A and B do not interact to affect the response means Ha: Factors A and B do interact to affect the response means The test statistic is F = .14. The rejection region requires α = .05 in the upper tail of the F-distribution with ν1 = (a − 1)(b − 1) = (2 − 1)(2 − 1) = 1 and ν2 = n − ab = 8 − 2(2) = 4. From Table IX, Appendix B, F.05 = 7.71. The rejection region is F > 7.71. Since the observed value of the test statistic does not fall in the rejection region (F = .14 >/ 7.71), H0 is not rejected. There is insufficient evidence to indicate the factors interact at α = .05.

e.

Since the interaction was not significant, we test for main effects. To determine whether the two means of factor A differ, we test: H0: μ1 = μ2 Ha: μ1 ≠ μ2 The test statistic is F = 46.65. The rejection region requires α = .05 in the upper tail of the F-distribution with ν1 = a − 1 = 2 − 1 = 1 and ν2 = n − ab = 8 − 2(2) = 4. From Table IX, Appendix B, F.05 = 7.71. The rejection region is F > 7.71. Since the observed value of the test statistic falls in the rejection region (F = 46.65 > 7.71), H0 is rejected. There is sufficient evidence to indicate the two means of factor A differ at α = .05. To determine whether the two means of factor B differ, we test: H0: μ1 = μ2 Ha: μ1 ≠ μ2 The test statistic is F = 18.08. The rejection region requires α = .05 in the upper tail of the F-distribution with ν1 = b − 1 = 2 − 1 = 1 and ν2 = n − ab = 8 − 2(2) = 4. From Table IX, Appendix B, F.05 = 7.71. The rejection region is F > 7.71.

278

Chapter 8


Since the observed value of the test statistic falls in the rejection region (F = 18.08 > 7.71), H0 is rejected. There is sufficient evidence to indicate the two means of factor B differ at α = .05. f.

The results of all the tests agree with those in part a.

g.

Since no interaction is present, but the means of both factors A and B differ, we compare the two means of factor A and compare the two means of factor B. Since there are only two means to compare for each factor, the higher population mean corresponds to the higher sample mean. Factor A: x1 = x2 =

∑x

1

br

∑x

2

br

=

29.6 + 35.2 + 47.3 + 42.1 = 38.55 2(2)

=

12.9 + 17.6 + 28.4 + 22.7 = 20.4 2(2)

The mean for level 1 of factor A is significantly higher than the mean for level 2. Factor B: x1 = x2 =

∑x

1

ar

∑x

2

ar

=

29.6 + 35.2 + 12.9 + 17.6 = 23.825 2(2)

=

47.3 + 42.1 + 28.4 + 22.7 = 35.125 2(2)

The mean for level 2 of factor B is significantly higher than the mean for level 1. 8.64

a.

There are a total of 2 × 4 = 8 treatments.

b.

The interaction between temperature and type was significant. This means that the effect of type of yeast on the mean autolysis yield depends on the level of temperature.

c.

To determine if the main effect of type of yeast is significant, we test: H0: μBa = μBr Ha: μBa ≠ μBr To determine if the main effect of temperature is significant, we test: H0: μ1 = μ2 = μ3 = μ4 Ha: At least one mean differs

d.

The tests for the main effects should not be run until after the test for interaction is conducted. If interaction is significant, then these interaction effects could cover up the main effects. Thus, the main effect tests would not be informative. If the test for interaction is not significant, then the main effect tests could be run.


279


e.

Baker’s yeast:

The mean yield for temperature 54o is significantly lower than the mean yields for the other 3 temperatures. There is no difference in the mean yields for the temperatures 45o, 48o and 51o. Brewer’s yeast: The mean yield for temperature 54o is significantly lower than the mean yields for the other 3 temperatures. There is no difference in the mean yields for the temperatures 45o, 48o and 51o.

8.66

a.

This is an observational experiment. The researcher recorded the number of users per hour for each of 24 hours per day, 7 days per week, for 7 weeks. The researcher did not manipulate the weeks or days or hours.

b.

The two factors are (1) the day of the week with 7 levels and (2) the hour of the day with 24 levels.

c.

In a factorial experiment, a is the number of levels of factor A and b is the number of levels of factor B. If we let factor A be the day of the week and factor B be the hour of the day, then a = 7 and b = 24.

d.

To determine if the a × b = 7 × 24 = 168 treatment means differ, we test: H0: μ1 = μ2 = μ3 = . . . = μ168 Ha: At least two means differ The test statistic is F =

MST 1143.99 = = 25.06 MSE 45.65

The rejection region requires α = .01 in the upper tail of the F distribution with v1 = p − 1 = 168 − 1 = 167 and v2 = n − p = 1172 – 168 = 1004. From Table XI, Appendix B, F.01 ≈ 1.00. The rejection region is F > 1.00. Since the observed value of the test statistic falls in the rejection region (F = 25.06 > 1.00), H0 is rejected. There is sufficient evidence to indicate a difference in mean usage among the day-hour combinations at α = .01. e.

The hypotheses used to test if an interaction effect exists are: H0: Days and hours do not interact to affect the mean usage Ha: Days and hours interact do affect the mean usage

f.


MSAB 55.69 = 1.22 = MSE 45.65

The p-value is p = .0527. Since the p-value is not less than α = .01, H0 is not rejected. There is insufficient evidence to indicate days and hours interact to affect usage at α = .01.

280

Chapter 8


g.

To determine if the mean usage differs among the days of the week, we test: H0: μ1 = μ2 = μ3 = μ4 = μ5 = μ6 = μ7 Ha: At least two means differ The test statistic is F =

MSA 3122.02 = 68.39 = MSE 45.65

The p-value is p = .0001. Since the p-value is less than α = .01, H0 is rejected. There is sufficient evidence to indicate the mean usage differs among the days of the week at α = .01. To determine if the mean usage differs among the hours of the day, we test: H0: μ1 = μ2 = μ3 = . . . = μ24 Ha: At least two means differ The test statistic is F =

MSB 7157.82 = 156.80 = MSE 45.65

The p-value is p = .0001. Since the p-value is less than α = .01, H0 is rejected. There is sufficient evidence to indicate the mean usage differs among the hours of the day at α = .01. 8.68

a.

The degrees of freedom for “Type of message retrieval system” is a − 1 = 2 − 1 = 1. The degrees of freedom for “Pricing option” is b − 1 = 2 − 1 = 1. The degrees of freedom for the interaction of Type of message retrieval system and Pricing option is (a − 1)(b – 1) = (2 − 1)(2 − 1) = 1. The degrees of freedom for error is n − ab = 120 − 2(2) = 116. Source Type of message retrieval system Pricing Option Type of system × pricing option Error Total

b.

Df 1 1 1 116 119

SS -

MS -

F 2.001 5.019 4.986

To determine if “Type of system” and “Pricing option” interact to affect the mean willingness to buy, we test: H0: “Type of system” and “Pricing option” do not interact Ha: “Type of system” and “Pricing option” interact

c.


MSAB = 4.986 MSE

The rejection region requires α = .05 in the upper tail of the F distribution with ν1 = (a − 1)(b − 1) = (2 − 1)(2 − 1) = 1 and ν2 = n − ab = 120 − 2(2) = 116. From Table IX, Appendix B, F.05 ≈ 3.92. The rejection region is F > 3.92.


281


Since the observed value of the test statistic falls in the rejection region (F = 4.986 > 3.92), H0 is rejected. There is sufficient evidence to indicate “Type of system” and “Pricing option” interact to affect the mean willingness to buy at α = .05.

8.70

d.

No. Since the test in part c indicated that interaction between “Type of system” and “Pricing option” is present, we should not test for the main effects. Instead, we should proceed directly to a multiple comparison procedure to compare selected treatment means. If interaction is present, it can cover up the main effects.

a.

The treatments are the 3 × 3 = 9 combinations of PES and Trust. The nine treatments are: (BC, Low), (PC, Low), (NA, Low), (BC, Med), (PC, Med), (NA, Med), (BC, High), (PC, High), and (NA, High).

b.

df(Trust) = 3 − 1 = 2; SSE = SSTot − SS(PES) − SS(Trust) − SSPT = 161.1162 − 2.1774 − 7.6367 − 1.7380 = 149.5641 SS(PES) 2.1774 = = 1.0887 MS(PES) = 2 df(PES) SS(Trust) 7.6367 = = 3.81835 MS(Trust) = 2 df(Trust) SS(PT) 1.7380 = = 0.4345 MS(PT) = df(PT) 4 SSE 149.5641 MSE = = 0.7260 = df(Error) 206 MS(PES) MS(Trust) 1.0887 3.81835 FPES = = 1.50 FTrust = = 5.26 = = MSE MSE 0.7260 0.7260 MS(PT) 0.4345 FPT = = = 0.60 0.7260 MSE The ANOVA table is: Source PES Trust PES × Trust Error Total

c.

df 2 2 4 206 214

SS 2.1774 7.6367 1.7380 149.5641 161.1162

MS 1.0887 3.81835 0.4345 0.7260

F 1.50 5.26 0.60

To determine if PES and Trust interact, we test: H0: PES and Trust do not interact to affect the mean tension Ha: PES and Trust do interact to affect the mean tension The test statistic is F = 0.60.

282

Chapter 8


The rejection region requires α = .05 in the upper tail of the F-distribution with ν1 = (a − 1)(b − 1) = (3 − 1)(3 − 1) = 4 and ν2 = n − ab = 215 − 3(3) = 206. From Table IX, Appendix B, F.05 ≈ 2.37. The rejection region is F > 2.37. Since the observed value of the test statistic does not fall in the rejection region (F = 0.60 >/ 2.37), H0 is not rejected. There is insufficient evidence to indicate that PES and Trust interact at α = .05. d.

The plot of the treatment means is: The mean tension scores for Low Trust are relatively the same for each level of PES. Similarly, the mean tension scores for Medium Trust are relatively the same for each level of PES. However, the mean tension scores for High Trust are not the same for each level of PES. For both PES levels BC and PC, as the level of trust increases, the mean tension scores decrease. However, for PES level NA, as trust goes from low to medium, the mean tension decreases. As the trust goes from medium to high, the mean tension increases. This indicates that interaction is present which was also found in part d.

e.

8.72

Because the interaction of PES and Trust was found to be significant, the tests for the main effects are irrelevant. If the factors interact, the interaction effect can cover up any main effect differences. In addition, interaction implies that the effects of one factor on the dependent variable are different at different levels of the second factor. Thus, there is no one "main" effect of the factor.

Using MINITAB, the ANOVA results are: General Linear Model: Deviation versus Group, Trail Factor Group Trail

Type Levels Values fixed 4 F G M N fixed 2 C E

Analysis of Variance for Deviatio, using Adjusted SS for Tests Source Group Trail Group*Trail Error Total

DF 3 1 3 112 119

Seq SS 16271.2 46445.5 2245.2 82131.7 147093.6

Adj SS 13000.6 46445.5 2245.2 82131.7

Adj MS 4333.5 46445.5 748.4 733.3

F 5.91 63.34 1.02

P 0.001 0.000 0.386

First, we must test for treatment effects. SST = SS(Group) + SS(Trail) + SS(GxT) = 16,271.2 + 46,445.5 + 2,245.2 = 64,961.9. The df = 3 + 1 + 3 = 7.


283


MST =

SST 64, 961.9 = = 9, 280.2714 ab − 1 4(2) − 1

F=

MST 9, 280.2714 = = 12.66 MSE 733.3

To determine if there are differences in mean ratings among the 8 treatments, we test: H0: All treatment means are the same Ha: At least two treatment means differ The test statistic is F = 12.66. Since no α was given, we will use α = .05. The rejection region requires α = .05 in the upper tail of the F distribution with ν1 = ab – 1 = 4(2) – 1 = 7 and ν2 = n – ab = 120 – 4(2) = 112. From Table IX, Appendix B, F.05 ≈ 2.09. The rejection region is F > 2.09. Since the observed value of the test statistic falls in the rejection region (F = 12.66 > 2.09), H0 is rejected. There is sufficient evidence that differences exist among the treatment means at α = .05. Since differences exist, we now test for the interaction effect between Trail and Group. To determine if Trail and Group interact, we test: H0: Trail and Group do not interact Ha: Trail and Group do interact The test statistic is F = 1.02 and p = .386 Since the p-value is greater than α (p = .386 > .05), H0 is not rejected. There is insufficient evidence that Trail and Group interact at α = .05. Since the interaction does not exist, we test for the main effects of Trail and Group. To determine if there are differences in the mean rating between the two levels of Trail, we test: H0: μ1 = μ2 Ha: μ1 ≠ μ2 The test statistics is F = 63.34 and p = 0.000. Since the p-value is greater than α (p = .000 < .05), H0 is rejected. There is sufficient evidence that the mean trail deviations differ between the fecal extract trail and the control trail α = .05. To determine if there are differences in the mean rating between the four levels of Group, we test: H0: μ1 = μ2 = μ3 = μ4 Ha: At least 2 means differ The test statistics is F = 5.91 and p = 0.001. Since the p-value is less than α (p = 0.001 < .05), Ho is rejected. There is sufficient evidence that the mean trail deviations differ among the four groups at α = .05.

284

Chapter 8


8.74

There are 3 × 2 = 6 treatments. They are A1B1, A1B2, A2B1, A2B2, A3B1, and A3B2.

8.76

a.

B

B

B

B

B

B

SSE = SSTot − SST = 62.55 − 36.95 = 25.60 df Treatment = p − 1 = 4 − 1 = 3 df Error = n − p = 20 − 4 = 16 df Total = n − 1 = 20 − 1 = 19 36.95 = 12.32 MST = SST/df = 3 25.60 = 1.60 MSE = SSE/df = 16 MST 12.32 F= = = 7.70 MSE 1.60 The ANOVA table: Source Treatment Error Total

b.

df

3 16 19

SS 36.95 25.60 62.55

MS 12.32 1.60

F 7.70

To determine if there is a difference in the treatment means, we test: H0: μ1 = μ2 = μ3 = μ4 Ha: At least two of the means differ where the μi represents the mean for the ith treatment. The test statistic is F =

MST = 7.70 MSE

The rejection region requires α = .10 in the upper tail of the F-distribution with ν1 = (p − 1) = (4 − 1) = 3 and ν2 = (n − p) = (20 − 4) = 16. From Table VIII, Appendix B, F.10 = 2.46. The rejection region is F > 2.46. Since the observed value of the test statistic falls in the rejection region (F = 7.70 > 2.46), H0 is rejected. There is sufficient evidence to conclude that at least two of the means differ at α = .10. c.

x4 =

∑x

4

n4

=

57 = 11.4 5

For confidence level .90, α = .10 and α/2 = .10/2 = .05. From Table VI, Appendix B, with df = 16, t.05 = 1.746. The confidence interval is: x4 ± t.05 MSE/n4 ⇒ 11.4 ± 1.746⋅ 1.6 / 5 ⇒ 11.4 ± .99 ⇒ (10.41, 12.39)


285


8.78

a.

df(AB) = (a − 1)(b - 1) = 3(5) = 15 df(Error) = n − ab = 48 − 4(6) = 24 SSAB = MSAB(df) = 3.1(15) = 46.5 SS(Total) = SSA + SSB + SSAB + SSE = 2.6 + 9.2 + 46.5 + 18.7 = 77 SS A 2.6 SSB 9.2 = = .8667 = = 1.84 MSA = MSB = a −1 3 b −1 5 SSE 18.7 = = .7792 MSE = n − ab 24 MSA .8667 MSB 1.84 = = 1.11 = = 2.36 FB = FA = MSE .7792 MSE .7792 MS AB 3.1 = = 3.98 FAB = MSE .7792 B

Source A B AB Error Total

df 3 5 15 24 47

SS 2.6 9.2 46.5 18.7 77.0

MS .8667 1.84 3.1 .7792

F 1.11 2.36 3.98

b.

Factor A has a = 3 + 1 = 4 levels and factor B has b = 5 + 1 = 6 levels. The number of treatments is ab = 4(6) = 24. The total number of observations is n = 47 + 1 = 48. Thus, two replicates were performed.

c.

SST = SSA + SSB + SSAB = 2.6 + 9.2 + 46.5 = 58.3 SST 58.3 = = 2.5347 MST = ab − 1 4(6) − 1

F=

MST 2.5347 = = 3.25 MSE .7792

To determine whether the treatment means differ, we test: H0: μ1 = μ2 = ⋅⋅⋅ = μ24 Ha: At least one treatment mean is different The test statistic is F =

MST = 3.25 MSE

The rejection region requires α = .05 in the upper tail of the F-distribution with ν1 = ab − 1 = 4(6) − 1 = 23 and ν2 = n − ab = 48 - 4(6) = 24. From Table IX, Appendix B, F.05 ≈ 2.03. The rejection region is F > 2.03. Since the observed value of the test statistic falls in the rejection region (F = 3.25 > 2.03), H0 is rejected. There is sufficient evidence to indicate the treatment means differ at α = .05.

286

Chapter 8


d.

Since there are differences among the treatment means, we test for the presence of interaction: H0: Factor A and factor B do not interact to affect the response mean Ha: Factor A and factor B do interact to affect the response mean The test statistic is F =

MS AB = 3.98 MSE

The rejection region requires α = .05 in the upper tail of the F-distribution with ν1 = (a − 1)(b − 1) = (4 − 1)(6 − 1) = 15 and ν2 = n − ab = 48 − 4(6) = 24. From Table IX, Appendix B, F.05 = 2.11. The rejection region is F > 2.11. Since the observed value of the test statistic falls in the rejection region (F = 3.98 > 2.11), H0 is rejected. There is sufficient evidence to indicate factors A and B interact to affect the response means at α = .05. Since the interaction is significant, no further tests are warranted. Multiple comparisons need to be performed. 8.80

a.

This is a two-factor factorial design. It is also a completely randomized design.

b.

The two factors are "involvement in topic" and "question wording." Both are qualitative variables because neither are measured on numerical scales.

c.

There are two levels of "involvement in topic": high and low. There are two levels of "question wording": positive and negative.

d.

There are 2 × 2 = 4 treatments. The are: (high, positive), (high, negative), (low, positive), and (low, negative)

8.82

e.

The experiment's dependent variable is the level of agreement.

a.

To determine if the mean vacancy rates of the eight office-property submarkets in Atlanta differ, we test: H0: μ1 = μ2 = μ3 = μ4 = μ5 = μ6 = μ7 = μ8 Ha: At least two means differ

b.

If quarterly data were used for nine years, there are 4 × 9 = 36 observations per submarket. Since there are 8 submarkets, the total sample size is 8 × 36 = 288. Since no value of α is given, we will use α = .05. The rejection region requires α = .05 in the upper tail of the F-distribution with ν1 = k − 1 = 8 − 1 = 7 and ν2 = n − k = 288 – 8 = 280. From Table X, Appendix B, F.05 ≈ 2.01. The rejection region is F > 2.01.


287


Since the observed value of the test statistic falls in the rejection region (F = 17.54 > 2.01), H0 is rejected. There is sufficient evidence to indicate the mean vacancy rates of the eight office-property submarkets in Atlanta differ at α = .05.

8.84

c.

With ν1 = k − 1 = 8 − 1 = 7 and ν2 = n − k = 288 – 8 = 280, P(F > 17.54) < .01, using Table XI, Appendix B. Thus, the p-value is less than .01.

d.

We must assume that all eight samples are randomly drawn from normal populations, the eight populations variances are the same, and the samples are independent.

e.

The mean vacancy rate for the South submarket is significantly larger than the mean vacancy rates for all other submarkets. The mean vacancy rate of the Downtown submarket is significantly larger than the mean vacancy rates for all other submarkets except the South. The mean vacancy rate of the North Lake submarket is significantly larger than the mean vacancy rates for all other submarkets except the South and Downtown. The mean vacancy rate of the Midtown submarket is significantly larger than the mean vacancy rates for all other submarkets except the South, Downtown, and North Lake. There are no other significant differences.

a.

The response is the weight of a brochure. There is one factor and it is carton. The treatments are the five different cartons, while the experimental units are the brochures.

b.

(∑ y)

2

.750052 = .01406437506 n 40 SS(Total) = ∑ y 2 − CM = .014066537 − .01406437506 = .00000216264

CM =

SST =

=

2 Ti 2 . .15028 2 .14962 2 .15217 2 .150312 + + + + − .01406437506 ∑ n − CM = 14767 8 8 8 8 8 i

= .01406568209 - .01406437506 = .00000130703 SSE = SS(Total) − SST = .00000216264 - .00000130703 = .00000085561 SST .00000130703 = MST = = .000000326756 k −1 5 −1 SSE .00000085561 = = .000000024446 MSE = n−k 40 − 5 MST .000000326756 F= = = 13.37 MSE .000000024446 Source Treatments Error Total

df 4 35 39

SS .00000130703 .00000085561 .00000216264

MS F .000000326756 13.37 .000000024446

To determine whether there are differences in mean weight per brochure among the five cartons, we test:

H0: μ1 = μ2 = μ3 = μ4 = μ5 Ha: At least two treatment means differ

288

Chapter 8


The test statistic is F = 13.37. The rejection region requires α = .05 in the upper tail of the F-distribution with ν1 = k − 1 = 5 − 1 = 4 and ν2 = n − k = 40 − 5 = 35. From Table IX, Appendix B, F.05 ≈ 2.53. The rejection region is F > 2.53. Since the observed value of the test statistic falls in the rejection region (F = 13.37 > 2.53), H0 is rejected. There is sufficient evidence to indicate a difference in mean weight per brochure among the five cartons at α = .05. c.

We must assume that the distributions of weights for the brochures in the five cartons are normal, that the variances of the weights for the brochures in the five cartons are equal, and that random and independent samples were selected from each of the cartons.

d.

Using MINITAB, the results of Tukey’s multiple comparison procedure are:

Level Carton1 Carton2 Carton3 Carton4 Carton5

N 8 8 8 8 8

Mean 0.018459 0.018785 0.018703 0.019021 0.018789

Individual 95% CIs For Mean Based on Pooled StDev ---+---------+---------+---------+----(-----*-----) (----*-----) (----*-----) (-----*-----) (----*-----) ---+---------+---------+---------+-----0.01840 0.01860 0.01880 0.01900

StDev 0.000105 0.000101 0.000109 0.000232 0.000188

Pooled StDev = 0.000156 Tukey 95% Simultaneous Confidence Intervals All Pairwise Comparisons Individual confidence level = 99.32% Carton1 subtracted from: Carton2 Carton3 Carton4 Carton5

Lower 0.0001013 0.0000188 0.0003375 0.0001050

Center 0.0003262 0.0002437 0.0005625 0.0003300

Upper 0.0005512 0.0004687 0.0007875 0.0005550

Carton2 Carton3 Carton4 Carton5

------+---------+---------+---------+--(-----*------) (-----*-----) (-----*-----) (-----*------) ------+---------+---------+---------+---0.00035 0.00000 0.00035 0.00070

Carton2 subtracted from: Carton3 Carton4 Carton5

Lower -0.0003075 0.0000113 -0.0002212

Center -0.0000825 0.0002363 0.0000037

Carton3 Carton4 Carton5

------+---------+---------+---------+--(------*-----) (------*-----) (-----*------) ------+---------+---------+---------+---0.00035 0.00000 0.00035 0.00070


Upper 0.0001425 0.0004612 0.0002287

289


Carton3 subtracted from: Carton4 Carton5

Lower 0.0000938 -0.0001387

Center 0.0003187 0.0000862

Upper 0.0005437 0.0003112

Carton4 Carton5

------+---------+---------+---------+--(-----*------) (-----*------) ------+---------+---------+---------+---0.00035 0.00000 0.00035 0.00070

Carton4 subtracted from: Carton5

Lower -0.0004575

Center -0.0002325

Upper -0.0000075

Carton5

------+---------+---------+---------+--(-----*------) ------+---------+---------+---------+---0.00035 0.00000 0.00035 0.00070

The means arranged in order are: Carton 1 Carton 3 Carton 2 .018459.018703.018785.018789.019021

Carton 5

Carton 4

The interpretation of the Tukey results are: The mean weight for carton 4 is significantly higher than the mean weights of all the other cartons. The mean weights of cartons 5, 4, and 3 are significantly higher than the mean weight of carton 1.

8.86

e.

Since there are differences among the cartons, management should sample from many cartons.

a.

This is a randomized block design. Response: Factor: Factor type: Treatments: Experimental units:

290

the length of time required for a cut to stop bleeding drug qualitative drugs A, B, and C subjects

Chapter 8


b.

Using MINITAB, the results are: General Linear Model: Y versus Drug, Person Factor Drug Person

Type Levels Values fixed 3 A B C fixed 5 1 2 3 4 5

Analysis of Variance for Y, using Adjusted SS for Tests Source Drug Person Error Total

DF 2 4 8 14

Seq SS 156.4 7645.8 160.1 7962.3

Adj SS 156.4 7645.8 160.1

Adj MS 78.2 1911.5 20.0

F 3.91 95.51

P 0.066 0.000

Tukey 90.0% Simultaneous Confidence Intervals Response Variable Y All Pairwise Comparisons among Levels of Drug Drug = A subtracted from: Drug B C

Lower -11.56 -3.72

Center -4.820 3.020

Upper 1.922 9.762

-----+---------+---------+---------+(-------*-------) (--------*-------) -----+---------+---------+---------+-8.0 0.0 8.0 16.0

Upper 14.58

-----+---------+---------+---------+(--------*-------) -----+---------+---------+---------+-8.0 0.0 8.0 16.0

Drug = B subtracted from: Drug C

Lower 1.098

Center 7.840

Let μ1, μ2, and μ3 represent the mean clotting time for the three drugs.

H0: μ1 = μ2 = μ3 Ha: At least two means differ The test statistic is F =

MS(Drug) = 3.91 MSE

The p-value is p = 0.066. Since the observed level of significance is less than α = .10, H0 is rejected. There is sufficient evidence to indicate differences in the mean clotting times among the three drugs at α = .10. c.

The observed level of significance is given as 0.066.

d.

To determine if there is a significant difference in the mean response over blocks, we test:

H0: μ1 = μ2 = μ3 = μ4 = μ5 Ha: At least two block means differ The test statistic is F =

MS(Person) = 95.51 MSE


291


The p-value is p = 0.000. Since the observed level of significance is less than α = .10, H0 is rejected. There is sufficient evidence to indicate differences in the mean clotting times among the five people at α = .10. e.

The confidence interval to compare drugs A and B is (-11.56, 1.922). Since 0 is in the interval, there is no evidence of a difference in mean clotting times between drugs A and B. The confidence interval to compare drugs A and C is (-3.72, 9.762). Since 0 is in the interval, there is no evidence of a difference in mean clotting times between drugs A and C. The confidence interval to compare drugs B and C is (1.098, 14.58). Since 0 is not in the interval, there is evidence of a difference in mean clotting times between drugs B and C. Since the numbers are positive, the mean clotting time for drug C is greater than that for drug B. In summary, the mean clotting time for drug C is greater than that for drug B. No other differences exist.

8.88

a.

243.2 57.8 SS A SSB = = 243.2 MSB = = = 57.8 1 1 df B df A SSAB = SSTot- SSA - SSB - SSE = 976.3 - 243.2 - 57.8 - 670.8 = 4.5 SS AB SSE 4.5 670.8 = 4.5 MSE = = 8.712 = = MSAB = 1 77 df AB df E

MSA =

MS A 243.2 = 27.92 = MSE 8.712 MSAB 4.5 = 0.52 FAB = = 8.712 MSE

FA =

FB = B

MSB 57.8 = 6.63 = MSE 8.712

The ANOVA table is: Source Recent Performance (A) Risk attitude(B) AB Error Total

b.

df

1 1 1 77 80

SS 243.2 57.8 4.5 670.8 976.3

MS 243.2 57.8 4.5 8.712

F 27.92 6.63 0.52

To determine if factors A and B interact, we test:

H0: Factors A and B do not interact to affect the mean decision Ha: Factors A and B do interact to affect the mean decision The test statistic is F = 0.52.

292

Chapter 8


The rejection region requires α = .05 in the upper tail of the F-distribution with ν1 = (a − 1)(b − 1) = (2 − 1)(2 − 1) = 1 and ν2 = n − ab = 81 − 2(2) = 77. From Table IX, Appendix B, F.05 ≈ 4.00. The rejection region is F > 4.00. Since the observed value of the test statistic does not fall in the rejection region (F = .52 >/ 4.00), H0 is not rejected. There is insufficient evidence to indicate that factors A and B interact at α = .05. c.

Since the interaction is not significant, the main effect tests are meaningful. To determine if an individual's risk attitude affects his or her budgetary decisions, we test:

H0: No difference exists between the risk attitude means Ha: The risk attitude means differ The test statistic is F = 6.63. The rejection region requires α = .05 in the upper tail of the F-distribution with ν1 = b − 1 = 2 − 1 = 1 and ν2 = n − ab = 81 − 2(2) = 77. From Table IX, Appendix B, F.05 ≈ 4.00. The rejection region is F > 4.00. Since the observed value of the test statistic falls in the rejection region (F = 6.63 > 4.00), H0 is rejected. There is sufficient evidence to indicate an individual's risk attitude affects his or her budgetary decisions at α = .05. d.

To determine if recent performance affects budgeting decisions, we test:

H0: No difference exists between the recent performance means Ha: The recent performance means differ The test statistic is F = 27.92. The rejection region requires α = .01 in the upper tail of the F-distribution with ν1 = a − 1 = 2 − 1 = 1 and ν2 = n − ab = 81 − 2(2) = 77. From Table XI, Appendix B, F.01 ≈ 7.08. The rejection region is F > 7.08. Since the observed value of the test statistic falls in the rejection region (F = 27.92 > 7.08), H0 is rejected. There is sufficient evidence to indicate that recent performance affects his or her budgetary decisions at α = .01.


293


8.90

Let factor A be second plastic and factor B be metal density. Some preliminary calculations are:

(∑ y)

2

5.562 = 3.8642 n 8 SS(Total) = ∑ y 2 − CM = 9.1646 − 3.8642 = 5.3004

CM =

SSA = SSB =

=

Ai2 .922 4.642 ∑ br − CM = 2(2) + 2(2) − 3.8642 = 5.594 − 3.8642 = 1.7298 B 2j

∑ ar

SSAB =

∑

− CM =

ABij2 ar

.57 2 4.992 + − 3.8642 = 6.30625 − 3.8642 = 2.44205 2(2) 2(2)

− SSA − SSB − CM

.062 .862 .512 4.132 + + + − 1.7298 − 2.44205 − 3.8642 2 2 2 2 = 9.0301 − 8.03605 = .99405 SSE = SS(Total) − SSA − SSB − SSAB = 5.3004 − 1.7298 − 2.44205 − .99405 = .1345 SSA 1.7298 MSA = = = 1.7298 a −1 2 −1 SSB 2.44205 = = 2.44205 MSB = b −1 2 −1 SS AB .99405 = MSAB = = .99405 (a − 1)(b − 1) (1)(1) SSE .1345 = MSE = = .033625 n − ab 8 − (2)(2) MSA 1.7298 F(A) = = = 51.44 MSE .033625 MSB 2.44205 = F(B) = = 72.63 MSE .033625 MS AB .99405 F(AB) = = = 29.56 MSE .033625 =

Source A B AB Error Total

df 1 1 1 4

SS 1.72980 2.44205 .99405 .13450 7 5.30040

MS 1.72980 2.44205 .99405 .033625

F 51.44 72.63 29.56

SST = SSA + SSB + SSAB = 1.7298 + 2.44205 + .99405 = 5.1659 SST 5.1659 = = 1.7220 MST = ab − 1 2(2) − 1

294

Chapter 8


F(T) =

MST 1.7220 = 51.21 = MSE .033625

To determine whether differences exist among the treatment means, we test:

H0: μ1 = μ2 = μ3 = μ4 Ha: At least two treatment means differ The test statistic is F = 51.21. The rejection region requires α = .05 in the upper tail of the F-distribution with ν1 = ab − 1 = 2(2) − 1 = 3 and ν2 = n − ab = 8 − 2(2) = 4. From Table IX, Appendix B, F.05 = 6.59. The rejection region is F > 6.59. Since the observed value of the test statistic falls in the rejection region (F = 51.21 > 6.59), H0 is rejected. There is sufficient evidence to indicate differences in mean radiation among the four treatments at α = .05. Since there are differences among the treatment means, we next test to see if the two factors interact.

H0: Second plastic and metal density do not interact Ha: Second plastic and metal density do interact The test statistic is F =

MS AB = 29.56 MSE

The rejection requires α= .05 in the upper tail of the F-distribution with ν1 = (a − 1)(b − 1) = 1 and ν2 = n − ab = 8 − 2(2) = 4. From Table IX, Appendix B, F.05 = 7.71. The rejection region is F > 7.71. Since the observed value of the test statistic falls in the rejection region (F = 29.56 > 7.71), H0 is rejected. There is sufficient evidence to indicate second plastic and metal density interact at α = .05. Since interaction is present, no tests for main effects are necessary. Since we want to find the preferred method to protect patients, we will compare all four treatment means. There are four p ( p − 1) 4(4 − 1) treatments, so c = = = 6. For α* = α/c = .05/6 = .0083 and α*/2 = .0083/2 = 2 2 .0042 ≈ .005 and df = n - ab = 4, t.005 = 4.604 from Table VI, Appendix B.


295


We now form confidence intervals for the differences between each pair of means using the formula: ( xi − x j ) ± t.005 s

1 1 + where s = ni n j

MSE = .033625 = .1834

Pair

11 – 12 11 – 21 11 – 22 12 – 21 12 – 22 21 – 22

1 1 + ⇒ −.40 ± .844 ⇒ (−1.244, .444) 2 2 (.03 − .255) ± .844 ⇒ −.255 ± .844 ⇒ (−1.069, .619) (.03 − 2.065) ± .844 ⇒ −2.035 ± .844 ⇒ (−2.879, −1.191) (.43 − .255) ± .844 ⇒ .175 ± .844 ⇒ (−.669, 1.019) (.43 − 2.065) ± .844 ⇒ −1.635 ± .844 ⇒ (−2.479, −.791) (.255 - 2.065) ± .844 ⇒ −1.81 ± .844 ⇒ (−2.654, −.966) (.03 − .43) ± 4.604(.1834)

The means that differ are 11 and 22, 12 and 22, and 21 and 22. No other means are significantly different. Since we are looking for the treatment that gives the best protection (allows the smallest amount of radiation), we would pick any treatment except 22. Thus, use second plastic present and heavy alloy, second plastic present and light alloy, or second plastic not present and heavy alloy. Pick the one of these three which is the cheapest or the most convenient. 8.92

a.

There are a total of a × b = 3 × 3 = 9 treatments in this study.

b.

Using MINITAB, the ANOVA results are: General Linear Model: Y versus Display, Price Factor Display Price

Type Levels Values fixed 3 1 2 3 fixed 3 1 2 3

Analysis of Variance for Y, using Adjusted SS for Tests Source Display Price Display*Price Error Total

DF 2 2 4 18 26

Seq SS 1691393 3089054 510705 8905 5300057

Adj SS 1691393 3089054 510705 8905

Adj MS F 845696 1709.37 1544527 3121.89 127676 258.07 495

P 0.000 0.000 0.000

To get the SS for Treatments, we must add the SS for Display, SS for Price, and the SS for Interaction. Thus, SST = 1,691,393 + 3,089,054 + 510,705 = 5,291,152. The df = 2 + 2 + 4 = 8. SST 5, 291,152 MST 661,394 = = 661,394 MST = F= = = 1336.15 3(3) − 1 ab − 1 MSE 495

296

Chapter 8


To determine whether the treatment means differ, we test:

H0: μ1 = μ2 = ⋅⋅⋅ = μ9 Ha: At least two treatment means differ The test statistic is F =

MST = 1336.15 MSE

The rejection region requires α = .10 in the upper tail of the F-distribution with ν1 = ab − 1 = 3(3) − 1 = 8 and ν2 = n − ab = 27 − 3(3) = 18. From Table VIII, Appendix B, F.10 = 2.04. The rejection region is F > 2.04. Since the observed value of the test statistic falls in the rejection region (F = 1336.15 > 2.04), H0 is rejected. There is sufficient evidence to indicate the treatment means differ at α = .10. c.

Since there are differences among the treatment means, we next test for the presence of interaction.

H0: Factors A and B do not interact to affect the response means Ha: Factors A and B do interact to affect the response means The test statistic is F =

MSAB = 258.07 MSE

The rejection region requires α = .10 in the upper tail of the F-distribution with ν1 = (a − 1)(b − 1) = (3 − 1)(3 − 1) = 4 and ν2 = n − ab = 17 − 3(3) = 18. From Table VIII, Appendix B, F.10 = 2.29. The rejection region is F > 2.29. Since the observed value of the test statistic falls in the rejection region (F = 258.07 > 2.29), H0 is rejected. There is sufficient evidence to indicate the two factors interact at α = .10. d.

The main effect tests are not warranted since interaction is present in part c.

e.

The nine treatment means need to be compared.

f.

From the graph, if the like letters are connected, the lines are not parallel. This implies interaction is present. This agrees with the results of part c.


297


8.94

a.

This is a completely randomized design with a complete four-factor factorial design.

b.

There are a total of 2 × 2 × 2 × 2 = 16 treatments.

c.

Using SAS, the output is: Analysis of Variance Procedure Dependent Variable: Y Sum of

Mean

Source

DF

Squares

Square

F Value

Pr > F

Model

15

546745.50

36449.70

5.11

0.0012

Error

16

114062.00

7128.88

Corrected Total

31

660807.50

R-Square

C.V.

Root MSE

Y Mean

0.827390

41.46478

84.433

203.63

DF

Anova SS

Mean Square

F Value

Pr > F

SPEED

1

56784.50

56784.50

7.97

0.0123

FEED

1

21218.00

21218.00

2.98

0.1037

SPEED*FEED

1

55444.50

55444.50

7.78

0.0131

COLLET

1

165025.13

165025.13

23.15

0.0002

SPEED*COLLET

1

44253.13

44253.13

6.21

0.0241

FEED*COLLET

1

142311.13

142311.13

19.96

0.0004

SPEED*FEED*COLLET

1

54946.13

54946.13

7.71

0.0135

WEAR

1

378.13

378.13

0.05

0.8208

SPEED*WEAR

1

1540.13

1540.13

0.22

0.6483

FEED*WEAR

1

946.13

946.13

0.13

0.7204

SPEED*FEED*WEAR

1

528.13

528.13

0.07

0.7890

COLLET*WEAR

1

1682.00

1682.00

0.24

0.6337

SPEED*COLLET*WEAR

1

512.00

512.00

0.07

0.7921

FEED*COLLET*WEAR

1

72.00

72.00

0.01

0.9212

SPEE*FEED*COLLE*WEAR

1

1104.50

1104.50

0.15

0.6991

Source

d.

To determine if the interaction terms are significant, we must add together the sum of squares for all interaction terms as well as the degrees of freedom. SS(Interaction) = 55,444.50 + 44,253.13 + 142,311.13 + 54,946.13 + 1,540.13 + 946.13 + 528.13 + 1,682.00 + 512.00 + 72.00 + 1,104.50 = 303,339.78 df(Interaction) = 11 SS(Interacton) 303, 339.78 = = 27,576.34364 MS(Interaction) = 11 df(Interaction) MS(Interaction) 27, 576.34364 = 3.87 F(Interaction) = = MSE 7128.88

298

Chapter 8


To determine if interaction effects are present, we test:

H0: No interaction effects exist Ha: Interaction effects exist The test statistic is F = 3.87. The rejection region requires α = .05 in the upper tail of the F-distribution with ν1 = 11 and ν2 = 16. From Table IX, Appendix B, F.05 ≈ 2.49. The rejection region is F > 2.49. Since the observed value of the test statistic falls in the rejection region (F = 3.87 > 2.49), H0 is rejected. There is sufficient evidence to indicate that interaction effects exist at α = .05. Since the sums of squares for a balanced factorial design are independent of each other, we can look at the SAS output to determine which of the interaction effects are significant. The three-way interaction between speed, feed, and collet is significant (p = .0135). There are three two-way interactions with p-values less than .05. However, all of these two-way interaction terms are imbedded in the significant three-way interaction term. e.

Yes. Since the significant interaction terms do not include wear, it would be necessary to perform the main effect test for wear. All other main effects are contained in a significant interaction term. To determine if the mean finish measurements differ for the different levels of wear, we test:

H0: The mean finish measurements for the two levels of wear are the same Ha: The mean finish measurements for the two levels of wear are different The test statistic is t = 0.05. The rejection region requires α = .05 in the upper tail of the F-distribution with ν1 = 1 and ν2 = 16. From Table IX, Appendix B, F.05 = 4.49. The rejection region is F > 4.49. Since the observed value of the test statistic does not fall in the rejection region (F = .05 >/ 4.49), H0 is not rejected. There is insufficient evidence to indicate that the mean finish measurements differ for the different levels of wear at α = .05. f.

We must assume that: i. ii. iii.

The populations sampled from are normal. The population variances are the same. The samples are random and independent.


299


Categorical Data Analysis

9.2

Chapter 9

The characteristics of the multinomial experiment are: 1. 2. 3. 4. 5.

The experiment consists of n identical trials. There are k possible outcomes to each trial. The probabilities of the k outcomes, denoted p1, p2, ... , pk, remain the same from trial to trial, where p1 + p2 + ⋅⋅⋅ + pk = 1. The trials are independent. The random variables of interest are the counts n1, n2, ... , nk in each of the k cells.

The characteristics of the binomial are the same as those for the multinomial with k = 2. 9.4

The hypotheses of interest are: H0: p1 = .25, p2 = .25, p3 = .50 Ha: At least one of the probabilities differs from the hypothesized value E(n1) = np1,0 = 320(.25) = 80 E(n2) = np2,0 = 320(.25) = 80 E(n3) = np3,0 = 320(.50) = 160 The test statistic is χ = 2

∑

[ ni − E (ni )] E (ni )

2

=

(78 − 80) 2 (60 − 80) 2 (182 − 160)2 = 8.075 + + 80 80 160

The rejection region requires α = .05 in the upper tail of the χ2 distribution with df = k − 1 2 = 5.99147. The rejection region is χ2 > = 3 − 1 = 2. From Table VII, Appendix B, χ.05 5.99147. Since the observed value of the test statistic falls in the rejection region (χ2 = 8.075 > 5.99147), H0 is rejected. There is sufficient evidence to indicate that at least one of the probabilities differs from its hypothesized value at α = .05. 9.6

300

a.

The qualitative variable of interest is the location of professional sports stadiums and ballparks. There are 3 levels or categories of this variable – downtown, central city, and suburban.

b.

Let p1 = proportion of major sports facilities located in downtown areas, p2 = proportion of major sports facilities located in central city areas, and p3 = proportion of major sports facilities located in suburban areas in 1997.

Chapter 9


To determine if the proportions of major sports facilities in downtown, central city, and suburban areas in 1997 are the different than in 1985, we test: H0: p1 = .40, p2 = .30, p3 = .30 Ha: At least one of the proportions differs from their hypothesized values c.

E(n1) = np1,0 = 113(.40) = 45.2; E(n2) = np2,0 = 113(.30) = 33.9; E(n3) = np3,0 = 113(.30) = 33.9

d.

The test statistic is [n − E (ni )]2 (58 − 45.2) 2 (26 − 33.9) 2 (29 − 33.9) 2 = + + = 6.174 χ2 = ∑ i 45.2 33.9 33.9 E ( ni )

e.

The degrees of freedom for the test statistic is k – 1 = 3 – 1 = 2. The p-value is p = P ( χ 2 ≥ 6.174) .

Using Table VII, Appendix B, with df = 2, .025 > P ( χ 2 ≥ 6.174) > .01 . Thus, .01 < p < .025. Since the p-value is smaller than α = .05, H0 is rejected. There is sufficient evidence to indicate the proportions of major sports facilities in downtown, central city, and suburban areas in 1997 are the different than in 1985. 9.8

a.

The categorical variable is the rating of the student exposure to social and environmental issues. It has 5 levels: 1-star, 2-stars, 3-stars, 4-stars, and 5-stars.

b.

If there were no difference in the category proportions, then each proportion should be pi = 1/5 = .20. There were a total of n = 30 business schools sampled. The expected number would be: E(n1) = E(n2) = E(n3) = E(n4) = E(n5) = n(pi,0) = 30(.20) = 6

c.

To determine if there are differences in the star rating category proportions of all MBA programs, we test: H0: p1 = p2 = p3 = p4 = p5 = .20 Ha: At least one pi differs from its hypothesized value

d.

The test statistic is

⎡ ni − E ( ni ) ⎦⎤ ( 2 − 6 )2 ( 9 − 6 )2 (14 − 6 )2 ( 5 − 6 )2 ( 0 − 6 )2 = + + + + = 21 χ =∑⎣ E ( ni ) 6 6 6 6 6 2

2

e.

The rejection region requires α = .05 in the upper tail of the χ2 distribution with 2 = 9.48773. The rejection df = k – 1 = 5 – 1 = 4. From Table VII, Appendix B, χ.05 2 region is χ > 9.48773.


301


f.

Since the observed value of the test statistic falls in the rejection region (χ2 = 21 > 9.48773), H0 is rejected. There is sufficient evidence to indicate differences in the star rating category proportions of all MBA programs at α = .05.

g.

Some preliminary calculations are: pˆ 3 =

x3 14 = = .467 n 30


pˆ 3 ± z.025

pˆ 3qˆ3 .467(.533) ⇒ .467 ± 1.96 ⇒ .467 ± .179 ⇒ (.288, .646) n 30

We are 95% confident that the proportion of all MBA programs that are ranked in the 3-star category is between .288 and .646. 9.10

a.

Some preliminary calculations are: E(n1) = np1,0 = 1000(.50) = 500

E(n2) = np2,0 = 1000(.22) = 220

E(n3) = np3,0 = 1000(.11) = 110

E(n4) = np4,0 = 1000(.17) = 170

To determine if the percentages disagree with the percentages reported by Nielson/NetRatings, we test: H0: p1 = .50, p2 = .22, p3 = .11, and p4 = .17 Ha: At least one pi differs from its hypothesized value The test statistic is 2 2 2 2 ⎡⎣ ni − E ( ni ) ⎤⎦ 487 − 500 ) 245 − 220 ) 121 − 110 ) 147 − 170 ) ( ( ( ( = + + + χ =∑ 500 220 110 170 E ( ni ) 2

2

= 7.391 The rejection region requires α = .05 in the upper tail of the χ2 distribution with 2 df = k – 1 = 4 – 1 = 3. From Table VII, Appendix B, χ.05 = 7.81473. The rejection 2 region is χ > 7.81473. Since the observed value of the test statistic does not fall in the rejection region (χ2 = 7.391 >/ 7.81473), H0 is not rejected. There is insufficient evidence to indicate the percentages disagree with the percentages reported by Nielson/NetRatings at α = .05.

302

Chapter 9


b.

Some preliminary calculations are: pˆ1 =

x1 487 = = .487 n 1000

For confidence coefficient .95, α = .05 and α/2 = .05/2 = .025. From Table IV, Appendix B, z.025 = 1.96. The 95% confidence interval is: pˆ1 ± z.025

pˆ1qˆ1 .487(.513) ⇒ .487 ± 1.96 ⇒ .487 ± .031 ⇒ (.456, .518) n 1000

We are 95% confident that the percentage of all Internet searches that use the Google Search Engine is between 45.6% and 51.8%. 9.12

Some preliminary calculations are: E(n1) = np1,0 = 2,023(.45) = 910.35

E(n2) = np2,0 = 2,023 (.35) = 708.05

E(n3) = np3,0 = 2,023 (.15) = 303.45

E(n4) = np4,0 = 2,023 (.05) = 101.15

To determine if the percentages of all adults falling into the four response categories changed after the Enron scandal, we test: H0: p1 = .45, p2 = .35, p3 = .15, and p4 = .05 Ha: At least one pi differs from its hypothesized value The test statistic is 2 2 2 2 ⎡⎣ ni − E ( ni ) ⎤⎦ 1,173 − 910.35 ) 587 − 708.05 ) 182 − 303.45 ) 81 − 101.15 ) ( ( ( ( χ =∑ = + + + 910.35 708.05 303.45 101.15 E ( ni ) 2

2

= 149.096 The rejection region requires α = .01 in the upper tail of the χ2 distribution with 2 = 11.3449. The rejection region is df = k – 1 = 4 – 1 = 3. From Table VII, Appendix B, χ.01 2 χ > 11.3449. Since the observed value of the test statistic falls in the rejection region (χ2 = 149.096 > 11.3449), H0 is rejected. There is sufficient evidence to indicate the percentages of all adults falling into the four response categories changed after the Enron scandal at α = .01.


303


9.14

a.

Some preliminary calculations are: E(n1) = np1,0 = 700(.09) = 63 E(n3) = np3,0 = 700(.02) = 14 E(n5) = np5,0 = 700(.12) = 84 E(n7) = np7,0 = 700(.03) = 21 E(n9) = np9,0 = 700(.09) = 63 E(n11) = np11,0 = 700(.01) = 7 E(n13) = np13,0 = 700(.02) = 14 E(n15) = np15,0 = 700(.08) = 56 E(n17) = np17,0 = 700(.01) = 7 E(n19) = np19,0 = 700(.04) = 28 E(n21) = np21,0 = 700(.04) = 28 E(n23) = np23,0 = 700(.02) = 14 E(n25) = np25,0 = 700(.02) = 14 E(n27) = np27,0 = 700(.02) = 14

χ2 = ∑

E(n2) = np2,0 = 700(.02) = 14 E(n4) = np4,0 = 700(.04) = 28 E(n6) = np6,0 = 700(.02) = 14 E(n8) = np8,0 = 700(.02) = 14 E(n10) = np10,0 = 700(.01) = 7 E(n12) = np12,0 = 700(.04) = 28 E(n14) = np14,0 = 700(.06) = 42 E(n16) = np16,0 = 700(.02) = 14 E(n18) = np18,0 = 700(.06) = 42 E(n20) = np20,0 = 700(.06) = 42 E(n22) = np22,0 = 700(.02) = 14 E(n24) = np24,0 = 700(.01) = 7 E(n26) = np26,0 = 700(.01) = 7

[ ni − E (ni )]2 (39 − 63) 2 (18 − 14) 2 (30 − 14) 2 (34 − 14) 2 = + + + ... + = 360.48 E (ni ) 63 14 14 14

To determine if ScrabbleExpress “ presents the player with unfair word selection opportunities” that are different from the Scrabble board game, we test: H0: Proportions in ScrabbleExpress are the same as in the Scrabble board game Ha: Proportions in ScrabbleExpress are different from those in the Scrabble board game The test statistic is χ 2 = 360.47 The rejection region requires α = .05 in the upper tail of the χ 2 distribution with df = k – 1 = 27 – 1 = 26. From Table VII, Appendix B, χ 2 = 38.8852. The rejection region is χ 2 > 38.8852. Since the observed value of the test statistic falls in the rejection region ( χ 2 = 360.47 > 38.8852), H0 is rejected. There is sufficient evidence to indicate the ScrabbleExpress “presents the player with unfair word selection opportunities” that are different from the Scrabble board game at α = .05. b.

The relative frequency of vowels for the board game is P(A) + P(E) + P(I) + P(O) + P(U) = .09 + .12 + .09 +.08 + .04 = .42 pˆ v =

304

39 + 31 + 25 + 20 + 21 136 = = .194 700 700

Chapter 9


For confidence level .95, α = .05 and α/2 = .05/2 = .025. From Table IV, Appendix B, z.025 = 1.96. The 95% confidence interval is: pˆ v (1 − pˆ v ) .194(.806) ⇒ .194 ± 1.96 ⇒ .194 ± .029 ⇒ (.165, .223) n 700

pˆ v ± z.025

We are 95% confident that the true proportion of vowels in the ScrabbleExpress game is between .165 and .223. The true proportion from the board game is .42 which is much greater than the values in the interval. 9.16

2 df = (r − 1)(c − 1) = (5 − 1)(5 − 1) = 16. From Table VII, Appendix B, χ.05 = 26.2962. 2 The rejection region is χ > 26.2962.

a.

b.

9.18

2 df = (r − 1)(c − 1) = (3 − 1)(6 − 1) = 10. From Table VII, Appendix B, χ.10 = 15.9871. 2 The rejection region is χ > 15.9871.

c.

df = (r − 1)(c − 1) = (2 − 1)(3 − 1) = 2. From Table VII, Appendix B, χ2 = 9.21034. The rejection region is χ2 > 9.21034.

a.

To convert the frequencies to percentages, divide the numbers in each column by the column total and multiply by 100. Also, divide the row totals by the overall total and multiply by 100. The column totals are 25, 64, and 78, while the row totals are 96 and 71. The overall sample size is 165. The table of percentages are: Column 2

1 Row 1

b.

3

9 ⋅ 100 = 36% 25

34 ⋅ 100 = 53.1% 64

53 ⋅ 100 = 67.9% 78

96 ⋅ 100 = 57.5% 167

2 16 ⋅ 100 = 64% 25

30 ⋅ 100 = 46.9% 64

25 ⋅ 100 = 32.1% 78

71 ⋅ 100 = 42.5% 167

Using MINITAB, the graph is:

70 60

57.5%

50

Percent

40 30 20 10 0 1

2

3

Column


305


c.

9.20

If the rows and columns are independent, the row percentages in each column would be close to the row total percentages. This pattern is not evident in the plot, implying the rows and columns are not independent.

a-b. To convert the frequencies to percentages, divide the numbers in each column by the column total and multiply by 100. Also, divide the row totals by the overall total and multiply by 100. B B2

B1 B

c.

B3

B

B

Totals

A1 40 ⋅ 100 = 29.9% 134

72 ⋅ 100 = 44.2% 163

42 ⋅ 100 = 29.6% 142

154 ⋅ 100 = 35.1% 439

A2 63 ⋅ 100 = 47.0% Row 134

53 ⋅ 100 = 32.5% 163

70 ⋅ 100 = 49.3% 142

186 ⋅ 100 = 42.4% 439

A3 31 ⋅ 100 = 23.1% 134

38 ⋅ 100 = 23.3% 163

30 ⋅ 100 = 21.1% 142

99 ⋅ 100 = 22.6% 439

Using MINITAB, the graph is:

45 40 35

35.1%

30

Percent

25 20 15 10 5 0 1

2

3

B

The graph supports the conclusion that the rows and columns are not independent. If they were, then the height of all the bars would be essentially the same. 9.22

a.

The contingency table would be: Taxmotivation Yes No Total

306

Itemize Deductions Yes No 691 381 794 899 1,482 1,280

Total 1,072 1,693 2,765

Chapter 9


b.

c.

E11 =

R1C1 1,072(1, 485) = = 575.7 n 2,765

E21 =

R2C1 1,693(1, 485) = = 909.3 n 2,765

E12 =

R1C2 1,072(1, 280) = = 496.3 n 2,765

E22 =

R2C2 1,693(1, 280) = = 783.7 n 2,765

The test statistic is:

χ 2 = ∑∑

[nij − Eij ]2 Eij

[691 − 575.7]2 [381 − 496.3]2 [794 − 909.3]2 [899 − 783.7]2 + + + 575.7 496.3 909.3 783.7 = 81.46 =

d.

To determine if tax-motivation and itemize-deduction are related for charitable givers, we test: H0: Tax-motivation and itemize-deduction are independent Ha: Tax-motivation and itemize-deduction are dependent The test statistic is χ 2 = 81.46. The rejection region requires α = .05 in the upper tail of the χ 2 distribution with df = 2 = 3.84146. The (r – 1)(c – 1) = (2 – 1)(2 – 1) = 1. From Table VII, Appendix B, χ.05

rejection region is χ 2 > 3.84146. Since the observed value of the test statistic falls in the rejection region ( χ 2 = 81.46 > 3.84146), H0 is rejected. There is sufficient evidence to indicate that tax-motivation and itemize-deduction are related for charitable givers at α = .05. e.

To compute the bar graph, we first convert frequencies to percentages by dividing the numbers in each column by the column total and multiplying by 100%. Also, divide the row totals by the overall total and multiply by 100%.

Taxmotivation Yes No Total

Itemize Deductions Yes 691 ⋅ 100% = 46.5% 1485 794 ⋅ 100% = 53.5% 1485 1,485


No 381 ⋅ 100% = 29.8% 1280 899 ⋅ 100% = 70.2% 1280 1,280

Total 1072 ⋅ 100% = 38.8% 2765 1693 ⋅ 100% = 61.28% 2765 2,765

307


Using MINITAB, the bar graph is:

50

40

38.8%

Percent

30

20

10

0 Yes

No

Itemize

9.24.

a.

Some preliminary calculations are: pˆ C1 =

xC1 175 = = .028 n1 6, 222

pˆ C 2 =

xC 2 236 = = .050 4,692 n2

pˆ C 3 =

xC 3 319 = = .045 7,140 n3

pˆ C 4 =

xC 4 231 = = .038 6,120 n4

pˆ C 5 =

xC 5 480 = = .046 n5 10,353

pˆ C 6 =

xC 6 187 = = .039 4794 n6

The proportions range from .028 to .050. Since .050 is about twice as big as .028, there may be evidence to conclude some of the proportions are different. b.

308

Some preliminary calculations are: E11 =

R1C1 6, 222(37,693) = = 5,964.39 n 39,321

E12 =

R1C2 6, 222(1628) = = 257.61 n 39,321

E21 =

R2C1 4,692(37,693) = = 4497.74 n 39,321

E22 =

R2C2 4,692(1,628) = = 194.26 n 39,321

E31 =

R3C1 7,140(37,693) = = 6,844.38 n 39,321

E32 =

R3C2 7,140(1,628) = = 295.62 n 39,321

E41 =

R4C1 6,120(37,693) = = 5,866.61 n 39,321

E42 =

R4C2 6,120(1,628) = = 253.39 n 39,321

Chapter 9


E51 =

R5C1 10,353(37,693) = = 9,924.36 n 39,321

E52 =

R5C2 10,353(1,628) = = 428.64 n 39,321

E61 =

R6C1 4,794(37,693) = = 4,595.51 39,321 n

E62 =

R6C2 4,794(1,628) = = 198.49 39,321 n

To determine if the proportions of censored measurements differ for the six tractor lines, we test: H0: Tractor lines and Censored measurements are independent Ha: Tractor lines and Censored measurements are dependent The test statistic is 2

2 2 2 ⎡ nij − Eij ⎤ 6047 − 5964.39 ) 175 − 257.61) 4456 − 4497.74 ) ( ( ( ⎣ ⎦ χ = ∑∑ = + + 5964.39 257.61 4497.74 Eij 2

2 187 − 198.49 ) ( + ⋅⋅⋅ +

198.49

= 48.0978

The rejection region requires α = .01 in the upper tail of the χ2 distribution with 2 = 15.0863. df = (r – 1)(c – 1) = (6 – 1)(2 − 1) = 5. From Table VII, Appendix B, χ.01 2 The rejection region is χ > 15.0863. Since the observed value of the test statistic falls in the rejection region (χ2 = 48.0978 > 15.0863), H0 is rejected. There is sufficient evidence to indicate that the proportions of censored measurements differ for the six tractor lines at α = .01. c.

9.26

Even though there are differences in the proportions of censured data among the 6 tractor lines, these proportions range from .028 to .050. In practice, there is very little difference between .028 and .050.


R1C1 95(118) = = 42.8 262 n

E21 =

R2 C1 69(118) = = 31.1 n 262

E31 =

R3 C1 42(118) = = 18.9 n 262

E32 =

R3 C2 42(144) = = 23.1 n 262

E41 =

R4 C1 56(118) = = 25.2 n 262

E42 =

R4 C2 56(144) = = 30.8 n 262


E12 =

R1C2 95(144) = = 52.2 262 n

E22 =

R2 C2 69(144) = = 37.9 n 262

309


To determine whether a pig farmer’s education level has an impact on the size of the pig farm, we test: H0: Pig farmer’s education level and size of pig farm are independent Ha: Pig farmer’s education level and size of pig farm are dependent The test statistic is

χ 2 = ∑∑ +

[nij − Eij ]2 Eij

=

(42 − 42.8) 2 (53 − 52.2) 2 (27 − 31.1) 2 (42 − 37.9) 2 (22 − 18.9) 2 + + + + 42.8 52.2 31.1 37.9 18.9

(20 − 23.1) 2 (27 − 25.2)2 (29 − 30.8) 2 + + = 2.17 23.1 25.2 30.8

The rejection region requires α = .05 in the upper tail of the χ 2 distribution with df 2 = (r – 1)(c – 1) = (4 – 1)(2 – 1) = 3. From Table VII, Appendix B, χ.05 = 7.81473. The

rejection region is χ 2 > 7.81473. Since the observed value of the test statistic does not fall in the rejection region ( χ 2 = 2.17 >/ 7.81473), H0 is not rejected. There is insufficient evidence to indicate that a pig farmer’s education level has an impact on the size of the pig farm at α = .05. To compute the bar graph, we first convert frequencies to percentages by dividing the numbers in each row by the row total and multiplying by 100%. Also, divide the column totals by the overall total and multiply by 100%. Farm Size <1,000 pigs 1,000-2,000 pigs 2,000-5,000 pigs > 5,000 pigs Total

310

Education Level No college College 42 53 ⋅ 100% = 44.2% ⋅ 100% = 55.8% 95 95 27 42 ⋅ 100% = 39.1% ⋅ 100% = 60.9% 69 69 22 20 ⋅ 100% = 52.4% ⋅ 100% = 47.6% 42 42 27 29 ⋅ 100% = 48.2% ⋅ 100% = 51.8% 56 56 118 144 ⋅ 100% = 45.0% ⋅ 100% = 55.0% 262 262

Total 95

69 42 56 262

Chapter 9



50 45.0%

Percent

40

30 20

10 0 <1,000

1,000-2,000

2,000-5,000

>5,000

Farm Size

9.28

a.

Some preliminary calculations are: R1C1 53(35) = = 26.5 n 70 R C 17(35) = 8.5 E21 = 2 1 = n 70

R1C2 53(35) = = 26.5 n 70 R C 17(35) E22 = 2 2 = = 8.5 n 70

E11 =

E12 =

To determine if the severity of the ethical issue influenced whether the issue was identified or not by the auditors, we test: H0: Severity of ethical issue and identification are independent Ha: Severity of ethical issue and identification are dependent ⎡ nij − Eij ⎤⎦ The test statistic is χ = ∑∑ ⎣ Eij

2

2

(27 − 26.5) (26 − 26.5) (8 − 8.5) (9 − 8.5) + + + = = .078 26.5 26.5 8.5 8.5 2

2

2

2

The rejection region requires α = .05 in the upper tail of the χ2 distribution with df = 2 = 3.84146. The (r − 1)(c − 1) = (2 − 1)(2 − 1) = 1. From Table VII, Appendix B, χ.05 2 rejection region is χ > 3.84146. Since the observed value of the test statistic does not fall in the rejection region (χ2 = .078 >/ 3.84146), H0 is not rejected. There is insufficient evidence to indicate that the severity of the ethical issue influenced whether the issue was identified or not by the auditors at α = .05. b.

No. If there were 0 in the bottom cell of the column, then the expected count for that cell will be less than 5. One of the assumptions necessary for the test statistic to have a χ2 distribution will not hold.


311


c.

Suppose we change the numbers in the table to be as follows: Severity of Ethical Issue Moderate Severe 32 21 3 14

Ethical Issue Identified Ethical Issue Not Identified

Since the row and column totals are the same, the expected cell counts are the same as above.

⎡⎣ nij − Eij ⎤⎦ The test statistic is χ = ∑∑ Eij

2

2

=

(32 − 26.5) 2 (21 − 26.5) 2 (3 − 8.5) 2 (14 − 8.5) 2 + + + = 9.401 26.5 26.5 8.5 8.5

Now the test statistic would fall in the rejection region. 9.30

a.

The contingency table is:

Altitude < 300 300-600 ≥ 600 Totals

b.

Flight Response Low High 85 105 77 121 17 59 179 285

Totals 190 198 76 464


E11 =

R1C1 190(179) = = 73.297 n 464

E12 =

R1C2 190(285) = = 116.703 n 464

E21 =

R2C1 198(179) = = 76.384 n 464

E22 =

R2C2 198(285) = = 121.616 n 464

E31 =

R3C1 76(179) = = 29.319 464 n

E32 =

R3C2 76(285) = = 46.681 464 n

To determine if flight response of the geese depends on the altitude of the helicopter, we test:

H0: Flight response and Altitude of helicopter are independent Ha: Flight response and Altitude of helicopter are dependent

312

Chapter 9


The test statistic is

⎡ nij − Eij ⎤ ⎦ χ = ∑∑ ⎣ Eij

2

2

=

(85 − 73.297 )2 (105 − 116.703)2 ( 77 − 76.384 )2 (121 − 121.616 )2 73.297

+

+

+

116.703

(17 − 29.319 )

2

29.319

+

( 59 − 46.681)

+

76.384

121.616

2

46.681

= 11.477 The rejection region requires α = .01 in the upper tail of the χ2 distribution with 2 df = (r – 1)(c – 1) = (3 – 1)(2 − 1) = 2. From Table VII, Appendix B, χ.01 = 9.21034. 2 The rejection region is χ > 9.21034. Since the observed value of the test statistic falls in the rejection region (χ2 = 11.477 > 9.21034), H0 is rejected. There is sufficient evidence to indicate that the flight response of the geese depends on the altitude of the helicopter at α = .01. c.

The contingency table is: Flight Response Lateral Distance < 1000 1000-2000 2000-3000 ≥ 3000 Totals

d.

Low 37 68 44 30 179

High 243 37 4 1 285

Totals 280 105 48 31 464


E11 =

R1C1 280(179) = = 108.017 n 464

E12 =

R1C2 280(285) = = 171.983 n 464

E21 =

R2C1 105(179) = = 40.506 n 464

E22 =

R2C2 105(285) = = 64.494 n 464

E31 =

R3C1 48(179) = = 18.517 464 n

E32 =

R3C2 48(285) = = 29.483 464 n

E41 =

R 4 C1 31(179) = = 11.959 n 464

E42 =

R4C2 31(285) = = 19.041 n 464


313


To determine if flight response of the geese depends on the lateral distance of the helicopter, we test:

H0: Flight response and Lateral distance of the helicopter are independent Ha: Flight response and Lateral distance of the helicopter are dependent The test statistic is ⎡ nij − Eij ⎤ ⎦ χ 2 = ∑∑ ⎣ Eij =

2

( 37 − 108.017 )2 ( 243 − 171.983)2 ( 68 − 40.506 )2 ( 37 − 64.494 )2 108.017 +

+

171.983

( 44 − 18.517 ) 18.517

2

+

+

( 4 − 29.494 )

40.506 2

29.494

+

+

( 30 − 11.959 )

64.494 2

11.959

+

(1 − 19.041)2 19.041

= 207.814 The rejection region requires α = .01 in the upper tail of the χ2 distribution with 2 df = (r – 1)(c – 1) = (4 – 1)(2 − 1) = 3. From Table VII, Appendix B, χ.01 = 11.3449. 2 The rejection region is χ > 11.3449. Since the observed value of the test statistic falls in the rejection region (χ2 = 207.814 > 11.3449), H0 is rejected. There is sufficient evidence to indicate that the flight response of the geese depends on the lateral distance of the helicopter at α = .01. e.

Using SAS, the contingency table for altitude by response with the column percents is: Table of ALTGRP by RESPONSE ALTGRP

RESPONSE

Frequency| Percent | Row Pct | Col Pct |LOW |HIGH | Total ---------+--------+--------+ <300 | 85 | 105 | 190 | 18.32 | 22.63 | 40.95 | 44.74 | 55.26 | | 47.49 | 36.84 | ---------+--------+--------+ 300-600 | 77 | 121 | 198 | 16.59 | 26.08 | 42.67 | 38.89 | 61.11 | | 43.02 | 42.46 | ---------+--------+--------+ 600+ | 17 | 59 | 76 | 3.66 | 12.72 | 16.38 | 22.37 | 77.63 | | 9.50 | 20.70 | ---------+--------+--------+ Total 179 285 464 38.58 61.42 100.00

314

Chapter 9


Statistics for Table of ALTGRP by RESPONSE Statistic DF Value Prob -----------------------------------------------------Chi-Square 2 11.4770 0.0032 Likelihood Ratio Chi-Square 2 12.1040 0.0024 Mantel-Haenszel Chi-Square 1 10.2104 0.0014 Phi Coefficient 0.1573 Contingency Coefficient 0.1554 Cramer's V 0.1573 Sample Size = 464

From the row percents, it appears that the lower the plane, the lower the response. For altitude <300m, 55.26% of the geese had a high response. For altitude 300600m, 61.11% of the geese had a high response. For altitude 600+m, 77.63% of the geese had a high response. Thus, instead of setting a minimum altitude for the planes, we need to set a maximum altitude. For this data, the lowest response is at an altitude of < 300 meters. Using SAS, the contingency table for lateral distance by response with the column percents is: The FREQ Procedure Table of LATGRP by RESPONSE LATGRP

RESPONSE

Frequency | Percent | Row Pct | Col Pct |LOW |HIGH | Total ----------+--------+--------+ <1000 | 37 | 242 | 279 | 7.99 | 52.27 | 60.26 | 13.26 | 86.74 | | 20.67 | 85.21 | ----------+--------+--------+ 1000-2000 | 68 | 37 | 105 | 14.69 | 7.99 | 22.68 | 64.76 | 35.24 | | 37.99 | 13.03 | ----------+--------+--------+ 2000-3000 | 44 | 4 | 48 | 9.50 | 0.86 | 10.37 | 91.67 | 8.33 | | 24.58 | 1.41 | ----------+--------+--------+ 3000+ | 30 | 1 | 31 | 6.48 | 0.22 | 6.70 | 96.77 | 3.23 | | 16.76 | 0.35 | ----------+--------+--------+ Total 179 284 463 38.66 61.34 100.00 Frequency Missing = 1 Statistics for Table of LATGRP by RESPONSE Statistic DF Value Prob -----------------------------------------------------Chi-Square 3 207.0800 <.0001 Likelihood Ratio Chi-Square 3 226.8291 <.0001 Mantel-Haenszel Chi-Square 1 189.2843 <.0001 Phi Coefficient 0.6688 Contingency Coefficient 0.5559 Cramer's V 0.6688 Effective Sample Size = 463 Frequency Missing = 1


315


From the row percents, it appears that the greater the lateral distance, the lower the response. For a lateral distance of 3000+m only 3.23% of the geese had a high response. Thus, the further away the plane is laterally, the lower the response. For this data, the lowest response is when the plane is further than 3000 meters. Thus the recommendation would be a maximum height of 300 m and a minimum lateral distance of 3000 m. 9.32

a.

Some preliminary calculations are: E11 = E12 = E13 = E31 = E32 = E33 =

50(50) = 10 250 50(90) = 18 250 50(110) = 22 250 100(50) = 20 250 100(90) = 36 250 100(110) = 44 250

100(50) = 20 250 100(90) E22 = = 36 250 100(110) E23 = = 44 250 E21 =

To determine if the rows and columns are dependent, we test: H0: Rows and columns are independent Ha: Rows and columns are dependent 2

⎡ nij − Eij ⎤⎦ (20 − 10) 2 (30 − 44) 2 +"+ The test statistic is χ = ∑∑ ⎣ = = 54.14 10 44 Eij 2

The rejection region requires α = .05 in the upper tail of the χ2 distribution with df = 2 = 9.48773. The (r − 1)(c − 1) = (3 −1)(3 − 1) = 4. From Table VII, Appendix B, χ.05 2 rejection region is χ > 9.48773. Since the observed value of the test statistic falls in the rejection region (χ2 = 54.14 > 9.48773), H0 is rejected. There is sufficient evidence to indicate a dependence between rows and columns at α = .05.

316

b.

No, the analysis remains identical.

c.

Yes, the assumptions on the sampling differ.

Chapter 9


d.

The percentages are in the table below. Column 2

1 1

20 50

Row

2

10 50

3

20 50

e.

20

× 100% = 40%

90

20

× 100% = 20%

90 50

× 100% = 40%

90

3 10

× 100% = 22.2%

110

70

× 100% = 22.2%

110 30

× 100% = 55.6%

110

× 100% = 9.1%

Totals 50 250

× 100% = 63.6%

100

× 100% = 37.3%

100

250 250

× 100% = 20% × 100% = 40% × 100% = 40%


40

Percent

30

20%

20

10

0 1

2

3

Column

The graph supports the decision in part a. In part a, we rejected the null hypothesis and concluded that the rows and columns were dependent. If they were dependent, then we would expect the three bars to be the same height. In this graph, they are not the same height. 9.34

a.

If Bon Appetit readers do not have a preference for their least favorite vegetable, then the values of p1, p2, p3, and p4 should all be the same. Since there are four categories, then p1 = p2 = p3 = p4 = .25.

b.

To determine if the Bon Appetit readers have a preference for at least one of the vegetables as “least favorite”, we test: H0: p1 = p2 = p3 = p4 = .25 Ha At least one pi ≠ .25


317


c.

Some preliminary calculations: n=

∑n

i

= 46 + 76 + 44 + 34 = 200

E(ni) = npi,0 = 200(.25) = 50, i = 1, 2, 3, or 4 The test statistic is χ = 2

=

∑

[ ni − E (ni )]

2

E ( ni )

(46 − 50) 2 (76 − 50) 2 (44 − 50) 2 (34 − 50) 2 = 19.68 + + + 50 50 50 50

The rejection region requires α = .05 in the upper tail of the χ2 distribution with df = k − 1 2 = 7.81473. The rejection region is = 4 − 1 = 3. From Table VII, Appendix B, χ.05 2 χ > 7.81473. Since the observed value of the test statistic falls in the rejection region (χ2 = 19.68 > 7.81473), H0 is rejected. There is sufficient evidence to indicate the Bon Appetit readers have a preference for at least one of the vegetables as “least favorite” at α = .05. d.

We must assume that: Sample is random Sample size is sufficiently large (every cell has an expected value of at least 5).

9.36

a.


R1C1 242(473) = = 208.499 n 549

E21 =

R2 C1 212(473) = = 182.652 n 549

E31 =

R3 C1 95(473) = = 81.849 549 n

E12 =

R1C2 242(76) = = 33.501 n 549

E22 = E32 =

R2 C2 212(76) = = 29.348 n 549

R3 C2 95(76) = = 13.151 549 n

To determine if the likelihood for stress is dependent on an employee’s fitness level, we test: H0: Stress and Fitness level are independent Ha: Stress and Fitness level are dependent

318

Chapter 9


The test statistic is ⎡ nij − Eij ⎤ ⎦ χ = ∑∑ ⎣ Eij

2

2

=

( 204 − 208.499 )2 ( 38 − 33.506 )2 (184 − 182.652 )2 +

208.499 +

( 28 − 29.348) 29.348

33.506 2

+

+

182.652

(85 − 81.849 ) 81.849

2

+

(10 − 13.151)2 13.151

= 1.648 Since no α level was given, we will use α = .05. The rejection region requires α = .05 in the upper tail of the χ2 distribution with df = (r – 1)(c – 1) = (3 – 1)(2 − 1) = 2. 2 From Table VII, Appendix B, χ.05 = 5.99147. The rejection region is χ2 > 5.99147. Since the observed value of the test statistic does not fall in the rejection region (χ2 = 1.648 > 5.99147), H0 is not rejected. There is insufficient evidence to indicate that the likelihood for stress is dependent on an employee’s fitness level at α = .05. b.

A Type I error is rejecting H0 when H0 is true. In this case, it would be concluding that Stress and Fitness level are dependent when, in fact, they are independent. A Type II error is accepting Ho when Ho is false. In this case, it would be concluding that Stress and Fitness level are independent when, in fact, they are dependent.

c.

To convert frequencies to percentages, divide the numbers in each row by the row total and multiply by 100. Also, divide the column totals by the overall total and multiply by 100. Stress Level Poor

Fitness Level

Average Good Total


No Stress

Stress

204 ⋅ 100 = 84.3% 242 184 ⋅ 100 = 86.8% 212 85 ⋅ 100 = 89.5% 95 473 ⋅ 100 = 86.2% 549

38 ⋅ 100 = 15.7% 242 28 ⋅ 100 = 13.2% 212 10 ⋅ 100 = 10.5% 95 76 ⋅ 100 = 13.8% 549

319


Using MINITAB, the bar chart is: Chart of Percent with Stress 16 14

13.8%

P er cent

12 10 8 6 4 2 0

9.38

a.

P oor

A v erage Fitness Level

G ood

E(n1) = np1,0 = 370(.30) = 111 E(n2) = np2,0 = 370(.20) = 74 E(n3) = np3,0 = 370(.20) = 74 E(n4) = np4,0 = 370(.10) = 37 E(n5) = np5,0 = 370(.10) = 37 E(n6) = np6,0 = 370(.10) = 37

b.

The test statistic is χ = ∑ 2

[ ni − E (ni )]

2

E (ni )

(84 − 111) (79 − 74) (75 − 74) (49 − 37) + + + 111 74 74 37 2 2 (36 − 37) (47 − 37) + + = 13.541 37 37 2

2

2

2

=

c.

To determine if the true percentages of the colors produced differ from the manufacturer’s stated percentages, we test: H0: p1 = .30, p2 = .20, p3 = .20, p4 = .10, p5 = .10, p6 = .10 Ha: At least one pi does not equal its hypothesized value. The test statistic is χ2 = 13.541.

320

Chapter 9


The rejection region requires α = .05 in the upper tail of the χ2 distribution with df = k − 1 2 = 11.0705. The rejection region is = 6 − 1 = 5. From Table VII, Appendix B, χ.05 2 χ > 11.0705. Since the observed value of the test statistic falls in the rejection region (χ2 = 13.541 > 11.0705), H0 is rejected. There is sufficient evidence to indicate the true percentages of the colors produced differ from the manufacturer’s stated percentages at α = .05. 9.40

a.

The expected cell counts are: R1C1 20(11) = 7.097 = 31 n RC 11(11) E21 = 2 1 = = 3.903 31 n E11 =

R1C2 20(20) = 12.903 = 31 n RC 11(20) E22 = 2 2 = = 7.097 31 n E12 =

b.

One of the assumptions for the chi-square test is that the sample size, n, is large enough so that, for every cell, the expected cell count, Eij, will be equal to 5 or more. For cell (2, 1), the expected cell count is only 3.903.

c.

To determine if inside ownership and size are independent, we test: H0: Inside ownership and size are independent Ha: Inside ownership and size are dependent The p-value is .0043. Since the p-value is so small, H0 is rejected. There is sufficient evidence to indicate that inside ownership and size are dependent for α > .0043

d.

First, we find the percentages by dividing each cell count by the column total and multiplying by 100. The row totals are divided by the total sample size. The percentages are found in the table: Size Insider Ownership Low High

Small 3 × 100% = 27.3% 11 8 × 100% = 72.7% 11


Large 17 × 100% = 85% 20 3 × 100% = 15% 20

Totals 20 × 100% = 64.5% 31 11 × 100% = 35.5% 31

321


Using MINITAB, the bar chart is:

90 80 70 64.5%

60

Percent

50 40 30 20 10 0 Small

Large

Size

Since the bars are not the same height, there is evidence that insider ownership and size are dependent. This is what we found in part c. 9.42

322


R1C1 100(171) = = 34.2 n 500

E12 =

R1C2 100(207) = = 41.4 n 500

E13 =

R1C3 100(80) = = 16.0 n 500

E14 =

R1C4 100(42) = = 8.4 n 500

E21 =

R2 C1 175(171) = = 59.9 500 n

E22 =

R2 C2 175(207) = = 72.5 500 n

E23 =

R2 C3 175(80) = = 28.0 500 n

E24 =

R2 C4 175(42) = = 14.7 500 n

E31 =

R3 C1 145(171) = = 49.6 n 500

E32 =

R3 C2 145(207) = = 60.0 n 500

E33 =

R3 C3 145(80) = = 23.2 n 500

E34 =

R3 C4 145(42) = = 12.2 n 500

E41 =

R4 C1 80(171) = = 27.4 n 500

E42 =

R4 C2 80(207) = = 33.1 n 500

E43 =

R4 C3 80(80) = = 12.8 500 n

E44 =

R4 C4 80(42) = = 6.7 500 n

Chapter 9


To determine if there is a dependence between a son's choice of occupation and his occupation, we test:

father's

H0: Son's choice of occupation and his father's occupation are independent Ha: Son's choice of occupation and his father's occupation are dependent. The test statistic is

χ = ∑∑ 2

[nij − Eij ]2 Eij

=

(55 − 34.2) 2 (38 − 41.4) 2 (7 − 16.0) 2 (0 − 8.4) 2 (79 − 59.9) 2 + + + + 34.2 41.4 16.0 8.4 59.9

(71 − 72.5) 2 (25 − 28) 2 (0 − 14.7) 2 (22 − 49.6) 2 (75 − 60) 2 (38 − 23.2) 2 + + + + + 72.5 28 14.7 49.6 60 23.2 (10 − 12.2) 2 (15 − 27.4) 2 (23 − 33.1) 2 (10 − 12.8) 2 (32 − 6.7) 2 + + + + + = 181.32 12.2 27.4 33.1 12.8 6.7 +

The rejection region requires α = .05 in the upper tail of the χ 2 distribution with df 2 = 16.9190. The = (r – 1)(c – 1) = (4 – 1)(4 – 1) = 9. From Table VII, Appendix B, χ.05

rejection region is χ 2 > 16.9190. Since the observed value of the test statistic falls in the rejection region ( χ 2 = 181.32 > 16.9190), H0 is rejected. There is sufficient evidence to indicate a dependence between a son’s choice of occupation and his father’s occupation at α = .05. 9.44

a.

Some preliminary calculations are: R1C1 57(52) = = 34.465 n 86 R C 29(52) E21 = 2 1 = = 17.535 n 86

E11 =

R1C2 57(54) = = 22.535 n 86 RC 29(34) E22 = 2 2 = = 11.465 n 86

E12 =

To determine if manufacturing firms were more likely to be involved with TQM than service firms, we test: H0: Type of firm and TQM are independent Ha: Type of firm and TQM are dependent ⎡ nij − Eij ⎤⎦ The test statistic is χ = ∑∑ ⎣ Eij

2

2

(34 − 34.465) (23 − 22.535) (18 − 17.535) (11 − 11.465) + + + = .047 34.465 22.535 17.535 11.465 2

=

2

2

2

The rejection region requires α = .05 in the upper tail of the χ2 distribution with df = 2 = 3.84146. The (r − 1)(c − 1) = (2 − 1)(2 − 1) = 1. From Table VII, Appendix B, χ.05 2 rejection region is χ > 3.84146.


323


Since the observed value of the test statistic does not fall in the rejection region (χ2 = .047 >/ 3.84146), H0 is not rejected. There is insufficient evidence to indicate that the type of firm and TQM are dependent at α = .05. There is no evidence to indicate that manufacturing firms are more likely to be involved with TQM than service firms. b.

The p-value is P(χ2 > .047). From Table VII, Appendix B, with df = 1, .10 < P(χ2 > .047) < .90.

c.

We must assume: 1.

2.

9.46

a.

The n observed counts are a random sample from the population of interest. We may then consider this to be a multinomial experiment with r × c = 2 × 2 = 4 possible outcomes The sample size, n, will be large enough so that, for every cell, the expected cell count, E(nij), will be equal to 5 or more.

Some preliminary calculations are: E(n1) = np1,0 = 85(.26) = 22.1 E(n2) = np2,0 = 85(.30) = 25.5 E(n3) = np3,0 = 85(.11) = 9.35 E(n4) = np4,0 = 85(.14) = 11.9 E(n5) = np5,0 = 85(.19) = 16.15 To determine if probabilities differ from the hypothesized values, we test: H0: p1 = .26, p2 = .30, p3 = .11, p4 = .14, p5 = .19 Ha: At least one of the probabilities differs from its hypothesized value. ⎡ ni − E ( n i ) ⎤⎦ The test statistic is χ = ∑ ⎣ E (ni ) 2

2

2

(32 − 22.1) (26 − 25.5) (15 − 9.35) (6 − 11.9) (6 − 16.15) + + + + 22.1 25.5 9.35 11.9 16.15 2

=

2

2

2

2

= 17.16 The rejection region requires α = .05 in the upper tail of the χ2 distribution with df = k − 1 2 = 9.48773. The rejection region is = 5 − 1 = 4. From Table VII, Appendix B, χ.05 2 χ > 9.48773. Since the observed value of the test statistic falls in the rejection region (χ2 = 17.16 > 9.48873), reject H0. There is sufficient evidence to indicate the probabilities differ from their hypothesized values at α = .05.

324

Chapter 9


b.

pˆ1 =

n1 32 = .37647 = n 85

For confidence coefficient .95, α = 1 − .95 = .05 and α/2 = .05/2 = .025. From Table IV, Appendix B, z.025 = 1.96. The 95% confidence interval is: pˆ1 (1 − pˆ1 ) n .37647(1 − .37647) ⇒ .376 ± 1.96 85 ⇒ .376 ± .103 ⇒ (.273, .479) z.025

9.48

c.

The interval tells us that between 27.3% and 47.9% of the Avonex MS patients are exacerbation-free during a two-year period. Since this interval is completely above the percentage of placebo patients (26%), it seems that the Avonex patients are more likely to have no exacerbations than placebo patients.

a.

Some preliminary calculations are: The contingency table is:

Shift 1 2 3

Defectives 25 35 80 140

Non-Defectives 175 165 120 460

200 200 200 600

R1C1 200(140) = 46.667 = n 600 200(140) E21 = E31 = = 46.667 600 200(460) E12 = E22 = (n32) = = 153.333 600

E11 =

To determine if quality of the filters are related to shift, we test:

H0: Quality of filters and shift are independent Ha: Quality of filters and shift are dependent The test statistic is χ2 =

+

(80 − 46.667 )

46.667 = 47.98


∑∑

2

+

[ nij − Eij ]2 Eij

(175 − 153.333) 153.333

=

( 25 − 46.667 )

2

+

46.667

2

+

(165 − 153.333) 153.333

( 35 − 46.667 )

2

46.667

2

+

(120 − 153.333)

2

153.333

325


The rejection region requires α= .05 in the upper tail of the χ2 distribution with df = 2 = 5.99147. The (r − 1)(c − 1) = (3 − 1)(2 − 1) = 2. From Table VII, Appendix B, χ.05 2 rejection region is χ > 5.99147. Since the observed value of the test statistic falls in the rejection region (χ2 = 47.98 > 5.99147), H0 is rejected. There is sufficient evidence to indicate quality of filters and shift are related at α = .05. b.

The form of the confidence interval for p is: pˆ qˆ 25 pˆ ± zα/2 1 1 where pˆ1 = = .125 200 n For confidence coefficient .95, α = 1 − .95 = .05 and α/2 = .05/2 = .025. From Table IV, Appendix B, z.025 = 1.96. The 95% confidence interval is: .125(.875) .125 ± 1.96 ⇒ .125 ± .046 ⇒ (.079, .171) 200

9.50

Using SAS, the output is: The FREQ Procedure Table of CANDIDATE by TIME CANDIDATE

TIME

Frequency| Col Pct | 1| 2| 3| 4| 5| 6| ---------+--------+--------+--------+--------+--------+--------+ SMITH | 208 | 208 | 451 | 392 | 351 | 410 | | 52.53 | 55.32 | 55.34 | 55.92 | 56.16 | 55.33 | ---------+--------+--------+--------+--------+--------+--------+ COPPIN | 55 | 51 | 109 | 98 | 88 | 104 | | 13.89 | 13.56 | 13.37 | 13.98 | 14.08 | 14.04 | ---------+--------+--------+--------+--------+--------+--------+ MONTES | 133 | 117 | 255 | 211 | 186 | 227 | | 33.59 | 31.12 | 31.29 | 30.10 | 29.76 | 30.63 | ---------+--------+--------+--------+--------+--------+--------+ Total 396 376 815 701 625 741

Total 2020 505 1129 3654

Statistics for Table of CANDIDATE by TIME Statistic DF Value Prob -----------------------------------------------------Chi-Square 10 2.2839 0.9937 Likelihood Ratio Chi-Square 10 2.2722 0.9938 Mantel-Haenszel Chi-Square 1 0.9851 0.3209 Phi Coefficient 0.0250 Contingency Coefficient 0.0250 Cramer's V 0.0177 Sample Size = 3654

To determine if candidates received votes independent of time period, we test: H0: Voting and Time period are independent Ha: Voting and Time period are dependent The test statistic is χ2 = 2.2839.

326

Chapter 9


Since no value of α was given, we will use α = .05. The rejection region requires α = .05 in the upper tail of the χ2 distribution with df = (r – 1)(c – 1) = (3 – 1)(6 − 1) = 10. From Table 2 = 18.3070. The rejection region is χ2 > 18.3070. VII, Appendix B, χ.05 Since the observed value of the test statistic does not fall in the rejection region (χ2 = 2.2839 >/ 18.3070), H0 is not rejected. There is insufficient evidence to indicate Voting and Time period are dependent at α = .05. Thus, we can conclude that voting and time period are independent. This means that regardless of time period, the percentage of votes received by each candidate is the same. In the table created by SAS, the bottom number in each cell is the column percent. This is the percent of votes received by the candidate in each time period. An inspection of these percents indicates that candidate Smith received approximately 55.3% of the votes each time period, candidate Coppin received approximately 13.8% of the vote, and candidate Montes received approximately 30.9% of the vote. All of this indicates that the election was rigged.


327


Discrimination in the Work Place

(To accompany Chapters 8–9)

Part I:

If we assume that those selected for termination were randomly selected from all workers, then the Chisquared test for independence is appropriate. Using SAS, the output is: TABLE OF RACE BY DECISION RACE

DECISION

Frequency| Percent | Row Pct | Col Pct |RETAINED|LAIDOFF | Total ---------+--------+--------+ WHITE | 1051 | 31 | 1082 | 86.50 | 2.55 | 89.05 | 97.13 | 2.87 | | 90.29 | 60.78 | ---------+--------+--------+ BLACK | 113 | 20 | 133 | 9.30 | 1.65 | 10.95 | 84.96 | 15.04 | | 9.71 | 39.22 | ---------+--------+--------+ Total 1164 51 1215 95.80 4.20 100.00 STATISTICS FOR TABLE OF RACE BY DECISION Statistic DF Value Prob -----------------------------------------------------Chi-Square 1 43.641 0.001 Likelihood Ratio Chi-Square 1 29.260 0.001 Continuity Adj. Chi-Square 1 40.666 0.001 Mantel-Haenszel Chi-Square 1 43.605 0.001 Fisher's Exact Test (Left) 1.000 (Right) 6.43E-08 (2-Tail) 6.43E-08 Phi Coefficient 0.190 Contingency Coefficient 0.186 Cramer's V 0.190 Sample Size = 1215

328



To determine if the variables Race and Decision are related, we test: H0: Race and Decision are independent Ha: Race and Decision are dependent The test statistic is χ2 = 43.641. The p-value is p = .001. Since the p-value is so small, there is evidence to reject H0. There is sufficient evidence to indicate that Race and Decision are related. From the table, only 2.9% of whites were terminated. However, 15.0% of black were terminated. There is a significant difference in these percentages. This supports the plaintiff's position. However, this is all based on the assumption that those selected to be laidoff were randomly selected. However, if the company made its decision based on performance as it claims, then those selected to be terminated were not randomly selected and thus, the test of hypothesis is invalid. Part II: If the workers to be terminated were truly selected at random, then the Chi-square test for independence is appropriate. Using SAS, the output is: TABLE OF STATUS BY AGE1 STATUS

AGE1

Frequency | Percent | Row Pct | Col Pct |UNDER 40|40 + | Total -----------+--------+--------+ ACTIVE | 18 | 13 | 31 | 32.73 | 23.64 | 56.36 | 58.06 | 41.94 | | 72.00 | 43.33 | -----------+--------+--------+ TERMINATED | 7 | 17 | 24 | 12.73 | 30.91 | 43.64 | 29.17 | 70.83 | | 28.00 | 56.67 | -----------+--------+--------+ Total 25 30 55 45.45 54.55 100.00


329


STATISTICS FOR TABLE OF STATUS BY AGE1 Statistic DF Value Prob -----------------------------------------------------Chi-Square 1 4.556 0.033 Likelihood Ratio Chi-Square 1 4.651 0.031 Continuity Adj. Chi-Square 1 3.465 0.063 Mantel-Haenszel Chi-Square 1 4.473 0.034 Fisher's Exact Test (Left) 0.993 (Right) 0.031 (2-Tail) 0.055 Phi Coefficient 0.288 Contingency Coefficient 0.277 Cramer's V 0.288 Sample Size = 55

To determine if the variables Status and Age are related, we test: H0: Age and Status are independent Ha: Age and Status are dependent The test statistic is χ2 = 4.556. The p-value is p = .033. Since the p-value is so small, there is evidence to reject H0. There is sufficient evidence to indicate that Age and Status are related. From the table, 56.7% of those aged 40 and over were terminated. However, only 28.0% of those aged under 40 were terminated. There is a significant difference in these percentages. This supports the plaintiff's position. We can also look at some other revealing statistics. If we compare the mean wages of those terminated against those who remained active, there is a significant difference. The mean wages of those terminated is significantly higher than the mean wages of those who remained active. Also, the mean age of those who remained active (33.0) is significantly less than the mean age of those who were terminated (44.08). Also, the mean wage of those under 40 ($26,452.20) was significantly less than the mean wage of those 40 or over ($39,044.17). All of this implies that those who were terminated were those who were older with the higher salaries. It appears that the company wanted to not only reduce the work force, but also reduce its mean expenses for those remaining on the workforce. I can find nothing to support the defendant's position. TTEST PROCEDURE Variable: WAGES STATUS N Mean Std Dev Std Error Variances T DF Prob>|T| ------------------------------------------------------------------------------ACTIVE 31 28772.26 6302.5283 1131.9675 Unequal -6.8124 52.9 0.0001 -6.6214 53.0 0.0000* TERMINATED 24 39195.42 5042.9673 1029.3914 Equal

330



For H0: Variances are equal, F' = 1.56 DF = (30,23) Prob>F' = 0.2738 ************************************************************************ Variable: AGE STATUS N Mean Std Dev Std Error Variances T DF Prob>|T| -----------------------------------------------------------------------------ACTIVE 31 33.0000 8.0000 1.4368 Unequal -5.7661 53.0 0.0001* TERMINATED 24 44.0833 6.2549 1.2768 Equal -5.5886 53.0 0.0000 For H0: Variances are equal, F' = 1.64 DF = (30,23) Prob>F' = 0.2273 ************************************************************************ Variable: WAGES AGE1 N Mean Std Dev Std Error Variances T DF Prob>|T| ------------------------------------------------------------------------------UNDER 40 25 26452.2000 4739.5548 947.9110 Unequal -10.1970 49.3 0.0001 -10.2814 53.0 0.0000* 40 + 30 39044.1667 4334.8764 791.4365 Equal For H0: Variances are equal, F' = 1.20 DF = (24,29) Prob>F' = 0.6409


331



10.2

Chapter 10

For all problems below, we use:

a.

Slope =

"rise" y2 − y1 = "run" x2 − x1

Slope =

5 −1 = 1 = β1 5 −1

If y = β0 + β1x, then β0 = y − β1x. Since a given point is (1, 1) and β1 = 1, the y-intercept = β0 = 1 − 1(1) = 0. b.

Slope =

0−3 = −1 = β1 3−0

If y = β0 + β1x, then β0 = y − β1x. Since (0, 3) is given, the y-intercept is β0 = 3 − (−1)(0) = 3. c.

Slope =

2 −1 1 = = .2 = β1 4 − (−1) 5

If y = β0 + β1x, then β0 = y − β1x. Since a given point is (−1, 1) and β1 = 1/5, the y-intercept is β0 = 1 − .2(−1) = 1.2. d.

Slope =

6 − ( −3) 9 = = 1.125 = β1 2 − (−6) 8

If y = β0 + β1x, then β0 = y − β1x. Since a given point is (−6, −3) and β1 = 9/8, the y-intercept is β0 = −3 − 1.125(−6) = 3.75. 10.4

a.

The equation for a straight line (deterministic) is y = β0 + β1x. If the line passes through (1, 1), then 1 = β0 + β1(1) ⇒ 1 = β0 + β1 Likewise, through (5, 5) 5 = β0 + β1(5)

332

Chapter 10


Solving for these two equations: 1 = β0 + β1 −(5 = β0 + β1(5)) ──────── −4 = −4β1 ⇒ β1 = 1 Substituting β1 = 1 into the first equation, we get 1 = β0 + 1 ⇒ β0 = 0 The equation is y = 0 + 1x or y = x. b.

The equation for a straight line is y = β0 + β1x. If the line passes through (0, 3), then 3 = β0 + β1(0), which implies β0 = 3. Likewise, through the point (3, 0), then 0 = β0 + 3β1 or −β0 = 3β1. Substituting β0 = 3, we get −3 = 3β1 or β1 = −1. Therefore, the line passing through (0, 3) and (3, 0) is y = 3 − x.

c.

The equation for a straight line is y = β0 + β1x. If the line passes through (−1, 1), then 1 = β0 + β1(−1). Likewise through the point (4, 2), 2 = β0 + β1(4). Solving for these two equations 2 = β0 + β14 −(1 = β0 − β11) ──────── 5β1 or β1 =

1=

d.

1 5

Solving for β0, 1 = β0 +

1 1 1 6 (−1) or 1 = β0 − or β0 = 1 + = 5 5 5 5

The equation, with β0 =

6 1 6 1 and β1 = , is y = + x . 5 5 5 5

The equation for a straight line is y = β0 + β1x. If the line passes through (−6, −3), then −3 = β0 − β16. Likewise, through the point (2, 6), 6 = β0 + β12. Solving these equations simultaneously. 6 = β0 + β12 −[(−3) = β0 − β16] ───────── 9=

8β1 or β1 =

9 8

18 30 ⎛9⎞ Solving for β0, 6 = β0 + 2 ⎜ ⎟ ⇒ 6 − = β0 or β0 = 8 8 ⎝8⎠

Therefore, y =


30 9 + x. 8 8

333


10.6

a.

y = 4 + x. The slope is β1 = 1. The intercept is β0 = 4.

b.

y = 5 − 2x. The slope is β1 = −2. The intercept is β0 = 5.

c.

y = −4 + 3x. The slope is β1 = 3. The intercept is β0 = -4.

d.

y = −2x. The slope is β1 = −2. The intercept is β0 = 0.

e.

y = x. The slope is β1 = 1. The intercept is β0 = 0.

f.

y = .5 + 1.5x. The slope is β1 = 1.5. The intercept is β0 = .5.

10.8

The "line of means" is the deterministic component in a probabilistic model.

10.10

a. xi

yi

xi2

xi yi

7 4 6 2 1 1 3

2 4 2 5 7 6 5

72 = 49 42 = 16 62 = 36 22 = 4 12 = 1 12 = 1 32 = 9

7(2) = 14 4(4) = 16 6(2) = 12 2(5) = 10 1(7) = 7 1(6) = 6 3(5) = 15

∑ x = 7 + 4 + 6 + 2 + 1 + 1 + 3 = 24 ∑ y = 2 + 4 + 2 + 5 + 7 + 6 + 5 = 31 ∑ x = 49 + 16 + 36 + 4 + 1 + 1 + 9 = 116 ∑ x y = 14 + 16 + 12 + 10 + 7 + 6 + 15 = 80

Totals:

i

i

2 i

i

b.

334

SSxy =

c.

SSxx =

d.

βˆ1 =

∑x y i

∑x

2 i

SS xy

=

SS xx

∑x

i

=

i

−

−

i

( ∑ x )( ∑ y ) i

i

n

(∑ x ) i

7

= 80 −

2

= 116 −

(24)(31) = 80 − 106.2857143 = -26.2857143 7

(24) 2 = 116 − 82.28571429 = 33.71428571 7

−26.2857143 = −.779661017 ≈ −.7797 33.71428571 24 = 3.428571429 y = 7

∑x

i

=

31 = 4 .428571429 7

e.

x =

f.

βˆ0 = y − βˆ1 x = 4.428571429 − (−.779661017)(3.428571429)

g.

= 4.428571429 − (−2.673123487) = 7.101694916 ≈ 7.102 The least squares line is yˆ = βˆ0 + βˆ1 x = 7.102 − .7797x.

n

n

Chapter 10


10.12

a.

b.

Choose y = 1 + x since it best describes the relation of x and y.

c. y

2 1 3

d.

.5 1.0 1.5

y

X

2 1 3

.5 1.0 1.5

SSE =

∑ ( y − yˆ )

y − yˆ

yˆ = 1 + x

X

2 − 1.5 = .5 1 − 2.0 = −1.0 3 − 2.5 = .5 Sum of errors = 0

1.5 2.0 2.5

yˆ = 3 − x 3 − .5 = 2.5 3 − 1.0 = 2.0 3 − 1.5 = 1.5

y − yˆ 2 − 2.5 = −.5 1 − 2.0 = −1.0 3 − 1.5 = 1.5 Sum of errors = 0

2

SSE for 1st model: y = 1 + x, SSE = (.5)2 + (−1)2 + (.5)2 = 1.5 SSE for 2nd model: y = 3 - x, SSE = (−.5)2 + (−1)2 + (1.5)2 = 3.5 The best fitting straight line is the one that has the smallest least squares. The model y = 1 + x has a smaller SSE, and therefore it verifies the visual check in part a. e.


∑x

=3

SSxy = SSxx =

βˆ1 =

∑

∑ y = 6 ∑ xy = 6.5 ∑ x = 3.5 ( ∑ x )( ∑ y ) = 6.5 − (3)(6) = .5 xy −

∑x

2

3

n

2

−

(∑ x)

.5 = 1; x = .5


2

n

∑x 3

2

= 3.5 − =

(3) = .5 3

3 = 1; y = 3

∑y 3

=

6 =2 3

335


βˆ0 = y − βˆ1 x = 2 − 1(1) = 1 ⇒ yˆ = βˆ0 + βˆ1 x = 1 + x The least squares line is the same as the second line given. 10.14

10.16

a.

The straight-line model would be: y = β o + β1 x + ε

b.

The least squares line is:

c.

Since range of observed values for the number of carats (x) does not include 0, the yintercept has no meaning.

d.

The slope of the line is β1 . In terms of this problem, β1 is the change in the mean asking price for each additional carat. This interpretation is meaningful for values of x within the observed range. The observed range of x is .18 to 1.10.

e.

yˆ = −2, 298.4 + 11, 598.9(.52) = 3, 733.028 . The predicted asking price for a .52 carat diamond is $3,733.028.

a.


yˆ = −2, 298.4 + 11, 598.9 x

∑ x = 62

∑ y = 97.8

∑ x 2 = 720.52

∑ y 2 = 1,710.2

x=

∑ x = 62 = 10.33333333 n

6

SS xy = ∑ xy −

SS xx = ∑ x

βˆ1 =

SS xy SS xx

2

=

∑ xy = 1,087.78

y=

∑ y = 97.8 = 16.3 n

6

( ∑ x )( ∑ y ) = 1,087.78 − 62(97.8) = 1,087.78 − 1,010.6 = 77.18 6

n

(∑ x) − n

2

= 720.52 −

(62) 2 = 720.52 − 640.667 = 79.8533333 6

77.18 = 0.966521957 ≈ 0.9665 79.8533333

βô = y − βˆ1 x = 16.3 − 0.966521957(10.33333333) = 6.312606448 ≈ 6.3126 yˆ = 6.3126 + .9665 x b.

336

Since x = 0 is not in the observed range of the mean pore diameters, the y-intercept has no meaning.

Chapter 10


10.18

c.

For each unit increase in mean pore diameter, the mean value of porosity is estimated to increase by .9665.

d.

For x = 10, yˆ = 6.3126 + .9665(10) = 15.9776

a.


∑x ∑x

= 6167 2

= 1,641,115

SSxy =

∑ xy −

SSxx =

∑x

∑ y = 135.8 ∑ xy = 34,764.5

n = 24

( ∑ x )( ∑ y )

n (6167)(135.8) = −130.44167 = 34,764.5 − 24 2

(∑ x) −

2

n

2

(6167) = 56,452.95833 24 SS xy −130.44167 = βˆ1 = = −.002310625 ≈ −.0023 SS xx 56452.958 = 1,641,115 −

βˆ0 = y − βˆ1 x =

135.8 ⎛ 6167 ⎞ − (−.002310625) ⎜ ⎟ = 6.252067683 ≈ 6.25 24 ⎝ 24 ⎠

The least squares line is yˆ = 6.25 − .0023x b.

βˆ0 = 6.25. Since x = 0 is not in the observed range, βˆ0 has no interpretation other than being the y-intercept.

βˆ1 = −.0023. For each additional increase of 1 part per million of pectin, the mean sweetness index is estimated to decrease by .0023.

10.20

c.

yˆ = 6.25 − .0023(300) = 5.56

a.

A proposed model is E(y) = βo + β1x.

b.


∑ x = 1, 292.7 ∑ x 2 = 88,668.43


∑ y = 3,781.1

∑ xy = 218, 291.63

∑ y 2 = 651,612.45

337


x=

∑ x = 1, 292.7 = 58.75909091

y=

22

n

SS xy = ∑ xy −

∑ y = 3,781.1 = 171.8681818 22

n

( ∑ x )( ∑ y ) = 218, 291.63 − 1, 292.7(3,781.1)

n 22 = 218, 291.63 − 222,173.9986 = −3,882.3686

(∑ x) −

2

(1, 292.7) 2 n 22 = 88,668.43 − 75,957.87682 = 12,710.55318

SSxx = ∑ x

βˆ1 =

SSxy SSxx

2

=

= 88,668.43 −

−3,882.3686 = −0.305444503 ≈ −0.305 12,710.55318

βô = y − βˆ1 x = 171.8681818 − (−0.305444503)(58.75909091) = 189.8158231 ≈ 189.816 The fitted regression line is: yˆ = 189.816 − 0.305 x c.

Using MINITAB, a graph of the fitted regression line is: Fitted Line Plot F C A T-M ath = 189.8 - 0.3054 P ercent 190

S R-Sq R-Sq(adj)

185

5.36572 67.3% 65.7%

FC A T -M ath

180 175 170 165 160 155 10

20

30

40

50 60 P er cent

70

80

90

100

From the fitted regression line, the relationship between the two variables is negative.

338

Chapter 10


d.

βô = 189.816 . Since 0 is not in the range of observed values of the variable % Below Poverty, the y-intercept has no meaning.

βˆ1 = −0.305 .

e.

For each unit change in % Below Poverty, the mean value of FCAT-Math is estimated to decrease by 0.305.

A proposed model is E(y) = βo + β1x. Some preliminary calculations are:

∑ x = 1, 292.7

∑ y = 3,764.2

∑ x 2 = 88,668.43 x=

∑ y 2 = 645, 221.16

∑ x = 1, 292.7 = 58.75909091 n

22

SSxy = ∑ xy −

∑ xy = 217,738.81

y=

∑ y = 3,764.2 = 171.1 n

22

( ∑ x )( ∑ y ) = 217,738.81 − 1, 292.7(3,764.2)

n = 217,738.81 − 221,180.97 = −3, 442.16

(∑ x) −

22

2

(1, 292.7) 2 n 22 = 88,668.43 − 75,957.87682 = 12,710.55318

SS xx = ∑ x

βˆ1 =

SSxy SSxx

2

=

= 88,668.43 −

−3, 442.16 = −0.270811187 ≈ −0.271 12,710.55318

βô = y − βˆ1 x = 171.1 − (−0.270811187)(58.75909091) = 187.0126192 ≈ 187.013 The fitted regression line is: yˆ = 187.013 − 0.271x


339


Using MINITAB, a graph of the fitted regression line is: Fitted Line Plot F C A T-Read = 187.0 - 0.2708 P ercent 185

S

180

FC A T -Read

3.42319

R-Sq R-Sq(adj)

79.9% 78.9%

175 170

165 160

10

20

30

40

50 60 P er cent

70

80

90

100

From the fitted regression line, the relationship between the two variables is negative.

10.22

βô = 187.013 .

Since 0 is not in the range of observed values of the variable % Below Poverty, the y-intercept has no meaning.

βˆ1 = −0.271 .

For each unit change in % Below Poverty, the mean value of FCAT-Reading is estimated to decrease by .271.

a.

We will select Average Salary as the dependent variable and Mean GMAT as the independent variable.

b.


∑ x = 6,944

∑ y = 1,080, 288

∑ x 2 = 4,824,680

∑ y 2 = 118,151,669, 430

x=

∑ x = 6,944 = 694.4 n

10

SSxy = ∑ xy −

y=

∑ y = 1,080, 288 = 108,028.8 n

10

( ∑ x )( ∑ y ) = 751,698, 490 − 6,944(1,080, 288)

n = 751,698, 490 − 75,015,987.2 = 1,546,502.8

340

∑ xy = 751,698, 490

10

Chapter 10


(∑ x) −

2

(6,944) 2 n 10 = 4,824,680 − 4,821,913.6 = 2,766.4

SSxx = ∑ x

βˆ1 =

SSxy SSxx

2

= 4,824,680 −

1,546,502.8 = 559.0307981 ≈ 559.031 2,766.4

=

βô = y − βˆ1 x = 108,028.8 − (559.0307981)(694.4) = −280,162.1862 ≈ −280,162.186 The fitted regression line is: yˆ = −280,162.186 + 559.031x

βô = −280,162.186 .

Since 0 is not in the range of observed values of the variable Mean GMAT, the y-intercept has no meaning.

βˆ1 = −0.271 . For each additional point increase in the mean GMAT score, the mean value of Average Salary is estimated to increase by $559.031.

10.24

The graph in b would have the smallest s2 because the width of the data points is the smallest.

10.26

a.

SSE = SSyy − βˆ1 SSxy = 95 − .75(50) = 57.5 s2 =

∑x n

57.5 = 3.19444 20 − 2

=

b.

SSyy =

∑y

c.

SSyy =

∑(y

(∑ y) −

2 2

50 = 797.5 40 n SSE = SSyy − βˆ1 SSxy = 797.5 − .2(2700) = 257.5 SSE 257.5 = = 6.776315789 ≈ 6.7763 s2 = n−2 40 − 2 2

i

= 860 −

− yˆ ) 2 = 58

βˆ1 =

SS xy

=

91 = .535294117 170

SS xx ˆ SSE = SSyy − β1 SSxy = 58 − .535294117(91) = 9.2882353 ≈ 9.288 SSE 9.2882353 = = 1.161029413 ≈ 1.1610 s2 = n−2 10 − 2

10.28

a.

From the printout, SSE = 382,178,624, s2 = MSE = 1,248,950, and s = 1,117.56.

b.

s = 1,117.56. We would expect approximately 95% of the observed values of y to fall within 2s or 2(1,117.56) = 2,235.12 of their least squares predicted values.


341


10.30

a.

From part a of Exercise 10.17, SSxy = 20.00833333,

∑ y = 239 , ∑ y 2 = 10, 255 ,

and βˆ1 = 35.91623038 .

(∑ y) −

2

(239) 2 n 6 = 10, 255 − 9520.166667 = 734.8333333

SS yy = ∑ y

2

= 10, 255 −

SSE = SS yy − βˆ1SS xy = 734.833333 − 35.91623068(20.00833333) = 16.2094179 s 2 = MSE =

10.32

SSE 16.2094179 = = 4.052354475 and s = 4.052354475 = 2.013 n−2 6−2

b.

s = 2.013. We would expect approximately 95% of the observed values of y (Drug release rate) to fall within 2s or 2(2.013) = 4.026 units of their least squares predicted values.

a.

Using MINITAB, the scattergram of the data is:

b.

∑ x = 44.71 ∑ y = 131,670 ∑ y = 1,514,402,100

∑ xy

= 493,117.7

∑x

2

= 167.4615

2

x=

∑ x = 44.71 = 3.7258333 n

SSxy =

12

∑ xy −

y=

∑ y = 131, 670 n

12

= 10,972.5

( ∑ x )( ∑ y ) = 493,117.7 − 44.71(131, 670)

n = 493,117.7 − 490,580.475 = 2,537.225

(∑ x) −

12

2

44.712 n 12 = 167.4615 − 166.5820083 = .8794917

SSxx =

βˆ1 =

342

∑x

SSxy SS xx

2

=

= 167.4615 −

2, 537.225 = 2884.876571 ≈ 2884.877 .8794917

Chapter 10


βˆ0 = y − βˆ 1 x = 10,972.5 − 2884.876571(3.7258333) = 10,972.5 − 10,748.56929 = 233.93071 ≈ 233.931 The fitted regression line is = 233.931 + 2884.877x

c.

(∑ y) −

2

131, 6702 n 12 = 1,514,402,100 – 1,444,749,075 = 69,653,025

∑y

SSyy =

2

= 1,514,402,1000 −

SSE = SSyy − βˆ1 SSxy = 69,653,025 − 2,884.876571(2,537.225) = 69,653,025 - 7,319,580.958 = 62,333,444.04 s2 =

SSE 62, 333, 444.04 = = 6,233,344.404 n−2 12 − 2

s=

s 2 = 6, 233, 344.404 = 2,496.6667

We would expect to see most of the hospital charges to fall within 2s or 2($2,496.6667) = $4,993.3333 of the least squares line. d.

For x = 4, yˆ = 223.931 + 2,884.877(4) = 11,763.439 yˆ ± 2s ⇒ 11,763.439 ± 4,993.3333 ⇒ (6,770.106, 16,756.772)

e.

10.34

Only one state (California) had an average hospital charge more than 2 standard errors from the least squares line. Thus, 11 out of 12 or 11/12 or .917 of the states had average hospital charges within 2 standard errors of the least squares line.

Some preliminary calculations for Brand A are:

∑ x = 750

∑x

SSxy = ∑ xy −

∑ x∑ y = 2, 022 − 750(44.8) = −218

SSxx = ∑ x 2 − SS yy = ∑ y

2

2

= 40, 500

∑ xy = 2, 022 ∑ y = 44.8

n

(∑ x)

2

= 168.70

15

2

n

(∑ y) − n

∑y

= 40, 500 −

7502 = 3, 000 15

= 168.70 −

44.82 = 34.89733333 15

2

−218 = −0.0726666667 ≈ −0.0727 SSxx 3, 000 44.8 750 βˆ0 = y − βˆ1 x = − (−0.0726666667) = 6.62 15 15

βˆ1 =

SSxy

=


343


The least squares prediction equation for Brand A is: yˆ = 6.62 − 0.0727 x Some preliminary calculations for Brand B are:

∑ x = 750

∑x

SSxy = ∑ xy −

∑ x∑ y = 2, 622 − 750(58.9) = −323

SSxx = ∑ x 2 −

SS yy = ∑ y

2

2

= 40, 500

∑ xy = 2, 622 ∑ y = 58.9

n

(∑ x)

∑y

2

= 270.89

15

2

n

(∑ y) −

= 40, 500 −

7502 = 3, 000 15

= 270.89 −

58.92 = 39.60933333 15

2

n − 323 βˆ1 = = = −0.1076666667 ≈ −0.1077 SSxx 3, 000 58.9 750 βˆ0 = y − βˆ1 x = − (−0.1076666667) = 9.31 15 15 SSxy

The least squares prediction equation for Brand B is: yˆ = 9.31 − 0.1077 x For Brand A, SSE = SS yy − βˆ1SS xy = 34.89733333 − ( −0.072666667)(−218) = 19.0560 s 2 = MSE =

SSE 19.0560 = = 1.4658 and s = 1.4658 = 1.211 n − 2 15 − 2

For Brand B, SSE = SS yy − βˆ1SS xy = 39.60933333 − (−0.107666667)(−323) = 4.833 s 2 = MSE =

SSE 4.833 = = 0.37177 and s = 0.37177 = .61 n − 2 15 − 2

For Brand A, yˆ = 6.62 − .0727x. For x = 70, yˆ = 6.62 − .0727(70) = 1.531 2s = 2(1.211) = 2.422 Therefore, yˆ ± 2s ⇒ 1.531 ± 2.422 ⇒ (−.891, 3.593) For Brand B, yˆ = 9.31 − .1077x. For x = 70, yˆ = 9.31 − .1077(70) = 1.7 2s = 2(.61) = 1.22 Therefore, yˆ ± 2s ⇒ 1.771 ± 1.22 ⇒ (.551, 2.991) More confident with Brand B since there is less variation (s is smaller).

344

Chapter 10


10.36

a.

b.


∑x

= 21

SSxy =

∑

∑ x = 91 ∑ xy = 86 ∑ y = 21 ∑ x∑ y = 86 − 21(21) = 86 − 63 = 23 xy − 2

n

SSxx =

∑x

2

SSyy =

∑y

2

−

(∑ x)

∑y

2

= 89

7

2

n

(∑ y) −

2

= 91 −

21 = 91 − 63 = 28 7

= 89 −

212 = 26 7

2

n

23 = .821428571 ≈ .821 28 SS xx 21 ⎛ 21 ⎞ βˆ0 = y − βˆ1 x = − .821428571 ⎜ ⎟ = 3 − 2.4642857 = .535714285 ≈ .536 7 ⎝ 7 ⎠

βˆ1 =

SS xy

=

The fitted line is yˆ = .536 + .821x. c. d.

See the plot in part a. To test whether x contributes significant information for predicting y, we test: H0: β1 = 0 Ha: β1 ≠ 0

e.


βˆ1 − 0 sβˆ

1

where sβˆ = 1

s SSxx

SSE = SSyy − βˆ1 SSxy = 26 − .821428571(23) = 7.107142857 SSE 7.107142857 = 1.421428571 s2 = s = 1.42143 = 1.1922 = 7−2 n−2 1.1922 .82143 − 0 sβˆ = = .2253 t= = 3.646 1 .2253 28 The degrees of freedom for this t is df = n − 2 = 7 − 2 = 5.


345


f.

The rejection region requires α/2 = .05/2 = .025 in each tail of the t-distribution. From Table VI, Appendix B, t.025 = 2.571 with df = n − 2 = 7 − 2 = 5. The rejection region is t > 2.571 or t < −2.571. Since the observed value of the test statistic falls in the rejection region (t = 3.646 > 2.571), H0 is rejected. There is sufficient evidence to indicate that x contributes information for the prediction of y at α = .05.

10.38


∑x

= 21

SSxy =

∑

∑ x = 91 ∑ xy = 65 ∑ y = 19 ∑ x∑ y = 65 − 21(19) = 65 − 66.5 = -1.5 xy − 2

n

SSxx =

∑ x2 −

SSyy =

∑y

2

(∑ x)

∑y

2

= 65

6

2

n

(∑ y) −

= 91 −

212 = 91 − 73.5 = 17.5 6

= 65 −

192 = 65 − 60.166667 = 4.8333333 6

2

n − 1.5 SS xy βˆ1 = = = −.085714285 ≈ −.0857 17.5 SS xx SSE = SSyy − βˆ1 SSxy = 4.8333333 − (−.085714285)(−1.5) = 4.704761903 SSE 4.704761903 s2 = s = 1.76190476 = 1.0845 = = 1.176190476 6−2 n−2

To determine whether a straight line is useful for characterizing the relationship between x and y, we test:

H0: β1 = 0 Ha: β1 ≠ 0 The test statistic is t =

βˆ1 − 0 sβˆ

=

−.08571 − 0 = −.33 1.0845

1

17.5 The rejection region requires α/2 = .05/2 = .025 in each tail of the t-distribution with df = n − 2 = 6 - 2 = 4. From Table VI, Appendix B, t.025 = 2.776. The rejection region is t > 2.776 or t < −2.776. Since the observed value of the test statistic does not fall in the rejection region (t = −.33
346

Chapter 10


10.40

a.

To determine if the average state SAT score in 2005 has a positive relationship with the average state SAT score in 1990, we test:

H0: β1 = 0 Ha: β1 > 0 b.

From the printout in Exercise 10.15, the p-value is p = 0.000. This is the p-value for a 2tailed test. The p-value for this one-tailed test is 0.000/2 = 0.000. Since the p-value is less than α = .05, H0 is rejected. There is sufficient evidence to indicate the average state SAT score in 2005 has a positive relationship with the average state SAT score in 1990 at α = .05.

c.

For confidence coefficient .95, α = .05 and α/2 = .05/2 = .025. From Table VI, Appendix B, with df = n – 2 = 51 – 2 = 49, t.025 ≈ 2.011. The 95% confidence interval is:

βˆ1 ± t.025 sβˆ ⇒ 1.073 ± 2.011(.056) ⇒ 1.073 ± .113 ⇒ (.960, 1.186) 1

We are 95% confident that for each additional point in the 1990 average state SAT score, the increase in the 2005 average stat SAT score is between .960 and 1.186. 10.42

From Exercise 10.18, SSxy = −130.44167, βˆ1 = -0.002310625, and SSxx = 56,452.95833.

∑ y = 135.8

∑ y = 769.72 ( ∑ y ) = 769.72 − 135.8 − 2

2

SS yy = ∑ y

2

n

24

2

= 1.3183333

SSE = SS yy − βˆ1SS xy = 1.3183333 − ( −0.002310625)(−130.44167) = 1.016931516 SSE 1.016931516 = = 0.046224159 and s = 0.046224159 = 0.214998 n−2 24 − 2 MSE 0.214998 sβˆ = = = 0.0009049 1 SSxx 56, 452.95833

s 2 = MSE =

For confidence coefficient .95, α = .05 and α/2 = .05/2 = .025. From Table VI, Appendix B, with df = n – 2 = 24 – 2 = 22, t.025 = 2.074. The confidence interval is:

βˆ1 ± t.025 sβˆ ⇒ −0.0023 ± 2.074(0.0009049) 1

⇒ −0.0023 ± 0.0019 ⇒ (−0.0042, − 0.0004)

We are 95% confident that for each additional point increase in the amount of soluble pectin, the mean sweetness index will decrease from between .0004 and .0042 points.


347


10.44

a.

From Exercise 10.23, SSxy = -787.51087, SSxx = 6,906.6087,

∑y

2

∑ y = 60.1 ,

= 262.271 , and βˆ1 = −0.114022801 .

(∑ y) −

2

(60.1) 2 23 n = 262.271 − 157.043913 = 105.227087

SS yy = ∑ y

2

= 262.271 −

SSE = SS yy − βˆ1SS xy = 105.227087 − ( −0.114022801)( −787.51087) = 15.43289179 s 2 = MSE =

sβˆ = 1

SSE 15.43289179 = = 0.734899609 and s = 0.734899609 = 0.8573 n−2 23 − 2

MSE

=

SS xx

0.734899609 6,906.6087

= 0.010315

To determine if the mass of the spill tends to diminish linearly as time increases, we test: H0: β1 = 0 Ha: β1 < 0 The test statistic is t =

βˆ1 − 0 sβˆ

1

=

−0.114022801 = −11.05 0.010315

The rejection region requires α = .05 in the lower tail of the t-distribution with df = n – 2 = 23 – 2 = 21. From Table VI, Appendix B, t.05 = 1.721. The rejection region is t < −1.721. Since the observed value of the test statistic falls in the rejection region (t = −11.05 < −1.721), H0 is rejected. There is sufficient evidence to indicate the mass of the spill tends to diminish linearly as time increases at α = .05. b.

For confidence coefficient .95, α = .05 and α/2 = .05/2 = .025. From Table VI, Appendix B, with df = n – 2 = 23 – 2 = 21, t.025 = 2.080. The 95% confidence interval is:

βˆ1 ± t.025 sβˆ ⇒ −0.1140 ± 2.080(0.010315) ⇒ −0.1140 ± 0.02146 1

⇒ (−0.13546, -0.09254) We are 95% confident that for each additional minute of elapsed time, the decrease in spill mass is between 0.13546 and 0.09254.

348

Chapter 10


10.46

a.

Using MINITAB, the scattergram is:

It appears from the plot that as the percentage of the population that is minority increases, the number of people per branch bank tends to increase. b.

The value of β1 will be positive. As one variable increases, the other tends to increase.

c.

∑ x = 363.8

∑y x=

2

∑ y = 56,560

∑ xy = 1,075,763

∑x

2

= 9,020.86

= 158,763,894

∑ x = 363.8 = 17.32380952 n

SSxy =

21

∑ xy −

y=

∑ x = 56, 560 = 2,693.33333 n

21

( ∑ x )( ∑ y ) = 1, 075, 763 − 363.8(56, 560)

n 21 = 1,075,763 − 979,834.6667 = 95,928.3333

(∑ x) −

2

363.82 n 21 = 9,020.86 − 6,302.401905 = 2,718.458095

SSxx =

βˆ1 =

∑x

SS xy

2

=

SS xx

= 9,020.86 −

95, 928.3333 = 35.28777342 ≈ 35.288 2, 718.458095

(∑ y) −

2

56, 5602 n 21 = 158,763,894 - 152,334,933.3 = 6,428,960.7

SSyy =

∑y

2

= 158,863,894 −

SSE = SSyy − βˆ1 SSxy = 6,428,960.7 − 35.28777342(95,928.3333) = 6,428,960.7 − 3,385,097.29 = 3,043,863.41 s2 =

SSE 3, 043,863.41 = = 160,203.3374 n−2 21 − 2


349


s=

2 s = 160, 203.3374 = 400.2541

To determine if the data support the charge made against the New Jersey banking community, we test: H0: β1 = 0 Ha: β1 ≠ 0 The test statistic is t =

βˆ1 − 0 sβˆ

=

35.288 − 0 400.2541

= 4.597

1

2, 718.458095 The rejection region requires α/2 =.01/2 = .005 in each tail of the t-distribution with df = n − 2 = 21 − 2 = 19. From Table VI, Appendix B, t.005 = 2.861. The rejection region is t < −2.861 or t > 2.861. Since the observed value of the test statistic falls in the rejection region (t = 4.597 > 2.861), H0 is rejected. There is sufficient evidence to support the charge made against the New Jersey banking community at α = .01. 10.48

a.

b.

Using MINITAB, the regression analysis is: Regression Analysis: Index versus Interactions The regression equation is Index = 44.1 + 0.237 Interactions Predictor Constant Interact S = 19.40

Coef 44.130 0.2366

SE Coef 9.362 0.1865

R-Sq = 8.6%

T 4.71 1.27

P 0.000 0.222

R-Sq(adj) = 3.3%

Analysis of Variance Source Regression Residual Error Total

DF 1 17 18

SS 606.0 6400.6 7006.6

MS 606.0 376.5

F 1.61

P 0.222

From the printout, the least squares line is yˆ = 44.13 + .2366x.

350

Chapter 10


c.

From the printout, s = 19.40 The standard deviation s represents the spread of the manager success index about the least squares line. Approximately 95% of the manager success indexes should lie within 2s = 2(19.40) = 38.8 of the least squares line.

d.

Refer to the scattergram in part a. The number of interactions with outsiders might contribute some information in the prediction of managerial success, but it does not look like a very strong relationship.

e.

To determine if the number of interactions contributes information for the prediction of managerial success, we test: H0: β1 = 0 Ha: β1 ≠ 0 The test statistic is t =

βˆ1 − 0 sβˆ

= 1.27

1

The rejection region requires α/2 = .05/2 = .025 in each tail of the t-distribution with df = n − 2 = 19 − 2 = 17. From Table VI, Appendix B, t.025 = 2.110. The rejection region is t > 2.110 or t < −2.110. Since the observed value of the test statistic does not fall in the rejection region (t = 1.27 >/ 2.110), H0 is not rejected. There is insufficient evidence to indicate the number of interactions contributes information for the prediction of managerial success at α = .05. f.

For confidence coefficient .95, α = 1 − .95 = .05 and α/2 = .05/2 = .025. From Table VI, Appendix B, with df = 17, t.025 = 2.110. The 95% confidence interval is:

βˆ1 ± t.025 sβˆ ⇒ .2366 ± 2.110(.1865) ⇒ .2366 ± .3935 ⇒ (−.1569, .6301) 1

We are 95% confident the change in the mean manager success index for each additional interaction with outsiders is between −.1569 and .6301. 10.50

a.

Using MINITAB, the regression analysis is: Regression Analysis: Risk versus Credit The regression equation is Risk = 56.2 - 0.400 Credit Predictor Constant Credit

Coef 56.215 -0.39961

S = 12.6777

SE Coef 6.033 0.09152

R-Sq = 33.4%

T 9.32 -4.37

P 0.000 0.000

R-Sq(adj) = 31.7%



DF 1 38 39

SS 3064.4 6107.5 9171.9

MS 3064.4 160.7

F 19.07

P 0.000

351


To determine if country credit risk contributes information for the prediction of market volatility, we test: H0: β1 = 0 Ha: β1 ≠ 0 The test statistic is t =

βˆ1 − 0 sβˆ

= −4.37 (from printout).

1

The p-value is .000. Since the p-value is so small, there is strong evidence to indicate that country credit risk contributes information for the prediction of market volatility at α > .000. b.

Using MINITAB, a scattergram of the data with the fitted regression line is:

Regression Plot Risk = 56.22 − .3996 Credit S = 12.6777

R-Sq = 33.4 %

R-Sq(adj) = 31.7 %

90

80 70

Ris k

60

50

40 30

20 10 20

30

40

50

60

70

80

90

100

Credit

From the plot, there appears to be several outliers. Observations 1, 19, 34, and 36 have arrows pointing at them.

352

Chapter 10


c.

Eliminating those four data points and using MINITAB, the regression analysis is as follows: The regression equation is Risk = 48.9 - 0.316 Credit Predictor Constant Credit

Coef 48.891 -0.31599

s = 7.46401

Stdev 3.991 0.05883

R-sq = 45.9%

t-ratio 12.25 -5.37

p 0.000 0.000

R-sq(adj) = 44.3%

Analysis of Variance SOURCE Regression Error Total Unusual Obs. 4 25 27

DF 1 34 35

SS 1607.4 1894.2 3501.6

Observations C2 C1 35.1 63.70 25.3 23.30 55.6 46.40

MS 1607.4 55.7

Fit Stdev.Fit 37.80 2.13 40.90 2.62 31.32 1.35

F 28.85

Residual 25.90 -17.60 15.08

p 0.000

St.Resid 3.62R -2.52R 2.05R

R denotes an obs. with a large st. resid.

After eliminating the four data points, the regression analysis is very similar. The fitted regression line is:

yˆ = 48.891 − .31599x To determine if country credit risk contributes information for the prediction of market volatility, we test: H0: β1 = 0 Ha: β1 ≠ 0 The test statistic is t =

βˆ1 − 0 sβˆ

= −5.37 (from printout).

1

The p-value is .000. Since the p-value is so small, there is strong evidence to indicate that country credit risk contributes information for the prediction of market volatility at α > .000. The standard error for the analysis when the four data points have been removed (s = 7.464) is much smaller than the standard error with all the data points (s = 12.6777).


353


10.52

10.54

a.

r = 1 implies x and y are perfectly, positively related.

b.

r = −1 implies x and y are perfectly, negatively related.

c.

r = 0 implies x and y are not related.

d.

r = .90 implies x and y are positively related. Since r is close to 1, the strength of the relationship is very high.

e.

r = .10 implies x and y are positively related. Since r is close to 0, the relationship is fairly weak.

f.

r = −.88 implies x and y are negatively related. Since r is close to −1, the relationship is fairly strong.

a.


∑x =0 ∑ y = 12 SSxy =

∑

∑ x = 10 ∑ xy = 20 ∑ y = 70 ∑ x∑ y = 20 − 0(12) = 20 xy − 2

2

n

SSxx =

∑x

2

SSyy =

∑y

2

r=

−

(∑ x)

5

2

n

(∑ y) −

SS xy

n

=

SS xxSS yy

2

= 10 −

0 = 10 5

= 70 −

122 = 41.2 5

2

20 10(41.2)

= .9853

r2 = .98532 = .9709 Since r = .9853, there is a very strong positive linear relationship between x and y. Since r2 = .9709, 97.09% of the total sample variability around the sample mean response is explained by the linear relationship between x and y.

354

Chapter 10


b.


∑x =0 ∑ y = 16 SSxy =

∑

∑ x = 10 ∑ xy = −15 ∑ y = 74 ∑ x∑ y = −15 − 0(16) = −15 xy − 2

2

n

SSxx =

∑x

2

SSyy =

∑y

2

r=

−

(∑ x)

5

2

n

(∑ y) −

02 = 10 5

= 74 −

162 = 22.8 5

2

n

SS xy

= 10 −

−15

=

10(22.8) SS xxSS yy 2 2 r = (−.9934) = .9868

= −.9934

Since r = −.9934, there is a very strong negative linear relationship between x and y. Since r2 = .9868, 98.68% of the total sample variability around the sample mean response is explained by the linear relationship between x and y. c.


∑ x = 18 ∑ y = 14 SSxy =

∑

∑ x = 52 ∑ xy = 36 ∑ y = 32 ∑ x∑ y = 36 − 18(14) = 0 xy − 2

2

n

SSxx =

∑x

2

SSyy =

∑y

2


−

(∑ x)

7

2

n

(∑ y) − n

= 52 −

182 = 5.71428571 7

= 32 −

142 =4 7

2

355


SS xy

r=

0

=

5.71428571(4)

SS xxSS yy

=0

r2 = 02 = 0 Since r = 0, this implies that x and y are not linearly related. Since r2 = 0, 0% of the total sample variability around the sample mean response is explained by the linear relationship between x and y.

d.


∑ x = 15 ∑y =4 SSxy =

∑

∑ x = 71 ∑ xy = 12 ∑y =6 ∑ x∑ y = 12 − 15(4) = 0 xy − 2

2

n

SSxx =

∑ x2 −

SSyy =

∑y

r=

2

(∑ x)

5

2

n

(∑ y) −

SS xy

SS xxSS yy 2 2 r =0 =0

n

=

= 71 −

152 = 26 5

=6−

42 = 2.8 5

2

0 26(2.8)

=0

Since r = 0, this implies that x and y are not linearly related. Since r2 = 0, 0% of the total sample variability around the sample mean response is explained by the linear relationship between x and y.

356

Chapter 10


10.56

10.58

10.60

a.

From the printout, r2 = R-Sq = 89.3%. 89.3% of the total sample variability around the sample mean asking price is explained by the linear relationship between asking price and number of carats for diamond.

b.

r = r 2 = .893 = .945. The value of r has the same sign as βˆ1 , which is positive. Since r is very close to 1, there is a strong positive linear relationship between asking price and number of carats for diamond.

a.

Since r = .43, there is a fairly weak positive linear relationship between total time allotted to sports and audience rating.

b.

r2 = .432 = .1849. Since r2 = .1849, 18.49% of the total sample variability around the sample mean audience rating is explained by the linear relationship between audience rating and total time allocated to sports.

a.

Using MINITAB, a scattergram of the data is: Scatterplot of NetWorth vs Age 50

NetWor th

40

30

20

10 20

30

40

50

60

70

80

90

A ge

There appears to be a slight increase in the Net Worth as age increases, but the relationship is fairly weak. b.


∑ x = 859

∑ y = 303.8

∑ x 2 = 53,567

∑ y 2 = 8, 202.28

SS xy = ∑ xy −

∑ xy = 17,841.6

( ∑ x )( ∑ y ) = 17,841.6 − 859(303.8)

15 n = 17,841.6 − 17,397.61333 = 443.98667


357


(∑ x) −

2

(859) 2 15 n = 53,567 − 49,192.06667 = 4,374.93333

SS xx = ∑ x

2

= 53,567 −

(∑ y) −

2

(303.8) 2 15 n = 8, 202.28 − 6,152.962667 = 2,049.317333

SS yy = ∑ y

βˆ1 =

r=

SSxy SS xx

2

=

= 8, 202.28 −

443.98667 = 0.101484213 ≈ 0.1015 4,374.93333

SSxy SSxx SS yy

=

443.98667 = .1483 4,374.93333 2,049.317333

Since r is positive, there is a very weak positive linear relationship between a person’s net worth and his/her age. c.

If r had a negative sign, the interpretation would be: Since r is negative, there is a very weak negative linear relationship between a person’s net worth and his/her age.

10.62

From Exercises 10.23 and 10.44, SSxy = -787.51087, SSxx = 6,906.6087, and SSyy = 105.227087.

r=

SSxy SSxx SS yy

=

−787.51087 = −.924 6,906.6087 105.227087

There is a very strong negative linear relationship between mass of spill and elapsed time of the spill.

r 2 = −.9242 = .854 Approximately 85.4% of the variability in the mass of the spill around the sample mean is explained by the linear relationship between mass of the spill and elapsed time of the spill.

358

Chapter 10


10.64

a.


15

WeightChg

10

5

0

-5

-10 0

10

20

30

40

50

60

70

80

Digest

b.


∑ x = 1, 266.5 ∑ y = 1, 075.5

∑x

2

∑ xy = 4,103.25 ∑ y = 46

= 57, 390.75

2

∑ x∑ y = 4,103.25 − 1, 266.5(46) = 2, 716.130952

SSxy = ∑ xy − SSxx = ∑ x 2 − SS yy = ∑ y

βˆ1 = r=

SSxy SSxx

2

n

(∑ x)

42

2

= 57, 390.75 −

n

(∑ y) −

2

= 1, 075.5 −

(1, 266.5) 2 = 19,199.74405 42

462 = 1, 025.119048 42

n 2, 716.130952 = = 0.141467039 19,199.74405

SSxy SSxx SS yy

=

2, 716.130952 19,199.74405 1, 025.119048

= .6122

There is a moderate positive linear relationship between digestion efficiency and weight change. c.

To determine whether weight change is correlated with digestion, we test: H0: ρ = 0 Ha: ρ ≠ 0 The test statistic is t =


r 1− r n−2 2

=

.6122 1 − .61222 42 − 2

= 4.90

359


The rejection region requires α/2 = .01/2 = .005 in each tail of the t-distribution with df = n – 2 = 42 – 2 = 40. From Table VI, Appendix B, t.005 = 2.704. The rejection region is t > 2.704 or t < −2.704. Since the observed value of the test statistic falls in the rejection region (t = 4.90 > 2.704), H0 is rejected. There is sufficient evidence to indicate weight change and digestion are correlated at α = .01. d.

After deleting the data corresponding to duck chow, the preliminary calculations are:

∑ x = 701.50 SS xy = ∑ xy −

SS xx = ∑ x 2 − SS yy = ∑ y

2

∑x

2

= 21, 069

∑ xy = 99.5 ∑ y = −18 ∑ y

2

= 404.00

∑ x∑ y = 99.5 − 701.50(−18) = 482.1363636 n

(∑ x)

33

2

= 21, 069 −

n

(∑ y) −

2

= 404 −

(701.50) 2 = 6,156.81061 33

(−18) 2 = 394.1818182 33

n 482.1363636 βˆ1 = = = 0.078309435 SSxx 6,156.81061 SSxy

r=

SSxy SSxx SS yy

482.1363636

=

= .3095

6,156.81061 394.1818182

There is a rather weak positive linear relationship between digestion efficiency and weight change. To determine whether weight change is correlated with digestion, we test: H0: ρ = 0 Ha: ρ ≠ 0 The test statistic is t =

r

=

.3095

= 1.81 1 − r2 1 − .30952 n−2 33 − 2 The rejection region requires α/2 = .01/2 = .005 in each tail of the t-distribution with df = n – 2 = 33 – 2 = 31. From Table VI, Appendix B, t.005 = 2.750. The rejection region is t > 2.750 or t < −2.750.

Since the observed value of the test statistic does not fall in the rejection region (t = 1.81 >/ 2.750), H0 is not rejected. There is insufficient evidence to indicate weight change and digestion are correlated at α = .01.

360

Chapter 10


e.


80 70 60

Digest

50 40 30 20 10 0 5

15

25

35

Fiber


∑ x = 943.5 ∑ x ∑ y = 57, 390.75

2

= 24, 533.25

∑ xy = 21, 405.5 ∑ y = 1, 266.5

2

SSxy = ∑ xy − SSxx = ∑ x 2 − SS yy = ∑ y

βˆ1 = r=

SSxy SSxx

2

∑ x∑ y = 21, 405.5 − 943.5(1, 266.5) = −7, 045.51786 n

(∑ x)

42

2

n

(∑ y) −

= 24, 533.25 −

(943.5) 2 = 3, 338.19643 42

= 57, 390.75 −

1, 266.52 = 19,199.74405 42

2

n −7, 045.51786 = = −2.110576177 3, 338.19643

SSxy SSxx SS yy

=

−7, 045.51786 3, 338.19643 19,199.74405

= −.8801

There is a fairly strong negative linear relationship between digestion efficiency and acid-detergent fiber.


361


To determine whether acid-detergent fiber is correlated with digestion, we test: H0: ρ = 0 Ha: ρ ≠ 0 The test statistic is t =

r 1 − r2 n−2

=

−.8801 1 − (−.8801) 2 42 − 2

= −11.72

The rejection region requires α/2 = .01/2 = .005 in each tail of the t-distribution with df = n – 2 = 42 – 2 = 40. From Table VI, Appendix B, t.005 = 2.704. The rejection region is t > 2.704 or t < −2.704. Since the observed value of the test statistic falls in the rejection region (t = −11.72 < −2.704), H0 is rejected. There is sufficient evidence to indicate acid-detergent fiber and digestion are correlated at α = .01. After deleting the data corresponding to duck chow, the preliminary calculations are:

∑ x = 877 ∑ x ∑ y = 21, 069

∑ xy = 17, 274 ∑ y = 701.50

= 24, 036.5

2

2

∑ x∑ y = 17, 274 − 877(701.50) = −1, 368.89394

SSxy = ∑ xy −

SSxx = ∑ x 2 − SS yy = ∑ y

2

n

(∑ x)

33

2

= 24, 036.5 −

n

(∑ y) −

2

= 21, 069 −

(877) 2 = 729.56061 33

(701.50) 2 = 6,156.81061 33

n −1, 368.89394 βˆ1 = = = −1.876326547 SSxx 729.56061 SSxy

r=

SSxy SSxx SS yy

=

−1, 368.89394

= −.6459

729.56061 6,156.81061

There is a moderate negative linear relationship between digestion efficiency and acid-detergent fiber. To determine whether acid-detergent fiber is correlated with digestion, we test: H0: ρ = 0 Ha: ρ ≠ 0

362

Chapter 10



r 1 − r2 n−2

=

−.6459 1 − ( −.6459) 2 33 − 2

= −4.71

The rejection region requires α/2 = .01/2 = .005 in each tail of the t-distribution with df = n – 2 = 33 – 2 = 31. From Table VI, Appendix B, t.005 = 2.750. The rejection region is t > 2.750 or t < −2.750. Since the observed value of the test statistic falls in the rejection region (t = −4.71 < −2.750), H0 is rejected. There is sufficient evidence to indicate acid-detergent fiber and digestion are correlated at α = .01. 10.66

a.

b.


∑x

= 28

SSxy =

∑

∑ x = 224 ∑ xy = 254 ∑ y = 37 ∑ y ∑ x∑ y = 254 − 28(37) = 106 xy − 2

n

SSxx =

∑ x2 −

SSyy =

∑y

2

(∑ x)

= 307

7

2

n

(∑ y) − n

2

= 224 −

282 = 112 7

= 307 −

37 2 = 111.4285714 7

2

106 = .946428571 SS xx 112 37 ⎛ 28 ⎞ − .946428571 ⎜ ⎟ = 1.5 βˆ0 = y − βˆ1 x = 7 ⎝ 7 ⎠

βˆ1 =

SS xy

=

The least squares line is yˆ = 1.5 + .946x. c.

SSE = SSyy − βˆ1 SSxy = 111.4285714 − (.946428571)(106) = 11.1071429 SSE 11.1071429 = = 2.22143 s2 = n−2 7−2


363


d.

The form of the confidence interval is: 1 ( xp − x ) yˆ ± tα/2s + SSxx n

2

where s =

2 s =

For xp = 3, yˆ = 1.5 + .946(3) = 4.338 and x =

2.22143 = 1.4904

28 =4 7

For confidence coefficient .90, α = 1 − .90 = .10 and α/2 = .10/2 = .05. From Table VI, Appendix B, t.05 = 2.015 with df = n − 2 = 7 − 2 = 5. The 90% confidence interval is: 1 (3 − 4) + ⇒ 4.338 ± 1.170 ⇒ (3.168, 5.508) 7 112 2

4.338 ± 2.015(1.4904) e.

The form of the prediction interval is: 1 ( xp − x ) yˆ ± tα/2s 1 + + SSxx n

2

The 90% prediction interval is: 1 (3 − 4) + ⇒ 4.338 ± 3.223 ⇒ (1.115, 7.561) 7 112 2

4.338 ± 2.015(1.4904) 1 + f.

The 95% prediction interval for y is wider than the 95% confidence interval for the mean value of y when xp = 3. The error of predicting a particular value of y will be larger than the error of estimating the mean value of y for a particular x value. This is true since the error in estimating the mean value of y for a given x value is the distance between the least squares line and the true line of means, while the error in predicting some future value of y is the sum of two errors—the error of estimating the mean of y plus the random error that is a component of the value of y to be predicted.

10.68

a.

The form of the confidence interval is: s ∑ y = 22 = 2.2 y ± tα/2 where y = n 10 n

s2 =

∑y

2

(∑ y) − n −1

n

2

=

(22) 2 10 = 3.7333 and s = 1.9322 10 − 1

82 −

For confidence coefficient .95, α = 1 − .95 = .05 and α/2 = .05/2 = .025. From Table VI, Appendix B, t.025 = 2.262 with df = n − 1 = 10 − 1 = 9. The 95% confidence interval is: 2.2 ± 2.262

364

1.9322 10

⇒ 2.2 ± 1.382 ⇒ (.818, 3.582)

Chapter 10


b.

c.

The confidence intervals computed in Exercise 10.63 are much narrower than that found in part a. Thus, x appears to contribute information about the mean value of y.

d.

From Exercise 12.63, βˆ1 = .843, s = .8619, SSxx = 38.9, and n = 10. H0: β1 = 0 Ha: β1 ≠ 0 The test statistic is t =

βˆ1 − 0 sβˆ

=

βˆ1 − 0 s

=

.843 − 0 = 6.10 .8619

1

SSxx

38.9

The rejection region requires α/2 = .05/2 = .025 in each tail of the t-distribution with df = n − 2 = 10 − 2 = 8. From Table VI, Appendix B, t.025 = 2.306. The rejection region is t > 2.306 or t < −2.306. Since the observed value of the test statistic falls in the rejection region (t = 6.10 > 2.306), H0 is rejected. There is sufficient evidence to indicate the straight-line model contributes information for the prediction of y at α = .05. 10.70

10.72

a.

The 95% confidence interval for E(y) when y = .52 is (3,598.1, 3,868.1). We are 95% confident that the mean asking price for a diamond weighing .52 carats is between $3,598.10 and $3,868.10.

b.

The 95% prediction interval for y when y = .52 is (1529.8, 5,936.3). We are 95% confident that the actual asking price for a diamond weighing .52 carats is between $1,529.80 and $5,936.30.

Answers may vary. One possible answer is: The 90% confidence interval for x = 220.00 is (5.64898, 5.83848). We are 90% confident that the mean sweetness index of all orange juice samples will be between 5.64898 and 5.83848 parts per million when the pectin value is 220.00.


365


10.74

a.

Using MINITAB, the results of the regression analysis are: Regression Analysis: Managers versus UnitsSold The regression equation is Managers = 5.33 + 0.586 UnitsSold Predictor Constant UnitsSol

Coef 5.325 0.58610

S = 2.566

SE Coef 1.180 0.03818

R-Sq = 92.9%

T 4.51 15.35

P 0.000 0.000

R-Sq(adj) = 92.5%


DF 1 18 19

SS 1552.0 118.6 1670.5

MS 1552.0 6.6

F 235.63

P 0.000

To determine the usefulness of the model, we test: H0: β1 = 0 Ha: β1 ≠ 0 The test statistic is t =

βˆ1 − 0 sβˆ

= 15.35 (from printout).

1

The rejection region requires α/2 = .05/2 = .025 in each tail of the t-distribution with df = n − 2 = 20 − 2 = 18. From Table VI, Appendix B, t.025 = 2.101. The rejection region is t > 2.101 or t < −2.101. Since the observed value of the test statistic falls in the rejection region (t = 15.35 > 2.101), H0 is rejected. There is sufficient evidence to indicate the model is useful at α = .05. Therefore, the monthly sales is useful in predicting the number of managers at α = .05. b.

For confidence coefficient .90, α = 1 − .90 = .10 and α/2 = .10/2 = .05. From Table VI, Appendix B, t.05 = 1.734 with df = 18. For xp = 39, x =

∑ x = 540 n

20

= 27, and yˆ = 5.325 + .5861(39) = 28.1829.

The form of the prediction interval is: 2 1 (39 − 27) 2 1 ( xp − x ) + ⇒ 28.183 ± 1.734(2.5664) 1 + yˆ ± tα/2s 1 + + 20 4, 518 n SSxx

⇒ 28.183 ± 4.629 ⇒ (23.554, 32.812) c.

366

We are 90% confident the actual number of managers needed when 39 units are sold is between 23.55 and 32.81.

Chapter 10


10.76

a.

From Exercise 10.34, SSxx = 3000 and x = 50. Also, for Brand A, s = 1.211; for Brand B, s = .610. For Brand A, yˆ = 6.62 − .0727(45) = 3.349, while for Brand B, yˆ = 9.31 − .1077(45) = 4.464. The degrees of freedom for both brands is n − 2 = 15 − 2 = 13. For confidence coefficient .90, (i.e., for all parts of this question), α = .10 and α/2 = .05. From Table VI, Appendix B, with df = 13, t.05 = 1.771. The form of both confidence intervals is yˆ ± tα/2s

2 1 ( xp − x ) + n SSxx

For Brand A, we obtain: 1 (45 − 50) + ⇒ 3.349 ± .587 ⇒ (2.762, 3.936) 15 3000 2

3.349 ± 1.771(1.211) For Brand B, we obtain:

1 (45 − 50) + ⇒ 4.464 ± .296 ⇒ (4.168, 4.760) 15 3000 2

4.464 ± 1.771(.610)

The first interval is wider, caused by the larger value of s.

b.

2 1 ( xp − x ) The form of both prediction intervals is yˆ ± tα/2s 1 + + n SSxx

For Brand A, we obtain: 1 (45 - 50) 3.349 ± 1.771(1.211) 1 + + 15 3000

2

⇒ 3.349 ± 2.224 ⇒ (1.125, 5.573)

For Brand B, we obtain: 1 (45 - 50) 4.464 ± 1.771(.610) 1 + + 15 3000

2

⇒ 4.464 ± 1.120 ⇒ (3.344, 5.584)

Again, the first interval is wider, caused by the larger value of s. Each of these intervals is wider than its counterpart from part a, since, for the same x, a prediction interval for an individual y is always wider than a confidence interval for the mean of y. This is due to an individual observation having a greater variance than the variance of the mean of a set of observations.


367


c.

To obtain a confidence interval for the life of a brand A cutting tool that is operated at 100 meters per minute, we use: 2 1 ( xp − x ) yˆ ± tα/2s 1 + + n SSxx

For x = 100, yˆ = 6.62 − .0727(100) = −.65. The degrees of freedom are n − 2 = 15 − 2 = 13. For confidence coefficient .95, α = .05 and α/2 = .025. From Table VI, Appendix B, with df = 13, t.025 = 2.160. Here, we obtain: −.65 ± 2.160(1.211) 1 +

(100 − 50) 2 1 + ⇒ −.65 ± 3.606 ⇒ (−4.256, 2.956) 15 3000

The additional assumption would be that the straight line model fits the data well for the x's actually observed all the way up to the value under consideration, 100. Clearly from the estimated value of −.65, this is not true (usually, negative "useful lives" are not found). 10.78

a.

b.

One possible line is yˆ = x. x

y

yˆ

y - yˆ

1 3 5

1 3 5

1 3 5

0 0 0 0

For this example

∑ ( y − yˆ ) = 0

A second possible line is yˆ = 3.

368

x

y

yˆ

y - yˆ

1 3 5

1 3 5

3 3 3

−2 0 2 0

For this example

∑ ( y − yˆ ) = 0

Chapter 10


c.


∑ x = 9 ∑ x = 35 ∑ xy = 35 ∑ y = 9 ∑ y = 35 ∑ x∑ y = 35 − 9(9) = 8 SSxy = ∑ xy − n 3 ( ∑ x ) = 35 − 9 = 8 SSxx = ∑ x − n 3 ( ∑ y ) = 35 − 9 = 8 SSyy = ∑ y − 3 n 2

2

2

2

2

2

i

2 i

βˆ1 =

SS xy SS xx

=

2

8 =1 8

9 ⎛9⎞ βˆ0 = βˆ1 x = − 1 ⎜ ⎟ = 0 3

⎝3⎠

The least squares line is yˆ = 0 + 1x = x. d.

For yˆ = x, SSE = SSyy βˆ1 SSxy = 8 − 1(8) = 0 For yˆ = 3, SSE = ∑ ( yi − yî ) 2 = (1 − 3)2 + (3 − 3)2 + (5 − 3)2 = 8 The least squares line has the smallest SSE of all possible lines.

10.80

a.

The variables x and y do appear to be related. It appears when x increases, y tends to increase. b.

r = r 2 = .612 = .7823 The correlation between concentration and exhaustion index is .7823. This relationship is positive since r > 0. The relationship is fairly strong. No, this does not mean that concentration causes emotional exhaustion. They are just related.


369


c.

To determine if the straight-line relationship is useful, we test: H0: β1 = 0 Ha: β1 ≠ 0 The test statistic is t =

βˆ1 − 0 sβˆ

= 6.03

1

The rejection region requires α/2 = .05/2 = .025 in each tail of the t-distribution with df = n − 2 = 25 − 2 = 23. From Table VI, Appendix B, t.025 = 2.069. The rejection region is t > 2.069 or t < −2.069. Since the observed value of the test statistic falls in the rejection region (t = 6.03 > 2.069), H0 is rejected. There is sufficient evidence to indicate the model is useful for predicting burnout at α = .05. d.

r2 = .612 61.2% of the sample variation of exhaustion index is explained by the linear relationship between the exhaustion index and concentration.

e.

For confidence level .95, α = .05 and α/2 = .05/2 = .025. From Table VI, Appendix B, with df = n – 2 = 25 – 2 = 23, t.025 = 2.069. The 95% confidence interval is:

βˆ1 ± t.025 sβˆ ⇒ 8.865 ± 2.069(1.471) ⇒ 8.865 ± 3.043 ⇒ (5.822, 11.908) 1

We are 95% confident that the change in mean exhaustion index for each unit change in concentration is between 5.822 and 11.908. f.

For confidence coefficient .95, α = 1 − .95 and α/2 = .05/2 = .025. From Table VI, Appendix B, t.025 = 2.069 with df = 23. The confidence interval is: 2 1 ( xp − x ) yˆ ± tα/2s where yˆ = −29.497 + 8.865(80) = 679.703 + n SSxx

1 (80 − 68.56) + ⇒ 679.703 ± 80.054 25 14, 026.16 ⇒ (599.678, 759.757) 2

⇒ 679.703 ± 2.069(174.2074)

We are 95% confident that the interval from 599.648 to 759.757 encloses the mean exhaustion level for all professionals who have 80% of their social contacts within their work groups.

370

Chapter 10


10.82

a.

∑ x = 590,124 ∑ x = 27,727,637,890 ∑ xy = 1,396,503,941 ∑ y = 30,537.4 ∑ y = 73,506,140.4 ( ∑ x )( ∑ y ) = 1,396,503,941 − 590,124(30, 537.4) = 10,284,507 SSxy = ∑ xy − 13 13 ( ∑ x ) = 27,727,637,890 − 590,124 = 939,458,250 SSxx = ∑ x − 13 13 2

2

2

2

2

10, 284, 507 = .010947274 ≈ .0109 939, 458, 250 SS xx .010947274(590,124) 30, 537.4 βˆ1 = y − βˆ1 x = = 1852.088523 ≈ 1852.089 − 13 13

βˆ1 =

SS xy

=

The least squares line is yˆ = 1852.089 + .0109x. b.

The plot of the data is:

c.

Based on the graph, it does not appear that the line fits the data very well. The points do not lie very close to the line.

d.

Some preliminary calculations are: SS yy = ∑ y

2

(∑ y) − n

2

= 73, 506,140.4 −

(30, 537.4) 2 = 1, 772,848.19 13

SSE = SS yy − βˆ1SS xy = 1, 772,848.19 − (0.010947274)(10, 284, 507) = 1, 660, 260.874

SSE 1, 660, 260.874 = = 150, 932.8067 n−2 13 − 2 and s = 150, 932.8067 = 388.501 s 2 = MSE =


371


For confidence level .95, α = .05 and α/2 = .05/2 = .025. From Table VI, Appendix B, with df = n – 2 = 13 – 2 = 11, t.025 = 2.201. The 95% confidence interval is:

βˆ1 ± t.025 sβˆ ⇒ .0109 ± 2.201 1

10.84

10.86

388.501 939, 458, 250

⇒ .0109 ± .0279 ⇒ (−0.0170, 0.0388)

e.

Since 0 is contained in the 95% confidence interval, there is no evidence to indicate that there is a linear relationship between buying income and retail sales.

a.

r = .14. Because this value is close to 0, there is a very weak positive linear relationship between math confidence and computer interest for boys.

b.

r = .33. Because this value is fairly close to 0, there is a weak positive linear relationship between math confidence and computer interest for girls.

a.

βˆ1 = .020. For each additional 1% increase in leaves infected, the mean log of the average number of infections per leaf is estimated to increase by .02.

b.

r2 = .816. 81.6% of the total sample variability around the sample mean log of the average number of infections per leaf is explained by the linear relationship between the log of the average number of infections per leaf and the percentage of leaves infected.

c.

s = .288. We would expect most of the observed values of the log of the average number of infections per leaf to fall within ±2s or ±2(.288) or .576 units of their predicted values.

d.

r = .816 = .903. Because this number is close to 1, there is a fairly strong positive linear relationship between the log of the average number of infections per leaf and the percentage of leaves infected.

e.

To determine if there is a linear relationship between the log of the average number of infections per leaf and the percentage of leaves infected, we test: H0: β1 = 0 Ha: β1 ≠ 0 The test statistic is t =

r (1 − r ) /(n − 2) 2

=

.903 (1 − .816) /(100 − 2)

= 20.83

The rejection region requires α/2 = .05/2 = .025 in each tail of the t-distribution with df = n − 2 = 100 − 2 = 98. From Table VI, Appendix B, t.025 ≈ 1.99. The rejection region is t < −1.99 or t > 1.99. Since the observed value of the test statistic falls in the rejection region (t = 20.83 > 1.99), H0 is rejected. There is sufficient evidence to indicate that there is a linear relationship between the log of the average number of infections per leaf and the percentage of leaves infected at α = .05.

372

Chapter 10


10.88

f.

For xp = 80%, yˆ = −.939 + .020(80) = .661. The antilog (base 10) of .661 is 4.58. Thus, when the percentage of leaves infected is 80%, the average number of infections per leaf is predicted to be 4.58.

a.

A straight line model relating an NFL team’s current value to its operating income is: y = β0 + β1x + ε

b.

∑ x = 1,037.6

∑ y = 26, 207

∑ x 2 = 38,996.28 x=

∑ y 2 = 22,024,389

∑ x = 1,037.6 = 32.425 32

n

SSxy = ∑ xy −

∑ xy = 879, 473.1

y=

∑ y = 26, 207 = 818.96875 32

n

( ∑ x )( ∑ y ) = 879, 473.1 − 1,037.6(26, 207)

n = 879, 473.1 − 849,761.975 = 29,711.125

(∑ x) −

32

2

(1,037.6) 2 32 n = 38,996.28 − 33,644.18 = 5,352.1

SSxx = ∑ x

βˆ1 =

SSxy SSxx

2

=

= 38,996.28 −

29,711.125 = 5.551302293 ≈ 5.551 5,352.1

βô = y − βˆ1 x = 818.96875 − (5.551302293)(32.425) = 638.9677731 ≈ 638.968 The fitted regression line is: yˆ = 638.968 + 5.551x c.

βˆ1 = 5.551. When operating income increases by 1 millon dollars, the mean current value is estimated to increase by 5.551 million dollars. This is meaningful for values of operating income between 7.8 and 54.3 million dollars.

βˆ0 = 638.968. This has no meaning since x = 0 is not in the observed range. d.

Some additional calculations are:

(∑ y) −

2

(26, 207) 2 32 n = 22,024,389 − 21, 462,714.03 = 561,674.97

SS yy = ∑ y

2

= 22,024,389 −

SSE = SS yy − βˆ1SS xy = 561674.97 − 5.551302293(29,711.125) = 396,739.5337


373


s 2 = MSE =

SSE 396,739.5337 = = 13, 224.65112 and n−2 32 − 2

s = 13, 224.65112 = 114.9985 To determine if a linear relationship exists between current value and operating income, we test: H0: β1 = 0 Ha: β1 ≠ 0 The test statistics is t =

βˆ1 − 0 sβˆ

5.551 − 0 = 3.53 114.9985 5,352.1

=

1

No α was given so we will use α = .05. The rejection region requires α/2 = .05/2 = .025 in each tail of the t-distribution with df = n – 2 = 32 – 2 = 30. From Table VI, Appendix B, t.025 = 2.042. The rejection region is t > 2.042 or t < −2.042. Since the observed value of the test statistic falls in the rejection region (t = 3.53 > 2.042), H0 is rejected. There is sufficient evidence to indicate a significant linear relationship between current value and operating income at α = .05. r2 =

SS yy − SSE SS yy

=

561,674.97 − 396,739.5337 = .29365 ≈ .294 561,674.97

29.4% of the sample variation around the sample mean current value is explained by the linear relationship between current value and operating income. There is a significant linear relationship between current value and operating income. However, the relationship is not particularly strong. 10.90

a.

Using MINITAB, the regression analysis is: Regression Analysis: BTU versus Area The regression equation is BTU = - 99045 + 103 Area Predictor Constant Area S = 628185

Coef -99045 102.81

SE Coef 261618 15.86

R-Sq = 67.8%

T -0.38 6.48

P 0.709 0.000

R-Sq(adj) = 66.1%


374

DF SS MS 1 1.65850E+13 1.65850E+13 20 7.89232E+12 3.94616E+11 21 2.44773E+13

F 42.03

P 0.000

Chapter 10


Predicted Values for New Observations New Obs 1

Fit 723467

SE Fit 165874

(

95.0% CI 377459, 1069475)

95.0% PI ( -631816, 2078750)

Values of Predictors for New Observations New Obs Area 1 8000

βˆ0 (INTERCEP) = −99045 βˆ1 (AREA) = 102.81 b.

To determine if energy consumption is positively linearly related to the shell area, we test: H0: β1 = 0 Ha: β1 > 0 The test statistic is t = 6.48 (from printout). The rejection region requires α = .10 in the upper tail of the t-distribution with df = n − 2 = 22 − 2 = 20. From Table VI, Appendix B, t.10 = 1.325. The rejection region is t > 1.325. Since the observed value of the test statistic falls in the rejection region (t = 6.48 > 1.325), H0 is rejected. There is sufficient evidence to indicate that energy consumption is positively linearly related to the shell area at α = .10.

c.

Since this is a one-tailed test but the output calculates the p-value for a two-tailed test, the observed significance level is: 1 ( Prob > T 2

) ≤ 12 (.000) = .000

This is the probability of observing our value of t (6.481) or anything larger if β1 = 0. Since this probability is so small, there is strong evidence to reject H0. d.

r2 = R-Square = .678 67.8% of the total sample variability in energy consumption around its mean is explained by the linear relationship between energy consumption and shell area.

e.

From the printout, for xp = 8000, yˆ = 723,467 The 95% prediction interval is (−631,816, 2,078,750). This interval is so large and includes negative BTU's; it is not very useful.


375


10.92


∑ x = 4305 ∑ y = 201,558 a.

βˆ1 =

∑x ∑y

2

∑ xy = 76,652,695 ∑ x 1,652,025 2

2

= 1,652,025

∑ xy = 76,652,695

= 3,571,211,200

= 46.39923427 ≈ 46.3992

The least squares line is yˆ = 46.3992x.

b.

SSxy =

∑ xy −

SSxx =

∑x

βˆ1 =

SSxy SSxx

2

∑ x∑ y n

(∑ x) −

= 76,652,695 −

2

= 1,652,025 −

4305(201,558) = 18,805,549 15 2

4305 = 416,490 15

n 18,805,549 = = 45.15246224 ≈ 45.1525 416, 490

βˆ0 − y − βˆ1 x =

201,558 ⎛ 4305 ⎞ − 45.15246224 ⎜ ⎟ = 478.4433 15 ⎝ 15 ⎠

The least squares line is yˆ = 478.4433 + 45.1525x. c.

376

Because x = 0 is not in the observed range, we are trying to represent the data on the observed interval with the best fitting line. We are not concerned with whether the line goes through (0, 0) or not.

Chapter 10


d.


(∑ y) −

2

201,5582 = 862,836,042 15 n SSE = SSyy − βˆ1 SSxy = 862,836,042 - 45.15246224(18,805,549) = 13,719,200.88 SSE 13,719, 200.88 s2 = = = 1,055,323.145 s = 1027.2892 n−2 15 − 2

SSyy =

∑y

2

= 3,571, 211, 200 −


βˆ0 − 0 2

=

x 1 + s n SSxx

478.443 2 1 1027.2892 + 287 15 416, 490

= .906

The rejection region requires α/2 = .10/2 = .05 in each tail of the t-distribution with df = n − 2 = 15 − 2 = 13. From Table VI, Appendix B, t.05 = 1.771. The rejection region is t < −1.771 or t > 1.771. Since the observed value of the test statistic does not fall in the rejection region (t = .906 >/ 1.771), H0 is not rejected. There is insufficient evidence to indicate β0 is different from 0 at α = .10. Thus, β0 should not be included in the model. 10.94

Answers may vary. Possible answer: The scaffold-drop survey provides the most accurate estimate of spall rate in a given wall segment. However, the drop areas were not selected at random from the entire complex; rather, drops were made at areas with high spall concentrations. Therefore, if the photo spall rates could be shown to be related to drop spall rates, then the 83 photo spall rates could be used to predict what the drop spall rates would be. a.

Construct a scattergram for the data.

The scattergram shows a positive relationship between the photo spall rate (x) and the drop spall rate (y).


377


b.

Find the prediction equation for drop spall rate. The MINITAB output shows the results of the analysis. The regression equation is drop = 2.55 + 2.76 photo Predictor Constant photo S = 4.164

Coef 2.548 2.7599

StDev 1.637 0.2180

R-Sq = 94.7%

T P 1.56 0.154 12.66 0.000 R-Sq(adj) = 94.1%

Analysis of Variance Source DF SS Regression 1 2777.5 Residual Error 9 156.0 Total 10 2933.5 Unusual Observations Obs photo drop 11 11.8 43.00

MS 2777.5 17.3

F P 160.23 0.000

Fit StDev Fit 35.11 1.97

Residual St Resid 7.89 2.15R

R denotes an observation with a large standardized residual yˆ = 2.55 + 2.76x c.

Conduct a formal statistical hypthesis test to determine if the photo spall rates contribute information for the prediction of drop spall rates. H0: β1 = 0 Ha: β1 ≠ 0 The test statistic is t = 12.66, with p-value < .0001. Reject H0 for any level of significance ≥ .0001. There is sufficient evidence to indicate that photo spall rates contribute information for the prediction of drop spall rates at α ≥ .0001.

d.

378

One could now use the 83 photos spall rates to predict values for 83 drop spall rates. Then use this information to estimate the true spall rate at a given wall segment and estimate to total spall damage.

Chapter 10


Multiple Regression and Model Building

11.2

a.

βˆ0 = 506.346, βˆ1 = −941.900, βˆ2 = -429.060

b.

yˆ = 506.346 − 941.900x1 − 429.060x2

c.

SSE = 151,016, MSE = 8883, s = 94.251

Chapter 11

We expect about 95% of the y-values to fall within ±2s or ±2(94.251) or ±188.502 units of the fitted regression equation. d.

H0: β1 = 0 Ha: β1 ≠ 0


βˆ1 − 0 sβˆ

=

−941.900 = −3.42 275.08

1

The rejection region requires α/2 = .05/2 = .025 in each tail of the t distribution with df = n − (k + 1) = 20 - (2 + 1) = 17. From Table VI, Appendix B, t.025 = 2.110. The rejection region is t < −2.110 or t > 2.110. Since the observed value of the test statistic falls in the rejection region (t = −3.42 < −2.110), H0 is rejected. There is sufficient evidence to indicate β1 ≠ 0 at α = .05. e.

For confidence coefficient .95, α = .05 and α/2 = .025. From Table VI, Appendix B, with df = n − (k + 1) = 20 − (2 + 1) = 17, t.025 = 2.110. The 95% confidence interval is:

βˆ2 ± t.025 sβˆ ⇒ −429.060 ± 2.110(379.83) ⇒ −429.060 ± 801.441 2

⇒ (−1230.501, 372.381) f.

R2 = R-Sq = 45.9% . 45.9% of the total sample variation of the y values is explained by the model containing x1 and x2. R2a = R-Sq(adj) = 39.6%. 39.6% of the total sample variation of the y values is explained by the model containing x1 and x2, adjusted for the sample size and the number of parameters in the model.


379


g.

To determine if at least one of the independent variables is significant in prediction y, we test: H0: β1 = β2 = 0 Ha: At least one βi ≠ 0 From the printout, the test statistic is F = 7.22 Since no α level was given, we will choose α = .05. The rejection region requires α = .05 in the upper tail of the F-distribution with ν1 = k = 2 and ν2 = n – (k + 1) = 20 – (2 + 1) = 17. From Table IX, Appendix B, F.05 = 3.59. The rejection region is F > 3.59. Since the observed value of the test statistic falls in the rejection region ( F = 7.22 > 3.59), H0 is rejected. There is sufficient evidence to indicate at least one of the variables, x1 or x2, is significant in predicting y at α = .05.

11.4

h.

The observed significance level of the test is p-value = 0.005. Since the p-value is so small, we will reject H0 for most reasonable values of α. There is sufficient evidence to indicate at least one of the variables, x1 or x2, is significant in predicting y at α greater than 0.005.

a.

We are given βˆ1 = 3.1, sβˆ = 2.3, and n = 25. 1

H0: β1 = 0 Ha: β1 > 0 The test statistic is t =

βˆ1 − 0 sβˆ

=

3.1 = 1.35 2.3

1

The rejection region requires α = .05 in the upper tail of the t distribution with df = n − (k + 1) = 25 − (2 + 1) = 22. From Table VI, Appendix B, t.05 = 1.717. The rejection region is t > 1.717. Since the observed value of the test statistic does not fall in the rejection region (t = 1.35 >/ 1.717), H0 is not rejected. There is insufficient evidence to indicate β1 > 0 at α = .05. b.

We are given βˆ2 = .92, sβˆ = .27, and n = 25. 2


βˆ2 − 0 sβˆ

=

.92 = 3.41 .27

2

380

Chapter 11


The rejection region requires α/2 = .05/2 = .025 in each tail of the t distribution with df = n − (k + 1) = 25 − (2 + 1) = 22. From Table VI, Appendix B, t.025 = 2.074. The rejection region is t < −2.074 or t > 2.074. Since the observed value of the test statistic falls in the rejection region (t = 3.41 > 2.074), reject H0. There is sufficient evidence to indicate β2 ≠ 0 at α = .05. c.

For confidence coefficient .90, α = 1 − .90 = .10 and α/2 = .10/2 = .05. From Table VI, Appendix B, with df = n − (k + 1) = 25 − (2 + 1) = 22, t.05 = 1.717. The confidence interval is:

βˆ1 ± t.05 sβˆ ⇒ 3.1 ± 1.717(2.3) ⇒ 3.1 ± 3.949 ⇒ (−.849, 7.049) 1

We are 90% confident that β1 falls between −.849 and 7.049. d.

For confidence coefficient .99, α = 1 − .99 = .01 and α/2 = .01/2 = .005. From Table VI, Appendix B, with df = n − (k + 1) = 25 − (2 + 1) = 22, t.005 = 2.819. The confidence interval is:

βˆ2 ± t.005 sβˆ ⇒ .92 ± 2.819(.27) ⇒ .92 ± .761 ⇒ (.159, 1.681) 2

We are 99% confident that β2 falls between .159 and 1.681. 11.6

a.

For x2 = 1 and x3 = 3, E(y) = 1 + 2x1 + 1 − 3(3) E(y) = 2x1 − 7 The graph is :


381


b.

For x2 = −1 and x3 = 1 E(y) = 1 + 2x1 + (−1) − 3(1) E(y) = 2x1 − 3 The graph is:

c.

They are parallel, each with a slope of 2. They have different y-intercepts.

d.

The relationship will be parallel lines.

11.8

No. There may be other independent variables that are important that have not been included in the model, while there may also be some variables included in the model which are not important. The only conclusion is that at least one of the independent variables is a good predictor of y.

11.10

a.

The first order model is: E(y) = β0 + β1 x1 + β2 x2 + β3 x3 + β4 x4 + β5 x5

b.

R2 = .58. 58% of the total sample variation of the levels of trust is explained by the model containing the 5 independent variables.

c.

F=

d.

.58 5 R2 k = = 16.57 2 (1 − R ) [n − (k + 1)] (1 − .58) [66 − (5 + 1)]

The rejection region requires α = .10 in the upper tail of the F-distribution with ν1 = k = 5 and ν2 = n – (k + 1) = 66 – (5 + 1) = 60. From Table VIII, Appendix B, F.10 = 1.90. The rejection region is F > 1.96. Since the observed value of the test statistic falls in the rejection region (F = 16.57 > 1.96), H0 is rejected. There is sufficient evidence to indicate that at least one of the 5 independent variables is useful in the prediction of level of trust at α = .10.

11.12

a.

The least squares prediction equation is:

yˆ = 3.70 + .34 x1 + .49 x2 + .72 x3 + 1.14 x4 + 1.51x5 + .26 x6 − .14 x7 − .10 x8 − .10 x9 .

382

Chapter 11


b.

βˆ0 = 3.70 . This is estimate of the y-intercept. It has no other meaning because the point with all independent variables equal to 0 is not in the observed range.

βˆ1 = 0.34 . For each additional walk, the mean number of runs scored is estimated to increase by .30, holding all other variables constant.

βˆ2 = 0.49 . For each additional single, the mean number of runs scored is estimated to increase by .49, holding all other variables constant.

βˆ3 = 0.72 . For each additional double, the mean number of runs scored is estimated to increase by .72, holding all other variables constant.

βˆ4 = 1.14 . For each additional triple, the mean number of runs scored is estimated to increase by 1.14, holding all other variables constant.

βˆ5 = 1.51 . For each additional home run, the mean number of runs scored is estimated to increase by 1.51, holding all other variables constant.

βˆ6 = 0.26 . For each additional stolen base, the mean number of runs scored is estimated to increase by .26, holding all other variables constant.

βˆ7 = −0.14 . For each additional time a runner is caught stealing, the mean number of runs scored is estimated too decrease by .14, holding all other variables constant.

βˆ8 = −0.10 . For each additional strikeout, the mean number of runs scored is estimated to decrease by .10, holding all other variables constant.

βˆ9 = −0.10 . For each additional out, the mean number of runs scored is estimated to decrease by .10, holding all other variables constant. c.

H0: β7 = 0 Ha: β7 < 0


βˆ7 − 0 sβˆ

=

−.14 − 0 = −1.00 .14

7

The rejection region requires α = .05 in the lower tail of the t-distribution with df = n – (k + 1) = 234 – (9 + 1) = 224. From Table VI, Appendix B, t.05 = 1.645. The rejection region is t < −1.645. Since the observed value of the test statistic does not fall in the rejection region (t = −1.00

383


d.

For confidence level .95, α = .05 and α/2 = .05/2 = .025. From Table VI, Appendix B, with df = 224, t.025 = 1.96. The 95% confidence interval is:

βˆ5 ± tα / 2 sβˆ ⇒ 1.51 ± 1.96(.05) ⇒ 1.51 ± 0.098 ⇒ (1.412, 1.608) 5

We are 95% confident that the mean number of runs will increase by anywhere from 1.412 to 1.608 for each additional home run, holding all other variables constant. 11.14. a.

b.

R2 = .31. 31% of the total sample variation of the natural log of the level of CO2 emissions in 1996 is explained by the model containing the 7 independent variables. The test statistic is F =

.31 7 R2 k = = 3.72 2 (1 − R ) [n − (k + 1)] (1 − .31) [66 − (7 + 1)]

The rejection region requires α = .01 in the upper tail of the F-distribution with ν1 = k = 7 and ν2 = n – (k + 1) = 66 – (7 + 1) = 58. From Table XI, Appendix B, F.01 = 2.95. The rejection region is F > 2.95. Since the observed value of the test statistic falls in the rejection region (F = 3.72 > 2.95), H0 is rejected. There is sufficient evidence to indicate that at least one of the 7 independent variables is useful in the prediction of natural log of the level of CO2 emissions in 1996 at α = .01. c.

To determine if foreign investments in 1980 is a useful predictor of CO2 emissions in 1996, we test: H0: β1 = 0 Ha: β1 ≠ 0

11.16

d.

The test statistic is t = 2.52 and the p-value is p < 0.05. Since the observed p-value is less than α (p < .05), Ho is rejected. There is sufficient evidence to indicate foreign investments in 1980 is a useful predictor of CO2 emissions in 1996 at α = .05.

a.

From MINITAB, the output is: Regression Analysis: DDT versus Mile, Length, Weight The regression equation is DDT = - 108 + 0.0851 Mile + 3.77 Length - 0.0494 Weight Predictor Constant Mile Length Weight

Coef -108.07 0.08509 3.771 -0.04941

S = 97.48

SE Coef 62.70 0.08221 1.619 0.02926

R-Sq = 3.9%

T -1.72 1.03 2.33 -1.69

P 0.087 0.302 0.021 0.094

R-Sq(adj) = 1.8%


384

DF 3 140 143

SS 53794 1330210 1384003

MS 17931 9501

F 1.89

P 0.135

Chapter 11


The least squares prediction equation is:

yˆ = −108.07 + 0.08509x1 + 3.771x2 – 0.04941x3 b.

s = 97.48. We would expect about 95% of the observed values of DDT level to fall within 2s or 2(97.48) = 194.96 units of their least squares predicted values.

c.

To determine if at least one of the variables is useful in predicting the DDT level, we test: Ho: β1 = β2 = β3 = 0 Ha: At least 1 βi ≠ 0

The test statistic is F = 1.89 and the p-value is p = .135. Since the p-value is not less than α = .05 (p = .135
To determine if DDT level increases as length increases, we test: H0: β2 = 0 Ha: β2 > 0

The test statistics is t = 2.33 The p-value is p = .021/2 = .0105. Since the p-value is less than α (p = .0105 < .05), H0 is rejected. There is sufficient evidence to indicate that DDT level increases as length increases, holding the other variables constant at α = .05. The observed significance level is p = .0105. e.

For confidence coefficient .95, α = .05 and α/2 = .05/2 = .025. From Table VI, Appendix B, with df = n – 3 = 144 – 4 = 140, t.025 = 1.96. The 95% confidence interval is:

βˆ3 ± tα / 2 sβˆ ⇒ −0.04941 ± 1.96(0.02926) ⇒ −0.04941 ± 0.05735 3

⇒ (−0.10676, 0.00794)

We are 95% confident that the mean DDT level will change from −0.10676 to 0.00794 for each additional point increase in weight, holding length and mile constant. Since 0 is in the interval, there is no evidence that weight and DDT level are linearly related.


385


11.18

a.

From MINITAB, the output is: Regression Analysis: WeightChg versus Digest, Fiber The regression equation is WeightChg = 12.2 - 0.0265 Digest - 0.458 Fiber Predictor Constant Digest Fiber

Coef 12.180 -0.02654 -0.4578

S = 3.519

SE Coef 4.402 0.05349 0.1283

R-Sq = 52.9%

T 2.77 -0.50 -3.57

P 0.009 0.623 0.001

R-Sq(adj) = 50.5%


DF 2 39 41

SS 542.03 483.08 1025.12

MS 271.02 12.39

F 21.88

P 0.000

yˆ = 12.2 − .0265x1 − .458x2 b.

βˆ0 = 12.2 = the estimate of the y-intercept βˆ1 = −.0265. We estimate that the mean weight change will decrease by .0265% for each additional increase of 1% in digestion efficiency, with acid-detergent fibre held constant.

βˆ2 = −.458. We estimate that the mean weight change will decrease by .458% for each additional increase of 1% in acid-detergent fibre, with digestion efficiency held constant. c.

To determine if digestion efficiency is a useful predictor of weight change, we test: H0: β1 = 0 Ha: β1 ≠ 0 The test statistic is t = −.50. The p-value is p = .623. Since the p-value is greater than α (p = .623 > .01), H0 is not rejected. There is insufficient evidence to indicate that digestion efficiency is a useful linear predictor of weight change at α = .01.

d.

For confidence coefficient .99, α = 1 − .99 = .01 and α/2 = .01/2 = .005. From Table VI, Appendix B, with df = n − (k + 1) = 42 − (2 + 1) = 39, t.005 ≈ 2.704. The 99% confidence interval is:

βˆ2 ± t.005 sβˆ ⇒ −.4578 ± 2.704 (.1283) ⇒ −.4578 ± .3469 ⇒ (−.8047, −.1109) 2

We are 99% confident that the change in mean weight change for each unit change in acid-detergent fiber, holding digestion efficiency constant is between −.8047% and −.1109%.

386

Chapter 11


e.

R2 = R-Sq = 52.9%. 52.9% of the total sample variance of the weight changes is explained by the model containing the 2 independent variables, digestion efficiency ad acid-detergent fiber. R2a = R-Sq(adj) = 50.5%. 50.5% of the total sample variance of the weight changes is explained by the model containing the 2 independent variables, digestion efficiency ad acid-detergent fiber, adjusting for the sample size and the number of parameters in the model.

f.

To determine if at least one of the variables is useful in predicting weight change, we test: H0: β1 = β2 = 0 Ha: At least 1 βi ≠ 0 The test statistic is F = 21.88 and the p-value is p = .000. Since the p-value is less than α = .05 (p = .000 < .05), H0 is rejected. There is sufficient evidence to indicate at least one of the variables is useful in predicting weight change at α = .05.

11.20

a.

The least squares prediction equation is: yˆ = −4.30 − .002x1 + .336x2 + .384x3 + .067x4 − .143x5 + .081x6 + .134x7

b.

To determine if the model is adequate, we test: H0: β1 = β2 = β3 = β4 = β5 = β6 = β7 = 0 Ha: At least one βi ≠ 0, i = 1, 2, 3, ..., 7 The test statistic is F = 111.1 (from table). Since no α was given, we will use α = .05. The rejection region requires α = .05 in the upper tail of the F-distribution with ν1 = k = 7 and ν2 = n − (k + 1) = 268 − (7 + 1) = 260. From Table IX, Appendix B, F.05 ≈ 2.01. The rejection region is F > 2.01. Since the observed value of the test statistic falls in the rejection region (F = 111.1 > 2.01), H0 is rejected. There is sufficient evidence to indicate that the model is adequate for predicting the logarithm of the audit fees at α = .05.

c.

βˆ3 = .384.

For each additional subsidiary of the auditee, the mean of the logarithm of audit fee is estimated to increase by .384 units.


387


d.

To determine if the β4 > 0, we test: H0: β4 = 0 Ha: β4 > 0 The test statistic is t = 1.76 (from table). The p-value for the test is .079. Since the p-value is not less than α (p = .079 0, holding all the other variables constant, at α = .05.

e.

To determine if the β1 < 0, we test: H0: β1 = 0 Ha: β1 < 0 The test statistic is t = −0.049 (from table). The p-value for the test is .961. Since the p-value is not less than α (p = .961
11.22

To determine if the model is useful, we test: H0: β1 = β2 = ⋅⋅⋅ = β18 = 0 Ha: At least one βi ≠ 0, i = 1, 2, ... , 18 The test statistic is F =

R2 / k .95 /18 = = 1.06 2 (1 − R ) /[n − ( k + 1)] (1 − .95) /[20 − (18 + 1)]

The rejection region requires α = .05 in the upper tail of the F distribution with ν1 = k = 18 and ν2 = n − (k + 1) = 20 − (18 + 1) = 1. From Table IX, Appendix B, F.05 ≈ 245.9. The rejection region is F > 245.9. Since the observed value of the test statistic does not fall in the rejection region (F = 1.06 >/ 247), H0 is not rejected. There is insufficient evidence to indicate the model is adequate at α = .05. Note: Although R2 is large, there are so many variables in the model that ν2 is small.

388

Chapter 11


11.24

a.

From MINITAB, the output is: Regression Analysis: Labor versus Pounds, Units, Weight The regression equation is Labor = 132 + 2.73 Pounds + 0.0472 Units - 2.59 Weight Predictor Constant Pounds Units Weight

Coef 131.92 2.726 0.04722 -2.5874

S = 9.810

SE Coef 25.69 2.275 0.09335 0.6428

R-Sq = 77.0%

T 5.13 1.20 0.51 -4.03

P 0.000 0.248 0.620 0.001

R-Sq(adj) = 72.7%

Analysis of Variance Source Regression Residual Error Total Source Pounds Units Weight

DF 3 16 19

DF 1 1 1

SS 5158.3 1539.9 6698.2

MS 1719.4 96.2

F 17.87

P 0.000

Seq SS 3400.6 198.4 1559.3

The least squares equation is: yˆ = 131.92 + 2.726x1 + .0472x2 − 2.587x3 b.

To test the usefulness of the model, we test: H0: β1 = β2 = β3 = 0 Ha: At least one βi ≠ 0, for i = 1, 2, 3 The test statistic is F =

MSR 1719.4 = = 17.87 MSE 96.2

The rejection region requires α = .01 in the upper tail of the F-distribution with ν1 = k = 3 and ν2 = n − (k + 1) = 20 − (3 + 1) = 16. From Table XI, Appendix B, F.01 = 5.29. The rejection region is F > 5.29. Since the observed value of the test statistic falls in the rejection region (F = 17.87 > 5.29), H0 is rejected. There is sufficient evidence to indicate a relationship exists between hours of labor and at least one of the independent variables at α = .01. c.

H0: β2 = 0 Ha: β2 ≠ 0 The test statistic is t = .51. The p-value = .620. We reject H0 if p-value < α. Since .620 > .05, do not reject H0. There is insufficient evidence to indicate a relationship exists between hours of labor and percentage of units shipped by truck, all other variables held constant, at α = .05.


389


d.

R2 is printed as R-Sq. R2 = .770. We conclude that 77% of the sample variation of the labor hours is explained by the regression model, including the independent variables pounds shipped, percentage of units shipped by truck, and weight.

e.

If the average number of pounds per shipment increases from 20 to 21, the estimated change in mean number of hours of labor is −2.587. Thus, it will cost $7.50(2.587) = $19.4025 less, if the variables x1 and x2 are constant.

f.

Since s = Standard Error = 9.81, we can estimate approximately with ±2s precision or ±2(9.81) or ±19.62 hours.

g.

No. Regression analysis only determines if variables are related. It cannot be used to determine cause and effect.

11.26

From the printout, the 90% prediction interval is (-151.996, 175.4874). We are 90% confidence that an actual DDT level for a fish caught 100 miles upstream that is 40 centimeters long and weighs 800 grams will be between -151.996 and 175.4874. Since the DDT level cannot be negative, the interval would be between 0 and 175.4874.

11.28

a.

From MINITAB, the output is: Regression Analysis: Precip versus Altitude, Latit, Coast The regression equation is Precip = - 102 + 0.00409 Altitude + 3.45 Latit - 0.143 Coast Predictor Constant Altitude Latit Coast

Coef -102.36 0.004091 3.4511 -0.14286

S = 11.10

SE Coef 29.21 0.001218 0.7949 0.03634

R-Sq = 60.0%

T -3.50 3.36 4.34 -3.93

P 0.002 0.002 0.000 0.001

R-Sq(adj) = 55.4%

Analysis of Variance Source Regression Residual Error Total Source Altitude Latit Coast

DF 1 1 1

DF 3 26 29

SS 4809.4 3202.3 8011.7

MS 1603.1 123.2

F 13.02

P 0.000

Seq SS 730.7 2175.3 1903.4


Fit 29.25

SE Fit 5.60

(

95.0% CI 17.75, 40.76)

(

95.0% PI 3.71, 54.80)

Values of Predictors for New Observations New Obs Altitude Latit Coast 1 6360 36.6 145

The fitted regression line is:

yˆ = −102.36 + 0.00409 x1 + 3.4511x2 − 0.1429 x3

390

Chapter 11


b.

To determine if the first-order model is useful for the predicting annual precipitation, we test:

H0: β1 = β2 = β3 = 0 Ha: At least one βi ≠ 0, i = 1, 2, 3 The test statistic is 13.02 and the p-value is p = 0.000. Since the p-value is less than α = .05, H0 is rejected. There is sufficient evidence to indicate that the model is useful for predicting annual precipitation at α = .05. c.

The prediction interval is (3.71, 54.80). With 95% confidence, we can conclude that the annual precipitation for an individual meteorological station with characteristics x1 = 6360 feet, x2 = 36.6°, x3 = 145 miles will fall between 3.71 inches and 54.80 inches.

11.30

The first order model is:

E(y) = β0 + β1x1 + β2x2 + β3x5 We want to find a 95% prediction interval for the actual voltage when the volume fraction of the disperse phase is at the high level (x1 = 80), the salinity is at the low level (x2 = 1), and the amount of surfactant is at the low level (x5 = 2). Using MINITAB, the output is: The regression equation is y = 0.993 - 0.0243 x1 + 0.142 x2 + 0.385 x5 Predictor

Coef

StDev

T

P

0.9326

0.2482

3.76

0.002

x1

-0.024272

0.004900

-4.95

0.000

x2

0.14206

0.07573

1.88

0.080

x5

0.38457

0.09801

3.92

0.001

Constant

S = 0.4796

R-Sq = 66.6%

R-Sq(adj) = 59.9%

Analysis of Variance Source Regression Residual

DF

SS

MS

F

P

3

6.8701

2.2900

9.95

0.001

15

3.4509

0.2301

18

10.3210

Error Total Sourc

DF

Seq SS

x1

1

1.4016

x2

1

1.9263

x5

1

3.5422

e


391


Unusual Observations Obs

x1

y

Fit

StDev Fit

Residual

St Resid

3

40.0

3.200

2.068

0.239

1.132

2.72R

R denotes an observation with a large standardized residual Predicted Values Fit

StDev Fit

-0.098

0.232

95.0% ( -0.592,

CI 0.396)

95.0% (

-1.233,

PI 1.038)

The 95% prediction interval is (−1.233, 1.038). We are 95% confident that the actual voltage is between −1.233 and 1.038 kw/cm when the volume fraction of the disperse phase is at the high level (x1 = 80), the salinity is at the low level (x2 = 1), and the amount of surfactant is at the low level (x5 = 2). 11.32

11.34

a.

E(y) = β0 + β1x1 + β2x2 + β3x1x2

b.

E(y) = β0 + β1x1 + β2x2 + β3x3 + β4x1x2 + β5x1x3 + β6x2x3

a.

R2 = 1 −

SSE SS yy

=1−

21 = .956 479

95.6% of the total variability of the y values is explained by this model. b.

To test the utility of the model, we test:

H0: β1 = β2 = β3 = 0 Ha: At least one βi ≠ 0, i = 1, 2, 3 The test statistic is F =

R2 / k .956 / 3 = 202.8 = 2 (1 − R )[n − (k + 1)] (1 − .956)[32 − (3 + 1)]

The rejection region requires α = .05 in the upper tail of the F distribution, with ν1 = k = 3 and ν2 = n − (k + 1) = 32 − (3 + 1) = 28. From Table IX, Appendix B, F.05 = 2.95. The rejection region is F > 2.95. Since the observed value of the test statistic falls in the rejection region (F = 202.8 > 2.95), H0 is rejected. There is sufficient evidence that the model is adequate for predicting y at α = .05.

392

Chapter 11


c.

The relationship between y and x1 depends on the level of x2.

d.

To determine if x1 and x2 interact, we test:


βˆ1 − 0 10 sβˆ

=

4

= 2.5.

3

The rejection region requires α/2 = .05/2 = .025 in each tail of the t distribution with df = n − (k + 1) = 32 − (3 + 1) = 28. From Table VI, Appendix B, t.025 = 2.048. The rejection region is t < −2.048 or t > 2.048. Since the observed value of the test statistic falls in the rejection region (t = 2.5 > 2.048), H0 is rejected. There is sufficient evidence to indicate that x1 and x2 interact at α = .05. 11.36

a.

To determine if the overall model is useful for predicting y, we test:

H0: β1 = β2 = β3 = 0 Ha: At least one βi is not 0 The test statistic is F = 226.35 and the p-value is p < .001. Since the p-value is less than α (p < .001 < .05), Ho is rejected. There is sufficient evidence to indicate the overall model is useful for predicting y, willingness of the consumer to shop at a retailer’s store in the future at α = .05. b.

To determine if consumer satisfaction and retailer interest interact to affect willingness to shop at retailer’s shop in future, we test:

H0: β3 = 0 Ha: β3 ≠ 0 The test statistic is t = -3.09 and the p-value is p < .01. Since the p-value is less than α (p < .01 < .05), H0 is rejected. There is sufficient evidence to indicate


393


consumer satisfaction and retailer interest interact to affect willingness to shop at retailer’s shop in future at α = .05. c.

When x2 = 1,

yˆ = βô + .426 x1 + .044 x2 − .157 x1 x2 = βˆ + .426 x + .044(1) − .157 x (1) o

1

1

= βô + .044 + (.426 − .157) x1 = βˆ + .044 + .269 x o

1

Since no value is given for βô , we will use βô = 1 for graphing purposes. Using MINITAB, a graph might look like: Scatterplot of YHAT vs X1 when X2=1 3.0

YHA T

2.5

2.0

1.5

1

d.

2

3

4 X1

5

6

7

When x2 = 7,

yˆ = βô + .426 x1 + .044 x2 − .157 x1 x2 = βˆ + .426 x + .044(7) − .157 x (7) o

1

1

= βô + .308 + (.426 − 1.099) x1 = βˆ + .308 − .673x o

1

Since no value is given for βô , we will again use βô = 1 for graphing purposes.

394

Chapter 11


Using MINITAB, a graph might look like: Scatterplot of YHAT vs X1 when X2=7

0

YHA T

-1

-2

-3

-4 1

e.

2

3

4 X1

5

6

7

Using MINITAB, both plots on the same graph would be: Scatterplot of YAHT vs X1 Variable

3

x2=1 x2=7

2

YHA T

1 0 -1 -2 -3 -4 1

2

3

4 X1

5

6

7

Since the lines are not parallel, it indicates that interaction is present. 11.38

a.

The hypothesized regression model including the interaction between x1 and x2 would be:

E ( y ) = β o + β1 x1 + β 2 x2 + β3 x1 x2 b.

If “x1 and x2 interact to affect y” then the effect of x1 on y depends on the level of x2. Also, the effect of x2 on y depends on the level of x1.


395


c.

Since the p-value is not small (p = .25), Ho is not rejected. There is insufficient evidence to indicate x1 and x2 interact to affect y.

d.

β1 corresponds to x1, the number ahead in line. If the “negative feeling score” gets larger as the number of people ahead increases, then β1 is positive. β2 corresponds to x2, the number behind in line. If the “negative feeling score” gets lower as the number of people behind increases, then β2 is negative.

11.40

a.

If client credibility and linguistic delivery style interact, then the effect of client credibility on the likelihood value depends on the level of linguistic delivery style.

b.

To determine the overall model adequacy, we test:

H0: β1 = β2 = β3 = 0 Ha: At least one βi ≠ 0 c.

The test statistic is F = 55.35 and the p-value is p < 0.0005. Since the p-value is so small (p < 0.0005), H0 is rejected for any reasonable value of α. There is sufficient evidence to indicate that the model is adequate at α > 0.0005.

d.

To determine if client credibility and linguistic delivery style interact, we test:

H0: β3 = 0 Ha: β3 ≠ 0 e.

The test statistic is t = 4.008 and the p-value is p < 0.005. Since the p-value is so small (p < 0.005), H0 is rejected. There is sufficient evidence to indicate that client credibility and linguistic delivery style interact at α > 0.005.

f.

When x1 = 22, the least squares line is:

yˆ = 15.865 + 0.037(22) − 0.678 x2 + 0.036 x2 (22) = 16.679 + 0.114 x2 The estimated slope of the Likelihood-Linguistic delivery style line when client credibility is 22 is 0.114. When client credibility is equal to 22, for each additional point increase in linguistic delivery style, the mean likelihood is estimated to increase by 0.114. g.

When x1 = 46, the least squares line is:

yˆ = 15.865 + 0.037(46) − 0.678 x2 + 0.036 x2 (46) = 17.567 + 0.978 x2 The estimated slope of the Likelihood-Linguistic delivery style line when client credibility is 46 is 0.978. When client credibility is equal to 46, for each additional point increase in linguistic delivery style, the mean likelihood is estimated to increase by 0.978.

396

Chapter 11


11.42

a.

E(y) = β0 + β1x1 + β2x2 + β3x3 + β4x4 + β5x5

b.

H0: β4 = 0

c.

t = 4.408, p-value = .001 Since the p-value is so small, there is strong evidence to reject H0. There is sufficient evidence to indicate that the strength of client-therapist relationship contributes information for the prediction of a client's reaction for any α > .001.

11.44

d.

Answers may vary.

e.

R2 = .2946. 29.46% of the variability in the client's reaction scores can be explained by this model.

a.

βˆ1 = .02. The mean level of support for a military response is estimated to increase by .02 for each day increase in level of TV news exposure, all other variables held constant.

b.

To determine if an increase in TV news exposure is associated with an increase in support for military resolution, we test:

H0: β1 = 0 Ha: β1 > 0 The p-value is p = .03/2 = .015. Since the p-value is less than α (p = .015 < .05), H0 is rejected. There is sufficient evidence to indicate that an increase in TV news exposure is associated with an increase in support for military resolution, all other variables held constant, at α = .05. c.

To determine if the relationship between support for military resolution and gender depends on political knowledge, we test:

H0: β8 = 0 Ha: β8 ≠ 0 The p-value is p = .02. Since the p-value is less than α (p = .02 < .05), H0 is rejected. There is sufficient evidence to indicate that the relationship between support for a military resolution and gender depends on political knowledge, all other variables held constant, at α = .05. d.

To determine if the relationship between support for military resolution and race depends on political knowledge, we test:

H0: β9 = 0 Ha: β9 ≠ 0 The p-value is p = .08. Since the p-value is not less than α (p = .08

397


support for a military resolution and race depends on political knowledge, all other variables held constant, at α = .05. e.

f.

R2 = .194.

19.4% of the variation in the support for military resolution is explained by the model containing the seven independent variables and the two interaction terms. H0: β1 = β2 = β3 = β4 = β5 = β6 = β7 = β8 = β9 = 0 Ha: At least one βi ≠ 0, i = 1, 2, 3, ... , 9 The test statistic is F =

R2 / k .194 / 9 = = 46.88 2 (1 − R ) /[n − (k + 1)] (1 − .194) /[1763 − (9 + 1)]

The rejection region requires α = .05 in the upper tail of the F distribution with ν1 = k = 9 and ν2 = n − (k + 1) = 1763 − (9 + 1) = 1753. From Table IX, Appendix B, F.05 ≈ 1.88. The rejection region is F > 1.88. Since the observed value of the test statistic falls in the rejection region (F = 46.88 > 1.88), H0 is rejected. There is sufficient evidence to indicate that the model is useful at α = .05. 11.46

a.


βˆ2 − 0 sβˆ

=

.47 − 0 = 3.133 .15

2

The rejection region requires α/2 = .05/2 = .025 in each tail of the t distribution with df = n − (k + 1) = 25 − (2 + 1) = 22. From Table VI, Appendix B, t.025 = 2.074. The rejection region is t < −2.074 or t > 2.074. Since the observed value of the test statistic falls in the rejection region (t = 3.133 > 2.074), H0 is rejected. There is sufficient evidence to indicate the quadratic term should be included in the model at α = .05. b.

H0: β2 = 0 Ha: β2 > 0 The test statistic is the same as in part a, t = 3.133. The rejection region requires α = .05 in the upper tail of the t distribution with df = 22. From Table VI, Appendix B, t.05 = 1.717. The rejection region is t > 1.717. Since the observed value of the test statistic falls in the rejection region (t = 3.133 > 1.717), H0 is rejected. There is sufficient evidence to indicate the quadratic curve opens upward at α = .05.

398

Chapter 11


11.48

11.50

a.

b.

It moves the graph to the right (−2x) or to the left (+2x) compared to the graph of y = 1 + x2.

c.

It controls whether the graph opens up (+x2) or down (−x2). It also controls how steep the curvature is, i.e., the larger the absolute value of the coefficient of x2 , the narrower the curve is.

a.

βˆ0 has no meaning because x = 0 would not be in the observed range of values. In this case, x is the year with values between 1984 and 1999.

b.

βˆ1 = −321.67. Since the quadratic effect is included in the model, the linear term is just a location parameter and has no meaning.

c.

βˆ2 = .0794. Since the value of βˆ2 is positive, the curvature is upward.

d.

Since no data have been collected past 1999, we have no idea if the relationship between the two variables from 1984 to 1999 will remain the same until 2021.


399


11.52

a.

Using MINITAB, a sketch of the least squares prediction equation is: Scatterplot of yhat vs Dose 12 10

yhat

8 6 4 2 0 0

100

200

300

400 Dose

500

600

700

800

b.

For x = 500, yˆ = 10.25 + .0053(500) − .0000266(5002 ) = 10.25 + 2.65 − 6.65 = 6.25

c.

For x = 0, yˆ = 10.25 + .0053(0) − .0000266(02 ) = 10.25

d.

For x = 100, yˆ = 10.25 + .0053(100) − .0000266(1002 ) = 10.25 + .53 − .266 = 10.514 This value is slightly larger than that for the control group (10.25). For x = 200, yˆ = 10.25 + .0053(200) − .0000266(2002 ) = 10.25 + 1.06 − 1.064 = 10.246 This value is slightly smaller than that for the control group (10.25). So, the largest value of x which yields an estimated weight change that is closest to, but just less than the estimated weight change for the control group is x = 200.

11.54

a.

A first order model is: E(y) = βo + β1x

b.

A second order model is: E(y) = βo + β1x + β2x2

400

Chapter 11


c.

Using MINITAB, a scattergram of these data is: Scatterplot of International vs Domestic 1200

Inter national

1000 800 600 400 200 0 100

200

300

400 Domestic

500

600

From the plot, it appears that the first order model might fit the data better. There does not appear to be much of a curve to the relationship. d.

Using MINITAB, the output is: Regression Analysis: International versus Domestic, Dsq The regression equation is International = 203 - 0.58 Domestic + 0.00364 Dsq Predictor Constant Domestic Dsq

Coef 202.9 -0.581 0.003638

S = 142.696

SE Coef 245.0 1.510 0.002085

R-Sq = 78.8%

T 0.83 -0.38 1.74

P 0.424 0.707 0.107

R-Sq(adj) = 75.2%

Analysis of Variance Source Regression Residual Error Total Source Domestic Dsq

DF 1 1

DF 2 12 14

SS 906515 244345 1150860

MS 453258 20362

F 22.26

P 0.000

Seq SS 844526 61990

To investigate the usefulness of the model, we test:

H0: β1 = β2 = 0 Ha: At least one βi ≠ 0, i = 1, 2


401


The test statistic is F = 22.26. The p-value is p = 0.000. Since the p-value is so small, we reject H0. There is sufficient evidence to indicate the model is useful for predicting foreign gross revenue. To determine if a curvilinear relationship exists between foreign and domestic gross revenues, we test:

H0: β2 = 0 Ha: β2 ≠ 0 The test statistic is t = 1.74 The p-value is p = 0.107. Since the p-value is greater than α = .05 (p = 0.107 > α = .05), H0 is not rejected. There is insufficient evidence to indicate that a curvilinear relationship exists between foreign and domestic gross revenues at α = .05. e.

11.56

402

From the analysis in part d, the first-order model better explains the variation in foreign gross revenues. In part d, we concluded that the second-order term did not improve the model.

a.

b.

It moves the graph to the right (−2x) or to the left (+2x) compared to the graph of y = 1 + x2.

c.

It controls whether the graph opens up (+x2) or down (−x2). It also controls how steep the curvature is, i.e., the larger the absolute value of the coefficient of x2 , the narrower the curve is.

Chapter 11


11.58

a.

A scatterplot of the data is: -

*

10500+

*

Y

*

-

*

*

*

7000+

***

-

*

-

* *

-

*

-

** *

*

* *

*

*

-

*

** *

**

3500+

*

* *

*

* *

*

-

*

*

*

*

*

*

+---------+---------+---------+---------+---------+------X 0.0

8.0

16.0

24.0

32.0

40.0

b.

From the plot, it looks like a second-order model would fit the data better than a firstorder model. There is little evidence that a third-order model would fit the data better than a second-order model.

c.

Using MINITAB, the output for fitting a first-order model is: The regression equation is Y = 2752 + 122 X Predictor Constant X

Coef 2752.4 122.34

s = 1904

Stdev 613.5 26.08

R-sq = 36.7%

t-ratio 4.49 4.69

p 0.000 0.000

R-sq(adj) = 35.0%

Analysis of Variance SOURCE Regression Error Total

DF 1 38 39

SS 79775688 137726224 217501920

Unusual Observations Obs. X Y 27 27.0 2007 40 40.0 11520

MS 79775688 3624374

Fit Stdev.Fit 6056 345 7646 591

F 22.01

Residual -4049 3874

p 0.000

St.Resid -2.16R 2.14R



403


To see if there is a significant linear relationship between day and demand, we test:

H0: β1 = 0 Ha: β1 ≠ 0 The test statistic is t = 4.69. The p-value for the test is p = 0.000. Since the p-value is less than α = .05, H0 is rejected. There is sufficient evidence to indicate that there is a linear relationship between day and demand at α = .05. d.

Using MINITAB, the output for fitting a second-order model is: The regression equation is Y = 5120 - 216 X + 8.25 XSQ Predictor Constant X XSQ

Coef 5120.2 -215.92 8.250

s = 1637

Stdev 816.9 91.89 2.173

R-sq = 54.4%

t-ratio 6.27 -2.35 3.80

p 0.000 0.024 0.001

R-sq(adj) = 52.0%


DF 2 37 39

SS 118377056 99124856 217501920

SOURCE X XSQ

DF 1 1

SEQ SS 79775688 38601372

Unusual Observations Obs. X Y 27 27.0 2007

MS 59188528 2679050

Fit Stdev.Fit 5305 357

F 22.09

Residual -3298

p 0.000

St.Resid -2.06R


To see if there is a significant quadratic relationship between day and demand, we test:

H0: β2 = 0 Ha: β2 ≠ 0 The test statistic is t = 3.80. The p-value for the test is p = 0.001. Since the p-value is less than α = .05, H0 is rejected. There is sufficient evidence to indicate that there is a quadratic relationship between day and demand at α = .05.

404

Chapter 11


e.

11.60

Since the quadratic term is significant in the second-order model in part d, the second order model is better.

The model is E(y) = β0 + β1x1 + β2x2 where

⎧ 1 if the variable is at level 2 x1 = ⎨ ⎩ 0 otherwise

⎧ 1 if the variable is at level 3 x2 = ⎨ ⎩ 0 otherwise

β0 = mean value of y when qualitative variable is at level 1. β1 = difference in mean value of y between level 2 and level 1 of qualitative variable. β2 = difference in mean value of y between level 3 and level 1 of qualitative variable. 11.62

a.

The least squares prediction equation is: yˆ = 80 + 16.8x1 + 40.4x2

b.

βˆ1 estimates the difference in the mean value of the dependent variable between level 2 and level 1 of the independent variable.

βˆ2 estimates the difference in the mean value of the dependent variable between level 3 and level 1 of the independent variable. c.

The hypothesis H0: β1 = β2 = 0 is the same as H0: μ1 = μ2 = μ3. The hypothesis Ha: At least one of the parameters β1 and β2 differs from 0 is the same as Ha: At least one mean (μ1, μ2, or μ3) is different.

d.


MSR 2059.5 = = 24.72 MSE 83.3

Since no α was given, we will use α = .05. The rejection region requires α = .05 in the upper tail of the test statistic with numerator df = k = 2 and denominator df = n − (k + 1) = 15 − (2 + 1) = 12. From Table IX, Appendix B, F.05 = 3.89. The rejection region is F > 3.89. Since the observed value of the test statistic falls in the rejection region (F = 24.72 > 3.89), H0 is rejected. There is sufficient evidence to indicate at least one of the means is different at α = .05. 11.64

a.

b.

A confidence interval for the difference of two population means could be used. Since both sample sizes are over 30, the large sample confidence interval is used (with independent samples). ⎧ 1 if public college Let x1 = ⎨ ⎩ 0 otherwise The model is E(y) = β0 + β1x1


405


c.

β1 is the difference between the two population means. A point estimate for β1 is βˆ1 . A confidence interval for β1 could be used to estimate the difference in the two population means.

11.66

a.

⎧ 1 if no Let x1 = ⎨ ⎩ 0 if yes The model would be E(y) = β0 + β1x1 In this model, β0 is the mean job preference for those who responded ‘yes’ to the question "Flextime of the position applied for" and β1 is the difference in the mean job preference between those who responded 'no' to the question and those who answered ‘yes’ to the question.

b.

⎧ 1 if referral Let x1 = ⎨ ⎩ 0 if not

⎧ 1 if on-premise x2 = ⎨ ⎩ 0 if not

The model would be E(y) = βo + β1x1 + β2x2 In this model, βo is the mean job preference for those who responded ‘none’ to level of day care support required, β1 is the difference in the mean job preference between those who responded ‘referral’ and those who responded ‘none’, and β2 is the difference in the mean job preference between those who responded ‘on-premise’ and those who responded ‘none’. c.

⎧ 1 if counseling Let x1 = ⎨ ⎩ 0 if not

⎧ 1 if active search x2 = ⎨ ⎩ 0 if not

The model would be E(y) = β0 + β1x1 + β2x2 In this model, β0 is the mean job preference for those who responded ‘none’ to spousal transfer support required, β1 is the difference in the mean job preference between those who responded ‘counseling’ and those who responded ‘none’, and β2 is the difference in the mean job preference between those who responded ‘active search’ and those who responded ‘none’. d.

⎧ 1 if not married Let x1 = ⎨ ⎩ 0 if married The model would be E(y) = β0 + β1x1 In this model, β0 is the mean job preference for those who responded ‘married’ to marital status and β1 is the difference in the mean job preference between those who responded ‘not married’ and those who answered ‘married’.

406

Chapter 11


e.

⎧ 1 if female Let x1 = ⎨ ⎩ 0 if male The model would be E(y) = β0 + β1x1 In this model, β0 is the mean job preference for males and β1 is the difference in the mean job preference between females and males.

11.68

a.

βˆ4 = .296 The difference in the mean value of DTVA between when the operating earnings are negative and lower than last year and when the operating earnings are not negative and lower than last year is estimated to be .296, holding all other variables constant.

b.

To determine if the mean DTVA for firms with negative earnings and earnings lower than last year exceed the mean DTVA of other firms, we test: H0: β4 = 0 Ha: β4 > 0 The p-value for this test is p = .001 / 2 = .0005. Since the p-value is so small, we would reject H0 for α = .05. There is sufficient evidence to indicate the mean DTVA for firms with negative earnings and earnings lower than last year exceed the mean DTVA of other firms at α = .05.

11.70

c.

Ra2 = .280 28% of the variability in the DTVA scores is explained by the model containing the 5 independent variables, adjusted for the number of variables in the model and the sample size.

a.

To determine if there is a difference in the mean monthly rate of return for T-Bills between an expansive Fed monetary policy and a restrictive Fed monetary policy, we test: H0: β1 = 0 Ha: β1 ≠ 0 The test statistic is t = 8.14. Since no n nor α is given, we cannot determine the exact rejection region. However, we can assume that n is greater than 2 since the data used are from 1972 and 1997. With α = .05, the critical value of t for the rejection region will be smaller than 4.303. Thus, with α = .05, t = 8.14 will fall in the rejection region. There is sufficient evidence to indicate a difference in the mean monthly rate of return for T-Bills between an expansive Fed monetary policy and a restrictive Fed monetary policy at α = .05. However, the value of R2 is .1818. The model used is explaining only 18.18% of the variability in the monthly rate of return. This is not a particularly large value.


407


To determine if there is a difference in the mean monthly rate of return for Equity REIT between an expansive Fed monetary policy and a restrictive Fed monetary policy, we test: H0: β1 = 0 Ha: β1 ≠ 0 The test statistic is t = −3.46. Since no n nor α is given, we cannot determine the exact rejection region. However, we can assume that n is greater than 4 since the data used are from 1972 and 1997. With α = .05, the critical value of t for the rejection region will be smaller than 3.182. Thus, with α = .05, t = −3.46 will fall in the rejection region. There is sufficient evidence to indicate a difference in the mean monthly rate of return for Equity REIT between an expansive Fed monetary policy and a restrictive Fed monetary policy at α = .05. However, the value of R2 is .0387. The model used is explaining only 3.87% of the variability in the monthly rate of return. This is a very small value. b.

For the first model, β1 is the difference in the mean monthly rate of return for T-Bills between an expansive Fed monetary policy and a restrictive Fed monetary policy. For the second model, β1 is the difference in the mean monthly rate of return for Equity REIT between an expansive Fed monetary policy and a restrictive Fed monetary policy.

c.

The least squares prediction equation for the equity REIT index is: yˆ = 0.01863 − 0.01582x. When the Federal Reserve’s monetary policy is restrictive, x = 1. The predicted mean monthly rate of return for the equity REIT index is

yˆ = 0.01863 − 0.01582(1) = .00281 When the Federal Reserve’s monetary policy is expansive, x = 0. The predicted mean monthly rate of return for the equity REIT index is yˆ = 0.01863 − 0.01582(0) = .01863. 11.72

a.

The first-order model is E(y) = β0 + β1x1

b.

The new model is E(y) = β0 + β1x1 + β2x2 + β3x3 ⎧1 if level 2 where x 2 = ⎨ ⎩0 otherwise

408

⎧1 if level 3 x3 = ⎨ ⎩0 otherwise

Chapter 11


c.

To allow for interactions, the model is: E(y) = β0 + β1x1 + β2x2 + β3x3 + β4x1x2 + β5x1x3

11.74

11.76

d.

The response lines will be parallel if β4 = β5 = 0

e.

There will be one response line if β2 = β3 = β4 = β5 = 0

a.

When x2 = x3 = 0, E(y) = β0 + β1x1 When x2 = 1 and x3 = 0, E(y) = β0 + β1x1 + β2 When x2 = 0 and x3 = 1, E(y) = β0 + β1x1 + β3

b.

For level 1, yˆ = 44.8 + 2.2x1 For level 2, yˆ = 44.8 + 2.2x1 + 9.4 = 54.2 + 2.2x1 For level 3, yˆ = 44.8 + 2.2x1 + 15.6 = 60.4 + 2.2x1

The model is E(y) = β0 + β1x1 + β2 x12 + β3x2 + β4x3 + β5x4 where x1 is the quantitative variable and ⎧ 1 if level 2 of qualitative variable x2 = ⎨ ⎩ 0 otherwise ⎧ 1 if level 3 of qualitative variable x3 = ⎨ ⎩ 0 otherwise ⎧ 1 if level 4 of qualitative variable x4 = ⎨ ⎩ 0 otherwise


409


11.78

a.

E(y) = β0 + β1x1 + β2x2 + β3x1x2

⎧1 if diet is duck chow where x 2 = ⎨ ⎩0 otherwise b.

Using MINITAB, the printout is: The regression equation is WtChg = -2.21 + 0.0783x1 - 10.4x2 - 0.095x1x2 Predicto r Constant x1 x2 x1x2 S = 3.882

Coef

StDev

T

P

-2.210 0.07831 10.354 -0.0948

1.250 0.04947 8.538 0.1418

-1.77 1.58 1.21 -0.67

0.085 0.122 0.233 0.508

R-Sq = 44.1%

R-Sq(adj) = 39.7

Analysis of Variance Source Regression Residual Error Total Sourc e x1 x2 x1x2

DF 3 38

SS 452.54 572.58

41

1025.12

DF

Seq SS

1 1 1

384.24 61.57 6.73

MS 150.85 15.07

F 10.01

P 0.000

Unusual Observations Obs 12 37 40

x1 30.0 42.5 75.0

y -8.500 8.000 8.500

WtChg StDev Fit Residual St Resid 0.139 0.802 -8.639 -2.27R 7.445 2.990 0.555 0.22 X 6.910 2.077 1.590 0.48 X

R denotes an observation with a large standardized residual X denotes an observation whose X value gives it large influence.

The fitted equation is yˆ = −2.21 + .0783x1 + 10.4x2 − .095x1x2

410

Chapter 11


c.

For diet = plants, x2 = 0 yˆ = −2.21 + .0783x1 + 10.4(0) − .095x1(0) = −2.21 + .0783x1

The slope is .0783. For each unit increase in digestion efficiency, the mean weight change is estimated to increase by .0783 for goslings fed plants. d.

For diet = plants, x2 = 1

yˆ = −2.21 + .0783x1 + 10.4(1) − .095x1(1) = 8.19 − .0167x1 The slope is −.0167. For each unit increase in digestion efficiency, the mean weight change is estimated to decrease by .0167 for goslings fed duck chow. e.

To determine if the slopes associated with the two diets differ, we test: H0: β3 = 0 Ha: β3 ≠ 0

From MINITAB, the test statistic is t = −.67 with p-value = .508 Since α = .05 is less than the p-value, we fail to reject H0. There is insufficient evidence to conclude that the slopes associated with the two diets are significantly different at α = .05 11.80

a.

⎧ 1 if intervention group Let x2 = ⎨ ⎩ 0 if otherwise The first-order model would be: E(y) = β0 + β1x1 + β2x2

b.

For the control group, x2 = 0. The first-order model is: E(y) = β0 + β1x1 + β2(0) = β0 + β1x1

For the intervention group, x2 = 1. The first-order model is: E(y) = β0 + β1x1 + β2(1) = β0 + β1x1 + β2 = (β0 + β2) + β1x1

In both models, the slope of the line is β1. c.

If pretest score and group interact, the first-order model would be: E(y) = β0 + β1x1 + β2x2 + β3x1x2


411


d.

For the control group, x2 = 0. The first-order model including the interaction is: E(y) = β0 + β1x1 + β2(0) + β3x1(0) = β0 + β1x1

For the intervention group, x2 = 1. The first-order model including the interaction is: E(y) = β0 + β1x1 + β2(1) + β3x1(1) = β0 + β1x1 + β2 + β3x1 = (β0 + β2) + (β1 + β3)x1

The slope of the model for the control group is β1. The slope of the model for the intervention group is β1 + β3. 11.82

a.

The first-order model is: E(y) = β0 + β1x1 + β2x2

b.

For the high-tech firms, x2 = 1. The model for the high-tech firm is: E(y) = β0 + β1x1 + β2(1) = β0 + β2 + β1x1

The slope of the line would be β1. c.

The new model would include the interaction term: E(y) = β0 + β1x1 + β2x2 + β3x1x2

d.

For the high-tech firms, x2 = 1. The model for the high-tech firm is: E(y) = β0 + β1x1 + β2(1) + β3x1(1) = β0 + β2 + (β1 + β3)x1

The slope of the line would be β1 + β3. 11.84

By adding variables to the model, SSE will decrease or stay the same. Thus, SSEC ≤ SSER. The only circumstance under which we will reject H0 is if SSEC is much smaller than SSER. If SSEC is much smaller than SSER, F will be large. Thus, the test is only one-tailed.

11.86

a.

Ha: At least one βi ≠ 0, i = 3, 4, 5

b.

The reduced model would be E(y) = β0 + β1x1 + β2x2

c.

The numerator df = k − g = 5 − 2 = 3 and the denominator df = n − (k + 1) = 30 − (5 + 1) = 24.

412

Chapter 11


d.

H0: β3 = β4 = β5 = 0 Ha: At least one βi ≠ 0, i = 3, 4, 5 (SSE R − SSE C)/(k − g ) SSE C /[n − (k + 1)] (1250.2 − 1125.2) /(5 − 2) 41.6667 = .89 = = 1125.2 /[30 − (5 + 1)] 46.8833


The rejection region requires α = .05 in the upper tail of the F distribution with numerator df = k − g = 5 − 2 = 3 and denominator df = n − (k + 1) = 30 − (5 + 1) = 24. From Table IX, Appendix B, F.05 = 3.01. The rejection region is F > 3.01. Since the observed value of the test statistic does not fall in the rejection region (F = .89 >/ 3.01), H0 is not rejected. There is insufficient evidence to indicate the secondorder terms are useful at α = .05. 11.88

a.

Let variables x1 through x4 be the Demographic variables, variables x5 through x11 be the Diagnostic variables, variables x12 through x15 be the Treatment variables, and variables x16 through x21 be the Community variables. The compete model is: E ( y ) = β 0 + β1 x1 + β 2 x2 + β 3 x3 + β 4 x4 + β 5 x5 + β 6 x6 + β 7 x7 + β 8 x8 + β 9 x9 + β10 x10 + β11 x11 + β12 x12 + β13 x13 + β14 x14 + β15 x15 + β16 x16 + β17 x17 + β18 x18 + β19 x19 + β 20 x20 + β 21 x21

b.

To determine if the 7 Diagnostic variables contribute information for the prediction of y, we test: H0: β5 = β6 = …= β11 = 0

c.

The reduced model would be: E ( y ) = β 0 + β1 x1 + β 2 x2 + β 3 x3 + β 4 x4 + β12 x12 + β13 x13 + β14 x14 + β15 x15 + β16 x16 + β17 x17 + β18 x18 + β19 x19 + β 20 x20 + β 21 x21

11.90

d.

Since the p-value is so small (p < .0001), H0 is rejected. There is sufficient evidence to indicate at least one of the seven diagnostic variables contributes information for the prediction of y.

a.

The complete second order model is: E(y) = β0 + β1x1 + β x12 + β3x2 + β4x1x2 + β 5 x12 x2 where x1 = age ⎧ 1 if current x2 = ⎨ ⎩0 otherwise


413


b.

To determine if the quadratic terms are important, we test:

c.

H0: β2 = β5 = 0 To determine if the interaction terms are important, we test: H0: β4 = β5 = 0

d.

From MINITAB, the outputs from fitting the three models are: Regression Analysis: Value versus Age, AgeSq, Status, AgeSt, AgeSqSt The regression equation is Value = 83 - 5.7 Age + 0.236 AgeSq - 62 Status + 5.4 AgeSt - 0.234 AgeSqSt Predictor Constant Age AgeSq Status AgeSt AgeSqSt

Coef 83.4 -5.74 0.2361 -62.1 5.36 -0.2337

S = 286.8

SE Coef 316.3 18.68 0.2549 354.8 24.81 0.4080

R-Sq = 24.7%

T 0.26 -0.31 0.93 -0.18 0.22 -0.57

P 0.793 0.760 0.359 0.862 0.830 0.570

R-Sq(adj) = 16.1%

Analysis of Variance Source Regression Residual Error Total Source Age AgeSq Status AgeSt AgeSqSt

DF 5 44 49

DF 1 1 1 1 1

SS 1186549 3618994 4805542

MS 237310 82250

F 2.89

P 0.024

Seq SS 865746 138871 77594 77342 26996

Regression Analysis: Value versus Age, Status, AgeSt The regression equation is Value = - 176 + 11.2 Age + 196 Status - 11.4 AgeSt Predictor Constant Age Status AgeSt

Coef -176.1 11.166 196.5 -11.432

S = 283.2

SE Coef 145.0 3.902 178.9 6.763

R-Sq = 23.2%

T -1.21 2.86 1.10 -1.69

P 0.231 0.006 0.278 0.098

R-Sq(adj) = 18.2%

Analysis of Variance Source Regression Residual Error Total Source Age Status AgeSt

414

DF 1 1 1

DF 3 46 49

SS 1116017 3689526 4805543

MS 372006 80207

F 4.64

P 0.006

Seq SS 865746 21097 229174

Chapter 11


Regression Analysis: Value versus Age, AgeSq, Status The regression equation is Value = 166 - 8.8 Age + 0.253 AgeSq - 106 Status Predictor Constant Age AgeSq Status

Coef 165.8 -8.81 0.2535 -105.6

S = 284.5

SE Coef 182.7 10.89 0.1632 107.9

R-Sq = 22.5%

T 0.91 -0.81 1.55 -0.98

P 0.369 0.423 0.127 0.333

R-Sq(adj) = 17.5%

Analysis of Variance Source Regression Residual Error Total Source Age AgeSq Status

DF 1 1 1

DF 3 46 49

SS 1082210 3723332 4805542

MS 360737 80942

F 4.46

P 0.008

Seq SS 865746 138871 77594

Test for part b: The test statistic is: F=

(SSE R − SSE C)/(k − g ) (3, 689, 526 − 3, 618, 994) / 2 = = .429 82, 250 SSE C /[n − ( k + 1)]

Since no α is given, we will use α = .05. The rejection region requires α = .05 in the upper tail of the F distribution with ν1 = 2 numerator degrees of freedom and ν2 = 44 denominator degrees of freedom. From Table IX, Appendix B, F.05 ≈ 3.23. The rejection region is F > 3.23. Since the observed value of the test statistic does not fall in the rejection region (F = .429 >/ 3.23), H0 is not rejected. There is insufficient evidence to indicate the quadratic terms are important for predicting market value at α = .05. Test for part c: The test statistic is: F=

(SSE R − SSE C)/(k − g ) (3, 723, 332 − 3, 618, 994) /(5 − 3) = = .634 82, 250 SSE C /[n − (k + 1)]

The rejection region is the same as in previous test. Reject H0 if F > 3.23. Since the observed value of the test statistic does not fall in the rejection region (F = .634 >/ 3.23), H0 is not rejected. There is insufficient evidence to indicate the interaction terms are important for predicting market value at α = .05.


415


11.92

a.

The reduced model for testing if the mean posttest scores differ for the intervention and control groups would be: E(y) = β0 + β1x1

11.94

b.

The reported p-value is .03. Since the p-value is so small, H0 is rejected. There is evidence to indicate that the mean posttest sun safety knowledge scores differ for the intervention and control groups for α > .03.

c.

The reported p-value is .033. Since the p-value is so small, H0 is rejected. There is evidence to indicate that the mean posttest sun safety comprehension scores differ for the intervention and control groups for α > .033.

d.

The reported p-value is .322. Since the p-value is not small, H0 is not rejected. There is no evidence to indicate that the mean posttest sun safety application scores differ for the intervention and control groups for α < .322.

a.

To determine whether the rate of increase of emotional distress with experience is different for the two groups, we test: H0: β4 = β5 = 0 Ha: At least one βi ≠ 0, i = 4, 5

b.

To determine whether there are differences in mean emotional distress levels that are attributable to exposure group, we test: H0: β3 = β4 = β5 = 0 Ha: At least one βi ≠ 0, i = 3, 4, 5

c.

To determine whether there are differences in mean emotional distress levels that are attributable to exposure group, we test: H0: β3 = β4 = β5 = 0 Ha: At least one βi ≠ 0, i = 3, 4, 5 The test statistic is F =

(SSE R − SSE C) /(k − g ) (795.23 − 783.9) /(5 − 2) = = .93 783.9 /[200 − (5 + 1)] SSE C /[ n − (k + 1)]

The rejection region requires α = .05 in the upper tail of the F distribution with ν1 = k − g = 5 − 2 = 3 and ν2 = n − (k + 1) = 200 − (5 + 1) = 194. From Table IX, Appendix B, F.05 ≈ 2.60. The rejection region is F > 2.60. Since the observed value of the test statistic does not fall in the rejection region (F = .93 >/ 2.60), H0 is not rejected. There is insufficient evidence to indicate that there are differences in mean emotional distress levels that are attributable to exposure group at α = .05.

416

Chapter 11


11.96

a.

The best one-variable predictor of y is the one whose t statistic has the largest absolute value. The t statistics for each of the variables are:

Independent Variable ───── x1 x2 x3 x4 x5 x6

t=

βî sβˆ

i

───────── t = 1.6/.42 = 3.81 t = −.9/.01 = −90 t = 3.4/1.14 = 2.98 t = 2.5/2.06 = 1.21 t = −4.4/.73 = −6.03 t = .3/.35 = .86

The variable x2 is the best one-variable predictor of y. The absolute value of the corresponding t score is 90. This is larger than any of the others.

11.98

b.

Yes. In the stepwise procedure, the first variable entered is the one which has the largest absolute value of t, provided the absolute value of the t falls in the rejection region.

c.

Once x2 is entered, the next variable that is entered is the one that, in conjunction with x2, has the largest absolute t value associated with it.

a.

In step 1, all 1 variable models are fit. Thus, there are a total of 11 models fit.

b.

In step 2, all two-variable models are fit, where 1 of the variables is the best one selected in step 1. Thus, a total of 10 two-variable models are fit.

c.

In the 11th step, only one model is fit – the model containing all the independent variables.

d.

The model would be:

E ( y ) = β 0 + β1 x1 + β 2 x2 + β3 x3 + β 4 x4 + β 7 x7 + β 9 x9 + β10 x10 + β11 x11 e.

67.7% of the total sample variability of overall satisfaction is explained by the model containing the independent variables safety on bus, seat availability, dependability, t travel time, convenience of route, safety at bus stops, hours of service, and frequency of service.

f.

Using stepwise regression does not guarantee that the best model will be found. There may be better combinations of the independent variables that are never found, because of the order in which the independent variables are entered into the model.


417


11.100 a.

The plot of the residuals reveals a nonrandom pattern. The residuals exhibit a curved shape. Such a pattern usually indicates that curvature needs to be added to the model.

b.

The plot of the residuals reveals a nonrandom pattern. The residuals versus the predicted values shows a pattern where the range in values of the residuals increases as yˆ increases. This indicates that the variance of the random error, ∈, becomes larger as the estimate of E(y) increases in value. Since E(y) depends on the x-values in the model, this implies that the variance of ∈ is not constant for all settings of the x's.

c.

This plot reveals an outlier, since all or almost all of the residuals should fall within 3 standard deviations of their mean of 0.

d.

This frequency distribution of the residuals is skewed to the right. This may be due to outliers or could indicate the need for a transformation of the dependent variable.

11.102 a.

b.

Since all the pairwise correlations are .45 or less in absolute value, there is little evidence of extreme multicollinearity. No. The overall model test is significant (p < .001). This implies that at least one variable contributes to the prediction of the urban/rural rating. Looking at the individual t-tests, there are several that are significant, namely x1, x3, and x5. There is no evidence that multicollinearity is present.

11.104 First, we need to compute the value of the residual:

Residual = y − yˆ = 87 − 29.63 = 57.37 We are given that the standard deviation is s = 24.68. Thus, an observation with a residual of 57.37 is 57.37 / 24.68 = 2.32 standard deviations from the fitted regression line. Since this is less than 3 standard deviations from the regression line, this point is not considered an outlier.

418

Chapter 11


11.106 a.

From MINITAB, the output is: Regression Analysis: Food versus Income, Size The regression equation is Food = 2.79 - 0.00016 Income + 0.383 Size Predictor Constant Income Size

Coef 2.7944 -0.000164 0.38348

S = 0.7188

SE Coef 0.4363 0.006564 0.07189

R-Sq = 55.8%

T 6.40 -0.02 5.33

P 0.000 0.980 0.000

R-Sq(adj) = 52.0%

Analysis of Variance Source Regression Residual Error Total Source Income Size

DF 2 23 25

DF 1 1

SS 15.0027 11.8839 26.8865

MS 7.5013 0.5167

F 14.52

P 0.000

Seq SS 0.2989 14.7037

Correlations: Income, Size Pearson correlation of Income and Size = -0.137 P-Value = 0.506

No; Income and household size do not seem to be highly correlated. The correlation coefficient between income and household size is −.137. b.

Using MINITAB, the residual plots are: Histogram of the Residuals (response is Food)

Frequency

10

5

0 -1.0

-0.5

0.0

0.5

1.0

1.5

2.0

2.5

3.0

Residual


419


Residuals Versus the Fitted Values (response is Food) 3

Residual

2

1

0

-1 3

4

5

6

Fitted Value

Residuals Versus Income (response is Food) 3

Residual

2

1

0

-1 0

10

20

30

40

50

60

70

80

90

100

Income

Residuals Versus Size (response is Food) 3

Residual

2

1

0

-1 0

1

2

3

4

5

6

7

8

9

Size

Yes; The residuals versus income and residuals versus homesize exhibit a curved shape. Such a pattern could indicate that a second-order model may be more appropriate.

420

Chapter 11


c.

No; The residuals versus the predicted values reveals varying spreads for different values of yˆ . This implies that the variance of ∈ is not constant for all settings of the x's.

d.

Yes; The outlier shows up in several plots and is the 26th household (Food consumption = $7500, income = $7300 and household size = 5).

e.

No; The frequency distribution of the residuals shows that the outlier skews the frequency distribution to the right.

11.108 Using MINITAB, the residual plots are:

Residual Plots for DDT Normal Probability Plot of the Residuals

Percent

99 90 50 10 1 0.1

Residuals Versus the Fitted Values Standardized Residual

99.9

-5

0 5 Standardized Residual

2.5 0.0

50

10

50 Fitted Value

100

Residuals Versus the Order of the Data Standardized Residual

Frequency

100

2 4 6 8 Standardized Residual

5.0

0

Histogram of the Residuals

0

7.5

10

150

0

10.0

10.0 7.5 5.0 2.5 0.0 1 10 20 30 4 0 5 0 6 0 7 0 8 0 9 0 00 10 20 30 40 1 1 1 1 1

Observation Order

Residuals Versus WEIGHT (response is DDT) 12

Standardized Residual

10 8 6 4 2 0 0

500

1000

1500

2000

2500

WEIGHT


421


Residuals Versus LENGTH (response is DDT) 12


10 8 6 4 2 0 20

25

30

35 LENGTH

40

45

50

55

Residuals Versus MILE (response is DDT) 12


10 8 6 4 2 0 0

50

100

150

200

250

300

350

MILE

From the normal probability plot, the points do not fall on a straight line, indicating the residuals are not normal. The histogram of the residuals indicates the residuals are skewed to the right, which also indicates that the residuals are not normal. The plot of the residuals versus yhat indicates that there is at least one outlier and the variance is not constant. One observation has a standardized residual of more than 10 and several others have standardized residuals greater than 3. This is also evident in the plots of the residuals versus each of the independent variables. Since the assumptions of normality and constant variance appear to be violated, we could consider transforming the data. We should also check the outlying observations to see if there are any errors connected with these observations. 11.110 a.

To determine if at least one of the β parameters is not zero, we test: H0: β1 = β2 = β3 = β4 = 0 Ha: At least one βi ≠ 0 The test statistic is F =

422

R2 / k .83 / 4 = = 24.41 2 (1 − R ) /[n − (k + 1)] (1 − .83)([25 − (4 + 1)]

Chapter 11


The rejection region requires α = .05 in the upper tail of the F distribution with numerator df = k = 4 and denominator df = n − (k + 1) = 25 − (4 + 1) = 20. From Table IX, Appendix B, F.05 = 2.87. The rejection region is F > 2.87. Since the observed value of the test statistic falls in the rejection region (F = 24.41 > 2.87), H0 is rejected. There is sufficient evidence to indicate at least one of the β parameters is nonzero at α = .05. b.

H0: β1 = 0 Ha: β1 < 0 The test statistic is t =

βˆ1 − 0 sβˆ

=

−2.43 − 0 = −2.01 1.21

1

The rejection region requires α = .05 in the lower tail of the t distribution with df = n − (k + 1) = 25 − (4 + 1) = 20. From Table VI, Appendix B, t.05 = 1.725. The rejection region is t < −1.725. Since the observed value of the test statistic falls in the rejection region (t = −2.01 < −1.725), H0 is rejected. There is sufficient evidence to indicate β1 is less than 0 at α = .05. c.

H0: β2 = 0 Ha: β2 > 0 The test statistic is t =

βˆ2 − 0 sβˆ

=

.05 − 0 = .31 .16

2

The rejection region requires α = .05 in the upper tail of the t distribution. From part b above, the rejection region is t > 1.725. Since the observed value of the test statistic does not fall in the rejection region (t = .31 >/ 1.725), H0 is not rejected. There is insufficient evidence to indicate β2 is greater than 0 at α = .05. d.


βˆ3 − 0 sβˆ

=

.62 − 0 = 2.38 .26

3

The rejection region requires α/2 = .05/2 = .025 in each tail of the t distribution with df = 20. From Table VI, Appendix B, t.025 = 2.086. The rejection region is t < −2.086 or t > 2.086. Since the observed value of the test statistic falls in the rejection region (t = 2.38 > 2.086), H0 is rejected. There is sufficient evidence to indicate β3 is different from 0 at α = .05.


423


11.112 The error of prediction is smallest when the values of x1, x2, and x3 are equal to their sample means. The further x1, x2, and x3 are from their means, the larger the error. When x1 = 60, x2 = .4, and x3 = 900, the observed values are outside the observed ranges of the x values. When x1 = 30, x2 = .6, and x3 = 1300, the observed values are within the observed ranges and consequently the x values are closer to their means. Thus, when x1 = 30, x2 = .6, and x3 = 1300, the error of prediction is smaller. 11.114 From the plot of the residuals for the straight line model, there appears to be a mound shape which implies the quadratic model should be used. 11.116 a. b.

Ha: At least one of β4 and β5 ≠ 0 The regression model E(y) = β0 + β1x1 + β2x2 + β3 x22 + β4x1x2 + β5x1 x22 is fit to the 35 data points, yielding a sum of squares for error, denoted SSEC. The regression model E(y) = β0 + β1x1 + β2x2 + β3 x22 is also fit to the data and its sum of squares for error is obtained, denoted SSER. Then the test statistic is: F=

(SSE R − SSE C) /( k − g ) SSE C /[n − (k + 1)]

where k = 5, g = 3, and n = 35. c.

The numerator degrees of freedom is k − g = 5 − 3 = 2, and the denominator degrees of freedom is n − (k + 1) = 35 − (5 + 1) = 29.

d.

The rejection region requires α = .05 in the upper tail of the F distribution with numerator df = 2 and denominator df = 29. From Table IX, Appendix B, F.05 = 3.33. The rejection region is F > 3.33.

11.118 a.

E(y) = β0 + β1x1 + β2x2 + β3x3 ⎧ 1, if level 2 ⎧ 1, if level 3 x3 = ⎨ where x2 = ⎨ ⎩ 0, otherwise ⎩ 0, otherwise

b.

E(y) = β0 + β1x1 + β2 x12 + β3x2 + β4x3 + β5x1x2 + β6x1x3 + β7 x12 x2 + β8 x12 x3 where x1, x2, and x3 are as in part a.

424

Chapter 11


11.120 a. b. 11.122 a.

E(y) = β0 + β1x1 + β2x2 E(y) = β0 + β1x1 + β2 x12 + β3x2 + β4 x22 + β5x1x2 1. 2. 3. 4. 5.

b.

c.

The "Quantitative GMAT score" is measured on a numerical scale, so it is a quantitative variable. The "Verbal GMAT score" is measured on a numerical scale, so it is a quantitative variable. The "Undergraduate GPA" is measured on a numerical scale, so it is a quantitative variable. The "First-year graduate GPA" is measured on a numerical scale, so it is a quantitative variable. The "Student cohort" has 3 categories, so it is a qualitative variable. Note that the numerical scale is meaningless in this situation. (It is possible to consider this as a quantitative variable. However, for this problem we will consider it as qualitative.)

The quantitative variables GMAT score, verbal GMAT score, undergraduate GPA, and first-year graduate GPA should all be positively correlated to final GPA. ⎧1 x5 = ⎨ ⎩0 ⎧1 x6 = ⎨ ⎩0

if student entered doctoral program in year 3 otherwise if student entered doctoral program in year 5 otherwise

d.

E(y) = β0 + β1x1 + β2x2 + β3x3 + β4x4 + β5x5 + β6x6

e.

β0 = the y-intercept for students entering in year 1. β1 = the final GPA will increase by β1 for each additional increase of one unit of GMAT score, holding the remaining variables constant. β2 = the final GPA will increase by β2 for each additional increase of one unit of verbal GMAT score, holding the remaining variables constant. β3 = the final GPA will increase by β3 for each additional increase of one undergraduate GPA point, holding the remaining variables constant. β4 = the final GPA will increase by β4 for each additional increase of one first-year graduate GPA point, holding the remaining variables constant.

β5 = difference in mean final GPA between student cohort year 2 and year 1. β6 = difference in mean final GPA between student cohort year 3 and year 1. f.

E(y) = β0 + β1x1 + β2x2 + β3x3 + β4x4 + β5x5 + β6x6 + β7x1x5 + β8x1x6 + β9x2x5 + β10x2x6 + β11x3x5 + β12x3x6 + β13x4x5 + β14x4x6


425


g.

For the year 1 cohort, x5 = x6 = 0. The model is: E(y) = β0 + β1x1 + β2x2 + β3x3 + β4x4 + β5(0) + β6(0) + β7x1(0) + β8x1(0) + β9x2(0) + β10x2(0) + β11x3(0) + β12x3(0) + β13x4(0) + β14x4(0) = β0 + β1x1 + β2x2 + β3x3 + β4x4 The slopes for the four variables are β1, β2, β3 and β4 respectively.

11.124 a.

The hypothesized model is: E(y) = β0 + β1x1 + β2x2 + β3x3 + β4x4 + β5x5

β0 = y-intercept. It has no interpretation in this model. β1 = difference in the mean salaries between males and females, all other variables held constant.

β2 = difference in the mean salaries between whites and nonwhites, all other variables held constant. β3 = change in the mean salary for each additional year of education, all other variables held constant.

β4 = change in the mean salary for each additional year of tenure with firm, all other variables held constant. β5 = change in the mean salary for each additional hour worked per week, all other variables held constant. b.

The least squares equation is:

yˆ = 15.491 + 12.774x1 + .713x2 + 1.519x3 + .32x4 + .205x5

βˆ0 = estimate of the y-intercept. It has no interpretation in this model. βˆ1 : We estimate the difference in the mean salaries between males and females to be $12.774, all other variables held constant.

βˆ2 : We estimate the difference in the mean salaries between whites and nonwhites to be

$.713, all other variables held constant.

βˆ3 : We estimate the change in the mean salary for each additional year of education to be $1.519, all other variables held constant.

βˆ4 : We estimate the change in the mean salary for each additional year of tenure with firm to be $.320, all other variables held constant.

βˆ5 : We estimate the change in the mean salary for each additional hour worked per week to be $.205, all other variables held constant.

426

Chapter 11


c.

R2 = .240. 24% of the total variability of salaries is explained by the model containing gender, race, educational level, tenure with firm, and number of hours worked per week. To determine if the model is useful for predicting annual salary, we test: H0: β1 = β2 = β3 = β4 = β5 = 0 Ha: At least one βi ≠ 0 The test statistic is F =

R2 / k .24 / 5 = = 11.68 2 (1 − R )[n − (k + 1)] (1 − .24) /[191 − (5 + 1)]

The rejection region requires α = .05 in the upper tail of the F distribution with numerator df = k = 5 and denominator df = n − (k + 1) = 191 − (5 + 1) = 185. From Table IX, Appendix B, F.05 ≈ 2.21. The rejection region is F > 2.21. Since the observed value of the test statistic falls in the rejection region (F = 11.68 > 2.21), H0 is rejected. There is sufficient evidence to indicate the model containing gender, race, educational level, tenure with firm, and number of hours worked per week is useful for predicting annual salary for α = .05. d.

To determine if male managers are paid more than female managers, we test: H0: β1 = 0 Ha: β1 > 0 The p-value given for the test < .05/2 = .025. Since the p-value is less than α = .05, there is evidence to reject H0. There is evidence to indicate male managers are paid more than female managers, holding all other variables constant, for α > .025.

e.

11.126 a. b.

The salary paid an individual depends on many factors other than gender. Thus, in order to adjust for other factors influencing salary, we include them in the model. The main effects model would be: E ( y ) = β 0 + β1 x1 + β8 x8

βˆ1 = −.28 . The mean value for the relative error of the effort estimate for developers is estimated to be .28 units below that of project leaders, holding previous accuracy constant.

βˆ8 = .27 . The mean value for the relative error of the effort estimate if previous accuracy is more than 20% is estimated to be .27 units above that if previous accuracy is less than 20%, holding company role of estimator constant. c.

One possible reason for the sign of βˆ1 being opposite from what is expected could be that company role of estimator and previous accuracy could be correlated.


427


11.128 a.

R2 = .45. 45% of the total variability of the suicide rates is explained by the model containing unemployment rate, percentage of females in the work force, divorce rate, logarithm of GNP, and annual percent change in GNP. To determine if the model is useful for predicting suicide rate, we test: H0: β1 = β2 = β3 = β4 = β5 = 0 Ha: At least one βi ≠ 0 The test statistic is F =

R2 / k .45 / 5 = = 6.38 2 (1 − R )[n − (k + 1)] (1 − .45) /[45 − (5 + 1)]

The rejection region requires α = .05 in the upper tail of the F distribution with numerator df = k = 5 and denominator df = n − (k + 1) = 45 − (5 + 1) = 39. From Table IX, Appendix B, F.05 ≈ 2.45. The rejection region is F > 2.45. Since the observed value of the test statistic falls in the rejection region (F = 6.38 > 2.45), H0 is rejected. There is sufficient evidence to indicate the model containing unemployment rate, percentage of females in the work force, divorce rate, logarithm of GNP and annual percent change in GNP is useful for predicting suicide rate for α = .05. b.

βˆ0 = .002 = estimate of the y-intercept. It has no interpretation in this model. βˆ1 : We estimate the change in suicide rate for each unit change in unemployment rate to be .0204, all other variables held constant.

βˆ2 : We estimate the change in suicide rate for each unit change in percentage of females in the work force to be −.0231, all other variables held constant.

βˆ 3 : We estimate the change in suicide rate for each unit change in divorce rate to be .0765, all other variables held constant.

βˆ4 : We estimate the change in suicide rate for each unit change in logarithm of GNP to be .2760, all other variables held constant.

βˆ5 : We estimate the change in suicide rate for each unit change in annual percent change in GNP to be .0018, all other variables held constant. The p-values for unemployment rate and percentage of females in the work force are less than .05. This indicates that both are important in predicting suicide rate. The pvalues for divorce rate, logarithm of GNP, and annual percent change in GNP are all greater than .10. This indicates that none of these variables are important in predicting suicide rate. We must view these conclusions with caution. Some of these independent variables may be highly correlated with each other. If so, some of the variables declared nonsignificant may be significant if the other variables are removed from the model.

428

Chapter 11


c.

To determine if unemployment rate is a useful predictor of the suicide rate, we test: H0: β1 = 0 Ha: β1 ≠ 0 The p-value = .002. Since this p-value is less than α = .05, there is evidence to reject H0. There is sufficient evidence to indicate unemployment rate is a useful predictor of the suicide rate for σ = .05.

d.

Curvature: It may be possible that the relationship between the suicide rate and some of the independent variables is not linear, but curved. Thus, some of the variables that do not appear to be useful predictors may, in fact, be useful predictors if the secondorder term was added to the model. Interaction: Again, it may be possible that the effect of some independent variables on the suicide rate is different for different levels of other independent variables. This possibility should be explored before throwing out certain independent variables. Multicollinearity: Some of these independent variables may be highly correlated with each other. If so, some of the variables declared nonsignificant may be significant if other variables are removed from the model.

11.130 CEO income (x1) and stock percentage (x2) are said to interact if the effect of one variable, say CEO income, on the dependent variable profit (y) depends on the level of the second variable, stock percentage. 11.132 a.

The SAS output is: DEP VARIABLE: Y ANALYSIS OF VARIANCE SUM OF

MEAN

DF

SQUARES

SQUARE

F VALUE

PROB>F

MODEL

3

25784705.01

8594901.67

241.758

0.0001

ERROR

16

568826.19

35551.63709

C TOTAL

19

26353531.20

ROOT MSE

188.5514

R-SQUARE

0.9784

DEP MEAN

3014.2

ADJ R-SQ

0.9744

SOURCE

C.V.

6.255438

PARAMETER ESTIMATES PARAMETER

STANDARD

T FOR H0:

ESTIMATE

ERROR

PARAMETER=0

PROB > |T|

290.99944

4.581

0.0003

0.37864583

-0.399

0.6949

5.34596285

-0.491

0.6300

0.006863831

7.569

0.0001

VARIABLE

DF

INTERCEP

1

1333.17830

X1

1

-0.15122302

X2

1

-2.62532461

X1X2

1

0.05195415

The fitted model is yˆ = 1333.18 − .151x1 − 2.625x2 + .052x1x2


429


b.

To determine if the overall model is useful, we test: H0: β1 = β2 = β3 = 0 Ha: At least one βi ≠ 0, i = 1, 2, 3 The test statistic is F =

MSR 8, 594, 901.67 = = 241.758 MSE 35, 551.637

The rejection region requires α = .05 in the upper tail of the F distribution with numerator df = k = 3 and denominator df = n − (k + 1) = 20 − (3 + 1) = 16. From Table IX, Appendix B, F.05 = 3.24. The rejection region is F > 3.24. Since the observed value of the test statistic falls in the rejection region (F = 241.758 > 3.24), H0 is rejected. There is sufficient evidence to indicate the model is useful at α = .05. c.

To determine if the interaction is present, we test: H0: β3 = 0 Ha: β3 ≠ 0


βˆ3 − 0 sβˆ

= 7.569.

3

The rejection region requires α/2 = .05/2 = .025 in each tail of the t distribution with df = n − (k + 1) = 20 − (3 + 1) = 16. From Table VI, Appendix B, t.025 = 2.120. The rejection region is t < −2.120 or t > 2.120. Since the observed value of the test statistic falls in the rejection region (t = 7.569 > 2.120), H0 is rejected. There is sufficient evidence to indicate the interaction between advertising expenditure and shelf space is present at α = .05.

430

d.

Advertising expenditure and shelf space are said to interact if the affect of advertising expenditure on sales is different at different levels of shelf space.

e.

If a first-order model was used, the effect of advertising expenditure on sales would be the same regardless of the amount of shelf space. If interaction really exists, the effect of advertising expenditure on sales would depend on which level of shelf space was present.

Chapter 11


11.134 a.

There is a curvilinear trend. b.

From MINITAB, the output is: The regression equation is y = 42.2 - 0.0114x + 0.000001 xsq Predictor

Coef

StDev

T

P

42.247

5.712

7.40

0.000

-0.011404

0.005053

-2.26

0.037

0.00000061

0.00000037

1.66

0.115

Constant x xsq S = 21.81

R-Sq = 34.9%

R-Sq(adj) = 27.2%

Analysis of Variance Source

DF

SS

MS

F

P

2

4325.4

2162.7

4.55

0.026

475.6

Regression Residual Error

17

8085.5

Total

19

12410.9

Sourc

DF

Seq SS

e x

1

3013.3

xsq

1

1312.1

Unusual Observations Obs 16 17

x1 9150

y

Fit

StDev Fit

Residual

4.60

-11.21

16.24

15.81

St Resid 1.09 x

15022

2.20

8.09

21.40

-5.89

-1.41 x

X denotes an observation whose X value gives it large influence.

The fitted model is yˆ = 42.2 − .0114x + .00000061x2


431


c.

To determine if a curvilinear relationship exists, we test: H0: β2 = 0 Ha: β2 ≠ 0

From MINITAB, the test statistic is t = 1.66 with p-value = .115. Since the p-value is greater than α = .05, do not reject H0. There is insufficient evidence to indicate that a curvilinear relationship exists between dissolved phosphorus percentage and soil loss at α = .05. 11.136 a.

The first order model for this problem is: E(y) = β0 + β1x1 + β2x2 + β3x3 + β4x4

b.

Using MINITAB, the printout is: Regression Analysis The regression equation is y = 28.9 -0.000000 x1 + 0.844 x2 - 0.360 x3 - 0.300 x4 Predictor

Coef

StDev

T

P

28.87

12.67

2.28

0.034

x1

-0.00000011

0.00000028

-0.38

0.708

x2

0.8440

0.2326

3.63

0.002

x3

-0.3600

0.1316

-2.74

0.013

x4

-0.3003

0.1834

-1.64

0.117

Constant

S = 5.989

R-Sq = 51.2%

R-Sq(adj) = 41.5%

Analysis of Variance Source Regression

DF

SS

MS

F

P

4

753.76

188.44

5.25

0.005

35.87

Residual Error

20

717.40

Total

24

1471.17

Source

DF

Seq SS

x1

1

129.96

x2

1

355.43

x3

1

172.19

x4

1

96.17


x1

y

Fit

StDev Fit

Residual

4

11940345

32.60

17.25

3.40

15.35

St Resid 3.11R

12

4905123

27.00

16.17

4.36

10.83

2.63R

R denotes an observation with a large standardized residual

432

Chapter 11


The least squares prediction line is yˆ = 28.9 − .00000011x1 + .844x2 − .360x3 − .300x4. To determine if the model is useful for predicting percentage of problem mortgages, we test: H0: β1 = β2 = β3 = β4 = 0 Ha: At least one of the coefficients is nonzero


MS(Model) = 5.25 MSE

The p-value is p = .005. Since the p-value is less than α = .05 (p = .005 < .05), H0 is rejected. There is sufficient evidence to indicate the model is useful in predicting percentage of problem mortgages at α = .05. c.

βˆ0 = 28.9. This is merely the y-intercept. It has no other meaning in this problem. βˆ1 = −0.00000011. For each unit increase in total mortgage loans, the mean percentage of problem mortgages is estimated to decrease by 0.00000011, holding percentage of invested assets, percentage of commercial mortgages, and percentage of residential mortgages constant.

βˆ2 = 0.844. For each unit increase in percentage of invested assets, the mean percentage of problem mortgages is estimated to increase by 0.844, holding total mortgage loans, percentage of commercial mortgages, and percentage of residential mortgages constant.

βˆ3 = −0.360. For each unit increase in percentage of commercial mortgages, the mean percentage of problem mortgages is estimated to decrease by 0.360, holding total mortgage loans, percentage of invested assets, and percentage of residential mortgages constant.

βˆ4 = −0.300. For each unit increase in percentage of residential mortgages, the mean percentage of problem mortgages is estimated to decrease by 0.300, holding total mortgage loans, percentage of invested assets, and percentage of commercial mortgages constant.


433


d.

Using MINITAB, the scattergrams are:

From the scattergrams, it appears that possibly x2 and x4 might warrant inclusion in the model as second order terms.

434

Chapter 11


e.

Using MINITAB, the printout is: Regression Analysis The regression equation is y = 56.2 -0.000000 x1 - 1.82 x2 - 0.449 x3 + 0.223 x4 + 0.0771 x2sq - 0.0189 x4sq Predictor

Coef

StDev

T

P

56.17

13.81

4.07

0.001

x1

-0.00000008

0.00000025

-0.31

0.760

x2

-1.8177

0.9935

-1.83

0.084

x3

-0.4494

0.1127

-3.99

0.001

x4

0.2227

0.6079

0.37

0.718

x2sq

0.07707

0.02665

2.89

0.010

x4sq

-0.01887

0.02334

-0.81

0.429

Constant

S = 4.956

R-Sq = 69.9%

R-Sq(adj) = 59.9%

Analysis of Variance Source Regression

DF

SS

MS

F

P

6

1029.03

171.51

6.98

0.001

24.56

Residual Error

18

442.13

Total

24

1471.17

Source

DF

Seq SS

x1

1

129.96

x2

1

355.43

x3

1

172.19

x4

1

96.17

x2sq

1

259.22

x4sq

1

16.05


x1

y

Fit

StDev Fit

Residual

4 11940345

32.600

26.777

4.038

5.823

St Resid 2.03R -2.04R

10

5328142

7.500

16.105

2.599

-8.605

12

4905123

27.000

16.559

3.607

10.441

3.07R

20

2978628

3.200

11.759

2.679

-8.559

-2.05R

R denotes an observation with a large standardized residual

The least squares prediction equation is yˆ = 56.2 − .00000008x1 − 1.82x2 − .449x3 + .223x4 + 1 .0771x22 − .0189 x42 To determine if the model is useful for predicting percentage of problem mortgages, we test: H0: β1 = β2 = β3 = β4 = β5 = β6 = 0 Ha: At least one of the coefficients is nonzero


435



MS(Model) = 6.98 MSE

The p-value is p = .001. Since the p-value is less than α = .05 (p = .001 < .05), H0 is rejected. There is sufficient evidence to indicate the model is useful in predicting percentage of problem mortgages at α = .05. f.

To determine if one or more of the second-order terms of our model contribute information for the prediction of the percentage of problem mortgages, we test: H0: β5 = β6 = 0 Ha: At least one of the coefficients is nonzero


(SSE R − SSE C) /( k − g ) (717.40 − 442.13) /(6 − 4) = = 5.60 442.13 /[25 − (6 + 1)] SSE C /[n − (k + 1)]

The rejection region requires α = .05 in the upper tail of the F-distribution with ν1 = (k − g) = (6 − 4) = 2 and ν2 = n − (k + 1) = 25 − (6 + 1) = 18. From Table IX, Appendix B, F.05 = 3.55. The rejection region is F > 3.55. Since the observed value of the test statistic falls in the rejection region (F = 5.60 > 3.55), H0 is rejected. There is sufficient evidence to indicate one or more of the second-order terms of our model contribute information for the prediction of the percentage of problem mortgages at α = .05. 11.138 a.

Using SAS, the output for fitting the model is: DEP VARIABLE: Y ANALYSIS OF VARIANCE SUM OF

MEAN

DF

SQUARES

SQUARE

F VALUE

PROB>F

MODEL

3

2396.36410

798.78803

99.394

0.0001

ERROR

16

128.58590

8.03662

C TOTAL

11

2524.95000

SOURCE

436

ROOT MSE

2.83489

R-SQUARE

0.9491

DEP MEAN

23.05000

ADJ R-SQ

0.9395

C.V.

12.29889

Chapter 11



STANDARD

T FOR H0:

VARIABLE

DF

ESTIMATE

ERROR

PARAMETER=0

PROB > |T|

INTERCEP

1

-11.768830

3.05032146

-3.858

0.0014

X1

1

10.293782

1.43788129

7.159

0.0001

X1SQ

1

-0.417991

0.16132974

-2.591

0.0197

X2

1

13.244076

1.50325080

8.810

0.0001

The fitted model is: yˆ = −11.8 + 10.3x1 − .418 x12 + 13.2x2 b.

To determine if the second-order term is necessary, we test: H0: β2 = 0 Ha: β2 ≠ 0

The test statistic is t = −2.591. The p-value is p = .0197. Since the p-value is less than α (p = .0197 < .05), H0 is rejected. There is sufficient evidence to conclude that the second-order term in the model proposed by the operations manager is necessary at α = .05. c.

The reduced model E(y) = β0 + β3x2 was fit to the data. The SAS output is: DEP VARIABLE: Y ANALYSIS OF VARIANCE SUM OF

MEAN

DF

SQUARES

SQUARE

F VALUE

PROB>F

MODEL

1

1.25000000

1.25000000

0.009

0.9258

ERROR

18

2523.70000

140.20556

C TOTAL

19

2524.95000

ROOT MSE

11.84084

R-SQUARE

0.0005

DEP MEAN

23.05

ADJ R-SQ

-0.0550

SOURCE

C.V.

51.37025


STANDARD

T FOR H0:

VARIABLE

DF

ESTIMATE

ERROR

PARAMETER=0

PROB > |T|

INTERCEP

1

23.30000000

3.74440323

6.223

0.0001

X2

1

-0.50000000

5.29538583

-0.094

0.9258


437


The fitted model is yˆ = 23.3 − .5x2. The hypotheses are: H0: β1 = β2 = 0 Ha: At least one βi ≠ 0, i = 1, 2

(SSE R − SSE C) /(k − g ) SSE C /[ n − (k + 1)] (2523.7 − 128.586) /(3 − 1) 1197.557 = = 149.01 = 128.586 /[20 − (3 + 1)] 8.036625


The rejection region requires α = .10 in the upper tail of the F distribution with numerator df = k − g = 3 − 1 = 2 and denominator df = n − (k + 1) = 20 − (3 + 1) = 16. From Table VIII, Appendix B, F.10 = 2.67. The rejection region is F > 2.67. Since the observed value of the test statistic falls in the rejection region (F = 149.01 > 2.67), H0 is rejected. There is sufficient evidence to indicate the age of the machine contributes information to the model at α = .10. After adjusting for machine type, there is evidence that down time is related to age. 11.140 a.

For a sunny weekday, x1 = 0 and x2 = 1: x3 = 70 ⇒ yˆ = 250 − 700(0) + 100(1) + 5(70) + 15(0)(70) = 700 x3 = 80 ⇒ yˆ = 250 − 700(0) + 100(1) + 5(80) + 15(0)(80) = 750 x3 = 90 ⇒ yˆ = 800 x3 = 100 ⇒ yˆ = 850

For a sunny weekend, x1 = 1 and x2 = 1: x3 = 70 ⇒ yˆ = 250 − 700(1) + 100(1) + 5(70) + 15(1)(70) = 1050 x3 = 80 ⇒ yˆ = 250 − 700(1) + 100(1) + 5(80) + 15(1)(80) = 1250 x3 = 90 ⇒ yˆ = 1450 x3 = 100 ⇒ yˆ = 1650

438

Chapter 11


For both sunny weekdays and sunny weekend days, as the predicted high temperature increases, so does the predicted day's attendance. However, the predicted day's attendance on sunny weekend days increases at a faster rate than on sunny weekdays. Also, the predicted day's attendance is higher on sunny weekend days than on sunny weekdays. b.

To determine if the interaction term is a useful addition to the model, we test: H0: β4 = 0 Ha: β4 ≠ 0


βˆ4 sβˆ

=

15 =5 3

4

The rejection region requires α/2 = .05/2 = .025 in each tail of the t distribution with df = n − (k + 1) = 30 − (4 + 1) = 25. From Table VI, Appendix B, t.025 = 2.06. The rejection region is t < −2.06 or t > 2.06. Since the observed value of the test statistic falls in the rejection region (t = 5 > 2.06), H0 is rejected. There is sufficient evidence to indicate the interaction term is a useful addition to the model at α = .05. c.

For x1 = 0, x2 = 1, and x3 = 95, yˆ = 250 − 700(0) + 100(1) + 5(95) + 15(0)(95) = 825

d.

The width of the interval in Exercise 11.139e is 1245 − 645 = 600, while the width is 850 − 800 = 50 for the model containing the interaction term. The smaller the width of the interval, the smaller the variance. This implies that the interaction term is quite useful in predicting daily attendance. It has reduced the unexplained error.


439


e.

11.142 a.

Because an interaction term including x1 is in the model, the coefficient corresponding to x1 must be interpreted with caution. For all observed values of x3 (temperature), the interaction term value is greater than 700. From MINITAB, the output is: Regression Analysis: y versus x1, x2, x1sq, x2sq, x1x2 The regression equation is y = - 9.92 + 0.167 x1 + 0.138 x2 - 0.00111 x1sq -0.000843 x2sq +0.000241 x1x2 Predictor Constant x1 x2 x1sq x2sq x1x2

Coef -9.917 0.16681 0.13760 -0.0011082 -0.0008433 0.0002411

S = 0.1871

SE Coef 1.354 0.02124 0.02673 0.0001173 0.0001594 0.0001440

R-Sq = 93.7%

T -7.32 7.85 5.15 -9.45 -5.29 1.67

P 0.000 0.000 0.000 0.000 0.000 0.103

R-Sq(adj) = 92.7%

Analysis of Variance Source Regression Residual Error Total Source x1 x2 x1sq x2sq x1x2

DF 5 34 39

DF 1 1 1 1 1

SS 17.5827 1.1908 18.7735

MS 3.5165 0.0350

F 100.41

P 0.000

Seq SS 5.2549 7.5311 3.6434 1.0552 0.0982

The least squares prediction equation is: yˆ = −9.917 + .167 x1 + .138 x2 − .00111x12 − .000843 x22 + .000241x 1 x2

b.

The standard deviation for the first-order model is s = .4023. The standard deviation for the second-order model is s = .1871. The relative precision for the first-order model is ± 2(.4023) = ± .8046. The relative precision for the second-order model is ± 2(.1871) = ± .3742.

c.

To determine if the model is useful, we test: H0: β1 = β2 = β3 = β4 = β5 = 0 Ha: At least one βi ≠ 0, i = 1, 2, ... , 5


MSR 3.5165 = = 100.41 MSE .0350

The p-value is .0000. Since the p-value is less than α = .05, H0 is rejected. There is sufficient evidence to indicate the model is useful for predicting GPA at α = .05.

440

Chapter 11


d.

To determine if the interaction term is important, we test: H0: β5 = 0 Ha: β5 ≠ 0

The test statistic is t = 1.67. The p-value is .103. Since the p-value is not less than α = .10, H0 is not rejected. There is insufficient evidence to indicate the interaction term is important for predicting GPA at α = .10. e.

From MINITAB, the plots are:

Residuals Versus x1 (response is y) 0.5 0.4 0.3

Residual

0.2 0.1 0.0 -0.1 -0.2 -0.3 -0.4 40

50

60

70

80

90

100

x1


441


Residuals Versus x2 (response is y) 0.5 0.4 0.3

Residual

0.2 0.1 0.0 -0.1 -0.2 -0.3 -0.4 50

60

70

80

90

100

x2

The residual plots of the residuals against x1 and against x2 for the second-order model indicate there is no mound or bowl shape in either graph. This implies that secondorder is the highest order necessary. We have eliminated the mound shape from the plots of the residuals against x1 and the residuals against x2 for the first-order model. From the plots and the results of the tests in 11.145, it appears the second order model is preferable for predicting GPA. f.

To see if the second-order terms are useful, we test: H0: β3 = β4 = β5 Ha: At least one βi ≠ 0, i = 3, 4, 5


(SSE R − SSE C ) /(k − g ) (5.9876 − 1.1908) / 3 = = 45.68 .0350 SSE C / [ n − (k + 1) ]

Since no α is given, we will use α = .05. The rejection region requires α = .05 in the upper tail of the F distribution with ν1 = k − g = 5 − 2 = 3 and ν2 = n − [k + 1] = 40 − (5 + 1) = 34. From Table IX, Appendix B, F.05 ≈ 2.92. The rejection region is F > 2.92. Since the observed value of the test statistic falls in the rejection region (F = 45.68 > 2.92), H0 is rejected. There is sufficient evidence that at least one second-order term is useful at α = .05.

442

Chapter 11


11.144 a.

The model is E(y) = β0 + β1x1 A sketch of the response curve might be:

b.

The model is E(y) = β0 + β1x1 + β2x2 + β3x3 ⎧1 if brand 2 where x 2 = ⎨ ⎩0 otherwise

⎧1 if brand 3 x3 = ⎨ ⎩0 otherwise

A sketch of the response curve might be:

c.

The model is E(y) = β0 + β1x1 + β2x2 + β3x3 + β4x1x2 + β5x1x3 A sketch of the response curve might be:


443


The Condo Sales Case (To accompany Chapters 10–11)

Several models were fit to obtain the final model. I first fit a model with only the main effects for Floor, Distance, View, Endunit, and Furnish. Of these, only Furnish, adjusted for the other variables, was not significant. See the output below. The regression equation is Price = 184 - 3.81 Floor + 1.74 Distance + 40.3 View - 32.7 Endunit + 4.28 Furnish Predictor Constant Floor Distance View Endunit Furnish

Coef 183.570 -3.8076 1.7414 40.325 -32.716 4.279

s = 24.39

Stdev 5.221 0.7482 0.3750 3.456 9.581 3.602

R-sq = 49.4%

t-ratio 35.16 -5.09 4.64 11.67 -3.41 1.19

p 0.000 0.000 0.000 0.000 0.001 0.236

R-sq(adj) = 48.2%

Analysis of Variance SOURCE Regression Error Total SOURCE Floor Distance View Endunit Furnish

DF 5 203 208

SS 118091 120802 238893

DF 1 1 1 1 1

SEQ SS 14149 21208 75065 6829 840

MS 23618 595

F 39.69

p 0.000

I then added Floor2 and Distance2 to the model with all main effects. For this model, all of the main effects, including Furnish, were significant along with both squared terms. The output follows. The regression equation is Price = 220 - 13.3 Floor - 7.01 Distance + 38.9 View - 22.0 Endunit + 7.31 Furnish + 1.05 FlSq + 0.572 DiSq Predictor Constant Floor Distance View Endunit Furnish FlSq DiSq s = 22.49

444

Coef 220.258 -13.296 -7.007 38.927 -21.967 7.308 1.0512 0.5719

Stdev 8.178 3.253 1.614 3.202 9.086 3.419 0.3492 0.1033

R-sq = 57.4%

t-ratio 26.93 -4.09 -4.34 12.16 -2.42 2.14 3.01 5.54

p 0.000 0.000 0.000 0.000 0.017 0.034 0.003 0.000

R-sq(adj) = 56.0%

The Condo Sales Case



DF 7 201 208

SS 137234 101659 238893

DF 1 1 1 1 1 1 1

SEQ SS 14149 21208 75065 6829 840 3640 15503

SOURCE Floor Distance View Endunit Furnish FlSq DiSq

MS 19605 506

F 38.76

p 0.000

I then did a stepwise regression, forcing all the main effects and the two squared terms into the model, to see if any two-way interaction terms could be added to the model. From this, only the interaction between Floor and View was significant. The output from the final model is: The regression equation is Price = 206 - 9.93 Floor - 7.02 Distance + 66.0 View - 22.5 Endunit + 6.48 Furnish + 1.02 FlSq + 0.577 DiSq - 6.04 FV Predictor Constant Floor Distance View Endunit Furnish FlSq DiSq FV

Coef 206.123 -9.927 -7.020 65.952 -22.451 6.485 1.0207 0.57720 -6.037

s = 21.44

Stdev 8.379 3.186 1.539 6.619 8.662 3.265 0.3330 0.09848 1.312

R-sq = 61.5%

t-ratio 24.60 -3.12 -4.56 9.96 -2.59 1.99 3.07 5.86 -4.60

p 0.000 0.002 0.000 0.000 0.010 0.048 0.002 0.000 0.000

R-sq(adj) = 60.0%


DF 8 200 208

SS 146965 91928 238893

DF 1 1 1 1 1 1 1 1

SEQ SS 14149 21208 75065 6829 840 3640 15503 9731

SOURCE Floor Distance View Endunit Furnish FlSq DiSq FV


MS 18371 460

F 39.97

p 0.000

445


This final model is fairly good. The R-squared value is .615. Thus, 61.5% of the variation in prices can be explained by the model that includes the follow variables: Floor and Floor-squared, Distance and Distance-squared, View, Endunit, Furnish, and the interaction of Floor and View. The residual plots are as follows:

From the residual plots, it appears that the data are normally distributed, but there may be a couple of outliers. This is evident by the two points whose standardized residuals are less than −3. Also, it appears that there is constant variance. Thus, the model looks to be fairly good. It would be better if the R-squared value was higher, however. The final model is: Price = 206 − 9.93 Floor − 7.02 Distance + 66.0 View − 22.5 Endunit + 6.48 Furnish + 1.02 FlSq + 0.577 DiSq - 6.04 FV I have included graphs to indicate how each variable affects the price. These graphs reflect the relationship between Price and a selected variable, holding the other variables constant. The first graph is a graph of Price by Floor for each level of View, since Floor and View interact. Both lines are curved to reflect the quadratic relationship between Floor and Price. For the Non-ocean view, the price is fairly constant. There is a slight decrease in price as the Floor increases until Floor 5, and then a slight increase as the floor increases. For the Ocean view, the price decreases at a decreasing rate as the Floor increases. The second graph is a graph of the Price by Distance. Again, the quadratic relationship is reflected by the curved line. As the distance increases, the price decreases until a distance of 6 is reached. Then the price begins to increase again as the distance increases.

446



The third graph is a graph of the Price by View, for each Floor. Again, we must look at the relationship between Price and View at each Floor because of the significant interaction. For all Floors, the price of the Ocean View is higher than the price of the Non-ocean View. However, the difference in the two views depends on the floor. The fourth graph is a graph of the Price by Endunit. From the graph, the price of the endunits are less than the others. The last graph is a graph of the Price by Furnish. From the graph, the price of the furnished units is higher than the price of the non-furnished units.


447



Chapter 12

12.2

If rational subgrouping is not used, it is possible that a change in the process mean will go undetected. In rational subgrouping, samples are selected so that a change in the process mean occurs between samples, not within samples.

12.4

An x -chart is used to monitor the process mean.

12.6

The variation of a process must be stable. If it were not, the control limits of the -chart would be meaningless since they are a function of the process variation.

12.8

a.

According to rule 4 (14 points in a row alternating up and down), the process is out of control. Therefore, it is affected by both common and special causes of variation. An incontrol process is affected by only common causes. Rule 4 says that if we observe 14 points in a row alternating up and down, that is an indication of the presence of special causes of variation in addition to common causes. Points 2 through 16 alternate up and down.

b.

The extended x -chart is: _ x 35

UCL A

30 B 25

C = x

20 C 15 B 10 A 5

LCL 1

5

10

15

20

25

30

Sample Number

The additional points suggest that the process is out of control. Rule 1 (One point beyond Zone A), Rule 5 (2 out of 3 points in a row in Zone A or beyond), and Rule 6 (4 out of 5 points in a row in Zone B or beyond) indicate the process is out of control. 12.10

448

a.

x1 + x2 + " + x25 2008.8 = = 80.352 25 k R + R2 + " + R25 198.7 = 7.948 R= 1 = 25 k x=

Chapter 12


b.

Centerline = x = 80.352

From Table XII, Appendix B, with n = 5, A2 = .577.

Upper control limit = x + A2 R = 80.352 + .577(7.948) = 84.938 Lower control limit = x − A2 R = 80.352 − .577(7.948) = 75.766 c – d.

2 ( A2 R ) ) = 80.352 + 23 (.577)(7.948) = 83.409 3 2 2 Lower A–B boundary = x − ( A2 R ) ) = 80.352 − (.577)(7.948) = 77.295 3 3 1 1 Upper B–C boundary = x + ( A2 R ) ) = 80.352 + (.577)(7.948) = 81.881 3 3 1 1 Lower B–C boundary = x − ( A2 R ) ) = 80.352 + (.577)(7.948) = 78.823 3 3

Upper A–B boundary = x +

The x -chart is:

Rule 1: Rule 2: Rule 3: Rule 4: Rule 5: Rule 6:

One point beyond Zone A: Point 10 is beyond Zone A. This indicates the process is out of control. Nine points in a row in Zone C or beyond: No sequence of nine points are in Zone C (on one side of the centerline) or beyond. Six points in a row steadily increasing or decreasing: No sequence of six points steadily increase or decrease. Fourteen points in a row alternating up and down: This pattern does not exist. Two out of three points in Zone A or beyond: There are no groups of three consecutive points that have two or more in Zone A or beyond. Four out of five points in a row in Zone B or beyond: No sequence of five points has four or more in Zone B or beyond.

Rule 1 indicates the process is out of control.


449


12.12

a.

From Table XII, Appendix B, with n = 4, A2 = .729.

x = .6733 and R = .335 Upper control limit = x + A2 R = .6733 + .729(.335) = .9175 Lower control limit = x − A2 R = .6733 − .729(.335) = .4291

b.

Upper A − B boundary = x +

2 2 A2 R ) = .6733 + (.729)(.335) = .8361 ( 3 3

Lower A − B boundary = x −

2 2 A2 R ) = .6733 − (.729)(.335) = .5105 ( 3 3

Upper B − C boundary = x +

1 1 A2 R ) = .6733 + (.729)(.335) = .7547 ( 3 3

Lower A − B boundary = x −

1 1 A2 R ) = .6733 − (.729)(.335) = .5919 ( 3 3

Rule 1: Rule 2:

Rule 3: Rule 4: Rule 5: Rule 6:

One point beyond Zone A: No points are beyond Zone A. Nine points in a row in Zone C or beyond: There are nine points (Points 9 through 17) in a row in Zone C (on one side of the centerline) or beyond. This indicates that the process is out of control. Six points in a row steadily increasing or decreasing: No sequence of six points steadily increase or decrease. Fourteen points in a row alternating up and down: This pattern does not exist. Two out of three points in Zone A or beyond: There are no groups of three consecutive points that have two or more in Zone A or beyond. Four out of five points in a row in Zone B or beyond: No sequence of five points has four or more in Zone B or beyond.

Rule 2 indicates the process in out of control. c.

450

These control limits should not be used to monitor future output because the process is out of control. One or more special causes of variation are affecting the process mean. These should be identified and eliminated in order to bring the process into control.

Chapter 12


12.14

a.

The process of interest is the production of bolts used in military aircraft.

b.


Descriptive Statistics: Length by Hour Variable Length

Hour 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

N 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4

Mean 36.973 36.957 37.067 37.065 36.948 36.998 37.000 37.005 37.027 36.970 37.020 36.983 37.070 37.073 36.993 36.955 37.038 37.010 36.955 37.035 36.995 37.023 37.003 36.995 37.010

Median 36.965 36.970 37.060 37.040 36.940 36.985 36.995 36.995 37.020 36.950 37.050 36.985 37.075 37.075 37.020 36.965 37.035 37.010 36.965 37.045 36.985 37.020 37.010 37.005 37.020

TrMean 36.973 36.957 37.067 37.065 36.948 36.998 37.000 37.005 37.027 36.970 37.020 36.983 37.070 37.073 36.993 36.955 37.038 37.010 36.955 37.035 36.995 37.023 37.003 36.995 37.010

StDev 0.098 0.079 0.081 0.096 0.121 0.101 0.054 0.087 0.111 0.106 0.098 0.066 0.132 0.025 0.069 0.040 0.097 0.085 0.058 0.109 0.044 0.096 0.039 0.071 0.083

Variable Length

Hour 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

SE Mean 0.049 0.040 0.040 0.048 0.061 0.051 0.027 0.044 0.055 0.053 0.049 0.033 0.066 0.013 0.035 0.020 0.049 0.043 0.029 0.055 0.022 0.048 0.019 0.036 0.041

Minimum 36.880 36.850 36.990 36.980 36.810 36.890 36.940 36.910 36.900 36.870 36.880 36.900 36.910 37.040 36.890 36.900 36.940 36.910 36.880 36.900 36.960 36.930 36.950 36.900 36.900

Maximum 37.080 37.040 37.160 37.200 37.100 37.130 37.070 37.120 37.170 37.110 37.100 37.060 37.220 37.100 37.040 36.990 37.140 37.110 37.010 37.150 37.050 37.120 37.040 37.070 37.100

Q1 36.885 36.878 36.995 36.990 36.835 36.908 36.953 36.927 36.927 36.880 36.918 36.920 36.940 37.048 36.920 36.913 36.948 36.927 36.895 36.925 36.960 36.935 36.963 36.923 36.927

Q3 37.067 37.025 37.147 37.165 37.068 37.100 37.053 37.093 37.135 37.080 37.093 37.043 37.195 37.095 37.038 36.988 37.130 37.093 37.005 37.135 37.040 37.113 37.035 37.058 37.083

For each sample, we compute R = range = largest measurement - smallest measurement.


451


The results are listed in the table: Sample No. 1 2 3 4 5 6 7 8 9 10 11 12 13

R .20 .19 .17 .22 .29 .24 .13 .21 .27 .24 .22 .16 .31

Sample No. 14 15 16 17 18 19 20 21 22 23 24 25

R .06 .15 .09 .20 .20 .13 .25 .09 .19 .09 .17 .20

x1 + x2 + " + x25 925.1650 = = 37.0066 k 25 R + R2 + " R25 4.67 = R = 1 = .1868 k 25

x =

Centerline = x = 37.007 From Table XII, Appendix B, with n = 4, A2 = .729.

Upper control limit = x + A2 R = 37.007 + .729(.1868) = 37.143 Lower control limit = x − A2 R = 37.007 − .729(.1868) = 36.871 2 2 A2 R ) ) = 37.007 + (.729)(.1868) = 37.098 ( 3 3 2 2 Lower A–B boundary = x − ( A2 R ) ) = 37.007 − (.729)(.1868) = 36.916 3 3 1 1 Upper B–C boundary = x + ( A2 R ) ) = 37.007 + (.729)(.1868) = 37.052 3 3 1 1 Lower B–C boundary = x − ( A2 R ) ) = 37.007 − (.729)(.1868) = 36.962 3 3

Upper A–B boundary = x +

452

Chapter 12


The x -chart is:

c.

To determine if the process is in or out of control, we check the six rules: Rule 1: Rule 2: Rule 3: Rule 4: Rule 5: Rule 6:

One point beyond Zone A: No points are beyond Zone A. Nine points in a row in Zone C or beyond: No sequence of nine points are in Zone C (on one side of the centerline) or beyond. Six points in a row steadily increasing or decreasing: No sequence of six points steadily increase or decrease. Fourteen points in a row alternating up and down: This pattern does not exist. Two out of three points in Zone A or beyond: There are no groups of three consecutive points that have two or more in Zone A or beyond. Four out of five points in a row in Zone B or beyond: No sequence of five points has four or more in Zone B or beyond.

The process appears to be in control. No special causes of variation appear to be present.

12.16

d.

An example of a special cause of variation would be if the machine used to produce the bolts slipped out of alignment and started producing bolts of a different length. An example of common cause variation would be the grade of the raw material used to make the bolts.

e.

Since the process appears to be in control, it is appropriate to use these limits to monitor future process output.

a.

x1 + x2 + " + x16 868.18 = = 54.26125 k 16 R + R2 + " + R16 44.1 = R= 1 = 2.75625 k 16

x =

Centerline = x = 54.26125 From Table XII, Appendix B, with n = 5, A2 = .577

Upper control limit = x + A2 R = 54.26125 + .577(2.75625) = 55.8516


453


Lower control limit = x − A2 R = 54.26125 − .577(2.75625) = 52.6709 Upper A – B boundary = x +

2 2 (.577)(2.75625) = 55.3215 ( A2 R ) = 54.26125 + 3 3

Lower A – B boundary = x −

2 2 ( A2 R) = 54.26125 − (.577)(2.75625) = 53.2010 3 3

Upper B – C boundary = x +

1 1 ( A2 R ) = 54.26125 + (.577)(2.75625) = 54.7914 3 3

Lower B – C boundary = x −

1 1 ( A2 R ) = 54.26125 − (.577)(2.75625) = 53.7311 3 3

The x -chart is:

b.

To determine if the process is in or out of control, we check the six rules: Rule 1: Rule 2: Rule 3: Rule 4: Rule 5:

Rule 6:

One point beyond Zone A: One point is beyond Zone A. Nine points in a row in Zone C or beyond: No sequence of nine points are in Zone C (on one side of the centerline) or beyond. Six points in a row steadily increasing or decreasing: No sequence of six points steadily increase or decrease. Fourteen points in a row alternating up and down: This pattern does not exist. Two out of three points in Zone A or beyond: There are two sets of three consecutive points (data points 3, 4, and 5 and data points 4, 5, and 6) that have two points in Zone A or beyond. Four out of five points in a row in Zone B or beyond: No sequence of five points has four or more in Zone B or beyond.

Special causes of variation appear to be present. The process appears to be out of control. Rules 1 and 5 indicate the process is out of control. c.

454

Since the process is out of control, these control limits should not be used to monitor future process outputs.

Chapter 12


12.18

The R-chart is designed to monitor the variation of the process.

12.20

Using Table XII, Appendix B:

12.22

a.

With n = 4, D3 = 0.000

D4 = 2.282

b.

With n = 12, D3 = 0.283

D4 = 1.717

c.

With n = 24, D3 = 0.451

D4 = 1.548

a.

From Exercise 12.11, the R values are: Sample No. 1 2 3 4 5 6 7 8 9 10

R=

R 1.8 2.8 3.8 2.5 3.7 5.0 5.5 3.5 2.5 4.1

Sample No. 11 12 13 14 15 16 17 18 19 20

R 3.2 0.9 2.6 4.0 2.2 4.3 3.6 2.5 2.2 5.5

R1 + R2 + " R20 66.2 = = 3.31 k 20

Centerline = R = 3.31 From Table XII, Appendix B, with n = 4, D4 = 2.282, and D3 = 0.

Upper control limit = R D4 = 3.31(2.282) = 7.553 Since D3 = 0, the lower control limit is negative and is not included on the chart. b.

From Table XII, Appendix B, with n = 4, d2 = 2.059, and d3 = .880.

Upper A–B boundary = R + 2d3

R 3.31 = 6.139 = 3.31 + 2(.880) d2 2.059

Lower A–B boundary = R − 2d3

R 3.31 = 0.481 = 3.31 − 2(.880) d2 2.059

Upper B–C boundary = R + d3

R 3.31 = 4.725 = 3.31 + (.880) d2 2.059

Lower B–C boundary = R − d3

R 3.31 = 1.895 = 3.31− (.880) d2 2.059


455


c.

The R-chart is:

To determine if the process is in or out of control, we check the four rules: Rule 1: Rule 2: Rule 3: Rule 4:

One point beyond Zone A: No points are beyond Zone A. Nine points in a row in Zone C or beyond: No sequence of nine points are in Zone C (on one side of the centerline) or beyond. Six points in a row steadily increasing or decreasing: No sequence of six points steadily increase or decrease. Fourteen points in a row alternating up and down: This pattern does not exist.

The process appears to be in control. 12.24

a.

From Table XII, Appendix B, with n = 4, D3 = 0, and D4 = 2.282. R = .335 Upper control limit = R D4 = .335(2.282) = .7645

Since D3 = 0, the lower control limit is negative and is not included on the chart. b.

To determine if special causes of variation are present, we need to complete the R-chart. From Table XII, Appendix B, with n = 4, d2 = 2.059, and d3 = .880.

456

Upper A − B boundary = R + 2d 3

R .335 = .335 + 2(.880) = .6213 d2 2.059

Lower A − B boundary = R − 2d3

R .335 = .335 − 2(.880) = .0486 d2 2.059

Chapter 12


Upper B − C boundary = R + d3

R .335 = .335 + (.880) = .4782 d2 2.059

Lower B − C boundary = R − d 3

R .335 = .335 − (.880) = .1918 d2 2.059

The R-chart is: UCL = .7646 .6213 .4782 R = 0.335 .1918 .0486

To determine if the process is in control, we check the four rules. Rule 1: Rule 2: Rule 3: Rule 4:

One point beyond Zone A: No points are beyond Zone A. Nine points in a row in Zone C or beyond: There are not nine points are in a row in Zone C (on one side of the centerline) or beyond. Six points in a row steadily increasing or decreasing: No sequence of six points steadily increase or decrease. Fourteen points in a row alternating up and down: This pattern does not exist.

It appears that the process is in control. c.

Yes. This process appears to be in control. Therefore, these control limits could be used to monitor future output.

d.

Of the 30 R values plotted, there are only 6 different values. Most of the R values take on one of three values. This indicates that the data must be discrete (take on a countable number of values), or that the path widths are multiples of each other.


457


12.26

a.

R=

R1 + R2 + " + R20 4 + 6 + " + 15 176 = = = 8.8 k 20 20

Centerline = R = 8.8

From Table XII, Appendix B, with n = 5, D4 = 2.114 and D3 = 0. Upper control limit = RD4 = 8.8(2.114) = 18.603

Since D3 = 0, the lower control limit is negative and is not included on the chart. From Table XII, Appendix B, with n = 5, d2 = 2.326 and d3 = 0.864. R 8.8 = 15.338 = 8.8 + 2(.864) d2 2.326

Upper A – B boundary = R + 2d3

Lower A – B boundary = R − 2 d 3

Upper B – C boundary = R + d 3

Lower B – C boundary = R − d 3

R

= 8.8 − 2(.864)

d2

R

= 8.8 + (.864)

8.8 = 12.069 2.326

= 8.8 − (.864)

8.8 = 5.531 2.326

d2 R d2

8.8 = 2.262 2.326

The R-chart is:

b.

To determine if the process is in or out of control, we check the four rules: Rule 1: Rule 2:

458

One point beyond Zone A: No points are beyond Zone A. Nine points in a row in Zone C or beyond: No sequence of nine points are in Zone C (on one side of the centerline) or beyond.

Chapter 12


Rule 3: Rule 4:

Six points in a row steadily increasing or decreasing: No sequence of six points steadily increase or decrease. Fourteen points in a row alternating up and down: This pattern does not exist.

The process appears to be in control since none of the out-of-control signals are observed. No special causes of variation appear to be present.

12.28

c.

Since the process appears to be in control, the control limits of the R-chart could be used to monitor future replacement cycle times.

d.

From part b, we decided that the process was in control. However, there does appear to be a pattern emerging in the R-chart. As the sample number increases, the value of R is tending to increase. If this process was monitored for a longer period of time, the R-chart might indicate that the process was out of control.

a.

R =

R1 + R2 + " + R16 .4 + 1.4 + " + 2.6 44.1 = = = 2.756 k 16 16

Centerline = R = 2.756

From Table XII, Appendix B, with n = 5, D4 = 2.114 and D3 = 0. Upper control limit = RD4 = 2.756(2.114) = 5.826

Since D3 = 0, the lower control limit is negative and is not included on the chart. From Table XII, Appendix B, with n = 5, d2 = 2.326 and d3 = 0.864. Upper A – B boundary = R + 2d3

R 2.756 = 4.803 = 2.756 + 2(.864) d2 2.326

Lower A – B boundary = R − 2d3

R 2.756 = 2.756 - 2(.864) = .709 d2 2.326

Upper B – C boundary = R + d3

2.756 R = 2.756 + (.864) = 3.780 2.326 d2

Lower B – C boundary = R − d3

R 2.756 = 1.732 = 2.756 - (.864) d2 2.326


459


The R-chart is:

b.

The R-chart is designed to monitor the process variation.

c.


One point beyond Zone A: No points are beyond Zone A. Nine points in a row in Zone C or beyond: No sequence of nine points are in Zone C (on one side of the centerline) or beyond. Six points in a row steadily increasing or decreasing: No sequence of six points steadily increases or decreases. Fourteen points in a row alternating up and down: This pattern does not exist.

The process appears to be in control. None of the out-of-control signals are present. There is no indication that special causes of variation present. 12.30

The p-chart is designed to monitor the proportion of defective units produced by a process.

12.32

a.

To compute the proportion of defectives in each sample, divide the number of defectives by the number in the sample, 200:

No. of defectives Pˆ = No. in sample

460

Chapter 12


The sample proportions are listed in the table: Sample No.

1 2 3 4 5 6 7 8 9 10 11 12 13 b.

pˆ .080 .070 .045 .055 .075 .040 .060 .080 .085 .065 .075 .050 .045

Sample No. pˆ 14 .060 15 .070 16 .055 17 .040 18 .035 19 .060 20 .075 21 .045 22 .080 23 .065 24 .055 25 .050

To get the total number of defectives, sum the number of defectives for all 25 samples. The sum is 303. To get the total number of units sampled, multiply the sample size by the number of samples: 200(25) = 5000. p =

Total defective in all samples 303 = .0606 = 5000 Total units sampled

Centerline = p = .060

Upper control limit = p + 3

Lower control limit = p − 3

c.

p (1 − p ) .0606(.9394) = .0606 + 3 = .1112 n 200 p (1 − p ) .0606(.9394) = .0606 − 3 = .0100 n 200

p (1 − p ) .0606(.9394) = .0606 + 2 = .0943 n 200 p (1 − p ) .0606(.9394) Lower A–B boundary = p − 2 = .0606 − 2 = .0269 n 200 p (1 − p ) .0606(.9394) = .0606 + Upper B–C boundary = p + = .0775 n 200 p (1 − p ) .0606(.9394) = .0606 − Lower B–C boundary = p − = .0437 n 200 Upper A–B boundary = p + 2


461


d.

The p-chart is:

e.

To determine if the process is in or out of control, we check the four rules: Rule 1: Rule 2:


Rule 3: Rule 4:

The process appears to be in control. There do not appear to be any special causes of variation. 12.34

a.

The sample size is determined by the following: n>

9 (1 − p 0 ) p0

=

9(1 − .01) = 891 .01

The minimum sample size is 892. b.


9 (1 − p 0 ) p0

=

9(1 − .05) = 171 .05

The minimum sample size is 172.

462

Chapter 12


c.


9 (1 − p 0 ) p0

=

9(1 − .10) = 81 .10

The minimum sample size is 82. d.


9 (1 − p 0 ) p0

=

9(1 − .20) = 36 .20

The minimum sample size is 37. 12.36

a.


9 (1 − p 0 ) p0

=

9(1 − .07) = 119.6 ≈ 120 .07

The minimum sample size is 120. b.

To compute the proportion of defectives in each sample, divide the number of defectives by the number in the sample, 120: pˆ =

No. defectives No. in sample


1 2 3 4 5 6 7 8 9 10

pˆ .092 .042 .033 .067 .083 .108 .075 .067 .083 .092

Sample No. pˆ 11 .083 12 .100 13 .067 14 .050 15 .083 16 .042 17 .083 18 .083 19 .025 20 .067

To get the total number of defectives, sum the number of defectives for all 20 samples. The sum is 171. To get the total number of units sampled, multiply the sample size by the number of samples: 120(20) = 2400.


463


p =


Centerline = p = .071 Upper control limit = p + 3 Lower control limit = p − 3

p (1 − p ) .071(.929) = .071 + 3 = .141 n 120 p (1 − p ) .071(.929) = .071 − 3 = .001 n 120

p (1 − p ) .071(.929) = .071 + 2 = .118 n 120 p (1 − p ) .071(.929) = .071 − 2 = .024 Lower A–B boundary = p − 2 n 120 p (1 − p ) .071(.929) Upper B–C boundary = p + = .071 + = .094 n 120 p (1 − p ) .071(.929) Lower B–C boundary = p − = .071 − = .048 n 120 Upper A–B boundary = p + 2

The p-chart is:

c.



The process appears to be in control.

464

Chapter 12


12.38

d.

Since the process is in control, it is appropriate to use the control limits to monitor future process output.

e.

No. The number of defectives recorded was per day, not per hour. Therefore, the p-chart is not capable of signaling hour-to-hour changes in p.

a.

To compute the proportion of defectives in each sample, divide the number of defectives by the number in the sample, 200: pˆ =

No. defectives No. in sample


1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

pˆ .065 .025 .010 .015 .010 .015 .005 .010 .005 .005 .055 .030 .010 .015 .005

Sample No. pˆ 16 .015 17 .005 18 .010 19 .015 20 .005 21 .045 22 .025 23 .010 24 .005 25 .015 26 .010 27 .020 28 .010 29 .005 30 .005


Total defective in all samples 96 = = .016 Total units sampled 6000

The centerline is p = .016 p (1 − p ) .016(1 − .016) = .016 + 3 = .0426 n 200 p (1 − p ) .016(1 − .016) Lower control limit = p − 3 = .016 − 3 = -.0106 n 200 p (1 − p ) .016(1 − .016) Upper A–B boundary = p + 2 = .016 + 2 = .0337 n 200 Upper control limit = p + 3


465


Lower A–B boundary = p − 2 Upper B–C boundary = p + Lower B–C boundary = p −

p (1 − p ) .016(1 − .016) = .016 − 2 = -.0017 n 200 p (1 − p ) .016(1 − .016) = .016 + = .0249 n 200 p (1 − p ) .016(1 − .016) = .016 − = .0071 n 200

The p-chart is:

b.


One point beyond Zone A: There are 3 points beyond Zone A—Points 1, 11, and 21. Nine points in a row in Zone C or beyond: No sequence of nine points are in Zone C (on one side of the centerline) or beyond. Six points in a row steadily increasing or decreasing: This pattern is not present. Fourteen points in a row alternating up and down: This pattern does not exist.

The process does not appear to be in control. Rule 1 indicates that the process is out of control. 12.40

Specification spread is the difference between the upper specification limit and the lower specification spread. The specification spread is determined by customers, management, and product designers. Process spread is the spread of the actual output and is a function of the standard deviation of the data.

12.42

There are two reasons why CP should not be used in isolation. First, CP is a statistic and is subject to sampling error. The sample standard deviation is used to estimate the population standard deviation which is used to calculate the process spread. Thus, the estimate of the process spread can vary from sample to sample. Second, CP does not reflect the shape of the output distribution. Distributions with different shapes can have the same CP value.

466

Chapter 12


12.44

12.46

12.48

The specification spread is the difference between the upper specification limit and the lower specification limit. a.

Specification spread = USL − LSL = 19.65 − 12.45 = 7.20

b.

Specification spread = USL − LSL = .0010 − .0008 = .0002

c.

Specification spread = USL − LSL = 1.43 − 1.27 = 0.16

d.

Specification spread = USL − LSL = 490 − 486 = 4

CP =

Specification spread USL − LSL = 6σ Process spread

a.

CP ≈

1.0065 − 1.0035 USL − LSL .003 =1 = = 6s .003 6(.0005)

b.

CP ≈

22 − 21 USL − LSL 1 = = = .8333 6s 1.2 6(.2)

c.

CP ≈

875 − 870 USL − LSL 5 = 1.111 = = 6s 4.5 6(.75)

a.

If the output distribution is normal with a mean of 1000 and a standard deviation of 100, then the proportion of the output that is unacceptable is: P(x < 980) + P(x > 1,020) 980 − 1, 000 ⎞ 1, 020 − 1, 000 ⎞ ⎛ ⎛ = P⎜ z < ⎟ + P⎜ z > ⎟ 100 100 ⎝ ⎠ ⎝ ⎠ = P(z < −.2) + P(z > .2) = (.5 − .0793) + (.5 − .0793) = .8414 (using Table IV, Appendix B) The percentage of unacceptable output is 84.14%.

b.

CP =

USL − LSL 1, 020 − 980 40 = .067 ≈ = 6σ 600 6(100)

Since the value of CP is less than 1, the process is not capable.


467


12.50

a.

A capability diagram is: LSL = 35 is off the chart.

b.

Fifty-two of the observations are above the upper specification limit. Thus, the percentage is (52/100) × 100% = 52%.

c.

From the sample, x = 37.007 and s = .083. CP =

d.

37 − 35 USL − LSL 2 ≈ = 4.016 = 6s .498 6(.083)

Since the CP value is greater than 1, the process is capable.

12.52

The quality of a good or service is indicated by the extent to which it satisfies the needs and preferences of its users. Its eight dimensions are: performance, features, reliability, conformance, durability, serviceability, aesthetics, and other perceptions that influence judgments of quality.

12.54

A process is a series of actions or operations that transform inputs to outputs. A process produces output over time. Organizational process: Manufacturing a product. Personnel Process: Balancing a checkbook.

12.56

The six major sources of process variation are: people, machines, materials, methods, measurements, and environment.

12.62

Common causes of variation are the methods, materials, equipment, personnel, and environment that make up a process and the inputs required by the process. That is, common causes are attributable to the design of the process. Special causes of variation are events or actions that are not part of the process design. Typically, they are transient, fleeting events that affect only local areas or operations within the process for a brief period of time. Occasionally, however, such events may have a persistent or recurrent effect on the process.

12.64

If a process is capable, then it is necessarily in control. If a process is in control, then the control chart should be used to monitor the process.

468

Chapter 12


12.66

The probability of observing a value of more than 3 standard deviations from its mean is: P( x > μ + 3 σ x ) + P( x < μ − 3 σ x ) = P(z > 3) + P(z < 3) = .5000 − .4987 + .5000 − .4987 = .0026 If we want to find the number of standard deviations from the mean the control limits should be set so the probability of the chart falsely indicating the presence of a special cause of variation is .10, we must find the z score such that: P(z > z0) + P(z < −z0) = .1000 or P(z > z0) = .0500 Using Table IV, Appendix B, z0 = 1.645. Thus the control limits should be set 1.645 standard deviations above and below the mean.

12.68

a.

The centerline = x =

∑ x = 150.58 n

20

= 7.529

The time series plot is:

12.70

b.

The variation pattern that best describes the pattern in this time series is the level shift. Points 1 through 10 all have fairly low values, while points 11 through 20 all have fairly high values.

a.

Yes. The minimum sample size necessary so the lower control limit is not negative is: n>

9 (1 − p 0 ) p0

From the data, p0 ≈ .06 Thus, n >

9(1 − .06) = 141. Our sample size was 200. .06


469


b.

To compute the proportion of defectives in each sample, divide the number of defectives by the number in the sample, 200: p =

No. of defectives No. in sample

The sample proportions are listed in the table: Sample No. 1 2 3 4 5 6 7 8 9 10 11

p .02 .03 .055 .06 .025 .05 .04 .08 .085 .10 .14

Sample No. 12 13 14 15 16 17 18 10 20 21

p .10 .10 .085 .065 .05 .055 .035 .03 .04 .045


No. of defectives 258 = = .0614 No. in sample 4200

Centerline = p = .0614 Upper control limit = p + 3 Lower control limit = p − 3 Upper A-B boundary = p + Lower A-B boundary = p − Upper B-C boundary = p + Lower B-C boundary = p −

470

p (1 − p ) = .0614 + 3 n p (1 − p ) = .0614 − 3 n p (1 − p ) 2 = .0614 + n p (1 − p ) 2 = .0614 − n p (1 − p ) = .0614 + n p (1 − p ) = .0614 − n

.0614(.9386) = .1123 200 .0614(.9386) = .0105 200 .0614(.9386) 2 = .0953 200 .0614(.9386) 2 = .0275 200 .0614(.9386) = .0784 200 .0614(.9386) = .0444 200

Chapter 12


The p-chart is:

c.

To determine if the control limits should be used to monitor future process output, we need to check the four rules. Rule 1: Rule 2: Rule 3: Rule 4:

One point beyond Zone A: The 11th point is beyond Zone A. This indicates the process is out of control. Nine points in a row in Zone C or beyond: There are not nine points in a row in Zone C (on one side of the centerline) or beyond. Six points in a row steadily increasing or decreasing: No sequence of six points steadily increase or decrease. Fourteen points in a row alternating up and down: This pattern does not exist.

Rule 1 indicates the process is out of control. These control limits should not be used to monitor future process output. 12.72

a.

In order for the x -chart to be meaningful, we must assume the variation in the process is constant (i.e., stable). ∑ x and R = range = largest measurement - smallest For each sample, we compute x = n measurement. The results are listed in the table: Sample No. 1 2 3 4 5 6 7 8 9 10 11 12

x 32.325 30.825 30.450 34.525 31.725 33.850 32.100 28.250 32.375 30.125 32.200 29.150


R 11.6 12.4 7.8 10.2 9.1 10.4 10.1 6.8 8.7 6.3 7.1 9.3

Sample No. 13 14 15 16 17 18 19 20 21 22 23 24

x 31.050 34.400 31.350 28.150 30.950 32.225 29.050 31.400 30.350 34.175 33.275 30.950

R 13.3 9.6 7.3 8.6 7.6 5.6 10.0 8.7 8.9 10.5 13.0 8.9

471


x1 + x2 + " + x24 755.225 = = 31.4677 k 24 R + R2 + " + R24 221.8 = R = 1 = 9.242 k 24 x =

Centerline = x = 31.468 From Table XII, Appendix B, with n = 4, A2 = .729. Upper control limit = x + A2 R = 31.468 + .729(9.242) = 38.205 Lower control limit = x − A2 R = 31.468 - .729(9.242) = 24.731 2 2 ( A2 R ) = 31.468 + (.729)(9.242) = 35.960 3 3 2 2 Lower A-B boundary = x − ( A2 R) = 31.468 − (.729)(9.242) = 26.976 3 3 1 1 Upper B-C boundary = x + ( A2 R ) = 31.468 + (.729)(9.242) = 33.714 3 3 1 1 Lower B-C boundary = x − ( A2 R) = 31.468 − (.729)(9.242) = 29.222 3 3

Upper A-B boundary = x +

The x -chart is:

b.

To determine if the process is in or out of control, we check the six rules. Rule 1: Rule 2: Rule 3: Rule 4:

472


Chapter 12


Rule 5: Rule 6:

Two out of three points in Zone A or beyond: There are no groups of three consecutive points that have two or more in Zone A or beyond. Four out of five points in a row in Zone B or beyond: No sequence of five points has four or more in Zone B or beyond.

The process appears to be in control. There are no indications that special causes of variation are affecting the process.

12.74

c.

Since the process appears to be in control, these limits should be used to monitor future process output.

a.

A capability analysis diagram is:

b.

For an upper specification limit of 5, there are 27 observations above this limit. Thus, (27/100) × 100% = 27% of the observations are unacceptable. It does not appear that the process is capable.

c.

From Exercise 14.73, the process appears to be in control. Thus, it is appropriate to estimate CP. From the sample, x = 3.867 and s = 2.190 CP =

5−0 USL − LSL 5 = .381 = ≈ 6s 6(2.19) 13.14

Since the CP value is less than 1, the process is not capable.

12.76

d.

There is no lower specification limit because management has no time limit below which is unacceptable. The variable being measured is time customers wait in line. The actual lower limit would be 0.

a.



The centerline is p = .048


473


p (1 − p ) .048(1 − .048) = .048 + 3 = .099 N 160 p (1 − p ) .048(1 − .048) = .048 − 3 = -.003 −3 N 160 p (1 − p ) .048(1 − .048) p +2 = .048 + 2 = .082 N 160 p (1 − p ) .048(1 − .048) = .048 − 2 = .014 p −2 N 160 p (1 − p ) .048(1 − .048) = .048 + = .065 p + N 160 p (1 − p ) .048(1 − .048) = .048 − = .031 p − N 160

Upper control limit = p + 3 Lower control limit = p Upper A–B boundary = Lower A–B boundary = Upper B–C boundary = Lower B–C boundary = The p-chart is:

b

To determine if the process is in or out of control, we check the four rules of the R-chart: Rule 1: Rule 2: Rule 3: Rule 4:

One point beyond Zone A: No points are beyond Zone A. Nine points in a row in Zone C or beyond: No sequence of nine points are in Zone C (on one side of the centerline) or beyond. Six points in a row steadily increasing or decreasing: This pattern is not present. Fourteen points in a row alternating up and down: This pattern does not exist.

The process appears to be in control. Thus, there is no indication that special causes of variation are present.

474

Chapter 12


c.

The Pareto diagram is:

Most of the defects are due to microcracks. Thus, "microcracks" are the "vital few." The other types of defectives are broken stands, gaps between layers, and internal voids. These are the "trivial many."


475


Time Series: Descriptive Analyses, Models, and Forecasting

13.2

a.

Chapter 13

The simple composite index is calculated as follows: First, sum the observations for all the series of interest at each time period. Select the base time period. Divide each sum by the sum in the base time period and multiply by 100.

b.

To calculate a weighted composite index, we follow the following steps: First, multiply the observations in each time series by its appropriate weight. Then sum the weighted observations across all times series for each time period. Select the base time period. Divide each weighted sum by the weighted sum in the base time period and multiply by 100.

c.

The steps necessary to compute a Laspeyres Index are: 1. 2. 3. 4. 5.

d.

The steps necessary to compute a Paasche index are: 1. 2. 3. 4.

5.

13.4

476

a.

Collect data for each of k price series. Select a base time period and collect purchase quantity information for each of the k series at the base time period. Using the purchase quantity values at the base period as weights, multiply each value in the kth series by its corresponding weight. Sum the products for each time period. Divide each sum by the sum corresponding to the base period and multiply by 100.

Collect data for each of k price series. Select a base period. Collect purchase quantity information for each series at each time period. For each time period, multiply the value in each price series by its corresponding purchase quantity for that time period. Sum the products for each time period. To find the value of the Paasche index at a particular time period, multiply the purchase quantity values (weights) for that time period by the corresponding price values of the base time period. Sum the results for the base period. The Paasche Index is then found by dividing the sum found in (4) by the sum found in (5).

The simple index for the quarter 4 price of product A, using quarter 1 as the base period is (4.25 / 3.25) × 100 = 130.77.

Chapter 13


13.6

b.

The simple index for the quarter 2 price of product B, using quarter 1 as the base period is (1.25 / 1.75) × 100 = 71.43.

c.

To find the simple composite index, we must first sum the prices for all three products over the base period and the quarter for which we want to compute the simple composite index. The sum for quarter 1 is 3.25 + 1.75 + 8.00 = 13.00. The sum for quarter 4 is 4.25 + 1.00 + 10.50 = 15.75. The simple composite index for quarter 4 using quarter 1 as the base period is (15.75 / 13.00) × 100 = 121.15.

d.

The sum of all the products for quarter 2 is 3.50 + 1.25 + 9.35 = 14.10. The simple composite index for quarter 4 using quarter 2 as the base period is (15.75 / 14.10) × 100 = 111.70.

a.

To find the simple index, divide each value by the value for the base year and multiply by 100. The index numbers are:

Year 1975 1980 1985 1990 1995 2000 b.

Simple Index (Base Year = 1975) (13,719/13,719) × 100 = 100.00 (21,023/13,719) × 100 = 153.24 (27,735/13,719) × 100 = 202.16 (35,353/13,719) × 100 = 257.69 (40,611/13,719) × 100 = 296.02 (50,890/13,719) × 100 = 370.95

Simple Index (Base Year = 1980) (13,719/21,023) × 100 = 65.26 (21,023/21,023) × 100 = 100.00 (27,735/21,023) × 100 = 131.93 (35,353/21,023) × 100 = 168.16 (40,611/21,023) × 100 = 193.17 (50,890/21,023) × 100 = 242.07

The index value for 1990 is 257.69 when the base is 1975. Thus, the median annual family income for 1990 increased by 257.69 – 100 = 157.69% over the median annual family income in 1975. The index value for 1990 is 168.16 when the base is 1980. Thus, the median annual family income for 1990 increased by 168.16 – 100 = 68.16% over the median annual family income in 1980.

13.8

a.

To compute the simple index, divide each housing start value by the 2001, Quarter 1 value, 274 and then multiply by 100. Year 2001

2002

2003

Quarter 1 2 3 4 1 2 3 4 1 2

Simple Index (274/274) × 100 = (374/274) × 100 = (341/274) × 100 = (285/274) × 100 = (293/274) × 100 = (386/274) × 100 = (361/274) × 100 = (319/274) × 100 = (304/274) × 100 = (406/274) × 100 =

100.00 136.50 124.45 104.01 106.93 140.88 131.75 116.42 110.95 148.18

Year 2003 2004

2005


Quarter 3 4 1 2 3 4 1 2 3 4

Simple Index (412/274) x 100 = (377/274) x 100 = (345/274) x 100 = (456/274) x 100 = (440/274) x 100 = (370/274) x 100 = (369/274) x 100 = (485/274) x 100 = (471/274) x 100 = (392/274) x 100 =

150.36 137.59 125.91 166.42 160.58 135.04 134.67 177.01 171.90 143.07

477


13.10

b.

The value of the index for Quarter 2, 2004 is 166.42. Thus, the housing starts in Quarter 2, 2004 increased by 166.42 – 100 = 66.42% over the housing starts in the base quarter, Quarter 1, 2001.

c.

The value of the index for Quarter 4, 2005 is 143.07. Thus, the housing starts in Quarter 4, 2005 increased by 143.07 – 100 = 43.07% over the housing starts in the base quarter, Quarter 1, 2001.

d.

The number of housing starts for Quarter 1, 2003 is 304 thousand. The number of housing starts for Quarter 4, 2005 is 392 thousand. Using Quarter 1, 2003 as the base, the index for Quarter 4, 2005 is (392/304) × 100 = 128.95. Thus, the number of housing starts in Quarter 4, 2005 increased by 128.95 – 100 = 28.95% over the housing starts in Quarter 1, 2003.

a.

To compute the simple index for the agricultural data, divide each farm value by the 1980 value 3,364 and then multiply by 100. To compute the simple index for the nonagricultural data, divide each nonfarm value by the 1980 value 95,938 and then multiply by 100. The two indices are:

Year 1980 1985 1990 1995 2000 2003

Farm Index (3,364/3,364) × 100 = (3,179/3,364) × 100 = (3,223/3,364) × 100 = (3,440/3,364) × 100 = (2,464/3,364) × 100 = (2,275/3,364) × 100 =

Nonfarm Index (95,938/95,938) × 100 = (10,3971/95,938) × 100 = (115,570/95,938) × 100 = (121,460/95,938) × 100 = (134,427/95,938) × 100 = (135,461/95,938) × 100 =

100.00 108.37 120.46 126.60 140.12 141.20

b.

The nonfarm segment has shown the greater percentage change in employment over the time period. The nonfarm employment in 2003 was 41.20% greater than in 1980. The farm employment in 2003 was 32.37% lower than in 1980.

c.

To compute the simple composite index, first sum the two values (farm and nonfarm) for every time period. Then divide the sum by the sum in 1980, 99,302, and then multiply by 100. The simple composite index is: Year 1980 1985 1990 1995 2000 2003

d.

478

100.00 94.50 95.81 102.26 73.25 67.63

Sum 99,302 107,150 118,793 124,900 136,891 137,736

Simple Composite Index (99,302/99,302) × 100 = (107,150/99,302) × 100 = (118,793/99,302) × 100 = (124,900/99,302) × 100 = (136,891/99,302) × 100 = (137,736/99,302) × 100 =

100.00 107.90 119.63 125.78 137.85 138.70

The simple composite index value for 2003 is 138.70. The composite employment is 38.70% higher in 2003 than in 1980.

Chapter 13


13.12

a.

The find Laspeyres index, we multiply the durable goods by 10.9, the nondurable goods by 14.02, and the services by 42.6. The three products are then summed. The index is found by dividing the weighted sum at each time period by the weighted sum of 1970, 17,108.86, and then multiplying by 100. The Laspeyres index and the simple composite index for 1970 (computed in Exercise 13.11) are: Year

Simple Composite Index-1970 51.43 68.77 100.00 158.52 270.39 412.59 581.78 768.60 1,033.83 1,272.99

1960 1965 1970 1975 1980 1985 1990 1995 2000 2004 b.

Weighted Sum 8,409.95 11,442.51 17,108.86 27,509.89 48,215.53 76,167.86 110,254.64 150,193.08 202,856.51 251,152.45

Laspeyres Index 49.16 66.88 100.00 160.79 281.82 445.20 644.43 877.87 1,185.68 1,467.97

The plot of the two indices is: 1600

Variable I-1970 Laspeyres

1400 1200

Index

1000 800 600 400 200 0 1960

1965

1970

1975 1980

1985

1990

1995

2000

Y ear

The two indices are very similar from 1960 to approximately 1980. After 1980, the difference between the two indices becomes larger, with the Laspeyres index increasing faster than the simple composite index.


479


13.14

a.

To get the simple composite price index, sum the prices for the three metals for each month, divide by 2,090.35 (the sum of the prices for the base period January), and multiply by 100. To get the simple composite quantity index, sum the quantities for the three metals for each month, divide by 8,793.40 (the sum of the quantities for the base period January), and multiply by 100. The indices are:

Month Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec b.

Price Total 2,090.35 2,495.72 2,536.85 2,409.55 2,550.70 2,603.20 2,719.30 2,998.52 2,978.98 2,997.82 3,038.80 3,018.57

Price Index 100.00 119.39 121.36 115.27 122.02 124.53 130.09 143.45 142.51 143.41 145.37 144.41

Quantity Index 100.00 97.02 106.97 102.89 105.80 104.08 105.78 107.56 106.70 110.29 103.79 100.16

To compute the Laspeyres index, multiply the price for each month by the quantity for each of the metals for January, sum the products for the three metals, divide by 1,768,700.64 (the sum for the base period January), and multiply by 100. The Laspeyres index is: Month Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec

480

Quantity Total 8,793.40 8,531.70 9,406.50 9,047.10 9,303.20 9,152.10 9,301.80 9,457.90 9,382.90 9,698.20 9,127.00 8,807.90

Total 1,768,700.64 2,077,067.24 2,345,138.00 2,114,563.64 1,760,956.32 1,746,326.88 2,117,568.80 2,377,017.20 2,100,958.72 2,276,109.40 2,366,980.72 2,155,654.92

Laspeyres Index 100.00 117.43 132.59 119.55 99.56 98.74 119.72 134.39 118.79 128.69 133.83 121.88

Chapter 13


c.

The plots of the simple composite price index, the simple composite quantity index, and Laspeyres index are: 150

Variable Price Quantity Laspeyres

140

Index

130

120

110

100

90 Jan

Feb Mar

Apr

May

Jun

Jul

Aug

Sep Oct

Nov Dec

M onth

The quantity index appears to be fairly stable while the price index steadily increases. The Laspeyres index is rather unstable, as it varies much more than the other two indices. d.

The following steps are used to compute the Paasche index: 1. 2.

3.

First, multiply the price × production for copper, steel, and lead for each month. The numerator of the index is the sum of these three quantities at each month. Next, multiply the production values of copper by 1,133, the production of steel by 187.75, and the production of lead by 769.6. The denominator is the sum of these three quantities at each month. The values of the Paasche index are the ratios of these two values at each month times 100.

The Paasche index is: Month Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec

Paasche Numerator 1,768,700.64 2,013,192.24 2,500,128.80 2,180,640.81 1,858,912.26 1,822,735.92 2,230,984.40 2,549,791.96 2,244,369.96 2,504,067.86 2,450,159.20 2,175,046.70

Paasche Denominator 1,768,700.64 1,714,396.58 1,884,813.60 1,823,938.71 1,867,861.77 1,844,379.26 1,864,385.48 1,898,332.74 1,888,977.74 1,946,822.77 1,831,683.15 1,781,166.44


Paasche Index 100.00 117.43 132.65 119.56 99.52 98.83 119.66 134.32 118.81 128.62 133.77 122.11

481


e.

The plot of the Laspeyres index and the Paasche index is: The two indices are almost identical. Time Series Plot of Laspeyres, Paasche 135

Variable Laspeyres Paasche

130 125

Data

120 115 110 105 100 Jan

Feb Mar

Apr

May

Jun

Jul

Aug

Sep Oct

Nov Dec

M onth

13.16

f.

The values of Laspeyres index for September and December are 118.79 and 121.88 The values of the Paasche index for September and December are 118.81 and 122.11. These values are almost identical. Both the Laspeyres and Paasche indices are so close to being the same, neither is superior to the other.

a.

The exponentially smoothed employment for the first period is equal to the employment for that period. For the rest of the time periods, the exponentially smoothed employment values are found by multiplying .5 times the employment value of that time period and adding to that (1 − .5) times the value of the exponentially smoothed employment figure of the previous time period. The exponentially smoothed employment value for the time period 2 is .5(281) + (1 − .5)(280) = 280.5. The rest of the values are shown in the table.

Month Jan. Feb. Mar. Apr. May June July Aug. Sept. Oct. Nov. Dec.

482

t 1 2 3 4 5 6 7 8 9 10 11 12

Yt 280 281 250 246 239 218 218 210 205 206 200 200

Exponentially Smoothed Series w = .5 280.0 280.5 265.3 255.6 247.3 232.7 225.3 217.7 211.3 208.7 204.3 202.2

Chapter 13


b.

The graph of the time series and the exponentially smoothed series is:

280 270

Exponentially Smoothed Series

260

Yt

250 240

Series

230 220 210 200 2

4

6

8

10

12

Time Period

13.18

a.

The exponentially smoothed fish catch for Chile for the first period is equal to the fish catch for that period. For the rest of the time periods, the exponentially smoothed fish catch values are found by multiplying .5 times the fish catch of that time period and adding to that (1 − .5) times the value of the exponentially smoothed fish catch figure of the previous time period. The exponentially smoothed fish catch for Chile for the time period 1995 is .5(7,590.5) + (1 − .5)(5,195.4) = 6,392.95. The rest of the values are shown in the table. Similarly, the exponentially smoothed fish catch for Brazil for the first period is equal to the fish catch for that period. For the rest of the time periods, the exponentially smoothed fish catch values are found by multiplying .5 times the fish catch of that time period and adding to that (1 − .5) times the value of the exponentially smoothed fish catch figure of the previous time period. The exponentially smoothed fish catch for Brazil for time period 1995 is .5(800.0) + (1 − .5)(802.9) = 801.45. The rest of the values are shown in the table.


483


Year 1990 1995 1998 1999 2000 2001 2002

b.

Chile Catch 5,195.4 7,590.5 3,265.3 5,050.2 4,300.0 3,797.1 4,271.5

Chile w=.5 Exponentially Smoothed Catch 5,195.40 6,392.95 4,829.13 4,939.66 4,619.83 4,208.47 4,239.98

Brazil Catch 802.9 800.0 706.8 703.9 766.8 806.7 822.1

Brazil w=.5 Exponentially Smoothed Catch 802.90 801.45 754.13 729.01 747.91 777.30 799.70

The plot of the two time series and the two exponentially smoothed series is: 8000

Variable Chile Brazil Chile-Exp Brazil-Exp

7000

Fish C atch

6000 5000 4000 3000 2000 1000 0 1990

1992

1994

1996 Y ear

1998

2000

2002

Both the time series and the exponentially smoothed series for the fish catch in Brazil are fairly stable over time. There is a decrease and then increase for both series in Brazil. Both the time series and exponentially smoothed series for the fish catch in Chile show a decrease over time. The exponentially smoothed series is more stable than the actual time series.

484

Chapter 13


13.20

a.

The exponentially smoothed expenditure for the first time period is equal to the expenditure for that period. For the rest of the time periods, the exponentially smoothed expenditures are found by multiplying the expenditures for the time period by w = .2 and adding to that (1 − .2) times the exponentially smoothed value above it. The exponentially smoothed value for the year 1991 is .2(548.9) + (1 − .2)(590.1) = 581.86. The rest of the values appear in the table. The process is repeated with w = .8.

Year 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005

b.

Expenditure s 590.1 548.9 581.1 607.6 643.2 654.6 687.1 727.4 779.3 831.6 853.4 872.0 890.9 912.3 925.6 931.5

w = .2 Exponentially Smoothed Value 590.10 581.86 581.71 586.89 598.15 609.44 624.97 645.46 672.23 704.10 733.96 761.57 787.43 812.41 835.05 854.34

w = .8 Exponentially Smoothed Value 590.10 557.14 576.31 601.34 634.83 650.65 679.81 717.88 767.02 818.68 846.46 866.89 886.10 907.06 921.89 929.58

The plot of the two series is:

Variable Expend Exp-.2 Exp-.8

900

Expenditur es

800

700

600

500 1991

1993

1995

1997 1999 Y ear

2001

2003

There trend in personal consumption expenditure on transportation increased at a faster rate in the 1990s than in the 2000s. In the 2000s, the consumption expenditure is increasing but at a slower rate.


485


13.22

a.

The exponentially smoothed Stock Index for the first time period is equal to the Stock Index for that time period. For the rest of the time periods, the exponentially smoothed stock price is found by multiplying w = .3 times the stock prices for that time period and adding to that (1 − .3) times the value of the exponentially smoothed stock price for the previous time period. The exponentially smoothed stock prices for the second time period is .3(1372.7) + (1 − .3)(1286.4) = 1312.29. The rest of the values are shown in the table.

Year 1999

2000

2001

2002

2003

2004

2005

2006

486

Quarter 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3

S&P 500 1286.4 1372.7 1282.7 1469.2 1498.6 1454.6 1436.5 1320.3 1160.3 1224.4 1040.9 1148.1 1147.4 989.8 815.3 879.8 848.2 974.5 996 1111.9 1126.2 1140.8 1114.6 1211.9 1180.6 1191.3 1228.8 1248.3 1294.9 1270.2 1335.8

Exponentially Smoothed Series w = .3 1286.4 1312.3 1303.4 1353.1 1396.8 1414.1 1420.8 1390.7 1321.6 1292.4 1217.0 1196.3 1181.6 1124.1 1031.4 986.0 944.6 953.6 966.3 1010.0 1044.9 1073.6 1085.9 1123.7 1140.8 1155.9 1177.8 1198.9 1227.7 1240.5 1269.1

Exponentially Smoothed Series w = .7 1286.4 1346.8 1301.9 1419.0 1474.7 1460.6 1443.7 1357.3 1219.4 1222.9 1095.5 1132.3 1142.9 1035.7 881.4 880.3 857.8 939.5 979.0 1072.0 1110.0 1131.5 1119.7 1184.2 1181.7 1188.4 1216.7 1238.8 1278.1 1272.6 1316.8

Chapter 13


The plot of the original series and the exponentially smoothed series with w = .3 is:

Variable S&P 500 Exp-.3

1500 1400

S & P 500

1300 1200 1100 1000 900 800 Q uarter Year

b.

Q1 1999

Q1 Q1 2000 2001

Q1 Q1 2002 2003

Q1 2004

Q1 Q1 2005 2006

The same procedure is followed for w = .7. The exponentially smoothed Stock Index for the first time period is equal to the Stock Index for that time period. For the rest of the time periods, the exponentially smoothed stock price is found by multiplying w = .7 times the stock prices for that time period and adding to that (1 − .7) times the value of the exponentially smoothed stock price for the previous time period. The exponentially smoothed stock prices for the second time period is .7(1372.7) + (1 − .7)(1286.4) = 1346.8. The rest of the values are shown in the table in part a. The plot of the original series and the exponentially smoothed series with w = .7 is:

Variable S&P 500 Exp-.7

1500 1400

S & P 500

1300 1200 1100 1000 900 800 Q uarter Year

c.

Q1 1999

Q1 Q1 2000 2001

Q1 Q1 2002 2003

Q1 2004

Q1 Q1 2005 2006

The exponentially smoothed series with w = .3 better describes the trends in the series. The exponentially smoothed series with w = .7 is almost exactly like the original series.


487


13.24

13.26

a.

The missing trend value for quarter 3 is: T3 = v(E3 – E2) + (1 – v)T2 = .6(3.78 – 3.50) + (1 − .6)(.25) = .27

b.

The missing smoothed value for quarter 4 is: E4 = wY4 + (1 – w)(E3 + T3) = .2(4.25) + (1 − .2)(3.78 + .27) = 4.09.

c.

The forecast for quarter 5 is: FQ5 = Ft+1 = Et + Tt = 4.09 + .29 = 4.38.

a.

To compute the exponentially smoothed values, we follow these steps: E1 = Y1 = 345 E2 = wY2 + (1 – w)E1 = .6(456) + (1 − .6)(345) = 411.60 E3 = wY3 + (1 – w)E2 = .6(440) + (1 − .6)(411.60) = 428.64 The rest of the values are computed in a similar manner and are listed in the table: Year 2004

Quarter 1 2 3 4 1 2 3 4

2005

b.

Exponentially Smoothed w = .6 345.00 411.60 428.64 393.46 378.78 442.51 459.61 419.04

Housing Starts 345 456 440 370 369 485 471 392

Using MINITAB, the plot is: 500

Variable Housing Exp-.6

475

Star ts

450

425

400

375

350 Q uarter Year

Q1 2004

Q2

Q3

Q4

Q1 2005

Q2

Q3

Q4

c. To forecast using exponentially smoothed values, we use the following: F2006,1 = Ft+1 = Et = 419.04 F2006,2 = Ft+2 = Ft+1 = 419.04 F2006,3 = Ft+3 = Ft+1 = 419.04 F2006,4 = Ft+4 = Ft+1 = 419.04

488

Chapter 13


13.28

a.

Using the information from Exercise 13.21, the forecast using the exponentially smoothed values with w = .9 is: F2006 = Ft+2 = Ft+ 1 = Et = 1815.3

b.

We first compute the Holt-Winters values for years 1974-2004. With w = .3 and v = .8, E2 = Y2 = 1171 E3 = wY3 + (1 – w)(E2 + T2) = .3(1663) + (1 − .3)(1171 + 245) = 1490.1 T2 = Y2 – Y1 = 1171 – 926 = 245 T3 = v(E3 – E2) + (1 – v)T2 = .8(1490.1 – 1171) + (1 − .8)(245) = 304.28 The rest of the Et’s and Tt’s appear in the table:

Year 1974 1975 1976 1977 1978 1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004

t 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31

Imports 926 1,171 1,663 2,058 1,892 1,866 1,414 1,067 633 540 553 479 771 876 987 1,232 1,282 1,233 1,247 1,339 1,307 1,303 1,258 1,378 1,522 1,543 1,664 1,770 1,490 1,671 1,833

Et w = .3 v = .8

Tt w = .3 v = .8

1171.00 1490.10 1873.47 2136.31 2253.87 2107.47 1734.46 1182.96 637.02 235.48 8.40 50.00 283.65 622.67 1020.93 1365.36 1571.76 1639.14 1619.79 1529.25 1411.34 1289.30 1232.36 1270.65 1364.08 1508.72 1679.04 1736.09 1771.26 1820.42

245.00 304.28 367.55 283.79 150.80 −86.96 −315.80 −504.36 −537.62 −428.76 −267.41 −20.21 182.88 307.79 380.16 351.58 235.43 100.99 4.72 −71.48 −108.63 −119.36 −69.42 16.75 78.09 131.33 162.52 78.14 43.77 48.08


489


To forecast using the Holt-Winters Model: For w = .3 and v = .8, F2006 = Ft+2 = Ft+1 = Et + 2Tt = 1,820.42 + 2(48.08) = 1,916.58 c.

The error forecast for the exponentially smoothed series is Yt+2 – Ft+2 = 2,100 – 1815.3 = 284.7 The error forecast for the Holt-Winters series is Yt+2 – Ft+2 = 2,100 – 1,916.58 = 183.42 The error for the Holt-Winters forecast is smaller than the error for the exponentially smoothed forecast.

13.30

a.

We first compute the Holt-Winters values for the years 2003-2005. With w = .3 and v = .5, E2 = Y2 = 974.5 E3 = wY3 + (1 – w)(E2 + T2) =.3(996.0) + (1 − .3)(974.5 + 126.3) = 1,069.36. T2 = Y2 – Y1 = 974.5 – 848.2 = 126.3 T3 = v(E3 – E2) + (1 – v)T2 = .5(1,069.36 – 974.5) + (1 − .5)(126.3) = 110.58 The rest of the Et’s and Tt’s appear in the table that follows.

Year 2003

2004

2005

2006

490

Quarter 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3

S&P 500 848.2 974.5 996.0 1111.9 1126.2 1140.8 1114.6 1211.9 1180.6 1191.3 1228.8 1248.3 1294.9 1270.2 1335.8

Et w = .3 v = .5

Tt w = .3 v = .5

Et w = .7 v = .5

Tt w = .7 v = .5

974.5 1069.36 1159.53 1219.79 1252.32 1250.50 1258.03 1246.99 1232.52 1227.45 1229.96

126.30 110.58 100.37 80.32 56.42 27.30 17.42 3.19 -5.64 -5.35 -1.42

974.5 1027.44 1113.45 1148.72 1161.64 1139.88 1192.62 1193.28 1196.53 1221.92 1245.60

126.30 89.62 87.81 61.54 37.23 7.74 30.24 15.45 9.35 17.37 20.52

Chapter 13


To forecast using the Holt-Winters Model with w = .3 and v = .5: F2006,1 = Ft+1 = Et + Tt = 1,229.96 – 1.42 = 1,228.54 F2006,2 = Ft+2 = Et + 2Tt = 1,229.96 + 2(–1.42) = 1,227.12 F2006,3 = Ft+3 = Et + 3Tt = 1,229.96 + 3(–1.42) = 1,225.70 With w = .7 and v = .5, E2 = Y2 = 974.5 E3 = wY3 + (1 – w)(E2 + T2) =.7(996.0) + (1 − .7)(974.5 + 126.3) = 1,027.44. T2 = Y2 – Y1 = 974.5 – 848.2 = 126.3 T3 = v(E3 – E2) + (1 – v)T2 = .5(1,027.44 – 974.5) + (1 − .5)(126.3) = 89.62 The rest of the Et’s and Tt’s appear in the table above. To forecast using the Holt-Winters Model with w = .7 and v = .5: F2006,1 = Ft+1 = Et + Tt = 1,245.60 + 20.52 = 1,266.12 F2006,2 = Ft+2 = Et + 2Tt = 1,245.60 + 2(20.52) = 1,286.64 F2006,3 = Ft+3 = Et + 3Tt = 1,245.60 + 3(20.52) = 1,307.16 13.32

a.

From Exercise 13.25a, the forecasts for 2003-2005 using w = .3 are: F2003 = 199.48 F2004 = 199.48 F2005 = 199.48 The errors are the differences between the actual values and the predicted values. Thus, the errors are: Y2003 − F2003 = 195 − 199.48 = −4.48 Y2004 − F2004 = 197 − 199.48 = −2.48 Y2005 − F2005 = 195 − 199.48 = −4.48

b.

From Exercise 13.25a, the forecasts for 2003-2005 using w = .7 are: F2003 = 199.74 F2004 = 199.74 F2005 = 199.74 The errors are: Y2003 − F2003 = 195 − 199.74 = −4.74 Y2004 − F2004 = 197 − 199.74 = −2.74 Y2005 − F2005 = 195 − 199.74 = −4.74


491


c.

For the exponentially smoothed forecasts with w = .3, m

∑ | Yt − Ft |

|195 − 199.48 | + |197 − 199.48 | + |195 − 199.48 | 11.44 = = 3.81 m 3 3 ⎡ m (Yt − Ft ) ⎤ ⎡ 195 − 199.48 197 − 199.48 195 − 199.48 ⎤ ⎢∑ ⎥ + + ⎢ ⎥ Y 195 197 195 i =1 t ⎢ ⎥ ⎢ ⎥ 100 = MAPE = ⎢ 100 ⎥ m 3 ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎣ ⎦ ⎢⎣ ⎥⎦ ⎡ .0585 ⎤ =⎢ ⎥ 100 = 1.9512 ⎣ 3 ⎦ MAD =

i =1

=

m

∑ (Yt − Ft )

2

i =1

RMSE =

=

m

= d.

(195 − 199.48)2 + (197 − 199.48)2 + (195 − 199.48)2 3 46.2912 = 3.928 3

For the exponentially smoothed forecasts with w = .7, m

MAD =

∑ | Yt − Ft | i =1

m

=

|195 − 199.74 | + |197 − 199.74 | + |195 − 199.74 | 12.22 = = 4.07 3 3

⎡ m (Yt − Ft ) ⎢∑ Yt i =1 MAPE = ⎢⎢ m ⎢ ⎢⎣

⎤ ⎡ 195 − 199.74 197 − 199.74 195 − 199.74 ⎥ + + ⎢ 195 197 195 ⎥ 100 = ⎢ ⎥ 3 ⎢ ⎥ ⎢ ⎣ ⎥⎦

⎤ ⎥ ⎥ 100 ⎥ ⎥ ⎦

⎡ .0625 ⎤ =⎢ ⎥ 100 = 2.0841 ⎣ 3 ⎦ m

RMSE =

∑ (Yt − Ft ) i =1

m

2

= =

492

(195 − 199.74 )2 + (197 − 199.74 )2 + (195 − 199.74 )2 3 52.4428 = 4.181 3

Chapter 13


13.34

a.

From Exercise 13.29a, the forecasts for the 3 quarters of 2006 using w = .7 are: F2006,1 = 1,238.8 F2006,2 = 1,238.8 F2006,3 = 1,238.8 For the exponentially smoothed forecasts with w = .7: m

MAD =

∑ | Yt − Ft | i =1

m

=

|1294.9 − 1238.8 | + |1270.2 − 1238.8 | + |1335.8 − 1238.8 | 184.5 = = 61.5 3 3

⎡ m (Yt − Ft ) ⎢∑ Yt i =1 ⎢ MAPE = ⎢ m ⎢ ⎢⎣

⎤ ⎡ 1294.9 − 1238.8 1270.2 − 1238.8 1335.8 − 1238.8 ⎥ + + ⎢ 1294.9 1270.2 1335.8 ⎥ 100 = ⎢ ⎥ 3 ⎢ ⎥ ⎢ ⎣ ⎥⎦

⎤ ⎥ ⎥ 100 ⎥ ⎥ ⎦

⎡ .1407 ⎤ =⎢ ⎥ 100 = 4.689 ⎣ 3 ⎦ m

RMSE =

∑ (Yt − Ft ) i =1

m

2

= =

b.

(1294.9 − 1238.8)2 + (1270.2 − 1238.8)2 + (1335.8 − 1238.8)2 3 13,542.17 = 67.187 3

From Exercise 13.29b, the forecasts for the 3 quarters of 2006 using w = .3 are: F2006,1 = 1,198.9 F2006,2 = 1,198.9 F2006,3 = 1,198.9 For the exponentially smoothed forecasts with w = .3: m

MAD =

∑ | Yt − Ft | i =1

=

m 304.2 = = 101.4 3

|1294.9 − 1198.9 | + |1270.2 − 1198.9 | + |1335.8 − 1198.9 | 3


493


⎡ m (Yt − Ft ) ⎢∑ Yt i =1 MAPE = ⎢⎢ m ⎢ ⎢⎣

⎤ ⎡ 1294.9 − 1198.9 1270.2 − 1198.9 1335.8 − 1198.9 ⎥ + + ⎢ 1294.9 1270.2 1335.8 ⎥ 100 = ⎢ ⎥ 3 ⎢ ⎥ ⎢ ⎣ ⎥⎦

⎤ ⎥ ⎥ 100 ⎥ ⎥ ⎦

⎡ .2328 ⎤ =⎢ ⎥ 100 = 7.759 ⎣ 3 ⎦ m

∑ (Yt − Ft )

2

i =1

RMSE =

m

= =

13.36

(1294.9 − 1198.9 )2 + (1270.2 − 1198.9 )2 + (1335.8 − 1198.9 )2 3 33,041.3 = 104.946 3

c.

For all three measures of error, the exponentially smoothed series with w = .7 is smaller than the exponentially smoothed series with w = .3. Thus, the more accurate series would be the exponentially smoothed series with w = .7.

a.

From Exercise 13.31, the actual data and the forecasts using the exponential smoothing and the Holt-Winters forecasts are:

Year 2005

Month Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec

Gold Price 424.2 423.4 434.2 428.9 421.9 430.7 424.5 437.9 456.0 469.9 476.7 509.8

Exponential Forecast w =.5 433.47 433.47 433.47 433.47 433.47 433.47 433.47 433.47 433.47 433.47 433.47 433.47

Holt-Winters Forecast w =.5, v =.5 454.09 466.55 479.01 491.47 503.93 516.39 528.85 541.31 553.77 566.23 578.69 591.15

For the exponential smoothing forecasts with w = .5: m

MAD =

∑ | Yt − Ft | i =1

=

| 424.2 − 433.47 | + | 423.4 − 433.47 | + ⋅⋅⋅ + | 509.8 − 433.47 | 12

m 230.9 = = 19.242 12

494

Chapter 13


⎡ m (Yt − Ft ) ⎢∑ Yt i =1 MAPE = ⎢⎢ m ⎢ ⎢⎣

⎤ ⎡ 424.2 − 433.47 423.4 − 433.47 509.8 − 433.47 ⎥ + + ⋅⋅⋅ + ⎢ 424.2 423.4 509.8 ⎥ 100 = ⎢ ⎥ 12 ⎢ ⎥ ⎢ ⎣ ⎥⎦

⎤ ⎥ ⎥ 100 ⎥ ⎥ ⎦

⎡ .4904 ⎤ =⎢ ⎥ 100 = 4.087 ⎣ 12 ⎦ m

∑ (Yt − Ft )

2

i =1

RMSE =

=

m

=

( 424.2 − 433.47 )2 + ( 423.4 − 433.47 )2 + ⋅⋅⋅ + ( 509.8 − 433.47 )2 12 9,980.2268 = 28.839 12

For the Holt-Winters forecasts with w = .5 and v = .5: m

MAD =

∑ | Yt − Ft | i =1

=

| 424.2 − 454.09 | + | 423.4 − 466.55 | + ⋅⋅⋅ + | 509.8 − 591.15 | 12

m 933.34 = = 77.778 12

⎡ m (Yt − Ft ) ⎢∑ Yt i =1 MAPE = ⎢⎢ m ⎢ ⎢⎣

⎤ ⎡ 424.2 − 454.09 423.4 − 466.55 509.8 − 591.15 ⎥ + + ⋅⋅⋅ + ⎢ 424.2 423.4 509.8 ⎥ 100 = ⎢ ⎥ 12 ⎢ ⎥ ⎢ ⎣ ⎥⎦

⎤ ⎥ ⎥ 100 ⎥ ⎥ ⎦

⎡ 2.0897 ⎤ =⎢ ⎥ 100 = 17.415 ⎣ 12 ⎦ m

RMSE =

∑ (Yt − Ft ) i =1

m

2

= =

( 424.2 − 454.09 )2 + ( 423.4 − 466.55)2 + ⋅⋅⋅ + ( 509.8 − 591.15)2 12 80,190.7476 = 81.747 12

For all three measures of forecast errors, the exponential smoothing forecasts had smaller errors. Thus, the exponential smoothing forecasts are better.


495


b.

From Exercise 13.31, the actual data and the forecasts using the exponential smoothing one-step-ahead and the Holt-Winters one-step-ahead forecasts are:

Year 2005

Month Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec

Gold Price 424.2 423.4 434.2 428.9 421.9 430.7 424.5 437.9 456.0 469.9 476.7 509.8

Exponential Forecast w =.5 433.47 428.83 426.12 430.16 429.53 425.71 428.21 426.35 432.13 444.06 456.98 466.84

Holt-Winters Forecast w =.5, v =.5 454.09 444.12 433.57 433.84 430.10 422.67 425.37 423.40 432.74 452.27 473.40 488.19

For the exponential smoothing one-step-ahead forecasts with w = .5: m

MAD =

∑ | Yt − Ft | i =1

=

| 424.2 − 433.47 | + | 423.4 − 428.83 | + ⋅⋅⋅ + | 509.8 − 466.84 | 12

m 164.32 = = 13.693 12

⎡ m (Yt − Ft ) ⎢∑ Yt i =1 MAPE = ⎢⎢ m ⎢ ⎢⎣

⎤ ⎡ 424.2 − 433.47 423.4 − 428.83 509.8 − 466.84 ⎥ + + ⋅⋅⋅ + ⎢ 424.2 423.4 509.8 ⎥ 100 = ⎢ ⎥ 12 ⎢ ⎥ ⎢ ⎣ ⎥⎦

⎤ ⎥ ⎥ 100 ⎥ ⎥ ⎦

⎡ .3540 ⎤ =⎢ ⎥ 100 = 2.950 ⎣ 12 ⎦ m

RMSE =

∑ (Yt − Ft ) i =1

m

2

= =

496

( 424.2 − 433.47 )2 + ( 423.4 − 428.83)2 + ⋅⋅⋅ + ( 509.8 − 466.84 )2 12 3,884.9754 = 17.993 12

Chapter 13


For the Holt-Winters one-step-ahead forecasts with w = .5 and v = .5: m

MAD =

∑ | Yt − Ft | i =1

| 424.2 − 454.09 | + | 423.4 − 444.12 | + ⋅⋅⋅ + | 509.8 − 488.19 | 12

=

m 153.58 = = 12.798 12

⎡ m (Yt − Ft ) ⎢∑ Yt i =1 MAPE = ⎢⎢ m ⎢ ⎢⎣

⎤ ⎡ 424.2 − 454.09 423.4 − 444.12 509.8 − 488.19 ⎥ + + ⋅⋅⋅ + ⎢ 424.2 423.4 509.8 ⎥ 100 = ⎢ ⎥ 12 ⎢ ⎥ ⎢ ⎣ ⎥⎦

⎤ ⎥ ⎥ 100 ⎥ ⎥ ⎦

⎡ .3434 ⎤ =⎢ ⎥ 100 = 2.862 ⎣ 12 ⎦ m

RMSE =

∑ (Yt − Ft )

2

i =1

m

= =

( 424.2 − 454.09 )2 + ( 423.4 − 444.12 )2 + ⋅⋅⋅ + ( 509.8 − 488.19 )2 12 3,019.9854 = 15.864 12

For all three measures of forecast errors, the Holt-Winters forecasts have smaller errors. Thus, the Holt-Winters forecasts are better. 13.38

a.

Using MINITAB, the output is: Regression Analysis: Price versus t The regression equation is Price = 24.7 + 0.0910 t Predictor Constant t

Coef 24.6975 0.09103

S = 1.497

SE Coef 0.7851 0.08119

R-Sq = 8.2%

T 31.46 1.12

P 0.000 0.281

R-Sq(adj) = 1.7%


DF 1 14 15

SS 2.817 31.379 34.197

MS 2.817 2.241

F 1.26

P 0.281


Fit 26.245

SE Fit 0.785

(

95.0% CI 24.561, 27.929)

(

95.0% PI 22.619, 29.871)

Values of Predictors for New Observations New Obs 1

t 17.0


497



Fit 26.336

SE Fit 0.857

(

95.0% CI 24.497, 28.175)

(

95.0% PI 22.636, 30.036)


b.

t 18.0

The estimates of the parameters in the model, E(Yt) = β0 + β1t, are

βˆ0 = 24.6975 The price is estimated to be 24.6975 cents/pound for t = 0 or for 1991. βˆ1 = .09103

c.

The price is estimated to increase by .091 cents/pound for each additional year.

The forecast for 2007 is: Using t = 17, Yˆ 2003 = 24.6975 + .09103(17) = 26.2450 The forecast for 2008 is: Using t = 18, Yˆ 2004 = 24.6975 + .09103(18) = 26.3360 Yes, these agree with the predicted values on the printout.

d.

From the printout, the 95% forecast intervals are: 2007 (22.619, 29.871) 2008 (22.636, 30.036) We are 95% confident that the actual price in 2007 will be between 22.619 and 29.871. We are 95% confident that the actual price in 2008 will be between 22.636 and 30.036.

e.

13.40

498

No, we would not recommend that this model be used to forecast annual price. If we were to test if there is a significant linear relationship between time and annual price (H0: β1 = 0 vs Ha: β1 ≠ 0), the test statistic would be t = 1.12 and the p-value would be p = .281. Thus, we would conclude there is insufficient evidence to indicate a linear relationship exists between time and annual price. (Do not reject H0.)

The major advantage of regression forecasts over the exponentially smoothed forecasts is that prediction intervals can be formed using the regression forecasts and not using the exponentially smoothed forecasts.

Chapter 13


13.42

a.

Using MINITAB, the results are: Regression Analysis: Price versus Time The regression equation is Price = 4.76 + 0.309 Time Predictor Constant Time

Coef 4.7608 0.30857

S = 0.769971

SE Coef 0.4184 0.04601

R-Sq = 77.6%

T 11.38 6.71

P 0.000 0.000

R-Sq(adj) = 75.8%


DF 1 13 14

SS 26.661 7.707 34.368

MS 26.661 0.593

F 44.97

P 0.000

Unusual Observations Obs 15

Time 15.0

Price 10.740

Fit 9.389

SE Fit 0.379

Residual 1.351

St Resid 2.01R

R denotes an observation with a large standardized residual. Predicted Values for New Observations New Obs 1

Fit 9.698

SE Fit 0.418

95% CI (8.794, 10.602)

95% PI (7.805, 11.591)


Time 16.0


Fit 10.006

SE Fit 0.459

95% CI (9.014, 10.999)

95% PI (8.069, 11.943)


Time 17.0

From the printout:

βô = 4.7608 . The price of gas is estimated to be 4.7608 dollars per 1,000 cubic feet in 1989.

βˆ1 = .30857 . For each additional year, the price of gas is estimated to increase by .30857 dollars per 1,000 cubic feet.


499


b.

To determine the model fit, we test: H0: β = 0 Ha: β ≠ 0 The test statistic is t = 6.71 (from the printout). The p-value is p = 0.000. Since the p-value is so small, H0 is rejected for any reasonable value of α. There is sufficient evidence that the model has an adequate fit.

c.

The 95% prediction interval for 2005 is (7.805, 11.591). We are 95% confident that the actual annual price of natural gas in 2005 is between 7.805 and 11.591 dollars per 1,000 cubic feet. The 95% prediction interval for 2006 is (8.069, 11.943). We are 95% confident that the actual annual price of natural gas in 2006 is between 8.069 and 11.943 dollars per 1,000 cubic feet.

13.44

d.

There are basically two problems with using simple linear regression for predicting time series data. First, we must predict values of the time series for values of time outside the observed range. We observe data for time periods 1, 2, … , t and use the regression model to predict values of the time series for t + 1, t + 2, … . The second problem is that simple linear regression does not allow for any cyclical effects such as seasonal trends.

a.

The regression model is: E (Yt ) = β o + β1t + β 2 Q1 + β3 Q2 + β 3 Q3

b.

Using MINITAB, the output is: Regression Analysis: Sales versus t, Q1, Q2, Q3 The regression equation is Sales = 120 + 16.5 t + 262 Q1 + 223 Q2 + 106 Q3 Predictor Constant t Q1 Q2 Q3

Coef 119.85 16.512 262.34 222.83 105.51

S = 26.00

SE Coef 16.95 1.028 16.73 16.57 16.48

R-Sq = 96.9%

T 7.07 16.07 15.68 13.45 6.40

P 0.000 0.000 0.000 0.000 0.000

R-Sq(adj) = 96.1%

Analysis of Variance Source Regression Residual Error Total Source t Q1 Q2 Q3

500

DF 1 1 1 1

DF 4 15 19

SS 318560 10139 328700

MS 79640 676

F 117.82

P 0.000

Seq SS 114343 81883 94610 27724

Chapter 13



Fit 728.95

SE Fit 16.95

(

95.0% CI 692.82, 765.08)

(

95.0% PI 662.80, 795.10)

(

95.0% PI 639.80, 772.10)

(

95.0% PI 539.00, 671.30)

(

95.0% PI 450.00, 582.30)


t 21.0

Q1 1.00

Q2 0.000000

Q3 0.000000


Fit 705.95

SE Fit 16.95

(

95.0% CI 669.82, 742.08)


t 22.0

Q1 0.000000

Q2 1.00

Q3 0.000000


Fit 605.15

SE Fit 16.95

(

95.0% CI 569.02, 641.28)


t 23.0

Q1 0.000000

Q2 0.000000

Q3 1.00


Fit 516.15

SE Fit 16.95

(

95.0% CI 480.02, 552.28)


t 24.0

Q1 0.000000

Q2 0.000000

Q3 0.000000

The least squares equation is: Yˆt = 119.85 + 16.512t + 262.34Q1 + 222.83Q2 + 105.51Q3

βˆ1 = 16.512 βˆ2 = 262.34 βˆ3 = 222.83 βˆ4 = 105.51

For every increase in time period (1 quarter), the mean sales index increases by an estimated 16.512. The difference in mean sales index between the first and fourth quarters is estimated to be 262.34. The difference in the mean sales index between the second and fourth quarters is estimated to be 222.83. The difference in the mean sales index between the third and fourth quarters is estimated to be 105.51.

To determine if the model is useful, we test: H0: β1 = β2 = β3 = β4 = 0 Ha: At least one βi ≠ 0, i = 1, 2, 3, 4 The test statistic is F = 117.82


501


Since no α is given, we will use α = .05. The rejection region requires α = .05 in the upper tail of the F-distribution with numerator df = k = 4 and denominator df = n − (k + 1) = 20 − (4 + 1) = 15. From Table IX, Appendix B, F = 3.06. The rejection region is F > 3.06. Since the observed value of the test statistic falls in the rejection region (F = 117.82 > 3.06), H0 is rejected. There is sufficient evidence to indicate the model is useful at α = .05. c.

The assumption of independent error terms is in doubt.

d.

The forecasts and the 95% prediction intervals are found at the bottom of the printout and are:

2007

13.46

13.48

I II III IV

Forecast 728.95 705.95 605.15 516.115

95% Lower Limit 95% Upper Limit 662.8 795.1 639.8 772.1 539.0 671.3 450.0 582.3

a.

d = 3.9 indicates the residuals are very strongly negatively autocorrelated.

b.

d = .2 indicates the residuals are very strongly positively autocorrelated.

c.

d = 1.99 indicates the residuals are probably uncorrelated.

a.

To determine if the overall model contributes information for the prediction of monthly passenger car and light truck sales, we test: H0: β1 = β2 = β3 = β4 = β5 = 0 Ha: At least 1 βi ≠ 0 The test statistic is F =

R2 / k .856 / 5 = = 164.067 2 (1 − R ) /[n − (k + 1)] (1 − .856) /[144 − (5 + 1)]

The rejection region requires α = .05 in the upper tail of the F distribution with ν1 = k = 5 and ν2 = n – (k + 1) = 144 – (5 + 1) = 138. From Table IX, Appendix B, F.05 ≈ 2.29. The rejection region is F > 2.29. Since the observed value of the test statistic falls in the rejection region (F = 164.067 > 2.29), H0 is rejected. There is sufficient evidence to indicate the overall model contributes information for the prediction of monthly passenger car and light truck sales at α = .05. b.

To determine if positive autocorrelation is present, we test: H0: No first-order autocorrelation Ha: Positive first-order autocorrelation of residuals

502

Chapter 13


The test statistics is d = 1.01. For α = .05, the rejection region is d < dL, α = dL,.05 ≈ 1.57. The value dL,.05 is found in Table XIII, Appendix B, with k = 5, n = 144, and α = .05. Since the observed value of the test statistic falls in the rejection region (d = 1.01 < 1.57, H0 is rejected. There is sufficient evidence to indicate the time series residuals are positively autocorrelated at α = .05.

13.50

c.

One of the requirements for the validity of the test in part b is that the error terms are independent. Since H0 was rejected in part a, there is evidence that positive autocorrelation exists. Since the error terms are not independent, the test in part b may not be valid.

a.

There is a tendency for the residuals to have long positive runs and negative runs. Residuals 1 through 6 are positive, while residuals 7 through 25 are negative. Residuals 26 through 35 are positive. This indicates the error terms are correlated.

b.

From the printout, the Durbin-Watson d is d = .0627. To determine if the time series residuals are autocorrelated, we test: H0: No first-order autocorrelation of residuals Ha: Positive or negative first-order autocorrelation of residuals The test statistic is d = .0627. For α = .10, the rejection region is d < dL,α/2 = dL,.05 = 1.40 or (4 − d) < dL,.05 = 1.40. The value dL,.05 is found in Table XIII, Appendix B, with k = 1, n = 35, and α = .10. Since the observed value of the test statistic falls in the rejection region (d = .0627 < 1.40), H0 is rejected. There is sufficient evidence to indicate the time series residuals are autocorrelated at α = .10.

c.

We must assume the residuals are normally distributed.


503


13.52

a.

Using MINITAB, the plot of the residuals against t is: Scatterplot of RESI1 vs Time 1.5

1.0

RESI1

0.5

0

0.0

-0.5

-1.0 0

2

4

6

8 T ime

10

12

14

16

There is not a random scattering of the residuals. The first 5 residuals are positive, the next 6 are negative, the next one is positive, the next one is negative and the last 2 are positive. This does not appear to be a random scattering. The plot suggests the possibility of autocorrelation. b.

Using MINITAB, the output is: Regression Analysis: Price versus Time The regression equation is Price = 4.76 + 0.309 Time Predictor Constant Time

Coef 4.7608 0.30857

S = 0.769971

SE Coef 0.4184 0.04601

R-Sq = 77.6%

T 11.38 6.71

P 0.000 0.000

R-Sq(adj) = 75.8%


DF 1 13 14

SS 26.661 7.707 34.368

MS 26.661 0.593

F 44.97

P 0.000


Time 15.0

Price 10.740

Fit 9.389

SE Fit 0.379

Residual 1.351

St Resid 2.01R

R denotes an observation with a large standardized residual. Durbin-Watson statistic = 1.39909

504

Chapter 13


To determine if positive autocorrelation is present, we test: H0: No first-order autocorrelation Ha: Positive first-order autocorrelation of residuals The test statistics is d = 1.399. For α = .05, the rejection region is d < dL, α = dL,.05 = 1.08. The value dL,.05 is found in Table XIII, Appendix B, with k = 1, n = 15, and α = .05. Since the observed value of the test statistic does not fall in the rejection region (d = 1.399 1.36), there is insufficient evidence to indicate the time series residuals are positively autocorrelated at α = .05.

13.54

c.

Since the error terms do not appear to be dependent, the validity of the test for the model adequacy appears to be fine.

a.

Using MINITAB, the plot of the residuals against t is: Scatterplot of RESI1 vs t 30 20

RESI1

10 0

0

-10 -20 -30 0

5

10

15

20

25

30

35

t

Since there appear to be groups of consecutive positive and groups of consecutive negative residuals, the data appear to be autocorrelated.


505


b.

Using MINITAB, the output is: Regression Analysis: Policies versus t The regression equation is Policies = 385 - 0.363 t Predictor Constant t

Coef 385.326 -0.3632

S = 15.0555

SE Coef 5.280 0.2632

R-Sq = 5.6%

T 72.98 -1.38

P 0.000 0.177

R-Sq(adj) = 2.7%


DF 1 32 33

SS 431.6 7253.3 7685.0

MS 431.6 226.7

F 1.90

P 0.177


t 1.0

Policies 355.00

Fit 384.96

SE Fit 5.05

Residual -29.96

St Resid -2.11R


To determine if positive autocorrelation is present, we test: H0: No first-order autocorrelation Ha: Positive first-order autocorrelation of residuals The test statistics is d = 0.42. For α = .05, the rejection region is d < dL, α = dL,.05 = 1.39. The value dL,.05 is found in Table XIII, Appendix B, with k = 1, n = 34, and α = .05. Since the observed value of the test statistic falls in the rejection region (d = .42 < 1.39), H0 is rejected. There is sufficient evidence to indicate the time series residuals are positively autocorrelated at α = .05. c.

506

Since the error terms do not appear to be independent, the validity of the test for model adequacy is in question.

Chapter 13


13.56

a.

Year 1995 2000 2001 2002 2003 2004 b.

The exponentially smoothed price for the first time period is equal to the price for that period. For the rest of the time periods, the exponentially smoothed prices are found by multiplying the price for that time period by w = .5 and adding to that (1 − .5) times the exponentially smoothed price for the time period preceeding it. The exponentially smoothed values for each of the price series appear in the table:

Cold Finished Price 25.70 23.08 22.76 23.26 25.15 38.67

Exponentially Smoothed Value w = .5 25.70 24.39 23.58 23.42 24.28 31.48


Hot Rolled Price 25.32 15.67 11.71 16.46 14.80 30.84

Galvanized Price 34.47 21.38 16.41 22.00 20.08 36.69


The plot of the three price series and the exponentially smoothed series are: Cold Finished 40

Variable CF CF-Exp-.5

P r ice

35

30

25

1995 1996 1997

1998 1999

2000 2001

2002 2003

2004

Y ear


507


Hot Rolled Variable HR HR-Exp-.5

30

P r ice

25

20

15

10 1995 1996 1997 1998

1999 2000 2001

2002 2003 2004

Y ear

Galvanized Variable Gal Gal-Exp-.5

35

P r ice

30

25

20

15 1995

1996 1997 1998 1999 2000

2001 2002 2003 2004

Y ear

c.

The exponential smoothing forecasts for 2005 are: Cold Finished: F2005 = E2004 = 31.48 Hot Rolled: F2005 = E2004 = 23.19 Galvanized: F2005 = E2004 = 28.89 One of the main drawbacks of this kind of forecast is the inability to forecast future values using prediction intervals.

508

Chapter 13


13.58

a.

To compute the Laspeyres index, multiply the price for each year by the quantity for each of the items for 1990, sum the products for the four items, divide by 14.05 (the sum for the base period 1990), and multiply by 100. The Laspeyres index is:

Year 1990 1995 2000 2004

13.60

Spaghetti 0.85 0.88 0.88 0.95

Ground Beef 1.63 1.40 1.63 2.14

Eggs 1.00 1.16 0.96 0.98

Potatoes 0.32 0.38 0.35 0.51

Total 14.05 13.72 14.37 18.68

Laspeyres 100.00 97.65 102.28 132.95

b.

From 1990 to 2004, the “basket” of foods increased by 132.95 – 100 = 32.95%.

a.

We first calculate the exponentially smoothed values for 1980–1999. E1 = Y1 = 56.50 E2 = .8Y2 + (1 − .8)E1 = .8(27.0) + .2(56.50) = 32.90 E3 = .8Y3 + (1 − .8)E2 = .8(38.75) + .2(32.90) = 37.58 The rest of the values appear in the table. Year

1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006

Closing Exponentially Smoothed Value Price (w = .8) 56.50 56.50 27.00 32.90 38.75 37.58 45.25 43.72 41.75 42.14 68.37 63.12 45.62 49.12 48.02 48.24 48.01 48.06 64.03 60.84 45.00 48.17 68.07 64.09 30.03 36.84 29.05 30.61 32.05 31.76 41.05 39.19 50.75 48.44 65.50 62.09 49.00 51.62 36.31 39.37 48.44 46.63 55.75 53.93 40.00 42.79 46.60 45.84 46.65 46.49 39.43 40.84 43.80 43.21


509


The forecasts for 2007 and 2008 are: F2007 = Ft+1 = Et = 43.21 F2008 = Ft+2 = Et = 43.21 The expected gain is F2008 – Y2006 = 43.21 – 43.80 = −.59. Since this number is negative, it is actually a loss. b.

We first calculate the Holt-Winters values for 1980-2006. For w = .8 and v = .5, E2 = Y2 = 27.00 E3 = .8Y3 + (1 − .8)(E2 + T2) = .8(38.75) + .2(27 − 29.50) = 30.50 T2 = Y2 − Y1 = 27.00 − 56.50 = −29.50 T3 = .5(E3 − E2) + (1 − .5)(T2) = .5(30.50 − 27.00) + .5(−29.50) = -13.00 The rest of the values appear in the table. Year

1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006

510

Closing Price

56.50 27.00 38.75 45.25 41.75 68.37 45.62 48.02 48.01 64.03 45.00 68.07 30.03 29.05 32.05 41.05 50.75 65.50 49.00 36.31 48.44 55.75 40.00 46.60 46.65 39.43 43.80

Holt-Winters w = .8 v = .5 Et Tt

27.00 −29.5 30.50 −13.00 39.70 −1.90 40.96 −0.32 62.82 10.77 51.22 −0.42 48.58 −1.53 47.82 −1.14 60.56 5.80 49.27 −2.74 63.76 5.87 37.95 −9.97 28.84 −9.54 29.50 −4.44 37.85 1.96 48.56 6.33 63.38 10.58 53.99 0.59 39.96 −6.72 45.40 −0.64 53.55 3.76 43.46 −3.17 45.34 −0.65 46.26 0.14 40.82 −2.65 42.67 −0.40

Chapter 13


The forecasts for 2007 and 2008 are: F2007 = Ft+1 = Et + Tt = 42.67 + (−.40) = 42.27 F2008 = Ft+2 = Et + 2Tt = 42.67 + 2(−.40) = 41.87 The expected gain is F2008 – Y2006 = 41.87 – 43.80 = −1.93. Since this number is negative, it is actually a loss. 13.62

a.

To compute the simple index for the IRA series, divide each IRA value by the 1990 value, 140, and then multiply by 100. To compute the simple index for the 401(k) series, divide each 401(k) value by the 1990 value, 35, and then multiply by 100. The values for the indices are in the table:

Year 1990 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004

b.

IRA 140 350 476 598 767 960 1234 1232 1161 1034 1307 1490

IRA Simple Index 100.00 250.00 340.00 427.14 547.86 685.71 881.43 880.00 829.29 738.57 933.57 1064.29

401(k) 35 184 266 346 466 616 810 815 794 706 919 1086

401(k) Simple Index 100.00 525.71 760.00 988.57 1331.43 1760.00 2314.29 2328.57 2268.57 2017.14 2625.71 3102.86

The time series plot is: 3500

Variable IRAindex 401(K)index

3000

Index

2500 2000 1500 1000 500 0 1990 1992

1994

1996 1998 Y ear

2000

2002

2004


511


13.64

c.

Both the IRA and 401(K) finds have increased since 1990. However, the 401(K) fund has increased at a higher rate than has the IRA fund.

a.

Using MINITAB, the results from fitting the model E(Yt) = βo + β1t are: Regression Analysis: GDP versus t The regression equation is GDP = 9595 + 79.5 t Predictor Constant t

Coef 9594.96 79.537

S = 97.4825

SE Coef 45.28 3.780

R-Sq = 96.1%

T 211.89 21.04

P 0.000 0.000

R-Sq(adj) = 95.9%


DF 1 18 19

SS 4206863 171051 4377914

MS 4206863 9503

F 442.70

P 0.000


t 1.0

GDP 9876.0

Fit 9674.5

SE Fit 42.0

Residual 201.5

St Resid 2.29R

R denotes an observation with a large standardized residual. Durbin-Watson statistic = 0.236602 Predicted Values for New Observations New Obs 1

Fit 11265.2

SE Fit 45.3

95% CI (11170.1, 11360.4)

95% PI (11039.4, 11491.1)


t 21.0


Fit 11344.8

SE Fit 48.6

95% CI (11242.6, 11446.9)

95% PI (11115.9, 11573.6)


512

t 22.0

Chapter 13



Fit 11424.3

SE Fit 52.0

95% CI (11315.0, 11533.6)

95% PI (11192.2, 11656.5)


t 23.0


Fit 11503.8

SE Fit 55.5

95% CI (11387.3, 11620.4)

95% PI (11268.2, 11739.5)X

X denotes a point that is an outlier in the predictors. Values of Predictors for New Observations New Obs 1

t 24.0

The fitted regression line is: Yˆt = 9,594.96 + 79.537t From the printout, the 2006 quarterly GDP forecasts are:

Year 2006

b.

Quarter Q1 Q2 Q3 Q4

Forecast 11,265.2 11,344.8 11,424.3 11,503.8

95% Lower Limit 11,039.4 11,115.9 11,192.2 11,268.2

95% Upper Limit 11,491.1 11,573.6 11,656.5 11,739.5

The following model is fit: E(Yt) = βo + β1t + β1t + β2Q1 + β3Q2 + β4Q3 ⎧1 if quarter 1 where Q1 = ⎨ ⎩0 otherwise

⎧1 if quarter 2 Q2 = ⎨ ⎩0 otherwise

⎧1 if quarter 3 Q3 = ⎨ ⎩0 otherwise

The MINITAB printout is: Regression Analysis: GDP versus t, Q1, Q2, Q3 The regression equation is GDP = 9573 + 79.8 t + 29.4 Q1 + 21.1 Q2 + 25.8 Q3 Predictor Constant t Q1 Q2 Q3

Coef 9572.60 79.850 29.35 21.10 25.85

S = 105.993

SE Coef 69.10 4.190 68.20 67.56 67.17

R-Sq = 96.2%

T 138.53 19.06 0.43 0.31 0.38

P 0.000 0.000 0.673 0.759 0.706

R-Sq(adj) = 95.1%


513


Analysis of Variance Source Regression Residual Error Total Source t Q1 Q2 Q3

DF 1 1 1 1

DF 4 15 19

SS 4209395 168519 4377914

MS 1052349 11235

F 93.67

P 0.000

Seq SS 4206863 656 212 1664


t 1.0

GDP 9876.0

Fit 9681.8

SE Fit 58.1

Residual 194.2

St Resid 2.19R

R denotes an observation with a large standardized residual. Durbin-Watson statistic = 0.238059 Predicted Values for New Observations New Obs 1

Fit 11278.8

SE Fit 69.1

95% CI (11131.5, 11426.1)

95% PI (11009.1, 11548.5)


t 21.0

Q1 1.00

Q2 0.000000

Q3 0.000000


Fit 11350.4

SE Fit 69.1

95% CI (11203.1, 11497.7)

95% PI (11080.7, 11620.1)


t 22.0

Q1 0.000000

Q2 1.00

Q3 0.000000


Fit 11435.0

SE Fit 69.1

95% CI (11287.7, 11582.3)

95% PI (11165.3, 11704.7)


t 23.0

Q1 0.000000

Q2 0.000000

Q3 1.00

Predicted Values for New Observations New Obs Fit SE Fit 95% CI 95% PI 1 11489.0 69.1 (11341.7, 11636.3) (11219.3, 11758.7) Values of Predictors for New Observations New Obs 1

514

t 24.0

Q1 0.000000

Q2 0.000000

Q3 0.000000

Chapter 13


The fitted regression line is: Yˆt = 9,572.6 + 79.85t + 29.35Q1 + 21.10Q2 + 25.85Q3 To determine whether the data indicate a significant seasonal component, we test:

H0: β2 = β3 = β4 = 0 Ha: At least one βi ≠ 0

i = 2, 3, 4

The test statistic is F=

(SSE R − SSE C ) /(k − g ) (171,051 − 168,519) /(4 − 1) 844 = = = 0.075 SSE C [ n − ( k + 1)] 168,519 /[20 − (4 + 1)] 11, 234.6

Since no α is given, we will use α = .05. The rejection region requires α = .05 in the upper tail of the F-distribution with ν1 = k – g = 4 – 1 = 3 and ν2 = n – (k + 1) = 20 – (4 + 1) = 15. From Table IX, Appendix B, F.05 = 3.29. The rejection region is F > 3.29. Since the observed value of the test statistic does not fall in the rejection region (F = .075 >/ 3.29), H0 is not rejected. There is insufficient evidence to indicate a seasonal component at α = .05. This supports the assertion that the data have been seasonally adjusted. c.

From the printout, the 2006 quarterly forecasts are:

Year 2006

d.

Quarter Q1 Q2 Q3 Q4

Forecast 11,278.8 11,350.4 11,435.0 11,489.0

95% Lower Limit 11,009.1 11,080.7 11,165.3 11,219.3

95% Upper Limit 11,548.5 11,620.1 11,704.7 11,758.7

To determine if the time series residuals are autocorrelated, we test: H0: No first-order autocorrelation of residuals Ha: Positive or negative first-order autocorrelation of residuals The test statistic is d = 0.24. For α = .10, the rejection region is d < dL,α/2 = dL,.05 = .90 or (4 – d) < dL,.01 = .90. The value of dL,.05 is found in Table XIII, Appendix B, with k = 4 and n = 20. Since the observed value of the test statistic falls in the rejection region (d = 0.24 < .90), H0 is rejected. There is sufficient evidence to indicate the time series residuals are autocorrelated at α = .10.


515


13.66

a.

Using MINITAB, the results from fitting the model E(Yt) = β0 + β1t are: Regression Analysis: Revolving versus t The regression equation is Revolving = - 84.5 + 33.8 t Predictor Constant t

Coef -84.54 33.768

S = 56.7803

SE Coef 23.41 1.575

R-Sq = 95.2%

T -3.61 21.44

P 0.001 0.000

R-Sq(adj) = 95.0%


DF 1 23 24

SS 1482334 74152 1556486

MS 1482334 3224

F 459.78

P 0.000


t 1.0

Revolving 55.0

Fit -50.8

SE Fit 22.0

Residual 105.8

St Resid 2.02R

R denotes an observation with a large standardized residual. Predicted Values for New Observations New Obs 1

Fit 827.2

SE Fit 24.8

95% CI (775.9, 878.5)

95% PI (699.0, 955.4)


t 27.0


Fit 861.0

SE Fit 26.2

95% CI (806.7, 915.2)

95% PI (731.6, 990.3)


t 28.0

The fitted regression line is: Yˆt = −84.54 + 33.768t

516

Chapter 13


For the years 2006 and 2007, t = 27 and 28. From the printout, the predicted values and 95% prediction intervals for 2006 and 2007 are:

Year 2006 2007

b.

Forecast 827.2 861.0

95% Lower Limit 699.0 731.6

95% Upper Limit 955.4 990.3

To compute the Holt-Winters values for the years 1980-2004: With w = .7 and v = .7, E2 = Y2 = 61 E3 = wY3 + (1 – w)(E2 + T2) =.7(66) + (1 − .7)(61 + 6) = 66.3. T2 = Y2 – Y1 = 61 – 55 = 6 T3 = v(E3 – E2) + (1 – v)T2 = .7(66.3 – 61) + (1 − .7)(6) = 5.51 The rest of the values appear in the table:

Year 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004

Revolving 55 61 66 79 100 122 136 153 174 198 239 245 257 288 338 443 499 530 579 608 678 722 738 759 794

Holt-Winters w = .7 v = .7 Et Tt

61.00 66.30 76.84 95.76 118.91 137.17 153.98 173.24 196.19 232.66 250.91 261.89 284.49 327.99 419.44 497.62 543.45 584.91 614.75 669.40 720.81 748.01 765.97 792.44


6.00 5.51 9.03 15.95 20.99 19.08 17.49 18.73 21.69 32.04 22.38 14.40 20.14 36.49 74.97 77.22 55.24 45.59 34.57 48.62 50.57 34.22 22.83 25.38

517


Using the Holt-Winters series, the forecasts for 2006 and 2007 are: F2006 = Ft+2 = Et + 2Tt = 792.44 + 2(25.38) = 843.20 F2007 = Ft+3 = Et + 3Tt = 792.44 + 3(25.38) = 868.58 These values are very similar to forecasts found using regression. 13.68

a.

From Example 13.4, the exponentially smoothed value for September 2005 is 80.333. The forecasts for October through December 2005 are: F2005,Oct = Ft+1 = Et = 80.333 F2005,Nov = Ft+2 = Ft+1 = 80.333 F2005,Dec = Ft+3 = Ft+1 = 80.333 The forecast errors are the differences between the actual values and the forecasted values. The forecast errors are: Year 2005,Oct 2005,No v 2005,Dec

b.

Yt+i 81.88

Ft+i 80.333

Difference 1.55

88.90 82.20

80.333 80.333

8.57 1.87

Using MINITAB, the results of fitting the model are: Regression Analysis: IBM versus Time The regression equation is IBM = 95.8 - 0.740 Time Predictor Constant Time

Coef 95.777 -0.7401

S = 5.79351

SE Coef 2.622 0.2088

R-Sq = 39.8%

T 36.53 -3.54

P 0.000 0.002

R-Sq(adj) = 36.6%


DF 1 19 20

SS 421.71 637.73 1059.44

MS 421.71 33.56

F 12.56

P 0.002


Time 12.0

IBM 98.58

Fit 86.90

SE Fit 1.28

Residual 11.68

St Resid 2.07R


518

Chapter 13



Fit 79.50

SE Fit 2.62

95% CI (74.01, 84.98)

95% PI (66.19, 92.81)


Time 22.0


Fit 78.76

SE Fit 2.81

95% CI (72.88, 84.63)

95% PI (65.28, 92.23)


Time 23.0


Fit 78.02

SE Fit 2.99

95% CI (71.75, 84.28)

95% PI (64.37, 91.67)


Time 24.0

The least squares fitted model is: Yˆt = 95.777 − .7401t

βô = 95.777

The estimated stock price for IBM in December 2003 is 95.777.

βˆ1 = −.7401

The estimated decrease in the value of the stock for IBM for each additional month is .7401.

c.

The approximate precision is ±2s or ±2(5.79) or ±11.58 .

d.

The forecasts and prediction intervals are found at the bottom of the printout in part b.

Year 2005, Oct 2005, Nov 2005, Dec

Forecast 79.50 78.76 78.02

95% Lower Limit 66.19 65.28 64.37

The precision for October is approximately

95% Upper Limit 92.81 92.23 91.67

92.81 − 66.19 = 13.31 . 2


519


The precision for November is approximately

92.23 − 65.28 = 13.48 . 2

The precision for December is approximately

91.67 − 64.37 = 13.65 . 2

All of these are close to the 11.58 from part c. e.

The MAD, MAPE, and RMSE for the smoothed series are: m

MAD =

∑ | Yt − Ft | i =1

m

=

| 81.88 − 80.33 | + | 88.90 − 80.33 | + | 82.20 − 80.33 | 11.98 = = 3.994 3 3

⎡ m (Yt − Ft ) ⎢∑ Yt ⎢ i =1 MAPE = ⎢ m ⎢ ⎢ ⎣

m

∑ (Yt − Ft )

2

i =1

RMSE =

⎤ ⎡ 81.88 − 80.33 88.90 − 80.33 82.20 − 80.33 ⎥ + + ⎢ ⎥ 81.88 88.90 88.90 ⎥ 100 = ⎢⎢ 3 ⎥ ⎢ ⎥ ⎣ ⎦ ⎡ .1380 ⎤ =⎢ ⎥ 100 = 4.599 ⎣ 3 ⎦

m

= =

⎤ ⎥ ⎥ 100 ⎥ ⎥ ⎦

(81.88 − 80.33)2 + (88.90 − 80.33)2 + (82.20 − 80.33)2 3 79.2724 = 5.140 3

The MAD, MAPE, and RMSE for the regression model are: m

MAD =

∑ | Yt − Ft | i =1

=

m 16.70 = = 5.567 3

| 81.88 − 79.50 | + | 88.90 − 78.76 | + | 82.20 − 78.02 | 3

⎡ m (Yt − Ft ) ⎢∑ Yt ⎢ i =1 MAPE = ⎢ m ⎢ ⎢ ⎣

520

⎤ ⎡ 81.88 − 79.50 88.90 − 78.76 82.20 − 78.02 ⎥ + + ⎢ ⎥ 81.88 88.90 88.90 ⎥ 100 = ⎢⎢ 3 ⎥ ⎢ ⎥ ⎣ ⎦ ⎡ .1940 ⎤ =⎢ ⎥ 100 = 6.466 ⎣ 3 ⎦

⎤ ⎥ ⎥ 100 ⎥ ⎥ ⎦

Chapter 13


m

RMSE =

∑ (Yt − Ft ) i =1

m

2

= =

(81.88 − 79.50 )2 + (88.90 − 78.76 )2 + (82.20 − 78.02 )2 3 125.9564 = 6.480 3

The values of MAD, MAPE, and RMSE for the exponentially smoothed model are all smaller than their corresponding values for the regression model. f.

We have to assume that the error terms are independent.

g.

To determine if positive autocorrelation is present, we test: H0: No first-order autocorrelation of residuals Ha: Positive first-order autocorrelation of residuals The test statistic is d = 0.69. The rejection region is d < dL,α = dL,.05 = 1.22. The value of dL,.05 is found in Table XIII, Appendix B, with k = 1 and n = 21 . Since the observed value of the test statistic falls in the rejection region (d = .69 < 1.22), H0 is rejected. There is sufficient evidence to indicate the time series residuals are positively autocorrelated at α = .05. Since there is evidence of positive autocorrelation, the validity of the regression model is questioned.


521


The Gasket Manufacturing Case (To accompany Chapters 12–13)

For this study, I constructed an R chart and an x -chart for both the original data (5.1) and for the new data (5.2). First, we will analyze the data set, 5.1 (that collected under the discretion of the operator). We must compute the mean and range for each sample. The range = R = largest measurement smallest measure. The results are listed in the table:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24

0.0440 0.0438 0.0453 0.0451 0.0459 0.0449 0.0472 0.0457 0.0464 0.0451 0.0456 0.0448 0.0459 0.0456 0.0472 0.0462 0.0427 0.0431 0.0425 0.0429 0.0443 0.0443 0.0429 0.0448

Samples 0.0446 0.0425 0.0428 0.0441 0.0466 0.0471 0.0477 0.0459 0.0457 0.0447 0.0455 0.0423 0.0468 0.0471 0.0465 0.0463 0.0437 0.0448 0.0442 0.0447 0.0441 0.0423 0.0427 0.0451

0.0437 0.0443 0.0433 0.0434 0.0476 0.0451 0.0452 0.0472 0.0447 0.0457 0.0445 0.0442 0.0452 0.0450 0.0461 0.0471 0.0445 0.0429 0.0432 0.0450 0.0450 0.0447 0.0464 0.0428

x 0.0441 0.0435 0.0438 0.0442 0.0467 0.0457 0.0467 0.0463 0.0456 0.0452 0.0452 0.0438 0.0460 0.0459 0.0466 0.0465 0.0436 0.0436 0.0433 0.0442 0.0445 0.0438 0.0440 0.0442

Range 0.0009 0.0018 0.0025 0.0017 0.0017 0.0022 0.0025 0.0015 0.0017 0.0010 0.0011 0.0025 0.0016 0.0021 0.0011 0.0009 0.0018 0.0019 0.0017 0.0021 0.0009 0.0024 0.0037 0.0023

x1 + x2 + " + x24 1.0770 = = .0449 n 24 R + R2 + " + R24 .0436 = R = 1 = .0018 n 24 x =

We now construct an R chart. From Table XVII, Appendix B, with n = 3, D3 = .000 and D4 = 2.574.

522

The Gasket Manufacturing Case


R = .0018 Upper control limit = RD4 = .0018(2.574) = .0046 Since D3 = 0, the lower control limit is negative and is not included on the chart. From Table XVII, Appendix B, with n = 3, d2 = 1.693 and d3 = .888.

Upper A–B boundary = R + 2d3

.0018 R = .0018 + 2(.888) = .0037 1.693 d2

Lower A−B boundary = R − 2d3

.0018 R = .0018 − 2(.888) = −.0001 = 0 1.693 d2

Upper B–C boundary = R + d3

.0018 R = .0018 + (.888) = .0027 d2 1.693

Lower B–C boundary = R − d3

.0018 R = .0018 − (.888) = .0009 1.693 d2

The R-chart is:

To determine if the process is in control, we check the four rules. Rule 1: One point beyond Zone A: There are no points beyond Zone A. Rule 2: Nine points in a row in Zone C or beyond: No sequence of nine points are in Zone C (on one side of the centerline) or beyond. Rule 3: Six points in a row steadily increasing or decreasing: This pattern is not present. Rule 4: Fourteen points in a row alternating up and down: This pattern does not exit. The process appears to be in control. No rule is violated. Next, we construct the x -chart.


523


Centerline = x = .0449 From Table XVII, Appendix B, with n = 3, A2 = 1.023

Upper control limit = x + A2 R = .0449 + 1.023(.0018) = .0467 Lower control limit = x − A2 R = .0449 − 1.023(.0018) = .0431

Upper A-B boundary = x =

2 2 ( A2 R ) = .0449 + (1.023(.0018) ) = .0461 3 3

Lower A–B boundary = x −

2 2 ( A2 R ) = .0449 − (1.023(.0018) ) = .0437 3 3

Upper B–C boundary = x +

1 1 ( A2 R ) = .0449 + (1.023(.0018) ) = .0455 3 3

Lower B–C boundary = x −

1 1 ( A2 R ) = .0449 − (1.023(.0018) ) = .0443 3 3

The x -chart is:

To determine if the process is in or out of control, we check the six rules: Rule 1: One point beyond Zone A: No points are beyond Zone A. Rule 2: Nine points in a row in Zone C or beyond: No sequence of nine points are in Zone C (on one side of the centerline) or beyond. Rule 3: Six points in a row steadily increasing or decreasing: This pattern is not present. Rule 4: Fourteen points in a row alternating up and down: This pattern does not exit. Rule 5: Two out of three points in Zone A or beyond: There are six groups of at least three points in Zone A or beyond—points 5–7, points 6–8, points 7–9, points 14–16, points 17–19, and points 18–20. Rule 6: Four out of five points in a row in Zone B or beyond: There are six groups of points that satisfy this rule—points 5–9, points 6–10, points 17–21, points 18–22, points 19–23, and points 20–24.

524



The process appears to be out of control. Rules 5 and 6 indicate that the process is out of control. Since the process is out of control, a capability analysis is not appropriate. However, I will include a dot diagram which indicates that many of the actual observations are outside of the specification limits. The dot plot is: . : : ::: ....

.

:. .:

. .

..

:. .::: :.::.:::. .:: : ...:.. .

::

..

-------+---------+---------+---------+---------+--------0.0430

0.0440

0.0450

0.0460

0.0470

0.0480

The specification limits are .043 to .047. There are 11 points below .043 and 8 above .047. Thus, 19 out of the 72 points or .264 of the points are outside of the specification limits. This indicates that the present system, when the operator is allowed to adjust the system at his/her discretion, is not capable of reaching the needs of the customers. Next, we analyze the second set of data, 5.2. First, we must compute the mean and range for each sample. The range = R = largest measurement smallest measure. The results are listed in the table:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24

0.0445 0.0435 0.0438 0.0449 0.0433 0.0455 0.0455 0.0445 0.0443 0.0449 0.0465 0.0461 0.0443 0.0456 0.0447 0.0454 0.0445 0.0438 0.0453 0.0455 0.0440 0.0444 0.0445 0.0450

Samples 0.0455 0.0453 0.0459 0.0449 0.0461 0.0454 0.0458 0.0451 0.0450 0.0448 0.0449 0.0439 0.0434 0.0459 0.0442 0.0445 0.0471 0.0445 0.0444 0.0435 0.0438 0.0450 0.0447 0.0463


0.0457 0.0450 0.0428 0.0467 0.0451 0.0461 0.0445 0.0436 0.0441 0.0467 0.0448 0.0452 0.0454 0.0452 0.0457 0.0451 0.0465 0.0472 0.0451 0.0443 0.0444 0.0467 0.0461 0.0456

x 0.0452 0.0446 0.0442 0.0455 0.0448 0.0457 0.0453 0.0444 0.0445 0.0455 0.0454 0.0451 0.0444 0.0456 0.0449 0.0450 0.0460 0.0452 0.0449 0.0444 0.0441 0.0454 0.0451 0.0456

Range 0.0012 0.0018 0.0031 0.0018 0.0028 0.0007 0.0013 0.0015 0.0009 0.0019 0.0017 0.0022 0.0020 0.0007 0.0015 0.0009 0.0026 0.0034 0.0009 0.0020 0.0006 0.0023 0.0016 0.0013

525


x1 + x2 + " + x24 1.0808 = = .0450 n 24 R + R2 + " + R24 .0407 = R = 1 = .0017 24 n x =

First, we construct an R chart. From Table XVII, Appendix B, with n = 3, D3 = .000 and D4 = 2.574.

R = .0017 Upper control limit = RD4 = .0017(2.574) = .0044 Since D3 = 0, the lower control limit is negative and is not included on the chart. From Table XVII, Appendix B, with n = 3, d2 = 1.693 and d3 = .888.

.0017 Upper A–B boundary = R + 2d3 R = .0017 + 2(.888) = .0035 1.693 d2 .0017 Lower A–B boundary = R − 2d3 R = .0017 − 2(.888) = -.0001 = 0 1.693 d2 .0017 Upper B–C boundary = R + d3 R = .0017 + (.888) = .0026 1.693 d2 .0017 Lower B–C boundary = R − d3 R = .0017 − (.888) = .0008 1.693 d2 The R-chart is:

To determine if the process is in control, we check the four rules. Rule 1: One point beyond Zone A: There are no points beyond Zone A. Rule 2: Nine points in a row in Zone C or beyond: No sequence of nine points are in Zone C (on one side of the centerline) or beyond.

526



Rule 3: Six points in a row steadily increasing or decreasing: This pattern is not present. Rule 4: Fourteen points in a row alternating up and down: This pattern does not exit. The process appears to be in control. No rule is violated. Next, we construct the x -chart. Centerline = x = .0450 From Table XVII, Appendix B, with n = 3, A2 = 1.023

Upper control limit = x + A2 R = .0450 + 1.023(.0017) = .0467 Lower control limit = x − A2 R = .0450 − 1.023(.0017) = .0433 Upper A-B boundary = x +

2 2 ( A2 R ) = .0450 + (1.023(.0017) ) = .0462 3 3

Lower A–B boundary = x −

2 2 ( A2 R ) = .0450 − (1.023(.0017) ) = .0438 3 3

Upper B–C boundary = x +

1 1 ( A2 R ) = .0450 + (1.023(.0017) ) = .0456 3 3

Lower B–C boundary = x −

1 1 ( A2 R ) = .0450 − (1.023(.0017) ) = .0444 3 3

The x -chart is:

To determine if the process is in or out of control, we check the six rules: Rule 1: One point beyond Zone A: No points are beyond Zone A. Rule 2: Nine points in a row in Zone C or beyond: No sequence of nine points are in Zone C (on one side of the centerline) or beyond.


527


Rule 3: Rule 4: Rule 5: Rule 6:

Six points in a row steadily increasing or decreasing: This pattern is not present. Fourteen points in a row alternating up and down: This pattern does not exit. Two out of three points in Zone A or beyond: This pattern does not exist. Four out of five points in a row in Zone B or beyond: This pattern does not exist.

The process appears to be in control. No rules are violated. Since the process is in control, we will perform a capability analysis to see if the process can meet the customer's demand. I will include a dot diagram which indicates that many of the actual observations are outside of the specification limits. The dot plot is: . : . ..: : :: . : : . . .. :. : .... ::: ::: ::::: :::. : : . : : .. -----+---------+---------+---------+---------+---------+0.04320 0.04400 0.04480 0.04560 0.04640 0.04720

The specification limits are .043 to .047. There is one point below .043 and two points above .047. Thus, 3 out of the 72 points or .042 of the points are outside of the specification limits. This indicates that the present system, when the operator does not adjust the system at his/her discretion, might be able to meet the needs of the customers. We will also compute the capability index. The capability index is defined as the ratio of the specification limits to 6 standard deviations or:

Cp =

upper specification limit − lower specification limit 6σ

Since σ is not known, we will estimate it with s. In this case, s = .00095. The capability index is:

Cp =

.047 - .043 = .702 6(.00095)

Since the capability index is less than 1, it indicates that the process is not capable of meeting the customer's needs. Even though this process (operator does not make adjustments) is in control, it is not capable of meeting the needs of the customers. In conclusion, it appears that the engineers are correct—the present equipment is not capable of producing gasket material within the necessary limits.

528




14.2

14.4

Chapter 14

a.

Since the normal distribution is symmetric, the probability that a randomly selected observation exceeds the mean of a normal distribution is .5.

b.

By the definition of "median," the probability that a randomly selected observation exceeds the median of a normal distribution is .5.

c.

If the distribution is not normal, the probability that a randomly selected observation exceeds the mean depends on the distribution. With the information given, the probability cannot be determined.

d.

By definition of "median," the probability that a randomly selected observation exceeds the median of a non-normal distribution is .5.

a.

H0: η = 9 Ha: η > 9 The test statistic is S = {Number of observations greater than 9} = 7. The p-value = P(x ≥ 7) where x is a binomial random variable with n = 10 and p = .5. From Table II, p-value = P(x ≥ 7) = 1 − P(x ≤ 6) = 1 − .828 = .172 Since the p-value = .172 > α = .05, H0 is not rejected. There is insufficient evidence to indicate the median is greater than 9 at α = .05.

b.

H0 : η = 9 Ha: η ≠ 9 S1 = {Number of observations less than 9} = 3 and S2 = {Number of observations greater than 9} = 7 The test statistic is S = larger of S1 and S2 = 7. The p-value = 2P(x ≥ 7) where x is a binomial random variable with n = 10 and p = .5. From Table II, p-value = 2P(x ≥ 7) = 2(1 − P(x ≤ 6)) = 2(1 - .828) = .344 Since the p-value = .344 > α = .05, H0 is not rejected. There is insufficient evidence to indicate the median is different than 9 at α = .05.


529


c.

H0: η = 20 Ha: η < 20 The test statistic is S = {Number of observations less than 20} = 9. The p-value = P(x ≥ 9) where x is a binomial random variable with n = 10 and p = .5. From Table II, p-value = P(x ≥ 9) = 1 − P(x ≤ 8) = 1 − .989 = .011 Since the p-value = .011 < α= .05, H0 is rejected. There is sufficient evidence to indicate the median is less than 20 at α = .05.

d.

H0: η = 20 Ha: η ≠ 20 S1 = {Number of observations less than 20} = 9 and S2 = {Number of observations greater than 20} = 1 The test statistic is S = larger of S1 and S2 = 9. The p-value = 2P(x ≥ 9) where x is a binomial random variable with n = 10 and p = .5. From Table II, p-value = 2P(x ≥ 9) = 2(1 − P(x ≤ 8)) = 2(1 − .989) = .022 Since the p-value = .022 < α = .05, H0 is rejected. There is sufficient evidence to indicate the median is different than 20 at α = .05.

e.

For all parts, μ = np = 10(.5) = 5 and σ =

npq = 10(.5)(.5) = 1.581.

(7 − .5) − 5 ⎞ ⎛ For part a, P(x ≥ 7) ≈ P ⎜ z ≥ = P(z ≥ .95) = .5 − .3289 = .1911 1.581 ⎟⎠ ⎝

This is close to the probability .172 in part a. The conclusion is the same. (7 − .5) − 5 ⎞ ⎛ For part b, 2P(x ≥ 7) ≈ 2 P ⎜ z ≥ = 2P(z ≥ .95) = 2(.5 − .3289) 1.581 ⎟⎠ ⎝ = .3422 This is close to the probability .344 in part b. The conclusion is the same. (9 − .5) − 5 ⎞ ⎛ = P(z ≥ 2.21) = .5 − .4864 For part c, P(x ≥ 9) ≈ P ⎜ z ≥ 1.581 ⎟⎠ ⎝ = .0136

This is close to the probability .011 in part c. The conclusion is the same.

530

Chapter 14


(9 − .5) − 5 ⎞ ⎛ For part d, 2P(x ≥ 9) ≈ 2 P ⎜ z ≥ = 2P(z ≥ 2.21) = 2(.5 − .4864) 1.581 ⎟⎠ ⎝ = .0272 This is close to the probability .022 in part d. The conclusion is the same.

14.6

f.

We must assume only that the sample is selected randomly from a continuous probability distribution.

a.

To determine if the median amount of caffeine in Breakfast Blend coffee exceeds 300 milligrams, we test: H0: η = 300 Ha: η > 300

b.

S=4

c.

Using Table II, Appendix B, with n = 6 and p = .5,

P ( x ≥ 4) = 1 − P ( x ≤ 3) = 1 − .656 = .344 d.

14.8

a.

Since the probability in part c is greater than α = .05, H0 is not rejected. There is insufficient evidence to indicate the median amount of caffeine in Breakfast Blend coffee exceeds 300 milligrams at α = .05. To determine if cohesiveness will deteriorate after storage, we test: H0: η = 0 Ha: η > 0

b.

The test statistic is S = {number of measurements greater than 0} = 13. The p-value = P(x ≥ 13) where x is a binomial random variable with n = 20 and p = .5. From Table II, p-value = P(x ≥ 13) = 1 – P(x ≤ 12) = 1 − .868 = .132

14.10

c.

Since the p-value = .132 > α = .05, H0 is not rejected. There is insufficient evidence to indicate cohesiveness will deteriorate after storage at α = .05.

a.

I would recommend the sign test because five of the sample measurements are of similar magnitude, but the 6th is about three times as large as the others. It would be very unlikely to observe this sample if the population were normal.

b.

To determine if the airline is meeting the requirement, we test: H0: η = 30 Ha: η < 30


531


c.

The test statistic is S = number of measurements less than 30 = 5. H0 will be rejected if the p-value < α = .01.

d.

The test statistic is S = 5. The p-value = P(x ≥ 5) where x is a binomial random variable with n = 6 and p = .5. From Table II, p-value = P(x ≥ 5) = 1 − P(x ≤ 4) = 1 − .891 = .109 Since the p-value = .109 is not less than α = .01, H0 is not rejected. There is insufficient evidence to indicate the airline is meeting the maintenance requirement at α = .01.

14.12

To determine if the median surface roughness of coated interior pipe differs from 2 micrometers, we test: H0: η = 2 Ha: η ≠ 2 S1 = {Number of measurements < 2} = 9. S2 = {Number of measurements > 2} = 11. The test statistic is S = Larger of S1 and S2 = 11. The p-value = 2 P(x ≥ 11) where x is a binomial random variable with n = 20 and p = .5 From Table II, Appendix B, p-value = 2 P(x ≥ 11) = 2(1 − P( x ≤ 10)) = 2(1 − .588) = .824 Since the p-value = .824
14.14

To determine if the distribution of A is shifted to the left of distribution B, we test: H0: The two sampled populations have identical distributions Ha: The probability distribution for population A is shifted to the left of population B.

n1 (n1 + n2 + 1) 15(15 + 15 + 1) 173 − 2 2 The test statistic is z = = = −2.47 15(15)(15 + 15 + 1) n1n2 ( n1 + n2 + 1) 12 12 The rejection region requires α = .05 in the lower tail of the z-distribution. From Table IV, z.05 = 1.645. The rejection region is z < −1.645. T1 −

Since the observed value of the test statistic falls in the rejection region (z = −2.47 < −1.645), H0 is rejected. There is sufficient evidence to indicate the distribution of A is shifted to the left of distribution B.

532

Chapter 14


14.16 Sample from Population 1 15 10 12 16 13 8

Rank 13 8.5 10.5 14 12 4.5

T1 = 62.5 a.

Sample from Population 2 5 12 9 9 8 4 5 10

Rank 2.5 10.5 6.5 6.5 4.5 1 2.5 8.5 T2 = 42.5

H0: The two sampled populations have identical probability distributions Ha: The probability distribution for population 1 is shifted to the left or to the right of that for 2 The test statistic is T1 = 62.5 since sample A has the smallest number of measurements. The null hypothesis will be rejected if T1 ≤ TL or T1 ≥ TU where TL and TU correspond to α = .05 (two-tailed), n1 = 6 and n2 = 8. From Table XV, Appendix B, TL = 29 and TU = 61. Reject H0 if T1 ≤ 29 or T1 ≥ 61. Since T1 = 62.5 ≥ 61, we reject H0 and conclude there is sufficient evidence to indicate population 1 is shifted to the left or right of population 2 at α = .05.

b.

H0: The two sampled populations have identical probability distributions Ha: The probability distribution for population 1 is shifted to the right of population 2 The test statistic remains T1 = 62.5. The null hypothesis will be rejected if T1 ≥ TU where TU corresponds to α = .05 (onetailed), n1 = 6 and n2 = 8. From Table XV, Appendix B, TU = 58. Reject H0 if T1 ≥ 58. Since T1 = 62.5 ≥ 58, we reject H0 and conclude there is sufficient evidence to indicate population 1 is shifted to the right of population 2 at α = .05.


533


14.18

a.

Some preliminary calculations: Private Sector 2.58 5.05 0.05 2.10 4.30 2.25 2.50 1.94 2.33

b.

Rank 10 13 1 5 12 6 8 4 7 T1 = 66

Public Sector 5.40 2.55 9.00 10.55 1.02 5.11 12.42 1.67 3.33

Rank 15 9 16 17 2 14 18 3 11 T2 = 105

To determine if the distribution for public sector organizations is located to the right of the distribution for private sector firms, we test: H0: The two sampled populations have identical probability distributions Ha: The probability distribution of the public sector is located to the right of that for the private sector The test statistic is T2 = 105. The null hypothesis will be rejected if T2 ≥ TU where TU corresponds to α = .05 (onetailed), and n1 = n2 = 9. From Table XV, Appendix B, TU = 105. Reject H0 if T2 ≥ 105. Since T2 = 105 ≥ 105, H0 is rejected. There is sufficient evidence to indicate that the distribution in the public sector organization is located to the right of the distribution for the private sector firms at α = .05.

c.

The null hypothesis will be rejected if T2 ≥ TU where TU corresponds to α = .05 (onetailed), and n1 = n2 = 9. From Table XV, Appendix B, TU = 105. Since T1 = 105, we would reject H0. Thus, the p-value is less than or equal to α = .05.

d.

The assumptions necessary for the test are: 1. 2.

534

The two samples are random and independent. The two probability distributions from which the samples were drawn are continuous.

Chapter 14


14.20

a. American Purchasing Managers Sample 1 Rank 50 20.5 10 4.5 35 15.5 30 13.5 20 10.5 15 7.5 8 3 40 17.5 80 26.5 75 25 19 9 11 6 5 1.5 25 12 30 13.5 T1 = 186

b.

Mexican Purchasing Managers Sample 2 Rank 10 4.5 90 29 65 24 50 20.5 20 10.5 15 7.5 60 23 80 26.5 85 28 35 15.5 5 1.5 55 22 40 17.5 45 19 95 30 T2 = 279

To determine whether American and Mexican purchasing managers perceive the given ethical situation differently, we test: H0: The two sampled populations have identical probability distributions Ha: The probability distribution of the American managers is shifted to the right or left of the probability distribution of the Mexican managers.


n1 (n1 + n2 + 1) 15(15 + 15 + 1) 186 − 2 2 = = −1.929 15(15)(15 + 15 + 1) n1n2 (n1 + n2 + 1) 12 12

T1 −

The rejection region requires α/2 = .05/2 = .025 in each tail of the z-distribution. From Table IV, Appendix B, z.025 = 1.96. The rejection region is z < −1.96 or z > 1.96. Since the observed value of the test statistic does not fall in the rejection region (z = −1.929
In order to use the t-test, we need to assume that the two populations being sampled from are normal and that the variances of the two populations are equal. To check these assumptions, we will use stem-and-leaf plots and dot plots.


535


The stem-and-leaf plots are: Stem-and-leaf of Ethics Leaf Unit = 1.0 2 6 (2) 7 4 3 2 2 1

0 1 2 3 4 5 6 7 8

0 1 2 3 4 5 6 7 8 9

N

= 15

Managers = 2

N

= 15

58 0159 05 005 0 0 5 0

Stem-and-leaf of Ethics Leaf Unit = 1.0 1 3 4 5 7 (2) 6 4 4 2

Managers = 1

5 05 0 5 05 05 05 05 05

Neither of these two stem-and-leaf plots look mound-shaped. The assumption that the populations are normal may not be valid. The dot plots are: Managers 1 .... . :

. :

. .

.

. .

+---------+---------+---------+---------+---------+-------Ethics Managers 2

. .

. .

. .

. .

. .

.

.

. .

.

+---------+---------+---------+---------+---------+-------Ethics 0

20

40

60

80

100

The spread of the two data sets look approximately equal. The assumption that the variances of the two populations are the same appears to be valid.

536

Chapter 14


14.22

a.

Using MINITAB, histograms of the two data sets are:

Histogram of HEATRATE 9000 10000 11000 12000 13000 14000 15000 16000

Aeroderiv

20

Traditional

Frequency

15

10

5

0

9000 10000 11000 12000 13000 14000 15000 16000

HEATRATE Panel variable: ENGINE

From the histograms, the data for each group do not look like they are moundshaped. The variance of the aeroderivative engines is greater than that of the traditional engines. Thus, the assumptions of normal distributions and equal variances necessary for the t-test are probably not met.

14.24

b.

The p-value = .3431. Since this p-value is not small, H0 is not rejected. There is no evidence to indicate that the heat rate distribution of the traditional turbine engines is shifted to the right or left of that for the aeroderivative turbine engines.

a.

We first rank all the data: Firms with Successful MIS (1) Score Rank Score 52 5 90 70 15 75 40 1.5 80 80 19 95 82 21 90 65 12.5 86 59 9 95 60 10.5 93

T1 = 290.5


Rank 25.5 17 19 29.5 25.5 23 29.5 28

Firms with Unsuccessful MIS (2) Score Rank Score Rank 60 10.5 65 12.5 50 4 55 7 55 7 70 15 70 15 90 25.5 41 3 85 22 40 1.5 80 19 55 7 90 25.5

T2 = 174.5

537


To determine whether the distribution of quality scores for the successfully implemented systems differs from that for the unsuccessfully implemented systems, we test: H0: The two sampled distributions are identical Ha: The probability distribution for the successful MIS is shifted to the right or left of that for the unsuccessful MIS


n1 (n1 + n2 + 1) 16(16 + 14 + 1) 290.5 − 2 2 = = 1.767 16(14)(16 + 14 + 1) n1n2 (n1 + n2 + 1) 12 12

T1 −

The rejection region requires α/2 = .05/2 = .025 in each tail of the z-distribution. From Table IV, Appendix B, z.025 = 1.96. The rejection region is z < −1.96 or z > 1,96. Since the observed value of the test statistic does not fall in the rejection region (z = 1.767 >/ 1.96), H0 is not rejected. There is insufficient evidence to indicate the distribution of quality scores for the successfully implemented systems differs from that for the unsuccessfully implemented systems at α = .05. b.

We could use the two-sample t-test if: 1. 2.

14.26

a.

Both populations are normal. The variances of the two populations are the same.

The test statistic is T− or T+, the smaller of the two. The rejection region is T ≤ 152, from Table XVI, Appendix B, with n = 30, α = .10, and two-tailed.

b.

The test statistic is T−. The rejection region is T− ≤ 60, from Table XVI, Appendix B, with n = 20, α = .05, and one-tailed.

c.

The test statistic is T+. The rejection region is T+ ≤ 0, from Table XVI, Appendix B, with n = 8, α = .005, and one-tailed.

14.28

a.

The rejection region requires α = .05 in the upper tail of the z-distribution. From Table IV, Appendix B, z.05 = 1.645. The rejection region is z > 1.645. n(n + 1) 25(26) 273 − 4 4 = 2.97 = n(n + 1)(2n + 1) 25(26)(51) 24 24 T+−

b.

The large sample test statistic is z =

Since the observed value of the test statistic falls in the rejection region (z = 2.97 > 1.645), H0 is rejected. There is sufficient evidence to indicate that the responses for A tend to be larger than those for B at α = .05.

538

Chapter 14


c.

p-value = P(z ≥ 2.97) = .5 − P(0 < z < 2.97) = .5 − .4985 = .0015 (from Table IV, Appendix B) Thus, we can reject H0 for any preselected α greater than .0015.

14.30

a.

To determine if the chest injury ratings of drivers and front-seat passengers differ, we test: H0: The two sampled populations have identical probability distributions Ha: The probability distribution of drivers is shifted to the right or left of that for front-seat passengers

b.

Using MINITAB, the results are: Wilcoxon Signed Rank Test: Diff Test of median = 0.000000 versus median not = 0.000000

Diff

N 18

N for Test 16

Wilcoxon Statistic 23.0

P 0.021

Estimated Median -4.000

From the printout, the test statistic is T+ = 23.

c.

The rejection region is T+ ≤ To where To corresponds to α = .01 (two-tailed) and n = 16. From Table XVI, Appendix B, To = 19. The rejection region is T+ ≤ 19.

d.

Since the observed value of the test statistic does not fall in the rejection region (T+ = 23 ≤/ 19), H0 is not rejected. There is insufficient evidence to indicate the chest injury ratings of drivers and front-seat passengers differ at α = .01. From the printout, the p-value is p = .021.

14.32

Some preliminary calculations: Theme

Tourism Physical Transportation People History Climate Forestry Agriculture Fishing Energy Mining Manufacturing


High School Teachers 10 2 7 1 2 6 5 7 9 2 10 12

Geography Alumni 2 1 3 6 5 4 8 10 7 8 11 12

Difference Rank of Absolute T-A Differences 8 11 1 1.5 4 8 9 −5 6 −3 2 3.5 6 −3 6 −3 2 3.5 10 −6 1.5 −1 0 (eliminated) Positive rank sum T+ = 27.5

539


To determine if the distributions of theme rankings for the two groups differ, we test: H0: The probability distributions for the two populations are identical Ha: The probability distribution of the high school teachers is shifted to the right or left of the probability distribution of the geography alumni The test statistic is T+ = 27.5. Reject H0 if T+ ≤ T0 where T0 is based on α = .05 and n = 11 (two-tailed): Reject H0 if T+ ≤ 11 (from Table XVI, Appendix B) Since the observed value of the test statistic does not fall in the rejection region (T+ = 27.5 ≤/ 11), H0 is not rejected. There is insufficient evidence to indicate that the distributions of these rankings for the two groups differ at α = .05. Practically, this means that the thematic content of a new atlas could be based on the views of either educators or geography alumni. 14.34


Employee 1 2 3 4 5 6 7 8 9 10

Before Flextime 54 25 80 76 63 82 94 72 33 90

After Flextime 68 42 80 91 70 88 90 81 39 93

Difference (B − A) −4 −17 0 −15 −7 −6 4 −9 −6 −3

Difference 7 9 (Eliminated) 8 5 3.5 2 6 3.5 1 T+ = 2

To determine if the pilot flextime program is a success, we test: H0: The two probability distributions are identical Ha: The probability distribution before is shifted to the left of that after The test statistic is T+ = 2. The rejection region is T+ ≤ 8, from Table XVI, Appendix B, with n = 9 and α = .05. Since the observed value of the test statistic falls in the rejection region (T+ = 2 ≤ 8), H0 is rejected. There is sufficient evidence to indicate the pilot flextime program has been a success at α = .05.

540

Chapter 14


14.36


Science 0 4 3 1 3 2 4 2 3 4

Math 2 3 0 1 1 3 0 1 1 1

Rank of Difference Absolute ScienceDifference Math −2 5 1 2 3 7.5 0 eliminate 2 5 −1 2 4 9 1 2 2 5 3 7.5 Negative rank sum T_ = 7 Positive rank sum T+ = 38

To determine if there are differences in the levels of family involvement between math and science homework, we test; H0: The distributions of the science and math levels of family involvement are the same Ha: The distributions of the science and math levels of family involvement differ The test statistic is T_ = 7. The rejection region is T_ ≤ To where To corresponds to α = .05 (two-tailed) and n = 9. From Table XVI, Appendix B, To = 6. The rejection region is T_ ≤ 6. Since the observed value of the test statistic does not fall in the rejection region (T_ = 7 ≤/ 6), H0 is not rejected. There is insufficient evidence to indicate there are differences in the levels of family involvement between math and science homework at α = .05. 14.38

a.

The hypotheses are: H0: The three probability distributions are identical Ha: At least two of the three probability distributions differ in location

b.

The test statistic is: H=

2 12 12 ⎡ 230 2 440 2 365 2 ⎤ Rj + + − 3(n + 1) = ∑ ⎢ ⎥ − 3(46) n( n + 1) 45(46) ⎣ 15 15 15 ⎦ nj

= 146.754 − 138 = 8.754


541


The rejection region requires α = .05 in the upper tail of the χ2 distribution with 2 = 5.99147. The rejection df = p − 1 = 3 − 1 = 2. From Table VII, Appendix B, χ.05 region is H > 5.99147. Since the observed value of the test statistic falls in the rejection region (H = 8.754 > 5.99147), H0 is rejected. There is sufficient evidence to indicate that the probability distributions of at least two of the populations A, B, and C, differ in location at α = .05. c.

d.

14.40

a.

The approximate p-value is P(χ2 ≥ 8.754). From Table VII, Appendix B, with df = 2, .01 ≤ P(χ2 ≤ 8.754) ≤ .025. RB 440 R A = 230 = = 29.333 = 15.333 RB = 15 15 15 15 RC 365 n + 1 45 + 1 = = 24.333 = = 23 R = RC = 15 15 2 2 12 H= ∑ n j ( R j − R )2 n(n + 1) 12 ⎡ = 15(15.333 − 23) 2 + 15(29.333 − 23) 2 + 15(24.333 − 23) 2 ⎤⎦ = 8.754 ⎣ 45(46) In order to compare the three population means using parametric techniques, we must assume that all populations being sampled from are normal and all population variances are the same. It is quite possible that these two conditions are not met with this data. RA =

b.

Since we want to compare 3 groups, we will use the Kruskal-Wallis test.

c.

The test statistic is H=

R 2j ⎛ 53352 3937 2 37692 12 12 − + = + + 3( 1) n ⎜ ∑n n(n + 1) 161(161 + 1) ⎝ 67 57 37 j

⎞ ⎟ − 3(161 + 1) ⎠

= 11.201

14.42

d.

Since the p-value is so small (p = .0037), H0 will be rejected. There is sufficient evidence to indicate DEF distributions differ for the 3 tax litigation forums for α > .0037.

a.

To determine if the distributions of office rental growth rates differ among the four market cycle phases, we test: H0: The four probability distributions are identical Ha: At least two of the growth rate distributions differ

542

Chapter 14


b. Phase I 2.7 −1.0 1.1 3.4 4.2 3.5

14.44

The ranks of the measurements are: Rank 9 4.5 6 10 12 11 R1 = 52.5

Phase II 10.5 11.5 9.4 12.2 8.6 10.9

Rank 20 23 19 24 18 21 R2 = 125

Phase III 6.1 1.2 11.4 4.4 6.2 7.6

Rank 14 7 22 13 15.5 17 R3 = 88.5

Phase IV −1.0 6.2 −10.8 2.0 −1.1 −2.3

Rank 4.5 15.5 1 8 3 2 R4 = 34

c.

The rank sums appear in the table above. The test statistic is: R 2j ⎛ 52.52 1252 88.52 342 ⎞ 12 12 − + = + + + 3( 1) H= n ⎜ ⎟ − 3(24 + 1) ∑n n( n + 1) 24(24 + 1) ⎝ 6 6 6 6 ⎠ j = 16.23

d.

The rejection region requires α = .05 in the upper tail of the χ2 distribution with df = 2 p – 1 = 4 – 1 = 3. From Table VII, Appendix B, χ.05 = 7.81473. The rejection region is H > 7.81473.

e.

Since the observed value of the test statistic falls in the rejection region (H = 16.23 > 7.81473), H0 is rejected. There is sufficient evidence to indicate the distributions of office rental growth rates differ among the four market cycle phases at α = .05.

Some preliminary calculations are: Aromatics 1.06 0.79 0.82 0.89 1.05 0.95 0.65 1.15 1.12

Ranks 26 19 20 22 25 24 18 29 27.5

R1 = 210.5


Chloroalkanes 1.58 1.45 0.57 1.16 1.12 0.91 0.83 0.43

Ranks 32 31 15 30 27.5 23 21 9.5

R2 = 189

Esters 0.29 0.06 0.44 0.61 0.55 0.43 0.51 0.10 0.34 0.53 0.06 0.09 0.17 0.60 0.17

Ranks 7 1.5 11 17 14 9.5 12 4 8 13 1.5 3 5.5 16 5.5 R3 = 128.5

543


To determine if the sorption rate distributions differ among the three solvents, we test: H0: The three probability distributions are identical Ha: At least two of the three probability distributions differ in location

The test statistic is R 2j ⎛ 210.52 1892 128.52 ⎞ 12 12 − 3(n + 1) = + + H= ⎜ ⎟ − 3(32 + 1) ∑ n( n + 1) nj 32(32 + 1) ⎝ 9 8 15 ⎠ = 20.197

The rejection region requires α = .01 in the upper tail of the χ2 distribution with df = p – 1 = 3 2 – 1 = 2. From Table VII, Appendix B, χ.01 = 9.21034. The rejection region is H > 9.21034. Since the observed value of the test statistic falls in the rejection region (H = 20.197 > 9.21034), H0 is rejected. There is sufficient evidence to indicate the sorption rate distributions differ among the three solvents at α = .01. 14.46

a.

The F-test would be appropriate if: 1. 2. 3.

b. c.

All p populations sampled from are normal. The variances of the p populations are equal. The p samples are independent.

The variances for the three populations are probably not the same and the populations are probably not normal. To determine whether the salary distributions differ among the three cities, we test: H0: The three probability distributions are identical Ha: At least two of the three probability distributions differ in location

Some preliminary calculations are: 1 Atlanta 34,600 84,900 61,700 38,900 77,200 83,600 59,800

544

Rank 1 19 11 3 17 18 10 R1 = 79

2 Los Angeles 42,400 135,000 63,000 43,700 69,400 97,000 49,500

Rank 4 21 12 5 13 20 7 R2 = 82

3 Washington, D.C. 38,000 76,900 48,000 72,600 73,200 51,800 55,000

Rank 2 16 6 14 15 8 9 R3 = 70

Chapter 14


The test statistic is H = =

2 12 Rj − 3(n + 1) ∑ n( n + 1) nj

12 ⎛ 79 2 82 2 70 2 ⎞ + + ⎜ ⎟ − 3(22) = 66.2894 − 66 = .2894 21(22) ⎝ 7 7 7 ⎠

The rejection region requires α = .05 in the upper tail of the χ2 distribution with df = p − 2 1 = 3 − 1 = 2. From Table VII, Appendix B, χ.05 = 5.99147. The rejection region is H > 5.99147. Since the observed value of the test statistic does not fall in the rejection region (H = .2894 >/ 5.99147), H0 is not rejected. There is insufficient evidence to indicate the salary distributions differ among the three cities at α = .05. We must assume we have independent random samples, sample sizes greater than or equal to 5 from each population, and that all populations are continuous. 14.48

a.

The hypotheses are: H0: The probability distributions for three treatments are identical Ha: At least two of the probability distributions differ in location

b.

The rejection region requires α = .10 in the upper tail of the χ2 distribution with df = 2 p − 1 = 3 − 1 = 2. From Table VII, Appendix B, χ.10 = 4.60517. The rejection region is Fr > 4.60517.

c.

Some preliminary calculations are: Block 1 2 3 4 5 6 7

A

9 13 11 10 9 14 10

Rank

1 2 1 1 2 2 1 RA = 10

B 11 13 12 15 8 12 12

Rank 2 2 2.5 2 1 1 2 RB = 12.5

C 18 13 12 16 10 16 15

Rank 3 2 2.5 3 3 3 3 RC = 19.5

12 R 2j − 3b( p + 1) ∑ bp ( p + 1) 12 ⎡102 + 12.52 + 19.52 ⎤ − 3(7)(4) = 90.9286 − 84 = 6.9286 = ⎦ 7(3)(4) ⎣

The test statistic is Fr =

Since the observed value of the test statistic falls in the rejection region (Fr = 6.9286 > 4.60517), H0 is rejected. There is sufficient evidence to indicate the effectiveness of the three different treatments differ at α = .10.


545


14.50

a.

The Friedman test statistic is Fr = =

14.52

12 ∑ R 2j − 3b( p + 1) bp ( p + 1)

12 (27 2 + 252 + 182 + 112 + 92 ) − 3(6)(5 + 1) = 17.333 6(5)(5 + 1)

b.

The rejection region requires α = .05 in the upper tail of the χ2 distribution with df = 2 p – 1 = 5 – 1 = 4. From Table VII, Appendix B, χ.05 = 9.48773. The rejection region is Fr > 9.48773.

c.

Since the observed value of the test statistic falls in the rejection region (Fr = 17.333 > 9.48773), H0 is rejected. There is sufficient evidence to indicate there is a difference in the levels of farm production among the five conditions at α = .05.

a.

To determine if the distributions of rotary oil rigs differ among the three states, we test: H0: The probability distributions of the rotary oil rigs for the 3 states are the same Ha: At least two of the probability distributions of rotary oil rigs differ in location

b.

The ranked data are: Month/Year Nov. 2000 Oct. 2001 Nov. 2001

c.

Utah 2 2 2 R2 = 6

Alaska 1 1 1 R3 = 3


546

California 3 3 3 R1 = 9

(

)

12 12 92 + 62 + 32 − 3(3)(3 + 1) = 6 R 2j − 3b( p + 1) = ∑ 3(3)(3 + 1) bp ( p + 1)

d.

The rejection region requires α = .05 in the upper tail of the χ2 distribution with df = 2 p – 1 = 3 – 1 = 2. From Table VII, Appendix B, χ.05 = 5.99147. The rejection region is H > 5.99147.

e.

Since the observed value of the test statistic falls in the rejection region (H = 6 > 5.99147), H0 is rejected. There is sufficient evidence to indicate the distributions of rotary oil rigs differ among the three states at α = .05.

Chapter 14


14.54


Location Anguilla Antigua Dominica Guyana Jamaica St. Lucia Suriname

Temephos Rank 4.6 5 9.2 5 7.8 5 1.7 2 3.4 3 6.7 4 1.4 1 R1 = 13

Malsathion Rank 1.2 1 2.9 3 1.4 1 1.9 4 3.7 4 2.7 1.5 1.9 3 R2 = 15

Fenitrothion Rank 1.5 2.5 2.0 1.5 2.4 2 2.2 5 2.0 2 2.7 1.5 2.0 4 R3 = 18.5

Fenthion Rank 1.8 4 7.0 4 4.2 4 1.5 1 1.5 1 4.8 3 2.1 5 R4 = 22

Chlorpyrifos Rank 1.5 2.5 2.0 1.5 4.1 3 1.8 3 7.1 5 8.7 5 1.7 2 R5 = 22

To determine if the resistance ratio distributions of the 5 insecticides differ, we test: H0: The distributions of the 5 insecticide ratios are the same Ha: At least two of the distributions of insecticide ratios differ 12 R 2j − 3b( p + 1) ∑ bp ( p + 1) 12 (252 + 17.52 + 18.52 + 222 + 222 ) − 3(7)(5 + 1) = 2.086 = 7(5)(5 + 1)


Since no α was given, we will use α = .05. The rejection region requires α = .05 in the upper 2 tail of the χ2 distribution with df = p – 1 = 5 – 1 = 4. From Table VII, Appendix B, χ.05 = 9.48773. The rejection region is Fr > 9.48773. Since the observed value of the test statistic does not fall in the rejection region (Fr = 2.086 >/ 9.48773), H0 is not rejected. There is insufficient evidence to indicate that the resistance ratio distributions of the 5 insecticides differ at α = .05. 14.56


Week 1 2 3 4 5 6 7 8 9

Monday 5 5 2.5 2 5 4 5 4 1 R1 = 33.5

Tuesday 1 4 2.5 1 1 2 3.5 2 2 R2 = 19

Wednesday 4 3 5 3.5 2 3 1.5 1 5 R3 = 28

Thursday 2 1 1 5 3 1 3.5 3 3 R1 = 22.5

Friday 3 2 4 3.5 4 5 1.5 5 4 R2 = 32

To determine if the distributions of days of the weeks differ, we test: H0: The probability distributions of the 5 days of the week are the same Ha: At least two of the probability distributions of the 5 days of the week differ in location


547


The test statistic is 12 R 2j − 3b( p + 1) ∑ bp ( p + 1) 12 33.52 + 192 + 282 + 22.52 + 322 − 3(9)(5 + 1) = 6.778 = 9(5)(5 + 1)

Fr =

(

)

Since no α was given we will use α = .05. The rejection region requires α = .05 in the upper 2 tail of the χ2 distribution with df = p – 1 = 5 – 1 = 4. From Table VII, Appendix B, χ.05 = 9.48773. The rejection region is H > 9.48773. Since the observed value of the test statistic does not fall in the rejection region (H = 6.778 >/ 9.48773), H0 is not rejected. There is insufficient evidence to indicate the distributions of the absentee rate for the days of the weeks differ at α = .05. 14.58

14.60

a.

From Table XVII with n = 10, rs,α/2 = rs,.025 = .648. The rejection region is rs > .648 or rs < −.648.

b.

From Table XVII with n = 20, rs,α = rs,.025 = .450. The rejection region is rs > .450.

c.

From Table XVII with n = 30, rs,α = rs,.01 = .432. The rejection region is rs < −.432.

a.

H0: ρs = 0 Ha: ρs ≠ 0

b.

The test statistic is rs =

x 0 3 0 −4 3 0 4

548

Rank, u 3 5.5 3 1 5.5 3 7 ∑ u = 28

SSuv =

∑ uv −

SSuu =

∑u

SSvv =

∑v

2

2

SSuv SSuuSSvv y 0 2 2 0 3 1 2

Rank, v 1.5 5 5 1.5 7 3 5 ∑ v = 28

( ∑ u )( ∑ v ) = 131 − 28(28) n

(∑u ) −

2

n

(∑ v) − n

7

= 137.5 −

(20) 2 7

= 137.5 −

(20) 2 7

2

u2 9 30.25 9 1 30.25 9 49 ∑ u 2 = 137.5

v2 2.25 25 25 2.25 49 9 25 ∑ v 2 = 137.5

uv 45 27.5 15 1.5 38.5 9 35 ∑ uv = 131

= 19

Chapter 14


rs =

19 = .745 25.5(25.5) Reject H0 if rs < −rs,α/2 or rs > rs,α/2 where α/2 = .025 and n = 7: Reject H0 if rs < −.786 or rs > .786 (from Table XVII, Appendix B).

Since the observed value of the test statistic does not fall in the rejection region, (rs = .745 >/ .786), do not reject H0. There is insufficient evidence to indicate x and y are correlated at α = .05.

14.62

c.

The p-value is P(rs ≥ .745) + P(rs ≤ −.745). For n = 7, rs = .745 is above rs,.025 where α/2 = .025 and below rs,.05 where α/2 = .05. Therefore, 2(.025) = .05 < p-value < 2(.05) = .10.

d.

The assumptions of the test are that the samples are randomly selected and the probability distributions of the two variables are continuous.

a.

Some preliminary calculations are: Expert 1 6 5 1 3 2 4

Brand A B C D E F

rs = 1 − b.

6∑ di2 n(n − 1) 2

= 1−

Expert 2 5 6 2 1 4 3

Difference di 1 −1 −1 2 −2 1

di2 1 1 1 4 4 1 ∑ di2 = 12

6(12) = 1 − .343 = .657 6(62 − 1)

To determine if there is a positive correlation in the rankings of the two experts, we test: H0: ρs = 0 Ha: ρs > 0 The test statistic is rs = .657. Reject H0 if rs > rs,α where α = .05 and n = 6. From Table XVII, Appendix B, rs,.01 = .829. Reject H0 if rs > .829. Since the observed value of the test statistic does not fall in the rejection region (rs = .657 >/ .829), H0 is not rejected. There is insufficient evidence to indicate a positive correlation in the rankings of the two experts at α = .05.


549


14.64

a.

Some preliminary calculations are: x u y v 5.2 1 220 4.5 5.5 7 227 7.5 6.0 23.5 259 15.5 5.9 20.5 210 1 5.8 16 224 6 6.0 23.5 215 3 5.8 16 231 9 5.6 10 268 19 5.6 10 239 11 5.9 20.5 212 2 5.4 5 410 24 5.6 10 256 14 5.8 16 306 22 5.5 7 259 15.5 5.3 3 284 21 5.3 3 383 23 5.7 12.5 271 20 5.5 7 264 18 5.7 12.5 227 7.5 5.3 3 263 17 5.9 20.5 232 10 5.8 16 220 4.5 5.8 16 246 13 5.9 20.5 241 12 ∑ u =300 ∑ v = 300 SSuv =

∑ uv −

SSuu =

∑u

SSvv =

∑v

rs =

2

2

u-sq 1 49 552.25 420.25 256 552.25 256 100 100 420.25 25 100 256 49 9 9 156.25 49 156.25 9 420.25 256 256 420.25 2 ∑ u =4878

( ∑ u )( ∑ v ) = 3197.5 − 300(300) n

(∑u ) −

2

n

(∑v) −

SSuv SSuuSSvv

n

=

24

= 4878 −

v-sq 20.25 56.25 240.25 1 36 9 81 361 121 4 576 196 484 240.25 441 529 400 324 56.25 289 100 20.25 169 144 2 ∑ v =4898.5

uv 4.5 52.5 364.25 20.5 96 70.5 144 190 110 41 120 140 352 108.5 63 69 250 126 93.75 51 205 72 208 246 ∑ uv =3197.5

= −552.5

3002 = 1128 24

2

= 4898.5 − −552.5

1128(1148.5)

3002 = 1148.5 24 = −.4854

Since the magnitude of the correlation coefficient is not particularly large, there is a fairly weak negative relationship between sweetness index and pectin.

550

Chapter 14


b.

To determine if there is a negative association between the sweetness index and the amount of pectin, we test: H0: ρs = 0 Ha: ρs < 0 The test statistic is rs = −.4854 Reject H0 if rs < −rs,α where α = .01 and n = 24. Reject H0 if rs < −.485 (from Table XVII, Appendix B) Since the observed value of the test statistic falls in the rejection region (rs = −.4854 < −.485), H0 is rejected. There is sufficient evidence to indicate there is a negative association between the sweetness index and the amount of pectin at α = .01.

14.66

a.

Some preliminary calculations are: Parent 643 381 342 251 216 208 192 141 131 128 124

Rank, u 11 10 9 8 7 6 5 4 3 2 1

rs = 1 −

Subsid 2,617 1,724 1,867 1,238 890 681 1,534 899 492 579 672

6∑ di2 n( n − 1) 2

=1−

Rank, v 11 9 10 7 5 4 8 6 1 2 3

Difference di 0 1 -1 1 2 2 -3 -2 2 0 -2

di2 0 1 1 1 4 4 9 4 4 0 4 2 ∑ di = 32

6(32) = 1 − .145 = .855 11(112 − 1)

Since this correlation coefficient is fairly close to 1, it indicates that there is a relatively strong positive relationship between the number of parent companies and the number of subsidiaries. To determine if the number of parent companies is positively related to the number of subsidiaries, we test: H0: ρs = 0 Ha: ρs > 0 The test statistic is rs = .855.


551


From Table XVI, Appendix B, rs,.05 = .523, with n = 11. The rejection region is rs > .523. Since the observed value of the test statistic falls in the rejection region (rs = .855 > .523), H0 is rejected. There is sufficient evidence to indicate that the number of parent companies is positively related to the number of subsidiaries at α = .05. b.

We must assume: 1. The sample is randomly selected. 2. The probability distributions of both of the variables are continuous. The actual number of companies and subsidiaries are not continuous. However, since the numbers of companies/subsidiaries are very large, this assumption is basically met. From the information given, we cannot tell whether the sample was random or not.

14.68

b.

Some preliminary calculations:

Involvement

1 2 3 4 5 6 7 8 9 10 11

rs = 1 −

6∑ d i2 n(n − 1) 2

ui

vi

Differences di = ui − vi

8 6 10 2 5 9 1 4 7 11 3

9 7 10 1 5 8 2 4 6 11 3

−1 −1 0 1 0 1 −1 0 1 0 0

=1−

d i2

∑ di2

1 1 0 1 0 1 1 0 1 0 1 =6

6(6) = .972 11(112 − 1)

To determine if a positive relationship exists between participation rates and cost savings rates, we test: H0: ρs = 0 Ha: ρs > 0 The test statistic is rs = .972. From Table XVII, Appendix B, rs,.01 = .736, with n = 11. The rejection region is rs > .736.

552

Chapter 14


Since the observed value of the test statistic does falls in the rejection region (rs = .972 > .736), H0 is rejected. There is sufficient evidence to indicate that a positive relationship exists between participation rates and cost savings rates at α = .01. c.

In order for the above test to be valid, we must assume: 1. 2.

The sample is randomly selected. The probability distributions of both of the variables are continuous.

In order to use the Pearson correlation coefficient, we must assume that both populations are normally distributed. It is very unlikely that the data are normally distributed. 14.70

The appropriate test for this completely randomized design is the Kruskal-Wallis H-test. Some preliminary calculations are: Sample 1 18 32 43 15 63

Rank 4.5 6 9 3 12

Sample 2 12 33 10 34 18

Rank Sample 3 12 87 7 53 1 65 8 50 4.5 64 77 R2 = 22.5

R1 = 34.5

Rank

16 11 14 10 13 15 R3 = 79

To determine whether at least two of the populations differ in location, we test: H0: The three probability distributions are identical Ha: At least two of the three probability distributions differ in location 2

Rj 12 The test statistic is H = − 3( n + 1) ∑ n( n + 1) nj =

⎡ (34.5) 2 (22.5) 2 (79) 2 ⎤ 12 + + ⎢ ⎥ − 3(16 + 1) 16(16 + 1) ⎣ 5 5 6 ⎦

= 60.859 − 51 = 9.859 The rejection region requires α = .05 in the upper tail of the χ2 distribution with df = p − 1 = 3 2 − 1 = 2. From Table VII, Appendix B, χ.05 = 5.99147. The rejection region is H > 5.99147. Since the observed value of the test statistic falls in the rejection region (H = 9.859 > 5.99147), reject H0. There is sufficient evidence to indicate a difference in location for at least two of the three probability distributions at α = .05.


553


14.72

The appropriate test for two independent samples is the Wilcoxon rank sum test. Some preliminary calculations are: Sample 1 1.2 1.9 .7 2.5 1.0 1.8 1.1

Rank 4 8.5 1 10 2 7 3 T1 = 35.5

Sample 2 1.5 1.3 2.9 1.9 2.7 3.5

Rank 6 5 12 8.5 11 13

T2 = 55.5

To determine if there is a difference between the locations of the probability distributions, we test: H0: The two sampled populations have identical probability distributions Ha: The probability distribution for population 1 is shifted to the left or right of that for 2 The test statistic is T2 = 55.5. Reject H0 if T2 ≤ TL or T2 ≥ TU where α = .05 (two-tailed), n1 = 7 and n2 = 6: Reject H0 if T2 ≤ 28 or T2 ≥ 56 (from Table XV, Appendix B). Since T2 = 55.5 ≤/ 28 and T2 = 55.5 ≥/ 56, do not reject H0. There is insufficient evidence to indicate a difference between the locations of the probability distributions for the sampled populations at α = .05. 14.74

a.

To determine whether the median biting rate is higher in bright, sunny weather, we test: H0: η = 5 Ha: η > 5

b.

( S − .5) − .5n (95 − .5) − .5(122) = = 6.07 .5 n .5 122 (where S = number of observations greater than 5)


The p-value is p = P(z ≥ 6.07). From Table IV, Appendix B, p = P(z ≥ 6.07) ≈ 0.0000. c.

554

Since the observed p-value is less than α (p = 0.0000 < .01), H0 is rejected. There is sufficient evidence to indicate that the median biting rate in bright, sunny weather is greater than 5 at α = .01.

Chapter 14


14.76

Some preliminary calculations are: Difference Highway 1 − Highway 2 −25 4 −23 −16 −16

Rank of Absolute Differences 5 1 4 2.5 2.5 T+ = 1

To determine if the heavily patrolled highway tends to have fewer speeders per 100 cars than the occasionally patrolled highway, we test: H0: The two sampled populations have identical probability distributions Ha: The probability distribution for highway 1 is shifted to the left of that for highway 2 The test statistic is T+ = 1. The rejection region is T+ ≤ 1 from Table XVI, Appendix B, with n = 5 and α = .05. Since the observed value of the test statistic falls in the rejection region (T+ = 1 ≤ 1), H0 is rejected. There is sufficient evidence to indicate the probability distribution for highway 1 is shifted to the left of that for highway 2 at α = .05. b.

Some preliminary calculations are: Day

1 2 3 4 5

Difference Highway 1 − Highway 2 25 4 −23 −16 −16

d=

∑ di = −76 5

n

∑

di2

= −15.2

( ∑ di ) −

2

n = n −1 sd = 131.7 = 11.4761

sd2 =

(−76) 2 5 5 −1

1682 −

To determine if the mean number of speeders per 100 cars differ for the two highways, we test: H0: μ1 = μ2 Ha: μ1 ≠ μ2 The test statistic is t =


d −0 −15.2 = = − 2.96 s d / n 11.4761 5

555


The rejection region requires α/2 = .05/2 = .025 in each tail of the t-distribution with df = n − 1 = 5 − 1 = 4. From Table VI, Appendix B, t.025 = 2.776. The rejection region is t > 2.776 and t < −2.776. Since the observed value of the test statistic falls in the rejection region (t = −2.96 < −2.776), H0 is rejected. There is sufficient evidence to indicate the mean number of speeders per 100 cars differ for the two highways at α = .05. We must assume that the population of differences is normally distributed and that a random sample of differences was selected. 14.78

a.

Since only 70 of the 80 customers responded to the question, only the 70 will be included. To determine if the median amount spent on hamburgers at lunch at McDonald's is less than $2.25, we test: H0: η = 2.25 Ha: η < 2.25 S = number of measurements less than 2.25 = 20. The test statistic is z =

( S − .5) − .5n .5 n

=

(20 − .5) − .5(70) .5 70

= −3.71

No α was given in the exercise. We will use α = .05. The rejection region requires α = .05 in the lower tail of the z-distribution. From Table IV, Appendix B, z.05 = 1.645. The rejection region is z > 1.645. Since the observed value of the test statistic does not fall in the rejection region (z = −3.71 >/ 1.645), H0 is not rejected. There is insufficient evidence to indicate that the median amount spent on hamburgers at lunch at McDonald's is less than $2.25 at α = .05.

556

b.

No. The survey was done in Boston only. The eating habits of those living in Boston are probably not representative of all Americans.

c.

We must assume that the sample is randomly selected from a continuous probability distribution.

Chapter 14


14.80

Some preliminary calculations: 1

Urban 4.3 5.2 6.2 5.6 3.8 5.8 4.7

2 3 Rank Suburban Rank Rural Rank 4.5 5.9 14 5.1 9 10.5 6.7 17 4.8 7 15.5 7.6 19 3.9 2 12 4.9 8 6.2 15.5 1 5.2 10.5 4.2 3 13 6.8 18 4.3 4.5 6 R1 = 62.5 R2 = 86.5 R3 = 41

To determine if there is a difference in the level of property taxes among the three types of school districts, we test: H0: The three probability distributions are identical Ha: At least two of the three probability distributions differ in location 2

The test statistic is H =

Rj 12 − 3( n + 1) ∑ n( n + 1) nj

⎛ 62.52 86.52 412 ⎞ 12 + + ⎜ ⎟ − 3(20) = 65.8498 − 60 19(19 + 1) ⎝ 7 6 6 ⎠ = 5.8498 =

The rejection region requires α = .05 in the upper tail of the χ2 distribution with df = p − 1 = 2 = 5.99147. The rejection region is H > 5.99147. 3 − 1 = 2. From Table VII, Appendix B, χ.05 Since the observed value of the test statistic does not fall in the rejection region (H = 5.8498 >/ 5.99147), H0 is not rejected. There is insufficient evidence to indicate that there is a difference in the level of property taxes among the three types of school districts at α = .05. 14.82

a. Some preliminary calculations are: Truck Static Weight of Truck (ui) 1 3 2 4 3 10 4 1 5 6 6 8 7 2 8 5 9 7 10 9 55


Weigh-in-Motion Prior (vi) 3 4 9 1.5 6 8 1.5 5 7 10 55

Weigh-in-Motion After (wi) 3 4 10 2 6 8 1 5 7 9 55

uivi

9 16 90 1.5 36 64 3 25 49 90 383.5

uiwi

9 16 100 2 36 64 2 25 49 81 384

557


∑ ui ∑ vi = 383.5 − 55(55)

SSuv =

∑ ui vI −

SSuw =

∑ u i wi −

SSuu =

∑ ui2 −

SSvv =

∑

SSww =

∑

rs1 = rs2 =

vi2

n ( ∑ ui ∑ wi )

( ∑ ui ) n

= 385 −

SSuu SSvv

n

=

= 384.5 −

SSuw SSuu SSww

=

2

= 385 −

81 82.5(82)

= 81

55(55) = 81.5 10

552 = 81.5 10

2

( ∑ wi ) −

SSuv

= 384 −

2

n

( ∑ vi ) −

wi2

n

10

552 = 82 10 552 = 82.5 10

= .9848

81.5 = .9879 82.5(82.5)

The correlation coefficient for x and y1 is rs1 = .9848. Since rs1 > 0, the relationship between static weight and weigh-in-motion prior to adjustment is positive. Because the value is close to 1, the relationship is very strong. It is larger than r1 = .965 found in Exercise 10.89. The correlation coefficient for x and y2 is rs2 = .9879. Since rs2 > 0, the relationship between static weight and weigh-in-motion after the adjustment is positive. Because the value is close to 1, the relationship is very strong. It is smaller than r2 = .996 found in Exercise 10.89. b.

In order for rs to be exactly 1, the rankings for the static weight and the weigh-in-motion must be the same for each truck. In order for rs to be exactly 0, the rankings for one of the variables (static weight) must be equal to 11 minus ranking of the other variable (weigh-in-motion) for each truck.

14.84

a.

To determine if the median level differs from the target, we test: H0: η = .75 Ha: η ≠ .75

b.

S1 = number of observations less than .75 and S2 = number of observations greater than .75. The test statistic is S = larger of S1 and S2. The p-value = 2P(x ≥ S) where x is a binomial random variable with n = 25 and p = .5. If the p-value is less than α = .10, reject H0.

558

Chapter 14


c.

A Type I error would be concluding the median level is not .75 when it is. If a Type I error were committed, the supervisor would correct the fluoridation process when it was not necessary. A Type II error would be concluding the median level is .75 when it is not. If a Type II error were committed, the supervisor would not correct the fluoridation process when it was necessary.

d.

S1 = number of observations less than .75 = 7 and S2 = number of observations greater than .75 = 18. The test statistic is S = larger of S1 and S2 = 18. The p-value = 2P(x ≥ 18) where x is a binomial random variable with n = 25 and p = .5. From Table II, p-value = 2P(x ≥ 18) = 2(1 − P(x ≤ 17)) = 2(1 − .978) = 2(.022) = .044 Since the p-value = .044 < α = .10, H0 is rejected. There is sufficient evidence to indicate the median level of fluoridation differs from the target of .75 at α = .10.

e.

A distribution heavily skewed to the right might look something like the following:

One assumption necessary for the t-test is that the distribution from which the sample is drawn is normal. A distribution which is heavily skewed in one direction is not normal. Thus, the sign test would be preferred. 14.86

Some preliminary calculations are: Hours

Rank

1 2 3 4 5 6 7 8

1 2 3 4 5 6 7 8


Fraction Defective .02 .05 .03 .08 .06 .09 .11 .10

Rank

1 3 2 5 4 6 8 7

di

0 −1 1 −1 1 0 −1 1

d i2

∑ di2

0 1 1 1 1 0 1 1 =6

559


To determine if the fraction defective increases as the day progresses, we test: H0: ρs = 0 Ha: ρs > 0 The test statistic is rs = 1 −

6∑ di2 n(n − 1) 2

=1−

6(6) = 1 − .071 = .929 8(82 − 1)

Reject H0 if rs > rs,α where α = .05 and n = 8: Reject H0 if rs > .643 (from Table XVII, Appendix B). Since rs = .929 > .643, reject H0. There is sufficient evidence to indicate that the fraction defective increases as the day progresses at α = .05. 14.88

a.

The design utilized was a completely randomized design.

b.

Some preliminary calculations are: Site 1 34.3 35.5 32.1 28.3 40.5 36.2 43.5 34.7 38.0 35.1

Rank 6 11 3 1 19 12 23 8 15 9 R1 = 107

Site 2 39.3 45.5 50.2 72.1 48.6 42.2 103.5 47.9 41.2 44.0

Rank 17 25 28 29 27 21 30 26 20 24 R2 = 247

Site 3 34.5 29.3 37.2 33.2 32.6 38.3 43.3 36.7 40.0 35.2

Rank 7 2 14 5 4 16 22 13 18 10 R3 = 111

To determine if the probability distributions for the three sites differ, we test: H0: The three sampled population probability distributions are identical Ha: At least two of the three sampled population probability distributions differ in location 2

Rj 12 The test statistic is H = − 3( n + 1) − 3(n + 1) ∑ n( n + 1) nj =

560

12 ⎡107 2 247 2 1112 ⎤ + + ⎢ ⎥ − 3(31) = 109.3923 − 93 30(31) ⎣ 10 10 10 ⎦ = 16.3923

Chapter 14


The rejection region requires α = .05 in the upper tail of the χ2 distribution with df = 2 = 5.99147. The rejection region is p − 1 = 3 − 1 = 2. From Table VII, Appendix B, χ.05 H > 5.99147. Since the observed value of the test statistic falls in the rejection region (H = 16.3923 > 5.99147), H0 is rejected. There is sufficient evidence to indicate the probability distributions for at least two of the three sites differ at α = .05. c.

Since H0 was rejected, we need to compare all pairs of sites. Some preliminary calculations are: Site 1 34.3 35.5 32.1 28.3 40.5 36.2 43.5 34.7 38.0 35.1

Site 2 39.3 45.5 50.2 72.1 48.6 42.2 103.5 47.9 41.2 44.0

Rank 3 6 2 1 10 7 13 4 8 5 T1 = 59 Rank 9 15 18 19 17 12 20 16 11 14 T2 = 151

Site 2 39.3 45.5 50.2 72.1 48.6 42.2 103.5 47.9 41.2 44.0

Rank 9 15 18 19 17 12 20 16 11 14 T2 = 151 Site 3 34.5 29.3 37.2 33.2 32.6 38.3 43.3 36.7 40.0 35.2

Site 1 34.3 35.5 32.1 28.3 40.5 36.2 43.5 34.7 38.0 35.1

Rank 6 11 3 1 18 12 20 8 15 9 T1 = 103

Site 3 34.3 29.3 37.2 33.2 32.6 38.3 43.3 36.7 40.0 35.2

Rank 7 2 14 5 4 16 19 13 17 10 T3 = 107

Rank 4 1 7 3 2 8 13 6 10 5 T3 = 59

For each pair, we test: H0: The two sampled population probability distributions are identical Ha: The probability distribution for one site is shifted to the right or left of the other. The rejection region for each pair is T ≤ 79 or T ≥ 131 from Table XV, Appendix B, with n1 = n2 = 10 and α = .05.


561


For sites 1 and 2: The test statistic is T1 = 59. Since the observed value of the test statistic falls in the rejection region, (TA = 59 ≤ 79), H0 is rejected. There is sufficient evidence to indicate the probability distribution for site 1 is shifted to the left of that for site 2 at α = .05. For sites 1 and 3: The test statistic is T1 = 103. Since the observed value of the test statistic does not fall in the rejection region (T1 = 103 / 131), H0 is not rejected. There is insufficient evidence to indicate the probability distribution for site 1 is shifted to the right or left of that for site 3 at α = .05. For sites 2 and 3: The test statistic is T2 = 151. Since the observed value of the test statistic falls in the rejection region (T2 = 151 ≥ 131), H0 is rejected. There is sufficient evidence to indicate the probability distribution for site 2 is shifted to the right of that for site 3 at α = .05. Thus, the income for those at site 2 is significantly higher than at the other two sites. d.

The necessary assumptions are: 1. 2. 3.

The three samples are random and independent. There are five or more measurements in each sample. The three probability distributions from which the samples are drawn are continuous.

For parametric tests, the assumptions are: 1. 2. 3.

562

The three populations are normal. The samples are random and independent The three population variances are equal.

Chapter 14


14.90

Using MINITAB, the results of the Wilcoxon Rank Sum Test (Mann-Whitney Test) for each of the Variables are: Mann-Whitney Test and CI: CREATIVE-S, CREATIVE-NS CREATIVE-S CREATIVE-NS

N 47 67

Median 5.0000 4.0000

Point estimate for ETA1-ETA2 is 1.0000 95.0 Percent CI for ETA1-ETA2 is (0.9999,1.0000) W = 3734.5 Test of ETA1 = ETA2 vs ETA1 not = ETA2 is significant at 0.0000 The test is significant at 0.0000 (adjusted for ties)

Mann-Whitney Test and CI: INFO-S, INFO-NS INFO-S INFO-NS

N 47 67

Median 5.000 5.000

Point estimate for ETA1-ETA2 is 0.000 95.0 Percent CI for ETA1-ETA2 is (-0.000,1.000) W = 2888.5 Test of ETA1 = ETA2 vs ETA1 not = ETA2 is significant at 0.2856 The test is significant at 0.2743 (adjusted for ties)

Mann-Whitney Test and CI: DECPERS-S, DECPERS-NS DECPERS-S DECPERS-NS

N 47 67

Median 3.000 2.000

Point estimate for ETA1-ETA2 is -0.000 95.0 Percent CI for ETA1-ETA2 is (-0.000,1.000) W = 2963.5 Test of ETA1 = ETA2 vs ETA1 not = ETA2 is significant at 0.1337 The test is significant at 0.1228 (adjusted for ties)

Mann-Whitney Test and CI: SKILLS-S, SKILLS-NS SKILLS-S SKILLS-NS

N 47 67

Median 6.0000 5.0000

Point estimate for ETA1-ETA2 is 1.0000 95.0 Percent CI for ETA1-ETA2 is (0.9999,1.9999) W = 3498.5 Test of ETA1 = ETA2 vs ETA1 not = ETA2 is significant at 0.0000 The test is significant at 0.0000 (adjusted for ties)


563


Mann-Whitney Test and CI: TASKID-S, TASKID-NS N 47 67

TASKID-S TASKID-NS

Median 5.000 4.000


Mann-Whitney Test and CI: AGE-S, AGE-NS AGE-S AGE-NS

N 47 67

Median 47.000 45.000


Mann-Whitney Test and CI: EDYRS-S, EDYRS-NS EDYRS-S EDYRS-NS

N 47 67

Median 13.000 13.000

Point estimate for ETA1-ETA2 is -0.000 95.0 Percent CI for ETA1-ETA2 is (0.000,-0.000) W = 2664.0 Test of ETA1 = ETA2 vs ETA1 not = ETA2 is significant at 0.8268 The test is significant at 0.8191 (adjusted for ties)

A summary of the tests above and the t-tests from Chapter 7 are listed in the table: Variable CREATIVE INFO DECPERS SKILLS TASKID AGE EDYRS

Wilcoxon Test Statistic, T2 3734.5 2888.5 2963.5 3498.5 3028.0 2891.5 2664.0

p-value 0.000 0.274 0.123 0.000 0.057 0.277 0.819

t 8.847 1.503 1.506 4.766 1.738 0.742 -0.623

p-value 0.000 0.136 0.135 0.000 0.087 0.460 0.534

The p-values for the Wilcoxon Rank Sum Tests and the t-tests are similar and the decisions are the same. Since the sample sizes are large (n = 47 and n = 67), the Central Limit Theorem applies. Thus, the t-tests (or z-tests) are valid. One assumption for the Wilcoxon Rank Sum test is that the distributions are continuous. Obviously, this is not true. There are many ties in the data, so the Wilcoxon Rank Sum tests may not be valid.

564

Chapter 14

SM McClave Stat10 Wm

Recommend Documents