To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
INSTRUCTOR'S SOLUTIONS MANUAL to Accompany James T. McClave P. George Benson and Terry Sincich's
STATISTICS FOR BUSINESS AND ECONOMICS Tenth Edition
Nancy S. Boudreau
Bowling Green State University
Upper Saddle River, New Jersey Columbus, Ohio
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
Contents Preface
v
Chapter 1
Statistics, Data, and Statistical Thinking
1
Chapter 2
Methods for Describing Sets of Data The Kentucky Milk Case
5 46
Chapter 3
Probability
55
Chapter 4
Random Variables and Probability Distributions The Furniture Fire Case
82 136
Chapter 5
Inferences Based on a Single Sample: Estimation with Confidence Intervals
137
Chapter 6
Inferences Based on a Single Sample: Tests of Hypothesis
161
Chapter 7
Inferences Based on Two Samples: Confidence Intervals and Tests of Hypotheses The Kentucky Milk Case – Part II
201 243
Chapter 8
Design of Experiments and Analysis of Variance
256
Chapter 9
Categorical Data Analysis Discrimination in the Work Place
300 328
Chapter 10
Simple Linear Regression
332
Chapter 11
Multiple Regression and Model Building The Condo Sales Case
379 444
Chapter 12
Methods for Quality Improvement
448
Chapter 13
Time Series: Descriptive Analyses, Models, and Forecasting The Gasket Manufacturing Case
476 522
Chapter 14
Nonparametric Statistics
529
iii
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
iv
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
Preface This solutions manual is designed to accompany the text, Statistics for Business and Economics, Tenth Edition, by James T. McClave, P. George Benson, and Terry Sincich. It provides answers to most evennumbered exercises for each chapter in the text. Other methods of solution may also be appropriate; however, the author has presented one that she believes to be most instructive to the beginning Statistics student. This manual is provided to help instructors save time in preparing presentations of the solutions and to possibly provide another point of view regarding their meaning. Some of the exercises are subjective in nature. Subjective decisions regarding these exercises have been made and are explained by the author. Solutions based on these decisions are presented; the solution to this type of exercise is often most instructive. When an alternative interpretation of an exercise may occur, the author has often addressed it and given justification for the approach taken. I would like to thank Kelly Barber for creating the art work and for typing this work.
Nancy S. Boudreau Bowling Green State University Bowling Green, Ohio
v
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
Statistics, Data, and Statistical Thinking
Chapter 1
1.2
Descriptive statistics utilizes numerical and graphical methods to look for patterns, to summarize, and to present the information in a set of data. Inferential statistics utilizes sample data to make estimates, decisions, predictions, or other generalizations about a larger set of data.
1.4
The first element of inferential statistics is the population of interest. The population is a set of existing units. The second element is one or more variables that are to be investigated. A variable is a characteristic or property of an individual population unit. The third element is the sample. A sample is a subset of the units of a population. The fourth element is the inference about the population based on information contained in the sample. A statistical inference is an estimate, prediction, or generalization about a population based on information contained in a sample. The fifth and final element of inferential statistics is the measure of reliability for the inference. The reliability of an inference is how confident one is that the inference is correct.
1.6
Quantitative data are measurements that are recorded on a meaningful numerical scale. Qualitative data are measurements that are not numerical in nature; they can only be classified into one of a group of categories.
1.8
A population is a set of existing units such as people, objects, transactions, or events. A sample is a subset of the units of a population.
1.10
An inference without a measure of reliability is nothing more than a guess. A measure of reliability separates statistical inference from fortune telling or guessing. Reliability gives a measure of how confident one is that the inference is correct.
1.12
Statistical thinking involves applying rational thought processes to critically assess data and inferences made from the data. It involves not taking all data and inferences presented at face value, but rather making sure the inferences and data are valid.
1.14
a.
The two variables measured are ‘type of credit card used’ and ‘amount of purchase.’ ‘Type of credit card used’ is qualitative. It has no meaningful number associated with it, only the name of the card used. ‘Amount of purchase’ is quantitative. It has a meaningful number associated with it.
b.
In Study 1, it says that all purchases were tracked. Thus, the data represent a population.
a.
High school GPA is a number usually between 0.0 and 4.0. Therefore, it is quantitative.
b.
Honors/awards would have responses that name things. Therefore, it would be qualitative.
1.16
Statistics, Data, and Statistical Thinking
1
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
1.18
1.20
1.22.
2
c.
The scores on the SAT's are numbers between 200 and 800. Therefore, it is quantitative.
d.
Gender is either male or female. Therefore, it is qualitative.
e.
Parent's income is a number: $25,000, $45,000, etc. Therefore, it is quantitative.
f.
Age is a number: 17, 18, etc. Therefore, it is quantitative.
a.
1.
The variable of interest is the status of a company’s e-commerce strategy. Since a company either has an e-commerce strategy or not, the variable is qualitative.
2.
The variable of interest is when the company will implement an e-commerce plan. Since the time of implementation will be a date, this variable will be qualitative.
3.
The variable of interest is whether the company is delivering products over the internet or not. Since the company is either delivering products or not, the variable is qualitative.
4.
The variable of interest is the company’s total revenue in the last fiscal year. Since this is a meaningful number, this variable is quantitative.
b.
Since there are many more that 154 companies in the U.S., this represents a sample rather than a population.
a.
The population of interest is the collection of computer security personnel at all U.S. corporations and government agencies.
b.
Surveys were sent to computer security personnel at all U. S. corporations and government agencies. However, in 2006, only 616 organizations responded to the survey. There could be nonresponse bias. Often, only those subjects with strong opinions will respond to a survey. Thus, the responses may not reflect what the population as a whole thinks.
c.
The variable measured in the survey is whether or not there was unauthorized use of computer systems at the firms during the year. Since the responses will be either ‘Yes’ or “No’, the variable is qualitative.
d.
If we assume that the responses were a random sample from the population, we could infer that about 52% of all computer security personnel will admit to unauthorized use of computer systems at their firms during the year.
a.
The data collection method used is a designed experiment.
b.
The experimental units in the study are the 50,000 smokers.
c.
The variable of interest is the age at which the scanning method first detects a tumor. Since this is a meaningful number, this variable is quantitative.
Chapter 1
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
1.24
1.26
1.28
1.30
d.
The population of interest is the set of all smokers in the U.S. The sample of interest is the set of 50,000 smokers surveyed.
e.
The researchers want to compare the age at first detection for the 2 methods to see if one is more sensitive than the other.
a.
The variable of interest to the researchers is the rating of highway bridges.
b.
Since the rating of a bridge can be categorized as one of three possible values, it is qualitative.
c.
The data set analyzed is a population since all highway bridges in the U.S. were categorized.
d.
The data were collected observationally. Each bridge was observed in its natural setting.
a.
The population of interest is the set of all New York accounting firms employing two or more professionals. There are two variables of interest: Whether or not the firm uses audit sampling methods, and if so, whether or not it uses random sampling. The sample is the set of 163 firms whose responses were useable. The inference of interest to the New York Society of CPAs is the proportion of all New York accounting firms employing two or more professionals that use sampling methods in auditing their clients.
b.
The four responses that were unusable could have been returned blank or could have been filled out incorrectly.
c.
Any time a survey is mailed it is questionable whether the returned questionnaires represent a random sample. Often times, only those with very strong opinions return the surveys. In such a case, the returned surveys would not be representative of the entire population.
a.
The experimental units in this study are the 24 projects.
b.
The population from which the sample was selected is the set of all new software development projects.
c.
The variable of interest in this project is the outcome of reusing previously developed software for the new software development projects.
d.
In the sample, 9 of the 24 projects were judged failures. This is (9 / 24)*100% = 37.5%. We could infer that approximately 37.5% of all projects would be judged failures.
a.
The process being studied is the process of filling beverage cans with softdrink at CCSB's Wakefield plant.
b.
The variable of interest is the amount of carbon dioxide added to each can of beverage.
c.
The sampling plan was to monitor five filled cans every 15 minutes. The sample is the total number of cans selected.
Statistics, Data, and Statistical Thinking
3
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
4
d.
The company's immediate interest is learning about the process of filling beverage cans with softdrink at CCSB's Wakefield plant. To do this, they are measuring the amount of carbon dioxide added to a can of beverage to make an inference about the process of filling beverage cans. In particular, they might use the mean amount of carbon dioxide added to the sampled cans of beverage to estimate the mean amount of carbon dioxide added to all the cans on the process line.
e.
The technician would then be dealing with a population. The cans of beverage have already been processed. He/she is now interested in the outputs.
Chapter 1
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
Methods for Describing Sets of Data 2.2
a.
To find the frequency for each class, count the number of times each letter occurs. The frequencies for the three classes are: Class X Y Z Total
b.
Chapter 2
Frequency 8 9 3 20
The relative frequency for each class is found by dividing the frequency by the total sample size. The relative frequency for the class X is 8/20 = .40. The relative frequency for the class Y is 9/20 = .45. The relative frequency for the class Z is 3/20 = .15. Class X Y Z Total
Frequency 8 9 3 20
Relative Frequency .40 .45 .15 1.00
c.
The frequency bar chart is:
d.
The pie chart for the frequency distribution is:
Methods for Describing Sets of Data
5
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
2.4
a.
The variable summarized in the table is ‘Reason for requesting the installation of the passenger-side on-off switch.’ The values this variable could assume are: Infant, Child, Medical, Infant & Medical, Child & Medical, Infant & Child, and Infant & Child & Medical. Since the responses name something, the variable is qualitative.
b.
The relative frequencies are found by dividing the number of requests for each category by the total number of requests. For the category ‘Infant’, the relative frequency is 1,852/30,337 = .061. The rest of the relative frequencies are found in the table below: Reason Infant
Number of Requests 1,852
1,852/30,337
Relative frequencies .061
Child
17,148
17,148/30,337
.565
Medical
8,377
8,377/30,337
.276
Infant & Medical
44
44/30,337
.0014
Child & Medical
903
903/30,337
.030
1,878
1,878/30,337
.062
135
135/30,337
.0045
Infant & Child Infant & Child & Medical TOTAL c.
30,337
.9999
Using MINITAB, a pie chart of the data is:
Pie Chart of Reason Child
(17148, 56.5%)
Child&Medica ( 903, 3.0%) Inf &Chd&Med ( 135, 0.4%) Inf ant
( 1852, 6.1%)
Medical
( 8377, 27.6%)
Inf ant&Child ( 1878, 6.2%) Inf ant&Medic (
d.
6
44, 0.1%)
There are 4 categories where Medical is mentioned as a reason: Medical, Infant & Medical, Child & Medical, and Infant & Child & Medical. The sum of the frequencies for these 4 categories is 8,377 + 44 + 903 + 135 = 9,459. The proportion listing Medical as one of the reasons is 9,459/30,337 = .312.
Chapter 2
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
2.6
a.
To find relative frequencies, we divide the frequencies of each category by the total number of incidents. The relative frequencies of the number of incidents for each of the cause categories are: Management System Cause Category Engineering & Design Procedures & Practices Management & Oversight Training & Communication TOTAL
b.
Number of Incidents
Relative Frequencies
27 24 22 10 83
27 / 83 = .325 24 / 83 = .289 22 / 83 = .265 10 / 83 = .120 1
The Pareto diagram is: Management Systen Cause Category 35 30
P er cent
25 20 15 10 5 0
2.8
E ng&D es
P roc&P ract M gmt&O v er C ategor y
Trn&C omm
c.
The category with the highest relative frequency of incidents is Engineering and Design. The category with the lowest relative frequency of incidents is Training and Communication.
a.
The data collection method was a survey.
b.
Since the data were numbers (percentage of US labor and materials), the variable is quantitative. Once the data were collected, they were grouped into 4 categories.
Methods for Describing Sets of Data
7
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
c.
Using MINITAB, a pie chart of the data is: Pie Chart of Made in USA
100% (64, 60.4%)
<50% ( 4, 3.8%)
75-99% (20, 18.9%)
50-74% (18, 17.0%)
About 60% of those surveyed believe that “Made in USA” means 100% US labor and materials. 2.10
Using MINITAB, a bar chart of the frequency of occurrence of the industry types is:
Chart of INDUSTRY 80 70
Count
60 50 40 30 20 0
Aerospace/Defense Banking Capital Goods Chemicals Conglomerates Construction Consumer Durables Diversified Financials Drugs/Biotechnology Food Markets Food/Drink/Tobacco Health Care Hotels/Restaurants/Leisure Household/Personal Products Insurance Materials Media Oil & Gas Retailing Semiconductors Services/Supplies Software & Services Technology Equipment Telecommunications Transportation Utilities
10
INDUSTRY
8
Chapter 2
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
2.12
Using MINITAB, the side-by-side bar charts are: Chart of 1999, 2006 vs Use Yes
No
1999
0.7
D on't know
2006
Relative Fr equency
0.6 0.5 0.4 0.3 0.2 0.1 0.0
Yes
No Don't know Unathor ized Use of C O mputer Systems
The relative frequency of unauthorized use of computer systems has decreased from 1999 to 2006. 2.14
a.
Using MINITAB, the side-by-side graphs are: Chart of Exposure, Opportunity, Content, Faculty vs Stars 5 Exposure
4
3
2
Opportunity 16 12
Fr equency
8 4 Content
Faculty
0
16 12 8 4 0
5
4
3
2
Star s
From these graphs, one can see that very few of the top 30 MBA programs got 5-stars in any criteria. In addition, about the same number of programs got 4 stars in each of the 4 criteria. The biggest difference in ratings among the 4 criteria was in the number of programs receiving 3-stars. More programs received 3-stars in Course Content than in any of the other criteria. Consequently, fewer programs received 2-stars in Course Content than in any of the other criteria. b.
Since this chart lists the rankings of only the top 30 MBA programs in the world, it is reasonable that none of these best programs would be rated as 1-star on any criteria.
Methods for Describing Sets of Data
9
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
2.16
2.18
a.
The original data set has 1 + 3 + 5 + 7 + 4 + 3 = 23 observations.
b.
For the bottom row of the stem-and-leaf display: The stem is 0. The leaves are 0, 1, 2. The numbers in the original data set are 0, 1, and 2.
2.20.
10
c.
The dot plot corresponding to all the data points is:
a.
The measurement class that contains the highest proportion of respondents is “none”. Sixty-one percent of the respondents said that their companies did not outsource any computer security functions.
b.
From the graph, 6% of the respondents indicated that they outsourced between 20% and 40% of their computer security functions.
c.
The proportion of the 609 respondents who outsourced at least 40% of computer security functions is .04 + .01 + .01 = .06.
d.
The number of the 609 respondents who outsourced less than 20% of computer security functions is (.27 + .61)*609 = .88(609) = 536.
Chapter 2
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
2.22
a.
Using MINITAB, the stem-and-leaf display of the data is:
Stem-and-Leaf Display: SCORE Stem-and-leaf of SCORE Leaf Unit = 1.0 1 6 1 6 2 7 3 7 4 8 15 8 56 9 (100) 9 13 10
2.24
N
= 169
2 2 8 4 66677888899 00001111111222222222233333333344444444444 55555555555555555555556666666666666666666777777777777777777888888+ 0000000000000
b.
From the stem-and-leaf display, we see that there are only 4 observations with sanitation scores less than the acceptable score of 86. The proportion of ships that have an accepted sanitation standard would be (169 – 4) / 169 = .976.
c.
The sanitation score of 84 is in bold in the stem-and-leaf display in part a.
a.
Using MINITAB, the frequency histogram is:
Frequency
30
20
10
0 20
30
40
50
Length
Methods for Describing Sets of Data
11
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
b.
Using MINITAB, the frequency histogram is: 35 30
Frequency
25 20 15 10 5 0 0
500
1000
1500
2000
250
Weight
c.
Using MINITAB, the frequency histogram is:
140 120
Frequency
100 80 60 40 20 0 0
500
1000
DDT
2.26
Using MINITAB, the two dot plots are: Dotplot for Arrive-Depart
Yes. Most of the numbers of items arriving at the work center per hour are in the 135 to 165 area. Most of the numbers of items departing the work center per hour are in the 110 to 140 area. Because the number of items arriving is larger than the number of items departing, there will probably be some sort of bottleneck.
12
Chapter 2
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
2.28
a.
Using MINITAB, the three frequency histograms are as follows (the same starting point and class interval were used for each): Histogram of C1
N = 25
Tenth Performance Midpoint Count 4.00 0 8.00 0 12.00 1 16.00 5 20.00 10 24.00 6 28.00 0 32.00 2 36.00 0 40.00 1
* ***** ********** ****** ** *
Histogram of C2
N = 25
Thirtieth Performance Midpoint Count 4.00 1 8.00 9 12.00 12 16.00 2 20.00 1
* ********* ************ ** *
Histogram of C3
N = 25
Fiftieth Performance Midpoint Count 4.00 3 8.00 15 12.00 4 16.00 2 20.00 1
b.
*** *************** **** ** *
The histogram for the tenth performance shows a much greater spread of the observations than the other two histograms. The thirtieth performance histogram shows a shift to the left—implying shorter completion times than for the tenth performance. In addition, the fiftieth performance histogram shows an additional shift to the left compared to that for the thirtieth performance. However, the last shift is not as great as the first shift. This agrees with statements made in the problem.
Methods for Describing Sets of Data
13
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
2.30
a.
A stem-and-leaf display is as follows, where the stems are the units place and the leaves are the decimal places: Stem Leaves 1 0 0 0 0 1 1 2 2 222 3 4 4 4 4444 5 5 55 6 79 2 1 144 6 7 9 9 3 0 028 9 9 4 1112 5 5 24 6 7 8 8 9 10 1
2.32
b.
A little more than half (26/49 = .53) of all companies spent less than 2 months in bankruptcy. Only two of the 49 companies spent more than 6 months in bankruptcy. It appears then, in general, the length of time in bankruptcy for firms using "prepacks" is less than that of firms not using "prepacks."
c.
A dot diagram will be used to compare the time in bankruptcy for the three types of "prepack" firms:
d.
The circled times in part a correspond to companies that were reorganized through a leverage buyout. There does not appear to be any pattern to these points. They appear to be scattered about evenly throughout the distribution of all times.
Using MINITAB, the stem-and-leaf display for the data is: Stem-and-leaf of Time Leaf Unit = 1.0 3 7 (7) 11 6 4 2 1
3 4 5 6 7 8 9 10
N
= 25
239 3499 0011469 34458 13 26 5 2
The numbers in bold represent delivery times associated with customers who subsequently did not place additional orders with the firm. Since there were only 2 customers with delivery times of 68 days or longer that placed additional orders, I would say the maximum tolerable delivery time is about 65 to 67 days. Everyone with delivery times less than 67 days placed additional orders.
14
Chapter 2
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
2.34
a.
∑ x = 3 + 8 + 4 + 5 + 3 + 4 + 6 = 33
b.
∑x
c.
∑ ( x − 5)
= 32 + 82 + 42 + 52 + 32 + 42 + 62 = 175
2
2
= (3 − 5)2 + (8 − 5)2 + (4 − 5)2 + (5 − 5)2 + (3 − 5)2 + (4 − 5)2 + (6 − 5)2 = 20
d.
∑ ( x − 2)
2
= (3 − 2)2 + (8 − 2)2 + (4 − 2)2 + (5 − 2)2 + (3 − 2)2 + (4 − 2)2 + (6 − 2)2 = 71
2.36
2.38
2.40
e.
(∑ x)
a.
∑ x = 6 + 0 + (−2) + (−1) + 3 = 6
b.
∑x
2
c.
∑x
2
a.
x=
b.
x=
400 = 25 16
c.
x=
35 = .78 45
d.
x=
242 = 13.44 18
2
= (3 + 8 + 4 + 5 + 3 + 4 + 6)2 = 332 = 1089
= 62 + 02 + (−2)2 + (−1)2 + 32 = 50
(∑ x) − 5
∑ x = 85 n
10
2
= 50 −
62 = 50 − 7.2 = 42.8 5
= 8.5
The median is the middle number once the data have been arranged in order. If n is even, there is not a single middle number. Thus, to compute the median, we take the average of the middle two numbers. If n is odd, there is a single middle number. The median is this middle number. A data set with five measurements arranged in order is 1, 3, 5, 6, 8. The median is the middle number, which is 5. A data set with six measurements arranged in order is 1, 3, 5, 5, 6, 8. The median is the average 5 + 5 10 = 5. of the middle two numbers which is = 2 2
Methods for Describing Sets of Data
15
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
2.42
a.
∑ x = 7 + " + 4 = 15
x =
n
6
Median =
6
= 2.5
3+3 = 3 (mean of 3rd and 4th numbers, after ordering) 2
Mode = 3
2.44
∑ x = 2 + " + 4 = 40
b.
= 3.08 n 13 13 Median = 3 (7th number, after ordering) Mode = 3
c.
= 49.6 10 10 48 + 50 Median = = 49 (mean of 5th and 6th numbers, after ordering) 2 Mode = 50
a.
The sample mean is:
x =
∑ x = 51 + " + 37 = 496
x =
n
n
x=
∑x i =1
i
n
=
529 + 355 + 301 + ... + 63 3757 = = 144.5 26 26
The sample median is found by finding the average of the 13th and 14th observations once the data are arranged in order. The 13th and 14th observations are 100 and 105. The average of these two numbers (median) is: median =
100 + 105 205 = = 102.5 2 2
The mode is the observation appearing the most. For this data set, the mode is 70, which appears 3 times. Since the mean is larger than the median, the data are skewed to the right. b.
The sample mean is: n
x=
∑x i =1
i
n
=
11 + 9 + 6 + ... + 4 136 = = 5.23 26 26
The sample median is found by finding the average of the 13th and 14th observations once the data are arranged in order. The 13th and 14th observations are 5 and 5. The average of these two numbers (median) is: median =
16
5 + 5 10 = =5 2 2
Chapter 2
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
The mode is the observation appearing the most. For this data set, the mode is 6, which appears 6 times. Since the mean and median are about the same, the data are somewhat symmetric. 2.46
a.
The sample mean is: n
x=
∑ xi i =1
n
=
1.72 + 2.50 + 2.16 + ⋅⋅⋅ + 1.95 37.62 = = 1.881 20 20
The sample average surface roughness of the 20 observations is 1.881. b.
The median is found as the average of the 10th and 11th observations, once the data have been ordered. The ordered data are: 1.06 1.09 1.19 1.26 1.27 1.40 1.51 1.72 1.95 2.03 2.05 2.13 2.13 2.16 2.24 2.31 2.41 2.50 2.57 2.64
The 10th and 11th observations are 2.03 and 2.05. The median is: 2.03 + 2.05 4.08 = = 2.04 2 2
The middle surface roughness measurement is 2.04. Half of the sample measurements were less than 2.04 and half were greater than 2.04.
2.48
c.
The data are somewhat skewed to the left. Thus, the median might be a better measure of central tendency than the mean. The few small values in the data tend to make the mean smaller than the median.
a.
Using MINITAB, the stem-and-leaf display is: Stem-and-leaf of PAF Leaf Unit = 1.0 6 8 (2) 7 5 4 4 3
b.
0 1 2 3 4 5 6 7
N=17
000009 25 45 13 0 2 057
The median is the middle number once the data are arranged in order. The data arranged in order are: 0, 0, 0, 0, 0, 9, 12, 15, 24, 25, 31, 33, 40, 62, 70, 75, 77. The middle number or the median is 24.
c.
The mean of the data is x =
Methods for Describing Sets of Data
∑x n
=
77 + 33 + 75 + " + 31 473 = = 27.82 17 17
17
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
2.50
d.
The number occurring most frequently is 0. The mode is 0.
e.
The mode corresponds to the smallest number. It does not seem to locate the center of the distribution. Both the mean and the median are in the middle of the stem-and-leaf display. Thus, it appears that both of them locate the center of the data.
a.
The sample mean length is: n
x=
∑x i =1
i
n
=
42.5 + 44.0 + 41.5 + ... + 36.0 6165 = = 42.81 144 144
The average length of the 144 fish is 42.81 cm. The median is the average of the middle two observations once they have been ordered. The 72nd and 73rd observations are 45 and 45. The average of these two observations is 45. Half of the fish lengths are less than 45 cm and half are longer. The mode is 46 cm. This observation occurred 12 times. b.
The sample mean weight is: n
x=
∑x i =1
i
n
=
732 + 795 + 547 + ... + 1433 151159 = = 1049.72 144 144
The average weight of the 144 fish is 1049.72 grams. The median is the average of the middle two observations once they have been ordered. The 72nd and 73rd observations are 989 and 1011. The average of these two observations is median =
989 + 1,011 = 1000 2
Half of the fish weights are less than 1000 grams and half are heavier. There are 2 modes, 886 and 1186. Each of these observations occurred 3 times. c.
The sample mean DDT level is: n
x=
∑x i =1
n
i
=
10 + 16 + 23 + ... + 1.9 3507.1 = = 24.35 144 144
The average DDT level of the 144 fish is 24.35 parts per million.
18
Chapter 2
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
The median is the average of the middle two observations once they have been ordered. The 72nd and 73rd observations are 7.1 and 7.2. The average of these two observations is median =
7.1 + 7.2 = 7.15 2
Half of the fish DDT levels are less than 7.15 parts per million and half are greater. The mode is 12. This observation occurred 8 times.
2.52
2.54
d.
From the graph in Exercise 2.24a, the data are skewed to the left. This corresponds to the relationship between the mean and the median. For data skewed to the left, the mean is less than the median. For the fish lengths, the mean is 42.81 and the median is 45.
e.
From the graph in Exercise 2.24b, the data are slightly skewed to the right. This corresponds to the relationship between the mean and the median. For data skewed to the right, the mean is more than the median. For the fish weights, the mean is 1049.72 and the median is 1000.
f.
From the graph in Exercise 2.24c, the data are skewed to the right. This corresponds to the relationship between the mean and the median. For data skewed to the right, the mean is more than the median. For the fish DDT levels, the mean is 24.35 and the median is 7.15.
a.
Due to the "elite" superstars, the salary distribution is skewed to the right. Since this implies that the median is less than the mean, the players' association would want to use the median.
b.
The owners, by the logic of part a, would want to use the mean.
a.
The sample mean is: n
x=
∑x i =1
n
i
=
5 + 3 + 4 + ... + 3 80 = =4 20 20
The sample median is found by finding the average of the 10th and 11th observations once the data are arranged in order. The data arranged in order are: 1 1 1 1 1 2 2 3 3 3 4 4 4 5 5 5 6 7 9 13 The 10th and 11th observations are 3 and 4. The average of these two numbers (median) is: median =
3+ 4 7 = = 3.5 2 2
The mode is the observation appearing the most. For this data set, the mode is 1, which appears 5 times.
Methods for Describing Sets of Data
19
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
b.
Eliminating the largest number which is 13 results in the following: The sample mean is: n
x=
∑x i =1
i
n
=
5 + 3 + 4 + ... + 3 67 = = 3.53 19 19
The sample median is found by finding the middle observation once the data are arranged in order. The data arranged in order are: 1 1 1 1 1 2 2 3 3 3 4 4 4 5 5 5 6 7 9 The 10th observation is 3. The median is 3 The mode is the observations appearing the most. For this data set, the mode is 1, which appears 5 times. By dropping the largest number, the mean is reduced from 4 to 3.53. The median is reduced from 3.5 to 3. There is no effect on the mode. c.
The data arranged in order are: 1 1 1 1 1 2 2 3 3 3 4 4 4 5 5 5 6 7 9 13 If we drop the lowest 2 and largest 2 observations we are left with: 1 1 1 2 2 3 3 3 4 4 4 5 5 5 6 7
The sample 10% trimmed mean is: n
x=
∑x i =1
n
i
=
1 + 1 + 1 + ... + 7 56 = = 3.5 16 16
The advantage of the trimmed mean over the regular mean is that very large and very small numbers that could greatly affect the mean have been eliminated.
20
Chapter 2
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
2.56
a.
b.
2.58
s2 =
s2 =
(∑ x) − n −1
∑x
2
∑x
2
= 2
= 2
n
a.
Range = 42 − 37 = 5
b.
(∑ x) −
n −1
s=
3.3333 = 1.826
17 2 20 = .1868 20 − 1
=
s=
.1868 = .432
2
1992 5 = 3.7 5 −1
s=
3.7 = 1.92
7935 −
n
=
Range = 100 − 1 = 99
s2 = c.
∑x
2
4.8889 = 2.211
18 −
s2 =
n −1
1002 ` 40 = 3.3333 40 − 1
s=
380 −
n
(∑ x) −
202 10 = 4.8889 10 − 1
84 −
n
(∑ x) −
n −1
2
c.
s2 =
∑x
2
(∑ x) −
n −1
2
3032 9 = 1,949.25 9 −1
25,795 −
n
=
s = 1,949.25 = 44.15
Range = 100 − 2 = 98
s2 = 2.60
∑x
2
∑x
2
(∑ x) −
n −1
2
2952 8 = 1,307.84 8 −1
20,033 −
n
=
s = 1,307.84 = 36.16
This is one possibility for the two data sets. Data Set 1: 1, 1, 2, 2, 3, 3, 4, 4, 5, 5 Data Set 2: 1, 1, 1, 1, 1, 5, 5, 5, 5, 5
x1 = x2 =
∑ x = 1 + 1 + 2 + 2 + 3 + 3 + 4 + 4 + 5 + 5 = 30 = 3 n
10 10 1 + 1 + 1 + 1 + 1 + 5 + 5 + 5 + 5 + 5 30 = = =3 n 10 10
∑x
Therefore, the two data sets have the same mean. The variances for the two data sets are:
s12 =
s22 =
∑x
2
(∑ x) − n −1
∑x
2
n
(∑ x) −
n −1
2
n
=
302 10 = 20 = 2.2222 9 9
110 −
2
=
302 10 = 20 = 4.4444 9 9
110 −
Methods for Describing Sets of Data
21
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
The dot diagrams for the two data sets are shown below.
2.62
a.
Range = 3 − 0 = 3
s2 = b.
∑x
2
(∑ x) −
n −1
2
72 5 = 1.3 = 5 −1 15 −
n
s = 1.3 = 1.1402
After adding 3 to each of the data points, Range = 6 − 3 = 3
s2 = c.
∑x
2
(∑ x) −
n −1
2
n
=
222 5 = 1.3 5 −1
102 −
s = 1.3 = 1.1402
After subtracting 4 from each of the data points, Range = −1 − (−4) = 3
s2 =
2.64
∑x
2
(∑ x) −
n −1
n
2
=
(−13) 2 5 = 1.3 5 −1
39 −
s = 1.3 = 1.1402
d.
The range, variance, and standard deviation remain the same when any number is added to or subtracted from each measurement in the data set.
a.
The maximum age is 64. The minimum age is 39. The range is 64 – 39 = 25.
b.
The variance is: 2
⎛ n ⎞ ⎜ ∑ xi ⎟ n 2 24942 x − ⎝ i =1 ⎠ ∑ 125,764n 50 = 27.822 s 2 = i =1 = n −1 50-1
c.
The standard deviation is: s = s 2 = 27.822 = 5.275
d.
22
Since the standard deviation of the ages of the 50 most powerful women in Europe is 10 years and is greater than that in the U.S. (5.275 years), the age data for Europe is more variable.
Chapter 2
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
2.66
a.
The maximum weight is 1.1 carats. The minimum weight is .18 carats. The range is 1.1 − .18 = .92 carats.
b.
The variance is: 2
⎛ ⎞ ⎜ ∑ xi ⎟ 194.322 xi2 − ⎝ i ⎠ 146.19 − ∑ n 308 = .0768 square carats s2 = i = 308 − 1 n −1 c.
The standard deviation is: s = s 2 = .0768 = .2772 carats
2.68
d.
The standard deviation. This gives us an idea about how spread out the data are in the same units as the original data.
a.
A worker's overall time to complete the operation under study is determined by adding the subtask-time averages. Worker A
The average for subtask 1 is: x =
∑ x = 211 = 30.14 n
7 21 = The average for subtask 2 is: x = =3 n 7 Worker A's overall time is 30.14 + 3 = 33.14.
∑x
Worker B
The average for subtask 1 is: x =
∑ x = 213 = 30.43 n
7 29 = The average for subtask 2 is: x = = 4.14 n 7 Worker B's overall time is 30.43 + 4.14 = 34.57.
∑x
b.
Worker A
s=
∑x
2
(∑ x) − n −1
2
n
=
2117 7 = 15.8095 = 3.98 7 −1
6455 −
Worker B
s= c.
∑x
2
(∑ x) − n −1
n
2
=
2132 7 = .9524 = .98 7 −1
6487 −
The standard deviations represent the amount of variability in the time it takes the worker to complete subtask 1.
Methods for Describing Sets of Data
23
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
d.
Worker A
∑x
s=
(∑ x) −
2
n −1
2
n
=
212 7 = .6667 = .82 7 −1
67 −
Worker B
∑x
s= e.
(∑ x) −
2
n −1
2
n
=
292 7 = 4.4762 = 2.12 7 −1
147 −
I would choose workers similar to worker B to perform subtask 1. Worker B has a slightly higher average time on subtask 1 (A: x = 30.14, B: x = 30.43). But, Worker B has a smaller variability in the time it takes to complete subtask 1 (part b). He or she is more consistent in the time needed to complete the task. I would choose workers similar to Worker A to perform subtask 2. Worker A has a smaller average time on subtask 2 (A: x = 3, B: x = 4.14). Worker A also has a smaller variability in the time needed to complete subtask 2 (part d).
2.70
2.72
Since no information is given about the data set, we can only use Chebyshev's Rule. a.
Nothing can be said about the percentage of measurements which will fall between x − s and x + s.
b.
At least 3/4 or 75% of the measurements will fall between x − 2s and x + 2s.
c.
At least 8/9 or 89% of the measurements will fall between x − 3s and x + 3s.
a.
x =
s2 =
∑ x = 206 n
∑x
25
= 8.24
(∑ x) −
2
n −1
n
2
=
2062 25 = 3.357 25 − 1
1778 −
s=
s 2 = 1.83
b. Interval
c.
24
Number of Measurements in Interval
Percentage
x ± s, or (6.41, 10.07)
18
18/25 = .72 or 72%
x ± 2s, or (4.58, 11.90)
24
24/25 = .96 or 96%
x ± 3s, or (2.75, 13.73)
25
25/25 = 1
or 100%
The percentages in part b are in agreement with Chebyshev's Rule and agree fairly well with the percentages given by the Empirical Rule.
Chapter 2
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
d.
Range = 12 − 5 = 7 s ≈ range/4 = 7/4 = 1.75 The range approximation provides a satisfactory estimate of s = 1.83 from part a.
2.74
From Chebyshev’s Theorem, we know that at least ¾ or 75% of all observations will fall within 2 standard deviations of the mean. From Exercise 2.47, x = .631. From Exercise 2.66, s = .2772. This interval is: x ± 2 s ⇒ .631 ± 2(.2772) ⇒ .631 ± .5544 ⇒ (.0766, 1.1854)
2.76
a.
From the information given, we have x = 375 and s = 25. From Chebyshev's Rule, we know that at least three-fourths of the measurements are within the interval: x ± 2s, or (325, 425)
Thus, at most one-fourth of the measurements exceed 425. In other words, more than 425 vehicles used the intersection on at most 25% of the days. b.
According to the Empirical Rule, approximately 95% of the measurements are within the interval: x ± 2s, or (325, 425)
This leaves approximately 5% of the measurements to lie outside the interval. Because of the symmetry of a mound-shaped distribution, approximately 2.5% of these will lie below 325, and the remaining 2.5% will lie above 425. Thus, on approximately 2.5% of the days, more than 425 vehicles used the intersection. 2.78
a.
Since the sample mean (18.2) is larger than the sample median (15), it indicates that the distribution of years is skewed to the right. In addition, the maximum number of years is 50 and the minimum is 2. If the distribution were symmetric, the mean and median should be about halfway between these two numbers. Halfway between the maximum and minimum values is 26, which is much larger than either the mean or the median.
b.
The standard deviation can be estimated by the range divided by either 4 or 6. For this distribution, the range is: Range = Largest − smallest = 50 − 2 = 48. Dividing the range by 4, we get an estimate of the standard deviation to be 48/4 = 12. Dividing the range by 6, we get an estimate of the standard deviation to be 48/6 = 8. Thus, the standard deviation should be somewhere between 8 and 12. For this problem, the standard deviation is s = 10.64. This value falls in the estimated range of 8 to 12.
Methods for Describing Sets of Data
25
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
c.
First, we calculate the number of standard deviations from the mean the value of 40 years is. To do this, we first subtract the mean and then divide by the value of the standard deviation. 40 − x 40 − 18.2 Number of standard deviations is = 2.05 ≈ 2 = 10.64 s Using Chebyshev's Rule, we know that at most 1/k2 or 1/22 = 1/4 of the data will be more than 2 standard deviations from the mean. Thus, this would indicate that at most 25% of the Generation Xers responded with 40 years or more. Next, we calculate the number of standard deviations from the mean the value of 8 years is. Number of standard deviations is
8 − x 8 − 18.2 = −.96 ≈ -1 = s 10.64
Using Chebyshev's Rule, we get no information about the data within 1 standard deviation of the mean. However, we know the median (15) is more than 8. By definition, 50% of the data are larger than the median. Thus, at least 50% of the Generation Xers responded with 8 years or more. No additional information can be obtained with the information given. 2.80
a.
Using MINITAB, the frequency histogram for the time in bankruptcy is:
Frequency
20
10
0 1
2
3
4
5
6
7
8
9
10
Time in Bankrupt
The Empirical Rule is not applicable because the data are not mound shaped.
26
Chapter 2
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
b. Using MINITAB, the descriptive measures are: Descriptive Statistics: Time in Bankrupt
Variable Time in
N 49
Mean 2.549
Median 1.700
TrMean 2.333
Variable Time in
Minimum 1.000
Maximum 10.100
Q1 1.350
Q3 3.500
StDev 1.828
SE Mean 0.261
From Chebyshev’s Theorem, we know that at least 75% of the observations will fall within 2 standard deviations of the mean. This interval is: x ± 2 s ⇒ 2.549 ± 2(1.828) ⇒ 2.549 ± 3.656 ⇒ (−1.107, 6.205)
c. There are 47 of the 49 observations within this interval. The percentage would be (47/49)*100% = 95.9%. This agrees with Chebyshev’s Theorem (at least 75%0. It also agrees with the Empirical Rule (approximately 95%). d. From the above interval we know that about 95% of all firms filing for prepackaged bankruptcy will be in bankruptcy between 0 and 6.2 months. Thus, we would estimate that a firm considering filing for bankruptcy will be in bankruptcy up to 6.2 months. 2.82
2.84
a.
Since it is given that the distribution is mound-shaped, we can use the Empirical Rule. We know that 1.84% is 2 standard deviations below the mean. The Empirical Rule states that approximately 95% of the observations will fall within 2 standard deviations of the mean and, consequently, approximately 5% will lie outside that interval. Since a mound-shaped distribution is symmetric, then approximately 2.5% of the day's production of batches will fall below 1.84%.
b.
If the data are actually mound-shaped, it would be extremely unusual (less than 2.5%) to observe a batch with 1.80% zinc phosphide if the true mean is 2.0%. Thus, if we did observe 1.8%, we would conclude that the mean percent of zinc phosphide in today's production is probably less than 2.0%.
a.
Since we do not have any idea of the shape of the distribution of SAT-Math score changes, we must use Chebyshev’s Theorem. We know that at least 8/9 of the observations will fall within 3 standard deviations of the mean. This interval would be: x ± 3s ⇒ 19 ± 3(65) ⇒ 19 ± 195 ⇒ (−176, 214)
Thus, for a randomly selected student, we could be pretty sure that this student’s score would be any where from 176 points below his/her previous SAT-Math score to 214 points above his/her previous SAT-Math score. b.
Since we do not have any idea of the shape of the distribution of SAT-Verbal score changes, we must use Chebyshev’s Theorem. We know that at least 8/9 of the observations will fall within 3 standard deviations of the mean. This interval would be: x ± 3s ⇒ 7 ± 3(49) ⇒ 7 ± 147 ⇒ (−140, 154)
Methods for Describing Sets of Data
27
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
Thus, for a randomly selected student, we could be pretty sure that this student’s score would be any where from 140 points below his/her previous SAT-Verbal score to 154 points above his/her previous SAT-Verbal score.
2.86
c.
A change of 140 points on the SAT-Math would be a little less than 2 standard deviations from the mean. A change of 140 points on the SAT-Verbal would be a little less than 3 standard deviations from the mean. Since the 140 point change for the SAT-Math is not as big a change as the 140 point on the SAT-Verbal, it would be most likely that the score was a SAT-Math score.
a.
z=
b.
z=
c.
z=
d.
z=
x − x 40 − 30 = 2 (sample) = s 5 x−μ
σ x−μ
σ
2 standard deviations above the mean.
=
90 − 89 = .5 (population) .5 standard deviations above the mean. 2
=
50 − 50 = 0 (population) 0 standard deviations above the mean. 5
x − x 20 − 30 = −2.5 (sample) 2.5 standard deviations below the mean. = s 4
2.88
The 50th percentile of a data set is the observation that has half of the observations less than it. Another name for the 50th percentile is the median.
2.90
Since the element 40 has a z-score of −2 and 90 has a z-score of 3, −2 =
40 − μ
σ
and 3 =
⇒ −2σ = 40 − μ ⇒ μ − 2σ = 40 ⇒ μ = 40 + 2σ
90 − μ
σ ⇒ 3σ = 90 − μ ⇒ μ + 3σ = 90
By substitution, 40 + 2σ + 3σ = 90 ⇒ 5σ = 50 ⇒ σ = 10 By substitution, μ = 40 + 2(10) = 60 Therefore, the population mean is 60 and the standard deviation is 10. 2.92
28
The percentile ranking of the age of 25 years would be 100% − 73.5% = 26.5%.
Chapter 2
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
2.94
a.
From Exercise 2.77, x = 94.91 and s = 4.83. The z-score for an observation of 78 is: z=
x − x 78 − 94.91 = = −3.50 s 4.83
This z-score indicates that an observation of 78 is 3.5 standard deviations below the mean. Very few observations will be lower than this one. b.
The z-score for an observation of 98 is: z=
x − x 98 − 94.91 = = 0.63 s 4.83
This z-score indicates that an observation of 98 is .63 standard deviations above the mean. This score is not an unusual observation in the data set. 2.96
a.
From the problem, μ = 2.7 and σ = .5 z=
x-μ
σ
⇒ zσ = x − μ ⇒ x = μ + zσ
For z = 2.0, x = 2.7 + 2.0(.5) = 3.7 For z = −1.0, x = 2.7 − 1.0(.5) = 2.2 For z = .5, x = 2.7 + .5(.5) = 2.95 For z = −2.5, x = 2.7 − 2.5(.5) = 1.45 b.
For z = −1.6, x = 2.7 − 1.6(.5) = 1.9
c.
If we assume the distribution of GPAs is approximately mound-shaped, we can use the Empirical Rule. From the Empirical Rule, we know that ≈.025 or ≈2.5% of the students will have GPAs above 3.7 (with z = 2). Thus, the GPA corresponding to summa cum laude (top 2.5%) will be greater than 3.7 (z > 2). We know that ≈.16 or 16% of the students will have GPAs above 3.2 (z = 1). Thus, the limit on GPAs for cum laude (top 16%) will be greater than 3.2 (z > 1). We must assume the distribution is mound-shaped.
Methods for Describing Sets of Data
29
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
2.98
a.
Since the data are approximately mound-shaped, we can use the Empirical Rule. On the blue exam, the mean is 53% and the standard deviation is 15%. We know that approximately 68% of all students will score within 1 standard deviation of the mean. This interval is: x ± s ⇒ 53 ± (15) ⇒ (38, 68)
About 95% of all students will score within 2 standard deviations of the mean. This interval is: x ± 2 s ⇒ 53 ± 2(15) ⇒ 53 ± 30 ⇒ (23, 83)
About 99.7% of all students will score within 3 standard deviations of the mean. This interval is: x ± 3s ⇒ 53 ± 3(15) ⇒ 53 ± 45 ⇒ (8, 98)
b.
Since the data are approximately mound-shaped, we can use the Empirical Rule. On the red exam, the mean is 39% and the standard deviation is 12%. We know that approximately 68% of all students will score within 1 standard deviation of the mean. This interval is: x ± s ⇒ 39 ± (12) ⇒ (27, 51)
About 95% of all students will score within 2 standard deviations of the mean. This interval is: x ± 2 s ⇒ 39 ± 2(12) ⇒ 39 ± 24 ⇒ (15, 63)
About 99.7% of all students will score within 3 standard deviations of the mean. This interval is:
c.
2.100
30
x ± 3s ⇒ 39 ± 3(12) ⇒ 39 ± 36 ⇒ (3, 75) The student would have been more likely to have taken the red exam. For the blue exam, we know that approximately 95% of all scores will be from 23% to 83%. The observed 20% score does not fall in this range. For the blue exam, we know that approximately 95% of all scores will be from 15% to 63%. The observed 20% score does fall in this range. Thus, it is more likely that the student would have taken the red exam.
The 25th percentile, or lower quartile, is the measurement that has 25% of the measurements below it and 75% of the measurements above it. The 50th percentile, or median, is the measurement that has 50% of the measurements below it and 50% of the measurements above it. The 75th percentile, or upper quartile, is the measurement that has 75% of the measurements below it and 25% of the measurements above it.
Chapter 2
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
2.102
a.
Median is approximately 4.
b.
QL is approximately 3 (Lower Quartile) QU is approximately 6 (Upper Quartile)
2.104
c.
IQR = QU − QL ≈ 6 − 3 = 3
d.
The data set is skewed to the right since the right whisker is longer than the left, there is one outlier, and there are two potential outliers.
e.
50% of the measurements are to the right of the median and 75% are to the left of the upper quartile.
f.
There are two potential outliers, 12 and 13. There is one outlier, 16.
a.
From the problem, x = 52.33 and s = 9.22. The highest salary is 75 (thousand). The z-score is z =
x−x 75 − 52.33 = = 2.46 s 9.22
Therefore, the highest salary is 2.46 standard deviations above the mean. The lowest salary is 35.0 (thousand). The z-score is z =
x−x 35.0 − 52.33 = = −1.88 s 9.22
Therefore, the lowest salary is 1.88 standard deviations below the mean. The mean salary offer is 52.33 (thousand). The z-score is z =
x−x 52.33 − 52.33 = =0 s 9.22
The z-score for the mean salary offer is 0 standard deviations from the mean. No, the highest salary offer is not unusually high. For any distribution, at least 8/9 of the salaries should have z-scores between −3 and 3. A z-score of 2.46 would not be that unusual.
Methods for Describing Sets of Data
31
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
b.
Using MINITAB, the box plot is:
Since no salaries are outside the inner fences, none of them are potentially faulty observations. 2.106
Using MINITAB, the side-by-side box plots are: 65
60
A GE
55
50
45
40 1
2 GRO UP
3
From the boxplots, there appears to be one outlier in the third group. 2.108
a.
First, we will compute the mean and standard deviation. The sample mean is: n
x=
∑x i =1
n
i
=
393 = 5.24 75
The sample variance is: 2
⎛ ⎞ ⎜ ∑ xi ⎟ 3932 xi2 − ⎝ i ⎠ 5943 − ∑ n 75 = 52.482 s2 = i = 75 − 1 n −1
32
Chapter 2
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
The standard deviation is: s = s 2 = 52.482 = 7.244
Since this data set is highly skewed, we will use 2 standard deviations from the mean as the cutoff for outliers. Z-scores with values greater than 2 in absolute value are considered outliers. An observation with a z-score of 2 would have the value: z=
x−x x − 5.24 ⇒2= ⇒ 2(7.244) = x − 5.24 ⇒ 14.488 = x − 5.24 ⇒ x = 19.728 s 7.244
An observation with a z-score of -2 would have the value: x−x x − 5.24 ⇒ −2 = ⇒ −2(7.244) = x − 5.24 z= s 7.244 ⇒ −14.488 = x − 5.24 ⇒ x = −9.248
Thus any observation that is greater than to 19.728 or less than -9.248 would be considered an outlier. In this data set there would be 4 outliers: 21, 21, 25, 48. b.
Deleting these 4 outliers, we will recalculate the mean, median, variance, and standard deviation. The median for the original data set is the middle number once they have been arranged in order and is the 38th observation which is 3. The new mean is: n
x=
∑x i =1
n
i
=
278 = 3.92 71
The new sample variance is: 2
⎛ ⎞ ⎜ ∑ xi ⎟ 2782 xi2 − ⎝ i ⎠ 2132 − ∑ n 71 = 14.907 s2 = i = n −1 71 − 1 The new standard deviation is: s = s 2 = 14.907 = 3.861
The new median is the 36th observation once the data have been arranged in order and is 3. In the original data set, the mean is 5.24, the standard deviation is 7.244, and the median is 3. In the revised data set, the mean is 3.92, the standard deviation is 3.861, and the median is 3. The mean has been decreased, the standard deviation has been almost halved, but the median stays the same.
Methods for Describing Sets of Data
33
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
2.110
For Perturbed Intrinsics, but no Perturbed Projections: n
x=
∑ xi i =1
n
=
1.0 + 1.3 + 3.0 + 1.5 + 1.3 8.1 = = 1.62 5 5 2
⎛ n ⎞ ⎜ ∑ xi ⎟ n 2 8.12 xi − ⎝ i =1 ⎠ 15.63 − ∑ n 5 = 2.508 = .627 s 2 = i =1 = 4 4 n −1 s = s 2 = .627 = .792
The z-score corresponding to a value of 4.5 is z=
x − x 4.5 − 1.62 = = 3.63 s .792
Since this z-score is greater than 3, we would consider this an outlier for perturbed intrinsics, but no perturbed projections. For Perturbed Projections, but no Perturbed Intrinsics: n
x=
∑ xi i =1
n
=
22.9 + 21.0 + 34.4 + 29.8 + 17.7 125.8 = = 25.16 5 5 2
⎛ n ⎞ ⎜ ∑ xi ⎟ n 2 125.82 xi − ⎝ i =1 ⎠ 3350.1 − ∑ n 5 = 184.972 = 46.243 s 2 = i =1 = 4 4 n −1 s = s 2 = 46.243 = 6.800
The z-score corresponding to a value of 4.5 is z=
x − x 4.5 − 25.16 = = −3.038 s 6.800
Since this z-score is less than -3, we would consider this an outlier for perturbed projections, but no perturbed intrinsics. Since the z-score corresponding to 4.5 for the perturbed projections, but no perturbed intrinsics is smaller than that for perturbed intrinsics, but no perturbed projections, it is more likely that the that the type of camera perturbation is perturbed projections, but no perturbed intrinsics.
34
Chapter 2
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
2.112
Using MINITAB, a scatterplot of the data is: 15
Var2
10
5
0 -1
0
1
2
3
4
5
6
7
8
Var1
2.114
Using MINITAB, the scatterplot of the data is:
550
Lawyers
450
350
250
150
50 0
5
10
Offices
As the number of offices increases, the number of lawyers also tends to increase. 2.116
a.
Using MINITAB, the scatterplot is: 20
30th
15
10
5 10
20
30
40
10th
It appears that as the completion time for the 10th trial increases, the completion time for the 30th trial decreases.
Methods for Describing Sets of Data
35
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
b.
Using MINITAB, the scatterplot is: 20
50th
15
10
5
10
20
30
40
10th
It appears that as the completion time for the 10th trial increases, the completion time for the 50th trial increases. c.
Using MINITAB, the scatterplot is: 20
50th
15
10
5
5
10
15
20
30th
It appears that as the completion time for the 30th trial increases, the completion time for the 50th trial increases.
36
Chapter 2
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
2.118
Using MINITAB, the scatterplot of the data is: Scatterplot of Mass vs Time 7 6 5
M ass
4 3 2 1 0 0
10
20
30 T ime
40
50
60
There is evidence to indicate that the mass of the spill tends to diminish as time increases. As time is getting larger, the mass is decreasing. 2.120
The mean is sensitive to extreme values in a data set. Therefore, the median is preferred to the mean when a data set is skewed in one direction or the other.
2.122
a.
If we assume that the data are about mound-shaped, then any observation with a z-score greater than 3 in absolute value would be considered an outlier. From Exercise 1.121, the z-score corresponding to 50 is −1, the z-score corresponding to 70 is 1, and the z-score corresponding to 80 is 2. Since none of these z-scores is greater than 3 in absolute value, none would be considered outliers.
b.
From Exercise 1.121, the z-score corresponding to 50 is −2, the z-score corresponding to 70 is 2, and the z-score corresponding to 80 is 4. Since the z-score corresponding to 80 is greater than 3, 80 would be considered an outlier.
c.
From Exercise 1.121, the z-score corresponding to 50 is 1, the z-score corresponding to 70 is 3, and the z-score corresponding to 80 is 4. Since the z-scores corresponding to 70 and 80 are greater than or equal to 3, 70 and 80 would be considered outliers.
d.
From Exercise 1.121, the z-score corresponding to 50 is .1, the z-score corresponding to 70 is .3, and the z-score corresponding to 80 is .4. Since none of these z-scores is greater than 3 in absolute value, none would be considered outliers.
Methods for Describing Sets of Data
37
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
2.124
a.
∑ x = 4 + 6 + 6 + 5 + 6 + 7 = 34 ∑ x = 42 + 62 + 62 + 52 + 62 + 72 = 198 ∑ x = 34 = 5.67 x= 2
n
s2 =
∑x
6
2
(∑ x) −
2
n
=
n −1 s = 1.067 = 1.03 b.
342 6 = 5.3333 = 1.0667 6 −1 5
198 −
∑ x = −1 + 4 + (−3) + 0 + (−3) + (−6) = −9 ∑ x = (−1)2 + 42 + (−3)2 + 02 + (−3)2 + (−6)2 = 71 ∑ x = −9 = -$1.5 x= 2
n
∑x
6
2
(∑ x) −
2
n = n −1 s = 11.5 = $3.39 s2 =
c.
3
4
2
1
(−9) 2 6 = 57.5 = 11.5 dollars squared 6 −1 5
71 −
1
∑ x = 5 + 5 + 5 + 5 + 16 2
2
= 2.0625 2
2
2
2 ⎛ 3⎞ ⎛ 4⎞ ⎛ 2⎞ ⎛1⎞ ⎛ 1 ⎞ ∑ x = ⎜⎝ 5 ⎟⎠ + ⎜⎝ 5 ⎟⎠ + ⎜⎝ 5 ⎟⎠ + ⎜⎝ 5 ⎟⎠ + ⎜⎝ 16 ⎟⎠ = 1.2039 ∑ x = 2.0625 = .4125% x= 5 n
s2 =
d.
2.126
38
∑x
2
(∑ x) − n
2
=
2.06252 .3531 5 = .0883% squared = 5 −1 4
1.2039 −
s=
n −1 .0883 = .30%
(a)
Range = 7 − 4 = 3
(b)
Range = $4 − ($-6) = $10
(c)
Range =
4 1 64 5 59 % − % = % − % = % = .7375% 5 16 80 80 80
σ ≈ range/4 = 20/4 = 5
Chapter 2
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
2.128
Using MINITAB, a pie chart of the data is: Pie Chart of defect C ategory false true
true 9.8%
false 90.2%
A response of ‘true’ means the software contained defective code. Thus, only 9.8% of the modules contained defective software code. 2.130
The z-score would be: z=
x − x 408 − 603.7 = = −1.06 185.4 s
Since this value is not very big, this is not an unusual value to observe. 2.132
2.134
a.
The variable of interest is opinion of book reviews. The values could be ‘would not recommend’, ‘cautious or very little recommendation’, ‘little or no preference’, ‘favorable/recommended’, and ‘outstanding/significant contribution’. Since these responses are not numerical, the variable is quantitative.
b.
Most of the books (63%) received a "favorable/recommended" review. About the same percentage of books received the following reviews: "cautious or very little recommendation" (10%), "little or no preference" (9%), and "outstanding/significant contribution" (12%). Only 5% of the books received "would not recommend" reviews.
c.
If the top two categories are added together, the percent recommended is 75% (actually slightly higher than 75%). This agrees with the study.
a.
To display the status, we use a pie chart. From the pie chart, we see that 58% of the Beanie babies are retired and 42% are current.
Methods for Describing Sets of Data
39
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
b.
Using Minitab, a histogram of the values is:
Most (40 of 50) Beanie babies have values less than $100. Of the remaining 10, 5 have values between $100 and $300, 1 has a value between $300 and $500, 1 has a value between $500 and $700, 2 have values between $700 and $900, and 1 has a value between $1900 and $2100. c.
A plot of the value versus the age of the Beanie Baby is as follows:
From the plot, it appears that as the age increases, the value tends to increase. 2.136
a.
Using MINITAB, the stem-and-leaf display is: Stem-and-leaf of C1 Leaf Unit = 0.10 4 (25) 16 4 2 2 2 2 1 1
40
0 0 1 1 2 2 3 3 4 4
N = 46
34 4 4 5 5 5 5 5 5 5 556666 6 6 6 7 7 7 7 7 8 8 8 8 9 000011222 3 34 7 7
9 7
Chapter 2
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
2.138
b.
The leaves that represent those brands that carry the American Dental Association seal are circled above.
c.
It appears that the cost of the brands approved by the ADA tend to have the lower costs. Thirteen of the twenty brands approved by the ADA, or (13/20) × 100% = 65% are less than the median cost.
a.
Using MINITAB, the summary statistics are:
Descriptive Statistics: Marketing, Engineering, Accounting, Total Variable Marketin Engineer Accounti Total
N 50 50 50 50
Mean 4.766 5.044 3.652 13.462
Median 5.400 4.500 0.800 13.750
TrMean 4.732 4.798 2.548 13.043
Variable Marketin Engineer Accounti Total
Minimum 0.100 0.400 0.100 1.800
Maximum 11.000 14.400 30.000 36.200
Q1 2.825 1.775 0.200 8.075
Q3 6.250 7.225 3.725 16.600
b.
SE Mean 0.365 0.542 0.885 0.965
The z-scores corresponding to the maximum time guidelines developed for each department and the total are as follows: Marketing: z =
x − x 6.5 − 4.77 = .67 = 2.58 s
Engineering: z =
x − x 7.0 − 5.04 = .51 = 3.84 s
Accounting: z =
x − x 8.5 − 3.65 = .77 = 6.26 s
Total: z = c.
StDev 2.584 3.835 6.256 6.820
x − x 17 − 13.46 = .52 = s 6.82
To find the maximum processing time corresponding to a z-score of 3, we substitute in the values of z, , and s into the z formula and solve for x. z=
x−x ⇒ x − x = zs ⇒ x = x + zs s
Marketing:
x = 4.77 + 3(2.58) = 4.77 + 7.74 = 12.51 None of the orders exceed this time.
Engineering:
x = 5.04 + 3(3.84) = 5.04 + 11.52 = 16.56 None of the orders exceed this time.
These both agree with both the Empirical Rule and Chebyshev's Rule.
Methods for Describing Sets of Data
41
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
Accounting:
x = 3.65 + 3(6.26) = 3.65 + 18.78 = 22.43 One of the orders exceeds this time or 1/50 = .02.
Total:
x = 13.46 + 3(6.82) = 13.46 + 20.46 = 33.92 One of the orders exceeds this time or 1/50 = .02.
These both agree with Chebyshev's Rule but not the Empirical Rule. Both of these last two distributions are skewed to the right. d.
Marketing:
x = 4.77 + 2(2.58) = 4.77 + 5.16 = 9.93 Two of the orders exceed this time or 2/50 = .04.
Engineering:
x = 5.04 + 2(3.84) = 5.04 + 7.68 = 12.72 Two of the orders exceed this time or 2/50 = .04.
Accounting:
x = 3.65 + 2(6.26) = 3.65 + 12.52 = 16.17 Three of the orders exceed this time or 3/50 = .06.
Total:
x = 13.46 + 2(6.82) = 13.46 + 13.64 = 27.10 Two of the orders exceed this time or 2/50 = .04.
All of these agree with Chebyshev's Rule but not the Empirical Rule. e.
No observations exceed the guideline of 3 standard deviations for both Marketing and Engineering. One observation exceeds the guideline of 3 standard deviations for both Accounting (#23, time = 30.0 days) and Total (#23, time = 36.2 days). Therefore, only (1/10) × 100% of the "lost" quotes have times exceeding at least one of the 3 standard deviation guidelines. Two observations exceed the guideline of 2 standard deviations for both Marketing (#31, time = 11.0 days and #48, time = 10.0 days) and Engineering (#4, time = 13.0 days and #49, time = 14.4 days). Three observations exceed the guideline of 2 standard deviations for Accounting (#20, time = 22.0 days; #23, time = 30.0 days; and #36, time = 18.2 days). Two observations exceed the guideline of 2 standard deviations for Total (#20, time = 30.2 days and #23, time = 36.2 days). Therefore, (7/10) × 100% = 70% of the "lost" quotes have times exceeding at least one the 2 standard deviation guidelines. We would recommend the 2 standard deviation guideline since it covers 70% of the lost quotes, while having very few other quotes exceed the guidelines.
2.140
a.
First, construct a relative frequency distribution for the departments. Class 1 2 3 4 5
42
Department Production Maintenance Sales R&D Administration TOTAL
Frequency 13 31 3 2 5 54
Relative Frequency .241 .574 .056 .037 .093 1.001
Chapter 2
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
The Pareto diagram is: From the diagram, it is evident that the departments with the worst safety record are Maintenance and Production.
b.
First, construct a relative frequency distribution for the type of injury in the maintenance department. Class 1 2 3 4 5 6 7 8
Injury Burn Back strain Eye damage Cuts Broken arm Broken leg Concussion Hearing loss TOTAL
Frequency 6 5 2 10 2 1 3 2 31
Relative Frequency .194 .161 .065 .323 .065 .032 .097 .065 1.002
The Pareto diagram is: From the Pareto diagram, it is evident that cuts is the most prevalent type of injury. Burns and back strain are the next most prevalent types of injuries.
2.142
a.
Using MINITAB, the descriptive statistics are:
Descriptive Statistics: MPG Variable MPG
N 36
Mean 40.056
Median 40.000
TrMean 40.063
Variable MPG
Minimum 35.000
Maximum 45.000
Q1 39.000
Q3 41.000
Methods for Describing Sets of Data
StDev 2.177
SE Mean 0.363
43
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
The mean is 40.056 and the standard deviation is 2.177. Both of these measures are measured in the same units as the original data, which are miles per gallon. b.
Since the sample mean is a good estimate of the population mean, the manufacturer should be satisfied. The sample mean is 40.056 which is greater than 40.
c.
The range of the data set is 45 − 35 = 10. Using Chebyshev's Rule, the range should cover approximately 6 standard deviations. Thus, a good estimate of the standard deviation would be 10/6 = 1.67. Using the Empirical Rule, the range should cover approximately 4 standard deviations. Thus, a good estimate of the standard deviation would be 10/4 = 2.5 The given standard deviation is 2.177 which is between these two estimates. Thus, it is a reasonable value.
d.
Using MINITAB, the frequency histogram is (the relative frequency histogram would have the same shape):
9 8
Frequency
7 6 5 4 3 2 1 0 35
36
37
38
39
40
41
42
43
44
45
MPG
Yes, the data appear to be mound-shaped. e.
Because the data are mound-shaped, we can use the Empirical Rule. We would expect approximately 68% of the data within the interval x ± s, approximately 95% of the data within the interval x ± 2s, and approximately all of the data within the interval x ± 3s.
f.
The interval x ± s is 40.056 ± 2.177 or (37.879, 42.233). Twenty-seven of the observations fall in this interval or 27/36 = .75 or 75%. This number is a little larger than 68%. The interval x ± 2s is 40.056 ± 2(2.177) or (35.702, 44.410). Thirty-four of the observations fall in this interval or 34/36 = .94 or 94%. This number is very close to 95%. The interval x ± 3s is 40.056 ± 3(2.177) or (33.525, 46.587). Thirty-six of the observations fall in this interval or 36/36 = 1.00 or 100%. This number is the same as all of the observations.
44
Chapter 2
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
2.144
a.
Both the height and width of the bars (peanuts) change. Thus, some readers may tend to equate the area of the peanuts with the frequency for each year.
b.
The frequency bar chart is:
Methods for Describing Sets of Data
45
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
The Kentucky Milk Case
(To accompany Chapters 1–2)
There are many things that could be included in a report about the possibility of collusion. I have concentrated on the incumbency rates, bid levels and dispersion, and average winning bids. With the data available, no comparison of market share can be made since there was so much missing data. Actually, with the data available, the exact analysis cannot be made, since only the winning bid information is provided. Thus, we have no idea what the losing bids were. I will present what I think is a reasonable solution. This is by no means the only solution to the case. Many other presentations could also be used.
Incumbency Rates The incumbency rate is the percent of the school districts that are won by the same vendor who won the previous year. A table containing the incumbency rates is included as well as a plot. Notice in the plot that the incumbency rates in the Tri-county market is higher than that in the Surrounding market. From 1985 through 1988, the incumbency rate for the Tri-county market was never lower than .923, while in the same period in the Surrounding market, the incumbency rate was never higher than .730. This implies the possibility of collusion in the Tri-county market.
Year 1984 1985 1986 1987 1988 1989 1990 1991
46
Surrounding Market Tri-county Market Number of Same Incumbency Number of Same Incumbency Districts Vendors Rate Districts Vendors Rate 26 16 .615 10 8 .800 27 19 .704 12 12 1.000 32 19 .594 13 13 1.000 37 27 .730 13 12 .923 37 25 .676 13 13 1.000 37 23 .622 13 9 .692 34 24 .706 13 10 .769 5 3 .600 13 11 .846
The Kentucky Milk Case
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
The plot of the incumbency rates is:
Bid Levels and Dispersion Since we only have access to the winning bids in each of the school districts, we cannot make a true analysis of the bid levels and dispersions. As a compromise, I have used the winning bids of the two dairies in question—Trauth and Meyer. I have looked at only the winning bids of these two dairies in both the Tri-county market and in the Surrounding market. If there was no collusion, then the winning bids and the dispersions of the winning bids should be similar in the two markets for the two dairies. I looked at the box plots of the winning bids of the two dairies in each market for each type of milk: whole white, lowfat white and lowfat chocolate. I have included only a few of the box plots as illustrations. Those included are for 1985 and 1986.
The Kentucky Milk Case
47
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
1985 Winning Bids:
OBS
MARKET
WINNER
WHOLE WHITE
LOWFAT WHITE
LOWFAT CHOCOLATE
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
SUR SUR SUR SUR SUR SUR SUR TRI TRI TRI TRI TRI TRI TRI TRI TRI TRI TRI TRI TRI
MEYER TRAUTH TRAUTH TRAUTH MEYER TRAUTH MEYER TRAUTH TRAUTH MEYER TRAUTH MEYER MEYER MEYER TRAUTH TRAUTH MEYER TRAUTH MEYER TRAUTH
0.1280 0.1200 . . 0.1225 0.1230 0.1250 0.1440 0.1450 0.1410 0.1393 0.1340 0.1445 . 0.1449 . 0.1480 0.1310 . 0.1435
0.1250 0.1110 0.1079 0.1190 0.1130 0.1130 0.1145 0.1440 0.1350 0.1410 0.1393 0.1340 0.1345 0.1345 0.1349 0.1299 0.1480 0.1290 0.1380 0.1335
0.1315 0.1090 0.1079 0.1210 0.1099 0.1120 0.1140 . . 0.1410 . 0.1340 0.1395 . 0.1399 0.1299 0.1480 . . .
Box Plots for Whole White Milk—1985 Boxplots for Whole White Milk - 1985 0.150 0.145
WWBID
0.140 0.135 0.130 0.125 0.120 S U RRO U N D
TRI-C O U N TY M A RKET
48
The Kentucky Milk Case
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
Box Plots for Lowfat White Milk—1985 Boxplots for Lowfat White Milk - 1985 0.15
LFWBID
0.14
0.13
0.12
0.11 S U RRO U N D
TRI-C O U N TY M A RKET
Box Plots for Lowfat Chocolate Milk—1985 Boxplots for Lowfat Chocolate Milk - 1985 0.15
LFC BID
0.14
0.13
0.12
0.11 S U RRO U N D
TRI-C O U N TY M A RKET
The Kentucky Milk Case
49
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
For each type of milk, the mean and median winning bids for the Tri-county market were higher than the corresponding winning bids in the Surrounding market. Also, the dispersion, indicated by the width of the boxes and the length of the whiskers, for the Surrounding market is larger than for the Tri-county market in most cases. This is indicative of collusion in the Tri-county market. This same pattern also existed in 1986. 1986 Winning Bids:
OBS
MARKET
WINNER
WHOLE WHITE
LOWFAT WHITE
LOWFAT CHOCOLATE
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
SUR SUR SUR SUR SUR SUR SUR SUR TRI TRI TRI TRI TRI TRI TRI TRI TRI TRI TRI TRI TRI
TRAUTH TRAUTH TRAUTH MEYER TRAUTH TRAUTH TRAUTH TRAUTH TRAUTH TRAUTH MEYER TRAUTH MEYER MEYER MEYER TRAUTH TRAUTH MEYER TRAUTH MEYER TRAUTH
0.1195 0.1330 0.1140 0.1350 0.1224 . . 0.1250 0.1475 0.1469 0.1440 0.1420 0.1390 0.1470 . 0.1474 . 0.1505 0.1360 . 0.1460
0.1100 0.1240 0.1070 0.1250 0.1124 0.1110 0.1180 0.1125 0.1475 0.1369 0.1340 0.1420 0.1390 0.1370 0.1380 0.1374 0.1349 0.1505 0.1320 0.1430 0.1360
0.1085 0.1290 0.1050 0.1315 0.1110 0.1110 0.1200 0.1115 . . 0.1395 . 0.1390 0.1420 . 0.1424 0.1349 0.1505 . . .
Box Plots for Whole White Milk—1986 Boxplots for Whole White Milk - 1986 0.15
WWBID
0.14
0.13
0.12
0.11 S U RRO U N D
TRI-C O U N TY M A RKET
50
The Kentucky Milk Case
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
Box Plots for Lowfat White Milk—1986 Boxplots for Lowfat White Milk - 1986 0.15
LFWBID
0.14
0.13
0.12
0.11
S U RRO U N D
TRI-C O U N TY M A RKET
Box Plots for Lowfat Chocolate Milk—1986 Boxplots for Lowfat Chocolate Milk - 1986 0.15
LFC BID
0.14
0.13
0.12
0.11
0.10 S U RRO U N D
TRI-C O U N TY M A RKET
The Kentucky Milk Case
51
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
The same pattern that existed for 1985 and 1986 also existed in 1984, 1987, and 1988. From 1989 on, the pattern no longer existed. Thus, from the plots, it appears that the two dairies were working together from 1984 through 1988 in the Tri-county market. I also plotted the mean winning bids for the two dairies in each of the two markets from 1984 through 1991 for each type of milk. In all three plots, the mean winning bid in 1983 was almost the same in the two markets. Then, in 1984, the mean winning bid in the Tri-county market was higher than in the Surrounding market for all three types of milk. This trend holds basically through 1988 (the lowfat white milk mean winning bid for the Surrounding market was greater than the mean winning bid in the Tri-county market in 1988). After 1988, the mean winning bids in the two markets are almost the same. This points to collusion in the Tri-county market from 1984 through 1988.
52
The Kentucky Milk Case
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
The dispersion, measured using the standard deviation, of the winning bids for each of the three types of milk was basically smaller in the Tri-county market than in the Surrounding market for the years 1985 through 1988. Again, after 1988 this pattern no longer existed. Again, this points to collusion between the two dairies in the Tri-county market during the years 1984 through 1988.
The Kentucky Milk Case
53
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
54
The Kentucky Milk Case
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
Probability
3.2
Chapter 3
a.
This is a Venn Diagram.
b.
If the sample points are equally likely, then P(1) = P(2) = P(3) = ⋅⋅⋅ = P(10) =
1 10
Therefore,
1 1 1 3 + + = = .3 10 10 10 10 1 1 2 P(B) = P(6) + P(7) = + = = .2 10 10 10
P(A) = P(4) + P(5) + P(6) =
3.4
1 1 3 5 + + = = .25 20 20 20 20 3 3 6 + = P(B) = P(6) + P(7) = = .3 20 20 20
c.
P(A) = P(4) + P(5) + P(6) =
a.
⎛ 9⎞ 9! 9 ⋅ 8 ⋅ 7 ⋅ 6 ⋅ 5 ⋅ 4 ⋅ 3 ⋅ 2 ⋅1 = 126 = ⎜ ⎟= ⎝ 4 ⎠ 4!(9 − 4)! 4 ⋅ 3 ⋅ 2 ⋅ 1 ⋅ 5 ⋅ 4 ⋅ 3 ⋅ 2 ⋅ 1
b.
⎛7⎞ 7! 7 ⋅ 6 ⋅ 5 ⋅ 4 ⋅ 3 ⋅ 2 ⋅1 = = 21 ⎜ ⎟= ⎝ 2 ⎠ 2!(7 − 2)! 2 ⋅ 1 ⋅ 5 ⋅ 4 ⋅ 3 ⋅ 2 ⋅ 1
c.
⎛ 4⎞ 4! 4 ⋅ 3 ⋅ 2 ⋅1 =1 = ⎜ ⎟= ⎝ 4 ⎠ 4!(4 − 4)! 4 ⋅ 3 ⋅ 2 ⋅ 1 ⋅ 1
d.
⎛ 5⎞ 5! 5 ⋅ 4 ⋅ 3 ⋅ 2 ⋅1 =1 = ⎜ ⎟= ⎝ 0 ⎠ 0!(5 − 0)! 1 ⋅ 5 ⋅ 4 ⋅ 3 ⋅ 2 ⋅ 1
e.
⎛ 6⎞ 6! 6 ⋅ 5 ⋅ 4 ⋅ 3 ⋅ 2 ⋅1 = =6 ⎜ ⎟= ⎝ 5 ⎠ 5!(6 − 5)! 5 ⋅ 4 ⋅ 3 ⋅ 2 ⋅ 1 ⋅ 1
Probability
55
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
3.6
a.
The 36 sample points are: 1,1 1,2 1,3 1,4 1,5 1,6 2,1 2,2 2,3 2,4 2,5 2,6 3,1 3,2 3,3 3,4 3,5 3,6 4,1 4,2 4,3 4,4 4,5 4,6 5,1 5,2 5,3 5,4 5,5 5,6 6,1 6,2 6,3 6,4 6,5 6,6
b.
If the dice are fair, then each of the sample points is equally likely. Each would have a probability of 1/36 of occurring.
c.
There is one sample point in A: 3,3. Thus, P(A) =
1 . 36
There are 6 sample points in B: 1,6 2,5 3,4 4,3 5,2 and 6,1. Thus, P(B) =
6 1 = . 36 6
There are 18 sample points in C: 1,1 1,3 1,5 2,2 2,4 2,6 3,1 3,3 3,5 4,2 4,4 18 1 = . 4,6 5,1 5,3 5,5 6,2 6,4 and 6,6. Thus, P(C) = 36 2 3.8
Each student will obtain slightly different proportions. However, the proportions should be close to P(A) = 1/10, P(B) = 6/10 and P(C) = 3/10.
3.10
Define the following event: B: {Postal worker was assaulted on the job in the past year} P(B) =
3.12
a.
600 = .05 12,000
The 5 sample points are: Total population, Agricultural change, Presence of industry, Growth, and Population concentration.
b.
The probabilities are best estimated with the sample proportions. Thus, P(Total population) = .18 P(Agricultural change) = .05 P(Presence of industry) = .27 P(Growth) = .05 P(Population concentration) = .45
c.
Define the following event: A: {Factor specified is population-related} P(A) = P(Total population) + P(Growth) + P(Population concentration) = .18 + .05 + .45 = .68.
56
Chapter 3
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
3.14
a.
The sample points of this experiment correspond to each of the 8 possible types of commodities. Suppose we introduce notation to make the listing of the sample points easier. A: {carload contains agricultural products} CH: {carload contains chemicals} CO: {carload contains coal} F: {carload contains forest products} MO: {carload contains metallic ores and minerals} MV: {carload contains motor vehicles and equipment} N: {carload contains nonmetallic minerals and products} O: {carload contains other}
The eight sample points are: A CH CO F MO MV N O b.
The probability of each sample point is found by dividing the number of carloads for each sample point by the total number of carloads. The probabilities are: P(A) = 41,690 / 335,770 = .124 P(CH) = 38,331 / 335,770 = .114 P(CO) = 124,595 / 335,770 = .371 P(F) = 21,929 / 335,770 = .065 P(MO) = 34,521 / 335,770 = .103 P(MV) = 22,906 / 335,770 = .068 P(N) = 37,416 / 335,770 = .111 P(O) = 14,382 / 335,770 = .043
c.
P(MV) = .068 P(nonagricultural products) = P(CH) + P(CO) + P(F) + P(MO) + P(MV) + P(N) + P(O) = .114 + .371 + .065 + .103 + .068 + .111 + .043 = .875
d.
P(CH) + P(CO) = .114 + .371 = .485
e.
Since there were 335,770 carloads that week, the probability of selecting any one in particular would be 1 / 335,770 = .00000298. Thus, the probability of selecting the carload with the serial number 1003642 is .00000298.
Probability
57
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
3.16
a.
Since order does not matter, the number of different bets would be a combination of 8 things taken 2 at a time. The number of ways would be ⎛8 ⎞ 8! 8 ⋅ 7 ⋅ 6 ⋅ 5 ⋅ 4 ⋅ 3 ⋅ 2 ⋅ 1 40,320 = = = 28 ⎜ ⎟= ⎝ 2 ⎠ 2!(8 − 2)! 2 ⋅ 1 ⋅ 6 ⋅ 5 ⋅ 4 ⋅ 3 ⋅ 2 ⋅ 1 1440
3.18
b.
If all players are of equal ability, then each of the 28 sample points would be equally likely. Each would have a probability of occurring of 1/28. There is only one sample point with values 2 and 7. Thus, the probability of winning with a bet of 2-7 would by 1/28 or .0357.
a.
Let I = Infiniti 1435, TP = Toyota Prius, and C = Chevrolet Corvette. All possible rankings are as follows, where the first dealer listed is ranked first, the second dealer listed is ranked second, and the third dealer listed is ranked third: I,TP,C
b.
I,C,TP
C,I,TP
C,TP,I
TP,I,C
TP,C, I
If each set of rankings is equally likely, then each has a probability of 1/6. The probability that the Toyota Prius is ranked first = P(TP,I,C) + P(TP,C, I) =1/6 + 1/6 = 2/6 = 1/3. The probability that the Infinity 1435 is ranked third = P(C,TP,I) + P(TP,C, I) =1/6 + 1/6 = 2/6 = 1/3. The probability that the Toyota Prius is ranked first and the Chevrolet Corvette is ranked second = P(TP,C, I) =1/6.
3.20
First, we need to compute the total number of ways we can select 2 bullets (pair) from 1,837 bullets. This is a combination of 1,837 things taken 2 at a time. The number of pairs is:
⎛1,837 ⎞ 1,837! 1837 ⋅1836 ⋅ ⋅ ⋅ ⋅1 1837 ⋅1836 ⎜⎜ ⎟⎟ = = = = 1,686,366 2 ⎝ 2 ⎠ 2!(1,837 − 2)! 2 ⋅1 ⋅1835 ⋅1834 ⋅ ⋅ ⋅1 The probability of a false positive is the number of false positives divided by the number of pairs and is: P(false positive) = # false positives / # pairs = 693 / 1,686,366 = .0004
This probability is very small. There would be only about 4 false positives out of every 10,000. I would have confidence in the FBI’s forensic evidence.
58
Chapter 3
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
3.22
3.24
a.
P ( B c ) = 1 − P ( B ) = 1 − .7 = .3
b.
P ( Ac ) = 1 − P ( A) = 1 − .4 = .6
c.
P ( A ∪ B ) = P ( A) + P ( B ) − P( A ∩ B) = .4 + .7 − .3 = .8
The experiment consists of rolling a pair of fair dice. The sample points are: 1, 1 1, 2 1, 3 1, 4 1, 5 1, 6
2, 1 2, 2 2, 3 2, 4 2, 5 2, 6
3, 1 3, 2 3, 3 3, 4 3, 5 3, 6
4, 1 4, 2 4, 3 4, 4 4, 5 4, 6
5, 1 5, 2 5, 3 5, 4 5, 5 5, 6
6, 1 6, 2 6, 3 6, 4 6, 5 6, 6
Since each die is fair, each sample point is equally likely. The probability of each sample point is 1/36. a.
A: {(1, 6), (2, 5), (3, 4), (4, 3), (5, 2), (6, 1)} B: {(1, 4), (2, 4), (3, 4), (4, 4), (5, 4), (6, 4), (4, 1), (4, 2), (4, 3), (4, 5), (4, 6)} A ∩ B: {(3, 4), (4, 3)} A ∪ B: {(1, 4), (2, 4), (3, 4), (4, 4), (5, 4), (6, 4), (4, 1), (4, 2), (4, 3), (4, 5), (4, 6), (1, 6), (2, 5), (5, 2), (6, 1)} Ac: {(1, 1), (1, 2), (1, 3), (1, 4), (1, 5), (2, 1), (2, 2), (2, 3), (2, 4), (2, 6), (3, 1), (3, 2), (3, 3), (3, 5), (3, 6), (4, 1), (4, 2), (4, 4), (4, 5), (4, 6), (5, 1), (5, 3), (5, 4), (5, 5), (5, 6), (6, 2), (6, 3), (6, 4), (6, 5), (6, 6)}
b.
⎛ 1 ⎞ 6 1 P(A) = 6 ⎜ ⎟ = = ⎝ 36 ⎠ 36 6 ⎛ 1 ⎞ 11 P(B) = 11⎜ ⎟ = ⎝ 36 ⎠ 36 ⎛ 1 ⎞ 2 1 P(A ∩ B) = 2 ⎜ ⎟ = = ⎝ 36 ⎠ 36 18 ⎛ 1 ⎞ 15 5 P(A ∪ B) = 15 ⎜ ⎟ = = ⎝ 36 ⎠ 36 12 ⎛ 1 ⎞ 30 5 P(Ac) = 30 ⎜ ⎟ = = ⎝ 36 ⎠ 36 6 1 11 1 6 + 11 − 2 15 5 + − = = = 6 36 18 36 36 12
c.
P(A ∪ B) = P(A) + P(B) − P(A ∩ B) =
d.
A and B are not mutually exclusive. To be mutually exclusive, P(A ∩ B) must be 0. Here, 1 . P(A ∩ B) = 18
Probability
59
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
3.26
3.28
3.30
a.
P(Ac) = P(E3) + P(E6) = .2 + .3 = .5
b.
P(Bc) = P(E1) + P(E7) = .10 + .06 = .16
c.
P(Ac ∩ B) = P(E3) + P(E6) = .2 + .3 = .5
d.
P(A ∪ B) = P(E1) + P(E2) + P(E3) + P(E4) + P(E5) + P(E6) + P(E7) = .10 + .05 + .20 + .20 + .06 + .30 + .06 = .97
e.
P(A ∩ B) = P(E2) + P(E4) + P(E5) = .05 + .20 + .06 = .31
f.
P(Ac ∪ Bc) = P(E1) + P(E7) + P(E3) + P(E6) = .10 + .06 + .20 + .30 = .66
g.
No. A and B are mutually exclusive if P(A ∩ B) = 0. Here, P(A ∩ B) = .31.
a.
The outcome "On" and "High" is A ∩ D.
b.
The outcome "Low" or "Medium" is Dc.
Define the following events: A: {problems with absenteeism} T: {problems with turnover} From the problem, P(A) = .55, P(T) = .41, and P(A ∩ T) = .22 P(problems with either absenteeism or turnover) = P(A ∪ T) = P(A) + P(T) − P(A ∩ T) = .55 + .41 − .22 = .74
3.32
60
a.
The event A ∩ B is the event the outcome is black and odd. The event is A ∩ B: {11, 13, 15, 17, 29, 31, 33, 35}
b.
The event A ∪ B is the event the outcome is black or odd or both. The event A ∪ B is {2, 4, 6, 8, 10, 11, 13, 15, 17, 20, 22, 24, 26, 28, 29, 31, 33, 35, 1, 3, 5, 7, 9, 19, 21, 23, 25, 27}
Chapter 3
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
c.
Assuming all events are equally likely, each has a probability of 1/38. ⎛ 1 ⎞ 18 9 P(A) = 18 ⎜ ⎟ = = ⎝ 38 ⎠ 38 19 ⎛ 1 ⎞ 18 9 P(B) = 18 ⎜ ⎟ = = ⎝ 38 ⎠ 38 19 4 ⎛ 1 ⎞ 8 P(A ∩ B) = 8 ⎜ ⎟ = = ⎝ 38 ⎠ 38 19 ⎛ 1 ⎞ 28 14 P(A ∪ B) = 28 ⎜ ⎟ = = ⎝ 38 ⎠ 38 19 ⎛ 1 ⎞ 18 9 P(C) = 18 ⎜ ⎟ = = ⎝ 38 ⎠ 38 19
d.
The event A ∩ B ∩ C is the event the outcome is odd and black and low. The event A ∩ B ∩ C is {11, 13, 15, 17}.
e.
P(A ∪ B) = P(A) + P(B) − P(A ∩ B) =
f.
2 ⎛ 1 ⎞ 4 = P(A ∩ B ∩ C) = 4 ⎜ ⎟ = 38 38 19 ⎝ ⎠
g.
The event A ∪ B ∪ C is the event the outcome is odd or black or low. The event A ∪ B ∪ C is:
9 9 4 14 + − = 19 19 19 19
{1, 2, 3, ... , 29, 31, 33, 35} or {All sample points except 00, 0, 30, 32, 34, 36}
3.34
h.
⎛ 1 ⎞ 32 16 = P(A ∪ B ∪ C) = 32 ⎜ ⎟ = ⎝ 38 ⎠ 38 19
a.
P∩S∩A Products 6 and 7 are contained in this intersection.
b.
P(possess all the desired characteristics) = P(P ∩ S ∩ A) = P(6) + P(7) =
c.
1 1 1 + = 10 10 5
A∪S P(A ∪ S) = P(2) + P(3) + P(5) + P(6) + P(7) + P(8) + P(9) + P(10) 1 1 1 1 1 1 1 1 8 4 + + + + + + + = = = 10 10 10 10 10 10 10 10 10 5
Probability
61
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
d.
P∩S P(P ∩ S) = P(2) + P(6) + P(7) =
3.36
3.38
1 1 1 3 + + = 10 10 10 10
First, convert the percentages in the table to probabilities by dividing the percent by 100%. a.
P(A) = .259 + .169 + .115 = .543 P(B) = .003 P(C) = .037 + .078 + .016 + .002 + .047 + .027 = .207 P(D) = .414
b.
P(A ∩ D) = .156 + .094 + .043 = .293 P(A ∪ D) = P(A) + P(B) − P(A ∩ D) = .543 + .414 − .293 = .664
c.
Ac: {The worker is under 40} Bc: {The worker is 20 or older or is not part-time} Dc: {The worker is not part-time}
d.
P(Ac) = 1 − P(A) = 1 − .543 = .457 P(Bc) = 1 − P(B) = 1 − .003 = .997 P(Dc) = 1 − P(D) = 1 − .414 = .586
Define the following events: A: {Wheelchair user had an injurious fall} B: {Wheelchair user had all five features installed in the home} C: {Wheelchair user had no falls} D: {Wheelchair user had none of the features installed in the home}
3.40
62
a.
P ( A) =
48 = .157 306
b.
P( B) =
9 = .029 306
c.
P (C ∩ D) =
89 = .291 306
There are a total of 6 x 6 x 6 = 216 possible outcomes from throwing 3 fair dice. To help demonstrate this, suppose the three dice are different colors – red, blue and green. When we roll these dice, we will record the outcome of the red die first, the blue die second, and the green die third. Thus, there are 6 possible outcomes for the first position, 6 for the second, and 6 for the third. This leads to the 216 possible outcomes.
Chapter 3
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
The Grand Duke argued that the chance of getting a sum of 9 and the chance of getting a sum of 10 should be the same since the number of partitions for 9 and 10 are the same. These partitions are: 9 126 135 144 225 234 333
10 136 145 226 235 244 334
In each case, there are 6 partitions. However, if we take into account the three colors of the dice, then there are various ways to get each partition. For instance, to get a partition of 126, we could get 126, 162, 216, 261, 612, and 621 (again, think of the red die first, the blue die second, and the green die third). However, to get a partition of 333, there is only 1 way. To get a partition of 144, there are 3 ways: 144, 414, and 441. The numbers of ways to get each of the above partitions are: 9 126 135 144 225 234 333
# ways 6 6 3 3 6 _ 1 25
10 136 145 226 235 244 334
# ways 6 6 3 6 3 _3 27
Thus, there are a total of 25 ways to get a sum of 9 and 27 ways to get a sum of 10. The chance of throwing a sum of 9 (25 chances out of 216 possibilities) is less than the chance of throwing a 10 (27 chances out of 216 possibilities). 3.42
3.44
a.
P ( A ∩ B ) = P ( A | B ) P ( B ) = .6(.2) = .12
b.
P ( B | A) =
a.
Since A and B are mutually exclusive events, P(A ∪ B) = P(A) + P(B) = .30 + .55 = .85
b.
Since A and C are mutually exclusive events, P(A ∩ C) = 0
c.
P(A│B) =
d.
Since B and C are mutually exclusive events, P(B ∪ C) = P(B) + P(C) = .55 + .15 = .70
e.
No, B and C cannot be independent events because they are mutually exclusive events.
Probability
P ( A ∩ B ) .12 = .3 = P( A) .4
P( A ∩ B) 0 = =0 P( B) .55
63
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
3.46
a.
If two fair coins are tossed, there are 4 possible outcomes or simple events. They are: (1) HH
(2) HT
(3) TH
(4) TT
Event A contains the simple events (2), (3), and (4). Event B contains the simple events (2) and (3). A Venn diagram of this would be:
A
B 2 3
4
1
Since the coins are fair, each of the sample points is equally likely. Each would have probabilities of ¼. b.
⎛1⎞ 3 P ( A) = 3 ⎜ ⎟ = = .75 ⎝4⎠ 4 ⎛1⎞ 2 1 P ( B ) = 2 ⎜ ⎟ = = = .5 ⎝4⎠ 4 2 P ( A ∩ B ) = P (2)+P (3) =
c.
64
1 1 2 1 + = = = .5 4 4 4 2
P( A | B) =
P ( A ∩ B ) .5 = =1 P( B) .5
P ( B | A) =
P ( A ∩ B ) .5 = = .667 P ( A) .75
Chapter 3
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
3.48
The 36 possible outcomes obtained when tossing two dice are listed below: (1, 1) (1, 2) (1, 3) (1, 4) (1, 5) (1, 6) (2, 1) (2, 2) (2, 3) (2, 4) (2, 5) (2, 6) (3, 1) (3, 2) (3, 3) (3, 4) (3, 5) (3, 6) (4, 1) (4, 2) (4, 3) (4, 4) (4, 5) (4, 6) (5, 1) (5, 2) (5, 3) (5, 4) (5, 5) (5, 6) (6, 1) (6, 2) (6, 3) (6, 4) (6, 5) (6, 6) A: {(1, 2), (1, 4), (1, 6), (2, 1), (2, 3), (2, 5), (3, 2), (3, 4), (3, 6), (4, 1), (4, 3), (4, 5), (5, 2), (5, 4), (5, 6), (6, 1), (6, 3), (6, 5)} B: {(3, 6), (4, 5), (5, 4), (5, 6), (6, 3), (6, 5), (6, 6)} A ∩ B: {(3, 6), (4, 5), (5, 4), (5, 6), (6, 3), (6, 5)} If A and B are independent, then P(A)P(B) = P(A ∩ B). 18 1 7 6 1 = P(B) = P(A ∩ B) = = 36 2 36 36 6 1 7 7 1 P(A)P(B) = ⋅ = ≠ = P ( A ∩ B ) . Thus, A and B are not independent. 2 36 72 6
P(A) =
3.50
Define the following events: S: {cause of fatal crash is speeding} C: {cause of fatal crash is missing a curve} From the problem, we know P(S) = .3 and P(S ∩ C) = .12. P (C | S ) =
3.52
P (C ∩ S ) .12 = .4 = P( S ) .3
Define the following events: A: {Winner is from the American League} B: {Winner is from the National League} C: {Winner is from the Eastern Division} D {Winner is from the Central Division} E: {Winner is from the Western Division}
a.
Probability
P (C | A) =
7 P( A ∩ C ) 7 = 15 = = .7 10 P( A) 10 15
65
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
3.54
b.
1 P( B ∩ D) 1 P ( B | D) = = 15 = = .333 3 P( D) 3 15
c.
P( D ∪ E | B) =
2 P (( D ∪ E ) ∩ B ) 2 = 15 = = .4 5 P( B) 5 15
Define the following events: A: {electrical switch monitors quality of power} B: {electrical switch not wired properly} From the problem, P(A) = .90 and P(B | A) = .90. P(A ∩ B) = P(B | A) P(A) = .90(.90) = .81.
3.56
Define the following events:
Ai : {ith CEO has bachelors degree} a. b.
3.58
P ( A1 ) =
8 = .20 40
If the first 4 CEO’s have just bachelor’s degree, then on the next pick there are only 4 left to choose from. Similarly, after picking 4 CEO’s, there are only 36 observations left to choose from. 4 P ( A5 | A1 ∩ A2 ∩ A3 ∩ A4 ) = = .111 36
If A and B are independent, then P ( A ∩ B ) = P ( A) P ( B ) . For this Exercise, 1385 + 786 2171 1385 + 1175 2560 = = .651 , and P ( A) = = = .552 , P ( B ) = 3934 3934 3934 3934 P( A ∩ B) =
1385 = .352 . 3934
P ( A) P ( B ) = .552(.651) = .359 ≠ .352 = P ( A ∩ B ) . Thus, A and B are not independent. 3.60
66
The probability of a false positive is P(A | B).
Chapter 3
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
3.62
First, define the following event: A: {CVSA correctly determines the veracity of a suspect} P(A) = .98 (from claim)
3.64
a.
The event that the CVSA is correct for all four suspects is the event A ∩ A ∩ A ∩ A. P(A ∩ A ∩ A ∩ A) = .98(.98)(.98)(.98) = .9224
b.
The event that the CVSA is incorrect for at least one of the four suspects is the event (A ∩ A ∩ A ∩ A)c. P(A ∩ A ∩ A ∩ A)c = 1 − P(A ∩ A ∩ A ∩ A) = 1 − .9224 = .0776
Define the following events: I: {Leak ignites immediately (jet fire)} D: {Leak has delayed ignition (flash fire)} From the problem, P(I) = .01 and P(D | Ic) = .01 The probability of a jet fire or a flash fire = P(I ∪ D) = P(I) + P(D) – P(I ∩ D) = P(I) + P(D | Ic)P(Ic) − P(I ∩ D) = .01 + .01(1 − .01) – 0 = .01 + .0099 = .0199 A tree diagram of this problem is: I .01
I .01
D(.01)
.99
Ic
Dc (.99)
3.66
a.
Ic∩D .99(.01)=.0099
Ic∩Dc .99(.99)=.9801
Define the following events: W: F:
{Player wins the game Go} {Player plays first (black stones)}
P(W ∩ F) = 319/577 = .553
Probability
67
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
b.
P(W ∩ F│CA) = 34/34 = 1 P(W ∩ F│CB) = 69/79 = .873 P(W ∩ F│CC) = 66/118 = .559 P(W ∩ F│BA) = 40/54 = .741 P(W ∩ F│BB) = 52/95 = .547 P(W ∩ F│BC) = 27/79 = .342 P(W ∩ F│AA) = 15/28 = .536 P(W ∩ F│AB) = 11/51 = .216 P(W ∩ F│AC) = 3/39 = .077
c.
There are three combinations where the player with the black stones (first) is ranked higher than the player with the white stones: CA, CB, and BA. P(W ∩ F│CA ∪ CB ∪ BA) = (34 + 69 + 40)/(34 + 79 + 54) = 143/167 = .856
d.
There are three combinations where the players are of the same level: CC, BB, and AA. P(W ∩ F│CC ∪ BB ∪ AA) = (66 + 52 + 15)/(118 + 95 + 28) = 133/241 = .552
3.68
a.
Suppose the elements of the population are: 1, 2, 3, 4, 5, 6, 7, 8, 9, and 10. The possible samples of size 2 are: (1, 2) (1, 3) (1, 4) (1, 5) (1, 6) (1, 7) (1, 8) (1, 9) (1, 10) (2, 3) (2, 4) (2, 5) (2, 6) (2, 7) (2, 8) (2, 9) (2, 10) (3, 4) (3, 5) (3, 6) (3, 7) (3, 8) (3, 9) (3, 10) (4, 5) (4, 6) (4, 7) (4, 8) (4, 9) (4, 10) (5, 6) (5, 7) (5, 8) (5, 9) (5, 10) (6, 7) (6, 8) (6, 9) (6, 10) (7, 8) (7, 9) (7, 10) (8, 9) (8, 10) (9, 10) Since there are N = 10 elements in the population, the number of samples of size n = 2 is a combination of 10 things taken 2 at a time or ⎛ 10 ⎞ 10! 10 ⋅ 9 ⋅ 8 ⋅ 7 ⋅ 6 ⋅ 5 ⋅ 4 ⋅ 3 ⋅ 2 ⋅ 1 = 1 = 45 ⎜ ⎟= ⎝ 2 ⎠ 2!8! (2 ⋅ 1)(8 ⋅ 7 ⋅ 6 ⋅ 5 ⋅ 4 ⋅ 3 ⋅ 2 ⋅ 1) Therefore, there are 45 different samples of size n = 2 that can be selected from a population of N = 10.
b.
68
If random sampling is employed, every pair of elements has an equal probability of being selected. Therefore, the probability of drawing a particular pair is 1/45.
Chapter 3
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
c.
To draw a random sample of 2 elements from 10, we will number the elements from 0 to 9. Then, starting in an arbitrary position in Table I, Appendix B, we will select two numbers by going either down a column or across a row. Suppose that we start in the third position of column 6 and row 9. We will proceed down the column. The first sample drawn will be 1 and 5. The second sample drawn will be 9 and 4. The 20 samples selected are: Sample Number
1 2 3 4 5 6 7 8 9 10
Items Selected 1, 5 9, 4 4, 2 9, 3 8, 1 5, 6 1, 3 0, 2 4, 6 8, 0
Sample Number 11 12 13 14 15 16 17 18 19 20
Items Selected 0, 9 1, 0 3, 7 3, 9 0, 8 3, 4 0, 4 9, 7 8, 4 0, 5
There are actually two pairs of samples that match: Samples 10 and 15, and samples 4 and 14. Given the low probability of each pair occurring, it is not that likely to have two pairs of samples that match. 3.70
First, number the elements of the population from 1 to 200,000. Starting in row 10, column 1, of Table I of Appendix B and reading down, take the first ten 6-digit numbers. Eliminate any duplicates, the number 000000, and all numbers greater than 200,000. The 10 numbers selected for the random sample are: 094299 103656 071199 023682 010115 070569 024883 007425 053660 005820 Elements with the above numbers are selected for the sample.
3.72
To draw a random sample of 1,000 households from 534,322, we will number the households from 1 to 534,322. Then, starting in an arbitrary position in Table I, Appendix B, we will select 6-digit numbers by proceeding down a column. We will continue selecting numbers until we have 1,000 different 6-digit numbers, eliminating 000000 and any numbers between 534,323 and 999,999.
Probability
69
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
3.74
a.
Give each stock in the NYSE-Composite Transactions table of the Wall Street Journal a number (1 to m). Using Table I of Appendix B, pick a starting point and read down using the same number of digits as in m until you have n different numbers between 1 and m, inclusive.
3.76
a.
P ( B1 ∩ A) = P ( A | B1 ) P ( B1 ) = .3(.75) = .225
b.
P( B2 ∩ A) = P( A | B2 ) P( B2 ) = .5(.25) = .125
c.
P ( A) = P ( B1 ∩ A) + P ( B2 ∩ A) = .225 + .125 = .35
d.
P ( B1 | A) =
P ( B1 ∩ A) .225 = = .643 P( A) .35
e.
P ( B2 | A) =
P ( B2 ∩ A) .125 = = .357 P ( A) .35
3.78
If A is independent of B1, B2, and B3, then P( A | B1 ) = P( A) = .4 . Then P ( B1 | A) =
3.80
a.
P ( A | B1 ) P ( B1 ) .4(.2) = = .2 P ( A) .4
P( E1 ∩ error ) P (error ) P (error | E1 ) P( E1 ) = P(error | E1 ) P( E1 ) + P(error | E2 ) P( E2 ) + P(error | E3 ) P ( E3 )
P ( E1 | error ) =
= b.
.01(.30) .003 .003 = = .158 = .01(.30) + .03(.20) + .02(.50) .003 + .006 + .01 .019
P( E2 ∩ error ) P (error ) P(error | E2 ) P( E2 ) = P (error | E1 ) P ( E1 ) + P (error | E2 ) P ( E2 ) + P(error | E3 ) P( E3 )
P ( E2 | error ) =
=
70
.03(.20) .006 .006 = = .316 = .01(.30) + .03(.20) + .02(.50) .003 + .006 + .01 .019
Chapter 3
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
c.
P ( E3 ∩ error ) P(error ) P(error | E3 ) P ( E3 ) = P(error | E1 ) P( E1 ) + P(error | E2 ) P( E2 ) + P(error | E3 ) P( E3 )
P ( E3 | error ) =
= d.
3.82
.02(.50) .01 .01 = = = .526 .01(.30) + .03(.20) + .02(.50) .003 + .006 + .01 .019
If there was a serious error, the probability that the error was made by engineer 3 is .526. This probability is higher than for any of the other engineers. Thus engineer #3 is most likely responsible for the error.
Define the following events: D: {Defect in steel casting} H: {NDE detects ‘Hit” or defect in steel casting} From the problem, P(H | D) = .97, P(H | Dc) = .005, and P(D) = .01. P(H) = P(H | D)P(D) + P(H | Dc)P(Dc) = .97(.01) + .005(.99) = .0097 + .00495 = .01465 P( D | H ) =
3.84
P ( D ∩ H ) P ( H | D) P ( D) .97(.01) .0097 = = = = .6621 P( H ) P( H ) .01465 .01465
Define the following events: A: {Alarm A sounds alarm} B: {Alarm B sounds alarm} I: {Intruder} From the problem: P(A | I ) = .9 P(B | I ) = .95 P(A | Ic ) = .2 P(B | Ic ) = .1 P( I ) = .4 Since the two systems are operating independently of each other, P(A ∩ B | I ) = P(A | I ) P(B | I ) = .9 (.95) = .855 P(A ∩ B ∩ I ) = P(A ∩ B | I ) P( I ) = .855(.4) = .342 P(A ∩ B | Ic ) = P(A | Ic ) P(B | Ic ) = .2 (.1) = .02
Probability
71
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
P(A ∩ B ∩ Ic ) = P(A ∩ B | Ic ) P( Ic ) = .02(.6) = .012 Thus, P(A ∩ B) = P(A ∩ B ∩ I ) + P(A ∩ B ∩ Ic ) = .342 + .012 = .354 Finally, P(I | A ∩ B ) = P(A ∩ B ∩ I ) / P(A ∩ B) = .342 / .354 = .966 3.86
a.
The two probability rules for a sample space are that the probability for any sample point is between 0 and 1 and that the sum of the probabilities of all the sample points is 1. For this Exercise, all the probabilities of the sample points are between 0 and 1 and 4
∑ P(S ) = P(S ) + P(S ) + P(S ) + P( S ) =.2 + .1 + .3 + .4 = 1.0 i =1
b.
i
1
2
3
4
P( A) = P( S1 ) + P( S4 ) = .2 + .4 = .6
3.88
P ( A ∪ B ) = P ( A) + P( B) − P( A ∩ B) = .7 + .5 − .4 = .8
3.90
a.
If the Dow Jones Industrial Average increases, a large New York bank would tend to decrease the prime interest rate. Therefore, the two events are not mutually exclusive since they could occur simultaneously.
b.
The next sale by a PC retailer could not be both a laptop and a desktop computer. Since the two events cannot occur simultaneously, the events are mutually exclusive.
c.
Since both events cannot occur simultaneously, the events are mutually exclusive.
a.
Because events A and B are independent, we have:
3.92
P(A ∩ B) = P(A)P(B) = (.3)(.1) = .03 Thus, P(A ∩ B) ≠ 0, and the two events cannot be mutually exclusive.
3.94
72
P( A ∩ B ) .03 = = .3 P( B) .1
P(B│A) =
P( A ∩ B ) .03 = = .1 P ( A) .3
b.
P(A│B) =
c.
P(A ∪ B) = P(A) + P(B) − P(A ∩ B) = .3 + .1 − .03 = .37
Mutually exclusive events are also dependent events since the assumption that one event occurs alters the probability of the occurrence of the other one. If we assume that one event has occurred, it is impossible for the other one to occur simultaneously since they are mutually exclusive. In other words, if A and B are mutually exclusive, P(A ∩ B) = 0. P(A│B) = P( A ∩ B) 0 = = 0. Since P(A) ≠ 0, A and B are dependent. P( B) P( B)
Chapter 3
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
3.96
Define the following events: C: {Public school building has inadequate plumbing} D: {Public school has plans for repairing building} From the problem, we know P(C) = .25 and P(D|C) = .38. P (C ∩ D) = P ( D | C ) P(C ) = .38(.25) = .095
3.98
a.
The event {The manager was involved in the ISO 9000 registration} contains the sample points {The manager was very involved}, {The manager had moderate involvement}, and {The manager had minimal involvement}. Thus, P(A) is: P(A) =
b.
The event {The length of time to achieve ISO 9000 registration was more than 2 years} contains the sample points {The length of time to achieve ISO 9000 registration was between 2.1 and 2.5 years} and {The length of time to achieve ISO 9000 registration was greater than 2.5 years}. Thus, P(B) is: P(B) =
3.100
9 16 12 37 = = .925 + + 40 40 40 40
2 3 5 = = .125 + 40 40 40
c.
We cannot determine if events A and B are independent from the data given because there is no way of finding the P(A ∩ B). In order to find P(A ∩ B), the 40 individuals would have to be classified on both variables at the same time. In the data provided, the individuals are first classified on the first variable and then classified on the second variable.
a.
The experiment consists of selecting 159 employees and asking each to indicate how strongly he/she agreed or disagreed with the statement "I believe that management is committed to CQI." There are five sample points: "Strongly agree," "Agree," "Neither agree nor disagree," "Disagree," and "Strongly disagree."
b.
Since we have frequencies for each of the sample points, good estimates of the probabilities are the relative frequencies. To find the relative frequencies, divide all of the frequencies by the sample size of 159. The estimates of the probabilities are:
c.
Probability
Strongly Agree
Agree
Neither Agree Nor Disagree
Disagree
Strongly Disagree
.189
.403
.258
.113
.038
The probability that an employee agrees or strongly agrees with the statement is .189 + .403 = .592.
73
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
3.102
d.
The probability that an employee does not strongly agree with the statement is equal to the sum of all the probabilities except that for "strongly agree" = .403 + .258 + .113 + .038 = .812.
a.
There are a total of 9 × 2 = 18 sample points for this experiment. There are 9 sources of CO poisoning, and each source of poisoning has 2 possible outcomes, fatal or nonfatal. Suppose we introduce some notation to make it easier to write down the sample points. Let FI = Fire, AU = Auto exhaust, FU = Furnace, K = Kerosene or spaceheater, AP = Appliance, OG = Other gas-powered motors, FP = Fireplace, O = Other, and U = Unknown. Also, let F = Fatal and N = Nonfatal. The 18 sample points are: FI, F FI, N
AU, F AU, N
FU, F FU, N
K, F K, N
AP, F AP, N
OG, F OG, N
FP, F FP, N
O, F O, N
b.
The set of all sample points is called the sample space.
c.
The event A is made up of the following sample points: FI, F and FI, N
U, F U, N
Then, P(A) = P(FI, F) + P(FI, N) = 63/981 + 53/981 = 116/981 = .118 d.
The event B is made up of the following sample points: (FI, F); (AU, F); (FU, F); (K, F); (AP, F); (OG, F); (FP, F); (O, F); (U, F) Then, P(B) = P(FI, F) + P(AU, F) + P(FU, F) + P(K, F) + P(AP, F) + P(OG, F) + P(FP, F) + P(O, F) + P(U, F) = 63/981 + 60/981 + 18/891 + 9/981 + 9/981 + 3/981 + 0/981 + 3/981 + 9/981 = 174/981 = .177
e.
The event C is made up of the following sample points: (AU, F) and (AU, N) Then, P(C) = P(AU, F) + P(AU, N) = 60/981 + 178/981 = 238/981 = .243
f.
The event D is made up of the following sample point: AU, F Then, P(D) = P(AU, F) = 60/981 = .061
g.
The event E is made up of the following sample point: FI, N Then, P(E) = P(FI, N) = 53/981 = .054
3.104
Since there are 11 individuals who are willing to serve on the panel, the number of different panels of 5 experts is a combination of 11 things taken 5 at a time or ⎛ 11⎞ 11! 11 ⋅ 10 ⋅ 9 ⋅ 8 ⋅ 7 ⋅ 6 ⋅ 5 ⋅ 4 ⋅ 3 ⋅ 2 ⋅ 1 = 462 = ⎜ ⎟= ⎝ 5 ⎠ 5!6! (5 ⋅ 4 ⋅ 3 ⋅ 2 ⋅ 1)(6 ⋅ 5 ⋅ 4 ⋅ 3 ⋅ 2 ⋅ 1)
74
Chapter 3
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
3.106
The possible ways of ranking the blades are: GSW GWS
SGW SWG
WGS WSG
If the consumer had no preference but still ranked the blades, then the 6 possibilities are equally likely. Therefore, each of the 6 possibilities has a probability of 1/6 of occurring.
3.108
a.
P(Ranks G first) = P(GSW) + P(GWS) =
1 1 2 1 + = = 6 6 6 3
b.
P(Ranks G last) = P(SWG) + P(WSG) =
1 1 2 1 + = = 6 6 6 3
c.
P(ranks G last and W second) = P(SWG) =
d.
P(WGS) =
a.
Consecutive tosses of a coin are independent events since what occurs one time would not affect the next outcome.
b.
If the individuals are randomly selected, then what one individual says should not affect what the next person says. They are independent events.
c.
The results in two consecutive at-bats are probably not independent. The player may have faced the same pitcher both times which may affect the outcome.
d.
The amount of gain and loss for two different stocks bought and sold on the same day are probably not independent. The market might be way up or down on a certain day so that all stocks are affected.
e.
The amount of gain or loss for two different stocks that are bought and sold in different time periods are independent. What happens to one stock should not affect what happens to the other.
f.
The prices bid by two different development firms in response to the same building construction proposal would probably not be independent. The same variables would be present for both firms to consider in their bids (materials, labor, etc.).
Probability
1 6
1 6
75
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
3.110
a.
We will define the following events: A:{The first activation device works properly; i.e., activates the sprinkler when it should} B:{The second activation device works properly} From the statement of the problem, we know P(A) = .91 and P(B) = .87 Furthermore, since the activation devices work independently, we conclude that P(A ∩ B) = P(A)P(B) = (.91)(.87) = .7917 Now, if a fire starts near a sprinkler head, the sprinkler will be activated if either the first activation device or the second activation device, or both, operates properly. Thus, P(Sprinkler head will be activated) = P(A ∪ B) = P(A) + P(B) − P(A ∩ B) = .91 + .87 − .7917 = .9883
b.
The event that the sprinkler head will not be activated is the complement of the event that the sprinkler will be activated. Thus, P(Sprinkler head will not be activated) = 1 − P(Sprinkler head will be activated) = 1 − .9883 = .0117
c.
From part a, P(A ∩ B) = P(A)P(B) = .7917
d.
In terms of the events we have defined, we wish to determine P(A ∩ Bc) = P(A)P(Bc) (by independence) = .91(1 − .87) = .91(.13) = .1183
3.112
Define the following events: S: {System shuts down} F1: {Hardware failure} F2: {Software failure} F3: {Power failure} From the Exercise, we know: P(F1) = .01, P(F2) = .05, and P(F3) = .02. Also, P(S|F1) = .73, P(S|F2) = .12, and P(S|F3) = .88.
76
Chapter 3
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
The probability that the current shutdown is due to a hardware failure is: P ( F1 ∩ S ) P( S | F1 ) P ( F1 ) = P(S ) P ( S | F1 ) P ( F1 ) + P ( S | F2 ) P ( F2 ) + P ( S | F3 ) P ( F3 )
P ( F1 | S ) =
.73(.01) .0073 .0073 = = = .2362 .73(.01) + .12(.05) + .88(.02) .0073 + .006 + .0176 .0309
=
The probability that the current shutdown is due to a software failure is: P ( F2 | S ) = =
P ( F2 ∩ S ) P ( S | F2 ) P ( F2 ) = P(S ) P ( S | F1 ) P ( F1 ) + P( S | F2 ) P( F2 ) + P( S | F3 ) P( F3 ) .12(.05) .006 .006 = = = .1942 .73(.01) + .12(.05) + .88(.02) .0073 + .006 + .0176 .0309
The probability that the current shutdown is due to a power failure is: P ( F3 | S ) = = 3.114
P( F3 ∩ S ) P ( S | F3 ) P ( F3 ) = P( S ) P ( S | F1 ) P ( F1 ) + P ( S | F2 ) P ( F2 ) + P ( S | F3 ) P ( F3 ) .88(.02) .0176 .0176 = = = .5696 .73(.01) + .12(.05) + .88(.02) .0073 + .006 + .0176 .0309
Define the following events: C: {Committee judges joint acceptable} I: {Inspector judges joint acceptable} The sample points of this experiment are: C∩I C ∩ Ic Cc ∩ I Cc ∩ I c a.
The probability the inspector judges the joint to be acceptable is: P(I) = P(C ∩ I) + P(C c ∩ I) =
101 23 124 + = = .810 153 153 153
The probability the committee judges the joint to be acceptable is: P(C) = P(C ∩ I) + P(C ∩ I c) =
Probability
101 10 111 + = = .725 153 153 153
77
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
b.
The probability that both the committee and the inspector judge the joint to be acceptable is: P(C ∩ I) =
101 = .660 153
The probability that neither judge the joint to be acceptable is: P(C c ∩ I c) = c.
19 = .124 153
The probability the inspector and committee disagree is: P(C ∩ I c) + P(C c ∩ I) =
10 23 33 + = = .216 153 153 153
The probability the inspector and committee agree is: P(C ∩ I) + P(C c ∩ I c) = 3.116
a.
101 19 120 + = = .784 153 153 153
Define the following events: A1: A2: B3: B4: A: B:
{Component 1 works properly} {Component 2 works properly} {Component 3 works properly} {Component 4 works properly} {Subsystem A works properly} {Subsystem B works properly}
The probability a component fails is .1, so the probability a component works properly is 1 − .1 = .9. Subsystem A works properly if both components 1 and 2 work properly. P(A) = P(A1 ∩ A2) = P(A1)P(A2) = .9(.9) = .81 (since the components operate independently) Similarly, P(B) = P(B1 ∩ B2) = P(B1)P(B2) = .9(.9) = .81 B
B
B
B
The system operates properly if either subsystem A or B operates properly. The probability the system operates properly is: P(A ∪ B) = P(A) + P(B) - P(A ∩ B) = P(A) + P(B) − P(A)P(B) = .81 + .81 − .81(.81) = .9639
78
Chapter 3
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
b.
The probability exactly one subsystem fails is: P(A ∩ Bc) + P(Ac ∩ B) = P(A)P(Bc) + P(Ac)P(B) = .81(1 − .81) + (1 − .81).81 = .1539 + .1539 = .3078
c.
The probability the system fails is the probability that both subsystems fail or: P(Ac ∩ Bc) = P(Ac)P(Bc) = (1 − .81)(1 − .81) = .0361
d.
The system operates correctly 99% of the time means it fails 1% of the time. The probability one subsystem fails is .19. The probability n subsystems fail is .19 n. Thus, we must find n such that .19n ≤ .01 Thus, n = 3.
3.118
Define the events: A: {A bottle comes from machine A} B: {A bottle comes from machine B} R: {A bottle is rejected}. Then the given probabilities are: P(A) = .75, P(B) = .25, P(R│A) =
1 1 , P(R│B) = 20 30
The proportion of rejected bottles is: P(R) = P(A ∩ R) + P(B ∩ R) = P(R⏐A)P(A) + P(R│A)P(B) 1 1 (.75) + (.25) = .0458 = 20 30 The probability that a bottle comes from machine A, given that it is accepted is: c P( A ∩ R c ) P ( R A) ⋅ P ( A) (19 / 20) ⋅ (.75) = .7467 P(A│R ) = = = R( R c ) 1 − P( R) 1 − .0458
c
Probability
79
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
3.120
There are a total of 6 × 6 = 36 outcomes when rolling 2 dice. If we let the first number in the pair represent the outcome of die number 1 and the second number in the pair represent the outcome of die number 2, then the possible outcomes are: 1,1 1,2 1,3 1,4 1,5 1,6
2,1 2,2 2,3 2,4 2,5 2,6
3,1 3,2 3,3 3,4 3,5 3,6
4,1 4,2 4,3 4,4 4,5 4,6
5,1 5,2 5,3 5,4 5,5 5,6
6,1 6,2 6,3 6,4 6,5 6,6
If both dice are fair, then each of these outcomes are equally like and have a probability of 1/36. a.
To win on the first roll, a player must roll a 7 or 11. There are 6 ways to roll a 7 and 2 ways to roll an 11. Thus the probability of winning on the first roll is: P (7 or 11) =
b.
To lose on the first roll, a player must roll a 2 or 3. There is 1 way to roll a 2 and 2 ways to roll a 3. Thus the probability of losing on the first roll is: P (2 or 3) =
c.
8 = .2222 36
3 = .0833 36
If a player rolls a 4 on the first roll, the game will end on the next roll if the player rolls the original roll again (player wins) or if the player rolls a seven (player loses). Now, there are 3 ways of getting a 4 on the first roll: 1,3, 2,2, or 3,1. If the first roll was 2,2, then the game would end on the next roll if the player threw a 2,2, 1,6, 2,5, 3,4, 4,3, 5,2, or 6,1 on the next roll. The probability of the game ending on the next roll would be: P (2, 2 or 7 on second toss | 2, 2 on first) =
7 = .1944 36
Now, suppose the first roll ended with a 1 and a 3. Since the dice are not marked, this result could have happened two ways: 1, 3 or 3,1. Regardless of how the original 1 and 3 were obtained, the player would have 2 ways of winning on the next roll: 1,3 or 3,1. For the game to end on the next roll, the player could throw 1,3, 3,1, 1,6, 2,5, 3,4, 4,3, 5,2, or 6,1. The probability of the game ending on the next roll would be: P (1,3 or 3,1 or 7 on second toss |1 and 3 on first) =
8 = .2222 36
Since there were 3 ways to get a 4 on the first roll, and each were equally likely, P(2,2) = 1/3 and P[1 and 3 (any order)] = 2/3.
80
Chapter 3
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
The probability that the game ends on the second roll is P (2, 2 or 7 on second toss | 2, 2 on first) P (2, 2 on first) + P (1,3 or 3,1 or 7 on second toss |1 and 3 on first) P (1 and 3 on first) ⎛1⎞ ⎛2⎞ = .1944 ⎜ ⎟ + .2222 ⎜ ⎟ = .0648 + .1481 = .2129 ⎝3⎠ ⎝3⎠
3.122
Suppose we define the following event: E: {Error produced when dividing} From the problem, we know that P(E) = 1 / 9,000,000,000 The probability of no error produced when dividing is P(Ec) = 1 – P(E) = 1 – 1 / 9,000,000,000 = 8,999,999,999 / 9,000,000,000 = .999999999 ≈ 1.0000 Suppose we want to find the probability of no errors in 2 divisions (assuming each division is independent): P(Ec ∩ Ec) = .999999999(.999999999) = .999999999 ≈ 1.0000 Thus, in general, the probability of no errors in k divisions would be: P(Ec ∩ Ec ∩ Ec ∩ …∩ Ec) = P(Ec)k = [8,999,999,999 / 9,000,000,000]k k times Suppose a user ran a program that performed 1 billion divisions. The probability of no errors in these 1 billion divisions would be: P(Ec)1,000,000,000 = [8,999,999,999 / 9,000,000,000]1,000,000,000 = .9048 Thus, the probability of at least 1 error in 1 billion divisions would be 1 − P(Ec)1,000,000,000 = 1 - [8,999,999,999 / 9,000,000,000]1,000,000,000 = 1 − .9048 = .0852 For a heavy MINITAB user, this flawed chip would be a problem because the above probability is not that small.
Probability
81
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
Random Variables and Probability Distributions 4.2
Chapter 4
a.
The closing price of a particular stock on the New York Stock Exchange is discrete. It can take on only a countable number of values.
b.
The number of shares of a particular stock that are traded on a particular day is discrete. It can take on only a countable number of values.
c.
The quarterly earnings of a particular firm is discrete. It can take on only a countable number of values.
d.
The percentage change in yearly earnings between 2005 and 2006 for a particular firm is continuous. It can take on any value in an interval.
e.
The number of new products introduced per year by a firm is discrete. It can take on only a countable number of values.
f.
The time until a pharmaceutical company gains approval from the U.S. Food and Drug Administration to market a new drug is continuous. It can take on any value in an interval of time.
4.4
The number of customers, x, waiting in line can take on values 0, 1, 2, 3, … . Even though the list is never ending, we call this list countable. Thus, the random variable is discrete.
4.6
A banker might be interested in the number of new accounts opened in a month, or the number of mortgages it currently has, both of which are discrete random variables.
4.8
The manager of a hotel might be concerned with the number of employees on duty at a specific time, or the number of vacancies there are on a certain night.
4.10
A stockbroker might be interested in the length of time until the stockmarket is closed for the day.
4.12
a.
The variable x can take on values 1, 3, 5, 7, and 9.
b.
The value of x that has the highest probability associated with it is 5. It has a probability of .4.
82
Chapter 4
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
4.14
4.16
c.
Using MINITAB, the probability distribution of x as a graph is:
d.
P(x = 7) = .2
e.
P(x ≥ 5) = p(5) + p(7) + p(9) = .4 + .2 + .1 = .7
f.
P(x > 2) = p(3) + p(5) + p(7) + p(9) = .2 + .4 + .2 + .1 = .9
a.
This is not a valid distribution because
b.
This is a valid distribution because 0 ≤ p(x) ≤ 1 for all values of x and
c.
This is not a valid distribution because p(4) = −.3 < 0.
d.
The sum of the probabilities over all possible values of the random variable is ∑ p( x) = 1.1 > 1, so this is not a valid probability distribution.
a.
μ = E(x) =
∑ p( x) = .9 ≠ 1.
∑ p( x) = 1.
∑ xp( x)
= 10(.05) + 20(.20) + 30(.30) + 40(.25) + 50(.10) + 60(.10) = .5 + 4 + 9 + 10 + 5 + 6 = 34.5
σ2 = E(x − μ)2 =
∑ (x − μ)
2
p ( x)
= (10 − 34.5)2(.05) + (20 − 34.5)2(.20) + (30 − 34.5)2(.30) + (40 − 34.5)2(.25) + (50 − 34.5)2(.10) + (60 − 34.5)2(.10) = 30.0125 + 42.05 + 6.075 + 7.5625 + 24.025 + 65.025 = 174.75 σ = 174.75 = 13.219 b.
Random Variables and Probability Distributions
83
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
c.
μ ± 2σ ⇒ 34.5 ± 2(13.219) ⇒ 34.5 ± 26.438 ⇒ (8.062, 60.938) P(8.062 < x < 60.938) = p(10) + p(20) + p(30) + p(40) + p(50) + p(60) = .05 + .20 + .30 + .25 + .10 + .10 = 1.00
4.18
a.
It would seem that the mean of both would be 1 since they both are symmetric distributions centered at 1.
b.
P(x) seems more variable since there appears to be greater probability for the two extreme values of 0 and 2 than there is in the distribution of y.
c.
For x:
∑ xp( x) = 0(.3) + 1(.4) + 2(.3) = 0 + .4 + .6 = 1 σ2 = E[(x − μ)2] = ∑ ( x − μ ) p ( x) μ = E(x) =
2
= (0 − 1)2(.3) + (1 − 1)2(.4) + (2 − 1)2(.3) = .3 + 0 + .3 = .6
∑ yp( y) = 0(.1) + 1(.8) + 2(.1) = 0 + .8 + .2 = 1 σ2 = E[(y − μ)2] = ∑ ( y − μ ) p( y ) μ = E(y) =
For y:
2
= (0 − 1)2(.1) + (1 − 1)2(.8) + (2 − 1)2(.1) = .1 + 0 + .1 = .2 The variance for x is larger than that for y. 4.20
a.
Yes. Relative frequencies are observed values from a sample. Relative frequencies are commonly used to estimate unknown probabilities. In addition, relative frequencies have the same properties as the probabilities in a probability distribution, namely 1. all relative frequencies are greater than or equal to zero 2. the sum of all the relative frequencies is 1
b.
Using MINITAB, the graph of the probability distribution is: 0.15
p(age)
0.10
0.05
0.00 20
25
30
age
c.
Let x = age of employee. Then P(x > 30) = .13 + .15 + .12 = .40. P(x > 40) = 0 P(x < 30) = .02 + .04 + .05 + .07 + .04 + .02 + .07 + .02 + .11 + .07 = .51
d.
84
P(x = 25 or x = 26) = .02 + .07 = .09
Chapter 4
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
4.22
4.24
a.
The probability distribution for x is: Grill Display Combination 1-2-3
x 6
p(x) 35 / 124 = .282
1-2-4
7
8 / 124 = .065
1-2-5
8
42 / 124 = .339
2-3-4
9
4 / 124 = .032
2-3-5
10
1 / 124 = .008
2-4-5
11
34 / 124 = .274
b.
P(x > 10) = p(10) + p(11) = .008 + .274 = .282
a.
First, we must find the probability distribution of x. Define the following events: C: {Chicken is contaminated} N: {Chicken is not contaminated} If 3 slaughtered chickens are randomly selected, then the possible outcomes are: CCC, CCN, CNC, NCC, CNN, NCN, NNC, and NNN Each of these outcomes are NOT equally likely since P(C) = 1/100 = .01. P(N) = 1 – P(C) = 1 -−.01 = .99. P(CCC) = P(C ∩ C ∩ C ) = P(C) P(C) P(C) = .01(.01)(.01) = .000001 P(CCN) = P(CNC) = P(NCC) = P(C ∩ C ∩ N ) = P(C) P(C) P(N) = .01(.01)(.99) = .000099 P(CNN) = P(NCN) = P(NNC) = P(C ∩ N ∩ N ) = P(C) P(N) P(N) = .01(.99)(.99) = .009801 P(NNN) = P(N ∩ N ∩ N ) = P(N) P(N) P(N) = .99(.99)(.99) = .970299. The variable x is defined as the number of contaminated chickens in the sample. The value of x for each of the outcomes is: Event CCC CCN CNC NCC CNN NCN NNC NNN
x 3 2 2 2 1 1 1 0
Random Variables and Probability Distributions
p(x) .000001 .000099 .000099 .000099 .009801 .009801 .009801 .970299
85
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
The probability distribution of x is: x 3 2 1 0
b.
p(x) .000001 .000297 .029403 .970299
Using MINITAB, the probability graph for x is:
1.0 0.9 0.8 0.7
p(x)
0.6 0.5 0.4 0.3 0.2 0.1 0.0 0
1
2
3
x
c. 4.26
P(x ≤ 1) = P(x = 0) + P(x = 1) = .970299 + .029403 = .999702
To find the probability distribution of x, we first list the possible values of x. For this exercise, the possible values of x are −3, −1, and 5. Next, we list the number of cases, f(x), that result in the particular values of x. To find the probability distribution of x, we divide the number of cases for each value of x, f(x), by the total number of cases, 678. For x = −3, the probability is p(−3) = 68 / 678 = .100. For x = −1, the probability is p(−1) = 71 / 678 = .105. For x = 5, the probability is p(5) = 539 / 678 = .795. The probability distribution of x is:
x −3 −1 5 Total
86
f(x) 68 71 539 678
p(x) .100 .105 .795 1.000
Chapter 4
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
Using MINITAB, the graph of the probability distribution is:
0.8 0.7 0.6
p(x)
0.5 0.4 0.3 0.2 0.1 0.0 -3
-2
-1
0
1
2
3
4
5
x
4.28
a.
E(x) =
∑ xp( x)
All x
Firm A: E(x) = 0(.01) + 500(.01) + 1000(.01) + 1500(.02) + 2000(.35) + 2500(.30) + 3000(.25) + 3500(.02) + 4000(.01) + 4500(.01) + 5000(.01) = 0 + 5 + 10 + 30 + 700 + 750 + 750 + 70 + 40 + 45 + 50 = 2450 Firm B: E(x) = 0(.00) + 200(.01) + 700(.02) + 1200(.02) + 1700(.15) + 2200(.30) + 2700(.30) + 3200(.15) + 3700(.02) + 4200(.02) + 4700(.01) = 0 + 2 + 14 + 24 + 255 + 660 + 810 + 480 + 74 + 84 + 47 = 2450 b.
σ = σ2
σ2 =
∑ (x − μ)
2
p( x)
All x
Firm A: σ2 = (0 − 2450)2(.01) + (500 − 2450)2(.01) + ⋅⋅⋅ + (5000 − 2450)2(.01) = 60,025 + 38,025 + 21,025 + 18,050 + 70,875 + 750 + 75,625 + 22,050 + 24,025 + 42,025 + 65,025 = 437,500 σ = 661.44 Firm B: σ2 = (0 − 2450)2(.00) + (200 − 2450)2(.01) + ⋅⋅⋅ + (4700 − 2450)2(.01) = 0 + 50,625 + 61,250 + 31,250 + 84,375 + 18,750 + 84,375 + 31,250 + 61,250 + 50,625 = 492,500 σ = 701.78 Firm B faces greater risk of physical damage because it has a higher variance and standard deviation.
Random Variables and Probability Distributions
87
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
4.30
a.
If a large number of measurements are observed, then the relative frequencies should be very good estimators of the probabilities.
b.
E(x) =
∑ xp( x) = 1(.01) + 2(.04) + 3(.04) + 4(.08) + 5(.10) + 6(.15) + 7(.25) + 8(.20) + 9(.08) + 10(.05) = .01 + .08 + .12 + .32 + .50 + .90 + 1.75 + 1.60 + .72 + .50 = 6.50
The average number of checkout lanes per store is 6.5. c.
σ2 =
∑ (x − μ)
2
p( x) = (1 − 6.5)2(.01) + (2 − 6.5)2(.04) + (3 − 6.5)2(.04)
All x
+ (4 − 6.5)2(.08) + (5 − 6.5)2(.10) + (6 − 6.5)2(.15) + (7 − 6.5)2(.25) + (8 − 6.5)2(.20) + (9 − 6.5)2(.08) + (10 − 6.5)2(.05) = .3025 + .8100 + .4900 + .5000 + .2250 + .0375 + .0625 + .4500 + .5000 + .6125 = 3.99
σ= d.
3.99 = 1.9975
Chebyshev's Rule says that at least 0 of the observations should fall in the interval μ ± σ.
Chebyshev's Rule says that at least 75% of the observations should fall in the interval μ ± 2σ. e.
μ ± σ ⇒ 6.5 ± 1.9975 ⇒ (4.5025, 8.4975)
P(4.5025 ≤ x ≤ 8.4975) = .10 + .15 + .25 + .20 = .70 This is at least 0.
μ ± 2σ ⇒ 6.5 ± 2(1.9975) ⇒ 6.5 ± 3.995 ⇒ (2.505, 10,495)
P(2.505 ≤ x ≤ 10.495) = .04 + .08 + .10 + .15 + .25 + .20 + .08 + .05 = .95 This is at least .75 or 75%.
4.32
Let x = winnings in the Florida lottery. The probability distribution for x is: x p(x) 22,999,999/23,000,000 −$1 $6,999,999 1/23,000,000
The expected net winnings would be:
μ = E(x) = (−1)(22,999,999/23,000,000) + 6,999,999(1/23,000,000) = −$.70 The average winnings of all those who play the lottery is −$.70.
88
Chapter 4
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
4.34
Each point in the system can have one of 2 status levels, “free” or “obstacle”. Define the following events: AF: {Point A is free} BF: {Point B is free} CF: {Point C is free}
AO: {Point A is obstacle} BO: {Point B is obstacle} CO: {Point C is obstacle}
Thus, the sample points for the space are: AFBFCF, AFBFCO, AFBOCF, AFBOCO, AOBFCF, AOBFCO, AOBOCF, AOBOCO Since it is stated that the probability of any point in the system having a “free” status is .5, the probability of any point having an “obstacle” status is also .5, Thus, the probability of each of the sample points above is P(AiBiCi) = .5(.5)(.5) = .125. The values of Y, the number of free links in the system, for each sample point are listed below. A link is free if both the points are free. Thus, a link from A to B is free if A is free and B is free. A link from B to C is free if B is free and C is free.
Sample point
Y
Probability
AFBFCF
2
.125
AFBFCO
1
.125
AFBOCF
0
.125
AFBOCO
0
.125
AOBFCF
1
.125
AOBFCO
0
.125
AOBOCF
0
.125
AOBOCO
0
.125
The probability distribution for Y is: Y
Probability
0
.625
1
.250
2
.125
Random Variables and Probability Distributions
89
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
4.36
a.
x is discrete. It can take on only six values.
b.
This is a binomial distribution.
c.
⎛ 5⎞ p(0) = ⎜ ⎟ (.7)0(.3)5-0 = ⎝ 0⎠ ⎛ 5⎞ p(1) = ⎜ ⎟ (.7)1(.3)5-1 = ⎝ 1⎠
⎛ 5⎞ p(2) = ⎜ ⎟ (.7)2(.3)5-2 = ⎝ 2⎠ ⎛5⎞ p(3) = ⎜ ⎟ (.7)3(.3)5-3 = ⎝ 3⎠ ⎛ 5⎞ p(4) = ⎜ ⎟ (.7)4(.3)5-4 = ⎝ 4⎠ ⎛5⎞ p(5) = ⎜ ⎟ (.7)5(.3)5-5 = ⎝5⎠
90
5! 5⋅ 4 ⋅3⋅ 2 ⋅1 (.7)0(.3)5 = (1)(.00243) = .00243 0!5! 1⋅5⋅ 4 ⋅3⋅ 2 ⋅1
5! (.7)1(.3)4 = .02835 1!4! 5! (.7)2(.3)3 = .1323 2!3!
5! (.7)3(.3)2 = .3087 3!2! 5! (.7)4(.3)1 = .36015 4!1! 5! (.7)5(.3)0 = .16807 5!0!
d.
μ = np = 5(.7) = 3.5 σ = npq = 5(.7)(.3) = 1.0247
e.
μ ± 2σ = 3.5 ± 2(1.0247) ⇒ (1.4506, 5.5494)
Chapter 4
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
4.38
a.
⎛ 3⎞ 3! 3 ⋅ 2 ⋅1 p(0) = ⎜ ⎟ (.3)0(.7)3-0 = (.3)0(.7)3 = (1)(.7)3 = .343 0!3! 1 ⋅ 3 ⋅ 2 ⋅1 ⎝ 0⎠ ⎛ 3⎞ 3! (.3)1(.7)2 = .441 p(1) = ⎜ ⎟ (.3)1(.7)3-1 = 1 1!2! ⎝ ⎠ ⎛ 3⎞ p(2) = ⎜ ⎟ (.3)2(.7)3-2 = ⎝ 2⎠ ⎛ 3⎞ p(3) = ⎜ ⎟ (.3)3(.7)3-3 = ⎝ 3⎠
4.40
4.42
x
p(x)
0 1 2 3
.343 .441 .189 .027
5! (.3)2(.7)1 = .189 2!1!
5! (.3)3(.7)0 = .027 3!0!
a.
P(x = 2) = P(x ≤ 2) − P(x ≤ 1) = .167 − .046 = .121 (from Table II, Appendix B)
b.
P(x ≤ 5) = .034
c.
P(x > 1) = 1 − P(x ≤ 1) = 1 − .919 = .081
d.
P(x < 10) = P(x ≤ 9) = 0
e.
P(x ≥ 10) = 1 − P(x ≤ 9) = 1 − .002 = .998
f.
P(x = 2) = P(x ≤ 2) − P(x ≤ 1) = .206 − .069 = .137
a.
We will check the 5 characteristics of a binomial random variable. 1. 2.
3. 4. 5.
The experiment consists of n = 200 identical trials. There are only two possible outcomes on each trial. Let S = young adult owns a mobile phone with internet access and F = young adult does not own a mobile phone with internet access. The probability of success (S) is the same from trial to trial. For each trial, p = P(S) = .20. q = 1 – p = 1 − .20 = .80. The trials are independent. The binomial random variable x is the number of young adults in 200 trials that own a mobile phone with internet access.
Thus, x is a binomial random variable. b.
From the exercise, p = .20. For any young adult, the probability that they own a mobile phone with internet access is .20.
c.
μ = E ( x) = np = 200(.20) = 40 . On the average, for every 200 young people surveyed, 40 will own mobile phones with internet access.
Random Variables and Probability Distributions
91
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
4.44
a.
We will check the 5 characteristics of a binomial random variable. 1. The experiment consists of n = 5 identical trials. We have to assume that the number of bottled water brands is large. 2. There are only 2 possible outcomes for each trial. Let S = brand of bottled water used tap water and F = brand of bottled water did not use tap water. 3. The probability of success (S) is the same from trial to trial. For each trial, p = P(S) = .25 and q = 1 – p = 1 - .25 = .75. 4. The trials are independent. 5. The binomial random variable x is the number of brands in the 5 trials that used tap water. If the total number of brands of bottled water is large, then the above characteristics will be basically true. Thus, x is a binomial random variable.
b.
c.
d.
4.46
⎛5 ⎞ The formula for the probability distribution for x is p( x) = ⎜ ⎟ .25 x (.75)5− x , ⎝ x⎠ for x = 1, 2, 3, 4, 5. ⎛5⎞ 5! .252.753 = .2637 P ( x = 2) = ⎜ ⎟ .252 (.75)5− 2 = 2 2!3! ⎝ ⎠
⎛5⎞ ⎛ 5⎞ P ( x ≤ 1) = P ( x = 0) + P ( x = 1) = ⎜ ⎟ .250 (.75)5−0 + ⎜ ⎟ .251 (.75)5−1 ⎝ 0⎠ ⎝1 ⎠ 5! 5! = .250.755 + .251.754 = .2373 + .3955 = .6328 0!5! 1!4!
a.
In order for x to be a binomial random variable, the n trials must be identical. We can assume that the process of selecting of a worker is identical from trial to trial. There are two possible outcomes - a worker missed work due to a back injury or not. The probability of success must be the same from trial to trial. We can assume that the probability of missing work due to a back injury is constant. The trials must be independent of each other. We can assume that the outcome of one trials will not affect the outcome of any other. Thus, x is a binomial random variable.
b.
From the information given in the problem, the estimate of p is .40.
c.
The mean is μ = E(x) = np = 10(.40) = 4. The standard deviation is σ =
d.
np(1 − p ) = 10(.40)(.60) = 2.4 1 = 1.549
Using Table II, Appendix B, with n = 10 and p = .40, P(x = 1) = P(x ≤ 1) − P(x ≤ 0) = .046 − .006 = .040 P(x > 1) = 1 − P(x ≤ 1) = 1 − .046 = .954
92
Chapter 4
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
4.48
Let x = number of packets observed by a network sensor in 150 trials. Then x has an approximate binomial distribution with n = 150 and p = .001. The virus will be detected if at least 1 packets is observed. ⎛ 150 ⎞ 150! 0 150 − 0 P ( x ≥ 1) = 1 − P ( x = 0) = 1 − ⎜ =1− .999150 = 1 − .8606 = .1394 ⎟ .001 (.999) 0!150! ⎝ 0 ⎠
4.50
a.
We must assume that the trials are identical, the probability of success is constant from trial to trial, and the trials are independent of each other.
b.
From the problem, we estimate p to be .20. Using Table II, Appendix B, with n = 25 and p = .20,
P(x ≤ 10) = .994 c.
E(x) = np = 25(.20) = 5
σ=
np(1 − p ) = 25(.20)(.80) = 4 = 2
d.
μ ± 2σ ⇒ 5 ± 2(2) ⇒ 5 ± 4 ⇒ (1, 9)
e.
Using Table II, Appendix B, with n = 25 and p = .20,
P(1 < x < 9) = P(x ≤ 8) − P(x ≤ 1) = .953 − .027 = .926 4.52
Assuming the supplier's claim is true,
μ = np = 500(.001) = .5 σ = npq = 500(.001)(.999) = .4995 = .707 If the supplier's claim is true, we would only expect to find .5 defective switches in a sample of size 500. Therefore, it is not likely we would find 4. Based on the sample, the guarantee is probably inaccurate. Note: z =
x−μ
σ
=
4 − .5 = 4.95 .707
This is an unusually large z-score. 4.54
a.
For this test, n = 20 and p = .10. Then x is a binomial random variable with n = 20 and p = .10. Using Table II, Appendix, with n = 20 and p = .10,
P(x ≤ 1) = .392
Random Variables and Probability Distributions
93
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
b.
For the experiment in part a, the level of confidence is 1 − P(x ≤ 1) = 1 − .392 = .608. Since this value is not close to 1, this would not be an acceptable level.
c.
Suppose we increased n from 20 to 25. Using Table II, Appendix B, with n = 25 and p = .10,
P(x ≤ 1) = .271. This value is smaller than the value found in part a. Now, suppose we keep n = 20, but change K to 0 instead of 1. Using Table II, Appendix B, with n = 20 and p = .10,
P(x ≤ 0) = .122. This value is again, smaller than the value found in part a. d.
Suppose we let K = 0. Now, we need to find n such that the level of confidence ≥ .95, which means that P(x = 0) ≤ .05. ⎛n⎞ P ( x = 0) = ⎜ ⎟ .10 (.9) n −0 ≤ .05 ⎝0⎠ n! n .9 ≤ .05 0!n! ⇒ .9n ≤ .05 ⇒
⇒ ln(.9n ) ≤ ln(.05) ⇒ nln(.9) ≤ ln(.05) ln(.05) −2.99573 = = 28.4 −.10536 ln(.9) Thus, if K = 0, then we need a sample size of 28 to get a level of confidence of at least .95. ⇒n≤
Now, suppose K = 1. Now, we need to find n such that the level of confidence is at least .95, which means that P(x ≤ 1) ≤ .05.
⎛n⎞ ⎛n⎞ P ( x ≤ 1) = P ( x = 0) + P( x = 1) = ⎜ ⎟ .10 (.9) n −0 + ⎜ ⎟ .11 (.9) n −1 ≤ .05 ⎝0⎠ ⎝1 ⎠ n! n n! .9 + .11.9n −1 ≤ .05 ⇒ 0!n! 1!(n − 1)! ⇒ .9n + n.11.9n −1 ≤ .05 ⇒ .9n −1 (.9 + .1n) ≤ ln(.05) From here, we will use trial and error.
94
Chapter 4
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
For n = 30, .930-1(.9+.1(30)) = .1837 n
.9n-1(.9+.1n)
30
.930-1(.9+.1(30)) = .1837
40
.940-1(.9+.1(40)) = .0805
45
.945-1(.9+.1(45)) = .0524
46
.946-1(.9+.1(46)) = .0480
Thus, for K = 1, we would need a sample size of 46 to get a level of confidence of at least .95. 4.56
μ = λ = 1.5 Using Table III of Appendix B:
4.58
a.
P(x ≤ 3) = .934
b.
P(x ≥ 3) = 1 − P(x ≤ 2) = 1 − .809 = .191
c.
P(x = 3) = P(x ≤ 3) − P(x ≤ 2) = .934 − .809 = .125
d.
P(x = 0) = .223
e.
P(x > 0) = 1 − P(x = 0) = 1 − .223 = .777
f.
P(x > 6) = 1 − P(x ≤ 6) = 1 − .999 = .001
a.
To graph the Poisson probability distribution with λ = 5, we need to calculate p(x) for x = 0 to 15. Using Table III, Appendix B, p(0) = .007 p(1) = P(x ≤ 1) − P(x ≤ 0) = .040 − .007 = .033 p(2) = P(x ≤ 2) − P(x ≤ 1) = .125 − .040 = .085 p(3) = P(x ≤ 3) − P(x ≤ 2) = .265 − .125 = .140 p(4) = P(x ≤ 4) − P(x ≤ 3) = .440 − .265 = .175 p(5) = P(x ≤ 5) − P(x ≤ 4) = .616 − .440 = .176 p(6) = P(x ≤ 6) − P(x ≤ 5) = .762 − .616 = .146 p(7) = P(x ≤ 7) − P(x ≤ 6) = .867 − .762 = .105 p(8) = P(x ≤ 8) − P(x ≤ 7) = .932 − .867 = .065 p(9) = P(x ≤ 9) − P(x ≤ 8) = .968 − .932 = .036 p(10) = P(x ≤ 10) − P(x ≤ 9) = .986 − .968 = .018 p(11) = P(x ≤ 11) − P(x ≤ 10) = .995 − .986 = .009 p(12) = P(x ≤ 12) − P(x ≤ 11) = .998 − .995 = .003 p(13) = P(x ≤ 13) − P(x ≤ 12) = .999 − .998 = .001 p(14) = P(x ≤ 14) − P(x ≤ 13) = 1.000 − .999 = .001 p(15) = P(x ≤ 15) − P(x ≤ 14) = 1.000 − 1.000 = .000
Random Variables and Probability Distributions
95
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
The graph is shown at right:
4.60
b.
μ=λ=5 σ = λ = 5 = 2.2361 μ ± 2σ ⇒ 5 ± 2(2.2361) ⇒ 5 ± 4.4722 ⇒ (.5278, 9.4722)
c.
P(.5278 < x < 9.4722) = P(1 ≤ x ≤ 9) = P(x ≤ 9) − P(x = 0) = .968 − .007 = .961
a.
E(x) = μ = λ = 6
σ = λ = 6 = 2.449 x−μ
z=
c.
Using Table III, Appendix B, with λ = 6,
σ
=
1− 6 = −2.041 2.449
b.
P(x ≤ 10) = .957
4.62
a.
In the problem, it is stated that E(x) = .03. This is also the value of λ.
σ2 = λ = .03 b.
96
The experiment consists of counting the number of deaths or missing persons in a threeyear interval. We must assume that the probability of a death or missing person in a three-year period is the same for any three-year period. We must also assume that the number of deaths or missing persons in any three-year period is independent of the number of deaths or missing persons in any other three-year period.
Chapter 4
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
c.
4.64
P(x = 1) =
λ 1e -λ = .031e -.03 = .0291
P(x = 0) =
λ 0e - λ = .030e -.03 = .9704
1!
0!
1!
0!
a.
Using Table III and λ = 6.2, P(x = 2) = P(x ≤ 2) − P(x ≤ 1) = .054 − .015 = .039 P(x = 6) = P(x ≤ 6) − P(x ≤ 5) = .574 − .414 = .160 P(x = 10) = P(x ≤ 10) − P(x ≤ 9) = .949 − .902 = .047
b.
The plot of the distribution is:
c.
μ = λ = 6.2, σ = λ = 6.2 = 2.490 μ ± σ ⇒ 6.2 ± 2.49 ⇒ (3.71, 8.69) μ ± 2σ ⇒ 6.2 ± 2(2.49) ⇒ 6.2 ± 4.98 ⇒ (1.22, 11.18) μ ± 3σ ⇒ 6.2 ± 3(2.49) ⇒ 6.2 ± 7.47 ⇒ (−1.27, 13.67) See the plot in part b.
d.
First, we need to find the mean number of customers per hour. If the mean number of customers per 10 minutes is 6.2, then the mean number of customers per hour is 6.2(6) = 37.2 = λ.
μ = λ = 37.2 and σ = λ = 37.2 = 6.099 μ ± 3σ ⇒ 37.2 ± 3(6.099) ⇒ 37.2 ± 18.297 ⇒ (18,903, 55.498) Using Chebyshev's Rule, we know at least 8/9 or 88.9% of the observations will fall within 3 standard deviations of the mean. The number 75 is way beyond the 3 standard deviation limit. Thus, it would be very unlikely that more than 75 customers entered the store per hour on Saturdays.
Random Variables and Probability Distributions
97
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
4.66
Let x = number of minor flaws in one square foot of a door's surface. Then x has a Poisson distribution with λ = .5.
μ= λ = .5, using Table III, Appendix B: P(fail inspection) = P(2 or more minor flaws in the square foot inspected) = P(x ≥ 2) = 1 − P(x ≤ 1) = 1 − .910 = .090 P(pass inspection) = P(x < 2) = P(x ≤ 1) = .910
4.68
If it takes exactly 5 minutes to wash a car and there are 5 cars in line, it will take 5(5) = 25 minutes to wash these 5 cars. Thus, for anyone to be in line at closing time, more than 1 car must arrive in the final ½ hour. In addition, if on average 10 cars arrive per hour, then an average of 5 cars will arrive per ½ hour (30 minutes). If we let x = number of cars to arrive in ½ hour, then x is a Poisson random variable with λ = 5. P(x > 1) = 1 – P(x ≤ 1) = 1 − .04 = .96 (Using Table III, Appendix B)
Since this probability is so big, it is very likely that someone will be in line at closing time. 4.70
4.72
⎧ .04 (20 ≤ x ≤ 45) From Exercise 4.69, f(x) = ⎨ ⎩ 0 otherwise a.
P(20 ≤ x ≤ 30) = (30 − 20)(.04) = .4
b.
P(20 < x < 30) = (30 − 20)(.04) = .4
c.
P(x ≥ 30) = (45 − 30)(.04) = .6
d.
P(x ≥ 45) = (45 − 45)(.04) = 0
e.
P(x ≤ 40) = (40 − 20)(.04) = .8
f.
P(x < 40) = (40 − 20)(.04) = .8
g.
P(15 ≤ x ≤ 35) = (35 − 20)(.04) = .6
h.
P(21.5 ≤ x ≤ 31.5) = (31.5 − 21.5)(.04) = .4
⎧ 1 (3 ≤ x ≤ 7) ⎪ From Exercise 4.71, f(x) = ⎨ 4 ⎪⎩ 0 otherwise a.
98
⎛1⎞ P(x ≥ a) = .6 ⇒ (7 − a) ⎜ ⎟ = .6 ⎝4⎠ ⇒ 7 − a = 2.4 ⇒ a = 4.6
Chapter 4
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
b.
c.
d.
4.74
⎛1⎞ P(x ≤ a) = .25 ⇒ (a − 3) ⎜ ⎟ = .25 ⎝4⎠ ⇒ a−3=1 ⇒ a=4 ⎛1⎞ P(x ≤ a) = 1 ⇒ (a − 3) ⎜ ⎟ = 1 ⎝4⎠ ⇒ a−3=4 ⇒ a=7 For any value of a ≥ 7, P(x ≤ a) = 1. Thus, a ≥ 7. ⎛1⎞ P(4 ≤ x ≤ a) = .5 ⇒ (a − 4) ⎜ ⎟ = .5 ⎝4⎠ ⇒ a − 4= 2 ⇒ a=6
c+d = 10 ⇒ c + d = 20 ⇒ c = 20 - d 2 d -c σ= = 1 ⇒ d − c = 12 12
μ=
Substituting, d − (20 − d) = 12 ⇒ 2d − 20 = 12 ⇒ 2d = 20 + 12 20 + 12 ⇒d= 2 ⇒ d = 11.732 Since c + d = 20 ⇒ c + 11.732 = 20 ⇒ c = 8.268 1 (c ≤ x ≤ d) f(x) = d −c 1 1 1 = = = .289 d − c 11.732 - 8.268 3.464 ⎧ .289 (8.268 ≤ x ≤ 11.732) Therefore, f(x) = ⎨ ⎩ 0 otherwise The graph of the probability distribution for x is given here.
Random Variables and Probability Distributions
99
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
4.76
a.
For this problem, c = 0 and d = 1. 1 ⎧ 1 (0 ≤ x ≤ 1) = ⎪ f(x) = ⎨ d − c 1 − 0 ⎪⎩ 0 otherwise c+d 0 +1 = = .5 2 2 2 2 (d − c) (1 − 0) 1 = .0833 σ2 = = = 12 12 12 P(.2 < x < .4) = (.4 − .2)(1) = .2
μ=
b.
4.78
c.
P(x > .995) = (1 − .995)(1) = .005. Since the probability of observing a trajectory greater than .995 is so small, we would not expect to see a trajectory exceeding .995.
a.
For layer 2, let x = amount loss. Since the amount of loss is random between .01 and .05 million dollars, the uniform distribution for x is: f(x) =
1 d −c
(c ≤ x ≤ d)
1 1 1 = = = 25 d − c .05 − .01 .04
⎧ 25 (.01 ≤ x ≤ .05) Therefore, f(x) = ⎨ ⎩ 0 otherwise A graph of the distribution looks like the following:
μ=
σ=
c + d .01 + .05 = = .03 2 2 d −c
12
=
.05 − .01 12
= .0115, σ2 = (.0115)2 = .00013
The mean loss for layer 2 is .03 million dollars and the variance of the loss for layer 2 is .00013 million dollars squared.
100
Chapter 4
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
b.
For layer 6, let x = amount loss. Since the amount of loss is random between .50 and 1.00 million dollars, the uniform distribution for x is: f(x) =
1 d −c
(c ≤ x ≤ d)
1 1 1 = = =2 d − c 1.00 − .50 .50
⎧ 2 (.50 ≤ x ≤ 1.00) Therefore, f(x) = ⎨ ⎩ 0 otherwise A graph of the distribution looks like the following:
μ=
σ=
c + d .50 + 1.00 = = .75 2 2
d −c
12
=
1.00 − .50 = .1443, σ2 = (.1443)2 = .0208 12
The mean loss for layer 6 is .75 million dollars and the variance of the loss for layer 6 is .0208 million dollars squared. c.
A loss of $10,000 corresponds to x = .01. P(x > .01) = 1 A loss of $25,000 corresponds to x = .025. 1 ⎛ 1 ⎞ ⎛ ⎞ P(x < .025) = (Base)(Height) = (x − c) ⎜ ⎟ = (.025 − .01) ⎜ ⎟ ⎝d − c⎠ ⎝ .05 − .01 ⎠ = .015(25) = .375
Random Variables and Probability Distributions
101
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
d.
A loss of $750,000 corresponds to x = .75. A loss of $1,000,000 corresponds to x = 1. 1 ⎛ 1 ⎞ ⎛ ⎞ P(.75 < x < 1) = (Base)(Height) = (d - x) ⎜ ⎟ = (1.00 - .75) ⎜ ⎟ ⎝ 1.00 − .50 ⎠ ⎝d −c⎠ = .25(2) = .5
A loss of $900,000 corresponds to x = .90. 1 ⎛ 1 ⎞ ⎛ ⎞ P(x > .9) = (Base)(Height) = (d − x) ⎜ ⎟ = (1.00 − .90) ⎜ ⎟ ⎝ 1.00 − .50 ⎠ ⎝d −c⎠ = .10(2) = .20 P(x = .9) = 0
4.80
Let x = cycle availability, where x has a uniform distribution on the interval from 0 to 1. Mean = μ =
c + d 0 +1 = = .5 2 2
Standard deviation = σ =
d −c
12
=
1− 0 = .289 12
The 10th percentile is that value of x such that 10% of all observations are below it. Let K1 = 10th percentile. P(x ≤ K1) = (K1 − 0)(1 − 0) = K1 = .10
The lower quartile is that value of x such that 25% of all observations are below it. Let K2 = 25th percentile. P(x ≤ K2) = (K2 − 0)(1 − 0) = K2 = .25
The UPPER quartile is that value of x such that 75% of all observations are below it. Let K3 = 75th percentile. P(x ≤ K3) = (K3 − 0)(1 − 0) = K3 = .75
4.82
102
a.
Chapter 4
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
b.
c + d 0 +1 = = .5 2 2 d − c 1− 0 = σ= = .289 12 12 μ=
σ2 = .2892 = .083
c.
P(p > .95) = (1 − .95)(1) = .05 P(p < .95) = (.95 − 0)(1) = .95
d.
The analyst should use a uniform probability distribution with c = .90 and d = .95. 1 1 ⎧ 1 = = = 20 (.90 ≤ p ≤ .95) ⎪ f(p) = ⎨ d − c .95 − .90 .05 ⎪⎩ 0 otherwise
4.84
4.86
Table IV in the text gives the area between z = 0 and z = z0. In this exercise, the answers may thus be read directly from the table by looking up the appropriate z. a.
P(0 < z < 2.0) = .4772
b.
P(0 < z < 3.0) = .4987
c.
P(0 < z < 1.5) = .4332
d.
P(0 < z < .80) = .2881
a.
P(−1 ≤ z ≤ 1) = A1 + A2 = .3413 + .3413 = .6826
b.
P(−2 ≤ z ≤ 2) = A1 + A2 = .4772 + .4772 = .9544
c.
P(−2.16 < z ≤ 0.55) = A1 + A2 = .4846 + .2088 = .6934
Random Variables and Probability Distributions
103
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
4.88
4.90
104
d.
P(−.42 < z < 1.96) = P(−.42 ≤ z ≤ 0) + P(0 ≤ z ≤ 1.96) = A 1 + A2 = .1628 + .4750 = .6378
e.
P(z ≥ −2.33) = P(−2.33 ≤ z ≤ 0) + P(z ≥ 0) = A 1 + A2 = .4901 + .5000 = .9901
f.
P(z < 2.33) = P(z ≤ 0) + P(0 ≤ z ≤ 2.33) = A 1 + A2 = .5000 + .4901 = .9901
a.
P(z = 1) = 0, since a single point does not have an area.
b.
P(z ≤ 1) = P(z ≤ 0) + P(0 < z ≤ 1) = A 1 + A2 = .5 + .3413 = .8413 (Table IV, Appendix B)
c.
P(z < 1) = P(z ≤ 1) = .8413 (Refer to part b.)
d.
P(z > 1) = 1 − P(z ≤ 1) = 1 − .8413 = .1587 (Refer to part b.)
Using Table IV, Appendix B: a.
P(z ≥ z0) = .05 A1 = .5 − .05 = .4500 Looking up the area .4500 in Table IV gives z0 = 1.645.
b.
P(z ≥ z0) = .025 A1 = .5 − .025 = .4750 Looking up the area .4750 in Table IV gives z0 = 1.96.
c.
P(z ≤ z0) = .025 A1 = .5 − .025 = .4750 Looking up the area .4750 in Table IV gives z = 1.96. Since z0 is to the left of 0, z0 = −1.96.
Chapter 4
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
4.92
4.94
d.
P(z ≥ z0) = .10 A1 = .5 − .1 = .4 Looking up the area .4000 in Table IV gives z0 = 1.28.
e.
P(z > z0) = .10 A1 = .5 − .1 = .4 z0 = 1.28 (same as in d)
a.
z=1
b.
z = −1
c.
z=0
d.
z = −2.5
e. z = 3 Using Table IV of Appendix B: a.
To find the probability that x assumes a value more than 2 standard deviations from μ: P(x < μ − 2σ) + P(x > μ + 2σ) = P(z < −2) + P(z > 2) = 2P(z > 2) = 2(.5000 − .4772) = 2(.0228) = .0456 To find the probability that x assumes a value more than 3 standard deviations from μ: P(x < μ − 3σ) + P(x > μ + 3σ) = P(z < −3) + P(z > 3) = 2P(z > 3) = 2(.5000 − .4987) = 2(.0013) = .0026
b.
To find the probability that x assumes a value within 1 standard deviation of its mean: P(μ − σ < x < μ + σ) = P(−1 < z < 1) = 2P(0 < z < 1) = 2(.3413) = .6826
Random Variables and Probability Distributions
105
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
To find the probability that x assumes a value within 2 standard deviations of μ: P(μ − 2σ < x < μ + 2σ) = P(−2 < z < 2) = 2P(0 < z < 2) = 2(.4772) = .9544 c.
To find the value of x that represents the 80th percentile, we must first find the value of z that corresponds to the 80th percentile. P(z < z0) = .80. Thus, A1 + A2 = .80. Since A1 = .50, A2 = .80 - .50 = .30. Using the body of Table IV, z0 = .84. To find x, we substitute the values into the z-score formula: z=
x−μ
σ
.84 =
x − 1000 ⇒ x = .84(10) + 1000 = 1008.4 10
To find the value of x that represents the 10th percentile, we must first find the value of z that corresponds to the 10th percentile.
P(z < z0) = .10. Thus, A1 = .50 - .10 = .40. Using the body of Table IV, z0 = −1.28. To find x, we substitute the values into the z-score formula: z=
x−μ
σ
−1.28 = 4.96
x − 1000 ⇒ x = −1.28(10) + 1000 = 987.2 10
The random variable x has a normal distribution with μ = 50 and σ = 3. a.
P(x ≤ x0) = .8413 So, A1 + A2 = .8413 Since A1 = .5, A2 = .8413 − .5 = .3413. Look up the area .3413 in the body of Table IV, Appendix B; z0 = 1.0.
106
Chapter 4
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
To find x0, substitute all the values into the z-score formula: z=
x−μ
σ
x − 50 1.0 = 0 3 x0 = 50 + 3(1.0) = 53 b.
P(x > x0) = .025 So, A = .5000 − .025 = .4750 Look up the area .4750 in the body of Table IV, Appendix B; z0 = 1.96. To find x0, substitute all the values into the z-score formula: z=
x−μ
σ
x − 50 1.96 = 0 3 x0 = 50 + 3(1.96) = 55.88 c.
P(x > x0) = .95 So, A1 + A2 = .95. Since A2 = .5, A1 = .95 − .5 = .4500. Look up the area .4500 in the body of Table IV, Appendix B; (since it is exactly between two values, average the z-scores). z0 ≈ −1.645. To find x0, substitute into the z-score formula: z=
x−μ
σ
x − 50 −1.645 = 0 3 x0 = 50 − 3(1.645) = 45.065
d.
P(41 ≤ x < x0) = .8630 z=
x−μ
σ
=
41 − 50 = −3 3
A1 = P(41 ≤ x ≤ μ) = P(−3 ≤ z ≤ 0) = P(0 ≤ z ≤ 3) = .4987 A1 + A2 = .8630, since A1 = .4987, A2 = .8630 - .4987 = .3643. Look up .3643 in the body of Table IV, Appendix B; z0 = 1.1.
Random Variables and Probability Distributions
107
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
To find x0, substitute into the z-score formula: z=
x−μ
σ
x − 50 1.1 = 0 3 x0 = 50 + 3(1.1) = 53.3 e.
P(x < x0) = .10 So A = .5000 − .10 = .4000 Look up area .4000 in the body of Table IV, Appendix B; z0 = 1.28. Since z0 is to the left of 0, z0 = −1.28. To find x0, substitute all the values into the z-score formula: z=
x−μ
σ
x0 − 50 3 x0 = 50 − 1.28(3) = 46.16
−1.28 =
f.
P(x > x0) = .01 So A = .5000 − .01 = .4900 Look up area .4900 in the body of Table IV, Appendix B; z0 = 2.33. To find x0, substitute all the values into the z-score formula: z=
x−μ
σ
x0 − 50 3 x0 = 50 + 2.33(3) = 56.99
2.33 =
4.98
a.
Using Table IV, Appendix B,
0 − 5.26 ⎞ ⎛ P ( x > 0) = P ⎜ z > ⎟ = P ( z > −0.526) 10 ⎠ ⎝ = .5 + P (−0.53 < z < 0) = .5 + .2019 = .7019 b.
108
15 − 5.26 ⎞ ⎛ 5 − 5.26
Chapter 4
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
c.
d.
1 − 5.26 ⎞ ⎛ P ( x < 1) = P ⎜ z < ⎟ = P( z < −0.426) 10 ⎠ ⎝ = .5 − P(−0.43 < z < 0) = .5 − .1664 = .3336 −25 − 5.26 ⎞ ⎛ P ( x ≤ −25) = P ⎜ z ≤ ⎟ = P ( z ≤ −3.026) 10 ⎝ ⎠ = .5 − P(−3.03 ≤ z < 0) = .5 − .4988 = .0012 Since the probability of seeing a win percentage of -25% or anything more unusual is so small (p = .0012), we would conclude that the average casino win percentage is not 5.26%.
4.100
Let x = driver’s head injury rating. The random variable x has a normal distribution with μ = 605 and σ = 185. Using Table IV, Appendix B, a.
b.
700 − 605 ⎞ ⎛ 500 − 605 P (500 < x < 700) = P ⎜
= P ( −1.11 < z < 0) − P ( −0.57 < z < 0) = .3665 − .2157 = .1508 c.
d.
850 − 605 ⎞ ⎛ P ( x < 850) = P ⎜ z < ⎟ = P ( z < 1.32) = .5 + P (0 < z < 1.32) 185 ⎠ ⎝ = .5 + .4066 = .9066 1, 000 − 605 ⎞ ⎛ P ( x > 1, 000) = P ⎜ z > ⎟ = P ( z > 2.14) = .5 − P (0 < z < 2.14) 185 ⎝ ⎠
= .5 − .4838 = .0162 4.102
a.
Let x = crop yield. The random variable x has a normal distribution with μ = 1,500 and σ = 250. 1,600 -1,500 ⎞ ⎛ P(x < 1,600) = P ⎜ z < ⎟ = P(z < .4) = .5 + .1554 = .6554 250 ⎝ ⎠ (Using Table IV)
Random Variables and Probability Distributions
109
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
b.
Let x1 = crop yield in first year and x2 = crop yield in second year. If x1 and x2 are independent, then the probability that the farm will lose money for two straight years is: 1,600 − 1,500 ⎞ ⎛ 1,600 − 1,500 ⎞ ⎛ P(x1 < 1,600) P(x2 < 1,600) = P ⎜ z1 < ⎟ P ⎜ z2 < ⎟ 250 250 ⎝ ⎠ ⎝ ⎠
= P(z1 < .4) P(z2 < .4) = (.5 + .1554)(.5 + .1554) = .6554(.6554) = .4295 (Using Table IV) c.
4.104
[1,500 + 2σ ] − 1,500 ⎞ ⎛ [1,500 − 2σ ] − 1,500 P(1,500 − 2σ ≤ x ≤ 1,500 + 2σ) = P ⎜ ≤z≤ ⎟ σ σ ⎝ ⎠ = P(−2 ≤ z ≤ 2) = 2P(0 ≤ z ≤ 2) = 2(.4772) = .9544 (Using Table IV)
Let x = wage rate. The random variable x is normally distributed with μ = 16 and σ = 1.25. Using Table IV, Appendix B, a.
b.
c.
17.30 − 16 ⎞ ⎛ P ( x > 17.30) = P ⎜ z > ⎟ = P ( z > 1.04) 1.25 ⎠ ⎝ = .5 − P(0 < z < 1.04) = .5 − .3508 = .1492 17.30 − 16 ⎞ ⎛ P ( x > 17.30) = P ⎜ z > ⎟ = P ( z > 1.04) 1.25 ⎠ ⎝ = .5 − P(0 < z < 1.04) = .5 − .3508 = .1492 P(x ≤ η) = P(x ≥ η) = .5 Thus, μ = η = 16. (Recall from section 2.4 that in a symmetric distribution, the mean equals the median.)
4.106
a.
The contract will be profitable if total cost, x, is less than $1,000,000. 1,000,000 − 850,000 ⎞ ⎛ P(x < 1,000,000) = P ⎜ z < ⎟ = P(z < .88) = .5 + .3106 = .8106 170,000 ⎝ ⎠
b.
The contract will result in a loss if total cost, x, exceeds 1,000,000. P(x > 1,000,000) = 1 − P(x < 1,000,000) = 1 − .8106 = .1894
110
Chapter 4
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
c.
P(x < R) = .99. Find R. R − 850,000 ⎞ ⎛ = P(z < z0) = .99 P(x < R) = P ⎜ z < 170,000 ⎟⎠ ⎝ A1 = .99 − .5 = .4900 Looking up the area .4900 in Table IV, z0 = 2.33 R − 850,000 R − 850,000 ⇒ 2.33 = 170,000 170,000 ⇒ R = 2.33(170,000) + 850,000 = $1,246,100
z0 =
4.108
a.
Let x = quantity injected per container. The random variable x has a normal distribution with μ = 10 and σ = .2. 10 − 10 ⎞ ⎛ P(x < 10) = P ⎜ z < ⎟ = P(z < 0.0) = .5 .2 ⎠ ⎝
10 − 10 ⎞ ⎛ P(x ≥ 10) = P ⎜ z ≥ ⎟ = P(z ≥ 0.0) = .5 .2 ⎠ ⎝
4.110
b.
Since the container needed to be reprocessed, it cost $10. Upon refilling, it contained 10.60 units with a cost of 10.60($20) = $212. Thus, the total cost for filling this container is $10 + $212 = $222. Since the container sells for $230, the profit is $230 − $222 = $8.
c.
Let x = quantity injected per container. The random variable x has a normal distribution with μ = 10.10 and σ = .2. The expected value of x is E(x) = μ = 10.10. The cost of a container with 10.10 units is 10.10($20) = $202. Thus, the expected profit would be the selling price minus the cost or $230 − $202 = $28.
a.
If z is a standard normal random variable, QL = zL is the value of the standard normal distribution which has 25% of the data to the left and 75% to the right. Find zL such that P(z < zL) = .25 A1 = .50 − .25 = .25. Look up the area A1 = .25 in the body of Table IV of Appendix B; zL = −.67 (taking the closest value). If interpolation is used, −.675 would be obtained.
Random Variables and Probability Distributions
111
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
QU = zU is the value of the standard normal distribution which has 75% of the data to the left and 25% to the right. Find zU such that P(z < zU) = .75 A1 + A2 = P(z ≤ 0) + P(0 ≤ z ≤ zU) = .5 + P(0 ≤ z ≤ zU) = .75 Therefore, P(0 ≤ z ≤ zU) = .25. Look up the area .25 in the body of Table IV of Appendix B; zU = .67 (taking the closest value). b.
Recall that the inner fences of a box plot are located 1.5(QU - QL) outside the hinges (QL and QU). To find the lower inner fence, QL − 1.5(QU − QL) = −.67 − 1.5(.67 − (−.67)) = -.67 − 1.5(1.34) = -2.68 (−2.70 if zL = −.675 and zU = +.675) The upper inner fence is: QU + 1.5(QU - QL) = .67 + 1.5(.67 − (−.67)) = .67 + 1.5(1.34) = 2.68 (+2.70 if zL = −.675 and zU = +.675)
c.
Recall that the outer fences of a box plot are located 3(QU − QL) outside the hinges (QL and QU). To find the lower outer fence, QL − 3(QU − QL) = −.67 − 3(.67 − (−.67)) = −.67 − 3(1.34) = -4.69 (−4.725 if zL = −.675 and zU = +.675) The upper outer fence is: QU + 3(QU − QL) = .67 + 3(.67 − (−.67)) = .67 + 3(1.34) = 4.69 (4.725 if zL = −.675 and zU = +.675)
112
Chapter 4
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
d.
P(z < −2.68) + P(z > 2.68) = 2P(z > 2.68) = 2(.5000 − .4963) (Table IV, Appendix B) = 2(.0037) = .0074 (or 2(.5000 − .4965) = .0070 if −2.70 and 2.70 are used) P(z < −4.69) + P(z > 4.69) = 2P(z > 4.69) ≈ 2(.5000 − .5000) ≈ 0
4.112
4.114
e.
In a normal probability distribution, the probability of an observation being beyond the inner fences is only .0074 and the probability of an observation being beyond the outer fences is approximately zero. Since the probability is so small, there should not be any observations beyond the inner and outer fences. Therefore, they are probably outliers.
a.
IQR = QU − QL = 195 − 72 = 123
b.
IQR/s = 123/95 = 1.295
c.
Yes. Since IQR is approximately 1.3, this implies that the data are approximately normal.
a.
Using MINITAB, the stem-and-leaf display is: Stem-and-leaf of X Leaf Unit = 0.10 5
11266
6
2
1
8
3
35
11
4
035
14
5
039
14
6
3457
10
7
346
7
8
24469
2
N = 28
47
Since the data do not form a mound-shape, it indicates that the data may not be normally distributed. b.
Using MINITAB, the descriptive statistics are: Variable X Variable X
N
Mean
Median
TrMean
StDev
SE Mean
28
5.511
6.100
5.519
2.765
0.5230
Minimum
Maximum
Q1
Q3
1.100
9.700
3.350
8.050
The standard deviation is 2.765.
Random Variables and Probability Distributions
113
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
c.
Using the printout from MINITAB in part b, QL = 3.35, and QU = 8.05. The IQR = QU − QL = 8.05 − 3.35 = 4.7. If the data are normally distributed, then IQR/s ≈ 1.3. For this data, IQR/s = 4.7/2.765 = 1.70. This is a fair amount larger than 1.3, which indicates that the data may not be normally distributed.
d.
Using MINITAB, the normal probability plot is:
The data at the extremes are not particularly on a straight line. This indicates that the data are not normally distributed.
4.116
From the normal probability plot, it appears that the data may not be normal. The points with small observed values and the points with large observed values do not fall on the straight line. This implies that the data may not be from a normal distribution.
4.118
a.
We will look at the 4 methods for determining if the data are normal. First, we will look at a histogram of the data. Using MINITAB, the histogram of the fish weights is: 35 30
Frequency
25 20 15 10 5 0 0
500
1000
1500
2000
2500
Weight
From the histogram, the data appear to be fairly mound-shaped. This indicates that the data may be normal.
114
Chapter 4
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
Next, we look at the intervals x ± s, x ± 2 s, x ± 3s . If the proportions of observations falling in each interval are approximately .68, .95, and 1.00, then the data are approximately normal. Using MINITAB, the summary statistics are: Descriptive Statistics: Weight Variable Weight
N 144
Mean 1049.7
Median 1000.0
TrMean 1039.4
Variable Weight
Minimum 173.0
Maximum 2302.0
Q1 804.5
Q3 1263.3
StDev 376.5
SE Mean 31.4
x ± s ⇒ 1049.7 ± 376.5 ⇒ (673.2, 1, 426.2) 98 of the 144 values fall in this interval. The proportion is .68. This is exactly the .68 we would expect if the data were normal. x ± 2 s ⇒ 1049.7 ± 2(376.5) ⇒ 1049.7 ± 753 ⇒ (296.7, 1802.7) 140 of the 144 values fall in this interval. The proportion is .97. This is somewhat larger than the .95 we would expect if the data were normal. x ± 3s ⇒ 1049.7 ± 3(376.5) ⇒ 1049.7 ± 1126.5 ⇒ (−79.8, 2179.2) 143 of the 144 values fall in this interval. The proportion is .993. This is close to the 1.00 we would expect if the data were normal. From this method, it appears that the data are normal. Next, we look at the ratio of the IQR to s. IQR = QU – QL = 1263.3 – 804.5 = 458.8. IQR 458.8 = = 1.22 This is close to the 1.3 we would expect if the data were normal. 376.5 s This method indicates the data are normal. Finally, using MINITAB, the normal probability plot is: Normal Probability Plot for Weight ML Estimates - 95% CI
ML Estimates 99
Mean
1049.72
StDev
375.236
Percent
95 90
Goodness of Fit
80 70 60 50 40 30 20
AD*
0.793
10 5 1
0
1000
2000
Data
Since the data form a fairly straight line, the data appear to be normal.
Random Variables and Probability Distributions
115
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
From the 4 different methods, all indications are that the fish weight data are approximately normal. b.
We will look at the 4 methods for determining if the data are normal. First, we will look at a histogram of the data. Using MINITAB, the histogram of the fish DDT levels is:
140 120
Frequency
100 80 60 40 20 0 0
500
1000
DDT
From the histogram, the data appear to be skewed to the right. This indicates that the data may not be normal. Next, we look at the intervals x ± s, x ± 2 s, x ± 3s . If the proportions of observations falling in each interval are approximately .68, .95, and 1.00, then the data are approximately normal. Using MINITAB, the summary statistics are: Descriptive Statistics: DDT Variable DDT
N 144
Mean 24.35
Median 7.15
TrMean 10.38
Variable DDT
Minimum 0.11
Maximum 1100.00
Q1 3.33
Q3 13.00
StDev 98.38
SE Mean 8.20
x ± s ⇒ 24.35 ± 98.38 ⇒ (−74.03, 122.73) 138 of the 144 values fall in this interval. The proportion is .96. This is much greater than the .68 we would expect if the data were normal. x ± 2 s ⇒ 24.35 ± 2(98.38) ⇒ 24.35 ± 196.76 ⇒ (−172.41, 221.11) 142 of the 144 values fall in this interval. The proportion is .986 This is much larger than the .95 we would expect if the data were normal. x ± 3s ⇒ 24.35 ± 3(98.38) ⇒ 24.35 ± 295.14 ⇒ (−270.79, 319.49) 142 of the 144 values fall in this interval. The proportion is .986. This is somewhat lower than the 1.00 we would expect if the data were normal. From this method, it appears that the data are not normal. Next, we look at the ratio of the IQR to s. IQR = QU – QL = 13.00 – 3.33 = 9.67.
116
Chapter 4
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
IQR 9.67 = = 0.098 This is much smaller than the 1.3 we would expect if the data were s 98.38 normal. This method indicates the data are not normal. Finally, using MINITAB, the normal probability plot is: Normal Probability Plot for DDT ML Estimates - 95% CI
ML Estimates 99
Percent
95 90
Mean
24.355
StDev
98.0364
Goodness of Fit
80 70 60 50 40 30 20
AD*
38.58
10 5 1
0
500
1000
Data
Since the data do not form a straight line, the data are not normal. From the 4 different methods, all indications are that the fish DDT level data are not normal. 4.120
We will look at the 4 methods for determining if the data are normal. First, we will look at a histogram of the data. Using MINITAB, the histogram of the sanitation scores is: Histogram of SCORE
40
Fr equency
30
20
10
0
66
72
78
84
90
96
SC O RE
Random Variables and Probability Distributions
117
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
From the histogram, the data appear to be skewed to the left. This indicates that the data are not normal. Next, we look at the intervals x ± s, x ± 2 s, x ± 3s . If the proportions of observations falling in each interval are approximately .68, .95, and 1.00, then the data are approximately normal. Using MINITAB, the summary statistics are: Descriptive Statistics: DDT Variable DDT
N 144
Mean 24.35
Median 7.15
TrMean 10.38
Variable DDT
Minimum 0.11
Maximum 1100.00
Q1 3.33
Q3 13.00
StDev 98.38
SE Mean 8.20
x ± s ⇒ 94.911 ± 4.825 ⇒ (90.086, 99.736) 137 of the 169 values fall in this interval. The proportion is .81. This is much larger than the .68 we would expect if the data were normal. x ± 2 s ⇒ 94.911 ± 2(4.825) ⇒ 94.911 ± 9.65 ⇒ (85.261, 104.561) 165 of the 169 values fall in this interval. The proportion is .98. This is somewhat larger than the .95 we would expect if the data were normal. x ± 3s ⇒ 94.911 ± 3(4.825) ⇒ 94.911 ± 14.475 ⇒ (80.436, 109.386) 166 of the 169 values fall in this interval. The proportion is .982. This is somewhat smaller than the 1.00 we would expect if the data were normal. From this method, it appears that the data are not normal. Next, we look at the ratio of the IQR to s. IQR = QU – QL = 98 – 93 = 5. IQR 5 = = 1.036 This is much smaller than the 1.3 we would expect if the data were s 4.825 normal. This method indicates the data are not normal. Finally, using MINITAB, the normal probability plot is: Probability Plot of SCORE N ormal - 95% C I 99.9
Mean StDev
99
N AD P-Value
95
94.91 4.825 169 7.216 <0.005
P er cent
90 80 70 60 50 40 30 20 10 5 1 0.1
60
118
70
80
90 SC O RE
100
110
Chapter 4
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
Since the data do not form a straight line, the data are not normal. From the 4 different methods, all indications are that the sanitation scores data are not normal. 4.122
We will look at the 4 methods for determining if the data are normal. First, we will look at a histogram of the data. Using MINITAB, the histogram of the tensile strength values is: Histogram of Strength 3.0
Fr equency
2.5 2.0
1.5 1.0
0.5 0.0
330
335
340 345 Str ength
350
355
From the histogram, the data appear to be somewhat skewed to the left. This might indicate that the data are not normal. Next, we look at the intervals x ± s, x ± 2 s, x ± 3s . If the proportions of observations falling in each interval are approximately .68, .95, and 1.00, then the data are approximately normal. Using MINITAB, the summary statistics are: Descriptive Statistics: Strength Variable Strength
N 11
N* 0
Variable Strength
Maximum 356.30
Mean 342.13
SE Mean 2.38
StDev 7.91
Minimum 328.20
Q1 334.70
Median 343.60
Q3 347.80
x ± s ⇒ 342.13 ± 7.91 ⇒ (334.22, 350.04) 8 of the 11 values fall in this interval. The proportion is .73. This is somewhat larger than the .68 we would expect if the data were normal. x ± 2 s ⇒ 342.16 ± 2(7.91) ⇒ 342.16 ± 9.65 ⇒ (326.34, 357.98) All 11 of the 11 values fall in this interval. The proportion is 1.00. This is somewhat larger than the .95 we would expect if the data were normal. x ± 3s ⇒ 342.16 ± 3(7.91) ⇒ 342.16 ± 23.73 ⇒ (318.43, 365.89) Again, all 11 of the 11 values fall in this interval. The proportion is 1.00. This is equal to the 1.00 we would expect if the data were normal.
Random Variables and Probability Distributions
119
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
From this method, it appears that the data are quite normal. Next, we look at the ratio of the IQR to s. IQR = QU – QL = 347.80 – 334.70 = 13.1. IQR 13.1 = = 1.656 This is much larger than the 1.3 we would expect if the data were normal. s 7.91 This method indicates the data are not normal.
Finally, using MINITAB, the normal probability plot is: Probability Plot of Strength Normal - 95% CI 99
Mean StDev N AD P-Value
95 90
342.1 7.907 11 0.154 0.937
80
Percent
70 60 50 40 30 20 10 5
1
310
320
330
340 Strength
350
360
370
Since the data do form a fairly straight line, the data could be normal. From the 4 different methods, three of the four indicate that the data probably are not from a normal distribution. 4.124
a.
In order to approximate the binomial distribution with the normal distribution, the interval μ ± 3σ ⇒ np ± 3 npq should lie in the range 0 to n. When n = 25 and p = .4, np ± 3 npq ⇒ 25(.4) ± 3 25(.4)(1 − .4) ⇒ 10 ± 3 6 ⇒ 10 ± 7.3485 ⇒ (2.6515, 17.3485) Since the interval calculated does lie in the range 0 to 25, we can use the normal approximation.
b.
μ = np = 25(.4) = 10 σ2 = npq = 25(.4)(.6) = 6
c.
P(x ≥ 9) = 1 − P(x ≤ 8) = 1 − .274 = .726
d.
120
(Table II, Appendix B)
⎛ (9 − .5) − 10 ⎞ P(x ≥ 9) ≈ P ⎜ z ≥ ⎟ 6 ⎝ ⎠ = P(z ≥−.61) = .5000 + .2291 = .7291 (Using Table IV in Appendix B.)
Chapter 4
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
4.126
μ = np = 1000(.5) = 500, σ = a.
npq = 1000(.5)(.5) = 15.811
Using the normal approximation, (500 + .5) 500 ⎞ ⎛ P(x > 500) ≈ P ⎜ z > ⎟ = P(z > .03) = .5 − .0120 = .4880 15.811 ⎝ ⎠ (from Table IV, Appendix B)
b.
(500 − .5) − 500 ⎞ ⎛ (490 − .5) − 500 P(490 ≤ x < 500) ≈ P ⎜ ≤z< ⎟ 15.811 15.811 ⎝ ⎠ = P(−.66 ≤ z < −.03) = .2454 − .0120 = .2334 (from Table IV, Appendix B)
c.
4.128
(550 + .5) − 500 ⎞ ⎛ P(x > 550) ≈ P ⎜ z > ⎟ = P(z > 3.19) ≈ .5 − .5 = 0 15.811 ⎝ ⎠ (from Table IV, Appendix B)
a.
E(x) = μ = np = 350(.27) = 94.5.
b.
σ = σ 2 = npq = 350(.27)(.73) = 68.985 = 8.306
c.
z=
d.
To see if the normal approximation is appropriate, we use:
x−μ
σ
=
99.5 − 94.5 = 0.60 8.306
μ ± 3σ ⇒ 94.5 ± 3(8.306) ⇒ 94.5 ± 24.918 ⇒ (69.582, 119.418) Since the interval lies in the range of 0 to 350, the normal approximation is appropriate. P ( x ≥ 100) ≈ P ( z ≥ 0.60) = .5 − .2257 = .2743 (Using Table IV, Appendix B) 4.130
Let x = number of white-collar employees in good shape who will develop stress related illnesses in a sample of 400. Then x is a binomial random variable with n = 400 and p = .10. To see if the normal approximation is appropriate for this problem: np ± 3 npq ⇒ 400(.1) ± 3 400(.1)(.9) ⇒ 40 ± 18 ⇒ (22, 58) Since this interval is contained in the interval 0, n = 400, the normal approximation is appropriate.
(60 + .5) − 40 ⎞ ⎛ P(x > 60) ≈ P ⎜ z > ⎟ 6 ⎝ ⎠ = P(z > 3.42) ≈ .5000 - .5000 = 0
Random Variables and Probability Distributions
121
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
4.132
a.
For n = 100 and p = .01: μ ± 3σ ⇒ np ± 3 npq ⇒ 100(.01) ± 3 100(.01)(.99) ⇒ 1 ± 3(.995) ⇒ 1 ± 2.985 ⇒ (−1.985, 3.985) Since the interval does not lie in the range 0 to 100, we cannot use the normal approximation to approximate the probabilities.
b.
For n = 100 and p = .5: μ ± 3σ ⇒ np ± 3 npq ⇒ 100(.5) ± 3 100(.5)(.5) ⇒ 50 ± 3(5) ⇒ 50 ± 15 ⇒ (35, 65) Since the interval lies in the range 0 to 100, we can use the normal approximation to approximate the probabilities.
c.
For n = 100 and p = .9: μ ± 3σ ⇒ np ± 3 npq ⇒ 100(.9) ± 3 100(.9)(.1) ⇒ 90 ± 3(3) ⇒ 90 ± 9 ⇒ (81, 99) Since the interval lies in the range 0 to 100, we can use the normal approximation to approximate the probabilities.
4.134
b.
Let v = number of credit card users out of 100 who carry Visa. Then v is a binomial random variable with n = 100 and pv = .539. E(v) = npv = 100(.539) = 53.9. Let d = number of credit card users out of 100 who carry Discover. Then d is a binomial random variable with n = 100 and pd = .040. E(d) = npd = 100(.040) = 4.0.
c.
To see if the normal approximation is valid, we use:
μ ± 3σ ⇒ npv ± 3 npv qv ⇒ 100(.539) ± 3 100(.539)(.461) ⇒ 53.9 ± 3(4.9848) ⇒ 53.9 ± 14.9544 ⇒ (38.946, 68.854) Since the interval lies in the range 0 to 100, we can use the normal approximation to approximate the probability. (50 − .5) − 53.9 ⎞ ⎛ P (v ≥ 50) ≈ P ⎜ z ≥ ⎟ = P ( z ≥ −.88) = .5 + .3106 = .8106 4.985 ⎝ ⎠
122
Chapter 4
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
Let a = number of credit card users out of 100 who carry American Express. Then a is a binomial random variable with n = 100 and pa = .132. To see if the normal approximation is valid, we use:
μ ± 3σ ⇒ npa ± 3 npa qa ⇒ 100(.132) ± 3 100(.132)(.868) ⇒ 13.2 ± 3(3.385) ⇒ 13.2 ± 10.155 ⇒ (3.045, 23.355) Since the interval lies in the range 0 to 100, we can use the normal approximation to approximate the probability. (50 − .5) − 13.2 ⎞ ⎛ P (a ≥ 50) ≈ P ⎜ z ≥ ⎟ = P( z ≥ 10.72) ≈ .5 + .5 = 0 3.385 ⎝ ⎠
4.136
d.
In order for the normal approximation to be valid, μ ± 3σ must lie in the interval (0, n). This check was done in part c for both portions of the question. In both cases, the normal approximation was justified.
a.
If 80% of the passengers pass through without their luggage being inspected, then 20% will be detained for luggage inspection. The expected number of passengers detained will be: E(x) = np = 1,500(.2) = 300
4.140
b.
For n = 4,000, E(x) = np = 4,000(.2) = 800
c.
⎛ (600 + .5) − 800 ⎞ P(x > 600) ≈ P ⎜ z > ⎟ = P(z > −7.89) = .5 + .5 = 1.0 ⎜ 4000(.2)(.8) ⎟⎠ ⎝
E(x) = μ =
∑ xp( x) = 1(.2) + 2(.3) + 3(.2) + 4(.2) + 5(.1) = .2 + .6 + .6 + .8 + .5 = 2.7
E( x ) =
∑ xp( x ) = 1.0(.04) + 1.5(.12) + 2.0(.17) + 2.5(.20) + 3.0(.20) + 3.5(.14) + 4.0(.08) + 4.5(.04) + 5.0(.01) = .04 + .18 + .34 + .50 + .60 + .49 + .32 + .18 + .05 = 2.7
4.144
The sampling distribution is approximately normal only if the sample size is sufficiently large or if the population being sampled from is normal.
4.146
a.
μ x = μ = 10, σ x = σ / n = 3/ 25 = 0.6
b.
μ x = μ = 100, σ x = σ / n = 25 / 25 = 5
c.
μ x = μ = 20, σ x = σ / n = 40 / 25 = 8
d.
μ x = μ = 10, σ x = σ / n = 100 / 25 = 20
Random Variables and Probability Distributions
123
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
4.148
4.150
a.
μ x = μ = 20, σ x = σ / n = 16 / 64 = 2
b.
By the Central Limit Theorem, the distribution of is approximately normal. In order for the Central Limit Theorem to apply, n must be sufficiently large. For this problem, n = 64 is sufficiently large.
c.
z=
d.
z=
x − μx
σx x − μx
σx
=
15.5 − 20 = − 2.25 2
=
23 − 20 = 1.50 2
For this population and sample size, E ( x ) = μ = 100, σ x = σ / n = 10 / 900 = 1/3 a.
b.
4.154
Approximately 95% of the time, will be within two standard deviations of the mean, i.e., 2 ⎛1⎞ μ ± 2σ ⇒ 100 ± 2 ⎜ ⎟ ⇒ 100 ± ⇒ (99.33, 100.67). Almost all of the time, the 3 ⎝3⎠ sample mean will be within three standard deviations of the mean, i.e., μ ± 3σ ⇒ 100 ± ⎛1⎞ 3 ⎜ ⎟ ⇒ 100 ± 1 ⇒ (99, 101). ⎝ 3⎠ ⎛1⎞ No more than three standard deviations, i.e., 3 ⎜ ⎟ = 1 ⎝ 3⎠
c.
No, the previous answer only depended on the standard deviation of the sampling distribution of the sample mean, not the mean itself.
a.
μ x = μ = 98,500
b.
σx =
σ n
=
30,000 50
= 4, 242.6407
c. By the Central Limit Theorem, the sampling distribution of x is approximately normal.
124
x − μx
z=
e.
P ( x > 89,500) = P ( z > −2.12) = .5 + .4830 = .9830 (Using Table IV, Appendix B)
σx
=
89,500 − 98,500 = −2.12 4, 242.6407
d.
Chapter 4
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
4.156
a.
μ x = μ = 89.34; σ x =
σ n
=
7.74 35
= 1.3083
b.
c.
d.
4.158
a.
88 − 89.34 ⎞ ⎛ P( x > 88) = P ⎜ z > ⎟ = P(z > −1.02) = .5 + .3461 = .8461 1.3083 ⎠ ⎝ (using Table IV, Appendix B) 87 − 89.34 ⎞ ⎛ P( x < 87) = P ⎜ z < ⎟ = P(z < −1.79) = .5 − .4633 = .0367 1.3083 ⎠ ⎝ (using Table IV, Appendix B)
Since the sample size is small, we also have to assume that the distribution from σ .5 which the sample was drawn is normal. μ x = μ = 1.8 , σ x = = = .1118 n 20 1.85 − 1.8 ⎞ ⎛ P ( x ≥ 1.85) = P ⎜ z ≥ = P ( z ≥ 0.45) = .5 − .1736 = .3264 .1118 ⎟⎠ ⎝ (using Table IV, Appendix B)
b.
Using MINITAB, the descriptive statistics are:
Descriptive Statistics: Rough Variable Rough
N 20
N* 0
Mean 1.881
SE Mean 0.117
StDev 0.524
Minimum 1.060
Q1 1.303
Median 2.040
Q3 2.293
Maximum 2.640
From this output, the value of x is 1.881. c.
For x = 1.881: 1.881 − 1.8 ⎞ ⎛ P ( x ≥ 1.881) = P ⎜ z ≥ = P ( z ≥ 0.72) = .5 − .1736 = .3264 .1118 ⎟⎠ ⎝
Since this probability is so high, observing a sample mean of x = 1.881, is not unusual. The assumptions in part a appear to be valid, 4.160
If the observations are independent of each other, then
P(1, 1) = p(1)p(1) = .2(.2) = .04 P(1, 2) = p(1)p(2) = .2(.3) = .06 P(1, 3) = p(1)p(3) = .2(.2) = .04 etc.
Random Variables and Probability Distributions
125
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
a.
x
Possible Sample
1, 1 1, 2 1, 3 1, 4 1, 5 2, 1 2, 2 2, 3 2, 4 2, 5 3, 1 3, 2 3, 3
1 1.5 2 2.5 3 1.5 2 2.5 3 3.5 2 2.5 3
p( x )
Possible Samples
.04 .06 .04 .04 .02 .06 .09 .06 .06 .03 .04 .06 .04
3, 4 3, 5 4, 1 4, 2 4, 3 4, 4 4, 5 5, 1 5, 2 5, 3 5, 4 5, 5
x 3.5 4 2.5 3 3.5 4 4.5 3 3.5 4 4.5 5
p( x ) .04 .02 .04 .06 .04 .04 .02 .02 .03 .02 .02 .01
Summing the probabilities, the probability distribution of is:
x 1 1.5 2 2.5 3 3.5 4 4.5 5
p( x ) .04 .12 .17 .20 .20 .14 .08 .04 .01
b.
126
c.
P( x ≥ 4.5) = .04 + .01 = .05
d.
No. The probability of observing = 4.5 or larger is small (.05).
Chapter 4
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
4.162
For n = 36, μ x = μ = 406 and σ x = σ / n = 10.1/ 36 = 1.6833. By the Central Limit Theorem, the sampling distribution is approximately normal (n is large). 400.8 − 406 ⎞ ⎛ P( x ≤ 400.8) = P ⎜ z ≤ ⎟ = P(z ≤ −3.09) = .5 − .4990 = .0010 1.6833 ⎠ ⎝ (using Table IV, Appendix B) The first. If the true value of μ is 406, it would be extremely unlikely to observe an as small as 400.8 or smaller (probability .0010). Thus, we would infer that the true value of μ is less than 406.
4.164
4.166
a.
This experiment consists of 100 trials. Each trial results in one of two outcomes: chip is defective or not defective. If the number of chips produced in one hour is much larger than 100, then we can assume the probability of a defective chip is the same on each trial and that the trials are independent. Thus, x is a binomial. If, however, the number of chips produced in an hour is not much larger than 100, the trials would not be independent. Then x would not be a binomial random variable.
b.
This experiment consists of two trials. Each trial results in one of two outcomes: applicant qualified or not qualified. However, the trials are not independent. The probability of selecting a qualified applicant on the first trial is 3 out of 5. The probability of selecting a qualified applicant on the second trial depends on what happened on the first trial. Thus, x is not a binomial random variable. It is a hypergeometric random variable.
c.
The number of trials is not a specified number in this experiment, thus x is not a binomial random variable. In this experiment, x is counting the number of calls received.
d.
The number of trials in this experiment is 1000. Each trial can result in one of two outcomes: favor state income tax or not favor state income tax. Since 1000 is small compared to the number of registered voters in Florida, the probability of selecting a voter in favor of the state income tax is the same from trial to trial, and the trials are independent of each other. Thus, x is a binomial random variable.
a.
μ= σ2 =
∑ xp( x) = 10(.2) + 12(.3) + 18(.1) + 20(.4) = 15.4 ∑ (x − μ)
2
p ( x)
= (10 − 15.4) (.2) + (12 − 15.4)2(.3) + (18 − 15.4)2(.1) + (20 − 15.4)2(.4) = 18.44 σ = 18.44 ≈ 4.294 2
b
P(x < 15) = p(10) + p(12) = .2 + .3 = .5
c.
μ ± 2σ = 15.4 ± 2(4.294) ⇒ (6.812, 23.988)
d.
P(6.812 < x < 23.988) = .2 + .3 + .1 + .4 = 1.0
Random Variables and Probability Distributions
127
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
4.168
4.170
128
Using Table III, Appendix B, a.
When λ = 2, p(3) = P(x ≤ 3) − P(x ≤ 2) = .857 − .677 = .180
b.
When λ = 1, p(4) = P(x ≤ 4) − P(x ≤ 3) = .996 − .981 = .015
c.
When λ = .5, p(2) = P(x ≤ 2) − P(x ≤ 1) = .986 − .910 = .076
a.
1 1 ⎧ 1 = = ,10 ≤ x ≤ 90 ⎪ f(x) = ⎨ d − c 90 − 10 80 ⎪⎩ 0 otherwise
b.
μ=
c.
The interval μ ± 2σ ⇒ 50 ± 2(23.094) ⇒ 50 ± 46.188 ⇒ (3.812, 96.188) is indicated on the graph.
d.
P(x ≤ 60) = Base(height) = (60 − 10)
e.
P(x ≥ 90) = 0
f.
P(x ≤ 80) = Base(height) = (80 − 10)
g.
P(μ −σ ≤ x ≤ μ + σ) = P(50 − 23.094 ≤ x ≤ 50 + 23.094) = P(26.906 ≤ x ≤ 73.094) = Base(height) ⎛ 1 ⎞ 46.188 = (73.094 − 26.906) ⎜ ⎟ = = .577 80 ⎝ 80 ⎠
h.
P(x > 75) = Base(height) = (90 − 75)
c+d 10 + 90 = = 50 2 2 d −c 90 − 10 σ= = = 23.094011 12 12
1 5 = = .625 80 8
1 7 = = .875 80 8
1 15 = = .1875 80 80
Chapter 4
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
4.172
a.
P(z ≤ z0) = .5080 ⇒ P(0 ≤ z ≤ z0) = .5080 − .5 = .0080 Looking up the area .0080 in Table IV, ⇒ z0 = .02
b.
P(z ≥ z0) = .5517 ⇒ P(z0 ≤ z ≤ 0) = .5517 − .5 = .0517 Looking up the area .0517 in Table IV, z0 = −.13.
c.
P(z ≥ z0) = .1492 ⇒ P(0 ≤ z ≤ z0) = .5 − .1492 = .3508 Looking up the area .3508 in Table IV, ⇒ z0 = 1.04
d.
P(z0 ≤ z ≤ .59) = .4773 ⇒ P(z0 ≤ z ≤ 0) + P(0 ≤ z ≤ .59) = .4773 P(0 ≤ z ≤ .59) = .2224 Thus, P(z0 ≤ z ≤ 0) = .4773 − .2224 = .2549 Looking up the area .2549 in Table IV, z0 = -.69
4.174
μ = np = 100(.5) = 50, σ = a.
npq = 100(.5)(.5) = 5
(48 + .5) − 50 ⎞ ⎛ P(x ≤ 48) = P ⎜ z ≤ ⎟ 5 ⎝ ⎠ = P(z ≤ −.30) = .5 − .1179 = .3821
b.
P(50 ≤ x ≤ 65) (65 + .5) − 50 ⎞ ⎛ (50 − .5) − 50 = P⎜ ≤ z ≤ ⎟ 5 5 ⎝ ⎠ = P(−.10 ≤ z ≤ 3.10) = .0398 + .5000 = .5398
Random Variables and Probability Distributions
129
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
c.
(70 − .5) − 50 ⎞ ⎛ P(x ≥ 70) = P ⎜ z ≥ ⎟ 5 ⎝ ⎠ = P(z ≥ 3.90) = .5 − .5 = 0
d.
P(55 ≤ x ≤ 58) (58 + .5) − 50 ⎞ ⎛ (55 − .5) − 50 = P⎜ ≤ z ≤ ⎟ 5 5 ⎝ ⎠ = P(.90 ≤ z ≤ 1.70) = P(0 ≤ z ≤ 1.70) − P(0 ≤ z ≤ .90) = .4554 − .3159 = .1395
e.
P(x = 62) (62 + .5) − 50 ⎞ ⎛ (62 − .5) − 50 = P⎜ ≤ z ≤ ⎟ 5 5 ⎝ ⎠ = P(2.30 ≤ z ≤ 2.50) = P(0 ≤ z ≤ 2.50) − (0 ≤ z ≤ 2.30) = .4938 − .4893 = .0045
f.
P(x ≤ 49 or x ≥ 72) (49 + .5) − 50 ⎞ ⎛ = P⎜ z ≤ ⎟ 5 ⎝ ⎠ (72 − .5) − 50 ⎞ ⎛ + P⎜ z ≥ ⎟ 5 ⎝ ⎠ = P(z ≤ −.10) + P(z ≥ 4.30) = (.5 − .0398) + (.5 − .5) = .4602
4.176
a.
First we must compute μ and σ. The probability distribution for x is: x 1 2 3 4
μ = E(x) =
p(x) .3 .2 .2 .3
∑ xp( x) = 1(.3) + 2(.2) + 3(.2) + 4(.3) = 2.5
σ2 = E ∑ ( x − μ ) 2 =
∑ (x − μ)
2
p ( x)
= (1 − 2.5) (.3) + (2 − 2.5)2(.2) + (3 − 2.5)2(.2)+ (4 − 2.52)(.3) = 1.45 σ 1.45 μ x = μ = 2.5, σ x = = = .1904 n 40 2
130
Chapter 4
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
4.180
b.
By the Central Limit Theorem, the distribution of is approximately normal. The sample size, n = 40, is sufficiently large. Our answer does depend on n. If n is not sufficiently large, the Central Limit Theorem would not apply.
a.
In order to be a binomial random variable, the five characteristics must hold. 1. 2. 3.
4. 5.
For this problem, there are 5 items scanned. We will assume that these 5 trials are identical. For each item scanned, there are 2 possible outcomes: priced incorrectly (S) or priced correctly (F). The probability of being priced incorrectly remains constant from trial to trial. For this problem, we will assume that the probability of being priced incorrectly is P(S) = 1/30 for each trial. We will assume that whether one item is priced incorrectly is independent of any other. The random variable x is the number of items priced incorrectly in 5 trials.
Thus, x is a binomial random variable. b.
The estimate of p, the probability of an item being priced incorrectly is 1/30.
c.
⎛ 5⎞ P(x = 1) = ⎜ ⎟ (1/30)1(29/30)4 = .1455 ⎝ 1⎠
d. e.
⎛ 5⎞ P(x ≥ 1) = 1 − P(x = 0) = 1 − ⎜ ⎟ (1/30)0(29/30)5 = 1 − .8441 = .1559 ⎝0⎠ Let x = number of items with incorrect prices in 10,000 trials. Thus, x is a binomial random variable with n = 10,000 and p = 1/30 = .033.
μ ± 3σ ⇒ np ± 3 npq ⇒ 10,000(.033) ± 3 10, 000(.033)(.967) ⇒ 330 ± 3 319.11 ⇒ 330 ± 3(17.864) ⇒ 330 ± 53.591 ⇒ (276.409, 383.591) Since the interval lies in the range 0 to 10,000, we can use the normal approximation to approximate the probabilities. (100 − .5) − 330 ⎞ ⎛ P(x ≥ 100) ≈ P ⎜ z ≥ ⎟ = P(z ≥ −12.90) 17.864 ⎝ ⎠ = P(−12.90 ≤ z < 0) + .5 ≈ .5 + .5 = 1 (using Table IV, Appendix B)
Random Variables and Probability Distributions
131
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
f.
Let x = number of items with incorrect prices in 100 trials. Thus, x is a binomial random variable with n = 100 and p = 1/30 = .033.
μ ± 3σ ⇒ np ± 3 npq ⇒ 100(.033) ± 3 100(.033)(.967) ⇒ 3.3 ± 3 3.191 ⇒ 3.3 ± 3(1.786) ⇒ 3.3 ± 5.358 ⇒ (−2.058, 8.658) Since the interval does not lie in the range 0 to 100, the normal approximation will not be appropriate. 4.182
a.
Using Table IV, Appendix B, with μ = 8.72 and σ = 1.10, 6 − 8.72 ⎞ ⎛ = P(z < −2.47) = .5 − .4932 = .0068 P(x < 6) = P ⎜ z < 1.10 ⎟⎠ ⎝ Thus, approximately .68% of the games would result in fewer than 6 hits.
4.184
b.
The probability of observing fewer than 6 hits in a game is p = .0068. The probability of observing 0 hits would be even smaller. Thus, it would be extremely unusual to observe a no hitter.
a.
Using Table III, Appendix B, with λ =1, P(x = 3) = P(x ≤ 3) − P(x ≤ 2) = .981 − .920 = .061
b. P(x > 2) = 1 – P(x ≤ 2) = 1 − .920 = .080. 4.186
a.
Let x = number of employees who have a drug problem in 1,000 trials. Then x is a binomial random variable with n = 1,000 and p = .052. E(x) = np = 1,000(.052) = 52
b.
Let x = number of employees who have an alcohol problem in 10 trials. Then x is a binomial random variable with n = 10 and p = .085. ⎛ 10 ⎞ P(x ≥ 1) = 1 − P(x = 0) = 1 − ⎜ ⎟ .0850 .91510-0 ⎝ 0⎠ 10! =1− .0850 .91510 = 1 − .4113 = .5887 0!(10 - 0)! ⎛ 10 ⎞ 10! P(x = 2) = ⎜ ⎟ .0852 .91510-2 = .0852 .9158 = .1597 2 2!(10 − 2)! ⎝ ⎠
c.
132
We had to assume that the probability of an employee having a substance abuse problem was constant from trial to trial and that the trials were independent.
Chapter 4
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
4.188
Let x = demand for white bread. Then x is a normal random variable with μ = 7200 and σ = 300: a.
P(x ≤ x0) = .94. Find x0. ⎛ x − 7200 ⎞ P(x ≤ x0) = P ⎜ z ≤ 0 ⎟ 300 ⎠ ⎝ = P(z ≤ z0) = .94 A1 = .94 − .50 = .4400 Using Table IV and area .4400, z0 = 1.555. x 0 − 7200 x − 7200 ⇒ 1.555 = 0 ⇒ x0 = 7666.5 ≈ 7667 300 300 If the company produces 7,667 loaves, the company will be left with more than 500 loaves if the demand is less than 7,667 - 500 = 7167. 7167 − 7200 ⎞ ⎛ P(x < 7167) = P ⎜ z < ⎟ = P(z < −.11) 300 ⎝ ⎠ z0 =
b.
= .5 − .0438 = .4562 (from Table IV, Appendix B) Thus, on 45.62% of the days the company will be left with more than 500 loaves. 4.190
Let x = number of inches a gouge is from one end of the spindle. Then x has a uniform distribution with f(x) as follows: 1 1 ⎧ 1 = = ⎪ f ( x) = ⎨ d − c 18 − 0 18 ⎪⎩ 0
0 ≤ x ≤ 18 otherwise
In order to get at least 14 consecutive inches without a gouge, the gouge must be within 4 inches of either end. Thus, we must find: P(x < 4) + P(x > 14) = (4 − 0)(1/18) + (18 − 14)(1/18) = 4/18 + 4/18 = 8/18 = .4444 4.192
a.
b.
c.
μ x = μ = 3.5 σ x =
σ n
=
.5 100
= .05
3.60 − 3.5 ⎞ ⎛ 3.40 − 3.5 P(3.40 < x < 3.60) = P ⎜ 3.62) = P ⎜ z > ⎟ = P(z > 2.40) = .5 − .4918 =.0082 .05 ⎝ ⎠ (using Table IV, Appendix B)
Random Variables and Probability Distributions
133
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
d.
μ x = μ = 3.5 σ x =
σ n
=
.5 200
= .03536
The mean of the sampling distribution of would stay the same, but the standard deviation would decrease.
3.60 − 3.5 ⎞ ⎛ 3.40 − 3.5
3.62 − 3.5 ⎞ ⎛ P( x > 3.62) = P ⎜ z > ⎟ = P(z > 3.39) ≈ .5 − .5 = 0 .03536 ⎠ ⎝ (using Table IV, Appendix B) This probability is smaller than when the sample size was 100.
4.194
a.
Let p1 = probability of an error = 1/100 = .01 and p2 = probability of an error resulting in a significant problem = 1/500 = .002. Let x = number of errors in 60,000 trials. Then E(x) = μ1 = np1 = 60,000(.01) = 600.
b.
Let y = number of significant errors in 60,000 trials. Then E(y) = μ2 = np2 = 60,000(.002) = 120. σ = np2q2 = 60,000(.002)(.998) = 119.76 σ = 119.76 = 10.94 μ2 ± 3σ ⇒ 120 ± 3(10.94) ⇒ 120 ± 32.82 ⇒ (87.18, 152.82) Using Chebyshev's Rule, at least 88.9% of the observations will fall within 3 standard deviations of the mean. We would expect the number of significant errors to fall between 87 and 153.
4.196
c.
We must assume that the trials are independent and that the probability of a significant error is constant from trial to trial.
a.
By the Central Limit Theorem, the sampling distribution of x is approximately normal since n > 30 and σ 15 σx = μ x = μ = 840 = = 2.1213 n 50
b. c.
134
830 − 840 ⎞ ⎛ P( x ≤ 830) = P ⎜ z ≤ ⎟ = P(z ≤ −4.71) ≈ .5 − .5 = 0 2.1213 ⎠ ⎝ Since the probability of observing a mean of 830 or less is extremely small (≈0) if the true mean is 840, we would tend to believe that the mean is not 840, but something less.
Chapter 4
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
d.
By the Central Limit Theorem, the sampling distribution of is approximately normal since n > 30 and σ 45 σx = μ x = μ = 840 = = 6.3640 n 50 830 − 840 ⎞ ⎛ P( x ≤ 830) = P ⎜ z ≤ ⎟ = P(z ≤ −1.57) ≈ .5 − .4418 = .0582 6.3640 ⎠ ⎝
4.198
Let x = length of time a bus is late. Then x is a uniform random variable with probability distribution: ⎧1 (0 ≤ x ≤ 20) ⎪ f(x) = ⎨ 20 ⎪⎩ 0 otherwise 0 + 20 = 10 2
a.
μ=
b.
⎛ 1 ⎞ 1 P(x ≥ 19) = (20 − 19) ⋅ ⎜ ⎟ = = .05 ⎝ 20 ⎠ 20
c.
It would be doubtful that the director's claim is true, since the probability of the bus being more than 19 minutes late is so small.
Random Variables and Probability Distributions
135
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
The Furniture Fire Case (To accompany Chapters 3–4)
Using the entire data set of 3,005 invoices as the population, the mean profit margin is 48.9% and the standard deviation is 13.8291%. If a random sample is selected from this population, the sampling distribution of the sample mean ( x ) is approximately normal with a mean of 48.901% and a standard deviation of 13.8291%/ n by the Central Limit Theorem. If a random sample of 253 invoices is selected, then the probability of obtaining a sample mean of 50.8% or higher is:
50.8 − 48.901 ⎞ ⎛ P(x ≥ 50.8) = P ⎜ z ≥ ⎟ = P(z ≥ 2.18) = .5 − .4854 = .0146 13.8291/ 253 ⎠ ⎝ Since the probability of obtaining a sample mean of 50.8% or higher from this population is extremely small (.0146), we would conclude that there is evidence of fraud. If we look at the two samples separately, the evidence becomes even more damning. For the sample of 134 invoices, the probability of obtaining a sample mean of 50.6% or higher is: 50.6 − 48.901 ⎞ ⎛ P( x1 ≥ 50.6) = P ⎜ z ≥ ⎟ = P(z ≥ 1.42) = .5 − .4222 = .0778 13.8291/ 134 ⎠ ⎝ For the sample of 119 invoices, the probability of obtaining a sample mean of 51.0% or higher is: 51.0 − 48.901 ⎞ ⎛ P( x2 ≥ 51.0) = P ⎜ z ≥ ⎟ = P(z ≥ 1.66) = .5 − .4515 = .0485 13.8291/ 119 ⎠ ⎝ The probability of observing one sample mean of 50.6% or higher AND a second sample mean of 51.0% or higher is:
P( x1 ≥ 50.6, x2 ≥ 51.0) = .0778(.0485) = .0038 Again, since the probability of obtaining two sample means of 50.8% or higher and 51.0% or higher from this population is extremely small (.0038), we would conclude that there is evidence of fraud.
136
The Furniture Fire Case
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
Inferences Based on a Single Sample: Estimation with Confidence Intervals
5.2
5.4
a.
zα/2 = 1.96, using Table IV, Appendix B, P(0 ≤ z ≤ 1.96) = .4750. Thus, α/2 = .5000 − .4750 = .025, α = 2(.025) = .05, and 1 - α = 1 - .05 = .95. The confidence level is 100% × .95 = 95%.
b.
zα/2 = 1.645, using Table IV, Appendix B, P(0 ≤ z ≤ 1.645) = .45. Thus, α/2 = .50 − .45 = .05, α = 2(.05) = .1, and 1 − α = 1 − .1 = .90. The confidence level is 100% × .90 = 90%.
c.
zα/2 = 2.575, using Table IV, Appendix B, P(0 ≤ z ≤ 2.575) = .495. Thus, α/2 = .500 − .495 = .005, α = 2(.005) = .01, and 1 − α = 1 − .01 = .99. The confidence level is 100% × .99 = 99%.
d.
zα/2 = 1.282, using Table IV, Appendix B, P(0 ≤ z ≤ 1.282) = .4. Thus, α/2 = .5 − .4 = .1, α = 2(.1) = .2, and 1 − α = 1 − .2 = .80. The confidence level is 100% × .80 = 80%.
e.
zα/2 = .99, using Table IV, Appendix B, P(0 ≤ z ≤ .99) = .3389. Thus, α/2 = .5000 − .3389 = .1611, α = 2(.1611) = .3222, and 1 − α = 1 − .3222 = .6778. The confidence level is 100% × .6778 = 67.78%.
a.
For confidence coefficient .95, α = .05 and α/2 = .05/2 = .025. From Table IV, Appendix B, z.025 = 1.96. The confidence interval is:
x ± z.025 b.
c.
s 2.7 ⇒ 25.9 ± 1.96 ⇒ 25.9 ± .56 ⇒ (25.34, 26.46) 90 n
For confidence coefficient .90, α = .10 and α/2 = .10/2 = .05. From Table IV, Appendix B, z.05 = 1.645. The confidence interval is:
x ± z.05
s n
⇒ 25.9 ± 1.645
2.7 90
⇒ 25.9 ± .47 ⇒ (25.43, 26.37)
For confidence coefficient .99, α = .01 and α/2 = .01/2 = .005. From Table IV, Appendix B, z.005 = 2.58. The confidence interval is: x ± z.005
5.6
Chapter 5
s 2.7 ⇒ 25.9 ± 2.58 ⇒ 25.9 ± .73 ⇒ (25.17, 26.63) 90 n
If we were to repeatedly draw samples from the population and form the interval x ± 1.96 σ x each time, approximately 95% of the intervals would contain μ. We have no way of knowing whether our interval estimate is one of the 95% that contain μ or one of the 5% that do not.
Inferences Based on a Single Sample: Estimation with Confidence Intervals
137
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
5.8
a.
For confidence coefficient .95, α = .05 and α/2 = .05/2 = .025. From Table IV, Appendix B, z.025 = 1.96. The confidence interval is: x ± z.025
5.10
s 3.3 ⇒ 33.9 ± 1.96 ⇒ 33.9 ± .323 ⇒ (33.577, 34.223) n 400
b.
x ± z.025
c.
For part a, the width of the interval is 2(.647) = 1.294. For part b, the width of the interval is 2(.323) = .646. When the sample size is quadrupled, the width of the confidence interval is halved.
a.
A point estimate for the average number of latex gloves used per week by all healthcare workers with latex allergy is x = 19.3 .
b.
For confidence coefficient .95, α = .05 and α/2 = .05/2 = .025. From Table IV, Appendix B, z.025 = 1.96. The confidence interval is: x ± zα / 2
138
s n
⇒ 19.3 ± 1.96
11.9 46
⇒ 19.3 ± 3.44 ⇒ (15.86, 22.74)
c.
We are 95% confident that the true average number of latex gloves used per week by all healthcare workers with a latex allergy is between 15.86 and 22.74.
d.
The conditions required for the interval to be valid are: a. b.
5.12
s 3.3 ⇒ 33.9 ± 1.96 ⇒ 33.9 ± .647 ⇒ (33.253, 34.547) 100 n
The sample selected was randomly selected from the target population. The sample size is sufficiently large, i.e. n > 30.
a.
The point estimate for the mean charitable commitment of tax-exempt organizations is x = 74.9667.
b.
From the printout, the 95% confidence interval is (68.2371, 81.6962).
c.
The probability of estimating the true mean charitable commitment with a single number is 0. By estimating the true mean charitable commitment with an interval, we can be pretty confident that the true mean is in the interval.
Chapter 5
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
5.14
Using MINITAB, the descriptive statistics are: Descriptive Statistics: r Variable r
N 34
Mean 0.4224
Median 0.4300
TrMean 0.4310
Variable r
Minimum -0.0800
Maximum 0.7400
Q1 0.2925
Q3 0.6000
StDev 0.1998
SE Mean 0.0343
For confidence coefficient .95, α = .05 and α/2 = .05/2 = .025. From Table IV, Appendix B, z.025 = 1.96. The confidence interval is: x ± zα / 2
s
⇒ .4224 ± 1.96
n ⇒ (.3552, .4895)
.1998 34
⇒ .4224 ± .0672
We are 95% confident that the mean value of r is between .3552 and .4895. 5.16
a.
Using MINITAB, the descriptive statistics are:
Descriptive Statistics: Rate Variable Rate
N 30
Mean 79.73
Median 80.00
TrMean 80.15
Variable Rate
Minimum 60.00
Maximum 90.00
Q1 76.75
Q3 84.00
StDev 5.96
SE Mean 1.09
For confidence coefficient .90, α = .10 and α/2 = .10/2 = .05. From Table IV, Appendix B, z.05 = 1.645. The confidence interval is: x ± zα / 2
s 5.96 ⇒ 79.73 ± 1.645 ⇒ 79.73 ± 1.79 n 30 ⇒ (77.94, 81.52)
b.
We are 90% confident that the mean participation rate for all companies that have 401(k) plans is between 77.94% and 81.52%.
c.
We must assume that the sample size (n = 30) is sufficiently large so that the Central Limit Theorem applies.
d.
Yes. Since 71% is not included in the 90% confidence interval, it can be concluded that this company's participation rate is lower than the population mean.
e.
The center of the confidence interval is . If 60% is changed to 80%, the value of will increase, thus indicating that the center point will be larger. The value of s2 will decrease if 60% is replaced by 80%, thus causing the width of the interval to decrease.
Inferences Based on a Single Sample: Estimation with Confidence Intervals
139
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
5.18
a.
Using MINITAB, I generated 30 random numbers using the uniform distribution from 1 to 308. The random numbers were: 9, 15, 19, 36, 46, 47, 63, 73, 90, 92, 108, 112, 117, 127, 144, 145, 150, 151, 172, 178, 218, 229, 230, 241, 242, 246, 252, 267, 274, 282 I numbered the 308 observations in the order that they appear in the file. Using the random numbers generated above, I selected the 9th, 15th, 19th, etc. observations for the sample. The selected sample is: .31, .34, .34, .50, .52, .53, .64, .72, .70, .70, .75, .78, 1.00, 1.00, 1.03, 1.04, 1.07, 1.10, .21, .24, .58, 1.01, .50, .57, .58, .61, .70, .81, .85, 1.00
b.
Using MINITAB, the descriptive statistics for the sample of 30 observations are:
Descriptive Statistics: carats-samp Variable carats-s
N 30
Mean 0.6910
Median 0.7000
TrMean 0.6965
Variable carats-s
Minimum 0.2100
Maximum 1.1000
Q1 0.5150
Q3 1.0000
StDev 0.2620
SE Mean 0.0478
From above, x =.6910 and s = .2620. c.
For confidence coefficient .95, α = .05 and α/2 = .05/2 = .025. From Table IV, Appendix B, z.025 = 1.96. The confidence interval is: x ± zα / 2
5.20
s n
⇒ .691 ± 1.96
.262 30
⇒ .691 ± .094 ⇒ (.597, .785)
d.
We are 95% confident that the mean number of carats is between .597 and .785.
e.
From Exercise 2.47, we computed the “population” mean to be .631. This mean does fall in the 95% confidence interval we computed in part d.
x=
11,298 = 2.26 5,000
For confidence coefficient, .95, α = .05 and α/2 = .025. From Table IV, Appendix B, z.025 = 1.96. The confidence interval is: 1 .5 s ⇒ 2.26 ± 1.96 ⇒ 2.26 ± .04 ⇒ (2.22, 2.30) x ± zα/2 5000 n We are 95% confident the mean number of roaches produced per roach per week is between 2.22 and 2.30.
140
Chapter 5
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
5.22
5.24
a.
If x is normally distributed, the sampling distribution of x is normal, regardless of the sample size.
b.
If nothing is known about the distribution of x, the sampling distribution of x is approximately normal if n is sufficiently large. If n is not large, the distribution of x is unknown if the distribution of x is not known.
a.
P(t ≥ t0) = .025 where df = 11 t0 = 2.201
b.
P(t ≥ t0) = .01 where df = 9 t0 = 2.821
c.
P(t ≤ t0) = .005 where df = 6 Because of symmetry, the statement can be rewritten P(t ≥ −t0) = .005 where df = 6 t0 = −3.707
d.
5.26
P(t ≤ t0) = .05 where df = 18 t0 = −1.734
For this sample, ∑ x = 1567 = 97.9375 x= n 16 s2 = s=
∑x
2
(∑ x) −
n −1
2
n
=
1567 2 16 = 159.9292 16 − 1
155,867 −
s 2 = 12.6463
a.
For confidence coefficient, .80, α = 1 − .80 = .20 and α/2 = .20/2 = .10. From Table VI, Appendix B, with df = n − 1 = 16 − 1 = 15, t.10 = 1.341. The 80% confidence interval for μ is: s 12.6463 x ± t.10 ⇒ 97.94 ± 1.341 ⇒ 97.94 ± 4.240 ⇒ (93.700, 102.180) n 16
b.
For confidence coefficient, .95, α = 1 − .95 = .05 and α/2 = .05/2 = .025. From Table VI, Appendix B, with df = n − 1 = 24 − 1 = 23, t.025 = 2.131. The 95% confidence interval for μ is: x ± t.025
s n
⇒ 97.94 ± 2.131
12.6463 16
⇒ 97.94 ± 6.737 ⇒ (91.203, 104.677)
The 95% confidence interval for μ is wider than the 80% confidence interval for μ found in part a.
Inferences Based on a Single Sample: Estimation with Confidence Intervals
141
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
c.
For part a: We are 80% confident that the true population mean lies in the interval 93.700 to 102.180. For part b: We are 95% confident that the true population mean lies in the interval 91.203 to 104.677. The 95% confidence interval is wider than the 80% confidence interval because the more confident you want to be that μ lies in an interval, the wider the range of possible values.
5.28
a.
Using MINITAB, the descriptive statistics are:
Descriptive Statistics: MTBE Variable MTBE
N 12
N* 0
Mean 97.2
SE Mean 32.8
StDev 113.8
Minimum 8.00
Q1 12.0
Median 50.5
Q3 146.0
Maximum 367.0
A point estimate for the true mean MTBE level for all well sites located near the New Jersey gasoline service station is x = 97.2 . b.
For confidence coefficient .99, α = .01 and α/2 = .01/2 = .005. From Table VI, Appendix B, with df = n – 1 = 12 – 1 = 11, t.005 = 3.106. The 99% confidence interval is: s
x ± t.005
n
⇒ 97.2 ± 3.106
113.8 12
⇒ 97.2 ± 102.04 ⇒ (−4.84, 199.24)
We are 99% confident that the true mean MTBE level for all well sites located near the New Jersey gasoline service station is between −4.84 and 199.24. c.
We must assume that the data were sampled from a normal distribution. We will use the four methods to check for normality. First, we will look at a histogram of the data. Using MINITAB, the histogram of the data is: Histogram of MTBE 5
Fr equency
4
3
2
1
0
142
0
50
100
150 200 M T BE
250
300
350
Chapter 5
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
From the histogram, the data do not appear to be mound-shaped. This indicates that the data may not be normal. Next, we look at the intervals x ± s, x ± 2 s, x ± 3s . If the proportions of observations falling in each interval are approximately .68, .95, and 1.00, then the data are approximately normal. Using MINITAB, the summary statistics are: x ± s ⇒ 97.2 ± 113.8 ⇒ (−16.6, 211.0) 10 of the 12 values fall in this interval. The proportion is .83. This is not very close to the .68 we would expect if the data were normal. x ± 2 s ⇒ 97.2 ± 2(113.8) ⇒ 97.2 ± 227.6 ⇒ (−130.4, 324.8) 11 of the 12 values fall in this interval. The proportion is .92. This is a somewhat smaller than the .95 we would expect if the data were normal. x ± 2 s ⇒ 97.2 ± 3(113.8) ⇒ 97.2 ± 341.4 ⇒ (−244.2, 438.6) 12 of the 12 values fall in this interval. The proportion is 1.00. This is exactly the 1.00 we would expect if the data were normal. From this method, it appears that the data may not be normal. Next, we look at the ratio of the IQR to s. IQR = QU – QL = 146.0 – 12.0 = 134.0. IQR 134.0 = = 1.18 This is somewhat smaller than the 1.3 we would expect if the data s 113.8 were normal. This method indicates the data may not be normal.
Finally, using MINITAB, the normal probability plot is: Probability Plot of MTBE N ormal - 95% C I 99
95 90
Mean StDev
97.17 113.8
N AD P-Value
12 0.929 0.012
P er cent
80 70 60 50 40 30 20 10 5
1
-300
-200
-100
0
100 200 M T BE
300
400
500
Since the data do not form a fairly straight line, the data may not be normal. From above, the all methods indicate the data may not be normal. It appears that the data probably are not normal.
Inferences Based on a Single Sample: Estimation with Confidence Intervals
143
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
5.30
We must assume that the distribution of the LOS's for all patients is normal. a.
For confidence coefficient .90, α = 1 − .90 = .10 and α/2 = .10/2 = .05. From Table VI, Appendix B, with df = n − 1 = 20 − 1 = 19, t.05 = 1.729. The 90% confidence interval is:
x ± t.05
5.32
5.34
s n
⇒ 3.8 ± 1.729
1.2 20
⇒ 3.8 ± .464 ⇒ (3.336, 4.264)
b.
We are 90% confident that the mean LOS is between 3.336 and 4.264 days.
c.
“90% confidence” means that if repeated samples of size n are selected from a population and 90% confidence intervals are constructed, 90% of all intervals thus constructed will contain the population mean.
a.
The 95% confidence interval for the mean surface roughness of coated interior pipe is (1.63580, 2.12620).
b.
No. Since 2.5 does not fall in the 95% confidence interval, it would be very unlikely that the average surface roughness would be as high as 2.5 micrometers.
a.
The population is the set of all DOT permanent count stations in the state of Florida.
b.
Yes. There are several types of routes included in the sample. There are 3 recreational areas, 7 rural areas, 5 small cities, and 5 urban areas.
c.
Using MINITAB, the descriptive statistics are:
Descriptive Statistics: 30th hour, 100th hour Variable 30th hou 100th ho
N 20 20
Mean 2206 2096
Median 2064 1999
TrMean 2165 2048
Variable 30th hou 100th ho
Minimum 252 229
Maximum 4905 4815
Q1 1429 1318
Q3 3068 2877
StDev 1224 1203
SE Mean 274 269
For confidence coefficient .95, α = .05 and α/2 = .05/2 = .025. From Table VI, Appendix B, with df = n – 1 = 20 – 1 = 19, t.025 = 2.093. The 95% confidence interval is: x ± t.025
s n
⇒ 2, 206 ± 2.093
1, 224 20
⇒ 2, 206 ± 572.84 ⇒ (1,633.16, 2,778.84)
We are 95% confident that the mean traffic count at the 30th highest hour is between 1,633.16 and 2,778.84. d.
144
We must assume that the distribution of the traffic counts at the 30th highest hour is normal. From the stem-and-leaf display, the data look fairly mound-shaped. Thus, the assumption of normality is probably met.
Chapter 5
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
e.
For confidence coefficient .95, α = .05 and α/2 = .05/2 = .025. From Table VI, Appendix B, with df = n – 1 = 20 – 1 = 19, t.025 = 2.093. The 95% confidence interval is: x ± t.025
s n
⇒ 2,096 ± 2.093
1, 203 20
⇒ 2,096 ± 563.01 ⇒ (1,532.99, 2,659.01)
We are 95% confident that the mean traffic count at the 100th highest hour is between 1,532.99 and 2,659.01. We must assume that the distribution of the traffic counts at the 100th highest hour is normal. From the stem-and-leaf display, the data look fairly mound-shaped. Thus, the assumption of normality is probably met. f.
If μ = 2,700, it is very possible that it is the mean count for the 30th highest hour. It falls in the 95% confidence interval for the mean count for the 30th highest hour. It is not very likely that the mean count for the 100th highest hour is 2,700. It does not fall in the 95% confidence interval for the mean count for the 100th highest hour. (See parts c and e above.)
5.36
By the Central Limit Theorem, the sampling distribution of is approximately normal with pq mean μ pˆ = p and standard deviation σ pˆ = . n
5.38
a.
The sample size is large enough if the interval pˆ ± 3σ pˆ does not include 0 or 1.
pˆ ± 3σ pˆ ⇒ pˆ ± 3
ˆˆ pq pq .88(1 − .88) ⇒ .88 ± .089 ⇒ pˆ ± 3 ⇒ .88 ± n n 121 ⇒ (.791, .969)
Since the interval lies within the interval (0, 1), the normal approximation will be adequate. b.
For confidence coefficient .90, α = .10 and α/2 = .05. From Table IV, Appendix B, z.05 = 1.645. The 90% confidence interval is: pˆ ± z .05
c.
pq ⇒ pˆ ± 1.645 n
ˆˆ pq .88(.12) ⇒ .88 ± .049 ⇒ .88 ± 1.645 1.645 n 121 ⇒ (.831, .929)
We must assume that the sample is a random sample from the population of interest.
Inferences Based on a Single Sample: Estimation with Confidence Intervals
145
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
5.40
a.
Of the 50 observations, 15 like the product ⇒ pˆ =
15 = .30. 30
To see if the sample size is sufficiently large:
pˆ ± 3 σ pˆ ≈ pˆ ± 3
ˆˆ pq .3(.7) ⇒ .3 ± 3 ⇒ .3 ± .194 ⇒ (.106, .494) n 50
Since this interval is wholly contained in the interval (0, 1), we may conclude that the normal approximation is reasonable. For the confidence coefficient .80, α = .20 and α/2 = .10. From Table IV, Appendix B, z.10 = 1.28. The confidence interval is: pˆ ± z.10
5.42
ˆˆ pq .3(.7) ⇒ .3 ± 1.28 ⇒ .3 ± .083 ⇒ (.217, .383) n 50
b.
We are 80% confident the proportion of all consumers who like the new snack food is between .217 and .383.
a.
The point estimate of p is pˆ = .11 .
b.
To see if the sample size is sufficiently large: ˆˆ pq .11(.89) ⇒ .11 ± 3 ⇒ .11 ± .077 ⇒ (.033, .187) n 150 Since the interval is wholly contained in the interval (0, 1), we may conclude that the normal approximation is reasonable. pˆ ± 3σ pˆ ≈ pˆ ± 3
For confidence coefficient .95, α = .05 and α/2 = .05/2 = .025. From Table IV, Appendix B, z.025 = 1.96. The confidence interval is: pˆ ± z.025
5.44
ˆˆ pq .11(.89) ⇒ .11 ± 1.645 ⇒ .11 ± .05 ⇒ (.06, .16) n 150
c.
We are 95% confident that the true proportion of MSDS that are satisfactorily completed is between .06 and .16.
a.
The point estimate of p is pˆ =
x 16 = = .052 . n 308
To see if the sample size is sufficiently large: ˆˆ pq .052(.948) pˆ ± 3σ pˆ ≈ pˆ ± 3 ⇒ .052 ± 3 ⇒ .052 ± .038 ⇒ (.014, .090) n 308 Since the interval is wholly contained in the interval (0, 1), we may conclude that the normal approximation is reasonable.
146
Chapter 5
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
For confidence coefficient .99, α = .01 and α/2 = .01/2 = .005. From Table IV, Appendix B, z.005 = 2.58. The confidence interval is: pˆ ± z.05
b.
ˆˆ pq .052(.948) ⇒ .052 ± 2.58 ⇒ .052 ± .033 ⇒ (.019, .085) n 308
We are 99% confident that the true proportion of diamonds for sale that are classified as “D” color is between .019 and .085. x 81 = .263 . The point estimate of p is pˆ = = n 308 To see if the sample size is sufficiently large: pˆ ± 3σ pˆ ≈ pˆ ± 3
ˆˆ pq .263(.737) ⇒ .263 ± 3 ⇒ .263 ± .075 ⇒ (.188, .338) n 308
Since the interval is wholly contained in the interval (0, 1), we may conclude that the normal approximation is reasonable. For confidence coefficient .99, α = .01 and α/2 = .01/2 = .005. From Table IV, Appendix B, z.005 = 2.58. The confidence interval is: pˆ ± z.05
ˆˆ pq .263(.737) ⇒ .263 ± 2.58 ⇒ .263 ± .065 ⇒ (.198, .328) n 308
We are 99% confident that the true proportion of diamonds for sale that are classified as “VS1” clarity, is between .198 and .328. 5.46
a.
The population is all senior human resource executives at U.S. companies.
b.
The population parameter of interest is p, the proportion of all senior human resource executives at U.S. companies who believe that their hiring managers are interviewing too many people to find qualified candidates for the job.
c.
The point estimate of p is pˆ =
x 211 = = .42 . To see if the sample size is sufficiently n 502
large: pˆ ± 3σ pˆ ≈ pˆ ± 3
ˆˆ pq .42(.58) ⇒ .42 ± 3 ⇒ .42 ± .066 ⇒ (.354, .486) n 502
Since the interval is wholly contained in the interval (0, 1), we may conclude that the normal approximation is reasonable.
Inferences Based on a Single Sample: Estimation with Confidence Intervals
147
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
d.
For confidence coefficient .98, α = .02 and α/2 = .02/2 = .01. From Table IV, Appendix B, z.01 = 2.33. The confidence interval is: pˆ ± z.01
ˆˆ pq .42(.58) ⇒ .42 ± 2.33 ⇒ .42 ± .051 ⇒ (.369, .471) n 502
We are 98% confident that the true proportion of all senior human resource executives at U.S. companies who believe that their hiring managers are interviewing too many people to find qualified candidates for the job is between .369 and .471.
5.48
e.
A 90% confidence interval would be narrower. If the interval was narrower, it would contain fewer values, thus, we would be less confident.
a.
The point estimate of p is
b.
We must check to see if the sample size is sufficiently large:
pˆ ± 3σ pˆ ≈ pˆ ± 3
pˆ = x/n = 35/55 = .636.
ˆˆ pq .636(.364) ⇒ .636 ± 3 ⇒ .636 ± .195 ⇒ (.441, .831) n 55
Since the interval is wholly contained in the interval (0, 1) we may assume that the normal approximation is reasonable. For confidence coefficient, .99, α = .01 and α/2 = .01/2 = .005. From Table IV, Appendix B, z.005 = 2.575. The confidence interval is: pˆ ± z.005 c. d.
5.50
ˆˆ pq .636(.364) ⇒ .636 ± 2.575 ⇒ .636 ± .167 ⇒ (.469, .803) n 55
We are 99% confident that the true proportion of fatal accidents involving children is between .469 and .803. The sample proportion of children killed by air bags who were not wearing seat belts or were improperly restrained is 24/35 = .686. This is rather large proportion. Whether a child is killed by an airbag could be related to whether or not he/she was properly restrained. Thus, the number of children killed by air bags could possibly be reduced if the child were properly restrained.
The point estimate of p is pˆ =
x 36 = = .434 . n 83
To see if the sample size is sufficiently large: pˆ ± 3σ pˆ ≈ pˆ ± 3
ˆˆ pq .434(.566) ⇒ .434 ± 3 ⇒ .434 ± .163 ⇒ (.271, .597) n 83
Since the interval is wholly contained in the interval (0, 1), we may conclude that the normal approximation is reasonable.
148
Chapter 5
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
For confidence coefficient .95, α = .05 and α/2 = .05/2 = .025. From Table IV, Appendix B, z.025 = 1.96. The confidence interval is: ˆˆ pq .434(.566) ⇒ .434 ± 1.96 ⇒ .434 ± .107 ⇒ (.327, .541) n 83
pˆ ± z.025
We are 95% confident that the true proportion of healthcare workers with latex allergies actually suspects the he or she actually has the allergy is between .327 and .541. 5.52
To compute the necessary sample size, use
n=
2 ( zα / 2 ) σ 2
where α = 1 − .95 = .05 and α/2 = .05/2 = .025.
SE 2
From Table IV, Appendix B, z.025 = 1.96. Thus, n=
(1.96) 2 (7.2) = 307.328 ≈ 308 .32
You would need to take 308 samples. 5.54
a.
To compute the needed sample size, use:
n=
Thus, n =
( zα / 2 ) SE
2
pq
2
where z.025 = 1.96 from Table IV, Appendix B.
(1.96) 2 (.2)(.8) = 96.04 ≈ 97 .08 2
You would need to take a sample of size 97. b.
To compute the needed sample size, use:
n=
( zα / 2 )
2
SE
2
pq
=
(1.96) 2(.5)(.5) = 150.0625 ≈ 151 .08 2
You would need to take a sample of size 151. 5.56
a.
For a width of 5 units, SE = 5/2 = 2.5. To compute the needed sample size, use
( zα / 2 ) σ 2 2
n=
SE
2
where α = 1 − .95 = .05 and α/2 = .025.
Inferences Based on a Single Sample: Estimation with Confidence Intervals
149
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
From Table IV, Appendix B, z.025 = 1.96. Thus,
n=
(1.96) 2 (14) 2 = 120.47 ≈ 121 2.52
You would need to take 121 samples at a cost of 121($10) = $1210. Yes, you do have sufficient funds. b.
For confidence coefficient .90, α = 1 − .90 = .10 and α/2 = .10/2 = .05. From Table IV, Appendix B, z.05 = 1.645.
n=
(1.645) 2 (14) 2 = 84.86 ≈ 85 2.52
You would need to take 85 samples at a cost of 85($10) = $850. You still have sufficient funds but have an increased risk of error. 5.58
The sample size will be larger than necessary for any p other than .5.
5.60
a.
The confidence level desired by the researchers is 90%.
b.
The sampling error desired by the researchers is SE = .05.
c.
For confidence coefficient .90, α = .10 and α/2 = .10/2 = .05. From Table IV, x 64 Appendix B, z.05 = 1.645. From the problem, we will use pˆ = = = .604 n 106 to estimate p. Thus,
n=
( zα / 2 ) 2 pq 1.6452.604(.396) = = 258.9 ≈ 259 ( SE ) 2 .052
Thus, we would need a sample of size 259. 5.62
For confidence coefficient .95, α = .05 and α/2 = .05/2 = .025. From Table IV, Appendix B, z.025 = 1.96. For this study, n=
( zα / 2 ) 2 σ 2 1.962 (5) 2 ≈ = 96.04 ≈ 97 SE 2 12
The sample size needed is 97.
150
Chapter 5
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
5.64
For confidence coefficient .90, α = .10 and α/2 = .05. From Table IV, Appendix B, z.05 = 1.645. For a width of .06, SE = .06/2 = .03 ( zα / 2 ) 2 pq (.1645) 2 (.17)(.83) = 424.2 ≈ 425 = The sample size is n = 2 .032 SE You would need to take n = 425 samples.
5.66
To compute the necessary sample size, use n=
( zα / 2 ) 2 σ 2 where α = 1 − .90 = .10 and α/2 = .05. SE 2
From Table IV, Appendix B, z.05 = 1.645. Thus, n= 5.68
a.
(1.645) 2 (10) 2 = 270.6 ≈ 271 12
To compute the needed sample size, use n=
( zα / 2 ) 2 σ 2 where α = 1 − .90 = .10 and α/2 = .05. SE 2
From Table IV, Appendix B, z.10 = 1.645. Thus, n=
(1.645) 2 (2) 2 = 1,082.41 ≈ 1,083 .12
b.
As the sample size decreases, the width of the confidence interval increases. Therefore, if we sample 100 parts instead of 1,083, the confidence interval would be wider.
c.
To compute the maximum confidence level that could be attained meeting the management's specifications, n=
( zα / 2 ) 2 σ 2 ( zα / 2 )(2) 2 100(.01) ⇒ 100 = ⇒ ( zα / 2 ) 2 = = .25 ⇒ zα/2 = .5 2 2 4 SE .1
Using Table IV, Appendix B, P(0 ≤ z ≤ .5) = .1915. Thus, α/2 = .5000 − .1915 = .3085,
α = 2(.3085) = .617, and 1 − α = 1 − .617 = .383. The maximum confidence level would be 38.3%.
Inferences Based on a Single Sample: Estimation with Confidence Intervals
151
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
5.70
5.72
σx =
σ n
N −n N 2500 − 1000 = 4.90 2500
a.
σx=
200 1000
b.
σx =
200 5000 − 1000 = 5.66 5000 1000
c.
σx =
10,000 − 1000 = 6.00 10,000 1000
d.
σx =
200 100,000 − 1000 = 6.293 100,000 1000
a.
For n = 36, with the finite population correction factor: ⎛ N − n ⎞ 24 ⎛ 5000 − 64 ⎞ σˆ x = s / n ⎜⎜ ⎟= ⎜ ⎟ = 3 .9872 = 2.9807 N ⎟⎠ 5000 ⎟⎠ 64 ⎜⎝ ⎝
200
without the finite population correction factor: 24 σˆ x = s / n = =3 64
σˆ x without the finite population correction factor is slightly larger. b.
For n = 400, with the finite population correction factor: ⎛ N −n ⎞ 24 ⎛ 5000 − 400 ⎞ σˆ x = s / n ⎜⎜ ⎟⎟ = ⎜ ⎟ = 1.2 .92 = 1.1510 N ⎠ 5000 ⎟⎠ 400 ⎜⎝ ⎝ without the finite population correction factor: 24 σˆ x = s / n = = 1.2 400
c.
5.74
In part a, n is smaller relative to N than in part b. Therefore, the finite population correction factor did not make as much difference in the answer in part a as in part b.
An approximate 95% confidence interval for μ is: s N −n 14 375 − 40 x ± 2σˆ x ⇒ x ± 2 ⇒ 422 ± 2 375 N 40 n ⇒ 422 ± 4.184 ⇒ (417.816, 426.184)
152
Chapter 5
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
5.76
a.
For N = 2,193, n = 223, x =116,754, and s = 39,185, the 95% confidence interval is:
s N −n 39,185 2,193 − 223 ⇒ 116,754 ± 2 N 2,193 n 223 ⇒ 116,754 ± 4,974.06 ⇒ (111,779.94, 121,728.06)
x ± 2σˆ x ⇒ x ± 2
5.78
b.
We are 95% confident that the mean salary of all vice presidents who subscribe to Quality Progress is between $111,777.94 and $121,728.06.
a.
The population of interest is the set of all households headed by women that have incomes of $25,000 or more in the database.
b.
Yes. Since n/N = 1,333/25,000 = .053 exceeds .05, we need to apply the finite population correction.
c.
The standard error for pˆ should be:
σˆ pˆ = d.
.708(1 − .708) ⎛ 25,000 − 1,333 ⎞ pˆ (1 − pˆ ) ⎛ N − n ⎞ ⎜ ⎟= ⎜ ⎟ = .012 1333 25,000 n ⎝ N ⎠ ⎝ ⎠
For confidence coefficient .90, α = 1 − .90 = .10 and α/2 = .10/2 = .05. From Table IV, Appendix B, z.05 = 1.645. The approximate 90% confidence interval is:
pˆ ± 1.645σˆ pˆ ⇒ .708 ± 1.645(.012) ⇒ (.688, .728) 5.80
For N = 1,500, n = 35, x = 1, and s = 124, the 95% confidence interval is:
⎛ s ⎞ N −n ⎛ 124 ⎞ 1,500 − 35 x ± 2σˆ x ⇒ x ± 2 ⎜ ⇒ 1 ± 2⎜ ⇒ 1 ± 41.43 ⎟ ⎟ 1,500 N ⎝ n⎠ ⎝ 35 ⎠ ⇒ (−40.43, 42.43) We are 95% confident that the mean error of the new system is between -$40.43 and $42.43.
5.82
a.
For a small sample from a normal distribution with unknown standard deviation, we use the t statistic. For confidence coefficient .95, α = 1 − .95 = .05 and α/2 = .05/2 = .025. From Table VI, Appendix B, with df = n − 1 = 23 − 1 = 22, t.025 = 2.074.
b.
For a large sample from a distribution with an unknown standard deviation, we can estimate the population standard deviation with s and use the z statistic. For confidence coefficient .95, α = 1 − .95 = .05 and α/2 = .05/2 = .025. From Table IV, Appendix B, z.025 = 1.96.
c.
For a small sample from a normal distribution with known standard deviation, we use the z statistic. For confidence coefficient .95, α = 1 − .95 = .05 and α/2 = .05/2 = .025. From Table IV, Appendix B, z.025 = 1.96.
Inferences Based on a Single Sample: Estimation with Confidence Intervals
153
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
5.84
d.
For a large sample from a distribution about which nothing is known, we can estimate the population standard deviation with s and use the z statistic. For confidence coefficient .95, α = 1 − .95 = .05 and α/2 = .05/2 = .025. From Table IV, Appendix B, z.025 = 1.96.
e.
For a small sample from a distribution about which nothing is known, we can use neither z nor t.
a.
Of the 400 observations, 227 had the characteristic ⇒ pˆ = 227/400 = .5675. To see if the sample size is sufficiently large: pˆ ± 3σ pˆ ⇒ pˆ ± 3
ˆˆ pq pq .5675(.4325) ⇒ pˆ ± 3 ⇒ .5675 ± 3 ⇒ .5675 ± .0743 n n 400 ⇒ (.4932, .6418)
Since the interval lies within the interval (0, 1), the normal approximation will be adequate. For confidence coefficient .95, α = .05 and α/2 = .05/2 = .025. From Table IV, Appendix B, z.025 = 1.96. The confidence interval is: pˆ ± z.025
b.
pq ⇒ ± 1.96 n
ˆˆ pq .5675(.4325) ⇒ .5675 ± 1.96 ⇒ .5675 ± .0486 n 400 ⇒ (.5189, .6161)
For this problem, SE = .02. For confidence coefficient .95, α = .05 and α/2 = .05/2 = .025. From Table IV, Appendix B, z.025 = 1.96. Thus, n=
( zα / 2 ) 2 pq (1.96) 2 (.5675)(.4325) = = 2,357.2 ≈ 2,358 SE 2 .022
Thus, the sample size was 2,358. 5.86
a.
The finite population correction factor is: ( N − n) = N
b.
The finite population correction factor is: ( N − n) = N
c.
(100 − 20) = .8944 100
The finite population correction factor is: ( N − n) = N
154
(2,000 − 50) = .9874 2,000
(1,500 − 300) = .8944 1,500
Chapter 5
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
5.88
5.90
a.
From the printout, the 90% confidence interval is (4.277, 6.184). We are 90% confident that the mean number of offices operated by all Florida law firms is between 4.277 and 6.184.
b.
From the histogram, it appears that the data probably are not from a normal distribution. The data appear to be skewed to the right.
c.
The interval constructed in part a depends on the assumption that the data came from a normal distribution. From part b, it appears that this assumption is not valid. Thus, the confidence interval is probably not valid.
a.
The point estimate of p is pˆ =
b.
To see if the sample size is sufficiently large:
x 67 = = .638 . n 105
ˆˆ pq .638(.362) ⇒ .638 ± 3 ⇒ .638 ± .141 ⇒ (.497, .779) n 105 Since the interval is wholly contained in the interval (0, 1), we may conclude that the normal approximation is reasonable. pˆ ± 3σ pˆ ≈ pˆ ± 3
For confidence coefficient .95, α = .05 and α/2 = .05/2 = .025. From Table IV, Appendix B, z.025 = 1.96. The confidence interval is: pˆ ± z.025
5.92
ˆˆ pq .638(.362) ⇒ .638 ± 1.96 ⇒ .638 ± .092 ⇒ (.546, .730) n 105
c.
We are 95% confident that the true proportion of on-the-job homicide cases that occurred at night is between .546 and .730.
a.
Using MINITAB, the descriptive statistics are:
Descriptive Statistics: NJValues Variable NJValues
N 20
N* 0
Mean 440.4
SE Mean 67.8
StDev 303.0
Minimum 159.0
Q1 212.3
Median 297.5
Q3 660.5
Maximum 1190.0
For confidence coefficient .95, α = .05 and α/2 = .05/2 = .025. From Table VI, Appendix B, with df = n – 1 = 20 – 1 = 19, t.025 = 2.093. The 95% confidence interval is: x ± t.025 b.
s n
⇒ 440.4 ± 2.093
303.0 20
⇒ 440.4 ± 141.81 ⇒ (298.59, 582.21)
We are 95% confident that the true mean sales price is between $298,590 and $582,210.
Inferences Based on a Single Sample: Estimation with Confidence Intervals
155
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
c.
"95% confidence" means that in repeated sampling, 95% of all confidence intervals constructed will contain the true mean sales price and 5% will not.
d.
Using MINITAB, a histogram of the data is: Histogram of NJValues 9 8
Fr equency
7 6 5 4 3 2 1 0
200
400
600 800 NJValues
1000
1200
Since the sample size is small (n = 20), we must assume that the distribution of sales prices is normal. From the histogram, it does not appear that the data come from a normal distribution. Thus, this confidence interval is probably not valid. 5.94
a.
For confidence coefficient .90, α = .10 and α/2 = .05. From Table IV, Appendix B, z.05 = 1.645. The 90% confidence interval is: x ± z.05
σ n
⇒ x ± 1.645
s n
⇒ 12.2 ± 1.645
10 100
⇒ 12.2 ± 1.645 ⇒ (10.555, 13.845)
We are 90% confident that the mean number of days of sick leave taken by all its employees is between 10.555 and 13.845. b.
For confidence coefficient .99, α = .01 and α/2 = .005. From Table IV, Appendix B, z.005 = 2.58. The sample size is n =
2 ( zα / 2 ) σ 2
SE 2
=
(2.58) 2 (10) 2 = 166.4 ≈ 167 22
You would need to take n = 167 samples.
156
Chapter 5
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
5.96
a.
For confidence coefficient .99, α = .01 and α/2 = .01/2 = .005. From Table IV, Appendix B, z.005 = 2.58. The confidence interval is: x ± zα / 2
2.21 s ⇒ 1.13 ± 2.58 ⇒ 1.13 ± .67 72 n ⇒ (.46, 1.80)
We are 99% confident that the mean number of pecks at the blue string is between .46 and 1.80.
5.98
b.
Yes. The mean number of pecks at the white string is 7.5. This value does not fall in the 99% confident interval for the blue string found in part a. Thus, the chickens are more apt to peck at white string.
a.
First we must compute pˆ : pˆ =
x 124 = .78 = n 159
To see if the sample size is sufficiently large: ˆˆ pq .78(22) ⇒ .78 ± 3 ⇒ .78 ± .099 ⇒ (.681, .879) n 159 Since this interval is wholly contained in the interval (0, 1), we may conclude that the normal approximation is reasonable.
pˆ ± 3σ pˆ ≈ pˆ ± 3
For confidence coefficient .90, α = .10 and α/2 = .10/2 = .05. From Table IV, Appendix B, z.05 = 1.645. The confidence interval is: pˆ ± z.05
pq ≈ pˆ ± 1.645 n
ˆˆ pq .78(.22) ⇒ .78 ± 1.645 ⇒ .78 ± .054 n 159 ⇒ (.726, .834)
We are 90% confident that the true proportion of all truck drivers who suffer from sleep apnea is between .726 and .834.
5.100
b.
Sleep researchers believe that 25% of the population suffer from obstructive sleep apnea. Since the 90% confidence interval for the proportion of truck drivers who suffer from sleep apnea does not contain .25, it appears that the true proportion of truck drivers who suffer from sleep apnea is larger than the proportion of the general population.
a.
The population of interest is the set of all debit cardholders in the U.S.
c.
Of the 1252 observations, 180 had used the debit card to purchase a product or service on the Internet ⇒ pˆ =
180 = .144 1252
Inferences Based on a Single Sample: Estimation with Confidence Intervals
157
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
To see if the sample size is sufficiently large: pˆ ± 3σ pˆ ≈ pˆ ± 3
ˆˆ pq .144(.856) ⇒ .144 ± 3 ⇒ .144 ± .030 ⇒ (.114, .174) n 1252
Since this interval is wholly contained in the interval (0, 1), we may conclude that the normal approximation is reasonable. d.
For confidence coefficient .98, α = 1 − .98 = .02 and α/2 = .02/2 = .01. From Table IV, Appendix B, z.01 = 2.33. The confidence interval is: pˆ ± z.01
ˆˆ pq .144(.856) ⇒ .144 ± .023 ⇒ (.121, .167) ⇒ .144 ± 2.33 n 1252
We are 98% confident that the proportion of debit cardholders who have used their card in making purchases over the Internet is between .121 and .167.
5.102
e.
Since we would have less confidence with a 90% confidence interval than with a 98% confidence interval, the 90% interval would be narrower.
a.
Of the 100 cancer patients, 7 were fired or laid off ⇒ = 7/100 = .07. To see if the sample size is sufficiently large: pˆ ± 3σ pˆ ⇒ pˆ ± 3
ˆˆ pq pq .07(.93) ⇒ pˆ ± 3 ⇒ .07 ± 3 ⇒ .07 ± .077 n n 100 ⇒ (−.007, .145)
Since the interval does not lie within the interval (0, 1), the normal approximation will not be adequate. We will go ahead and construct the interval anyway. For confidence coefficient .90, α = .10 and α/2 = .10/2 = .05. From Table IV, Appendix B, z.05 = 1.645. The confidence interval is: pˆ ± z.05
pq ⇒ pˆ ± 1.645 n
ˆˆ pq .07(.93) ⇒ .07 ± 1.645 ⇒ .07 ± .042 n 100 ⇒ (.028, .112)
Converting these to percentages, we get (2.8%, 11.2%).
158
b.
We are 90% confident that the percentage of all cancer patients who are fired or laid off due to their illness is between 2.8% and 11.2%.
c.
Since the rate of being fired or laid off for all Americans is 1.3% and this value falls outside the confidence interval in part b, there is evidence to indicate that employees with cancer are fired or laid off at a rate that is greater than that of all Americans.
Chapter 5
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
5.104
a.
x 9296 = = .9296 n 10,000
pˆ =
The approximate 95% confidence interval is: pˆ (1 − pˆ ) N − n .9296(.0704) 500,000 − 10,000 ⇒ .9296 ± 2 10,000 500,000 n N
pˆ ± 2
⇒ .9296 ± 2 .000006413 ⇒ .9296 ± .0051 ⇒ (.9245, .9347)
5.106
10,000 × 100% = 2% of the subscribers returned the questionnaire. Often in mail 500,000 surveys, those that respond are those with strong views. Thus, the 10,000 that responded may not be representative. I would question the estimate in part a.
b.
Only
a.
The point estimate for the fraction of the entire market who refuse to purchase bars is:
pˆ = b.
x 23 = = .094 n 244
To see if the sample size is sufficient:
pˆ ± 3
ˆˆ pq (.094)(.906) ⇒ .094 ± 3 ⇒ .094 ± .056 ⇒ (.038, .150) 244 n
Since the interval above is contained in the interval (0, 1), the sample size is sufficiently large. c.
For confidence coefficient .95, α = 1 − .95 = .05 and α/2 = .05/2 = .025. From Table IV, Appendix B, z.025 = 1.96. The confidence interval is:
pˆ ± z.025 d.
ˆˆ pq (.094)(.906) ⇒ .094 ± 1.96 ⇒ .094 ± .037 ⇒ (.057, .131) 244 n
The best estimate of the true fraction of the entire market who refuse to purchase bars six months after the poisoning is .094. We are 95% confident the true fraction of the entire market who refuse to purchase bars six months after the poisoning is between .057 and .131.
Inferences Based on a Single Sample: Estimation with Confidence Intervals
159
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
5.108
The bound is SE = .1. For confidence coefficient .99, α = 1 − .99 = .01 and α/2 = .01/2 = .005. From Table IV, Appendix B, z.005 = 2.575. We estimate p with from Exercise 7.48 which is = .636. Thus,
n=
( zα / 2 ) 2 pq 2.5752 (.636)(.364) ≈ = 153.5 ⇒ 154 .12 SE 2
The necessary sample size would be 154. 5.110
Since the manufacturer wants to be reasonably certain the process is really out of control before shutting down the process, we would want to use a high level of confidence for our inference. We will form a 99% confidence interval for the mean breaking strength. For confidence coefficient .99, α = .01 and α/2 = .01/2 = .005. From Table VI, Appendix B, with df = n – 1 = 9 – 1 = 8, t.005 = 3.355. The 99% confidence interval is:
x ± t.005
s 22.9 ⇒ 985.6 ± 3.355 ⇒ 985.6 ± 25.61 ⇒ (959.99, 1,011.21) 9 n
We are 99% confident that the true mean breaking strength is between 959.99 and 1,011.21. Since 1,000 is contained in this interval, it is not an unusual value for the true mean breaking strength. Thus, we would recommend that the process is not out of control.
160
Chapter 5
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
Inferences Based on a Single Sample: Tests of Hypothesis
Chapter 6
6.2
The test statistic is used to decide whether or not to reject the null hypothesis in favor of the alternative hypothesis.
6.4
A Type I error is rejecting the null hypothesis when it is true. A Type II error is accepting the null hypothesis when it is false.
α = the probability of committing a Type I error. β = the probability of committing a Type II error. 6.6
We can compute a measure of reliability for rejecting the null hypothesis when it is true. This measure of reliability is the probability of rejecting the null hypothesis when it is true which is α. However, it is generally not possible to compute a measure of reliability for accepting the null hypothesis when it is false. We would have to compute the probability of accepting the null hypothesis when it is false, β, for every value of the parameter in the alternative hypothesis.
6.8
Let p = proportion of U.S. companies that have formal, written travel and entertainment policies for their employees. The null hypothesis would be: H0: p = .80
6.10
Let μ = average Libor rate for 3-month loans. Since many Western banks think that the reported average Libor rate (.054) is too high, they want to show that the average is less than .054. The appropriate hypotheses would be: H0: μ = .054 Ha: μ < .054
6.12
Let p = proportion of time the camera correctly detects liars. The null hypothesis would be: H0: p = .75
6.14
a.
A Type I error would be concluding the proposed user is unauthorized when, in fact, the proposed user is authorized. A Type II error would be concluding the proposed user is authorized when, in fact, the proposed user is unauthorized. In this case, a more serious error would be a Type II error. One would not want to conclude that the proposed user is authorized when he/she is not.
b.
The Type I error rate is 1%. This means that the probability of concluding the proposed user is unauthorized when, in fact, the proposed user is authorized is .01.
Inferences Based on a Single Sample: Tests of Hypothesis
161
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
The Type II error rate is .00025%. This means that the probability of concluding the proposed user is authorized when, in fact, the proposed user is unauthorized is .0000025. c.
The Type I error rate is .01%. This means that the probability of concluding the proposed user is unauthorized when, in fact, the proposed user is authorized is .0001. The Type II error rate is .005%. This means that the probability of concluding the proposed user is authorized when, in fact, the proposed user is unauthorized is .00005.
6.16
6.18
a.
The null hypothesis is: Ho: There is no intrusion.
b.
The alternative hypothesis is: Ha: There is an intrusion.
c.
α = P(warning | no intrusion) =
1 = .001 . 1000
β = P(no warning | intrusion) =
500 = .5 . 1000
a.
The decision rule is to reject H0 if x > 270. Recall that z=
x − μ0
σx
Therefore, reject H0 if x > 270 can be written reject H0 if z >
x − μ0
σx 270 − 255 z> 63/ 81 z > 2.14
The decision rule in terms of z is to reject H0 if z > 2.14. b.
6.20
a.
P(z > 2.14) = .5 − P(0 < z < 2.14) = .5 − .4838 = .0162 H0: μ = .36 Ha: μ < .36
The test statistic is z =
x − μ0
σx
=
.323 − .36 .034 / 64
= −1.61
The rejection region requires α = .10 in the lower tail of the z-distribution. From Table IV, Appendix B, z.10 = 1.28. The rejection region is z < −1.28.
162
Chapter 6
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
Since the observed value of the test statistic falls in the rejection region (z = −1.61 < −1.28), H0 is rejected. There is sufficient evidence to indicate the mean is less than .36 at α = .10. b.
H0: μ = .36 Ha: μ ≠ .36
The test statistic is z = −1.61 (see part a). The rejection region requires α/2 = .10/2 = .05 in the each tail of the z-distribution. From Table IV, Appendix B, z.05 = 1.645. The rejection region is z < −1.645 or z > 1.645. Since the observed value of the test statistic does not fall in the rejection region (z = −1.61 −1.645), H0 is not rejected. There is insufficient evidence to indicate the mean is different from .36 at α = .10. 6.22
a.
To determine whether the mean July, 2006 dealer price of the Toyota Prius differs from $25,000, we test: H0: μ = 25,000 Ha: μ ≠ 25,000
b.
The sample mean is x =
∑ xi = 4,076,271 = 25, 476.69 n
160
The sample variance is:
s2 =
∑
xi2
( ∑ xi ) − n −1
n
2
=
104,788,653,115 − 160 − 1
4,076,2712 160 = 5,904,057.862
The sample standard deviation is: s = s 2 = 5,904,057.862 = 2, 429.8267 x − μo
The test statistic is z =
d.
The rejection region requires α/2 = .05/2 = .025 in each tail of the z-distribution. From Table IV, Appendix B, z.025 = 1.96. The rejection region is z < −1.96 or z > 1.96.
e.
Since the observed value of the test statistic falls in the rejection region (z = 2.48 > 1.96), Ho is rejected. There is sufficient evidence to indicate the mean July, 2006 dealer price of the Toyota Prius differs from $25,000 at α = .05.
σx
=
25, 476.69 − 25,000 = 2.48 2, 429.8267 160
c.
Inferences Based on a Single Sample: Tests of Hypothesis
163
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
6.24
a.
A Type I error is rejecting H0 when H0 is true. In this case, we would conclude that the mean number of carats per diamond is different from .6 when, in fact, it is equal to .6. A Type II error is accepting H0 when H0 is false. In this case, we would conclude that the mean number of carats per diamond is equal to .6 when, in fact, it is different from .6.
b.
From Exercise 5.18, the random sample of 30 diamonds yielded x = .691 and s = .262. Let μ = mean number of carats per diamond. To determine if the mean number of carats per diamond is different from .6, we test: H0: μ = .6 Ha: μ ≠ .6 The test statistic is z =
x − μ0
σx
=
.691 − .6 .262
30
= 1.90
The rejection region requires α/2 = .05/2 = .025 in each tail of the z-distribution. From Table IV, Appendix B, z.025 = 1.96. The rejection region is z > 1.96 or z < −1.96. Since the observed value of the test statistic does not fall in the rejection region (z = 1.90 >/ 1.96), H0 is not rejected. There is insufficient evidence to indicate the mean number of carats per diamond is different from .6 carats at α = .05. c.
When α is changed, H0, Ha, and the test statistic remain the same. The rejection region requires α/2 = .10/2 = .05 in each tail of the z-distribution. From Table IV, Appendix B, z.05 = 1.645. The rejection region is z > 1.645 or z < −1.645. Since the observed value of the test statistic falls in the rejection region (z = 1.90 > 1.645), H0 is rejected. There is sufficient evidence to indicate the mean number of carats per diamond is different from .6 carats at α = .10.
d.
6.26
When the value of α changes, the decision can also change. Thus, it is very important to include the level of α used in all decisions.
Using MINITAB, the descriptive statistics are: Descriptive Statistics: GASTURBINE Variable GASTURBINE
N 67
N* 0
Variable GASTURBINE
Maximum 16243
Mean 11066
SE Mean 195
StDev 1595
Minimum 8714
Q1 9918
Median 10656
Q3 11842
To determine if the mean heat rate of gas turbines augmented with high pressure inlet fogging exceeds 10,000 kJ/kWh, we test: H0: μ = 10,000 H0: μ > 10,000
164
Chapter 6
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
x − μo
The test statistic is z =
σx
=
11,066 − 10,000 = 5.47 1,595 67
The rejection region requires α = .05 in the upper tail of the z-distribution. From Table IV, Appendix B, z.05 = 1.645. The rejection region is z > 1.645. Since the observed value of the test statistics falls in the rejection region (z = 5.47 > 1.645), H0 is rejected. There is sufficient evidence to indicate the true mean heat rate of gas turbines augmented with high pressure inlet fogging exceeds 10,000 kJ/kWh at α = .05. 6.28
a.
Let μ = average full-service fee (in thousands of dollars) of U.S. funeral homes in 2006. To determine if the average full-service fee exceeds $6,500, we test: H0: μ = 6.50 Ha: μ > 6.50
b.
Using MINTAB, the output is: Descriptive Statistics: FUNERAL Variable Fee Variable Fee
N 36
Mean 6.819 Minimum 5.200
Median 6.600 Maximum 11.600
StDev 1.265 Q1 6.025
SE Mean 0.211 Q3 7.400
H0: μ = 6.50 Ha: μ > 6.50 The test statistic is z =
x − μ0
σx
=
6.819 − 6.50 = 1.51 1.265 36
The rejection region requires α = .05 in the upper tail of the z-distribution. From Table IV, Appendix B, z.05 = 1.645. The rejection region is z > 1.645. Since the observed value of the test statistic does not fall in the rejection region (z = 1.51 >/ 1.645), H0 is not rejected. There is insufficient evidence to indicate the true mean full-service fee of U.S. funeral homes in 2006 exceeds $6,500 at α = .05. c.
No. Since the sample size (n = 36) is greater than 30, the Central Limit Theorem applies. The distribution of x is approximately normal regardless of the population distribution.
Inferences Based on a Single Sample: Tests of Hypothesis
165
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
6.30
a.
To determine if the sample data refute the manufacturer's claim, we test:
H0: μ = 10 Ha: μ < 10 b.
A Type I error is concluding the mean number of solder joints inspected per second is less than 10 when, in fact, it is 10 or more. A Type II error is concluding the mean number of solder joints inspected per second is at least 10 when, in fact, it is less than 10.
c.
Using MINITAB, the descriptive statistics are:
Descriptive Statistics: PCB Variable PCB
N 48
Mean 9.292
Median 9.000
TrMean 9.432
Variable PCB
Minimum 0.000
Maximum 13.000
Q1 9.000
Q3 10.000
StDev 2.103
SE Mean 0.304
H0: μ = 10 Ha: μ < 10 The test statistic is z =
x − μ0
σx
=
9.292 − 10 2.103 / 48
= −2.33
The rejection region requires α = .05 in the lower tail of the z-distribution. From Table IV, Appendix B, z.05 = 1.645. The rejection region is z < −1.645. Since the observed value of the test statistic falls in the rejection region (z = −2.33 < −1.645), H0 is rejected. There is sufficient evidence to indicate the mean number of inspections per second is less than 10 at α = .05. 6.32
166
We will reject H0 if the p-value < α. a.
.06 .05, do not reject H0.
b.
.10 .05, do not reject H0.
c.
.01 < .05, reject H0.
d.
.001 < .05, reject H0.
e.
.251 .05, do not reject H0.
f.
.042 < .05, reject H0.
Chapter 6
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
6.34
z=
x − μ0
σx
=
49.4 − 50 4.1/ 100
= −1.46
p-value = P(z ≥ −1.46) = .5 + .4279 = .9279 There is no evidence to reject H0 for α ≤ .10. 6.36
First, find the value of the test statistic: z=
x − μ0
σx
=
10.7 − 10 3.1/ 50
= 1.60
p-value = P(z ≤ −1.60 or z ≥ 1.60) = 2P(z ≥ 1.60) = 2(.5 − .4452) = 2(.0548) = .1096 (using Table IV, Appendix B) There is no evidence to reject H0 for α ≤ .10. 6.38
a.
The p-value reported by SAS is for a two-tailed test. Thus, P(z ≤ −1.63) + P(z ≥ 1.63) = .1032. For this one-tailed test, the p-value = P(z ≤ −1.63) = .1032/2 = .0516. Since the p-value = .0516 > α = .05, H0 is not rejected. There is insufficient evidence to indicate μ < 75 at α = .05.
b.
For this one-tailed test, the p-value = P(z ≤ 1.63). Since P(z ≤ −1.63) = .1032/2 = .0516, P(z ≤ 1.63) = 1 − .0516 = .9484. Since the p-value = .9484 > α = .10, H0 is not rejected. There is insufficient evidence to indicate μ < 75 at α = .10.
c.
For this one-tailed test, the p-value = P(z ≥ 1.63) = .1032/2 = .0516. Since the p-value = .0516 < α = .10, H0 is rejected. There is sufficient evidence to indicate μ > 75 at α = .10.
d.
For this two-tailed test, the p-value = .1032. Since the p-value = .1032 > α = .01, H0 is not rejected. There is insufficient evidence to indicate μ ≠ 75 at α = .01.
6.40
The p-value is p = 0.014. The probability of observing a test statistic of t = 2.48 or anything more unusual if μ = 25,000 is 0.014. Since p = 0.014 is so small, we would reject H0. There is sufficient evidence to indicate the mean prices for hybrid Toyota Prius cars is different than $25,000 for any value of α > .014.
6.42
From the printout, the p-value = .000. Since the p-value = .000 < α = .01, H0 is rejected. There is sufficient evidence to indicate that the true population mean weight of plastic golf tees is different from .250 at α = .01.
Inferences Based on a Single Sample: Tests of Hypothesis
167
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
6.44
a.
z=
x − μo
σx
=
52.3 − 51 7.1
= 1.29
50
The p-value is p = P ( z ≥ 1.29)+P ( z ≤ −1.29) = (.5 − .4015) + (.5 − .4015) = .1970 . (Using Table IV, Appendix B.)
b.
The p-value is p = P ( z ≥ 1.29)= (.5 − .4015) = .0985 . (Using Table IV, Appendix B.)
c.
z=
x − μo
σx
=
52.3 − 51 10.4
50
= 0.88
The p-value is p = P ( z ≥ 0.88)+P ( z ≤ −0.88) = (.5 − .3106) + (.5 − .3106) = .3788 . (Using Table IV, Appendix B.) d.
In part a, in order to reject H0, α would have to be greater than .1970. In part b, in order to reject H0, α would have to be greater than .0985. In part c, in order to reject H0, α would have to be greater than .3788.
e.
For a two-tailed test, α/2 = .01/2 = .005. From Table IV, Appendix B, z.005 = 2.58. z=
x − μo
σx
⇒ 2.58 =
52.3 − 51 s
50
⇒ 2.58
s 50
= 52.3 − 51 ⇒ .3649s = 1.3 ⇒ s = 3.56
For a one-tailed test, α = .01. From Table IV, Appendix B, z.01 = 2.33. z=
6.46
a.
z=
x − μo
σx
x − μ0
σx
⇒ 2.33 =
=
52.3 − 51 s
10.2 − 0
50
⇒ 2.33
s 50
= 52.3 − 51 ⇒ .3295s = 1.3 ⇒ s = 3.95
= 2.30
31.3 / 50
b.
For this two-sided test, the p-value = P(z ≥ 2.30) + P(z ≤ −2.30) = (.5 − .4893) + (.5 − .4893) = .0214. Since this value is so small, there is evidence to reject H0. There is sufficient evidence to indicate the mean level of feminization is different from 0% for any value of α > .0214.
c.
z=
x - μ0
σx
=
15.0 − 0
= 4.23
25.1/ 50
For this two-sided test, the p-value = P(z ≥ 4.23) + P(z ≤ −4.23) ≈ (.5 − .5) + (.5 − .5) = 0. Since this value is so small, there is evidence to reject H0. There is sufficient evidence to indicate the mean level of feminization is different from 0% for any value of α > 0.0.
168
Chapter 6
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
6.48
6.50
a.
P(t > 1.440) = .10 (Using Table VI, Appendix B, with df = 6)
b.
P(t < −1.782) = .05 (Using Table VI, Appendix B, with df = 12)
c.
P(t < −2.060) + P(t > 2.060) = .025 + .025 = .05 (Using Table VI, Appendix B, with df = 25)
d.
The probability of a Type I error is computed above for each of the parts.
a.
H0: μ = 6 Ha: μ < 6 The test statistic is t =
x − μ0 s/ n
=
4.8 − 6 1.3/ 5
= −2.064
The necessary assumption is that the population is normal. The rejection region requires α = .05 in the lower tail of the t-distribution with df = n − 1 = 5 − 1 = 4. From Table VI, Appendix B, t.05 = 2.132. The rejection region is t < −2.132. Since the observed value of the test statistic does not fall in the rejection region (t = −2.064 −2.132), H0 is not rejected. There is insufficient evidence to indicate the mean is less than 6 at α = .05. b.
H0: μ = 6 Ha: μ ≠ 6 The test statistic is t = −2.064 (from a). The assumption is the same as in a. The rejection region requires α/2 = .05/2 = .025 in each tail of the t-distribution with df = n − 1 = 5 − 1 = 4. From Table VI, Appendix B, t.025 = 2.776. The rejection region is t < −2.776 or t > 2.776. Since the observed value of the test statistic does not fall in the rejection region (t = −2.064 −2.776), H0 is not rejected. There is insufficient evidence to indicate the mean is different from 6 at α = .05.
Inferences Based on a Single Sample: Tests of Hypothesis
169
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
c.
For part a, the p-value = P(t ≤ −2.064). From Table VI, with df = 4, .05 < P(t ≤ −2.064) < .10 or .05 < p-value < .10. For part b, the p-value = P(t ≤ −2.064) + P(t ≥ 2.064). From Table VI, with df = 4, 2(.05) < p-value < 2(.10) or .10 < p-value < .20.
6.52
a.
To determine if the true mean breaking strength of the new bonding adhesive is less than 5.70 Mpa, we test: H0: μ = 5.70 Ha: μ < 5.70
6.54
b.
The rejection region requires α = .01 in the lower tail of the t-distribution with df = n – 1 = 10 – 1 = 9. From Table VI, Appendix B, t.01 = 2.821. The rejection region is t < -2.821.
c.
The test statistic is t =
d.
Since the observed value of the test statistic falls in the rejection region (t = −4.33 < −2.821), H0 is rejected. There is sufficient evidence to indicate the true mean breaking strength of the new bonding adhesive is less than 5.70 Mpa at α = .01.
e.
We must assume that the sample was random and selected from a normal population.
x − μo s
n
=
5.07 − 5.70 .46
10
= −4.33 .
Some preliminary calculations are:
x=
s2 =
∑ x − 736 n
∑x
7
2
= 105.14
(∑ x) − n −1
n
2
=
(736) 2 7 = 218.4762 7 −1
78696 −
s=
218.4762 = 14.7809
a.
To determine if the mean consumption rate of salad dressings in the Southeastern U.S. is different than the mean national consumption rate, we test: H0: μ = 100 Ha: μ ≠ 100
b.
170
Since the sample size is so small, we must assume that the population being sampled is normal. In addition, we must assume that the sample is random.
Chapter 6
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
c.
The test statistic is t =
x − μ0 s/ n
=
105.14 − 100 14.7809 / 7
= .92
The rejection region requires α/2 = .05/2 = .025 in each tail of the t-distribution. From Table VI, Appendix B, with df = n − 1 = 7 − 1 = 6, t.025 = 2.447. The rejection region is t > 2.447 or t < −2.447. Since the value of the test statistic does not fall in the rejection region (t = .92 >/ 2.447), H0 is not rejected. There is insufficient evidence to indicate the mean consumption rate of salad dressings in the Southeastern U.S. is different than the mean national consumption rate at α = .05.
6.56
d.
The observed significance level is p-value = P(t ≥ .92) + P(t ≤ −.92). Since we did not reject H0 in part c, we know that the p-value must be greater than .05. Using Table VI, Appendix B, with df = n − 1 = 7 − 1 = 6, p-value = P(t ≥ .92) + P(t ≤ −.92) > .1 + .1 = .2 Thus, with this table, we only know that the p-value is greater than .2.
a.
To determine if the mean repellency percentage of the new mosquito repellent is less than 95, we test:
H0: μ = 95 Ha: μ < 95 The test statistic is t =
x − μ0 s/ n
=
83 − 95 15 / 5
= −1.79
The rejection region requires α = .10 in the lower tail of the t distribution. From Table VI, Appendix B, with df = n − 1 = 5 − 1 = 4, t.10 = 1.533. The rejection region is t < −1.533. Since the observed value of the test statistic falls in the rejection region (t = −1.79 < −1.533), H0 is rejected. There is sufficient evidence to indicate that the true mean repellency percentage of the new mosquito repellent is less than 95 at α = .10.
6.58
b.
We must assume that the population of percent repellencies is normally distributed.
a.
Using MINITAB, the descriptive statistics are:
Descriptive Statistics: Plants Variable Plants
N 20
Mean 4.000
Median 3.500
TrMean 3.667
Variable Plants
Minimum 1.000
Maximum 13.000
Q1 1.250
Q3 5.000
StDev 3.061
SE Mean 0.684
Let μ = mean number of active nuclear power plants operating in all states. To determine if the mean number of active nuclear power plants operating in all states exceeds 3, we test:
H0: μ = 3 Ha: μ > 3
Inferences Based on a Single Sample: Tests of Hypothesis
171
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
The test statistic is t =
x − μo s
n
=
4−3 3.061
20
= 1.46
The rejection region requires α = .10 in the upper tail of the t-distribution with df = n – 1 = 20 – 1 = 19. From Table VI, Appendix B, t.10 = 1.328. The rejection region is t > 1.328. Since the observed value of the test statistic falls in the rejection region (t = 1.46 > 1.328), H0 is rejected. There is sufficient evidence to indicate the mean number of active nuclear power plants operating in all states exceeds 3 at α = .10. b.
We will look at the 4 methods for determining if the data are normal. First, we will look at a histogram of the data. Using MINITAB, the histogram of the number of power plants is:
7 6
Frequency
5 4 3 2 1 0 2
4
6
8
10
12
14
Plants
From the histogram, the data appear to be skewed to the right. This indicates that the data may not be normal. Next, we look at the intervals x ± s, x ± 2 s, x ± 3s . If the proportions of observations falling in each interval are approximately .68, .95, and 1.00, then the data are approximately normal.
x ± s ⇒ 4 ± 3.061 ⇒ (.939, 7.061) 18 of the 20 values fall in this interval. The proportion is .90. This is much greater than the .68 we would expect if the data were normal. x ± 2s ⇒ 4 ± 2(3.061) ⇒ 4 ± 6.122 ⇒ (−2.122, 10.122) 19 of the 20 values fall in this interval. The proportion is .95. This is the same as the .95 we would expect if the data were normal. x ± 3s ⇒ 4 ± 3(3.061) ⇒ 4 ± 9.183 ⇒ (−5.183, 13.183) 20 of the 20 values fall in this interval. The proportion is 1.000. This is equal to the 1.00 we would expect if the data were normal.
172
Chapter 6
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
From this method, it appears that the data are not normal. Next, we look at the ratio of the IQR to s. IQR = QU – QL = 5.00 – 1.25 = 3.75.
IQR 3.75 = = 1.22 This is close to the 1.3 we would expect if the data were normal. s 3.061 This method indicates the data may be normal. Finally, using MINITAB, the normal probability plot is: Normal Probability Plot for Plants ML Estimates - 95% CI
99
ML Estimates
95
Mean
4
StDev
2.98329
90
Goodness of Fit
Percent
80
AD*
70 60 50
1.298
40 30 20 10 5
1 -5
0
5
10
Data
Since the data do not form a straight line, the data are not normal. From 3 of the 4 different methods, the indications are that the number of power plants data are not normal. c.
The two largest values are 9 and 13. The two lowest values are 1 and 1. Using MINITAB with the data deleted yields the descriptive statistics:
Descriptive Statistics: Plants2 Variable Plants2
N 16
Mean 3.500
Median 3.500
TrMean 3.429
Variable Plants2
Minimum 1.000
Maximum 7.000
Q1 2.000
Q3 5.000
StDev 1.826
SE Mean 0.456
To determine if the mean number of active nuclear power plants operating in all states exceeds 3 (using the reduced data set), we test: H0: μ = 3 Ha: μ > 3
Inferences Based on a Single Sample: Tests of Hypothesis
173
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
The test statistic is t =
x − μo s
n
=
3.5 − 3 1.826
16
= 1.10
The rejection region requires α = .10 in the upper tail of the t-distribution with df = n – 1 = 16 – 1 = 15. From Table VI, Appendix B, t.10 = 1.341. The rejection region is t > 1.341. Since the observed value of the test statistic does not fall in the rejection region (t = 1.10 >/ 1.341), H0 is not rejected. There is insufficient evidence to indicate the mean number of active nuclear power plants operating in all states exceeds 3 at α = .10. By eliminating the top two and bottom two observations, we have changed the decision from rejecting H0 to not rejecting H0. d.
6.60
It is very dangerous to eliminate data points to satisfy assumptions. The data may, in fact, not be normal. By eliminating data points, one has changed the kind of data that come from the parent population. Thus, incorrect decisions could be made.
Using MINITAB, the descriptive statistics for the 2 plants are: Descriptive Statistics: AL1, AL2 Variable aximum AL1 AL2
N
N*
Mean
SE Mean
StDev
Minimum
Q1
Median
Q3
2 2
0 0
0.00750 0.0700
0.00250 0.0200
0.00354 0.0283
0.00500 0.0500
* *
0.00750 0.0700
* *
M 0.01000 0.0900
To determine if plant 1 is violating the OSHA standard, we test: H0: μ = .004 Ha: μ > .004 The test statistic is t =
x − μo s
n
=
.0075 − .004 .00354
2
= 1.40
Since no α level was given, we will use α = .05. The rejection region requires α = .05 in the upper tail of the t-distribution with df = n – 1 = 2 – 1 = 1. From Table VI, Appendix B, t.05 = 6.314. The rejection region is t > 6.314. Since the observed value of the test statistic does not fall in the rejection region (t = 1.40 >/ 6.314), H0 is not rejected. There is insufficient evidence to indicate the OSHA standard is violated by plant 1 at α = .05. To determine if plant 2 is violating the OSHA standard, we test: H0: μ = .004 Ha: μ > .004 The test statistic is t =
174
x − μo s
n
=
.07 − .004 .0283
2
= 3.30
Chapter 6
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
Since no α level was given, we will use α = .05. The rejection region requires α = .05 in the upper tail of the t-distribution with df = n – 1 = 2 – 1 = 1. From Table VI, Appendix B, t.05 = 6.314. The rejection region is t > 6.314. Since the observed value of the test statistic does not fall in the rejection region (t = 3.30 >/ 6.314), H0 is not rejected. There is insufficient evidence to indicate the OSHA standard is violated by plant 2 at α = .05. 6.62
b.
First, check to see if n is large enough. p0 ± 3σ pˆ ⇒ p0 ± 3
p0 q0 (.70)(.30) ⇒ .70 ± 3 ⇒ .70 ± .14 ⇒ (.56, .84) 100 n
Since the interval lies within the interval (0, 1), the normal approximation will be adequate. H0: p = .70 Ha: p < .70 The test statistic is z =
pˆ − p0
σ pˆ
=
pˆ − p0 p0 q0 n
=
.63 − .70 = −1.53 .70(.30) 100
The rejection region requires α = .05 in the lower tail of the z-distribution. From Table IV, Appendix B, z.05 = 1.645. The rejection region is z < −1.645. Since the observed value of the test statistic does not fall in the rejection region (−1.53 −1.645), H0 is not rejected. There is insufficient evidence to indicate that the proportion is less than .70 at α = .05. c.
p-value = P(z ≤ −1.53) = .5 − .4370 = .0630 Since p is not less than α = .05, H0 is not rejected.
6.64
a.
No. The p-value is the probability of observing your test statistic or anything more unusual if H0 is true. For this problem, the p-value = .3300/2 = .1650. Given the true value of the population proportion, p, is .5, the probability of observing a test statistic of z = .44 or larger is .1650. Since the p-value is not small (p = .1650), there is no evidence to reject H0. There is no evidence to indicate the population proportion is greater than .5.
b.
If the alternative hypothesis were two-tailed, the p-value would be 2 times the p-value for a one-tailed test. For this problem, the p-value = .3300. The probability of observing your test statistic or anything more unusual if H0 is true is .3300. There is no evidence to reject H0 for α ≤ .10. There is no evidence to indicate that p ≠ .5 for α ≤ .10.
Inferences Based on a Single Sample: Tests of Hypothesis
175
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
6.66
6.68
x 64 = = .604 n 106
a.
pˆ =
b.
H0: p = .70 Ha: p ≠ .70
c.
The test statistic is z =
d.
The rejection region requires α/2 = .01/2 = .005 in each tail of the z-distribution. From Table IV, Appendix B, z.005 = 2.58. The rejection region is z > 2.58 or z < −2.58.
e.
Since the observed value of the test statistic does not fall in the rejection region (z = −2.16 −2.58), H0 is not rejected. There is insufficient evidence to indicate the true proportion of consumers who believe “Made in the USA” means 100% of labor and materials are from the United States is different from .70 at α = .01.
a.
The population parameter of interest is p = proportion of items that had the wrong price scanned at California Wal-Mart stores.
b.
To determine if the true proportion of items scanned at California Wal-Mart stores with the wrong price exceeds the 2% NIST standard, we test:
pˆ − p0 p0 q0 n
=
.604 − .70 = −2.16 .70(.30) 106
H0: p = .02 Ha: p > .02 c.
The test statistic is z =
pˆ − po po qo n
=
.083 − .02 .02(.98) 1000
= 14.23
The rejection region requires α = .05 in the upper tail of the z-distribution. From Table IV, Appendix B, z.05 = 1.645. The rejection region is z > 1.645. d.
Since the observed value of the test statistic falls in the rejection region (z = 14.23 > 1.645), H0 is rejected. There is sufficient evidence to indicate that the true proportion of items scanned at California Wal-Mart stores with the wrong price exceeds the 2% NIST standard at α = .05. This means that the proportion of items with wrong prices at California Wal-Mart stores is much higher than what is allowed.
e.
In order for the inference to be valid, the sampling distribution of pˆ must be approximately normal. We check this assumption: po ± 3σ pˆ ⇒ po ± 3
po qo .02(.98) ⇒ .02 ± 3 ⇒ .02 ± .013 ⇒ (.007, .033) n 1000
Since the above interval falls completely in the interval (0, 1), the normal distribution will be adequate.
176
Chapter 6
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
6.70
a.
Let p = proportion of vacation-home owners who are minorities in 2003. pˆ =
x 46 = = .111 n 416
To determine if the percentage of vacation-home owners in 2006 who are minorities is larger than 6%, we test: H0: p = .06 Ha: p > .06 The test statistic is z =
pˆ − po po qo n
=
.111 − .06 = 4.38 .06(.94) 416
The rejection region requires α = .01 in the upper tail of the z-distribution. From Table IV, Appendix B, z.01 = 2.33. The rejection region is z > 2.33. Since the observed value of the test statistic falls in the rejection region (z = 4.38 > 2.33), H0 is rejected. There is sufficient evidence to indicate that the true percentage of vacation-home owners in 2006 who are minorities is larger than 6% at α = .01. b.
6.72
Since the return rate of the questionnaire was so small compared to the number sent out, one should be very skeptical of the results. It would be fairly unusual that the sample of returned questionnaires would be representative of the entire population.
Let p = proportion of firms in violation of the new 4-day rule for reporting material changes. pˆ =
x 23 = = .050 n 462
To determine if the percentage of firms in violation of the new 4-day rule for reporting material changes is less than 10%, we test: H0: p = .10 Ha: p < .10 The test statistic is z =
pˆ − po po qo n
=
.050 − .10 = −3.58 .10(.90) 462
The rejection region requires α = .01 in the lower tail of the z-distribution. From Table IV, Appendix B, z.01 = 2.33. The rejection region is z < −2.33. Since the observed value of the test statistic falls in the rejection region (z = −3.58 < −2.33), Ho is rejected. There is sufficient evidence to indicate that the true percentage of firms in violation of the new 4-day rule for reporting material changes is less than 10% at α = .01.
Inferences Based on a Single Sample: Tests of Hypothesis
177
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
6.74
Let p = proportion of patients taking the pill who reported an improved condition. First we check to see if the normal approximation is adequate: p0 ± 3σ pˆ ⇒ p0 ± 3
p0 q0 .5(.5) ⇒± 3 ⇒ .5 ± .018 ⇒ (.482, .518) 7000 n
Since the interval falls completely in the interval (0, 1), the normal distribution will be adequate. To determine if there really is a placebo effect at the clinic, we test: H0: p = .5 Ha: p > .5 The test statistic is z =
pˆ − p0 p0 q0 n
=
.7 − .5 = 33.47 .5(.5) 7000
The rejection region requires α = .05 in the upper tail of the z distribution. From Table IV, Appendix B, z.05 = 1.645. The rejection region is z > 1.645. Since the observed value of the test statistic falls in the rejection region (z = 33.47 > 1.645), H0 is rejected. There is sufficient evidence to indicate that there really is a placebo effect at the clinic at α = .05. 6.76
a.
The power of a test increases when: 1. 2. 3.
b.
178
The distance between the null and alternative values of μ increases. The value of α increases. The sample size increases.
The power of a test is equal to 1 − β. As β increases, the power decreases.
Chapter 6
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
6.78
6.80
From Exercise 6.77 we want to test H0: μ = 500 against Ha: μ > 500 using α = .05, σ = 100, n = 25, and x = 532.9. ⎛
532.9 − 575 ⎞ ⎟ = P(z < −2.11) 100 / 25 ⎠ = .5 − .4826 = .0174
a.
β = P( x0 < 532.9 when μ = 575) = P ⎜ z <
b.
Power = 1 − β = 1 − .0174 = .9826
c.
In Exercise 6.77, β = .1949 and the power is .8051. The value of β has decreased in this exercise since μ = 575 is further from the hypothesized value than μ = 550. As a result, the power of the test in this exercise has increased (when β decreases, the power of the test increases).
a.
From Exercise 6.79, we want to test H0: μ = 75 against Ha: μ < 75 using α = .10, σ = 15, n = 49, and x = 72.257.
⎝
If μ = 74,
⎛
β = P( x0 > 72.257 when μ = 74) = P ⎜ z > ⎝
If μ = 72,
⎛
μ = P( x0 > 72.257 when μ = 72) = P ⎜ z > ⎝
If μ = 70,
72.257 − 74 ⎞ ⎟ = P(z > −.81) 15 / 49 ⎠ = .5 + .2910 = .7910 72.257 − 72 ⎞ ⎟ = P(z > .12) 15 / 49 ⎠ = .5 − .0478 = .4522
β = P( x0 > 72.257 when μ = 70) = .1469 (Refer to Exercise 6.69, part c.) If μ = 68,
⎛
β = P( x0 > 72.257 when μ = 68) = P ⎜ z > ⎝
If μ = 66,
⎛
β = P( x0 > 72.257 when μ = 66) = P ⎜ z > ⎝
In summary,
μ β
74 .7910
72 .4522
70 .1469
Inferences Based on a Single Sample: Tests of Hypothesis
72.257 − 68 ⎞ ⎟ = P(z > 1.99) 15 / 49 ⎠ = .5 − .4767 = .0233 72.257 − 66 ⎞ ⎟ = P(z > 2.92) 15 / 49 ⎠ = .5 − .4982 = .0018
68 .0233
66 .0018
179
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
b.
c.
Looking at the graph, β is approximately .62 when μ = .73.
d.
Power = 1 − β Therefore, 74 μ .7910 β Power .2090
72 .4522 .5478
70 .1469 .8531
68 .0233 .9767
66 .0018 .9982
The power curve starts out close to 1 when μ = 66 and decreases as μ increases, while the β curve is close to 0 when μ = 66 and increases as μ increases.
6.82
e.
As the distance between the true mean μ and the null hypothesized mean μ0 increases, β decreases and the power increases. We can also see that as β increases, the power decreases.
a.
To determine if the mean size of California homes exceeds the national average, we test: H0: μ = 2230 Ha: μ > 2230
180
Chapter 6
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
The test statistic is z =
x − μ0
σx
=
2347 − 2230 = 4.55 257 / 100
The rejection region requires α = .01 in the upper tail of the z-distribution. From Table IV, Appendix B, z.05 = 2.33. The rejection region is z > 2.33. Since the observed value of the test statistic falls in the rejection region (z = 4.55 > 2.33), H0 is rejected. There is sufficient evidence to indicate the mean size of California homes exceeds the national average at α = .01. b.
To compute the power, we must first set up the rejection regions in terms of . ⎛ s ⎞ ⎛ 257 ⎞ x0 = μ0 + zα σ x ≈ μ0 + 2.33 ⎜ ⎟ = 2, 230 + 2.33 ⎜ ⎟ = 2,289.88 ⎝ n⎠ ⎝ 100 ⎠
We would reject H0 if x > 2,289.88 The power of the test when μ = 2,330 would be: ⎛ x − μa ⎞ 2, 289.88 − 2,330 ⎞ ⎛ Power = P( x > 2289.88⏐μ = 2,330) = P ⎜ z > 0 ⎟ = P⎜ z > ⎟ σx ⎠ 257 / 100 ⎝ ⎠ ⎝ = P(z > −1.56) = .5 + .4406 = .9406
c.
The power of the test when μ = 2,280 would be: ⎛ x − μa Power = P( > 2289.88⏐μ = 2,280) = P ⎜ z > 0 σx ⎝ = P(z > 0.38) = .5 − .1480 = .3520
6.84
a.
⎞ 2, 289.88 − 2, 280 ⎞ ⎛ ⎟ = P⎜ z > ⎟ 257 / 100 ⎝ ⎠ ⎠
To determine if the mean mpg for 2006 Honda Civic autos is greater than 38 mpg, we test: H0: μ = 38 Ha: μ > 38
b.
The test statistic is z =
x − μ0
σx
=
40.3 − 38 = 2.16 6.4 / 36
The rejection region requires α = .05 in the upper tail of the z-distribution. From Table IV, Appendix B, z.05 = 1.645. The rejection region is z > 1.645. Since the observed value of the test statistic falls in the rejection region (z = 2.16 > 1.645), H0 is rejected. There is sufficient evidence to indicate that the mean mpg for 2006 Honda Civic autos is greater than 38 mpg at α = .05. We must assume that the sample was a random sample.
Inferences Based on a Single Sample: Tests of Hypothesis
181
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
c.
First find: x0 = μ0 + zα σ x = μ0 + zα
Thus, x0 = 38 + 1.645
σ n
where zα = 1.645 from Table IV, Appendix B.
6.4 = 39.75 36
For μ = 38.5: 39.75 − 38.5 ⎞ ⎛ Power = P( x > 39.75│μ = 38.5) = P ⎜ z > ⎟ = P(z > 1.17) 6.4 / 36 ⎠ ⎝ For μ = 39:
= .5 − .3790 = .1210
39.75 − 39 ⎞ ⎛ Power = P( x > 39.75│μ = 39) = P ⎜ z > ⎟ = P(z > .70) 6.4 / 36 ⎠ ⎝ For μ = 39.5:
= .5 − .2580 = .2420
39.75 − 39.5 ⎞ ⎛ Power = P( x > 39.75│μ = 39.5) = P ⎜ z > ⎟ = P(z > .23 ) 6.4 / 36 ⎠ ⎝ For μ = 40:
= .5 − .0910 = .4090
39.75 − 40 ⎞ ⎛ Power = P( x > 39.75│μ = 40) = P ⎜ z > ⎟ = P(z > −.23) 6.4 / 36 ⎠ ⎝ For μ = 40.5:
= .5 + .0910 = .5910
39.75 − 40.5 ⎞ ⎛ Power = P( x > 39.75│μ = 40.5) = P ⎜ z > ⎟ = P(z > −.70) 6.4 / 36 ⎠ ⎝ = .5 + .2580 = .7580 d.
182
The plot is:
Chapter 6
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
e.
From the plot, the power is approximately .5. For μ = 39.75 : ⎛ 39.75 − 39.75 ⎞ Power = P( x > 39.75 | μ = 39.75) = P ⎜ z > ⎟ = P( z > 0) = .5 ⎜ 6.4 36 ⎟⎠ ⎝
f.
From the plot, the power is approximately 1. For μ = 43 : ⎛ 39.75 − 43 ⎞ Power = P( x > 39.75 | μ = 43) = P ⎜ z > ⎟ = P( z > −3.05) ⎜ 6.4 36 ⎟⎠ ⎝ = .5 + .4989 = .9989 If the true value of μ is 40, the approximate probability that the test will fail to reject H0 is 1 − .9989 = .0011.
6.86
Using Table VII, Appendix B: a.
For n = 12, df = n − 1 = 12 − 1 = 11 P(χ2 > χ 02 ) = .10 ⇒ χ 02 = 17.2750
b.
For n = 9, df = n − 1 = 9 − 1 = 8 P(χ2 > χ 02 ) = .05 ⇒ χ 02 = 15.5073
c.
For n = 5, df = n − 1 = 5 − 1 = 4 P(χ2 > χ 02 ) = .025 ⇒ χ 02 = 11.1433
6.88
a.
It would be necessary to assume that the population has a normal distribution.
b.
H0: σ2 = 1 Ha: σ2 > 1 The test statistic is χ2 =
(n − 1) s 2
σ
2 0
=
6(4.84) = 29.04 1
The rejection region requires α = .05 in the upper tail of the χ2 distribution with 2 = 12.5916. The rejection df = n − 1 = 7 − 1 = 6. From Table VII, Appendix B, χ.05 region is χ2 > 12.5916. Since the observed value of the test statistic falls in the rejection region (29.04 > 12.5916), H0 is rejected. There is sufficient evidence to indicate that the variance is greater than 1 at α = .05.
Inferences Based on a Single Sample: Tests of Hypothesis
183
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
c.
H0: σ2 = 1 Ha: σ2 ≠ 1 (n − 1) s 2
The test statistic is χ2 =
σ
2 0
=
6(4.84) = 29.04 1
The rejection region requires α/2 = .025 in the upper tail of the χ2 distribution with 2 = 1.237347 and df = n − 1 = 7 − 1 = 6. From Table VII, Appendix B, χ.975 2 χ.025 = 14.4494. The rejection region is χ2 < 1.237347 or χ2 > 14.4494.
Since the observed value of the test statistic falls in the rejection region (29.04 > 14.4494), H0 is rejected. There is sufficient evidence to indicate that the variance is not equal to 1 at α = .05. 6.90
Some preliminary calculations are:
s2 =
∑x
(∑ x) −
2
n −1
n
2
=
302 7 = 7.9048 7 −1
176 −
To determine if σ2 < 1, we test: H0: σ2 = 1 Ha: σ2 < 1 The test statistic is χ2 =
(n − 1) s 2
σ
2 0
=
(7 − 1)7.9048 = 47.43 1
The rejection region requires α = .05 in the lower tail of the χ2 distribution with df = n − 1 = 7 2 = 1.63539. The rejection region is χ2 < 1.63539. − 1 = 6. From Table VII, Appendix B, χ.95 Since the observed value of the test statistic does not fall in the rejection region (χ2 = 47.43 1.63539), H0 is not rejected. There is insufficient evidence to indicate the variance is less than 1. 6.92
a.
To determine if the breaking strength variance of the new adhesive is less than the variance of the standard composite adhesive, σ2 = .25, we test: H0: σ2 = .25 Ha: σ2 < .25
b.
184
The rejection region requires α = .01 in the lower tail of the χ2 distribution with 2 df = n – 1 = 10 – 1 = 9. From Table VII, Appendix B, χ.99 = 2.087912. The rejection 2 region is χ < 2.087912.
Chapter 6
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
6.94
(n − 1) s 2
(10 − 1).462 = 7.6176 . .25
c.
The test statistic is χ 2 =
b.
Since the observed value of the test statistic does not fall in the rejection region (χ2 = 7.6176 2.087912), H0 is not rejected. There is insufficient evidence to indicate the breaking strength variance of the new adhesive is less than the variance of the standard composite adhesive, σ2 = .25 at α = .01.
e.
We must assume that the distribution of the breaking strengths is approximately normal and that a random sample was selected from this population.
σ o2
=
To determine if the true standard deviation of the point-spread errors exceed 15 (variance exceeds 225), we test: H0: σ2 = 225 Ha: σ2 > 225 The test statistic is χ2 =
(n − 1) s 2
σ 02
=
(240 − 1)13.32 = 187.896 225
The rejection region requires α in the upper tail of the χ2 distribution with df = n − 1 = 240 − 1 = 239. The maximum value of df in Table VII is 100. Thus, we cannot find the rejection region using Table VII. Using a statistical package, the p-value associated with χ2 = 187.896 is .9938. Since the p-value is so large, there is no evidence to reject H0. There is insufficient evidence to indicate that the true standard deviation of the point-spread errors exceeds 15 for any reasonable value of α. (Since the observed variance (or standard deviation) is less than the hypothesized value of the variance (or standard deviation) under H0, there is no way H0 will be rejected for any reasonable value of α.) 6.96
Using MINITAB, the descriptive statistics are: Descriptive Statistics: GASTURBINE Variable GASTURBINE
N 67
N* 0
Variable GASTURBINE
Maximum 16243
Mean 11066
SE Mean 195
StDev 1595
Minimum 8714
Q1 9918
Median 10656
Q3 11842
To determine if the heat rates of the augmented gas turbine engine are more variable than the heat rates of the standard gas turbine engine, we test: Ho: σ2 = 1,5002 Ha: σ2 > 1,5002
Inferences Based on a Single Sample: Tests of Hypothesis
185
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
The test statistic is χ 2 =
( n − 1) s 2
σ o2
=
(67 − 1)1,5952 = 74.625 . 1,5002
The rejection region requires α = .05 in the upper tail of the χ2 distribution with 2 ≈ 85.95148. The rejection df = n – 1 = 67 – 1 = 66. From Table VII, Appendix B, χ.05 2 region is χ > 85.95148. Since the observed value of the test statistic does not fall in the rejection region (χ2 = 74.625 >/ 85.95148), H0 is not rejected. There is insufficient evidence to indicate the heat rates of the augmented gas turbine engine are more variable than the heat rates of the standard gas turbine engine at α = .05. 6.98
For a large sample test of hypothesis about a population mean, no assumptions are necessary because the Central Limit Theorem assures that the test statistic will be approximately normally distributed. For a small sample test of hypothesis about a population mean, we must assume that the population being sampled from is normal. The test statistic for the large sample test is the z statistic, and the test statistic for the small sample test is the t statistic.
6.100
The elements of the test of hypothesis that should be specified prior to analyzing the data are: null hypothesis, alternative hypothesis, and rejection region based on α.
6.102
α = P(Type I error) = P(rejecting H0 when it is true). Thus, if rejection of H0 would cause your firm to go out of business, you would want this probability or α to be small.
6.104
a.
H0: μ = 8.3 Ha: μ ≠ 8.3 The test statistic is z =
x − μ0
σx
=
8.2 − 8.3 .79 / 175
= −1.67
The rejection region requires α/2 = .05/2 = .025 in each tail of the z-distribution. From Table IV, Appendix B, z.025 = 1.96. The rejection region is z < −1.96 or z > 1.96. Since the observed value of the test statistic does not fall in the rejection region (−1.67 −1.96), H0 is not rejected. There is insufficient evidence to indicate that the mean is different from 8.3 at α = .05. b.
H0: μ = 8.4 Ha: μ ≠ 8.4 The test statistic is z =
x − μ0
σx
=
8.2 − 8.4 = −3.35 .79 / 175
The rejection region is the same as part b, z < −1.96 or z > 1.96.
186
Chapter 6
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
Since the observed value of the test statistic falls in the rejection region (−3.35 < −1.96), H0 is rejected. There is sufficient evidence to indicate that the mean is different from 8.4 at α = .05. c.
H0: σ = 1 Ha: σ ≠ 1
H0: σ2 = 1 or
Ha: σ2 ≠ 1
The test statistic is χ 2 =
(n − 1) s 2
σ 02
=
(175 − 1)(.79) 2 = 108.59 1
The rejection region requires α/2 = .05/2 = .025 in each tail of the χ 2 distribution with df 2 2 ≈ 129.561 and χ.975 ≈ = n – 1 = 175 – 1 = 174. From Table VII, Appendix B, χ.025
74.2219. The rejection region is χ 2 > 129.561 or χ 2 < 74.2219. Since the observed value of the test statistic does not fall in the rejection region ( χ 2 = 108.59 >/ 129.561 and χ 2 = 108.59 74.2219), H0 is not rejected. There is insufficient evidence to indicate the variance differs from 1 at α = .05. d.
In part a, the rejection region is z < −1.96 or z > 1.96. In terms of x , the rejection region would be:
z=
x − μ0
z=
x − μ0
σx
σx
⇒ 1.96 =
xU − 8.3 .79
⇒ −1.96 =
175
⇒ .117 = xU − 8.3 ⇒ xU = 8.417
xL − 8.3 .79
175
⇒ −.117 = xL − 8.3 ⇒ xL = 8.183
Based on x , the rejection region would be: Reject H0 if x < 8.183 or x > 8.417 The power of the test is the probability the test statistic falls in the rejection region, given the alternative hypothesis is true. In this case, we will let μa = 8.5. Power = P( x < 8.183 | μa = 8.5) + P( x > 8.417 | μa = 8.5) ⎛ ⎛ 8.183 − 8.5 ⎞ 8.417 − 8.5 ⎞ = P ⎜⎜ z < ⎟ + P ⎜⎜ z > ⎟ ⎟ .79 175 ⎠ .79 175 ⎟⎠ ⎝ ⎝ = P( z < −5.31) + P ( z > −1.39) = (.5 − .5) + (.5 + .4177) = .9177 (Using Table IV, Appendix B)
Inferences Based on a Single Sample: Tests of Hypothesis
187
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
6.106
6.108
a.
The p-value = .1288 = P(t ≥ 1.174). Since the p-value is not very small, there is no evidence to reject H0 for α ≤ .10. There is no evidence to indicate the mean is greater than 10.
b.
We must assume that a random sample was selected from a population that is normally distributed.
c.
For the alternative hypothesis Ha: μ ≠ 10, the p-value is 2 times the p-value for the onetailed test. The p-value = 2(.1288) = .2576. There is no evidence to reject H0 for α ≤ .10. There is no evidence to indicate the mean is different from 10.
a.
If we wish to test the research hypothesis that the mean GHQ score for all unemployed men exceeds 10, we test: H0: μ = 10 Ha: μ > 10 This is a one-tailed test. We are only interested in rejecting H0 if the mean GHQ score for all unemployed men is greater than 10.
b.
The rejection region requires α = .05 in the upper tail of the z-distribution. From Table IV, Appendix B, z.05 = 1.645. The rejection region is z > 1.645.
c.
The test statistic is z =
x − μ0
σx
=
10.94 − 10.0 = 1.29 5.10 / 49
Since the observed value of the test statistic does not fall in the rejection region (z = 1.29 >/ 1.645), H0 is not rejected. There is insufficient evidence to indicate the mean GHQ score for all unemployed men is greater than 10 at α = .05. d.
The p-value is P(z ≥ 1.29) = .5 − .4015 = .0985. (Using Table IV, Appendix B) The probability of observing our test statistic or anything more unusual, given H0 is true, is .0985. Since this value is not less than α = .05, we do not reject H0. There is insufficient evidence to indicate the mean GHO score is greater than 10.
6.110
a.
The population parameter of interest is p = proportion of all television viewers with access to cable-TV who agree with the statement “Overall, I find the quality of news on cable networks to be better than news on the ABC, CBS, and NBC networks.
b.
pˆ =
c.
To determine if the true proportion of TV-viewers who find cable news to be better quality than network news differs from .50, we test:
x 248 = = .496 n 500
H0: p = .50 Ha: p ≠ .50
188
Chapter 6
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
d.
The test statistic is z =
pˆ − p0 p0 q0 n
=
.496 − .50 = −0.18 .50(.50) 500
The rejection region requires α/2 = .10/2 = .05 in each tail of the z-distribution. From Table IV, Appendix B, z.05 = 1.645. The rejection region is z > 1.645 or z < −1.645. Since the observed value of the test statistic does not fall in the rejection region (z = −0.18 −1.645), H0 is not rejected. There is insufficient evidence to indicate the true proportion of TV-viewers who find cable news to be better quality than network news differs from .50 at α = .10. e.
In order for the inference to be valid, the sampling distribution of pˆ must be approximately normal. We check this assumption: p0 ± 3σ pˆ ⇒ p0 ± 3
p0 q0 .5(.5) ⇒ .5 ± 3 ⇒ .5 ± .067 ⇒ (.433, .567) n 500
Since the interval falls completely in the interval (0, 1), the normal distribution will be adequate. 6.112
a.
First, check to see if the normal approximation is adequate: p0 ± 3 σ pˆ ⇒ p0 ± 3
p0 q0 (.25)(.75) ⇒ .25 ± 3 ⇒ .25 ± .103 ⇒ (.147, .353) n 159
Since the interval falls completely in the interval (0, 1), the normal distribution will be adequate.
pˆ =
x 124 = .786 = n 159
To determine if the percentage of truckers who suffer from sleep apnea differs from 25%, we test: H0: p = .25 Ha: p ≠ .25 The test statistic is z =
pˆ − p0 p0 q0 n
=
.786 − .25 = 15.61 (.25)(.75) 159
The rejection region requires α/2 = .10/2 = .05 in each tail of the z-distribution. From Table IV, Appendix B, z.05 = 1.645. The rejection region is z < −1.645 or z > 1.645.
Inferences Based on a Single Sample: Tests of Hypothesis
189
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
Since the observed value of the test statistic falls in the rejection region (z = 15.61 > 1.645), H0 is rejected. There is sufficient evidence to indicate that the percentage of truckers who suffer from sleep apnea differs from 25% at α = .05. b.
The observed significance level is the p-value and is: p-value = P(z ≥ 15.61) + P(z ≤ −15.61) ≈ (.5 − .5) + (.5 − .5) = 0 Since the p-value is so small, we would reject H0 for any reasonable value of α. There is sufficient evidence to indicate that the percentage of truckers who suffer from sleep apnea differs from 25%.
6.114
c.
The inference from a confidence interval and a test of hypothesis must agree because the same numbers are used in both if the same level of significance is used.
a.
Let p = proportion of shoppers using cents-off coupons. To determine if the proportion of shoppers using cents-off coupons exceeds .65, we test: H0: p = .65 Ha: p > .65 The test statistic is z =
pˆ − p0 p0 q0 n
=
.77 − .65 .65(.35) 1, 000
= 7.96
The rejection region requires α = .05 in the upper tail of the z-distribution. From Table IV, Appendix B, z.05 = 1.645. The rejection region is z > 1.645. Since the observed value of the test statistic falls in the rejection region (z = 7.96 > 1.645), H0 is rejected. There is sufficient evidence to indicate the proportion of shoppers using cents-off coupons exceeds .65 at α = .05. b.
The sample size is large enough if the interval does not include 0 or 1. p0 q0 .65(.35) ⇒ .65 ± 3 ⇒ .65 ± .045 ⇒ (.605, .695) n 1, 000 Since the interval falls completely in the interval (0, 1), the normal distribution will be adequate. p0 ± 3σ pˆ ⇒ p0 ± 3
c.
190
The p-value is p = P ( z ≥ 7.96) = (.5 − .5) ≈ .0 . (Using Table IV, Appendix B.) Since the p-value is smaller than α = .05, H0 is rejected. There is sufficient evidence to indicate the proportion of shoppers using cents-off coupons exceeds .65 at α = .05.
Chapter 6
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
6.116
Using MINITAB, the descriptive statistics are: Descriptive Statistics: Tunnel Variable Tunnel
N 10
Mean 989.8
Median 970.5
TrMean 987.9
Variable Tunnel
Minimum 735.0
Maximum 1260.0
Q1 862.5
Q3 1096.8
StDev 160.7
SE Mean 50.8
To determine whether peak hour pricing succeeded in reducing the average number of vehicles attempting to use the Lincoln Tunnel during the peak rush hour, we test: H0: μ = 1,220 Ha: μ < 1,220 The test statistic is t =
x − μ0 s/ n
=
989.8 − 1, 220 160.7 / 10
= −4.53
Since no α is given, we will use α = .05. The rejection region requires α = .05 in the lower tail of the t-distribution with df = n − 1 = 10 − 1 = 9. From Table VI, Appendix B, t.05 = 1.833. The rejection region is t < −1.833. Since the observed value of the test statistic falls in the rejection region (t = −4.53 < −1.833), H0 is rejected. There is sufficient evidence to indicate that peak hour pricing succeeded in reducing the average number of vehicles attempting to use the Lincoln Tunnel during the peak rush hour at α = .05. 6.118
a.
To determine if the true mean number of pecks at the blue string is less than 7.5, we test: H0: μ = 7.5 Ha: μ < 7.5 The test statistic is z =
x − μ0
σx
=
1.13 − 7.5 2.21
72
= −24.46
The rejection region requires α = .01 in the lower tail of the z-distribution. From Table IV, Appendix B, z.01 = 2.33. The rejection region is z < −2.33. Since the observed value of the test statistic falls in the rejection region (z = −24.46 < −2.33), H0 is rejected. There is sufficient evidence to indicate the true mean number of pecks at the blue string is less than 7.5 at α = .01.
b.
From Exercise 5.96, the 99% confidence interval is (.46, 1.80). Since the hypothesized value of the mean (μ = 7.5) does not fall in the confidence interval, it is not a likely candidate for the true value of the mean. Thus, you would reject it. This agrees with the conclusion in part a.
Inferences Based on a Single Sample: Tests of Hypothesis
191
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
6.120
a.
pˆ = 24/40 = .6 To determine if the proportion of shoplifters turned over to police is greater than .5, we test: H0: p = .5 Ha: p > .5 The test statistic is z =
pˆ − p0 p0 q0 n
=
.6 − .5 .5(.5) 40
= 1.26
The rejection region requires α = .05 in the upper tail of the z-distribution. From Table IV, Appendix B, z.05 = 1.645. The rejection region is z > 1.645. Since the observed value of the test statistic does not fall in the rejection region (z = 1.26 >/ 1.645), H0 is not rejected. There is insufficient evidence to indicate the proportion of shoplifters turned over to police is greater than .5 at α = .05. b.
To determine if the normal approximation is appropriate, we check: p0 ± 3σ pˆ ± 3
p0 q0 (.5)(.5) ≈ .5 ± 3 ⇒ .5 ± .237 ⇒ (.263, .737) n 40
Since the interval falls completely in the interval (0, 1), the normal distribution will be adequate. c.
The observed significance level of the test is p-value = P(z ≥ 1.26) = .5 − .3962 = .1038. The probability of observing the value of our test statistic or anything more unusual if the true value of p is .5 is .4038. Since this p-value is so large, there is no evidence to reject H0. There is no evidence to indicate the true proportion of shoplifters turned over to police is greater than .5.
6.122
d.
Any value of α that is greater than the p-value would lead one to reject H0. Thus, for this problem, we would reject H0 for any value of α > .1038.
a.
To determine whether the mean profit change for restaurants with frequency programs is greater than $1047.34, we test: H0: μ = 1047.34 Ha: μ > 1047.34
b.
Some preliminary calculations are: x =
192
∑ x = 30,113.17 n
12
= 2,509.43
Chapter 6
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
(∑ x) −
2
30,113.17 2 n 12 s2 = = 4,619,331.955 = n −1 12 − 1 s = 4,619,331.955 = 2149.2631
∑x
2
The test statistic is t =
126,379,568.8 −
x − μ0 s/ n
=
2509.43 − 1047.34 2149.2631/ 12
= 2.36
The rejection region requires α = .05 in the upper tail of the t-distribution with df = n − 1 = 12 − 1 = 11. From Table VI, Appendix B, t.05 = 1.796. The rejection region is t > 1.796. Since the observed value of the test statistic falls in the rejection region (t = 2.36 > 1.796), H0 is rejected. There is sufficient evidence to indicate the mean profit change for restaurants with frequency programs is greater than $1047.34 for α = .05. It appears that the frequency program would be profitable for the company if adopted nationwide. 6.124
a.
A Type II error would be concluding the mean amount of PCB in the air is less than or equal to 3 parts per million when, in fact, it is more than 3 parts per million.
b.
From Exercise 6.123, z =
x0 − μ
σ/ n
⇒ x0 = z
σ n
.5 +3 50 ⇒ x0 = 3.165
+ μ0 ⇒ x0 = 2.33
⎛ ⎞ ⎜ 3.165 − 3.1 ⎟ ⎟ = P(z ≤ .92) = .5 + .3212 = .8212 For μ = 3.1, β = P( x ≤ 3.165) = P ⎜ z ≤ .5 ⎜ ⎟ ⎜ ⎟ 50 ⎝ ⎠ (from Table IV, Appendix B) c.
Power = 1 − β = 1 − .8212 = .1788
d.
⎛ ⎞ ⎜ 3.165 − 3.2 ⎟ ⎟ = P(z ≤ −.49) = .5 − .1879 = .3121 For μ = 3.2, β = P( x ≤ 3.165) = P ⎜ z ≤ .5 ⎜ ⎟ ⎜ ⎟ 50 ⎝ ⎠ Power = 1 − β = 1 − .3121 = .6879 As the plant's mean PCB departs further from 3, the power increases.
Inferences Based on a Single Sample: Tests of Hypothesis
193
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
6.126
a.
Some preliminary calculations: x =
s2 = s=
∑ x = 79.93 n
∑x
5
2
(∑ x) −
= 15.986 2
n = n −1 .00043 = .0207
1, 277.7627 − 5 −1
79.932 5 = .00043
To determine if the mean measurement differs from 16.01, we test: H0: μ = 16.01 Ha: μ ≠ 16.01 The test statistic is t =
x − μ0 s/ n
=
15,986 − 16.01 .0207 / 5
= −2.59
The rejection region requires α/2 = .05/2 = .025 in each tail of the t-distribution with df = n − 1 = 5 − 1 = 4. From Table VI, Appendix B, t.025 = 2.776. The rejection region is t < −2.776 or t > 2.776. Since the observed value of the test statistic does not fall in the rejection region (t = −2.59 −2.776), H0 is not rejected. There is insufficient evidence to indicate the true mean measurement differs from 16.01 at α = .05. b.
We must assume that the sample of measurements was randomly selected from a population of measurements that is normally distributed.
c.
To determine if the standard deviation of the weight measurements is greater than .01, we test: H0: σ2 = .012 Ha: σ2 > .012 The test statistic is χ 2 =
( n − 1) s 2
σ o2
=
(5 − 1).0207 2 = 16.0684 . .012
The rejection region requires α = .05 in the upper tail of the χ2 distribution with df = n – 1 = 5 – 1 = 4. From Table VII, Appendix B, χ .205 = 9.48773. The rejection region is χ2 > 9.48773. Since the observed value of the test statistic falls in the rejection region (χ2 = 16.0684 > 9.48773), H0 is rejected. There is sufficient evidence to indicate the standard deviation of the weight measurements is greater than .01 at α = .05.
194
Chapter 6
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
6.128
a.
Let pi = proportion of first round games won by the ith seed. To determine if the higher seed has a better than 50-50 chance of winning a first-round game, we test: H0: pi = .5 Ha: pi > .5 for i = 1, 2, 3, …, 8 The test statistic is zi =
pˆ i − p0 po qo n
.
No value of α was given. We will use α = .05. The rejection region requires α = .05 in the upper tail of the z-distribution. From Table IV, Appendix B, z.05 = 1.645. The rejection region is z > 1.645. xi x x 52 x 49 41 = 1 , pˆ 2 = 2 = = .942 , pˆ 3 = 3 = = .788 , . Thus, pˆ1 = 1 = n 52 n 52 n n 52 x 37 x x x 42 36 35 pˆ 4 = 4 = = .808 , pˆ 5 = 5 = = .712 , pˆ 6 = 6 = = .692 , pˆ 7 = 7 = = .673 , n 52 n 52 n 52 n 52 x 22 pˆ 8 = 8 = = .423 n 52 pˆ i =
The corresponding test statistics are: z1 =
z3 =
z5 =
z7 =
pˆ1 − p0 po qo n pˆ 3 − p0 po qo n pˆ 5 − p0 po qo n pˆ 7 − p0 po qo n
=
1.00 − .5
=
=
=
.5(.5) 52 .788 − .5 .5(.5) 52 .712 − .5 .5(.5) 52 .673 − .5 .5(.5) 52
= 7.21 , z2 =
= 4.15 , z4 =
= 3.06 , z6 =
= 2.50 , z8 =
pˆ 2 − p0 po qo n pˆ 4 − p0 po qo n pˆ 6 − p0 po qo n pˆ 8 − p0 po qo n
=
.942 − .5
=
=
=
.5(.5) 52 .808 − .5 .5(.5) 52 .692 − .5 .5(.5) 52 .423 − .5 .5(.5) 52
= 6.37 ,
= 4.44 ,
= 2.77 ,
= −1.11
For games matching 1 and 16, since the observed value of the test statistic falls in the rejection region (z1 = 7.21 > 1.645), H0 is rejected. There is sufficient evidence to indicate the #1 seed has a better than 50-50 chance of winning a first-round game at α = .05.
Inferences Based on a Single Sample: Tests of Hypothesis
195
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
For games matching 2 and 15, since the observed value of the test statistic falls in the rejection region (z2 = 6.37 > 1.645), H0 is rejected. There is sufficient evidence to indicate the #2 seed has a better than 50-50 chance of winning a first-round game at α = .05. For games matching 3 and 14, since the observed value of the test statistic falls in the rejection region (z3 = 4.15 > 1.645), H0 is rejected. There is sufficient evidence to indicate the #3 seed has a better than 50-50 chance of winning a first-round game at α = .05. For games matching 4 and 13, since the observed value of the test statistic falls in the rejection region (z4 = 4.44 > 1.645), H0 is rejected. There is sufficient evidence to indicate the #4 seed has a better than 50-50 chance of winning a first-round game at α = .05. For games matching 5 and 12, since the observed value of the test statistic falls in the rejection region (z5 = 3.06 > 1.645), H0 is rejected. There is sufficient evidence to indicate the #5 seed has a better than 50-50 chance of winning a first-round game at α = .05. For games matching 6 and 11, since the observed value of the test statistic falls in the rejection region (z6 = 2.77 > 1.645), H0 is rejected. There is sufficient evidence to indicate the #6 seed has a better than 50-50 chance of winning a first-round game at α = .05. For games matching 7 and 10, since the observed value of the test statistic falls in the rejection region (z7 = 2.50 > 1.645), H0 is rejected. There is sufficient evidence to indicate the #7 seed has a better than 50-50 chance of winning a first-round game at α = .05. For games matching 8 and 9, since the observed value of the test statistic does not fall in the rejection region (z8 = −1.11 >/ 1.645), H0 is not rejected. There is insufficient evidence to indicate the #8 seed has a better than 50-50 chance of winning a first-round game at α = .05. b.
Let μi = mean margin of victory. To determine if the mean margin of victory is greater than 10 points, we test: H0: μi = 10 Ha: μi > 10 i = 1, 2, 3, and 4 The test statistic is zi =
xi − μ0
σx
No value of α was given. We will use α = .05. The rejection region requires α = .05 in the upper tail of the z-distribution. From Table IV, Appendix B, z.05 = 1.645. The rejection region is z > 1.645.
196
Chapter 6
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
The test statistics are: z1 =
z3 =
x1 − μ0
σx
x3 − μ0
σx
=
22.9 − 10 12.4
=
52
= 7.50 , z2 =
10.6 − 10 12.0
52
= 0.36 , z4 =
x2 − μ0
σx
x4 − μ0
σx
17.2 − 10
=
11.4
=
52
= 4.55 ,
10.0 − 10 12.5
52
=0
For games matching 1 and 16, since the observed value of the test statistic falls in the rejection region (z1 = 7.50 > 1.645), H0 is rejected. There is sufficient evidence to indicate the #1 seed wins by more than 10 points in first-round games at α = .05. For games matching 2 and 15, since the observed value of the test statistic falls in the rejection region (z2 = 4.55 > 1.645), H0 is rejected. There is sufficient evidence to indicate the #2 seed wins by more than 10 points in first-round games at α = .05. For games matching 3 and 14, since the observed value of the test statistic does not fall in the rejection region (z3 = 0.36 >/ 1.645), H0 is not rejected. There is insufficient evidence to indicate the #3 seed wins by more than 10 points in first-round games at α = .05.
c.
For games matching 4 and 13, since the observed value of the test statistic does not fall in the rejection region (z4 = 0 >/ 1.645), H0 is not rejected. There is insufficient evidence to indicate the #4 seed wins by more than 10 points in first-round games at α = .05. Let μi = mean margin of victory. To determine if the mean margin of victory is less than 5 points, we test: H0: μi = 5 Ha: μi < 5 i = 5, 6, 7, and 8 The test statistic is zi =
xi − μ0
σx
No value of α was given. We will use α = .05. The rejection region requires α = .05 in the upper tail of the z-distribution. From Table IV, Appendix B, z.05 = 1.645. The rejection region is z < −1.645. The test statistics are: z5 =
z7 =
x5 − μ0
σx
x7 − μ0
σx
=
=
5.3 − 5 10.4
52
3.2 − 5 10.5
52
= 0.21 , z6 =
x6 − μ0
= −1.24 , z8 =
σx
=
x8 − μ0
Inferences Based on a Single Sample: Tests of Hypothesis
σx
4.3 − 5 10.7 =
52
= −.47 ,
−2.1 − 5 11.0
52
= −4.65
197
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
For games matching 5 and 12, since the observed value of the test statistic does not fall in the rejection region (z5 = 0.21 −1.645), H0 is not rejected. There is insufficient evidence to indicate the #5 seed wins by less than 5 points in first-round games at α = .05. For games matching 6 and 11, since the observed value of the test statistic does not fall in the rejection region (z6 = −0.47 −1.645), H0 is not rejected. There is insufficient evidence to indicate the #6 seed wins by less than 5 points in first-round games at α = .05. For games matching 7 and 10, since the observed value of the test statistic does not fall in the rejection region (z7 = −1.24 −1.645), H0 is not rejected. There is insufficient evidence to indicate the #7 seed wins by less than 5 points in first-round games at α = .05. For games matching 8 and 9, since the observed value of the test statistic falls in the rejection region (z8 = −4.65 < −1.645), H0 is rejected. There is sufficient evidence to indicate the #8 seed wins by less than 5 points in first-round games at α = .05. d.
To determine if the standard deviation of victory margin differs from 11, we test: H0: σ 12 = 112 = 121 Ha: σ 12 ≠ 112 = 121 The test statistic is χ i2 =
(n − 1) si2
σ 02
No α level was given, so we will use α = .05. The rejection region requires α/2 = .05/2 = .025 in each tail of the χ 2 distribution with df = n – 1 = 52 – 1 = 51. From Table VII, 2 2 = 71.4202 and χ.975 = 32.3574. The rejection region is χ 2 < 32.3574 Appendix B, χ.025
or χ 2 > 71.4202. The test statistics are:
χ12 =
χ 32 =
χ 52 =
χ 72 =
198
(n − 1) s12
=
(n − 1) s22 (52 − 1)(11.4) 2 (52 − 1)(12.4) 2 = 64.808 , χ 22 = = = 54.777 , 121 121 σ 02
(n − 1) s32
=
(n − 1) s42 (52 − 1)(12.5) 2 (52 − 1)(12.0) 2 = 60.694 , χ 42 = = = 65.857 , 121 121 σ 02
(n − 1) s52
=
(n − 1) s62 (52 − 1)(10.7) 2 (52 − 1)(10.4) 2 = 45.588 , χ 62 = = = 48.256 , 121 121 σ 02
(n − 1) s72
=
(n − 1) s82 (52 − 1)(11) 2 (52 − 1)(10.5) 2 = 46.469 , χ 82 = = = 51.000 121 121 σ 02
σ
2 0
σ 02
σ 02
σ 02
Chapter 6
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
For games matching 1 and 16, since the observed value of the test statistic does not fall in the rejection region ( χ12 = 64.808 >/ 71.4202 and χ12 = 64.808 32.3574), H0 is not rejected. There is insufficient evidence to indicate standard deviation of victory margin differs from 11 at α = .05. For games matching 2 and 15, since the observed value of the test statistic does not fall in the rejection region ( χ 22 = 54.777 >/ 71.4202 and χ 22 = 54.777 32.3574), H0 is not rejected. There is insufficient evidence to indicate standard deviation of victory margin differs from 11 at α = .05. For games matching 3 and 14, since the observed value of the test statistic does not fall in the rejection region ( χ 32 = 60.694 >/ 71.4202 and χ 32 = 60.694 32.3574), H0 is not rejected. There is insufficient evidence to indicate standard deviation of victory margin differs from 11 at α = .05. For games matching 4 and 13, since the observed value of the test statistic does not fall in the rejection region ( χ 42 = 65.857 >/ 71.4202 and χ 42 = 65.857 32.3574), H0 is not rejected. There is insufficient evidence to indicate standard deviation of victory margin differs from 11 at α = .05. For games matching 5 and 12, since the observed value of the test statistic does not fall in the rejection region ( χ 52 = 45.588 >/ 71.4202 and χ 52 = 45.588 32.3574), H0 is not rejected. There is insufficient evidence to indicate standard deviation of victory margin differs from 11 at α = .05. For games matching 6 and 11, since the observed value of the test statistic does not fall in the rejection region ( χ 62 = 48.256 >/ 71.4202 and χ 62 = 48.256 32.3574), H0 is not rejected. There is insufficient evidence to indicate standard deviation of victory margin differs from 11 at α = .05. For games matching 7 and 10, since the observed value of the test statistic does not fall in the rejection region ( χ 72 = 46.469 >/ 71.4202 and χ 72 = 46.469 32.3574), H0 is not rejected. There is insufficient evidence to indicate standard deviation of victory margin differs from 11 at α = .05. For games matching 8 and 9, since the observed value of the test statistic does not fall in the rejection region ( χ 82 = 51.000 >/ 71.4202 and χ 82 = 51.000 32.3574), H0 is not rejected. There is insufficient evidence to indicate standard deviation of victory margin differs from 11 at α = .05.
Inferences Based on a Single Sample: Tests of Hypothesis
199
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
e.
Let μ = mean difference in game outcome and point spread. To determine if the point spread is a good predictor of the victory margin, we test: H0: μ = 0 Ha: μ ≠ 0 The test statistic is z =
x − μ0
σx
=
.7 − 0 11.3
360
= 1.18 .
Since no α was given, we will use α = .05. The rejection region requires α/2 = .05/2 = .025 in each tail of the z-distribution. From Table IV, Appendix B, z.025 = 1.96. The rejection region is z > 1.96 or z < −1.96. Since the observed value of the test statistic does not fall in the rejection region (z = 1.18 >/ 1.96), H0 is not rejected. There is insufficient evidence to indicate there is a difference in the game outcome and point spread at α = .05. There is no evidence to indicate the point spread is not a good predictor of the victory margin. 6.130
Using MINITAB, the descriptive statistics are: Descriptive Statistics: Candy Variable Candy
N 5
N* 0
Mean 24.00
SE Mean 1.67
StDev 3.74
Minimum 21.00
Q1 21.00
Median 23.00
Q3 27.50
Maximum 30.00
To give the benefit of the doubt to the students we will use a small value of α. (We do not want to reject H0 when it is true to favor the students.) Thus, we will use α = .001.
We must also assume that the sample comes from a normal distribution. To determine if the mean number of candies exceeds 15, we test: H0: μ = 15 Ha: μ > 15 The test statistic is z =
x − μo
σ
n
=
22 − 15 3
5
= 5.22
The rejection region requires α = .001 in the upper tail of the z-distribution. From Table IV, Appendix B, z.001 = 3.08. The rejection region is z > 3.08. Since the observed value of the test statistic falls in the rejection region (z = 5.22 > 3.08), H0 is rejected. There is sufficient evidence to indicate the mean number of candies exceeds 15 at α = .001.
200
Chapter 6
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
Inferences Based on Two Samples: Confidence Intervals and Tests of Hypothesis 7.2
a.
μ x = μ1 = 12
σx =
b.
μ x = μ2 = 10
σx =
c.
μ x − x = μ1 − μ2 = 12 − 10 = 2 1
1
7.4
2
2
σ1 n1
σ2 n2
=
4 = .5 64
=
3 64
= .375
2
σ x −x = d.
1
1
Chapter 7
2
σ 12 n1
+
σ 22 n2
=
42 32 25 + = = .625 64 64 64
Since n1 ≥ 30 and n2 ≥ 30, the sampling distribution of x1 − x2 is approximately normal by the Central Limit Theorem.
Assumptions about the two populations: 1. 2.
Both sampled populations have relative frequency distributions that are approximately normal. The population variances are equal.
Assumptions about the two samples: The samples are randomly and independently selected from the population. 7.6
a.
sp2 =
(n1 − 1) s12 + (n2 − 1) s22 (25 − 1)120 + (25 − 1)100 5280 = 110 = = n1 + n2 − 2 25 + 25 + 2 48
b.
sp2 =
(20 − 1)12 + (10 − 1)20 408 = = 14.5714 20 + 10 − 2 28
c.
sp2 =
(6 − 1).15 + (10 − 1).2 2.55 = = .1821 6 + 10 − 2 14
d.
sp2 =
(16 − 1)3000 + (17 − 1)2500 85,000 = = 2741.9355 16 + 17 − 2 31
e.
sp2 falls near the variance with the larger sample size.
Inferences Based on Two Samples: Confidence Intervals and Tests of Hypothesis
201
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
7.8
σ 12
σ 22
9 16 + = .25 = .5 100 100
a.
σ x −x =
b.
The sampling distribution of x1 − x2 is approximately normal by the Central Limit Theorem since n1 ≥ 30 and n2 ≥ 30.
1
2
n1
+
n2
=
μ x − x = μ1 − μ2 = 10 1
c.
2
x1 − x2 = 15.5 − 26.6 = −11.1 Yes, it appears that x1 − x2 = −11.1 contradicts the null hypothesis H0: μ1 − μ2 = 10.
d.
The rejection region requires α/2 = .025 = .05/2 in each tail of the z-distribution. From Table IV, Appendix B, z.025 = 1.96. The rejection region is z < −1.96 or z > 1.96.
e.
H0: μ1 − μ2 = 10 Ha: μ1 − μ2 ≠ 10 The test statistic is z =
( x1 − x2 ) − 10
σ 12 n1
+
σ 22
=
(15.5 − 26.6) − 10 = −42.2 .5
n2
The rejection region is z < −1.96 or z > 1.96. (Refer to part d.) Since the observed value of the test statistic falls in the rejection region (z = −42.2 < −1.96), H0 is rejected. There is sufficient evidence to indicate the difference in the population means is not equal to 10 at α = .05. f.
The form of the confidence interval is: ( x1 − x2 ) ± zα / 2
σ 12 n1
+
σ 22 n2
For confidence coefficient .95, α = 1 − .95 = .05 and α/2 = .05/2 = .025. From Table IV, Appendix B, z.025 = 1.96. The confidence interval is: 9 16 + ⇒ −11.1 ± .98 ⇒ (−12.08, −10.12) (15.5 − 26.6) ± 1.96 100 100
We are 95% confident that the difference in the two means is between −12.08 and −10.12. g.
202
The confidence interval gives more information.
Chapter 7
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
7.10
Some preliminary calculations: x1 =
∑x
1
n1
∑x
2 1
s12 = x1 =
sp2 = a.
654 15
(∑ x ) −
2
∑x
2
n2
∑x
6542 15 = 419.6 = 29.3167 15 − 1 14
1
n1
=
28934 −
=
n1 − 1
2 2
s22 =
=
858 = 53.625 16
(∑ x ) −
2
2
n2 n2 − 1
=
8582 16 = 439.75 = 29.3167 16 − 1 15
46450 −
(n1 − 1) s12 + (n2 − 1) s22 (15 − 1)29.9714 + (16 − 1)29.3167 859.3501 = 29.6328 = = 29 n1 + n2 − 2 15 + 16 − 2
H0: μ2 − μ1 = 10 Ha: μ2 − μ1 > 10 The test statistic is t =
( x1 − x2 ) − D0 ⎛1 1⎞ sp2 ⎜ + ⎟ ⎝ n1 n2 ⎠
=
(53.625 − 43.6) − 10 ⎛1 1⎞ 29.6328 ⎜ + ⎟ ⎝ 15 16 ⎠
=
.025 = .013 1.9564
The rejection region requires α = .01 in the upper tail of the t-distribution with df = n1 + n2 − 2 = 15 + 16 − 2 = 29. From Table VI, Appendix B, t.01 = 2.462. The rejection region is t > 2.462. Since the test statistic does not fall in the rejection region (t = .013 >/ 2.462), H0 is not rejected. There is insufficient evidence to conclude μ2 − μ1 > 10 at α = .01. b.
For confidence coefficient .98, α = .02 and α/2 = .01. From Table VI, Appendix B, with df = n1 + n2 − 2 = 15 + 16 − 2 = 29, t.01 = 2.462. The 98% confidence interval for (μ2 − μ1) is:
⎛1 1⎞ ⎛1 1⎞ ( x1 − x2 ) ± tα / 2 sp2 ⎜ + ⎟ ⇒ (53.625 − 43.6) ± 2.462 29.6328 ⎜ + ⎟ ⎝ 15 16 ⎠ ⎝ n1 n2 ⎠ ⇒ 10.025 ± 4.817 ⇒ (5.208, 14.842) We are 98% confident that the difference between the mean of population 2 and the mean of population 1 is between 5.208 and 14.842.
Inferences Based on Two Samples: Confidence Intervals and Tests of Hypothesis
203
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
7.12
a.
Let μ1 = mean carat size of diamonds certified by GIA and μ2 = mean carat size of diamonds certified by HRD. For confidence coefficient .95, α = .05 and α/2 = .05/2 = .025. From Table IV, Appendix B, z.025 = 1.96. The 95% confidence interval is:
σ 12
( x1 − x2 ) ± zα / 2
n1
+
σ 22 n2
⇒ (.6723 − .8129) ± 1.96
.24562 .18312 + 151 79
⇒ −.1406 ± .0563 ⇒ (−.1969, − .0843)
b.
We are 95% confident that the difference in mean carat size between diamonds certified by GIA and those certified by HRD is between -.1969 and -.0843.
c.
Let μ3 = mean carat size of diamonds certified by IGI. ( x1 − x3 ) ± zα / 2
σ 12 n1
+
σ 32 n3
⇒ (.6723 − .3665) ± 1.96
.24562 .21632 + 151 78
⇒ .3058 ± .0620 ⇒ (.2438, .3678)
7.14
d.
We are 95% confident that the difference in mean carat size between diamonds certified by GIA and those certified by IGI is between .2438 and .3678.
e.
( x2 − x3 ) ± zα / 2
f.
We are 95% confident that the difference in mean carat size between diamonds certified by HRD and those certified by IGI is between .3837 and .5091.
a.
Let μ1 = mean score for males and μ2 = mean score for females. For confidence coefficient .90, α = .10 and α/2 = .10/2 = .05. From Table IV, Appendix B, z.025 = 1.645. The 90% confidence interval is:
σ 22 n2
( x1 − x2 ) ± zα / 2
+
σ 32 n3
σ 12 n1
.18312 .21632 + 79 78 ⇒ .4464 ± .0627 ⇒ (.3837, .5091)
⇒ (.8129 − .3665) ± 1.96
+
σ 22 n2
⇒ (39.08 − 38.79) ± 1.645
6.732 6.942 + 127 114
⇒ 0.29 ± 1.452 ⇒ (−1.162, 1.742)
We are 90% confident that the difference in mean service-rating scores between males and females. b.
204
Because 0 falls in the 90% confidence interval, we are 90% confident that there is no difference in the mean service-rating scores between males and females.
Chapter 7
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
7.16
a.
The descriptive statistics are:
Descriptive Statistics: US, Japan Variable US Japan
N 5 5
Mean 6.562 3.118
Median 6.870 3.220
TrMean 6.562 3.118
Variable US Japan
Minimum 4.770 1.920
Maximum 8.000 4.910
Q1 5.415 1.970
Q3 7.555 4.215
s 2p =
StDev 1.217 1.227
SE Mean 0.544 0.549
(n1 − 1) s12 + (n2 − 1) s22 (5 − 1)1.217 2 + (5 − 1)1.227 2 = 1.4933 = 5+5−2 n1 + n2 − 2
To determine if the mean annual percentage turnover for U.S. plants exceeds that for Japanese plants, we test:
H0: μ1 − μ2 = 0 Ha: μ1 − μ2 > 0 The test statistic is t =
( x1 − x2 ) − D0 ⎛1 1 ⎞ sp2 ⎜ + ⎟ ⎝ n1 n2 ⎠
=
(6.562 − 3.118) − 0 ⎛1 1⎞ 1.4933 ⎜ + ⎟ ⎝5 5⎠
= 4.456
The rejection region requires α = .05 in the upper tail of the t-distribution with df = n1 + n2 − 2 = 5 + 5 − 2 = 8. From Table VI, Appendix B, t.05 = 1.860. The rejection region is t > 1.860. Since the observed value of the test statistic falls in the rejection region (t = 4.46 > 1.860), H0 is rejected. There is sufficient evidence to indicate the mean annual percentage turnover for U.S. plants exceeds that for Japanese plants at α = .05. b.
The p-value = P(t ≥ 4.456). Using Table VI, Appendix B, with df = n1 + n2 − 2 = 5 + 5 – 2 = 8, .005 < P(t ≥ 4.456) < .001. Since the p-value is so small, there is evidence to reject H0 for α > .005.
c.
The necessary assumptions are: 1. 2. 3.
Both sampled populations are approximately normal. The population variances are equal. The samples are randomly and independently sampled.
There is no indication that the populations are not normal. Both sample variances are similar, so there is no evidence the population variances are unequal. There is no indication the assumptions are not valid.
Inferences Based on Two Samples: Confidence Intervals and Tests of Hypothesis
205
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
7.18
Let μ1 = the mean relational intimacy score for participants in the CMC group and μ2 = the mean relational intimacy score for participants in the FTF group. Using MINITAB, the descriptive statistics are: Descriptive Statistics: CMC, FTF Variable CMC FTF
N 24 24
N* 0 0
Mean 3.500 3.542
SE Mean 0.159 0.134
StDev 0.780 0.658
Minimum 2.000 2.000
Q1 3.000 3.000
Median 3.500 4.000
Q3 4.000 4.000
Maximum 5.000 5.000
Some preliminary calculations are: s 2p =
( n1 − 1) s12 + ( n2 − 1) s22 = ( 24 − 1) .7802 + ( 24 − 1) .6582 n1 + n2 − 2
24 + 24 − 2
= 0.5207
To determine if the mean relational intimacy score for participants in the CMC group is lower than the mean relational intimacy score for participants in the FTF group, we test: H0: μ1 − μ2 = 0 Ha: μ1 − μ2 < 0 The test statistic is t =
( x1 − x2 ) − Do ⎛1 1 ⎞ s 2p ⎜ + ⎟ ⎝ n1 n2 ⎠
=
( 3.500 − 3.542 ) − 0 = −0.042 = −.20 1 ⎞ ⎛ 1 .5207 ⎜ + ⎟ ⎝ 24 24 ⎠
.20831
The rejection region requires α= .10 in the lower tail of the t-distribution with df = n1 + n2 – 2 = 24 + 24 – 2 = 46. From Table VI, Appendix B, t.10 ≈1.303. The rejection region is t < −1.303. Since the observed value of the test statistic does not fall in the rejection region (t = −.20 ≤/ −1.303), H0 is not rejected. There is insufficient evidence to indicate that the mean relational intimacy score for participants in the CMC group is lower than the mean relational intimacy score for participants in the FTF group at α = .10. 7.20
206
a.
The first population is the set of responses for all business students who have access to lecture notes and the second population is the set of responses for all business students not having access to lecture notes.
Chapter 7
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
b.
To determine if there is a difference in the mean response of the two groups, we test: H0: μ1 − μ2 = 0 Ha: μ1 − μ2 ≠ 0 The test statistic is z =
( x1 − x2 ) − 0 s12 s22 + n1 n2
=
(8.48 − 7.80) − 0 = 2.19 .94 2.99 + 86 35
The rejection region requires α/2 = .01/2 = .005 in each tail of the z-distribution. From Table IV, Appendix B, z.005 = 2.58. The rejection region is z < −2.58 or z > 2.58. Since the observed value of the test statistic does not fall in the rejection region (z = 2.19 >/ 2.58), H0 is not rejected. There is insufficient evidence to indicate a difference in the mean response of the two groups at α = .01. c.
For confidence coefficient .99, α = .01 and α/2 = .01/2 = .005. From Table IV, Appendix B, z.005 = 2.58. The confidence interval is: s12 s22 .94 2.99 + ⇒ (8.48 − 7.80) ± 2.58 + n1 n2 86 35
( x1 − x2 ) ± z.005
⇒ .68 ± .801 ⇒ (−.121, 1.481) We are 99% confident that the difference in the mean response between the two groups is between −.121 and 1.481.
7.22
d.
A 95% confidence interval would be smaller than the 99% confidence interval. The z value used in the 95% confidence interval is z.025 = 1.96 compared with the z value used in the 99% confidence interval of z.005 = 2.58.
a.
The bacteria counts are probably normally distributed because each count is the median of five measurements from the same specimen.
b.
Let μ1 = mean of the bacteria count for the discharge and μ2 = mean of the bacteria count upstream. Since we want to test if the mean of the bacteria count for the discharge exceeds the mean of the count upstream, we test: H0: μ1 − μ2 = 0 Ha: μ1 − μ2 > 0
c.
Using MINITAB, the descriptive statistics are: Descriptive Statistics: Plant, Upstream
Variable Plant Upstream
N 6 6
Mean 32.10 29.617
Median 31.75 30.000
TrMean 32.10 29.617
Variable Plant Upstream
Minimum 28.20 26.400
Maximum 36.20 32.300
Q1 29.40 27.075
Q3 35.23 31.850
StDev 3.19 2.355
Inferences Based on Two Samples: Confidence Intervals and Tests of Hypothesis
SE Mean 1.30 0.961
207
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
(n1 − 1) s12 + (n2 − 1) s22 (6 − 1)3.192 + (6 − 1)2.3552 = 7.861 s = = n1 + n2 − 2 6+6−2 2 p
The test statistic is t =
( x1 − x2 ) − 0 ⎛1 1 ⎞ s ⎜ + ⎟ ⎝ n1 n2 ⎠
=
(32.10 − 29.617) − 0
2 p
⎛1 1⎞ 7.861 ⎜ + ⎟ ⎝6 6⎠
= 1.53
No α level was given, so we will use α = .05. The rejection region requires α = .05 in the upper tail of the t-distribution with df = n1 + n2 − 2 = 6 + 6 – 2 = 10. From Table VI, Appendix B, t.05 = 1.812. The rejection region is t > 1.812. Since the observed value of the test statistic does not fall in the rejection region (t = 1.53 >/ 1.812), H0 is not rejected. There is insufficient evidence to indicate the mean bacteria count for the discharge exceeds the mean of the count upstream at α = .05. d.
We must assume: 1. 2. 3.
7.24
The mean counts per specimen for each location is normally distributed. The variances of the 2 distributions are equal. Independent and random samples were selected from each population.
a.
We cannot make inferences about the difference between the mean salaries of male and female accounting/finance/banking professionals because no standard deviations are provided.
b.
To determine if the mean salary for males is significantly greater than that for females, we test: H0: μ1 − μ2 = 0 Ha: μ1 − μ2 > 0 The rejection region requires α = .05 in the upper tail of the z-distribution. From Table IV, Appendix B, z.05 = 1.645. To make things easier, we will assume that the standard deviations for the 2 groups are the same. The test statistic is z=
208
( x1 − x2 ) − Do = ( 69, 484 − 52,012 ) − 0 = ⎛ σ 12 σ 22 ⎞ + ⎜ ⎟ ⎝ n1 n2 ⎠
1 ⎞ ⎛ 1 + ⎟ 1400 1400 ⎝ ⎠
σ2⎜
17,836 471,896.2038 = σ (.037796) σ
Chapter 7
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
In order to reject H0 this test statistic must fall in the rejection region, or be greater than 1.645. Solving for σ we get: z=
471,896.2038
σ
> 1.645 ⇒ σ <
471,896.2038 = 286,866.99 1.645
Thus, to reject H0 the average of the two standard deviations has to be less than $286,866.99.
7.26
c.
Yes. In fact, reasonable values for the standard deviation will be around $5,000. which is much smaller than the required $286,866.99.
d.
These data were collected from voluntary subjects who responded to a Web-based survey. Thus, this is not a random sample, but a self-selected sample. Generally, subjects who respond to surveys tend to have very strong opinions, which may not be the same as the population in general. Thus, the results from this self-selected sample may not reflect the results from the population in general.
a. Pair
Difference
1 2 3 4 5 6
3 2 2 4 0 1
nd
d=
∑d i =1
nd
i
=
12 =2 6
⎛ nd ⎞ ⎜ ∑ di ⎟ nd i =1 2 ⎠ di − ⎝ ∑ n d sd2 = i =1 nd − 1
2
⎛ (12) 2 ⎞ ⎜ 34 − ⎟ 6 ⎠ ⎝ =2 = 5
b.
μd = μ1 − μ2
c.
For confidence coefficient .95, α = .05 and α/2 = .025. From Table VI, Appendix B, with df = nD − 1 = 6 − 1 = 5, t.025 = 2.571. The confidence interval is:
d ± tα / 2
sd nd
= 2.571
2 6
⇒ 2 ± 1.484 ⇒ (.516, 3.484)
Inferences Based on Two Samples: Confidence Intervals and Tests of Hypothesis
209
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
d.
H0: μd = 0 Ha: μd ≠ 0 The test statistic is t = t =
d sd
nd
=
2 = 3.46 2/ 6
The rejection region requires α/2 = .05/2 = .025 in each tail of the t-distribution with df = nD − 1 = 6 − 1 = 5. From Table VI, Appendix B, t.025 = 2.571. The rejection region is t < −2.571 or t > 2.571. Since the observed value of the test statistic falls in the rejection region (3.46 > 2.571), H0 is rejected. There is sufficient evidence to indicate that the mean difference is different from 0 at α = .05. 7.28
a.
H0: μ1 − μ2 = 0 Ha: μ1 − μ2 < 0 The rejection region requires α = .10 in the lower tail of the z-distribution. From Table IV, Appendix B, z.10 = 1.28. The rejection region is z < −1.28.
b.
H0: μ1 − μ2 = 0 Ha: μ1 − μ2 < 0 The test statistic is z =
d − 0 −3.5 − 0 = = −4.71 . sd 21 nd 38
The rejection region is z < −1.28 (Refer to part a.) Since the observed value of the test statistic falls in the rejection region (z = −4.71 < −1.28), H0 is rejected. There is sufficient evidence to indicate μ1 − μ2 < 0 at α = .10. c.
Since the sample size of the number of pairs is greater than 30, we do not need to assume that the population of differences is normal. The sampling distribution of d is approximately normal by the Central Limit Theorem. We must assume that the differences are randomly selected.
d.
For confidence coefficient .90, α = .10 and α/2 = .10/2 = .05. From Table IV, Appendix B, z.05 = 1.645. The 90% confidence interval is:
d ± z.05
e.
210
sd nd
⇒ −3.5 ± 1.645
21 38
⇒ −3.5 ± 1.223 ⇒ (−4.723, − 2.277)
The confidence interval provides more information since it gives an interval of possible values for the difference between the population means.
Chapter 7
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
7.30
a.
Let μ1 = the mean salary of technology professionals in 2003 and μ2 = the mean salary of technology professionals in 2005. Let μd = μ1 - μ2. To determine if the mean salary of technology professionals at all U.S. metropolitan areas has increased between 2003 and 2005, we test:
H0: μ1 − μ2 = 0
H0: μd = 0 OR
Ha: μ1 − μ2 < 0
Ha: μd < 0
b. Metro Area
2003 Salary ($ thousands) 87.7 78.6 71.4 70.8 73.0 76.3 73.6 71.1 69.5 69.0 71.0 73.0 62.3
Silicon Valley New York Washington, D.C. Los Angeles Denver Boston Atlanta Chicago Philadelphia San Diego Seattle Dallas-Ft. Worth Detroit
2005 Salary ($ thousands) 85.9 80.3 77.4 77.1 77.1 80.1 73.2 73.0 69.8 77.1 66.9 71.0 64.1
Difference (2003 – 2005) 1.8 −1.7 −6.0 −6.3 −4.1 −3.8 0.4 −1.9 −0.3 −8.1 4.1 2.0 −1.8
nd
c.
d=
∑ di 1
nd
=
−25.7 = −1.977 13 2
⎛ nd ⎞ ⎜⎜ ∑ di ⎟⎟ nd 1 ⎠ 2 (−25.7) 2 di − ⎝ ∑ 206.59 − nd 13 = = 12.9819 sd2 = 1 nd − 1 13 − 1 sd = sd2 = 12.9819 = 3.603 d − μo
The test statistic is t =
e.
The rejection region requires α = .10 in the lower tail of the t-distribution with df = nd – 1 = 13 – 1 = 14. From Table VI, Appendix B, t.10 = 1.345. The rejection region is t < −1.345.
sd
nd
=
−1.977 − 0 = −1.978 3.603 13
d.
Inferences Based on Two Samples: Confidence Intervals and Tests of Hypothesis
211
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
f.
Since the observed value of the test statistic falls in the rejection region (t = −1.978 < −1.345), H0 is rejected. There is sufficient evidence to indicate the mean salary of technology professionals at all U.S. metropolitan areas has increased between 2003 and 2005 at α = .10.
g.
In order for the inference to be valid, we must assume that the population of differences is normal and that we have a random sample. Using MINITAB, the histogram of the differences is: Histogram of Diff 3.0
Fr equency
2.5 2.0
1.5 1.0
0.5 0.0
-7.5
-5.0
-2.5
0.0
2.5
5.0
Diff
The graph is fairly mound-shaped although it is somewhat skewed to the right. Since there are only 13 observations, this graph is close enough to being mound-shaped to indicate the normal assumption is reasonable. 7.32
212
a.
The data should be analyzed as a paired difference experiment because each actor who won an Academy Award was paired with another actor with similar characteristics who did not win the award.
b.
Let μ1 = mean life expectancy of Academy Award winners and μ2 = mean life expectancy of non-Academy Award winners. To compare the mean life expectancies of Academy Award winners and non-winners, we test: H0: μ1 − μ2 = μd = 0 Ha: μd ≠ 0
c.
Since the p-value was so small, there is sufficient evidence to indicate the mean life expectancies of the Academy Award winners and non-winners are different for any value of α > .003. Since the sample mean life expectancy of Academy Award winners is greater than that for non-winners, we can conclude that Academy Award winners have a longer mean life expectancy than non-winners.
Chapter 7
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
7.34
a.
Let μ1 = mean driver chest injury rating and μ2 = mean passenger chest injury rating. Because the data are paired, we are interested in μ1 − μ2 = μd, the difference in mean chest injury ratings between drivers and passengers.
b.
The data were collected as matched pairs and thus, must be analyzed as matched pairs. Two ratings are obtained for each car – the driver’s chest injury rating and the passenger’s chest injury rating.
c.
Using MINITAB, the descriptive statistics are:
Descriptive Statistics: DrivChst, PassChst, diff Variable DrivChst PassChst diff
N 98 98 98
Mean 49.663 50.224 -0.561
Median 50.000 50.500 0.000
TrMean 49.682 50.148 -0.420
Variable DrivChst PassChst diff
Minimum 34.000 35.000 -15.000
Maximum 68.000 69.000 13.000
Q1 45.000 45.000 -4.000
Q3 54.000 55.000 3.000
StDev 6.670 7.107 5.517
SE Mean 0.674 0.718 0.557
For confidence coefficient .99, α = .01 and α/2 = .01/2 = .005. From Table IV, Appendix B, z.005 = 2.58. The 99% confidence interval is: d ± z.005
7.36
sd nd
⇒ −0.561 ± 2.58
5.517 98
⇒ −0.561 ± 1.438 ⇒ (−1.999, 0.877)
d.
We are 99% confidence that the difference between the mean chest injury ratings of drivers and front-seat passengers is between −1.999 and 0.877. Since 0 is in the confidence interval, there is no evidence that the true mean driver chest injury rating exceeds the true mean passenger chest injury rating.
e.
Since the sample size is large, the sampling distribution of d is approximately normal by the Central Limit Theorem. We must assume that the differences are randomly selected.
a.
Let μC1 = mean relational intimacy score for the CMC group on the first meeting and μC3 = mean relational intimacy score for the CMC group on the third meeting. Let μCd = difference in mean relational intimacy score between the first and third meetings for the CMC group. To determine if the mean relational intimacy score will increase between the first and third meetings, we test: Ho: μCd = 0 Ha: μCd < 0
b.
The researchers used the paired t-test because the same individuals participated in each of the three meeting sessions. Thus, the samples would not be independent.
c.
Since the p-value is so small (p = .003), H0 would be rejected. There is sufficient evidence to indicate that the mean relational intimacy score for participants in the CMC group increased from the first to the third meeting for any value of α > .003.
Inferences Based on Two Samples: Confidence Intervals and Tests of Hypothesis
213
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
d.
Let μF1 = mean relational intimacy score for the FTF group on the first meeting and μF3 = mean relational intimacy score for the FTF group on the third meeting. Let μFd = difference in mean relational intimacy score between the first and third meetings for the FTF group. To determine if the mean relational intimacy score will change between the first and third meetings, we test: H0: μFd = 0 Ha: μFd ≠ 0
e.
7.38
Since the p-value is not small (p = .39), H0 would be not be rejected. There is insufficient evidence to indicate that the mean relational intimacy score for participants in the FTF group changed from the first to the third meeting for any value of α < .39.
Using MINITAB, the descriptive statistics are: Descriptive Statistics: Method1, Method2, Diff Variable Method1 Method2 Diff
N 10 10 10
N* 0 0 0
Mean 13.39 13.10 0.290
SE Mean 4.18 3.96 0.553
StDev 13.22 12.51 1.750
Minimum 1.00 1.40 -2.200
Q1 1.30 1.78 -0.875
Median 10.35 9.50 -0.150
Q3 24.63 25.05 1.575
Maximum 34.40 30.70 3.700
To determine if the mean transition error for method 1 differs from the mean transition error for method 2, we test: H0: μ1 − μ2 = 0
H0: μd = 0 OR
Ha: μ1 − μ2 ≠ 0 The test statistic is t =
d − μo sd
nd
=
Ha: μd ≠ 0
0.290 − 0 = 0.52 1.750 10
The rejection region requires α/2 = .10/2 = .05 in each tail of the t-distribution with df = nd – 1 = 10 – 1 = 9. From Table VI, Appendix B, t.05 = 1.833. The rejection region is t < −1.833 or t > 1.833. Since the observed value of the test statistic does not fall in the rejection region (t = 0.52 >/ 1.833), H0 is not rejected. There is insufficient evidence to indicate the mean transition error for method 1 differs from the mean transition error for method 2 at α = .10. 7.40
Using MINITAB, the descriptive statistics are: Descriptive Statistics: HMETER, HSTATIC, Diff
214
Variable HMETER HSTATIC Diff
N 40 40 40
N* 0 0 0
Mean 1.0405 1.0410 -0.000523
Variable HMETER HSTATIC Diff
Median 1.0232 1.0237 -0.000165
SE Mean 0.00638 0.00649 0.000204
Q3 1.0883 1.0908 0.000317
StDev 0.0403 0.0410 0.001291
Minimum 0.9936 0.9930 -0.004480
Q1 1.0047 1.0043 -0.001078
Maximum 1.1026 1.1052 0.001580
Chapter 7
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
For confidence coefficient .95, α = .05 and α/2 = .05/2 = .025. From Table VI, Appendix B, with df = nd – 1 = 40 – 1 = 39, t.025 ≈ 2.021. The 95% confidence interval is: d ± t.025
sd
⇒ −0.000523 ± 2.021
n ⇒ (−0.000936,
0.001291 ⇒ −0.000523 ± 0.000413 40
− 0.000110)
We are 95% confident that the true difference in mean density measurements between the two methods is between -0.000936 and -0.000110. Since the absolute value of this interval is completely less than the desired maximum difference of .002, the winery should choose the alternative method of measuring wine density. 7.42
a.
The rejection region requires α = .01 in the lower tail of the z-distribution. From Table IV, Appendix B, z.01 = 2.33. The rejection region is z < −2.33.
b.
The rejection region requires α = .025 in the lower tail of the z-distribution. From Table IV, Appendix B, z.025 = 1.96. The rejection region is z < −1.96.
c.
The rejection region requires α = .05 in the lower tail of the z-distribution. From Table IV, Appendix B, z.05 = 1.645. The rejection region is z < −1.645. The rejection region requires α = .10 in the lower tail of the z-distribution. From Table IV, Appendix B, z.10 = 1.28. The rejection region is z < −1.28.
d.
7.44
For confidence coefficient .95, α = 1 − .95 = .05 and α/2 = .05/2 = .025. From Table IV, Appendix B, z.025 = 1.96. The 95% confidence interval for p1 − p2 is approximately: a.
( pˆ1 − pˆ 2 ) ± zα / 2
pˆ1qˆ1 pˆ 2 qˆ2 .65(1 − .65) .58(1 − .58) + ⇒ (.65 − .58) ± 1.96 + n1 n2 400 400 ⇒ .07 − .067 ⇒ (.003, .137)
b.
( pˆ1 − pˆ 2 ) ± zα / 2
pˆ1qˆ1 pˆ 2 qˆ2 + ⇒ (.31 − .25) − 1.96 n1 n2
.31(1 − .31) .25(1 − .25) + 180 250
⇒ .06 ± .086 ⇒ (−.026, .146) c.
( pˆ1 − pˆ 2 ) ± zα / 2
pˆ1qˆ1 pˆ 2 qˆ2 .46(1 − .46) .61(1 − .61) + ⇒ (.46 − .61) ±1.96 + 100 120 n1 n2 ⇒ −.15 ± .131 ⇒ (−.281, −.019)
7.46
pˆ =
n1 pˆ1 + n2 pˆ 2 55(.7) + 65(.6) 78 = = = .65 55 + 65 120 n1 + n2
qˆ = 1 − pˆ = 1 − .65 = .35
H0: p1 − p2 = 0 Ha: p1 − p2 > 0
Inferences Based on Two Samples: Confidence Intervals and Tests of Hypothesis
215
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
The test statistic is z =
( pˆ1 − pˆ 2 ) − 0 ⎛1 1⎞ ˆ ˆ⎜ + ⎟ pq ⎝ n1 n2 ⎠
=
(.7 − .6) − 0 1 ⎞ ⎛ 1 .65(.35) ⎜ + ⎟ 55 65 ⎝ ⎠
=
.1 = 1.14 .08739
The rejection region requires α = .05 in the upper tail of the z-distribution. From Table IV, Appendix B, z.05 = 1.645. The rejection region is z > 1.645. Since the observed value of the test statistic does not fall in the rejection region (z = 1.14 >/ 1.645), H0 is not rejected. There is insufficient evidence to indicate the proportion from population 1 is greater than that for population 2 at α = .05. 7.48
a.
Let p1 = proportion of men who prefer to keep track of appointments in their head and p2 = proportion of women who prefer to keep track of appointments in their head. To determine if the proportion of men who prefer to keep track of appointments in their head is greater than that of women, we test:
H0: p1 − p2 = 0 Ha: p1 − p2 > 0 b.
pˆ =
n1 pˆ1 + n2 pˆ 2 500(.56) + 500(.46) = .51 and qˆ = 1 − pˆ = 1 − .51 = .49 = n1 + n2 500 + 500
The test statistic is z =
7.50
( pˆ1 − pˆ 2 ) − 0 ⎛1 1⎞ ˆ ˆ⎜ + ⎟ pq ⎝ n1 n2 ⎠
=
(.56 − .46) − 0 1 ⎞ ⎛ 1 + .51(.49) ⎜ ⎟ ⎝ 500 500 ⎠
= 3.16
c.
The rejection region requires α = .01 in the upper tail of the z distribution. From Table IV, Appendix B, z.01 = 2.33. The rejection region is z > 2.33.
d.
The p-value is p = P(z ≥ 3.16) ≈ .5 − .5 = 0.
e.
Since the observed value of the test statistic falls in the rejection region (z = 3.16 > 2.33), H0 is rejected. There is sufficient evidence to indicate the proportion of men who prefer to keep track of appointments in their head is greater than that of women at α = .01.
a.
Let p1 = proportion of customers returning the printed survey and p2 = proportion of customers returning the electronic survey. Some preliminary calculations are: pˆ1 =
x1 261 = = .414 n1 631
pˆ 2 =
x2 155 = = .374 n2 414
For confidence coefficient .90, α = .10 and α/2 = .10/2 = .05. From Table IV, Appendix B, z.05 = 1.645. The 90% confidence interval is:
216
Chapter 7
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
( pˆ1 − pˆ 2 ) ± z.05
pˆ1qˆ1 pˆ 2 qˆ2 .414(.586) .374(.626) + ⇒ (.414 − .374) ± 1.645 + n1 n2 631 414 ⇒ .04 ± .051 ⇒ (−.011, .091)
We are 90% confidence that the difference in the response rates for the two types of surveys is between −.011 and .091.
7.52
b.
Since the value .05 falls in the 90% confidence interval, it is not an unusual value. Thus, there is no evidence that the difference in response rates is different from .05. The researchers would be able to make this inference.
a.
Let p1 = proportion of managers and professionals who are male and p2 = proportion of part-time MBA students who are male. To see if the samples are sufficiently large: pˆ1 ± 3σ pˆ1 ⇒ pˆ1 ± 3
p1q1 pˆ qˆ (.95)(0.5) ⇒ pˆ1 ± 3 1 1 ⇒ .95 ± 3 n1 n1 162
⇒ .95 ± .05 ⇒ (.90, 1.00) pˆ 2 ± 3σ pˆ 2 ⇒ pˆ 2 ± 3
p2 q2 pˆ qˆ (.689)(.311) ⇒ pˆ 2 ± 3 2 2 ⇒ .95 ± 3 n2 n2 109
⇒ .689 ± .133 ⇒ (.556, .822) Since both intervals are contained within the interval (0, 1), the normal approximation will be adequate. First, we calculate the overall estimate of the common proportion under H0. pˆ =
n1 pˆ1 + n2 pˆ 2 162(.95) + 109(.689) = .845 = n1 + n2 162 + 109
To determine if the population of managers and professionals consists of more males than the part-time MBA population, we test: H0: p1 = p2 Ha: p1 > p2 The test statistic is z =
( pˆ1 − pˆ 2 ) − 0 ⎛1 1⎞ ˆ ˆ⎜ + ⎟ pq ⎝ n1 n2 ⎠
=
(.95 − .689) − 0 1 ⎞ ⎛ 1 + .845(.155) ⎜ ⎟ ⎝ 162 109 ⎠
= 5.82
The rejection region requires α = .05 in the upper tail of the z-distribution. From Table IV, Appendix B, z.05 = 1.645. The rejection region is z > 1.645. Since the observed value of the test statistic falls in the rejection (z = 5.82 > 1.645), H0 is rejected. There is sufficient evidence to indicate that population of managers and professionals consists of more males than the part-time MBA population at α = .05.
Inferences Based on Two Samples: Confidence Intervals and Tests of Hypothesis
217
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
b.
We had to assume: 1. Both samples were randomly selected 2. Both sample sizes are sufficiently large.
c.
First, we calculate the overall estimate of the common proportion under H0. pˆ =
n1 pˆ1 + n2 pˆ 2 162(.912) + 109(.534) = = .760 n1 + n2 162 + 109
To determine if the population of managers and professionals consists of more married individuals than the part-time MBA population, we test: H0: p1 = p2 Ha: p1 > p2 The test statistic is z =
( pˆ1 − pˆ 2 ) − 0 ⎛1 1⎞ ˆ ˆ⎜ + ⎟ pq ⎝ n1 n2 ⎠
(.912 − .534) − 0
=
1 ⎞ ⎛ 1 + .760(.240) ⎜ ⎟ ⎝ 162 109 ⎠
= 7.14
The rejection region requires α = .01 in the upper tail of the z-distribution. From Table IV, Appendix B, z.01 = 2.33. The rejection region is z > 2.33. Since the observed value of the test statistic falls in the rejection (z = 7.14 > 2.33), H0 is rejected. There is sufficient evidence to indicate that population of managers and professionals consists of more married individuals than the part-time MBA population at α = .01. d.
We had to assume: 1. Both samples were randomly selected 2. Both sample sizes are sufficiently large.
7.54
Let p1 = accuracy rate for modules with correct code and p2 = accuracy rate for modules with defective code. Some preliminary calculations are:
pˆ 1 =
218
x1 400 = = .891 n1 449
pˆ 2 =
x 2 20 = = .408 n2 49
Chapter 7
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
For confidence coefficient .99, α = .01 and α/2 = .01/2 = .005. From Table IV, Appendix B, z.005 = 2.58. The 99% confidence interval is: ( pˆ1 − pˆ 2 ) ± z.005
pˆ1qˆ1 pˆ 2 qˆ2 .891(.109) .408(.592) + ⇒ (.891 − .408) ± 2.58 + n1 n2 449 49 ⇒ .483 ± .185 ⇒ (.298, .668)
We are 99% confident that the difference in accuracy rates between modules with correct code and modules with defective code is between .298 and .668. 7.56
a.
Let p = proportion of all children who recognize Joe Camel. pˆ =
x 15 + 46 = = .735 n 28 + 55
qˆ = 1 − pˆ = 1 − .735 = .265
To see if the sample is sufficiently large: pˆ ± 3σ pˆ ⇒ pˆ ± 3
ˆˆ pq pq .735(.265) ⇒ pˆ ± 3 ⇒ .735 ± 3 ⇒ .735 ± .145 n n 83 ⇒ (.590, .880)
Since the interval lies within the interval (0, 1), the normal approximation will be adequate. For confidence coefficient .95, α = .05 and α/2 = .05/2 = .025. From Table IV, Appendix B, z.025 = 1.96. The 95% confidence interval is: pˆ ± z.025
ˆˆ pq .735(.265) ⇒ .735 ± 1.96 ⇒ .735 ± .095 ⇒ (.640, .830) n 83
We are 95% confident that the proportion of all children who recognize Joe Camel is between .640 and .830. b.
Let p1 = proportion of children under the age of 6 who recognize Joe Camel and p2 = proportion of children age 6 and over who recognize Joe Camel. x1 15 = = .536 n1 28 x 46 pˆ 2 = 2 = = .836 n2 55 pˆ1 =
qˆ1 = 1 − pˆ1 = 1 − .536 = .464 qˆ2 = 1 − pˆ 2 = 1 − .836 = .164
Inferences Based on Two Samples: Confidence Intervals and Tests of Hypothesis
219
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
To see if the samples are sufficiently large:
pˆ1 ± 3σ pˆ1 ⇒ pˆ1 ± 3
pˆ 2 ± 3σ pˆ 2 ⇒ pˆ 2 ± 3
p1q1 pˆ qˆ .536(.464) ⇒ pˆ1 ± 3 1 1 ⇒ .536 \± 3 28 n1 n1 p2 q2 n2
⇒ .536 ± .283 ⇒ (.253, .819) pˆ qˆ .836(.164) ⇒ pˆ 2 ± 3 2 2 ⇒ .836 ± 3 n2 55 ⇒ .836 ± .150 ⇒ (.686, .986)
Since both intervals lie within the interval (0, 1), the normal approximation will be adequate. To determine if the recognition of Joe Camel increases with age, we test: H0: p1 − p2 = 0 Ha: p1 − p2 < 0 The test statistic is z =
( pˆ1 − pˆ 2 ) − 0 ⎛1 1⎞ ˆ ˆ⎜ + ⎟ pq ⎝ n1 n2 ⎠
=
(.536 − .836) − 0 1 ⎞ ⎛ 1 .735(.265) ⎜ + ⎟ ⎝ 28 55 ⎠
= −2.93
The rejection region requires α = .05 in the lower tail of the z-distribution. From Table IV, Appendix B, z.05 = 1.645. The rejection region is z < −1.645. Since the observed value of the test statistic falls in the rejection region (z = −2.93 < −1.645), H0 is rejected. There is sufficient evidence to indicate that the recognition of Joe Camel increases with age at α = .05. 7.58
a.
For confidence coefficient .95, α = 1 − .95 = .05 and α/2 = .05/2 = .025. From Table IV, Appendix B, z.025 = 1.96. n1 = n2 =
b.
2 ( zα / 2 ) (σ 12 + σ 22 )
ME 2
=
(1.96) 2 (152 + 17 2 ) = 192.83 ≈ 193 3.22
If the range of each population is 40, we would estimate σ by:
σ ≈ 60/4 = 15 For confidence coefficient .99, α = 1 − .99 = .01 and α/2 = .01/2 = .005. From Table IV, Appendix B, z.005 = 2.58. n1 = n2 =
220
2 ( zα / 2 ) (σ 12 + σ 22 )
ME 2
=
(2.58) 2 (152 + 152 ) = 46.80 ≈ 47 82
Chapter 7
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
c.
For confidence coefficient .9, α = 1 − .9 = .1 and α/2 = .1/2 = .05. From Table IV, Appendix B, z.05 = 1.645. For a width of 1, the bound is .5. n1 = n2 =
7.60
2 ( zα / 2 ) (σ 12 + σ 22 )
ME
2
=
(1.645) 2 (5.82 + 7.52 ) = 143.96 ≈ 144 .52
First, find the sample sizes needed for width 5, or margin of error 2.5. For confidence coefficient .9, α = 1 − .9 = .1 and α/2 = .1/2 = .05. From Table IV, Appendix B, z.05 = 1.645. n1 = n2 =
2 ( zα / 2 ) (σ 12 + σ 22 )
ME 2
=
(1.645) 2 (102 + 102 ) = 86.59 ≈ 87 2.52
Thus, the necessary sample size from each population is 87. Therefore, sufficient funds have not been allocated to meet the specifications since n1 = n2 = 100 are large enough samples. 7.62
For confidence coefficient .95, α = .05 and α/2 = .05/2 = .025. From Table IV, Appendix B, z.025 = 1.96.
n1 = n2 =
2 ( zα / 2 ) (σ 12 + σ 22 )
( ME ) 2
=
1.962 (3.1892 + 2.3552 ) = 26.8 ≈ 27 1.52
We would need to sample 27 specimens from each location. 7.64
For confidence coefficient .90, α = 1 − .90 = .10 and α/2 = .10/2 = .05. From Table IV, Appendix B, z.05 = 1.645. Since no information is given about the values of p1 and p2, we will be conservative and use .5 for both. A width of .04 means the bound is .04/2 = .02. n1 = n2 =
7.66
a.
( zα / 2 )
( p1 q1 + p2 q2 ) ( ME ) 2
=
1.6452 (.5(.5) + .5(.5) ) .022
= 3,382.5 ≈ 3,383
For confidence coefficient .80, α = 1 − .80 = .20 and α/2 = .20/2 = .10. From Table IV, Appendix B, z.10 = 1.28. Since we have no prior information about the proportions, we use p1 = p2 = .5 to get a conservative estimate. For a width of .06, the margin of error is .03. n1 = n2 =
b.
2
( zα / 2 )
2
( p1q1 + p2 q2 ) ME 2
=
(1.28) 2 (.5(1 − .5) + .5(1 − .5) ) .032
= 910.22 ≈ 911
For confidence coefficient .90, α = 1 − .90 = .10 and α/2 = .10/2 = .05. From Table IV, Appendix B, z.05 = 1.645. Using the formula for the sample size needed to estimate a proportion from Chapter 7, n=
( zα / 2 )
2
ME
2
pq
=
1.6452 (.5(1 − .5) ) .02
2
=
.6765 = 1691.27 ≈ 1692 .0004
No, the sample size from part a is not large enough.
Inferences Based on Two Samples: Confidence Intervals and Tests of Hypothesis
221
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
7.68
For confidence coefficient .95, α = 1 − .95 = .05 and α/2 = .025. From Table IV, Appendix B, z.025 = 1.96.
n1 = n2 = 7.70
2 ( zα / 2 ) (σ 12 + σ 22 )
( ME ) 2
=
1.962 (352 + 802 ) = 292.9 ≈ 293 102
a.
With ν1 = 2 and ν2 = 30, P(F ≥ 5.39) = .01 (Table XI, Appendix B)
b.
With ν1 = 24 and ν2 = 10, P(F ≥ 2.74) = .05 (Table IX, Appendix B) Thus, P(F < 2.74) = 1 − P(F ≥ 2.74) = 1 − .05 = .95.
c.
With ν1 = 7 and ν2 = 1, P(F ≥ 236.8) = .05 (Table VIII, Appendix B) Thus, P(F < 236.8) = 1 − P(F ≥ 236.8) = 1 − .05 = .95.
d.
7.72
With ν1 = 40 and ν2 = 40, P(F > 2.11) = .01 (Table XI, Appendix B)
To test H0: σ 12 = σ 22 against Ha: σ 12 ≠ σ 22 , the rejection region is F > Fα/2 with ν1 = 10 and ν2 = 12. a.
α = .20, α/2 = .10 Reject H0 if F > F.10 = 2.19 (Table VIII, Appendix B)
b.
α = .10, α/2 = .05 Reject H0 if F > F.05 = 2.75 (Table IX, Appendix B)
c.
α = .05, α/2 = .025 Reject H0 if F > F.025 = 3.37 (Table X, Appendix B)
d.
α = .02, α/2 = .01 Reject H0 if F > F.01 = 4.30 (Table XI, Appendix B)
7.74
a.
To determine if a difference exists between the population variances, we test: H0: σ 12 = σ 22 Ha: σ 12 ≠ σ 22
The test statistic is F =
222
s22 8.75 = = 2.26 s12 3.87
Chapter 7
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
The rejection region requires α/2 = .10/2 = .05 in the upper tail of the F-distribution with ν1 = n2 − 1 = 27 − 1 = 26 and ν2 = n1 − 1 = 12 − 1 = 11. From Table IX, Appendix B, F.05 ≈ 2.60. The rejection region is F > 2.60. Since the observed value of the test statistic does not fall in the rejection region (F = 2.26 >/ 2.60), H0 is not rejected. There is insufficient evidence to indicate a difference between the population variances. b.
The p-value is 2P(F ≥ 2.26). From Tables VIII and IX, with ν1 = 26 and ν2 = 11, 2(.05) < 2P(F ≥ 2.26) < 2(.10) ⇒ .10 < 2P(F ≥ 2.26) < .20 There is no evidence to reject H0 for α ≤ .10.
7.76
Let σ 12 = variance of carat size for diamonds certified by GIA, σ 22 = variance of carat size for diamonds certified by HRD, and σ 32 = variance of carat size for diamonds certified by IGI. a.
To determine if the variation in carat size differs for diamonds certified by GIA and diamonds certified by HRD, we test: H0: σ 12 = σ 22
Ha: σ 12 ≠ σ 22 The test statistic is F =
Larger sample variance s12 .24562 = = = 1.799 Smaller sample variance s22 .18312
The rejection region requires α/2 = .05/2 = .025 in the upper tail of the F-distribution with ν1 = n1 – 1 = 151 – 1 = 150 and ν2 = n2 – 1 = 79 – 1 = 78. From Table X, Appendix B, F.025 ≈ 1.43. The rejection region is F > 1.43.
Since the observed value of the test statistic falls in the rejection region (F = 1.799 > 1.43), H0 is rejected. There is sufficient evidence to indicate the variation in carat size differs for diamonds certified by GIA and those certified by HRD at α = .05. b.
To determine if the variation in carat size differs for diamonds certified by GIA and diamonds certified by IGI, we test: H0: σ 12 = σ 32 Ha: σ 12 ≠ σ 32
The test statistic is F =
Larger sample variance s12 .24562 = = = 1.289 Smaller sample variance s32 .21632
The rejection region requires α/2 = .05/2 = .025 in the upper tail of the F-distribution with ν1 = n1 – 1 = 151 – 1 = 150 and ν2 = n3 – 1 = 78 – 1 = 77. From Table X, Appendix B, F.025 ≈ 1.43. The rejection region is F > 1.43.
Inferences Based on Two Samples: Confidence Intervals and Tests of Hypothesis
223
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
Since the observed value of the test statistic does not fall in the rejection region (F = 1.289 >/ 1.43), H0 is not rejected. There is insufficient evidence to indicate the variation in carat size differs for diamonds certified by GIA and those certified by IGI at α = .05. c.
To determine if the variation in carat size differs for diamonds certified by HRD and diamonds certified by IGI, we test: H0: σ 22 = σ 32 Ha: σ 22 ≠ σ 32
Larger sample variance s32 .21632 = = = 1.396 Smaller sample variance s22 .18312 The rejection region requires α/2 = .05/2 = .025 in the upper tail of the F-distribution with ν1 = n3 – 1 = 78 – 1 = 77 and ν2 = n2 – 1 = 79 – 1 = 78. From Table X, Appendix B, F.025 ≈ 1.67. The rejection region is F > 1.67. The test statistic is F =
Since the observed value of the test statistic does not fall in the rejection region (F = 1.396 >/ 1.67), H0 is not rejected. There is insufficient evidence to indicate the variation in carat size differs for diamonds certified by HRD and those certified by IGI at α = .05. d.
We will look at the 4 methods for determining if the data are normal. First, we will look at histograms of the data. Using MINITAB, the histograms of the carat sizes for the 3 certification bodies are:
40 40
30
Percent
Percent
30
20
20
10 10
0
0
0.0
0.5
1.0
0.0
GIA
0.5
1.0
HRD
40
Percent
30
20
10
0 0.0
0.5
1.0
IGI
From the histograms, none of the data appear to be mound-shaped. It appears that none of the data sets are normal.
224
Chapter 7
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
Next, we look at the intervals x ± s, x ± 2 s, x ± 3s . If the proportions of observations falling in each interval are approximately .68, .95, and 1.00, then the data are approximately normal. Using MINITAB, the summary statistics are: Descriptive Statistics: GIA, IGI, HRD Variable GIA IGI HRD
N 151 78 79
Mean 0.6723 0.3665 0.8129
Median 0.7000 0.2900 0.8100
TrMean 0.6713 0.3406 0.8169
Variable GIA IGI HRD
Minimum 0.3000 0.1800 0.5000
Maximum 1.1000 1.0100 1.0900
Q1 0.5000 0.2100 0.6500
Q3 0.9000 0.4850 1.0000
StDev 0.2456 0.2163 0.1831
SE Mean 0.0200 0.0245 0.0206
For GIA:
x ± s ⇒ .6723 ± .2456 ⇒ (.4267, .9179) 84 of the 151 values fall in this interval. The proportion is .56. This is much smaller than the .68 we would expect if the data were normal. x ± 2 s ⇒ .6723 ± 2(.2456) ⇒ .6723 ± .4912 ⇒ (.1811, 1.1635) 151 of the 151 values fall in this interval. The proportion is 1.00. This is much larger than the .95 we would expect if the data were normal. x ± 3s ⇒ .6723 ± 3(.2456) ⇒ .6723 ± .7368 ⇒ (−.0645, 1.4091) 151 of the 151 values fall in this interval. The proportion is 1.00. This is the same as the 1.00 we would expect if the data were normal. From this method, it appears that the data are not normal. For IGI:
x ± s ⇒ .3665 ± .2163 ⇒ (.1502, .5828) 69 of the 78 values fall in this interval. The proportion is .88. This is much larger than the .68 we would expect if the data were normal. x ± 2s ⇒ .3665 ± 2(.2163) ⇒ .3665 ± .4326 ⇒ (−.0661, .7991) 74 of the 78 values fall in this interval. The proportion is .95. This is the same as the .95 we would expect if the data were normal. x ± 3s ⇒ .3665 ± 3(.2163) ⇒ .3665 ± .6489 ⇒ (−.2824, 1.0154) 78 of the 78 values fall in this interval. The proportion is 1.00. This is the same as the 1.00 we would expect if the data were normal. From this method, it appears that the data are not normal.
Inferences Based on Two Samples: Confidence Intervals and Tests of Hypothesis
225
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
For HRD:
x ± s ⇒ .8129 ± .1831 ⇒ (.6298, .9960) 30 of the 79 values fall in this interval. The proportion is .38. This is much smaller than the .68 we would expect if the data were normal. x ± 2 s ⇒ .8129 ± 2(.1831) ⇒ .8129 ± .3662 ⇒ (.4467, 1.1791) 79 of the 79 values fall in this interval. The proportion is 1.00. This is much larger than the .95 we would expect if the data were normal. x ± 3s ⇒ .8129 ± 3(.1831) ⇒ .8129 ± .5493 ⇒ (.2636, 1.3622) 79 of the 79 values fall in this interval. The proportion is 1.00. This is the same as the 1.00 we would expect if the data were normal. From this method, it appears that the data are not normal. Next, we look at the ratio of the IQR to s. For GIA:
IQR = QU – QL = 1.1 − .3 = .8. IQR .8 = = 3.26 This is much larger than the 1.3 we would expect if the data were s .2456 normal. This method indicates the data are not normal. For IGI:
IQR = QU – QL = 1.01 - .18 = .83. IQR .83 = = 3.84 This is much larger than the 1.3 we would expect if the data were s .2163 normal. This method indicates the data are not normal. For HRD:
IQR = QU – QL = 1.09 - .5 = .59. IQR .59 = = 3.22 This is much larger than the 1.3 we would expect if the data were s .1831 normal. This method indicates the data are not normal.
226
Chapter 7
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
Finally, using MINITAB, the normal probability plot for GIA is: Normal Probability Plot for GIA ML Estimates - 95% CI
ML Estimates 99
Percent
95 90
Mean
0.672252
StDev
0.244757
Goodness of Fit
80 70 60 50 40 30 20
AD*
3.332
10 5 1
0.0
0.5
1.0
1.5
Data
Since the data do not form a straight line, the data are not normal. Using MINITAB, the normal probability plot for IGI is: Normal Probability Plot for IGI ML Estimates - 95% CI
ML Estimates 99
Mean
0.366538
StDev
0.214863
Percent
95 90
Goodness of Fit
80 70 60 50 40 30 20
AD*
5.622
10 5 1
0.0
0.5
1.0
Data
Since the data do not form a straight line, the data are not normal.
Inferences Based on Two Samples: Confidence Intervals and Tests of Hypothesis
227
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
Using MINITAB, the normal probability plot for HRD is: Normal Probability Plot for HRD ML Estimates - 95% CI
ML Estimates 99
Percent
95 90
Mean
0.812911
StDev
0.181890
Goodness of Fit
80 70 60 50 40 30 20
AD*
3.539
10 5 1
0.5
1.0
1.5
Data
Since the data do not form a straight line, the data are not normal. From the 4 different methods, all indications are that the carat size data are not normal for any of the certification bodies. 7.78
a.
The amount of variability of GHQ scores tells us how similar or different the members of the group are on GHQ scores. The larger the variability, the larger the differences are among the members on the GHQ scores. The smaller the variability, the smaller the differences are among the members on the GHQ scores.
b.
Let σ 12 = variance of the mental health scores of the employed and σ 22 = variance of the mental health scores of the unemployed. To determine if the variability in mental health scores differs for employed and unemployed workers, we test: H0: σ 12 = σ 22 Ha: σ 12 ≠ σ 22
c.
The test statistic is F =
Larger sample variance s12 5.102 = 2.45 = = Smaller sample variance s22 3.262
The rejection region requires α/2 = .05/2 = .025 in the upper tail of the F-distribution with ν1 = n2 − 1 = 49 − 1 = 48 and ν2 = n1 − 1 = 142 − 1 = 141. From Table XI, Appendix B, F.025 ≈ 1.61. The rejection region is F > 1.61. Since the observed value of the test statistic falls in the rejection region (F = 2.45 > 1.61), H0 is rejected. There is sufficient evidence to indicate that the variability in mental health scores differs for employed and unemployed workers for α = 05.
228
Chapter 7
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
d.
7.80
We must assume that the 2 populations of mental health scores are normally distributed. We must also assume that we selected 2 independent random samples.
Let σ 12 = variance zinc measurements from the text-line, σ 22 = variance zinc measurements from the witness-line, and σ 32 = variance zinc measurements from the intersection. Using MINITAB, the descriptive statistics are: Descriptive Statistics: Text-line, Witness-line, Intersection Variable Text-lin WitnessIntersec
N 3 6 5
Mean 0.3830 0.3042 0.3290
Median 0.3740 0.2955 0.3190
TrMean 0.3830 0.3042 0.3290
Variable Text-lin WitnessIntersec
Minimum 0.3350 0.1880 0.2850
Maximum 0.4400 0.4390 0.3930
Q1 0.3350 0.2045 0.2900
Q3 0.4400 0.4075 0.3730
a.
StDev 0.0531 0.1015 0.0443
SE Mean 0.0306 0.0415 0.0198
To determine if the variation in the zinc measurements for the text-line and the intersection differ, we test: H0: σ 12 = σ 32 Ha: σ 12 ≠ σ 32 The test statistic is F =
Larger sample variance s12 .05312 = 1.437 = = Smaller sample variance s32 .04432
The rejection region requires α/2 = .05/2 = .025 in the upper tail of the F-distribution with ν1 = n1 – 1 = 3 – 1 = 2 and ν2 = n3 – 1 = 5 – 1 = 4. From Table X, Appendix B, F.025 = 10.65. The rejection region is F > 10.65. Since the observed value of the test statistic does not fall in the rejection region (F = 1.437 >/ 10.65), H0 is not rejected. There is insufficient evidence to indicate the variation in the zinc measurements for the text-line and the intersection differ at α = .05. b.
To determine if the variation in the zinc measurements for the witness-line and the intersection differ, we test: H0: σ 22 = σ 32 Ha: σ 22 ≠ σ 32 The test statistic is F =
Larger sample variance s22 .10152 = 5.250 = = Smaller sample variance s32 .04432
The rejection region requires α/2 = .05/2 = .025 in the upper tail of the F-distribution with ν1 = n2 – 1 = 6 – 1 = 5 and ν2 = n3 – 1 = 5 – 1 = 4. From Table X, Appendix B, F.025 = 9.36. The rejection region is F > 9.36. Since the observed value of the test statistic does not fall in the rejection region (F = 5.250 >/ 9.36), H0 is not rejected. There is insufficient evidence to indicate the variation in the zinc measurements for the witness-line and the intersection differ at α = .05.
Inferences Based on Two Samples: Confidence Intervals and Tests of Hypothesis
229
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
7.82
c.
There is no indication that the variances of the zinc measurements for three locations differ.
d.
With only 3, 6, and 5 measurements, it is very difficult to check the assumptions.
Using MINITAB, some preliminary calculations are: Descriptive Statistics: HEATRATE Variable HEATRATE
ENGINE Advanced Aeroderiv Traditional
N 21 7 39
Variable HEATRATE
ENGINE Advanced Aeroderiv Traditional
Q3 10060 14628 11964
a.
N* 0 0 0
Mean 9764 12312 11544
SE Mean 139 1002 205
StDev 639 2652 1279
Minimum 9105 8714 10086
Q1 9252 9469 10592
Median 9669 12414 11183
Maximum 11588 16243 14796
To determine if the heat rate variances for traditional and aeroderivative augmented gas turbines differ, we test: H0: σ 22 = σ 32 Ha: σ 22 ≠ σ 32 89) The test statistic is
F=
Larger sample variance s22 26522 = 4.299 = = Smaller sample variance s32 12792
The rejection region requires α/2 = .05/2 = .025 in the upper tail of the F distribution with numerator df = ν2 = n2 – 1 = 7 – 1 = 6 and denominator df = ν3 = n3 – 1 = 39 – 1 = 38. From Table X, Appendix B, F.025 ≈ 2.74. The rejection region is F > 2.74. Since the observed value of the test statistic falls in the rejection region (F = 4.299 > 2.74), H0 is rejected. There is sufficient evidence to indicate the heat rate variances for traditional and aeroderivative augmented gas turbines differ at α = .05. Since the test in Exercise 7.23 a assumes that the population variances are the same, the validity of the test is suspect since we just found the variances are different. b.
To determine if the heat rate variances for advanced and aeroderivative augmented gas turbines differ, we test: H0: σ 12 = σ 22 Ha: σ 12 ≠ σ 22
230
Chapter 7
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
Larger sample variance s 212 26522 = 17.224 = = The test statistic is F = Smaller sample variance s12 6392 The rejection region requires α/2 = .05/2 = .025 in the upper tail of the F distribution with numerator df = ν1 = n1 – 1 = 7 – 1 = 6 and denominator df = ν2 = n2 – 1 = 21 – 1 = 20. From Table X, Appendix B, F.025 = 3.13. The rejection region is F > 3.13. Since the observed value of the test statistic falls in the rejection region (F = 17.224 > 3.13), H0 is rejected. There is sufficient evidence to indicate the heat rate variances for advanced and aeroderivative augmented gas turbines differ at α = .05. Since the test in Exercise 7.23 b assumes that the population variances are the same, the validity of the test is suspect since we just found the variances are different. 7.84
a.
The 2 samples are randomly selected in an independent manner from the two populations. The sample sizes, n1 and n2, are large enough so that x1 and x2 each have approximately normal sampling distributions and so that s12 and s22 provide good approximations to σ 12 and σ 22 . This will be true if n1 ≥ 30 and n2 ≥ 30.
b.
7.86
1. 2. 3.
Both sampled populations have relative frequency distributions that are approximately normal. The population variances are equal. The samples are randomly and independently selected from the populations.
c.
1. 2.
The relative frequency distribution of the population of differences is normal. The sample of differences are randomly selected from the population of differences.
d.
The two samples are independent random samples from binomial distributions. Both samples should be large enough so that the normal distribution provides an adequate approximation to the sampling distributions of pˆ1 and pˆ 2 .
e.
The two samples are independent random samples from populations which are normally distributed.
a.
H0: σ 12 = σ 22 Ha: σ 12 ≠ σ 22 The test statistic is F =
s2 Larger sample variance 120.1 = 22 = = 3.84 Smaller sample variance s1 31.3
The rejection region requires α/2 = .05/2 = .025 in the upper tail of the F-distribution with numerator df ν1 = n2 − 1 = 15 − 1 = 14 and denominator df ν2 = n1 − 1 = 20 − 1 = 19. From Table XI, Appendix B, F.025 ≈ 2.66. The rejection region is F > 2.66. Since the observed value of the test statistic falls in the rejection region (F = 3.84 > 2.66), H0 is rejected. There is sufficient evidence to conclude σ 12 ≠ σ 22 at α = .05.
Inferences Based on Two Samples: Confidence Intervals and Tests of Hypothesis
231
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
b.
7.88
No, we should not use a small sample t test to test H0: (μ1 − μ2) = 0 against Ha: (μ1 − μ2) ≠ 0 because the assumption of equal variances does not seem to hold since we concluded σ 12 ≠ σ 22 in part b.
Some preliminary calculations are:
pˆ1 = a.
x1 110 x 130 x +x 110 + 130 240 = = = .55; pˆ 2 = 2 = = .65; pˆ = 1 2 = n1 200 n2 200 n1 + n2 200 + 200 400
H0: (p1 − p2) = 0 Ha: (p1 − p2) < 0 The test statistic is z =
( pˆ1 − pˆ 2 ) − 0 ⎛1 1⎞ ˆ ˆ⎜ + ⎟ pq ⎝ n1 n2 ⎠
=
(.55 − .65) − 0 1 ⎞ ⎛ 1 .6(1 − .6) ⎜ + ⎟ ⎝ 200 200 ⎠
=
−.10 = −2.04 .049
The rejection region requires α = .10 in the lower tail of the z-distribution. From Table IV, Appendix B, z.10 = 1.28. The rejection region is z < −1.28. Since the observed value of the test statistic falls in the rejection region (z = −2.04 < −1.28), H0 is rejected. There is sufficient evidence to conclude (p1 − p2 < 0) at α = .10. b.
For confidence coefficient .95, α = 1 − .95 = .05 and α/2 = .05/2 = .025. From Table IV, Appendix B, z.025 = 1.96. The 95% confidence interval for (p1 − p2) is approximately: pˆ qˆ pˆ qˆ ( pˆ1 − pˆ 2 ) ± zα / 2 1 2 + 2 2 n1 n2 .55(1 − .55) .65(1 − .65) + 200 200 ⇒ −.10 ± .096 ⇒ (−.196, −.004) ⇒ (.55 − .65) ± 1.96
c.
From part b, z.025 = 1.96. Using the information from our samples, we can use p1 = .55 and p2 = .65. For a width of .01, the margin of error is .005. n1 = n2 =
232
( zα / 2 )
2
( p1q1 + p2 q2 ) ME
2
=
(1.96) 2 (.55(1 − .55) + .65(1 − .65) )
.005 = 72990.4 ≈ 72,991
2
=
1.82476 .000025
Chapter 7
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
7.90
a.
Let p1 = proportion of Opening Doors students enrolled full time and p2 = proportion of traditional students enrolled full time. The target parameter for this comparison is p1 – p2.
b.
Let μ1 = mean GPA of Opening Doors students and μ2 = mean GPA of traditional students. The target parameter for this comparison is μ1 – μ2.
7.92
Using MINITAB, some preliminary calculations are: Descriptive Statistics: Spillage Variable Spillage
Cause Collision Fire Grounding HullFail Unknown
Variable Spillage
Cause Q3 Maximum Collision 102.0 257.0 Fire 80.5 239.0 Grounding 59.00 124.00 HullFail 46.0 221.0 Unknown * 27.00
a.
N 10 12 14 12 2
N* 0 0 0 0 0
Mean 76.6 70.9 47.79 54.4 26.00
SE Mean 22.3 17.5 7.61 16.3 1.00
StDev 70.4 60.7 28.47 56.4 1.41
Minimum 31.0 26.0 21.00 24.0 25.00
Q1 35.0 32.3 30.25 29.3 *
Median 41.5 49.0 37.50 31.5 26.00
Let μ1 = mean spillage for accidents caused by collision and μ2 = mean spillage for accidents caused by fire/explosion. s 2p =
( n1 − 1) s12 + ( n2 − 1) s22 = (10 − 1) 70.42 + (12 − 1) 60.72 n1 + n2 − 2
10 + 12 − 2
= 4, 256.7415
For confidence coefficient .90, α = 1 − .90 = .10 and α/2 = .10/2 = .05. From Table VI, Appendix B, with df = n1 + n2 − 2 = 10 + 12 – 2 = 20, t.05 = 1.725. The confidence interval is:
⎛1 1 ⎞ ⎛ 1 1 ⎞ ( x1 − x2 ) ± t.05 s 2p ⎜ + ⎟ ⇒ (76.6 − 70.9) ± 1.725 4256.7415 ⎜ + ⎟ ⎝ 10 12 ⎠ ⎝ n1 n2 ⎠ ⇒ 5.7 ± 48.19 ⇒ ( −42.59, 53.89) b.
Let μ3 = mean spillage for accidents caused by grounding and μ4 = mean spillage for accidents caused by hull failure. s 2p =
( n1 − 1) s12 + ( n2 − 1) s22 = (14 − 1) 28.47 2 + (12 − 1) 56.42 n1 + n2 − 2
14 + 12 − 2
= 1,896.9830
Inferences Based on Two Samples: Confidence Intervals and Tests of Hypothesis
233
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
To determine if the mean spillage amount for accidents caused by grounding is different from the mean spillage amount caused by hull failure, we test: H0: μ3 − μ4 = 0 Ha: μ3 − μ4 ≠ 0 The test statistic is t =
( x1 − x2 ) − Do ⎛1 1 ⎞ s 2p ⎜ + ⎟ ⎝ n1 n2 ⎠
=
( 47.79 − 54.4 ) − 0 ⎛ 1 1⎞ 1896.983 ⎜ + ⎟ ⎝ 14 12 ⎠
=
−6.61 = −.39 17.1342
The rejection region requires α/2 = .05/2 = .025 in each tail of the t-distribution with df = n1 + n2 – 2 = 14 +12 – 2 = 24. From Table VI, Appendix B, t.025 = 2.064. The rejection region is t < −2.064 or t > 2.064. Since the observed value of the test statistic does not fall in the rejection region (t = −.39 −2.064), H0 is not rejected. There is insufficient evidence to indicate the mean spillage amount for accidents caused by grounding is different from the mean spillage amount caused by hull failure at α = .05. c.
The necessary assumptions are: We must assume that the distributions from which the samples were selected are approximately normal, the samples are independent, and the variances of the two populations are equal. Below are the stem-and-leaf plots for each of the samples: Stem-and-leaf of Spillage Cause = Collision Leaf Unit = 10 (6) 4 2 1 1 1
0 0 1 1 2 2
234
0 0 0 0 1 1 1 1 1 2 2
= 10
333444 69 2 5
Stem-and-leaf of Spillage Cause = Fire Leaf Unit = 10 4 (4) 4 3 2 2 1 1 1 1 1
N
N
= 12
2333 4455 7 8 3
3
Chapter 7
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
Stem-and-leaf of Spillage Cause = Grounding Leaf Unit = 1.0 3 (5) 6 4 3 2 2 2 1 1 1
2 3 4 5 6 7 8 9 10 11 12
N
168 11678 15 8 2 1 4
Stem-and-leaf of Spillage Cause = Hull Failure Leaf Unit = 10 (8) 4 2 2 2 1 1 1 1 1 1
0 0 0 0 1 1 1 1 1 2 2
= 14
N
= 12
22233333 44 0
2
Based on the shapes of the stem-and-leaf plots, it does not appear that the data are normally distributed. Also, we know that if the data are normally distributed, then the Interquartile Range, IQR, divided by the standard deviation should be approximately 1.3. We will compute IQR/s for each of the samples: Collision: Fire: Grounding: Hull Failure:
IQR/s = (102.0 – 35.0) / 70.4 = .95 IQR/s = (80.5 – 32.3) / 60.7 = .79 IQR/s = (59.0 – 30.25) / 28.47 = 1.01 IQR/s = (46.0 – 29.3) / 56.4 = .29
Since all of these ratios are quite a bit smaller than 1.3, it indicates that none of the samples come from normal distributions. Thus, it appears that the assumption of normal distributions is violated. The sample standard deviations are: Collision: Fire: Grounding: Hull Failure:
s = 70.4 s = 60.7 s = 28.47 s = 56.4
Without doing formal tests, it appears that the variances of the groups Collision, Fire, and Hull Failure are probably not significantly different. However, it appears that the variance for the grounding group is smaller than the others.
Inferences Based on Two Samples: Confidence Intervals and Tests of Hypothesis
235
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
d.
Let σ 12 = variance of spillage for accidents caused by collision and σ 22 = variance of spillage for accidents caused by grounding. To determine if the variances of the amounts of spillage due to collision and grounding differ, we test: H0: σ 12 = σ 22 Ha: σ 12 ≠ σ 22 The test statistic is F =
Larger sample variance s12 70.42 = 6.11 = 2 = Smaller sample variance s2 28.47 2
The rejection region requires α/2 = .02/2 = .01 in the upper tail of the F distribution with numerator df = ν1 = n1 – 1 = 10 – 1 = 9 and denominator df = ν2 = n2 – 1 = 14 – 1 = 13. From Table XI, Appendix B, F.01 = 4.19. The rejection region is F > 4.19. Since the observed value of the test statistic falls in the rejection region (F = 6.11 > 4.19), H0 is rejected. There is sufficient evidence to indicate the variances of the amounts of spillage due to collision and grounding differ at α = .02. 7.94
a.
Let μ1 = mean rating of concern about product tampering for males and μ2 = mean rating of concern about product tampering for females. To determine whether a difference exists in the mean level of concern about product tampering between males and females, we test: H0: μ1 − μ2 = 0 Ha: μ1 − μ2 ≠ 0
7.96
b.
The p-value = .008. Since the p-value is so small, there is evidence to reject H0. There is sufficient evidence to indicate a difference exists in the mean level of concern about product tampering between males and females for α > .008.
c.
We must assume the sample sizes were sufficiently large so that the Central Limit Theorem applies. We must also assume that we selected two random and independent samples from the two populations.
For confidence coefficient .95, α = .05 and α/2 = .025. From Table IV, Appendix B, z.025 = 1.96. n1 − n 2 =
7.98
236
a.
( zα / 2 )
2
( p1q1 + p2 q2 ) ( ME ) 2
=
1.962 (.395(.605) + .293(.707) ) .032
= 1904.26 ≈ 1905
Let p1999 = proportion of adult Americans who would vote for a woman president in 1999 and p1975 = proportion of adult Americans who would vote for a woman president in 1975.
Chapter 7
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
b.
To see if the samples are sufficiently large: pˆ1999 ± 3σ pˆ1999 ⇒ pˆ1999 ± 3
p1999 q1999 pˆ qˆ .92(.08) ⇒ pˆ1999 ± 3 1999 1999 ⇒ .92 ± 3 n1999 n1999 2000
⇒ .92 ± .02 ⇒ (.90, .94) pˆ1975 ± 3σ pˆ1975 ⇒ pˆ1975 ± 3
p1975 q1975 pˆ qˆ .73(.27) ⇒ pˆ1975 ± 3 1975 1975 ⇒ .73 ± 3 n1975 n1975 2000
⇒ .73 ± .03 ⇒ (.70, .76) Since both intervals are contained within the interval (0, 1), the normal approximation will be adequate. c.
For confidence coefficient .90, α = .10 and α/2 = .10/2 = .05. From Table IV, Appendix B, z.05 = 1.645. The 90% confidence interval is: ( pˆ1 − pˆ 2 ) ± z.05
pˆ1 qˆ1 pˆ 2 qˆ2 .92(.08) .73(.27) + ⇒ (.92 − .73) ± 1.645 + n1 n2 2000 1500
⇒ .19 ± .02 ⇒ (.17, .21) We are 90% confident that the difference in the proportions of adult Americans who would vote for a woman president between 1999 and 1975 is between .17 and .21. d.
To see if the samples are sufficiently large: pˆ1999 ± 3σ pˆ1999 ⇒ pˆ1999 ± 3
p1999 q1999 pˆ qˆ .92(.08) ⇒ pˆ1999 ± 3 1999 1999 ⇒ .92 ± 3 n1999 n1999 20
⇒ .92 ± .18 ⇒ (.74, 1.10) pˆ1975 ± 3σ pˆ1975 ⇒ pˆ1975 ± 3
p1975 q1975 pˆ qˆ .73(.27) ⇒ pˆ1975 ± 3 1975 1975 ⇒ .73 ± 3 n1975 n1975 50
⇒ .73 ± .19 ⇒ (.54, .92) Since the first interval is not contained within the interval (0, 1), the normal approximation will not be adequate. 7.100
a.
For each measure, let μ1 = mean job satisfaction for day-shift nurses and μ2 = mean job satisfaction for night-shift nurses. To determine whether a difference in job satisfaction exists between day-shift and night-shift nurses, we test: H0: μ1 − μ2 = 0 Ha: μ1 − μ2 ≠ 0
Inferences Based on Two Samples: Confidence Intervals and Tests of Hypothesis
237
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
b.
Hours of work: The p-value = .813. Since the p-value is so large, there is no evidence to reject H0. There is insufficient evidence to indicate a difference in mean job satisfaction exists between day-shift and night-shift nurses on hours of work for α ≤ .10. Free time: The p-value = .047. Since the p-value is so small, there is evidence to reject H0. There is sufficient evidence to indicate a difference in mean job satisfaction exists between day-shift and night-shift nurses on free time for α > .047. Breaks: The p-value = .0073. Since the p-value is so small, there is evidence to reject H0. There is sufficient evidence to indicate a difference in mean job satisfaction exists between day-shift and night-shift nurses on breaks for α > .0073.
c.
We must make the following assumptions for each measure: 1. 2. 3.
7.102
The job satisfaction scores for both day-shift and night-shift nurses are normally distributed. The variances of job satisfaction scores for both day-shift and night-shift nurses are equal. Random and independent samples were selected from both populations of job satisfaction scores.
For confidence coefficient .90, α = 1 − .90 = .10 and α/2 = .10/2 = .05. From Table IV, Appendix B, z.05 = 1.645. We estimate p1 = p2 = .5. n1 − n 2 =
7.104
( zα / 2 )
2
( p1q1 + p2 q2 ) ( ME ) 2
=
(1.645)2 (.5(.5) + .5(.5) ) .052
= 541.205 ≈ 542
Let p1 = proportion of larvae that died in containers containing high carbon dioxide levels and p2 = proportion of larvae that died in containers containing normal carbon dioxide levels. The parameter of interest for this problem is p1 − p2, or the difference in the death rates for the two groups. Some preliminary calculations are: pˆ =
x1 + x2 .10(80) + .05(80) = = .075 n1 + n2 80 + 80
qˆ = 1 − pˆ = 1 − .075 = .925
To determine if an increased level of carbon dioxide is effective in killing a higher percentage of leaf-eating larvae, we test: H0: p1 − p2 = 0 Ha: p1 − p2 > 0 The test statistic is z =
238
( pˆ1 − pˆ 2 ) − 0 1 1 ˆ ˆ ⎛⎜ + ⎞⎟ pq ⎝ 80 80 ⎠
=
(.10 − .05) − 0 1 ⎞ ⎛ 1 .075(.925) ⎜ + ⎟ ⎝ 80 80 ⎠
= 1.201
Chapter 7
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
The rejection region requires α = .01 in the upper tail of the z distribution. From Table IV, Appendix B, z.01 = 2.33. The rejection region is z > 2.33. Since the observed value of the test statistic does not fall in the rejection region (z = 1.201 >/ 2.33), H0 is not rejected. There is insufficient evidence to indicate that an increased level of carbon dioxide is effective in killing a higher percentage of leaf-eating larvae at α = .01. 7.106
a.
Let p1 = proportion of female students who switched due to loss of interest in SME and p2 = proportion of male students who switched due to lack of interest in SME. Some preliminary calculations are: pˆ1 =
x1 74 x x +x 72 74 + 72 = = .43; pˆ 2 = 2 = = .44; pˆ = 1 2 = = .436 n1 172 n2 163 n1 + n2 172 + 163
To determine if the proportion of female students who switch due to lack of interest in SME differs from the proportion of males who switch due to a lack of interest, we test: H0: p1 − p2 = 0 Ha: p1 − p2 ≠ 0 The test statistic is z =
( pˆ1 − pˆ 2 ) − 0 ⎛1 1⎞ ˆ ˆ⎜ + ⎟ pq ⎝ n1 n2 ⎠
=
(.43 − .44) − 0 1 ⎞ ⎛ 1 + .436(.564) ⎜ ⎟ ⎝ 172 163 ⎠
= −0.18
The rejection region requires α/2 = .10/2 = .05 in each tail of the z-distribution. From Table IV, Appendix B, z.05 = 1.645. The rejection region is z < −1.645 or z > 1.645. Since the observed value of the test statistic does not fall in the rejection region (z = −0.18 -1.645), H0 is not rejected. There is insufficient evidence to indicate the proportion of female students who switch due to lack of interest in SME differs from the proportion of males who switch due to a lack of interest in SME at α = .10. b.
Let p1 = proportion of female students who switched due to low grades in SME and p2 = proportion of male students who switched due to low grades in SME. Some preliminary calculations are:
pˆ1 =
x1 33 = = .19; n1 172
pˆ 2 =
x2 44 = = .27 n2 163
For confidence coefficient .90, α = .10 and α/2 = .10/2 = .05. From Table IV, Appendix B, z.05 = 1.645. The confidence interval is: ( pˆ1 − pˆ 2 ) ± z.05
pˆ1qˆ1 pˆ 2 qˆ2 .19(.81) .27(.73) + ⇒ (.19 − .27) ± 1.645 + n1 n2 172 163 ⇒ −.08 ± .075 ⇒ (−.155, −.005)
Inferences Based on Two Samples: Confidence Intervals and Tests of Hypothesis
239
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
We are 90% confident that the difference between the proportions of female and male switchers who lost confidence due to low grades in SME is between −.155 and −.005. Since the interval does not include 0, there is evidence to indicate the proportion of female switchers due to low grades is less than the proportion of male switchers due to low grades. 7.108
For confidence level .95, α = .05 and α/2 = .05/2 = .025. From Table IV, Appendix B, z.025 = 1.96. The standard deviation can be estimated by dividing the range by 4:
σ≈
Range 4 = =1 4 4 2 ( zα / 2 ) (σ 12 + σ 22 )
n1 = n 2 = 7.110
( ME ) 2
=
1.962 (12 + 12 ) = 192.08 ≈ 193 .22
Some preliminary calculations are:
∑x
2 1
s12 =
2
1
n1
=
n1 − 1
∑x
2 2
s22 =
(∑ x ) −
(∑ x ) −
2
2
n2 n2 − 1
2252 5 = 126 = 31.5 4 5 −1
10, 251 −
=
227 2 5 = 45.2 = 11.3 4 5 −1
10,351 −
Let σ 12 = variance for instrument A and σ 22 = variance for instrument B. Since we wish to determine if there is a difference in the precision of the two machines, we test: H0: σ 12 = σ 22 Ha: σ 12 ≠ σ 22 The test statistic is F =
Larger sample variance s12 31.5 = 2.79 = = Smaller sample variance s22 11.3
The rejection region requires α/2 = .10/2 = .05 in the upper tail of the F-distribution with ν1 = n1 − 1 = 5 − 1 = 4 and ν2 = n2 − 1 = 5 − 1 = 4. From Table IX, Appendix B, F.05 = 6.39. The rejection region is F > 6.39. Since the observed value of the test statistic does not fall in the rejection region (F = 2.79 >/ 6.39), H0 is not rejected. There is insufficient evidence of a difference in the precision of the two instruments at α = .10.
240
Chapter 7
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
7.112
a.
Let μ1 = mean change in bond prices handled by underwriter 1 and μ2 = mean change in bond prices handled by underwriter 2. sp2 =
(n1 − 1) s12 + ( n2 − 1) s22 (27 − 1).0098 + (23 − 1).002465 .30903 = = .006438 = 48 n1 + n2 − 2 27 + 23 − 2
To determine if there is a difference in the mean change in bond prices handled by the 2 underwriters, we test: H0: μ1 − μ2 = 0 Ha: μ1 − μ2 ≠ 0 The test statistic is t =
( x1 − x2 ) − D0 ⎛1 1⎞ s ⎜ + ⎟ ⎝ n1 n2 ⎠
=
2 p
−.0491 − (−.0307) − 0 1 ⎞ ⎛ 1 .006438 ⎜ + ⎟ ⎝ 27 23 ⎠
= −.81
The rejection region requires α/2 = .05/2 = .025 in each tail of the t-distribution with df = n1 + n2 − 2 = 27 + 23 − 2 = 48. From Table VI, Appendix B, t.025 ≈ 1.96. The rejection region is t < −1.96 or t > 1.96. Since the observed value of the test statistic does not fall in the rejection region (t = −.81 −1.96), H0 is not rejected. There is insufficient evidence to indicate there is a difference in the mean change in bond prices handled by the 2 underwriters at α = .05. b.
For confidence coefficient .95, α = 1 − .95 = .05 and α/2 = .05/2 = .025. From Table VI, Appendix B, with df = 48, t.025 ≈ 1.96. The confidence interval is: ⎛1 1⎞ ( x1 − x2 ) ± t.025 sp2 ⎜ + ⎟ ⎝ n1 n2 ⎠ 1 ⎞ ⎛ 1 ⇒ (−.0491 − (−.0307) ± 1.96 .006438 ⎜ + ⎟ ⎝ 27 23 ⎠ ⇒ −.0184 ± .0446 ⇒ (−.063, .0262)
We are 95% confident the difference in the mean bond prices handled by underwriter 1 and underwriter 2 is somewhere between −.063 and .0262. 7.114
a.
To determine if the mean salary of all males with post-graduate degrees exceeds the mean salary of all females with post-graduate degrees, we test: H0: μM = μF Ha: μM > μF
b.
The test statistic is z =
( xM − xF ) − 0 s
2 xM
+s
2 xF
=
(61, 340 − 32, 227) 2,1852 + 9322
= 12.26
Inferences Based on Two Samples: Confidence Intervals and Tests of Hypothesis
241
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
242
c.
The rejection region requires α = .01 in the upper tail of the z-distribution. From Table IV, Appendix B, z.01 = 2.33. The rejection region is z > 2.33.
d.
Since the observed value of the test statistic falls in the rejection region (z = 12.26 > 2.33), H0 is rejected. There is sufficient evidence to indicate the mean salary of all males with post-graduate degrees exceeds the mean salary of all females with post-graduate degrees at α = .01.
Chapter 7
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
The Kentucky Milk Case—Part II (To accompany Chapters 5–7)
(1)Incumbency Rates I have repeated the incumbency rates for the Tri-county market. If the "normal" incumbency rate is .7 in competitive markets, then we would like to test to see if the incumbency rate in the Tri-county market is larger than .7. We will run a test for each of the years from 1985 through 1988, and also for the four years combined.
Year 1984 1985 1986 1987 1988 1989 1990 1991
Tri-County Market Number of Same Incumbency Districts Vendors Rate 10 8 .800 12 12 1.000 13 13 1.000 13 12 .923 13 13 1.000 13 9 .692 13 10 .769 13 11 .846
1985 One of the assumptions necessary for this test is that the sample size is sufficiently large. In order for the sample size to be sufficiently large, the interval p0 ± 3σ pˆ must not contain 0 or 1. For this problem, the interval is: p0 ± 3σ pˆ ⇒ .7 ± 3
.7(.3) ⇒ .7 ± .397 ⇒ (.303, 1.097) 12
Since 1 is included in the interval, the sample size is not sufficiently large. The following test may not be valid. To see if the incumbency rate in 1985 exceeds .7, we test: H0: p = .7 Ha: p > .7
The test statistic is z =
pˆ − p0 p0 q0 n
The Kentucky Milk Case—Part II
=
1 − .7 = 2.27 .7(.3) 12
243
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
The rejection region requires α = .05 in the upper tail of the z-distribution. From Table IV, Appendix B, z.05 = 1.645. The rejection region is z > 1.645. Since the observed value of the test statistic falls in the rejection region (z = 2.27 > 1.645), H0 is rejected. There is sufficient evidence to indicate that the incumbency rate exceeds .7 in the Tricounty market at α = .05. 1986
In order for the sample size to be sufficiently large, the interval p0 ± 3σ pˆ must not contain 0 or 1. For this problem, the interval is: p0 ± 3σ pˆ ⇒ .7 ± 3
.7(.3) ⇒ .7 ± .381 ⇒ (.319, 1.081) 13
Since 1 is included in the interval, the sample size is not sufficiently large. The following test may not be valid. To see if the incumbency rate in 1986 exceeds .7, we test: H0: p = .7 Ha: p > .7
The test statistic is z =
pˆ − p0 p0 q0 n
=
1 − .7 .7(.3) 13
= 2.36
The rejection region requires α = .05 in the upper tail of the z-distribution. From Table IV, Appendix B, z.05 = 1.645. The rejection region is z > 1.645. Since the observed value of the test statistic falls in the rejection region (z = 2.36 > 1.645), H0 is rejected. There is sufficient evidence to indicate that the incumbency rate exceeds .7 in the Tricounty market at α = .05. 1987
In order for the sample size to be sufficiently large, the interval p0 ± 3σ pˆ must not contain 0 or 1. For this problem, the interval is: p0 ± 3σ pˆ ⇒ .7 ± 3
244
.7(.3) ⇒ .7 ± .381 ⇒ (.319, 1.081) 13
The Kentucky Milk Case—Part II
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
Since 1 is included in the interval, the sample size is not sufficiently large. The following test may not be valid. To see if the incumbency rate in 1987 exceeds .7, we test: H0: p = .7 Ha: p > .7 The test statistic is z =
pˆ − p0 p0 q0 n
=
.923 − .7 = 1.75 .7(.3) 13
The rejection region requires α = .05 in the upper tail of the z-distribution. From Table IV, Appendix B, z.05 = 1.645. The rejection region is z > 1.645. Since the observed value of the test statistic falls in the rejection region (z = 1.75 > 1.645), H0 is rejected. There is sufficient evidence to indicate that the incumbency rate exceeds .7 in the Tricounty market at α = .05. 1988
This test is the same as the test for 1986. Combined 1985-1988
To see if the sample size is sufficiently large, the interval p0 ± 3σ pˆ must not contain 0 or 1. For this problem, the interval is: p0 ± 3σ pˆ ⇒ .7 ± 3
.7(.3) ⇒ .7 ± .193 ⇒ (.507, .893) 51
Since neither 0 nor 1 is included in the interval, the sample size is sufficiently large. pˆ =
50 = .980 51
To see if the incumbency rate in 1985–1988 exceeds .7, we test: H0: p = .7 Ha: p > .7
The test statistic is z =
pˆ − p0 p0 q0 n
=
980 − .7 = 4.36 .7(.3) 51
The rejection region requires α = .05 in the upper tail of the z-distribution. From Table IV, Appendix B, z.05 = 1.645. The rejection region is z > 1.645.
The Kentucky Milk Case—Part II
245
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
Since the observed value of the test statistic falls in the rejection region (z = 4.36 > 1.645), H0 is rejected. There is sufficient evidence to indicate that the incumbency rate exceeds .7 in the Tricounty market at α = .05. Thus, there is evidence, based on the incumbency rates, that bid collusion is present in the Tricounty market.
(2)Bid Price Dispersion
Again, we can use only the data provided which are the winning bids in each of the school districts in both markets. The sample sizes and the variances for each of the milk products for each year and each market are provided in the table. Whole White Milk
YR 83 84 85 86 87 88 89 90 91
N 22 22 26 33 36 36 37 35 5
Surround Market VAR 0.000212 0.000188 0.000174 0.000120 0.000105 0.000128 0.000056 0.000063 0.000042
N 8 9 10 10 12 12 12 12 13
Tri-County Market VAR 0.000213 0.000022 0.000028 0.000019 0.000027 0.000024 0.000089 0.000010 0.000020
N 10 12 13 13 13 13 13 13 12
Tri-County Market VAR 0.000155 0.000040 0.000028 0.000028 0.000049 0.000038 0.000068 0.000025 0.000034
Lowfat White Milk
YR 83 84 85 86 87 88 89 90 91
246
N 24 26 29 33 35 35 35 34 5
Surround Market VAR 0.000279 0.000216 0.000210 0.000139 0.000152 0.000165 0.000043 0.000091 0.000051
The Kentucky Milk Case—Part II
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
Lowfat Chocolate Milk
YR 83 84 85 86 87 88 89 90 91
N 24 25 28 34 36 36 36 33 5
Surround Market VAR 0.000287 0.000234 0.000248 0.000163 0.000163 0.000184 0.000060 0.000098 0.000098
N 5 6 6 6 7 9 9 10 11
Tri-County Market VAR 0.000015 0.000060 0.000038 0.000027 0.000040 0.000087 0.000087 0.000014 0.000042
I will write out the first test and then summarize the others in a table. The first test will be for the year 1983 and will compare the variances of the whole white milk. To determine if the variances in the winning bid prices differ for the two markets, we test:
σ 12 =1 σ 22 σ2 Ha: 12 ≠ 1 σ2 H0:
The test statistic is F =
s2 larger sample variance .000213 = 1.005 = 12 = s2 smaller sample variance .000212
The rejection region requires α/2 = .05/2 = .025 in the upper tail of the F-distribution with ν1 = n2 − 1 = 8 − 1 = 7 and ν2 = n1 − 1 = 22 − 1 = 21. From Table IX, Appendix B, F.025 = 2.97. The rejection region is F > 2.97. Since the observed value of the test statistic does not fall in the rejection region (F = 1.005 >/ 2.97), H0 is not rejected. There is insufficient evidence to indicate that the variances of the winning bids are different for the two markets. Whole White Milk Year 1983 1984 1985 1986 1987 1988 1989 1990 1991
ν 1, ν 2 7,21 21,8 25,9 32,9 35,11 35,11 11,36 34,11 4,12
F.025 2.97 4.00 3.61 3.56 2.96 2.96 2.51 3.12 4.12
The Kentucky Milk Case—Part II
F 1.005 8.545 6.214 6.316 3.889 5.333 1.589 6.300 2.100
Decision Do not reject Reject Reject Reject Reject Reject Do not reject Reject Do not reject
247
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
In all cases where there was a significant difference in the variances of the winning bids between the two markets, the variance in the Surrounding market was larger than the variance in the Tri-county market. This implies that collusion might be present in the Tri-county market. Lowfat White Milk Year 1983 1984 1985 1986 1987 1988 1989 1990 1991
ν 1, ν 2 23,9 25,11 28,12 32,12 34,12 34,12 12,34 33,12 4,11
F.025 3.62 3.17 3.02 2.96 2.96 2.96 2.41 2.96 4.28
F 1.800 5.400 7.500 4.964 3.102 4.342 1.581 3.640 1.500
Decision Do not reject Reject Reject Reject Reject Reject Do not reject Reject Do not reject
Again, in all cases where there was a significant difference in the variances of the winning bids between the two markets, the variance in the Surrounding market was larger than the variance in the Tri-county market. This implies that collusion might be present in the Tri-county market. Lowfat Chocolate Milk Year 1983 1984 1985 1986 1987 1988 1989 1990 1991
v1,v2 23,4 24,5 27,5 33,5 35,6 35,8 8,35 32,9 4,10
F.025 8.56 6.28 6.28 6.23 5.07 3.89 2.65 3.56 4.47
F 19.133 3.900 6.526 6.037 4.075 10.222 1.450 7.000 2.333
Decision Reject Do not reject Reject Do not reject Do not reject Reject Do not reject Reject Do not reject
Again, in all cases where there was a significant difference in the variances of the winning bids between the two markets, the variance in the Surrounding market was larger than the variance in the Tri-county market. This implies that collusion might be present in the Tri-county market. Based on the analysis of the three milk products, there appears to be collusion in the Tri-county market.
248
The Kentucky Milk Case—Part II
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
(3)Average Winning Bid Price
I have provided the SAS output for computing the t-tests to compare the mean winning bid prices between the two markets for each of the years and each of the milk products. I will discuss the findings for each milk product separately. For t-tests, we must assume that the two population variances are the same. If the population variances are not the same, there is an approximate test that takes into consideration the different variances. The SAS printout provided allows for the test of equal variances first. I used a p-value of .25 as the cutoff point. If the p-value was less than or equal to .25 for the F-test, I assumed that the variances were different and used the approximate test designated as UNEQUAL. If the p-value for the F-test was greater than .25, I assumed that the population variances were the same and used the test designated as EQUAL. Whole White Milk: Variable: Whole White Milk - 1983 MARKET
N
Mean
Std Dev
Std Error
Variances
T
DF
Prob>|T|
-----------------------------------------------------------------------------SUR
22
0.1318
0.01458844
0.00311027
Unequal
2.4045
12.4
0.0326
TRI
8
0.1173
0.01462038
0.00516909
Equal
2.4071
28.0
0.0229*
For H0: Variances are equal, F' = 1.00
DF = (7,21)
Prob>F' = 0.9116 ************************************************************************ Variable: Whole White Milk MARKET
N
Mean
- 1984
Std Dev
Std Error
Variances
T
DF Prob>|T|
-----------------------------------------------------------------------------SUR
22
0.1309
0.01374189
0.00292978
Unequal
-2.3904
28.6
0.0236*
TRI
9
0.1389
0.00474871
0.00158290
Equal
-1.6825
29.0
0.1032
For H0: Variances are equal, F' = 8.37
DF = (21,8)
Prob>F' = 0.0044 ************************************************************************ Variable: Whole White Milk MARKET
N
Mean
- 1985
Std Dev
Std Error
Variances
T
DF Prob>|T|
-----------------------------------------------------------------------------SUR
26
0.1279
0.01321810
0.00259228
Unequal
-4.3968
33.8
0.0001*
TRI
10
0.1415
0.00534266
0.00168950
Equal
-3.1348
34.0
0.0035
For H0: Variances are equal, F' = 6.12
DF = (25,9)
Prob>F' = 0.0077 ************************************************************************
The Kentucky Milk Case—Part II
249
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
Variable: Whole White Milk MARKET
N
Mean
- 1986
Std Dev
Std Error
Variances
T
DF Prob>|T|
----------------------------------------------------------------------------SUR
33
0.1253
0.01098665
0.00191253
Unequal
-8.1534
37.3
0.0001*
TRI
10
0.1446
0.00442846
0.00140040
Equal
-5.3943
41.0
0.0000
For H0: Variances are equal, F' = 6.15
DF = (32,9)
Prob>F' = 0.0070 ************************************************************************ Variable: Whole White Milk MARKET
N
Mean
- 1987
Std Dev
Std Error
Variances
T
DF
Prob>|T|
-----------------------------------------------------------------------------SUR
36
0.1264
0.01026078
0.00171013
Unequal
TRI
12
0.1495
0.00527196
0.00152188
Equal
For H0: Variances are equal, F' = 3.79
-10.0785
37.5
0.0001*
-7.4313
46.0
0.0000
DF = (35,11)
Prob>F' = 0.0224 ************************************************************************ Variable: Whole White Milk MARKET
N
Mean
- 1988
Std Dev
Std Error
Variances
T
DF
Prob>|T|
-----------------------------------------------------------------------------SUR
36
0.1277
0.01135449
0.00189242
Unequal
-9.9271
42.2
0.0001*
TRI
12
0.1513
0.00499090
0.00144075
Equal
-6.9441
46.0
0.0000
For H0: Variances are equal, F' = 5.18
DF = (35,11)
Prob>F' = 0.0060 ************************************************************************ Variable: Whole White Milk MARKET
N
Mean
- 1989
Std Dev
Std Error
Variances
T
DF
Prob>|T|
-----------------------------------------------------------------------------SUR
37
0.1299
0.00752173
0.00123657
Unequal
-0.4890
15.8
0.6316
TRI
12
0.1314
0.00944991
0.00272795
Equal
-0.5501
47.0
0.5849NS
For H0: Variances are equal, F' = 1.58
DF = (11,36)
Prob>F' = 0.2947 ************************************************************************ Variable: Whole White Milk MARKET
N
Mean
- 1990
Std Dev
Std Error Variances
T
DF
Prob>|T|
--------------------------------------------------------------------------SUR
35
0.1609
0.00794659
0.00134322 Unequal
-1.1177
43.7
0.2698NS
TRI
12
0.1628
0.00317904
0.00091771 Equal
-0.7673
45.0
0.4469
For H0: Variances are equal, F' = 6.25
DF = (34,11)
Prob>F' = 0.0026 ************************************************************************
250
The Kentucky Milk Case—Part II
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
Variable: Whole White Milk MARKET
N
Mean
- 1991
Std Dev
Std Error
Variances
T
DF
Prob>|T|
-----------------------------------------------------------------------------SUR
5
0.1452
0.00652012
0.00291589
Unequal
1.2585
5.6
TRI
13
0.1412
0.00458169
0.00127073
Equal
1.4813
16.0
For H0: Variances are equal, F' = 2.03
0.2585 0.1580NS
DF = (4,12)
Prob>F' = 0.3095 ************************************************************************
The mean winning bid prices were significantly different between the markets for all years except 1989, 1990, and 1991. In 1983, the mean winning bid for the Surrounding market was significantly larger than that for the Tri-county market. For the years 1984–1988, the mean winning bid price for the Tri-county market was significantly larger than that for the Surrounding market. This implies evidence of collusion for the years 1984–1988. Lowfat White Milk: Variable: Lowfat White Milk - 1983 MARKET
N
Mean
Std Dev
Std Error
Variances
T
DF
Prob>|T|
-----------------------------------------------------------------------------SUR
24
0.1243
0.01672220
0.00341341
Unequal
2.5085
22.6
0.0198
TRI
10
0.1112
0.01246237
0.00394095
Equal
2.2214
32.0
0.0335*
For H0: Variances are equal, F' = 1.80
DF = (23,9)
Prob>F' = 0.3627 ************************************************************************ Variable: Lowfat White Milk - 1984 MARKET
N
Mean
Std Dev
Std Error
Variances
T
DF
Prob>|T|
----------------------------------------------------------------------------SUR
26
0.1236
0.01469859
0.00288263
Unequal
-3.0061
36.0
0.0048*
TRI
12
0.1338
0.00635717
0.00183516
Equal
-2.3099
36.0
0.0267
For H0: Variances are equal, F' = 5.35
DF = (25,11)
Prob>F' = 0.0059 ************************************************************************ Variable: Lowfat White Milk - 1985 MARKET
N
Mean
Std Dev
Std Error
Variances
T
DF
Prob>|T|
----------------------------------------------------------------------------SUR
29
0.1200
0.01452245
0.00269675
Unequal
-5.3857
39.2
0.0001*
TRI
13
0.1366
0.00537445
0.00149061
Equal
-3.9769
40.0
0.0003
For H0: Variances are equal, F' = 7.30
DF = (28,12)
Prob>F' = 0.0008 ************************************************************************
The Kentucky Milk Case—Part II
251
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
Variable: Lowfat White Milk - 1986 MARKET
N
Mean
Std Dev
Std Error
Variances
T
DF
Prob>|T|
-----------------------------------------------------------------------------SUR
33
0.1178
0.01180640
0.00205523
Unequal
-8.4010
43.0
0.0001*
TRI
13
0.1391
0.00533205
0.00147884
Equal
-6.2183
44.0
0.0000
For H0: Variances are equal, F' = 4.90
DF = (32,12)
Prob>F' = 0.0055 ************************************************************************ Variable: Lowfat White Milk - 1987 MARKET
N
Mean
Std Dev
Std Error
Variances
T
DF
Prob>|T|
------------------------------------------------------------------------------SUR
35
0.1173
0.01235100
0.00208770
Unequal
-8.7991
37.8
0.0001*
TRI
13
0.1424
0.00701738
0.00194627
Equal
-6.8995
46.0
0.0000
For H0: Variances are equal, F' = 3.10
DF = (34,12)
Prob>F' = 0.0404 ************************************************************************ Variable: Lowfat White Milk - 1988 MARKET
N
Mean
Std Dev
Std Error
Variances
T
DF
Prob>|T|
----------------------------------------------------------------------------SUR
35
0.1182
0.01285522
0.00217293
Unequal
-9.6219
42.7
0.0001*
TRI
13
0.1448
0.00618019
0.00171408
Equal
-7.1332
46.0
0.0000
For H0: Variances are equal, F' = 4.33
DF = (34,12)
Prob>F' = 0.0095 ************************************************************************ Variable: Lowfat White Milk - 1989 MARKET
N
Mean
Std Dev
Std Error
Variances
T
DF
Prob>|T|
----------------------------------------------------------------------------SUR
35
0.1187
0.00655938
0.00110874
Unequal
-2.1005
17.9
0.0501
TRI
13
0.1240
0.00828350
0.00229743
Equal
-2.3400
46.0
0.0237*
For H0: Variances are equal, F' = 1.59
DF = (12,34)
Prob>F' = 0.2798 ************************************************************************ Variable: Lowfat White Milk - 1990 MARKET
N
Mean
Std Dev
Std Error
Variances
T
DF
Prob>|T|
-----------------------------------------------------------------------------SUR
34
0.1519
0.00954524
0.00163700
Unequal
-2.3772
39.8
0.0223*
TRI
13
0.1570
0.00508486
0.00141029
Equal
-1.8347
45.0
0.0732
For H0: Variances are equal, F' = 3.52
DF = (33,12)
Prob>F' = 0.0238 ************************************************************************
252
The Kentucky Milk Case—Part II
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
Variable: Lowfat White Milk - 1991 MARKET
N
Mean
Std Dev
Std Error
Variances
T
DF
Prob>|T|
----------------------------------------------------------------------------SUR
5
0.1364
0.00718485
0.00321316
Unequal
0.2745
6.3
TRI
12
0.1354
0.00585768
0.00169097
Equal
0.3001
15.0
For H0: Variances are equal, F' = 1.50
0.7925 0.7682NS
DF = (4,11)
Prob>F' = 0.5343 ************************************************************************
The mean winning bid prices were significantly different between the markets for all years except 1991. In 1983, the mean winning bid for the Surrounding market was significantly larger than that for the Tri-county market. For the years 1984–1990, the mean winning bid price for the Tricounty market was significantly larger than that for the Surrounding market. This implies evidence of collusion for the years 1984–1990. Lowfat Chocolate Milk: Variable: Lowfat Chocolate Milk - 1983 MARKET
N
Mean
Std Dev
Std Error
Variances
T
DF
Prob>|T|
-----------------------------------------------------------------------------SUR
24
0.1267
0.01696642
0.00346326
Unequal
5.3313
26.3
0.0001*
TRI
5
0.1060
0.00394740
0.00176533
Equal
2.6795
27.0
0.0124
For H0: Variances are equal, F' =
18.47
DF = (23,4)
Prob>F' = 0.0117 ************************************************************************ Variable: Lowfat Chocolate Milk - 1984 MARKET
N
Mean
Std Dev
Std Error
Variances
T
DF
Prob>|T|
----------------------------------------------------------------------------SUR
25
0.1251
0.01530156
0.00306031
Unequal
-2.1693
15.7
0.0457*
TRI
6
0.1347
0.00778522
0.00317830
Equal
-1.4733
29.0
0.1514
For H0: Variances are equal, F' = 3.86
DF = (24,5)
Prob>F' = 0.1379 ************************************************************************ Variable: Lowfat Chocolate Milk - 1985 MARKET
N
Mean
Std Dev
Std Error
Variances
T
DF
Prob>|T|
----------------------------------------------------------------------------SUR
28
0.1206
0.01575587
0.00297758
Unequal
-4.6215
20.9
0.0001*
TRI
6
0.1387
0.00621914
0.00253895
Equal
-2.7384
32.0
0.0100
For H0: Variances are equal, F' = 6.42
DF = (27,5)
Prob>F' = 0.0472 ************************************************************************
The Kentucky Milk Case—Part II
253
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
Variable: Lowfat Chocolate Milk - 1986 MARKET
N
Mean
Std Dev
Std Error
Variances
T
DF
Prob>|T|
----------------------------------------------------------------------------SUR
34
0.1169
0.01279357
0.00219408
Unequal
-8.0140
18.2
0.0001*
TRI
6
0.1414
0.00521130
0.00212751
Equal
-4.5821
38.0
0.0000
For H0: Variances are equal, F' = 6.03
DF = (33,5)
Prob>F' = 0.0533 ************************************************************************ Variable: Lowfat Chocolate Milk - 1987 MARKET
N
Mean
Std Dev
Std Error
Variances
T
DF
Prob>|T|
----------------------------------------------------------------------------SUR
36
0.1184
0.01280507
0.00213418
Unequal
-7.8853
17.5
0.0001*
TRI
7
0.1436
0.00632926
0.00239224
Equal
-5.0675
41.0
0.0000
For H0: Variances are equal, F' = 4.09
DF = (35,6)
Prob>F' = 0.0832 ************************************************************************ Variable: Lowfat Chocolate Milk - 1988 MARKET
N
Mean
Std Dev
Std Error
Variances
T
DF
Prob>|T|
----------------------------------------------------------------------------SUR
36
0.1192
0.01359999
0.00226666
Unequal
10.3636
40.6
0.0001*
TRI
9
0.1470
0.00425532
0.00141844
Equal
-5.9934
43.0
0.0000
For H0: Variances are equal, F' =
10.21
DF = (35,8)
Prob>F' = 0.0019 ************************************************************************ Variable: Lowfat Chocolate Milk - 1989 MARKET
N
Mean
Std Dev
Std Error
Variances
T
DF
Prob>|T|
----------------------------------------------------------------------------SUR
36
0.1200
0.00776605
0.00129434
Unequal
-1.7178
10.9
0.1140
TRI
9
0.1258
0.00932923
0.00310974
Equal
-1.9216
43.0
0.0613NS
For H0: Variances are equal, F' = 1.44
DF = (8,35)
Prob>F' = 0.4274 ************************************************************************ Variable: Lowfat Chocolate Milk - 1990 MARKET
N
Mean
Std De
Std Error
Variances
T
DF
Prob>|T|
-----------------------------------------------------------------------------SUR
33
0.1531
0.00993298
0.00172911
Unequal
-3.9472
38.3
0.0003*
TRI
10
0.1614
0.00383030
0.00121125
Equal
-2.5773
41.0
0.0137
For H0: Variances are equal, F' = 6.73
DF = (32,9)
Prob>F' = 0.0050 ************************************************************************
254
The Kentucky Milk Case—Part II
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
Variable: Lowfat Chocolate Milk - 1991 MARKET
N
Mean
Std Dev
Std Error
Variances
T
DF
Prob>|T|
----------------------------------------------------------------------------SUR
5
0.1402
0.00991020
0.00443197
Unequal
-0.4431
5.6
TRI
11
0.1423
0.00650294
0.00196071
Equal
-0.5216
14.0
For H0: Variances are equal, F' = 2.32
0.6743 0.6101NS
DF = (4,10)
Prob>F' = 0.2552
The mean winning bid prices were significantly different between the markets for all years except 1989 and 1991. In 1983, the mean winning bid for the Surrounding market was significantly larger than that for the Tri-county market. For the years 1984–1988 and 1990, the mean winning bid price for the Tri-county market was significantly larger than that for the Surrounding market. This implies evidence of collusion for the years 1984–1988.
The Kentucky Milk Case—Part II
255
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
Design of Experiments and Analysis of Variance 8.2
Chapter 8
The treatments are the combinations of levels of each of the two factors. There are 2 × 5 = 10 treatments. They are: (A, 50), (A, 60), (A, 70), (A, 80), (A, 90) (B, 50), (B, 60), (B, 70), (B, 80), (B, 90)
8.4
8.6
a.
College GPA's are measured on college students. The experimental units are college students.
b.
Household income is measured on households. The experimental units are households.
c.
Gasoline mileage is measured on automobiles. The experimental units are the automobiles of a particular model.
d.
The experimental units are the sectors on a computer diskette.
e.
The experimental units are the states.
a.
The response variable is the amount of the purchase.
b. There is one factor in this problem: type of credit card. c. There are 4 treatments, corresponding to the 4 levels of the factor. The treatments are VISA, MasterCard, American Express, and Discover. d. The experimental units are the credit card holders. 8.8
8.10
256
a.
The response variable in this problem is the consumer’s opinion on the value of the discount offer.
b.
There are two treatments in this problem: Within-store price promotion and betweenstore price promotion.
c.
The experimental units are the consumers.
a.
There are 2 factors in the problem: Type of yeast and Temperature. Type of yeast has 2 levels – Brewer’s yeast and baker’s yeast. Temperature has 4 levels – 45o, 48o, 51o and 54oC.
b.
The response variable is the autolysis yield.
c.
There are a total of 2 × 4 = 8 treatments in this experiment. The treatments are all the type of yeast-temperature combinations.
d.
This is a designed experiment.
Chapter 8
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
8.12
8.14
8.16
a.
The response is the evaluation by the undergraduate student of the ethical behavior of the salesperson.
b.
There are two factors—type of sales job at two levels (high tech. vs. low tech.) and sales task at two levels (new account development vs. account maintenance).
c.
The treatments are the 2 × 2 = 4 combinations type of sales job and sales task.
d.
The experimental units are the college students.
a.
From Table IX with ν1 = 4 and ν2 = 4, F.05 = 6.39.
b.
From Table XI with ν1 = 4 and ν2 = 4, F.01 = 15.98.
c.
From Table VIII with ν1 = 30 and ν2 = 40, F.10 = 1.54.
d.
From Table X with ν1 = 15, and ν2 = 12, F.025 = 3.18.
a.
In the second dot diagram #2, the difference between the sample means is small relative to the variability within the sample observations. In the first dot diagram #1, the values in each of the samples are grouped together with a range of 4, while in the second diagram #2, the range of values is 8.
b. For diagram #1,
∑x
7 + 8 + 9 + 10 + 11 54 = =9 n 6 6 ∑ x2 = 12 + 13 + 14 + 14 + 15 + 16 = 84 = 14 x2 = n 6 6 x1 =
1
=
For diagram #2,
∑x
5 + 5 + 7 + 11 + 13 + 13 54 = =9 n 6 6 ∑ x2 = 10 + 10 + 12 + 16 + 18 + 18 = 84 = 14 x2 = n 6 6
x1 =
c.
1
=
For diagram #1, 2
SST =
∑ n (x i =1
i
i
− x ) 2 1 = 6(9 − 11.5)2 + 6(14 − 11.5)2 = 75
⎛ ∑ x = 54 + 84 = 11.5 ⎞⎟ ⎜⎜ x = ⎟ 12 n ⎝ ⎠ For diagram #2, 2
SST =
∑ n (x i =1
i
i
− x ) 2 = 6(9 - 11.5)2 + 6(14 - 11.5)2 = 75
Design of Experiments and Analysis of Variance
257
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
d.
For diagram #1,
∑x
2 1
s12 =
(∑ x ) −
2
1
n1
=
n1 − 1
∑x
2 2
s22 =
(∑ x ) −
2 542 54 496 − 6 = 6 =2 6 −1 6 −1
496 −
2
2
n2 n2 − 1
=
842 6 =2 6 −1
1186 −
SSE = (n1 − 1) s12 + (n2 − 1) s22 = (6 − 1)2 + (6 − 1)2 = 20 For diagram #2,
∑x
2 1
s12 =
(∑ x ) −
2
1
n1
=
n1 − 1
∑x
2 2
s22 =
(∑ x ) −
2
2
n2 n2 − 1
542 6 = 14.4 6 −1
558 −
=
842 6 = 14.4 6 −1
1248 −
SSE = (n1 − 1) s12 + (n2 − 1) s22 = (6 − 1)14.4 + (6 − 1)14.4 = 144 e.
For diagram #1, SS(Total) = SST + SSE = 75 + 20 = 95 SST is
SST 75 × 100% = × 100% = 78.95% of SS(Total) SS(Total) 95
For diagram #2, SS(Total) = SST + SSE = 75 + 144 = 219 SST is
f.
SST 75 × 100% = × 100% = 34.25% of SS(Total) SS(Total) 219 SST 75 = = 75 k −1 2 −1 SSE 20 = MSE = =2 n − k 12 − 2
For diagram #1, MST =
SST 75 = = 75 k −1 2 −1 SSE 144 = = 14.4 MSE = n − k 12 − 2
F=
MST 75 = = 37.5 MSE 2
F=
MST 75 = = 5.21 MSE 14.4
For diagram #2, MST =
258
Chapter 8
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
g.
The rejection region for both diagrams requires α = .05 in the upper tail of the Fdistribution with ν1 = p − 1 = 2 − 1 = 1 and ν2 = n − p = 12 − 2 = 10. From Table IX, Appendix B, F.05 = 4.96. The rejection region is F > 4.96. For diagram #1, the observed value of the test statistic falls in the rejection region (F = 37.5 > 4.96). Thus, H0 is rejected. There is sufficient evidence to indicate the samples were drawn from populations with different means at α = .05. For diagram #2, the observed value of the test statistic falls in the rejection region (F = 5.21 > 4.96). Thus, H0 is rejected. There is sufficient evidence to indicate the samples were drawn from populations with different means at α = .05.
h. 8.18
We must assume both populations are normally distributed with common variances.
Refer to Exercise 8.16, the ANOVA table is: For diagram #1: Source Treatment Error Total
Df 1 10 11
SS 75 20 95
MS 75 2
F 37.5
SS 75 144 219
MS 75 14.4
F 5.21
For diagram #2: Source Treatment Error Total
8.20
a.
Df 1 10 11
df for Error is 41 − 6 = 35 SSE = SS(Total) − SST = 46.5 − 17.5 = 29.0 SST 17.5 = = 2.9167 k −1 6 MST 2.9167 = F= = 3.52 MSE .8286
MST =
MSE =
SSE 29.0 = = .8286 n−k 35
The ANOVA table is: Source Treatment Error Total
df 6 35 41
SS 17.5 29.0 46.5
MS 2.9167 .8286
Design of Experiments and Analysis of Variance
F 3.52
259
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
b.
The number of treatments is k. We know k − 1 = 6 ⇒ k = 7.
c.
To determine if there is a difference among the population means, we test: H0: μ1 = μ2 = ⋅⋅⋅ = μ7 Ha: At least one of the population means differs from the rest The test statistic is F = 3.52. The rejection region requires α = .10 in the upper tail of the F-distribution with numerator df = k − 1 = 6 and denominator df = n − k = 35. From Table VIII, Appendix B, F.10 ≈ 1.98. The rejection region is F > 1.98. Since the observed value of the test statistic falls in the rejection region (F = 3.52 > 1.98), H0 is rejected. There is sufficient evidence to indicate a difference among the population means at α = .10.
d.
The observed significance level is P(F ≥ 3.52). With numerator df = 6 and denominator df = 35, and Table XI, P(F ≥ 3.52) < .01.
e.
H0: μ1 = μ2 Ha: μ1 ≠ μ2 The test statistic is t =
x1 − x2 ⎛1 1 ⎞ MSE ⎜ + ⎟ ⎝ n1 n2 ⎠
=
3.7 − 4.1 ⎛1 1⎞ .8286 ⎜ + ⎟ ⎝6 6⎠
= −.76
The rejection region requires α/2 = .10/2 = .05 in each tail of the t-distribution with df = n − p = 35. From Table VI, Appendix B, t.05 ≈ 1.697. The rejection region is t < −1.697 or t > 1.697. Since the observed value of the test statistic does not fall in the rejection region (t = −.76 −1.697), H0 is not rejected. There is insufficient evidence to indicate that μ1 and μ2 differ at α = .10. f.
For confidence coefficient .90, α = .10 and α/2 = .05. From Table VI, Appendix B, with df = 35, t.05 ≈ 1.697. The confidence interval is:
⎛1 1⎞ ⎛1 1⎞ ( x1 − x2 ) ± t.05 MSE⎜ + ⎟ ⇒ (3.7 − 4.1) ± 1.697 .8286 ⎜ + ⎟ ⎝6 6⎠ ⎝ n1 n 2 ⎠ ⇒ −.4 ± .892 ⇒ (1.292, .492) g.
260
The confidence interval is: x1 ± t.05 MSE/6 ⇒ 3.7 ± 1.697 .8286 / 6 ⇒ 3.7 ± .631 ⇒ (3.069, 4.331)
Chapter 8
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
8.22
a.
The experimental unit in the study is the college tennis coach. The dependent variable is the response to the statement “the Prospective Student-Athlete Form on the web site contributes very little to the recruiting process” on a scale from 1 to 7. There is one factor in the study and it is the NCAA division of the college tennis coach. There are 3 levels of this factor, and thus, there are 3 treatments – Division I, Division II, and Division III.
b.
To determine if the mean responses of tennis coaches from the different divisions differ, we test:
H0: μ1 = μ2 = μ3 Ha: At least 1 μi differs
8.24
c.
Since the observed p-value of the test (p < .003) is less than α = .05, H0 is rejected. There is sufficient evidence to indicate differences in mean response among coaches of the 3 divisions.
a.
A completely randomized design was used.
b.
There are 4 treatments: 3 robots/colony, 6 robots/colony, 9 robots/colony, and 12 robots/colony.
c.
To determine if there was a difference in the mean energy expended (per robot) among the 4 colony sizes, we test:
H0: μ1 = μ2 = μ3 = μ4 Ha: At least two means differ d.
8.26
a.
Since the p-value (<.001) is less than α (.05), H0 is rejected. There is sufficient evidence to indicate a difference in mean energy expended per robot among the 4 colony sizes at α = .05. To determine if differences exist in the mean rates of return among the three types of fund groups, we test:
H0: μ1 = μ2 = μ3 Ha: At least two means differ b.
c.
The rejection region requires α = .01 in the upper tail of the F-distribution with ν1 = k – 1 = 3 – 1 = 2 and ν2 = N – k = 90 – 3 = 87. From Table XI, Appendix B, F.01 ≈ 4.98. The rejection region is F > 4.98. Since the observed value of the test statistic falls in the rejection region (F = 69.65 > 4.98), H0 is rejected. There is sufficient evidence to indicate differences exist in the mean rates of return among the three types of fund groups at α = .01.
Design of Experiments and Analysis of Variance
261
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
8.28
a.
The response variable for this study is the safety rating of nuclear power plants.
b.
There are three treatments in this study. The treatment groups are the scientists, the journalists, and the federal government policymakers.
c.
To determine whether there are differences in the attitudes of scientists, journalists, and government officials regarding the safety of nuclear power plants, we test:
H0: μ1 = μ2 = μ3 Ha: At least two means differ d.
The rejection region requires α = .05 in the upper tail of the F-distribution with ν1 = k − 1 = 3 − 1 = 2 and ν2 = n − k = 300 − 3 = 297. From Table IX, Appendix B, F.05 ≈ 3.00. The rejection region is F > 3.00. In order to reject H0, the test statistic F must be greater than 3.00.
F=
MST > 3.00 MSE
⇒ MST > 3.00(MSE) ⇒ 3.00 (2.355) = 7.065. Thus, MST must be greater than 7.065.
8.30
MST 11.28 = = 4.79 MSE 2.355
e.
For MST = 11.280, F =
f.
With ν1 = k − 1 = 3 − 1 = 2 and ν2 = n − k = 300 − 3 = 297, P(F > 4.79) ≈ .01, using Table XI, Appendix B. The approximate p-value is .01.
a.
We will select size as the quantitative variable and color as the qualitative variable. To determine if the mean size of diamonds differ among the 6 colors, we test:
H0: μ1 = μ2 = μ3 = μ4 = μ5 = μ6 Ha: At least two means differ b.
Using MINITAB, the ANOVA table is:
One-way ANOVA: Carats versus Color Analysis of Variance for Carats Source DF SS MS Color 5 0.7963 0.1593 Error 302 22.7907 0.0755 Total 307 23.5869 Level D E F G H I
N 16 44 82 65 61 40
Pooled StDev =
262
Mean 0.6381 0.6232 0.5929 0.5808 0.6734 0.7310 0.2747
StDev 0.3195 0.2677 0.2648 0.2792 0.2643 0.2918
F 2.11
P 0.064
Individual 95% CIs For Mean Based on Pooled StDev ----------+---------+---------+-----(-------------*------------) (-------*-------) (-----*-----) (------*------) (------*------) (-------*--------) ----------+---------+---------+-----0.60 0.70 0.80
Chapter 8
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
The test statistic is F = 2.11 and the p-value is p = 0.064. Since the p-value (0.064) is less than α = .10, H0 is rejected. There is sufficient evidence to indicate the mean size of diamonds differ among the 6 colors at α = .10. c.
We will check the assumptions of normality and equal variances. Using MINITAB, the stem-and-leaf plots are: Stem-and-Leaf Display: Carats Stem-and-leaf of Carats Leaf Unit = 0.010 1 3 5 5 7 7 (4) 5 5 5
1 2 3 4 5 6 7 8 9 10
1 2 3 4 5 6 7 8 9 10
1 2 3 4 5 6 7 8 9 10
1156 00011
1 2 3 4 5 6 7 8 9 10
Color = 2
N
= 44
Color = 3
N
= 82
N
= 65
9 123 0011345 6 00012245668 23 000123 113 0000011113
88999 1356667 01124445567 0178 000111122333345566678 0 00001112367 0012555 0 00000011112224
Stem-and-leaf of Carats Leaf Unit = 0.010 5 12 21 23 (12) 30 26 16 12 12
= 16
23
Stem-and-leaf of Carats Leaf Unit = 0.010 5 12 23 27 (21) 34 33 22 15 14
N
9 01 01
Stem-and-leaf of Carats Leaf Unit = 0.010 1 4 11 12 (11) 21 19 13 10 10
Color= 1
Color = 4
88899 0001359 000124455 08 000013556789 0034 0000001348 0125 000000011126
Design of Experiments and Analysis of Variance
263
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
Stem-and-leaf of Carats Leaf Unit = 0.010 5 14 16 21 27 (13) 21 14 14 1
2 3 4 5 6 7 8 9 10 11
Color = 5
2 457 012344567 03 25778 001466 0001112233448 0014669
2 3 4 5 6 7 8 9 10
= 61
1 89
0000011111266 0
Stem-and-leaf of Carats Leaf Unit = 0.010 4 8 11 13 15 20 20 17 16
N
Color = 6
N
= 40
5689 0113 115 26 25 03355 002 0 0000001111114579
The data for the 6 colors do not look particularly mound-shaped, so the assumption of normality is probably not valid. However, departures from this assumption often do not invalidate the ANOVA results. Using MINITAB, the box plots are:
1.1 1.0 0.9
Carats
0.8 0.7 0.6 0.5 0.4 0.3 0.2 D
E
F
G
H
I
Color
The spreads of all the colors appear to be about the same, so the assumption of constant variance is probably valid.
264
Chapter 8
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
8.32
a.
The df for Groups = ν1 = k – 1 = 3 – 1 = 2. The df for Error = ν2 = n – k = 71 – 3 = 68. The completed ANOVA table is: Source Groups Error
b.
df
2 68
SS 128.70 27,124.52
MS 64.35 398.89
F 0.16
To determine if the total number of activities undertaken differed among the three groups of entrepreneurs, we test:
H0: μ1 = μ2 = μ3 Ha: At least one mean differs The test statistic is F = 0.16. The rejection region requires α = .05 in the upper tail of the F-distribution with ν1 = k − 1 = 3 − 1 = 2 and ν2 = n − k = 71 − 3 = 68. From Table IX, Appendix B, F.05 ≈ 3.15. The rejection region is F > 3.15. Since the observed value of the test statistic does not fall in the rejection region (F = 0.16 >/ 3.15), H0 is not rejected. There is insufficient evidence to indicate that the total number of activities differed among the groups of entrepreneurs at α = .05. c.
The p-value of the test is P(F > 0.16). From Table VIII, Appendix B, with ν1 = 2 and
ν2 = 68, P(F > 0.16) > .10.
d.
No. Since our conclusion was that there was no evidence of a difference in the total number of activities among the groups, there would be no evidence to indicate a difference between two specific groups.
e.
This study would be observational. The group that each entrepreneur fell into was observed, not controlled. Since no differences were found, the type of study does not have an impact on the conclusions.
8.34
The experimentwise error rate is the probability of making a Type I error for at least one of all of the comparisons made. If the experimentwise error rate is α = .05, then each individual comparison is made at a value of α which is less than .05.
8.36
a.
From the diagram, the following pairs of treatments are significantly different because they are not connected by a line: A and E, A and B, A and D, C and E, C and B, C and D, and E and D. All other pairs of means are not significantly different because they are connected by lines.
b.
From the diagram, the following pairs of treatments are significantly different because they are not connected by a line: A and B, A and D, C and B, C and D, E and B, E and D, and B and D. All other pairs of means are not significantly different because they are connected by lines.
Design of Experiments and Analysis of Variance
265
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
8.38
8.40
c.
From the diagram, the following pairs of treatments are significantly different because they are not connected by a line: A and E, A and B, and A and D. All other pairs of means are not significantly different because they are connected by lines.
d.
From the diagram, the following pairs of treatments are significantly different because they are not connected by a line: A and E, A and B, A and D, C and E, C and B, C and D, E and D, and B and D. All other pairs of means are not significantly different because they are connected by lines.
a.
The total number of comparisons conducted is k(k – 1)/2 = 4(4 – 1)/2 = 6.
b.
The mean energy expended by robots in the 12 robot colony is significantly smaller than the mean energy expended by robots in any of the other size colonies. There is no difference in the mean energy expended by robots in the 3 robot colony, the 6 robot colony, and the 9 robot colony.
a.
There will be c =
b.
Comparing the mean safety scores for government officials and journalists, the difference in mean safety scores is 4.2 − 3.7 = .5, The critical value for the Tukey comparison is .23. Since .5 > .23, we conclude that the mean safety score for government officials is higher than the mean safety score for journalists.
k (k − 1) 3(3 − 1) = 3 pairwise comparisons. = 2 2
Comparing the mean safety scores for government officials and scientists, the difference in mean safety scores is 4.2 − 4.1 = .1. Since .1 < .23, we conclude that there is no difference in mean safety scores between government officials and scientists. Comparing the mean safety scores for scientists and journalists, the difference in mean safety scores is 4.1 − 3.7 = .4, The critical value for the Tukey comparison is .23. Since .4 > .23, we conclude that the mean safety score for scientists is higher than the mean safety score for journalists. A display of these conclusions is: Journalists 3.7 8.42
Scientists 4.1
Gov. Officials 4.2
a.
The probability of declaring at least one pair of means different when they are not is .01.
b.
There are a total of
k (k − 1) 3(3 − 1) = = 3 pair-wise comparisons. They are: 2 2
‘Under $30 thousand’ to ‘Between $30 and $60 thousand’ ‘Under $30 thousand’ to ‘Over $60 thousand’ ‘Between $30 and $60 thousand’ to ‘Over $60 thousand’
266
Chapter 8
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
c.
Means for groups in homogeneous subsets are displayed in the table: Income Group
Subsets
Under $30,000 $30,000-$60,000 Over $60,000 d.
N 379 392 267
1 4.60
2 5.08 5.15
Two of the comparisons in part b will yield confidence intervals that do not contain 0. They are: ‘Under $30 thousand’ to ‘Between $30 and $60 thousand’ ‘Under $30 thousand’ to ‘Over $60 thousand’
8.44
From Exercise 8.30, we found that there were differences in the mean carats among the 6 levels of color From Exercise 8.30, the mean carats for the 6 colors are: G F E D H I
0.5808 0.5929 0.6232 0.6381 0.6734 0.7310
Using MINITAB, the Tukey confidence intervals are: Tukey's pairwise comparisons Family error rate = 0.100 Individual error rate = 0.0101 Critical value = 3.66 Intervals for (column level mean) - (row level mean) D
E
F
G
E
-0.1926 0.2225
F
-0.1491 0.2395
-0.1026 0.1631
G
-0.1411 0.2558
-0.0964 0.1812
-0.1059 0.1302
H
-0.2350 0.1644
-0.1909 0.0904
-0.2007 0.0397
-0.2194 0.0341
I
-0.3032 0.1174
-0.2631 0.0475
-0.2752 -0.0010
-0.2931 -0.0074
Design of Experiments and Analysis of Variance
H
-0.2022 0.0871
267
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
There are only 2 intervals that do not contain 0: The confidence interval for the difference in mean carats between colors G and I is (−0.2931, −0.0074). The confidence interval for the difference in mean carats between colors F and I is (−0.2752, −0.0010). Since 0 is not contained in these confidence intervals, there is sufficient evidence of a difference in the mean number of carats between colors G and I and between colors F and I. No other differences exist. 8.46
a.
There are 3 blocks used since Block df = b − 1 = 2 and 5 treatments since the treatment df = k − 1 = 4.
b.
There were 15 observations since the Total df = n − 1 = 14.
c.
H0: μ1 = μ2 = μ3 = μ4 = μ5 Ha: At least two treatment means differ
d.
The test statistic is F =
e.
The rejection region requires α = .01 in the upper tail of the F distribution with ν1 = k − 1 = 5 − 1 = 4 and ν2 = n − k − b + 1 = 15 − 5 − 3 + 1 = 8. From Table XI, Appendix B, F.01 = 7.01. The rejection region is F > 7.01.
f.
Since the observed value of the test statistic falls in the rejection region (F = 9.109 > 7.01), H0 is rejected. There is sufficient evidence to indicate that at least two treatment means differ at α = .01.
g.
The assumptions necessary to assure the validity of the test are as follows: 1. 2.
8.48
a.
The probability distributions of observations corresponding to all the blocktreatment combinations are normal. The variances of all the probability distributions are equal.
The ANOVA Table is as follows: Source Treatment Block Error Total
268
MST = 9.109 MSE
df 2 3 6 11
SS 12.032 71.749 .708 84.489
MS 6.016 23.916 .118
F 50.958 202.586
Chapter 8
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
b.
To determine if the treatment means differ, we test:
H0: μA = μB = μC Ha: At least two treatment means differ B
The test statistic is F =
MST = 50.958 MSE
The rejection region requires α = .05 in the upper tail of the F distribution with ν1 = k − 1 = 3 − 1 = 2 and ν2 = n − k − b + 1 = 12 − 3 − 4 + 1 = 6. From Table IX, Appendix B, F.05 = 5.14. The rejection region is F > 5.14. Since the observed value of the test statistic falls in the rejection region (F = 50.958 > 5.14), H0 is rejected. There is sufficient evidence to indicate that the treatment means differ at α = .05. c.
To see if the blocking was effective, we test:
H0: μ1 = μ2 = μ3 = μ4 Ha: At least two block means differ The test statistic is F =
MSB = 202.586 MSE
The rejection region requires α = .05 in the upper tail of the F distribution with ν1 = k − 1 = 4 − 1 = 3 and ν2 = n − k − b + 1 = 12 − 3 − 4 + 1 = 6. From Table IX, Appendix B, F.05 = 4.76. The rejection region is F > 4.76. Since the observed value of the test statistic falls in the rejection region (F = 202.586 > 4.76), H0 is rejected. There is sufficient evidence to indicate that blocking was effective in reducing the experimental error at α = .05. d.
From the printouts, we are given the differences in the sample means. The difference between Treatment B and both Treatments A and C are positive (1.125 and 2.450), so Treatment B has the largest sample mean. The difference between Treatment A and C is positive (1.325), so Treatment A has a larger sample mean than Treatment C. So Treatment B has the largest sample mean, Treatment A has the next largest sample mean and Treatment C has the smallest sample mean. From the printout, all the means are significantly different from each other.
e.
The assumptions necessary to assure the validity of the inferences above are: 1. 2.
The probability distributions of observations corresponding to all the blocktreatment combinations are normal. The variances of all the probability distributions are equal.
Design of Experiments and Analysis of Variance
269
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
8.50
a.
This is a randomized block design. The blocks are the 12 plots of land. The treatments are the three methods used on the shrubs: fire, clipping, and control. The response variable is the mean number of flowers produced. The experimental units are the 36 shrubs.
b.
Plot
c.
To determine if there is a difference in the mean number of flowers produced among the three treatments, we test:
H0: μ1 = μ2 = μ3 Ha: The mean number of flowers produced differ for at least two of the methods. The test statistic is F = 5.42 and p = .009. We can reject the null hypothesis at the α > .009 level of significance. At least two of the methods differ with respect to mean number of flowers produced by pawpaws. d.
8.52
270
The means of Control and Clipping do not differ significantly. The means of Clipping and Burning do not differ significantly. The mean of treatment Burning exceeds that of the Control.
From the printout, the p-value for treatments or Decoy is p = .589. Since the p-value is not small, we cannot reject H0. There is insufficient evidence to indicate a difference in mean percentage of a goose flock to approach to within 46 meters of the pit blind among the three decoy types. This conclusion is valid for any reasonable value of α.
Chapter 8
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
8.54
Using SAS, the ANOVA Table is: The ANOVA Procedure Dependent Variable: temp Source
DF
Sum of Squares
Mean Square
F Value
Pr > F
Model
11
18.53700000
1.68518182
0.52
0.8634
Error
18
58.03800000
3.22433333
Corrected Total
29
76.57500000
R-Square
Coeff Var
Root MSE
temp Mean
0.242076
1.885189
1.795643
95.25000
Source STUDENT PLANT
DF
Anova SS
Mean Square
F Value
Pr > F
9 2
18.41500000 0.12200000
2.04611111 0.06100000
0.63 0.02
0.7537 0.9813
To determine if there are differences among the mean temperatures among the three treatments, we test:
H0: μ1 = μ2 = μ3 Ha: At least two treatment means differ The test statistic is F = 0.02. The associated p-value is p = .9813. Since the p-value is very large, there is no evidence of a difference in mean temperature among the three treatments. Since there is no difference, we do not need to compare the means. It appears that the presence of plants or pictures of plants does not reduce stress. 8.56
a.
Some preliminary calculations are:
(∑ y ) CM =
2
n SS(Total) =
2.952 = .435125 10 y 2 − CM = .4705 − .435125 = .035375
=
∑
1.622 1.332 T12 T22 + − CM = + − .435125 = .004205 10 10 b b SST .004205 = = .004205, df = k − 1 = 1 MST = 2 −1 k −1 B2 B2 B2 SSB = SS(DOG) = 1 + 2 + ⋅⋅⋅ + 10 − CM k k k 2 2 2 2 2 .32 + .38 + .27 + .36 + .42 + .312 + .19 2 + .192 + .32 + .212 = 2 − .435125 = .028925 SSB .028925 = MSB = = .003214, df = b − 1 = 9 b −1 10 − 1
SST = SS(DRUG) =
SSE = SS(Total) − SST − SSB = .035375 − .004205 − .028925 = .002245
Design of Experiments and Analysis of Variance
271
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
MSE =
F=
SSE .002245 = = .0002494 n − k − b + 1 20 − 2 − 10 + 1
MST .004205 = = 16.86 MSE .0002494
F=
MSB .003214 = = 12.89 MSE .0002494
To determine if there is a difference in mean pressure readings for the two treatments, we test:
H0: μA = μB Ha: μA ≠ μB B
B
The test statistic is F =
MST = 16.86 MSE
The rejection region requires α = .05 in the upper tail of the F distribution with ν1 = k − 1 = 2 − 1 = 1 and ν2 = n − k − b + 1 = 20 − 2 − 10 + 1 = 9. From Table IX, Appendix B, F.05 = 5.12. The rejection region is F > 5.12. Since the observed value of the test statistic falls in the rejection region (F = 16.86 > 5.12), H0 is rejected. There is sufficient evidence to indicate a difference in mean pressure readings for the two drugs at α = .05. b.
Since there is expected to be much variation between the dogs, we use the dogs as blocks to eliminate this identified source of variation.
c.
272
Dog
Drug A
Drug B
1 2 3 4 5 6 7 8 9 10
.17 .20 .14 .18 .23 .19 .12 .10 .16 .13
.15 .18 .13 .18 .19 .12 .07 .09 .14 .08
(A − B) Differences .02 .02 .01 .00 .04 .07 .05 .01 .02 .05
Chapter 8
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
Some preliminary calculations are: d=
sd2 = sd =
∑d
i
nd
∑d
.29 = .029 10
= 2 i
(∑ d ) −
2
i
nd nd − 1
=
(.29) 2 10 = .00449 = .0004989 10 − 1 9
.0129 −
sd2 = .0004989 = .02234
To determine if there is a difference in mean pressure readings for the two treatments, we test: H0: μA = μB Ha: μA ≠ μB B
B
The test statistic is t =
d −0 sd / nd
=
.029 − 0
= 4.105
.02234 / 10
The rejection region requires α/2 = .05/2 = .025 in each tail of the t distribution with df = n − 1 = 10 − 1 = 9. From Table VI, Appendix B, t.025 = 2.262. The rejection region is t < −2.262 or t > 2.262. Since the observed value of the test statistic falls in the rejection region (t = 4.105 > 2.262), H0 is rejected. There is sufficient evidence to indicate a difference in the treatment means at α = .05. d.
In part a, F = 16.86; and in part c, t = 4.105. Note that t2 = 4.1052 = 16.85 = F. In part a, F.05 = 5.12; and in part c, t.025 = 2.262. Note that t.2025 = 2.2622 = 5.12 = F.05.
e.
p-value = P(F ≥ 16.86) with ν1 = 1 and ν2 = 9. Using Table XI, Appendix B, P(F ≥ 10.56) < .01. Thus, the p-value is < .01. The probability of a test statistic this extreme if the treatment means are the same is less than .01. This is very significant. We would reject H0 in favor of Ha if α is larger than the p-value.
8.58
a.
There are two factors.
b.
No, we cannot tell whether the factors are qualitative or quantitative.
c.
Yes. There are four levels of factor A and three levels of factor B.
d.
A treatment would consist of a combination of one level of factor A and one level of factor B. There are a total of 4 × 3 = 12 treatments.
Design of Experiments and Analysis of Variance
273
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
8.60
e.
One problem with only one replicate is there are no degrees of freedom for error. This is overcome by having at least two replicates.
a.
Factor A has 3 + 1 = 4 levels and factor B has 1 + 1 = 2 levels.
b.
There are a total of 23 + 1 = 24 observations and 4 × 2 = 8 treatments. Therefore, there were 24/8 = 3 observations for each treatment.
c.
AB df = (a − 1)(b − 1) = (4 − 1)(2 − 1) = 3 Error df = n − ab = 24 − 4(2) = 16 SS A ⇒ SSA = (a − 1)MSA = (4 − 1)(.75) = 2.25 a −1 SSB .95 = MSB = = .95 b −1 2 −1 SS AB MSAB = ⇒ SSAB = (a − 1)(b − 1)MSAB = (4 − 1)(2 − 1)(.30) = .9 (a − 1)(b − 1) SSE = SS(Total) − SSA − SSB − SSAB = 6.5 − 2.25 − .95 − .9 = 2.4 SSE 2.4 = MSE = = .15 n − ab 24 - 4(2)
MSA =
SST = SSA + SSB + SSAB = 2.25 + .95 + .90 = 4.1 Treatment df = ab − 1 = 4(2) − 1 = 7 SST 4.1 MST .5857 MST = = .5857 FT = = 3.90 = = ab − 1 7 MSE .15 MSA .75 = = 5.00 MSE .15 MSAB .30 = = 2.00 FAB = MSE .15
FA =
FB = B
MSB .95 = = 6.33 MSE .15
The ANOVA table is: Source Treatments A B AB Error Total
274
df 7 3 1 3 16 23
SS 4.1 2.25 .95 .90 2.40 6.50
MS .59 .75 .95 .30 .15
F 3.90 5.00 6.33 2.00
Chapter 8
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
d.
To determine whether the treatment means differ, we test: H0: μ1 = μ2 = ⋅⋅⋅ = μ8 Ha: At least two treatment means differ The test statistic is F =
MST = 3.90 MSE
The rejection region requires α = .10 in the upper tail of the F-distribution with ν1 = ab − 1 = 4(2) − 1 = 7 and ν2 = n − ab = 24 − 4(2) = 16. From Table VIII, Appendix B, F.10 = 2.13. The rejection region is F > 2.13. Since the observed value of the test statistic falls in the rejection region (F = 3.90 > 2.13), H0 is rejected. There is sufficient evidence to indicate the treatment means differ at α = .10. e.
To determine if the factors interact, we test: H0: Factors A and B do not interact to affect the response mean Ha: Factors A and B do interact to affect the response mean The test statistic is F = 2.00. The rejection region requires α = .10 in the upper tail of the F-distribution with ν1 = (a − 1)(b − 1) = (4 − 1)(2 − 1) = 3 and ν2 = n − ab = 24 − 4(2) = 16. From Table VIII, Appendix B, F.10 = 2.46. The rejection region is F > 2.46. Since the observed value of the test statistic does not fall in the rejection region (F = 2.00 >/ 2.46), H0 is not rejected. There is insufficient evidence to indicate factors A and B interact at α = .10. To determine if the four means of factor A differ, we test: H0: There is no difference in the four means of factor A Ha: At least two of the factor A means differ The test statistic is F = 5.00. The rejection region requires α = .10 in the upper tail of the F-distribution with ν1 = a − 1 = 4 − 1 = 3 and ν2 = n − ab = 24 - 4(2) = 16. From Table VIII, Appendix B, F.10 = 2.46. The rejection region is F > 2.46. Since the observed value of the test statistic falls in the rejection region (F = 5.00 > 2.46), H0 is rejected. There is sufficient evidence to indicate at least two of the four means of factor A differ at α = .10. To determine if the 2 means of factor B differ, we test: H0: There is no difference in the two means of factor B Ha: At least two of the factor B means differ
Design of Experiments and Analysis of Variance
275
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
The test statistic is F = 6.33. The rejection region requires α = .10 in the upper tail of the F-distribution with ν1 = b − 1 = 2 − 1 = 1 and ν2 = n − ab = 24 − 4(2) = 16. From Table VIII, Appendix B, F.10 = 3.05. The rejection region is F > 3.05. Since the observed value of the test statistic falls in the rejection region (F = 6.33 > 3.05), H0 is rejected. There is sufficient evidence to indicate the two means of factor B differ at α = .10. All of the tests performed are warranted because interaction was not significant. 8.62
a.
The treatments are the combinations of the levels of factor A and the levels of factor B. There are 2 × 2 = 4 treatments. The treatment means are: x11 = x21 =
∑x
11
2 ∑ x21 2
=
29.6 + 35.2 = 32.4 2
x12 =
=
12.9 + 17.6 = 15.25 2
x22 =
∑x
12
2 ∑ x22 2
=
47.3 + 42.1 2
=
28.4 + 22.7 2
The factors do not appear to interact—the lines are almost parallel. The treatment means do appear to differ because the sample means range from 15.25 to 44.7.
b.
276
(∑ x )
2
235.82 8 n 2 SS(Total) = ∑ x − CM = 7922.92 − 6950.205 = 972.715
CM =
i
SSA =
∑A
SSB =
∑B
2 i
br 2 i
ar
=
− CM=
154.22 81.62 + = 7609.05 − 6950.205 = 658.845 2(2) 2(2)
− CM=
95.32 140.52 + = 7205.585 − 6950.205 = 255.38 2(2) 2(2)
Chapter 8
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
∑∑ AB
2 ij
64.82 89.42 30.52 51.12 + + + r 2 4 2 2 − 658.845 − 255.38 − 6950.205 = 7866.43 − 7864.43 = 2 SSE = SS(Total) − SSA − SSB − SSAB = 972.715 − 658.845 − 255.38 − 2 = 56.49
SSAB =
− SSA − SSB − CM =
df = a − 1 = 2 − 1 = 1 df = b − 1 = 2 − 1 = 1 df = (a − 1)(b − 1) = (2 − 1)(2 − 1) = 1 df = n − ab = 8 − 2(2) = 4 df = n − 1 = 8 − 1 = 7
A B AB Error Total
SSA 658.845 = = 658.845 a −1 1 SSAB 2 = =2 MSAB = (a − 1)(b − 1) 1
MSA =
MSB =
SSB 255.38 = = 255.38 b −1 1
MSE =
SSE 56.49 = 14.1225 = n - ab 4
MSA 658.845 = = 46.65 MSE 14.1225
FA = FAB =
FB = B
MSB 255.38 = = 18.08 MSE 14.1225
MSAB 2 = = .14 MSE 14.1225
The ANOVA table is: Source A B AB Error Total
c.
df
1 1 1 4 7
SS 658.845 255.380 2.000 56.490 972.715
MS 658.845 255.380 2.000 14.1225
F 46.65 18.08 .14
SST = SSA + SSB + SSAB = 658.845 + 255.380 + 2.000 = 916.225 df = ab − 1 = 2(2) − 1 = 3 SST 916.225 MST 305.408 = 21.63 MST = = = 305.408 FT = = ab − 1 3 MSE 14.1225 To determine whether the treatment means differ, we test: H0: μ1 = μ2 = μ3 = μ4 Ha: At least two of the treatment means differ The test statistic is F = 21.63. The rejection region requires α = .05 in the upper tail of the F-distribution with ν1 = ab − 1 = 2(2) − 1 = 3 and ν2 = n − ab = 8 − 2(2) = 4. From Table IX, Appendix B, F.05 = 6.59. The rejection region is F > 6.59.
Design of Experiments and Analysis of Variance
277
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
Since the observed value of the test statistic falls in the rejection region (F = 21.63 > 6.59), H0 is rejected. There is sufficient evidence to indicate the treatment means differ at α = .05. This agrees with the conclusion in part a. d.
Since there are differences among the treatment means, we test for the presence of interaction: H0: Factors A and B do not interact to affect the response means Ha: Factors A and B do interact to affect the response means The test statistic is F = .14. The rejection region requires α = .05 in the upper tail of the F-distribution with ν1 = (a − 1)(b − 1) = (2 − 1)(2 − 1) = 1 and ν2 = n − ab = 8 − 2(2) = 4. From Table IX, Appendix B, F.05 = 7.71. The rejection region is F > 7.71. Since the observed value of the test statistic does not fall in the rejection region (F = .14 >/ 7.71), H0 is not rejected. There is insufficient evidence to indicate the factors interact at α = .05.
e.
Since the interaction was not significant, we test for main effects. To determine whether the two means of factor A differ, we test: H0: μ1 = μ2 Ha: μ1 ≠ μ2 The test statistic is F = 46.65. The rejection region requires α = .05 in the upper tail of the F-distribution with ν1 = a − 1 = 2 − 1 = 1 and ν2 = n − ab = 8 − 2(2) = 4. From Table IX, Appendix B, F.05 = 7.71. The rejection region is F > 7.71. Since the observed value of the test statistic falls in the rejection region (F = 46.65 > 7.71), H0 is rejected. There is sufficient evidence to indicate the two means of factor A differ at α = .05. To determine whether the two means of factor B differ, we test: H0: μ1 = μ2 Ha: μ1 ≠ μ2 The test statistic is F = 18.08. The rejection region requires α = .05 in the upper tail of the F-distribution with ν1 = b − 1 = 2 − 1 = 1 and ν2 = n − ab = 8 − 2(2) = 4. From Table IX, Appendix B, F.05 = 7.71. The rejection region is F > 7.71.
278
Chapter 8
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
Since the observed value of the test statistic falls in the rejection region (F = 18.08 > 7.71), H0 is rejected. There is sufficient evidence to indicate the two means of factor B differ at α = .05. f.
The results of all the tests agree with those in part a.
g.
Since no interaction is present, but the means of both factors A and B differ, we compare the two means of factor A and compare the two means of factor B. Since there are only two means to compare for each factor, the higher population mean corresponds to the higher sample mean. Factor A: x1 = x2 =
∑x
1
br
∑x
2
br
=
29.6 + 35.2 + 47.3 + 42.1 = 38.55 2(2)
=
12.9 + 17.6 + 28.4 + 22.7 = 20.4 2(2)
The mean for level 1 of factor A is significantly higher than the mean for level 2. Factor B: x1 = x2 =
∑x
1
ar
∑x
2
ar
=
29.6 + 35.2 + 12.9 + 17.6 = 23.825 2(2)
=
47.3 + 42.1 + 28.4 + 22.7 = 35.125 2(2)
The mean for level 2 of factor B is significantly higher than the mean for level 1. 8.64
a.
There are a total of 2 × 4 = 8 treatments.
b.
The interaction between temperature and type was significant. This means that the effect of type of yeast on the mean autolysis yield depends on the level of temperature.
c.
To determine if the main effect of type of yeast is significant, we test: H0: μBa = μBr Ha: μBa ≠ μBr To determine if the main effect of temperature is significant, we test: H0: μ1 = μ2 = μ3 = μ4 Ha: At least one mean differs
d.
The tests for the main effects should not be run until after the test for interaction is conducted. If interaction is significant, then these interaction effects could cover up the main effects. Thus, the main effect tests would not be informative. If the test for interaction is not significant, then the main effect tests could be run.
Design of Experiments and Analysis of Variance
279
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
e.
Baker’s yeast:
The mean yield for temperature 54o is significantly lower than the mean yields for the other 3 temperatures. There is no difference in the mean yields for the temperatures 45o, 48o and 51o. Brewer’s yeast: The mean yield for temperature 54o is significantly lower than the mean yields for the other 3 temperatures. There is no difference in the mean yields for the temperatures 45o, 48o and 51o.
8.66
a.
This is an observational experiment. The researcher recorded the number of users per hour for each of 24 hours per day, 7 days per week, for 7 weeks. The researcher did not manipulate the weeks or days or hours.
b.
The two factors are (1) the day of the week with 7 levels and (2) the hour of the day with 24 levels.
c.
In a factorial experiment, a is the number of levels of factor A and b is the number of levels of factor B. If we let factor A be the day of the week and factor B be the hour of the day, then a = 7 and b = 24.
d.
To determine if the a × b = 7 × 24 = 168 treatment means differ, we test: H0: μ1 = μ2 = μ3 = . . . = μ168 Ha: At least two means differ The test statistic is F =
MST 1143.99 = = 25.06 MSE 45.65
The rejection region requires α = .01 in the upper tail of the F distribution with v1 = p − 1 = 168 − 1 = 167 and v2 = n − p = 1172 – 168 = 1004. From Table XI, Appendix B, F.01 ≈ 1.00. The rejection region is F > 1.00. Since the observed value of the test statistic falls in the rejection region (F = 25.06 > 1.00), H0 is rejected. There is sufficient evidence to indicate a difference in mean usage among the day-hour combinations at α = .01. e.
The hypotheses used to test if an interaction effect exists are: H0: Days and hours do not interact to affect the mean usage Ha: Days and hours interact do affect the mean usage
f.
The test statistic is F =
MSAB 55.69 = 1.22 = MSE 45.65
The p-value is p = .0527. Since the p-value is not less than α = .01, H0 is not rejected. There is insufficient evidence to indicate days and hours interact to affect usage at α = .01.
280
Chapter 8
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
g.
To determine if the mean usage differs among the days of the week, we test: H0: μ1 = μ2 = μ3 = μ4 = μ5 = μ6 = μ7 Ha: At least two means differ The test statistic is F =
MSA 3122.02 = 68.39 = MSE 45.65
The p-value is p = .0001. Since the p-value is less than α = .01, H0 is rejected. There is sufficient evidence to indicate the mean usage differs among the days of the week at α = .01. To determine if the mean usage differs among the hours of the day, we test: H0: μ1 = μ2 = μ3 = . . . = μ24 Ha: At least two means differ The test statistic is F =
MSB 7157.82 = 156.80 = MSE 45.65
The p-value is p = .0001. Since the p-value is less than α = .01, H0 is rejected. There is sufficient evidence to indicate the mean usage differs among the hours of the day at α = .01. 8.68
a.
The degrees of freedom for “Type of message retrieval system” is a − 1 = 2 − 1 = 1. The degrees of freedom for “Pricing option” is b − 1 = 2 − 1 = 1. The degrees of freedom for the interaction of Type of message retrieval system and Pricing option is (a − 1)(b – 1) = (2 − 1)(2 − 1) = 1. The degrees of freedom for error is n − ab = 120 − 2(2) = 116. Source Type of message retrieval system Pricing Option Type of system × pricing option Error Total
b.
Df 1 1 1 116 119
SS -
MS -
F 2.001 5.019 4.986
To determine if “Type of system” and “Pricing option” interact to affect the mean willingness to buy, we test: H0: “Type of system” and “Pricing option” do not interact Ha: “Type of system” and “Pricing option” interact
c.
The test statistic is F =
MSAB = 4.986 MSE
The rejection region requires α = .05 in the upper tail of the F distribution with ν1 = (a − 1)(b − 1) = (2 − 1)(2 − 1) = 1 and ν2 = n − ab = 120 − 2(2) = 116. From Table IX, Appendix B, F.05 ≈ 3.92. The rejection region is F > 3.92.
Design of Experiments and Analysis of Variance
281
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
Since the observed value of the test statistic falls in the rejection region (F = 4.986 > 3.92), H0 is rejected. There is sufficient evidence to indicate “Type of system” and “Pricing option” interact to affect the mean willingness to buy at α = .05.
8.70
d.
No. Since the test in part c indicated that interaction between “Type of system” and “Pricing option” is present, we should not test for the main effects. Instead, we should proceed directly to a multiple comparison procedure to compare selected treatment means. If interaction is present, it can cover up the main effects.
a.
The treatments are the 3 × 3 = 9 combinations of PES and Trust. The nine treatments are: (BC, Low), (PC, Low), (NA, Low), (BC, Med), (PC, Med), (NA, Med), (BC, High), (PC, High), and (NA, High).
b.
df(Trust) = 3 − 1 = 2; SSE = SSTot − SS(PES) − SS(Trust) − SSPT = 161.1162 − 2.1774 − 7.6367 − 1.7380 = 149.5641 SS(PES) 2.1774 = = 1.0887 MS(PES) = 2 df(PES) SS(Trust) 7.6367 = = 3.81835 MS(Trust) = 2 df(Trust) SS(PT) 1.7380 = = 0.4345 MS(PT) = df(PT) 4 SSE 149.5641 MSE = = 0.7260 = df(Error) 206 MS(PES) MS(Trust) 1.0887 3.81835 FPES = = 1.50 FTrust = = 5.26 = = MSE MSE 0.7260 0.7260 MS(PT) 0.4345 FPT = = = 0.60 0.7260 MSE The ANOVA table is: Source PES Trust PES × Trust Error Total
c.
df 2 2 4 206 214
SS 2.1774 7.6367 1.7380 149.5641 161.1162
MS 1.0887 3.81835 0.4345 0.7260
F 1.50 5.26 0.60
To determine if PES and Trust interact, we test: H0: PES and Trust do not interact to affect the mean tension Ha: PES and Trust do interact to affect the mean tension The test statistic is F = 0.60.
282
Chapter 8
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
The rejection region requires α = .05 in the upper tail of the F-distribution with ν1 = (a − 1)(b − 1) = (3 − 1)(3 − 1) = 4 and ν2 = n − ab = 215 − 3(3) = 206. From Table IX, Appendix B, F.05 ≈ 2.37. The rejection region is F > 2.37. Since the observed value of the test statistic does not fall in the rejection region (F = 0.60 >/ 2.37), H0 is not rejected. There is insufficient evidence to indicate that PES and Trust interact at α = .05. d.
The plot of the treatment means is: The mean tension scores for Low Trust are relatively the same for each level of PES. Similarly, the mean tension scores for Medium Trust are relatively the same for each level of PES. However, the mean tension scores for High Trust are not the same for each level of PES. For both PES levels BC and PC, as the level of trust increases, the mean tension scores decrease. However, for PES level NA, as trust goes from low to medium, the mean tension decreases. As the trust goes from medium to high, the mean tension increases. This indicates that interaction is present which was also found in part d.
e.
8.72
Because the interaction of PES and Trust was found to be significant, the tests for the main effects are irrelevant. If the factors interact, the interaction effect can cover up any main effect differences. In addition, interaction implies that the effects of one factor on the dependent variable are different at different levels of the second factor. Thus, there is no one "main" effect of the factor.
Using MINITAB, the ANOVA results are: General Linear Model: Deviation versus Group, Trail Factor Group Trail
Type Levels Values fixed 4 F G M N fixed 2 C E
Analysis of Variance for Deviatio, using Adjusted SS for Tests Source Group Trail Group*Trail Error Total
DF 3 1 3 112 119
Seq SS 16271.2 46445.5 2245.2 82131.7 147093.6
Adj SS 13000.6 46445.5 2245.2 82131.7
Adj MS 4333.5 46445.5 748.4 733.3
F 5.91 63.34 1.02
P 0.001 0.000 0.386
First, we must test for treatment effects. SST = SS(Group) + SS(Trail) + SS(GxT) = 16,271.2 + 46,445.5 + 2,245.2 = 64,961.9. The df = 3 + 1 + 3 = 7.
Design of Experiments and Analysis of Variance
283
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
MST =
SST 64, 961.9 = = 9, 280.2714 ab − 1 4(2) − 1
F=
MST 9, 280.2714 = = 12.66 MSE 733.3
To determine if there are differences in mean ratings among the 8 treatments, we test: H0: All treatment means are the same Ha: At least two treatment means differ The test statistic is F = 12.66. Since no α was given, we will use α = .05. The rejection region requires α = .05 in the upper tail of the F distribution with ν1 = ab – 1 = 4(2) – 1 = 7 and ν2 = n – ab = 120 – 4(2) = 112. From Table IX, Appendix B, F.05 ≈ 2.09. The rejection region is F > 2.09. Since the observed value of the test statistic falls in the rejection region (F = 12.66 > 2.09), H0 is rejected. There is sufficient evidence that differences exist among the treatment means at α = .05. Since differences exist, we now test for the interaction effect between Trail and Group. To determine if Trail and Group interact, we test: H0: Trail and Group do not interact Ha: Trail and Group do interact The test statistic is F = 1.02 and p = .386 Since the p-value is greater than α (p = .386 > .05), H0 is not rejected. There is insufficient evidence that Trail and Group interact at α = .05. Since the interaction does not exist, we test for the main effects of Trail and Group. To determine if there are differences in the mean rating between the two levels of Trail, we test: H0: μ1 = μ2 Ha: μ1 ≠ μ2 The test statistics is F = 63.34 and p = 0.000. Since the p-value is greater than α (p = .000 < .05), H0 is rejected. There is sufficient evidence that the mean trail deviations differ between the fecal extract trail and the control trail α = .05. To determine if there are differences in the mean rating between the four levels of Group, we test: H0: μ1 = μ2 = μ3 = μ4 Ha: At least 2 means differ The test statistics is F = 5.91 and p = 0.001. Since the p-value is less than α (p = 0.001 < .05), Ho is rejected. There is sufficient evidence that the mean trail deviations differ among the four groups at α = .05.
284
Chapter 8
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
8.74
There are 3 × 2 = 6 treatments. They are A1B1, A1B2, A2B1, A2B2, A3B1, and A3B2.
8.76
a.
B
B
B
B
B
B
SSE = SSTot − SST = 62.55 − 36.95 = 25.60 df Treatment = p − 1 = 4 − 1 = 3 df Error = n − p = 20 − 4 = 16 df Total = n − 1 = 20 − 1 = 19 36.95 = 12.32 MST = SST/df = 3 25.60 = 1.60 MSE = SSE/df = 16 MST 12.32 F= = = 7.70 MSE 1.60 The ANOVA table: Source Treatment Error Total
b.
df
3 16 19
SS 36.95 25.60 62.55
MS 12.32 1.60
F 7.70
To determine if there is a difference in the treatment means, we test: H0: μ1 = μ2 = μ3 = μ4 Ha: At least two of the means differ where the μi represents the mean for the ith treatment. The test statistic is F =
MST = 7.70 MSE
The rejection region requires α = .10 in the upper tail of the F-distribution with ν1 = (p − 1) = (4 − 1) = 3 and ν2 = (n − p) = (20 − 4) = 16. From Table VIII, Appendix B, F.10 = 2.46. The rejection region is F > 2.46. Since the observed value of the test statistic falls in the rejection region (F = 7.70 > 2.46), H0 is rejected. There is sufficient evidence to conclude that at least two of the means differ at α = .10. c.
x4 =
∑x
4
n4
=
57 = 11.4 5
For confidence level .90, α = .10 and α/2 = .10/2 = .05. From Table VI, Appendix B, with df = 16, t.05 = 1.746. The confidence interval is: x4 ± t.05 MSE/n4 ⇒ 11.4 ± 1.746⋅ 1.6 / 5 ⇒ 11.4 ± .99 ⇒ (10.41, 12.39)
Design of Experiments and Analysis of Variance
285
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
8.78
a.
df(AB) = (a − 1)(b - 1) = 3(5) = 15 df(Error) = n − ab = 48 − 4(6) = 24 SSAB = MSAB(df) = 3.1(15) = 46.5 SS(Total) = SSA + SSB + SSAB + SSE = 2.6 + 9.2 + 46.5 + 18.7 = 77 SS A 2.6 SSB 9.2 = = .8667 = = 1.84 MSA = MSB = a −1 3 b −1 5 SSE 18.7 = = .7792 MSE = n − ab 24 MSA .8667 MSB 1.84 = = 1.11 = = 2.36 FB = FA = MSE .7792 MSE .7792 MS AB 3.1 = = 3.98 FAB = MSE .7792 B
Source A B AB Error Total
df 3 5 15 24 47
SS 2.6 9.2 46.5 18.7 77.0
MS .8667 1.84 3.1 .7792
F 1.11 2.36 3.98
b.
Factor A has a = 3 + 1 = 4 levels and factor B has b = 5 + 1 = 6 levels. The number of treatments is ab = 4(6) = 24. The total number of observations is n = 47 + 1 = 48. Thus, two replicates were performed.
c.
SST = SSA + SSB + SSAB = 2.6 + 9.2 + 46.5 = 58.3 SST 58.3 = = 2.5347 MST = ab − 1 4(6) − 1
F=
MST 2.5347 = = 3.25 MSE .7792
To determine whether the treatment means differ, we test: H0: μ1 = μ2 = ⋅⋅⋅ = μ24 Ha: At least one treatment mean is different The test statistic is F =
MST = 3.25 MSE
The rejection region requires α = .05 in the upper tail of the F-distribution with ν1 = ab − 1 = 4(6) − 1 = 23 and ν2 = n − ab = 48 - 4(6) = 24. From Table IX, Appendix B, F.05 ≈ 2.03. The rejection region is F > 2.03. Since the observed value of the test statistic falls in the rejection region (F = 3.25 > 2.03), H0 is rejected. There is sufficient evidence to indicate the treatment means differ at α = .05.
286
Chapter 8
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
d.
Since there are differences among the treatment means, we test for the presence of interaction: H0: Factor A and factor B do not interact to affect the response mean Ha: Factor A and factor B do interact to affect the response mean The test statistic is F =
MS AB = 3.98 MSE
The rejection region requires α = .05 in the upper tail of the F-distribution with ν1 = (a − 1)(b − 1) = (4 − 1)(6 − 1) = 15 and ν2 = n − ab = 48 − 4(6) = 24. From Table IX, Appendix B, F.05 = 2.11. The rejection region is F > 2.11. Since the observed value of the test statistic falls in the rejection region (F = 3.98 > 2.11), H0 is rejected. There is sufficient evidence to indicate factors A and B interact to affect the response means at α = .05. Since the interaction is significant, no further tests are warranted. Multiple comparisons need to be performed. 8.80
a.
This is a two-factor factorial design. It is also a completely randomized design.
b.
The two factors are "involvement in topic" and "question wording." Both are qualitative variables because neither are measured on numerical scales.
c.
There are two levels of "involvement in topic": high and low. There are two levels of "question wording": positive and negative.
d.
There are 2 × 2 = 4 treatments. The are: (high, positive), (high, negative), (low, positive), and (low, negative)
8.82
e.
The experiment's dependent variable is the level of agreement.
a.
To determine if the mean vacancy rates of the eight office-property submarkets in Atlanta differ, we test: H0: μ1 = μ2 = μ3 = μ4 = μ5 = μ6 = μ7 = μ8 Ha: At least two means differ
b.
If quarterly data were used for nine years, there are 4 × 9 = 36 observations per submarket. Since there are 8 submarkets, the total sample size is 8 × 36 = 288. Since no value of α is given, we will use α = .05. The rejection region requires α = .05 in the upper tail of the F-distribution with ν1 = k − 1 = 8 − 1 = 7 and ν2 = n − k = 288 – 8 = 280. From Table X, Appendix B, F.05 ≈ 2.01. The rejection region is F > 2.01.
Design of Experiments and Analysis of Variance
287
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
Since the observed value of the test statistic falls in the rejection region (F = 17.54 > 2.01), H0 is rejected. There is sufficient evidence to indicate the mean vacancy rates of the eight office-property submarkets in Atlanta differ at α = .05.
8.84
c.
With ν1 = k − 1 = 8 − 1 = 7 and ν2 = n − k = 288 – 8 = 280, P(F > 17.54) < .01, using Table XI, Appendix B. Thus, the p-value is less than .01.
d.
We must assume that all eight samples are randomly drawn from normal populations, the eight populations variances are the same, and the samples are independent.
e.
The mean vacancy rate for the South submarket is significantly larger than the mean vacancy rates for all other submarkets. The mean vacancy rate of the Downtown submarket is significantly larger than the mean vacancy rates for all other submarkets except the South. The mean vacancy rate of the North Lake submarket is significantly larger than the mean vacancy rates for all other submarkets except the South and Downtown. The mean vacancy rate of the Midtown submarket is significantly larger than the mean vacancy rates for all other submarkets except the South, Downtown, and North Lake. There are no other significant differences.
a.
The response is the weight of a brochure. There is one factor and it is carton. The treatments are the five different cartons, while the experimental units are the brochures.
b.
(∑ y)
2
.750052 = .01406437506 n 40 SS(Total) = ∑ y 2 − CM = .014066537 − .01406437506 = .00000216264
CM =
SST =
=
2 Ti 2 . .15028 2 .14962 2 .15217 2 .150312 + + + + − .01406437506 ∑ n − CM = 14767 8 8 8 8 8 i
= .01406568209 - .01406437506 = .00000130703 SSE = SS(Total) − SST = .00000216264 - .00000130703 = .00000085561 SST .00000130703 = MST = = .000000326756 k −1 5 −1 SSE .00000085561 = = .000000024446 MSE = n−k 40 − 5 MST .000000326756 F= = = 13.37 MSE .000000024446 Source Treatments Error Total
df 4 35 39
SS .00000130703 .00000085561 .00000216264
MS F .000000326756 13.37 .000000024446
To determine whether there are differences in mean weight per brochure among the five cartons, we test:
H0: μ1 = μ2 = μ3 = μ4 = μ5 Ha: At least two treatment means differ
288
Chapter 8
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
The test statistic is F = 13.37. The rejection region requires α = .05 in the upper tail of the F-distribution with ν1 = k − 1 = 5 − 1 = 4 and ν2 = n − k = 40 − 5 = 35. From Table IX, Appendix B, F.05 ≈ 2.53. The rejection region is F > 2.53. Since the observed value of the test statistic falls in the rejection region (F = 13.37 > 2.53), H0 is rejected. There is sufficient evidence to indicate a difference in mean weight per brochure among the five cartons at α = .05. c.
We must assume that the distributions of weights for the brochures in the five cartons are normal, that the variances of the weights for the brochures in the five cartons are equal, and that random and independent samples were selected from each of the cartons.
d.
Using MINITAB, the results of Tukey’s multiple comparison procedure are:
Level Carton1 Carton2 Carton3 Carton4 Carton5
N 8 8 8 8 8
Mean 0.018459 0.018785 0.018703 0.019021 0.018789
Individual 95% CIs For Mean Based on Pooled StDev ---+---------+---------+---------+----(-----*-----) (----*-----) (----*-----) (-----*-----) (----*-----) ---+---------+---------+---------+-----0.01840 0.01860 0.01880 0.01900
StDev 0.000105 0.000101 0.000109 0.000232 0.000188
Pooled StDev = 0.000156 Tukey 95% Simultaneous Confidence Intervals All Pairwise Comparisons Individual confidence level = 99.32% Carton1 subtracted from: Carton2 Carton3 Carton4 Carton5
Lower 0.0001013 0.0000188 0.0003375 0.0001050
Center 0.0003262 0.0002437 0.0005625 0.0003300
Upper 0.0005512 0.0004687 0.0007875 0.0005550
Carton2 Carton3 Carton4 Carton5
------+---------+---------+---------+--(-----*------) (-----*-----) (-----*-----) (-----*------) ------+---------+---------+---------+---0.00035 0.00000 0.00035 0.00070
Carton2 subtracted from: Carton3 Carton4 Carton5
Lower -0.0003075 0.0000113 -0.0002212
Center -0.0000825 0.0002363 0.0000037
Carton3 Carton4 Carton5
------+---------+---------+---------+--(------*-----) (------*-----) (-----*------) ------+---------+---------+---------+---0.00035 0.00000 0.00035 0.00070
Design of Experiments and Analysis of Variance
Upper 0.0001425 0.0004612 0.0002287
289
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
Carton3 subtracted from: Carton4 Carton5
Lower 0.0000938 -0.0001387
Center 0.0003187 0.0000862
Upper 0.0005437 0.0003112
Carton4 Carton5
------+---------+---------+---------+--(-----*------) (-----*------) ------+---------+---------+---------+---0.00035 0.00000 0.00035 0.00070
Carton4 subtracted from: Carton5
Lower -0.0004575
Center -0.0002325
Upper -0.0000075
Carton5
------+---------+---------+---------+--(-----*------) ------+---------+---------+---------+---0.00035 0.00000 0.00035 0.00070
The means arranged in order are: Carton 1 Carton 3 Carton 2 .018459.018703.018785.018789.019021
Carton 5
Carton 4
The interpretation of the Tukey results are: The mean weight for carton 4 is significantly higher than the mean weights of all the other cartons. The mean weights of cartons 5, 4, and 3 are significantly higher than the mean weight of carton 1.
8.86
e.
Since there are differences among the cartons, management should sample from many cartons.
a.
This is a randomized block design. Response: Factor: Factor type: Treatments: Experimental units:
290
the length of time required for a cut to stop bleeding drug qualitative drugs A, B, and C subjects
Chapter 8
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
b.
Using MINITAB, the results are: General Linear Model: Y versus Drug, Person Factor Drug Person
Type Levels Values fixed 3 A B C fixed 5 1 2 3 4 5
Analysis of Variance for Y, using Adjusted SS for Tests Source Drug Person Error Total
DF 2 4 8 14
Seq SS 156.4 7645.8 160.1 7962.3
Adj SS 156.4 7645.8 160.1
Adj MS 78.2 1911.5 20.0
F 3.91 95.51
P 0.066 0.000
Tukey 90.0% Simultaneous Confidence Intervals Response Variable Y All Pairwise Comparisons among Levels of Drug Drug = A subtracted from: Drug B C
Lower -11.56 -3.72
Center -4.820 3.020
Upper 1.922 9.762
-----+---------+---------+---------+(-------*-------) (--------*-------) -----+---------+---------+---------+-8.0 0.0 8.0 16.0
Upper 14.58
-----+---------+---------+---------+(--------*-------) -----+---------+---------+---------+-8.0 0.0 8.0 16.0
Drug = B subtracted from: Drug C
Lower 1.098
Center 7.840
Let μ1, μ2, and μ3 represent the mean clotting time for the three drugs.
H0: μ1 = μ2 = μ3 Ha: At least two means differ The test statistic is F =
MS(Drug) = 3.91 MSE
The p-value is p = 0.066. Since the observed level of significance is less than α = .10, H0 is rejected. There is sufficient evidence to indicate differences in the mean clotting times among the three drugs at α = .10. c.
The observed level of significance is given as 0.066.
d.
To determine if there is a significant difference in the mean response over blocks, we test:
H0: μ1 = μ2 = μ3 = μ4 = μ5 Ha: At least two block means differ The test statistic is F =
MS(Person) = 95.51 MSE
Design of Experiments and Analysis of Variance
291
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
The p-value is p = 0.000. Since the observed level of significance is less than α = .10, H0 is rejected. There is sufficient evidence to indicate differences in the mean clotting times among the five people at α = .10. e.
The confidence interval to compare drugs A and B is (-11.56, 1.922). Since 0 is in the interval, there is no evidence of a difference in mean clotting times between drugs A and B. The confidence interval to compare drugs A and C is (-3.72, 9.762). Since 0 is in the interval, there is no evidence of a difference in mean clotting times between drugs A and C. The confidence interval to compare drugs B and C is (1.098, 14.58). Since 0 is not in the interval, there is evidence of a difference in mean clotting times between drugs B and C. Since the numbers are positive, the mean clotting time for drug C is greater than that for drug B. In summary, the mean clotting time for drug C is greater than that for drug B. No other differences exist.
8.88
a.
243.2 57.8 SS A SSB = = 243.2 MSB = = = 57.8 1 1 df B df A SSAB = SSTot- SSA - SSB - SSE = 976.3 - 243.2 - 57.8 - 670.8 = 4.5 SS AB SSE 4.5 670.8 = 4.5 MSE = = 8.712 = = MSAB = 1 77 df AB df E
MSA =
MS A 243.2 = 27.92 = MSE 8.712 MSAB 4.5 = 0.52 FAB = = 8.712 MSE
FA =
FB = B
MSB 57.8 = 6.63 = MSE 8.712
The ANOVA table is: Source Recent Performance (A) Risk attitude(B) AB Error Total
b.
df
1 1 1 77 80
SS 243.2 57.8 4.5 670.8 976.3
MS 243.2 57.8 4.5 8.712
F 27.92 6.63 0.52
To determine if factors A and B interact, we test:
H0: Factors A and B do not interact to affect the mean decision Ha: Factors A and B do interact to affect the mean decision The test statistic is F = 0.52.
292
Chapter 8
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
The rejection region requires α = .05 in the upper tail of the F-distribution with ν1 = (a − 1)(b − 1) = (2 − 1)(2 − 1) = 1 and ν2 = n − ab = 81 − 2(2) = 77. From Table IX, Appendix B, F.05 ≈ 4.00. The rejection region is F > 4.00. Since the observed value of the test statistic does not fall in the rejection region (F = .52 >/ 4.00), H0 is not rejected. There is insufficient evidence to indicate that factors A and B interact at α = .05. c.
Since the interaction is not significant, the main effect tests are meaningful. To determine if an individual's risk attitude affects his or her budgetary decisions, we test:
H0: No difference exists between the risk attitude means Ha: The risk attitude means differ The test statistic is F = 6.63. The rejection region requires α = .05 in the upper tail of the F-distribution with ν1 = b − 1 = 2 − 1 = 1 and ν2 = n − ab = 81 − 2(2) = 77. From Table IX, Appendix B, F.05 ≈ 4.00. The rejection region is F > 4.00. Since the observed value of the test statistic falls in the rejection region (F = 6.63 > 4.00), H0 is rejected. There is sufficient evidence to indicate an individual's risk attitude affects his or her budgetary decisions at α = .05. d.
To determine if recent performance affects budgeting decisions, we test:
H0: No difference exists between the recent performance means Ha: The recent performance means differ The test statistic is F = 27.92. The rejection region requires α = .01 in the upper tail of the F-distribution with ν1 = a − 1 = 2 − 1 = 1 and ν2 = n − ab = 81 − 2(2) = 77. From Table XI, Appendix B, F.01 ≈ 7.08. The rejection region is F > 7.08. Since the observed value of the test statistic falls in the rejection region (F = 27.92 > 7.08), H0 is rejected. There is sufficient evidence to indicate that recent performance affects his or her budgetary decisions at α = .01.
Design of Experiments and Analysis of Variance
293
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
8.90
Let factor A be second plastic and factor B be metal density. Some preliminary calculations are:
(∑ y)
2
5.562 = 3.8642 n 8 SS(Total) = ∑ y 2 − CM = 9.1646 − 3.8642 = 5.3004
CM =
SSA = SSB =
=
Ai2 .922 4.642 ∑ br − CM = 2(2) + 2(2) − 3.8642 = 5.594 − 3.8642 = 1.7298 B 2j
∑ ar
SSAB =
∑
− CM =
ABij2 ar
.57 2 4.992 + − 3.8642 = 6.30625 − 3.8642 = 2.44205 2(2) 2(2)
− SSA − SSB − CM
.062 .862 .512 4.132 + + + − 1.7298 − 2.44205 − 3.8642 2 2 2 2 = 9.0301 − 8.03605 = .99405 SSE = SS(Total) − SSA − SSB − SSAB = 5.3004 − 1.7298 − 2.44205 − .99405 = .1345 SSA 1.7298 MSA = = = 1.7298 a −1 2 −1 SSB 2.44205 = = 2.44205 MSB = b −1 2 −1 SS AB .99405 = MSAB = = .99405 (a − 1)(b − 1) (1)(1) SSE .1345 = MSE = = .033625 n − ab 8 − (2)(2) MSA 1.7298 F(A) = = = 51.44 MSE .033625 MSB 2.44205 = F(B) = = 72.63 MSE .033625 MS AB .99405 F(AB) = = = 29.56 MSE .033625 =
Source A B AB Error Total
df 1 1 1 4
SS 1.72980 2.44205 .99405 .13450 7 5.30040
MS 1.72980 2.44205 .99405 .033625
F 51.44 72.63 29.56
SST = SSA + SSB + SSAB = 1.7298 + 2.44205 + .99405 = 5.1659 SST 5.1659 = = 1.7220 MST = ab − 1 2(2) − 1
294
Chapter 8
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
F(T) =
MST 1.7220 = 51.21 = MSE .033625
To determine whether differences exist among the treatment means, we test:
H0: μ1 = μ2 = μ3 = μ4 Ha: At least two treatment means differ The test statistic is F = 51.21. The rejection region requires α = .05 in the upper tail of the F-distribution with ν1 = ab − 1 = 2(2) − 1 = 3 and ν2 = n − ab = 8 − 2(2) = 4. From Table IX, Appendix B, F.05 = 6.59. The rejection region is F > 6.59. Since the observed value of the test statistic falls in the rejection region (F = 51.21 > 6.59), H0 is rejected. There is sufficient evidence to indicate differences in mean radiation among the four treatments at α = .05. Since there are differences among the treatment means, we next test to see if the two factors interact.
H0: Second plastic and metal density do not interact Ha: Second plastic and metal density do interact The test statistic is F =
MS AB = 29.56 MSE
The rejection requires α= .05 in the upper tail of the F-distribution with ν1 = (a − 1)(b − 1) = 1 and ν2 = n − ab = 8 − 2(2) = 4. From Table IX, Appendix B, F.05 = 7.71. The rejection region is F > 7.71. Since the observed value of the test statistic falls in the rejection region (F = 29.56 > 7.71), H0 is rejected. There is sufficient evidence to indicate second plastic and metal density interact at α = .05. Since interaction is present, no tests for main effects are necessary. Since we want to find the preferred method to protect patients, we will compare all four treatment means. There are four p ( p − 1) 4(4 − 1) treatments, so c = = = 6. For α* = α/c = .05/6 = .0083 and α*/2 = .0083/2 = 2 2 .0042 ≈ .005 and df = n - ab = 4, t.005 = 4.604 from Table VI, Appendix B.
Design of Experiments and Analysis of Variance
295
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
We now form confidence intervals for the differences between each pair of means using the formula: ( xi − x j ) ± t.005 s
1 1 + where s = ni n j
MSE = .033625 = .1834
Pair
11 – 12 11 – 21 11 – 22 12 – 21 12 – 22 21 – 22
1 1 + ⇒ −.40 ± .844 ⇒ (−1.244, .444) 2 2 (.03 − .255) ± .844 ⇒ −.255 ± .844 ⇒ (−1.069, .619) (.03 − 2.065) ± .844 ⇒ −2.035 ± .844 ⇒ (−2.879, −1.191) (.43 − .255) ± .844 ⇒ .175 ± .844 ⇒ (−.669, 1.019) (.43 − 2.065) ± .844 ⇒ −1.635 ± .844 ⇒ (−2.479, −.791) (.255 - 2.065) ± .844 ⇒ −1.81 ± .844 ⇒ (−2.654, −.966) (.03 − .43) ± 4.604(.1834)
The means that differ are 11 and 22, 12 and 22, and 21 and 22. No other means are significantly different. Since we are looking for the treatment that gives the best protection (allows the smallest amount of radiation), we would pick any treatment except 22. Thus, use second plastic present and heavy alloy, second plastic present and light alloy, or second plastic not present and heavy alloy. Pick the one of these three which is the cheapest or the most convenient. 8.92
a.
There are a total of a × b = 3 × 3 = 9 treatments in this study.
b.
Using MINITAB, the ANOVA results are: General Linear Model: Y versus Display, Price Factor Display Price
Type Levels Values fixed 3 1 2 3 fixed 3 1 2 3
Analysis of Variance for Y, using Adjusted SS for Tests Source Display Price Display*Price Error Total
DF 2 2 4 18 26
Seq SS 1691393 3089054 510705 8905 5300057
Adj SS 1691393 3089054 510705 8905
Adj MS F 845696 1709.37 1544527 3121.89 127676 258.07 495
P 0.000 0.000 0.000
To get the SS for Treatments, we must add the SS for Display, SS for Price, and the SS for Interaction. Thus, SST = 1,691,393 + 3,089,054 + 510,705 = 5,291,152. The df = 2 + 2 + 4 = 8. SST 5, 291,152 MST 661,394 = = 661,394 MST = F= = = 1336.15 3(3) − 1 ab − 1 MSE 495
296
Chapter 8
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
To determine whether the treatment means differ, we test:
H0: μ1 = μ2 = ⋅⋅⋅ = μ9 Ha: At least two treatment means differ The test statistic is F =
MST = 1336.15 MSE
The rejection region requires α = .10 in the upper tail of the F-distribution with ν1 = ab − 1 = 3(3) − 1 = 8 and ν2 = n − ab = 27 − 3(3) = 18. From Table VIII, Appendix B, F.10 = 2.04. The rejection region is F > 2.04. Since the observed value of the test statistic falls in the rejection region (F = 1336.15 > 2.04), H0 is rejected. There is sufficient evidence to indicate the treatment means differ at α = .10. c.
Since there are differences among the treatment means, we next test for the presence of interaction.
H0: Factors A and B do not interact to affect the response means Ha: Factors A and B do interact to affect the response means The test statistic is F =
MSAB = 258.07 MSE
The rejection region requires α = .10 in the upper tail of the F-distribution with ν1 = (a − 1)(b − 1) = (3 − 1)(3 − 1) = 4 and ν2 = n − ab = 17 − 3(3) = 18. From Table VIII, Appendix B, F.10 = 2.29. The rejection region is F > 2.29. Since the observed value of the test statistic falls in the rejection region (F = 258.07 > 2.29), H0 is rejected. There is sufficient evidence to indicate the two factors interact at α = .10. d.
The main effect tests are not warranted since interaction is present in part c.
e.
The nine treatment means need to be compared.
f.
From the graph, if the like letters are connected, the lines are not parallel. This implies interaction is present. This agrees with the results of part c.
Design of Experiments and Analysis of Variance
297
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
8.94
a.
This is a completely randomized design with a complete four-factor factorial design.
b.
There are a total of 2 × 2 × 2 × 2 = 16 treatments.
c.
Using SAS, the output is: Analysis of Variance Procedure Dependent Variable: Y Sum of
Mean
Source
DF
Squares
Square
F Value
Pr > F
Model
15
546745.50
36449.70
5.11
0.0012
Error
16
114062.00
7128.88
Corrected Total
31
660807.50
R-Square
C.V.
Root MSE
Y Mean
0.827390
41.46478
84.433
203.63
DF
Anova SS
Mean Square
F Value
Pr > F
SPEED
1
56784.50
56784.50
7.97
0.0123
FEED
1
21218.00
21218.00
2.98
0.1037
SPEED*FEED
1
55444.50
55444.50
7.78
0.0131
COLLET
1
165025.13
165025.13
23.15
0.0002
SPEED*COLLET
1
44253.13
44253.13
6.21
0.0241
FEED*COLLET
1
142311.13
142311.13
19.96
0.0004
SPEED*FEED*COLLET
1
54946.13
54946.13
7.71
0.0135
WEAR
1
378.13
378.13
0.05
0.8208
SPEED*WEAR
1
1540.13
1540.13
0.22
0.6483
FEED*WEAR
1
946.13
946.13
0.13
0.7204
SPEED*FEED*WEAR
1
528.13
528.13
0.07
0.7890
COLLET*WEAR
1
1682.00
1682.00
0.24
0.6337
SPEED*COLLET*WEAR
1
512.00
512.00
0.07
0.7921
FEED*COLLET*WEAR
1
72.00
72.00
0.01
0.9212
SPEE*FEED*COLLE*WEAR
1
1104.50
1104.50
0.15
0.6991
Source
d.
To determine if the interaction terms are significant, we must add together the sum of squares for all interaction terms as well as the degrees of freedom. SS(Interaction) = 55,444.50 + 44,253.13 + 142,311.13 + 54,946.13 + 1,540.13 + 946.13 + 528.13 + 1,682.00 + 512.00 + 72.00 + 1,104.50 = 303,339.78 df(Interaction) = 11 SS(Interacton) 303, 339.78 = = 27,576.34364 MS(Interaction) = 11 df(Interaction) MS(Interaction) 27, 576.34364 = 3.87 F(Interaction) = = MSE 7128.88
298
Chapter 8
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
To determine if interaction effects are present, we test:
H0: No interaction effects exist Ha: Interaction effects exist The test statistic is F = 3.87. The rejection region requires α = .05 in the upper tail of the F-distribution with ν1 = 11 and ν2 = 16. From Table IX, Appendix B, F.05 ≈ 2.49. The rejection region is F > 2.49. Since the observed value of the test statistic falls in the rejection region (F = 3.87 > 2.49), H0 is rejected. There is sufficient evidence to indicate that interaction effects exist at α = .05. Since the sums of squares for a balanced factorial design are independent of each other, we can look at the SAS output to determine which of the interaction effects are significant. The three-way interaction between speed, feed, and collet is significant (p = .0135). There are three two-way interactions with p-values less than .05. However, all of these two-way interaction terms are imbedded in the significant three-way interaction term. e.
Yes. Since the significant interaction terms do not include wear, it would be necessary to perform the main effect test for wear. All other main effects are contained in a significant interaction term. To determine if the mean finish measurements differ for the different levels of wear, we test:
H0: The mean finish measurements for the two levels of wear are the same Ha: The mean finish measurements for the two levels of wear are different The test statistic is t = 0.05. The rejection region requires α = .05 in the upper tail of the F-distribution with ν1 = 1 and ν2 = 16. From Table IX, Appendix B, F.05 = 4.49. The rejection region is F > 4.49. Since the observed value of the test statistic does not fall in the rejection region (F = .05 >/ 4.49), H0 is not rejected. There is insufficient evidence to indicate that the mean finish measurements differ for the different levels of wear at α = .05. f.
We must assume that: i. ii. iii.
The populations sampled from are normal. The population variances are the same. The samples are random and independent.
Design of Experiments and Analysis of Variance
299
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
Categorical Data Analysis
9.2
Chapter 9
The characteristics of the multinomial experiment are: 1. 2. 3. 4. 5.
The experiment consists of n identical trials. There are k possible outcomes to each trial. The probabilities of the k outcomes, denoted p1, p2, ... , pk, remain the same from trial to trial, where p1 + p2 + ⋅⋅⋅ + pk = 1. The trials are independent. The random variables of interest are the counts n1, n2, ... , nk in each of the k cells.
The characteristics of the binomial are the same as those for the multinomial with k = 2. 9.4
The hypotheses of interest are: H0: p1 = .25, p2 = .25, p3 = .50 Ha: At least one of the probabilities differs from the hypothesized value E(n1) = np1,0 = 320(.25) = 80 E(n2) = np2,0 = 320(.25) = 80 E(n3) = np3,0 = 320(.50) = 160 The test statistic is χ = 2
∑
[ ni − E (ni )] E (ni )
2
=
(78 − 80) 2 (60 − 80) 2 (182 − 160)2 = 8.075 + + 80 80 160
The rejection region requires α = .05 in the upper tail of the χ2 distribution with df = k − 1 2 = 5.99147. The rejection region is χ2 > = 3 − 1 = 2. From Table VII, Appendix B, χ.05 5.99147. Since the observed value of the test statistic falls in the rejection region (χ2 = 8.075 > 5.99147), H0 is rejected. There is sufficient evidence to indicate that at least one of the probabilities differs from its hypothesized value at α = .05. 9.6
300
a.
The qualitative variable of interest is the location of professional sports stadiums and ballparks. There are 3 levels or categories of this variable – downtown, central city, and suburban.
b.
Let p1 = proportion of major sports facilities located in downtown areas, p2 = proportion of major sports facilities located in central city areas, and p3 = proportion of major sports facilities located in suburban areas in 1997.
Chapter 9
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
To determine if the proportions of major sports facilities in downtown, central city, and suburban areas in 1997 are the different than in 1985, we test: H0: p1 = .40, p2 = .30, p3 = .30 Ha: At least one of the proportions differs from their hypothesized values c.
E(n1) = np1,0 = 113(.40) = 45.2; E(n2) = np2,0 = 113(.30) = 33.9; E(n3) = np3,0 = 113(.30) = 33.9
d.
The test statistic is [n − E (ni )]2 (58 − 45.2) 2 (26 − 33.9) 2 (29 − 33.9) 2 = + + = 6.174 χ2 = ∑ i 45.2 33.9 33.9 E ( ni )
e.
The degrees of freedom for the test statistic is k – 1 = 3 – 1 = 2. The p-value is p = P ( χ 2 ≥ 6.174) .
Using Table VII, Appendix B, with df = 2, .025 > P ( χ 2 ≥ 6.174) > .01 . Thus, .01 < p < .025. Since the p-value is smaller than α = .05, H0 is rejected. There is sufficient evidence to indicate the proportions of major sports facilities in downtown, central city, and suburban areas in 1997 are the different than in 1985. 9.8
a.
The categorical variable is the rating of the student exposure to social and environmental issues. It has 5 levels: 1-star, 2-stars, 3-stars, 4-stars, and 5-stars.
b.
If there were no difference in the category proportions, then each proportion should be pi = 1/5 = .20. There were a total of n = 30 business schools sampled. The expected number would be: E(n1) = E(n2) = E(n3) = E(n4) = E(n5) = n(pi,0) = 30(.20) = 6
c.
To determine if there are differences in the star rating category proportions of all MBA programs, we test: H0: p1 = p2 = p3 = p4 = p5 = .20 Ha: At least one pi differs from its hypothesized value
d.
The test statistic is
⎡ ni − E ( ni ) ⎦⎤ ( 2 − 6 )2 ( 9 − 6 )2 (14 − 6 )2 ( 5 − 6 )2 ( 0 − 6 )2 = + + + + = 21 χ =∑⎣ E ( ni ) 6 6 6 6 6 2
2
e.
The rejection region requires α = .05 in the upper tail of the χ2 distribution with 2 = 9.48773. The rejection df = k – 1 = 5 – 1 = 4. From Table VII, Appendix B, χ.05 2 region is χ > 9.48773.
Categorical Data Analysis
301
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
f.
Since the observed value of the test statistic falls in the rejection region (χ2 = 21 > 9.48773), H0 is rejected. There is sufficient evidence to indicate differences in the star rating category proportions of all MBA programs at α = .05.
g.
Some preliminary calculations are: pˆ 3 =
x3 14 = = .467 n 30
For confidence coefficient .95, α = .05 and α/2 = .05/2 = .025. From Table IV, Appendix B, z.025 = 1.96. The 95% confidence interval is:
pˆ 3 ± z.025
pˆ 3qˆ3 .467(.533) ⇒ .467 ± 1.96 ⇒ .467 ± .179 ⇒ (.288, .646) n 30
We are 95% confident that the proportion of all MBA programs that are ranked in the 3-star category is between .288 and .646. 9.10
a.
Some preliminary calculations are: E(n1) = np1,0 = 1000(.50) = 500
E(n2) = np2,0 = 1000(.22) = 220
E(n3) = np3,0 = 1000(.11) = 110
E(n4) = np4,0 = 1000(.17) = 170
To determine if the percentages disagree with the percentages reported by Nielson/NetRatings, we test: H0: p1 = .50, p2 = .22, p3 = .11, and p4 = .17 Ha: At least one pi differs from its hypothesized value The test statistic is 2 2 2 2 ⎡⎣ ni − E ( ni ) ⎤⎦ 487 − 500 ) 245 − 220 ) 121 − 110 ) 147 − 170 ) ( ( ( ( = + + + χ =∑ 500 220 110 170 E ( ni ) 2
2
= 7.391 The rejection region requires α = .05 in the upper tail of the χ2 distribution with 2 df = k – 1 = 4 – 1 = 3. From Table VII, Appendix B, χ.05 = 7.81473. The rejection 2 region is χ > 7.81473. Since the observed value of the test statistic does not fall in the rejection region (χ2 = 7.391 >/ 7.81473), H0 is not rejected. There is insufficient evidence to indicate the percentages disagree with the percentages reported by Nielson/NetRatings at α = .05.
302
Chapter 9
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
b.
Some preliminary calculations are: pˆ1 =
x1 487 = = .487 n 1000
For confidence coefficient .95, α = .05 and α/2 = .05/2 = .025. From Table IV, Appendix B, z.025 = 1.96. The 95% confidence interval is: pˆ1 ± z.025
pˆ1qˆ1 .487(.513) ⇒ .487 ± 1.96 ⇒ .487 ± .031 ⇒ (.456, .518) n 1000
We are 95% confident that the percentage of all Internet searches that use the Google Search Engine is between 45.6% and 51.8%. 9.12
Some preliminary calculations are: E(n1) = np1,0 = 2,023(.45) = 910.35
E(n2) = np2,0 = 2,023 (.35) = 708.05
E(n3) = np3,0 = 2,023 (.15) = 303.45
E(n4) = np4,0 = 2,023 (.05) = 101.15
To determine if the percentages of all adults falling into the four response categories changed after the Enron scandal, we test: H0: p1 = .45, p2 = .35, p3 = .15, and p4 = .05 Ha: At least one pi differs from its hypothesized value The test statistic is 2 2 2 2 ⎡⎣ ni − E ( ni ) ⎤⎦ 1,173 − 910.35 ) 587 − 708.05 ) 182 − 303.45 ) 81 − 101.15 ) ( ( ( ( χ =∑ = + + + 910.35 708.05 303.45 101.15 E ( ni ) 2
2
= 149.096 The rejection region requires α = .01 in the upper tail of the χ2 distribution with 2 = 11.3449. The rejection region is df = k – 1 = 4 – 1 = 3. From Table VII, Appendix B, χ.01 2 χ > 11.3449. Since the observed value of the test statistic falls in the rejection region (χ2 = 149.096 > 11.3449), H0 is rejected. There is sufficient evidence to indicate the percentages of all adults falling into the four response categories changed after the Enron scandal at α = .01.
Categorical Data Analysis
303
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
9.14
a.
Some preliminary calculations are: E(n1) = np1,0 = 700(.09) = 63 E(n3) = np3,0 = 700(.02) = 14 E(n5) = np5,0 = 700(.12) = 84 E(n7) = np7,0 = 700(.03) = 21 E(n9) = np9,0 = 700(.09) = 63 E(n11) = np11,0 = 700(.01) = 7 E(n13) = np13,0 = 700(.02) = 14 E(n15) = np15,0 = 700(.08) = 56 E(n17) = np17,0 = 700(.01) = 7 E(n19) = np19,0 = 700(.04) = 28 E(n21) = np21,0 = 700(.04) = 28 E(n23) = np23,0 = 700(.02) = 14 E(n25) = np25,0 = 700(.02) = 14 E(n27) = np27,0 = 700(.02) = 14
χ2 = ∑
E(n2) = np2,0 = 700(.02) = 14 E(n4) = np4,0 = 700(.04) = 28 E(n6) = np6,0 = 700(.02) = 14 E(n8) = np8,0 = 700(.02) = 14 E(n10) = np10,0 = 700(.01) = 7 E(n12) = np12,0 = 700(.04) = 28 E(n14) = np14,0 = 700(.06) = 42 E(n16) = np16,0 = 700(.02) = 14 E(n18) = np18,0 = 700(.06) = 42 E(n20) = np20,0 = 700(.06) = 42 E(n22) = np22,0 = 700(.02) = 14 E(n24) = np24,0 = 700(.01) = 7 E(n26) = np26,0 = 700(.01) = 7
[ ni − E (ni )]2 (39 − 63) 2 (18 − 14) 2 (30 − 14) 2 (34 − 14) 2 = + + + ... + = 360.48 E (ni ) 63 14 14 14
To determine if ScrabbleExpress “ presents the player with unfair word selection opportunities” that are different from the Scrabble board game, we test: H0: Proportions in ScrabbleExpress are the same as in the Scrabble board game Ha: Proportions in ScrabbleExpress are different from those in the Scrabble board game The test statistic is χ 2 = 360.47 The rejection region requires α = .05 in the upper tail of the χ 2 distribution with df = k – 1 = 27 – 1 = 26. From Table VII, Appendix B, χ 2 = 38.8852. The rejection region is χ 2 > 38.8852. Since the observed value of the test statistic falls in the rejection region ( χ 2 = 360.47 > 38.8852), H0 is rejected. There is sufficient evidence to indicate the ScrabbleExpress “presents the player with unfair word selection opportunities” that are different from the Scrabble board game at α = .05. b.
The relative frequency of vowels for the board game is P(A) + P(E) + P(I) + P(O) + P(U) = .09 + .12 + .09 +.08 + .04 = .42 pˆ v =
304
39 + 31 + 25 + 20 + 21 136 = = .194 700 700
Chapter 9
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
For confidence level .95, α = .05 and α/2 = .05/2 = .025. From Table IV, Appendix B, z.025 = 1.96. The 95% confidence interval is: pˆ v (1 − pˆ v ) .194(.806) ⇒ .194 ± 1.96 ⇒ .194 ± .029 ⇒ (.165, .223) n 700
pˆ v ± z.025
We are 95% confident that the true proportion of vowels in the ScrabbleExpress game is between .165 and .223. The true proportion from the board game is .42 which is much greater than the values in the interval. 9.16
2 df = (r − 1)(c − 1) = (5 − 1)(5 − 1) = 16. From Table VII, Appendix B, χ.05 = 26.2962. 2 The rejection region is χ > 26.2962.
a.
b.
9.18
2 df = (r − 1)(c − 1) = (3 − 1)(6 − 1) = 10. From Table VII, Appendix B, χ.10 = 15.9871. 2 The rejection region is χ > 15.9871.
c.
df = (r − 1)(c − 1) = (2 − 1)(3 − 1) = 2. From Table VII, Appendix B, χ2 = 9.21034. The rejection region is χ2 > 9.21034.
a.
To convert the frequencies to percentages, divide the numbers in each column by the column total and multiply by 100. Also, divide the row totals by the overall total and multiply by 100. The column totals are 25, 64, and 78, while the row totals are 96 and 71. The overall sample size is 165. The table of percentages are: Column 2
1 Row 1
b.
3
9 ⋅ 100 = 36% 25
34 ⋅ 100 = 53.1% 64
53 ⋅ 100 = 67.9% 78
96 ⋅ 100 = 57.5% 167
2 16 ⋅ 100 = 64% 25
30 ⋅ 100 = 46.9% 64
25 ⋅ 100 = 32.1% 78
71 ⋅ 100 = 42.5% 167
Using MINITAB, the graph is:
70 60
57.5%
50
Percent
40 30 20 10 0 1
2
3
Column
Categorical Data Analysis
305
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
c.
9.20
If the rows and columns are independent, the row percentages in each column would be close to the row total percentages. This pattern is not evident in the plot, implying the rows and columns are not independent.
a-b. To convert the frequencies to percentages, divide the numbers in each column by the column total and multiply by 100. Also, divide the row totals by the overall total and multiply by 100. B B2
B1 B
c.
B3
B
B
Totals
A1 40 ⋅ 100 = 29.9% 134
72 ⋅ 100 = 44.2% 163
42 ⋅ 100 = 29.6% 142
154 ⋅ 100 = 35.1% 439
A2 63 ⋅ 100 = 47.0% Row 134
53 ⋅ 100 = 32.5% 163
70 ⋅ 100 = 49.3% 142
186 ⋅ 100 = 42.4% 439
A3 31 ⋅ 100 = 23.1% 134
38 ⋅ 100 = 23.3% 163
30 ⋅ 100 = 21.1% 142
99 ⋅ 100 = 22.6% 439
Using MINITAB, the graph is:
45 40 35
35.1%
30
Percent
25 20 15 10 5 0 1
2
3
B
The graph supports the conclusion that the rows and columns are not independent. If they were, then the height of all the bars would be essentially the same. 9.22
a.
The contingency table would be: Taxmotivation Yes No Total
306
Itemize Deductions Yes No 691 381 794 899 1,482 1,280
Total 1,072 1,693 2,765
Chapter 9
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
b.
c.
E11 =
R1C1 1,072(1, 485) = = 575.7 n 2,765
E21 =
R2C1 1,693(1, 485) = = 909.3 n 2,765
E12 =
R1C2 1,072(1, 280) = = 496.3 n 2,765
E22 =
R2C2 1,693(1, 280) = = 783.7 n 2,765
The test statistic is:
χ 2 = ∑∑
[nij − Eij ]2 Eij
[691 − 575.7]2 [381 − 496.3]2 [794 − 909.3]2 [899 − 783.7]2 + + + 575.7 496.3 909.3 783.7 = 81.46 =
d.
To determine if tax-motivation and itemize-deduction are related for charitable givers, we test: H0: Tax-motivation and itemize-deduction are independent Ha: Tax-motivation and itemize-deduction are dependent The test statistic is χ 2 = 81.46. The rejection region requires α = .05 in the upper tail of the χ 2 distribution with df = 2 = 3.84146. The (r – 1)(c – 1) = (2 – 1)(2 – 1) = 1. From Table VII, Appendix B, χ.05
rejection region is χ 2 > 3.84146. Since the observed value of the test statistic falls in the rejection region ( χ 2 = 81.46 > 3.84146), H0 is rejected. There is sufficient evidence to indicate that tax-motivation and itemize-deduction are related for charitable givers at α = .05. e.
To compute the bar graph, we first convert frequencies to percentages by dividing the numbers in each column by the column total and multiplying by 100%. Also, divide the row totals by the overall total and multiply by 100%.
Taxmotivation Yes No Total
Itemize Deductions Yes 691 ⋅ 100% = 46.5% 1485 794 ⋅ 100% = 53.5% 1485 1,485
Categorical Data Analysis
No 381 ⋅ 100% = 29.8% 1280 899 ⋅ 100% = 70.2% 1280 1,280
Total 1072 ⋅ 100% = 38.8% 2765 1693 ⋅ 100% = 61.28% 2765 2,765
307
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
Using MINITAB, the bar graph is:
50
40
38.8%
Percent
30
20
10
0 Yes
No
Itemize
9.24.
a.
Some preliminary calculations are: pˆ C1 =
xC1 175 = = .028 n1 6, 222
pˆ C 2 =
xC 2 236 = = .050 4,692 n2
pˆ C 3 =
xC 3 319 = = .045 7,140 n3
pˆ C 4 =
xC 4 231 = = .038 6,120 n4
pˆ C 5 =
xC 5 480 = = .046 n5 10,353
pˆ C 6 =
xC 6 187 = = .039 4794 n6
The proportions range from .028 to .050. Since .050 is about twice as big as .028, there may be evidence to conclude some of the proportions are different. b.
308
Some preliminary calculations are: E11 =
R1C1 6, 222(37,693) = = 5,964.39 n 39,321
E12 =
R1C2 6, 222(1628) = = 257.61 n 39,321
E21 =
R2C1 4,692(37,693) = = 4497.74 n 39,321
E22 =
R2C2 4,692(1,628) = = 194.26 n 39,321
E31 =
R3C1 7,140(37,693) = = 6,844.38 n 39,321
E32 =
R3C2 7,140(1,628) = = 295.62 n 39,321
E41 =
R4C1 6,120(37,693) = = 5,866.61 n 39,321
E42 =
R4C2 6,120(1,628) = = 253.39 n 39,321
Chapter 9
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
E51 =
R5C1 10,353(37,693) = = 9,924.36 n 39,321
E52 =
R5C2 10,353(1,628) = = 428.64 n 39,321
E61 =
R6C1 4,794(37,693) = = 4,595.51 39,321 n
E62 =
R6C2 4,794(1,628) = = 198.49 39,321 n
To determine if the proportions of censored measurements differ for the six tractor lines, we test: H0: Tractor lines and Censored measurements are independent Ha: Tractor lines and Censored measurements are dependent The test statistic is 2
2 2 2 ⎡ nij − Eij ⎤ 6047 − 5964.39 ) 175 − 257.61) 4456 − 4497.74 ) ( ( ( ⎣ ⎦ χ = ∑∑ = + + 5964.39 257.61 4497.74 Eij 2
2 187 − 198.49 ) ( + ⋅⋅⋅ +
198.49
= 48.0978
The rejection region requires α = .01 in the upper tail of the χ2 distribution with 2 = 15.0863. df = (r – 1)(c – 1) = (6 – 1)(2 − 1) = 5. From Table VII, Appendix B, χ.01 2 The rejection region is χ > 15.0863. Since the observed value of the test statistic falls in the rejection region (χ2 = 48.0978 > 15.0863), H0 is rejected. There is sufficient evidence to indicate that the proportions of censored measurements differ for the six tractor lines at α = .01. c.
9.26
Even though there are differences in the proportions of censured data among the 6 tractor lines, these proportions range from .028 to .050. In practice, there is very little difference between .028 and .050.
Some preliminary calculations are: E11 =
R1C1 95(118) = = 42.8 262 n
E21 =
R2 C1 69(118) = = 31.1 n 262
E31 =
R3 C1 42(118) = = 18.9 n 262
E32 =
R3 C2 42(144) = = 23.1 n 262
E41 =
R4 C1 56(118) = = 25.2 n 262
E42 =
R4 C2 56(144) = = 30.8 n 262
Categorical Data Analysis
E12 =
R1C2 95(144) = = 52.2 262 n
E22 =
R2 C2 69(144) = = 37.9 n 262
309
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
To determine whether a pig farmer’s education level has an impact on the size of the pig farm, we test: H0: Pig farmer’s education level and size of pig farm are independent Ha: Pig farmer’s education level and size of pig farm are dependent The test statistic is
χ 2 = ∑∑ +
[nij − Eij ]2 Eij
=
(42 − 42.8) 2 (53 − 52.2) 2 (27 − 31.1) 2 (42 − 37.9) 2 (22 − 18.9) 2 + + + + 42.8 52.2 31.1 37.9 18.9
(20 − 23.1) 2 (27 − 25.2)2 (29 − 30.8) 2 + + = 2.17 23.1 25.2 30.8
The rejection region requires α = .05 in the upper tail of the χ 2 distribution with df 2 = (r – 1)(c – 1) = (4 – 1)(2 – 1) = 3. From Table VII, Appendix B, χ.05 = 7.81473. The
rejection region is χ 2 > 7.81473. Since the observed value of the test statistic does not fall in the rejection region ( χ 2 = 2.17 >/ 7.81473), H0 is not rejected. There is insufficient evidence to indicate that a pig farmer’s education level has an impact on the size of the pig farm at α = .05. To compute the bar graph, we first convert frequencies to percentages by dividing the numbers in each row by the row total and multiplying by 100%. Also, divide the column totals by the overall total and multiply by 100%. Farm Size <1,000 pigs 1,000-2,000 pigs 2,000-5,000 pigs > 5,000 pigs Total
310
Education Level No college College 42 53 ⋅ 100% = 44.2% ⋅ 100% = 55.8% 95 95 27 42 ⋅ 100% = 39.1% ⋅ 100% = 60.9% 69 69 22 20 ⋅ 100% = 52.4% ⋅ 100% = 47.6% 42 42 27 29 ⋅ 100% = 48.2% ⋅ 100% = 51.8% 56 56 118 144 ⋅ 100% = 45.0% ⋅ 100% = 55.0% 262 262
Total 95
69 42 56 262
Chapter 9
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
Using MINITAB, the bar graph is:
50 45.0%
Percent
40
30 20
10 0 <1,000
1,000-2,000
2,000-5,000
>5,000
Farm Size
9.28
a.
Some preliminary calculations are: R1C1 53(35) = = 26.5 n 70 R C 17(35) = 8.5 E21 = 2 1 = n 70
R1C2 53(35) = = 26.5 n 70 R C 17(35) E22 = 2 2 = = 8.5 n 70
E11 =
E12 =
To determine if the severity of the ethical issue influenced whether the issue was identified or not by the auditors, we test: H0: Severity of ethical issue and identification are independent Ha: Severity of ethical issue and identification are dependent ⎡ nij − Eij ⎤⎦ The test statistic is χ = ∑∑ ⎣ Eij
2
2
(27 − 26.5) (26 − 26.5) (8 − 8.5) (9 − 8.5) + + + = = .078 26.5 26.5 8.5 8.5 2
2
2
2
The rejection region requires α = .05 in the upper tail of the χ2 distribution with df = 2 = 3.84146. The (r − 1)(c − 1) = (2 − 1)(2 − 1) = 1. From Table VII, Appendix B, χ.05 2 rejection region is χ > 3.84146. Since the observed value of the test statistic does not fall in the rejection region (χ2 = .078 >/ 3.84146), H0 is not rejected. There is insufficient evidence to indicate that the severity of the ethical issue influenced whether the issue was identified or not by the auditors at α = .05. b.
No. If there were 0 in the bottom cell of the column, then the expected count for that cell will be less than 5. One of the assumptions necessary for the test statistic to have a χ2 distribution will not hold.
Categorical Data Analysis
311
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
c.
Suppose we change the numbers in the table to be as follows: Severity of Ethical Issue Moderate Severe 32 21 3 14
Ethical Issue Identified Ethical Issue Not Identified
Since the row and column totals are the same, the expected cell counts are the same as above.
⎡⎣ nij − Eij ⎤⎦ The test statistic is χ = ∑∑ Eij
2
2
=
(32 − 26.5) 2 (21 − 26.5) 2 (3 − 8.5) 2 (14 − 8.5) 2 + + + = 9.401 26.5 26.5 8.5 8.5
Now the test statistic would fall in the rejection region. 9.30
a.
The contingency table is:
Altitude < 300 300-600 ≥ 600 Totals
b.
Flight Response Low High 85 105 77 121 17 59 179 285
Totals 190 198 76 464
Some preliminary calculations are:
E11 =
R1C1 190(179) = = 73.297 n 464
E12 =
R1C2 190(285) = = 116.703 n 464
E21 =
R2C1 198(179) = = 76.384 n 464
E22 =
R2C2 198(285) = = 121.616 n 464
E31 =
R3C1 76(179) = = 29.319 464 n
E32 =
R3C2 76(285) = = 46.681 464 n
To determine if flight response of the geese depends on the altitude of the helicopter, we test:
H0: Flight response and Altitude of helicopter are independent Ha: Flight response and Altitude of helicopter are dependent
312
Chapter 9
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
The test statistic is
⎡ nij − Eij ⎤ ⎦ χ = ∑∑ ⎣ Eij
2
2
=
(85 − 73.297 )2 (105 − 116.703)2 ( 77 − 76.384 )2 (121 − 121.616 )2 73.297
+
+
+
116.703
(17 − 29.319 )
2
29.319
+
( 59 − 46.681)
+
76.384
121.616
2
46.681
= 11.477 The rejection region requires α = .01 in the upper tail of the χ2 distribution with 2 df = (r – 1)(c – 1) = (3 – 1)(2 − 1) = 2. From Table VII, Appendix B, χ.01 = 9.21034. 2 The rejection region is χ > 9.21034. Since the observed value of the test statistic falls in the rejection region (χ2 = 11.477 > 9.21034), H0 is rejected. There is sufficient evidence to indicate that the flight response of the geese depends on the altitude of the helicopter at α = .01. c.
The contingency table is: Flight Response Lateral Distance < 1000 1000-2000 2000-3000 ≥ 3000 Totals
d.
Low 37 68 44 30 179
High 243 37 4 1 285
Totals 280 105 48 31 464
Some preliminary calculations are:
E11 =
R1C1 280(179) = = 108.017 n 464
E12 =
R1C2 280(285) = = 171.983 n 464
E21 =
R2C1 105(179) = = 40.506 n 464
E22 =
R2C2 105(285) = = 64.494 n 464
E31 =
R3C1 48(179) = = 18.517 464 n
E32 =
R3C2 48(285) = = 29.483 464 n
E41 =
R 4 C1 31(179) = = 11.959 n 464
E42 =
R4C2 31(285) = = 19.041 n 464
Categorical Data Analysis
313
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
To determine if flight response of the geese depends on the lateral distance of the helicopter, we test:
H0: Flight response and Lateral distance of the helicopter are independent Ha: Flight response and Lateral distance of the helicopter are dependent The test statistic is ⎡ nij − Eij ⎤ ⎦ χ 2 = ∑∑ ⎣ Eij =
2
( 37 − 108.017 )2 ( 243 − 171.983)2 ( 68 − 40.506 )2 ( 37 − 64.494 )2 108.017 +
+
171.983
( 44 − 18.517 ) 18.517
2
+
+
( 4 − 29.494 )
40.506 2
29.494
+
+
( 30 − 11.959 )
64.494 2
11.959
+
(1 − 19.041)2 19.041
= 207.814 The rejection region requires α = .01 in the upper tail of the χ2 distribution with 2 df = (r – 1)(c – 1) = (4 – 1)(2 − 1) = 3. From Table VII, Appendix B, χ.01 = 11.3449. 2 The rejection region is χ > 11.3449. Since the observed value of the test statistic falls in the rejection region (χ2 = 207.814 > 11.3449), H0 is rejected. There is sufficient evidence to indicate that the flight response of the geese depends on the lateral distance of the helicopter at α = .01. e.
Using SAS, the contingency table for altitude by response with the column percents is: Table of ALTGRP by RESPONSE ALTGRP
RESPONSE
Frequency| Percent | Row Pct | Col Pct |LOW |HIGH | Total ---------+--------+--------+ <300 | 85 | 105 | 190 | 18.32 | 22.63 | 40.95 | 44.74 | 55.26 | | 47.49 | 36.84 | ---------+--------+--------+ 300-600 | 77 | 121 | 198 | 16.59 | 26.08 | 42.67 | 38.89 | 61.11 | | 43.02 | 42.46 | ---------+--------+--------+ 600+ | 17 | 59 | 76 | 3.66 | 12.72 | 16.38 | 22.37 | 77.63 | | 9.50 | 20.70 | ---------+--------+--------+ Total 179 285 464 38.58 61.42 100.00
314
Chapter 9
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
Statistics for Table of ALTGRP by RESPONSE Statistic DF Value Prob -----------------------------------------------------Chi-Square 2 11.4770 0.0032 Likelihood Ratio Chi-Square 2 12.1040 0.0024 Mantel-Haenszel Chi-Square 1 10.2104 0.0014 Phi Coefficient 0.1573 Contingency Coefficient 0.1554 Cramer's V 0.1573 Sample Size = 464
From the row percents, it appears that the lower the plane, the lower the response. For altitude <300m, 55.26% of the geese had a high response. For altitude 300600m, 61.11% of the geese had a high response. For altitude 600+m, 77.63% of the geese had a high response. Thus, instead of setting a minimum altitude for the planes, we need to set a maximum altitude. For this data, the lowest response is at an altitude of < 300 meters. Using SAS, the contingency table for lateral distance by response with the column percents is: The FREQ Procedure Table of LATGRP by RESPONSE LATGRP
RESPONSE
Frequency | Percent | Row Pct | Col Pct |LOW |HIGH | Total ----------+--------+--------+ <1000 | 37 | 242 | 279 | 7.99 | 52.27 | 60.26 | 13.26 | 86.74 | | 20.67 | 85.21 | ----------+--------+--------+ 1000-2000 | 68 | 37 | 105 | 14.69 | 7.99 | 22.68 | 64.76 | 35.24 | | 37.99 | 13.03 | ----------+--------+--------+ 2000-3000 | 44 | 4 | 48 | 9.50 | 0.86 | 10.37 | 91.67 | 8.33 | | 24.58 | 1.41 | ----------+--------+--------+ 3000+ | 30 | 1 | 31 | 6.48 | 0.22 | 6.70 | 96.77 | 3.23 | | 16.76 | 0.35 | ----------+--------+--------+ Total 179 284 463 38.66 61.34 100.00 Frequency Missing = 1 Statistics for Table of LATGRP by RESPONSE Statistic DF Value Prob -----------------------------------------------------Chi-Square 3 207.0800 <.0001 Likelihood Ratio Chi-Square 3 226.8291 <.0001 Mantel-Haenszel Chi-Square 1 189.2843 <.0001 Phi Coefficient 0.6688 Contingency Coefficient 0.5559 Cramer's V 0.6688 Effective Sample Size = 463 Frequency Missing = 1
Categorical Data Analysis
315
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
From the row percents, it appears that the greater the lateral distance, the lower the response. For a lateral distance of 3000+m only 3.23% of the geese had a high response. Thus, the further away the plane is laterally, the lower the response. For this data, the lowest response is when the plane is further than 3000 meters. Thus the recommendation would be a maximum height of 300 m and a minimum lateral distance of 3000 m. 9.32
a.
Some preliminary calculations are: E11 = E12 = E13 = E31 = E32 = E33 =
50(50) = 10 250 50(90) = 18 250 50(110) = 22 250 100(50) = 20 250 100(90) = 36 250 100(110) = 44 250
100(50) = 20 250 100(90) E22 = = 36 250 100(110) E23 = = 44 250 E21 =
To determine if the rows and columns are dependent, we test: H0: Rows and columns are independent Ha: Rows and columns are dependent 2
⎡ nij − Eij ⎤⎦ (20 − 10) 2 (30 − 44) 2 +"+ The test statistic is χ = ∑∑ ⎣ = = 54.14 10 44 Eij 2
The rejection region requires α = .05 in the upper tail of the χ2 distribution with df = 2 = 9.48773. The (r − 1)(c − 1) = (3 −1)(3 − 1) = 4. From Table VII, Appendix B, χ.05 2 rejection region is χ > 9.48773. Since the observed value of the test statistic falls in the rejection region (χ2 = 54.14 > 9.48773), H0 is rejected. There is sufficient evidence to indicate a dependence between rows and columns at α = .05.
316
b.
No, the analysis remains identical.
c.
Yes, the assumptions on the sampling differ.
Chapter 9
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
d.
The percentages are in the table below. Column 2
1 1
20 50
Row
2
10 50
3
20 50
e.
20
× 100% = 40%
90
20
× 100% = 20%
90 50
× 100% = 40%
90
3 10
× 100% = 22.2%
110
70
× 100% = 22.2%
110 30
× 100% = 55.6%
110
× 100% = 9.1%
Totals 50 250
× 100% = 63.6%
100
× 100% = 37.3%
100
250 250
× 100% = 20% × 100% = 40% × 100% = 40%
Using MINITAB, the bar graph is:
40
Percent
30
20%
20
10
0 1
2
3
Column
The graph supports the decision in part a. In part a, we rejected the null hypothesis and concluded that the rows and columns were dependent. If they were dependent, then we would expect the three bars to be the same height. In this graph, they are not the same height. 9.34
a.
If Bon Appetit readers do not have a preference for their least favorite vegetable, then the values of p1, p2, p3, and p4 should all be the same. Since there are four categories, then p1 = p2 = p3 = p4 = .25.
b.
To determine if the Bon Appetit readers have a preference for at least one of the vegetables as “least favorite”, we test: H0: p1 = p2 = p3 = p4 = .25 Ha At least one pi ≠ .25
Categorical Data Analysis
317
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
c.
Some preliminary calculations: n=
∑n
i
= 46 + 76 + 44 + 34 = 200
E(ni) = npi,0 = 200(.25) = 50, i = 1, 2, 3, or 4 The test statistic is χ = 2
=
∑
[ ni − E (ni )]
2
E ( ni )
(46 − 50) 2 (76 − 50) 2 (44 − 50) 2 (34 − 50) 2 = 19.68 + + + 50 50 50 50
The rejection region requires α = .05 in the upper tail of the χ2 distribution with df = k − 1 2 = 7.81473. The rejection region is = 4 − 1 = 3. From Table VII, Appendix B, χ.05 2 χ > 7.81473. Since the observed value of the test statistic falls in the rejection region (χ2 = 19.68 > 7.81473), H0 is rejected. There is sufficient evidence to indicate the Bon Appetit readers have a preference for at least one of the vegetables as “least favorite” at α = .05. d.
We must assume that: Sample is random Sample size is sufficiently large (every cell has an expected value of at least 5).
9.36
a.
Some preliminary calculations are: E11 =
R1C1 242(473) = = 208.499 n 549
E21 =
R2 C1 212(473) = = 182.652 n 549
E31 =
R3 C1 95(473) = = 81.849 549 n
E12 =
R1C2 242(76) = = 33.501 n 549
E22 = E32 =
R2 C2 212(76) = = 29.348 n 549
R3 C2 95(76) = = 13.151 549 n
To determine if the likelihood for stress is dependent on an employee’s fitness level, we test: H0: Stress and Fitness level are independent Ha: Stress and Fitness level are dependent
318
Chapter 9
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
The test statistic is ⎡ nij − Eij ⎤ ⎦ χ = ∑∑ ⎣ Eij
2
2
=
( 204 − 208.499 )2 ( 38 − 33.506 )2 (184 − 182.652 )2 +
208.499 +
( 28 − 29.348) 29.348
33.506 2
+
+
182.652
(85 − 81.849 ) 81.849
2
+
(10 − 13.151)2 13.151
= 1.648 Since no α level was given, we will use α = .05. The rejection region requires α = .05 in the upper tail of the χ2 distribution with df = (r – 1)(c – 1) = (3 – 1)(2 − 1) = 2. 2 From Table VII, Appendix B, χ.05 = 5.99147. The rejection region is χ2 > 5.99147. Since the observed value of the test statistic does not fall in the rejection region (χ2 = 1.648 > 5.99147), H0 is not rejected. There is insufficient evidence to indicate that the likelihood for stress is dependent on an employee’s fitness level at α = .05. b.
A Type I error is rejecting H0 when H0 is true. In this case, it would be concluding that Stress and Fitness level are dependent when, in fact, they are independent. A Type II error is accepting Ho when Ho is false. In this case, it would be concluding that Stress and Fitness level are independent when, in fact, they are dependent.
c.
To convert frequencies to percentages, divide the numbers in each row by the row total and multiply by 100. Also, divide the column totals by the overall total and multiply by 100. Stress Level Poor
Fitness Level
Average Good Total
Categorical Data Analysis
No Stress
Stress
204 ⋅ 100 = 84.3% 242 184 ⋅ 100 = 86.8% 212 85 ⋅ 100 = 89.5% 95 473 ⋅ 100 = 86.2% 549
38 ⋅ 100 = 15.7% 242 28 ⋅ 100 = 13.2% 212 10 ⋅ 100 = 10.5% 95 76 ⋅ 100 = 13.8% 549
319
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
Using MINITAB, the bar chart is: Chart of Percent with Stress 16 14
13.8%
P er cent
12 10 8 6 4 2 0
9.38
a.
P oor
A v erage Fitness Level
G ood
E(n1) = np1,0 = 370(.30) = 111 E(n2) = np2,0 = 370(.20) = 74 E(n3) = np3,0 = 370(.20) = 74 E(n4) = np4,0 = 370(.10) = 37 E(n5) = np5,0 = 370(.10) = 37 E(n6) = np6,0 = 370(.10) = 37
b.
The test statistic is χ = ∑ 2
[ ni − E (ni )]
2
E (ni )
(84 − 111) (79 − 74) (75 − 74) (49 − 37) + + + 111 74 74 37 2 2 (36 − 37) (47 − 37) + + = 13.541 37 37 2
2
2
2
=
c.
To determine if the true percentages of the colors produced differ from the manufacturer’s stated percentages, we test: H0: p1 = .30, p2 = .20, p3 = .20, p4 = .10, p5 = .10, p6 = .10 Ha: At least one pi does not equal its hypothesized value. The test statistic is χ2 = 13.541.
320
Chapter 9
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
The rejection region requires α = .05 in the upper tail of the χ2 distribution with df = k − 1 2 = 11.0705. The rejection region is = 6 − 1 = 5. From Table VII, Appendix B, χ.05 2 χ > 11.0705. Since the observed value of the test statistic falls in the rejection region (χ2 = 13.541 > 11.0705), H0 is rejected. There is sufficient evidence to indicate the true percentages of the colors produced differ from the manufacturer’s stated percentages at α = .05. 9.40
a.
The expected cell counts are: R1C1 20(11) = 7.097 = 31 n RC 11(11) E21 = 2 1 = = 3.903 31 n E11 =
R1C2 20(20) = 12.903 = 31 n RC 11(20) E22 = 2 2 = = 7.097 31 n E12 =
b.
One of the assumptions for the chi-square test is that the sample size, n, is large enough so that, for every cell, the expected cell count, Eij, will be equal to 5 or more. For cell (2, 1), the expected cell count is only 3.903.
c.
To determine if inside ownership and size are independent, we test: H0: Inside ownership and size are independent Ha: Inside ownership and size are dependent The p-value is .0043. Since the p-value is so small, H0 is rejected. There is sufficient evidence to indicate that inside ownership and size are dependent for α > .0043
d.
First, we find the percentages by dividing each cell count by the column total and multiplying by 100. The row totals are divided by the total sample size. The percentages are found in the table: Size Insider Ownership Low High
Small 3 × 100% = 27.3% 11 8 × 100% = 72.7% 11
Categorical Data Analysis
Large 17 × 100% = 85% 20 3 × 100% = 15% 20
Totals 20 × 100% = 64.5% 31 11 × 100% = 35.5% 31
321
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
Using MINITAB, the bar chart is:
90 80 70 64.5%
60
Percent
50 40 30 20 10 0 Small
Large
Size
Since the bars are not the same height, there is evidence that insider ownership and size are dependent. This is what we found in part c. 9.42
322
Some preliminary calculations are: E11 =
R1C1 100(171) = = 34.2 n 500
E12 =
R1C2 100(207) = = 41.4 n 500
E13 =
R1C3 100(80) = = 16.0 n 500
E14 =
R1C4 100(42) = = 8.4 n 500
E21 =
R2 C1 175(171) = = 59.9 500 n
E22 =
R2 C2 175(207) = = 72.5 500 n
E23 =
R2 C3 175(80) = = 28.0 500 n
E24 =
R2 C4 175(42) = = 14.7 500 n
E31 =
R3 C1 145(171) = = 49.6 n 500
E32 =
R3 C2 145(207) = = 60.0 n 500
E33 =
R3 C3 145(80) = = 23.2 n 500
E34 =
R3 C4 145(42) = = 12.2 n 500
E41 =
R4 C1 80(171) = = 27.4 n 500
E42 =
R4 C2 80(207) = = 33.1 n 500
E43 =
R4 C3 80(80) = = 12.8 500 n
E44 =
R4 C4 80(42) = = 6.7 500 n
Chapter 9
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
To determine if there is a dependence between a son's choice of occupation and his occupation, we test:
father's
H0: Son's choice of occupation and his father's occupation are independent Ha: Son's choice of occupation and his father's occupation are dependent. The test statistic is
χ = ∑∑ 2
[nij − Eij ]2 Eij
=
(55 − 34.2) 2 (38 − 41.4) 2 (7 − 16.0) 2 (0 − 8.4) 2 (79 − 59.9) 2 + + + + 34.2 41.4 16.0 8.4 59.9
(71 − 72.5) 2 (25 − 28) 2 (0 − 14.7) 2 (22 − 49.6) 2 (75 − 60) 2 (38 − 23.2) 2 + + + + + 72.5 28 14.7 49.6 60 23.2 (10 − 12.2) 2 (15 − 27.4) 2 (23 − 33.1) 2 (10 − 12.8) 2 (32 − 6.7) 2 + + + + + = 181.32 12.2 27.4 33.1 12.8 6.7 +
The rejection region requires α = .05 in the upper tail of the χ 2 distribution with df 2 = 16.9190. The = (r – 1)(c – 1) = (4 – 1)(4 – 1) = 9. From Table VII, Appendix B, χ.05
rejection region is χ 2 > 16.9190. Since the observed value of the test statistic falls in the rejection region ( χ 2 = 181.32 > 16.9190), H0 is rejected. There is sufficient evidence to indicate a dependence between a son’s choice of occupation and his father’s occupation at α = .05. 9.44
a.
Some preliminary calculations are: R1C1 57(52) = = 34.465 n 86 R C 29(52) E21 = 2 1 = = 17.535 n 86
E11 =
R1C2 57(54) = = 22.535 n 86 RC 29(34) E22 = 2 2 = = 11.465 n 86
E12 =
To determine if manufacturing firms were more likely to be involved with TQM than service firms, we test: H0: Type of firm and TQM are independent Ha: Type of firm and TQM are dependent ⎡ nij − Eij ⎤⎦ The test statistic is χ = ∑∑ ⎣ Eij
2
2
(34 − 34.465) (23 − 22.535) (18 − 17.535) (11 − 11.465) + + + = .047 34.465 22.535 17.535 11.465 2
=
2
2
2
The rejection region requires α = .05 in the upper tail of the χ2 distribution with df = 2 = 3.84146. The (r − 1)(c − 1) = (2 − 1)(2 − 1) = 1. From Table VII, Appendix B, χ.05 2 rejection region is χ > 3.84146.
Categorical Data Analysis
323
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
Since the observed value of the test statistic does not fall in the rejection region (χ2 = .047 >/ 3.84146), H0 is not rejected. There is insufficient evidence to indicate that the type of firm and TQM are dependent at α = .05. There is no evidence to indicate that manufacturing firms are more likely to be involved with TQM than service firms. b.
The p-value is P(χ2 > .047). From Table VII, Appendix B, with df = 1, .10 < P(χ2 > .047) < .90.
c.
We must assume: 1.
2.
9.46
a.
The n observed counts are a random sample from the population of interest. We may then consider this to be a multinomial experiment with r × c = 2 × 2 = 4 possible outcomes The sample size, n, will be large enough so that, for every cell, the expected cell count, E(nij), will be equal to 5 or more.
Some preliminary calculations are: E(n1) = np1,0 = 85(.26) = 22.1 E(n2) = np2,0 = 85(.30) = 25.5 E(n3) = np3,0 = 85(.11) = 9.35 E(n4) = np4,0 = 85(.14) = 11.9 E(n5) = np5,0 = 85(.19) = 16.15 To determine if probabilities differ from the hypothesized values, we test: H0: p1 = .26, p2 = .30, p3 = .11, p4 = .14, p5 = .19 Ha: At least one of the probabilities differs from its hypothesized value. ⎡ ni − E ( n i ) ⎤⎦ The test statistic is χ = ∑ ⎣ E (ni ) 2
2
2
(32 − 22.1) (26 − 25.5) (15 − 9.35) (6 − 11.9) (6 − 16.15) + + + + 22.1 25.5 9.35 11.9 16.15 2
=
2
2
2
2
= 17.16 The rejection region requires α = .05 in the upper tail of the χ2 distribution with df = k − 1 2 = 9.48773. The rejection region is = 5 − 1 = 4. From Table VII, Appendix B, χ.05 2 χ > 9.48773. Since the observed value of the test statistic falls in the rejection region (χ2 = 17.16 > 9.48873), reject H0. There is sufficient evidence to indicate the probabilities differ from their hypothesized values at α = .05.
324
Chapter 9
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
b.
pˆ1 =
n1 32 = .37647 = n 85
For confidence coefficient .95, α = 1 − .95 = .05 and α/2 = .05/2 = .025. From Table IV, Appendix B, z.025 = 1.96. The 95% confidence interval is: pˆ1 (1 − pˆ1 ) n .37647(1 − .37647) ⇒ .376 ± 1.96 85 ⇒ .376 ± .103 ⇒ (.273, .479) z.025
9.48
c.
The interval tells us that between 27.3% and 47.9% of the Avonex MS patients are exacerbation-free during a two-year period. Since this interval is completely above the percentage of placebo patients (26%), it seems that the Avonex patients are more likely to have no exacerbations than placebo patients.
a.
Some preliminary calculations are: The contingency table is:
Shift 1 2 3
Defectives 25 35 80 140
Non-Defectives 175 165 120 460
200 200 200 600
R1C1 200(140) = 46.667 = n 600 200(140) E21 = E31 = = 46.667 600 200(460) E12 = E22 = (n32) = = 153.333 600
E11 =
To determine if quality of the filters are related to shift, we test:
H0: Quality of filters and shift are independent Ha: Quality of filters and shift are dependent The test statistic is χ2 =
+
(80 − 46.667 )
46.667 = 47.98
Categorical Data Analysis
∑∑
2
+
[ nij − Eij ]2 Eij
(175 − 153.333) 153.333
=
( 25 − 46.667 )
2
+
46.667
2
+
(165 − 153.333) 153.333
( 35 − 46.667 )
2
46.667
2
+
(120 − 153.333)
2
153.333
325
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
The rejection region requires α= .05 in the upper tail of the χ2 distribution with df = 2 = 5.99147. The (r − 1)(c − 1) = (3 − 1)(2 − 1) = 2. From Table VII, Appendix B, χ.05 2 rejection region is χ > 5.99147. Since the observed value of the test statistic falls in the rejection region (χ2 = 47.98 > 5.99147), H0 is rejected. There is sufficient evidence to indicate quality of filters and shift are related at α = .05. b.
The form of the confidence interval for p is: pˆ qˆ 25 pˆ ± zα/2 1 1 where pˆ1 = = .125 200 n For confidence coefficient .95, α = 1 − .95 = .05 and α/2 = .05/2 = .025. From Table IV, Appendix B, z.025 = 1.96. The 95% confidence interval is: .125(.875) .125 ± 1.96 ⇒ .125 ± .046 ⇒ (.079, .171) 200
9.50
Using SAS, the output is: The FREQ Procedure Table of CANDIDATE by TIME CANDIDATE
TIME
Frequency| Col Pct | 1| 2| 3| 4| 5| 6| ---------+--------+--------+--------+--------+--------+--------+ SMITH | 208 | 208 | 451 | 392 | 351 | 410 | | 52.53 | 55.32 | 55.34 | 55.92 | 56.16 | 55.33 | ---------+--------+--------+--------+--------+--------+--------+ COPPIN | 55 | 51 | 109 | 98 | 88 | 104 | | 13.89 | 13.56 | 13.37 | 13.98 | 14.08 | 14.04 | ---------+--------+--------+--------+--------+--------+--------+ MONTES | 133 | 117 | 255 | 211 | 186 | 227 | | 33.59 | 31.12 | 31.29 | 30.10 | 29.76 | 30.63 | ---------+--------+--------+--------+--------+--------+--------+ Total 396 376 815 701 625 741
Total 2020 505 1129 3654
Statistics for Table of CANDIDATE by TIME Statistic DF Value Prob -----------------------------------------------------Chi-Square 10 2.2839 0.9937 Likelihood Ratio Chi-Square 10 2.2722 0.9938 Mantel-Haenszel Chi-Square 1 0.9851 0.3209 Phi Coefficient 0.0250 Contingency Coefficient 0.0250 Cramer's V 0.0177 Sample Size = 3654
To determine if candidates received votes independent of time period, we test: H0: Voting and Time period are independent Ha: Voting and Time period are dependent The test statistic is χ2 = 2.2839.
326
Chapter 9
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
Since no value of α was given, we will use α = .05. The rejection region requires α = .05 in the upper tail of the χ2 distribution with df = (r – 1)(c – 1) = (3 – 1)(6 − 1) = 10. From Table 2 = 18.3070. The rejection region is χ2 > 18.3070. VII, Appendix B, χ.05 Since the observed value of the test statistic does not fall in the rejection region (χ2 = 2.2839 >/ 18.3070), H0 is not rejected. There is insufficient evidence to indicate Voting and Time period are dependent at α = .05. Thus, we can conclude that voting and time period are independent. This means that regardless of time period, the percentage of votes received by each candidate is the same. In the table created by SAS, the bottom number in each cell is the column percent. This is the percent of votes received by the candidate in each time period. An inspection of these percents indicates that candidate Smith received approximately 55.3% of the votes each time period, candidate Coppin received approximately 13.8% of the vote, and candidate Montes received approximately 30.9% of the vote. All of this indicates that the election was rigged.
Categorical Data Analysis
327
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
Discrimination in the Work Place
(To accompany Chapters 8–9)
Part I:
If we assume that those selected for termination were randomly selected from all workers, then the Chisquared test for independence is appropriate. Using SAS, the output is: TABLE OF RACE BY DECISION RACE
DECISION
Frequency| Percent | Row Pct | Col Pct |RETAINED|LAIDOFF | Total ---------+--------+--------+ WHITE | 1051 | 31 | 1082 | 86.50 | 2.55 | 89.05 | 97.13 | 2.87 | | 90.29 | 60.78 | ---------+--------+--------+ BLACK | 113 | 20 | 133 | 9.30 | 1.65 | 10.95 | 84.96 | 15.04 | | 9.71 | 39.22 | ---------+--------+--------+ Total 1164 51 1215 95.80 4.20 100.00 STATISTICS FOR TABLE OF RACE BY DECISION Statistic DF Value Prob -----------------------------------------------------Chi-Square 1 43.641 0.001 Likelihood Ratio Chi-Square 1 29.260 0.001 Continuity Adj. Chi-Square 1 40.666 0.001 Mantel-Haenszel Chi-Square 1 43.605 0.001 Fisher's Exact Test (Left) 1.000 (Right) 6.43E-08 (2-Tail) 6.43E-08 Phi Coefficient 0.190 Contingency Coefficient 0.186 Cramer's V 0.190 Sample Size = 1215
328
Discrimination in the Work Place
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
To determine if the variables Race and Decision are related, we test: H0: Race and Decision are independent Ha: Race and Decision are dependent The test statistic is χ2 = 43.641. The p-value is p = .001. Since the p-value is so small, there is evidence to reject H0. There is sufficient evidence to indicate that Race and Decision are related. From the table, only 2.9% of whites were terminated. However, 15.0% of black were terminated. There is a significant difference in these percentages. This supports the plaintiff's position. However, this is all based on the assumption that those selected to be laidoff were randomly selected. However, if the company made its decision based on performance as it claims, then those selected to be terminated were not randomly selected and thus, the test of hypothesis is invalid. Part II: If the workers to be terminated were truly selected at random, then the Chi-square test for independence is appropriate. Using SAS, the output is: TABLE OF STATUS BY AGE1 STATUS
AGE1
Frequency | Percent | Row Pct | Col Pct |UNDER 40|40 + | Total -----------+--------+--------+ ACTIVE | 18 | 13 | 31 | 32.73 | 23.64 | 56.36 | 58.06 | 41.94 | | 72.00 | 43.33 | -----------+--------+--------+ TERMINATED | 7 | 17 | 24 | 12.73 | 30.91 | 43.64 | 29.17 | 70.83 | | 28.00 | 56.67 | -----------+--------+--------+ Total 25 30 55 45.45 54.55 100.00
Discrimination in the Work Place
329
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
STATISTICS FOR TABLE OF STATUS BY AGE1 Statistic DF Value Prob -----------------------------------------------------Chi-Square 1 4.556 0.033 Likelihood Ratio Chi-Square 1 4.651 0.031 Continuity Adj. Chi-Square 1 3.465 0.063 Mantel-Haenszel Chi-Square 1 4.473 0.034 Fisher's Exact Test (Left) 0.993 (Right) 0.031 (2-Tail) 0.055 Phi Coefficient 0.288 Contingency Coefficient 0.277 Cramer's V 0.288 Sample Size = 55
To determine if the variables Status and Age are related, we test: H0: Age and Status are independent Ha: Age and Status are dependent The test statistic is χ2 = 4.556. The p-value is p = .033. Since the p-value is so small, there is evidence to reject H0. There is sufficient evidence to indicate that Age and Status are related. From the table, 56.7% of those aged 40 and over were terminated. However, only 28.0% of those aged under 40 were terminated. There is a significant difference in these percentages. This supports the plaintiff's position. We can also look at some other revealing statistics. If we compare the mean wages of those terminated against those who remained active, there is a significant difference. The mean wages of those terminated is significantly higher than the mean wages of those who remained active. Also, the mean age of those who remained active (33.0) is significantly less than the mean age of those who were terminated (44.08). Also, the mean wage of those under 40 ($26,452.20) was significantly less than the mean wage of those 40 or over ($39,044.17). All of this implies that those who were terminated were those who were older with the higher salaries. It appears that the company wanted to not only reduce the work force, but also reduce its mean expenses for those remaining on the workforce. I can find nothing to support the defendant's position. TTEST PROCEDURE Variable: WAGES STATUS N Mean Std Dev Std Error Variances T DF Prob>|T| ------------------------------------------------------------------------------ACTIVE 31 28772.26 6302.5283 1131.9675 Unequal -6.8124 52.9 0.0001 -6.6214 53.0 0.0000* TERMINATED 24 39195.42 5042.9673 1029.3914 Equal
330
Discrimination in the Work Place
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
For H0: Variances are equal, F' = 1.56 DF = (30,23) Prob>F' = 0.2738 ************************************************************************ Variable: AGE STATUS N Mean Std Dev Std Error Variances T DF Prob>|T| -----------------------------------------------------------------------------ACTIVE 31 33.0000 8.0000 1.4368 Unequal -5.7661 53.0 0.0001* TERMINATED 24 44.0833 6.2549 1.2768 Equal -5.5886 53.0 0.0000 For H0: Variances are equal, F' = 1.64 DF = (30,23) Prob>F' = 0.2273 ************************************************************************ Variable: WAGES AGE1 N Mean Std Dev Std Error Variances T DF Prob>|T| ------------------------------------------------------------------------------UNDER 40 25 26452.2000 4739.5548 947.9110 Unequal -10.1970 49.3 0.0001 -10.2814 53.0 0.0000* 40 + 30 39044.1667 4334.8764 791.4365 Equal For H0: Variances are equal, F' = 1.20 DF = (24,29) Prob>F' = 0.6409
Discrimination in the Work Place
331
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
Simple Linear Regression
10.2
Chapter 10
For all problems below, we use:
a.
Slope =
"rise" y2 − y1 = "run" x2 − x1
Slope =
5 −1 = 1 = β1 5 −1
If y = β0 + β1x, then β0 = y − β1x. Since a given point is (1, 1) and β1 = 1, the y-intercept = β0 = 1 − 1(1) = 0. b.
Slope =
0−3 = −1 = β1 3−0
If y = β0 + β1x, then β0 = y − β1x. Since (0, 3) is given, the y-intercept is β0 = 3 − (−1)(0) = 3. c.
Slope =
2 −1 1 = = .2 = β1 4 − (−1) 5
If y = β0 + β1x, then β0 = y − β1x. Since a given point is (−1, 1) and β1 = 1/5, the y-intercept is β0 = 1 − .2(−1) = 1.2. d.
Slope =
6 − ( −3) 9 = = 1.125 = β1 2 − (−6) 8
If y = β0 + β1x, then β0 = y − β1x. Since a given point is (−6, −3) and β1 = 9/8, the y-intercept is β0 = −3 − 1.125(−6) = 3.75. 10.4
a.
The equation for a straight line (deterministic) is y = β0 + β1x. If the line passes through (1, 1), then 1 = β0 + β1(1) ⇒ 1 = β0 + β1 Likewise, through (5, 5) 5 = β0 + β1(5)
332
Chapter 10
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
Solving for these two equations: 1 = β0 + β1 −(5 = β0 + β1(5)) ──────── −4 = −4β1 ⇒ β1 = 1 Substituting β1 = 1 into the first equation, we get 1 = β0 + 1 ⇒ β0 = 0 The equation is y = 0 + 1x or y = x. b.
The equation for a straight line is y = β0 + β1x. If the line passes through (0, 3), then 3 = β0 + β1(0), which implies β0 = 3. Likewise, through the point (3, 0), then 0 = β0 + 3β1 or −β0 = 3β1. Substituting β0 = 3, we get −3 = 3β1 or β1 = −1. Therefore, the line passing through (0, 3) and (3, 0) is y = 3 − x.
c.
The equation for a straight line is y = β0 + β1x. If the line passes through (−1, 1), then 1 = β0 + β1(−1). Likewise through the point (4, 2), 2 = β0 + β1(4). Solving for these two equations 2 = β0 + β14 −(1 = β0 − β11) ──────── 5β1 or β1 =
1=
d.
1 5
Solving for β0, 1 = β0 +
1 1 1 6 (−1) or 1 = β0 − or β0 = 1 + = 5 5 5 5
The equation, with β0 =
6 1 6 1 and β1 = , is y = + x . 5 5 5 5
The equation for a straight line is y = β0 + β1x. If the line passes through (−6, −3), then −3 = β0 − β16. Likewise, through the point (2, 6), 6 = β0 + β12. Solving these equations simultaneously. 6 = β0 + β12 −[(−3) = β0 − β16] ───────── 9=
8β1 or β1 =
9 8
18 30 ⎛9⎞ Solving for β0, 6 = β0 + 2 ⎜ ⎟ ⇒ 6 − = β0 or β0 = 8 8 ⎝8⎠
Therefore, y =
Simple Linear Regression
30 9 + x. 8 8
333
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
10.6
a.
y = 4 + x. The slope is β1 = 1. The intercept is β0 = 4.
b.
y = 5 − 2x. The slope is β1 = −2. The intercept is β0 = 5.
c.
y = −4 + 3x. The slope is β1 = 3. The intercept is β0 = -4.
d.
y = −2x. The slope is β1 = −2. The intercept is β0 = 0.
e.
y = x. The slope is β1 = 1. The intercept is β0 = 0.
f.
y = .5 + 1.5x. The slope is β1 = 1.5. The intercept is β0 = .5.
10.8
The "line of means" is the deterministic component in a probabilistic model.
10.10
a. xi
yi
xi2
xi yi
7 4 6 2 1 1 3
2 4 2 5 7 6 5
72 = 49 42 = 16 62 = 36 22 = 4 12 = 1 12 = 1 32 = 9
7(2) = 14 4(4) = 16 6(2) = 12 2(5) = 10 1(7) = 7 1(6) = 6 3(5) = 15
∑ x = 7 + 4 + 6 + 2 + 1 + 1 + 3 = 24 ∑ y = 2 + 4 + 2 + 5 + 7 + 6 + 5 = 31 ∑ x = 49 + 16 + 36 + 4 + 1 + 1 + 9 = 116 ∑ x y = 14 + 16 + 12 + 10 + 7 + 6 + 15 = 80
Totals:
i
i
2 i
i
b.
334
SSxy =
c.
SSxx =
d.
βˆ1 =
∑x y i
∑x
2 i
SS xy
=
SS xx
∑x
i
=
i
−
−
i
( ∑ x )( ∑ y ) i
i
n
(∑ x ) i
7
= 80 −
2
= 116 −
(24)(31) = 80 − 106.2857143 = -26.2857143 7
(24) 2 = 116 − 82.28571429 = 33.71428571 7
−26.2857143 = −.779661017 ≈ −.7797 33.71428571 24 = 3.428571429 y = 7
∑x
i
=
31 = 4 .428571429 7
e.
x =
f.
βˆ0 = y − βˆ1 x = 4.428571429 − (−.779661017)(3.428571429)
g.
= 4.428571429 − (−2.673123487) = 7.101694916 ≈ 7.102 The least squares line is yˆ = βˆ0 + βˆ1 x = 7.102 − .7797x.
n
n
Chapter 10
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
10.12
a.
b.
Choose y = 1 + x since it best describes the relation of x and y.
c. y
2 1 3
d.
.5 1.0 1.5
y
X
2 1 3
.5 1.0 1.5
SSE =
∑ ( y − yˆ )
y − yˆ
yˆ = 1 + x
X
2 − 1.5 = .5 1 − 2.0 = −1.0 3 − 2.5 = .5 Sum of errors = 0
1.5 2.0 2.5
yˆ = 3 − x 3 − .5 = 2.5 3 − 1.0 = 2.0 3 − 1.5 = 1.5
y − yˆ 2 − 2.5 = −.5 1 − 2.0 = −1.0 3 − 1.5 = 1.5 Sum of errors = 0
2
SSE for 1st model: y = 1 + x, SSE = (.5)2 + (−1)2 + (.5)2 = 1.5 SSE for 2nd model: y = 3 - x, SSE = (−.5)2 + (−1)2 + (1.5)2 = 3.5 The best fitting straight line is the one that has the smallest least squares. The model y = 1 + x has a smaller SSE, and therefore it verifies the visual check in part a. e.
Some preliminary calculations are:
∑x
=3
SSxy = SSxx =
βˆ1 =
∑
∑ y = 6 ∑ xy = 6.5 ∑ x = 3.5 ( ∑ x )( ∑ y ) = 6.5 − (3)(6) = .5 xy −
∑x
2
3
n
2
−
(∑ x)
.5 = 1; x = .5
Simple Linear Regression
2
n
∑x 3
2
= 3.5 − =
(3) = .5 3
3 = 1; y = 3
∑y 3
=
6 =2 3
335
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
βˆ0 = y − βˆ1 x = 2 − 1(1) = 1 ⇒ yˆ = βˆ0 + βˆ1 x = 1 + x The least squares line is the same as the second line given. 10.14
10.16
a.
The straight-line model would be: y = β o + β1 x + ε
b.
The least squares line is:
c.
Since range of observed values for the number of carats (x) does not include 0, the yintercept has no meaning.
d.
The slope of the line is β1 . In terms of this problem, β1 is the change in the mean asking price for each additional carat. This interpretation is meaningful for values of x within the observed range. The observed range of x is .18 to 1.10.
e.
yˆ = −2, 298.4 + 11, 598.9(.52) = 3, 733.028 . The predicted asking price for a .52 carat diamond is $3,733.028.
a.
Some preliminary calculations are:
yˆ = −2, 298.4 + 11, 598.9 x
∑ x = 62
∑ y = 97.8
∑ x 2 = 720.52
∑ y 2 = 1,710.2
x=
∑ x = 62 = 10.33333333 n
6
SS xy = ∑ xy −
SS xx = ∑ x
βˆ1 =
SS xy SS xx
2
=
∑ xy = 1,087.78
y=
∑ y = 97.8 = 16.3 n
6
( ∑ x )( ∑ y ) = 1,087.78 − 62(97.8) = 1,087.78 − 1,010.6 = 77.18 6
n
(∑ x) − n
2
= 720.52 −
(62) 2 = 720.52 − 640.667 = 79.8533333 6
77.18 = 0.966521957 ≈ 0.9665 79.8533333
βˆo = y − βˆ1 x = 16.3 − 0.966521957(10.33333333) = 6.312606448 ≈ 6.3126 yˆ = 6.3126 + .9665 x b.
336
Since x = 0 is not in the observed range of the mean pore diameters, the y-intercept has no meaning.
Chapter 10
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
10.18
c.
For each unit increase in mean pore diameter, the mean value of porosity is estimated to increase by .9665.
d.
For x = 10, yˆ = 6.3126 + .9665(10) = 15.9776
a.
Some preliminary calculations are:
∑x ∑x
= 6167 2
= 1,641,115
SSxy =
∑ xy −
SSxx =
∑x
∑ y = 135.8 ∑ xy = 34,764.5
n = 24
( ∑ x )( ∑ y )
n (6167)(135.8) = −130.44167 = 34,764.5 − 24 2
(∑ x) −
2
n
2
(6167) = 56,452.95833 24 SS xy −130.44167 = βˆ1 = = −.002310625 ≈ −.0023 SS xx 56452.958 = 1,641,115 −
βˆ0 = y − βˆ1 x =
135.8 ⎛ 6167 ⎞ − (−.002310625) ⎜ ⎟ = 6.252067683 ≈ 6.25 24 ⎝ 24 ⎠
The least squares line is yˆ = 6.25 − .0023x b.
βˆ0 = 6.25. Since x = 0 is not in the observed range, βˆ0 has no interpretation other than being the y-intercept.
βˆ1 = −.0023. For each additional increase of 1 part per million of pectin, the mean sweetness index is estimated to decrease by .0023.
10.20
c.
yˆ = 6.25 − .0023(300) = 5.56
a.
A proposed model is E(y) = βo + β1x.
b.
Some preliminary calculations are:
∑ x = 1, 292.7 ∑ x 2 = 88,668.43
Simple Linear Regression
∑ y = 3,781.1
∑ xy = 218, 291.63
∑ y 2 = 651,612.45
337
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
x=
∑ x = 1, 292.7 = 58.75909091
y=
22
n
SS xy = ∑ xy −
∑ y = 3,781.1 = 171.8681818 22
n
( ∑ x )( ∑ y ) = 218, 291.63 − 1, 292.7(3,781.1)
n 22 = 218, 291.63 − 222,173.9986 = −3,882.3686
(∑ x) −
2
(1, 292.7) 2 n 22 = 88,668.43 − 75,957.87682 = 12,710.55318
SSxx = ∑ x
βˆ1 =
SSxy SSxx
2
=
= 88,668.43 −
−3,882.3686 = −0.305444503 ≈ −0.305 12,710.55318
βˆo = y − βˆ1 x = 171.8681818 − (−0.305444503)(58.75909091) = 189.8158231 ≈ 189.816 The fitted regression line is: yˆ = 189.816 − 0.305 x c.
Using MINITAB, a graph of the fitted regression line is: Fitted Line Plot F C A T-M ath = 189.8 - 0.3054 P ercent 190
S R-Sq R-Sq(adj)
185
5.36572 67.3% 65.7%
FC A T -M ath
180 175 170 165 160 155 10
20
30
40
50 60 P er cent
70
80
90
100
From the fitted regression line, the relationship between the two variables is negative.
338
Chapter 10
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
d.
βˆo = 189.816 . Since 0 is not in the range of observed values of the variable % Below Poverty, the y-intercept has no meaning.
βˆ1 = −0.305 .
e.
For each unit change in % Below Poverty, the mean value of FCAT-Math is estimated to decrease by 0.305.
A proposed model is E(y) = βo + β1x. Some preliminary calculations are:
∑ x = 1, 292.7
∑ y = 3,764.2
∑ x 2 = 88,668.43 x=
∑ y 2 = 645, 221.16
∑ x = 1, 292.7 = 58.75909091 n
22
SSxy = ∑ xy −
∑ xy = 217,738.81
y=
∑ y = 3,764.2 = 171.1 n
22
( ∑ x )( ∑ y ) = 217,738.81 − 1, 292.7(3,764.2)
n = 217,738.81 − 221,180.97 = −3, 442.16
(∑ x) −
22
2
(1, 292.7) 2 n 22 = 88,668.43 − 75,957.87682 = 12,710.55318
SS xx = ∑ x
βˆ1 =
SSxy SSxx
2
=
= 88,668.43 −
−3, 442.16 = −0.270811187 ≈ −0.271 12,710.55318
βˆo = y − βˆ1 x = 171.1 − (−0.270811187)(58.75909091) = 187.0126192 ≈ 187.013 The fitted regression line is: yˆ = 187.013 − 0.271x
Simple Linear Regression
339
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
Using MINITAB, a graph of the fitted regression line is: Fitted Line Plot F C A T-Read = 187.0 - 0.2708 P ercent 185
S
180
FC A T -Read
3.42319
R-Sq R-Sq(adj)
79.9% 78.9%
175 170
165 160
10
20
30
40
50 60 P er cent
70
80
90
100
From the fitted regression line, the relationship between the two variables is negative.
10.22
βˆo = 187.013 .
Since 0 is not in the range of observed values of the variable % Below Poverty, the y-intercept has no meaning.
βˆ1 = −0.271 .
For each unit change in % Below Poverty, the mean value of FCAT-Reading is estimated to decrease by .271.
a.
We will select Average Salary as the dependent variable and Mean GMAT as the independent variable.
b.
Some preliminary calculations are:
∑ x = 6,944
∑ y = 1,080, 288
∑ x 2 = 4,824,680
∑ y 2 = 118,151,669, 430
x=
∑ x = 6,944 = 694.4 n
10
SSxy = ∑ xy −
y=
∑ y = 1,080, 288 = 108,028.8 n
10
( ∑ x )( ∑ y ) = 751,698, 490 − 6,944(1,080, 288)
n = 751,698, 490 − 75,015,987.2 = 1,546,502.8
340
∑ xy = 751,698, 490
10
Chapter 10
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
(∑ x) −
2
(6,944) 2 n 10 = 4,824,680 − 4,821,913.6 = 2,766.4
SSxx = ∑ x
βˆ1 =
SSxy SSxx
2
= 4,824,680 −
1,546,502.8 = 559.0307981 ≈ 559.031 2,766.4
=
βˆo = y − βˆ1 x = 108,028.8 − (559.0307981)(694.4) = −280,162.1862 ≈ −280,162.186 The fitted regression line is: yˆ = −280,162.186 + 559.031x
βˆo = −280,162.186 .
Since 0 is not in the range of observed values of the variable Mean GMAT, the y-intercept has no meaning.
βˆ1 = −0.271 . For each additional point increase in the mean GMAT score, the mean value of Average Salary is estimated to increase by $559.031.
10.24
The graph in b would have the smallest s2 because the width of the data points is the smallest.
10.26
a.
SSE = SSyy − βˆ1 SSxy = 95 − .75(50) = 57.5 s2 =
∑x n
57.5 = 3.19444 20 − 2
=
b.
SSyy =
∑y
c.
SSyy =
∑(y
(∑ y) −
2 2
50 = 797.5 40 n SSE = SSyy − βˆ1 SSxy = 797.5 − .2(2700) = 257.5 SSE 257.5 = = 6.776315789 ≈ 6.7763 s2 = n−2 40 − 2 2
i
= 860 −
− yˆ ) 2 = 58
βˆ1 =
SS xy
=
91 = .535294117 170
SS xx ˆ SSE = SSyy − β1 SSxy = 58 − .535294117(91) = 9.2882353 ≈ 9.288 SSE 9.2882353 = = 1.161029413 ≈ 1.1610 s2 = n−2 10 − 2
10.28
a.
From the printout, SSE = 382,178,624, s2 = MSE = 1,248,950, and s = 1,117.56.
b.
s = 1,117.56. We would expect approximately 95% of the observed values of y to fall within 2s or 2(1,117.56) = 2,235.12 of their least squares predicted values.
Simple Linear Regression
341
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
10.30
a.
From part a of Exercise 10.17, SSxy = 20.00833333,
∑ y = 239 , ∑ y 2 = 10, 255 ,
and βˆ1 = 35.91623038 .
(∑ y) −
2
(239) 2 n 6 = 10, 255 − 9520.166667 = 734.8333333
SS yy = ∑ y
2
= 10, 255 −
SSE = SS yy − βˆ1SS xy = 734.833333 − 35.91623068(20.00833333) = 16.2094179 s 2 = MSE =
10.32
SSE 16.2094179 = = 4.052354475 and s = 4.052354475 = 2.013 n−2 6−2
b.
s = 2.013. We would expect approximately 95% of the observed values of y (Drug release rate) to fall within 2s or 2(2.013) = 4.026 units of their least squares predicted values.
a.
Using MINITAB, the scattergram of the data is:
b.
∑ x = 44.71 ∑ y = 131,670 ∑ y = 1,514,402,100
∑ xy
= 493,117.7
∑x
2
= 167.4615
2
x=
∑ x = 44.71 = 3.7258333 n
SSxy =
12
∑ xy −
y=
∑ y = 131, 670 n
12
= 10,972.5
( ∑ x )( ∑ y ) = 493,117.7 − 44.71(131, 670)
n = 493,117.7 − 490,580.475 = 2,537.225
(∑ x) −
12
2
44.712 n 12 = 167.4615 − 166.5820083 = .8794917
SSxx =
βˆ1 =
342
∑x
SSxy SS xx
2
=
= 167.4615 −
2, 537.225 = 2884.876571 ≈ 2884.877 .8794917
Chapter 10
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
βˆ0 = y − βˆ 1 x = 10,972.5 − 2884.876571(3.7258333) = 10,972.5 − 10,748.56929 = 233.93071 ≈ 233.931 The fitted regression line is = 233.931 + 2884.877x
c.
(∑ y) −
2
131, 6702 n 12 = 1,514,402,100 – 1,444,749,075 = 69,653,025
∑y
SSyy =
2
= 1,514,402,1000 −
SSE = SSyy − βˆ1 SSxy = 69,653,025 − 2,884.876571(2,537.225) = 69,653,025 - 7,319,580.958 = 62,333,444.04 s2 =
SSE 62, 333, 444.04 = = 6,233,344.404 n−2 12 − 2
s=
s 2 = 6, 233, 344.404 = 2,496.6667
We would expect to see most of the hospital charges to fall within 2s or 2($2,496.6667) = $4,993.3333 of the least squares line. d.
For x = 4, yˆ = 223.931 + 2,884.877(4) = 11,763.439 yˆ ± 2s ⇒ 11,763.439 ± 4,993.3333 ⇒ (6,770.106, 16,756.772)
e.
10.34
Only one state (California) had an average hospital charge more than 2 standard errors from the least squares line. Thus, 11 out of 12 or 11/12 or .917 of the states had average hospital charges within 2 standard errors of the least squares line.
Some preliminary calculations for Brand A are:
∑ x = 750
∑x
SSxy = ∑ xy −
∑ x∑ y = 2, 022 − 750(44.8) = −218
SSxx = ∑ x 2 − SS yy = ∑ y
2
2
= 40, 500
∑ xy = 2, 022 ∑ y = 44.8
n
(∑ x)
2
= 168.70
15
2
n
(∑ y) − n
∑y
= 40, 500 −
7502 = 3, 000 15
= 168.70 −
44.82 = 34.89733333 15
2
−218 = −0.0726666667 ≈ −0.0727 SSxx 3, 000 44.8 750 βˆ0 = y − βˆ1 x = − (−0.0726666667) = 6.62 15 15
βˆ1 =
SSxy
=
Simple Linear Regression
343
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
The least squares prediction equation for Brand A is: yˆ = 6.62 − 0.0727 x Some preliminary calculations for Brand B are:
∑ x = 750
∑x
SSxy = ∑ xy −
∑ x∑ y = 2, 622 − 750(58.9) = −323
SSxx = ∑ x 2 −
SS yy = ∑ y
2
2
= 40, 500
∑ xy = 2, 622 ∑ y = 58.9
n
(∑ x)
∑y
2
= 270.89
15
2
n
(∑ y) −
= 40, 500 −
7502 = 3, 000 15
= 270.89 −
58.92 = 39.60933333 15
2
n − 323 βˆ1 = = = −0.1076666667 ≈ −0.1077 SSxx 3, 000 58.9 750 βˆ0 = y − βˆ1 x = − (−0.1076666667) = 9.31 15 15 SSxy
The least squares prediction equation for Brand B is: yˆ = 9.31 − 0.1077 x For Brand A, SSE = SS yy − βˆ1SS xy = 34.89733333 − ( −0.072666667)(−218) = 19.0560 s 2 = MSE =
SSE 19.0560 = = 1.4658 and s = 1.4658 = 1.211 n − 2 15 − 2
For Brand B, SSE = SS yy − βˆ1SS xy = 39.60933333 − (−0.107666667)(−323) = 4.833 s 2 = MSE =
SSE 4.833 = = 0.37177 and s = 0.37177 = .61 n − 2 15 − 2
For Brand A, yˆ = 6.62 − .0727x. For x = 70, yˆ = 6.62 − .0727(70) = 1.531 2s = 2(1.211) = 2.422 Therefore, yˆ ± 2s ⇒ 1.531 ± 2.422 ⇒ (−.891, 3.593) For Brand B, yˆ = 9.31 − .1077x. For x = 70, yˆ = 9.31 − .1077(70) = 1.7 2s = 2(.61) = 1.22 Therefore, yˆ ± 2s ⇒ 1.771 ± 1.22 ⇒ (.551, 2.991) More confident with Brand B since there is less variation (s is smaller).
344
Chapter 10
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
10.36
a.
b.
Some preliminary calculations are:
∑x
= 21
SSxy =
∑
∑ x = 91 ∑ xy = 86 ∑ y = 21 ∑ x∑ y = 86 − 21(21) = 86 − 63 = 23 xy − 2
n
SSxx =
∑x
2
SSyy =
∑y
2
−
(∑ x)
∑y
2
= 89
7
2
n
(∑ y) −
2
= 91 −
21 = 91 − 63 = 28 7
= 89 −
212 = 26 7
2
n
23 = .821428571 ≈ .821 28 SS xx 21 ⎛ 21 ⎞ βˆ0 = y − βˆ1 x = − .821428571 ⎜ ⎟ = 3 − 2.4642857 = .535714285 ≈ .536 7 ⎝ 7 ⎠
βˆ1 =
SS xy
=
The fitted line is yˆ = .536 + .821x. c. d.
See the plot in part a. To test whether x contributes significant information for predicting y, we test: H0: β1 = 0 Ha: β1 ≠ 0
e.
The test statistic is t =
βˆ1 − 0 sβˆ
1
where sβˆ = 1
s SSxx
SSE = SSyy − βˆ1 SSxy = 26 − .821428571(23) = 7.107142857 SSE 7.107142857 = 1.421428571 s2 = s = 1.42143 = 1.1922 = 7−2 n−2 1.1922 .82143 − 0 sβˆ = = .2253 t= = 3.646 1 .2253 28 The degrees of freedom for this t is df = n − 2 = 7 − 2 = 5.
Simple Linear Regression
345
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
f.
The rejection region requires α/2 = .05/2 = .025 in each tail of the t-distribution. From Table VI, Appendix B, t.025 = 2.571 with df = n − 2 = 7 − 2 = 5. The rejection region is t > 2.571 or t < −2.571. Since the observed value of the test statistic falls in the rejection region (t = 3.646 > 2.571), H0 is rejected. There is sufficient evidence to indicate that x contributes information for the prediction of y at α = .05.
10.38
Some preliminary calculations are:
∑x
= 21
SSxy =
∑
∑ x = 91 ∑ xy = 65 ∑ y = 19 ∑ x∑ y = 65 − 21(19) = 65 − 66.5 = -1.5 xy − 2
n
SSxx =
∑ x2 −
SSyy =
∑y
2
(∑ x)
∑y
2
= 65
6
2
n
(∑ y) −
= 91 −
212 = 91 − 73.5 = 17.5 6
= 65 −
192 = 65 − 60.166667 = 4.8333333 6
2
n − 1.5 SS xy βˆ1 = = = −.085714285 ≈ −.0857 17.5 SS xx SSE = SSyy − βˆ1 SSxy = 4.8333333 − (−.085714285)(−1.5) = 4.704761903 SSE 4.704761903 s2 = s = 1.76190476 = 1.0845 = = 1.176190476 6−2 n−2
To determine whether a straight line is useful for characterizing the relationship between x and y, we test:
H0: β1 = 0 Ha: β1 ≠ 0 The test statistic is t =
βˆ1 − 0 sβˆ
=
−.08571 − 0 = −.33 1.0845
1
17.5 The rejection region requires α/2 = .05/2 = .025 in each tail of the t-distribution with df = n − 2 = 6 - 2 = 4. From Table VI, Appendix B, t.025 = 2.776. The rejection region is t > 2.776 or t < −2.776. Since the observed value of the test statistic does not fall in the rejection region (t = −.33 −2.776), H0 is not rejected. There is insufficient evidence to indicate that a straight line is useful for characterizing the relationship between x and y at α = .05.
346
Chapter 10
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
10.40
a.
To determine if the average state SAT score in 2005 has a positive relationship with the average state SAT score in 1990, we test:
H0: β1 = 0 Ha: β1 > 0 b.
From the printout in Exercise 10.15, the p-value is p = 0.000. This is the p-value for a 2tailed test. The p-value for this one-tailed test is 0.000/2 = 0.000. Since the p-value is less than α = .05, H0 is rejected. There is sufficient evidence to indicate the average state SAT score in 2005 has a positive relationship with the average state SAT score in 1990 at α = .05.
c.
For confidence coefficient .95, α = .05 and α/2 = .05/2 = .025. From Table VI, Appendix B, with df = n – 2 = 51 – 2 = 49, t.025 ≈ 2.011. The 95% confidence interval is:
βˆ1 ± t.025 sβˆ ⇒ 1.073 ± 2.011(.056) ⇒ 1.073 ± .113 ⇒ (.960, 1.186) 1
We are 95% confident that for each additional point in the 1990 average state SAT score, the increase in the 2005 average stat SAT score is between .960 and 1.186. 10.42
From Exercise 10.18, SSxy = −130.44167, βˆ1 = -0.002310625, and SSxx = 56,452.95833.
∑ y = 135.8
∑ y = 769.72 ( ∑ y ) = 769.72 − 135.8 − 2
2
SS yy = ∑ y
2
n
24
2
= 1.3183333
SSE = SS yy − βˆ1SS xy = 1.3183333 − ( −0.002310625)(−130.44167) = 1.016931516 SSE 1.016931516 = = 0.046224159 and s = 0.046224159 = 0.214998 n−2 24 − 2 MSE 0.214998 sβˆ = = = 0.0009049 1 SSxx 56, 452.95833
s 2 = MSE =
For confidence coefficient .95, α = .05 and α/2 = .05/2 = .025. From Table VI, Appendix B, with df = n – 2 = 24 – 2 = 22, t.025 = 2.074. The confidence interval is:
βˆ1 ± t.025 sβˆ ⇒ −0.0023 ± 2.074(0.0009049) 1
⇒ −0.0023 ± 0.0019 ⇒ (−0.0042, − 0.0004)
We are 95% confident that for each additional point increase in the amount of soluble pectin, the mean sweetness index will decrease from between .0004 and .0042 points.
Simple Linear Regression
347
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
10.44
a.
From Exercise 10.23, SSxy = -787.51087, SSxx = 6,906.6087,
∑y
2
∑ y = 60.1 ,
= 262.271 , and βˆ1 = −0.114022801 .
(∑ y) −
2
(60.1) 2 23 n = 262.271 − 157.043913 = 105.227087
SS yy = ∑ y
2
= 262.271 −
SSE = SS yy − βˆ1SS xy = 105.227087 − ( −0.114022801)( −787.51087) = 15.43289179 s 2 = MSE =
sβˆ = 1
SSE 15.43289179 = = 0.734899609 and s = 0.734899609 = 0.8573 n−2 23 − 2
MSE
=
SS xx
0.734899609 6,906.6087
= 0.010315
To determine if the mass of the spill tends to diminish linearly as time increases, we test: H0: β1 = 0 Ha: β1 < 0 The test statistic is t =
βˆ1 − 0 sβˆ
1
=
−0.114022801 = −11.05 0.010315
The rejection region requires α = .05 in the lower tail of the t-distribution with df = n – 2 = 23 – 2 = 21. From Table VI, Appendix B, t.05 = 1.721. The rejection region is t < −1.721. Since the observed value of the test statistic falls in the rejection region (t = −11.05 < −1.721), H0 is rejected. There is sufficient evidence to indicate the mass of the spill tends to diminish linearly as time increases at α = .05. b.
For confidence coefficient .95, α = .05 and α/2 = .05/2 = .025. From Table VI, Appendix B, with df = n – 2 = 23 – 2 = 21, t.025 = 2.080. The 95% confidence interval is:
βˆ1 ± t.025 sβˆ ⇒ −0.1140 ± 2.080(0.010315) ⇒ −0.1140 ± 0.02146 1
⇒ (−0.13546, -0.09254) We are 95% confident that for each additional minute of elapsed time, the decrease in spill mass is between 0.13546 and 0.09254.
348
Chapter 10
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
10.46
a.
Using MINITAB, the scattergram is:
It appears from the plot that as the percentage of the population that is minority increases, the number of people per branch bank tends to increase. b.
The value of β1 will be positive. As one variable increases, the other tends to increase.
c.
∑ x = 363.8
∑y x=
2
∑ y = 56,560
∑ xy = 1,075,763
∑x
2
= 9,020.86
= 158,763,894
∑ x = 363.8 = 17.32380952 n
SSxy =
21
∑ xy −
y=
∑ x = 56, 560 = 2,693.33333 n
21
( ∑ x )( ∑ y ) = 1, 075, 763 − 363.8(56, 560)
n 21 = 1,075,763 − 979,834.6667 = 95,928.3333
(∑ x) −
2
363.82 n 21 = 9,020.86 − 6,302.401905 = 2,718.458095
SSxx =
βˆ1 =
∑x
SS xy
2
=
SS xx
= 9,020.86 −
95, 928.3333 = 35.28777342 ≈ 35.288 2, 718.458095
(∑ y) −
2
56, 5602 n 21 = 158,763,894 - 152,334,933.3 = 6,428,960.7
SSyy =
∑y
2
= 158,863,894 −
SSE = SSyy − βˆ1 SSxy = 6,428,960.7 − 35.28777342(95,928.3333) = 6,428,960.7 − 3,385,097.29 = 3,043,863.41 s2 =
SSE 3, 043,863.41 = = 160,203.3374 n−2 21 − 2
Simple Linear Regression
349
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
s=
2 s = 160, 203.3374 = 400.2541
To determine if the data support the charge made against the New Jersey banking community, we test: H0: β1 = 0 Ha: β1 ≠ 0 The test statistic is t =
βˆ1 − 0 sβˆ
=
35.288 − 0 400.2541
= 4.597
1
2, 718.458095 The rejection region requires α/2 =.01/2 = .005 in each tail of the t-distribution with df = n − 2 = 21 − 2 = 19. From Table VI, Appendix B, t.005 = 2.861. The rejection region is t < −2.861 or t > 2.861. Since the observed value of the test statistic falls in the rejection region (t = 4.597 > 2.861), H0 is rejected. There is sufficient evidence to support the charge made against the New Jersey banking community at α = .01. 10.48
a.
b.
Using MINITAB, the regression analysis is: Regression Analysis: Index versus Interactions The regression equation is Index = 44.1 + 0.237 Interactions Predictor Constant Interact S = 19.40
Coef 44.130 0.2366
SE Coef 9.362 0.1865
R-Sq = 8.6%
T 4.71 1.27
P 0.000 0.222
R-Sq(adj) = 3.3%
Analysis of Variance Source Regression Residual Error Total
DF 1 17 18
SS 606.0 6400.6 7006.6
MS 606.0 376.5
F 1.61
P 0.222
From the printout, the least squares line is yˆ = 44.13 + .2366x.
350
Chapter 10
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
c.
From the printout, s = 19.40 The standard deviation s represents the spread of the manager success index about the least squares line. Approximately 95% of the manager success indexes should lie within 2s = 2(19.40) = 38.8 of the least squares line.
d.
Refer to the scattergram in part a. The number of interactions with outsiders might contribute some information in the prediction of managerial success, but it does not look like a very strong relationship.
e.
To determine if the number of interactions contributes information for the prediction of managerial success, we test: H0: β1 = 0 Ha: β1 ≠ 0 The test statistic is t =
βˆ1 − 0 sβˆ
= 1.27
1
The rejection region requires α/2 = .05/2 = .025 in each tail of the t-distribution with df = n − 2 = 19 − 2 = 17. From Table VI, Appendix B, t.025 = 2.110. The rejection region is t > 2.110 or t < −2.110. Since the observed value of the test statistic does not fall in the rejection region (t = 1.27 >/ 2.110), H0 is not rejected. There is insufficient evidence to indicate the number of interactions contributes information for the prediction of managerial success at α = .05. f.
For confidence coefficient .95, α = 1 − .95 = .05 and α/2 = .05/2 = .025. From Table VI, Appendix B, with df = 17, t.025 = 2.110. The 95% confidence interval is:
βˆ1 ± t.025 sβˆ ⇒ .2366 ± 2.110(.1865) ⇒ .2366 ± .3935 ⇒ (−.1569, .6301) 1
We are 95% confident the change in the mean manager success index for each additional interaction with outsiders is between −.1569 and .6301. 10.50
a.
Using MINITAB, the regression analysis is: Regression Analysis: Risk versus Credit The regression equation is Risk = 56.2 - 0.400 Credit Predictor Constant Credit
Coef 56.215 -0.39961
S = 12.6777
SE Coef 6.033 0.09152
R-Sq = 33.4%
T 9.32 -4.37
P 0.000 0.000
R-Sq(adj) = 31.7%
Analysis of Variance Source Regression Residual Error Total
Simple Linear Regression
DF 1 38 39
SS 3064.4 6107.5 9171.9
MS 3064.4 160.7
F 19.07
P 0.000
351
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
To determine if country credit risk contributes information for the prediction of market volatility, we test: H0: β1 = 0 Ha: β1 ≠ 0 The test statistic is t =
βˆ1 − 0 sβˆ
= −4.37 (from printout).
1
The p-value is .000. Since the p-value is so small, there is strong evidence to indicate that country credit risk contributes information for the prediction of market volatility at α > .000. b.
Using MINITAB, a scattergram of the data with the fitted regression line is:
Regression Plot Risk = 56.22 − .3996 Credit S = 12.6777
R-Sq = 33.4 %
R-Sq(adj) = 31.7 %
90
80 70
Ris k
60
50
40 30
20 10 20
30
40
50
60
70
80
90
100
Credit
From the plot, there appears to be several outliers. Observations 1, 19, 34, and 36 have arrows pointing at them.
352
Chapter 10
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
c.
Eliminating those four data points and using MINITAB, the regression analysis is as follows: The regression equation is Risk = 48.9 - 0.316 Credit Predictor Constant Credit
Coef 48.891 -0.31599
s = 7.46401
Stdev 3.991 0.05883
R-sq = 45.9%
t-ratio 12.25 -5.37
p 0.000 0.000
R-sq(adj) = 44.3%
Analysis of Variance SOURCE Regression Error Total Unusual Obs. 4 25 27
DF 1 34 35
SS 1607.4 1894.2 3501.6
Observations C2 C1 35.1 63.70 25.3 23.30 55.6 46.40
MS 1607.4 55.7
Fit Stdev.Fit 37.80 2.13 40.90 2.62 31.32 1.35
F 28.85
Residual 25.90 -17.60 15.08
p 0.000
St.Resid 3.62R -2.52R 2.05R
R denotes an obs. with a large st. resid.
After eliminating the four data points, the regression analysis is very similar. The fitted regression line is:
yˆ = 48.891 − .31599x To determine if country credit risk contributes information for the prediction of market volatility, we test: H0: β1 = 0 Ha: β1 ≠ 0 The test statistic is t =
βˆ1 − 0 sβˆ
= −5.37 (from printout).
1
The p-value is .000. Since the p-value is so small, there is strong evidence to indicate that country credit risk contributes information for the prediction of market volatility at α > .000. The standard error for the analysis when the four data points have been removed (s = 7.464) is much smaller than the standard error with all the data points (s = 12.6777).
Simple Linear Regression
353
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
10.52
10.54
a.
r = 1 implies x and y are perfectly, positively related.
b.
r = −1 implies x and y are perfectly, negatively related.
c.
r = 0 implies x and y are not related.
d.
r = .90 implies x and y are positively related. Since r is close to 1, the strength of the relationship is very high.
e.
r = .10 implies x and y are positively related. Since r is close to 0, the relationship is fairly weak.
f.
r = −.88 implies x and y are negatively related. Since r is close to −1, the relationship is fairly strong.
a.
Some preliminary calculations are:
∑x =0 ∑ y = 12 SSxy =
∑
∑ x = 10 ∑ xy = 20 ∑ y = 70 ∑ x∑ y = 20 − 0(12) = 20 xy − 2
2
n
SSxx =
∑x
2
SSyy =
∑y
2
r=
−
(∑ x)
5
2
n
(∑ y) −
SS xy
n
=
SS xxSS yy
2
= 10 −
0 = 10 5
= 70 −
122 = 41.2 5
2
20 10(41.2)
= .9853
r2 = .98532 = .9709 Since r = .9853, there is a very strong positive linear relationship between x and y. Since r2 = .9709, 97.09% of the total sample variability around the sample mean response is explained by the linear relationship between x and y.
354
Chapter 10
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
b.
Some preliminary calculations are:
∑x =0 ∑ y = 16 SSxy =
∑
∑ x = 10 ∑ xy = −15 ∑ y = 74 ∑ x∑ y = −15 − 0(16) = −15 xy − 2
2
n
SSxx =
∑x
2
SSyy =
∑y
2
r=
−
(∑ x)
5
2
n
(∑ y) −
02 = 10 5
= 74 −
162 = 22.8 5
2
n
SS xy
= 10 −
−15
=
10(22.8) SS xxSS yy 2 2 r = (−.9934) = .9868
= −.9934
Since r = −.9934, there is a very strong negative linear relationship between x and y. Since r2 = .9868, 98.68% of the total sample variability around the sample mean response is explained by the linear relationship between x and y. c.
Some preliminary calculations are:
∑ x = 18 ∑ y = 14 SSxy =
∑
∑ x = 52 ∑ xy = 36 ∑ y = 32 ∑ x∑ y = 36 − 18(14) = 0 xy − 2
2
n
SSxx =
∑x
2
SSyy =
∑y
2
Simple Linear Regression
−
(∑ x)
7
2
n
(∑ y) − n
= 52 −
182 = 5.71428571 7
= 32 −
142 =4 7
2
355
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
SS xy
r=
0
=
5.71428571(4)
SS xxSS yy
=0
r2 = 02 = 0 Since r = 0, this implies that x and y are not linearly related. Since r2 = 0, 0% of the total sample variability around the sample mean response is explained by the linear relationship between x and y.
d.
Some preliminary calculations are:
∑ x = 15 ∑y =4 SSxy =
∑
∑ x = 71 ∑ xy = 12 ∑y =6 ∑ x∑ y = 12 − 15(4) = 0 xy − 2
2
n
SSxx =
∑ x2 −
SSyy =
∑y
r=
2
(∑ x)
5
2
n
(∑ y) −
SS xy
SS xxSS yy 2 2 r =0 =0
n
=
= 71 −
152 = 26 5
=6−
42 = 2.8 5
2
0 26(2.8)
=0
Since r = 0, this implies that x and y are not linearly related. Since r2 = 0, 0% of the total sample variability around the sample mean response is explained by the linear relationship between x and y.
356
Chapter 10
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
10.56
10.58
10.60
a.
From the printout, r2 = R-Sq = 89.3%. 89.3% of the total sample variability around the sample mean asking price is explained by the linear relationship between asking price and number of carats for diamond.
b.
r = r 2 = .893 = .945. The value of r has the same sign as βˆ1 , which is positive. Since r is very close to 1, there is a strong positive linear relationship between asking price and number of carats for diamond.
a.
Since r = .43, there is a fairly weak positive linear relationship between total time allotted to sports and audience rating.
b.
r2 = .432 = .1849. Since r2 = .1849, 18.49% of the total sample variability around the sample mean audience rating is explained by the linear relationship between audience rating and total time allocated to sports.
a.
Using MINITAB, a scattergram of the data is: Scatterplot of NetWorth vs Age 50
NetWor th
40
30
20
10 20
30
40
50
60
70
80
90
A ge
There appears to be a slight increase in the Net Worth as age increases, but the relationship is fairly weak. b.
Some preliminary calculations are:
∑ x = 859
∑ y = 303.8
∑ x 2 = 53,567
∑ y 2 = 8, 202.28
SS xy = ∑ xy −
∑ xy = 17,841.6
( ∑ x )( ∑ y ) = 17,841.6 − 859(303.8)
15 n = 17,841.6 − 17,397.61333 = 443.98667
Simple Linear Regression
357
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
(∑ x) −
2
(859) 2 15 n = 53,567 − 49,192.06667 = 4,374.93333
SS xx = ∑ x
2
= 53,567 −
(∑ y) −
2
(303.8) 2 15 n = 8, 202.28 − 6,152.962667 = 2,049.317333
SS yy = ∑ y
βˆ1 =
r=
SSxy SS xx
2
=
= 8, 202.28 −
443.98667 = 0.101484213 ≈ 0.1015 4,374.93333
SSxy SSxx SS yy
=
443.98667 = .1483 4,374.93333 2,049.317333
Since r is positive, there is a very weak positive linear relationship between a person’s net worth and his/her age. c.
If r had a negative sign, the interpretation would be: Since r is negative, there is a very weak negative linear relationship between a person’s net worth and his/her age.
10.62
From Exercises 10.23 and 10.44, SSxy = -787.51087, SSxx = 6,906.6087, and SSyy = 105.227087.
r=
SSxy SSxx SS yy
=
−787.51087 = −.924 6,906.6087 105.227087
There is a very strong negative linear relationship between mass of spill and elapsed time of the spill.
r 2 = −.9242 = .854 Approximately 85.4% of the variability in the mass of the spill around the sample mean is explained by the linear relationship between mass of the spill and elapsed time of the spill.
358
Chapter 10
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
10.64
a.
Using MINITAB, the scattergram is:
15
WeightChg
10
5
0
-5
-10 0
10
20
30
40
50
60
70
80
Digest
b.
Some preliminary calculations are:
∑ x = 1, 266.5 ∑ y = 1, 075.5
∑x
2
∑ xy = 4,103.25 ∑ y = 46
= 57, 390.75
2
∑ x∑ y = 4,103.25 − 1, 266.5(46) = 2, 716.130952
SSxy = ∑ xy − SSxx = ∑ x 2 − SS yy = ∑ y
βˆ1 = r=
SSxy SSxx
2
n
(∑ x)
42
2
= 57, 390.75 −
n
(∑ y) −
2
= 1, 075.5 −
(1, 266.5) 2 = 19,199.74405 42
462 = 1, 025.119048 42
n 2, 716.130952 = = 0.141467039 19,199.74405
SSxy SSxx SS yy
=
2, 716.130952 19,199.74405 1, 025.119048
= .6122
There is a moderate positive linear relationship between digestion efficiency and weight change. c.
To determine whether weight change is correlated with digestion, we test: H0: ρ = 0 Ha: ρ ≠ 0 The test statistic is t =
Simple Linear Regression
r 1− r n−2 2
=
.6122 1 − .61222 42 − 2
= 4.90
359
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
The rejection region requires α/2 = .01/2 = .005 in each tail of the t-distribution with df = n – 2 = 42 – 2 = 40. From Table VI, Appendix B, t.005 = 2.704. The rejection region is t > 2.704 or t < −2.704. Since the observed value of the test statistic falls in the rejection region (t = 4.90 > 2.704), H0 is rejected. There is sufficient evidence to indicate weight change and digestion are correlated at α = .01. d.
After deleting the data corresponding to duck chow, the preliminary calculations are:
∑ x = 701.50 SS xy = ∑ xy −
SS xx = ∑ x 2 − SS yy = ∑ y
2
∑x
2
= 21, 069
∑ xy = 99.5 ∑ y = −18 ∑ y
2
= 404.00
∑ x∑ y = 99.5 − 701.50(−18) = 482.1363636 n
(∑ x)
33
2
= 21, 069 −
n
(∑ y) −
2
= 404 −
(701.50) 2 = 6,156.81061 33
(−18) 2 = 394.1818182 33
n 482.1363636 βˆ1 = = = 0.078309435 SSxx 6,156.81061 SSxy
r=
SSxy SSxx SS yy
482.1363636
=
= .3095
6,156.81061 394.1818182
There is a rather weak positive linear relationship between digestion efficiency and weight change. To determine whether weight change is correlated with digestion, we test: H0: ρ = 0 Ha: ρ ≠ 0 The test statistic is t =
r
=
.3095
= 1.81 1 − r2 1 − .30952 n−2 33 − 2 The rejection region requires α/2 = .01/2 = .005 in each tail of the t-distribution with df = n – 2 = 33 – 2 = 31. From Table VI, Appendix B, t.005 = 2.750. The rejection region is t > 2.750 or t < −2.750.
Since the observed value of the test statistic does not fall in the rejection region (t = 1.81 >/ 2.750), H0 is not rejected. There is insufficient evidence to indicate weight change and digestion are correlated at α = .01.
360
Chapter 10
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
e.
Using MINITAB, the scattergram is:
80 70 60
Digest
50 40 30 20 10 0 5
15
25
35
Fiber
Some preliminary calculations are:
∑ x = 943.5 ∑ x ∑ y = 57, 390.75
2
= 24, 533.25
∑ xy = 21, 405.5 ∑ y = 1, 266.5
2
SSxy = ∑ xy − SSxx = ∑ x 2 − SS yy = ∑ y
βˆ1 = r=
SSxy SSxx
2
∑ x∑ y = 21, 405.5 − 943.5(1, 266.5) = −7, 045.51786 n
(∑ x)
42
2
n
(∑ y) −
= 24, 533.25 −
(943.5) 2 = 3, 338.19643 42
= 57, 390.75 −
1, 266.52 = 19,199.74405 42
2
n −7, 045.51786 = = −2.110576177 3, 338.19643
SSxy SSxx SS yy
=
−7, 045.51786 3, 338.19643 19,199.74405
= −.8801
There is a fairly strong negative linear relationship between digestion efficiency and acid-detergent fiber.
Simple Linear Regression
361
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
To determine whether acid-detergent fiber is correlated with digestion, we test: H0: ρ = 0 Ha: ρ ≠ 0 The test statistic is t =
r 1 − r2 n−2
=
−.8801 1 − (−.8801) 2 42 − 2
= −11.72
The rejection region requires α/2 = .01/2 = .005 in each tail of the t-distribution with df = n – 2 = 42 – 2 = 40. From Table VI, Appendix B, t.005 = 2.704. The rejection region is t > 2.704 or t < −2.704. Since the observed value of the test statistic falls in the rejection region (t = −11.72 < −2.704), H0 is rejected. There is sufficient evidence to indicate acid-detergent fiber and digestion are correlated at α = .01. After deleting the data corresponding to duck chow, the preliminary calculations are:
∑ x = 877 ∑ x ∑ y = 21, 069
∑ xy = 17, 274 ∑ y = 701.50
= 24, 036.5
2
2
∑ x∑ y = 17, 274 − 877(701.50) = −1, 368.89394
SSxy = ∑ xy −
SSxx = ∑ x 2 − SS yy = ∑ y
2
n
(∑ x)
33
2
= 24, 036.5 −
n
(∑ y) −
2
= 21, 069 −
(877) 2 = 729.56061 33
(701.50) 2 = 6,156.81061 33
n −1, 368.89394 βˆ1 = = = −1.876326547 SSxx 729.56061 SSxy
r=
SSxy SSxx SS yy
=
−1, 368.89394
= −.6459
729.56061 6,156.81061
There is a moderate negative linear relationship between digestion efficiency and acid-detergent fiber. To determine whether acid-detergent fiber is correlated with digestion, we test: H0: ρ = 0 Ha: ρ ≠ 0
362
Chapter 10
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
The test statistic is t =
r 1 − r2 n−2
=
−.6459 1 − ( −.6459) 2 33 − 2
= −4.71
The rejection region requires α/2 = .01/2 = .005 in each tail of the t-distribution with df = n – 2 = 33 – 2 = 31. From Table VI, Appendix B, t.005 = 2.750. The rejection region is t > 2.750 or t < −2.750. Since the observed value of the test statistic falls in the rejection region (t = −4.71 < −2.750), H0 is rejected. There is sufficient evidence to indicate acid-detergent fiber and digestion are correlated at α = .01. 10.66
a.
b.
Some preliminary calculations are:
∑x
= 28
SSxy =
∑
∑ x = 224 ∑ xy = 254 ∑ y = 37 ∑ y ∑ x∑ y = 254 − 28(37) = 106 xy − 2
n
SSxx =
∑ x2 −
SSyy =
∑y
2
(∑ x)
= 307
7
2
n
(∑ y) − n
2
= 224 −
282 = 112 7
= 307 −
37 2 = 111.4285714 7
2
106 = .946428571 SS xx 112 37 ⎛ 28 ⎞ − .946428571 ⎜ ⎟ = 1.5 βˆ0 = y − βˆ1 x = 7 ⎝ 7 ⎠
βˆ1 =
SS xy
=
The least squares line is yˆ = 1.5 + .946x. c.
SSE = SSyy − βˆ1 SSxy = 111.4285714 − (.946428571)(106) = 11.1071429 SSE 11.1071429 = = 2.22143 s2 = n−2 7−2
Simple Linear Regression
363
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
d.
The form of the confidence interval is: 1 ( xp − x ) yˆ ± tα/2s + SSxx n
2
where s =
2 s =
For xp = 3, yˆ = 1.5 + .946(3) = 4.338 and x =
2.22143 = 1.4904
28 =4 7
For confidence coefficient .90, α = 1 − .90 = .10 and α/2 = .10/2 = .05. From Table VI, Appendix B, t.05 = 2.015 with df = n − 2 = 7 − 2 = 5. The 90% confidence interval is: 1 (3 − 4) + ⇒ 4.338 ± 1.170 ⇒ (3.168, 5.508) 7 112 2
4.338 ± 2.015(1.4904) e.
The form of the prediction interval is: 1 ( xp − x ) yˆ ± tα/2s 1 + + SSxx n
2
The 90% prediction interval is: 1 (3 − 4) + ⇒ 4.338 ± 3.223 ⇒ (1.115, 7.561) 7 112 2
4.338 ± 2.015(1.4904) 1 + f.
The 95% prediction interval for y is wider than the 95% confidence interval for the mean value of y when xp = 3. The error of predicting a particular value of y will be larger than the error of estimating the mean value of y for a particular x value. This is true since the error in estimating the mean value of y for a given x value is the distance between the least squares line and the true line of means, while the error in predicting some future value of y is the sum of two errors—the error of estimating the mean of y plus the random error that is a component of the value of y to be predicted.
10.68
a.
The form of the confidence interval is: s ∑ y = 22 = 2.2 y ± tα/2 where y = n 10 n
s2 =
∑y
2
(∑ y) − n −1
n
2
=
(22) 2 10 = 3.7333 and s = 1.9322 10 − 1
82 −
For confidence coefficient .95, α = 1 − .95 = .05 and α/2 = .05/2 = .025. From Table VI, Appendix B, t.025 = 2.262 with df = n − 1 = 10 − 1 = 9. The 95% confidence interval is: 2.2 ± 2.262
364
1.9322 10
⇒ 2.2 ± 1.382 ⇒ (.818, 3.582)
Chapter 10
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
b.
c.
The confidence intervals computed in Exercise 10.63 are much narrower than that found in part a. Thus, x appears to contribute information about the mean value of y.
d.
From Exercise 12.63, βˆ1 = .843, s = .8619, SSxx = 38.9, and n = 10. H0: β1 = 0 Ha: β1 ≠ 0 The test statistic is t =
βˆ1 − 0 sβˆ
=
βˆ1 − 0 s
=
.843 − 0 = 6.10 .8619
1
SSxx
38.9
The rejection region requires α/2 = .05/2 = .025 in each tail of the t-distribution with df = n − 2 = 10 − 2 = 8. From Table VI, Appendix B, t.025 = 2.306. The rejection region is t > 2.306 or t < −2.306. Since the observed value of the test statistic falls in the rejection region (t = 6.10 > 2.306), H0 is rejected. There is sufficient evidence to indicate the straight-line model contributes information for the prediction of y at α = .05. 10.70
10.72
a.
The 95% confidence interval for E(y) when y = .52 is (3,598.1, 3,868.1). We are 95% confident that the mean asking price for a diamond weighing .52 carats is between $3,598.10 and $3,868.10.
b.
The 95% prediction interval for y when y = .52 is (1529.8, 5,936.3). We are 95% confident that the actual asking price for a diamond weighing .52 carats is between $1,529.80 and $5,936.30.
Answers may vary. One possible answer is: The 90% confidence interval for x = 220.00 is (5.64898, 5.83848). We are 90% confident that the mean sweetness index of all orange juice samples will be between 5.64898 and 5.83848 parts per million when the pectin value is 220.00.
Simple Linear Regression
365
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
10.74
a.
Using MINITAB, the results of the regression analysis are: Regression Analysis: Managers versus UnitsSold The regression equation is Managers = 5.33 + 0.586 UnitsSold Predictor Constant UnitsSol
Coef 5.325 0.58610
S = 2.566
SE Coef 1.180 0.03818
R-Sq = 92.9%
T 4.51 15.35
P 0.000 0.000
R-Sq(adj) = 92.5%
Analysis of Variance Source Regression Residual Error Total
DF 1 18 19
SS 1552.0 118.6 1670.5
MS 1552.0 6.6
F 235.63
P 0.000
To determine the usefulness of the model, we test: H0: β1 = 0 Ha: β1 ≠ 0 The test statistic is t =
βˆ1 − 0 sβˆ
= 15.35 (from printout).
1
The rejection region requires α/2 = .05/2 = .025 in each tail of the t-distribution with df = n − 2 = 20 − 2 = 18. From Table VI, Appendix B, t.025 = 2.101. The rejection region is t > 2.101 or t < −2.101. Since the observed value of the test statistic falls in the rejection region (t = 15.35 > 2.101), H0 is rejected. There is sufficient evidence to indicate the model is useful at α = .05. Therefore, the monthly sales is useful in predicting the number of managers at α = .05. b.
For confidence coefficient .90, α = 1 − .90 = .10 and α/2 = .10/2 = .05. From Table VI, Appendix B, t.05 = 1.734 with df = 18. For xp = 39, x =
∑ x = 540 n
20
= 27, and yˆ = 5.325 + .5861(39) = 28.1829.
The form of the prediction interval is: 2 1 (39 − 27) 2 1 ( xp − x ) + ⇒ 28.183 ± 1.734(2.5664) 1 + yˆ ± tα/2s 1 + + 20 4, 518 n SSxx
⇒ 28.183 ± 4.629 ⇒ (23.554, 32.812) c.
366
We are 90% confident the actual number of managers needed when 39 units are sold is between 23.55 and 32.81.
Chapter 10
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
10.76
a.
From Exercise 10.34, SSxx = 3000 and x = 50. Also, for Brand A, s = 1.211; for Brand B, s = .610. For Brand A, yˆ = 6.62 − .0727(45) = 3.349, while for Brand B, yˆ = 9.31 − .1077(45) = 4.464. The degrees of freedom for both brands is n − 2 = 15 − 2 = 13. For confidence coefficient .90, (i.e., for all parts of this question), α = .10 and α/2 = .05. From Table VI, Appendix B, with df = 13, t.05 = 1.771. The form of both confidence intervals is yˆ ± tα/2s
2 1 ( xp − x ) + n SSxx
For Brand A, we obtain: 1 (45 − 50) + ⇒ 3.349 ± .587 ⇒ (2.762, 3.936) 15 3000 2
3.349 ± 1.771(1.211) For Brand B, we obtain:
1 (45 − 50) + ⇒ 4.464 ± .296 ⇒ (4.168, 4.760) 15 3000 2
4.464 ± 1.771(.610)
The first interval is wider, caused by the larger value of s.
b.
2 1 ( xp − x ) The form of both prediction intervals is yˆ ± tα/2s 1 + + n SSxx
For Brand A, we obtain: 1 (45 - 50) 3.349 ± 1.771(1.211) 1 + + 15 3000
2
⇒ 3.349 ± 2.224 ⇒ (1.125, 5.573)
For Brand B, we obtain: 1 (45 - 50) 4.464 ± 1.771(.610) 1 + + 15 3000
2
⇒ 4.464 ± 1.120 ⇒ (3.344, 5.584)
Again, the first interval is wider, caused by the larger value of s. Each of these intervals is wider than its counterpart from part a, since, for the same x, a prediction interval for an individual y is always wider than a confidence interval for the mean of y. This is due to an individual observation having a greater variance than the variance of the mean of a set of observations.
Simple Linear Regression
367
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
c.
To obtain a confidence interval for the life of a brand A cutting tool that is operated at 100 meters per minute, we use: 2 1 ( xp − x ) yˆ ± tα/2s 1 + + n SSxx
For x = 100, yˆ = 6.62 − .0727(100) = −.65. The degrees of freedom are n − 2 = 15 − 2 = 13. For confidence coefficient .95, α = .05 and α/2 = .025. From Table VI, Appendix B, with df = 13, t.025 = 2.160. Here, we obtain: −.65 ± 2.160(1.211) 1 +
(100 − 50) 2 1 + ⇒ −.65 ± 3.606 ⇒ (−4.256, 2.956) 15 3000
The additional assumption would be that the straight line model fits the data well for the x's actually observed all the way up to the value under consideration, 100. Clearly from the estimated value of −.65, this is not true (usually, negative "useful lives" are not found). 10.78
a.
b.
One possible line is yˆ = x. x
y
yˆ
y - yˆ
1 3 5
1 3 5
1 3 5
0 0 0 0
For this example
∑ ( y − yˆ ) = 0
A second possible line is yˆ = 3.
368
x
y
yˆ
y - yˆ
1 3 5
1 3 5
3 3 3
−2 0 2 0
For this example
∑ ( y − yˆ ) = 0
Chapter 10
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
c.
Some preliminary calculations are:
∑ x = 9 ∑ x = 35 ∑ xy = 35 ∑ y = 9 ∑ y = 35 ∑ x∑ y = 35 − 9(9) = 8 SSxy = ∑ xy − n 3 ( ∑ x ) = 35 − 9 = 8 SSxx = ∑ x − n 3 ( ∑ y ) = 35 − 9 = 8 SSyy = ∑ y − 3 n 2
2
2
2
2
2
i
2 i
βˆ1 =
SS xy SS xx
=
2
8 =1 8
9 ⎛9⎞ βˆ0 = βˆ1 x = − 1 ⎜ ⎟ = 0 3
⎝3⎠
The least squares line is yˆ = 0 + 1x = x. d.
For yˆ = x, SSE = SSyy βˆ1 SSxy = 8 − 1(8) = 0 For yˆ = 3, SSE = ∑ ( yi − yˆi ) 2 = (1 − 3)2 + (3 − 3)2 + (5 − 3)2 = 8 The least squares line has the smallest SSE of all possible lines.
10.80
a.
The variables x and y do appear to be related. It appears when x increases, y tends to increase. b.
r = r 2 = .612 = .7823 The correlation between concentration and exhaustion index is .7823. This relationship is positive since r > 0. The relationship is fairly strong. No, this does not mean that concentration causes emotional exhaustion. They are just related.
Simple Linear Regression
369
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
c.
To determine if the straight-line relationship is useful, we test: H0: β1 = 0 Ha: β1 ≠ 0 The test statistic is t =
βˆ1 − 0 sβˆ
= 6.03
1
The rejection region requires α/2 = .05/2 = .025 in each tail of the t-distribution with df = n − 2 = 25 − 2 = 23. From Table VI, Appendix B, t.025 = 2.069. The rejection region is t > 2.069 or t < −2.069. Since the observed value of the test statistic falls in the rejection region (t = 6.03 > 2.069), H0 is rejected. There is sufficient evidence to indicate the model is useful for predicting burnout at α = .05. d.
r2 = .612 61.2% of the sample variation of exhaustion index is explained by the linear relationship between the exhaustion index and concentration.
e.
For confidence level .95, α = .05 and α/2 = .05/2 = .025. From Table VI, Appendix B, with df = n – 2 = 25 – 2 = 23, t.025 = 2.069. The 95% confidence interval is:
βˆ1 ± t.025 sβˆ ⇒ 8.865 ± 2.069(1.471) ⇒ 8.865 ± 3.043 ⇒ (5.822, 11.908) 1
We are 95% confident that the change in mean exhaustion index for each unit change in concentration is between 5.822 and 11.908. f.
For confidence coefficient .95, α = 1 − .95 and α/2 = .05/2 = .025. From Table VI, Appendix B, t.025 = 2.069 with df = 23. The confidence interval is: 2 1 ( xp − x ) yˆ ± tα/2s where yˆ = −29.497 + 8.865(80) = 679.703 + n SSxx
1 (80 − 68.56) + ⇒ 679.703 ± 80.054 25 14, 026.16 ⇒ (599.678, 759.757) 2
⇒ 679.703 ± 2.069(174.2074)
We are 95% confident that the interval from 599.648 to 759.757 encloses the mean exhaustion level for all professionals who have 80% of their social contacts within their work groups.
370
Chapter 10
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
10.82
a.
∑ x = 590,124 ∑ x = 27,727,637,890 ∑ xy = 1,396,503,941 ∑ y = 30,537.4 ∑ y = 73,506,140.4 ( ∑ x )( ∑ y ) = 1,396,503,941 − 590,124(30, 537.4) = 10,284,507 SSxy = ∑ xy − 13 13 ( ∑ x ) = 27,727,637,890 − 590,124 = 939,458,250 SSxx = ∑ x − 13 13 2
2
2
2
2
10, 284, 507 = .010947274 ≈ .0109 939, 458, 250 SS xx .010947274(590,124) 30, 537.4 βˆ1 = y − βˆ1 x = = 1852.088523 ≈ 1852.089 − 13 13
βˆ1 =
SS xy
=
The least squares line is yˆ = 1852.089 + .0109x. b.
The plot of the data is:
c.
Based on the graph, it does not appear that the line fits the data very well. The points do not lie very close to the line.
d.
Some preliminary calculations are: SS yy = ∑ y
2
(∑ y) − n
2
= 73, 506,140.4 −
(30, 537.4) 2 = 1, 772,848.19 13
SSE = SS yy − βˆ1SS xy = 1, 772,848.19 − (0.010947274)(10, 284, 507) = 1, 660, 260.874
SSE 1, 660, 260.874 = = 150, 932.8067 n−2 13 − 2 and s = 150, 932.8067 = 388.501 s 2 = MSE =
Simple Linear Regression
371
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
For confidence level .95, α = .05 and α/2 = .05/2 = .025. From Table VI, Appendix B, with df = n – 2 = 13 – 2 = 11, t.025 = 2.201. The 95% confidence interval is:
βˆ1 ± t.025 sβˆ ⇒ .0109 ± 2.201 1
10.84
10.86
388.501 939, 458, 250
⇒ .0109 ± .0279 ⇒ (−0.0170, 0.0388)
e.
Since 0 is contained in the 95% confidence interval, there is no evidence to indicate that there is a linear relationship between buying income and retail sales.
a.
r = .14. Because this value is close to 0, there is a very weak positive linear relationship between math confidence and computer interest for boys.
b.
r = .33. Because this value is fairly close to 0, there is a weak positive linear relationship between math confidence and computer interest for girls.
a.
βˆ1 = .020. For each additional 1% increase in leaves infected, the mean log of the average number of infections per leaf is estimated to increase by .02.
b.
r2 = .816. 81.6% of the total sample variability around the sample mean log of the average number of infections per leaf is explained by the linear relationship between the log of the average number of infections per leaf and the percentage of leaves infected.
c.
s = .288. We would expect most of the observed values of the log of the average number of infections per leaf to fall within ±2s or ±2(.288) or .576 units of their predicted values.
d.
r = .816 = .903. Because this number is close to 1, there is a fairly strong positive linear relationship between the log of the average number of infections per leaf and the percentage of leaves infected.
e.
To determine if there is a linear relationship between the log of the average number of infections per leaf and the percentage of leaves infected, we test: H0: β1 = 0 Ha: β1 ≠ 0 The test statistic is t =
r (1 − r ) /(n − 2) 2
=
.903 (1 − .816) /(100 − 2)
= 20.83
The rejection region requires α/2 = .05/2 = .025 in each tail of the t-distribution with df = n − 2 = 100 − 2 = 98. From Table VI, Appendix B, t.025 ≈ 1.99. The rejection region is t < −1.99 or t > 1.99. Since the observed value of the test statistic falls in the rejection region (t = 20.83 > 1.99), H0 is rejected. There is sufficient evidence to indicate that there is a linear relationship between the log of the average number of infections per leaf and the percentage of leaves infected at α = .05.
372
Chapter 10
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
10.88
f.
For xp = 80%, yˆ = −.939 + .020(80) = .661. The antilog (base 10) of .661 is 4.58. Thus, when the percentage of leaves infected is 80%, the average number of infections per leaf is predicted to be 4.58.
a.
A straight line model relating an NFL team’s current value to its operating income is: y = β0 + β1x + ε
b.
∑ x = 1,037.6
∑ y = 26, 207
∑ x 2 = 38,996.28 x=
∑ y 2 = 22,024,389
∑ x = 1,037.6 = 32.425 32
n
SSxy = ∑ xy −
∑ xy = 879, 473.1
y=
∑ y = 26, 207 = 818.96875 32
n
( ∑ x )( ∑ y ) = 879, 473.1 − 1,037.6(26, 207)
n = 879, 473.1 − 849,761.975 = 29,711.125
(∑ x) −
32
2
(1,037.6) 2 32 n = 38,996.28 − 33,644.18 = 5,352.1
SSxx = ∑ x
βˆ1 =
SSxy SSxx
2
=
= 38,996.28 −
29,711.125 = 5.551302293 ≈ 5.551 5,352.1
βˆo = y − βˆ1 x = 818.96875 − (5.551302293)(32.425) = 638.9677731 ≈ 638.968 The fitted regression line is: yˆ = 638.968 + 5.551x c.
βˆ1 = 5.551. When operating income increases by 1 millon dollars, the mean current value is estimated to increase by 5.551 million dollars. This is meaningful for values of operating income between 7.8 and 54.3 million dollars.
βˆ0 = 638.968. This has no meaning since x = 0 is not in the observed range. d.
Some additional calculations are:
(∑ y) −
2
(26, 207) 2 32 n = 22,024,389 − 21, 462,714.03 = 561,674.97
SS yy = ∑ y
2
= 22,024,389 −
SSE = SS yy − βˆ1SS xy = 561674.97 − 5.551302293(29,711.125) = 396,739.5337
Simple Linear Regression
373
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
s 2 = MSE =
SSE 396,739.5337 = = 13, 224.65112 and n−2 32 − 2
s = 13, 224.65112 = 114.9985 To determine if a linear relationship exists between current value and operating income, we test: H0: β1 = 0 Ha: β1 ≠ 0 The test statistics is t =
βˆ1 − 0 sβˆ
5.551 − 0 = 3.53 114.9985 5,352.1
=
1
No α was given so we will use α = .05. The rejection region requires α/2 = .05/2 = .025 in each tail of the t-distribution with df = n – 2 = 32 – 2 = 30. From Table VI, Appendix B, t.025 = 2.042. The rejection region is t > 2.042 or t < −2.042. Since the observed value of the test statistic falls in the rejection region (t = 3.53 > 2.042), H0 is rejected. There is sufficient evidence to indicate a significant linear relationship between current value and operating income at α = .05. r2 =
SS yy − SSE SS yy
=
561,674.97 − 396,739.5337 = .29365 ≈ .294 561,674.97
29.4% of the sample variation around the sample mean current value is explained by the linear relationship between current value and operating income. There is a significant linear relationship between current value and operating income. However, the relationship is not particularly strong. 10.90
a.
Using MINITAB, the regression analysis is: Regression Analysis: BTU versus Area The regression equation is BTU = - 99045 + 103 Area Predictor Constant Area S = 628185
Coef -99045 102.81
SE Coef 261618 15.86
R-Sq = 67.8%
T -0.38 6.48
P 0.709 0.000
R-Sq(adj) = 66.1%
Analysis of Variance Source Regression Residual Error Total
374
DF SS MS 1 1.65850E+13 1.65850E+13 20 7.89232E+12 3.94616E+11 21 2.44773E+13
F 42.03
P 0.000
Chapter 10
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
Predicted Values for New Observations New Obs 1
Fit 723467
SE Fit 165874
(
95.0% CI 377459, 1069475)
95.0% PI ( -631816, 2078750)
Values of Predictors for New Observations New Obs Area 1 8000
βˆ0 (INTERCEP) = −99045 βˆ1 (AREA) = 102.81 b.
To determine if energy consumption is positively linearly related to the shell area, we test: H0: β1 = 0 Ha: β1 > 0 The test statistic is t = 6.48 (from printout). The rejection region requires α = .10 in the upper tail of the t-distribution with df = n − 2 = 22 − 2 = 20. From Table VI, Appendix B, t.10 = 1.325. The rejection region is t > 1.325. Since the observed value of the test statistic falls in the rejection region (t = 6.48 > 1.325), H0 is rejected. There is sufficient evidence to indicate that energy consumption is positively linearly related to the shell area at α = .10.
c.
Since this is a one-tailed test but the output calculates the p-value for a two-tailed test, the observed significance level is: 1 ( Prob > T 2
) ≤ 12 (.000) = .000
This is the probability of observing our value of t (6.481) or anything larger if β1 = 0. Since this probability is so small, there is strong evidence to reject H0. d.
r2 = R-Square = .678 67.8% of the total sample variability in energy consumption around its mean is explained by the linear relationship between energy consumption and shell area.
e.
From the printout, for xp = 8000, yˆ = 723,467 The 95% prediction interval is (−631,816, 2,078,750). This interval is so large and includes negative BTU's; it is not very useful.
Simple Linear Regression
375
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
10.92
Some preliminary calculations are:
∑ x = 4305 ∑ y = 201,558 a.
βˆ1 =
∑x ∑y
2
∑ xy = 76,652,695 ∑ x 1,652,025 2
2
= 1,652,025
∑ xy = 76,652,695
= 3,571,211,200
= 46.39923427 ≈ 46.3992
The least squares line is yˆ = 46.3992x.
b.
SSxy =
∑ xy −
SSxx =
∑x
βˆ1 =
SSxy SSxx
2
∑ x∑ y n
(∑ x) −
= 76,652,695 −
2
= 1,652,025 −
4305(201,558) = 18,805,549 15 2
4305 = 416,490 15
n 18,805,549 = = 45.15246224 ≈ 45.1525 416, 490
βˆ0 − y − βˆ1 x =
201,558 ⎛ 4305 ⎞ − 45.15246224 ⎜ ⎟ = 478.4433 15 ⎝ 15 ⎠
The least squares line is yˆ = 478.4433 + 45.1525x. c.
376
Because x = 0 is not in the observed range, we are trying to represent the data on the observed interval with the best fitting line. We are not concerned with whether the line goes through (0, 0) or not.
Chapter 10
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
d.
Some preliminary calculations are:
(∑ y) −
2
201,5582 = 862,836,042 15 n SSE = SSyy − βˆ1 SSxy = 862,836,042 - 45.15246224(18,805,549) = 13,719,200.88 SSE 13,719, 200.88 s2 = = = 1,055,323.145 s = 1027.2892 n−2 15 − 2
SSyy =
∑y
2
= 3,571, 211, 200 −
H0: β0 = 0 Ha: β0 ≠ 0 The test statistic is t =
βˆ0 − 0 2
=
x 1 + s n SSxx
478.443 2 1 1027.2892 + 287 15 416, 490
= .906
The rejection region requires α/2 = .10/2 = .05 in each tail of the t-distribution with df = n − 2 = 15 − 2 = 13. From Table VI, Appendix B, t.05 = 1.771. The rejection region is t < −1.771 or t > 1.771. Since the observed value of the test statistic does not fall in the rejection region (t = .906 >/ 1.771), H0 is not rejected. There is insufficient evidence to indicate β0 is different from 0 at α = .10. Thus, β0 should not be included in the model. 10.94
Answers may vary. Possible answer: The scaffold-drop survey provides the most accurate estimate of spall rate in a given wall segment. However, the drop areas were not selected at random from the entire complex; rather, drops were made at areas with high spall concentrations. Therefore, if the photo spall rates could be shown to be related to drop spall rates, then the 83 photo spall rates could be used to predict what the drop spall rates would be. a.
Construct a scattergram for the data.
The scattergram shows a positive relationship between the photo spall rate (x) and the drop spall rate (y).
Simple Linear Regression
377
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
b.
Find the prediction equation for drop spall rate. The MINITAB output shows the results of the analysis. The regression equation is drop = 2.55 + 2.76 photo Predictor Constant photo S = 4.164
Coef 2.548 2.7599
StDev 1.637 0.2180
R-Sq = 94.7%
T P 1.56 0.154 12.66 0.000 R-Sq(adj) = 94.1%
Analysis of Variance Source DF SS Regression 1 2777.5 Residual Error 9 156.0 Total 10 2933.5 Unusual Observations Obs photo drop 11 11.8 43.00
MS 2777.5 17.3
F P 160.23 0.000
Fit StDev Fit 35.11 1.97
Residual St Resid 7.89 2.15R
R denotes an observation with a large standardized residual yˆ = 2.55 + 2.76x c.
Conduct a formal statistical hypthesis test to determine if the photo spall rates contribute information for the prediction of drop spall rates. H0: β1 = 0 Ha: β1 ≠ 0 The test statistic is t = 12.66, with p-value < .0001. Reject H0 for any level of significance ≥ .0001. There is sufficient evidence to indicate that photo spall rates contribute information for the prediction of drop spall rates at α ≥ .0001.
d.
378
One could now use the 83 photos spall rates to predict values for 83 drop spall rates. Then use this information to estimate the true spall rate at a given wall segment and estimate to total spall damage.
Chapter 10
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
Multiple Regression and Model Building
11.2
a.
βˆ0 = 506.346, βˆ1 = −941.900, βˆ2 = -429.060
b.
yˆ = 506.346 − 941.900x1 − 429.060x2
c.
SSE = 151,016, MSE = 8883, s = 94.251
Chapter 11
We expect about 95% of the y-values to fall within ±2s or ±2(94.251) or ±188.502 units of the fitted regression equation. d.
H0: β1 = 0 Ha: β1 ≠ 0
The test statistic is t =
βˆ1 − 0 sβˆ
=
−941.900 = −3.42 275.08
1
The rejection region requires α/2 = .05/2 = .025 in each tail of the t distribution with df = n − (k + 1) = 20 - (2 + 1) = 17. From Table VI, Appendix B, t.025 = 2.110. The rejection region is t < −2.110 or t > 2.110. Since the observed value of the test statistic falls in the rejection region (t = −3.42 < −2.110), H0 is rejected. There is sufficient evidence to indicate β1 ≠ 0 at α = .05. e.
For confidence coefficient .95, α = .05 and α/2 = .025. From Table VI, Appendix B, with df = n − (k + 1) = 20 − (2 + 1) = 17, t.025 = 2.110. The 95% confidence interval is:
βˆ2 ± t.025 sβˆ ⇒ −429.060 ± 2.110(379.83) ⇒ −429.060 ± 801.441 2
⇒ (−1230.501, 372.381) f.
R2 = R-Sq = 45.9% . 45.9% of the total sample variation of the y values is explained by the model containing x1 and x2. R2a = R-Sq(adj) = 39.6%. 39.6% of the total sample variation of the y values is explained by the model containing x1 and x2, adjusted for the sample size and the number of parameters in the model.
Multiple Regression and Model Building
379
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
g.
To determine if at least one of the independent variables is significant in prediction y, we test: H0: β1 = β2 = 0 Ha: At least one βi ≠ 0 From the printout, the test statistic is F = 7.22 Since no α level was given, we will choose α = .05. The rejection region requires α = .05 in the upper tail of the F-distribution with ν1 = k = 2 and ν2 = n – (k + 1) = 20 – (2 + 1) = 17. From Table IX, Appendix B, F.05 = 3.59. The rejection region is F > 3.59. Since the observed value of the test statistic falls in the rejection region ( F = 7.22 > 3.59), H0 is rejected. There is sufficient evidence to indicate at least one of the variables, x1 or x2, is significant in predicting y at α = .05.
11.4
h.
The observed significance level of the test is p-value = 0.005. Since the p-value is so small, we will reject H0 for most reasonable values of α. There is sufficient evidence to indicate at least one of the variables, x1 or x2, is significant in predicting y at α greater than 0.005.
a.
We are given βˆ1 = 3.1, sβˆ = 2.3, and n = 25. 1
H0: β1 = 0 Ha: β1 > 0 The test statistic is t =
βˆ1 − 0 sβˆ
=
3.1 = 1.35 2.3
1
The rejection region requires α = .05 in the upper tail of the t distribution with df = n − (k + 1) = 25 − (2 + 1) = 22. From Table VI, Appendix B, t.05 = 1.717. The rejection region is t > 1.717. Since the observed value of the test statistic does not fall in the rejection region (t = 1.35 >/ 1.717), H0 is not rejected. There is insufficient evidence to indicate β1 > 0 at α = .05. b.
We are given βˆ2 = .92, sβˆ = .27, and n = 25. 2
H0: β2 = 0 Ha: β2 ≠ 0 The test statistic is t =
βˆ2 − 0 sβˆ
=
.92 = 3.41 .27
2
380
Chapter 11
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
The rejection region requires α/2 = .05/2 = .025 in each tail of the t distribution with df = n − (k + 1) = 25 − (2 + 1) = 22. From Table VI, Appendix B, t.025 = 2.074. The rejection region is t < −2.074 or t > 2.074. Since the observed value of the test statistic falls in the rejection region (t = 3.41 > 2.074), reject H0. There is sufficient evidence to indicate β2 ≠ 0 at α = .05. c.
For confidence coefficient .90, α = 1 − .90 = .10 and α/2 = .10/2 = .05. From Table VI, Appendix B, with df = n − (k + 1) = 25 − (2 + 1) = 22, t.05 = 1.717. The confidence interval is:
βˆ1 ± t.05 sβˆ ⇒ 3.1 ± 1.717(2.3) ⇒ 3.1 ± 3.949 ⇒ (−.849, 7.049) 1
We are 90% confident that β1 falls between −.849 and 7.049. d.
For confidence coefficient .99, α = 1 − .99 = .01 and α/2 = .01/2 = .005. From Table VI, Appendix B, with df = n − (k + 1) = 25 − (2 + 1) = 22, t.005 = 2.819. The confidence interval is:
βˆ2 ± t.005 sβˆ ⇒ .92 ± 2.819(.27) ⇒ .92 ± .761 ⇒ (.159, 1.681) 2
We are 99% confident that β2 falls between .159 and 1.681. 11.6
a.
For x2 = 1 and x3 = 3, E(y) = 1 + 2x1 + 1 − 3(3) E(y) = 2x1 − 7 The graph is :
Multiple Regression and Model Building
381
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
b.
For x2 = −1 and x3 = 1 E(y) = 1 + 2x1 + (−1) − 3(1) E(y) = 2x1 − 3 The graph is:
c.
They are parallel, each with a slope of 2. They have different y-intercepts.
d.
The relationship will be parallel lines.
11.8
No. There may be other independent variables that are important that have not been included in the model, while there may also be some variables included in the model which are not important. The only conclusion is that at least one of the independent variables is a good predictor of y.
11.10
a.
The first order model is: E(y) = β0 + β1 x1 + β2 x2 + β3 x3 + β4 x4 + β5 x5
b.
R2 = .58. 58% of the total sample variation of the levels of trust is explained by the model containing the 5 independent variables.
c.
F=
d.
.58 5 R2 k = = 16.57 2 (1 − R ) [n − (k + 1)] (1 − .58) [66 − (5 + 1)]
The rejection region requires α = .10 in the upper tail of the F-distribution with ν1 = k = 5 and ν2 = n – (k + 1) = 66 – (5 + 1) = 60. From Table VIII, Appendix B, F.10 = 1.90. The rejection region is F > 1.96. Since the observed value of the test statistic falls in the rejection region (F = 16.57 > 1.96), H0 is rejected. There is sufficient evidence to indicate that at least one of the 5 independent variables is useful in the prediction of level of trust at α = .10.
11.12
a.
The least squares prediction equation is:
yˆ = 3.70 + .34 x1 + .49 x2 + .72 x3 + 1.14 x4 + 1.51x5 + .26 x6 − .14 x7 − .10 x8 − .10 x9 .
382
Chapter 11
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
b.
βˆ0 = 3.70 . This is estimate of the y-intercept. It has no other meaning because the point with all independent variables equal to 0 is not in the observed range.
βˆ1 = 0.34 . For each additional walk, the mean number of runs scored is estimated to increase by .30, holding all other variables constant.
βˆ2 = 0.49 . For each additional single, the mean number of runs scored is estimated to increase by .49, holding all other variables constant.
βˆ3 = 0.72 . For each additional double, the mean number of runs scored is estimated to increase by .72, holding all other variables constant.
βˆ4 = 1.14 . For each additional triple, the mean number of runs scored is estimated to increase by 1.14, holding all other variables constant.
βˆ5 = 1.51 . For each additional home run, the mean number of runs scored is estimated to increase by 1.51, holding all other variables constant.
βˆ6 = 0.26 . For each additional stolen base, the mean number of runs scored is estimated to increase by .26, holding all other variables constant.
βˆ7 = −0.14 . For each additional time a runner is caught stealing, the mean number of runs scored is estimated too decrease by .14, holding all other variables constant.
βˆ8 = −0.10 . For each additional strikeout, the mean number of runs scored is estimated to decrease by .10, holding all other variables constant.
βˆ9 = −0.10 . For each additional out, the mean number of runs scored is estimated to decrease by .10, holding all other variables constant. c.
H0: β7 = 0 Ha: β7 < 0
The test statistic is t =
βˆ7 − 0 sβˆ
=
−.14 − 0 = −1.00 .14
7
The rejection region requires α = .05 in the lower tail of the t-distribution with df = n – (k + 1) = 234 – (9 + 1) = 224. From Table VI, Appendix B, t.05 = 1.645. The rejection region is t < −1.645. Since the observed value of the test statistic does not fall in the rejection region (t = −1.00 −1.645), H0 is not rejected. There is insufficient evidence to indicate that the mean number of runs decreases as the number of runners caught stealing increase, holding all other variables constant at α = .05.
Multiple Regression and Model Building
383
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
d.
For confidence level .95, α = .05 and α/2 = .05/2 = .025. From Table VI, Appendix B, with df = 224, t.025 = 1.96. The 95% confidence interval is:
βˆ5 ± tα / 2 sβˆ ⇒ 1.51 ± 1.96(.05) ⇒ 1.51 ± 0.098 ⇒ (1.412, 1.608) 5
We are 95% confident that the mean number of runs will increase by anywhere from 1.412 to 1.608 for each additional home run, holding all other variables constant. 11.14. a.
b.
R2 = .31. 31% of the total sample variation of the natural log of the level of CO2 emissions in 1996 is explained by the model containing the 7 independent variables. The test statistic is F =
.31 7 R2 k = = 3.72 2 (1 − R ) [n − (k + 1)] (1 − .31) [66 − (7 + 1)]
The rejection region requires α = .01 in the upper tail of the F-distribution with ν1 = k = 7 and ν2 = n – (k + 1) = 66 – (7 + 1) = 58. From Table XI, Appendix B, F.01 = 2.95. The rejection region is F > 2.95. Since the observed value of the test statistic falls in the rejection region (F = 3.72 > 2.95), H0 is rejected. There is sufficient evidence to indicate that at least one of the 7 independent variables is useful in the prediction of natural log of the level of CO2 emissions in 1996 at α = .01. c.
To determine if foreign investments in 1980 is a useful predictor of CO2 emissions in 1996, we test: H0: β1 = 0 Ha: β1 ≠ 0
11.16
d.
The test statistic is t = 2.52 and the p-value is p < 0.05. Since the observed p-value is less than α (p < .05), Ho is rejected. There is sufficient evidence to indicate foreign investments in 1980 is a useful predictor of CO2 emissions in 1996 at α = .05.
a.
From MINITAB, the output is: Regression Analysis: DDT versus Mile, Length, Weight The regression equation is DDT = - 108 + 0.0851 Mile + 3.77 Length - 0.0494 Weight Predictor Constant Mile Length Weight
Coef -108.07 0.08509 3.771 -0.04941
S = 97.48
SE Coef 62.70 0.08221 1.619 0.02926
R-Sq = 3.9%
T -1.72 1.03 2.33 -1.69
P 0.087 0.302 0.021 0.094
R-Sq(adj) = 1.8%
Analysis of Variance Source Regression Residual Error Total
384
DF 3 140 143
SS 53794 1330210 1384003
MS 17931 9501
F 1.89
P 0.135
Chapter 11
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
The least squares prediction equation is:
yˆ = −108.07 + 0.08509x1 + 3.771x2 – 0.04941x3 b.
s = 97.48. We would expect about 95% of the observed values of DDT level to fall within 2s or 2(97.48) = 194.96 units of their least squares predicted values.
c.
To determine if at least one of the variables is useful in predicting the DDT level, we test: Ho: β1 = β2 = β3 = 0 Ha: At least 1 βi ≠ 0
The test statistic is F = 1.89 and the p-value is p = .135. Since the p-value is not less than α = .05 (p = .135 .05), H0 is not rejected. There is insufficient evidence to indicate at least one of the variables is useful in predicting the DDT level at α = .05. d.
To determine if DDT level increases as length increases, we test: H0: β2 = 0 Ha: β2 > 0
The test statistics is t = 2.33 The p-value is p = .021/2 = .0105. Since the p-value is less than α (p = .0105 < .05), H0 is rejected. There is sufficient evidence to indicate that DDT level increases as length increases, holding the other variables constant at α = .05. The observed significance level is p = .0105. e.
For confidence coefficient .95, α = .05 and α/2 = .05/2 = .025. From Table VI, Appendix B, with df = n – 3 = 144 – 4 = 140, t.025 = 1.96. The 95% confidence interval is:
βˆ3 ± tα / 2 sβˆ ⇒ −0.04941 ± 1.96(0.02926) ⇒ −0.04941 ± 0.05735 3
⇒ (−0.10676, 0.00794)
We are 95% confident that the mean DDT level will change from −0.10676 to 0.00794 for each additional point increase in weight, holding length and mile constant. Since 0 is in the interval, there is no evidence that weight and DDT level are linearly related.
Multiple Regression and Model Building
385
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
11.18
a.
From MINITAB, the output is: Regression Analysis: WeightChg versus Digest, Fiber The regression equation is WeightChg = 12.2 - 0.0265 Digest - 0.458 Fiber Predictor Constant Digest Fiber
Coef 12.180 -0.02654 -0.4578
S = 3.519
SE Coef 4.402 0.05349 0.1283
R-Sq = 52.9%
T 2.77 -0.50 -3.57
P 0.009 0.623 0.001
R-Sq(adj) = 50.5%
Analysis of Variance Source Regression Residual Error Total
DF 2 39 41
SS 542.03 483.08 1025.12
MS 271.02 12.39
F 21.88
P 0.000
yˆ = 12.2 − .0265x1 − .458x2 b.
βˆ0 = 12.2 = the estimate of the y-intercept βˆ1 = −.0265. We estimate that the mean weight change will decrease by .0265% for each additional increase of 1% in digestion efficiency, with acid-detergent fibre held constant.
βˆ2 = −.458. We estimate that the mean weight change will decrease by .458% for each additional increase of 1% in acid-detergent fibre, with digestion efficiency held constant. c.
To determine if digestion efficiency is a useful predictor of weight change, we test: H0: β1 = 0 Ha: β1 ≠ 0 The test statistic is t = −.50. The p-value is p = .623. Since the p-value is greater than α (p = .623 > .01), H0 is not rejected. There is insufficient evidence to indicate that digestion efficiency is a useful linear predictor of weight change at α = .01.
d.
For confidence coefficient .99, α = 1 − .99 = .01 and α/2 = .01/2 = .005. From Table VI, Appendix B, with df = n − (k + 1) = 42 − (2 + 1) = 39, t.005 ≈ 2.704. The 99% confidence interval is:
βˆ2 ± t.005 sβˆ ⇒ −.4578 ± 2.704 (.1283) ⇒ −.4578 ± .3469 ⇒ (−.8047, −.1109) 2
We are 99% confident that the change in mean weight change for each unit change in acid-detergent fiber, holding digestion efficiency constant is between −.8047% and −.1109%.
386
Chapter 11
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
e.
R2 = R-Sq = 52.9%. 52.9% of the total sample variance of the weight changes is explained by the model containing the 2 independent variables, digestion efficiency ad acid-detergent fiber. R2a = R-Sq(adj) = 50.5%. 50.5% of the total sample variance of the weight changes is explained by the model containing the 2 independent variables, digestion efficiency ad acid-detergent fiber, adjusting for the sample size and the number of parameters in the model.
f.
To determine if at least one of the variables is useful in predicting weight change, we test: H0: β1 = β2 = 0 Ha: At least 1 βi ≠ 0 The test statistic is F = 21.88 and the p-value is p = .000. Since the p-value is less than α = .05 (p = .000 < .05), H0 is rejected. There is sufficient evidence to indicate at least one of the variables is useful in predicting weight change at α = .05.
11.20
a.
The least squares prediction equation is: yˆ = −4.30 − .002x1 + .336x2 + .384x3 + .067x4 − .143x5 + .081x6 + .134x7
b.
To determine if the model is adequate, we test: H0: β1 = β2 = β3 = β4 = β5 = β6 = β7 = 0 Ha: At least one βi ≠ 0, i = 1, 2, 3, ..., 7 The test statistic is F = 111.1 (from table). Since no α was given, we will use α = .05. The rejection region requires α = .05 in the upper tail of the F-distribution with ν1 = k = 7 and ν2 = n − (k + 1) = 268 − (7 + 1) = 260. From Table IX, Appendix B, F.05 ≈ 2.01. The rejection region is F > 2.01. Since the observed value of the test statistic falls in the rejection region (F = 111.1 > 2.01), H0 is rejected. There is sufficient evidence to indicate that the model is adequate for predicting the logarithm of the audit fees at α = .05.
c.
βˆ3 = .384.
For each additional subsidiary of the auditee, the mean of the logarithm of audit fee is estimated to increase by .384 units.
Multiple Regression and Model Building
387
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
d.
To determine if the β4 > 0, we test: H0: β4 = 0 Ha: β4 > 0 The test statistic is t = 1.76 (from table). The p-value for the test is .079. Since the p-value is not less than α (p = .079 α = .05), H0 is not rejected. There is insufficient evidence to indicate that β4 > 0, holding all the other variables constant, at α = .05.
e.
To determine if the β1 < 0, we test: H0: β1 = 0 Ha: β1 < 0 The test statistic is t = −0.049 (from table). The p-value for the test is .961. Since the p-value is not less than α (p = .961 α = .05), H0 is not rejected. There is insufficient evidence to indicate that β1 < 0, holding all the other variables constant, at α = .05. There is insufficient evidence to indicate that the new auditors charge less than incumbent auditors.
11.22
To determine if the model is useful, we test: H0: β1 = β2 = ⋅⋅⋅ = β18 = 0 Ha: At least one βi ≠ 0, i = 1, 2, ... , 18 The test statistic is F =
R2 / k .95 /18 = = 1.06 2 (1 − R ) /[n − ( k + 1)] (1 − .95) /[20 − (18 + 1)]
The rejection region requires α = .05 in the upper tail of the F distribution with ν1 = k = 18 and ν2 = n − (k + 1) = 20 − (18 + 1) = 1. From Table IX, Appendix B, F.05 ≈ 245.9. The rejection region is F > 245.9. Since the observed value of the test statistic does not fall in the rejection region (F = 1.06 >/ 247), H0 is not rejected. There is insufficient evidence to indicate the model is adequate at α = .05. Note: Although R2 is large, there are so many variables in the model that ν2 is small.
388
Chapter 11
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
11.24
a.
From MINITAB, the output is: Regression Analysis: Labor versus Pounds, Units, Weight The regression equation is Labor = 132 + 2.73 Pounds + 0.0472 Units - 2.59 Weight Predictor Constant Pounds Units Weight
Coef 131.92 2.726 0.04722 -2.5874
S = 9.810
SE Coef 25.69 2.275 0.09335 0.6428
R-Sq = 77.0%
T 5.13 1.20 0.51 -4.03
P 0.000 0.248 0.620 0.001
R-Sq(adj) = 72.7%
Analysis of Variance Source Regression Residual Error Total Source Pounds Units Weight
DF 3 16 19
DF 1 1 1
SS 5158.3 1539.9 6698.2
MS 1719.4 96.2
F 17.87
P 0.000
Seq SS 3400.6 198.4 1559.3
The least squares equation is: yˆ = 131.92 + 2.726x1 + .0472x2 − 2.587x3 b.
To test the usefulness of the model, we test: H0: β1 = β2 = β3 = 0 Ha: At least one βi ≠ 0, for i = 1, 2, 3 The test statistic is F =
MSR 1719.4 = = 17.87 MSE 96.2
The rejection region requires α = .01 in the upper tail of the F-distribution with ν1 = k = 3 and ν2 = n − (k + 1) = 20 − (3 + 1) = 16. From Table XI, Appendix B, F.01 = 5.29. The rejection region is F > 5.29. Since the observed value of the test statistic falls in the rejection region (F = 17.87 > 5.29), H0 is rejected. There is sufficient evidence to indicate a relationship exists between hours of labor and at least one of the independent variables at α = .01. c.
H0: β2 = 0 Ha: β2 ≠ 0 The test statistic is t = .51. The p-value = .620. We reject H0 if p-value < α. Since .620 > .05, do not reject H0. There is insufficient evidence to indicate a relationship exists between hours of labor and percentage of units shipped by truck, all other variables held constant, at α = .05.
Multiple Regression and Model Building
389
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
d.
R2 is printed as R-Sq. R2 = .770. We conclude that 77% of the sample variation of the labor hours is explained by the regression model, including the independent variables pounds shipped, percentage of units shipped by truck, and weight.
e.
If the average number of pounds per shipment increases from 20 to 21, the estimated change in mean number of hours of labor is −2.587. Thus, it will cost $7.50(2.587) = $19.4025 less, if the variables x1 and x2 are constant.
f.
Since s = Standard Error = 9.81, we can estimate approximately with ±2s precision or ±2(9.81) or ±19.62 hours.
g.
No. Regression analysis only determines if variables are related. It cannot be used to determine cause and effect.
11.26
From the printout, the 90% prediction interval is (-151.996, 175.4874). We are 90% confidence that an actual DDT level for a fish caught 100 miles upstream that is 40 centimeters long and weighs 800 grams will be between -151.996 and 175.4874. Since the DDT level cannot be negative, the interval would be between 0 and 175.4874.
11.28
a.
From MINITAB, the output is: Regression Analysis: Precip versus Altitude, Latit, Coast The regression equation is Precip = - 102 + 0.00409 Altitude + 3.45 Latit - 0.143 Coast Predictor Constant Altitude Latit Coast
Coef -102.36 0.004091 3.4511 -0.14286
S = 11.10
SE Coef 29.21 0.001218 0.7949 0.03634
R-Sq = 60.0%
T -3.50 3.36 4.34 -3.93
P 0.002 0.002 0.000 0.001
R-Sq(adj) = 55.4%
Analysis of Variance Source Regression Residual Error Total Source Altitude Latit Coast
DF 1 1 1
DF 3 26 29
SS 4809.4 3202.3 8011.7
MS 1603.1 123.2
F 13.02
P 0.000
Seq SS 730.7 2175.3 1903.4
Predicted Values for New Observations New Obs 1
Fit 29.25
SE Fit 5.60
(
95.0% CI 17.75, 40.76)
(
95.0% PI 3.71, 54.80)
Values of Predictors for New Observations New Obs Altitude Latit Coast 1 6360 36.6 145
The fitted regression line is:
yˆ = −102.36 + 0.00409 x1 + 3.4511x2 − 0.1429 x3
390
Chapter 11
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
b.
To determine if the first-order model is useful for the predicting annual precipitation, we test:
H0: β1 = β2 = β3 = 0 Ha: At least one βi ≠ 0, i = 1, 2, 3 The test statistic is 13.02 and the p-value is p = 0.000. Since the p-value is less than α = .05, H0 is rejected. There is sufficient evidence to indicate that the model is useful for predicting annual precipitation at α = .05. c.
The prediction interval is (3.71, 54.80). With 95% confidence, we can conclude that the annual precipitation for an individual meteorological station with characteristics x1 = 6360 feet, x2 = 36.6°, x3 = 145 miles will fall between 3.71 inches and 54.80 inches.
11.30
The first order model is:
E(y) = β0 + β1x1 + β2x2 + β3x5 We want to find a 95% prediction interval for the actual voltage when the volume fraction of the disperse phase is at the high level (x1 = 80), the salinity is at the low level (x2 = 1), and the amount of surfactant is at the low level (x5 = 2). Using MINITAB, the output is: The regression equation is y = 0.993 - 0.0243 x1 + 0.142 x2 + 0.385 x5 Predictor
Coef
StDev
T
P
0.9326
0.2482
3.76
0.002
x1
-0.024272
0.004900
-4.95
0.000
x2
0.14206
0.07573
1.88
0.080
x5
0.38457
0.09801
3.92
0.001
Constant
S = 0.4796
R-Sq = 66.6%
R-Sq(adj) = 59.9%
Analysis of Variance Source Regression Residual
DF
SS
MS
F
P
3
6.8701
2.2900
9.95
0.001
15
3.4509
0.2301
18
10.3210
Error Total Sourc
DF
Seq SS
x1
1
1.4016
x2
1
1.9263
x5
1
3.5422
e
Multiple Regression and Model Building
391
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
Unusual Observations Obs
x1
y
Fit
StDev Fit
Residual
St Resid
3
40.0
3.200
2.068
0.239
1.132
2.72R
R denotes an observation with a large standardized residual Predicted Values Fit
StDev Fit
-0.098
0.232
95.0% ( -0.592,
CI 0.396)
95.0% (
-1.233,
PI 1.038)
The 95% prediction interval is (−1.233, 1.038). We are 95% confident that the actual voltage is between −1.233 and 1.038 kw/cm when the volume fraction of the disperse phase is at the high level (x1 = 80), the salinity is at the low level (x2 = 1), and the amount of surfactant is at the low level (x5 = 2). 11.32
11.34
a.
E(y) = β0 + β1x1 + β2x2 + β3x1x2
b.
E(y) = β0 + β1x1 + β2x2 + β3x3 + β4x1x2 + β5x1x3 + β6x2x3
a.
R2 = 1 −
SSE SS yy
=1−
21 = .956 479
95.6% of the total variability of the y values is explained by this model. b.
To test the utility of the model, we test:
H0: β1 = β2 = β3 = 0 Ha: At least one βi ≠ 0, i = 1, 2, 3 The test statistic is F =
R2 / k .956 / 3 = 202.8 = 2 (1 − R )[n − (k + 1)] (1 − .956)[32 − (3 + 1)]
The rejection region requires α = .05 in the upper tail of the F distribution, with ν1 = k = 3 and ν2 = n − (k + 1) = 32 − (3 + 1) = 28. From Table IX, Appendix B, F.05 = 2.95. The rejection region is F > 2.95. Since the observed value of the test statistic falls in the rejection region (F = 202.8 > 2.95), H0 is rejected. There is sufficient evidence that the model is adequate for predicting y at α = .05.
392
Chapter 11
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
c.
The relationship between y and x1 depends on the level of x2.
d.
To determine if x1 and x2 interact, we test:
H0: β3 = 0 Ha: β3 ≠ 0 The test statistic is t =
βˆ1 − 0 10 sβˆ
=
4
= 2.5.
3
The rejection region requires α/2 = .05/2 = .025 in each tail of the t distribution with df = n − (k + 1) = 32 − (3 + 1) = 28. From Table VI, Appendix B, t.025 = 2.048. The rejection region is t < −2.048 or t > 2.048. Since the observed value of the test statistic falls in the rejection region (t = 2.5 > 2.048), H0 is rejected. There is sufficient evidence to indicate that x1 and x2 interact at α = .05. 11.36
a.
To determine if the overall model is useful for predicting y, we test:
H0: β1 = β2 = β3 = 0 Ha: At least one βi is not 0 The test statistic is F = 226.35 and the p-value is p < .001. Since the p-value is less than α (p < .001 < .05), Ho is rejected. There is sufficient evidence to indicate the overall model is useful for predicting y, willingness of the consumer to shop at a retailer’s store in the future at α = .05. b.
To determine if consumer satisfaction and retailer interest interact to affect willingness to shop at retailer’s shop in future, we test:
H0: β3 = 0 Ha: β3 ≠ 0 The test statistic is t = -3.09 and the p-value is p < .01. Since the p-value is less than α (p < .01 < .05), H0 is rejected. There is sufficient evidence to indicate
Multiple Regression and Model Building
393
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
consumer satisfaction and retailer interest interact to affect willingness to shop at retailer’s shop in future at α = .05. c.
When x2 = 1,
yˆ = βˆo + .426 x1 + .044 x2 − .157 x1 x2 = βˆ + .426 x + .044(1) − .157 x (1) o
1
1
= βˆo + .044 + (.426 − .157) x1 = βˆ + .044 + .269 x o
1
Since no value is given for βˆo , we will use βˆo = 1 for graphing purposes. Using MINITAB, a graph might look like: Scatterplot of YHAT vs X1 when X2=1 3.0
YHA T
2.5
2.0
1.5
1
d.
2
3
4 X1
5
6
7
When x2 = 7,
yˆ = βˆo + .426 x1 + .044 x2 − .157 x1 x2 = βˆ + .426 x + .044(7) − .157 x (7) o
1
1
= βˆo + .308 + (.426 − 1.099) x1 = βˆ + .308 − .673x o
1
Since no value is given for βˆo , we will again use βˆo = 1 for graphing purposes.
394
Chapter 11
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
Using MINITAB, a graph might look like: Scatterplot of YHAT vs X1 when X2=7
0
YHA T
-1
-2
-3
-4 1
e.
2
3
4 X1
5
6
7
Using MINITAB, both plots on the same graph would be: Scatterplot of YAHT vs X1 Variable
3
x2=1 x2=7
2
YHA T
1 0 -1 -2 -3 -4 1
2
3
4 X1
5
6
7
Since the lines are not parallel, it indicates that interaction is present. 11.38
a.
The hypothesized regression model including the interaction between x1 and x2 would be:
E ( y ) = β o + β1 x1 + β 2 x2 + β3 x1 x2 b.
If “x1 and x2 interact to affect y” then the effect of x1 on y depends on the level of x2. Also, the effect of x2 on y depends on the level of x1.
Multiple Regression and Model Building
395
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
c.
Since the p-value is not small (p = .25), Ho is not rejected. There is insufficient evidence to indicate x1 and x2 interact to affect y.
d.
β1 corresponds to x1, the number ahead in line. If the “negative feeling score” gets larger as the number of people ahead increases, then β1 is positive. β2 corresponds to x2, the number behind in line. If the “negative feeling score” gets lower as the number of people behind increases, then β2 is negative.
11.40
a.
If client credibility and linguistic delivery style interact, then the effect of client credibility on the likelihood value depends on the level of linguistic delivery style.
b.
To determine the overall model adequacy, we test:
H0: β1 = β2 = β3 = 0 Ha: At least one βi ≠ 0 c.
The test statistic is F = 55.35 and the p-value is p < 0.0005. Since the p-value is so small (p < 0.0005), H0 is rejected for any reasonable value of α. There is sufficient evidence to indicate that the model is adequate at α > 0.0005.
d.
To determine if client credibility and linguistic delivery style interact, we test:
H0: β3 = 0 Ha: β3 ≠ 0 e.
The test statistic is t = 4.008 and the p-value is p < 0.005. Since the p-value is so small (p < 0.005), H0 is rejected. There is sufficient evidence to indicate that client credibility and linguistic delivery style interact at α > 0.005.
f.
When x1 = 22, the least squares line is:
yˆ = 15.865 + 0.037(22) − 0.678 x2 + 0.036 x2 (22) = 16.679 + 0.114 x2 The estimated slope of the Likelihood-Linguistic delivery style line when client credibility is 22 is 0.114. When client credibility is equal to 22, for each additional point increase in linguistic delivery style, the mean likelihood is estimated to increase by 0.114. g.
When x1 = 46, the least squares line is:
yˆ = 15.865 + 0.037(46) − 0.678 x2 + 0.036 x2 (46) = 17.567 + 0.978 x2 The estimated slope of the Likelihood-Linguistic delivery style line when client credibility is 46 is 0.978. When client credibility is equal to 46, for each additional point increase in linguistic delivery style, the mean likelihood is estimated to increase by 0.978.
396
Chapter 11
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
11.42
a.
E(y) = β0 + β1x1 + β2x2 + β3x3 + β4x4 + β5x5
b.
H0: β4 = 0
c.
t = 4.408, p-value = .001 Since the p-value is so small, there is strong evidence to reject H0. There is sufficient evidence to indicate that the strength of client-therapist relationship contributes information for the prediction of a client's reaction for any α > .001.
11.44
d.
Answers may vary.
e.
R2 = .2946. 29.46% of the variability in the client's reaction scores can be explained by this model.
a.
βˆ1 = .02. The mean level of support for a military response is estimated to increase by .02 for each day increase in level of TV news exposure, all other variables held constant.
b.
To determine if an increase in TV news exposure is associated with an increase in support for military resolution, we test:
H0: β1 = 0 Ha: β1 > 0 The p-value is p = .03/2 = .015. Since the p-value is less than α (p = .015 < .05), H0 is rejected. There is sufficient evidence to indicate that an increase in TV news exposure is associated with an increase in support for military resolution, all other variables held constant, at α = .05. c.
To determine if the relationship between support for military resolution and gender depends on political knowledge, we test:
H0: β8 = 0 Ha: β8 ≠ 0 The p-value is p = .02. Since the p-value is less than α (p = .02 < .05), H0 is rejected. There is sufficient evidence to indicate that the relationship between support for a military resolution and gender depends on political knowledge, all other variables held constant, at α = .05. d.
To determine if the relationship between support for military resolution and race depends on political knowledge, we test:
H0: β9 = 0 Ha: β9 ≠ 0 The p-value is p = .08. Since the p-value is not less than α (p = .08 .05), H0 is not rejected. There is insufficient evidence to indicate that the relationship between
Multiple Regression and Model Building
397
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
support for a military resolution and race depends on political knowledge, all other variables held constant, at α = .05. e.
f.
R2 = .194.
19.4% of the variation in the support for military resolution is explained by the model containing the seven independent variables and the two interaction terms. H0: β1 = β2 = β3 = β4 = β5 = β6 = β7 = β8 = β9 = 0 Ha: At least one βi ≠ 0, i = 1, 2, 3, ... , 9 The test statistic is F =
R2 / k .194 / 9 = = 46.88 2 (1 − R ) /[n − (k + 1)] (1 − .194) /[1763 − (9 + 1)]
The rejection region requires α = .05 in the upper tail of the F distribution with ν1 = k = 9 and ν2 = n − (k + 1) = 1763 − (9 + 1) = 1753. From Table IX, Appendix B, F.05 ≈ 1.88. The rejection region is F > 1.88. Since the observed value of the test statistic falls in the rejection region (F = 46.88 > 1.88), H0 is rejected. There is sufficient evidence to indicate that the model is useful at α = .05. 11.46
a.
H0: β2 = 0 Ha: β2 ≠ 0 The test statistic is t =
βˆ2 − 0 sβˆ
=
.47 − 0 = 3.133 .15
2
The rejection region requires α/2 = .05/2 = .025 in each tail of the t distribution with df = n − (k + 1) = 25 − (2 + 1) = 22. From Table VI, Appendix B, t.025 = 2.074. The rejection region is t < −2.074 or t > 2.074. Since the observed value of the test statistic falls in the rejection region (t = 3.133 > 2.074), H0 is rejected. There is sufficient evidence to indicate the quadratic term should be included in the model at α = .05. b.
H0: β2 = 0 Ha: β2 > 0 The test statistic is the same as in part a, t = 3.133. The rejection region requires α = .05 in the upper tail of the t distribution with df = 22. From Table VI, Appendix B, t.05 = 1.717. The rejection region is t > 1.717. Since the observed value of the test statistic falls in the rejection region (t = 3.133 > 1.717), H0 is rejected. There is sufficient evidence to indicate the quadratic curve opens upward at α = .05.
398
Chapter 11
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
11.48
11.50
a.
b.
It moves the graph to the right (−2x) or to the left (+2x) compared to the graph of y = 1 + x2.
c.
It controls whether the graph opens up (+x2) or down (−x2). It also controls how steep the curvature is, i.e., the larger the absolute value of the coefficient of x2 , the narrower the curve is.
a.
βˆ0 has no meaning because x = 0 would not be in the observed range of values. In this case, x is the year with values between 1984 and 1999.
b.
βˆ1 = −321.67. Since the quadratic effect is included in the model, the linear term is just a location parameter and has no meaning.
c.
βˆ2 = .0794. Since the value of βˆ2 is positive, the curvature is upward.
d.
Since no data have been collected past 1999, we have no idea if the relationship between the two variables from 1984 to 1999 will remain the same until 2021.
Multiple Regression and Model Building
399
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
11.52
a.
Using MINITAB, a sketch of the least squares prediction equation is: Scatterplot of yhat vs Dose 12 10
yhat
8 6 4 2 0 0
100
200
300
400 Dose
500
600
700
800
b.
For x = 500, yˆ = 10.25 + .0053(500) − .0000266(5002 ) = 10.25 + 2.65 − 6.65 = 6.25
c.
For x = 0, yˆ = 10.25 + .0053(0) − .0000266(02 ) = 10.25
d.
For x = 100, yˆ = 10.25 + .0053(100) − .0000266(1002 ) = 10.25 + .53 − .266 = 10.514 This value is slightly larger than that for the control group (10.25). For x = 200, yˆ = 10.25 + .0053(200) − .0000266(2002 ) = 10.25 + 1.06 − 1.064 = 10.246 This value is slightly smaller than that for the control group (10.25). So, the largest value of x which yields an estimated weight change that is closest to, but just less than the estimated weight change for the control group is x = 200.
11.54
a.
A first order model is: E(y) = βo + β1x
b.
A second order model is: E(y) = βo + β1x + β2x2
400
Chapter 11
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
c.
Using MINITAB, a scattergram of these data is: Scatterplot of International vs Domestic 1200
Inter national
1000 800 600 400 200 0 100
200
300
400 Domestic
500
600
From the plot, it appears that the first order model might fit the data better. There does not appear to be much of a curve to the relationship. d.
Using MINITAB, the output is: Regression Analysis: International versus Domestic, Dsq The regression equation is International = 203 - 0.58 Domestic + 0.00364 Dsq Predictor Constant Domestic Dsq
Coef 202.9 -0.581 0.003638
S = 142.696
SE Coef 245.0 1.510 0.002085
R-Sq = 78.8%
T 0.83 -0.38 1.74
P 0.424 0.707 0.107
R-Sq(adj) = 75.2%
Analysis of Variance Source Regression Residual Error Total Source Domestic Dsq
DF 1 1
DF 2 12 14
SS 906515 244345 1150860
MS 453258 20362
F 22.26
P 0.000
Seq SS 844526 61990
To investigate the usefulness of the model, we test:
H0: β1 = β2 = 0 Ha: At least one βi ≠ 0, i = 1, 2
Multiple Regression and Model Building
401
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
The test statistic is F = 22.26. The p-value is p = 0.000. Since the p-value is so small, we reject H0. There is sufficient evidence to indicate the model is useful for predicting foreign gross revenue. To determine if a curvilinear relationship exists between foreign and domestic gross revenues, we test:
H0: β2 = 0 Ha: β2 ≠ 0 The test statistic is t = 1.74 The p-value is p = 0.107. Since the p-value is greater than α = .05 (p = 0.107 > α = .05), H0 is not rejected. There is insufficient evidence to indicate that a curvilinear relationship exists between foreign and domestic gross revenues at α = .05. e.
11.56
402
From the analysis in part d, the first-order model better explains the variation in foreign gross revenues. In part d, we concluded that the second-order term did not improve the model.
a.
b.
It moves the graph to the right (−2x) or to the left (+2x) compared to the graph of y = 1 + x2.
c.
It controls whether the graph opens up (+x2) or down (−x2). It also controls how steep the curvature is, i.e., the larger the absolute value of the coefficient of x2 , the narrower the curve is.
Chapter 11
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
11.58
a.
A scatterplot of the data is: -
*
10500+
*
Y
*
-
*
*
*
7000+
***
-
*
-
* *
-
*
-
** *
*
* *
*
*
-
*
** *
**
3500+
*
* *
*
* *
*
-
*
*
*
*
*
*
+---------+---------+---------+---------+---------+------X 0.0
8.0
16.0
24.0
32.0
40.0
b.
From the plot, it looks like a second-order model would fit the data better than a firstorder model. There is little evidence that a third-order model would fit the data better than a second-order model.
c.
Using MINITAB, the output for fitting a first-order model is: The regression equation is Y = 2752 + 122 X Predictor Constant X
Coef 2752.4 122.34
s = 1904
Stdev 613.5 26.08
R-sq = 36.7%
t-ratio 4.49 4.69
p 0.000 0.000
R-sq(adj) = 35.0%
Analysis of Variance SOURCE Regression Error Total
DF 1 38 39
SS 79775688 137726224 217501920
Unusual Observations Obs. X Y 27 27.0 2007 40 40.0 11520
MS 79775688 3624374
Fit Stdev.Fit 6056 345 7646 591
F 22.01
Residual -4049 3874
p 0.000
St.Resid -2.16R 2.14R
R denotes an obs. with a large st. resid.
Multiple Regression and Model Building
403
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
To see if there is a significant linear relationship between day and demand, we test:
H0: β1 = 0 Ha: β1 ≠ 0 The test statistic is t = 4.69. The p-value for the test is p = 0.000. Since the p-value is less than α = .05, H0 is rejected. There is sufficient evidence to indicate that there is a linear relationship between day and demand at α = .05. d.
Using MINITAB, the output for fitting a second-order model is: The regression equation is Y = 5120 - 216 X + 8.25 XSQ Predictor Constant X XSQ
Coef 5120.2 -215.92 8.250
s = 1637
Stdev 816.9 91.89 2.173
R-sq = 54.4%
t-ratio 6.27 -2.35 3.80
p 0.000 0.024 0.001
R-sq(adj) = 52.0%
Analysis of Variance SOURCE Regression Error Total
DF 2 37 39
SS 118377056 99124856 217501920
SOURCE X XSQ
DF 1 1
SEQ SS 79775688 38601372
Unusual Observations Obs. X Y 27 27.0 2007
MS 59188528 2679050
Fit Stdev.Fit 5305 357
F 22.09
Residual -3298
p 0.000
St.Resid -2.06R
R denotes an obs. with a large st. resid.
To see if there is a significant quadratic relationship between day and demand, we test:
H0: β2 = 0 Ha: β2 ≠ 0 The test statistic is t = 3.80. The p-value for the test is p = 0.001. Since the p-value is less than α = .05, H0 is rejected. There is sufficient evidence to indicate that there is a quadratic relationship between day and demand at α = .05.
404
Chapter 11
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
e.
11.60
Since the quadratic term is significant in the second-order model in part d, the second order model is better.
The model is E(y) = β0 + β1x1 + β2x2 where
⎧ 1 if the variable is at level 2 x1 = ⎨ ⎩ 0 otherwise
⎧ 1 if the variable is at level 3 x2 = ⎨ ⎩ 0 otherwise
β0 = mean value of y when qualitative variable is at level 1. β1 = difference in mean value of y between level 2 and level 1 of qualitative variable. β2 = difference in mean value of y between level 3 and level 1 of qualitative variable. 11.62
a.
The least squares prediction equation is: yˆ = 80 + 16.8x1 + 40.4x2
b.
βˆ1 estimates the difference in the mean value of the dependent variable between level 2 and level 1 of the independent variable.
βˆ2 estimates the difference in the mean value of the dependent variable between level 3 and level 1 of the independent variable. c.
The hypothesis H0: β1 = β2 = 0 is the same as H0: μ1 = μ2 = μ3. The hypothesis Ha: At least one of the parameters β1 and β2 differs from 0 is the same as Ha: At least one mean (μ1, μ2, or μ3) is different.
d.
The test statistic is F =
MSR 2059.5 = = 24.72 MSE 83.3
Since no α was given, we will use α = .05. The rejection region requires α = .05 in the upper tail of the test statistic with numerator df = k = 2 and denominator df = n − (k + 1) = 15 − (2 + 1) = 12. From Table IX, Appendix B, F.05 = 3.89. The rejection region is F > 3.89. Since the observed value of the test statistic falls in the rejection region (F = 24.72 > 3.89), H0 is rejected. There is sufficient evidence to indicate at least one of the means is different at α = .05. 11.64
a.
b.
A confidence interval for the difference of two population means could be used. Since both sample sizes are over 30, the large sample confidence interval is used (with independent samples). ⎧ 1 if public college Let x1 = ⎨ ⎩ 0 otherwise The model is E(y) = β0 + β1x1
Multiple Regression and Model Building
405
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
c.
β1 is the difference between the two population means. A point estimate for β1 is βˆ1 . A confidence interval for β1 could be used to estimate the difference in the two population means.
11.66
a.
⎧ 1 if no Let x1 = ⎨ ⎩ 0 if yes The model would be E(y) = β0 + β1x1 In this model, β0 is the mean job preference for those who responded ‘yes’ to the question "Flextime of the position applied for" and β1 is the difference in the mean job preference between those who responded 'no' to the question and those who answered ‘yes’ to the question.
b.
⎧ 1 if referral Let x1 = ⎨ ⎩ 0 if not
⎧ 1 if on-premise x2 = ⎨ ⎩ 0 if not
The model would be E(y) = βo + β1x1 + β2x2 In this model, βo is the mean job preference for those who responded ‘none’ to level of day care support required, β1 is the difference in the mean job preference between those who responded ‘referral’ and those who responded ‘none’, and β2 is the difference in the mean job preference between those who responded ‘on-premise’ and those who responded ‘none’. c.
⎧ 1 if counseling Let x1 = ⎨ ⎩ 0 if not
⎧ 1 if active search x2 = ⎨ ⎩ 0 if not
The model would be E(y) = β0 + β1x1 + β2x2 In this model, β0 is the mean job preference for those who responded ‘none’ to spousal transfer support required, β1 is the difference in the mean job preference between those who responded ‘counseling’ and those who responded ‘none’, and β2 is the difference in the mean job preference between those who responded ‘active search’ and those who responded ‘none’. d.
⎧ 1 if not married Let x1 = ⎨ ⎩ 0 if married The model would be E(y) = β0 + β1x1 In this model, β0 is the mean job preference for those who responded ‘married’ to marital status and β1 is the difference in the mean job preference between those who responded ‘not married’ and those who answered ‘married’.
406
Chapter 11
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
e.
⎧ 1 if female Let x1 = ⎨ ⎩ 0 if male The model would be E(y) = β0 + β1x1 In this model, β0 is the mean job preference for males and β1 is the difference in the mean job preference between females and males.
11.68
a.
βˆ4 = .296 The difference in the mean value of DTVA between when the operating earnings are negative and lower than last year and when the operating earnings are not negative and lower than last year is estimated to be .296, holding all other variables constant.
b.
To determine if the mean DTVA for firms with negative earnings and earnings lower than last year exceed the mean DTVA of other firms, we test: H0: β4 = 0 Ha: β4 > 0 The p-value for this test is p = .001 / 2 = .0005. Since the p-value is so small, we would reject H0 for α = .05. There is sufficient evidence to indicate the mean DTVA for firms with negative earnings and earnings lower than last year exceed the mean DTVA of other firms at α = .05.
11.70
c.
Ra2 = .280 28% of the variability in the DTVA scores is explained by the model containing the 5 independent variables, adjusted for the number of variables in the model and the sample size.
a.
To determine if there is a difference in the mean monthly rate of return for T-Bills between an expansive Fed monetary policy and a restrictive Fed monetary policy, we test: H0: β1 = 0 Ha: β1 ≠ 0 The test statistic is t = 8.14. Since no n nor α is given, we cannot determine the exact rejection region. However, we can assume that n is greater than 2 since the data used are from 1972 and 1997. With α = .05, the critical value of t for the rejection region will be smaller than 4.303. Thus, with α = .05, t = 8.14 will fall in the rejection region. There is sufficient evidence to indicate a difference in the mean monthly rate of return for T-Bills between an expansive Fed monetary policy and a restrictive Fed monetary policy at α = .05. However, the value of R2 is .1818. The model used is explaining only 18.18% of the variability in the monthly rate of return. This is not a particularly large value.
Multiple Regression and Model Building
407
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
To determine if there is a difference in the mean monthly rate of return for Equity REIT between an expansive Fed monetary policy and a restrictive Fed monetary policy, we test: H0: β1 = 0 Ha: β1 ≠ 0 The test statistic is t = −3.46. Since no n nor α is given, we cannot determine the exact rejection region. However, we can assume that n is greater than 4 since the data used are from 1972 and 1997. With α = .05, the critical value of t for the rejection region will be smaller than 3.182. Thus, with α = .05, t = −3.46 will fall in the rejection region. There is sufficient evidence to indicate a difference in the mean monthly rate of return for Equity REIT between an expansive Fed monetary policy and a restrictive Fed monetary policy at α = .05. However, the value of R2 is .0387. The model used is explaining only 3.87% of the variability in the monthly rate of return. This is a very small value. b.
For the first model, β1 is the difference in the mean monthly rate of return for T-Bills between an expansive Fed monetary policy and a restrictive Fed monetary policy. For the second model, β1 is the difference in the mean monthly rate of return for Equity REIT between an expansive Fed monetary policy and a restrictive Fed monetary policy.
c.
The least squares prediction equation for the equity REIT index is: yˆ = 0.01863 − 0.01582x. When the Federal Reserve’s monetary policy is restrictive, x = 1. The predicted mean monthly rate of return for the equity REIT index is
yˆ = 0.01863 − 0.01582(1) = .00281 When the Federal Reserve’s monetary policy is expansive, x = 0. The predicted mean monthly rate of return for the equity REIT index is yˆ = 0.01863 − 0.01582(0) = .01863. 11.72
a.
The first-order model is E(y) = β0 + β1x1
b.
The new model is E(y) = β0 + β1x1 + β2x2 + β3x3 ⎧1 if level 2 where x 2 = ⎨ ⎩0 otherwise
408
⎧1 if level 3 x3 = ⎨ ⎩0 otherwise
Chapter 11
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
c.
To allow for interactions, the model is: E(y) = β0 + β1x1 + β2x2 + β3x3 + β4x1x2 + β5x1x3
11.74
11.76
d.
The response lines will be parallel if β4 = β5 = 0
e.
There will be one response line if β2 = β3 = β4 = β5 = 0
a.
When x2 = x3 = 0, E(y) = β0 + β1x1 When x2 = 1 and x3 = 0, E(y) = β0 + β1x1 + β2 When x2 = 0 and x3 = 1, E(y) = β0 + β1x1 + β3
b.
For level 1, yˆ = 44.8 + 2.2x1 For level 2, yˆ = 44.8 + 2.2x1 + 9.4 = 54.2 + 2.2x1 For level 3, yˆ = 44.8 + 2.2x1 + 15.6 = 60.4 + 2.2x1
The model is E(y) = β0 + β1x1 + β2 x12 + β3x2 + β4x3 + β5x4 where x1 is the quantitative variable and ⎧ 1 if level 2 of qualitative variable x2 = ⎨ ⎩ 0 otherwise ⎧ 1 if level 3 of qualitative variable x3 = ⎨ ⎩ 0 otherwise ⎧ 1 if level 4 of qualitative variable x4 = ⎨ ⎩ 0 otherwise
Multiple Regression and Model Building
409
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
11.78
a.
E(y) = β0 + β1x1 + β2x2 + β3x1x2
⎧1 if diet is duck chow where x 2 = ⎨ ⎩0 otherwise b.
Using MINITAB, the printout is: The regression equation is WtChg = -2.21 + 0.0783x1 - 10.4x2 - 0.095x1x2 Predicto r Constant x1 x2 x1x2 S = 3.882
Coef
StDev
T
P
-2.210 0.07831 10.354 -0.0948
1.250 0.04947 8.538 0.1418
-1.77 1.58 1.21 -0.67
0.085 0.122 0.233 0.508
R-Sq = 44.1%
R-Sq(adj) = 39.7
Analysis of Variance Source Regression Residual Error Total Sourc e x1 x2 x1x2
DF 3 38
SS 452.54 572.58
41
1025.12
DF
Seq SS
1 1 1
384.24 61.57 6.73
MS 150.85 15.07
F 10.01
P 0.000
Unusual Observations Obs 12 37 40
x1 30.0 42.5 75.0
y -8.500 8.000 8.500
WtChg StDev Fit Residual St Resid 0.139 0.802 -8.639 -2.27R 7.445 2.990 0.555 0.22 X 6.910 2.077 1.590 0.48 X
R denotes an observation with a large standardized residual X denotes an observation whose X value gives it large influence.
The fitted equation is yˆ = −2.21 + .0783x1 + 10.4x2 − .095x1x2
410
Chapter 11
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
c.
For diet = plants, x2 = 0 yˆ = −2.21 + .0783x1 + 10.4(0) − .095x1(0) = −2.21 + .0783x1
The slope is .0783. For each unit increase in digestion efficiency, the mean weight change is estimated to increase by .0783 for goslings fed plants. d.
For diet = plants, x2 = 1
yˆ = −2.21 + .0783x1 + 10.4(1) − .095x1(1) = 8.19 − .0167x1 The slope is −.0167. For each unit increase in digestion efficiency, the mean weight change is estimated to decrease by .0167 for goslings fed duck chow. e.
To determine if the slopes associated with the two diets differ, we test: H0: β3 = 0 Ha: β3 ≠ 0
From MINITAB, the test statistic is t = −.67 with p-value = .508 Since α = .05 is less than the p-value, we fail to reject H0. There is insufficient evidence to conclude that the slopes associated with the two diets are significantly different at α = .05 11.80
a.
⎧ 1 if intervention group Let x2 = ⎨ ⎩ 0 if otherwise The first-order model would be: E(y) = β0 + β1x1 + β2x2
b.
For the control group, x2 = 0. The first-order model is: E(y) = β0 + β1x1 + β2(0) = β0 + β1x1
For the intervention group, x2 = 1. The first-order model is: E(y) = β0 + β1x1 + β2(1) = β0 + β1x1 + β2 = (β0 + β2) + β1x1
In both models, the slope of the line is β1. c.
If pretest score and group interact, the first-order model would be: E(y) = β0 + β1x1 + β2x2 + β3x1x2
Multiple Regression and Model Building
411
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
d.
For the control group, x2 = 0. The first-order model including the interaction is: E(y) = β0 + β1x1 + β2(0) + β3x1(0) = β0 + β1x1
For the intervention group, x2 = 1. The first-order model including the interaction is: E(y) = β0 + β1x1 + β2(1) + β3x1(1) = β0 + β1x1 + β2 + β3x1 = (β0 + β2) + (β1 + β3)x1
The slope of the model for the control group is β1. The slope of the model for the intervention group is β1 + β3. 11.82
a.
The first-order model is: E(y) = β0 + β1x1 + β2x2
b.
For the high-tech firms, x2 = 1. The model for the high-tech firm is: E(y) = β0 + β1x1 + β2(1) = β0 + β2 + β1x1
The slope of the line would be β1. c.
The new model would include the interaction term: E(y) = β0 + β1x1 + β2x2 + β3x1x2
d.
For the high-tech firms, x2 = 1. The model for the high-tech firm is: E(y) = β0 + β1x1 + β2(1) + β3x1(1) = β0 + β2 + (β1 + β3)x1
The slope of the line would be β1 + β3. 11.84
By adding variables to the model, SSE will decrease or stay the same. Thus, SSEC ≤ SSER. The only circumstance under which we will reject H0 is if SSEC is much smaller than SSER. If SSEC is much smaller than SSER, F will be large. Thus, the test is only one-tailed.
11.86
a.
Ha: At least one βi ≠ 0, i = 3, 4, 5
b.
The reduced model would be E(y) = β0 + β1x1 + β2x2
c.
The numerator df = k − g = 5 − 2 = 3 and the denominator df = n − (k + 1) = 30 − (5 + 1) = 24.
412
Chapter 11
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
d.
H0: β3 = β4 = β5 = 0 Ha: At least one βi ≠ 0, i = 3, 4, 5 (SSE R − SSE C)/(k − g ) SSE C /[n − (k + 1)] (1250.2 − 1125.2) /(5 − 2) 41.6667 = .89 = = 1125.2 /[30 − (5 + 1)] 46.8833
The test statistic is F =
The rejection region requires α = .05 in the upper tail of the F distribution with numerator df = k − g = 5 − 2 = 3 and denominator df = n − (k + 1) = 30 − (5 + 1) = 24. From Table IX, Appendix B, F.05 = 3.01. The rejection region is F > 3.01. Since the observed value of the test statistic does not fall in the rejection region (F = .89 >/ 3.01), H0 is not rejected. There is insufficient evidence to indicate the secondorder terms are useful at α = .05. 11.88
a.
Let variables x1 through x4 be the Demographic variables, variables x5 through x11 be the Diagnostic variables, variables x12 through x15 be the Treatment variables, and variables x16 through x21 be the Community variables. The compete model is: E ( y ) = β 0 + β1 x1 + β 2 x2 + β 3 x3 + β 4 x4 + β 5 x5 + β 6 x6 + β 7 x7 + β 8 x8 + β 9 x9 + β10 x10 + β11 x11 + β12 x12 + β13 x13 + β14 x14 + β15 x15 + β16 x16 + β17 x17 + β18 x18 + β19 x19 + β 20 x20 + β 21 x21
b.
To determine if the 7 Diagnostic variables contribute information for the prediction of y, we test: H0: β5 = β6 = …= β11 = 0
c.
The reduced model would be: E ( y ) = β 0 + β1 x1 + β 2 x2 + β 3 x3 + β 4 x4 + β12 x12 + β13 x13 + β14 x14 + β15 x15 + β16 x16 + β17 x17 + β18 x18 + β19 x19 + β 20 x20 + β 21 x21
11.90
d.
Since the p-value is so small (p < .0001), H0 is rejected. There is sufficient evidence to indicate at least one of the seven diagnostic variables contributes information for the prediction of y.
a.
The complete second order model is: E(y) = β0 + β1x1 + β x12 + β3x2 + β4x1x2 + β 5 x12 x2 where x1 = age ⎧ 1 if current x2 = ⎨ ⎩0 otherwise
Multiple Regression and Model Building
413
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
b.
To determine if the quadratic terms are important, we test:
c.
H0: β2 = β5 = 0 To determine if the interaction terms are important, we test: H0: β4 = β5 = 0
d.
From MINITAB, the outputs from fitting the three models are: Regression Analysis: Value versus Age, AgeSq, Status, AgeSt, AgeSqSt The regression equation is Value = 83 - 5.7 Age + 0.236 AgeSq - 62 Status + 5.4 AgeSt - 0.234 AgeSqSt Predictor Constant Age AgeSq Status AgeSt AgeSqSt
Coef 83.4 -5.74 0.2361 -62.1 5.36 -0.2337
S = 286.8
SE Coef 316.3 18.68 0.2549 354.8 24.81 0.4080
R-Sq = 24.7%
T 0.26 -0.31 0.93 -0.18 0.22 -0.57
P 0.793 0.760 0.359 0.862 0.830 0.570
R-Sq(adj) = 16.1%
Analysis of Variance Source Regression Residual Error Total Source Age AgeSq Status AgeSt AgeSqSt
DF 5 44 49
DF 1 1 1 1 1
SS 1186549 3618994 4805542
MS 237310 82250
F 2.89
P 0.024
Seq SS 865746 138871 77594 77342 26996
Regression Analysis: Value versus Age, Status, AgeSt The regression equation is Value = - 176 + 11.2 Age + 196 Status - 11.4 AgeSt Predictor Constant Age Status AgeSt
Coef -176.1 11.166 196.5 -11.432
S = 283.2
SE Coef 145.0 3.902 178.9 6.763
R-Sq = 23.2%
T -1.21 2.86 1.10 -1.69
P 0.231 0.006 0.278 0.098
R-Sq(adj) = 18.2%
Analysis of Variance Source Regression Residual Error Total Source Age Status AgeSt
414
DF 1 1 1
DF 3 46 49
SS 1116017 3689526 4805543
MS 372006 80207
F 4.64
P 0.006
Seq SS 865746 21097 229174
Chapter 11
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
Regression Analysis: Value versus Age, AgeSq, Status The regression equation is Value = 166 - 8.8 Age + 0.253 AgeSq - 106 Status Predictor Constant Age AgeSq Status
Coef 165.8 -8.81 0.2535 -105.6
S = 284.5
SE Coef 182.7 10.89 0.1632 107.9
R-Sq = 22.5%
T 0.91 -0.81 1.55 -0.98
P 0.369 0.423 0.127 0.333
R-Sq(adj) = 17.5%
Analysis of Variance Source Regression Residual Error Total Source Age AgeSq Status
DF 1 1 1
DF 3 46 49
SS 1082210 3723332 4805542
MS 360737 80942
F 4.46
P 0.008
Seq SS 865746 138871 77594
Test for part b: The test statistic is: F=
(SSE R − SSE C)/(k − g ) (3, 689, 526 − 3, 618, 994) / 2 = = .429 82, 250 SSE C /[n − ( k + 1)]
Since no α is given, we will use α = .05. The rejection region requires α = .05 in the upper tail of the F distribution with ν1 = 2 numerator degrees of freedom and ν2 = 44 denominator degrees of freedom. From Table IX, Appendix B, F.05 ≈ 3.23. The rejection region is F > 3.23. Since the observed value of the test statistic does not fall in the rejection region (F = .429 >/ 3.23), H0 is not rejected. There is insufficient evidence to indicate the quadratic terms are important for predicting market value at α = .05. Test for part c: The test statistic is: F=
(SSE R − SSE C)/(k − g ) (3, 723, 332 − 3, 618, 994) /(5 − 3) = = .634 82, 250 SSE C /[n − (k + 1)]
The rejection region is the same as in previous test. Reject H0 if F > 3.23. Since the observed value of the test statistic does not fall in the rejection region (F = .634 >/ 3.23), H0 is not rejected. There is insufficient evidence to indicate the interaction terms are important for predicting market value at α = .05.
Multiple Regression and Model Building
415
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
11.92
a.
The reduced model for testing if the mean posttest scores differ for the intervention and control groups would be: E(y) = β0 + β1x1
11.94
b.
The reported p-value is .03. Since the p-value is so small, H0 is rejected. There is evidence to indicate that the mean posttest sun safety knowledge scores differ for the intervention and control groups for α > .03.
c.
The reported p-value is .033. Since the p-value is so small, H0 is rejected. There is evidence to indicate that the mean posttest sun safety comprehension scores differ for the intervention and control groups for α > .033.
d.
The reported p-value is .322. Since the p-value is not small, H0 is not rejected. There is no evidence to indicate that the mean posttest sun safety application scores differ for the intervention and control groups for α < .322.
a.
To determine whether the rate of increase of emotional distress with experience is different for the two groups, we test: H0: β4 = β5 = 0 Ha: At least one βi ≠ 0, i = 4, 5
b.
To determine whether there are differences in mean emotional distress levels that are attributable to exposure group, we test: H0: β3 = β4 = β5 = 0 Ha: At least one βi ≠ 0, i = 3, 4, 5
c.
To determine whether there are differences in mean emotional distress levels that are attributable to exposure group, we test: H0: β3 = β4 = β5 = 0 Ha: At least one βi ≠ 0, i = 3, 4, 5 The test statistic is F =
(SSE R − SSE C) /(k − g ) (795.23 − 783.9) /(5 − 2) = = .93 783.9 /[200 − (5 + 1)] SSE C /[ n − (k + 1)]
The rejection region requires α = .05 in the upper tail of the F distribution with ν1 = k − g = 5 − 2 = 3 and ν2 = n − (k + 1) = 200 − (5 + 1) = 194. From Table IX, Appendix B, F.05 ≈ 2.60. The rejection region is F > 2.60. Since the observed value of the test statistic does not fall in the rejection region (F = .93 >/ 2.60), H0 is not rejected. There is insufficient evidence to indicate that there are differences in mean emotional distress levels that are attributable to exposure group at α = .05.
416
Chapter 11
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
11.96
a.
The best one-variable predictor of y is the one whose t statistic has the largest absolute value. The t statistics for each of the variables are:
Independent Variable ───── x1 x2 x3 x4 x5 x6
t=
βˆi sβˆ
i
───────── t = 1.6/.42 = 3.81 t = −.9/.01 = −90 t = 3.4/1.14 = 2.98 t = 2.5/2.06 = 1.21 t = −4.4/.73 = −6.03 t = .3/.35 = .86
The variable x2 is the best one-variable predictor of y. The absolute value of the corresponding t score is 90. This is larger than any of the others.
11.98
b.
Yes. In the stepwise procedure, the first variable entered is the one which has the largest absolute value of t, provided the absolute value of the t falls in the rejection region.
c.
Once x2 is entered, the next variable that is entered is the one that, in conjunction with x2, has the largest absolute t value associated with it.
a.
In step 1, all 1 variable models are fit. Thus, there are a total of 11 models fit.
b.
In step 2, all two-variable models are fit, where 1 of the variables is the best one selected in step 1. Thus, a total of 10 two-variable models are fit.
c.
In the 11th step, only one model is fit – the model containing all the independent variables.
d.
The model would be:
E ( y ) = β 0 + β1 x1 + β 2 x2 + β3 x3 + β 4 x4 + β 7 x7 + β 9 x9 + β10 x10 + β11 x11 e.
67.7% of the total sample variability of overall satisfaction is explained by the model containing the independent variables safety on bus, seat availability, dependability, t travel time, convenience of route, safety at bus stops, hours of service, and frequency of service.
f.
Using stepwise regression does not guarantee that the best model will be found. There may be better combinations of the independent variables that are never found, because of the order in which the independent variables are entered into the model.
Multiple Regression and Model Building
417
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
11.100 a.
The plot of the residuals reveals a nonrandom pattern. The residuals exhibit a curved shape. Such a pattern usually indicates that curvature needs to be added to the model.
b.
The plot of the residuals reveals a nonrandom pattern. The residuals versus the predicted values shows a pattern where the range in values of the residuals increases as yˆ increases. This indicates that the variance of the random error, ∈, becomes larger as the estimate of E(y) increases in value. Since E(y) depends on the x-values in the model, this implies that the variance of ∈ is not constant for all settings of the x's.
c.
This plot reveals an outlier, since all or almost all of the residuals should fall within 3 standard deviations of their mean of 0.
d.
This frequency distribution of the residuals is skewed to the right. This may be due to outliers or could indicate the need for a transformation of the dependent variable.
11.102 a.
b.
Since all the pairwise correlations are .45 or less in absolute value, there is little evidence of extreme multicollinearity. No. The overall model test is significant (p < .001). This implies that at least one variable contributes to the prediction of the urban/rural rating. Looking at the individual t-tests, there are several that are significant, namely x1, x3, and x5. There is no evidence that multicollinearity is present.
11.104 First, we need to compute the value of the residual:
Residual = y − yˆ = 87 − 29.63 = 57.37 We are given that the standard deviation is s = 24.68. Thus, an observation with a residual of 57.37 is 57.37 / 24.68 = 2.32 standard deviations from the fitted regression line. Since this is less than 3 standard deviations from the regression line, this point is not considered an outlier.
418
Chapter 11
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
11.106 a.
From MINITAB, the output is: Regression Analysis: Food versus Income, Size The regression equation is Food = 2.79 - 0.00016 Income + 0.383 Size Predictor Constant Income Size
Coef 2.7944 -0.000164 0.38348
S = 0.7188
SE Coef 0.4363 0.006564 0.07189
R-Sq = 55.8%
T 6.40 -0.02 5.33
P 0.000 0.980 0.000
R-Sq(adj) = 52.0%
Analysis of Variance Source Regression Residual Error Total Source Income Size
DF 2 23 25
DF 1 1
SS 15.0027 11.8839 26.8865
MS 7.5013 0.5167
F 14.52
P 0.000
Seq SS 0.2989 14.7037
Correlations: Income, Size Pearson correlation of Income and Size = -0.137 P-Value = 0.506
No; Income and household size do not seem to be highly correlated. The correlation coefficient between income and household size is −.137. b.
Using MINITAB, the residual plots are: Histogram of the Residuals (response is Food)
Frequency
10
5
0 -1.0
-0.5
0.0
0.5
1.0
1.5
2.0
2.5
3.0
Residual
Multiple Regression and Model Building
419
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
Residuals Versus the Fitted Values (response is Food) 3
Residual
2
1
0
-1 3
4
5
6
Fitted Value
Residuals Versus Income (response is Food) 3
Residual
2
1
0
-1 0
10
20
30
40
50
60
70
80
90
100
Income
Residuals Versus Size (response is Food) 3
Residual
2
1
0
-1 0
1
2
3
4
5
6
7
8
9
Size
Yes; The residuals versus income and residuals versus homesize exhibit a curved shape. Such a pattern could indicate that a second-order model may be more appropriate.
420
Chapter 11
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
c.
No; The residuals versus the predicted values reveals varying spreads for different values of yˆ . This implies that the variance of ∈ is not constant for all settings of the x's.
d.
Yes; The outlier shows up in several plots and is the 26th household (Food consumption = $7500, income = $7300 and household size = 5).
e.
No; The frequency distribution of the residuals shows that the outlier skews the frequency distribution to the right.
11.108 Using MINITAB, the residual plots are:
Residual Plots for DDT Normal Probability Plot of the Residuals
Percent
99 90 50 10 1 0.1
Residuals Versus the Fitted Values Standardized Residual
99.9
-5
0 5 Standardized Residual
2.5 0.0
50
10
50 Fitted Value
100
Residuals Versus the Order of the Data Standardized Residual
Frequency
100
2 4 6 8 Standardized Residual
5.0
0
Histogram of the Residuals
0
7.5
10
150
0
10.0
10.0 7.5 5.0 2.5 0.0 1 10 20 30 4 0 5 0 6 0 7 0 8 0 9 0 00 10 20 30 40 1 1 1 1 1
Observation Order
Residuals Versus WEIGHT (response is DDT) 12
Standardized Residual
10 8 6 4 2 0 0
500
1000
1500
2000
2500
WEIGHT
Multiple Regression and Model Building
421
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
Residuals Versus LENGTH (response is DDT) 12
Standardized Residual
10 8 6 4 2 0 20
25
30
35 LENGTH
40
45
50
55
Residuals Versus MILE (response is DDT) 12
Standardized Residual
10 8 6 4 2 0 0
50
100
150
200
250
300
350
MILE
From the normal probability plot, the points do not fall on a straight line, indicating the residuals are not normal. The histogram of the residuals indicates the residuals are skewed to the right, which also indicates that the residuals are not normal. The plot of the residuals versus yhat indicates that there is at least one outlier and the variance is not constant. One observation has a standardized residual of more than 10 and several others have standardized residuals greater than 3. This is also evident in the plots of the residuals versus each of the independent variables. Since the assumptions of normality and constant variance appear to be violated, we could consider transforming the data. We should also check the outlying observations to see if there are any errors connected with these observations. 11.110 a.
To determine if at least one of the β parameters is not zero, we test: H0: β1 = β2 = β3 = β4 = 0 Ha: At least one βi ≠ 0 The test statistic is F =
422
R2 / k .83 / 4 = = 24.41 2 (1 − R ) /[n − (k + 1)] (1 − .83)([25 − (4 + 1)]
Chapter 11
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
The rejection region requires α = .05 in the upper tail of the F distribution with numerator df = k = 4 and denominator df = n − (k + 1) = 25 − (4 + 1) = 20. From Table IX, Appendix B, F.05 = 2.87. The rejection region is F > 2.87. Since the observed value of the test statistic falls in the rejection region (F = 24.41 > 2.87), H0 is rejected. There is sufficient evidence to indicate at least one of the β parameters is nonzero at α = .05. b.
H0: β1 = 0 Ha: β1 < 0 The test statistic is t =
βˆ1 − 0 sβˆ
=
−2.43 − 0 = −2.01 1.21
1
The rejection region requires α = .05 in the lower tail of the t distribution with df = n − (k + 1) = 25 − (4 + 1) = 20. From Table VI, Appendix B, t.05 = 1.725. The rejection region is t < −1.725. Since the observed value of the test statistic falls in the rejection region (t = −2.01 < −1.725), H0 is rejected. There is sufficient evidence to indicate β1 is less than 0 at α = .05. c.
H0: β2 = 0 Ha: β2 > 0 The test statistic is t =
βˆ2 − 0 sβˆ
=
.05 − 0 = .31 .16
2
The rejection region requires α = .05 in the upper tail of the t distribution. From part b above, the rejection region is t > 1.725. Since the observed value of the test statistic does not fall in the rejection region (t = .31 >/ 1.725), H0 is not rejected. There is insufficient evidence to indicate β2 is greater than 0 at α = .05. d.
H0: β3 = 0 Ha: β3 ≠ 0 The test statistic is t =
βˆ3 − 0 sβˆ
=
.62 − 0 = 2.38 .26
3
The rejection region requires α/2 = .05/2 = .025 in each tail of the t distribution with df = 20. From Table VI, Appendix B, t.025 = 2.086. The rejection region is t < −2.086 or t > 2.086. Since the observed value of the test statistic falls in the rejection region (t = 2.38 > 2.086), H0 is rejected. There is sufficient evidence to indicate β3 is different from 0 at α = .05.
Multiple Regression and Model Building
423
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
11.112 The error of prediction is smallest when the values of x1, x2, and x3 are equal to their sample means. The further x1, x2, and x3 are from their means, the larger the error. When x1 = 60, x2 = .4, and x3 = 900, the observed values are outside the observed ranges of the x values. When x1 = 30, x2 = .6, and x3 = 1300, the observed values are within the observed ranges and consequently the x values are closer to their means. Thus, when x1 = 30, x2 = .6, and x3 = 1300, the error of prediction is smaller. 11.114 From the plot of the residuals for the straight line model, there appears to be a mound shape which implies the quadratic model should be used. 11.116 a. b.
Ha: At least one of β4 and β5 ≠ 0 The regression model E(y) = β0 + β1x1 + β2x2 + β3 x22 + β4x1x2 + β5x1 x22 is fit to the 35 data points, yielding a sum of squares for error, denoted SSEC. The regression model E(y) = β0 + β1x1 + β2x2 + β3 x22 is also fit to the data and its sum of squares for error is obtained, denoted SSER. Then the test statistic is: F=
(SSE R − SSE C) /( k − g ) SSE C /[n − (k + 1)]
where k = 5, g = 3, and n = 35. c.
The numerator degrees of freedom is k − g = 5 − 3 = 2, and the denominator degrees of freedom is n − (k + 1) = 35 − (5 + 1) = 29.
d.
The rejection region requires α = .05 in the upper tail of the F distribution with numerator df = 2 and denominator df = 29. From Table IX, Appendix B, F.05 = 3.33. The rejection region is F > 3.33.
11.118 a.
E(y) = β0 + β1x1 + β2x2 + β3x3 ⎧ 1, if level 2 ⎧ 1, if level 3 x3 = ⎨ where x2 = ⎨ ⎩ 0, otherwise ⎩ 0, otherwise
b.
E(y) = β0 + β1x1 + β2 x12 + β3x2 + β4x3 + β5x1x2 + β6x1x3 + β7 x12 x2 + β8 x12 x3 where x1, x2, and x3 are as in part a.
424
Chapter 11
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
11.120 a. b. 11.122 a.
E(y) = β0 + β1x1 + β2x2 E(y) = β0 + β1x1 + β2 x12 + β3x2 + β4 x22 + β5x1x2 1. 2. 3. 4. 5.
b.
c.
The "Quantitative GMAT score" is measured on a numerical scale, so it is a quantitative variable. The "Verbal GMAT score" is measured on a numerical scale, so it is a quantitative variable. The "Undergraduate GPA" is measured on a numerical scale, so it is a quantitative variable. The "First-year graduate GPA" is measured on a numerical scale, so it is a quantitative variable. The "Student cohort" has 3 categories, so it is a qualitative variable. Note that the numerical scale is meaningless in this situation. (It is possible to consider this as a quantitative variable. However, for this problem we will consider it as qualitative.)
The quantitative variables GMAT score, verbal GMAT score, undergraduate GPA, and first-year graduate GPA should all be positively correlated to final GPA. ⎧1 x5 = ⎨ ⎩0 ⎧1 x6 = ⎨ ⎩0
if student entered doctoral program in year 3 otherwise if student entered doctoral program in year 5 otherwise
d.
E(y) = β0 + β1x1 + β2x2 + β3x3 + β4x4 + β5x5 + β6x6
e.
β0 = the y-intercept for students entering in year 1. β1 = the final GPA will increase by β1 for each additional increase of one unit of GMAT score, holding the remaining variables constant. β2 = the final GPA will increase by β2 for each additional increase of one unit of verbal GMAT score, holding the remaining variables constant. β3 = the final GPA will increase by β3 for each additional increase of one undergraduate GPA point, holding the remaining variables constant. β4 = the final GPA will increase by β4 for each additional increase of one first-year graduate GPA point, holding the remaining variables constant.
β5 = difference in mean final GPA between student cohort year 2 and year 1. β6 = difference in mean final GPA between student cohort year 3 and year 1. f.
E(y) = β0 + β1x1 + β2x2 + β3x3 + β4x4 + β5x5 + β6x6 + β7x1x5 + β8x1x6 + β9x2x5 + β10x2x6 + β11x3x5 + β12x3x6 + β13x4x5 + β14x4x6
Multiple Regression and Model Building
425
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
g.
For the year 1 cohort, x5 = x6 = 0. The model is: E(y) = β0 + β1x1 + β2x2 + β3x3 + β4x4 + β5(0) + β6(0) + β7x1(0) + β8x1(0) + β9x2(0) + β10x2(0) + β11x3(0) + β12x3(0) + β13x4(0) + β14x4(0) = β0 + β1x1 + β2x2 + β3x3 + β4x4 The slopes for the four variables are β1, β2, β3 and β4 respectively.
11.124 a.
The hypothesized model is: E(y) = β0 + β1x1 + β2x2 + β3x3 + β4x4 + β5x5
β0 = y-intercept. It has no interpretation in this model. β1 = difference in the mean salaries between males and females, all other variables held constant.
β2 = difference in the mean salaries between whites and nonwhites, all other variables held constant. β3 = change in the mean salary for each additional year of education, all other variables held constant.
β4 = change in the mean salary for each additional year of tenure with firm, all other variables held constant. β5 = change in the mean salary for each additional hour worked per week, all other variables held constant. b.
The least squares equation is:
yˆ = 15.491 + 12.774x1 + .713x2 + 1.519x3 + .32x4 + .205x5
βˆ0 = estimate of the y-intercept. It has no interpretation in this model. βˆ1 : We estimate the difference in the mean salaries between males and females to be $12.774, all other variables held constant.
βˆ2 : We estimate the difference in the mean salaries between whites and nonwhites to be
$.713, all other variables held constant.
βˆ3 : We estimate the change in the mean salary for each additional year of education to be $1.519, all other variables held constant.
βˆ4 : We estimate the change in the mean salary for each additional year of tenure with firm to be $.320, all other variables held constant.
βˆ5 : We estimate the change in the mean salary for each additional hour worked per week to be $.205, all other variables held constant.
426
Chapter 11
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
c.
R2 = .240. 24% of the total variability of salaries is explained by the model containing gender, race, educational level, tenure with firm, and number of hours worked per week. To determine if the model is useful for predicting annual salary, we test: H0: β1 = β2 = β3 = β4 = β5 = 0 Ha: At least one βi ≠ 0 The test statistic is F =
R2 / k .24 / 5 = = 11.68 2 (1 − R )[n − (k + 1)] (1 − .24) /[191 − (5 + 1)]
The rejection region requires α = .05 in the upper tail of the F distribution with numerator df = k = 5 and denominator df = n − (k + 1) = 191 − (5 + 1) = 185. From Table IX, Appendix B, F.05 ≈ 2.21. The rejection region is F > 2.21. Since the observed value of the test statistic falls in the rejection region (F = 11.68 > 2.21), H0 is rejected. There is sufficient evidence to indicate the model containing gender, race, educational level, tenure with firm, and number of hours worked per week is useful for predicting annual salary for α = .05. d.
To determine if male managers are paid more than female managers, we test: H0: β1 = 0 Ha: β1 > 0 The p-value given for the test < .05/2 = .025. Since the p-value is less than α = .05, there is evidence to reject H0. There is evidence to indicate male managers are paid more than female managers, holding all other variables constant, for α > .025.
e.
11.126 a. b.
The salary paid an individual depends on many factors other than gender. Thus, in order to adjust for other factors influencing salary, we include them in the model. The main effects model would be: E ( y ) = β 0 + β1 x1 + β8 x8
βˆ1 = −.28 . The mean value for the relative error of the effort estimate for developers is estimated to be .28 units below that of project leaders, holding previous accuracy constant.
βˆ8 = .27 . The mean value for the relative error of the effort estimate if previous accuracy is more than 20% is estimated to be .27 units above that if previous accuracy is less than 20%, holding company role of estimator constant. c.
One possible reason for the sign of βˆ1 being opposite from what is expected could be that company role of estimator and previous accuracy could be correlated.
Multiple Regression and Model Building
427
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
11.128 a.
R2 = .45. 45% of the total variability of the suicide rates is explained by the model containing unemployment rate, percentage of females in the work force, divorce rate, logarithm of GNP, and annual percent change in GNP. To determine if the model is useful for predicting suicide rate, we test: H0: β1 = β2 = β3 = β4 = β5 = 0 Ha: At least one βi ≠ 0 The test statistic is F =
R2 / k .45 / 5 = = 6.38 2 (1 − R )[n − (k + 1)] (1 − .45) /[45 − (5 + 1)]
The rejection region requires α = .05 in the upper tail of the F distribution with numerator df = k = 5 and denominator df = n − (k + 1) = 45 − (5 + 1) = 39. From Table IX, Appendix B, F.05 ≈ 2.45. The rejection region is F > 2.45. Since the observed value of the test statistic falls in the rejection region (F = 6.38 > 2.45), H0 is rejected. There is sufficient evidence to indicate the model containing unemployment rate, percentage of females in the work force, divorce rate, logarithm of GNP and annual percent change in GNP is useful for predicting suicide rate for α = .05. b.
βˆ0 = .002 = estimate of the y-intercept. It has no interpretation in this model. βˆ1 : We estimate the change in suicide rate for each unit change in unemployment rate to be .0204, all other variables held constant.
βˆ2 : We estimate the change in suicide rate for each unit change in percentage of females in the work force to be −.0231, all other variables held constant.
βˆ 3 : We estimate the change in suicide rate for each unit change in divorce rate to be .0765, all other variables held constant.
βˆ4 : We estimate the change in suicide rate for each unit change in logarithm of GNP to be .2760, all other variables held constant.
βˆ5 : We estimate the change in suicide rate for each unit change in annual percent change in GNP to be .0018, all other variables held constant. The p-values for unemployment rate and percentage of females in the work force are less than .05. This indicates that both are important in predicting suicide rate. The pvalues for divorce rate, logarithm of GNP, and annual percent change in GNP are all greater than .10. This indicates that none of these variables are important in predicting suicide rate. We must view these conclusions with caution. Some of these independent variables may be highly correlated with each other. If so, some of the variables declared nonsignificant may be significant if the other variables are removed from the model.
428
Chapter 11
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
c.
To determine if unemployment rate is a useful predictor of the suicide rate, we test: H0: β1 = 0 Ha: β1 ≠ 0 The p-value = .002. Since this p-value is less than α = .05, there is evidence to reject H0. There is sufficient evidence to indicate unemployment rate is a useful predictor of the suicide rate for σ = .05.
d.
Curvature: It may be possible that the relationship between the suicide rate and some of the independent variables is not linear, but curved. Thus, some of the variables that do not appear to be useful predictors may, in fact, be useful predictors if the secondorder term was added to the model. Interaction: Again, it may be possible that the effect of some independent variables on the suicide rate is different for different levels of other independent variables. This possibility should be explored before throwing out certain independent variables. Multicollinearity: Some of these independent variables may be highly correlated with each other. If so, some of the variables declared nonsignificant may be significant if other variables are removed from the model.
11.130 CEO income (x1) and stock percentage (x2) are said to interact if the effect of one variable, say CEO income, on the dependent variable profit (y) depends on the level of the second variable, stock percentage. 11.132 a.
The SAS output is: DEP VARIABLE: Y ANALYSIS OF VARIANCE SUM OF
MEAN
DF
SQUARES
SQUARE
F VALUE
PROB>F
MODEL
3
25784705.01
8594901.67
241.758
0.0001
ERROR
16
568826.19
35551.63709
C TOTAL
19
26353531.20
ROOT MSE
188.5514
R-SQUARE
0.9784
DEP MEAN
3014.2
ADJ R-SQ
0.9744
SOURCE
C.V.
6.255438
PARAMETER ESTIMATES PARAMETER
STANDARD
T FOR H0:
ESTIMATE
ERROR
PARAMETER=0
PROB > |T|
290.99944
4.581
0.0003
0.37864583
-0.399
0.6949
5.34596285
-0.491
0.6300
0.006863831
7.569
0.0001
VARIABLE
DF
INTERCEP
1
1333.17830
X1
1
-0.15122302
X2
1
-2.62532461
X1X2
1
0.05195415
The fitted model is yˆ = 1333.18 − .151x1 − 2.625x2 + .052x1x2
Multiple Regression and Model Building
429
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
b.
To determine if the overall model is useful, we test: H0: β1 = β2 = β3 = 0 Ha: At least one βi ≠ 0, i = 1, 2, 3 The test statistic is F =
MSR 8, 594, 901.67 = = 241.758 MSE 35, 551.637
The rejection region requires α = .05 in the upper tail of the F distribution with numerator df = k = 3 and denominator df = n − (k + 1) = 20 − (3 + 1) = 16. From Table IX, Appendix B, F.05 = 3.24. The rejection region is F > 3.24. Since the observed value of the test statistic falls in the rejection region (F = 241.758 > 3.24), H0 is rejected. There is sufficient evidence to indicate the model is useful at α = .05. c.
To determine if the interaction is present, we test: H0: β3 = 0 Ha: β3 ≠ 0
The test statistic is t =
βˆ3 − 0 sβˆ
= 7.569.
3
The rejection region requires α/2 = .05/2 = .025 in each tail of the t distribution with df = n − (k + 1) = 20 − (3 + 1) = 16. From Table VI, Appendix B, t.025 = 2.120. The rejection region is t < −2.120 or t > 2.120. Since the observed value of the test statistic falls in the rejection region (t = 7.569 > 2.120), H0 is rejected. There is sufficient evidence to indicate the interaction between advertising expenditure and shelf space is present at α = .05.
430
d.
Advertising expenditure and shelf space are said to interact if the affect of advertising expenditure on sales is different at different levels of shelf space.
e.
If a first-order model was used, the effect of advertising expenditure on sales would be the same regardless of the amount of shelf space. If interaction really exists, the effect of advertising expenditure on sales would depend on which level of shelf space was present.
Chapter 11
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
11.134 a.
There is a curvilinear trend. b.
From MINITAB, the output is: The regression equation is y = 42.2 - 0.0114x + 0.000001 xsq Predictor
Coef
StDev
T
P
42.247
5.712
7.40
0.000
-0.011404
0.005053
-2.26
0.037
0.00000061
0.00000037
1.66
0.115
Constant x xsq S = 21.81
R-Sq = 34.9%
R-Sq(adj) = 27.2%
Analysis of Variance Source
DF
SS
MS
F
P
2
4325.4
2162.7
4.55
0.026
475.6
Regression Residual Error
17
8085.5
Total
19
12410.9
Sourc
DF
Seq SS
e x
1
3013.3
xsq
1
1312.1
Unusual Observations Obs 16 17
x1 9150
y
Fit
StDev Fit
Residual
4.60
-11.21
16.24
15.81
St Resid 1.09 x
15022
2.20
8.09
21.40
-5.89
-1.41 x
X denotes an observation whose X value gives it large influence.
The fitted model is yˆ = 42.2 − .0114x + .00000061x2
Multiple Regression and Model Building
431
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
c.
To determine if a curvilinear relationship exists, we test: H0: β2 = 0 Ha: β2 ≠ 0
From MINITAB, the test statistic is t = 1.66 with p-value = .115. Since the p-value is greater than α = .05, do not reject H0. There is insufficient evidence to indicate that a curvilinear relationship exists between dissolved phosphorus percentage and soil loss at α = .05. 11.136 a.
The first order model for this problem is: E(y) = β0 + β1x1 + β2x2 + β3x3 + β4x4
b.
Using MINITAB, the printout is: Regression Analysis The regression equation is y = 28.9 -0.000000 x1 + 0.844 x2 - 0.360 x3 - 0.300 x4 Predictor
Coef
StDev
T
P
28.87
12.67
2.28
0.034
x1
-0.00000011
0.00000028
-0.38
0.708
x2
0.8440
0.2326
3.63
0.002
x3
-0.3600
0.1316
-2.74
0.013
x4
-0.3003
0.1834
-1.64
0.117
Constant
S = 5.989
R-Sq = 51.2%
R-Sq(adj) = 41.5%
Analysis of Variance Source Regression
DF
SS
MS
F
P
4
753.76
188.44
5.25
0.005
35.87
Residual Error
20
717.40
Total
24
1471.17
Source
DF
Seq SS
x1
1
129.96
x2
1
355.43
x3
1
172.19
x4
1
96.17
Unusual Observations Obs
x1
y
Fit
StDev Fit
Residual
4
11940345
32.60
17.25
3.40
15.35
St Resid 3.11R
12
4905123
27.00
16.17
4.36
10.83
2.63R
R denotes an observation with a large standardized residual
432
Chapter 11
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
The least squares prediction line is yˆ = 28.9 − .00000011x1 + .844x2 − .360x3 − .300x4. To determine if the model is useful for predicting percentage of problem mortgages, we test: H0: β1 = β2 = β3 = β4 = 0 Ha: At least one of the coefficients is nonzero
The test statistic is F =
MS(Model) = 5.25 MSE
The p-value is p = .005. Since the p-value is less than α = .05 (p = .005 < .05), H0 is rejected. There is sufficient evidence to indicate the model is useful in predicting percentage of problem mortgages at α = .05. c.
βˆ0 = 28.9. This is merely the y-intercept. It has no other meaning in this problem. βˆ1 = −0.00000011. For each unit increase in total mortgage loans, the mean percentage of problem mortgages is estimated to decrease by 0.00000011, holding percentage of invested assets, percentage of commercial mortgages, and percentage of residential mortgages constant.
βˆ2 = 0.844. For each unit increase in percentage of invested assets, the mean percentage of problem mortgages is estimated to increase by 0.844, holding total mortgage loans, percentage of commercial mortgages, and percentage of residential mortgages constant.
βˆ3 = −0.360. For each unit increase in percentage of commercial mortgages, the mean percentage of problem mortgages is estimated to decrease by 0.360, holding total mortgage loans, percentage of invested assets, and percentage of residential mortgages constant.
βˆ4 = −0.300. For each unit increase in percentage of residential mortgages, the mean percentage of problem mortgages is estimated to decrease by 0.300, holding total mortgage loans, percentage of invested assets, and percentage of commercial mortgages constant.
Multiple Regression and Model Building
433
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
d.
Using MINITAB, the scattergrams are:
From the scattergrams, it appears that possibly x2 and x4 might warrant inclusion in the model as second order terms.
434
Chapter 11
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
e.
Using MINITAB, the printout is: Regression Analysis The regression equation is y = 56.2 -0.000000 x1 - 1.82 x2 - 0.449 x3 + 0.223 x4 + 0.0771 x2sq - 0.0189 x4sq Predictor
Coef
StDev
T
P
56.17
13.81
4.07
0.001
x1
-0.00000008
0.00000025
-0.31
0.760
x2
-1.8177
0.9935
-1.83
0.084
x3
-0.4494
0.1127
-3.99
0.001
x4
0.2227
0.6079
0.37
0.718
x2sq
0.07707
0.02665
2.89
0.010
x4sq
-0.01887
0.02334
-0.81
0.429
Constant
S = 4.956
R-Sq = 69.9%
R-Sq(adj) = 59.9%
Analysis of Variance Source Regression
DF
SS
MS
F
P
6
1029.03
171.51
6.98
0.001
24.56
Residual Error
18
442.13
Total
24
1471.17
Source
DF
Seq SS
x1
1
129.96
x2
1
355.43
x3
1
172.19
x4
1
96.17
x2sq
1
259.22
x4sq
1
16.05
Unusual Observations Obs
x1
y
Fit
StDev Fit
Residual
4 11940345
32.600
26.777
4.038
5.823
St Resid 2.03R -2.04R
10
5328142
7.500
16.105
2.599
-8.605
12
4905123
27.000
16.559
3.607
10.441
3.07R
20
2978628
3.200
11.759
2.679
-8.559
-2.05R
R denotes an observation with a large standardized residual
The least squares prediction equation is yˆ = 56.2 − .00000008x1 − 1.82x2 − .449x3 + .223x4 + 1 .0771x22 − .0189 x42 To determine if the model is useful for predicting percentage of problem mortgages, we test: H0: β1 = β2 = β3 = β4 = β5 = β6 = 0 Ha: At least one of the coefficients is nonzero
Multiple Regression and Model Building
435
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
The test statistic is F =
MS(Model) = 6.98 MSE
The p-value is p = .001. Since the p-value is less than α = .05 (p = .001 < .05), H0 is rejected. There is sufficient evidence to indicate the model is useful in predicting percentage of problem mortgages at α = .05. f.
To determine if one or more of the second-order terms of our model contribute information for the prediction of the percentage of problem mortgages, we test: H0: β5 = β6 = 0 Ha: At least one of the coefficients is nonzero
The test statistic is F =
(SSE R − SSE C) /( k − g ) (717.40 − 442.13) /(6 − 4) = = 5.60 442.13 /[25 − (6 + 1)] SSE C /[n − (k + 1)]
The rejection region requires α = .05 in the upper tail of the F-distribution with ν1 = (k − g) = (6 − 4) = 2 and ν2 = n − (k + 1) = 25 − (6 + 1) = 18. From Table IX, Appendix B, F.05 = 3.55. The rejection region is F > 3.55. Since the observed value of the test statistic falls in the rejection region (F = 5.60 > 3.55), H0 is rejected. There is sufficient evidence to indicate one or more of the second-order terms of our model contribute information for the prediction of the percentage of problem mortgages at α = .05. 11.138 a.
Using SAS, the output for fitting the model is: DEP VARIABLE: Y ANALYSIS OF VARIANCE SUM OF
MEAN
DF
SQUARES
SQUARE
F VALUE
PROB>F
MODEL
3
2396.36410
798.78803
99.394
0.0001
ERROR
16
128.58590
8.03662
C TOTAL
11
2524.95000
SOURCE
436
ROOT MSE
2.83489
R-SQUARE
0.9491
DEP MEAN
23.05000
ADJ R-SQ
0.9395
C.V.
12.29889
Chapter 11
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
PARAMETER ESTIMATES PARAMETER
STANDARD
T FOR H0:
VARIABLE
DF
ESTIMATE
ERROR
PARAMETER=0
PROB > |T|
INTERCEP
1
-11.768830
3.05032146
-3.858
0.0014
X1
1
10.293782
1.43788129
7.159
0.0001
X1SQ
1
-0.417991
0.16132974
-2.591
0.0197
X2
1
13.244076
1.50325080
8.810
0.0001
The fitted model is: yˆ = −11.8 + 10.3x1 − .418 x12 + 13.2x2 b.
To determine if the second-order term is necessary, we test: H0: β2 = 0 Ha: β2 ≠ 0
The test statistic is t = −2.591. The p-value is p = .0197. Since the p-value is less than α (p = .0197 < .05), H0 is rejected. There is sufficient evidence to conclude that the second-order term in the model proposed by the operations manager is necessary at α = .05. c.
The reduced model E(y) = β0 + β3x2 was fit to the data. The SAS output is: DEP VARIABLE: Y ANALYSIS OF VARIANCE SUM OF
MEAN
DF
SQUARES
SQUARE
F VALUE
PROB>F
MODEL
1
1.25000000
1.25000000
0.009
0.9258
ERROR
18
2523.70000
140.20556
C TOTAL
19
2524.95000
ROOT MSE
11.84084
R-SQUARE
0.0005
DEP MEAN
23.05
ADJ R-SQ
-0.0550
SOURCE
C.V.
51.37025
PARAMETER ESTIMATES PARAMETER
STANDARD
T FOR H0:
VARIABLE
DF
ESTIMATE
ERROR
PARAMETER=0
PROB > |T|
INTERCEP
1
23.30000000
3.74440323
6.223
0.0001
X2
1
-0.50000000
5.29538583
-0.094
0.9258
Multiple Regression and Model Building
437
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
The fitted model is yˆ = 23.3 − .5x2. The hypotheses are: H0: β1 = β2 = 0 Ha: At least one βi ≠ 0, i = 1, 2
(SSE R − SSE C) /(k − g ) SSE C /[ n − (k + 1)] (2523.7 − 128.586) /(3 − 1) 1197.557 = = 149.01 = 128.586 /[20 − (3 + 1)] 8.036625
The test statistic is F =
The rejection region requires α = .10 in the upper tail of the F distribution with numerator df = k − g = 3 − 1 = 2 and denominator df = n − (k + 1) = 20 − (3 + 1) = 16. From Table VIII, Appendix B, F.10 = 2.67. The rejection region is F > 2.67. Since the observed value of the test statistic falls in the rejection region (F = 149.01 > 2.67), H0 is rejected. There is sufficient evidence to indicate the age of the machine contributes information to the model at α = .10. After adjusting for machine type, there is evidence that down time is related to age. 11.140 a.
For a sunny weekday, x1 = 0 and x2 = 1: x3 = 70 ⇒ yˆ = 250 − 700(0) + 100(1) + 5(70) + 15(0)(70) = 700 x3 = 80 ⇒ yˆ = 250 − 700(0) + 100(1) + 5(80) + 15(0)(80) = 750 x3 = 90 ⇒ yˆ = 800 x3 = 100 ⇒ yˆ = 850
For a sunny weekend, x1 = 1 and x2 = 1: x3 = 70 ⇒ yˆ = 250 − 700(1) + 100(1) + 5(70) + 15(1)(70) = 1050 x3 = 80 ⇒ yˆ = 250 − 700(1) + 100(1) + 5(80) + 15(1)(80) = 1250 x3 = 90 ⇒ yˆ = 1450 x3 = 100 ⇒ yˆ = 1650
438
Chapter 11
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
For both sunny weekdays and sunny weekend days, as the predicted high temperature increases, so does the predicted day's attendance. However, the predicted day's attendance on sunny weekend days increases at a faster rate than on sunny weekdays. Also, the predicted day's attendance is higher on sunny weekend days than on sunny weekdays. b.
To determine if the interaction term is a useful addition to the model, we test: H0: β4 = 0 Ha: β4 ≠ 0
The test statistic is t =
βˆ4 sβˆ
=
15 =5 3
4
The rejection region requires α/2 = .05/2 = .025 in each tail of the t distribution with df = n − (k + 1) = 30 − (4 + 1) = 25. From Table VI, Appendix B, t.025 = 2.06. The rejection region is t < −2.06 or t > 2.06. Since the observed value of the test statistic falls in the rejection region (t = 5 > 2.06), H0 is rejected. There is sufficient evidence to indicate the interaction term is a useful addition to the model at α = .05. c.
For x1 = 0, x2 = 1, and x3 = 95, yˆ = 250 − 700(0) + 100(1) + 5(95) + 15(0)(95) = 825
d.
The width of the interval in Exercise 11.139e is 1245 − 645 = 600, while the width is 850 − 800 = 50 for the model containing the interaction term. The smaller the width of the interval, the smaller the variance. This implies that the interaction term is quite useful in predicting daily attendance. It has reduced the unexplained error.
Multiple Regression and Model Building
439
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
e.
11.142 a.
Because an interaction term including x1 is in the model, the coefficient corresponding to x1 must be interpreted with caution. For all observed values of x3 (temperature), the interaction term value is greater than 700. From MINITAB, the output is: Regression Analysis: y versus x1, x2, x1sq, x2sq, x1x2 The regression equation is y = - 9.92 + 0.167 x1 + 0.138 x2 - 0.00111 x1sq -0.000843 x2sq +0.000241 x1x2 Predictor Constant x1 x2 x1sq x2sq x1x2
Coef -9.917 0.16681 0.13760 -0.0011082 -0.0008433 0.0002411
S = 0.1871
SE Coef 1.354 0.02124 0.02673 0.0001173 0.0001594 0.0001440
R-Sq = 93.7%
T -7.32 7.85 5.15 -9.45 -5.29 1.67
P 0.000 0.000 0.000 0.000 0.000 0.103
R-Sq(adj) = 92.7%
Analysis of Variance Source Regression Residual Error Total Source x1 x2 x1sq x2sq x1x2
DF 5 34 39
DF 1 1 1 1 1
SS 17.5827 1.1908 18.7735
MS 3.5165 0.0350
F 100.41
P 0.000
Seq SS 5.2549 7.5311 3.6434 1.0552 0.0982
The least squares prediction equation is: yˆ = −9.917 + .167 x1 + .138 x2 − .00111x12 − .000843 x22 + .000241x 1 x2
b.
The standard deviation for the first-order model is s = .4023. The standard deviation for the second-order model is s = .1871. The relative precision for the first-order model is ± 2(.4023) = ± .8046. The relative precision for the second-order model is ± 2(.1871) = ± .3742.
c.
To determine if the model is useful, we test: H0: β1 = β2 = β3 = β4 = β5 = 0 Ha: At least one βi ≠ 0, i = 1, 2, ... , 5
The test statistic is F =
MSR 3.5165 = = 100.41 MSE .0350
The p-value is .0000. Since the p-value is less than α = .05, H0 is rejected. There is sufficient evidence to indicate the model is useful for predicting GPA at α = .05.
440
Chapter 11
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
d.
To determine if the interaction term is important, we test: H0: β5 = 0 Ha: β5 ≠ 0
The test statistic is t = 1.67. The p-value is .103. Since the p-value is not less than α = .10, H0 is not rejected. There is insufficient evidence to indicate the interaction term is important for predicting GPA at α = .10. e.
From MINITAB, the plots are:
Residuals Versus x1 (response is y) 0.5 0.4 0.3
Residual
0.2 0.1 0.0 -0.1 -0.2 -0.3 -0.4 40
50
60
70
80
90
100
x1
Multiple Regression and Model Building
441
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
Residuals Versus x2 (response is y) 0.5 0.4 0.3
Residual
0.2 0.1 0.0 -0.1 -0.2 -0.3 -0.4 50
60
70
80
90
100
x2
The residual plots of the residuals against x1 and against x2 for the second-order model indicate there is no mound or bowl shape in either graph. This implies that secondorder is the highest order necessary. We have eliminated the mound shape from the plots of the residuals against x1 and the residuals against x2 for the first-order model. From the plots and the results of the tests in 11.145, it appears the second order model is preferable for predicting GPA. f.
To see if the second-order terms are useful, we test: H0: β3 = β4 = β5 Ha: At least one βi ≠ 0, i = 3, 4, 5
The test statistic is F =
(SSE R − SSE C ) /(k − g ) (5.9876 − 1.1908) / 3 = = 45.68 .0350 SSE C / [ n − (k + 1) ]
Since no α is given, we will use α = .05. The rejection region requires α = .05 in the upper tail of the F distribution with ν1 = k − g = 5 − 2 = 3 and ν2 = n − [k + 1] = 40 − (5 + 1) = 34. From Table IX, Appendix B, F.05 ≈ 2.92. The rejection region is F > 2.92. Since the observed value of the test statistic falls in the rejection region (F = 45.68 > 2.92), H0 is rejected. There is sufficient evidence that at least one second-order term is useful at α = .05.
442
Chapter 11
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
11.144 a.
The model is E(y) = β0 + β1x1 A sketch of the response curve might be:
b.
The model is E(y) = β0 + β1x1 + β2x2 + β3x3 ⎧1 if brand 2 where x 2 = ⎨ ⎩0 otherwise
⎧1 if brand 3 x3 = ⎨ ⎩0 otherwise
A sketch of the response curve might be:
c.
The model is E(y) = β0 + β1x1 + β2x2 + β3x3 + β4x1x2 + β5x1x3 A sketch of the response curve might be:
Multiple Regression and Model Building
443
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
The Condo Sales Case (To accompany Chapters 10–11)
Several models were fit to obtain the final model. I first fit a model with only the main effects for Floor, Distance, View, Endunit, and Furnish. Of these, only Furnish, adjusted for the other variables, was not significant. See the output below. The regression equation is Price = 184 - 3.81 Floor + 1.74 Distance + 40.3 View - 32.7 Endunit + 4.28 Furnish Predictor Constant Floor Distance View Endunit Furnish
Coef 183.570 -3.8076 1.7414 40.325 -32.716 4.279
s = 24.39
Stdev 5.221 0.7482 0.3750 3.456 9.581 3.602
R-sq = 49.4%
t-ratio 35.16 -5.09 4.64 11.67 -3.41 1.19
p 0.000 0.000 0.000 0.000 0.001 0.236
R-sq(adj) = 48.2%
Analysis of Variance SOURCE Regression Error Total SOURCE Floor Distance View Endunit Furnish
DF 5 203 208
SS 118091 120802 238893
DF 1 1 1 1 1
SEQ SS 14149 21208 75065 6829 840
MS 23618 595
F 39.69
p 0.000
I then added Floor2 and Distance2 to the model with all main effects. For this model, all of the main effects, including Furnish, were significant along with both squared terms. The output follows. The regression equation is Price = 220 - 13.3 Floor - 7.01 Distance + 38.9 View - 22.0 Endunit + 7.31 Furnish + 1.05 FlSq + 0.572 DiSq Predictor Constant Floor Distance View Endunit Furnish FlSq DiSq s = 22.49
444
Coef 220.258 -13.296 -7.007 38.927 -21.967 7.308 1.0512 0.5719
Stdev 8.178 3.253 1.614 3.202 9.086 3.419 0.3492 0.1033
R-sq = 57.4%
t-ratio 26.93 -4.09 -4.34 12.16 -2.42 2.14 3.01 5.54
p 0.000 0.000 0.000 0.000 0.017 0.034 0.003 0.000
R-sq(adj) = 56.0%
The Condo Sales Case
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
Analysis of Variance SOURCE Regression Error Total
DF 7 201 208
SS 137234 101659 238893
DF 1 1 1 1 1 1 1
SEQ SS 14149 21208 75065 6829 840 3640 15503
SOURCE Floor Distance View Endunit Furnish FlSq DiSq
MS 19605 506
F 38.76
p 0.000
I then did a stepwise regression, forcing all the main effects and the two squared terms into the model, to see if any two-way interaction terms could be added to the model. From this, only the interaction between Floor and View was significant. The output from the final model is: The regression equation is Price = 206 - 9.93 Floor - 7.02 Distance + 66.0 View - 22.5 Endunit + 6.48 Furnish + 1.02 FlSq + 0.577 DiSq - 6.04 FV Predictor Constant Floor Distance View Endunit Furnish FlSq DiSq FV
Coef 206.123 -9.927 -7.020 65.952 -22.451 6.485 1.0207 0.57720 -6.037
s = 21.44
Stdev 8.379 3.186 1.539 6.619 8.662 3.265 0.3330 0.09848 1.312
R-sq = 61.5%
t-ratio 24.60 -3.12 -4.56 9.96 -2.59 1.99 3.07 5.86 -4.60
p 0.000 0.002 0.000 0.000 0.010 0.048 0.002 0.000 0.000
R-sq(adj) = 60.0%
Analysis of Variance SOURCE Regression Error Total
DF 8 200 208
SS 146965 91928 238893
DF 1 1 1 1 1 1 1 1
SEQ SS 14149 21208 75065 6829 840 3640 15503 9731
SOURCE Floor Distance View Endunit Furnish FlSq DiSq FV
The Condo Sales Case
MS 18371 460
F 39.97
p 0.000
445
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
This final model is fairly good. The R-squared value is .615. Thus, 61.5% of the variation in prices can be explained by the model that includes the follow variables: Floor and Floor-squared, Distance and Distance-squared, View, Endunit, Furnish, and the interaction of Floor and View. The residual plots are as follows:
From the residual plots, it appears that the data are normally distributed, but there may be a couple of outliers. This is evident by the two points whose standardized residuals are less than −3. Also, it appears that there is constant variance. Thus, the model looks to be fairly good. It would be better if the R-squared value was higher, however. The final model is: Price = 206 − 9.93 Floor − 7.02 Distance + 66.0 View − 22.5 Endunit + 6.48 Furnish + 1.02 FlSq + 0.577 DiSq - 6.04 FV I have included graphs to indicate how each variable affects the price. These graphs reflect the relationship between Price and a selected variable, holding the other variables constant. The first graph is a graph of Price by Floor for each level of View, since Floor and View interact. Both lines are curved to reflect the quadratic relationship between Floor and Price. For the Non-ocean view, the price is fairly constant. There is a slight decrease in price as the Floor increases until Floor 5, and then a slight increase as the floor increases. For the Ocean view, the price decreases at a decreasing rate as the Floor increases. The second graph is a graph of the Price by Distance. Again, the quadratic relationship is reflected by the curved line. As the distance increases, the price decreases until a distance of 6 is reached. Then the price begins to increase again as the distance increases.
446
The Condo Sales Case
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
The third graph is a graph of the Price by View, for each Floor. Again, we must look at the relationship between Price and View at each Floor because of the significant interaction. For all Floors, the price of the Ocean View is higher than the price of the Non-ocean View. However, the difference in the two views depends on the floor. The fourth graph is a graph of the Price by Endunit. From the graph, the price of the endunits are less than the others. The last graph is a graph of the Price by Furnish. From the graph, the price of the furnished units is higher than the price of the non-furnished units.
The Condo Sales Case
447
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
Methods for Quality Improvement
Chapter 12
12.2
If rational subgrouping is not used, it is possible that a change in the process mean will go undetected. In rational subgrouping, samples are selected so that a change in the process mean occurs between samples, not within samples.
12.4
An x -chart is used to monitor the process mean.
12.6
The variation of a process must be stable. If it were not, the control limits of the -chart would be meaningless since they are a function of the process variation.
12.8
a.
According to rule 4 (14 points in a row alternating up and down), the process is out of control. Therefore, it is affected by both common and special causes of variation. An incontrol process is affected by only common causes. Rule 4 says that if we observe 14 points in a row alternating up and down, that is an indication of the presence of special causes of variation in addition to common causes. Points 2 through 16 alternate up and down.
b.
The extended x -chart is: _ x 35
UCL A
30 B 25
C = x
20 C 15 B 10 A 5
LCL 1
5
10
15
20
25
30
Sample Number
The additional points suggest that the process is out of control. Rule 1 (One point beyond Zone A), Rule 5 (2 out of 3 points in a row in Zone A or beyond), and Rule 6 (4 out of 5 points in a row in Zone B or beyond) indicate the process is out of control. 12.10
448
a.
x1 + x2 + " + x25 2008.8 = = 80.352 25 k R + R2 + " + R25 198.7 = 7.948 R= 1 = 25 k x=
Chapter 12
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
b.
Centerline = x = 80.352
From Table XII, Appendix B, with n = 5, A2 = .577.
Upper control limit = x + A2 R = 80.352 + .577(7.948) = 84.938 Lower control limit = x − A2 R = 80.352 − .577(7.948) = 75.766 c – d.
2 ( A2 R ) ) = 80.352 + 23 (.577)(7.948) = 83.409 3 2 2 Lower A–B boundary = x − ( A2 R ) ) = 80.352 − (.577)(7.948) = 77.295 3 3 1 1 Upper B–C boundary = x + ( A2 R ) ) = 80.352 + (.577)(7.948) = 81.881 3 3 1 1 Lower B–C boundary = x − ( A2 R ) ) = 80.352 + (.577)(7.948) = 78.823 3 3
Upper A–B boundary = x +
The x -chart is:
Rule 1: Rule 2: Rule 3: Rule 4: Rule 5: Rule 6:
One point beyond Zone A: Point 10 is beyond Zone A. This indicates the process is out of control. Nine points in a row in Zone C or beyond: No sequence of nine points are in Zone C (on one side of the centerline) or beyond. Six points in a row steadily increasing or decreasing: No sequence of six points steadily increase or decrease. Fourteen points in a row alternating up and down: This pattern does not exist. Two out of three points in Zone A or beyond: There are no groups of three consecutive points that have two or more in Zone A or beyond. Four out of five points in a row in Zone B or beyond: No sequence of five points has four or more in Zone B or beyond.
Rule 1 indicates the process is out of control.
Methods for Quality Improvement
449
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
12.12
a.
From Table XII, Appendix B, with n = 4, A2 = .729.
x = .6733 and R = .335 Upper control limit = x + A2 R = .6733 + .729(.335) = .9175 Lower control limit = x − A2 R = .6733 − .729(.335) = .4291
b.
Upper A − B boundary = x +
2 2 A2 R ) = .6733 + (.729)(.335) = .8361 ( 3 3
Lower A − B boundary = x −
2 2 A2 R ) = .6733 − (.729)(.335) = .5105 ( 3 3
Upper B − C boundary = x +
1 1 A2 R ) = .6733 + (.729)(.335) = .7547 ( 3 3
Lower A − B boundary = x −
1 1 A2 R ) = .6733 − (.729)(.335) = .5919 ( 3 3
Rule 1: Rule 2:
Rule 3: Rule 4: Rule 5: Rule 6:
One point beyond Zone A: No points are beyond Zone A. Nine points in a row in Zone C or beyond: There are nine points (Points 9 through 17) in a row in Zone C (on one side of the centerline) or beyond. This indicates that the process is out of control. Six points in a row steadily increasing or decreasing: No sequence of six points steadily increase or decrease. Fourteen points in a row alternating up and down: This pattern does not exist. Two out of three points in Zone A or beyond: There are no groups of three consecutive points that have two or more in Zone A or beyond. Four out of five points in a row in Zone B or beyond: No sequence of five points has four or more in Zone B or beyond.
Rule 2 indicates the process in out of control. c.
450
These control limits should not be used to monitor future output because the process is out of control. One or more special causes of variation are affecting the process mean. These should be identified and eliminated in order to bring the process into control.
Chapter 12
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
12.14
a.
The process of interest is the production of bolts used in military aircraft.
b.
Using MINITAB, the descriptive statistics are:
Descriptive Statistics: Length by Hour Variable Length
Hour 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
N 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
Mean 36.973 36.957 37.067 37.065 36.948 36.998 37.000 37.005 37.027 36.970 37.020 36.983 37.070 37.073 36.993 36.955 37.038 37.010 36.955 37.035 36.995 37.023 37.003 36.995 37.010
Median 36.965 36.970 37.060 37.040 36.940 36.985 36.995 36.995 37.020 36.950 37.050 36.985 37.075 37.075 37.020 36.965 37.035 37.010 36.965 37.045 36.985 37.020 37.010 37.005 37.020
TrMean 36.973 36.957 37.067 37.065 36.948 36.998 37.000 37.005 37.027 36.970 37.020 36.983 37.070 37.073 36.993 36.955 37.038 37.010 36.955 37.035 36.995 37.023 37.003 36.995 37.010
StDev 0.098 0.079 0.081 0.096 0.121 0.101 0.054 0.087 0.111 0.106 0.098 0.066 0.132 0.025 0.069 0.040 0.097 0.085 0.058 0.109 0.044 0.096 0.039 0.071 0.083
Variable Length
Hour 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
SE Mean 0.049 0.040 0.040 0.048 0.061 0.051 0.027 0.044 0.055 0.053 0.049 0.033 0.066 0.013 0.035 0.020 0.049 0.043 0.029 0.055 0.022 0.048 0.019 0.036 0.041
Minimum 36.880 36.850 36.990 36.980 36.810 36.890 36.940 36.910 36.900 36.870 36.880 36.900 36.910 37.040 36.890 36.900 36.940 36.910 36.880 36.900 36.960 36.930 36.950 36.900 36.900
Maximum 37.080 37.040 37.160 37.200 37.100 37.130 37.070 37.120 37.170 37.110 37.100 37.060 37.220 37.100 37.040 36.990 37.140 37.110 37.010 37.150 37.050 37.120 37.040 37.070 37.100
Q1 36.885 36.878 36.995 36.990 36.835 36.908 36.953 36.927 36.927 36.880 36.918 36.920 36.940 37.048 36.920 36.913 36.948 36.927 36.895 36.925 36.960 36.935 36.963 36.923 36.927
Q3 37.067 37.025 37.147 37.165 37.068 37.100 37.053 37.093 37.135 37.080 37.093 37.043 37.195 37.095 37.038 36.988 37.130 37.093 37.005 37.135 37.040 37.113 37.035 37.058 37.083
For each sample, we compute R = range = largest measurement - smallest measurement.
Methods for Quality Improvement
451
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
The results are listed in the table: Sample No. 1 2 3 4 5 6 7 8 9 10 11 12 13
R .20 .19 .17 .22 .29 .24 .13 .21 .27 .24 .22 .16 .31
Sample No. 14 15 16 17 18 19 20 21 22 23 24 25
R .06 .15 .09 .20 .20 .13 .25 .09 .19 .09 .17 .20
x1 + x2 + " + x25 925.1650 = = 37.0066 k 25 R + R2 + " R25 4.67 = R = 1 = .1868 k 25
x =
Centerline = x = 37.007 From Table XII, Appendix B, with n = 4, A2 = .729.
Upper control limit = x + A2 R = 37.007 + .729(.1868) = 37.143 Lower control limit = x − A2 R = 37.007 − .729(.1868) = 36.871 2 2 A2 R ) ) = 37.007 + (.729)(.1868) = 37.098 ( 3 3 2 2 Lower A–B boundary = x − ( A2 R ) ) = 37.007 − (.729)(.1868) = 36.916 3 3 1 1 Upper B–C boundary = x + ( A2 R ) ) = 37.007 + (.729)(.1868) = 37.052 3 3 1 1 Lower B–C boundary = x − ( A2 R ) ) = 37.007 − (.729)(.1868) = 36.962 3 3
Upper A–B boundary = x +
452
Chapter 12
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
The x -chart is:
c.
To determine if the process is in or out of control, we check the six rules: Rule 1: Rule 2: Rule 3: Rule 4: Rule 5: Rule 6:
One point beyond Zone A: No points are beyond Zone A. Nine points in a row in Zone C or beyond: No sequence of nine points are in Zone C (on one side of the centerline) or beyond. Six points in a row steadily increasing or decreasing: No sequence of six points steadily increase or decrease. Fourteen points in a row alternating up and down: This pattern does not exist. Two out of three points in Zone A or beyond: There are no groups of three consecutive points that have two or more in Zone A or beyond. Four out of five points in a row in Zone B or beyond: No sequence of five points has four or more in Zone B or beyond.
The process appears to be in control. No special causes of variation appear to be present.
12.16
d.
An example of a special cause of variation would be if the machine used to produce the bolts slipped out of alignment and started producing bolts of a different length. An example of common cause variation would be the grade of the raw material used to make the bolts.
e.
Since the process appears to be in control, it is appropriate to use these limits to monitor future process output.
a.
x1 + x2 + " + x16 868.18 = = 54.26125 k 16 R + R2 + " + R16 44.1 = R= 1 = 2.75625 k 16
x =
Centerline = x = 54.26125 From Table XII, Appendix B, with n = 5, A2 = .577
Upper control limit = x + A2 R = 54.26125 + .577(2.75625) = 55.8516
Methods for Quality Improvement
453
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
Lower control limit = x − A2 R = 54.26125 − .577(2.75625) = 52.6709 Upper A – B boundary = x +
2 2 (.577)(2.75625) = 55.3215 ( A2 R ) = 54.26125 + 3 3
Lower A – B boundary = x −
2 2 ( A2 R) = 54.26125 − (.577)(2.75625) = 53.2010 3 3
Upper B – C boundary = x +
1 1 ( A2 R ) = 54.26125 + (.577)(2.75625) = 54.7914 3 3
Lower B – C boundary = x −
1 1 ( A2 R ) = 54.26125 − (.577)(2.75625) = 53.7311 3 3
The x -chart is:
b.
To determine if the process is in or out of control, we check the six rules: Rule 1: Rule 2: Rule 3: Rule 4: Rule 5:
Rule 6:
One point beyond Zone A: One point is beyond Zone A. Nine points in a row in Zone C or beyond: No sequence of nine points are in Zone C (on one side of the centerline) or beyond. Six points in a row steadily increasing or decreasing: No sequence of six points steadily increase or decrease. Fourteen points in a row alternating up and down: This pattern does not exist. Two out of three points in Zone A or beyond: There are two sets of three consecutive points (data points 3, 4, and 5 and data points 4, 5, and 6) that have two points in Zone A or beyond. Four out of five points in a row in Zone B or beyond: No sequence of five points has four or more in Zone B or beyond.
Special causes of variation appear to be present. The process appears to be out of control. Rules 1 and 5 indicate the process is out of control. c.
454
Since the process is out of control, these control limits should not be used to monitor future process outputs.
Chapter 12
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
12.18
The R-chart is designed to monitor the variation of the process.
12.20
Using Table XII, Appendix B:
12.22
a.
With n = 4, D3 = 0.000
D4 = 2.282
b.
With n = 12, D3 = 0.283
D4 = 1.717
c.
With n = 24, D3 = 0.451
D4 = 1.548
a.
From Exercise 12.11, the R values are: Sample No. 1 2 3 4 5 6 7 8 9 10
R=
R 1.8 2.8 3.8 2.5 3.7 5.0 5.5 3.5 2.5 4.1
Sample No. 11 12 13 14 15 16 17 18 19 20
R 3.2 0.9 2.6 4.0 2.2 4.3 3.6 2.5 2.2 5.5
R1 + R2 + " R20 66.2 = = 3.31 k 20
Centerline = R = 3.31 From Table XII, Appendix B, with n = 4, D4 = 2.282, and D3 = 0.
Upper control limit = R D4 = 3.31(2.282) = 7.553 Since D3 = 0, the lower control limit is negative and is not included on the chart. b.
From Table XII, Appendix B, with n = 4, d2 = 2.059, and d3 = .880.
Upper A–B boundary = R + 2d3
R 3.31 = 6.139 = 3.31 + 2(.880) d2 2.059
Lower A–B boundary = R − 2d3
R 3.31 = 0.481 = 3.31 − 2(.880) d2 2.059
Upper B–C boundary = R + d3
R 3.31 = 4.725 = 3.31 + (.880) d2 2.059
Lower B–C boundary = R − d3
R 3.31 = 1.895 = 3.31− (.880) d2 2.059
Methods for Quality Improvement
455
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
c.
The R-chart is:
To determine if the process is in or out of control, we check the four rules: Rule 1: Rule 2: Rule 3: Rule 4:
One point beyond Zone A: No points are beyond Zone A. Nine points in a row in Zone C or beyond: No sequence of nine points are in Zone C (on one side of the centerline) or beyond. Six points in a row steadily increasing or decreasing: No sequence of six points steadily increase or decrease. Fourteen points in a row alternating up and down: This pattern does not exist.
The process appears to be in control. 12.24
a.
From Table XII, Appendix B, with n = 4, D3 = 0, and D4 = 2.282. R = .335 Upper control limit = R D4 = .335(2.282) = .7645
Since D3 = 0, the lower control limit is negative and is not included on the chart. b.
To determine if special causes of variation are present, we need to complete the R-chart. From Table XII, Appendix B, with n = 4, d2 = 2.059, and d3 = .880.
456
Upper A − B boundary = R + 2d 3
R .335 = .335 + 2(.880) = .6213 d2 2.059
Lower A − B boundary = R − 2d3
R .335 = .335 − 2(.880) = .0486 d2 2.059
Chapter 12
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
Upper B − C boundary = R + d3
R .335 = .335 + (.880) = .4782 d2 2.059
Lower B − C boundary = R − d 3
R .335 = .335 − (.880) = .1918 d2 2.059
The R-chart is: UCL = .7646 .6213 .4782 R = 0.335 .1918 .0486
To determine if the process is in control, we check the four rules. Rule 1: Rule 2: Rule 3: Rule 4:
One point beyond Zone A: No points are beyond Zone A. Nine points in a row in Zone C or beyond: There are not nine points are in a row in Zone C (on one side of the centerline) or beyond. Six points in a row steadily increasing or decreasing: No sequence of six points steadily increase or decrease. Fourteen points in a row alternating up and down: This pattern does not exist.
It appears that the process is in control. c.
Yes. This process appears to be in control. Therefore, these control limits could be used to monitor future output.
d.
Of the 30 R values plotted, there are only 6 different values. Most of the R values take on one of three values. This indicates that the data must be discrete (take on a countable number of values), or that the path widths are multiples of each other.
Methods for Quality Improvement
457
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
12.26
a.
R=
R1 + R2 + " + R20 4 + 6 + " + 15 176 = = = 8.8 k 20 20
Centerline = R = 8.8
From Table XII, Appendix B, with n = 5, D4 = 2.114 and D3 = 0. Upper control limit = RD4 = 8.8(2.114) = 18.603
Since D3 = 0, the lower control limit is negative and is not included on the chart. From Table XII, Appendix B, with n = 5, d2 = 2.326 and d3 = 0.864. R 8.8 = 15.338 = 8.8 + 2(.864) d2 2.326
Upper A – B boundary = R + 2d3
Lower A – B boundary = R − 2 d 3
Upper B – C boundary = R + d 3
Lower B – C boundary = R − d 3
R
= 8.8 − 2(.864)
d2
R
= 8.8 + (.864)
8.8 = 12.069 2.326
= 8.8 − (.864)
8.8 = 5.531 2.326
d2 R d2
8.8 = 2.262 2.326
The R-chart is:
b.
To determine if the process is in or out of control, we check the four rules: Rule 1: Rule 2:
458
One point beyond Zone A: No points are beyond Zone A. Nine points in a row in Zone C or beyond: No sequence of nine points are in Zone C (on one side of the centerline) or beyond.
Chapter 12
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
Rule 3: Rule 4:
Six points in a row steadily increasing or decreasing: No sequence of six points steadily increase or decrease. Fourteen points in a row alternating up and down: This pattern does not exist.
The process appears to be in control since none of the out-of-control signals are observed. No special causes of variation appear to be present.
12.28
c.
Since the process appears to be in control, the control limits of the R-chart could be used to monitor future replacement cycle times.
d.
From part b, we decided that the process was in control. However, there does appear to be a pattern emerging in the R-chart. As the sample number increases, the value of R is tending to increase. If this process was monitored for a longer period of time, the R-chart might indicate that the process was out of control.
a.
R =
R1 + R2 + " + R16 .4 + 1.4 + " + 2.6 44.1 = = = 2.756 k 16 16
Centerline = R = 2.756
From Table XII, Appendix B, with n = 5, D4 = 2.114 and D3 = 0. Upper control limit = RD4 = 2.756(2.114) = 5.826
Since D3 = 0, the lower control limit is negative and is not included on the chart. From Table XII, Appendix B, with n = 5, d2 = 2.326 and d3 = 0.864. Upper A – B boundary = R + 2d3
R 2.756 = 4.803 = 2.756 + 2(.864) d2 2.326
Lower A – B boundary = R − 2d3
R 2.756 = 2.756 - 2(.864) = .709 d2 2.326
Upper B – C boundary = R + d3
2.756 R = 2.756 + (.864) = 3.780 2.326 d2
Lower B – C boundary = R − d3
R 2.756 = 1.732 = 2.756 - (.864) d2 2.326
Methods for Quality Improvement
459
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
The R-chart is:
b.
The R-chart is designed to monitor the process variation.
c.
To determine if the process is in or out of control, we check the four rules: Rule 1: Rule 2: Rule 3: Rule 4:
One point beyond Zone A: No points are beyond Zone A. Nine points in a row in Zone C or beyond: No sequence of nine points are in Zone C (on one side of the centerline) or beyond. Six points in a row steadily increasing or decreasing: No sequence of six points steadily increases or decreases. Fourteen points in a row alternating up and down: This pattern does not exist.
The process appears to be in control. None of the out-of-control signals are present. There is no indication that special causes of variation present. 12.30
The p-chart is designed to monitor the proportion of defective units produced by a process.
12.32
a.
To compute the proportion of defectives in each sample, divide the number of defectives by the number in the sample, 200:
No. of defectives Pˆ = No. in sample
460
Chapter 12
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
The sample proportions are listed in the table: Sample No.
1 2 3 4 5 6 7 8 9 10 11 12 13 b.
pˆ .080 .070 .045 .055 .075 .040 .060 .080 .085 .065 .075 .050 .045
Sample No. pˆ 14 .060 15 .070 16 .055 17 .040 18 .035 19 .060 20 .075 21 .045 22 .080 23 .065 24 .055 25 .050
To get the total number of defectives, sum the number of defectives for all 25 samples. The sum is 303. To get the total number of units sampled, multiply the sample size by the number of samples: 200(25) = 5000. p =
Total defective in all samples 303 = .0606 = 5000 Total units sampled
Centerline = p = .060
Upper control limit = p + 3
Lower control limit = p − 3
c.
p (1 − p ) .0606(.9394) = .0606 + 3 = .1112 n 200 p (1 − p ) .0606(.9394) = .0606 − 3 = .0100 n 200
p (1 − p ) .0606(.9394) = .0606 + 2 = .0943 n 200 p (1 − p ) .0606(.9394) Lower A–B boundary = p − 2 = .0606 − 2 = .0269 n 200 p (1 − p ) .0606(.9394) = .0606 + Upper B–C boundary = p + = .0775 n 200 p (1 − p ) .0606(.9394) = .0606 − Lower B–C boundary = p − = .0437 n 200 Upper A–B boundary = p + 2
Methods for Quality Improvement
461
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
d.
The p-chart is:
e.
To determine if the process is in or out of control, we check the four rules: Rule 1: Rule 2:
One point beyond Zone A: No points are beyond Zone A. Nine points in a row in Zone C or beyond: No sequence of nine points are in Zone C (on one side of the centerline) or beyond. Six points in a row steadily increasing or decreasing: No sequence of six points steadily increase or decrease. Fourteen points in a row alternating up and down: This pattern does not exist.
Rule 3: Rule 4:
The process appears to be in control. There do not appear to be any special causes of variation. 12.34
a.
The sample size is determined by the following: n>
9 (1 − p 0 ) p0
=
9(1 − .01) = 891 .01
The minimum sample size is 892. b.
The sample size is determined by the following: n>
9 (1 − p 0 ) p0
=
9(1 − .05) = 171 .05
The minimum sample size is 172.
462
Chapter 12
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
c.
The sample size is determined by the following: n>
9 (1 − p 0 ) p0
=
9(1 − .10) = 81 .10
The minimum sample size is 82. d.
The sample size is determined by the following: n>
9 (1 − p 0 ) p0
=
9(1 − .20) = 36 .20
The minimum sample size is 37. 12.36
a.
The sample size is determined by the following: n>
9 (1 − p 0 ) p0
=
9(1 − .07) = 119.6 ≈ 120 .07
The minimum sample size is 120. b.
To compute the proportion of defectives in each sample, divide the number of defectives by the number in the sample, 120: pˆ =
No. defectives No. in sample
The sample proportions are listed in the table: Sample No.
1 2 3 4 5 6 7 8 9 10
pˆ .092 .042 .033 .067 .083 .108 .075 .067 .083 .092
Sample No. pˆ 11 .083 12 .100 13 .067 14 .050 15 .083 16 .042 17 .083 18 .083 19 .025 20 .067
To get the total number of defectives, sum the number of defectives for all 20 samples. The sum is 171. To get the total number of units sampled, multiply the sample size by the number of samples: 120(20) = 2400.
Methods for Quality Improvement
463
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
p =
Total defective in all samples 171 = .071 = 2400 Total units sampled
Centerline = p = .071 Upper control limit = p + 3 Lower control limit = p − 3
p (1 − p ) .071(.929) = .071 + 3 = .141 n 120 p (1 − p ) .071(.929) = .071 − 3 = .001 n 120
p (1 − p ) .071(.929) = .071 + 2 = .118 n 120 p (1 − p ) .071(.929) = .071 − 2 = .024 Lower A–B boundary = p − 2 n 120 p (1 − p ) .071(.929) Upper B–C boundary = p + = .071 + = .094 n 120 p (1 − p ) .071(.929) Lower B–C boundary = p − = .071 − = .048 n 120 Upper A–B boundary = p + 2
The p-chart is:
c.
To determine if the process is in or out of control, we check the four rules: Rule 1: Rule 2: Rule 3: Rule 4:
One point beyond Zone A: No points are beyond Zone A. Nine points in a row in Zone C or beyond: No sequence of nine points are in Zone C (on one side of the centerline) or beyond. Six points in a row steadily increasing or decreasing: No sequence of six points steadily increase or decrease. Fourteen points in a row alternating up and down: This pattern does not exist.
The process appears to be in control.
464
Chapter 12
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
12.38
d.
Since the process is in control, it is appropriate to use the control limits to monitor future process output.
e.
No. The number of defectives recorded was per day, not per hour. Therefore, the p-chart is not capable of signaling hour-to-hour changes in p.
a.
To compute the proportion of defectives in each sample, divide the number of defectives by the number in the sample, 200: pˆ =
No. defectives No. in sample
The sample proportions are listed in the table: Sample No.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
pˆ .065 .025 .010 .015 .010 .015 .005 .010 .005 .005 .055 .030 .010 .015 .005
Sample No. pˆ 16 .015 17 .005 18 .010 19 .015 20 .005 21 .045 22 .025 23 .010 24 .005 25 .015 26 .010 27 .020 28 .010 29 .005 30 .005
To get the total number of defectives, sum the number of defectives for all 30 samples. The sum is 96. To get the total number of units sampled, multiply the sample size by the number of samples: 200(30) = 6000. p =
Total defective in all samples 96 = = .016 Total units sampled 6000
The centerline is p = .016 p (1 − p ) .016(1 − .016) = .016 + 3 = .0426 n 200 p (1 − p ) .016(1 − .016) Lower control limit = p − 3 = .016 − 3 = -.0106 n 200 p (1 − p ) .016(1 − .016) Upper A–B boundary = p + 2 = .016 + 2 = .0337 n 200 Upper control limit = p + 3
Methods for Quality Improvement
465
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
Lower A–B boundary = p − 2 Upper B–C boundary = p + Lower B–C boundary = p −
p (1 − p ) .016(1 − .016) = .016 − 2 = -.0017 n 200 p (1 − p ) .016(1 − .016) = .016 + = .0249 n 200 p (1 − p ) .016(1 − .016) = .016 − = .0071 n 200
The p-chart is:
b.
To determine if the process is in or out of control, we check the four rules: Rule 1: Rule 2: Rule 3: Rule 4:
One point beyond Zone A: There are 3 points beyond Zone A—Points 1, 11, and 21. Nine points in a row in Zone C or beyond: No sequence of nine points are in Zone C (on one side of the centerline) or beyond. Six points in a row steadily increasing or decreasing: This pattern is not present. Fourteen points in a row alternating up and down: This pattern does not exist.
The process does not appear to be in control. Rule 1 indicates that the process is out of control. 12.40
Specification spread is the difference between the upper specification limit and the lower specification spread. The specification spread is determined by customers, management, and product designers. Process spread is the spread of the actual output and is a function of the standard deviation of the data.
12.42
There are two reasons why CP should not be used in isolation. First, CP is a statistic and is subject to sampling error. The sample standard deviation is used to estimate the population standard deviation which is used to calculate the process spread. Thus, the estimate of the process spread can vary from sample to sample. Second, CP does not reflect the shape of the output distribution. Distributions with different shapes can have the same CP value.
466
Chapter 12
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
12.44
12.46
12.48
The specification spread is the difference between the upper specification limit and the lower specification limit. a.
Specification spread = USL − LSL = 19.65 − 12.45 = 7.20
b.
Specification spread = USL − LSL = .0010 − .0008 = .0002
c.
Specification spread = USL − LSL = 1.43 − 1.27 = 0.16
d.
Specification spread = USL − LSL = 490 − 486 = 4
CP =
Specification spread USL − LSL = 6σ Process spread
a.
CP ≈
1.0065 − 1.0035 USL − LSL .003 =1 = = 6s .003 6(.0005)
b.
CP ≈
22 − 21 USL − LSL 1 = = = .8333 6s 1.2 6(.2)
c.
CP ≈
875 − 870 USL − LSL 5 = 1.111 = = 6s 4.5 6(.75)
a.
If the output distribution is normal with a mean of 1000 and a standard deviation of 100, then the proportion of the output that is unacceptable is: P(x < 980) + P(x > 1,020) 980 − 1, 000 ⎞ 1, 020 − 1, 000 ⎞ ⎛ ⎛ = P⎜ z < ⎟ + P⎜ z > ⎟ 100 100 ⎝ ⎠ ⎝ ⎠ = P(z < −.2) + P(z > .2) = (.5 − .0793) + (.5 − .0793) = .8414 (using Table IV, Appendix B) The percentage of unacceptable output is 84.14%.
b.
CP =
USL − LSL 1, 020 − 980 40 = .067 ≈ = 6σ 600 6(100)
Since the value of CP is less than 1, the process is not capable.
Methods for Quality Improvement
467
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
12.50
a.
A capability diagram is: LSL = 35 is off the chart.
b.
Fifty-two of the observations are above the upper specification limit. Thus, the percentage is (52/100) × 100% = 52%.
c.
From the sample, x = 37.007 and s = .083. CP =
d.
37 − 35 USL − LSL 2 ≈ = 4.016 = 6s .498 6(.083)
Since the CP value is greater than 1, the process is capable.
12.52
The quality of a good or service is indicated by the extent to which it satisfies the needs and preferences of its users. Its eight dimensions are: performance, features, reliability, conformance, durability, serviceability, aesthetics, and other perceptions that influence judgments of quality.
12.54
A process is a series of actions or operations that transform inputs to outputs. A process produces output over time. Organizational process: Manufacturing a product. Personnel Process: Balancing a checkbook.
12.56
The six major sources of process variation are: people, machines, materials, methods, measurements, and environment.
12.62
Common causes of variation are the methods, materials, equipment, personnel, and environment that make up a process and the inputs required by the process. That is, common causes are attributable to the design of the process. Special causes of variation are events or actions that are not part of the process design. Typically, they are transient, fleeting events that affect only local areas or operations within the process for a brief period of time. Occasionally, however, such events may have a persistent or recurrent effect on the process.
12.64
If a process is capable, then it is necessarily in control. If a process is in control, then the control chart should be used to monitor the process.
468
Chapter 12
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
12.66
The probability of observing a value of more than 3 standard deviations from its mean is: P( x > μ + 3 σ x ) + P( x < μ − 3 σ x ) = P(z > 3) + P(z < 3) = .5000 − .4987 + .5000 − .4987 = .0026 If we want to find the number of standard deviations from the mean the control limits should be set so the probability of the chart falsely indicating the presence of a special cause of variation is .10, we must find the z score such that: P(z > z0) + P(z < −z0) = .1000 or P(z > z0) = .0500 Using Table IV, Appendix B, z0 = 1.645. Thus the control limits should be set 1.645 standard deviations above and below the mean.
12.68
a.
The centerline = x =
∑ x = 150.58 n
20
= 7.529
The time series plot is:
12.70
b.
The variation pattern that best describes the pattern in this time series is the level shift. Points 1 through 10 all have fairly low values, while points 11 through 20 all have fairly high values.
a.
Yes. The minimum sample size necessary so the lower control limit is not negative is: n>
9 (1 − p 0 ) p0
From the data, p0 ≈ .06 Thus, n >
9(1 − .06) = 141. Our sample size was 200. .06
Methods for Quality Improvement
469
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
b.
To compute the proportion of defectives in each sample, divide the number of defectives by the number in the sample, 200: p =
No. of defectives No. in sample
The sample proportions are listed in the table: Sample No. 1 2 3 4 5 6 7 8 9 10 11
p .02 .03 .055 .06 .025 .05 .04 .08 .085 .10 .14
Sample No. 12 13 14 15 16 17 18 10 20 21
p .10 .10 .085 .065 .05 .055 .035 .03 .04 .045
To get the total number of defectives, sum the number of defectives for all 21 samples. The sum is 258. To get the total number of units sampled, multiply the sample size by the number of samples: 200(21) = 4200. p =
No. of defectives 258 = = .0614 No. in sample 4200
Centerline = p = .0614 Upper control limit = p + 3 Lower control limit = p − 3 Upper A-B boundary = p + Lower A-B boundary = p − Upper B-C boundary = p + Lower B-C boundary = p −
470
p (1 − p ) = .0614 + 3 n p (1 − p ) = .0614 − 3 n p (1 − p ) 2 = .0614 + n p (1 − p ) 2 = .0614 − n p (1 − p ) = .0614 + n p (1 − p ) = .0614 − n
.0614(.9386) = .1123 200 .0614(.9386) = .0105 200 .0614(.9386) 2 = .0953 200 .0614(.9386) 2 = .0275 200 .0614(.9386) = .0784 200 .0614(.9386) = .0444 200
Chapter 12
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
The p-chart is:
c.
To determine if the control limits should be used to monitor future process output, we need to check the four rules. Rule 1: Rule 2: Rule 3: Rule 4:
One point beyond Zone A: The 11th point is beyond Zone A. This indicates the process is out of control. Nine points in a row in Zone C or beyond: There are not nine points in a row in Zone C (on one side of the centerline) or beyond. Six points in a row steadily increasing or decreasing: No sequence of six points steadily increase or decrease. Fourteen points in a row alternating up and down: This pattern does not exist.
Rule 1 indicates the process is out of control. These control limits should not be used to monitor future process output. 12.72
a.
In order for the x -chart to be meaningful, we must assume the variation in the process is constant (i.e., stable). ∑ x and R = range = largest measurement - smallest For each sample, we compute x = n measurement. The results are listed in the table: Sample No. 1 2 3 4 5 6 7 8 9 10 11 12
x 32.325 30.825 30.450 34.525 31.725 33.850 32.100 28.250 32.375 30.125 32.200 29.150
Methods for Quality Improvement
R 11.6 12.4 7.8 10.2 9.1 10.4 10.1 6.8 8.7 6.3 7.1 9.3
Sample No. 13 14 15 16 17 18 19 20 21 22 23 24
x 31.050 34.400 31.350 28.150 30.950 32.225 29.050 31.400 30.350 34.175 33.275 30.950
R 13.3 9.6 7.3 8.6 7.6 5.6 10.0 8.7 8.9 10.5 13.0 8.9
471
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
x1 + x2 + " + x24 755.225 = = 31.4677 k 24 R + R2 + " + R24 221.8 = R = 1 = 9.242 k 24 x =
Centerline = x = 31.468 From Table XII, Appendix B, with n = 4, A2 = .729. Upper control limit = x + A2 R = 31.468 + .729(9.242) = 38.205 Lower control limit = x − A2 R = 31.468 - .729(9.242) = 24.731 2 2 ( A2 R ) = 31.468 + (.729)(9.242) = 35.960 3 3 2 2 Lower A-B boundary = x − ( A2 R) = 31.468 − (.729)(9.242) = 26.976 3 3 1 1 Upper B-C boundary = x + ( A2 R ) = 31.468 + (.729)(9.242) = 33.714 3 3 1 1 Lower B-C boundary = x − ( A2 R) = 31.468 − (.729)(9.242) = 29.222 3 3
Upper A-B boundary = x +
The x -chart is:
b.
To determine if the process is in or out of control, we check the six rules. Rule 1: Rule 2: Rule 3: Rule 4:
472
One point beyond Zone A: No points are beyond Zone A. Nine points in a row in Zone C or beyond: No sequence of nine points are in Zone C (on one side of the centerline) or beyond. Six points in a row steadily increasing or decreasing: No sequence of six points steadily increase or decrease. Fourteen points in a row alternating up and down: This pattern does not exist.
Chapter 12
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
Rule 5: Rule 6:
Two out of three points in Zone A or beyond: There are no groups of three consecutive points that have two or more in Zone A or beyond. Four out of five points in a row in Zone B or beyond: No sequence of five points has four or more in Zone B or beyond.
The process appears to be in control. There are no indications that special causes of variation are affecting the process.
12.74
c.
Since the process appears to be in control, these limits should be used to monitor future process output.
a.
A capability analysis diagram is:
b.
For an upper specification limit of 5, there are 27 observations above this limit. Thus, (27/100) × 100% = 27% of the observations are unacceptable. It does not appear that the process is capable.
c.
From Exercise 14.73, the process appears to be in control. Thus, it is appropriate to estimate CP. From the sample, x = 3.867 and s = 2.190 CP =
5−0 USL − LSL 5 = .381 = ≈ 6s 6(2.19) 13.14
Since the CP value is less than 1, the process is not capable.
12.76
d.
There is no lower specification limit because management has no time limit below which is unacceptable. The variable being measured is time customers wait in line. The actual lower limit would be 0.
a.
To get the total number of defectives, sum the number of defectives for all 36 samples. The sum is 279. To get the total number of units sampled, multiply the sample size by the number of samples: 160(36) = 5760. p =
Total defective in all samples 279 = .048 = 5760 Total units sampled
The centerline is p = .048
Methods for Quality Improvement
473
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
p (1 − p ) .048(1 − .048) = .048 + 3 = .099 N 160 p (1 − p ) .048(1 − .048) = .048 − 3 = -.003 −3 N 160 p (1 − p ) .048(1 − .048) p +2 = .048 + 2 = .082 N 160 p (1 − p ) .048(1 − .048) = .048 − 2 = .014 p −2 N 160 p (1 − p ) .048(1 − .048) = .048 + = .065 p + N 160 p (1 − p ) .048(1 − .048) = .048 − = .031 p − N 160
Upper control limit = p + 3 Lower control limit = p Upper A–B boundary = Lower A–B boundary = Upper B–C boundary = Lower B–C boundary = The p-chart is:
b
To determine if the process is in or out of control, we check the four rules of the R-chart: Rule 1: Rule 2: Rule 3: Rule 4:
One point beyond Zone A: No points are beyond Zone A. Nine points in a row in Zone C or beyond: No sequence of nine points are in Zone C (on one side of the centerline) or beyond. Six points in a row steadily increasing or decreasing: This pattern is not present. Fourteen points in a row alternating up and down: This pattern does not exist.
The process appears to be in control. Thus, there is no indication that special causes of variation are present.
474
Chapter 12
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
c.
The Pareto diagram is:
Most of the defects are due to microcracks. Thus, "microcracks" are the "vital few." The other types of defectives are broken stands, gaps between layers, and internal voids. These are the "trivial many."
Methods for Quality Improvement
475
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
Time Series: Descriptive Analyses, Models, and Forecasting
13.2
a.
Chapter 13
The simple composite index is calculated as follows: First, sum the observations for all the series of interest at each time period. Select the base time period. Divide each sum by the sum in the base time period and multiply by 100.
b.
To calculate a weighted composite index, we follow the following steps: First, multiply the observations in each time series by its appropriate weight. Then sum the weighted observations across all times series for each time period. Select the base time period. Divide each weighted sum by the weighted sum in the base time period and multiply by 100.
c.
The steps necessary to compute a Laspeyres Index are: 1. 2. 3. 4. 5.
d.
The steps necessary to compute a Paasche index are: 1. 2. 3. 4.
5.
13.4
476
a.
Collect data for each of k price series. Select a base time period and collect purchase quantity information for each of the k series at the base time period. Using the purchase quantity values at the base period as weights, multiply each value in the kth series by its corresponding weight. Sum the products for each time period. Divide each sum by the sum corresponding to the base period and multiply by 100.
Collect data for each of k price series. Select a base period. Collect purchase quantity information for each series at each time period. For each time period, multiply the value in each price series by its corresponding purchase quantity for that time period. Sum the products for each time period. To find the value of the Paasche index at a particular time period, multiply the purchase quantity values (weights) for that time period by the corresponding price values of the base time period. Sum the results for the base period. The Paasche Index is then found by dividing the sum found in (4) by the sum found in (5).
The simple index for the quarter 4 price of product A, using quarter 1 as the base period is (4.25 / 3.25) × 100 = 130.77.
Chapter 13
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
13.6
b.
The simple index for the quarter 2 price of product B, using quarter 1 as the base period is (1.25 / 1.75) × 100 = 71.43.
c.
To find the simple composite index, we must first sum the prices for all three products over the base period and the quarter for which we want to compute the simple composite index. The sum for quarter 1 is 3.25 + 1.75 + 8.00 = 13.00. The sum for quarter 4 is 4.25 + 1.00 + 10.50 = 15.75. The simple composite index for quarter 4 using quarter 1 as the base period is (15.75 / 13.00) × 100 = 121.15.
d.
The sum of all the products for quarter 2 is 3.50 + 1.25 + 9.35 = 14.10. The simple composite index for quarter 4 using quarter 2 as the base period is (15.75 / 14.10) × 100 = 111.70.
a.
To find the simple index, divide each value by the value for the base year and multiply by 100. The index numbers are:
Year 1975 1980 1985 1990 1995 2000 b.
Simple Index (Base Year = 1975) (13,719/13,719) × 100 = 100.00 (21,023/13,719) × 100 = 153.24 (27,735/13,719) × 100 = 202.16 (35,353/13,719) × 100 = 257.69 (40,611/13,719) × 100 = 296.02 (50,890/13,719) × 100 = 370.95
Simple Index (Base Year = 1980) (13,719/21,023) × 100 = 65.26 (21,023/21,023) × 100 = 100.00 (27,735/21,023) × 100 = 131.93 (35,353/21,023) × 100 = 168.16 (40,611/21,023) × 100 = 193.17 (50,890/21,023) × 100 = 242.07
The index value for 1990 is 257.69 when the base is 1975. Thus, the median annual family income for 1990 increased by 257.69 – 100 = 157.69% over the median annual family income in 1975. The index value for 1990 is 168.16 when the base is 1980. Thus, the median annual family income for 1990 increased by 168.16 – 100 = 68.16% over the median annual family income in 1980.
13.8
a.
To compute the simple index, divide each housing start value by the 2001, Quarter 1 value, 274 and then multiply by 100. Year 2001
2002
2003
Quarter 1 2 3 4 1 2 3 4 1 2
Simple Index (274/274) × 100 = (374/274) × 100 = (341/274) × 100 = (285/274) × 100 = (293/274) × 100 = (386/274) × 100 = (361/274) × 100 = (319/274) × 100 = (304/274) × 100 = (406/274) × 100 =
100.00 136.50 124.45 104.01 106.93 140.88 131.75 116.42 110.95 148.18
Year 2003 2004
2005
Time Series: Descriptive Analyses, Models, and Forecasting
Quarter 3 4 1 2 3 4 1 2 3 4
Simple Index (412/274) x 100 = (377/274) x 100 = (345/274) x 100 = (456/274) x 100 = (440/274) x 100 = (370/274) x 100 = (369/274) x 100 = (485/274) x 100 = (471/274) x 100 = (392/274) x 100 =
150.36 137.59 125.91 166.42 160.58 135.04 134.67 177.01 171.90 143.07
477
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
13.10
b.
The value of the index for Quarter 2, 2004 is 166.42. Thus, the housing starts in Quarter 2, 2004 increased by 166.42 – 100 = 66.42% over the housing starts in the base quarter, Quarter 1, 2001.
c.
The value of the index for Quarter 4, 2005 is 143.07. Thus, the housing starts in Quarter 4, 2005 increased by 143.07 – 100 = 43.07% over the housing starts in the base quarter, Quarter 1, 2001.
d.
The number of housing starts for Quarter 1, 2003 is 304 thousand. The number of housing starts for Quarter 4, 2005 is 392 thousand. Using Quarter 1, 2003 as the base, the index for Quarter 4, 2005 is (392/304) × 100 = 128.95. Thus, the number of housing starts in Quarter 4, 2005 increased by 128.95 – 100 = 28.95% over the housing starts in Quarter 1, 2003.
a.
To compute the simple index for the agricultural data, divide each farm value by the 1980 value 3,364 and then multiply by 100. To compute the simple index for the nonagricultural data, divide each nonfarm value by the 1980 value 95,938 and then multiply by 100. The two indices are:
Year 1980 1985 1990 1995 2000 2003
Farm Index (3,364/3,364) × 100 = (3,179/3,364) × 100 = (3,223/3,364) × 100 = (3,440/3,364) × 100 = (2,464/3,364) × 100 = (2,275/3,364) × 100 =
Nonfarm Index (95,938/95,938) × 100 = (10,3971/95,938) × 100 = (115,570/95,938) × 100 = (121,460/95,938) × 100 = (134,427/95,938) × 100 = (135,461/95,938) × 100 =
100.00 108.37 120.46 126.60 140.12 141.20
b.
The nonfarm segment has shown the greater percentage change in employment over the time period. The nonfarm employment in 2003 was 41.20% greater than in 1980. The farm employment in 2003 was 32.37% lower than in 1980.
c.
To compute the simple composite index, first sum the two values (farm and nonfarm) for every time period. Then divide the sum by the sum in 1980, 99,302, and then multiply by 100. The simple composite index is: Year 1980 1985 1990 1995 2000 2003
d.
478
100.00 94.50 95.81 102.26 73.25 67.63
Sum 99,302 107,150 118,793 124,900 136,891 137,736
Simple Composite Index (99,302/99,302) × 100 = (107,150/99,302) × 100 = (118,793/99,302) × 100 = (124,900/99,302) × 100 = (136,891/99,302) × 100 = (137,736/99,302) × 100 =
100.00 107.90 119.63 125.78 137.85 138.70
The simple composite index value for 2003 is 138.70. The composite employment is 38.70% higher in 2003 than in 1980.
Chapter 13
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
13.12
a.
The find Laspeyres index, we multiply the durable goods by 10.9, the nondurable goods by 14.02, and the services by 42.6. The three products are then summed. The index is found by dividing the weighted sum at each time period by the weighted sum of 1970, 17,108.86, and then multiplying by 100. The Laspeyres index and the simple composite index for 1970 (computed in Exercise 13.11) are: Year
Simple Composite Index-1970 51.43 68.77 100.00 158.52 270.39 412.59 581.78 768.60 1,033.83 1,272.99
1960 1965 1970 1975 1980 1985 1990 1995 2000 2004 b.
Weighted Sum 8,409.95 11,442.51 17,108.86 27,509.89 48,215.53 76,167.86 110,254.64 150,193.08 202,856.51 251,152.45
Laspeyres Index 49.16 66.88 100.00 160.79 281.82 445.20 644.43 877.87 1,185.68 1,467.97
The plot of the two indices is: 1600
Variable I-1970 Laspeyres
1400 1200
Index
1000 800 600 400 200 0 1960
1965
1970
1975 1980
1985
1990
1995
2000
Y ear
The two indices are very similar from 1960 to approximately 1980. After 1980, the difference between the two indices becomes larger, with the Laspeyres index increasing faster than the simple composite index.
Time Series: Descriptive Analyses, Models, and Forecasting
479
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
13.14
a.
To get the simple composite price index, sum the prices for the three metals for each month, divide by 2,090.35 (the sum of the prices for the base period January), and multiply by 100. To get the simple composite quantity index, sum the quantities for the three metals for each month, divide by 8,793.40 (the sum of the quantities for the base period January), and multiply by 100. The indices are:
Month Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec b.
Price Total 2,090.35 2,495.72 2,536.85 2,409.55 2,550.70 2,603.20 2,719.30 2,998.52 2,978.98 2,997.82 3,038.80 3,018.57
Price Index 100.00 119.39 121.36 115.27 122.02 124.53 130.09 143.45 142.51 143.41 145.37 144.41
Quantity Index 100.00 97.02 106.97 102.89 105.80 104.08 105.78 107.56 106.70 110.29 103.79 100.16
To compute the Laspeyres index, multiply the price for each month by the quantity for each of the metals for January, sum the products for the three metals, divide by 1,768,700.64 (the sum for the base period January), and multiply by 100. The Laspeyres index is: Month Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
480
Quantity Total 8,793.40 8,531.70 9,406.50 9,047.10 9,303.20 9,152.10 9,301.80 9,457.90 9,382.90 9,698.20 9,127.00 8,807.90
Total 1,768,700.64 2,077,067.24 2,345,138.00 2,114,563.64 1,760,956.32 1,746,326.88 2,117,568.80 2,377,017.20 2,100,958.72 2,276,109.40 2,366,980.72 2,155,654.92
Laspeyres Index 100.00 117.43 132.59 119.55 99.56 98.74 119.72 134.39 118.79 128.69 133.83 121.88
Chapter 13
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
c.
The plots of the simple composite price index, the simple composite quantity index, and Laspeyres index are: 150
Variable Price Quantity Laspeyres
140
Index
130
120
110
100
90 Jan
Feb Mar
Apr
May
Jun
Jul
Aug
Sep Oct
Nov Dec
M onth
The quantity index appears to be fairly stable while the price index steadily increases. The Laspeyres index is rather unstable, as it varies much more than the other two indices. d.
The following steps are used to compute the Paasche index: 1. 2.
3.
First, multiply the price × production for copper, steel, and lead for each month. The numerator of the index is the sum of these three quantities at each month. Next, multiply the production values of copper by 1,133, the production of steel by 187.75, and the production of lead by 769.6. The denominator is the sum of these three quantities at each month. The values of the Paasche index are the ratios of these two values at each month times 100.
The Paasche index is: Month Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
Paasche Numerator 1,768,700.64 2,013,192.24 2,500,128.80 2,180,640.81 1,858,912.26 1,822,735.92 2,230,984.40 2,549,791.96 2,244,369.96 2,504,067.86 2,450,159.20 2,175,046.70
Paasche Denominator 1,768,700.64 1,714,396.58 1,884,813.60 1,823,938.71 1,867,861.77 1,844,379.26 1,864,385.48 1,898,332.74 1,888,977.74 1,946,822.77 1,831,683.15 1,781,166.44
Time Series: Descriptive Analyses, Models, and Forecasting
Paasche Index 100.00 117.43 132.65 119.56 99.52 98.83 119.66 134.32 118.81 128.62 133.77 122.11
481
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
e.
The plot of the Laspeyres index and the Paasche index is: The two indices are almost identical. Time Series Plot of Laspeyres, Paasche 135
Variable Laspeyres Paasche
130 125
Data
120 115 110 105 100 Jan
Feb Mar
Apr
May
Jun
Jul
Aug
Sep Oct
Nov Dec
M onth
13.16
f.
The values of Laspeyres index for September and December are 118.79 and 121.88 The values of the Paasche index for September and December are 118.81 and 122.11. These values are almost identical. Both the Laspeyres and Paasche indices are so close to being the same, neither is superior to the other.
a.
The exponentially smoothed employment for the first period is equal to the employment for that period. For the rest of the time periods, the exponentially smoothed employment values are found by multiplying .5 times the employment value of that time period and adding to that (1 − .5) times the value of the exponentially smoothed employment figure of the previous time period. The exponentially smoothed employment value for the time period 2 is .5(281) + (1 − .5)(280) = 280.5. The rest of the values are shown in the table.
Month Jan. Feb. Mar. Apr. May June July Aug. Sept. Oct. Nov. Dec.
482
t 1 2 3 4 5 6 7 8 9 10 11 12
Yt 280 281 250 246 239 218 218 210 205 206 200 200
Exponentially Smoothed Series w = .5 280.0 280.5 265.3 255.6 247.3 232.7 225.3 217.7 211.3 208.7 204.3 202.2
Chapter 13
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
b.
The graph of the time series and the exponentially smoothed series is:
280 270
Exponentially Smoothed Series
260
Yt
250 240
Series
230 220 210 200 2
4
6
8
10
12
Time Period
13.18
a.
The exponentially smoothed fish catch for Chile for the first period is equal to the fish catch for that period. For the rest of the time periods, the exponentially smoothed fish catch values are found by multiplying .5 times the fish catch of that time period and adding to that (1 − .5) times the value of the exponentially smoothed fish catch figure of the previous time period. The exponentially smoothed fish catch for Chile for the time period 1995 is .5(7,590.5) + (1 − .5)(5,195.4) = 6,392.95. The rest of the values are shown in the table. Similarly, the exponentially smoothed fish catch for Brazil for the first period is equal to the fish catch for that period. For the rest of the time periods, the exponentially smoothed fish catch values are found by multiplying .5 times the fish catch of that time period and adding to that (1 − .5) times the value of the exponentially smoothed fish catch figure of the previous time period. The exponentially smoothed fish catch for Brazil for time period 1995 is .5(800.0) + (1 − .5)(802.9) = 801.45. The rest of the values are shown in the table.
Time Series: Descriptive Analyses, Models, and Forecasting
483
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
Year 1990 1995 1998 1999 2000 2001 2002
b.
Chile Catch 5,195.4 7,590.5 3,265.3 5,050.2 4,300.0 3,797.1 4,271.5
Chile w=.5 Exponentially Smoothed Catch 5,195.40 6,392.95 4,829.13 4,939.66 4,619.83 4,208.47 4,239.98
Brazil Catch 802.9 800.0 706.8 703.9 766.8 806.7 822.1
Brazil w=.5 Exponentially Smoothed Catch 802.90 801.45 754.13 729.01 747.91 777.30 799.70
The plot of the two time series and the two exponentially smoothed series is: 8000
Variable Chile Brazil Chile-Exp Brazil-Exp
7000
Fish C atch
6000 5000 4000 3000 2000 1000 0 1990
1992
1994
1996 Y ear
1998
2000
2002
Both the time series and the exponentially smoothed series for the fish catch in Brazil are fairly stable over time. There is a decrease and then increase for both series in Brazil. Both the time series and exponentially smoothed series for the fish catch in Chile show a decrease over time. The exponentially smoothed series is more stable than the actual time series.
484
Chapter 13
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
13.20
a.
The exponentially smoothed expenditure for the first time period is equal to the expenditure for that period. For the rest of the time periods, the exponentially smoothed expenditures are found by multiplying the expenditures for the time period by w = .2 and adding to that (1 − .2) times the exponentially smoothed value above it. The exponentially smoothed value for the year 1991 is .2(548.9) + (1 − .2)(590.1) = 581.86. The rest of the values appear in the table. The process is repeated with w = .8.
Year 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005
b.
Expenditure s 590.1 548.9 581.1 607.6 643.2 654.6 687.1 727.4 779.3 831.6 853.4 872.0 890.9 912.3 925.6 931.5
w = .2 Exponentially Smoothed Value 590.10 581.86 581.71 586.89 598.15 609.44 624.97 645.46 672.23 704.10 733.96 761.57 787.43 812.41 835.05 854.34
w = .8 Exponentially Smoothed Value 590.10 557.14 576.31 601.34 634.83 650.65 679.81 717.88 767.02 818.68 846.46 866.89 886.10 907.06 921.89 929.58
The plot of the two series is:
Variable Expend Exp-.2 Exp-.8
900
Expenditur es
800
700
600
500 1991
1993
1995
1997 1999 Y ear
2001
2003
There trend in personal consumption expenditure on transportation increased at a faster rate in the 1990s than in the 2000s. In the 2000s, the consumption expenditure is increasing but at a slower rate.
Time Series: Descriptive Analyses, Models, and Forecasting
485
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
13.22
a.
The exponentially smoothed Stock Index for the first time period is equal to the Stock Index for that time period. For the rest of the time periods, the exponentially smoothed stock price is found by multiplying w = .3 times the stock prices for that time period and adding to that (1 − .3) times the value of the exponentially smoothed stock price for the previous time period. The exponentially smoothed stock prices for the second time period is .3(1372.7) + (1 − .3)(1286.4) = 1312.29. The rest of the values are shown in the table.
Year 1999
2000
2001
2002
2003
2004
2005
2006
486
Quarter 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3
S&P 500 1286.4 1372.7 1282.7 1469.2 1498.6 1454.6 1436.5 1320.3 1160.3 1224.4 1040.9 1148.1 1147.4 989.8 815.3 879.8 848.2 974.5 996 1111.9 1126.2 1140.8 1114.6 1211.9 1180.6 1191.3 1228.8 1248.3 1294.9 1270.2 1335.8
Exponentially Smoothed Series w = .3 1286.4 1312.3 1303.4 1353.1 1396.8 1414.1 1420.8 1390.7 1321.6 1292.4 1217.0 1196.3 1181.6 1124.1 1031.4 986.0 944.6 953.6 966.3 1010.0 1044.9 1073.6 1085.9 1123.7 1140.8 1155.9 1177.8 1198.9 1227.7 1240.5 1269.1
Exponentially Smoothed Series w = .7 1286.4 1346.8 1301.9 1419.0 1474.7 1460.6 1443.7 1357.3 1219.4 1222.9 1095.5 1132.3 1142.9 1035.7 881.4 880.3 857.8 939.5 979.0 1072.0 1110.0 1131.5 1119.7 1184.2 1181.7 1188.4 1216.7 1238.8 1278.1 1272.6 1316.8
Chapter 13
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
The plot of the original series and the exponentially smoothed series with w = .3 is:
Variable S&P 500 Exp-.3
1500 1400
S & P 500
1300 1200 1100 1000 900 800 Q uarter Year
b.
Q1 1999
Q1 Q1 2000 2001
Q1 Q1 2002 2003
Q1 2004
Q1 Q1 2005 2006
The same procedure is followed for w = .7. The exponentially smoothed Stock Index for the first time period is equal to the Stock Index for that time period. For the rest of the time periods, the exponentially smoothed stock price is found by multiplying w = .7 times the stock prices for that time period and adding to that (1 − .7) times the value of the exponentially smoothed stock price for the previous time period. The exponentially smoothed stock prices for the second time period is .7(1372.7) + (1 − .7)(1286.4) = 1346.8. The rest of the values are shown in the table in part a. The plot of the original series and the exponentially smoothed series with w = .7 is:
Variable S&P 500 Exp-.7
1500 1400
S & P 500
1300 1200 1100 1000 900 800 Q uarter Year
c.
Q1 1999
Q1 Q1 2000 2001
Q1 Q1 2002 2003
Q1 2004
Q1 Q1 2005 2006
The exponentially smoothed series with w = .3 better describes the trends in the series. The exponentially smoothed series with w = .7 is almost exactly like the original series.
Time Series: Descriptive Analyses, Models, and Forecasting
487
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
13.24
13.26
a.
The missing trend value for quarter 3 is: T3 = v(E3 – E2) + (1 – v)T2 = .6(3.78 – 3.50) + (1 − .6)(.25) = .27
b.
The missing smoothed value for quarter 4 is: E4 = wY4 + (1 – w)(E3 + T3) = .2(4.25) + (1 − .2)(3.78 + .27) = 4.09.
c.
The forecast for quarter 5 is: FQ5 = Ft+1 = Et + Tt = 4.09 + .29 = 4.38.
a.
To compute the exponentially smoothed values, we follow these steps: E1 = Y1 = 345 E2 = wY2 + (1 – w)E1 = .6(456) + (1 − .6)(345) = 411.60 E3 = wY3 + (1 – w)E2 = .6(440) + (1 − .6)(411.60) = 428.64 The rest of the values are computed in a similar manner and are listed in the table: Year 2004
Quarter 1 2 3 4 1 2 3 4
2005
b.
Exponentially Smoothed w = .6 345.00 411.60 428.64 393.46 378.78 442.51 459.61 419.04
Housing Starts 345 456 440 370 369 485 471 392
Using MINITAB, the plot is: 500
Variable Housing Exp-.6
475
Star ts
450
425
400
375
350 Q uarter Year
Q1 2004
Q2
Q3
Q4
Q1 2005
Q2
Q3
Q4
c. To forecast using exponentially smoothed values, we use the following: F2006,1 = Ft+1 = Et = 419.04 F2006,2 = Ft+2 = Ft+1 = 419.04 F2006,3 = Ft+3 = Ft+1 = 419.04 F2006,4 = Ft+4 = Ft+1 = 419.04
488
Chapter 13
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
13.28
a.
Using the information from Exercise 13.21, the forecast using the exponentially smoothed values with w = .9 is: F2006 = Ft+2 = Ft+ 1 = Et = 1815.3
b.
We first compute the Holt-Winters values for years 1974-2004. With w = .3 and v = .8, E2 = Y2 = 1171 E3 = wY3 + (1 – w)(E2 + T2) = .3(1663) + (1 − .3)(1171 + 245) = 1490.1 T2 = Y2 – Y1 = 1171 – 926 = 245 T3 = v(E3 – E2) + (1 – v)T2 = .8(1490.1 – 1171) + (1 − .8)(245) = 304.28 The rest of the Et’s and Tt’s appear in the table:
Year 1974 1975 1976 1977 1978 1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004
t 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
Imports 926 1,171 1,663 2,058 1,892 1,866 1,414 1,067 633 540 553 479 771 876 987 1,232 1,282 1,233 1,247 1,339 1,307 1,303 1,258 1,378 1,522 1,543 1,664 1,770 1,490 1,671 1,833
Et w = .3 v = .8
Tt w = .3 v = .8
1171.00 1490.10 1873.47 2136.31 2253.87 2107.47 1734.46 1182.96 637.02 235.48 8.40 50.00 283.65 622.67 1020.93 1365.36 1571.76 1639.14 1619.79 1529.25 1411.34 1289.30 1232.36 1270.65 1364.08 1508.72 1679.04 1736.09 1771.26 1820.42
245.00 304.28 367.55 283.79 150.80 −86.96 −315.80 −504.36 −537.62 −428.76 −267.41 −20.21 182.88 307.79 380.16 351.58 235.43 100.99 4.72 −71.48 −108.63 −119.36 −69.42 16.75 78.09 131.33 162.52 78.14 43.77 48.08
Time Series: Descriptive Analyses, Models, and Forecasting
489
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
To forecast using the Holt-Winters Model: For w = .3 and v = .8, F2006 = Ft+2 = Ft+1 = Et + 2Tt = 1,820.42 + 2(48.08) = 1,916.58 c.
The error forecast for the exponentially smoothed series is Yt+2 – Ft+2 = 2,100 – 1815.3 = 284.7 The error forecast for the Holt-Winters series is Yt+2 – Ft+2 = 2,100 – 1,916.58 = 183.42 The error for the Holt-Winters forecast is smaller than the error for the exponentially smoothed forecast.
13.30
a.
We first compute the Holt-Winters values for the years 2003-2005. With w = .3 and v = .5, E2 = Y2 = 974.5 E3 = wY3 + (1 – w)(E2 + T2) =.3(996.0) + (1 − .3)(974.5 + 126.3) = 1,069.36. T2 = Y2 – Y1 = 974.5 – 848.2 = 126.3 T3 = v(E3 – E2) + (1 – v)T2 = .5(1,069.36 – 974.5) + (1 − .5)(126.3) = 110.58 The rest of the Et’s and Tt’s appear in the table that follows.
Year 2003
2004
2005
2006
490
Quarter 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3
S&P 500 848.2 974.5 996.0 1111.9 1126.2 1140.8 1114.6 1211.9 1180.6 1191.3 1228.8 1248.3 1294.9 1270.2 1335.8
Et w = .3 v = .5
Tt w = .3 v = .5
Et w = .7 v = .5
Tt w = .7 v = .5
974.5 1069.36 1159.53 1219.79 1252.32 1250.50 1258.03 1246.99 1232.52 1227.45 1229.96
126.30 110.58 100.37 80.32 56.42 27.30 17.42 3.19 -5.64 -5.35 -1.42
974.5 1027.44 1113.45 1148.72 1161.64 1139.88 1192.62 1193.28 1196.53 1221.92 1245.60
126.30 89.62 87.81 61.54 37.23 7.74 30.24 15.45 9.35 17.37 20.52
Chapter 13
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
To forecast using the Holt-Winters Model with w = .3 and v = .5: F2006,1 = Ft+1 = Et + Tt = 1,229.96 – 1.42 = 1,228.54 F2006,2 = Ft+2 = Et + 2Tt = 1,229.96 + 2(–1.42) = 1,227.12 F2006,3 = Ft+3 = Et + 3Tt = 1,229.96 + 3(–1.42) = 1,225.70 With w = .7 and v = .5, E2 = Y2 = 974.5 E3 = wY3 + (1 – w)(E2 + T2) =.7(996.0) + (1 − .7)(974.5 + 126.3) = 1,027.44. T2 = Y2 – Y1 = 974.5 – 848.2 = 126.3 T3 = v(E3 – E2) + (1 – v)T2 = .5(1,027.44 – 974.5) + (1 − .5)(126.3) = 89.62 The rest of the Et’s and Tt’s appear in the table above. To forecast using the Holt-Winters Model with w = .7 and v = .5: F2006,1 = Ft+1 = Et + Tt = 1,245.60 + 20.52 = 1,266.12 F2006,2 = Ft+2 = Et + 2Tt = 1,245.60 + 2(20.52) = 1,286.64 F2006,3 = Ft+3 = Et + 3Tt = 1,245.60 + 3(20.52) = 1,307.16 13.32
a.
From Exercise 13.25a, the forecasts for 2003-2005 using w = .3 are: F2003 = 199.48 F2004 = 199.48 F2005 = 199.48 The errors are the differences between the actual values and the predicted values. Thus, the errors are: Y2003 − F2003 = 195 − 199.48 = −4.48 Y2004 − F2004 = 197 − 199.48 = −2.48 Y2005 − F2005 = 195 − 199.48 = −4.48
b.
From Exercise 13.25a, the forecasts for 2003-2005 using w = .7 are: F2003 = 199.74 F2004 = 199.74 F2005 = 199.74 The errors are: Y2003 − F2003 = 195 − 199.74 = −4.74 Y2004 − F2004 = 197 − 199.74 = −2.74 Y2005 − F2005 = 195 − 199.74 = −4.74
Time Series: Descriptive Analyses, Models, and Forecasting
491
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
c.
For the exponentially smoothed forecasts with w = .3, m
∑ | Yt − Ft |
|195 − 199.48 | + |197 − 199.48 | + |195 − 199.48 | 11.44 = = 3.81 m 3 3 ⎡ m (Yt − Ft ) ⎤ ⎡ 195 − 199.48 197 − 199.48 195 − 199.48 ⎤ ⎢∑ ⎥ + + ⎢ ⎥ Y 195 197 195 i =1 t ⎢ ⎥ ⎢ ⎥ 100 = MAPE = ⎢ 100 ⎥ m 3 ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎣ ⎦ ⎢⎣ ⎥⎦ ⎡ .0585 ⎤ =⎢ ⎥ 100 = 1.9512 ⎣ 3 ⎦ MAD =
i =1
=
m
∑ (Yt − Ft )
2
i =1
RMSE =
=
m
= d.
(195 − 199.48)2 + (197 − 199.48)2 + (195 − 199.48)2 3 46.2912 = 3.928 3
For the exponentially smoothed forecasts with w = .7, m
MAD =
∑ | Yt − Ft | i =1
m
=
|195 − 199.74 | + |197 − 199.74 | + |195 − 199.74 | 12.22 = = 4.07 3 3
⎡ m (Yt − Ft ) ⎢∑ Yt i =1 MAPE = ⎢⎢ m ⎢ ⎢⎣
⎤ ⎡ 195 − 199.74 197 − 199.74 195 − 199.74 ⎥ + + ⎢ 195 197 195 ⎥ 100 = ⎢ ⎥ 3 ⎢ ⎥ ⎢ ⎣ ⎥⎦
⎤ ⎥ ⎥ 100 ⎥ ⎥ ⎦
⎡ .0625 ⎤ =⎢ ⎥ 100 = 2.0841 ⎣ 3 ⎦ m
RMSE =
∑ (Yt − Ft ) i =1
m
2
= =
492
(195 − 199.74 )2 + (197 − 199.74 )2 + (195 − 199.74 )2 3 52.4428 = 4.181 3
Chapter 13
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
13.34
a.
From Exercise 13.29a, the forecasts for the 3 quarters of 2006 using w = .7 are: F2006,1 = 1,238.8 F2006,2 = 1,238.8 F2006,3 = 1,238.8 For the exponentially smoothed forecasts with w = .7: m
MAD =
∑ | Yt − Ft | i =1
m
=
|1294.9 − 1238.8 | + |1270.2 − 1238.8 | + |1335.8 − 1238.8 | 184.5 = = 61.5 3 3
⎡ m (Yt − Ft ) ⎢∑ Yt i =1 ⎢ MAPE = ⎢ m ⎢ ⎢⎣
⎤ ⎡ 1294.9 − 1238.8 1270.2 − 1238.8 1335.8 − 1238.8 ⎥ + + ⎢ 1294.9 1270.2 1335.8 ⎥ 100 = ⎢ ⎥ 3 ⎢ ⎥ ⎢ ⎣ ⎥⎦
⎤ ⎥ ⎥ 100 ⎥ ⎥ ⎦
⎡ .1407 ⎤ =⎢ ⎥ 100 = 4.689 ⎣ 3 ⎦ m
RMSE =
∑ (Yt − Ft ) i =1
m
2
= =
b.
(1294.9 − 1238.8)2 + (1270.2 − 1238.8)2 + (1335.8 − 1238.8)2 3 13,542.17 = 67.187 3
From Exercise 13.29b, the forecasts for the 3 quarters of 2006 using w = .3 are: F2006,1 = 1,198.9 F2006,2 = 1,198.9 F2006,3 = 1,198.9 For the exponentially smoothed forecasts with w = .3: m
MAD =
∑ | Yt − Ft | i =1
=
m 304.2 = = 101.4 3
|1294.9 − 1198.9 | + |1270.2 − 1198.9 | + |1335.8 − 1198.9 | 3
Time Series: Descriptive Analyses, Models, and Forecasting
493
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
⎡ m (Yt − Ft ) ⎢∑ Yt i =1 MAPE = ⎢⎢ m ⎢ ⎢⎣
⎤ ⎡ 1294.9 − 1198.9 1270.2 − 1198.9 1335.8 − 1198.9 ⎥ + + ⎢ 1294.9 1270.2 1335.8 ⎥ 100 = ⎢ ⎥ 3 ⎢ ⎥ ⎢ ⎣ ⎥⎦
⎤ ⎥ ⎥ 100 ⎥ ⎥ ⎦
⎡ .2328 ⎤ =⎢ ⎥ 100 = 7.759 ⎣ 3 ⎦ m
∑ (Yt − Ft )
2
i =1
RMSE =
m
= =
13.36
(1294.9 − 1198.9 )2 + (1270.2 − 1198.9 )2 + (1335.8 − 1198.9 )2 3 33,041.3 = 104.946 3
c.
For all three measures of error, the exponentially smoothed series with w = .7 is smaller than the exponentially smoothed series with w = .3. Thus, the more accurate series would be the exponentially smoothed series with w = .7.
a.
From Exercise 13.31, the actual data and the forecasts using the exponential smoothing and the Holt-Winters forecasts are:
Year 2005
Month Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
Gold Price 424.2 423.4 434.2 428.9 421.9 430.7 424.5 437.9 456.0 469.9 476.7 509.8
Exponential Forecast w =.5 433.47 433.47 433.47 433.47 433.47 433.47 433.47 433.47 433.47 433.47 433.47 433.47
Holt-Winters Forecast w =.5, v =.5 454.09 466.55 479.01 491.47 503.93 516.39 528.85 541.31 553.77 566.23 578.69 591.15
For the exponential smoothing forecasts with w = .5: m
MAD =
∑ | Yt − Ft | i =1
=
| 424.2 − 433.47 | + | 423.4 − 433.47 | + ⋅⋅⋅ + | 509.8 − 433.47 | 12
m 230.9 = = 19.242 12
494
Chapter 13
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
⎡ m (Yt − Ft ) ⎢∑ Yt i =1 MAPE = ⎢⎢ m ⎢ ⎢⎣
⎤ ⎡ 424.2 − 433.47 423.4 − 433.47 509.8 − 433.47 ⎥ + + ⋅⋅⋅ + ⎢ 424.2 423.4 509.8 ⎥ 100 = ⎢ ⎥ 12 ⎢ ⎥ ⎢ ⎣ ⎥⎦
⎤ ⎥ ⎥ 100 ⎥ ⎥ ⎦
⎡ .4904 ⎤ =⎢ ⎥ 100 = 4.087 ⎣ 12 ⎦ m
∑ (Yt − Ft )
2
i =1
RMSE =
=
m
=
( 424.2 − 433.47 )2 + ( 423.4 − 433.47 )2 + ⋅⋅⋅ + ( 509.8 − 433.47 )2 12 9,980.2268 = 28.839 12
For the Holt-Winters forecasts with w = .5 and v = .5: m
MAD =
∑ | Yt − Ft | i =1
=
| 424.2 − 454.09 | + | 423.4 − 466.55 | + ⋅⋅⋅ + | 509.8 − 591.15 | 12
m 933.34 = = 77.778 12
⎡ m (Yt − Ft ) ⎢∑ Yt i =1 MAPE = ⎢⎢ m ⎢ ⎢⎣
⎤ ⎡ 424.2 − 454.09 423.4 − 466.55 509.8 − 591.15 ⎥ + + ⋅⋅⋅ + ⎢ 424.2 423.4 509.8 ⎥ 100 = ⎢ ⎥ 12 ⎢ ⎥ ⎢ ⎣ ⎥⎦
⎤ ⎥ ⎥ 100 ⎥ ⎥ ⎦
⎡ 2.0897 ⎤ =⎢ ⎥ 100 = 17.415 ⎣ 12 ⎦ m
RMSE =
∑ (Yt − Ft ) i =1
m
2
= =
( 424.2 − 454.09 )2 + ( 423.4 − 466.55)2 + ⋅⋅⋅ + ( 509.8 − 591.15)2 12 80,190.7476 = 81.747 12
For all three measures of forecast errors, the exponential smoothing forecasts had smaller errors. Thus, the exponential smoothing forecasts are better.
Time Series: Descriptive Analyses, Models, and Forecasting
495
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
b.
From Exercise 13.31, the actual data and the forecasts using the exponential smoothing one-step-ahead and the Holt-Winters one-step-ahead forecasts are:
Year 2005
Month Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
Gold Price 424.2 423.4 434.2 428.9 421.9 430.7 424.5 437.9 456.0 469.9 476.7 509.8
Exponential Forecast w =.5 433.47 428.83 426.12 430.16 429.53 425.71 428.21 426.35 432.13 444.06 456.98 466.84
Holt-Winters Forecast w =.5, v =.5 454.09 444.12 433.57 433.84 430.10 422.67 425.37 423.40 432.74 452.27 473.40 488.19
For the exponential smoothing one-step-ahead forecasts with w = .5: m
MAD =
∑ | Yt − Ft | i =1
=
| 424.2 − 433.47 | + | 423.4 − 428.83 | + ⋅⋅⋅ + | 509.8 − 466.84 | 12
m 164.32 = = 13.693 12
⎡ m (Yt − Ft ) ⎢∑ Yt i =1 MAPE = ⎢⎢ m ⎢ ⎢⎣
⎤ ⎡ 424.2 − 433.47 423.4 − 428.83 509.8 − 466.84 ⎥ + + ⋅⋅⋅ + ⎢ 424.2 423.4 509.8 ⎥ 100 = ⎢ ⎥ 12 ⎢ ⎥ ⎢ ⎣ ⎥⎦
⎤ ⎥ ⎥ 100 ⎥ ⎥ ⎦
⎡ .3540 ⎤ =⎢ ⎥ 100 = 2.950 ⎣ 12 ⎦ m
RMSE =
∑ (Yt − Ft ) i =1
m
2
= =
496
( 424.2 − 433.47 )2 + ( 423.4 − 428.83)2 + ⋅⋅⋅ + ( 509.8 − 466.84 )2 12 3,884.9754 = 17.993 12
Chapter 13
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
For the Holt-Winters one-step-ahead forecasts with w = .5 and v = .5: m
MAD =
∑ | Yt − Ft | i =1
| 424.2 − 454.09 | + | 423.4 − 444.12 | + ⋅⋅⋅ + | 509.8 − 488.19 | 12
=
m 153.58 = = 12.798 12
⎡ m (Yt − Ft ) ⎢∑ Yt i =1 MAPE = ⎢⎢ m ⎢ ⎢⎣
⎤ ⎡ 424.2 − 454.09 423.4 − 444.12 509.8 − 488.19 ⎥ + + ⋅⋅⋅ + ⎢ 424.2 423.4 509.8 ⎥ 100 = ⎢ ⎥ 12 ⎢ ⎥ ⎢ ⎣ ⎥⎦
⎤ ⎥ ⎥ 100 ⎥ ⎥ ⎦
⎡ .3434 ⎤ =⎢ ⎥ 100 = 2.862 ⎣ 12 ⎦ m
RMSE =
∑ (Yt − Ft )
2
i =1
m
= =
( 424.2 − 454.09 )2 + ( 423.4 − 444.12 )2 + ⋅⋅⋅ + ( 509.8 − 488.19 )2 12 3,019.9854 = 15.864 12
For all three measures of forecast errors, the Holt-Winters forecasts have smaller errors. Thus, the Holt-Winters forecasts are better. 13.38
a.
Using MINITAB, the output is: Regression Analysis: Price versus t The regression equation is Price = 24.7 + 0.0910 t Predictor Constant t
Coef 24.6975 0.09103
S = 1.497
SE Coef 0.7851 0.08119
R-Sq = 8.2%
T 31.46 1.12
P 0.000 0.281
R-Sq(adj) = 1.7%
Analysis of Variance Source Regression Residual Error Total
DF 1 14 15
SS 2.817 31.379 34.197
MS 2.817 2.241
F 1.26
P 0.281
Predicted Values for New Observations New Obs 1
Fit 26.245
SE Fit 0.785
(
95.0% CI 24.561, 27.929)
(
95.0% PI 22.619, 29.871)
Values of Predictors for New Observations New Obs 1
t 17.0
Time Series: Descriptive Analyses, Models, and Forecasting
497
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
Predicted Values for New Observations New Obs 2
Fit 26.336
SE Fit 0.857
(
95.0% CI 24.497, 28.175)
(
95.0% PI 22.636, 30.036)
Values of Predictors for New Observations New Obs 2
b.
t 18.0
The estimates of the parameters in the model, E(Yt) = β0 + β1t, are
βˆ0 = 24.6975 The price is estimated to be 24.6975 cents/pound for t = 0 or for 1991. βˆ1 = .09103
c.
The price is estimated to increase by .091 cents/pound for each additional year.
The forecast for 2007 is: Using t = 17, Yˆ 2003 = 24.6975 + .09103(17) = 26.2450 The forecast for 2008 is: Using t = 18, Yˆ 2004 = 24.6975 + .09103(18) = 26.3360 Yes, these agree with the predicted values on the printout.
d.
From the printout, the 95% forecast intervals are: 2007 (22.619, 29.871) 2008 (22.636, 30.036) We are 95% confident that the actual price in 2007 will be between 22.619 and 29.871. We are 95% confident that the actual price in 2008 will be between 22.636 and 30.036.
e.
13.40
498
No, we would not recommend that this model be used to forecast annual price. If we were to test if there is a significant linear relationship between time and annual price (H0: β1 = 0 vs Ha: β1 ≠ 0), the test statistic would be t = 1.12 and the p-value would be p = .281. Thus, we would conclude there is insufficient evidence to indicate a linear relationship exists between time and annual price. (Do not reject H0.)
The major advantage of regression forecasts over the exponentially smoothed forecasts is that prediction intervals can be formed using the regression forecasts and not using the exponentially smoothed forecasts.
Chapter 13
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
13.42
a.
Using MINITAB, the results are: Regression Analysis: Price versus Time The regression equation is Price = 4.76 + 0.309 Time Predictor Constant Time
Coef 4.7608 0.30857
S = 0.769971
SE Coef 0.4184 0.04601
R-Sq = 77.6%
T 11.38 6.71
P 0.000 0.000
R-Sq(adj) = 75.8%
Analysis of Variance Source Regression Residual Error Total
DF 1 13 14
SS 26.661 7.707 34.368
MS 26.661 0.593
F 44.97
P 0.000
Unusual Observations Obs 15
Time 15.0
Price 10.740
Fit 9.389
SE Fit 0.379
Residual 1.351
St Resid 2.01R
R denotes an observation with a large standardized residual. Predicted Values for New Observations New Obs 1
Fit 9.698
SE Fit 0.418
95% CI (8.794, 10.602)
95% PI (7.805, 11.591)
Values of Predictors for New Observations New Obs 1
Time 16.0
Predicted Values for New Observations New Obs 1
Fit 10.006
SE Fit 0.459
95% CI (9.014, 10.999)
95% PI (8.069, 11.943)
Values of Predictors for New Observations New Obs 1
Time 17.0
From the printout:
βˆo = 4.7608 . The price of gas is estimated to be 4.7608 dollars per 1,000 cubic feet in 1989.
βˆ1 = .30857 . For each additional year, the price of gas is estimated to increase by .30857 dollars per 1,000 cubic feet.
Time Series: Descriptive Analyses, Models, and Forecasting
499
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
b.
To determine the model fit, we test: H0: β = 0 Ha: β ≠ 0 The test statistic is t = 6.71 (from the printout). The p-value is p = 0.000. Since the p-value is so small, H0 is rejected for any reasonable value of α. There is sufficient evidence that the model has an adequate fit.
c.
The 95% prediction interval for 2005 is (7.805, 11.591). We are 95% confident that the actual annual price of natural gas in 2005 is between 7.805 and 11.591 dollars per 1,000 cubic feet. The 95% prediction interval for 2006 is (8.069, 11.943). We are 95% confident that the actual annual price of natural gas in 2006 is between 8.069 and 11.943 dollars per 1,000 cubic feet.
13.44
d.
There are basically two problems with using simple linear regression for predicting time series data. First, we must predict values of the time series for values of time outside the observed range. We observe data for time periods 1, 2, … , t and use the regression model to predict values of the time series for t + 1, t + 2, … . The second problem is that simple linear regression does not allow for any cyclical effects such as seasonal trends.
a.
The regression model is: E (Yt ) = β o + β1t + β 2 Q1 + β3 Q2 + β 3 Q3
b.
Using MINITAB, the output is: Regression Analysis: Sales versus t, Q1, Q2, Q3 The regression equation is Sales = 120 + 16.5 t + 262 Q1 + 223 Q2 + 106 Q3 Predictor Constant t Q1 Q2 Q3
Coef 119.85 16.512 262.34 222.83 105.51
S = 26.00
SE Coef 16.95 1.028 16.73 16.57 16.48
R-Sq = 96.9%
T 7.07 16.07 15.68 13.45 6.40
P 0.000 0.000 0.000 0.000 0.000
R-Sq(adj) = 96.1%
Analysis of Variance Source Regression Residual Error Total Source t Q1 Q2 Q3
500
DF 1 1 1 1
DF 4 15 19
SS 318560 10139 328700
MS 79640 676
F 117.82
P 0.000
Seq SS 114343 81883 94610 27724
Chapter 13
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
Predicted Values for New Observations New Obs 1
Fit 728.95
SE Fit 16.95
(
95.0% CI 692.82, 765.08)
(
95.0% PI 662.80, 795.10)
(
95.0% PI 639.80, 772.10)
(
95.0% PI 539.00, 671.30)
(
95.0% PI 450.00, 582.30)
Values of Predictors for New Observations New Obs 1
t 21.0
Q1 1.00
Q2 0.000000
Q3 0.000000
Predicted Values for New Observations New Obs 1
Fit 705.95
SE Fit 16.95
(
95.0% CI 669.82, 742.08)
Values of Predictors for New Observations New Obs 1
t 22.0
Q1 0.000000
Q2 1.00
Q3 0.000000
Predicted Values for New Observations New Obs 1
Fit 605.15
SE Fit 16.95
(
95.0% CI 569.02, 641.28)
Values of Predictors for New Observations New Obs 1
t 23.0
Q1 0.000000
Q2 0.000000
Q3 1.00
Predicted Values for New Observations New Obs 1
Fit 516.15
SE Fit 16.95
(
95.0% CI 480.02, 552.28)
Values of Predictors for New Observations New Obs 1
t 24.0
Q1 0.000000
Q2 0.000000
Q3 0.000000
The least squares equation is: Yˆt = 119.85 + 16.512t + 262.34Q1 + 222.83Q2 + 105.51Q3
βˆ1 = 16.512 βˆ2 = 262.34 βˆ3 = 222.83 βˆ4 = 105.51
For every increase in time period (1 quarter), the mean sales index increases by an estimated 16.512. The difference in mean sales index between the first and fourth quarters is estimated to be 262.34. The difference in the mean sales index between the second and fourth quarters is estimated to be 222.83. The difference in the mean sales index between the third and fourth quarters is estimated to be 105.51.
To determine if the model is useful, we test: H0: β1 = β2 = β3 = β4 = 0 Ha: At least one βi ≠ 0, i = 1, 2, 3, 4 The test statistic is F = 117.82
Time Series: Descriptive Analyses, Models, and Forecasting
501
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
Since no α is given, we will use α = .05. The rejection region requires α = .05 in the upper tail of the F-distribution with numerator df = k = 4 and denominator df = n − (k + 1) = 20 − (4 + 1) = 15. From Table IX, Appendix B, F = 3.06. The rejection region is F > 3.06. Since the observed value of the test statistic falls in the rejection region (F = 117.82 > 3.06), H0 is rejected. There is sufficient evidence to indicate the model is useful at α = .05. c.
The assumption of independent error terms is in doubt.
d.
The forecasts and the 95% prediction intervals are found at the bottom of the printout and are:
2007
13.46
13.48
I II III IV
Forecast 728.95 705.95 605.15 516.115
95% Lower Limit 95% Upper Limit 662.8 795.1 639.8 772.1 539.0 671.3 450.0 582.3
a.
d = 3.9 indicates the residuals are very strongly negatively autocorrelated.
b.
d = .2 indicates the residuals are very strongly positively autocorrelated.
c.
d = 1.99 indicates the residuals are probably uncorrelated.
a.
To determine if the overall model contributes information for the prediction of monthly passenger car and light truck sales, we test: H0: β1 = β2 = β3 = β4 = β5 = 0 Ha: At least 1 βi ≠ 0 The test statistic is F =
R2 / k .856 / 5 = = 164.067 2 (1 − R ) /[n − (k + 1)] (1 − .856) /[144 − (5 + 1)]
The rejection region requires α = .05 in the upper tail of the F distribution with ν1 = k = 5 and ν2 = n – (k + 1) = 144 – (5 + 1) = 138. From Table IX, Appendix B, F.05 ≈ 2.29. The rejection region is F > 2.29. Since the observed value of the test statistic falls in the rejection region (F = 164.067 > 2.29), H0 is rejected. There is sufficient evidence to indicate the overall model contributes information for the prediction of monthly passenger car and light truck sales at α = .05. b.
To determine if positive autocorrelation is present, we test: H0: No first-order autocorrelation Ha: Positive first-order autocorrelation of residuals
502
Chapter 13
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
The test statistics is d = 1.01. For α = .05, the rejection region is d < dL, α = dL,.05 ≈ 1.57. The value dL,.05 is found in Table XIII, Appendix B, with k = 5, n = 144, and α = .05. Since the observed value of the test statistic falls in the rejection region (d = 1.01 < 1.57, H0 is rejected. There is sufficient evidence to indicate the time series residuals are positively autocorrelated at α = .05.
13.50
c.
One of the requirements for the validity of the test in part b is that the error terms are independent. Since H0 was rejected in part a, there is evidence that positive autocorrelation exists. Since the error terms are not independent, the test in part b may not be valid.
a.
There is a tendency for the residuals to have long positive runs and negative runs. Residuals 1 through 6 are positive, while residuals 7 through 25 are negative. Residuals 26 through 35 are positive. This indicates the error terms are correlated.
b.
From the printout, the Durbin-Watson d is d = .0627. To determine if the time series residuals are autocorrelated, we test: H0: No first-order autocorrelation of residuals Ha: Positive or negative first-order autocorrelation of residuals The test statistic is d = .0627. For α = .10, the rejection region is d < dL,α/2 = dL,.05 = 1.40 or (4 − d) < dL,.05 = 1.40. The value dL,.05 is found in Table XIII, Appendix B, with k = 1, n = 35, and α = .10. Since the observed value of the test statistic falls in the rejection region (d = .0627 < 1.40), H0 is rejected. There is sufficient evidence to indicate the time series residuals are autocorrelated at α = .10.
c.
We must assume the residuals are normally distributed.
Time Series: Descriptive Analyses, Models, and Forecasting
503
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
13.52
a.
Using MINITAB, the plot of the residuals against t is: Scatterplot of RESI1 vs Time 1.5
1.0
RESI1
0.5
0
0.0
-0.5
-1.0 0
2
4
6
8 T ime
10
12
14
16
There is not a random scattering of the residuals. The first 5 residuals are positive, the next 6 are negative, the next one is positive, the next one is negative and the last 2 are positive. This does not appear to be a random scattering. The plot suggests the possibility of autocorrelation. b.
Using MINITAB, the output is: Regression Analysis: Price versus Time The regression equation is Price = 4.76 + 0.309 Time Predictor Constant Time
Coef 4.7608 0.30857
S = 0.769971
SE Coef 0.4184 0.04601
R-Sq = 77.6%
T 11.38 6.71
P 0.000 0.000
R-Sq(adj) = 75.8%
Analysis of Variance Source Regression Residual Error Total
DF 1 13 14
SS 26.661 7.707 34.368
MS 26.661 0.593
F 44.97
P 0.000
Unusual Observations Obs 15
Time 15.0
Price 10.740
Fit 9.389
SE Fit 0.379
Residual 1.351
St Resid 2.01R
R denotes an observation with a large standardized residual. Durbin-Watson statistic = 1.39909
504
Chapter 13
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
To determine if positive autocorrelation is present, we test: H0: No first-order autocorrelation Ha: Positive first-order autocorrelation of residuals The test statistics is d = 1.399. For α = .05, the rejection region is d < dL, α = dL,.05 = 1.08. The value dL,.05 is found in Table XIII, Appendix B, with k = 1, n = 15, and α = .05. Since the observed value of the test statistic does not fall in the rejection region (d = 1.399 1.08), H0 is not rejected. From Table XII, Appendix B, dU,α = 1.36 with k = 1, n = 15 and α = .05. Since the observed value of the test statistic falls above the upper limit (d = 1.399 > 1.36), there is insufficient evidence to indicate the time series residuals are positively autocorrelated at α = .05.
13.54
c.
Since the error terms do not appear to be dependent, the validity of the test for the model adequacy appears to be fine.
a.
Using MINITAB, the plot of the residuals against t is: Scatterplot of RESI1 vs t 30 20
RESI1
10 0
0
-10 -20 -30 0
5
10
15
20
25
30
35
t
Since there appear to be groups of consecutive positive and groups of consecutive negative residuals, the data appear to be autocorrelated.
Time Series: Descriptive Analyses, Models, and Forecasting
505
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
b.
Using MINITAB, the output is: Regression Analysis: Policies versus t The regression equation is Policies = 385 - 0.363 t Predictor Constant t
Coef 385.326 -0.3632
S = 15.0555
SE Coef 5.280 0.2632
R-Sq = 5.6%
T 72.98 -1.38
P 0.000 0.177
R-Sq(adj) = 2.7%
Analysis of Variance Source Regression Residual Error Total
DF 1 32 33
SS 431.6 7253.3 7685.0
MS 431.6 226.7
F 1.90
P 0.177
Unusual Observations Obs 1
t 1.0
Policies 355.00
Fit 384.96
SE Fit 5.05
Residual -29.96
St Resid -2.11R
R denotes an observation with a large standardized residual. Durbin-Watson statistic = 0.424942
To determine if positive autocorrelation is present, we test: H0: No first-order autocorrelation Ha: Positive first-order autocorrelation of residuals The test statistics is d = 0.42. For α = .05, the rejection region is d < dL, α = dL,.05 = 1.39. The value dL,.05 is found in Table XIII, Appendix B, with k = 1, n = 34, and α = .05. Since the observed value of the test statistic falls in the rejection region (d = .42 < 1.39), H0 is rejected. There is sufficient evidence to indicate the time series residuals are positively autocorrelated at α = .05. c.
506
Since the error terms do not appear to be independent, the validity of the test for model adequacy is in question.
Chapter 13
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
13.56
a.
Year 1995 2000 2001 2002 2003 2004 b.
The exponentially smoothed price for the first time period is equal to the price for that period. For the rest of the time periods, the exponentially smoothed prices are found by multiplying the price for that time period by w = .5 and adding to that (1 − .5) times the exponentially smoothed price for the time period preceeding it. The exponentially smoothed values for each of the price series appear in the table:
Cold Finished Price 25.70 23.08 22.76 23.26 25.15 38.67
Exponentially Smoothed Value w = .5 25.70 24.39 23.58 23.42 24.28 31.48
Exponentially Smoothed Value w = .5 25.32 20.50 16.10 16.28 15.54 23.19
Hot Rolled Price 25.32 15.67 11.71 16.46 14.80 30.84
Galvanized Price 34.47 21.38 16.41 22.00 20.08 36.69
Exponentially Smoothed Value w = .5 34.47 27.93 22.17 22.08 21.08 28.89
The plot of the three price series and the exponentially smoothed series are: Cold Finished 40
Variable CF CF-Exp-.5
P r ice
35
30
25
1995 1996 1997
1998 1999
2000 2001
2002 2003
2004
Y ear
Time Series: Descriptive Analyses, Models, and Forecasting
507
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
Hot Rolled Variable HR HR-Exp-.5
30
P r ice
25
20
15
10 1995 1996 1997 1998
1999 2000 2001
2002 2003 2004
Y ear
Galvanized Variable Gal Gal-Exp-.5
35
P r ice
30
25
20
15 1995
1996 1997 1998 1999 2000
2001 2002 2003 2004
Y ear
c.
The exponential smoothing forecasts for 2005 are: Cold Finished: F2005 = E2004 = 31.48 Hot Rolled: F2005 = E2004 = 23.19 Galvanized: F2005 = E2004 = 28.89 One of the main drawbacks of this kind of forecast is the inability to forecast future values using prediction intervals.
508
Chapter 13
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
13.58
a.
To compute the Laspeyres index, multiply the price for each year by the quantity for each of the items for 1990, sum the products for the four items, divide by 14.05 (the sum for the base period 1990), and multiply by 100. The Laspeyres index is:
Year 1990 1995 2000 2004
13.60
Spaghetti 0.85 0.88 0.88 0.95
Ground Beef 1.63 1.40 1.63 2.14
Eggs 1.00 1.16 0.96 0.98
Potatoes 0.32 0.38 0.35 0.51
Total 14.05 13.72 14.37 18.68
Laspeyres 100.00 97.65 102.28 132.95
b.
From 1990 to 2004, the “basket” of foods increased by 132.95 – 100 = 32.95%.
a.
We first calculate the exponentially smoothed values for 1980–1999. E1 = Y1 = 56.50 E2 = .8Y2 + (1 − .8)E1 = .8(27.0) + .2(56.50) = 32.90 E3 = .8Y3 + (1 − .8)E2 = .8(38.75) + .2(32.90) = 37.58 The rest of the values appear in the table. Year
1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006
Closing Exponentially Smoothed Value Price (w = .8) 56.50 56.50 27.00 32.90 38.75 37.58 45.25 43.72 41.75 42.14 68.37 63.12 45.62 49.12 48.02 48.24 48.01 48.06 64.03 60.84 45.00 48.17 68.07 64.09 30.03 36.84 29.05 30.61 32.05 31.76 41.05 39.19 50.75 48.44 65.50 62.09 49.00 51.62 36.31 39.37 48.44 46.63 55.75 53.93 40.00 42.79 46.60 45.84 46.65 46.49 39.43 40.84 43.80 43.21
Time Series: Descriptive Analyses, Models, and Forecasting
509
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
The forecasts for 2007 and 2008 are: F2007 = Ft+1 = Et = 43.21 F2008 = Ft+2 = Et = 43.21 The expected gain is F2008 – Y2006 = 43.21 – 43.80 = −.59. Since this number is negative, it is actually a loss. b.
We first calculate the Holt-Winters values for 1980-2006. For w = .8 and v = .5, E2 = Y2 = 27.00 E3 = .8Y3 + (1 − .8)(E2 + T2) = .8(38.75) + .2(27 − 29.50) = 30.50 T2 = Y2 − Y1 = 27.00 − 56.50 = −29.50 T3 = .5(E3 − E2) + (1 − .5)(T2) = .5(30.50 − 27.00) + .5(−29.50) = -13.00 The rest of the values appear in the table. Year
1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006
510
Closing Price
56.50 27.00 38.75 45.25 41.75 68.37 45.62 48.02 48.01 64.03 45.00 68.07 30.03 29.05 32.05 41.05 50.75 65.50 49.00 36.31 48.44 55.75 40.00 46.60 46.65 39.43 43.80
Holt-Winters w = .8 v = .5 Et Tt
27.00 −29.5 30.50 −13.00 39.70 −1.90 40.96 −0.32 62.82 10.77 51.22 −0.42 48.58 −1.53 47.82 −1.14 60.56 5.80 49.27 −2.74 63.76 5.87 37.95 −9.97 28.84 −9.54 29.50 −4.44 37.85 1.96 48.56 6.33 63.38 10.58 53.99 0.59 39.96 −6.72 45.40 −0.64 53.55 3.76 43.46 −3.17 45.34 −0.65 46.26 0.14 40.82 −2.65 42.67 −0.40
Chapter 13
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
The forecasts for 2007 and 2008 are: F2007 = Ft+1 = Et + Tt = 42.67 + (−.40) = 42.27 F2008 = Ft+2 = Et + 2Tt = 42.67 + 2(−.40) = 41.87 The expected gain is F2008 – Y2006 = 41.87 – 43.80 = −1.93. Since this number is negative, it is actually a loss. 13.62
a.
To compute the simple index for the IRA series, divide each IRA value by the 1990 value, 140, and then multiply by 100. To compute the simple index for the 401(k) series, divide each 401(k) value by the 1990 value, 35, and then multiply by 100. The values for the indices are in the table:
Year 1990 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004
b.
IRA 140 350 476 598 767 960 1234 1232 1161 1034 1307 1490
IRA Simple Index 100.00 250.00 340.00 427.14 547.86 685.71 881.43 880.00 829.29 738.57 933.57 1064.29
401(k) 35 184 266 346 466 616 810 815 794 706 919 1086
401(k) Simple Index 100.00 525.71 760.00 988.57 1331.43 1760.00 2314.29 2328.57 2268.57 2017.14 2625.71 3102.86
The time series plot is: 3500
Variable IRAindex 401(K)index
3000
Index
2500 2000 1500 1000 500 0 1990 1992
1994
1996 1998 Y ear
2000
2002
2004
Time Series: Descriptive Analyses, Models, and Forecasting
511
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
13.64
c.
Both the IRA and 401(K) finds have increased since 1990. However, the 401(K) fund has increased at a higher rate than has the IRA fund.
a.
Using MINITAB, the results from fitting the model E(Yt) = βo + β1t are: Regression Analysis: GDP versus t The regression equation is GDP = 9595 + 79.5 t Predictor Constant t
Coef 9594.96 79.537
S = 97.4825
SE Coef 45.28 3.780
R-Sq = 96.1%
T 211.89 21.04
P 0.000 0.000
R-Sq(adj) = 95.9%
Analysis of Variance Source Regression Residual Error Total
DF 1 18 19
SS 4206863 171051 4377914
MS 4206863 9503
F 442.70
P 0.000
Unusual Observations Obs 1
t 1.0
GDP 9876.0
Fit 9674.5
SE Fit 42.0
Residual 201.5
St Resid 2.29R
R denotes an observation with a large standardized residual. Durbin-Watson statistic = 0.236602 Predicted Values for New Observations New Obs 1
Fit 11265.2
SE Fit 45.3
95% CI (11170.1, 11360.4)
95% PI (11039.4, 11491.1)
Values of Predictors for New Observations New Obs 1
t 21.0
Predicted Values for New Observations New Obs 1
Fit 11344.8
SE Fit 48.6
95% CI (11242.6, 11446.9)
95% PI (11115.9, 11573.6)
Values of Predictors for New Observations New Obs 1
512
t 22.0
Chapter 13
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
Predicted Values for New Observations New Obs 1
Fit 11424.3
SE Fit 52.0
95% CI (11315.0, 11533.6)
95% PI (11192.2, 11656.5)
Values of Predictors for New Observations New Obs 1
t 23.0
Predicted Values for New Observations New Obs 1
Fit 11503.8
SE Fit 55.5
95% CI (11387.3, 11620.4)
95% PI (11268.2, 11739.5)X
X denotes a point that is an outlier in the predictors. Values of Predictors for New Observations New Obs 1
t 24.0
The fitted regression line is: Yˆt = 9,594.96 + 79.537t From the printout, the 2006 quarterly GDP forecasts are:
Year 2006
b.
Quarter Q1 Q2 Q3 Q4
Forecast 11,265.2 11,344.8 11,424.3 11,503.8
95% Lower Limit 11,039.4 11,115.9 11,192.2 11,268.2
95% Upper Limit 11,491.1 11,573.6 11,656.5 11,739.5
The following model is fit: E(Yt) = βo + β1t + β1t + β2Q1 + β3Q2 + β4Q3 ⎧1 if quarter 1 where Q1 = ⎨ ⎩0 otherwise
⎧1 if quarter 2 Q2 = ⎨ ⎩0 otherwise
⎧1 if quarter 3 Q3 = ⎨ ⎩0 otherwise
The MINITAB printout is: Regression Analysis: GDP versus t, Q1, Q2, Q3 The regression equation is GDP = 9573 + 79.8 t + 29.4 Q1 + 21.1 Q2 + 25.8 Q3 Predictor Constant t Q1 Q2 Q3
Coef 9572.60 79.850 29.35 21.10 25.85
S = 105.993
SE Coef 69.10 4.190 68.20 67.56 67.17
R-Sq = 96.2%
T 138.53 19.06 0.43 0.31 0.38
P 0.000 0.000 0.673 0.759 0.706
R-Sq(adj) = 95.1%
Time Series: Descriptive Analyses, Models, and Forecasting
513
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
Analysis of Variance Source Regression Residual Error Total Source t Q1 Q2 Q3
DF 1 1 1 1
DF 4 15 19
SS 4209395 168519 4377914
MS 1052349 11235
F 93.67
P 0.000
Seq SS 4206863 656 212 1664
Unusual Observations Obs 1
t 1.0
GDP 9876.0
Fit 9681.8
SE Fit 58.1
Residual 194.2
St Resid 2.19R
R denotes an observation with a large standardized residual. Durbin-Watson statistic = 0.238059 Predicted Values for New Observations New Obs 1
Fit 11278.8
SE Fit 69.1
95% CI (11131.5, 11426.1)
95% PI (11009.1, 11548.5)
Values of Predictors for New Observations New Obs 1
t 21.0
Q1 1.00
Q2 0.000000
Q3 0.000000
Predicted Values for New Observations New Obs 1
Fit 11350.4
SE Fit 69.1
95% CI (11203.1, 11497.7)
95% PI (11080.7, 11620.1)
Values of Predictors for New Observations New Obs 1
t 22.0
Q1 0.000000
Q2 1.00
Q3 0.000000
Predicted Values for New Observations New Obs 1
Fit 11435.0
SE Fit 69.1
95% CI (11287.7, 11582.3)
95% PI (11165.3, 11704.7)
Values of Predictors for New Observations New Obs 1
t 23.0
Q1 0.000000
Q2 0.000000
Q3 1.00
Predicted Values for New Observations New Obs Fit SE Fit 95% CI 95% PI 1 11489.0 69.1 (11341.7, 11636.3) (11219.3, 11758.7) Values of Predictors for New Observations New Obs 1
514
t 24.0
Q1 0.000000
Q2 0.000000
Q3 0.000000
Chapter 13
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
The fitted regression line is: Yˆt = 9,572.6 + 79.85t + 29.35Q1 + 21.10Q2 + 25.85Q3 To determine whether the data indicate a significant seasonal component, we test:
H0: β2 = β3 = β4 = 0 Ha: At least one βi ≠ 0
i = 2, 3, 4
The test statistic is F=
(SSE R − SSE C ) /(k − g ) (171,051 − 168,519) /(4 − 1) 844 = = = 0.075 SSE C [ n − ( k + 1)] 168,519 /[20 − (4 + 1)] 11, 234.6
Since no α is given, we will use α = .05. The rejection region requires α = .05 in the upper tail of the F-distribution with ν1 = k – g = 4 – 1 = 3 and ν2 = n – (k + 1) = 20 – (4 + 1) = 15. From Table IX, Appendix B, F.05 = 3.29. The rejection region is F > 3.29. Since the observed value of the test statistic does not fall in the rejection region (F = .075 >/ 3.29), H0 is not rejected. There is insufficient evidence to indicate a seasonal component at α = .05. This supports the assertion that the data have been seasonally adjusted. c.
From the printout, the 2006 quarterly forecasts are:
Year 2006
d.
Quarter Q1 Q2 Q3 Q4
Forecast 11,278.8 11,350.4 11,435.0 11,489.0
95% Lower Limit 11,009.1 11,080.7 11,165.3 11,219.3
95% Upper Limit 11,548.5 11,620.1 11,704.7 11,758.7
To determine if the time series residuals are autocorrelated, we test: H0: No first-order autocorrelation of residuals Ha: Positive or negative first-order autocorrelation of residuals The test statistic is d = 0.24. For α = .10, the rejection region is d < dL,α/2 = dL,.05 = .90 or (4 – d) < dL,.01 = .90. The value of dL,.05 is found in Table XIII, Appendix B, with k = 4 and n = 20. Since the observed value of the test statistic falls in the rejection region (d = 0.24 < .90), H0 is rejected. There is sufficient evidence to indicate the time series residuals are autocorrelated at α = .10.
Time Series: Descriptive Analyses, Models, and Forecasting
515
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
13.66
a.
Using MINITAB, the results from fitting the model E(Yt) = β0 + β1t are: Regression Analysis: Revolving versus t The regression equation is Revolving = - 84.5 + 33.8 t Predictor Constant t
Coef -84.54 33.768
S = 56.7803
SE Coef 23.41 1.575
R-Sq = 95.2%
T -3.61 21.44
P 0.001 0.000
R-Sq(adj) = 95.0%
Analysis of Variance Source Regression Residual Error Total
DF 1 23 24
SS 1482334 74152 1556486
MS 1482334 3224
F 459.78
P 0.000
Unusual Observations Obs 1
t 1.0
Revolving 55.0
Fit -50.8
SE Fit 22.0
Residual 105.8
St Resid 2.02R
R denotes an observation with a large standardized residual. Predicted Values for New Observations New Obs 1
Fit 827.2
SE Fit 24.8
95% CI (775.9, 878.5)
95% PI (699.0, 955.4)
Values of Predictors for New Observations New Obs 1
t 27.0
Predicted Values for New Observations New Obs 1
Fit 861.0
SE Fit 26.2
95% CI (806.7, 915.2)
95% PI (731.6, 990.3)
Values of Predictors for New Observations New Obs 1
t 28.0
The fitted regression line is: Yˆt = −84.54 + 33.768t
516
Chapter 13
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
For the years 2006 and 2007, t = 27 and 28. From the printout, the predicted values and 95% prediction intervals for 2006 and 2007 are:
Year 2006 2007
b.
Forecast 827.2 861.0
95% Lower Limit 699.0 731.6
95% Upper Limit 955.4 990.3
To compute the Holt-Winters values for the years 1980-2004: With w = .7 and v = .7, E2 = Y2 = 61 E3 = wY3 + (1 – w)(E2 + T2) =.7(66) + (1 − .7)(61 + 6) = 66.3. T2 = Y2 – Y1 = 61 – 55 = 6 T3 = v(E3 – E2) + (1 – v)T2 = .7(66.3 – 61) + (1 − .7)(6) = 5.51 The rest of the values appear in the table:
Year 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004
Revolving 55 61 66 79 100 122 136 153 174 198 239 245 257 288 338 443 499 530 579 608 678 722 738 759 794
Holt-Winters w = .7 v = .7 Et Tt
61.00 66.30 76.84 95.76 118.91 137.17 153.98 173.24 196.19 232.66 250.91 261.89 284.49 327.99 419.44 497.62 543.45 584.91 614.75 669.40 720.81 748.01 765.97 792.44
Time Series: Descriptive Analyses, Models, and Forecasting
6.00 5.51 9.03 15.95 20.99 19.08 17.49 18.73 21.69 32.04 22.38 14.40 20.14 36.49 74.97 77.22 55.24 45.59 34.57 48.62 50.57 34.22 22.83 25.38
517
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
Using the Holt-Winters series, the forecasts for 2006 and 2007 are: F2006 = Ft+2 = Et + 2Tt = 792.44 + 2(25.38) = 843.20 F2007 = Ft+3 = Et + 3Tt = 792.44 + 3(25.38) = 868.58 These values are very similar to forecasts found using regression. 13.68
a.
From Example 13.4, the exponentially smoothed value for September 2005 is 80.333. The forecasts for October through December 2005 are: F2005,Oct = Ft+1 = Et = 80.333 F2005,Nov = Ft+2 = Ft+1 = 80.333 F2005,Dec = Ft+3 = Ft+1 = 80.333 The forecast errors are the differences between the actual values and the forecasted values. The forecast errors are: Year 2005,Oct 2005,No v 2005,Dec
b.
Yt+i 81.88
Ft+i 80.333
Difference 1.55
88.90 82.20
80.333 80.333
8.57 1.87
Using MINITAB, the results of fitting the model are: Regression Analysis: IBM versus Time The regression equation is IBM = 95.8 - 0.740 Time Predictor Constant Time
Coef 95.777 -0.7401
S = 5.79351
SE Coef 2.622 0.2088
R-Sq = 39.8%
T 36.53 -3.54
P 0.000 0.002
R-Sq(adj) = 36.6%
Analysis of Variance Source Regression Residual Error Total
DF 1 19 20
SS 421.71 637.73 1059.44
MS 421.71 33.56
F 12.56
P 0.002
Unusual Observations Obs 12
Time 12.0
IBM 98.58
Fit 86.90
SE Fit 1.28
Residual 11.68
St Resid 2.07R
R denotes an observation with a large standardized residual. Durbin-Watson statistic = 0.688518
518
Chapter 13
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
Predicted Values for New Observations New Obs 1
Fit 79.50
SE Fit 2.62
95% CI (74.01, 84.98)
95% PI (66.19, 92.81)
Values of Predictors for New Observations New Obs 1
Time 22.0
Predicted Values for New Observations New Obs 1
Fit 78.76
SE Fit 2.81
95% CI (72.88, 84.63)
95% PI (65.28, 92.23)
Values of Predictors for New Observations New Obs 1
Time 23.0
Predicted Values for New Observations New Obs 1
Fit 78.02
SE Fit 2.99
95% CI (71.75, 84.28)
95% PI (64.37, 91.67)
Values of Predictors for New Observations New Obs 1
Time 24.0
The least squares fitted model is: Yˆt = 95.777 − .7401t
βˆo = 95.777
The estimated stock price for IBM in December 2003 is 95.777.
βˆ1 = −.7401
The estimated decrease in the value of the stock for IBM for each additional month is .7401.
c.
The approximate precision is ±2s or ±2(5.79) or ±11.58 .
d.
The forecasts and prediction intervals are found at the bottom of the printout in part b.
Year 2005, Oct 2005, Nov 2005, Dec
Forecast 79.50 78.76 78.02
95% Lower Limit 66.19 65.28 64.37
The precision for October is approximately
95% Upper Limit 92.81 92.23 91.67
92.81 − 66.19 = 13.31 . 2
Time Series: Descriptive Analyses, Models, and Forecasting
519
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
The precision for November is approximately
92.23 − 65.28 = 13.48 . 2
The precision for December is approximately
91.67 − 64.37 = 13.65 . 2
All of these are close to the 11.58 from part c. e.
The MAD, MAPE, and RMSE for the smoothed series are: m
MAD =
∑ | Yt − Ft | i =1
m
=
| 81.88 − 80.33 | + | 88.90 − 80.33 | + | 82.20 − 80.33 | 11.98 = = 3.994 3 3
⎡ m (Yt − Ft ) ⎢∑ Yt ⎢ i =1 MAPE = ⎢ m ⎢ ⎢ ⎣
m
∑ (Yt − Ft )
2
i =1
RMSE =
⎤ ⎡ 81.88 − 80.33 88.90 − 80.33 82.20 − 80.33 ⎥ + + ⎢ ⎥ 81.88 88.90 88.90 ⎥ 100 = ⎢⎢ 3 ⎥ ⎢ ⎥ ⎣ ⎦ ⎡ .1380 ⎤ =⎢ ⎥ 100 = 4.599 ⎣ 3 ⎦
m
= =
⎤ ⎥ ⎥ 100 ⎥ ⎥ ⎦
(81.88 − 80.33)2 + (88.90 − 80.33)2 + (82.20 − 80.33)2 3 79.2724 = 5.140 3
The MAD, MAPE, and RMSE for the regression model are: m
MAD =
∑ | Yt − Ft | i =1
=
m 16.70 = = 5.567 3
| 81.88 − 79.50 | + | 88.90 − 78.76 | + | 82.20 − 78.02 | 3
⎡ m (Yt − Ft ) ⎢∑ Yt ⎢ i =1 MAPE = ⎢ m ⎢ ⎢ ⎣
520
⎤ ⎡ 81.88 − 79.50 88.90 − 78.76 82.20 − 78.02 ⎥ + + ⎢ ⎥ 81.88 88.90 88.90 ⎥ 100 = ⎢⎢ 3 ⎥ ⎢ ⎥ ⎣ ⎦ ⎡ .1940 ⎤ =⎢ ⎥ 100 = 6.466 ⎣ 3 ⎦
⎤ ⎥ ⎥ 100 ⎥ ⎥ ⎦
Chapter 13
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
m
RMSE =
∑ (Yt − Ft ) i =1
m
2
= =
(81.88 − 79.50 )2 + (88.90 − 78.76 )2 + (82.20 − 78.02 )2 3 125.9564 = 6.480 3
The values of MAD, MAPE, and RMSE for the exponentially smoothed model are all smaller than their corresponding values for the regression model. f.
We have to assume that the error terms are independent.
g.
To determine if positive autocorrelation is present, we test: H0: No first-order autocorrelation of residuals Ha: Positive first-order autocorrelation of residuals The test statistic is d = 0.69. The rejection region is d < dL,α = dL,.05 = 1.22. The value of dL,.05 is found in Table XIII, Appendix B, with k = 1 and n = 21 . Since the observed value of the test statistic falls in the rejection region (d = .69 < 1.22), H0 is rejected. There is sufficient evidence to indicate the time series residuals are positively autocorrelated at α = .05. Since there is evidence of positive autocorrelation, the validity of the regression model is questioned.
Time Series: Descriptive Analyses, Models, and Forecasting
521
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
The Gasket Manufacturing Case (To accompany Chapters 12–13)
For this study, I constructed an R chart and an x -chart for both the original data (5.1) and for the new data (5.2). First, we will analyze the data set, 5.1 (that collected under the discretion of the operator). We must compute the mean and range for each sample. The range = R = largest measurement smallest measure. The results are listed in the table:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
0.0440 0.0438 0.0453 0.0451 0.0459 0.0449 0.0472 0.0457 0.0464 0.0451 0.0456 0.0448 0.0459 0.0456 0.0472 0.0462 0.0427 0.0431 0.0425 0.0429 0.0443 0.0443 0.0429 0.0448
Samples 0.0446 0.0425 0.0428 0.0441 0.0466 0.0471 0.0477 0.0459 0.0457 0.0447 0.0455 0.0423 0.0468 0.0471 0.0465 0.0463 0.0437 0.0448 0.0442 0.0447 0.0441 0.0423 0.0427 0.0451
0.0437 0.0443 0.0433 0.0434 0.0476 0.0451 0.0452 0.0472 0.0447 0.0457 0.0445 0.0442 0.0452 0.0450 0.0461 0.0471 0.0445 0.0429 0.0432 0.0450 0.0450 0.0447 0.0464 0.0428
x 0.0441 0.0435 0.0438 0.0442 0.0467 0.0457 0.0467 0.0463 0.0456 0.0452 0.0452 0.0438 0.0460 0.0459 0.0466 0.0465 0.0436 0.0436 0.0433 0.0442 0.0445 0.0438 0.0440 0.0442
Range 0.0009 0.0018 0.0025 0.0017 0.0017 0.0022 0.0025 0.0015 0.0017 0.0010 0.0011 0.0025 0.0016 0.0021 0.0011 0.0009 0.0018 0.0019 0.0017 0.0021 0.0009 0.0024 0.0037 0.0023
x1 + x2 + " + x24 1.0770 = = .0449 n 24 R + R2 + " + R24 .0436 = R = 1 = .0018 n 24 x =
We now construct an R chart. From Table XVII, Appendix B, with n = 3, D3 = .000 and D4 = 2.574.
522
The Gasket Manufacturing Case
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
R = .0018 Upper control limit = RD4 = .0018(2.574) = .0046 Since D3 = 0, the lower control limit is negative and is not included on the chart. From Table XVII, Appendix B, with n = 3, d2 = 1.693 and d3 = .888.
Upper A–B boundary = R + 2d3
.0018 R = .0018 + 2(.888) = .0037 1.693 d2
Lower A−B boundary = R − 2d3
.0018 R = .0018 − 2(.888) = −.0001 = 0 1.693 d2
Upper B–C boundary = R + d3
.0018 R = .0018 + (.888) = .0027 d2 1.693
Lower B–C boundary = R − d3
.0018 R = .0018 − (.888) = .0009 1.693 d2
The R-chart is:
To determine if the process is in control, we check the four rules. Rule 1: One point beyond Zone A: There are no points beyond Zone A. Rule 2: Nine points in a row in Zone C or beyond: No sequence of nine points are in Zone C (on one side of the centerline) or beyond. Rule 3: Six points in a row steadily increasing or decreasing: This pattern is not present. Rule 4: Fourteen points in a row alternating up and down: This pattern does not exit. The process appears to be in control. No rule is violated. Next, we construct the x -chart.
The Gasket Manufacturing Case
523
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
Centerline = x = .0449 From Table XVII, Appendix B, with n = 3, A2 = 1.023
Upper control limit = x + A2 R = .0449 + 1.023(.0018) = .0467 Lower control limit = x − A2 R = .0449 − 1.023(.0018) = .0431
Upper A-B boundary = x =
2 2 ( A2 R ) = .0449 + (1.023(.0018) ) = .0461 3 3
Lower A–B boundary = x −
2 2 ( A2 R ) = .0449 − (1.023(.0018) ) = .0437 3 3
Upper B–C boundary = x +
1 1 ( A2 R ) = .0449 + (1.023(.0018) ) = .0455 3 3
Lower B–C boundary = x −
1 1 ( A2 R ) = .0449 − (1.023(.0018) ) = .0443 3 3
The x -chart is:
To determine if the process is in or out of control, we check the six rules: Rule 1: One point beyond Zone A: No points are beyond Zone A. Rule 2: Nine points in a row in Zone C or beyond: No sequence of nine points are in Zone C (on one side of the centerline) or beyond. Rule 3: Six points in a row steadily increasing or decreasing: This pattern is not present. Rule 4: Fourteen points in a row alternating up and down: This pattern does not exit. Rule 5: Two out of three points in Zone A or beyond: There are six groups of at least three points in Zone A or beyond—points 5–7, points 6–8, points 7–9, points 14–16, points 17–19, and points 18–20. Rule 6: Four out of five points in a row in Zone B or beyond: There are six groups of points that satisfy this rule—points 5–9, points 6–10, points 17–21, points 18–22, points 19–23, and points 20–24.
524
The Gasket Manufacturing Case
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
The process appears to be out of control. Rules 5 and 6 indicate that the process is out of control. Since the process is out of control, a capability analysis is not appropriate. However, I will include a dot diagram which indicates that many of the actual observations are outside of the specification limits. The dot plot is: . : : ::: ....
.
:. .:
. .
..
:. .::: :.::.:::. .:: : ...:.. .
::
..
-------+---------+---------+---------+---------+--------0.0430
0.0440
0.0450
0.0460
0.0470
0.0480
The specification limits are .043 to .047. There are 11 points below .043 and 8 above .047. Thus, 19 out of the 72 points or .264 of the points are outside of the specification limits. This indicates that the present system, when the operator is allowed to adjust the system at his/her discretion, is not capable of reaching the needs of the customers. Next, we analyze the second set of data, 5.2. First, we must compute the mean and range for each sample. The range = R = largest measurement smallest measure. The results are listed in the table:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
0.0445 0.0435 0.0438 0.0449 0.0433 0.0455 0.0455 0.0445 0.0443 0.0449 0.0465 0.0461 0.0443 0.0456 0.0447 0.0454 0.0445 0.0438 0.0453 0.0455 0.0440 0.0444 0.0445 0.0450
Samples 0.0455 0.0453 0.0459 0.0449 0.0461 0.0454 0.0458 0.0451 0.0450 0.0448 0.0449 0.0439 0.0434 0.0459 0.0442 0.0445 0.0471 0.0445 0.0444 0.0435 0.0438 0.0450 0.0447 0.0463
The Gasket Manufacturing Case
0.0457 0.0450 0.0428 0.0467 0.0451 0.0461 0.0445 0.0436 0.0441 0.0467 0.0448 0.0452 0.0454 0.0452 0.0457 0.0451 0.0465 0.0472 0.0451 0.0443 0.0444 0.0467 0.0461 0.0456
x 0.0452 0.0446 0.0442 0.0455 0.0448 0.0457 0.0453 0.0444 0.0445 0.0455 0.0454 0.0451 0.0444 0.0456 0.0449 0.0450 0.0460 0.0452 0.0449 0.0444 0.0441 0.0454 0.0451 0.0456
Range 0.0012 0.0018 0.0031 0.0018 0.0028 0.0007 0.0013 0.0015 0.0009 0.0019 0.0017 0.0022 0.0020 0.0007 0.0015 0.0009 0.0026 0.0034 0.0009 0.0020 0.0006 0.0023 0.0016 0.0013
525
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
x1 + x2 + " + x24 1.0808 = = .0450 n 24 R + R2 + " + R24 .0407 = R = 1 = .0017 24 n x =
First, we construct an R chart. From Table XVII, Appendix B, with n = 3, D3 = .000 and D4 = 2.574.
R = .0017 Upper control limit = RD4 = .0017(2.574) = .0044 Since D3 = 0, the lower control limit is negative and is not included on the chart. From Table XVII, Appendix B, with n = 3, d2 = 1.693 and d3 = .888.
.0017 Upper A–B boundary = R + 2d3 R = .0017 + 2(.888) = .0035 1.693 d2 .0017 Lower A–B boundary = R − 2d3 R = .0017 − 2(.888) = -.0001 = 0 1.693 d2 .0017 Upper B–C boundary = R + d3 R = .0017 + (.888) = .0026 1.693 d2 .0017 Lower B–C boundary = R − d3 R = .0017 − (.888) = .0008 1.693 d2 The R-chart is:
To determine if the process is in control, we check the four rules. Rule 1: One point beyond Zone A: There are no points beyond Zone A. Rule 2: Nine points in a row in Zone C or beyond: No sequence of nine points are in Zone C (on one side of the centerline) or beyond.
526
The Gasket Manufacturing Case
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
Rule 3: Six points in a row steadily increasing or decreasing: This pattern is not present. Rule 4: Fourteen points in a row alternating up and down: This pattern does not exit. The process appears to be in control. No rule is violated. Next, we construct the x -chart. Centerline = x = .0450 From Table XVII, Appendix B, with n = 3, A2 = 1.023
Upper control limit = x + A2 R = .0450 + 1.023(.0017) = .0467 Lower control limit = x − A2 R = .0450 − 1.023(.0017) = .0433 Upper A-B boundary = x +
2 2 ( A2 R ) = .0450 + (1.023(.0017) ) = .0462 3 3
Lower A–B boundary = x −
2 2 ( A2 R ) = .0450 − (1.023(.0017) ) = .0438 3 3
Upper B–C boundary = x +
1 1 ( A2 R ) = .0450 + (1.023(.0017) ) = .0456 3 3
Lower B–C boundary = x −
1 1 ( A2 R ) = .0450 − (1.023(.0017) ) = .0444 3 3
The x -chart is:
To determine if the process is in or out of control, we check the six rules: Rule 1: One point beyond Zone A: No points are beyond Zone A. Rule 2: Nine points in a row in Zone C or beyond: No sequence of nine points are in Zone C (on one side of the centerline) or beyond.
The Gasket Manufacturing Case
527
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
Rule 3: Rule 4: Rule 5: Rule 6:
Six points in a row steadily increasing or decreasing: This pattern is not present. Fourteen points in a row alternating up and down: This pattern does not exit. Two out of three points in Zone A or beyond: This pattern does not exist. Four out of five points in a row in Zone B or beyond: This pattern does not exist.
The process appears to be in control. No rules are violated. Since the process is in control, we will perform a capability analysis to see if the process can meet the customer's demand. I will include a dot diagram which indicates that many of the actual observations are outside of the specification limits. The dot plot is: . : . ..: : :: . : : . . .. :. : .... ::: ::: ::::: :::. : : . : : .. -----+---------+---------+---------+---------+---------+0.04320 0.04400 0.04480 0.04560 0.04640 0.04720
The specification limits are .043 to .047. There is one point below .043 and two points above .047. Thus, 3 out of the 72 points or .042 of the points are outside of the specification limits. This indicates that the present system, when the operator does not adjust the system at his/her discretion, might be able to meet the needs of the customers. We will also compute the capability index. The capability index is defined as the ratio of the specification limits to 6 standard deviations or:
Cp =
upper specification limit − lower specification limit 6σ
Since σ is not known, we will estimate it with s. In this case, s = .00095. The capability index is:
Cp =
.047 - .043 = .702 6(.00095)
Since the capability index is less than 1, it indicates that the process is not capable of meeting the customer's needs. Even though this process (operator does not make adjustments) is in control, it is not capable of meeting the needs of the customers. In conclusion, it appears that the engineers are correct—the present equipment is not capable of producing gasket material within the necessary limits.
528
The Gasket Manufacturing Case
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
Nonparametric Statistics
14.2
14.4
Chapter 14
a.
Since the normal distribution is symmetric, the probability that a randomly selected observation exceeds the mean of a normal distribution is .5.
b.
By the definition of "median," the probability that a randomly selected observation exceeds the median of a normal distribution is .5.
c.
If the distribution is not normal, the probability that a randomly selected observation exceeds the mean depends on the distribution. With the information given, the probability cannot be determined.
d.
By definition of "median," the probability that a randomly selected observation exceeds the median of a non-normal distribution is .5.
a.
H0: η = 9 Ha: η > 9 The test statistic is S = {Number of observations greater than 9} = 7. The p-value = P(x ≥ 7) where x is a binomial random variable with n = 10 and p = .5. From Table II, p-value = P(x ≥ 7) = 1 − P(x ≤ 6) = 1 − .828 = .172 Since the p-value = .172 > α = .05, H0 is not rejected. There is insufficient evidence to indicate the median is greater than 9 at α = .05.
b.
H0 : η = 9 Ha: η ≠ 9 S1 = {Number of observations less than 9} = 3 and S2 = {Number of observations greater than 9} = 7 The test statistic is S = larger of S1 and S2 = 7. The p-value = 2P(x ≥ 7) where x is a binomial random variable with n = 10 and p = .5. From Table II, p-value = 2P(x ≥ 7) = 2(1 − P(x ≤ 6)) = 2(1 - .828) = .344 Since the p-value = .344 > α = .05, H0 is not rejected. There is insufficient evidence to indicate the median is different than 9 at α = .05.
Nonparametric Statistics
529
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
c.
H0: η = 20 Ha: η < 20 The test statistic is S = {Number of observations less than 20} = 9. The p-value = P(x ≥ 9) where x is a binomial random variable with n = 10 and p = .5. From Table II, p-value = P(x ≥ 9) = 1 − P(x ≤ 8) = 1 − .989 = .011 Since the p-value = .011 < α= .05, H0 is rejected. There is sufficient evidence to indicate the median is less than 20 at α = .05.
d.
H0: η = 20 Ha: η ≠ 20 S1 = {Number of observations less than 20} = 9 and S2 = {Number of observations greater than 20} = 1 The test statistic is S = larger of S1 and S2 = 9. The p-value = 2P(x ≥ 9) where x is a binomial random variable with n = 10 and p = .5. From Table II, p-value = 2P(x ≥ 9) = 2(1 − P(x ≤ 8)) = 2(1 − .989) = .022 Since the p-value = .022 < α = .05, H0 is rejected. There is sufficient evidence to indicate the median is different than 20 at α = .05.
e.
For all parts, μ = np = 10(.5) = 5 and σ =
npq = 10(.5)(.5) = 1.581.
(7 − .5) − 5 ⎞ ⎛ For part a, P(x ≥ 7) ≈ P ⎜ z ≥ = P(z ≥ .95) = .5 − .3289 = .1911 1.581 ⎟⎠ ⎝
This is close to the probability .172 in part a. The conclusion is the same. (7 − .5) − 5 ⎞ ⎛ For part b, 2P(x ≥ 7) ≈ 2 P ⎜ z ≥ = 2P(z ≥ .95) = 2(.5 − .3289) 1.581 ⎟⎠ ⎝ = .3422 This is close to the probability .344 in part b. The conclusion is the same. (9 − .5) − 5 ⎞ ⎛ = P(z ≥ 2.21) = .5 − .4864 For part c, P(x ≥ 9) ≈ P ⎜ z ≥ 1.581 ⎟⎠ ⎝ = .0136
This is close to the probability .011 in part c. The conclusion is the same.
530
Chapter 14
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
(9 − .5) − 5 ⎞ ⎛ For part d, 2P(x ≥ 9) ≈ 2 P ⎜ z ≥ = 2P(z ≥ 2.21) = 2(.5 − .4864) 1.581 ⎟⎠ ⎝ = .0272 This is close to the probability .022 in part d. The conclusion is the same.
14.6
f.
We must assume only that the sample is selected randomly from a continuous probability distribution.
a.
To determine if the median amount of caffeine in Breakfast Blend coffee exceeds 300 milligrams, we test: H0: η = 300 Ha: η > 300
b.
S=4
c.
Using Table II, Appendix B, with n = 6 and p = .5,
P ( x ≥ 4) = 1 − P ( x ≤ 3) = 1 − .656 = .344 d.
14.8
a.
Since the probability in part c is greater than α = .05, H0 is not rejected. There is insufficient evidence to indicate the median amount of caffeine in Breakfast Blend coffee exceeds 300 milligrams at α = .05. To determine if cohesiveness will deteriorate after storage, we test: H0: η = 0 Ha: η > 0
b.
The test statistic is S = {number of measurements greater than 0} = 13. The p-value = P(x ≥ 13) where x is a binomial random variable with n = 20 and p = .5. From Table II, p-value = P(x ≥ 13) = 1 – P(x ≤ 12) = 1 − .868 = .132
14.10
c.
Since the p-value = .132 > α = .05, H0 is not rejected. There is insufficient evidence to indicate cohesiveness will deteriorate after storage at α = .05.
a.
I would recommend the sign test because five of the sample measurements are of similar magnitude, but the 6th is about three times as large as the others. It would be very unlikely to observe this sample if the population were normal.
b.
To determine if the airline is meeting the requirement, we test: H0: η = 30 Ha: η < 30
Nonparametric Statistics
531
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
c.
The test statistic is S = number of measurements less than 30 = 5. H0 will be rejected if the p-value < α = .01.
d.
The test statistic is S = 5. The p-value = P(x ≥ 5) where x is a binomial random variable with n = 6 and p = .5. From Table II, p-value = P(x ≥ 5) = 1 − P(x ≤ 4) = 1 − .891 = .109 Since the p-value = .109 is not less than α = .01, H0 is not rejected. There is insufficient evidence to indicate the airline is meeting the maintenance requirement at α = .01.
14.12
To determine if the median surface roughness of coated interior pipe differs from 2 micrometers, we test: H0: η = 2 Ha: η ≠ 2 S1 = {Number of measurements < 2} = 9. S2 = {Number of measurements > 2} = 11. The test statistic is S = Larger of S1 and S2 = 11. The p-value = 2 P(x ≥ 11) where x is a binomial random variable with n = 20 and p = .5 From Table II, Appendix B, p-value = 2 P(x ≥ 11) = 2(1 − P( x ≤ 10)) = 2(1 − .588) = .824 Since the p-value = .824 α = .05, H0 is not rejected. There is insufficient evidence to indicate the median surface roughness of coated interior pipe differs from 2 micrometers at α = .05.
14.14
To determine if the distribution of A is shifted to the left of distribution B, we test: H0: The two sampled populations have identical distributions Ha: The probability distribution for population A is shifted to the left of population B.
n1 (n1 + n2 + 1) 15(15 + 15 + 1) 173 − 2 2 The test statistic is z = = = −2.47 15(15)(15 + 15 + 1) n1n2 ( n1 + n2 + 1) 12 12 The rejection region requires α = .05 in the lower tail of the z-distribution. From Table IV, z.05 = 1.645. The rejection region is z < −1.645. T1 −
Since the observed value of the test statistic falls in the rejection region (z = −2.47 < −1.645), H0 is rejected. There is sufficient evidence to indicate the distribution of A is shifted to the left of distribution B.
532
Chapter 14
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
14.16 Sample from Population 1 15 10 12 16 13 8
Rank 13 8.5 10.5 14 12 4.5
T1 = 62.5 a.
Sample from Population 2 5 12 9 9 8 4 5 10
Rank 2.5 10.5 6.5 6.5 4.5 1 2.5 8.5 T2 = 42.5
H0: The two sampled populations have identical probability distributions Ha: The probability distribution for population 1 is shifted to the left or to the right of that for 2 The test statistic is T1 = 62.5 since sample A has the smallest number of measurements. The null hypothesis will be rejected if T1 ≤ TL or T1 ≥ TU where TL and TU correspond to α = .05 (two-tailed), n1 = 6 and n2 = 8. From Table XV, Appendix B, TL = 29 and TU = 61. Reject H0 if T1 ≤ 29 or T1 ≥ 61. Since T1 = 62.5 ≥ 61, we reject H0 and conclude there is sufficient evidence to indicate population 1 is shifted to the left or right of population 2 at α = .05.
b.
H0: The two sampled populations have identical probability distributions Ha: The probability distribution for population 1 is shifted to the right of population 2 The test statistic remains T1 = 62.5. The null hypothesis will be rejected if T1 ≥ TU where TU corresponds to α = .05 (onetailed), n1 = 6 and n2 = 8. From Table XV, Appendix B, TU = 58. Reject H0 if T1 ≥ 58. Since T1 = 62.5 ≥ 58, we reject H0 and conclude there is sufficient evidence to indicate population 1 is shifted to the right of population 2 at α = .05.
Nonparametric Statistics
533
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
14.18
a.
Some preliminary calculations: Private Sector 2.58 5.05 0.05 2.10 4.30 2.25 2.50 1.94 2.33
b.
Rank 10 13 1 5 12 6 8 4 7 T1 = 66
Public Sector 5.40 2.55 9.00 10.55 1.02 5.11 12.42 1.67 3.33
Rank 15 9 16 17 2 14 18 3 11 T2 = 105
To determine if the distribution for public sector organizations is located to the right of the distribution for private sector firms, we test: H0: The two sampled populations have identical probability distributions Ha: The probability distribution of the public sector is located to the right of that for the private sector The test statistic is T2 = 105. The null hypothesis will be rejected if T2 ≥ TU where TU corresponds to α = .05 (onetailed), and n1 = n2 = 9. From Table XV, Appendix B, TU = 105. Reject H0 if T2 ≥ 105. Since T2 = 105 ≥ 105, H0 is rejected. There is sufficient evidence to indicate that the distribution in the public sector organization is located to the right of the distribution for the private sector firms at α = .05.
c.
The null hypothesis will be rejected if T2 ≥ TU where TU corresponds to α = .05 (onetailed), and n1 = n2 = 9. From Table XV, Appendix B, TU = 105. Since T1 = 105, we would reject H0. Thus, the p-value is less than or equal to α = .05.
d.
The assumptions necessary for the test are: 1. 2.
534
The two samples are random and independent. The two probability distributions from which the samples were drawn are continuous.
Chapter 14
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
14.20
a. American Purchasing Managers Sample 1 Rank 50 20.5 10 4.5 35 15.5 30 13.5 20 10.5 15 7.5 8 3 40 17.5 80 26.5 75 25 19 9 11 6 5 1.5 25 12 30 13.5 T1 = 186
b.
Mexican Purchasing Managers Sample 2 Rank 10 4.5 90 29 65 24 50 20.5 20 10.5 15 7.5 60 23 80 26.5 85 28 35 15.5 5 1.5 55 22 40 17.5 45 19 95 30 T2 = 279
To determine whether American and Mexican purchasing managers perceive the given ethical situation differently, we test: H0: The two sampled populations have identical probability distributions Ha: The probability distribution of the American managers is shifted to the right or left of the probability distribution of the Mexican managers.
The test statistic is z =
n1 (n1 + n2 + 1) 15(15 + 15 + 1) 186 − 2 2 = = −1.929 15(15)(15 + 15 + 1) n1n2 (n1 + n2 + 1) 12 12
T1 −
The rejection region requires α/2 = .05/2 = .025 in each tail of the z-distribution. From Table IV, Appendix B, z.025 = 1.96. The rejection region is z < −1.96 or z > 1.96. Since the observed value of the test statistic does not fall in the rejection region (z = −1.929 −1.96), H0 is not rejected. There is insufficient evidence to indicate American and Mexican purchasing managers perceive the given ethical situation differently at α = .05. c.
In order to use the t-test, we need to assume that the two populations being sampled from are normal and that the variances of the two populations are equal. To check these assumptions, we will use stem-and-leaf plots and dot plots.
Nonparametric Statistics
535
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
The stem-and-leaf plots are: Stem-and-leaf of Ethics Leaf Unit = 1.0 2 6 (2) 7 4 3 2 2 1
0 1 2 3 4 5 6 7 8
0 1 2 3 4 5 6 7 8 9
N
= 15
Managers = 2
N
= 15
58 0159 05 005 0 0 5 0
Stem-and-leaf of Ethics Leaf Unit = 1.0 1 3 4 5 7 (2) 6 4 4 2
Managers = 1
5 05 0 5 05 05 05 05 05
Neither of these two stem-and-leaf plots look mound-shaped. The assumption that the populations are normal may not be valid. The dot plots are: Managers 1 .... . :
. :
. .
.
. .
+---------+---------+---------+---------+---------+-------Ethics Managers 2
. .
. .
. .
. .
. .
.
.
. .
.
+---------+---------+---------+---------+---------+-------Ethics 0
20
40
60
80
100
The spread of the two data sets look approximately equal. The assumption that the variances of the two populations are the same appears to be valid.
536
Chapter 14
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
14.22
a.
Using MINITAB, histograms of the two data sets are:
Histogram of HEATRATE 9000 10000 11000 12000 13000 14000 15000 16000
Aeroderiv
20
Traditional
Frequency
15
10
5
0
9000 10000 11000 12000 13000 14000 15000 16000
HEATRATE Panel variable: ENGINE
From the histograms, the data for each group do not look like they are moundshaped. The variance of the aeroderivative engines is greater than that of the traditional engines. Thus, the assumptions of normal distributions and equal variances necessary for the t-test are probably not met.
14.24
b.
The p-value = .3431. Since this p-value is not small, H0 is not rejected. There is no evidence to indicate that the heat rate distribution of the traditional turbine engines is shifted to the right or left of that for the aeroderivative turbine engines.
a.
We first rank all the data: Firms with Successful MIS (1) Score Rank Score 52 5 90 70 15 75 40 1.5 80 80 19 95 82 21 90 65 12.5 86 59 9 95 60 10.5 93
T1 = 290.5
Nonparametric Statistics
Rank 25.5 17 19 29.5 25.5 23 29.5 28
Firms with Unsuccessful MIS (2) Score Rank Score Rank 60 10.5 65 12.5 50 4 55 7 55 7 70 15 70 15 90 25.5 41 3 85 22 40 1.5 80 19 55 7 90 25.5
T2 = 174.5
537
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
To determine whether the distribution of quality scores for the successfully implemented systems differs from that for the unsuccessfully implemented systems, we test: H0: The two sampled distributions are identical Ha: The probability distribution for the successful MIS is shifted to the right or left of that for the unsuccessful MIS
The test statistic is z =
n1 (n1 + n2 + 1) 16(16 + 14 + 1) 290.5 − 2 2 = = 1.767 16(14)(16 + 14 + 1) n1n2 (n1 + n2 + 1) 12 12
T1 −
The rejection region requires α/2 = .05/2 = .025 in each tail of the z-distribution. From Table IV, Appendix B, z.025 = 1.96. The rejection region is z < −1.96 or z > 1,96. Since the observed value of the test statistic does not fall in the rejection region (z = 1.767 >/ 1.96), H0 is not rejected. There is insufficient evidence to indicate the distribution of quality scores for the successfully implemented systems differs from that for the unsuccessfully implemented systems at α = .05. b.
We could use the two-sample t-test if: 1. 2.
14.26
a.
Both populations are normal. The variances of the two populations are the same.
The test statistic is T− or T+, the smaller of the two. The rejection region is T ≤ 152, from Table XVI, Appendix B, with n = 30, α = .10, and two-tailed.
b.
The test statistic is T−. The rejection region is T− ≤ 60, from Table XVI, Appendix B, with n = 20, α = .05, and one-tailed.
c.
The test statistic is T+. The rejection region is T+ ≤ 0, from Table XVI, Appendix B, with n = 8, α = .005, and one-tailed.
14.28
a.
The rejection region requires α = .05 in the upper tail of the z-distribution. From Table IV, Appendix B, z.05 = 1.645. The rejection region is z > 1.645. n(n + 1) 25(26) 273 − 4 4 = 2.97 = n(n + 1)(2n + 1) 25(26)(51) 24 24 T+−
b.
The large sample test statistic is z =
Since the observed value of the test statistic falls in the rejection region (z = 2.97 > 1.645), H0 is rejected. There is sufficient evidence to indicate that the responses for A tend to be larger than those for B at α = .05.
538
Chapter 14
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
c.
p-value = P(z ≥ 2.97) = .5 − P(0 < z < 2.97) = .5 − .4985 = .0015 (from Table IV, Appendix B) Thus, we can reject H0 for any preselected α greater than .0015.
14.30
a.
To determine if the chest injury ratings of drivers and front-seat passengers differ, we test: H0: The two sampled populations have identical probability distributions Ha: The probability distribution of drivers is shifted to the right or left of that for front-seat passengers
b.
Using MINITAB, the results are: Wilcoxon Signed Rank Test: Diff Test of median = 0.000000 versus median not = 0.000000
Diff
N 18
N for Test 16
Wilcoxon Statistic 23.0
P 0.021
Estimated Median -4.000
From the printout, the test statistic is T+ = 23.
c.
The rejection region is T+ ≤ To where To corresponds to α = .01 (two-tailed) and n = 16. From Table XVI, Appendix B, To = 19. The rejection region is T+ ≤ 19.
d.
Since the observed value of the test statistic does not fall in the rejection region (T+ = 23 ≤/ 19), H0 is not rejected. There is insufficient evidence to indicate the chest injury ratings of drivers and front-seat passengers differ at α = .01. From the printout, the p-value is p = .021.
14.32
Some preliminary calculations: Theme
Tourism Physical Transportation People History Climate Forestry Agriculture Fishing Energy Mining Manufacturing
Nonparametric Statistics
High School Teachers 10 2 7 1 2 6 5 7 9 2 10 12
Geography Alumni 2 1 3 6 5 4 8 10 7 8 11 12
Difference Rank of Absolute T-A Differences 8 11 1 1.5 4 8 9 −5 6 −3 2 3.5 6 −3 6 −3 2 3.5 10 −6 1.5 −1 0 (eliminated) Positive rank sum T+ = 27.5
539
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
To determine if the distributions of theme rankings for the two groups differ, we test: H0: The probability distributions for the two populations are identical Ha: The probability distribution of the high school teachers is shifted to the right or left of the probability distribution of the geography alumni The test statistic is T+ = 27.5. Reject H0 if T+ ≤ T0 where T0 is based on α = .05 and n = 11 (two-tailed): Reject H0 if T+ ≤ 11 (from Table XVI, Appendix B) Since the observed value of the test statistic does not fall in the rejection region (T+ = 27.5 ≤/ 11), H0 is not rejected. There is insufficient evidence to indicate that the distributions of these rankings for the two groups differ at α = .05. Practically, this means that the thematic content of a new atlas could be based on the views of either educators or geography alumni. 14.34
Some preliminary calculations are:
Employee 1 2 3 4 5 6 7 8 9 10
Before Flextime 54 25 80 76 63 82 94 72 33 90
After Flextime 68 42 80 91 70 88 90 81 39 93
Difference (B − A) −4 −17 0 −15 −7 −6 4 −9 −6 −3
Difference 7 9 (Eliminated) 8 5 3.5 2 6 3.5 1 T+ = 2
To determine if the pilot flextime program is a success, we test: H0: The two probability distributions are identical Ha: The probability distribution before is shifted to the left of that after The test statistic is T+ = 2. The rejection region is T+ ≤ 8, from Table XVI, Appendix B, with n = 9 and α = .05. Since the observed value of the test statistic falls in the rejection region (T+ = 2 ≤ 8), H0 is rejected. There is sufficient evidence to indicate the pilot flextime program has been a success at α = .05.
540
Chapter 14
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
14.36
Some preliminary calculations are:
Science 0 4 3 1 3 2 4 2 3 4
Math 2 3 0 1 1 3 0 1 1 1
Rank of Difference Absolute ScienceDifference Math −2 5 1 2 3 7.5 0 eliminate 2 5 −1 2 4 9 1 2 2 5 3 7.5 Negative rank sum T_ = 7 Positive rank sum T+ = 38
To determine if there are differences in the levels of family involvement between math and science homework, we test; H0: The distributions of the science and math levels of family involvement are the same Ha: The distributions of the science and math levels of family involvement differ The test statistic is T_ = 7. The rejection region is T_ ≤ To where To corresponds to α = .05 (two-tailed) and n = 9. From Table XVI, Appendix B, To = 6. The rejection region is T_ ≤ 6. Since the observed value of the test statistic does not fall in the rejection region (T_ = 7 ≤/ 6), H0 is not rejected. There is insufficient evidence to indicate there are differences in the levels of family involvement between math and science homework at α = .05. 14.38
a.
The hypotheses are: H0: The three probability distributions are identical Ha: At least two of the three probability distributions differ in location
b.
The test statistic is: H=
2 12 12 ⎡ 230 2 440 2 365 2 ⎤ Rj + + − 3(n + 1) = ∑ ⎢ ⎥ − 3(46) n( n + 1) 45(46) ⎣ 15 15 15 ⎦ nj
= 146.754 − 138 = 8.754
Nonparametric Statistics
541
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
The rejection region requires α = .05 in the upper tail of the χ2 distribution with 2 = 5.99147. The rejection df = p − 1 = 3 − 1 = 2. From Table VII, Appendix B, χ.05 region is H > 5.99147. Since the observed value of the test statistic falls in the rejection region (H = 8.754 > 5.99147), H0 is rejected. There is sufficient evidence to indicate that the probability distributions of at least two of the populations A, B, and C, differ in location at α = .05. c.
d.
14.40
a.
The approximate p-value is P(χ2 ≥ 8.754). From Table VII, Appendix B, with df = 2, .01 ≤ P(χ2 ≤ 8.754) ≤ .025. RB 440 R A = 230 = = 29.333 = 15.333 RB = 15 15 15 15 RC 365 n + 1 45 + 1 = = 24.333 = = 23 R = RC = 15 15 2 2 12 H= ∑ n j ( R j − R )2 n(n + 1) 12 ⎡ = 15(15.333 − 23) 2 + 15(29.333 − 23) 2 + 15(24.333 − 23) 2 ⎤⎦ = 8.754 ⎣ 45(46) In order to compare the three population means using parametric techniques, we must assume that all populations being sampled from are normal and all population variances are the same. It is quite possible that these two conditions are not met with this data. RA =
b.
Since we want to compare 3 groups, we will use the Kruskal-Wallis test.
c.
The test statistic is H=
R 2j ⎛ 53352 3937 2 37692 12 12 − + = + + 3( 1) n ⎜ ∑n n(n + 1) 161(161 + 1) ⎝ 67 57 37 j
⎞ ⎟ − 3(161 + 1) ⎠
= 11.201
14.42
d.
Since the p-value is so small (p = .0037), H0 will be rejected. There is sufficient evidence to indicate DEF distributions differ for the 3 tax litigation forums for α > .0037.
a.
To determine if the distributions of office rental growth rates differ among the four market cycle phases, we test: H0: The four probability distributions are identical Ha: At least two of the growth rate distributions differ
542
Chapter 14
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
b. Phase I 2.7 −1.0 1.1 3.4 4.2 3.5
14.44
The ranks of the measurements are: Rank 9 4.5 6 10 12 11 R1 = 52.5
Phase II 10.5 11.5 9.4 12.2 8.6 10.9
Rank 20 23 19 24 18 21 R2 = 125
Phase III 6.1 1.2 11.4 4.4 6.2 7.6
Rank 14 7 22 13 15.5 17 R3 = 88.5
Phase IV −1.0 6.2 −10.8 2.0 −1.1 −2.3
Rank 4.5 15.5 1 8 3 2 R4 = 34
c.
The rank sums appear in the table above. The test statistic is: R 2j ⎛ 52.52 1252 88.52 342 ⎞ 12 12 − + = + + + 3( 1) H= n ⎜ ⎟ − 3(24 + 1) ∑n n( n + 1) 24(24 + 1) ⎝ 6 6 6 6 ⎠ j = 16.23
d.
The rejection region requires α = .05 in the upper tail of the χ2 distribution with df = 2 p – 1 = 4 – 1 = 3. From Table VII, Appendix B, χ.05 = 7.81473. The rejection region is H > 7.81473.
e.
Since the observed value of the test statistic falls in the rejection region (H = 16.23 > 7.81473), H0 is rejected. There is sufficient evidence to indicate the distributions of office rental growth rates differ among the four market cycle phases at α = .05.
Some preliminary calculations are: Aromatics 1.06 0.79 0.82 0.89 1.05 0.95 0.65 1.15 1.12
Ranks 26 19 20 22 25 24 18 29 27.5
R1 = 210.5
Nonparametric Statistics
Chloroalkanes 1.58 1.45 0.57 1.16 1.12 0.91 0.83 0.43
Ranks 32 31 15 30 27.5 23 21 9.5
R2 = 189
Esters 0.29 0.06 0.44 0.61 0.55 0.43 0.51 0.10 0.34 0.53 0.06 0.09 0.17 0.60 0.17
Ranks 7 1.5 11 17 14 9.5 12 4 8 13 1.5 3 5.5 16 5.5 R3 = 128.5
543
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
To determine if the sorption rate distributions differ among the three solvents, we test: H0: The three probability distributions are identical Ha: At least two of the three probability distributions differ in location
The test statistic is R 2j ⎛ 210.52 1892 128.52 ⎞ 12 12 − 3(n + 1) = + + H= ⎜ ⎟ − 3(32 + 1) ∑ n( n + 1) nj 32(32 + 1) ⎝ 9 8 15 ⎠ = 20.197
The rejection region requires α = .01 in the upper tail of the χ2 distribution with df = p – 1 = 3 2 – 1 = 2. From Table VII, Appendix B, χ.01 = 9.21034. The rejection region is H > 9.21034. Since the observed value of the test statistic falls in the rejection region (H = 20.197 > 9.21034), H0 is rejected. There is sufficient evidence to indicate the sorption rate distributions differ among the three solvents at α = .01. 14.46
a.
The F-test would be appropriate if: 1. 2. 3.
b. c.
All p populations sampled from are normal. The variances of the p populations are equal. The p samples are independent.
The variances for the three populations are probably not the same and the populations are probably not normal. To determine whether the salary distributions differ among the three cities, we test: H0: The three probability distributions are identical Ha: At least two of the three probability distributions differ in location
Some preliminary calculations are: 1 Atlanta 34,600 84,900 61,700 38,900 77,200 83,600 59,800
544
Rank 1 19 11 3 17 18 10 R1 = 79
2 Los Angeles 42,400 135,000 63,000 43,700 69,400 97,000 49,500
Rank 4 21 12 5 13 20 7 R2 = 82
3 Washington, D.C. 38,000 76,900 48,000 72,600 73,200 51,800 55,000
Rank 2 16 6 14 15 8 9 R3 = 70
Chapter 14
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
The test statistic is H = =
2 12 Rj − 3(n + 1) ∑ n( n + 1) nj
12 ⎛ 79 2 82 2 70 2 ⎞ + + ⎜ ⎟ − 3(22) = 66.2894 − 66 = .2894 21(22) ⎝ 7 7 7 ⎠
The rejection region requires α = .05 in the upper tail of the χ2 distribution with df = p − 2 1 = 3 − 1 = 2. From Table VII, Appendix B, χ.05 = 5.99147. The rejection region is H > 5.99147. Since the observed value of the test statistic does not fall in the rejection region (H = .2894 >/ 5.99147), H0 is not rejected. There is insufficient evidence to indicate the salary distributions differ among the three cities at α = .05. We must assume we have independent random samples, sample sizes greater than or equal to 5 from each population, and that all populations are continuous. 14.48
a.
The hypotheses are: H0: The probability distributions for three treatments are identical Ha: At least two of the probability distributions differ in location
b.
The rejection region requires α = .10 in the upper tail of the χ2 distribution with df = 2 p − 1 = 3 − 1 = 2. From Table VII, Appendix B, χ.10 = 4.60517. The rejection region is Fr > 4.60517.
c.
Some preliminary calculations are: Block 1 2 3 4 5 6 7
A
9 13 11 10 9 14 10
Rank
1 2 1 1 2 2 1 RA = 10
B 11 13 12 15 8 12 12
Rank 2 2 2.5 2 1 1 2 RB = 12.5
C 18 13 12 16 10 16 15
Rank 3 2 2.5 3 3 3 3 RC = 19.5
12 R 2j − 3b( p + 1) ∑ bp ( p + 1) 12 ⎡102 + 12.52 + 19.52 ⎤ − 3(7)(4) = 90.9286 − 84 = 6.9286 = ⎦ 7(3)(4) ⎣
The test statistic is Fr =
Since the observed value of the test statistic falls in the rejection region (Fr = 6.9286 > 4.60517), H0 is rejected. There is sufficient evidence to indicate the effectiveness of the three different treatments differ at α = .10.
Nonparametric Statistics
545
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
14.50
a.
The Friedman test statistic is Fr = =
14.52
12 ∑ R 2j − 3b( p + 1) bp ( p + 1)
12 (27 2 + 252 + 182 + 112 + 92 ) − 3(6)(5 + 1) = 17.333 6(5)(5 + 1)
b.
The rejection region requires α = .05 in the upper tail of the χ2 distribution with df = 2 p – 1 = 5 – 1 = 4. From Table VII, Appendix B, χ.05 = 9.48773. The rejection region is Fr > 9.48773.
c.
Since the observed value of the test statistic falls in the rejection region (Fr = 17.333 > 9.48773), H0 is rejected. There is sufficient evidence to indicate there is a difference in the levels of farm production among the five conditions at α = .05.
a.
To determine if the distributions of rotary oil rigs differ among the three states, we test: H0: The probability distributions of the rotary oil rigs for the 3 states are the same Ha: At least two of the probability distributions of rotary oil rigs differ in location
b.
The ranked data are: Month/Year Nov. 2000 Oct. 2001 Nov. 2001
c.
Utah 2 2 2 R2 = 6
Alaska 1 1 1 R3 = 3
The test statistic is Fr =
546
California 3 3 3 R1 = 9
(
)
12 12 92 + 62 + 32 − 3(3)(3 + 1) = 6 R 2j − 3b( p + 1) = ∑ 3(3)(3 + 1) bp ( p + 1)
d.
The rejection region requires α = .05 in the upper tail of the χ2 distribution with df = 2 p – 1 = 3 – 1 = 2. From Table VII, Appendix B, χ.05 = 5.99147. The rejection region is H > 5.99147.
e.
Since the observed value of the test statistic falls in the rejection region (H = 6 > 5.99147), H0 is rejected. There is sufficient evidence to indicate the distributions of rotary oil rigs differ among the three states at α = .05.
Chapter 14
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
14.54
Some preliminary calculations are:
Location Anguilla Antigua Dominica Guyana Jamaica St. Lucia Suriname
Temephos Rank 4.6 5 9.2 5 7.8 5 1.7 2 3.4 3 6.7 4 1.4 1 R1 = 13
Malsathion Rank 1.2 1 2.9 3 1.4 1 1.9 4 3.7 4 2.7 1.5 1.9 3 R2 = 15
Fenitrothion Rank 1.5 2.5 2.0 1.5 2.4 2 2.2 5 2.0 2 2.7 1.5 2.0 4 R3 = 18.5
Fenthion Rank 1.8 4 7.0 4 4.2 4 1.5 1 1.5 1 4.8 3 2.1 5 R4 = 22
Chlorpyrifos Rank 1.5 2.5 2.0 1.5 4.1 3 1.8 3 7.1 5 8.7 5 1.7 2 R5 = 22
To determine if the resistance ratio distributions of the 5 insecticides differ, we test: H0: The distributions of the 5 insecticide ratios are the same Ha: At least two of the distributions of insecticide ratios differ 12 R 2j − 3b( p + 1) ∑ bp ( p + 1) 12 (252 + 17.52 + 18.52 + 222 + 222 ) − 3(7)(5 + 1) = 2.086 = 7(5)(5 + 1)
The test statistic is Fr =
Since no α was given, we will use α = .05. The rejection region requires α = .05 in the upper 2 tail of the χ2 distribution with df = p – 1 = 5 – 1 = 4. From Table VII, Appendix B, χ.05 = 9.48773. The rejection region is Fr > 9.48773. Since the observed value of the test statistic does not fall in the rejection region (Fr = 2.086 >/ 9.48773), H0 is not rejected. There is insufficient evidence to indicate that the resistance ratio distributions of the 5 insecticides differ at α = .05. 14.56
Some preliminary calculations are:
Week 1 2 3 4 5 6 7 8 9
Monday 5 5 2.5 2 5 4 5 4 1 R1 = 33.5
Tuesday 1 4 2.5 1 1 2 3.5 2 2 R2 = 19
Wednesday 4 3 5 3.5 2 3 1.5 1 5 R3 = 28
Thursday 2 1 1 5 3 1 3.5 3 3 R1 = 22.5
Friday 3 2 4 3.5 4 5 1.5 5 4 R2 = 32
To determine if the distributions of days of the weeks differ, we test: H0: The probability distributions of the 5 days of the week are the same Ha: At least two of the probability distributions of the 5 days of the week differ in location
Nonparametric Statistics
547
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
The test statistic is 12 R 2j − 3b( p + 1) ∑ bp ( p + 1) 12 33.52 + 192 + 282 + 22.52 + 322 − 3(9)(5 + 1) = 6.778 = 9(5)(5 + 1)
Fr =
(
)
Since no α was given we will use α = .05. The rejection region requires α = .05 in the upper 2 tail of the χ2 distribution with df = p – 1 = 5 – 1 = 4. From Table VII, Appendix B, χ.05 = 9.48773. The rejection region is H > 9.48773. Since the observed value of the test statistic does not fall in the rejection region (H = 6.778 >/ 9.48773), H0 is not rejected. There is insufficient evidence to indicate the distributions of the absentee rate for the days of the weeks differ at α = .05. 14.58
14.60
a.
From Table XVII with n = 10, rs,α/2 = rs,.025 = .648. The rejection region is rs > .648 or rs < −.648.
b.
From Table XVII with n = 20, rs,α = rs,.025 = .450. The rejection region is rs > .450.
c.
From Table XVII with n = 30, rs,α = rs,.01 = .432. The rejection region is rs < −.432.
a.
H0: ρs = 0 Ha: ρs ≠ 0
b.
The test statistic is rs =
x 0 3 0 −4 3 0 4
548
Rank, u 3 5.5 3 1 5.5 3 7 ∑ u = 28
SSuv =
∑ uv −
SSuu =
∑u
SSvv =
∑v
2
2
SSuv SSuuSSvv y 0 2 2 0 3 1 2
Rank, v 1.5 5 5 1.5 7 3 5 ∑ v = 28
( ∑ u )( ∑ v ) = 131 − 28(28) n
(∑u ) −
2
n
(∑ v) − n
7
= 137.5 −
(20) 2 7
= 137.5 −
(20) 2 7
2
u2 9 30.25 9 1 30.25 9 49 ∑ u 2 = 137.5
v2 2.25 25 25 2.25 49 9 25 ∑ v 2 = 137.5
uv 45 27.5 15 1.5 38.5 9 35 ∑ uv = 131
= 19
Chapter 14
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
rs =
19 = .745 25.5(25.5) Reject H0 if rs < −rs,α/2 or rs > rs,α/2 where α/2 = .025 and n = 7: Reject H0 if rs < −.786 or rs > .786 (from Table XVII, Appendix B).
Since the observed value of the test statistic does not fall in the rejection region, (rs = .745 >/ .786), do not reject H0. There is insufficient evidence to indicate x and y are correlated at α = .05.
14.62
c.
The p-value is P(rs ≥ .745) + P(rs ≤ −.745). For n = 7, rs = .745 is above rs,.025 where α/2 = .025 and below rs,.05 where α/2 = .05. Therefore, 2(.025) = .05 < p-value < 2(.05) = .10.
d.
The assumptions of the test are that the samples are randomly selected and the probability distributions of the two variables are continuous.
a.
Some preliminary calculations are: Expert 1 6 5 1 3 2 4
Brand A B C D E F
rs = 1 − b.
6∑ di2 n(n − 1) 2
= 1−
Expert 2 5 6 2 1 4 3
Difference di 1 −1 −1 2 −2 1
di2 1 1 1 4 4 1 ∑ di2 = 12
6(12) = 1 − .343 = .657 6(62 − 1)
To determine if there is a positive correlation in the rankings of the two experts, we test: H0: ρs = 0 Ha: ρs > 0 The test statistic is rs = .657. Reject H0 if rs > rs,α where α = .05 and n = 6. From Table XVII, Appendix B, rs,.01 = .829. Reject H0 if rs > .829. Since the observed value of the test statistic does not fall in the rejection region (rs = .657 >/ .829), H0 is not rejected. There is insufficient evidence to indicate a positive correlation in the rankings of the two experts at α = .05.
Nonparametric Statistics
549
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
14.64
a.
Some preliminary calculations are: x u y v 5.2 1 220 4.5 5.5 7 227 7.5 6.0 23.5 259 15.5 5.9 20.5 210 1 5.8 16 224 6 6.0 23.5 215 3 5.8 16 231 9 5.6 10 268 19 5.6 10 239 11 5.9 20.5 212 2 5.4 5 410 24 5.6 10 256 14 5.8 16 306 22 5.5 7 259 15.5 5.3 3 284 21 5.3 3 383 23 5.7 12.5 271 20 5.5 7 264 18 5.7 12.5 227 7.5 5.3 3 263 17 5.9 20.5 232 10 5.8 16 220 4.5 5.8 16 246 13 5.9 20.5 241 12 ∑ u =300 ∑ v = 300 SSuv =
∑ uv −
SSuu =
∑u
SSvv =
∑v
rs =
2
2
u-sq 1 49 552.25 420.25 256 552.25 256 100 100 420.25 25 100 256 49 9 9 156.25 49 156.25 9 420.25 256 256 420.25 2 ∑ u =4878
( ∑ u )( ∑ v ) = 3197.5 − 300(300) n
(∑u ) −
2
n
(∑v) −
SSuv SSuuSSvv
n
=
24
= 4878 −
v-sq 20.25 56.25 240.25 1 36 9 81 361 121 4 576 196 484 240.25 441 529 400 324 56.25 289 100 20.25 169 144 2 ∑ v =4898.5
uv 4.5 52.5 364.25 20.5 96 70.5 144 190 110 41 120 140 352 108.5 63 69 250 126 93.75 51 205 72 208 246 ∑ uv =3197.5
= −552.5
3002 = 1128 24
2
= 4898.5 − −552.5
1128(1148.5)
3002 = 1148.5 24 = −.4854
Since the magnitude of the correlation coefficient is not particularly large, there is a fairly weak negative relationship between sweetness index and pectin.
550
Chapter 14
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
b.
To determine if there is a negative association between the sweetness index and the amount of pectin, we test: H0: ρs = 0 Ha: ρs < 0 The test statistic is rs = −.4854 Reject H0 if rs < −rs,α where α = .01 and n = 24. Reject H0 if rs < −.485 (from Table XVII, Appendix B) Since the observed value of the test statistic falls in the rejection region (rs = −.4854 < −.485), H0 is rejected. There is sufficient evidence to indicate there is a negative association between the sweetness index and the amount of pectin at α = .01.
14.66
a.
Some preliminary calculations are: Parent 643 381 342 251 216 208 192 141 131 128 124
Rank, u 11 10 9 8 7 6 5 4 3 2 1
rs = 1 −
Subsid 2,617 1,724 1,867 1,238 890 681 1,534 899 492 579 672
6∑ di2 n( n − 1) 2
=1−
Rank, v 11 9 10 7 5 4 8 6 1 2 3
Difference di 0 1 -1 1 2 2 -3 -2 2 0 -2
di2 0 1 1 1 4 4 9 4 4 0 4 2 ∑ di = 32
6(32) = 1 − .145 = .855 11(112 − 1)
Since this correlation coefficient is fairly close to 1, it indicates that there is a relatively strong positive relationship between the number of parent companies and the number of subsidiaries. To determine if the number of parent companies is positively related to the number of subsidiaries, we test: H0: ρs = 0 Ha: ρs > 0 The test statistic is rs = .855.
Nonparametric Statistics
551
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
From Table XVI, Appendix B, rs,.05 = .523, with n = 11. The rejection region is rs > .523. Since the observed value of the test statistic falls in the rejection region (rs = .855 > .523), H0 is rejected. There is sufficient evidence to indicate that the number of parent companies is positively related to the number of subsidiaries at α = .05. b.
We must assume: 1. The sample is randomly selected. 2. The probability distributions of both of the variables are continuous. The actual number of companies and subsidiaries are not continuous. However, since the numbers of companies/subsidiaries are very large, this assumption is basically met. From the information given, we cannot tell whether the sample was random or not.
14.68
b.
Some preliminary calculations:
Involvement
1 2 3 4 5 6 7 8 9 10 11
rs = 1 −
6∑ d i2 n(n − 1) 2
ui
vi
Differences di = ui − vi
8 6 10 2 5 9 1 4 7 11 3
9 7 10 1 5 8 2 4 6 11 3
−1 −1 0 1 0 1 −1 0 1 0 0
=1−
d i2
∑ di2
1 1 0 1 0 1 1 0 1 0 1 =6
6(6) = .972 11(112 − 1)
To determine if a positive relationship exists between participation rates and cost savings rates, we test: H0: ρs = 0 Ha: ρs > 0 The test statistic is rs = .972. From Table XVII, Appendix B, rs,.01 = .736, with n = 11. The rejection region is rs > .736.
552
Chapter 14
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
Since the observed value of the test statistic does falls in the rejection region (rs = .972 > .736), H0 is rejected. There is sufficient evidence to indicate that a positive relationship exists between participation rates and cost savings rates at α = .01. c.
In order for the above test to be valid, we must assume: 1. 2.
The sample is randomly selected. The probability distributions of both of the variables are continuous.
In order to use the Pearson correlation coefficient, we must assume that both populations are normally distributed. It is very unlikely that the data are normally distributed. 14.70
The appropriate test for this completely randomized design is the Kruskal-Wallis H-test. Some preliminary calculations are: Sample 1 18 32 43 15 63
Rank 4.5 6 9 3 12
Sample 2 12 33 10 34 18
Rank Sample 3 12 87 7 53 1 65 8 50 4.5 64 77 R2 = 22.5
R1 = 34.5
Rank
16 11 14 10 13 15 R3 = 79
To determine whether at least two of the populations differ in location, we test: H0: The three probability distributions are identical Ha: At least two of the three probability distributions differ in location 2
Rj 12 The test statistic is H = − 3( n + 1) ∑ n( n + 1) nj =
⎡ (34.5) 2 (22.5) 2 (79) 2 ⎤ 12 + + ⎢ ⎥ − 3(16 + 1) 16(16 + 1) ⎣ 5 5 6 ⎦
= 60.859 − 51 = 9.859 The rejection region requires α = .05 in the upper tail of the χ2 distribution with df = p − 1 = 3 2 − 1 = 2. From Table VII, Appendix B, χ.05 = 5.99147. The rejection region is H > 5.99147. Since the observed value of the test statistic falls in the rejection region (H = 9.859 > 5.99147), reject H0. There is sufficient evidence to indicate a difference in location for at least two of the three probability distributions at α = .05.
Nonparametric Statistics
553
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
14.72
The appropriate test for two independent samples is the Wilcoxon rank sum test. Some preliminary calculations are: Sample 1 1.2 1.9 .7 2.5 1.0 1.8 1.1
Rank 4 8.5 1 10 2 7 3 T1 = 35.5
Sample 2 1.5 1.3 2.9 1.9 2.7 3.5
Rank 6 5 12 8.5 11 13
T2 = 55.5
To determine if there is a difference between the locations of the probability distributions, we test: H0: The two sampled populations have identical probability distributions Ha: The probability distribution for population 1 is shifted to the left or right of that for 2 The test statistic is T2 = 55.5. Reject H0 if T2 ≤ TL or T2 ≥ TU where α = .05 (two-tailed), n1 = 7 and n2 = 6: Reject H0 if T2 ≤ 28 or T2 ≥ 56 (from Table XV, Appendix B). Since T2 = 55.5 ≤/ 28 and T2 = 55.5 ≥/ 56, do not reject H0. There is insufficient evidence to indicate a difference between the locations of the probability distributions for the sampled populations at α = .05. 14.74
a.
To determine whether the median biting rate is higher in bright, sunny weather, we test: H0: η = 5 Ha: η > 5
b.
( S − .5) − .5n (95 − .5) − .5(122) = = 6.07 .5 n .5 122 (where S = number of observations greater than 5)
The test statistic is z =
The p-value is p = P(z ≥ 6.07). From Table IV, Appendix B, p = P(z ≥ 6.07) ≈ 0.0000. c.
554
Since the observed p-value is less than α (p = 0.0000 < .01), H0 is rejected. There is sufficient evidence to indicate that the median biting rate in bright, sunny weather is greater than 5 at α = .01.
Chapter 14
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
14.76
Some preliminary calculations are: Difference Highway 1 − Highway 2 −25 4 −23 −16 −16
Rank of Absolute Differences 5 1 4 2.5 2.5 T+ = 1
To determine if the heavily patrolled highway tends to have fewer speeders per 100 cars than the occasionally patrolled highway, we test: H0: The two sampled populations have identical probability distributions Ha: The probability distribution for highway 1 is shifted to the left of that for highway 2 The test statistic is T+ = 1. The rejection region is T+ ≤ 1 from Table XVI, Appendix B, with n = 5 and α = .05. Since the observed value of the test statistic falls in the rejection region (T+ = 1 ≤ 1), H0 is rejected. There is sufficient evidence to indicate the probability distribution for highway 1 is shifted to the left of that for highway 2 at α = .05. b.
Some preliminary calculations are: Day
1 2 3 4 5
Difference Highway 1 − Highway 2 25 4 −23 −16 −16
d=
∑ di = −76 5
n
∑
di2
= −15.2
( ∑ di ) −
2
n = n −1 sd = 131.7 = 11.4761
sd2 =
(−76) 2 5 5 −1
1682 −
To determine if the mean number of speeders per 100 cars differ for the two highways, we test: H0: μ1 = μ2 Ha: μ1 ≠ μ2 The test statistic is t =
Nonparametric Statistics
d −0 −15.2 = = − 2.96 s d / n 11.4761 5
555
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
The rejection region requires α/2 = .05/2 = .025 in each tail of the t-distribution with df = n − 1 = 5 − 1 = 4. From Table VI, Appendix B, t.025 = 2.776. The rejection region is t > 2.776 and t < −2.776. Since the observed value of the test statistic falls in the rejection region (t = −2.96 < −2.776), H0 is rejected. There is sufficient evidence to indicate the mean number of speeders per 100 cars differ for the two highways at α = .05. We must assume that the population of differences is normally distributed and that a random sample of differences was selected. 14.78
a.
Since only 70 of the 80 customers responded to the question, only the 70 will be included. To determine if the median amount spent on hamburgers at lunch at McDonald's is less than $2.25, we test: H0: η = 2.25 Ha: η < 2.25 S = number of measurements less than 2.25 = 20. The test statistic is z =
( S − .5) − .5n .5 n
=
(20 − .5) − .5(70) .5 70
= −3.71
No α was given in the exercise. We will use α = .05. The rejection region requires α = .05 in the lower tail of the z-distribution. From Table IV, Appendix B, z.05 = 1.645. The rejection region is z > 1.645. Since the observed value of the test statistic does not fall in the rejection region (z = −3.71 >/ 1.645), H0 is not rejected. There is insufficient evidence to indicate that the median amount spent on hamburgers at lunch at McDonald's is less than $2.25 at α = .05.
556
b.
No. The survey was done in Boston only. The eating habits of those living in Boston are probably not representative of all Americans.
c.
We must assume that the sample is randomly selected from a continuous probability distribution.
Chapter 14
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
14.80
Some preliminary calculations: 1
Urban 4.3 5.2 6.2 5.6 3.8 5.8 4.7
2 3 Rank Suburban Rank Rural Rank 4.5 5.9 14 5.1 9 10.5 6.7 17 4.8 7 15.5 7.6 19 3.9 2 12 4.9 8 6.2 15.5 1 5.2 10.5 4.2 3 13 6.8 18 4.3 4.5 6 R1 = 62.5 R2 = 86.5 R3 = 41
To determine if there is a difference in the level of property taxes among the three types of school districts, we test: H0: The three probability distributions are identical Ha: At least two of the three probability distributions differ in location 2
The test statistic is H =
Rj 12 − 3( n + 1) ∑ n( n + 1) nj
⎛ 62.52 86.52 412 ⎞ 12 + + ⎜ ⎟ − 3(20) = 65.8498 − 60 19(19 + 1) ⎝ 7 6 6 ⎠ = 5.8498 =
The rejection region requires α = .05 in the upper tail of the χ2 distribution with df = p − 1 = 2 = 5.99147. The rejection region is H > 5.99147. 3 − 1 = 2. From Table VII, Appendix B, χ.05 Since the observed value of the test statistic does not fall in the rejection region (H = 5.8498 >/ 5.99147), H0 is not rejected. There is insufficient evidence to indicate that there is a difference in the level of property taxes among the three types of school districts at α = .05. 14.82
a. Some preliminary calculations are: Truck Static Weight of Truck (ui) 1 3 2 4 3 10 4 1 5 6 6 8 7 2 8 5 9 7 10 9 55
Nonparametric Statistics
Weigh-in-Motion Prior (vi) 3 4 9 1.5 6 8 1.5 5 7 10 55
Weigh-in-Motion After (wi) 3 4 10 2 6 8 1 5 7 9 55
uivi
9 16 90 1.5 36 64 3 25 49 90 383.5
uiwi
9 16 100 2 36 64 2 25 49 81 384
557
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
∑ ui ∑ vi = 383.5 − 55(55)
SSuv =
∑ ui vI −
SSuw =
∑ u i wi −
SSuu =
∑ ui2 −
SSvv =
∑
SSww =
∑
rs1 = rs2 =
vi2
n ( ∑ ui ∑ wi )
( ∑ ui ) n
= 385 −
SSuu SSvv
n
=
= 384.5 −
SSuw SSuu SSww
=
2
= 385 −
81 82.5(82)
= 81
55(55) = 81.5 10
552 = 81.5 10
2
( ∑ wi ) −
SSuv
= 384 −
2
n
( ∑ vi ) −
wi2
n
10
552 = 82 10 552 = 82.5 10
= .9848
81.5 = .9879 82.5(82.5)
The correlation coefficient for x and y1 is rs1 = .9848. Since rs1 > 0, the relationship between static weight and weigh-in-motion prior to adjustment is positive. Because the value is close to 1, the relationship is very strong. It is larger than r1 = .965 found in Exercise 10.89. The correlation coefficient for x and y2 is rs2 = .9879. Since rs2 > 0, the relationship between static weight and weigh-in-motion after the adjustment is positive. Because the value is close to 1, the relationship is very strong. It is smaller than r2 = .996 found in Exercise 10.89. b.
In order for rs to be exactly 1, the rankings for the static weight and the weigh-in-motion must be the same for each truck. In order for rs to be exactly 0, the rankings for one of the variables (static weight) must be equal to 11 minus ranking of the other variable (weigh-in-motion) for each truck.
14.84
a.
To determine if the median level differs from the target, we test: H0: η = .75 Ha: η ≠ .75
b.
S1 = number of observations less than .75 and S2 = number of observations greater than .75. The test statistic is S = larger of S1 and S2. The p-value = 2P(x ≥ S) where x is a binomial random variable with n = 25 and p = .5. If the p-value is less than α = .10, reject H0.
558
Chapter 14
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
c.
A Type I error would be concluding the median level is not .75 when it is. If a Type I error were committed, the supervisor would correct the fluoridation process when it was not necessary. A Type II error would be concluding the median level is .75 when it is not. If a Type II error were committed, the supervisor would not correct the fluoridation process when it was necessary.
d.
S1 = number of observations less than .75 = 7 and S2 = number of observations greater than .75 = 18. The test statistic is S = larger of S1 and S2 = 18. The p-value = 2P(x ≥ 18) where x is a binomial random variable with n = 25 and p = .5. From Table II, p-value = 2P(x ≥ 18) = 2(1 − P(x ≤ 17)) = 2(1 − .978) = 2(.022) = .044 Since the p-value = .044 < α = .10, H0 is rejected. There is sufficient evidence to indicate the median level of fluoridation differs from the target of .75 at α = .10.
e.
A distribution heavily skewed to the right might look something like the following:
One assumption necessary for the t-test is that the distribution from which the sample is drawn is normal. A distribution which is heavily skewed in one direction is not normal. Thus, the sign test would be preferred. 14.86
Some preliminary calculations are: Hours
Rank
1 2 3 4 5 6 7 8
1 2 3 4 5 6 7 8
Nonparametric Statistics
Fraction Defective .02 .05 .03 .08 .06 .09 .11 .10
Rank
1 3 2 5 4 6 8 7
di
0 −1 1 −1 1 0 −1 1
d i2
∑ di2
0 1 1 1 1 0 1 1 =6
559
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
To determine if the fraction defective increases as the day progresses, we test: H0: ρs = 0 Ha: ρs > 0 The test statistic is rs = 1 −
6∑ di2 n(n − 1) 2
=1−
6(6) = 1 − .071 = .929 8(82 − 1)
Reject H0 if rs > rs,α where α = .05 and n = 8: Reject H0 if rs > .643 (from Table XVII, Appendix B). Since rs = .929 > .643, reject H0. There is sufficient evidence to indicate that the fraction defective increases as the day progresses at α = .05. 14.88
a.
The design utilized was a completely randomized design.
b.
Some preliminary calculations are: Site 1 34.3 35.5 32.1 28.3 40.5 36.2 43.5 34.7 38.0 35.1
Rank 6 11 3 1 19 12 23 8 15 9 R1 = 107
Site 2 39.3 45.5 50.2 72.1 48.6 42.2 103.5 47.9 41.2 44.0
Rank 17 25 28 29 27 21 30 26 20 24 R2 = 247
Site 3 34.5 29.3 37.2 33.2 32.6 38.3 43.3 36.7 40.0 35.2
Rank 7 2 14 5 4 16 22 13 18 10 R3 = 111
To determine if the probability distributions for the three sites differ, we test: H0: The three sampled population probability distributions are identical Ha: At least two of the three sampled population probability distributions differ in location 2
Rj 12 The test statistic is H = − 3( n + 1) − 3(n + 1) ∑ n( n + 1) nj =
560
12 ⎡107 2 247 2 1112 ⎤ + + ⎢ ⎥ − 3(31) = 109.3923 − 93 30(31) ⎣ 10 10 10 ⎦ = 16.3923
Chapter 14
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
The rejection region requires α = .05 in the upper tail of the χ2 distribution with df = 2 = 5.99147. The rejection region is p − 1 = 3 − 1 = 2. From Table VII, Appendix B, χ.05 H > 5.99147. Since the observed value of the test statistic falls in the rejection region (H = 16.3923 > 5.99147), H0 is rejected. There is sufficient evidence to indicate the probability distributions for at least two of the three sites differ at α = .05. c.
Since H0 was rejected, we need to compare all pairs of sites. Some preliminary calculations are: Site 1 34.3 35.5 32.1 28.3 40.5 36.2 43.5 34.7 38.0 35.1
Site 2 39.3 45.5 50.2 72.1 48.6 42.2 103.5 47.9 41.2 44.0
Rank 3 6 2 1 10 7 13 4 8 5 T1 = 59 Rank 9 15 18 19 17 12 20 16 11 14 T2 = 151
Site 2 39.3 45.5 50.2 72.1 48.6 42.2 103.5 47.9 41.2 44.0
Rank 9 15 18 19 17 12 20 16 11 14 T2 = 151 Site 3 34.5 29.3 37.2 33.2 32.6 38.3 43.3 36.7 40.0 35.2
Site 1 34.3 35.5 32.1 28.3 40.5 36.2 43.5 34.7 38.0 35.1
Rank 6 11 3 1 18 12 20 8 15 9 T1 = 103
Site 3 34.3 29.3 37.2 33.2 32.6 38.3 43.3 36.7 40.0 35.2
Rank 7 2 14 5 4 16 19 13 17 10 T3 = 107
Rank 4 1 7 3 2 8 13 6 10 5 T3 = 59
For each pair, we test: H0: The two sampled population probability distributions are identical Ha: The probability distribution for one site is shifted to the right or left of the other. The rejection region for each pair is T ≤ 79 or T ≥ 131 from Table XV, Appendix B, with n1 = n2 = 10 and α = .05.
Nonparametric Statistics
561
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
For sites 1 and 2: The test statistic is T1 = 59. Since the observed value of the test statistic falls in the rejection region, (TA = 59 ≤ 79), H0 is rejected. There is sufficient evidence to indicate the probability distribution for site 1 is shifted to the left of that for site 2 at α = .05. For sites 1 and 3: The test statistic is T1 = 103. Since the observed value of the test statistic does not fall in the rejection region (T1 = 103 79 and 103 >/ 131), H0 is not rejected. There is insufficient evidence to indicate the probability distribution for site 1 is shifted to the right or left of that for site 3 at α = .05. For sites 2 and 3: The test statistic is T2 = 151. Since the observed value of the test statistic falls in the rejection region (T2 = 151 ≥ 131), H0 is rejected. There is sufficient evidence to indicate the probability distribution for site 2 is shifted to the right of that for site 3 at α = .05. Thus, the income for those at site 2 is significantly higher than at the other two sites. d.
The necessary assumptions are: 1. 2. 3.
The three samples are random and independent. There are five or more measurements in each sample. The three probability distributions from which the samples are drawn are continuous.
For parametric tests, the assumptions are: 1. 2. 3.
562
The three populations are normal. The samples are random and independent The three population variances are equal.
Chapter 14
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
14.90
Using MINITAB, the results of the Wilcoxon Rank Sum Test (Mann-Whitney Test) for each of the Variables are: Mann-Whitney Test and CI: CREATIVE-S, CREATIVE-NS CREATIVE-S CREATIVE-NS
N 47 67
Median 5.0000 4.0000
Point estimate for ETA1-ETA2 is 1.0000 95.0 Percent CI for ETA1-ETA2 is (0.9999,1.0000) W = 3734.5 Test of ETA1 = ETA2 vs ETA1 not = ETA2 is significant at 0.0000 The test is significant at 0.0000 (adjusted for ties)
Mann-Whitney Test and CI: INFO-S, INFO-NS INFO-S INFO-NS
N 47 67
Median 5.000 5.000
Point estimate for ETA1-ETA2 is 0.000 95.0 Percent CI for ETA1-ETA2 is (-0.000,1.000) W = 2888.5 Test of ETA1 = ETA2 vs ETA1 not = ETA2 is significant at 0.2856 The test is significant at 0.2743 (adjusted for ties)
Mann-Whitney Test and CI: DECPERS-S, DECPERS-NS DECPERS-S DECPERS-NS
N 47 67
Median 3.000 2.000
Point estimate for ETA1-ETA2 is -0.000 95.0 Percent CI for ETA1-ETA2 is (-0.000,1.000) W = 2963.5 Test of ETA1 = ETA2 vs ETA1 not = ETA2 is significant at 0.1337 The test is significant at 0.1228 (adjusted for ties)
Mann-Whitney Test and CI: SKILLS-S, SKILLS-NS SKILLS-S SKILLS-NS
N 47 67
Median 6.0000 5.0000
Point estimate for ETA1-ETA2 is 1.0000 95.0 Percent CI for ETA1-ETA2 is (0.9999,1.9999) W = 3498.5 Test of ETA1 = ETA2 vs ETA1 not = ETA2 is significant at 0.0000 The test is significant at 0.0000 (adjusted for ties)
Nonparametric Statistics
563
To download more slides, ebook, solutions and test bank, visit http://downloadslide.blogspot.com
Mann-Whitney Test and CI: TASKID-S, TASKID-NS N 47 67
TASKID-S TASKID-NS
Median 5.000 4.000
Point estimate for ETA1-ETA2 is 1.000 95.0 Percent CI for ETA1-ETA2 is (-0.000,1.000) W = 3028.0 Test of ETA1 = ETA2 vs ETA1 not = ETA2 is significant at 0.0614 The test is significant at 0.0566 (adjusted for ties)
Mann-Whitney Test and CI: AGE-S, AGE-NS AGE-S AGE-NS
N 47 67
Median 47.000 45.000
Point estimate for ETA1-ETA2 is 1.000 95.0 Percent CI for ETA1-ETA2 is (-1.000,4.001) W = 2891.5 Test of ETA1 = ETA2 vs ETA1 not = ETA2 is significant at 0.2779 The test is significant at 0.2771 (adjusted for ties)
Mann-Whitney Test and CI: EDYRS-S, EDYRS-NS EDYRS-S EDYRS-NS
N 47 67
Median 13.000 13.000
Point estimate for ETA1-ETA2 is -0.000 95.0 Percent CI for ETA1-ETA2 is (0.000,-0.000) W = 2664.0 Test of ETA1 = ETA2 vs ETA1 not = ETA2 is significant at 0.8268 The test is significant at 0.8191 (adjusted for ties)
A summary of the tests above and the t-tests from Chapter 7 are listed in the table: Variable CREATIVE INFO DECPERS SKILLS TASKID AGE EDYRS
Wilcoxon Test Statistic, T2 3734.5 2888.5 2963.5 3498.5 3028.0 2891.5 2664.0
p-value 0.000 0.274 0.123 0.000 0.057 0.277 0.819
t 8.847 1.503 1.506 4.766 1.738 0.742 -0.623
p-value 0.000 0.136 0.135 0.000 0.087 0.460 0.534
The p-values for the Wilcoxon Rank Sum Tests and the t-tests are similar and the decisions are the same. Since the sample sizes are large (n = 47 and n = 67), the Central Limit Theorem applies. Thus, the t-tests (or z-tests) are valid. One assumption for the Wilcoxon Rank Sum test is that the distributions are continuous. Obviously, this is not true. There are many ties in the data, so the Wilcoxon Rank Sum tests may not be valid.
564
Chapter 14