5 4
Random Sample and Central
Limit
Theorem; X-Bar and R control charts.
Exercise 1: (Example 1) Suppose X1, X2, …, X20 is a sample from normal distribution N (µ , σ σ 2 = 4. Find X (a) Expectation and Variance of (b) Distribution of X
) with µ = 5,
2
Exercise 2: (Example 2) Given that X is normally distributed with mean 50 and standard deviation 4, compute the following for n=25.
(a) Mean and variance of X P ( X ≤49 ) (b) (c) P ( X >52 ) (d) P ( 49 ≤ X ≤51 .5)
1
Probability and Statistics Work Book
(a) (b) (c) (d)
Exercise 3: (Tutorial 5, No.1) Given that X is normally distributed with mean 20 and standard deviation 2, compute the following for n=40. Mean and variance of X P ( X ≤19 ) P ( X >22 ) P (19 ≤X ≤21 .5)
Solution: (a) Mean of X = 20 and variance of X = 4/40 = 0.1 (b) P ( X ≤ 19 ) = P ( Z ≤ (c)
P ( X > 22 ) = P ( Z >
19 − 20 ) = P ( Z ≤ −3.16 ) = 0.000789 0 .1
22 − 20 ) = P ( Z > 6.32 ) = 1 − P ( Z ≤ 6.32 ) = 1 −1 = 0 0.1)
19 − 20 21 .5 − 20 P (19 ≤ X ≤ 21 .5) = P ( ≤Z ≤ ) = P (−3.16 ≤ Z ≤ 7.9) 0.1 0.1 (d) = Φ(7.9) − Φ( −3.16 ) = 1 − 0.000789 = 0.999211 Exercise 4: (Tutorial 5, No.2) Let X denote the number of flaws in a 1 in length of copper wire. The pmf of X is given in the following table X=x P(X=x)
0 0.48
1 0.39
2 0.12
3 0.01
100 wires are sampled from this population. What is the probability that the average number of flaws per wire in this sample is less than 0.5? Solution: Given that, Mean of X = 0(0.48) + 1(0.39) + 2(0.12) + 3(0.01)=0.66 Variance of X =[ 02(0.48) + 12(0.39) + 22(0.12) + 32(0.01) ] – (0.66)2 = 0.5244 If n=100, the mean of X is 0.66 and the variance of X is 0.5244/100 = 0.005244 So, P ( X < 0.5) = P ( Z <
0.5 − 0.66 ) = P( Z < −2.21) = 0.0136 0.005244
2
Probability and Statistics Work Book
Exercise 5: (Tutorial 5, No.3) At a large university, the mean age of the students is 22.3 years, and the standard deviation is 4 years. A random sample of 64 students is drawn. What is the probability that the average age of these students is greater than 23 years? Solution: Given that, the mean of X is 22.3 and the variance of X is 16 If n = 64, the mean of X is 22.3 and the variance of X is 16/64 = 0.25
So,
23 − 22 .3 ) = P ( Z < 1.4) = 1 − P ( Z ≤ 1.4) 0.25 = 1 − Φ(1.4) = 1 − 0.919 = 0.081
P ( X > 23 ) = P ( Z <
Exercise 6: The flexural strength (in MPa) of certain concrete beams is X ~ N (8, 2.25). Find the probability that the sample mean of strength of 16 concrete beams will belong to (7.55, 8.75)
3
Probability and Statistics Work Book
Exercise 7(Example 3) A component part for a jet aircraft engine is manufactured by an investment casting process. The vane opening on this casting is an important functional parameter of the part. We will illustrate the use of X and R control charts to assess the statistical stability of this process. The table presents 20 samples of five parts each. The values given in the table have been coded by using the last three digits of the dimension; that is, 31.6 should be 0.50316 inch.
Sample Number 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 (a) (b)
x1 33 33 35 30 33 38 30 29 28 38 28 31 27 33 35 33 35 32 25 35
x2 29 31 37 31 34 37 31 39 33 33 30 35 32 33 37 33 34 33 27 35
x3 31 35 33 33 35 39 32 38 35 32 28 35 34 35 32 27 34 30 34 36
x4 32 37 34 34 33 40 34 39 36 35 32 35 35 37 35 31 30 30 27 33
x5 33 31 36 33 34 38 31 39 43 32 31 34 37 36 39 30 32 33 28 30
X
31.6 33.4 35.0 32.2 33.8 38.4 31.6 36.8 35.0 34.0 29.8 34.0 33.0 34.8 35.6 30.8 33.0 31.6 28.2 33.8
r 4 6 4 4 2 3 4 10 15 6 4 4 10 4 7 6 5 3 9 6
Construct X and R control charts. After the process is in control, estimate the process mean and standard deviation.
4
Probability and Statistics Work Book
Exercise 8(Tutorial 5, No.4) The overall length of a skew used in a knee replacement device is monitored using and R charts. The following table gives the length for 20 samples of size 4. (Measurements are coded from 2.00 mm; that is, 15 is 2.15 mm.) Observation Sample 1
2
3
Observation
4 Sample 1
2
3
4
1
1 6
1 8
1 13 5
11
1 4
1 4
1 5
1 3
2
1 6
1 5
1 16 7
12
1 5
1 3
1 5
1 6
3
1 5
1 6
2 16 0
13
1 3
1 7
1 6
1 5
4
1 4
1 6
1 12 4
14
1 1
1 4
1 4
2 1
5
1 4
1 5
1 16 3
15
1 4
1 5
1 4
1 3
6
1 6
1 4
1 15 6
16
1 8
1 5
1 6
1 4
7
1 6
1 6
1 15 4
17
1 4
1 6
1 9
1 6
8
1 7
1 3
1 16 7
18
1 6
1 4
1 3
1 9
9
1 5
1 1
1 16 3
19
1 7
1 9
1 7
1 3
10
1 5
1 8
1 13 4
20
1 2
1 5
1 2
1 7
(i) Using all the data, find trial control limits for and R charts, construct the chart, and plot the data. (ii) Use the trial control limits from part (a) to identify out-of-control points. If necessary, revise your control limits, assuming that any samples that plot outside the control limits can be eliminated. (iii) Assuming that the process is in control, estimate the process mean and process standard deviation.
5
Probability and Statistics Work Book
Solution: (i)
The trial control limits are as follows.
6
Probability and Statistics Work Book
(ii)
Based on the control charts, there is a single observation beyond the control limits. Observation 14 is above the upper control limit on the R chart. With Observation 14 removed, the control limits and charts are as follows.
.0 All points are within the control limits. The process is said to be in statistical control. (iii)
The estimate process mean is 15.14 The estimate process standard deviation is 3.895/2.059 = 1.892
7
Probability and Statistics Work Book
Exrcise 9: The thickness of a printed circuit board (PCB) is an important quality parameter. Data on board thickness (in cm) are given below for 25 samples of three boards each. Sample
1
2
3
Sample
1
2
3
1
0.0629
0.0636
0.0640
14
0.0645
0.0640
0.0631
2
0.0630
0.0631
0.0622
15
0.0619
0.0644
0.0632
3
0.0628
0.0631
0.0633
16
0.0631
0.0627
0.0630
4
0.0634
0.0630
0.0631
17
0.0616
0.0623
0.0631
5
0.0619
0.0628
0.0630
18
0.0630
0.0630
0.0626
6
0.0613
0.0629
0.0634
19
0.0636
0.0631
0.0629
7
0.0630
0.0639
0.0625
20
0.0640
0.0635
0.0629
8
0.0628
0.0627
0.0622
21
0.0628
0.0625
0.0616
9
0.0623
0.0626
0.0633
22
0.0615
0.0625
0.0619
10
0.0631
0.0631
0.0633
23
0.0630
0.0632
0.0630
11
0.0635
0.0630
0.0638
24
0.0635
0.0629
0.0635
12
0.0623
0.0630
0.0630
25
0.0623
0.0629
0.0630
13
0.0635
0.0631
0.0630
(i) Using all the data, find trial control limits for and R charts, construct the chart, and plot the data. (ii) Use the trial control limits from part (a) to identify out-of-control points. If necessary, revise your control limits, assuming that any samples that plot outside the control limits can be eliminated.
8
Probability and Statistics Work Book
(iii) Assuming that the process is in control, estimate the process mean and process standard deviation.
6 5
Hypothesis Testing - One Population
Exercise 1: (Example 1) A manufacturer of sprinkler systems used for fire protection in office buildings claims that the true average system- activation temperature is 1300. A sample of 9 systems, when tested yields an average activation temperature of 131.080F. If the distribution of activation times is normal with standard deviation 1.50F, does the data contradict the firm’s claim at level of significance a = 0.01. What is the P-value for this test?
Exercise 2: (Example 2) A random sample of 50 battery packs is selected and subjected to a life test. The average life of these batteries is 4.05 hours. Assume that the battery life is normally distributed with standard deviation equals 0.2 hour. Is there evidence to support the claim that mean battery life exceeds 4 hours? Use a = 0.05. What is the P-value for this test?
9
Probability and Statistics Work Book
Exercise 3: A new cure has been developed for a certain type of cement that results in a compressive strength of 5000 kilograms per square centimeter with a standard deviation of 120 kilograms follow the normal distribution. To test the null hypothesis that µ = 5000 against the alternative that µ < 5000, a random sample of 50 pieces of cement is observed. The critical region is defined to be X < 4970. (a) Find the probability of committing a type I error when H0 is true. (b) Evaluate β (the probability of type II error) if µ = 4960
Exercise 4: (Tutorial 6, No.1) A civil engineer is analyzing the compressive strength of concrete. Compressive strength is approximately normally distributed with variance σ 2 = 1000psi2. A random sample of 12 specimens has a mean compressive strength of x
=3255.42 psi.
(a) Test the hypothesis that mean compressive strength is 3500psi. Use a fixed-level test with α =0.01; (b) What is the smallest level of significance at which you would be willing to reject the null hypothesis?; (c) Construct a 95% two-sided CI on mean compressive strength; and (d) Construct a 99% two-sided CI on mean compressive strength. Compare the width of this confidence interval with the width of the one in part (c). What is your comment? Solution: (a) (i) The parameter of interest is the true mean compressive strength, μ. (ii) The hypothesis Testing: vs (iii) The significance level α = 0.01 (iv) The test statistics is:
Computation
(v)
x = 3255 .42 , σ = 31 .62 3255 .42 −3500 ⇒z 0 = = −26 .79 31 .62 / 12 Decision:
10
Probability and Statistics Work Book
Reject H0 if z0 <- z/2 where z0.005 = 2.58 or z0 > z/2 where z0.005 = 2.58 (vi) Result and conclusion: Since -26.79 < -2.58, so we reject the null hypothesis and conclude the true mean compressive strength is significantly different from 3500 at α = 0.01. (b) The smallest level of significance at which we are willing to reject the null hypothesis is P-value = 2[1 - φ (26.84)]=2[1-1]=0 (c) A 95% two-sided CI on mean compressive strength is
With 95% confidence, we believe the true mean compressive strength is between 3237.53psi and 3273.31psi.
(d)
A 99% two-sided CI on mean compressive strength is
With 99% confidence, we believed that the true mean compressive strength is between 3231.96 psi and 3278.88 psi. The 99% confidence interval is wider than the 95% confidence interval. We can conclude that the confidence interval with the larger level of confidence will always result in a wider confidence interval when x , σ 2, and n are held constant.
11
Probability and Statistics Work Book
Exercise 5: (Example 3) A new process for producing synthetic diamonds can be operated at a profitable level only if the average weight of the diamonds is greater than 0.5 karat. To evaluate the profitability of the process, six diamonds are generated with recorded weights, 0.46, 0.61, .52, .48, .57 and . 54 karat. (a) At 5% significance level Do the six measurements present sufficient evidence that the average weight of the diamonds produced by the process is in excess of .05 karat? (b) Use the P-value approach to test the hypothesis null. (c) Construct a 95% CI on the average weight of diamonds.
Exercise 6: (Tutorial 6, No.2) One of the Cigarette Company claims that their cigarettes contain an average of only 10mg of tar. A random sample of 25 cigarettes shows the average tar content to be 12.5mg with standard deviation of 4.5mg. (a) Construct a hypothesis test to determine whether the average tar content of cigarettes exceeds 10mg. using the P-value approach; (b) Construct a 95% two-sided CI on the average tar content of cigarettes. Solution: (a) (i) The parameter of interest is the true mean tar content, μ. (ii) The hypothesis testing:
H 0 : µ = 10 mg vs H 1 : µ > 10 mg (iii) The test statistics is:
12
Probability and Statistics Work Book
t0 =
x −µ s/
n
=
12 .5 −10 = 2.778 4.5 / 5
(v) Decision: Reject H0 if P-value is smaller than 0.05 (vi) Conclusion: From a t-distribution table, for a t – distribution with 24degree of freedom, that t0 =2.778 falls between two values: 2.492 for which α =0.01 and 2.797 for which α =0.005. So the P-value is : 0.005 < P < 0.01. Since P<0.05, thus we reject H0 and conclude that the mean tar content of the cigarette exceeds 10mg. (b)
A 95% two-sided CI on mean tar content is x =12 .5, s = 4.5, n = 25 , tα / 2, n −1 = t 0.025 , 24 = 2.064 s s x −tα / 2, n −1 ≤ µ ≤ x + tα / 2, n −1 n n 4.5 4.5 12 .5 − ( 2.064 ) ≤ µ ≤12 .5 − ( 2.064 ) 25 25 10 .642 ≤µ ≤14 .358
Exercise 7: (Example 4) Regardless of age, about 20% of Malaysian adults participate in fitness activities at least twice a week. In a local survey of 100 adults over 40 years old, a total of 15 people indicated that they participated in a fitness activity at least twice a week. (a) Do these data indicate that the participation rate for adults over 40 years of age is significantly less than 20%? Carry out a test at 10% significance level and draw appropriate conclusion. (b) Construct a 95% two-sided CI on the participation rate.
13
Probability and Statistics Work Book
Exercise 8: (Tutorial 6, No.3) A survey done one year ago showed that 45% of the population participated in recycling programs. In a recent poll a random sample of 1250 people showed that 588 participate in recycling programs. (a) Test the hypothesis that the proportion of the population who participate in recycling programs is greater than it was one year ago. Use a 5% significance level. (b) Construct a 95% two-sided CI on the proportion. Solution: (a) (i) The parameter of interest is the proportion of the population who participate in recycling program, p. (ii) The hypothesis testing: H 0 : p 0 = 0.45
vs H 1 : p 0 > 0.45 (iii) The significance level α = 0.05 (iv) Test statistics is:
z0 = (v)
Decision:
(vi)
Conclusion:
pˆ = X / n − p0 = p0 (1 − p0 ) / n
588 / 1250 − 0.45 = 1.449 (0.45)( 0.55) / 1250
Reject H0 if z0 > zα where zα = z0.05 = 1.645. Since 1.449 < 1.645, thus we do not reject the null hypothesis and conclude that 45% of the population who participate in recycling program is true at the 0.05 level of significance. (b) 95% two-sided CI is
14
Probability and Statistics Work Book
ˆ − Zα / 2 p 0.47 −1.96
ˆ (1 − p ˆ) p ˆ + Zα / 2 ≤p≤p n
ˆ (1 − p ˆ) p n
(0.47 )( 0.53 ≤ p ≤ 0.47 +1.96 1250 0.442 ≤ p ≤ 0.498
(0.47 )( 0.53 ) n
Since p =0.45 is inside the interval, then we cannot reject the null hypothesis. Exercise 9: A Ipoh city council member gave a speech in which she said that 18% of all private homes in the city had been undervalued by the county tax assessor’s office. In a follow-up story the local newspaper reported that it had taken random sample of 91 private homes. Using professional evaluator to evaluate the property and checking against county tax records it found that 14 of the homes had been undervalued. (i) Does this data indicate that the proportion of private homes that are undervalued by the county tax assessor is different from 18%? Use a 5% significance level. (ii) Construct a 95% two-sided CI on the proportion.
Exercise 10: (Example 5) Engineers designing the front-wheel-drive half shaft of a new model automobile claim that the variance in the displacement of the constant velocity joints of the shaft is less than 1.5 mm. 20 simulations were conducted and the following results were obtained, x = 3.39 and s = 1.41. (i) At α = 0.05, do these data support the claim of the engineers? (ii) What is the P-value for this test? (iii) Construct a two-sided CI for σ.
15
Probability and Statistics Work Book
Exercise 11: (Tutorial 6, No.4) An Aerospace Engineers claim that the standard deviation of the percentage in an alloy used in aerospace casting is greater than 0.3. 51 parts were randomly selected and the sample standard deviation of the percentage in an alloy used in aerospace casting is s =0.37. (i). At α = 0.05, do these data support the claim of the engineers? (ii) What is the P-value for this test? (iii) Construct a 95% two-sided CI for σ . What is conclusion? Solution: 2 (i) (a) The parameter of interest is the population variance σ . (b) The hypothesis testing:
H 0 : σ 2 = (0.3) 2
vs H1 : σ 2 > (0.3) 2 (c) The significance level α = 0.05 (d) Test statistics is:
χ02 =
(f)
(n − 1) s 2 50(0.37 ) 2 = = 76 .056 σ 02 (0.3) 2
(e) Decision: 2 2 Reject H0 if χ0 >χ0.05 , 50 = 67 .50 Conclusion: Since 76.056 > 67.50, thus we reject the null hypothesis and conclude that the engineers claim is true at the 0.05 level of significance. χ2 table, χ02.1,50 = 76 .15 , χ02.25 ,50 = 71 .42 . Since (ii) From the 71.42<76.056< 76.15, so the P-value is 0.1 < p < 0.25. Because the P-value is large, then we do not reject the null hypothesis. (b) 95% two-sided CI is
16
Probability and Statistics Work Book
(n − 1) s 2 (n − 1) s 2 2 ≤ σ ≤ χ α2 / 2, n −1 χ12−α / 2, n −1 50(0.37)2 50(0.37) 2 ≤σ2 ≤ 71.42 32.36 0.442 ≤ p ≤ 0.498
Exercise 12: The scientists claim that the variance of sugar content of the syrup in canned peaches thought to be 18 mg2. From a random sample of 10 cans yields a sample deviation of 4.8mg. (i) At α = 0.05, do these data support the claim of the scientists? (ii) What is the P-value for this test? (iii) Construct a 95% two-sided CI for σ . What is conclusion?
17
Probability and Statistics Work Book
7 5
Hypothesis Testing -Two Population
Exercise 1: (Example 1) A random sample of size n = 25 taken from a normal population with σ = 5.2 has a mean equals 81. A second random sample of size n = 36, taken from a different normal population with σ = 3.4, has a mean equals 76. (a) Do the data indicate that the true mean value µ 1 and µ 2 are different? Carry out a test at α = 0.01 (b) Find 90% CI on the difference in mean strength
Exercise 2: (Example 2) Two machines are used for filling plastic bottles with a net volume of 16.0 oz. The fill volume can be assumed normal with, s1 = 0.02 and s2 = 0.025. A member of the quality engineering staff suspects that both machines fill to the same mean net volume, whether or not this volume is 16.0 oz. A random sample of 10 bottles is taken from the output of each machine with the following results: (a) Do you think the engineer is correct? Use the p – value approach. (b) Find a 95% CI on the difference in means.
18
Probability and Statistics Work Book
Exercise 3: (Tutorial 7, No.1) Two machine are used to fill plastic bottles with dishwashing detergent. The standard deviations of fill volume are known to be σ 1= 0.01 and σ 2 = 0.15 fluid ounce for two machines, respectively. Two random samples of n1 = 12 bottles from machine 1 and n2=10 bottles from machine 2 are selected, and the sample mean fill volumes are x 1 =30.61
x 2 =30.24 fluid ounces. Assume normality. (i) (ii) (iii)
Test the hypothesis that both machines fill to the same mean volume. Use the P-value approach; Construct a 90% two-sided CI on the mean difference in fill volume; and Construct a 95% two-sided CI on the mean difference in fill volume. Compare and comment on the width of this interval to the width of the interval in part (ii).
Exercise 4: (Example 3) To find out whether a new serum will arrest leukemia, 9 mice, all with an advanced stage of the disease are selected. 5 mice receive the treatment and 4 do not. Survival, in years, from the time the experiment commenced are as follows: Treatment
2.1
5.3
1.4
4.6
No treatment
1.9
0.5
2.8
3.1
0.9
At the 0.05 level of significance can the serum be said to be effective? Assume the two distributions to be of equal variances.
19
Probability and Statistics Work Book
Exercise 5: (Tutorial 7, No.2) A new policy regarding overtime pay was implemented. This policy decreased the pay factor for overtime work. Neither the staffing pattern nor the work loads changed. To determine if overtime loads changed under the policy, a random sample of employees was selected. Their overtime hours for a randomly selected week before and for another randomly selected week after the policy change were recorded as follows: Employees: Before: After:
1 5
2 4
3
2 7
3 4 5 6 7 8 9 10 11 12 8 10 4 9 3 6 0 1 5 5 3 7 4 4 1 2 3 2 2
Assume that the two population variances are equal and the underlying population is normally distributed. (i) Is there any evidence to support the claim that the average number of hours worked as overtime per week changed after the policy went into effect. Use a P-value approach in arriving at this conclusion. (ii) Construct a 95% CI for the difference in mean before and after the policy change. Interpret this interval.
Exercise 6: The diameter of steel rods manufactured on two different extrusion machines is being investigated. Two random samples of sizes n1 = 15 and n2 = 17 are selected, and x1 = 8.37 , Assume s12 = 0.that 35 data andarex2drawn = 8.68 , s22 distribution = 0.40 with equal variances. respectively. normal (a) Is there evidence to support the claim that the two machines produce rods with different mean diameters ? Use the p – value approach. (b) Construct a 95% CI on the difference in mean rod diameter.
Exercise 7: (Example 4)
20
Probability and Statistics Work Book
The following data represent the running times of films produced by 2 motion-picture companies. Test the hypothesis that the average running time of films produced by company 2 exceeds the average running time of films produced by company 1 by 10 minutes against the one-sided alternative that the difference is less than 10 minutes? Use a = 0.01 and assume the distributions of times to be approximately normal with unequal variances. Time
Company X1
102
86
98
109
92
X2
81
165
97
134
92
87
114
Exercise 8: Two companies manufacture a rubber material intended for use in an automotive application. 25 samples of material from each company are tested, and the amount of wear after 1000 cycles are observed. For company 1, the sample mean and standard deviation of wear are
x1 = 20 .12 mg / 1000 cycles and s1 = 1.9mg / 1000 cycles and for company 2, we obtain x2 = 11 .64 mg / 1000 cycles and s2 = 7.9mg / 1000 cycles (a) Do the sample data support the claim that the two companies produce material with different mean wear? Assume each population is normally distributed but unequal variances? (b) Construct a 95% CI for the difference in mean wear of these two companies. Interpret this interval.
Exercise 9: (Tutorial 7, No.3)
21
Probability and Statistics Work Book
Professor A claims that a probability and statistics student can increase his or her score on tests if the person is provided with a pre-test the week before the exam. To test her theory she selected 16 probability and statistics students at random and gave these students a pre-test the week before an exam. She also selected an independent random sample of 12 students who were given the same exam but did not have access to the pre-test. The first group had a mean score of 79.4 with standard deviation 8.8. The second group had sample mean score 71.2 with standard deviation 7.9. (i) Do the data support Professor A claims that the mean score of students who get a pretest are different from the mean score of those who do not get a pre test before an exam. Use the P-value approach and assume that their variances are not equal. (ii) Construct a 95% CI for the difference in mean score of students who get a pre-test and those who do not get a pre-test before an exam. Interpret this interval.
Exercise 10: (Example 5) A vote is to be taken among residents of a town and the surrounding county to determine whether a proposed chemical plant should be constructed. If 120 of 200 town voters favour the proposal and 240 of 500 county residents favour it, would you agree that the proportion of town voters favouring the proposal is higher than the proportion of county voters? Use a = 0.05
Exercise 11: (Tutorial 7, No.4)
22
Probability and Statistics Work Book
(i) (ii)
The rollover rate of sport utility vehicles is a transportation safety issue. Safety advocates claim that the manufacturer A’s vehicle has a higher rollover rate than that of manufacturer B. One hundreds crashes for each of this vehicles were examined. The rollover rates were pA=0.35 and pB=0.25. By using the P-value approach, does manufacturer A’s vehicle has a higher rollover rate than manufacturer B’s? Construct a 95% CI on the difference in the two rollover rates of the vehicle. Interpret this interval.
Exercise 12: Professor Rady gave 58 A’s and B’s to a class of 125 students in his section of English 101. The next term Professor Hady gave 45 A’s and B’s to a class of 115students in his section of English 101. (i) By using a 5% significance level, test the claim that Professor Rady gives a higher percentage of A’s and B’s in English 101 than Professor Hady does. What is comment? (ii) Construct a 95% CI on the difference in the percentage of A’s and B’s in English 101 given by this two professors.
23
8 5
Probability and Statistics Work Book
Simple Linear Regression
Exercise 1: (Example 1) The manager of a car plant wishes to investigate how the plant’s electricity usage depends upon the plant production. The data is given below Production 4.51 3.58 4.31 5.06 5.64 4.99 5.29 5.83 4.7 5.61 4.9 4.2 (RMmillion) (x) Electricity 2.48 2.26 2.47 2.77 2.99 3.05 3.18 3.46 3.03 3.26 2.67 2.53 Usage (y)
Y = β0 + β1 x (a) Estimate the linear regression equation (b) An estimate for the electricity usage when x = 5 (c) Find a 90% Confidence Interval for the electricity usage.
Exercise 2: An experiment was set up to investigate the variation of the specific heat of a certain chemical with temperature. The data is given below
(a) (b)
Temperature oF (x)
50
60
70
80
90
Heat (y)
1.60 1.64
1.63 1.65
1.67 1.67
1.70 1.72
1.71 1.72
Estimate the linear regression equation Y = β0 + β1 x Plot the results on a scatter diagram (c) An estimate for the specific heat when the temperature is 75oF (d) Find a 95% Confidence Interval for the specific heat. Exercise 3: (Example 2)
24
100 1.71 1.74
Probability and Statistics Work Book
An engineer at a semiconductor company wants to model the relationship between the device HFE (y) and the parameter Emitter - RS ( x).1 Data for Emitter - RS was first collected and a statistical analysis is carried out and the output is displayed in the table given. Regression Analysis: y = 1075.2 – 63.87x1 Predictor Coef SE Coef T Constant 1075.2 121.1 8.88 x1 -63.87 8.002 -7.98 S = 19.4 R-Sq = 0.78 Analysis of variance Source DF Regression 1 Residual 18 Total 19
SS 23965 6772 30737
MS 23965 376
P-value 0.000 0.000
F 63.70
(a) Estimate HFE when the Emitter - RS is 14.5. (b) Obtain a 95 % confidence interval for the true slope β. (c) Test for significance of regression for a = 0.05.
Exercise 4:
25
Probability and Statistics Work Book
An chemical engineer wants to model the relationship between the purity of oxygen (y) produced in a chemical distillation process and the percentage of hydrocarbons (x ) that are present in the main condenser of the distillation unit. A statistical analysis is carried out and the output is displayed in the table given. Regression Analysis: y = 74.3 + 14.9x Predictor Coef SE Coef Constant 74.283 1.593 x1 14.947 1.317 S = 1.087 R-Sq = 87.7% Analysis of variance Source DF Regression 1 Residual 18 Total 19
SS 152.13 21.25 173.38
T 46.62 11.35
MS 152.13 1.18
P-value 0.000 0.000
F 12.86
(a) Estimate the purity of oxygen when the percentage of hydrocarbon 1%. (b) Obtain a 95 % confidence interval for the true slope β. (c) Test for significance of regression for a = 0.05.
26
Probability and Statistics Work Book
Exercise 5: (Tutorial 8, No.1) Regression methods were used to analyze the data from a study investigating the relationship between roadway surface temperature (x) and pavement deflection (y). The data follow. Temperature x
Deflection y
Temperature x
Deflection y
70.0
0.621
72.7
0.637
77.0
0.657
67.8
0.627
72.1
0.640
76.6
0.652
72.8
0.623
73.4
0.630
78.3
0.661
70.5
0.627
74.5
0.641
72.1
0.631
74.0
0.637
71.2
0.641
72.4
0.630
73.0
0.631
75.2
0.644
72.7
0.634
76.0
0.639
71.4
0.638
(a) Estimate the intercept and slope regression coefficients. Write the estimated regression line. (b) Compute SSE and estimate the variance. (c) Find the standard error of the slope and intercept coefficients. (d) Show that (e) Compute the coefficient of determination, R2. Comment on the value. (f) Use a t-test to test for significance of the intercept and slope coefficients at . Give the P-values of each and comment on your results. (g) Construct the ANOVA table and test for significance of regression using the Pvalue. Comment on your results and their relationship to your results in part (f).
(h) Construct 95% CIs on the intercept and slope. Comment on the relationship of these CIs and your findings in parts (f) and (g).
27
Probability and Statistics Work Book
Exercise 6: (Tutorial 8, No.2) The designers of a database information system that allows its users to search backwards for several days wanted to develop a formula to predict the time it would be take to search. Actually elapsed time was measured for several different values of days. The measured data is shown in the following table: Number of Days Elapsed Time
1 0.6 5
2 0.79
4 1.36
8 2.26
16 3.59
25 5.39
Estimate the intercept and slope regression coefficients. Write the estimated regression line. (ii) Compute SSE and estimate the variance. (iii) Find the standard error of the slope and intercept coefficients. (iv) Show that (v) Compute the coefficient of determination, R2. Comment on the value. (vi) Use a t-test to test for significance of the intercept and slope coefficients at . Give the P-values of each and comment on your results. (vii) Construct the ANOVA table and test for significance of regression using the P-value. Comment on your results and their relationship to your results in part (vi). (viii) Construct 95% CIs on the intercept and slope. Comment on the relationship of these CIs and your findings in parts (vi) and (vii). (i)
28
Probability and Statistics Work Book
9 5
Multiple Linear Regressions
Exercise 1: (Example 1) Given the data: Test Number 1 2 3 4 5 6 7 8 9 10
y 1.6 2.1 2.4 2.8 3.6 3.8 4.3 4.9 5.7 5
x1 1 1 2 2 2 3 2 4 4 3
(a) Fit a multiple linear regression model to these data.
29
x2 1 2 1 2 3 2 4 2 3 4
Probability and Statistics Work Book
Exercise 2: Given the data: Observation Number Pull Strength y Wire Length x1 Die Height x2 1 9.95 2 50 2 24.45 8 110 3 31.75 11 120 4 35.00 10 550 5 25.02 8 295 6 16.86 4 200 7 14.38 2 375 8 9.60 2 52 9 24.35 9 100 10 27.50 8 300 11 17.08 4 412 12 37.00 11 400 13 41.95 12 500 14 11.66 2 360 15 21.65 4 205 16 17.89 4 400 17 69.00 20 600 18 10.30 1 585 19 34.93 10 540 20 46.59 15 250 21 44.88 15 290 22 54.12 16 510 23 56.63 17 590 24 22.13 6 100 25 21.15 5 400 (b) Fit a multiple linear regression model to these data.
30
Probability and Statistics Work Book
Exercise 3: A study was performed to investigate the shear strength of soil (y) as it related to depth in meter (x1) and percentage moisture content (x2). Ten observations were collected and the following summary quantities obtained:
n = 10 ,
∑x ∑x
2 i1 i1
∑x
i1
i2
∑y ∑x x
= 553 ,
i
= 1,916 ,
= 12 ,352 , ∑x = 31,729 , = 43 ,550 .8, ∑x y = 104 ,736 .8, ∑y = 371 ,595 .6
= 5,200 .9, yi
∑x
= 223 , 2 i2
i1 i 2
i2
i
2 i
(a) Estimate the parameters to fit the multiple regression models for these data. (b) What is the predicted strength when x1=18meter and x2= 43%.
31
Probability and Statistics Work Book
Exercise 4: (Example 2) A set of experimental runs were made to determine a way of predicting cooking time y at various levels of oven width x1, and temperature x2. The data were recorded as follows: y 6.4 15.05 18.75 30.25 44.86 48.94 51.55 61.5 100.44 111.42
x1 1.32 2.69 3.56 4.41 5.35 6.3 7.12 8.87 9.8 10.65
x2 1.15 3.4 4.1 8.75 14.82 15.15 15.32 18.18 35.19 40.4
(a) Fit a multiple linear regression model to these data. (b) Estimate and the standard errors of the regression coefficients. (c) Test for significance of and . (d) Predict the useful range when brightness = 80 and contrast = 75. Construct a 95% PI. (e) Compute the mean response of the useful range when brightness = 80 and contrast = 75. Compute a 95% CI. (f) Interpret parts (d) and (e) and comment on the comparison between the 95% PI and 95% CI.
32
Probability and Statistics Work Book
Exercise 5: (Tutorial 9, No.1) An article in Optical Engineering (“Operating Curve Extraction of a Correlator's Filter,” Vol. 43, 2004, pp. 2775–2779) reported the use of an optical correlator to perform an experiment by varying brightness and contrast. The resulting modulation is characterized by the useful range of gray levels. The data are shown
Brightness (%):
5 4
6 1
6 5
10 0
10 0
10 0
50
57
54
Contrast (%):
5 6
8 0
7 0
50
65
80
25
35
26
Useful range (ng): 9 6
5 0
5 0
11 2
96
80
15 5
14 4
25 5
(a) Fit a multiple linear regression model to these data. (b) Estimate and the standard errors of the regression coefficients. (c) Test for significance of and . (d) Predict the useful range when brightness = 80 and contrast = 75. Construct a 95% PI. (e) Compute the mean response of the useful range when brightness = 80 and contrast = 75. Compute a 95% CI. (f) Interpret parts (d) and (e) and comment on the comparison between the 95% PI and 95% CI.
33
Probability and Statistics Work Book
Exercise 6: (Tutorial 9, No.2) A study was performed on wear of a bearing y and its relationship to x1 = oil viscosity and x2 = load. The following data were obtained: x 1
1.6 15.5 22.0
43.0 33.0 40.0
x 2
85 1
816
1058 120 1
135 7
111 5
y
29 3
230
172
113
125
91
(a) (b) (c) (d) (e)
Fir a multiple regression model to these data. 2 Estimate σ and the standard errors of the regression coefficients. Use the model to predict wear when x1 = 25 and x2 = 1000. Fit a multiple regression model with an interaction term to these data. 2 Estimate σ and se(β j) for this new model. How did these quantities change? Does this tell you anything about the value of adding the interaction term to the model? (f) Use the model in (d), to predict when x1=25 and x2=1000. Compare this prediction with the predicted value from part (c) above.
34
Probability and Statistics Work Book
10 10
Factorial Experiments – 22 Factorial design
Exercise 1: (Example 1) An engineer is investigating the thickness of epitaxial layer which will be subject to two variations in A, deposition time (+ for short time, and – for long time) and two levels of B, arsenic flow rate (- for 55% and + for 59%). The engineer conduct 22 factorial design with n = 4 replicates. The data are as follow:
35
Probability and Statistics Work Book
Arsenic Level B– (Low - 55%)
B+ (High – 59%)
14.037 14.165 13.972 13.907
13.880 13.860 14.032 13.914
14.821 14.757 14.843 14.878
14.888 14.921 14.415 14.932
Deposition Time A - (Long)
A + (Short)
a) Construct the 2 X 2 factorial design table. b) Find the estimate of all effects and interaction. c) Construct the ANOVA table for each effect, test the null hypothesis that the effect is equal to 0. Exercise 2: (Tutorial No1) A two factor experimental design was conducted to investigate the lifetime of a component being manufactured. The two factors are A (design) and B (cost of material). Two levels ((+) and (-)) of each factor are considered. Three components are manufactured with each combination of design and material, and the total lifetime measured (in hours) is as shown in table below
Treatment
Design
Material
A
B
Combination
36
Total lifetime of 3 AB
components (in hours)
Probability and Statistics Work Book
(1)
-
-
+
122
+
-
-
60
b
-
+
-
120
ab
+
+
+
118
a
(a) Perform a two way analysis of variance to estimate the effects of design and material expense on the component life time. (b) Based on your results in part (a), what conclusions can you draw from the factorial experiment? (c) Indicate which effects are significant to the lifetime of a component. (d) Write the least square fitted model using only the significant sources.
Exercise 3: An engineer suspects that the surface finish of metal parts is influenced by the type of paint used and the drying time. He selected three drying times – 20, 25, and 30 minutes and used two types of paint. Three parts are tested with each combination of paint typoe and drying time. The data are as follow: Drying Time (min) Paint
20min
25min
30min
ICI
74 64 50
73 61 44
78 85 92
NIPPON
92 86 68
98 73 88
66 45 85
37
Probability and Statistics Work Book
(a) Compute the estimates of the effects and their standard errors for this design. (b) Construct two-factor interaction plots and comment on the interaction of the factors. (c) Use the t ratio to determine the significance of each effect with .Comment on your findings. (d) Compute an approximate 95% CI for each effect. Compare your results with those in part (c) and comment. (e) Perform an analysis of variance of the appropriate regression model for this design. Include in your analysis hypothesis tests for each coefficient, as well as residual
Exercise 4: (Tutorial 10, No.2) An experiment involves a storage battery used in the launching mechanism of a shoulderfired ground-to-air missile. Two material types can be used to make the battery plates. The objective is to design a battery that is relatively unaffected by the ambient temperature. The output response from the battery is effective life in hours. Two temperature levels are selected, and a factorial experiment with four replicates is run. The data are as follows: Temperature (°F) Material 1
2
Low
High
13 0
15 5
2 0
70
74
18 0
8 2
58
13 8
11 0
9 6
10 4
16 3816 8 0
8 2
60
Probability and Statistics Work Book
(a) Compute the estimates of the effects and their standard errors for this design. (b) Construct two-factor interaction plots and comment on the interaction of the factors. (c) Use the t ratio to determine the significance of each effect with .Comment on your findings. (d) Compute an approximate 95% CI for each effect. Compare your results with those in part (c) and comment. (e) Perform an analysis of variance of the appropriate regression model for this design. Include in your analysis hypothesis tests for each coefficient, as well as residual analysis. State your final conclusions about the adequacy of the model. Compare your results to part (c) and comment.
Exercise 5: An article in the IEEE Transactions on Semiconductor Manufacturing (Vol. 5, 1992, pp. 214222) describes an experiment to investigate the surface charge on a silicon wafer. The factors thought to influence induced surface charge are cleaning method (spin rinse dry or SRD and spin dry or SD and the position on the wafer where the charge was measured. The surface charge ( X1011 q/cm3) response data are shown.
Cleaning Method
Test Position L 1.66 1.90 1.92 -4.21
SD
39
R 1.84 1.84 1.62 -7.58
Probability and Statistics Work Book
SRD
-1.35 -2.08
-2.20 -5.36
(a) Compute the estimates of the effects and their standard errors for this design. (b) Construct two-factor interaction plots and comment on the interaction of the factors. (c) Use the t ratio to determine the significance of each effect with .Comment on your findings. (d) Compute an approximate 95% CI for each effect. Compare your results with those in part (c) and comment. (e) Perform an analysis of variance of the appropriate regression model for this design. Include in your analysis hypothesis tests for each coefficient, as well as residual analysis. State your final conclusions about the adequacy of the model. Compare your results to part (c) and comment.
40