p, so that the sample mean converges stochastically to p, We will show this in a more general setting in a later section. 0. (e) NB(r, p); r known, O 0. < N(a, 2). EXP(0); 0> 0. GAM(0,K);0>O,ic>0. BETA(01, p2); O > 0, 02 > 0. WEI(0, ß); ß known, O > 0. 0}, and H0 p = Po is an abbreviated notation for H0 : (p, a2) n fl where Po and o = {í'o} x (0, co) = {(L1, a2) The Y] a], all a) t (where t is given in Table 12).
The normal distribution also is referred to frequently as the Gaussian distribution.
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
119
SPECIAL CONTINUOUS DISTRIBUTIONS
3.3
The normal distribution arises frequently in physical problems; there is a theoretical reason for this, which will be developed in Chapter 7. First, we will verify that the noi itial pdf integrates to 1, and then we will verify that the values of the parameters and Q2 are indeed the mean and variance of X. Making change of variable, z = (x p)/o' with dx = dz, gives
('
I'D
-
-
Jf(x;/1cJ)dx=J - 2-
le_2212dz
=2
/2iv
Jo
If we let w = z2/2, then z =
I
loe
-z2/2 dz
,J2w and dz = (w_1I2/,/) dw, so
w112 e1v dw
r'(l/2)
=J
-
which follows from equation (3.3.7).
The integrand obtained following the substitution z = (x - i)/a is an important special case known as the standard normal pdf. We will adopt a special notation for this pdf, namely (z)
e212
=
If Z had pdf (3.3.29), then Z
-
(3.3.29)
N(O, 1), and the standard normal CDF is given
by ('z
(3.3.30) (t) dt = JSome basic geometric properties of the standard normal pdf cari be obtained by the methods of calculus. Notice that 1'(z)
q(z)
= q5(z)
(3.3.31)
for all real z, so çb(z) is an even function of z. In other words, the standard normal distribution is symmetric about z = O. Furthermore, because of the special form of i(z) we have
z4(z)
q5'(z)
=
çb"(z)
= (z2 - 1)4(z)
(3.3.32)
and (3.3.33)
Consequently, q5(z) has a unique maximum at z = O and inflection points at '(z) = z/[/2iv exp (z2/2)] -+ 0 as z = ± 1. Note also that 4(z) -* O and z
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
120
CHAPTER 3 SPECIAL PROBABILITY DISTRIBUTIONS
It is also possible, using equations (3.3.32) and (3.3.33), to find E(Z) and E(Z2). Specifically,
«z) dz
E(Z)
dz
=0 and
= '(z) I03 +
j«z) dz
=0+1 Similar results follow for the more general case X substitution z = (x - .t)/a, we have
i
¿'03
E(X)
=
x
exp
ji
-
(\X
2
J (03
IL)
a ,
-
N(u, a2). Based on the
dx
(U + az)(z) dz =J
('
= /L J
(OD
«z) dz + a J _03
-03
z«z) dz
=11
and E(X2) =
J1x2
= J(U -03
exp[
+ cz)2«z) dz
2 J«z) dz + 2pa =
2
z«z) dz + a2
Jz2«z) dz
+ a2
It follows that Var(X) = E(X2) - IL2
(p2 + a2)
- p2 = a2.
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
3.3
FIGURE 3.4
SPECIAL CONTINUOUS DISTRIBUTIONS
¡21
A normal pdf
The graph of y =f(x; p, c) is shown in Figure 3.4. The general topic of transformation of variables will be discussed later, but it is convenient to consider the following theorem at this point.
Theorem 3.3.4
N(u, c2) then
If X
1,
(33.34)
Proof F(z) = P[Z
z]
Xp C
= P[X
p + zc]
After the substitution w = (x F2(z)
=J
- p)/a, we have
e - w2/2 dw
= Part i follows by differentiation, f(z) = F'(z) -
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
122
CHAPTER 3 SPECIAL PROBABILITY DISTRIBUTIONS
We obtain Part 2 as follows:
F(x) = P[X
=P
X
x] /1
X
U]
(3.3.35)
Standard normal cumulative probabilities, 'I(z), are provided in Table 3 in Appendix C for positive values of z. Because of the symmetry of the normal density, cumulative probabilities may be determined for negative values of z by the relationship
I(z)
1 - '1(z)
(3.3,36)
We will let z denote the yth percentile of the standard normal distribution, y. For example, for y = 0.95, z095 = 1.645 from Table 3 (Appendix C). By symmetry, we know that F(-1.645) = 1 0.95 = 0.05. That is, z005 = z1_005. It follows that
P[z005
P[z1_0,05 < Z < z1_0,05] = 0.90
(3.3.37)
Some authors find it more convenient to use Z to denote the value that has a area to the right; however, we will use the notation above, which is more consistent with our notation for percentiles. Thus, in general,
P[Zj_ai2
(3.3.38)
which corresponds to equation (3.3.37) with a = 0.10 and where z112 = z095 = 1645 Similarly ifa = 005, then z1_12 = z0 = 196, and
P[-196
N(jt, cr2), then
P[ji - 1.96cr
- 1.96cr)
+1.96a_)(,i_1.96cr_) = '(1.96) - D(-1.96) = 0.95
That is, 95% of the area under a normal pdf is within 1.96 standard deviations of the mean, 90% is within 1.645 standard deviations, and so on.
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
3.3
123
SPECIAL CONTINUOUS DISTRIBUTIONS
Example 3.3.4 Let X represeñt the lifetime in months of a battery, and assume that approximately X '. N(60, 36). The fraction of batteries that will fail within a four-year warranty period is given by
P[Xz48]=«(4860)
= 0.0228
If one wished to know what warranty period would correspond to 5% failures, then
P[X
6) 60"
o.os
x00] =
- 0.05
which means that (x005 - 60)/6 = 1.645, and x0
= 1.645(6) + 60 = 50.13
months.
In general, we see that the 100 x pth percentile is xi,, = ji + z In the previous example, note also that
P[X
0]
(O _60)
- D(-10)
(3.3.39)
O
Thus, although the normal random variable theoretically takes on values over the whole real line, it still may provide a reasonable model for a variable that takes on only positive values if very little probability is associated with the nega tive values. Another possibility is to consider a truncated normal model when the variable must he positive, although we need not bother with that here; the probability assigned to the negative values is so small that the truncated model would be essentially the same as the untruncated model. Of course, there is still no guarantee that the normal model is a good choice for this variable. In particular, the normal distribution is symmetric, and it is not uncommon for lifetimes to follow skewed distributions. The question of model selection is a statistical topic to be considered later, but we will not exclude the possibility of using a normal model for positive variables in examples, with the understanding that it may be approximating the more theoretically correct truncated normal model. Some additional properties are given by the following theorem.
Theorem 3.3.5 If X
N(ji, ,.2) then
M(t) =
(3.3.40)
r=1 2, E(X
ji)2rl =
O
r = 1
(3341) (3.3.42)
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
¡24
CHAPTER 3
SPECIAL PROBABILITY DISTRIBUTiONS
Proof To show equation (3.3.40), we note that the MGF for a standard normal random variable is given by
z _j2e i
('
AI (t) - I
i
ize_2212 dz
-(z--t)2/2+:l/2
dz = ¿2/2
2ir
The integral of the first factor in the second integral is 1, because it is the integral of a normal pdf with mean t and variance 1. Because X = Za +
.
M(t) = M+(t) = eMtMz(at) Equations (3 3 41) and (3 3 42) follow from a series expansion = e(2t212
o2'(2r)! 2rr! (2r)!
This expansion contains only even integer powers, and the coefficient of t2n/(2r)! is the 2rth moment of(X
- it).
The mean of a normal distribution is an example of a special type of parameter known as a location parameter, and the standard deviation is a scale parameter.
3.4 LOCATION AND SCALE PARAMETERS In each of the following definitions, F0(z) represents a completely specified CDF, and f0(z) is the pdf.
Definition 3.4.1 Location Parameters A quantity 'j is a location parameter for the distribution of X if the CDF has the form F(x; ,') = F0(x
-
(3.4.1)
In other words, the pdf has the form (3.4.2)
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
3.4
Example 3.4.1
125
LOCATION AND SCALE PARAMETERS
A distribution that often is encountered in life-testing applications has pdf
f(x;)=
x>
and zero otherwise. The location parameter, , in this application usually is called a threshold parameter because the probability of a failure before ij is zero. This is illustrated in Figure 3.5. F1GUE 3.5
An exponential pdf with a threshold parameter
f(x;ij)
J
It is more common for a location parameter to be a measure of central tendency of X, such as a mean or a median. Consider the pdf f0(z)
= e1
co
If X has pdf of the form
f(x; ) = FIGUflE 3.6
i
- co
co
A double-exponential pdf with a location parameter
f(x;)
77
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
126
CHAPTER 3 SPECIAL PROBABILITY DISTRIBUTIONS
then the location parameter, q, is the mean of the distribution. Becausef(x; q) is symmetric about ij, and has a unique maximum, q is also the median and the mode in this example This is illustrated in Figure 3 6 The notion of a scale parameter was mentioned earlier in this chapter. A more precise definition now will be given.
Definition 3.4.2 Scale Parameter A positive quantity O is a scale parameter for the distribution of X if the CDF has the form F(x; O) =
F0()
(3.4.3)
In other words, he pdf has the form
f(x;O)=
(3.4.4)
A frequently encountered example of a random variable, the distribution of which has a scale parameter, is X EXP(0). The standard deviation, c, often turns out to be a scale parameter, but sometimes it is more convenient to use something else. For example, if X WEI(O, 2), then B is a scale parameter, but it is not the standard deviation of X. Often, both types of parameters are required.
Definition 3.4.3 Location-Scale Parameter Quantities i and 6> 0 are called location-scale parameters for the distribution of X if the CDF has the form F(x; O, q) = FO(x - n)
(3.4.5)
In other words, the pdf has the form
1 f(X - n)
f(x; J
(3.4.6)
The normal distribution is the most commonly encountered location-scale distribution, but there are other important examples.
Erarnple 3.4.3
Consider a pdf of the form f0(z) =
11 +
2
-
(3.4,7)
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
2.4
127
LOCATION AND SCALE PARAMETERS
If X has pdf of the form (l/O)f0[(x - )/O], with f0(z) given by equation (3.4.7), then X is said to have the Cauchy distribution with location scale parameters ij and O, denoted
X
CAU(O,
(3,4.8)
)
It is easy to show that the mean and variance of X do not exist, so j and O cannot be related to a mean and standard deviation. We still can interpret as either the median or mode.
Another location-scale distribution, which is frequently encountered in lifeesting apphcatiors, has pdf
f(x; û, ) =
i
exp
I
-
xii" )
x > i7
(3.4.9)
and zero otherwise. This is called the two-parameter exponential distribution, denoted by X
(3.4.10)
EXP(O, i)
A location-scale distribution based on the pdff0(z) of Example 3.4.2 is called the Laplace or double-exponential distribution, denoted by
XDE(O,)
(3.4.11)
It also is possible to define three parameter models if we replace f0(z) with
a pdf, f(z), that depends on another parameter, say fi. For example, if Z '-j WEI(1, fi), then X has the three-parameter Weibull distribution, with location-scale parameters and O and shape parameter fi if its pdf is of the form f(x; O, , ß) = (l/O)f[(x - j)/O]. Similarly, if Z '- GAM(1, ic), then X has the three-parameter gamma distribution These are denoted, respectively, by X '- WEI(0,
,
fi) and X
GAM(O,
, K).
SUMMARY The purpose of this chapter was to develop special probability distributions. Special discrete distributionssuch as the binomial, hypergeometric, negative binomial, and Poisson distributionsprovide useful models for experiments that involve counting or other integer-value responses. Special continuous distributionssuch as the uniform, exponential, gamma, Weibull, and normal distributionsprovide useful models when experiments involve measurements on a continuous scale such as time, length, or weight.
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
128
CHAPTER 3 SPECIAL PROBABILITY DISTRIBUTIONS
EX E R C IS ES
An office has 10 dot matrix printers. Each requires a new ribbon approximately every seven weeks. lithe stock clerk finds at the beginning of a certain week that there are only five ribbons in stock, what is the probability that the supply will be exhausted during that week?
2. In a 10-question truefalse test: What is the probability of getting all answers correct by guessing? What is the probability of getting eight correct by guessing?
3. A basketball player shoots 10 shots and the probability of hitting is 0.5 on each shot. What is the probability of hitting eight shot,? What is the probability of hitting eight shots if the probability on each shot isO 6 (e) What are the expected value and variance of the number of shots hit if p 0.5? 4. A four-engine plane can fly if at least two engines work. lithe engines operate independently and each malfunctions with probability q, what is the probability that the plane will fly safely? A two-engined plane can fly if at least one engine works. If an engine malfunctions with probability q, what is the probability that the plane will fly safely? (e) Which plane is the safest?
8.
(a) The Chevalier de Mere used to bet that he would get at least one 6 in four rolls of a die. Was this a good bet? (b) He also bet that he would get at least one pair of 6's in 24 rolls of two dice. What was his probability of winning this bet? Compare the probability of at least one 6 when six dice are rolled with the probability of at least two 6's when 12 dice are rolled.
6. lithe probability of picking a winning horse in a race is 0.2, and if X is the number of winning picks out of 20 races, what is:
P[X = 4J.
P[X4]. (e) E(X) and Var(X).
7. If X
BIN(n, p), derive E(X) using Definition 2.2.3.
8. Ajar contains 30 green jelly beans and 20 purple jelly beans. Suppose 10 jelly beans are selected at random from the jar. Find the probability of obtaining exactly five purple jelly beans if they are selected with replacement. Find the probability of obtaining exactly five purple beans if they are selected without replacement.
9. An office has 10 employees, three men and seven women. The manager chooses four at random to attend a short course on quality improvement.
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
EXERCISES
129
What is the probability that an equal number of men and women are chosen? What is the probability that more women are chosen? 10
Five cards are drawn without replacement from a regular deck of 52 cards. Give the probability of each of the following events: exactly two aces. exactly two kings. less than two aces. at least two aces.
i i . A shipment of 50 mechanical devices consists of 42 good ones and eight defective. An inspector selects five devices at random without replacement. What is the probability that exactly three are good? What is the probability that at most three are good? Repeat Exercise 10 if cards are drawn with replacement.
A man pays $1 a throw to try to win a $3 Kewpie doll. His probability ofwinning on each throw is 0.1.
What is the probability that two throws will be required to win the doll? What is the probability that x throws will be required to win the doll9 What is the probability that more than three throws will be required to win the doll? What is the expected number of throws needed to win a doll? 14
Three men toss coins to see who pays for coffee If all three match they toss again Otherwise, the "odd man" pays for coffee. What is the probability that they will need to do this more than once? What is the probability of tossing at most twice9
15
The man in Exercise 13 has three children, and he must win a Kewpie doll for each one. What is the probability that 10 throws will be required to win the three dolls? What is the probability that at least four throws will be required? What is the expected number of throws needed to win three dolls?
16. Consider a seven-game world series between team A and team B, where for each game P(A wins) = 0.6. Find P(A wins series in x games). You hold a ticket for the seventh game. What is the probability that you will get to use it If P(A wins a game) p what value of p maximizes your chance in (b)? What is the most likely number of games to be played in the series for p = 0.6?
The probability of a successful missile launch is 09 Test launches are conducted until three successful launches are achieved What is the probability of each of the following? Exactly six launches will be required Fewer than six launches will be required At least four launches will be required.
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
130
CI-IAPTER 3 SPECIAL PROBABILITY DISTRIBUTIONS
18. Let X
GEO(p).
Derive the MGF of X. Find the FMGF of X. Find E(X).
Find E{X(X - 1)]. Find Var(X).
19
Let X
NB(r, p).
Derive the MGF of X. Find E(X). Find Var(X),
20. Suppose an ordinary six-sided die is rolled repeatedly, and the outcomeI, 2, 3, 4, 5 or 6is noted on each roll. What is the probability that the third 6 occurs on the seventh roll? What is the probability that the number of rolls until the first 6 occurs is at most 10?
21. The number of calls that arrive at a switchboard during one hour is Poisson distributed with mean p = 10. Fjnd the probability of occurrence during an hour of each of the following events:
Exactly seven calls arrive. At most seven calls arrive. Between three and seven calls (inclusive) arrive.
2Z If X has a Poisson distribution and if P{X 23,
24
0] = 0.2, find PfX> 4].
A certain assembly line produces electronic components, and defective components occur independently with pfobablhty 01 The assembly line produces 500 components per hour For a given hour what is the probability that the number of defective components is at most two? Give the Poisson approximation for (a).
The probability that a certain type of electronic component will fail during the first hour of operation is 0.005. If 400 components are tested independently, find the Poisson approximation of the probability that at most two will fail during the first hour. Suppose that 3% of the items produced by an assembly line are defective. An inspector selects 100 items at random from the assembly line. Approximate the probability that exactly five defectives are selected.
The number of vehicles passing a certain intersection in the time interval [0, t] is a Poisson process X(t) with mean E[X(t)] = 3t where the unit of time is minutes Find the probability that at least two vehicles will pass during a given minute Define the events A = at least four vehicles pass during the first minute and B = at most two vehicles pass during the second minute Find the probability that both A and B occur.
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
131
EXERCISES
27. Let X
P01(p). Find the factorial moment generating function (FMGF) of X, G.(t). Use G(t) to find E(X).
Use G(t) to find E[X(X - 1)].
28. Suppose the X P01(10). Find P[5 < X < 15]. Use the Chebychev Inequality to find a lower bound for F[5
Find a lower bound for P[1 - k
30. Let X '- DU(N). Derive the MGF of X. Hint: Make use of the identity s + s(1 '
1s
2
+
fors1.
/
UNIF(a b) Derive the MGF of X
31
Let X
32
The hardness of a certain alloy (measured on the Rockwell scale) is a random variable X Assume that X UNIF(50, 75). Give the CDF of X.
Find P{60
33.
1f Q
34
Suppose a value x is chosen at random in the interval [0 10] in other words x is an observed value of a random variable X ' UNIF(0 10) The value x divides the interval [0
UNIF(0, 3), find the probability that the roots of the equation g(t) where g(t) = 4t2 + 4Qt + Q + 2
10] into two subintervals. Find the CDF of the length of the shorter subinterval. What is the probability that the ratio of lengths of the shorter to the longer subinterval is less than 1/4?
35
Hint Use the following steps Prove that F(1/2) = in the integral Make the substitution x = F(i/2) =
Jt_2e_t dt
Change to polar coordinates in the double integral
[(i)]2
= j' $° exp [(x2 + y2)] dx dy
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
132
CHAPTER 3 SPECIAL PROBABILITY DISTRIBUTIONS
36. Use the properties of Theorem 3.3.1 to find each of the following: r'(5).
r(5/2). (c) Give an expression for the binomial coefficient, function.
(i),
in terms of the gamma
37. The survival time (in days) of a white rat that was subjected to a certain level of X-ray radiation is a random variable X GAM(5, 4). Use Theorem 3.3.2 to find:
(a) P[X 15]. (b) P[15 < X < 20]. (c) Find the expected survival time, E(X).
The time (in minutes) until the third customer of the day enters a store is a random variable X GAM(1, 3). If the store opens at 8 A.M., find the probability that: the third customer arrives between 8:05 and 8:10; the third customer arrives after 8:10; (e) Sketch the graph of the pdf of X.
39. Suppose that for the variable Q of Exercise 33, instead of a uniform distribution we assume EXP(1.5). Find the probability that the roots of g(t) = O are real.
Q
40.
Assume that the time (in hours) until failure of a transistor is a random variable X EXP(100).
Find the probability that X> 15. Find the probability that X> 110. (e) It is observed after 95 hours that the transistor still is working. Find the conditional probability that X > 110. How does this compare to (a)? Explain this result. (d) What is Var(X)?
41. If X GAM(1, 2), find the mode of X. 42. For a switchboard, suppose the time X (in minutes) until the third call of the day arrives is gamma distributed with scale parameter O = 2 and shape parameter 'c = 3. If the switchboard is activated at 8 A.M. find the probability that the third call arrives before 8:06 AM.
43. If X
WEI(O, ß1, derive E(Xk) assuming that k> -
44. Suppose X
PAR(O, K).
Derive E(X) K> 1 Derive E(X2); K> 2.
45. If X
46
PAR(l00, 3), find E(X) and Var(X).
The shear strength (in pounds) of a spot weld is a Weibull distributed random variable X WEI(400 2/3)
(a) Find P{X>410],
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
133
EXERCISES
Find the conditional probability P[X> 4101 X> 390]. Find E(X). Find Var(X).
47. The distance (in meters) that a bomb hits from the center of a target area is a random variable X WEI(10, 2). Find the probability that the bomb hits at least 20 meters from the center of the target. Sketch the graph of the pdf of X. Find E(X) and Var(X).
Suppose that X - PAR(6, 'c). Derive the 100 x pth percentile of X. Find the median ofX if 0 10 and K = 2.
49. Rework Exercise 37 assuming that, rather than being gamma distributed, the survival time is a random variable X
PAR(4, 1.2).
50.
Rework Exercise 40 assuming that, rather than being exponential, the failure time has a Pareto distribution X PAR(100, 2).
51.
Suppose that Z N(0, 1). Find the following probabilities: P(Z 1.53).
P(Z> 0.49). P(0.35
P(IZI
52. Suppose that X -. N(3, 0.16). Find the following probabilities: P(X> 3). P(X > 3.3). P(2.8 X 3.1). Find the 98th percentile of X. Find the value c such that P(3 - c -< X < 3 + c) = 0.90.
53. The Rockwell hardness of a metal specimep is determined by impressing the surface of the specimen with a hardened point and then measuring the depth of penetration The hardness of a certain alloy is normally distributed with mean of 70 units and standard deviation of 3 units. Ifa specimen is acceptable only if its hardness is between 66 and 74 units, what is the probability that a randomly chosen specimen is acceptable? If the acceptable range is 70 ± c, for what value of c would 95% of all specimens be acceptable?
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
134
CHAPTER 3 SPECIAL PROBABILITY DISTRIBUTIONS
54..
Suppose that X
P[X P{4
P[2X
N(lO, 16), Find: 14]. X 18]. - 10 18].
x095, the 95th percentile of X.
55. Assume the amount of light X (in lumens) produced by a certain type of light bulb is normally distributed with mean p = 350 and variance a2 = 400.
Find P[325
56.
Suppose that X
N(1, 2).
(a) Find E(X - 1). (h) Find E(X4).
57. Suppose the computer store in Exercise 26 of Chapter 2 expands its marketing operation and orders 10 copies of the software package. As before, the annual demand is a random variable X and unsold copies are discarded but assume now that X BIN(10 p) Find the expected net profit to the store as a function of p. How large must p be to produce a positive expected net profit? 1f instead X P01(2), would the store make a greater expected net profit by ordering more copies of the software? 58..
Consider the following continuous analog of Exercise 57. Let X represent the annual demand for some commodity that is measured on a continuous scale such as a liquid pesticide which can be measured in gallons (or fractions thereof) At the beginning of the year, a farm-supply store orders c gallons at d1 dollars per gallon and sells it to customers at d2 dollars per gallon. The pesticide loses effectiveness if it is stored during the off-season, so any amount unsold at the end of the year is a loss If S is the amount sold, show that E(S)
=
f xf(x) dx + c[1 - F(e)].
Show that the amount e that maximizes the expected net profit is the 100 x pth percentile of X with p = (d2 - d1)/d2. lfd1 = 6, d2 = 14, and X UNIF(980, 1020), find the optimum choice for e. Rework (c) if, instead, X N(l000, 100).
59. The solution of Exercise 58 can be extended to the discrete case. Suppose now that X is discrete as in Exercise 57, and the store pays d1 dollars per copy, and charges each customer d2 dollars per copy. Furthermore, let the demand X be an arbitrary nonnegative integer-valued random variable, with pdff(x) and CDF F(x). Again, let c be the number of copies ordered by the store.
Show that E(S) = >xf(x) + c[1 - F(c)] (b) Express the net profit Y as a linear function of S, and find E(Y).
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
EXERCISES
135
Verify that the solution that maximizes E(Y) is the smallest integer c such that F(c) (d2 - d1)/d2. Hint: Note that the expected net profit is a function of e, say g(c) = E(Y), and the optimum solution will be the smallest e such that
g(c+ l)g(c)
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
C
H
P
A
T
R
JOINT DISTRBUTONS
4j INTRODUCTION In many applications there will be more than one random variable of interest, say
X1, X2, ..., Xk. It is convenient mathematically to regard these variables as components of a k-dimensional vector, X = (X1, X2, ..., X,3, which is capable of assuming values x = (x1, x2, ..., xk) in a k-dimensional Euclidean space. Note, for example, that an observed value x may be the result of measuring k characteristics once each, or the result of measuring one characteristic k times, That is, in the latter case x could represent the outcomes on k repeated trials of an experiment concerning a single variable. A3 before, we will develop the discrete and continuous cases separately. 136
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
137
4.2 JOINT DISCRETE DISTRIBUTIONS
42 JOINT DISCRETE DISTRIBUTIONS Definition 4.2.1 The joint probability density function (joint pdf) of the k-dimensional discrete random variable X = (X1, X2.....Xk) is defined to be f(x1, x2, ..., X) = P[X1 = x1, X2
x2, ..., X =
(4.2.1)
for all possible values x = (x1, x2,..., xk) of X.
In this context, the notation [X1 x1, X2 = x2, ..., X, = xk] represents the intersection of k events [X1 = x1] n [X2 = x2] n n [Xk = Xk]. Another notation for the joint pdf involves subscripts, namelyf121(Xi, X2, ..., X). This notation is a bit more cumbersome, and we will use it only when necessary.
Recall in Example 3.2.6 that a bin contained 1000 flower seeds and 400 were red flowering seeds. Of the remaining seeds, 400 are white flowering and 200 are pink
flowering. If 10 seeds are selected at random without replacement, then the number of red flowering seeds, X1, and the number of white flowering seeds, X2, in the sample are jointly distributed discrete random variables. The joint pdf of the pair (X1, X2) is obtained easily by the methods of Section 1.6. Specifically,
(400y400y f(x1, x2) -
X1
Ax2 Alo -
200
-
(1000
(4.2.2)
10
for all O x1, O x2, and x1 + x2 10. The probability of obtaining exactly two red, five white, and three pink flowering seeds isf(2, 5) = 0,0331. Notice that once the values of x1 and x2 are specified, the number of pink is also determined, namely 10 - x1 - x2, so it suffices to consider only two variables. This is a special case of a more genéral type of hypergeometric distribution.
EXTENDED H YPERGEOMETRIC DISTRIBUTION The hypergeometric distribution of equation (3 2 10) can be generalized to apply in cases where there are more than two types of outcomes of interest. Suppose that a collection consists of a finite number of items N and that there
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
138
CHAPTER 4 JOINT DISTRIBUTIONS
are k + i different types; M1 of type 1, M2 of type 2, and soon. Select n items t random without replacement, and let X. be the number of items of type i that are selected. The vector X = (X1, X2, ..., Xk) has an extended hypergeometric distribution and a joint pdf of the form
(Mi(M2
(Mk(Mk1
\X1,iX2J
cl
Xk)Xk+1
(4.2.3)
(N \\fl k
for all O s x
M1, where Mk+l =
N-
M1 and Xk+1 =
n
k
A special
notation for this is
X HYP(n, M1, M2, .,., M, N)
(4.2.4)
Note that only k random variables are here, and Xk +1 is used only as a notational convenience. The corresponding problem when the items are selected with replacement can be solved with a more general form of the binomial distribution known as the multinomial distribution. MUL TIAIOM!AL DISTRIBUTIOP!
Suppose that there are k + i mutually exclusive and exhaustive events, say E1, E2, .., E,, Ek+l, which can occur on any trial of an experiment, and let
= P(E1) for i = 1, 2, ..., k + i. On n independent trials of the experiment, we let X. be the number of occurrences of the event E1. The vector X = (X1, X2 , . .., X) IS said to have the multinomial distribution which has a joint pdf of the form p1
p1p2 ... p11
f(x1, x2, ..., X) = X1.X2. k
for allO
z
x1
í
n, where Xk+1 =
n x1 and Pk1 1-
(4.2.5) k
11
A special notation for this is
X-. MULT(n, i' P
P)
(4.2.6)
The rationale for equation (4.2.5) is similar to that of the binomial distribution. To have exactly x occurrences of E1, it is necessary to have some permutation of x1 E1's, x2 E2's, and so on. The total ñumber of such permutations is n!/(x1!Xx2!) ...
and each permutation occurs with probability p1p2 ... pXk+i, Just as the binomial provides an approximation to the hypergeometric distribution, under certain conditions equation (4.2.5) approximates equation (4.2.3). In Example 4.2.1, let us approximate the value of f(2, 5) with (4.2.5), where p1 = Pz = 0.4 and p = 0.2, This yields an approximate value of 0.0330, which
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
139
4.2 JOINT DISCRETE DISTRIBUTIONS
agrees with the exact answer to three decimal places. Actually, this corresponds to the situation of sampling with replacement In other words if n is small, rela
tive to N and to the values of M, then the effect of replacement or nonreplacement is negligible.
Exampk 4.2.2 The four-sided die of Example 2.1,1 is rolled 20 times, and the number of occurrences of each side is recorded. The probability of obtaining four l's, six 2's, five 3's, and five 4s can be computed from (3 2 5) with p = 0 25, namely [20 '/(4 i)(6 ?5 t)(5 ')](O 25)20
0 0089
If we were concerned only with recording l's, 3's, and even numbers then equa-
tion (4.2.5) would apply with p = p = 0.25 and 1 - p - p = 0.5. The probability of four l's, five 3's, and 11 even numbers would be [20 !/(4 !)(5 !X1 1 !)](0.25)(0.5)
= 0.0394
The functions defined by equations (4.2.3) and (4.2.5) both sum to i when summed over all possible values of x = (x1, x2, This is necessary to define a discrete pdf
..., xk), and both are nonnegative.
Theorem 4,2.1 A function f(x1, x2, ..., x) is the joint pdf for some vector-valued random variable X = (X1 X2, , X) if and only if the following properties are satisfied
f(x1 x2
,
x) ? O for all possible values (x1, x2,
xk)
(4 2 7)
and Xi
f(x, X2, ..., Xk) = 1
(4.2.8)
Xk
In some two-dimensional problems it isonvenient to present the joint pdf in a tabular form, particularly if a simple functibial form for the joint pdff(x1, x2) is not known. For the purpose of illustration, let X1 and X2 be discrete random variables with joint probabilities f(x1, x2) as given in Table 4 1 These values rep resent probabilities from a multinomial distribution (X1, X2) ' MULT(3, 0 4 0 4) For example this model would apply to Example 42 1 if the sampling had been with replacement, or it would be an approximation to the extended hypergeometnc model for the without-replacement case. First notice that 3
3
xiO x0
f(x1,x2)=1
as shown in the table. It is convenient to include impossible outcomes such as (3, 3) in the table and assign them probability zero. Care must be taken with the limits of the summations so that the points with zero probability are not included inadvertently when the nonzero portion of the pdf is summed.
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
140
TABLE 4.1
CHAPTER 4 JOINT DISTRIBUTIONS
Values of the discrete pdf of MU!T(3; 0.4,0.4) X2
x,
0
1
2
3
0
0.008
1
0048 0.096 0.064
0,048 0.192 0.192 0.000
0.096 0.192 0 000 0.000
0.064 0.000 0.000 0.000
0.216 0.432 0.288 0.064
0.216
0.432
0288
0.064
1.000
2
3
Now we are interested in the "marginal" probability, say P[X = 0], without regard to what value X2 may assume. Relative to the joint sample space, X2 has the effect of partitioning the event, say A, that X1 = O; computing the marginal probability P(A) = P[X1 = 0] is equivalent to computing a total probability as discussed in Section 1.5. That is, if B3 denotes the event that X2 =j, then A = (A n B0) u (A n B1) u (A n B2) u (A n B3) and
P(A n B)
P(A) = 3
=
i0
P[X1=0,X2=j]
3
=
=
i=0
f(0,i)
f(O, X2)
= 0.216
as shown in the right margin of the top row of the table. Similarly, we could compute P[X1 = 1], P[X1 = 2], and so on. The numerical values off1(x1) = P[X1 = x1] are given in the right margin of the table for each possible value of x1. Clearly, f1(x1) = XI
f(x1, x2) =
1
(4.2.9)
XI X2
soJ'1(x1) is a legitimate pdf and is referred to as the marginal pdf ofX1 relative to the original j oint sample space. Similarly, numerical values of the function 3
f2(x2) = P[X2
= x2] =
f(i, x2) i=0
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
4.2
JOINT DISCRETE DISTRIBUTIONS
141
are given in the bottom margin of the table, and this provides a means of finding the pdf's of X1 and X2 fromf(x1, x2).
Definition 4.2.2 If the pair (X1, X2) of discrete random variables has the joint pdff(x1, x2), then the marginal pdf's of X1 and X2 are f1(x1)
f(x1, x2)
(42.10)
f(x1, X2)
(4.2.11)
and f2(x2) = XI
Because (4.2.10) and (4.2,11) are the pdf's of X1 and X2, another notation would bef1(x1) andf2(x2) Although the marginal pdf s were motivated by means of a tabled distribution it often is possible to derive formulas forf1(x1) andf2(x2) if an analytic expression forf(x1, x2) is available. For example, if (X1, X2) MULT(n, Pi, P2)' then the marginal pdf of X1 is f1(x1) =
f(x1, X2) X2
=
n - xi
X20
f(x1,
flX (n - xi)!p2[(1 - Pi)
n!
x1 !(n - x1)! n!
= x1 !(n - x1)!
p2](flX1)X2
x2 ![(n - x1) - x2]!
p'
(n _xi)
=
(:)2 + (1 - p)
=
()'i -
X2[(j
- p1) - P2](n_x!)_x2
P2)n_Xi
That is, X1 BIN(n, Pi). This is what we would expect, because X1 is counting the number of occurrences of some evént E1 on n independent trials. For the flower seed example, n = 3, Pi = 0.4, and P2 = 0.4. If we are interested
only in X1, the number of red flowering seeds obtained, then we cañ lump the white and pink flowering seeds together and reduce the problem to a binomialtype problem Specifically, X1 BIN(3, 04) Similarly, the marginal distribution of X2 is BIN(n, P2) = BIN(3, 0.4). Of course, probabilities other than marginal probabilities can be computed from joint pdf's.
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
142
CHAPTER 4 JOINT DISTRIBUTIONS
If X is a k-dimensional discrete random variable and A is an event, then
P[XE A] =
xA
f(x, x2,
.,Xk)
(4.212)
For example, in the flower seed problem, we may want to know the probability that the number of red flowering seeds is less than the number of white flowering
seeds In other words the event of interest is [X1
plane, which in this example is A = {(x1, x2) x1
P[X1
Marginal probabilities can be evaluated as discussed in Section 2.2 by summing over the marginal pdf s A joint probability of special importance involves sets of the form x (- co, xk]; in other words, Cartesian products of intervals A = (- c, xj x
of the type (cc x,], i = 1,2,
FIGURE 4.1
,
k
Region corresponding to the event [X1
X2
(0,3)
(0,2)
(1,2) -
Q
-
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
4.2
JOINT DISCRETE DISTRIBUTIONS
143
Definition 4.2.3 Joint CDF The joint cumulative distribution function of the k random variables X1 X2
Xk is the function defined by xk) = P{X1
F(x1
That is, F(x1,
x1
Xk
(4 2 13)
Xk]
..., Xk) denotes the probability that the random vector X will
assume a value in the indicated k-dimensional rectangle, A As in the one dimensional case, other events can be expressed in terms of events of type A so that the CDF completely specifies the probability model As in the one-dimensional case, the only requirement that a function must satisfy to qualify as a joint CDF is that it must assign probabilities to events in such a way that Definition 1.3.1 will be satisfied. In particular, a joint CDF must satisfy properties analogous to the properties given in Theorem 2.2.3 for the onedimensional case. Properties of a bivariate CDF are listed below, and properties of a k-dimensional CDF would be similar. Theorem 4.2.2 A function F(x1, x2) is a bivariate CDF if and only if
hm F(x1, x2) = F( cc, x2) = O for all x2 hrn F(x1, x2) = F(x1, -03) = O for all x1 X2 him F(x1, x) = F(cc, co) = 1
(4.2.14)
(4.2.15) (4.2.16)
XI -+
X2-
F(b,d)F(b,c)F(a,d)+F(a,c)O foralla
for all x1 and x2 (4.2.18)
Note that property (4.2.17) is a monotonicity condition that is the twodimensional version of equation (2.2.11). This is needed to prevent the assignment
of negative values as probabilities to events of the form A = (a, b] x (e, dl. In particular,
P[a < X1
b, e
d] = F(b, d) - F(b, e) - F(a, d) + F(a, c)
which is the value on the left of inequality (4.2.17).
Property (4.2.18) asserts that F(x1, x2) is continuous from the right in each variable separately. Also note that (4.2.17) is something more than simply requiring that F(x1, x2) be nondecreasing in each variable separately.
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
144
CHAPTER 4 JOINT DISTRIBUTIONS
Example 4.2.3 Consider the function defined as follows: Io F(x1,x2)=11
ifx1+x2<-1
ifx1+x2-1
(4219)
Ifweleta=c= 1 and bd= i in(4.2.17), then
F(1, 1)F(1, 1)F(-1, 1)+F(-1, 1)= 1 1-1+0=-1 which means that (4.2.17) is not satisfied. However, it is not hard to verify that (4.2.19) is nondecreasing in each variable separately, and all of the other properties are satisfied. Thus, a set function based on (4.2.19) would violate property (1.3.1) of the definition of probability.
4e3 JOINT CONTINUOUS DISTRIUTIONS The joint CDF provides a means for defining a joint Continuous distribution.
Definition 4.3.1 A k dimensional vector valued random variable X = (X1 X2 Xk) is said to be continuous if there is a function f(x1, x2.....xk), called the joint probability density function (joint pdf), of X, such that the joint CDF can be written as (4.3.1)
As in the one-dimensional case, the joint pdf can be obtained from the joint CDF by differentiation. In particular, f(x1,
Xk)_F X1,...,Xk)
(4.3.2)
wherever the partial derivatives exist. To serve the purpose of a joint pdf, two properties must be satisfied.
Theorem 4.3.1 Any function f(x1, X2, ..., xk) is a joint pdf of a k-dimensional random variable if
and only if
Xk)O forallx
1
, Xk
(4
3)
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
4.3
JOINT CONTINUOUS DISTRIBUTIONS
145
(4.3.4)
Numerous applications can be modeled by joint continuous variables.
Example 4.3.1
Let X1 denote the concentration of a certain substance in one trial of an experiment, and X2 the concentration of the substance in a second trial of the experiment. Assume that the joint pdf is given by f(x1, x2) 4x1 x2; 0 < x1 <1, 0
F(x1, x2)
=j
CXI
- -j
f(t1, t2) dt1 dt2
jxi
i:
22 i 2
t2 d, dt 2
0< X1 < 1,0< X2 < i
This defines F(x1, x2) over the region (0, 1) x (0, 1), but there are four other regions of the plane where it must be defined. In particular, see Figure 4.2 for the definition of F(x1, x2) on the five regions.
FIGURE 4.2
Values of a joint CDF
F(x ,x2)
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
146
CHAPTER 4 JOINT DISTRIBUTIONS
It also is possible to evaluate joint probabilities by integrating the joint pdf over the appropriate region. For example, we will find the probability that for both trials of the experiment, the "average concentration is less than 0.5." This
event can be represented by [(X1 + X2)/2 <0 5], or more generally by [(X1, X2) E A], where A = {(x1, x2)! (x1 + x2)/2 <0.5]. Thus,
P[(X1 + X2)/2 <0.5] = P[(X1, X2) e A]
=JJf(xi
x2) dx1 dx2
A
fi fi Jo Jo
4x1x2 dx1 dx2
The region A is illustrated in Figure 4.3. For a general k dimensional continuous random variable X = (X1 a k-dimensional event A we have dx1 ... dxk
X) and (4.3.5)
Earlier in the section the notion of marginal distributions was discussed for joint discrete random variables. A similar concept can be developed for joint
FIGURE 4.3
Region corresponding to the event [(X1 +X2)/2 <0.5] X1
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
147
4.3 JOINT CONTINUOUS DISTRIBUTIONS
continuous random variables, but the approach is slightly different. In particular, consider the joint CDF, F(x1, x2), of a pair of random variables X = (X1, X2). The CDF of X1 is F1(x1) = P[X1
= P[X1
x1]
x1, X2 < co]
= F(x1, co)
('Xi
Jf1(t1)
dt1
co
Thus, for the continuous case, the distribution function that we can interpret as the marginal CDF of X1 is given by F(x1, co) and the pdf associated with F1(x1) is the quantity enclosed in parentheses. That is, f1(x1)
F1(x1)
i°
d dx1
c'xi
f(t1,x
dt1 dx2
(co
= Jf(x1,
x2) dx2
co
Similar results can be obtained for X2, which suggests the following definition.
Definition 432 If the pair (X1 X2) of continuous random variables has the Joint pdff(x1 the marginal pdf's of X1 and X2 are
2)
then
('co
f1(x1)
= J - co
f(x1, x2) dx2
(4.3.6)
f(x, x2) dx1
(4.3.7)
(co
f2(x2)
=J
It follows from the preceding argument thatf1(x1) and f2(x2) are the pdf's of X1 and X2, and consequently, another possible notation isf1(x1) and fx2(x2). Con-
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
148
CHAPTER 4 JOINT DISTRIBUTIONS
sider the joint pdf in Example 4.3.1. The marginal pdf of X1 is f1(x1) =
4x1 x2 dx2
Ç
Jo
= 4x1
f
x2 dx2
Jo
= 2x1
for any O < x1 < 1, and zero otherwise. Similarly,j'2(x2) = 2x2 for any O < x, < 1. Actually, the argument preceding Definition 4.3.2 provides a general approach to defining marginal distributions.
Definition 43.3 1f X = (X1, X, ..........X1) is a k-dimensional random variable with joint CDF F(x1, x2 ,...,xk), then the majginal CDF of X is
F(x) = um F(x...........,...,xk)
(4.3.8)
xi_ all i*j
Furthermore, if X is discrete, the marginal pdf is
...
=
all *j
f(x1,
..., x, ...,
X)
(4.3.9)
and if X is continuous, the marginal pdf is
=
J... J f(x, ..., x, ..., X) dx1 ... dxx,
(4.3.10)
Ezample 4.3.2 Let X1, X2, and X3 be continuous with a joint pdf of the formf(x1, x2, x3) = e; O
f3(x3) =
6 dx1 dX2
J
Jo
/o
('x3
=6
X2dX2 Jo
= 3X
if O < x1 < 1, and zero otherwise.
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
44 INDEPENDENT RANDOM VARIABLES
149
Notice that a similar procedure can be used to obtain the joint pdf of any subset of the original set of random variables. For example, the joint pdf of the
pair (X1, X2) is obtained by integrating f(x1, x2, x3) with respect to x3 as follows:
f(x1, x2) =
5f(xi
=
j6 dx3
x2, x3) dx3
=6(1x2) if O < x1
A general formula for the joint pdf of a subset of an arbitrary collection of random variables would involve a rather complicated expression, and we will not attempt to give such a formula. However, the procedure described above, which involves integrating with respect to the "unwanted" variables, provides a general approach to this problem.
4.4 INDDENT RANDOM VARIABLES Suppose that X1 and X2 are discrete random variables with joint probabilities as
given in Table 4.2. Note that f(1, 1) = 0.2 =f1(1)f2(1). That is, P[X1 = i and X2 = 1] = P[X1 = i]P[X2 = 1], and we would say that the events [X1 = 1] and [X2 = 1] are independent events in the usual sense of Chapter 1. However, for example, f(1, 2) = 0.1
f1(1)f2(2), so these events are not independent. Thus
there is in general a dependence between the random variables X1 and X2, although certain events are independent. Iff(x1, x2) = f1(x1)f2(x2) for all possible (x1, x2), then it would be reasonable to say in general that the random variables X1 and X2 are independent.
TABLE 4.2
Values of the joint pdf of two dependent random variables X2
0
x,
1
2
f3(x2)
O
1
2
f1(x,)
0.1 0.1 0.1
0.2 0.2
0.1 0.1
0.1
0.0
0.4 0.4 0.2
0.3
0.5
0,2
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
150
CNAPTER 4 JOINT DISTRIBUTIONS
Similarly, for continuous random variables, suppose that f(x1, =J'1(x1)f2(x2) for all x1 and x2. It follows that ifa
P[aX1 b,cX2d]=
f(x,x2)dx1 dx2 Ja
,Jc
Cd Ch
= jj
f1(x)f2(x2) dx1 dx2
a
c
Ch
Cd
= Jf1(x1)
dx1
a
j
f2(x2) dx2
=P[a
of random variables, where nindom variables X1 and X2 would be said to be independent if all events of type A and B are independent. Note that these concepts apply to both discrete and continuous random variables.
Definitior 4.A1 Independent Random Variables pendent if for every a < b1, P[a1
X1
Random variables X1,..., X1 are said to be lade-
X,,
I
f P[a, i
X. z b.]
(4.4.1)
i
The expression on the right of equation (4.4.1) is the product of the marginal probabilities P[ai X1 b1], ..., PEa,, X,, b,,]. The terminology stochastically independent also is often used in this context If (44 1) does not hold for all a1 < b1, the random variables are called dependent Some properties that are equivalent to independence are stated in the following theorem Thoore,m 4.4.1 Random variables X,,. erties holds:
. . ,
X are independent if and only if the following prop-
F(x1, ..., x,,) = F1(x1) ... F(x)
(4.4.2)
x,,) = f1(x1)
(4 4 3)
f,,(x,,)
where F,(x1) andf(x,) are the marginal CDF and pdf of X., respectively,
Clearly, the random variables in Example 4.3.1 are independent. Indeed, anytime the limits of the variables are not functionally related and the joint pdf
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
4.4
NDEPENDENT RANDOM VARIABLES
can be factored into a function of x1 times a function of x2, say f(x1,
151
x2)
= g(x1)h(x2), then it can be factored into a product of the marginal pdf's, say f(x1, x2) =f1(x1)f2(x2) by adjusting the constants properly This is formalized as follows.
Theorem 4.4.2 Two random variables X1 and X2 with joint pdff(x1, x2) are independent if and only if:
1. the "support set," {(x1, x2)l f(x1, x2) > O], is a Cartesian product, A x B, and 2 the joint pdf can be factored into the product of functions of x1 and x2, f(x1, x2) = g(x1)h(x2)
Exemple 3.4.1
The joint pdf of a pair X1 and X2 is
f(x1,x2)=8x1x2
O
and zero otherwise. This function can clearly be factored according to part (2) of the theorem, but the support set, {(x1, x2) O < < x2
Example 4.4.2 Consider now a pair X1 and X2 with joint pdf
f(x1x2)x1+x2
O
and zero otherwise In this case the support set is {(x1, x2) O
Many interesting problems can be modeled in terms of independent random variables.
Example 4.4.3 Two components in a rocket operate independently, and the probability that each component fails on a launch is p. Let X denote the number of launches required to have a failure of component i and let Y denote the number of
launches required to have a failure of component 2. Assuming the launch trials are independent, each variable would follow a geometric distribution, X, Y GEO(p), and if the components are assumed to operate independently, a reasonable model for the pair of variables (X, Y) would be
f1 (x, y) = f1(x)fy(y) pqx_lpqY_l,p2qx+Y_2;x1,2,...; y=i,2,...
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
152
CHAPTER 4 JOINT DISTRIBUTIONS
where q = i - p. The joint CDF of X and Y is y
x
F(x,y)= j=1 1=1 pqJ_1fpqi_i
= (1 - qX1
qY)
= F(x)F(y) which also could be obtained directly from (4.4.1) of Theorem 4.4.1.
We also are interested in the random variables X and T, where T = X + Y is the number of trials needed to get a failure in both components. Now, to have X = x and T = t, it is necessary to have X = x and Y = t - x, so the joint pdf of Xand Tcan be given by
f,
T(X,
t) = P[X = x, T = tl
= P[X = x, Y = t - x] = fx, 1(x, t - x)
pqx_lpq_xl p2q?_i
t=x+i,x+2,...
It is clear from Theorem 4.4.2 that X and T are dependent because the support set cannot be a Cartesian product. This is also reasonable intuitively, because a large value of X implies an even larger value of T. As noted earlier, the probability model is specified completely by either the pdf or the CDF. For example, the joint CDF of X and T is
F T(x, t) =
pZqI_2
k1 ik+i = i _qX_pXqti
fort=x+1,x+2,...; x=1,2,...,t-1. The marginal CDFs of X and T can be obtained by taking limits as in equa tion (4.3.8), Specifically,
F1(x) = limF1 T(X, t) = i
qX
x = 1, 2,
FT(t) = um F1, T(x, t) = F1 T(t - 1, t) X-. 3
= 1 qt1 - p(t - l)q1
t = 2,3,...
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
4.5 CONDITIONAL DISTRIBUTIONS
153
The marginal pdf of T could be obtained by taking differences, f(t) = F7.(t) - FT(t - 1), or directly from Definition 4.2.2:
t-i .fT(t) =
fx, T(x,
t)
x1 = (t - l)p2qt2
t = 2, 3,
This is the pdf of a negative binomial variable with parameters r = 2 and p, which should not be surprising because T is the number of trials required to obtain two failures. Thus, T - NB(2, p). It also should come as no surprise that X GEO(p); this was assumed at the outset. As noted earlier, the variables X and T are dependent and thus f T(X, t) f(x)f1(t) for some values of x and t. It might be worth noting that verification of independence by seeing that the joint pdf factors into the product of the marginal pdf s requires verification for every pair of values To see this, suppose we
take p = 1/2 on the above example. It turns out that the events [X = 1] and [T = 3] are independent, because f, T(l, 3) = 1/8 =f'(1)f(3). However, X and T are not independent unless all such events are independent. This is not true in this case because, for example, f T(2, 3) = 1/8 but f(2)f1.(3) 1/16. To show dependence, it suffices to find only one such pair of values.
4j CONDmONAL L)ISTRIUTIONS Recall that independence also is related to the concept of conditional probability, and this suggests that the definition of conditional probability of events could be
extended to the concept of conditional random variables. In the previous example, one may be interested in a general formula for expressing conditional probabilities of the form fX,T(Çt)
P Tt X_x]_
x,Tt1
P[X=x]
f(x)
which suggests the following definition.
2nrion 4.5,1 Conditional pdf If X1 and X2 are discrete or continuous random variables with joint pdf f(x1, x2), then the conditional probability density function (conditional pdf) of X2 given X1 = x1 is defined to be
Ix) -'fh.
(4.5.1)
foi values x1 such thatf1(x1)> O and zero otherwise
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
154
CHAPTER 4 JOINT DISTRIBUTIONS
Similarly, the conditional pdf of X1 given X2 = x2 is
f(x11x2)
f(x1, X2) f2(x2)
(4.5.2)
for x2 such that f2(x2) > O, and zero otherwise.
As noted in the previous example, for discrete variables a conditional pdf is actually a conditional probability. For example, if X1 and X2 are discrete, then f(x2 I x1) is the conditional probability of the event [X2 = x2] given the event [X1 = x1]. In the case of continuous random variables, the interpretation of the conditional pdf is not as obvious because P[X1 = x1] O. Although f(x2 x1) cannot be interpreted as a conditional probability in this case, it can be thought of as assigning conditional "probability density" to arbitrarily small intervals [x2, x2 + Lx2], in much the same way that the marginal pdf,f2(x2), assigns marginal probability density. Thus, in the continuous case, the conditional probability of an event of the form [a X2 b] given X1 = x1 is
P[a
X2
f(x2 f xi) dx2
bIX1
fb (4.5.3)
-
f(x1, x2) dx2
That is, the denominator is the total area under the joint pdf at X1 = x1, and the numerator is the amount of that area for which a X2 b (see Figure 44) This could be regarded as a way of assigning probability to an event [a X2 b] over a slice, X1 = x1, of the joint sample space of the pair (X1 X2) For this to be a valid way of assigning probability, f(x2 I x1) must satisfy the
usual properties of a pdf in the variable x2 with x1 fixed The fact that f(x2 I x1) ? O follows from (4 5 2) also note that
I
f(x2Ix1)dx2=
fj(x1)j_ i I
f1(x )
f(x1,x2)dx2
f1(x1) =
for the continuous case. The discrete case is similar. The concept of conditional distribution can be extended to vectors of random variables. Suppose, for example, that X = (X1, ..., Xr, ..., X) has joint pdff(x) and X1 = (X1, ..., X,.) has joint pdff'1(x1). If X2 = (X,.1.....Xk), then the conditional pdf of X2 given X1 = x1 is f(x2 x1) = f(x)/f1(x1) for values x1 such that
f1(x)> O. As an illustration, consider the random variables X1, X2, and X3 of
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
4.5
FIGURE 4.4
155
CONDITIONAL DISTRIBUTIONS
Conditional distribution of probability
f(x,x2)
Example 4.3.2. The conditional pdf of X3 given (X1, X2) = (x1, x2
f(x3 x1, x2) -
f(x1, x2,
x3)
X2
J 6
6(1 I
x2)
0< X1
and zero otherwise. Some properties of conditional pdf's, which correspond to similar properties of conditional probability, are stated in the following theorem.
Theorem 4.5.1 If X1 and X2 are random variables with joint pdff(x
and marginal pdf's
f1(x1) and f2(x2), then
f(x1, x2) = f1(x1)f(x2 Ix) -f2(x2)f(x1
(4,5.4)
and if X1 and X2 are independent, then
f(x2 I x) = f2(x2)
(4.5.5)
f(xiIx2) = f1(x1)
(4.5.6)
and
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
156
CHAPTER 4 JOINT DISTRIBUTIONS
Other notations often are used for conditional pdf's. For example, if X and Y are jointly distributed random variables, the conditional pdf of Y given X = x also could be written asf11(yx) or possibly f11(y). In most applications, there will be no confusion if we suppress the subscripts and use the simpler notation fCv I x) It is also common practice to speak of "conditional random variables
denoted by YIX or YIX=x.
Example 4.5.1
Consider the variables X and T of Example 4.4,3. The conditional pdf of T given
X = xis
f(tjx)=
2 t-2 pqX_l
_pqi_X_
t = x + 1, x + 2,...
Notice that this means that for any x = 1,
2,
..., the conditional pdf of
U = T - X given X = x is
f1(uIx)= PUT - x = uIX = = P[T = u + x X = = f(u + x x) =pqU_l
u=1,2,3,...
Thus, conditional on X = x, U
GEO(p). The conditional pdf of X given T = t
is
p2qt -2
f(x It) = (t - 1)p2qt_2
x=1,2,...,t-1
=t1
Thus, conditional on T = t, X the integers 1, 2, .. ., t - 1. Example 4.5.2
DU(t - 1), the discrete uniform distribution on
A piece of flat land is in the form of a right triangle with a southern boundary two miles long and eastern boundary one mile long (see Figure 4.5). The point at which an airborne seed lands is of interest. The following assump
tion will be made: Given that the seed lands within the boundaries, its coordi nates X and Y are unifonnly' distributed over the surface of the triangle In other words, the pair (X, Y) has a constant joint pdff(x, y) = c > O for points within the boundaries, and zero otherwise By property (4 3 4), we have e = i The marginal pdf's of X and Y are
fi(x)=J
dy=
('2
f2(y)
=
dx = 2(1 J2y
O
2
- y)
o
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
4.5
FIGURE 4.6
157
CONDITIONAL DISTRIBUTIONS
A tri&ìgular region
y = x/2
0.7
0.25 0.1 0.5
2
The conditional pdf of Y given X = x is I
f(yIx)=-= x/2 x
O
and zero otherwise. This means that conditional on X = x, Y IJNIF(0, x/2). One possible interpretation of this result is the following. Suppose we are able to observe the x-coordinate but not the y-coordinate. If we observe that X = x, then it is sensible to use this information in assigning probability to events relative to Y. Thus, if we observe X = 0.5, then the conditional probability that Y is in some interval, say [0.1, 0.7], is P{0.l
Y0.7IX=0.5]=
('0.7 I
f(YI0.5)dY
10.1
(0.25
Jo.i
4dy
= 0.6
where the change in the upper limit is becausef(y I x) = O if x = 0.5 and y> 0.25 (see Figure 4.5). A conditional probability can take this sort of information into account, whereas a marginal probability cannot. For comparison, the marginal probability of this event is 0.7 (2 P[0.1
Y
0.7] iOJ
f(x, y) dx dy .J2y
f2tv) dy (0.7
Jo.i
2(1 y) dy
= 0.72
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
158
CHAPTER 4 JOINT DISTRIBUTIONS
Mathematically, there is nothing wrong with the marginal probability, but it cannot take information about related variables into account.
In this example, the conditional pdf's turned out to be uniform, but this is not always the case. Example 4.5.3
Consider the joint density discussed in Example 4.4.2,
f(x,y)=x+y
0
In this case, for any x between 0 and 1,
f(yIx)=1_5
O
For example, o
P[0 < Y < 0.5 X = 0.25] =
0.25 + 0.5
d
4.6 RANDOM SAMPLES If X represents a random variable of interest, say the lifetime of a certain type of light bulb, then f(x) represents the population probability density function of this variable. If one light bulb is selected "at random" from the population and placed
in operation, then f(x) provides the failure density for that light bulb. This process might be described as one trial of an experiment If one assumes that conceptually there is an infinite population of light bulbs of this type or if the actual population of light bulbs is sufficiently large so that the population may be assumed to remain essentially the same after a finite number, n, of light bulbs is drawn from it, then it would be reasonable to assume that the population probability density function also is applicable to the lifetime of the second light bulb drawn. Indeed, we could conduct n trials of the experiment, and to distinguish the
n trials we may let X. denote the lifetime of the light bulb obtained on the ith trial, where each X f(xJ. That is, each X is distributed according to the common parent population density. In addition, if the items are sampled (or the trials are conducted) in such a way that the outcome on one trial does not affect the probability distribution of the variable on a different trial, then the variables may be assumed to be independent. Ordinarily, we will assume that the trials of
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
159
4.6 RANDOM SAMPLES
the experiment or the sampling are conducted in such a way that these two conditions are satisfied, and we will refer to this as "random sampling."
Do finition 4.6.1 Random Sample The set of random variables X1, ..., X is said to be a random sample of size form
n
from a population with density functionf(x) if the joint pdf has the
f(x, x2, . .,
= f(x1)f(x2)
f(x)
(4.6.1)
That is, random sampling assumes that the sample is taken in such a way that
the random variables for each trial are independent and follow the common population deiistty function In this case, the joint density function is the product of the common marginal densities. It also is common practice to refer to the set of observed values, or data, x1, x2, ..., x, obtained from the experiment as a random sample. In many cases it is necessary to obtain actual observed data from a population LO help validate an assumed model or to help select an appropriate model If the dai i can be assumed to represent a random sample, then equation (46 1) pro
vides the connecting link between the observed data and the mathematical model.
The lifetime of a certain type of light bulb is assumed to follow an "exponential" population density function given by
f(x) = e_x
O
(4.6.2)
where the lifetime is measured in years. If a random sample of size two is obtained from this population, then we would have f(x1, x2) = O < x
(4.6.3)
Now suppose that the total lifetime of the two light bulbs turned out to be 1 + x2 = 0 5 years One may wonder whether this sample result is reasonable when the population density is given by equation (4.6.2). If not, then it may be that a different population model is more appropriate. Questions of this type can be answered by using equation (4.6.3). In particular, consider Cc
P[X1 + X2
c]
=I
Çcx2. I
Jo Jo
= I - ce
e12) dx1
dx2
- e_c
For c = 0.5, P[X1 + X2 0.5] = 0.09; thus it would be unlikely to find the total lifetime of the two bulbs to be 0.5 years or less, if the true population model is given by equation (4.6.2).
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
160
CHAPTER 4 JOINT DISTRIBUTIONS
Specific techniques for making decisions or drawing inferences based on sample data will be emphasized in later chapters, and we now are interested primarily in developing the mathematical properties that will be needed in carrying out those statistical procedures
EMPIRICA L DISTRIBUTIONS
It is possible to use some of the work on discrete distributions to study the adequacy of specific continuous models For example, let X1, X2, , X be a random sample of size n, each distributed as X f(x), where, respectively, f(x) and F(x) are the population pdf and CDF. For each real x, let W be the number of variables, X, in the random sample that are less than or equal to x. We can regard the occurrence of the event [X x] for some i as "success," and because the variables in a random sample are independent, W simply counts the number
of successes on n independent Bernoulli trials with probability of success p = P[X x]. Thus, W BIN(n, p) with p = F(x). The relative frequency of a success on n trials of the experiment would be W/n, which we will denote by F(x), referred to as the empirical CDF. The property of statistical regularity, as discussed in Section 1 2 suggests that F(x) should be close to F(x) for large n Thus, for any proposed model, the corresponding CDF, F(x), should be consistent with the empirical CDF, F(x), based on data from F(x). We now take a set of data x1, x2,. ., x,, from a random sample of size n from f(x), and let Yi
X <1
ExampTh 4.6.2
yi
X
yn
X
(4.6.4)
Consider the following data from a simulated sample of size n = 10 from the distribution of Example 2.3.2, where the CDF has the form F(x) = 1 - (1 +
x >0:
X.:
0.85, 1.08, 0.35, 3.28, 1.24, 2.58, 0.02, 0.13, 0.22, 0,52
y:
0.02, 0.13, 0.22, 0.35,
52,
x)2;
0.85, 1.08, 1.24, 2.58, 3.28
The graphs of F(x) and F10(x) are shown in Figure 4.6. Although the graph of an empirical CDF, F(x), is a step function, it generally should be possible to get
at least a rough idea of the shape of the corresponding CDF F(x). In this example, the sample size n = 10 is probably too small to conclude much, but the graphs in Figure 4.6 show a fairly good agreement.
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
4.6 RANDOM SAMPLES
FIGURE 4.6
161
Comparson of a CDF with ari empirical CDF
F(x)
o
To select a suitable distribution for the population, it is helpful to have numeri-
cal estimates for unknown population parameters. For example, suppose it is desired to estimate the population mean t. The mean of the empirical distribution, computed from the sample data, is a natural choice as an estimate of p. Corresponding to the empirical CDF is a discrete distribution with pdf, say f(x), which assigns the value 1/n at each of the data values x = x1, x2, ..., x and zero otherwise. The mean of this distribution is
x, f(x1)
= x1(l/n) + ... + x(l/n),
which provides the desired estimate of p. This estimate, which is simply the arithmetic average of the data, is called the sample mean, denoted by
=
xdn. This
rationale also can be used to obtain estimates of other unknown parameter values. For example, an estimate of the population variance cr2 would be the
)2/ + (x, - )2f(x) (x i + (x - )2/n. However, it turns out that such a procedure tends to underestimate the population variance. This point will be discussed in Chapter 8, where it variance of the empirical distribution,
will be shown that the following modified version does not suffer from this problem. The sample variance is defined as s2 =
-
)2/(n
- 1). Another
illustration involves the estimation of a population proportion. For example, suppose a study is concerned with whether individuals in a population have been exposed to a particular contagious disease. The proportion, say p, who have been exposed is another parameter of the population If n individuals are selected at random from the population and y is the number who have been exposed, then the sample proportion is defined as = y/n. This also can be treated as a special type of sample mean. For the ith individual in the sample, define x. = i if he has been exposed and zero otherwise. The data x1, ..., x,, correspond to observed values of a random sample from a Bernoulli distribution with parameter p, and (x + + x,j/n = y/n =
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
162
CHAPTER 4 JOINT DISTRIBUTIONS
HISTOGRAMS
It usually is easier to study the distribution of probability in terms of the pdf, f(x), rather than the CDF. This leads us to consider a different type of empirical distribution, known as a histogram. Although this concept generally is considered as a purely descriptive method in lower level developments of probability and statistics it is possible to provide a rationale in terms of the multinomial distribu tion, which was presented in Section 4.2.
Suppose the data can be sorted into k disjoint intervals, say I. = (ai, aï1]; j = 1, 2, ..., k. Then the relative frequency, fi, with which an observation falls into I gives at least a rough indication of what range of values the pdf, f(x), might have over that interval. This can be made more precise by considering k + 1 events E1, E2, ..., E11, where E occurs if and only if some variable, X, from the random sample is in the interval I, if j = 1, 2, ..., k and Ek+j occurs if an X is not in any I,. If Y, is the number of variables from the random sample that fall into I, and Y = (Y1, Y2, ..., Yk), then Y MULT(n, pa,..., Pk), where ('aj i
= F(a+1) - F(a,)
f(x) dx
(4.6.5)
= Jaj Again, because of statistical regularity we would expect the observed relative frequency, fi = y/n to be close to p for large n It usually is possible to choose the intervals so that the probability of Ek+ is negligible. Actually, in practice, this is accomplished by choosing the intervals after the data have been obtained. This is a convenient practice, although theoretically incorrect.
Exanipkl
3.ic12
The following observations represent the observed lifetimes in months of a random sample of 40 electrical parts, which have been ordered from smallest to largest: 0.15
2.37
2.90
7.39
7.99
12.05
15.17
17.56
22.40
34.84
35.39
36.38
39.52
41.07
46.50
50.52
52.54
58.91
58.93
66.71
71.48
71.84
77.66
79.31
80.90
90.87
91.22
96.35
108.92
112.26
122.71
126.87
127.05
137.96
167.59
183.53
282.49
335.33
341.19
409.97
We will use k = nine intervals of length 50, I = (0, 50], '2 = (50, 100], and so on. The distribution of the data is summarized in Table 4.3. It is seen, for example, that the proportion 15/40 = 0.375 of the sample values
fall below 50 months, so one would expect approximately 37.5% of the total population to fall before 50 months, and so on. Of course, the accuracy of these estimates or approximations would depend primarily on the sample size n. To plot this information so that it directly approximates the population pdf, place a
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
4.6
T.AØLE 4.3
163
RANDOM SAMPLES
Frequency distribution of lifetimes of 40 electrical parts Limits of/i
f
ffn
Height
0-50 50-100 100-150
15 13 6 2 0
0.375 0.325 0,150 0.050 0.000 0 025 0.050 0.000 0.025
0.0075 0.0065 0.0030 0.0010 0.0000 0.0005 0.0010 0.0000 0.0005
1 50-200
200-250 250-300 300-350 350-400 400-450
1
2 0 1
rectangle over the interval (0, 50] with area 0.375, a rectangle over the interval (50, 100] with area 0.325, and so on. To achieve this, the height of the rectangles should be taken as the fraction desired divided by the length of the interval. Thus the height of the rectangle over (0, 50] should be 0.375/50 = 0.0075, the height over (50, 100] should be 0.325/50 = 0.0065, and so on. This results in Figure 4.7, which sometimes is referred to as a modified relative frequency histogram
FIGURE 4.7
Comparison of an exponential pdf with a modified relative frequency histogram
f(x)
o
A smooth curve through the tops of the rectangles then would provide a direct approximation to the population pdf The number and length of the intervals can be adjusted as desired taking into account such factors as sample size or range of the data. Such a decision is purely subjective, however, and there is no universally accepted rule for doing this.
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
164
CHAPTER 4 JOINT DISTRIBUTIONS
The shape of the histogram of Figure 4.7 appears to be consistent with an exponential pdf, say
f(x) = (1/100)e10°
O < x < co
and the graph of this pdf also is shown in the figure. In this case, the CDF is
F(x)=1-e°°
O
to this model, P[X s 50] = F(50) = 0.393, P[50
According
and are compared with the observed frequencies for the sample of size 40.
TABLE 4.4
Observed and fitted probabilities
Limitz of I 0-50 50-100 100-150 150-200 200-250 250-300 300-350 350-400 400-450
Interval probabilities
Cumulative probabilities
Observsd
Exponential
Observed
Exponential
0.375 0.325 0.150 0.050 0.000 0,025 0.050 0.000 0.025
0.393 0.239 0.145 0.088 0.053 0.032 0,020 0.012 0.007
0.375 0.700 0.850 0.900 0.900 0.925 0.975 0.975 1.000
0.393 0.632 0.777 0.865 0.918 0.950 0.970 0.982 0.989
The data in this example were simulated from an EXP(l00) model, and the discrepancy between the empirical distribution and the true model results from the natural "sampling error" involved. As the sample size increases, the histogram or empirical distribution should approach the true model. Of course, in practice the true model would be unknown. Perhaps it should be noted that definition 4.6.1 of a random sample does not apply to the case of "random sampling without replacement'from a finite popu lation. The "random sampling" terminology is used in this case to reflect the fact that on each trial the elements remaining in the population are equally likely to be selected, but the trials are not independent and the usual counting techniques discussed earlier would apply here. The random sample terminology also may refer here to the idea that each subset of size n elements of the population is equally 'likely to be selected as the sample. Definition 4.6.1 is applicable to sampling from finite populations if the sampling is with replacement (or it would be approximately suitable if the population is quite large).
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
165
EXERCISES
SUMMARY The purpose of.this chapter was to further develop the concept of a random variable to include experiments that involve two or more numerical responses. The joint pdf and CDF provide ways to express the joint probability distribution. When the random variables are considered individually, the marginal pdf's and CDFs express their probability distributions. When the joint pdf can be factored into the product of the marginal pdf's, then the random variables are independent If the jointly distributed random variables are dependent, then the conditional pdf's provide a valuable way to express this dependence.
Information about the nature of the true probability model may be obtained by conducting n independent trials of an experiment, and obtaining n observed values of the random variable of interest. These observations constitute a random sample from a real or conceptual population. Useful information about the population distribution can be obtained by descriptive methods such as the empirical CDF or a histogram
EX E RC IS ES 1.
For the discrete random variables defined in Exercise i of Chapter 3, tabulate: the joint pdf of Y and Z. thejoint pdf of Z and W.
2. In Exercise 2 of Chapter 3, a game consisted of rolling a die and tossing a coin. If X denotes the number of spots showing on the die plus the number of heads showing on the coin, and if Y denotes just the number of spots showing on the die, tabulate the joint pdf ofX and Y.
3. Five cards are drawn without replacement from a regular deck of 52 cards. Let X represent the number of aces, Y the number of kings, and Z the number of queens obtained. Give the probability of each of the following events:
A=[X=2]. [Y A r B. B
2].
A u B. A given B.
[X=x]. [X<2].
[X2].
[X=2,Y=2,Z=l). Write an expression for the joint pdf of X, Y, and Z.
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
CHAPTER 4 JOINT DISTRIBUTIONS
166
4. In reference to Example 4.2,1, find the probability for each of the following events. Exactly five red and two white. Exactly five red and two pink.
5. Rework Exercise 3, assuming that the cards were drawn with replacement. 6. An ordinary six-sided die is rolled 12 times. If X1 is the number of l's, X2 is the number of 2's, and so on, give the probability for each of the following events:
(a) [X12,X2=3,X3lX4O,X54,X62]. (b) [X5 = X2 = X3 = X4 = X5 = X6].
(c) [X1=1,X2=2,X3=3,X44]. (d) Write an expression for the joint pdf of X1, X3, and X5.
7
Suppose that X1 and X2 are discrete random variables with joint pdf of the form f(x5, x2) = c(x1 + x2)
x1 = 0, 1, 2; x2 = 0, 1, 2
and zero otherwise. Find the constant c.
8.
if X and Y are discrete random variables with joint pdf
f(x,Y)c,1
x=0,l,2,...;yO,l,2,...
and zero otherwise. Find the constant c. Find the marginal pdf's of X and Y. (e) Are X and Y independent? Why or why not?
9. Let X1 and X2 be discrete random variables with joint pdff(x1, x2) given by the following table: X2
2
1/12
x,
2
O
3
1/18
1/6 1/9 1/4
0 1/5
2/15
Find the marginal pdf's of X5 and X2. Are X1 and X2 independent? Why or why not? (e) Find P[X1 2] Find P[X1 X2].
Tabulate the conditional pdf's,f(x2 x1) and f(x1 x2). 10.
Two cards are drawn at random without replacement from an ordinary deck. Let X be the number of hearts and Y the number of black cards obtained. (a) Write an expression for the joint pdf,f(x, y).
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
167
EX ER CS ES
Tabulate the joint CDF, F(x, y). Find the marginal pdf's,f1(x) andf'2(y). Are X and Y independent?
Find PUY = iX = 1].
FindP[Y=yIX= 1]. Find P[Y=yIX=x]. TabulateP[X+ Yz];z=0,i,2. Rework Exercise 10, assuming that the cards are drawn with replacement. Consider the function F(x1, x2) defined as follows 0.25(x1
+x2)2 if0x1 <1 and 0x2< i
O
ifx1<0orx2
i
otherwise
Is F(x1, x2) a bivariate CDF?
Hint:
Check the properties of Theorem
4.2.2.
Prove Theorem 4.4.2.
14
Suppose the j oint pdf of lifetimes of a certain part and a spare is given by
0
f(x,y)=e_
and zero otherwise. Find each of the following: The marginal pdf's,f1(x) andf2(y). The joint CDF, F(x, y). P[X>2].
P{X < Y]. FIX + Y > 2].
Are X and Y independent? 15. Suppose X1 and X2 are the survival times (in days) of two white rats that were subjected to different levels of radiation. Assume that X1 and X2 are independent, X1
PAR(1, 1) and X2
PAR(l, 2)
Give the joint pdf of X1 and X2. Find the probability that the second rat outlives the first rat, P[X1
16.
Assume that X and Y are independent with X UNIF( .1, 1) and Y Find the probability that the roots of the equation h(t) = t2
17.
+ 2Xt
h(t) = O
UNIF(0, i).
are real where
+ Y
For the random variables X1, X2, and X3 of Example Find the marginal pdf f1(x1). Find the marginal pdf f2(x2). Find the joint pdf of the pair (X1, X2).
4.3.2:
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
¡68
CHAPTER 4 JOINT DISTRIBUTIONS
18. Consider a pair of continuous random variables X and Y with a j oint CDF of the form
O5xy(x+y) ifO
F(x, y) =
O.5y(y+l) i
ifO
ifix,1y
and zero otherwise. Find each of the following: The joint pdf, f(x, y).
PfXO.5, YO.5]. (e) P[X
0.5]. 1.5].
P[X+Yz];O
Let X and Y be continuous random variables with ajoint pdf of the form
f(x,y)k(x+y) 0xy1 and zero otherwise.
Find k so thatf(x, y) is a joint pdf. Find the marginals,f1(x) andf2(y). Find the Joint CDF F(x y) Find the conditional pdff(y Ix). Find the conditional pdf f(x I y).
20. Suppose that X and Y have the joint pdf
f(x,y)-8xy Oxyi and zero otherwise. Find each of the following: The joint CDF F(x, y).
f(ylx). (e) f(x I y).
P[X P[X
0.51Y= 0.75]. O.5Y 0.75].
21. Suppose that X and Y have the joint pdf
f(x,y)=(2/3)(x+1)
O
and zero otherwise Find each of the following f1(x). f2(y).
f(y Ix).
P[X + Y
1].
P[X<2Y<3X]. Are X and Y independent?
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
169
EXERCISES
22. Let X1, X2, ..., X denote a random sample from a population with pdf f(x) = 3x2; O < x < 1, and zero otherwise. Write down the joint pdf of X1, X2.....X. Find the probability that the first observation is less than 0.5, P{X1 < 0.5]. Find the probability that all of the observations are less than 0.5.
23. Rework Exercise 22 if the random sample is from a Weibull population, X1
WEI(1, 2).
24. The following set of data consists of weight measurements (in ounces) for 60 major league baseballs: 5.09
5.26 5.27 5.26 5.28 5.25
5.08 5.10 5.26 5.18 5.24 5.28
5.21 5.28 5.17 5.13 5.23 5.24
5.17 5.29 5.19 5.08 5.23 5.26
5.07 5.27 5.28 5.25 5.27 5.24
5.24 5.09 5,28 5.17 5.22 5.24
5.12 5.24 5.18 5.09 5.26 5.27
5.16 5.26 5.27 5.16 5.27 5.26
5.18 5.17 5.25 5.24 5.24 5.22
5.19 5.13 5.26 5.23 5.27 5.09
Construct a frequency distribution by sorting the data into five intervals of length 0.05, starting at 5.05. Based on the result of(a), graph a modified relative frequency histogram. Construct a table that compares observed and fitted probabilities based on the intervals of(a), and for pdff(x) that is uniform on the interval [5.05, 5.30].
25. For the first 10 observations in Exercise 24, graph the empirical CDF, F10(x), and also graph the CDF, F(x), of a uniform distribution on the interval [5.05, 5.30].
26. Consider the jointly distributed random variables X1, X2, and X3 of Example 4.3.2. Find the joint pdf of X1 and X3. Find the joint pdf of X2 and X3. Find the conditional pdf of X2 given (X1, X3) = (x1, x3). Find the conditional pdf of X1 given (X2, X3) = (x2, x3). Find the conditional pdf of (X1, X2) given X3 = x3.
27. Suppose X1, X2 is a random sample of size n = 2 from a discrete distribution with pdf given byf(1) = f(3) = .2 andf(2)
.6.
Tabulate the values of the joint pdf of X1 and X2. Tabulate the values of the joint CDF of X1 and X2, F(x1, x2).
(e) Find P[X1 + X2
4].
28. Suppose X and Y are continuous random variables with joint pdff(x, y) = 4(x - xy) if O
29. Suppose X and Y are continuous random variables with joint pdf given byf(x, y) = 24xy ifO
Find P[Y> 2X]. Find the marginal pdf of X.
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
170
CHAPTER 4 JOINT DISTRIBUTIONS
30. Suppose X and Y are continuous random variables with joint pdff(x, y) = 60x2y if O < x, O
FindP[Y>.1IX=.5]. 31. Suppose X1 and X2 are continuous random variables with joint pdf given by f(x1, x2 = 2(x1 + x2) if O < x1
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
H
P
A
T
E
PROPERTIES OF RANDOM VARIABLES
51 INTRODUCTION The use of a random variable and its probability distribution has been discussed as a way of expressing a mathematical model for a nondeterministic physical phenomenon. The random variable may be associated with some numerical characteristic of a real or conceptual population of items, and the pdf represents the distribution of the population over the possible values of the characteristic. Quite often the true population density may be unknown. One possibility in this case is to consider a family of density functions indexed by an unknown parameter as a possible model and then concentrate on selecting a value for the parameter A major emphasis in statistics is to develop estimates of unknown parameters based on sample data. In some cases a parameter may represent a physically meaningful quantity, such as an average or mean value of the population. Thus, it is worthwhile to define and study various properties of random variables that may be useful in representing and interpreting the original population, as well as 171
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
172
CHAPTER 5 PROPERTIES OF RANDOM VARIABLES
useful in estimating or selecting an appropriate model. In some cases, special properties of a model (such as the no-memory property of the exponential distribution) may be quite helpful in indicating the type of physical assumptions that would be consiíent with that model, although the implications of a model usually are less clear. In such a case, more reliance m: be placed on basic
descriptive measures such as the mean and variance of a distribution. In this chapter, additional descriptive measures and further properties of random variables will be developed.
52 PROPERTIES OF EXPECTED VALUES As noted in Chapter 2, it often is necessary to consider the expected value of some function of one or more random variables. For example, a study might involve a vector of k random variables, X = (X1, X2, ..., Xj, and we would wish to know the expected value of some function of X, say Y u(X). We could use the standard notation E(Y), or another possibility would be E[u(X)], or Ex[u(X)],
where the subscript emphasizes that the sum or integral used to evaluate this expected value is taken relative to the joint pdf of X. The following theorem asserts that both approaches yield the same result.
Theorem 5.2.1 If X = (X1, ..., Xk) has a joint pdff(x1, ..., x), and if Y = u(X1, ..., Xk) is a function of X, then E(Y) = E[u(X1, ..., X)], where Xk)]
...
u(x1,
..., x)f(
Xk)
(5.2.1)
if Xis discrete, and
E[U(X1,.
Xk)] =
-c
X1, ..., xk)f(xl, ..., Xk) dx1 ... dx (5,2.2)
if X is continuous.
The proof will be omitted here, but the method of proof will be discussed in C.hapter 6. We use the results of the theorem to derive some additional properties of expected values.
Theorem 5.2.2 If X1 and X2 are random variables with joint pdff(x1, x2), then E(X1 + X2) = E(X1) + E(X2)
(5.2.3)
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
5.2
17
PROPERTIES OF EXPECTED VALUES
Proof Note that the expected value on the left side of equation (5.2.3) is relative to the joint pdf of X = (X1, X2), while the terms on the right side could be relative to either the joint or the marginal pdf's. Thus, a more precise statement of equation (5 2 3) would be
E(X1) + E(X2) = E1(X1) + E2(X2)
EX(X1 + X2) =
We will show this for the continuous case E(X1 + X2) = EX(X1 + X2)
+
2
=
Jxi
f(x1, x2) dx1 dx2
Jf(xi x2 J
f(x1)
x2) dx2 dx1
f(x1, x2) dx1 dx2 dx1
2
f2(x2) dx2
= E1(X1) + E2(X2) = E(X1) + E(X2)
The discrete case is similar.
It is possible to combine the preceding theorems to show that if a1, a2, .., a, are constants and X1, X2, ..., X are jointly distributed random variables, then
¡k E( i=1
\
k
a X1) =
a1
E(X)
(5.2.4)
j=1
Another commonly encountered function of random variables is the product. Theorem 5,2.3 If X and Y are independent random variables and g(x) and h(y) are functions,
then E[g(X)h(Y)] = E[g(X)]E[h(Y)]
(5.2.5)
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
174
CHAPTER 5
PROPERTIES OF RANDOM VARIABLES
Proof In the continuous case, E[g(X)h(Y)]
=
j
j(x)h(Y)f(x
--
y) dx dy dx dy
= jj g(x)h(y)f1(x)f2(y) -ir g(x)f1(x) dxj[j h(y)f2(y) dy
[r
('
-
= E[g(X)]E[h(Y)]
It is possible to generalize this theorem to more than two variables, Specifically, if X1, ..., X, are independent random variables, and u1(x1), ..., u,(x) are functions, then E[u1(X1)
U(Xj] = E[u1(X1)] ,,. E[u(X)]
(5.2.6)
Certain expected values provide information about the relationship between
two variables.
Definition 5.2.1 The covariance of a pair of random variables X and Y is defined by
Coy (X, Y) = E{(X - u1)(Y - /2 y)]
(5.2.7)
Another common notation for covariance is
Some properties that are useful in dealing with covariances are given in the
following theorems.
Theorem 5.2.4
If X and Y are random variables and a and b are constants, then Cov(aX, bY) = ab Cov(X, Y)
(5.2.8)
Cov(X + a, Y + b) = Cov(X, Y)
(5.2.9)
Cov(X, aX + b) = a Var(X)
(5.2.10)
Proof See Exercise 26.
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
5.2
175-
PROPERTIES OF EXPECTED VALUES
Theorem 5.2.5 If K and Y are random variables, then
Cov(X, Y) = E(X Y) - E(X)E(Y)
(5.2.11)
and Cov(X, Y) = O whenever X and Y are independent
Proof See Exercise 27.
Theorem 52.2 dealt with the expected value of a sum of two random variables. The following theorem deals with the variance of a sum. Theorem 5.2.6 If X1 and X2 are random variables with joint pdff(x1, x2), then Var(X1 + X2) = Var(X1) + Var(X2) + 2 Cov(X1,
2)
(5.2.12)
and
Var(X1 + X2) = Var(X1) + Var(X2)
(5.2.13)
whenever X1 and X2 are independent.
Proof For convenience, denote the expected values of X1 and X2 by
=
i = 1,2. Yar(X1 + X2) = E[(X1 + X2) - (j + P2)]2
= E[(X1 - p) + (X2 - P2)]2 )2] = + E[(X2 - P2)2]
-
+ 2E[(X1 - p1)(X2 - P2)] = Var(X1) + Var(X2) + 2 Cov(X1, X2) which establishes equation (5.2.12). Equation (5.2.13) follows from Theorem 5.2.5.
It also can be verified that if X1, ..., X are random variables and a1,. ., ak are constants, then
Var(aiX) = i=1 a Var(X) + 2 i=1
i
aa Cov(X1, X)
(5.2.14)
and if X1, ..., Xk are independent, then
¡k \ Var(aXi) = ¡=1
k
a Var(X)
(5.2.15)
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
176
Example 5.2.1
CHAPTER 5
PROPERTIES OF RANDOM VARIABLES
Suppose that Y - BIN(n, p). Because binomial random variables represent the number of successes in n independent trials of a Bernoulli experiment, Y is dístri
buted as a sum, Y =
X1, of independent Bernoulli variables, X1
BIN(l, p).
It follows from equations (5.2.4) and (5.2.15) that the mean and variance of Y are E(Y) = np and Var( Y) = npq, because the mean and variance of a Bernoulli vari able are E(X1) = p and Var(X) = pq. This is somewhat easier than the approach used in Chapter 3.
Example 5.2.2 The approach of the previous examples also can be used if Y ' HYP(n, M, N), but the derivation is more difficult because draws are dependent if sampling is done without replacement For example, suppose that a set of N components contains M defective ones, and n components are drawn at random without replacement. If X. represents the number of defective components (either O or 1) obtained on the ith draw, then X1, X2, ..., X are dependent Bernoulli variables. Consider the pair (X1, X2). It is not difficult to see that the marginal distribution
of the first variable is X1 X1 = x1, X2
BIN(l, Pi) where p = M/N, and conditional on
BIN(1, P2), where P2 = (M - x1)/(N - 1). Thus, the joint pdf of
(X1, X2) is
f(x1, x2) = p1q
X1p2X2qXi
xi = 0, 1
(5.2.16)
from which we can obtain the covariance, Cov(X1, X
Mf'1 N
M'\
1
N)N_1
Actually, it can be shown that for any pair (X1, X) with i
(5.2,17)
j, Cov(X1, X) is
given by equation (5.2.17), and for any i, E(X1)
=
(5.2.18)
and
Var(X1)=(1 _%i)
(5.2.19)
It follows from equations (5.2.4) and (5.2.14) that
M (5.2.20)
MI MVNn Var(Y)=n1 N,JN-1
(5.2.21)
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
5.3
177
CORRELATION
In the case of equation (5.2.21), there are n terms of the form (5.2.19) and n(n - 1) of the form (5.2.17), and the result follows after simplification.
Note that the mean and variance of the hypergeometric distribution have forms similar to the mean and variance of the binomial, with p replaced by M/N, except for the last factor in the variance, (N - n)/(N - 1), which is referred to as the finite multiplier term. Because the hypergeometric random variable, Y, represents the number of defects obtained when sampling from a finite population without replacement, it is clear that Var( Y) must approach zero as n approaches N That is, Y = M when n = N, with variance zero, whereas this effect does not occur for the binomial case, which corresponds to sampling with replacement
APPROXIMATE MEAN AND VARIANCE Chapter 2 discussed a method for approximating the mean and variance of a function of a random variable X. Similar results can be developed for a function
of more than one variable For example, consider a pair of random variables (X, Y) with means Pi and P2, variances ¿r and o, and covariance c2 further suppose that the function H(x y) has partial derivatives in an open rectangle containing (ui, p) Using Taylor approximations, we obtain the following approximate formulas for the mean and variance of H(X, Y)
E[
XY )] = H(.u, P2) + ,
2
(
Var[H(X, Y)]3H2 () a
+ 32
2
3H 3H () + 2()(c12 3H2
+
where the partial derivatives are evaluated at the means (ui, P2).
5.3 CORRELATION The importance of the mean and variance in characterizing the distribution of a random variable was discussed earlier, and the covariance was described as a useful measure of dependence between two random variables. It was shown in Theorem 5.2.5 that Cov(X, Y) O whenever X and Y are independent. The converse, in general, is not true. Example 5.3.1
Consider a pair of discrete random variables X and Y with joint pdff(x, y) = 1/4 if(x, y) = (0, 1), (1, 0), (0, 1), or (-1, 0). The marginal pdf of X isf(±1) = 1/4, fg(0) = 1/2, and fx(x) = O otherwise. The pdf of Y is similar. Thus, E(X) = - 1(1/4) + 0(1/2) + 1(1/4) = O. Because xy O whenever f(x, y) > O, it follows that
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
178
CHAPTER 5
PROPERTIES OF RANDOM VARIABLES
E(XY) = O. f(O, O) = O
Consequently,
Cov(X,
Y) = E(XY) - E(X)E(Y) = O. However,
f(0)f(0), so X and Y are dependent. In general, we can conclude
that X and Y are dependent if Cov(X, Y) O, but Cov(X, Y) = O does not necessarily imply that X and Y are independent.
Definition 5.3.1 If X and Y are random variables with variances
a and a and covariance axy = Cov(X, Y), then the conelation coefficient of X and Y is axy
(5.3.1)
p=
The random variables X and Y are said to be uncorrelated if p = O; otherwise they are said to be correlated. A subscripted notation, Pxy, also is sometimes used. The following theorem gives some important properties of the correlation coefficient. Theorem 5.3.1
If p is the correlation coefficient of X and Y, then
1p
i
(5.3.2)
and
p = ±1 if and only if Y = aX + b with probability 1 for some a O and b
(5.3.3)
Proof For convenience we will use the simplified notation /li=Px, /1214Y, clx, a2 a, and c12 =
l
To show equation (5.3.2), let Y X W=--p a2 a1
so that
(1)2
+
(e
2 2
cl12
a1 a2
= i +p2
22
= i p2O because Var(W)
2 O.
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
5.3
179
CORRELATION
To show (5.3.3), notice that p = ± i implies Var(W) = 0, which by Theorem 24 7 implies that PEW = Pw] = i so that with probability i, Y/a - Xp/c71 = P2/2 - Pi P/0i, or Y = aX + b where a = PO2/01 and b = P2 Pi po2/a1. On the other hand, if Y = aX + b, then by Theorems 2.4.3 and 5.2.4, a = a2o' and
cr12=ao,inwhichcasep=a/IaI,sothatp=1ifa>0andp= lifa<0. m Example 5.3.2
Consider random variables X and Y with joint pdf of the form f(x, y) = 1/20 if (x, y) e C, and zero otherwise, where
C={(x, y)I0
FIGUflE s.i
Region corresponding toO
y=x
X
Io
Although Y is not a function of X, the joint distribution of X and Y is "clustered" around the line y = x, and we should expect p to be near 1. The variance of X is a = (10)2/12 = 25/3, the variance of Y is a = (10)2/12 + (2)2/12 = 26/3, and the covariance is a12 = E(XY) - E(X)E(Y) = 25/3. Thus, the correlation coefficient is 25/3
.,/25/3 J26/3
-
/25/26
0.981
which, as expected, is close to 1.
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
180
CHAPTER 5
PROPERTIES OF RANDOM VARIABLES
5.4 CONDITIONAL EXPECTATION It is possible to extend the notions of expectation and variance to the conditional framework.
Definition 5..4i If X and Y are jointly distributed random variables, then the conditiona' expectation of Y given X = x is given by
yf(yx)
E(YIx) =
if X and Y are discrete
(5 4 1)
if X and Y are continuous
(5.4.2)
y
roe
E(Y jx)
j
= - oe
yf(y x)
dy
Other common notations for conditional expectation are E11(Y) and E(YIX = x).
Consider Example 4.5.2, where a certain airborne particle lands at a random point (X, Y) on a triangular region. For this example, the conditional pdf of Y given X = x is
f(yjx)=2
O
x
The conditional expectation is x/2
2
E(Ylx)=j Y(_)dY (2/x)(X/2)2 2
If we are trying to "predict" the value of the vertical coordinate of (X, Y), then E(YIx) should be more useful than the marginal expectation, E(Y), because it uses information about the horizontal coordinate. Of course, this assumes that such information is available.
Notice that the conditional expectation of Y given X = x is a function of x, say u(x) = E(Y x). The following theorem says that, in general, the random variable u(X) = E(Y X) has the marginal expectation of Y, E(Y).
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
5.4
Theorem 5.4.1
181
CONDITIONAL EXPECTATION
If X and Y are jointly distributed random variables, then
E[E(YX)] =E(Y)
(543)
Proof Consider the continuous case E[E(YI X)]
=J-
=J -
E(Y I x)f1(x) dx
J
-
(' ('
-J
J
Cc
= JYf2(Y) =E(Y)
dy
In the previous example, suppose that we wish to know the marginal expectation, E(Y). One possibility would be to use the previous theorem, with E(Y Ix) = x/4 and f1(x) = x/2 for O
E(Y) = E[E(YjX)] =
Example 5.4.2
/xVx'\ dx =
Ä)
Suppose the number of misspelled words in a student's term paper is a Poissondistributed random variable X with mean E(X) = 20. The student's roommate is asked to proofread the paper in hopes of finding the spelling errors. On average, the roommate finds 85% of such errors, and if x errors are present in the paper,
then it is reasonable to consider the number of spelling errors found to be a binomial variable with parameters n = x and p = .85. In other words, conditional on X = x, Y BIN(x, .85), and because, in general, the mean of a binomial distribution is np, the conditional expectation is E(YIx) = .85x. Thus, the expected number of spelling errors that are found by the roommate would be E(Y) = E[E(YIX)] = E(.85X) = .85E(X) = 17. An interesting situation occurs when X and Y are independent. Theorem 5.4.2
If X and Y are
independent random variables, then E(Y x) = E(Y)
and E(X I y) = E(X).
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
182
CHAPTER 5
PROPERTIES OF RANDOM VARIABLES
Proof If X and Y are independent, thenf(x, y) = f1(x)f2(y), so thatf(yjx) = f(x y) = f1(x). In the continuous case
[2(Y)
and
(co
E(YIx)=J
-
yf(yx)dy co
('co
= JYf(Y) - co =E(Y)
dy
The discrete case is similar
It also is useful to study the variance of a conditional distribution, which usually is referred to as the conditional variance.
Definition 54.2 The conditional variance of Y given X = x is given by
Var(YIx) = E{[Y - E(Ylx)]2
x}
(5.4.4)
An equivalent form, which is analogous to equation (2.4.8), is
Var(YIx) = E(Y21x) - [E(YJx)]2
(5,4.5)
Theorem 5.4.3 If X and Y are jointly distributed random variables, then
Var(Y) = E[Var(YX)] + Var[E(YIX)]
(5.4.6)
Proof E[Var(YIX)] = EX{E(Y2IX) - [E(YIX)]2} = E(Y2) - E[E(YIX)]2} = E(Y2) - [E(Y)]2 - {E[E(Y IX)]2 - [E(Y)]2}
= Var(Y) - Yar[E(YIX)] This theorem indicates that, on the average (over X), the conditional variance will be smaller than the unconditional variance. Of course, they would be equal if X and Y are independent, because E(YIX) then would not be a function of X,
and Var{E(Y IX)] would be zero. If one is interested in estimating the mean height of an individual, E(Y), then the theorem suggests that it might be easier to estimate the individual's (conditional) height if you know the person's weight,
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
183
5.4 CONDITIONAL EXPECTATION
because the unconditional population of heights would have greater variance than the reduced population of individuals all with fixed weight, x. This fact leads to the important area of regression analysis, where information about one variable is used to aid in understanding a related variable.
An argument similar to that of Theorem 5.4.1 yields the following, more general, theorem. Theorem 5.4.4
If X and Y are jointly distributed random variables and h(x, y) is a function, then
E{h(X, Y)] = E{E[h(X, Y) I X]}
(5.4.7)
This theorem says that a joint expectation, such as the left side of equation (5.4.7), can be solved by first finding the conditional expectation E[h(x, Y) and then finding its expectation relative to the marginal distribution of X. This theorem is very useful in certain applications when used in conjunction with the following theorem which is stated without proof I
Theorem
5.45 If X and Y are jointly distributed random variables, and g(x) is a function, then E[g(X)Yj x] = g(x)E(Y x)
Example 5.4.3
(5.4.8)
If (X, Y) - MULT(n, Pi' P21, then by straightforward derivation it follows that X - BIN(n, pi), Y - BIN(n, P2), and conditional on X = x, Y x ' BIN(n - x, p), where p = P2/(l - Pi). The means and variances of X and Y follow from the results of Section 5.2. Note also that E(Y x) = (n - x)p2/(1 - pi). The two previous theorems can be used to find the covariance of X and Y. I
Specifically,
E(XY) = E[E(XYIX)]
= E[XE(YIX)] L
- X)p2 1Pi
[ P2 [nE(X) - E(X2)] [i - Pi_ If we substitute E(X) = Pi and E(X2) = Yar(X) + (np1)2 = np1[1 + (n - l)pi] and simplify, then the result is E(XY) = n(n
- l)p P2
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
184
CHAPTER 5
PROPERTIES OF RANDOM VARIABLES
Thus,
Cov(X, Y) = n(n - i)pj P2 - (np1)(np2)
= np1 P2 = E(X), P2 =
As in Theorem 5.3.1, we adopt the convenient notation = Var(X), and a = Var(Y).
Theorem 546 If E(Y I x) is a linear function of x, then
E(YIx)=,12+p(x/1j)
(549)
and
E[Var(YIX)] = o(1 p2)
(5.4.10)
Proof Consider (5.4.9). If E(Y I x) = ax + b, then P2
= E(Y) = E[E(y X)] = Ex(aX + b) = ap1 + b
and
cixy = E[(X - p1)(Y - it2)]
=E[(Xp1)Y]-0 = E{E[(X - p1)Y I X]}
=
E[(X - p1)E(YjX)]
= EX[(X - p1)(aX + b)] = aci Thus, a2 a2 ciy a=j-=p and b=p2pp1 a1 a1
Equation (5.4.10) follows from Theorem 5.4.3,
X)] = Var(Y) - Varx[it2 + p
=Var(Y)p
(X
Pi)]
2
= a(1 p2)
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
5.4
185
CONDITIONAL EXPECTATION
Note that if the conditional variance does not depend on x, then
Var(YIx)=Ex [Var(YIX)] =a(1
p2)
Thus, the amount of decrease in the variance of the conditional population
compared to the unconditional population depends on the correlation, p, between the variables. An important case will now be discussed in which E(Y x) is a linear function of x and Var(YI x) does not depend on x. I
BIVARIATE NORMAL DISTRIBUTION
A pair of continuous random vanables X and Y is said to have a bivariate normal distribution if it has a joint pdf of the form
2ira1 a2
(xpj\(yp2 i [(x.u1"2 I xexp i 2P1 2(1 p 2 ) L\ \ a1 J\n 01
\
a
J
a2
j cx
(5.4.11)
A special notation for this is (X, Y)
BVN(u1, I2,
a, o, p)
(5.4.12)
which depends on five parameters, cia
> O,
a2 > O, and 1
a) and
Y
N(p2,
a)
and p is the correlation coefficient of X and Y.
Proof See Exercise 23.
Strictly speaking, first we should have established that
f(x, y) dx dy = i but this would follow from Theorem 5.4.7.
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
186
CHAPTER 5
PROPERTIES OF RANDOM VARIABLES
It was noted in Section 5.2 that independent random variables are uncorrelated. In other words, if X and Y are independent, then p = O. Notice that the above joint pdf, (5.4.11), factors into the product of the marginal pdf's if p = O. Thus, for bivariate normal variables, the terms "uncorrelated" and "independent" can be used interchangeably, although this is not true for general bivariate distributions. Theorem 5.4.8 If (X, Y)
BVN(p,p2, cr
,p), then
conditional on X = YIx
NJ
P2 + p -(x - pi), o(1 O.i
conditional on Y = y, Xf y
N
Proof See Exercise 24.
This theorem shows that both conditional expectations are linear functions of the conditional variable, and both conditional variances are constant. If p is close to zero, then the conditional variance is close to the marginal variance and if p is close to ±1, then the conditional variance is close to zero
As noted earlier in the chapter, a conditional expectation, say E(Y X), is a function of x that, when applied to X, has the same expected value as Y This function sometimes is referred to as a regressoen function, and the graph of I
E(Y I x) is called the regression curve of Y on X = x. The previous theorem asserts that for bivariate normal variables we have a linear regression function. It follows
fron Theorem 5 4 3 that the variance of E(Y I X) is less than or equal to that of Y, and thus the regression function, in general, should be a better estimator of P2 = E(Y) than is Y itself. Of course, this could be explained by the fact that the marginal distribution of Y makes no use of information about X, whereas the conditional distribution of Y given X = x does.
5j JOINT MOMENT GENERATING FUNCTIONS The moment generating function concept can be generalized to k-dimensional random variables.
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
5.5
187
JOINT MOMENT GENERATING FUNCTIONS
Definitkn 5.5.1 The joint MGF of X = (X,.., Xk), if it exists, is defined to be
Mx(t)E[exP(tiX))] where t =
(ti, ..., t)
(5.51)
and h
The joint MGF has properties analogous to those of the univariate MGF. Mixed moments such as E[XXJ5] may be obtained by differentiating the joint MGF r times with respect to t, and s times with respect to t and then setting all = O. The joint MGF also uniquely determines the joint distribution of the vari
ablesXl,...,Xk. Note that it also is possible to obtain the MGF of the marginal distributions from the joint MGF. For example, -
Theorem 5.5.1
M(t1) = M 1(t1, O)
(5.5.2)
M(t2)= M1(O,
(5.5.3)
t2)
If M (t1, t2) exists, then the random variables X and Y are independent if and only if M
Example 55.1
Suppose that
X = (X1, ..., X) MULT(n, p, ..., Pk) We have discused earlier that the marginal distributions are binomial, X BIN(n, p.).
The joint MGF of the multinomial distribution may be evaluated along the lines followed for the binomial distribution to obtain
M(t) =
E[exP (x)] (p1eu1l .. Xi I (5.5.4)
where
Pk+
=1p1.. Pk
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
188
CHAPTER 5
PROPERTIES OF RANDOM VARIABLES
Clearly, the joint marginal distributions also are multinomial, For example, if MULT(n, Pi, P2, P3), then
(X1, X2, X3)
M1
2(t1, t2) =
Mi,,
= [p1
2,
3(t1, t2,
O)
e + P2 e12 +
+ (1 - Pi
= [Pi e11 + P2 e'2 + i - Pi
P2
P2]
so (X1, X2)
MULT(n, Pi'
P2)
Example 5.5.2 Consider a pair of bivariate normal random variables with means p and P2, variances i and o, and correlation coefficient p. In other words, (X, Y) BVN(u1, P2' o, c, p). The joint MGF of X and Y can be evaluated directly by integration: exp (t1 x + t2 y)f(x, y) dx dy withf(x, y) given by equa-
$J
tion (5.4.11). The direct approach is somewhat tedious, so we will make use of some of the results on conditional expectations. Specifically, from Theorems 5.4.4 and 5.4.5, it follows that
M, 1(t1, t2) = E[exp (t1 X + t2 Y)] = Ex{E[exp (t1 X +
= E{exp
(t1 X)E
t2
[exp
flJX]} (t2 Y)
I
X]}
(5.5.5)
Furthermore, by Theorem 5.4.8
YIxN[2 +p(xp1),(1
I
so that E[exp
(t2 Y) j
= xj = exp {[P2 + p
(x - i)]t2 +
(1
- p 2)t/2}
After substitution into equation (5.5.5) and some simplification, we obtain
M
(t1, t2) = exp Lu1 t1 + P2 t2 + (cr
t+
t + 2P1 2 1 t2)] (5.5.6)
SUMMARY The main purpose of this chapter was to develop general properties involving both expected values of and functions of random variables. Sums and products are important functions of random variables that are given special attention. For example, it is shown that the expected value of a sum is the sum of the (marginal)
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
189
EXERCISES
expected values. If the random variables are independent, then the expected value of a product is the product of the (marginal) expected values and the variance of a sum is the sum of the (marginal) variances.
The correlation coefficient provides a measure of dependence between two random variables. When the correlation coefficient is zero, the random variables
are said to be uncorrelated For two random variables to be independent, it is necessary, but not sufficient, that they be uncorrelated When the random variables are dependent, the conditional expectation is useful n attempting to predict the value of one variable given an Observed value of the other variable
EXERCISES 1.
Let X1, X2, X3, and X4 be independent random variables, each having the same distribution with mean 5 and standard deviation 3, and let Y = X1 + 2X2 + X3 Find E(Y) Find Var(Y).
Z.
Suppose the weight (in ounces) of a major league baseball is a random variable X with mean p = 5 and standard deviation a = 2/5. A carton contains 144 baseballs. Assume that the weights of individual baseballs are independent and let T represent the total weight of all the baseballs in the carton Find the expected total weight E(T) Find the variance, Var(T).
3
Suppose X and Y are continuous random variables with joint pdff(x, y)
24xy
- X4.
if
O
Find Cov(X + 1, Y - 2).
Find Cov(X + i 5Y - 2) Find Cov(3X + 5, X).
4
Let X and Y be discrete random variables with j oint pdff(x y) = 4/(Sxy) if
x = 1 2 and y = 2 3 and zero otherwise Find E(X). E(Y)
(e) E(XY) (d) Cov(X Y)
5. Let X and Y be continuous random variables with joint pdff(x, y) = x + y if 0< x < 1 and O
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
190
CHAPTER 5
PROPERTIES OF RANDOM VARIABLES
(b) E(X + Y). (e) E(XY). Cov(2X,3Y).
E(Yx).
6. If X, Y, Z, and W are random variables, then show that: Cov(X ± Y, Z) = Còv(X, Z) ± Cov(Y, Z). Cov(X + Y, Z + W) Cov(X, Z) + Cov(X, W) + Cov(Y, Z) + Cov(Y, W). Cov(X + Y, X - Y) = Var(X) - Var(Y). 7.
Suppose X and Y are independent random variables with E(X) = 2, E( Y) = Var(X) 4, and Var(Y) = 16. Find E(5X Y). Find Var(5X Y). Find Cov(3X + Y, Y). Find Cov(X, 5X Y).
8.
1f X1, X2.....Xk and Y1, Y,..., I,, are jointly distributed random variables, and if a1, a2, ..., a, and b1, h2, ..., b,,, are constants, show that k
ni
C
4=1 j1
4=1
abCov(X4, 1Ç)
0. Use the result of Exercise 8 to verify equations (5.2.14) and (5.2.15). 10. For the random variables in Exercise 5, find the approximate mean and variance of
w = xy.
Il.
Letf(x, y) = 6x; O 'z x
Cov(X, Y). p.
f(yx) E(Yjx).
12. Suppose X and Y are continuous random variables with joint pdff(x, y) = 4(x - xy) if O 'z x < 1 and O
Find E(X - Y). Find Var (X - Y). What is the correlation coefficient of X and Y' What is E(Y x)?
13. Letf(x, y) = 1 if O 'z y
'z
2x, O < x < 1, and zero otherwise. Find:
(a) f(y x).
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
191
EXERCISES
E(Ylx). p.
14.
15.
For the joint pdf of Exercise 30 in Chapter 4(page 170): Find the correlation coefficient of X and Y. Find the conditional expectation, E( Y Ix). Find the conditional variance, Var(Y x). (a) Determine E( YJ x) in Exercise 4. (b) Determine Var(Y x) in Exercise 4.
16. Let X and Y have joint pdff(x, y) = e
if O < x
E(X I y),
17. Suppose that the conditional distribution of Y given X = x is Poisson with mean E(YIx) = x, YIx P01(x), and that X EXP(1). Find E(Y). Find Var(Y).
18. One box contains five red and six black marbles. A second box contains 10 red and five black marbles. One marble is drawn from box i and placed in box 2. Two marbles then are drawn from box 2 without replacement. What is the expected number of red marbles obtained on the second draw? 19. The number of times a batter gets to bat in a game follows a binomial distribution N «. BIN(6, 0.8). Given the number of times at bat, n, that the batter has, the number of hits he gets conditionally follows a binomial distribution, X n BIN(n, 0.3). Find E(X). Find Var(X). (e) Find E(X2).
20. Let X be the number of customers arriving in a given minute at the drive-up window of a local bank, and let Y be the number who make withdrawals. Assume that X is Poisson distributed with expected value E(X) = 3, and that the conditional expectation and variance of Y given X = x are E(Yix) = x/2 and Var(YIx) = (x + 1)/3. FindE(Y). Var(Y). Find E(XY).
21. Suppose that Y1 and Y2 are continuous with joint pdff(y1, Y2) = 2e12 if O
22. Find the joint MGF of the continuous random variables X and Y with joint pdf f(x, y) = e if O
23. Prove Theorem 5.4.7. Hint: Use the joint MGF of Example 5.5.2, 24. Prove Theorem 5.4.8.
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
192
CHAPTER 5
PROPERTIES OF RANDOM VARIABLES
25. Let X1 and X2 be independent normal random variables, X
and Y2=X1+X2.
N(j, of), and let Y1 = X1
Show that Y1 and Y2 are bivariate normal. What are the means, variances, and correlation coefficient of Y1 and Y2? Find the conditional distribution of Y2 given Y1 = Yi Hint: Use Theorem 5A.8.
26. Prove Theorem 5.2.4.
27
Prove Theorem 5.2.5.
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
T
A
E
FUNCTIONS OF RANDOM VARIABLES
NTRODUCÏION In Chapter 1, probability was defined first in a set theoretic framework. The concept of a random variable then was introduced so that events could be associ-
ated with sets of real numbers in the range space of the random variable. This makes it possible to mathematically express the probability model for the population or characteristic of interest in the form of a pdf or a CDF for the associ ated random variable, say X. In this case, X represents the initial characteristic of interest, and the pdf,f(x), may be referred to as the population pdf. It often may be the case that some function of this variable also is of interest. Thus, if X represents the age in weeks of some component, another experimenter may be expressing the age, Y, in days, so that Y = 7X. Similarly, W = ln X or some other function of X may be of interest. Any function of a random variable X is itself a random variable, and the probability distribution of a function of X is determined by the probability distribution of X. For example, for Y above, 193
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
194
CHAPTER 6
FUNCTIONS OF RANDOM VARIABLES
21] = P[2 X 3], and so on. Clearly, probabilities concerning functions of a random variable may be of interest, and it is useful to be able to express the pdf or CDF of a function of a random variable in terms of the pdf or CDF of the original variable. Such pdf's sometimes are referred to as "derived" distributions. Of course, a çertain pdf may represent a population pdf in one application, but correspond to a derived distribution in a different application In Example 4.6.1, the total lifetime, X + X2, of two light bulbs was of interest, and a method of deriving the distribution of this variable was suggested. General P[14 z Y
techniques for deriving the pdf of a function of random variables will be discussed in this chapter.
62 THE CDF TECHNIQUE We will assume that a random variable X has CDF F(x), and that some function of X is of interest, say Y = u(X). The idea behind the CDF technique is to express the CDF of Y in terms of the distribution of X. Specifically, for each real y, we can define a set = {xI u(x) y}. It follows that [Y y] and [X A] are equivalent events, in the sense discussed in Section 2.1, and consequently
F(y) = P[u(X)
(6.2.1)
y]
which also can be expressed as P[X u Ar]. This probability can be expressed as the integral of the pdf,f(x), over the set A if X is continuous, or the summation off(x) over x in A if X is discrete. For example, it often is possible to express [u(X) y] in terms of an equivahrt event [x1 X x2], where one or both of the limits x1 and x2 depend on y. In the continuous case, ('X2
F(y)
f,(x) dx = JX' = F(x2) F(X1)
(6.2.2)
and, of course, the pdfisf1(y) = (d/dy)F1(y).
Example 6.2.1
Suppose that F(x) = i - e2x, O < x < co, and consider Y = ei". We have
F(y)
P[Y
y]
= P[eX <] = P[X
in y]
= F(in y) = i - y_2
i
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
6.2 THE CDF TECHNIQUE
In this case, x1 = fy(y) =
Example 6.2.2
d
195
and x2 = in y, and the pdfof Yis
Fy(y) = 2y3
i
Consider a continuous random variable X, and let Y = X2. It follows that F1(y)
= P[X2
y]
= P[ -
X
= F(J) - F()
(6.2.3)
The pdf of Y can be expressed easily in terms of the pdf of X in this case, because
f(y) =
[F/) - F( (6,2.4)
for y> O. Evaluation of equation (6.2.1) can be more complicated than the form given by equation (6.2.2), because a union of intervals may occur.
Example 6.2.3 A signal is sent to a two-sided rotating antenna, and the angle of the antenna at the time the signal is received can be assumed to be uniformly distributed from O
to 2it, ® - UNIF(0, 2ir). The signal can be received if Y = tan e> Yo' For example, y i corresponds to the angle 45° <® < 90° and 225° <® < 270°. The CDF of Y when y <0 is F(y) = P [tan (®)
y]
ir + tan1 (y)] + P[3ir/2
By symmetry, P[Y> y] = P[Y < y]. Thus, for y > 0,
F(y)=1P[Y>y]=1P[Y< y] = i - [1/2 + (1/it) tan1 (Y)] = 1/2 + (1/ir) tan' (y)
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
196
CHAPTER 6 FUNCTIONS OF RANDOM VARIABLES
It is interesting that
dF(y)
i
i
itl+y
dy
- cc
2
This is the pdf of a Cauchy distribution (defined in Chapter 3), Y
CAU(l, O).
The CDF technique also can be extended to apply to a function of several variables, although the analysis generally is more complicated.
Theorem 6.2.1
Let X = (X1, X2, ..., X1) be a k-dimensional vector of continuous random variables, with joint pdff(x1, x2,..., X1). If Y = u(A) is a function of X, then
F(y) = P[u(X)
f(X1,
f. where A = {x u(x)
y]
...,
1)
dx1
...
dx1
(6.2.5)
y}.
Of course, the limits of the integral (6.2.5) are functions of y, and the convenience of this method will depend on the complexity of the resulting limits
Example 6.2.4
In Example 4.6.1 we considered the sum of two independent random variables, say Y = X1 + X2, where X. EXP(1). The set required in (6.2.5) is, as shown in Figure 6.1,
A = {(x1, x2)IO
FIGURE 6.1
RegionA such that x1 +x2
y
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
197
6.3 TRANSFORMATION METHODS
arid consequently
('' C-x2
F(y) = I
o
+xz) dx1 dx2
I
Jo
=iand fy(y) =
- F(y)
>0
= ye
It is possible in many cases to derive the pdf directly and without first deriving the CDF.
63 TRANSFORMATION METHODS First we will consider transformations of variables in one dimension. Let u(x) be a real-valued function of a real variable x, If the equation y = u(x) can be solved uniquely, say x = w(y), then we say the transformation is one-to-one. It will be necessary to consider discrete and continuous cases separately, and also whether the function is one-to-one.
ONE-TO-ONE TRANSFORMA TIONS
Thororn 6,3.1 Discrete Case Suppose that X is a discrete random variable with pdffx(x) and that Y = u(X) defines a one-to-one transformation. In other words, the equation y = u(x) can be solved uniquely, say x = w(y) Then the pdf of Y is
= f(wtv)) where B = {y I t'(y)>
yeB
(6.3.1>
O).
Proof This follows because f(y) = P[Y = y] = P[u(X) = y] = P[X = w(y)] = f(w(y)).
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
198
Example 63.1
CHAPTER 6
FUNCTIONS OF RANDOM VARIABLES
Let X '- GEO(p), so that
f(x) =
= 1, 2, 3,
Another frequently encountered random variable that also is called geometric is of the form Y = X - 1, so that u(x) = x - 1, w(y) = y + 1, and
= f( + 1) pqY y=O,l,2,... which is nothing more than the pdf of the number of failures before the first success
Theorem 6.3.2 Continuous Case Suppose that X is a continuous random variable with pdff(x), and assume that Y = u(X) defines a one-to-one transformation from {x I f(x)> O} on to B = {y I f(y) > O} with inverse transformation x = w(y). A If the derivative (d/dy)w(y) is continuous and nonzero on B, then the pdf of Y is
f1() =
w(y)
yeB
(5.3.2)
Proof If y = u(x) is one-to-one, then it is either monotonic increasing or monotonic decreasing. If we first assume that it is increasing, then u(x) y if and only if x
w(y). Thus, F1(y)
= P[u(X)
y] = P[X
w(y)] = F(w(y))
and, consequently,
=
F(w(Y))
= dw(y)
F(w(y))
w(y)
= f(w(y)) - w(y) dy
because (d/dy)w(y) > O in this case. In the decreasing case, u(x) y if and only if w(y)
F(y) = P[u(X)
y] = P[X
x, and thus
w(y)] = i - F(w(y))
and
f1(y) = f(w(y)) = f(w(y))
w(y)
f- w(y)
because (d/dy)w(y)
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
199
6.3 TRANSFORMATION METHODS
In this context, the derivative of w(y) is usually referred to as the Jacobian of the transformation, and denoted by J = (d/dy)w(y). Note also that transforming a continuous random variable is equivalent to the problem of making a change of variables in an integral. This should not be surprising, because a continuous pdf is simply the function that is integrated over events to obtain probabilities
ExampI 6.3.2 We wish to use Theorem 6.3.2 to determine the pdf of Y = e' in Example 6.2.1. We obtain the inverse transformation x = w(y) = in y, and the Jacobian J = w'(y) = 1/y, so that 1
fy(y) = fx (In y)
y
2e_21(! 'y
i
= 2y3
y n B = (1, cc) In a transformation problem, it is always important to identify the set B where fY(y)> O, which in this example is B = (1, cc), because ex> i when x> O.
The transformation Y = eX, when applied to a normally distributed random variable, yields a positive-valued random variable with an important special distribution.
Example 6.32 A distribution that is related to the normal distribution, but for which the random variable assumes only positive values, is the lognormal distribution, which is defined by the pdf
fo ith
i
(In yp)2/2c2
O
ya
parameters -
(6.3.3)
will be denoted by LOGNu, ci2), and it is related to the normal distribution by the relationship Y '- LOGN(p, a2) if and only ifX =ln Y N(u, a2) Y
In some cases, the lognormal distribution is reparameterized by letting /1 = In 6, which gives
fo
i
(6.3.4)
yaJ2ir
and in this notation O becomes a scale parameter. It is clear that cumulative lognormal probabilities can be expressed in terms of normal probabilities, because if Y LOGN(u, a2), then
F(y) = P[Y
y] = P [In Y
= P[X
In y]
in y]
Y_P)
(6.3.5)
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
200
CHAPTER 6
FUNCTIONS OF RANDOM VARIABLES
Another important special distribution is obtained by a log transformation applied to a Pareto variable.
Exampk
If X
PAR(1, 1), then the pdf of Z = In X is
f(z) - (1 + ez)2 and the CDF is
F(z)
¿ =i+
for all real z. 1f we introduce location and scale parameters and O, respectively, by the transformation y = u(z) = + Oz, then the pdf of Y = u(Z) is
exp [(y)/0]
1 O
(6.3.6)
l + exp [(y -
for all real y. The distribution of Y is known as the logistic distribution, denoted by Y '- LOG(8, ). This is another example of a symmetric distribution, which follows by noting that
(2Z/2Z -z ,
¿ - , f2(z) = (1 + e_z)2 - (ez 1)2 - fz(z) +
The transformation y
+ Oz provides a general approach to introducing
location and scale parameters into a model
Recall that Theorem 2.4.1 was stated without proof. A proof for the special case of a continuous random variable X, under the conditions of Theorem 6 3 2, now will be provided. Consider the case in which u(x) is an increasing function so that the inverse transformation, x = w(y), also is increasing E[u(X)]
= j=j
u(x)f(x) dx u[w(y)]f[w(y)]
= jyf(y)
d
w(y) dy -r "y
dy
=E(Y) The case in which u(x) is decreasing is similar. A very useful special transformation is given by the following theorem.
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
6.3 TRANSFORMATION METHODS
201
Thoorm 6.33 Probability Integral Transformation If X is continuous with CDF F(x), then U = F(X)
UNIF(O, 1)
Proof We will prove the theorem in the case where F(x) is one to one, so that the inverse, F - '(u), exists:
F(u) = P[F(X)
= P[X
u]
F'(u)]
= F(F'(u)) Because O
F(x)
1,we have F(u) = O ifu
O and F(u) = i if u
1.
u
A more general proof is obtained if F '(u) is replaced by the function G(u) that assigns to each value u the minimum value of x such that u F(x),
G(u)=min{xuF(x)}
Ozul
(6.3.7)
The function G(u) exists for any CDF, F(x), and it agrees with F(u) if F(x) is a one-to-one function. The following example involves a continuous distribution with a CDF that is not one-to-one.
Exanipk
J.5
Let X be a continuous random variable with pdf
11/2 ifl
otherwise
The CDF of X, whose graph is shown in Figure 6.2, is not one-to-one, because it assumes the value 1/2 for all i x 3. FIGURE 6.2
A CDF that ;s continuous but not one-to-one F(x)
1/2 X
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
202
CHAPTER 6 FUNCTIONS OF RANDOM VARIABLES
The function G(u), for this example, is G(u) =
ifOu1/2
52u
12(u+1) if1/2
The function G(u) has another important application, which follows from the next theorem.
Theorem 6.3.4 Let F(x) be a CDF and let G(u) be the function defined by (6.3.7). If U - UNIF(O, 1), then X = G(U)
F(x).
Proof See Exercise 5.
An important application of the preceding theorem is the generation of "pseudo" random variables from some specified distribution using a computer. In other words, the computer generates data that are distributed as the observations of a random sample from some specified distribution with CDF F(x). Specifically, ìf n "random numbers," say u1, u2, ..., u,,, are generated by a random number generaLol, these represent a simulated random sample of size n from UNIF(O, 1) It follows, then, that x1, x2, ..., x where
= G(u)
i = 1, 2, ..., n
(6.3.8)
corresponds to a simulated random sample from a distribution with CDF F(x).
Of course, in many examples the CDF is one-to-one, and we could use x = F1(u), Equation (6.3.8) also can be used with discrete distributions.
Example 6.3.6
If X
BIN(1, 1/2), then O ifx
if i
x
and
G(u) {O ifOu i
1/2
ifi/2
TRANSFORMATIONS THAT ARE NOT ONE-TO-ONE
Suppose that the function u(x) is not one-to-one over A = {x f(x)> O}. Although this means that no unique solution to the equation y = u(x) exists, it usually is possible to partition A into disjoint subsets A1, A2, ... such that u(x) is
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
203
6.3 T9ANSFORMATION METHODS
one-to-one over each A. Then, for each y in the range of u(x), the equation y = u(x) has a unique solution x = w,(y) over the set A In the discrete case, it follows that Theorem 6.3.1 can be extended to functions that are not one-to-one by replacing equation (6.3.1) with
f(w/y))
f'Cv) =
(6.3.9)
f.(x) where the sum is over all x. such that u(x) = y.
That is, f,(y) = J
Example 6,3.7
Let f(x) = (; x = 2, 1, 0,
1,
2, and consider Y = 1X1. Clearly,
B = {O, 1, 2} and
fi(0) = f(0)
31
fr(i) = f(-') + f(1) f(2) = f(-2) + f(2) = Another way to express this is fy(y)
y=o
=
4 [/i
(iV
=j[) +)
An expression analogous to equation (6.3.9) for the discrete case is obtained for continuous functions that are not one-to-one by extending equation (6.3.2) to fy(y) =
fx(wj(y))
w(y)
(6.3.10)
That is, for the transformation Y = u(X), the summation is again over all the values off for which u(x) = y, although the Jacobian enters into the equation for the continuous case. We found by the cumulative method in Example 6.2.2 that if Y = X2, then
F(y) = F(..J) and by taking derivatives
[f() + f( -
(6.3.11)
Equation (6.3.11) also follows directly by the transformation method now by applying equation (6.3.10).
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
204
CHAPTER 6 FUNCTIONS OF RANDOM VARIABLES
Example 6.3.8 Suppose that X UNIF(-1, 1) and Y = X2. If we partition A =(-1, 1) into A1 = (-1, 0) and A2 = (0, 1), then y = x2 has unique solutions x1 = w1(y)
= -/ and x2 = w2(y) = Jy over these intervals. We can neglect the point x = O in this partition, because X is continuous. The pdf of Y is thus
fY(y) = fx()
+
2
i
2/
i
2/
y e B = (0, 1)
If the limits of the function u(x) are not the same over each set A of the partition, then greater care must be exercised in applying the equations. This is illustrated in the following example
Example 8,3.9 With f(x) = x2/3, 1
points with nonzero pdf, namely x1 = -/ and x2 = ,,/
that map into y
whereas for i
fo
= ,/, that
i
2jy [fx() + i F(_/)2 (/)2 +
2L i
[
2JL
3
0
In the previous example notice that it is possible to solve the problem without explicitly using the functional notation u(x) or w.(y). Thus, as suggested earlier, a simpler way of expressing equations (6.3.9) and (6.3.10), respectively, is
fÜ) =
f(x)
fy(y) =
f(x)
(6.3.12)
and dx dy
(6.3.13)
where it must be kept in mind that x = w(y) is a function of y. This simpler notation will be convenient in expressing the results ofjoint trans formations of several random variables.
JOINT TRANSFORMA TIONS The preceding theorems can be extended o apply to functions of several random variables.
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
6.3 TRANSFORMATION METHODS
205
Example 6.3.10 Consider the geometric variables X, Y GEO(p) of Example 4.4.3. As we found in this example the joint pdf of X and T = X + Y can be expressed in terms of the joint pdf of X and Y, namely
f, T(x, t) = f, 1(x, t - x) where x and t - x are the solutions of the joint transformation x = u1(x, y) and t = u2(x, y) = x + y
This can be generalized as follows. Consider a k-dimensional vector X = (X1, X2, ..., Xk) of random variables, and suppose that u1(x), u2(x), ..., u(x) are k functions of x, so that Y = u() for i = 1, ..., k defines another vector of random variables, Y = (Y1, Y2, ..., }j. A more concise way to express this is
Y=
In the discrete case, we can state a k-dimensional version of Theorem 6.3.1.
Theorem 6.3.5 If X is a vector of discrete random variables with joint pdff(x) and Y = u(X)
defines a one-to-one transformation, then the joint pdf of Y is :Iy(yi, Y2
., Y) = f(x1, X2,.
(6.3.14)
where x1, x2, ..., x,, are the solutions of y = u(x), and consequently depend on y,
Y2,,Yk
If the transformation is not one-to-one, and ifa partition exists, say A1, A2, such that the equation y = u(x) has a unique solution x = r or = (x1» X2J, ..., XkJ)
(6.3.15)
over A1, then the pdf of Y is
=
f(x1, ...
Xj)
(6.3.16)
Joint transformations of continuous random variables can be accomplished, although the notion of the Jacobian must be generalized Suppose, for example, that u1(x1, x2) and u2(x1, x2) are functions, and x1 and x2 are unique solutions to the transformation Yi = u1(x1, x2) and Y2 = u2(x1 x2) Then the Jacobian of the transformation is the determinant
ay1
ay2
ay1
aY2
(6.3.17)
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
206
CHAPTER 6 FUNCTIONS OF RANDOM VARIABLES
Example 63.11 It is desired to transform x1 and x2 into x1 and the product x1 x2. Specifically, let Yi = x1 and Y2 = x1x2 The solution is x1 = Yi and x2 = Y2/Y1, and the Jaco bian is =
= 1Iyi
L'2
For a transformation of k variables y = u(x), with a unique solution x = (x1, .x2, ..., Xk), the Jacobian is the determinant of the k x k matrix of partial derivatives: ¿3x1
3x1
'3Yi
Y2
3Yk
i3x2 Dy1
(6.3.18)
ôXk
Theorem 6.3.2 can be generalized as follows.
..., X) is a vector of continuous random variables with joint pdff(x1, x2 ,...,xk)> O on A, and Y = (Y1, Y2, ..., Y,) is defined by the one-to-one transformation
Theorem 6.3.6 Suppose that X = (X1, X2,
Y'Uj(XI,X2,...,Xk)
i=1,2,...,k
If the Jacobian is continuous and nonzero over the range of the transformation, then the joint pdf of Vis fy(Y1,
where x = (x1,
., Yk) = f(x1, ..., xk)IJI
(6.3.19)
..., x) is the solution of y =
Proof As noted earlier, the problem of finding the pdf of a function of a random variable is related to a change of variables in an integral. This approach extends readily to transformations of k variables. Denote by B the range of a transformation y = u(x) with inverse x = w(y). Assume D = B, and let C be the set of all points x = (x1, ..., xk) that map into D under the transformation. We have
P[YeD] =
=1
$ fy(y1, . .., Yk) dy1 ...
$ f(x1, ...,
Xk) dx1
.
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
6.3 TRANSFORMATION METHODS
207
But this also can be written as
JJ f[w(y1
Y),
Wk(Y1
Yk)] I J dy1
as the result of a standard theorem on change of variables in an integral. Because this is true for arbitrary D B, equation (6.3.19) follows.
Example 6.3.12 Let X1 and X2 be independent and exponential, X.
EXP(1). Thus, the joint pdf
is '
JX1, X2'.(x 1'
x2/ -- e12)
where A = {(x1, x2) O <
X1,
1x '. 1'
x2 e A
O < x2}. Consider the random variables Y1 = X1
and Y2 = X1 + X2. This corresponds to the transformation Yi = x1 and Y2 = x1 + x2, which has a unique solution, x1 = y and x2 = Y2 - Yi' The Jacobian is
J=
10
11
=1
and thus fyj, y2(Yi, Y2) = fr1, x2(y1, Y - Yi)
= e'2
(y
Y2) e B
and zero otherwise. The set B is obtained by transforming the set A, and this corresponds to Yi = x1> O and Y2 y1 =x2 >0 Thus B = «y1, y2)j0
FIGURE 6.3
Regions corresponding to the transformation y1 =x1 and y2 X2
x1 +x2
Y2
x
?ß'
Yi
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
208
CHAPTER 6
FUNCTIONS OF RANDOM VARA6LES
The marginal pdf's of Y1 and Y2 are given as follows:
= ,Jyf i
Note that Y2
e2 dy2
GAM(1, 2).
Exa,iuuYe 6.3.13 Suppose that, instead of the transformation of the previous example, we consider a different transformation, y = x1 - x2 and Y2 = x1 + x2. The solution is x1 = (y + Y2)/2 and x2 = (Y2 - Yi)/2, so the Jacobian is
1/2 1/2_1/2 The joint pdf is given by fr1, r2(y,
Y2) = eT2
(y1, Y2) e B
where in this example, B = {(y1, Y2)I Y2
FIGURE 6.4
Region corresponding to the transformation y1
x1 x2
and y2
x1 +x2
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
209
6.4 SUMS OF RANDOM VARIABLES
The marginal pdf's of Y1 and Y2 are
e2 dy2 = e1
=
= Je2
dy2
=i e1
y1
4.
Yi <0 y
>0
- co
=
= Jya _2
e2 dy1 = Y2 e2
As in the previous example, Y2 tial distribution DE(l, 0)
Y2
>0
GAM(1, 2), and Y has the double exponen-
It is possible to extend Theorem 6.3.6 to transformations that are not one-toone in a manner similar to equation (6.3.13). Specifically, if the equation y = u(x) can be solved uniquely over each set in a partition A1, A2, ..., to yield solutions such as in equation (6.3,15), and if these solutions have nonzero continuous Jacobians, then
fy(yj,...,yk)=fx(xlI,...,xkI)lJ1I
(6.3.20)
where J is the Jacobian of the solution over A.. An important application of equation (6.3.20) will be considered in Section 6.5, but first we will consider methods for dealing with sums
6.4 SUMS OF RANDOM VARIABLES Special methods are provided here for dealing with the important special case of sums of random variables.
CONVOLUTION FORMULA If one is interested only in the pdf of a sum S = X1 + X2, where X1 and X2 are continuous with joint pdff(x1, x2), then a general formula can be derived using the approach of Example 6.3.12, namely f5(s)
= J-
f(t, s - t) dt
(6.4.1)
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
210
CHAPTER 6 FUNCTIONS OF RANDOM VARIABLES
If X1 and X2 are independent, then this usually is referred to as the convolution formula, f8(s)
Example 6.4.1
=
j
fj(t)f2(s - t) dt
(6.4.2)
Let X1 and X2 be independent and uniform, X
UNIF(0,
1),
and let
S = X1 + X2. The region B corresponding to the transformation t = x1 and s = x1 + x2 is B = {(s, t) O < t
Regions corresponding to the transformation t =x1 and s =x1 +x2
t=s
t=s-1
Thus, from equation (6.4.2) we have
f5(s)=
Ç
dt=s
O
Jo
=
Js - i
dt=2s
=1sli
ls<2 0
and zero otherwise.
In some cases it may be necessary to consider points other than just the boundaries in determining the new, range space B. Care must be exercised in determining the appropriate limits of integration, depending on B
ExampiG 6.4,2
Suppose that X1 and X2 are independent gamma variables, f(x1, X2)
=
- lx_ le_xi x2
0< x,< x
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
211
6.4 SUMS OF RANDOM VARIABLES
Let Y1 = X1 + X2 and Y2 = X1/(X1 + X2), x1 = Yi Y2 and x2 = Yi(l - Y2) We have Y2
Y1
1Y2
Yi
with
inverse
transformations
=
and Yil (6.4.3)
if (Yi, Y2) e B = {(y1, Y2)IO < y
all points in A are included, and such a line maps into the line Y2
= x1/(x1 + kxj) = 1/(1 + k), where y = x1 + kx1 = (1 + k)x1 goes between O and cc as O < x1 < cc. Thus, as k goes from 0 to co, B is composed of all the parallel lines between Y2 = O and Y2 = 1.
It is interesting to observe from equation (6.4.3) that Y and Y2 are independent random variables and Y1 GAM(l, X + /1).
Example 6.4.3 Assume that X1, X2, and X3 are independent gamma variables, X GAM(1, z); i 1,2, 3. The joint pdfis f11,x2,x3(x1, x2, x3) =
/3
3j
0< x< cc
=i 3
X, i = 1,2; and Y3 =
Let Z = X1
j=1
Xwith inverse transformation j=1
x1=y1y3, x2=y2y3, and x3=y3(1-
- Y2)
We have y3 O
_y3
O
y
y' Y2
'Y1-Y2
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
212
CHAPTER 6
FUNCTIONS OF RANDOM VARIABLES
and 21 +22 +23 y3 fY1, Y2, Y3(Y1, Y2, Y3) -
Y3 21-1Y222-III '
Yi f(1)F(2)F(3)
Y1Y21 (Yi,
Y2'
y3)c B
where
B={(y1,y2,y3)IO
/
k
Yk=
i=1
k
XiGAM(\1
i=1
Sums of independent random variables often arise in practice. A technique based on moment generating functions usually is much more convenient than using transformations for determining the distribution of sums of independent random variables; this approach will be discussed next.
MOMENT GENERATING FUNCTION METHOD Theorem 6.4.1 If X1,
..., X are independent random variables with MGFs M(t), then the
MGF of Y =
X1 is
M(t) = M1(t) '.. M4t)
(6.4.4)
Proof Notice that eti' =
= e'1 ...
so by property (5.2.6),
M(t) = E(e') = E(etXl ... e) = E(eX) ... E(e') Mx1(t)
M(t)
This has a special form when X1, ..., X represents a random sample from a population with common pdff(x) and MGF M(t), namely
M(t) =
(6.4.5)
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
213
6.4 SUMS OF RANDOM VARIABLES
As noted in Chapter 2, the MGF of a random variable uniquely determines its distribution.
It is clear that the MGF can be used as a technique for determining the distribution of a function of a random variable, and it is undoubtedly more impor-
tant for this purpose than for computing moments. The MGF approach is particularly useful for determining the distribution of a sum of independent random variables, and it often will be much more convenient than trying to carry
out a joint transformation. If the MGF of a variable is ascertained, then it is necessary to recognize what distribution has that MGF. The MGFs of many of the most common distributions have been included in Appendix B Xk be independent binomial random variables with respective
Example 6.4.4 Let X1,
parameters n, and p
,
X, It follows that
BIN(n,, p, and let Y = 1=1
M(t) = M1(t) ... = (pe' + q)1 ... (pe' + q)k = (pe' + q)fh++Hk We recognize that this is the binomial MGF with parameters n1 + p, and thus, Y '- BIN(n1 +
Example 6.4.5
+ nk and
+ n, p).
Let X, ...,
X,, be independent Poisson-distributed random variables, and let Y = X1 + ... + X,,. The MGF of X. is ÏvI,.,(t) POI(uJ, = exp [u.(e' - 1)], and consequently the MGF of Y is
X.
M(t) = exp [p1(e' - 1)] .. exp [1u,,(e' - 1)] .
= exp [(pi + which shows that Y
Example 6.4.6
+ p,,)(e' - 1)]
POIjt1 + ... + i,j.
Suppose that X1, ..., X,, are independent gamma-distributed random variables with respective shape parameters ,c1, K2, ..., ¿c,, and common scale parameter O,
GAM(6,ic)fori= 1,...,n.TheMGFofXis M(t) = (1 - Ot)" t < 1/O
X,thentheMGFofYis
IfY= É=i
M(t) = (1 Ot)'
(1-
= (1 -
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
214
CHAPTER 6
FUNCTIONS OF RANDOM VARIABLES
and consequently, Y ' GAM(O, K1 + some earlier examples.
Example 6.4.7
Let X1, ..., X.
X,,
+ K,,). Of course, this is consistent with
be independent normally distributed random variables,
N(u, ai), and let Y = >IX,. The MGF of X. is
M(t) = exp (jt + ot2/2) and thus the MGF of Y is
M(t) = exp (u t + at2/2)
=exp(u1t+at2/2+
exp (jt + a,t2/2)
+p,,t+at2/2)
= exp [(u1 + .. + p,,)t + (a +
+ a)t2/2] + o).
which shows that Y- N(p1 + +p,,,a + This includes the special case of a random sample X1, ..., X,, from a normally distributed population, say X, ' N(u, a2) In this case, p = p, and a2 = a for all ¡ = 1, .. , n, and consequently i
case that the sample mean,
i
X. - N(np, na2). It also follows readily in this
= i
X,./n is normally distributed,
'
N(u, a2/n).
An important application of the transformation method that involves ordered random variables is discussed next
65 ORDER STATISTICS The concept of a random sample of size n was discussed earlier, and the joint density function of the associated n independent random variables, say X1
,X,,,isgivenby x,,) = f(x1)
f(x,.)
(651)
For example, if a random sample of five light bulbs is tested, the observed failure times might be (in months) (x1, ..., x5) = (5, 11, 4, 100, 17). Now, the actual observations would have taken place in the order x3 = 4, x1 = 5, x2 = 11, x5 = 17, and x4 = 100. It often is useful to consider the "ordered" random sample of size n, denoted by (X1:,,, x2.,,, ..., x,,,,). That is, in this example x1.5 = x3 = 4, x2.5 = x1 = 5, x3.5 = x2 = 11, x.5 = x5 = 17, and x5.5 = x4 = 100. Because we do not really care which bulbs happened to be labeled number 1, number 2, and
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
6,5
215
ORDER STATISTICS
so on, one could equivalently record the ordered data as it was taken without keeping track on the initial labeling. In some cases one may desire to stop after the r smallest ordered observations out of n have been observed, because this could result in a great saving of time In the example, 100 months were required before all five light bulbs failed but the first four failed in 17 months
The joint distribution of the ordered variables is not the same as the joint density of the unordered variables For example, the 5 different permutations of a sample of five observations would correspond to just one ordered result This suggests the result of the following theorem We will consider a transformation that orders the values x1, x2, ..., x,,. For example, = u1(x1, x2,
X1, X2,
y,, =u,,(x1, x2, ...,x,,)= max (x1,x2, ...,x,,) and in general y = u(x1, x2, ..., x,,) represents the ith smallest of x1, x2, ..., x,,. For an example of this transformation see the above light bulb data Sometimes we will use the notation xi,, for u1(x1, x2, ..., x,,), but ordinarily we will use the simpler notation y. Similarly, when this transformation is applied to a random sample X1, X2 ....., X,, we will obtain a set of ordered random variables, called the order statistics and denoted by either X1,,, X2.,,, ..., X,,,, or Y1, Y2, ..., I.
Thecrsm 6.5.1
If X1, X2, ..., X,, is a random sample from a population with continuous pdf f(x), then the joint pdf of the order statistic í Y2 , 1 is g(y1, Y2
if y
y,,) = ,i'f(y1)f(y2)
j (y,,)
(652)
This is an example of a transformation of continuous random variables that is not one-to-one, and it may be carried out by partitioning the domain into subsets A1, A2, ... such that the transformation is one-to-one on each subset, and then summing as suggested by equation (6.3.20). Rather than attempting a general proof of the theorem, we will illustrate it for the case n = 3. In this case, the sample space can be partitioned into the following 3 = 6 disjoint sets A1 = {(x1, x2, x3) x1 < x2
A2 = {(x1, x2 x3)lx2
A4 = {(x1, x2, x3)Ix2 < x3
A6 = {(x1, x2, x3)1x3
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
216
CHAPTER 6
FUNCTIONS OF RANDOM VARIABLES
In transforming to the ordered random sample, we have the one-to-one transformation
with J1 = i on
Y1 = X1, Y2 = X2, Y3 = X3
A1
Y1 = X2, Y2 = X1, Y3 = X3 with J2 = 1 on A2
Y1 =X1, Y2 =X3, Y3 =X2 with J3 = 1 on A3 and so forth. Notice that in each case J = 1. Furthermore, for each region, the joint pdf is the product of factors f(y1) multiplied in some order, but it can be written as f(y1)f(y2)f(y3) regardless of the order. If we sum over all 3! = 6 I
I
subsets, then the joint pdf of Y1, Y2, and Y3 is 6
g(y1, Y2' Y3)
= iE
1
3 !f(y1)f(y2)f(y3)
Yi
and zero otherwise. The argument for a sample of size n, as given by equation (6.5.2), is similar.
Suppose that X1, X2, and X represent a random sample of size 3 from a population with pdf
0
f(x)=2x
It follows that the joint pdf of the order statistics Y1, Y2, and Y3 is g(y1,
Y2' Y3) = 3!(2yiX2y2X2y3)
= 48Y1 YY
0
and zero otherwise.
Quite often one may be interested in the marginal density of a single order statistic, say k, and this density can be obtained in the usual fashion by integrating over the other variables. In this example, let us find the marginal pdf of the smallest order statistic, Y1: 1
('1
48y1y2y3dy3dy2
1(Y1)=J
j = 6y(l - y)2 Y1
Y2
O
If we want to know the probability that the smallest observation is below some value, say 0.1, it follows that ro.1
P[Y1 <0.1] = (
g1(y1) dy1 = 0.030
Jo
It is possible to derive an explicit general formula for the distribution of the kth order statistic in terms of the pdf, f(x), and CDF, F(x), of the population random
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
6.5 ORDER STATISTICS
217
variable X. If X is a continuous random variable with f(x) > O on a < x
=
3!f(y1)f(y2)f(y3) dy3 f
dy2
Jy1 .)y
rb = 3 !f(y1)
f(y2)[F(b) - F(y2)J
dy2
Jy1
= 3!f(y1)
'
F(y2)]2
b
y' = 3f(y1)[1 -
F(y1)]2
a
Similarly, r» ('Y2 g2(y2)
3!f(y1)f(y2)f(y3) dy1 dy3
=j
Y2
= 3
Ja
!f(y)[F(y2) - F(a)]
f(y3) dy3
"y = 3 !f(y2)[1 -
where F(a) = O and
F(y2)]F(y2)
a
F(b) = 1.
These results may be generalized to the n-dimensional case to obtain the following theorem.
Theorem 6.52 Suppose that X, ..., X denotes a random sample of size n from a continuous pdf, f(x), wheref(x) > O for a < x
if a
< b,
=
[F(y)]k (k - 1)!(n - k)!
- F(yk)] kf(y)
(6.5.3)
and zero otherwise.
An interesting heuristic argument can be given, based on the notion that the "likelihood" of an observation is assigned by the pdf. To have Y,, = h one must have k - i observations less than y,,; one at Yk' and n - k observations greater than y,,, where P[X Y,,] = F(y,,), P[X ? Yk] = i - F(y,j, and the likelihood of an observation at Y isf(y,,). There are n!/(k - 1)! 1 !(n - k)! possible orderings of the n independent observations, and g,,(y,,) is given by the multinomial expression (6.5.3). This is illustrated in Figure 6.6. A similar argument can be used to easily give the joint pdf of any set of order statistics. For example, consider a pair of order statistics } and Y where i
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
218
FIGURE 6.6
CHAPTER 6
FUNCTIONS OF RANDOM VARIABLES
The kth ordered observation
k-1
nk
i
y Y2
= y and = one must have j - i observations less than y, one at y,j - i - i between y and y, one at y, and n -j greater than y. Applying the have
multinomial form gives the joint pdf for Y and Y as
- (i - 1)!(j - i 1)!(n -j)! [F(y)]1'f(y1)
x [F(y) - F(y)]'[1 -
(6.5.4)
if a
-
and 1Ç, which are special cases of equation (6 5 3) are g1(y1)
= nEl - F(y1)]°_1f(y1)
(6.5.5)
and
g(y) = n[F(y)] - 'f(y11)
a
(6.5.6)
For discrete and continuous random variables, the CDF of the minimum or maximum of the sample can be derived directly by following the CDF technique. For the minimum. G1(y1) = FEYJ
Yi]
= i - FEY1 > Yi]
= i - P[ail X > j] = i - [1 - F(y1)] FIGURE 6.7
(6.5.7)
The /th and jth ordered observations
ni I
y1
I
000 i
000
I
I
3/. 1
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
219
6.5 ORDER STATISTICS
For the maximum
G(y) = P[1 y,,] = P[all X y,,] =
(6.5.8)
Following similar arguments, it is possible to express the CDF of the kth order statistic. In this case we have 1, YA if k or more X, are at most Yk' where the number of X. that are at most y follows a binomial distribution with parameters n and p = F(y,3, That is, let A denote the event that exactly j X,'s are less than or equal to Yk and let B denote the event that Y,,
yk; then
A3 J k
where the A
P(B) =
jk
are
dijoint and P(A)
(fl)J(l
p)i
It follows that
P(A), which gives the result stated in the following theorem.
Theorem 6.5.3 For a random sample of size n from a discrete or continuous CDF, F(x), the marginal CDF of the kth order statistic is given by Gk(yk) =
()[FYk)]i[1 - F(y,,)]
j=k J
(6.5.9)
Example 6.5.2 Consider the result of two rolls of the four-sided die in Example 2.1.1. The graph of the CDF of the maximum is shown in Figure 2 3 Although this function was obtained numerically from a table of the pdf, we can obtain an analytic expression using equation (6.5.8). Specifically, let X1 and X2 represent a random sample of size 2 from the discrete uniform distribution, X, DU (4). The CDF of X, is x 4, where [x] is the greatest integer not exceeding x. If 4, according to equaY2 = max (X1, X2), then G2(y2) = ([y2]/4)2 for i y tion (6.5.8). The CDF of the minimum, Y1 min (X1, X2), would be given by G1(y1) = i - (1 - [y1]/4)2 for 1 Yi 4, according to equation (6.5.7).
F(x) = [x]/4 for i
Example 6.5.3
Consider a random sample of size n from a distribution with pdf and CDF given by f(x) = 2x and F(x) = x2; O
y)n1
O
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
220
CHAPTER 6
and
FUNCTIONS OF RANDOM VARIABLES
g(y) = 2ny(y,)" -1
= 2ny"'
O < y <1
The corresponding CDFs may be obtained by integration or directly from equations (6.5.7) and (6,5.8).
Example 6.5.4
Suppose that in Example 6.5.3 we are interested in the density of the range of the sample, R = - Y1. From expression (6.5.4), we have iii
= (n 2)!
(2y1)[y - y]"2(2y)
O
i
Making the transformation R = - Y1, S = Y1, yields the inverse transforrn mation Yi = s, y = r + s, and JI = 1. Thus, thejoint pdfofR and S is 4n! h(rs)(2),s(r+s)[r2+2rs]"2
O
The regions A and B of the transformation are shown in Figure 6.8.
F1JRE 6.
Regions corresponding to the ransformatiori r = y,, - y1 and s = y1
s
The marginal density of the range then is given by ('1-r h1(r) = j h(r, s) ds Jo
For example, for the case n = 2, we have ('1-r h1(r) = 8s(r + s) ds jo
= (4/3)(r + 2)(1 - r)2
(6.5.10)
forO < r < 1.
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
6.5
221
ORDER STATISTICS
An interesting general expression can be obtained for the marginal CDF of R, because r
H1(r) =
-
_
$
=
g1(s, x + s) ds dx
(n
2)!
f(s)[F(x+ s) - F(s)]2f(x+ s) dxds
nf(s)[F(r + s) - F(s)]1 ds
(6 511)
=J Note, however, that great care must be taken in applying this formula to an example where the region withf(x) > O has finite limits.
Exampk' 6.5.5 Again consider Example 6.5.4. In that case F(s + r) = i if s> i - r, so equation (6.5.11) becomes
('1r
('1
n(2s)[(r + s)2 -
H1(r)
= JO For the case n = 2,
s2J1_
ds +
$1_r 2 ±2rs) ds + [_r45(1
H1(r)
=
J1r
n(2s)[1
2jn_
ds
2)
- 2r +
which is consistent with the pdf given by equation (6.5.10),
CENSORED SAMPLING
As mentioned earlier, in certain types of problems such as life-testing experiments, the ordered observations may occur naturally. In such cases a great savings
in time and cost may be realized by terminating the experiment after only the 1rst r ordered observations have occurred, rather than waiting for all n failures to occur This usually is referred to as Type H CSO! sampirng In this case, the joint marginal density function of the first r order statistics may be obtained by integrating over the remaining variables. Censored sampling is applicable to many different types of problems, but for convenience the variable will be referred to as "time" in the following discussion. Theorcm 6.5.4 Type H Censored Sampling The joint marginal density function of the first r
order statistics from a random sample of size n from a continuous pdf, f(x), is given by Yr)
(n
[1 - F(Yr)]nrfIf(Yj
(6.5.12)
ifx
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
222
CHAPTER 6
FUNCTIONS OF RANDOM VARIABLES
In Type II censored sampling the number of observations, r, is fixed but the length of the experiment, , is a random variable. If one terminates the experiment after a fixed time, t0, this procedure is referred to as Type I censored sam-
pling In this case the number of observations, R is a random variable The probability that a failure occurs before time t0 for any given trial is p = F(t0), so for a random sample of size n the random variable R follows a binomial distribution: R
BIN(n, F(t0))
(6.5.13)
Type I censored sampling is related to the concept of truncated sampling and truncated distributions. Consider a random variable X with pdff(x) and CDF F(x) ¡fit is given that a random variable from this distribution has a value less than t0, then the CDF of X given X t0 is referred to as the truncated distribution of X, truncated on the right at t0, and is given by
F(xlx
t0)
=
P[X x, X t0] P[X t0]
O
(6514)
and
f(x) f(xlxto)ç I t0)
O
Distributions truncated on the left are defined similarly.
Now, consider a random sample of size n from f(x), and suppose it is given that r observations occur before the truncation point t0; then, given R = r, the joint conditional density function of these values, say x1 h(x1,
..., xrIr) =
flf(xIIxL
r
= [F(t0)]T 1J
, x is given by
t0)
f(x)
(6.5.15)
t0 and zero otherwise. Equation (6.5.15) also would be the density function of a random sample of size
if all x
r, when the parent population density function is assumed to be the truncated density function f(x)/F(t0). Thus, (6.5.15) may arise either when the pdf of the population sampled originally is in the form of a truncated density or when the original population density is not tuncated but the observed sample values are restricted or truncated. This restriction could result from any of several reasons, including limitations of measuring devices
Thus, one could have a sample of size r from a truncated density or a truncated sample from a regular density. In the first case, equation (6.5.15) provides the usual density function for a random sample of size r, and in the second case it provides the conditional density function for the r observations that were given
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
223
8.5 ORDER STATISTICS
to have values less than t0 . It is interesting to note that equation (6.5,15) does not involve the original sample size n. Indeed, truncated sampling may occur in two slightly different ways Suppose that the failure time of a unit follows the density
f(x), and that the unit is guaranteed for t0 years If a unit fails under warranty, then it is returned to a certain repair center, and the failure times of these units are recorded until r failure times are observed The conditional density function of these r failure times then would follow equation (6 5 15), which does not depend on n, and the original number of units, n, placed in service may be known or unknown Also note that the data would again naturally occur as ordered data and the original labeling of the original random units placed in service would be unimportant or unknown Thus, it again would be reasonable to consider directly the joint density of the ordered observations given by g(y1
,
Yr I
(6516)
= [F(t0)]r ]Jf(Yi)
if Yi <"< Yr < t0 and zero otherwise. In a slightly different setup, one may place a known number of units, say n, in service and record failure times until time t0. Again, as mentioned, the condition al density function still is given by equation (6.5.15) [or by equation (6,5.16) for
the ordered data], but in this case additional information is available, namely that n - r items survived longer than time t0. This information is not ignored if the unconditional joint density of Y, ..., Y,, is considered, and this usually is a preferred approach when the sample size n is known, This situation usually is referred to as Type I censored sampling (on the right), rather than truncated sampling.
Th9orcm 6.5.5 Type I Censored Sampling
If
Y'
,
1"r
denote the observed values of a random
sample of size n from f(x) that is Type I censored on the right at t0, then the joint pdf of Y1,
, R is given by
f11.....YR1' ..
Y<
n!
Yr)
- (n - r)! [1 - F(t0)]flf(y)
(6.5.17)
P[R = O] = [1 - F(t0)]'
Proof This follows by factoring the joint pdf into the product of the marginal pdf of R with the conditional pdf of Y1, ..., Y« given R fil.....YR(Y1, ..., Y) =
Yr
r! fIf(Y)
r. Specifically,
r)b(r; n, F(t0)) n!
[F(t0)]r r!(nr)!
- F(t0)]"_T
which simplifies to equation (6.5.17).
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
224
CHAPTER 6 FUNCTIONS OF RANDOM VARIABLES
Note that the forms of equations (6.5.12) and (6.5.17) are quite similar, with
t0
replacing Yr
As suggested earlier, we will wish to use sample data to make statistical inferences about the probability model for a given experiment The joint density func tion or "likelihood function" of the sample data is the connecting link between the observed data and the mathematical model, and indeed many statistical pro cedures are expressed directly in terms of the likelihood function of the data In the case of censored data, equations (6 5 12), (6 5 16), or (6 5 17) give the likelihood function or joint density function of the available ordered data, and
statistical or probabilistic results must be based on these equations Thus it is clear that the type of data available and the methods of sampling can affect the likelihood function of the observed data. ExampI 6,5.6
We will assume that failure times of airplane air conditioners follow an exponen-
tial model EXP(0). We will study properties of random variables in the next chapter that will help us characterize a distribution and interpret the physical meaning of parameters such as e. However, for illustration purposes, suppose the
manufacturer claims that an exponential distribution with O = 200 provides a good model for the failure times of such air conditioners, but the mechanics feel O = 150 provides a better model. Thirteen airplanes were placed in service, and the Íìcst 10 air conditioner failure times were as follows (Proschan, 1963):
23, 50, 50, 55, 74, 90, 97, 102, 130, 194
For Type II censored sampling, the likelihood function for the exponential distribution is given by equation (6.512) as y; O)
= (n r)!
exp
n!
(n_r)!Or For the above data, r = 10, n
T i
(n e
O
¡r +( ex[ [Y
o
13, and
Y+(1310)Yio1447
It would be interesting to compare the likelihoods of the observed data assuming O = 200 and O = 150. The ratio of the likelihoods is Yio; 200)
g(y1,...,y10;
(150\b0
F
I1
150)20O) exP[-1447I\jj= 0.628
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
6.5 ORDER STATISTICS
225
Thus we see that the observed data values are more likely under the assumption 8 = 150 than when 0 200. Based on these data, it would be reasonable to infer that the exponential model with O = 150 provides the better model. Indeed, it is possible to show that the value of O that yields the maximum value of the likelihood is the value
T r
-= 1447 =144.7 10 Thus, if one wished to choose a value of O based on these data, the value O = 144.7 seems reasonable.
For illustration purposes, suppose that Type I censoring had been used and that the experiment had been conducted for 200 flying hours for each plane to obtain the preceding data The likelihood function now is given by equation (6.5.17):
n!
f(y1, ...,
- r)! û"
For our example, r = 10, n = 13, and t0 = 200. h is Interesting that the likelihood function is maximized in this case by the value of 6 given by
O=(y+(n_r)to'/r= 146,5 \i=1 JI
As a final illustration, suppose that a large fleet of planes is placed in service and a repair depot decides to record the failure times that occur before 200 hours However, some units in service may be taken to a different depot for repair, so it is unknown how many units have not failed after 200 hours. That is, the sample size n is unknown. Given that r ordered observations have been recorded, the conditional likelihood is given by equation (6.5.16): r! exp
(_io)
Yr O, t0 r) = orct - exp (t0/O)]" where r = 10 and t0 = 200.
The value of O that maximizes this joint pdf cannot be expressed in closed form; however, the approximate value for this case based on the given data is O 245. This value is not too close to the other values obtained, but of course the data were not actually obtained under this mode of sampling. If two different assumptions are made about the same data, then one cannot expect to always get similar results (although the Type I and Type II censoring formulas are quite similar).
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
226
CHAPTER 6 FUNCTIONS OF RANDOM VARIABLES
SUMMARY The main purpose of this chapter was to develop methods for deriving the distribution of a function of one or more random variables. The CDF technique is a general method that involves expressing the CDF of the "new" random variable in terms of the distribution of the "old random variable (or variables) When one k-dimensional vector of random variables (new variables) is defined as a function of another k-dimensional vector of random variables (old variables) by means of a set of equations, transformation methods make it possible to express the joint pdf of the new random variables in terms of the j oint pdf of the old random variables The continuous case also involves multiplying by a function called the Jacobian of the transformation A special transformation, called the probability integral transformation, and its inverse are useful in applications such as computer simulation of data. The transformation that orders the values in a random sample from smallest to largest can be used to define the order statistics A set of order statistics in which a specified subset is not observed is termed a censored sample, This concept is useful in applications such as life-testing of manufactured components, where it is not feasible to wait for all components to fail before analyzing the data.
EXERCISES 1.
Let X be a random variable with pdff(x) = 4x3 if O < x < I and zero otherwise. Use the cumulative (CDF) technique to determine the pdf of each of the following random variables:
Y=X4. W = e'. (e) Z = In X. (d) U = (X 2.
Ø5)2
Let X be a random variable that is uniformly distributed, X technique to determine the pdf of each of the following:
UNIF(O, 1). Use the CDF
Y=X'14. W=e_X.
(e) Z = i - e'. (d) U=X(1X). 3. The measured radius of a circle, R, has pdff(r) = 6r(1 - r), O
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
227
EXERCISES
4. If X is Weibull distributed, X
WEI(O, ß, find both the CDF and pdf of each of the
following: Y = (XJO).
W=lnX. Z = (In X)2.
5. Prove Theorem 6.3.4, assuming that the CDF F(x) is a one-to-one functìon.
6
Let X have the pdf given in Exercise i Find the transformation y = u(x) such that Y = u(X) UNIF(O 1)
7
Let X UNIF(O 1) Find transformations y = G1(u) and w = G2(u) such that Y
G1(U)
W = G2(U)
8
EXP(1). BJN(3, 1/2).
Rework Exercise I using transformation methods Rework Exercise 2 using transformation methods.
Suppose X has pdff(x) = (1/2) exp (-1x1) for all -
Find thepdfofY=IXj.
LetW=OifXO and W= 1 ifX>O.FindtheCDFofW. 11,
IfX - BIN(n, p), then find the pdf of Y
IfX
n - X.
NB(r,p), then find the pdfof Y = X - r.
Let X have pdff(x) = x2/24; 2
Let X and Y have joint pdff(x y) = 4e21
O
otherwise.
Find the £DF of W X + Y. Find the joint pdf of U = X/Y and V = X. Find the marginal pdf of U. 15, If X1 and X2 denote a random sample of size 2 from a Poisson distribution, X find the pdf of Y = X1 + X2.
POI.),
18. Let X1 and X2 denote a random sample of size 2 from a distribution with pdff(x) = 1/x2; i x < co and zero otherwise. Find the joint pdf of U = X1 X2 and V X1 Find the marginal pdf of U
17. Suppose that X1 and X2 denote a random sample of size 2 from a gamma distribution, X -' GAM(2, 1/2). Find the pdf of Y = ,JX1 + X2. Find the pdf of W = X1/X2 18, Let X and Y have joint pdff(x, y) = e; O
(a) Find the joint pdfof S = X + Y and T = X.
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
228
CHAPTER 6
FUNCTIONS OF RANDOM VARIABLES
Find the marginal pdf of T. Find the marginal pdf of S. 19. Suppose that X1, X2, .., X are independent random variables and let } = u,(X,) for i = 1, 2, .,., k. Show that Y, Y2, ..., Y are independent. Consider only the case where X, is continuous and y, = u,4x) is one-to-one. Hint: If x = w,-(y,) is the inverse transformation, then the Jacobian has the form k
J = i1 fi
w,(y,) Yi
20. Prove Theorem 5.4,5 in the case of discrete random variables X and Y. Hint: Use the transformation s = x and t = g(x)y
21. Suppose X and Y are continuous random variables with joint pdff(x, y) = 2(x +' y) if O
22. As in Exercise 2 of Chapter 5 (page 189), assume the weight (in ounces) of a major league baseball is a random variable and recall that a carton contains 144 baseballs Assume now that the weights of individual baseballs are indepeident and normally distributed with mean p = 5 and standard deviation ci = 2/5 and let T 'present the total weight of all baseballs in the carton. Find the probability that the total weight of baseballs in a carton is at most 725 ounces.
23. Suppose that X1, X2 .....X1 are independent random variables, and let Y = X1 + X2 + + X. 1f X. '- GEO(p), then find the MGF of Y. What is the distribution of Y? 24. Let X1, X2, .., X10 be a random sample of size n = 10 from an exponential distribution with mean 2, X,
EXP(2). 10
Find the MGF of the sum Y =
X,. i
i
What is the pdf of Y?
25. Let X1, X2, X3, and X4 be independent random variables. Assume that X2, X3, and X4 each are Poisson distributed with mean 5 and suppose that Y = X1 + X2 + X3 + X4 P01(25).
What is the distribution of X,?
What is the distribution of W X1 + X2'
26. Let X1 and X2 be independent negative binomial random variables, X1 '- NB(ri, p) and X2
NB(r2, p).
FindtheMGFofY=X1 +X2. What is the distribution of Y?
27. Recall that Y
LOGN(p, 2) if In Y N(p, 2) Assume that Y1 i = 1, ..., n are independent. Find the distribution of:
LOGN(p,,
(a) fli.
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
229
EXERCJSES
fJY'. Y1/Y2.
Find E Li= i
z].
28. Let X1 and X2 be a random sample of size n = 2 from a continuous distribution with pdf of the formf(x) = 2x if O
29. Consider a random sample of size n from a distribution with pdff(x) = 1/x2 if i
x < co;
zero otherwise. Give the joint pdf of the order statistics. Give the pdf of the smallest order statistic, Y. Give the pdf of the largest order statistic, }. - Y1, for n = 2. Derive the pdf of the sample range, R = Give the pdf of the sample median, l, assuming that n is odd so that r = (n + 1)72.
30. Consider a random sample of size n = 5 from a Pareto distribution, X. -.- PAR(1, 2). Give the joint pdf of the second and fourth order statistics, Y2 and Give the joint pdf of the first three order statistics, Y1, Y2, and Y3. Give the CDF of the sample median, Y3.
Y4.
31. Consider a random sample of size n from an exponential distribution, X1
EXP(1). Give
the pdf of each of the following: The smallest order statistic, Y1. The largest order statistic, The sample range, R = Y1. The first r order statistics, Y1, ..., Y,. .
-
32. A system is composed of five independent components connected in series. If the pdf of the time to failure of each component is exponential, X1 - EXP(1), then give the pdf of the time to failure of the system. Repeat (a), but assume that the components are connected in parallel. Suppose that the five-component system fails when at least three components fail. Give the pdf of the time to failure of the system. Suppose that n independent components are not distributed identically, but rather EXP(61). Give the pdf of the time to failure of a series system in this case. X1
33. Consider a random sample of size n from a geometric distribution, X1
GEO(p). Give the
CDF of each of the following: The minimum, Y1. The kth smallest, Yk.
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
230
CHAPTER 6 FUNCTIONS OF RANDOM VARIABLES
The maximum, Find P[YJ 1].
.
34, Suppose X1 and X2 are continuous random variables with joint pdf f(x1, x2). Prove Theorem 5.2.1 assuming the transformation y1 = u(x1, x2), Y2 = x2 is one-to-one. Hint: First derive the marginal pdf of Y1 = u(X1, X2) and show that E(Y1)
=
=
J Yi f(Yi) dy1
JJ u(x, x2) f(x1, x2) dx1 dx2. Use a similar proof in the case of discrete random
varables. Notice that proofs for the cases of k variables and transformations that are not one-to-one are similar but more complicated.
35. Suppose X1, X2 are independent exponentially distributed random variables, X1
EXP(0), and let Y = X1 - X2. Find the MUF of Y. What is the distribution of Y?
36. Show that if X1....., X, are independent random variables with FMGFs G1(t), . ., G(t), and Y = X1 + + X, respectively, then the FMGF of Y is G(t) G1(t) . . G(t).
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
LI ív ITI !G
ISTRIBUTICNS
7J J NTRO D U CTION
In Chapter 6, general methods were discussed for deriving the distribution of a function of n random variables, say }Ç = u(X1, , X,j In some cases the pdf of 1 is obtained easily, but there are many important cases where the derivation is not tractable. In many of these, it is possible to obtain useful approximate results that apply when n is large. These results are based on the notions of convergence in distribution and limiting distribution. 231
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
232
CHAPTER 7
LIMITING DISTRIBUTIONS
72 SEQUENCES OF RANDOM VARIABLES Consider a sequence of random variables
Y1, Y2, ... with a corresponding sequence of CDFs G1(y), G2(y), ... so that for each n = 1,2,
G(y) = P[Y
y]
Definition 7.2.1 If n - G(y) for each n
1, 2,.,., and if for some CDF G(y),
um Giy) = G(y)
(7.2.2)
n-' it
for all values y at which G(y) is continuous, then the sequence Y1, Y2, ... is said to
converge in distribution to Y G(y), denoted by - Y. The distribution corresponding to the CDF G(y) is called the limiting distribution of }.
Fxmph1 7.2.1
Let X1, ..., X, be a random sample from a uniform distribution, X. UNIF(O, 1), and let }Ç = the largest order statistic. From the results of Chapter 6, it follows that the CDF of is
G(y)y' zero if y
O
O and one if y
approaches x, and when y
FIGURE 7.1
(7.2.3)
1. Of course, when O
Comparison of CDFs G(y) with limiting degenerate CDF G(y)
G(y)
G,1(y)
Yo
5
y2
y o
o
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
7.2 SEQUENCES OF RANDOM VARIABLES
233
respective limits O or 1. Thus, 11m G(y) = G(y) where G(y)
y
y;l
(7.2.4)
This situation is illustrated in Figure 7.1, which shows G(y) and G(y) for n = 2, 5, and 10.
The function defined by equation (7.2.4) is the CDF of a random variable that
is concentrated at one value, y = i Such distributions occur often as limiting distributions.
Definition 7.2.2 The function G(y) is the CDF of a degenerate distribution at the value y = c if
\JO y
(7.2.5)
In other words, G(y) is the CDF of a discrete distribution that assigns probability one at the value y = c and zero otherwise.
Examplo 7.2.2
Let X1, X2, ..., X be a random sample from an exponential distribution, X EXP(0), and let = X1. be the smallest order statistic. It follows that the
CDFofis
G(y)=1e'°
y>O
(7.2.6)
and zero otherwise. We have 11m G,(y) = i if y > O because e6 < i in this
case. Thus, the limit is zero if y <0 and one if y > 0, which corresponds to a degenerate distribution at the value y 0. Notice that the limit at y = O is zero, which means that the limiting function is not only discontinuous at y = O but also not even continuous from the right at y = 0, which is a requirement of a CDF. This is not a problem, because Definition 7.2.1 requires only that the limiting function agrees with a CDF at its points of continuity
Ljfinition 7.2.3 A sequence of random variables, Y1, Y2, ..., is said to converge stochastically to a constant c if it has a limiting distribution that is degenerate at y = c
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
234
CHAPTER 7
LIMITING DISTRIBUTIONS
An alternative formulation of stochastic convergence will be considered in Section 7.6, and a more general concept called convergence in probability will be discussed in Section 7.7. Not all limiting distributions are degenerate, as seen in the next example. The following limits are useful in many problems: (7.2.7)
um
if lim d(n) =
(7.2.8)
These are obtained easily from expansions involving the natural logarithm. For example, limit (7.2.7) follows from the expansion nb In (1 + c/n) = nb(c/n + .) = cb + «, where the rest of the terms approach zero as n - co. Example 7.2.3 Suppose that X1, ..., X,, is a random sample from a Pareto distribution, X. PAR(l, 1), and let = nX1,,,. The CDF of X. is F(x) = 1 (1 + x); x> O, so the CDF of IÇ, is
y>0
(7.2.9)
Using limit (7.2.7), we obtain the limit G(y) = i - e; y > O and zero otherwise, which is the CDF of an exponential distribution, EXP(1) This is illustrated in Figure 7.2, which shows the graphs of G(y) and G,,(y) for n = 1, 2, and 5.
FIGURE 7.2
Comparison of CDFs G(y) with 'imiting CDF G(y)
G(y) = i - e
Ql (Y)
G2(y)
G,(y)
The following example shows that a sequence of random variables need not have a limiting distribution.
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
7.2
Example 72.4
SEQUENCES OF RANDOM VARIABLES
235
For the random sample of the previous example, let us consider the largest order statistic, 1Ç = Xn:n. The CDF of 1 is (7.2.10)
and zero otherwise. Because y/l + y) < 1, we have hrn G(y) = G(y) = O for all y, which is not a CDF because it does not approach one as y - cc.
Example 7.2.5 In the previous example, suppose = (l/n)X.,,, which has CDF
hat we consider a rescaled variable,
G(y) = (i +
(7.2.11)
and zero otherwise. Using limit (7.2.7), we obtain the CDF G(y) = e11; y > O.
Example 7.2.6 For the random sample of Example 7.2.2, consider the modified sequence ln n. The CDF is
= (1/6)X,,. G,,(y)
=
[i - ()e_']
y> in n
and zero otherwise. Following from limit
G(y)= exp(e'); cc
(7.2.12)
(7.2.7),
the
limiting CDF is
We now illustrate the accuracy when this limiting CDF is used as an approximation to G(y) for large n Suppose that the lifetime in months of a certain type of component is a random variable X '- EXP(l), and suppose that 10 indepen dent components are connected in a parallel system The time to failure of the system is T = X10 and the CDF is FT(t) = (1 - e_t)lO, t > O This CDF is
evaluated at t = 1, 2 5 and 7 months in the table at the top of page 236 To approximate these probabilities with the limiting distribution, then FT(t) = PUT
t]
= P[Y10 + in 10 G(t
t]
In 10)
(_e_th = exo ( lOe_ti = exp
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
236
CHAPTER 7
LIMITING DISTRIBUTIONS
The approximate probabilities are given in the table for comparison t: FT(t):
1
2
5
7
0.010
0.234
0.935
0.9909
G(tlnlO) 0025 0258 0935 09909 The approximation should improve as n increases
Example 7.2.7 Consider the sample mean of a random sample from a normal distribution, N(p «2), )Ç = G(y)
, From the results of the previous chapter,
[(_ P)]
N (u, «2/n), and (7.2.13)
The limiting CDF is degenerate at y = p, because hm G(y) = O if y
Certain limiting distributions are easier to derive by using moment generating functions
7,3 THE CENTRAL LIMIT THEOREM In the previous examples, the exact CDF was known for each finite n and the limiting distribution was obtained directly from this sequence. One advantage of limiting distributions is that it often may be possible to determine the limiting distribution without knowing the exact form of the CDF for finite n. The limiting distribution then may provide a useful approximation when the exact probabil ities are not available. One method of accomplishing this result is to make use of MGFs. The following theorem is stated without proof.
Theorsm 7.3.1 Let Y1, Y2, .. be a sequence of random variables with respective CDFs G1(y), G2(y), ... and MGFs M1(t), M2(t)......If M(t) is the MGF of a CDF G(y), and if hm M(t) = M(t) for all t in an open interval containing zero, - h < t < h, then h-'
urn G(y) = G(y) for all continuity points of G(y).
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
7.3
Example 7.3.1
THE CENTRAL LJMIT THEOREM
., X be a random sample from a Bernoulli distribution,
Let X1,
X
237
BIN(l, p), and consider
=
X1. If we let p . O as n
.
in such a way
that np = p, for fixed p > O, then
M(t) = (pet + q)'1
=
(
=
+i +
(7.3.1)
and from limit (7.2.7) we have
hm M(t) = e'
(7.3.2)
which is the MGF of the Poisson distribution with mean p. This is consistent with the result of Theorem 3.2.3 and is somewhat easier to verify. We conclude that } Y P01(p). -
Example 7.3.2
Bernoulli Law of Large Numbers
Suppose now that we keep p fixed and consider the sequence of sample proportions, = p = Y,.,/n By using the series expansion e' = i + u + u2/2 + with u = tin, we obtain
M(t) = (pe"1
+ q)'1
I (7.3.3)
where d(n)/n involves the disregarded terms of the series expansion, and d(n) * O as n -+ . From limit (7.2.8) we have um n-'
M(t)
e"t
which is the MGF of a degenerate distribution at y = p, and thus
(7.3.4)
converges
stochastically to p as n approaches infinity.
Note that this example provides an approach to answering the question that was raised in Chapter 1 about statistical regularity. If, in a sequence of M independent trials of an experiment, M represents the number of occurrences of an event A, then JA = YAI/M is the relative frequency of occurrence of A. Because the
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
238
CHAPTER 7
LIMITING DISTRIBUTIONS
Bernoulli parameter has the value p = P(A) in this case, it follows that f4 converges stochastically to P(A) as M -* x. For example, if a coin is tossed repeatedly, and A = {H}, then the successive relative frequencies of A correspond to a sequence of random variables that will converge stochastically to p = 1/2 for an unbiased coin. Even though different sequences of tosses generally produce different observed numerical sequences offA, in the long run they all tend to stabilize near 1/2.
Example 7.32 Now we consider the sequence of "standardized" variables: (7.3.5)
With the simplified notation a,, ,.Jnpq, we have Z,, = Ya/a,, - np/os. Using the series expansion of the previous example, M(t) = e - ?PtICt(petI
+ q)"
p'1'pe'j« + q)]' +
=
-
t2
2n
p2t2
2a,,2
d(n)1" n
where d(n) - O as n - co. Thus, (7.3.7)
um iVI,(t) = e'212 n-'
which is the MGF of the standard normal distribution, and so Z,, - Z
N(O, 1).
Thi is an example of a special limiting result known as the Central Limit Theorem.
TIwrem 7.3.2 Central Limit Theorem (CLT) If X1, ..., X,, is a random sample from a distribution with mean p and variance a2 < co, then the limiting distribution of
(7.3.8)
is the standard normal, Z,, -* Z
N(O, 1) as n - co.
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
7.3 THE CENTRAL LIMIT THEOREM
239
Proof This limiting result holds for random samples from any distribution with finite mean and variance, but the proof will be outlined under the stronger assumption that the MGF of the distribution exists. The proof can be modified for the more general case by using a more general concept called a characteristic function, which we will not consider here
Let m(t) denote the MGF of X - p, m(t) = M_,1(t), and note that m(0) = 1,
-
m'(0) = E(X - p) = O, and m"(0) = E(X p)2 = c2. Expanding m(t) by the Taylor series formula about O gives, for between O and t, m(t) = m(0) + m'(0)t +
-+ -+
m"()t2
2
(m"() - c72)t 2
(7,3.9)
2
by adding and subtracting o2t2/2. Now we may write
and
M(t) =
= flM n
= = [m( "f Z
2nr2
+
-
2n2
Ij <
Asn-*t/,/cr-*O, c-+O,andm"()c2-O,so M,,(t) =
[i +
+
(7.3.10)
where d(n) -+0 as n -cc. It follows
hrn M (t) = ¿2/2 n-
(7.3.11)
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
240
LIMITING DISTRIBUTIONS
CHAPTER 7
or
limF(z) = which means that Z,, + Z
(7.3.12)
N(O, 1).
Note that the variable in limit (7.3.8) also can be related to the sample mean, (7.3.13)
Z,,
The major application of the CLT is to provide an approximate distribution in cases where the exact distribution is unknown or intractable.
Examplø 7.3.4
Let X1, ..., X,, be a random sample from a uniform distribution, X
UNIF(O, 1),
n
and let
}Ç =
X1. Because E(X) = 1/2 and Var(X) = 1/12, we have the
approximation
nfl 2' 12 For example, if n = 12, then approximately
Y2 - 6
N(O, 1)
This approximation is so close that it often is used to simulate standard normal random numbers in computer applications Of course this requires 12 uniform random numbers to be generated to obtain one normal random number.
74 APPROXMATIONS FOR THE BINOMIAL DISTRIBUTION Examples 7.3.1 through 7.3.3 demonstrated that various limiting distributions apply, depending on how the sequence of binomial variables is standardized and also on assumptions about the behavior of p as n - co BIN(n, p), if n is large Example 7.3.1 suggests that for a binomial variable POI(np). This was discussed in a different and p is small, then approximately context and an illustration was given in Example 3.2.9 of Chapter 3.
Example 7 33 considered a fixed value of p, and a suitably standardized sequence was found to have a standard normal distribution, suggesting a normal
approximation. In particular, it suggests that for large n and fixed p, approximately 1
N(np, np q). This approximation works best when p is close to 0.5,
because the binomial distribution is symmetric when p = 0.5. The accuracy
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
7.4 APPROXIMATIONS FOR THE BINOMIAL DISTRIBUTION
241
required in any approximation depends on the application. One guideline is to use the normal approximation when np 5 and nq 5, but again this would depend on the accuracy required. Example 7.4.1
The probability that a basketball player hits a shot is p = 0.5, If he takes 20 shots, what is the probability that he hits at least nine? The exact probability is
P[Y209]=1P[Y208] =i-
8
yo Y
)0.5Y0.520-Y = 0,7483
A normal approximation is
P[Y209]= i P[Y208]
i
(8_1O)
= i - c1(-0.89) = 0.8133 Because the binomial distribution is discrete and the normal distribution is continuous, the approximation can be improved by making a continuity correction In particular each binomial probability b(y, n, p) has the same value as the area of a rectangle of height b(y n p) and with the interval [y - 0 5 y + 0 5] as its base, because the length of the base is one unit. The area of this rectangle can be approximated by the area under the pdf of Y '- N(np, np q), which corre-
sponds to fitting a normal distribution with the same mean and variance as BIN(n, p). This is illustrated for the case of n = 20, p = 0.5, and y = 7 in Figure 7.3, where the exact probability is b(7; 20, 0.5) = ()(o.5)7(o.5)13
= 0.0739. The approximation, which is the shaded area in the figure, is
FÍGUNE 7.3
/
Continuity correction for normal approximation of a binomial probability
b(7;20,0.5) - 0.0739
65
7
7.5
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
242
CHAPTER 7
LIMITING DISTRIBUTIONS
«(7.5_1o) «(6.5_1o) - «(-1.12) - (-L57) = 0,0732
The same idea can be used with other binomial probabilities, such as
PEY20 ; 9] = I - P[Y20 ( 8]
-- i - (0118.5 - 10 = i - i(-0.67) = 0.7486
which is much closer to the exact value than without the continuity correction. The situation is sho i in Figure 7 4 In general, if « BIN(n, p) and a b are integers, then P[a
b]
(7.4.1)
Continuity corrections also are useful with other discrete distributions that can be approximated by the normal distribution.
FIGURE 7.4
The normal approximation for a binomial distribution
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
7.5 ASYMPTOTIC NORMAL DISTRIBUTIONS
Example 7.4.2
Suppose that
1
243
POl(n), where n is a positive integer. From the results
of Chapter 6, we know that 1 has the same distribution as a sum where X1, ..., X are independent, X
-
PoI(i). According to the CLT, Z
n)// Z ' N(0, 1), which suggests the approximation 1Ç N(n, n) ( for large n. For example, n = 20, we desire to find PElO Y20 30]. The exact 30
value is y1O
e20(20)/y! = 0.982, and the approximate value is
(30.5 - 20\
¿9.5 - 20
)
)
-
(2.35) - (-2.35) = 0.981
which is quite close to the exact value.
7,5 ASYMPTOTIC NORMAL DISTRIBUTIONS From the CLT it follows that when the sample mean is standardized according to equation (7.3.13), the corresponding sequence Z
Z N(0, 1). It would not be unreasonable to consider the distribution of the sample mean X,, as approximately N (u, o-2/n) for large n. This is an example of a more general
notion.
Definition 74.1 If Y1, Y2, ... is a sequence of random variables and m and c are constants such that
Z=_mZN(O,l)
(7.5,1)
as n -* m, then Y,, is said to have an asymptotic normal distribution with asymptotic mean m and asymptotic variance c2/n.
Examplo 7.5.1
Consider the random sample of Example 4.6.3, which involved n = 40 lifetimes of electrical parts X, EXP(100) By the CLT X,, has an asymptotic normal dis tribution with mean m = 100 and variance c2/n = (100)2/40 250.
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
244
CHAPTER 7
LIMITING DISTRIBUTIONS
ASYMPTOTIC DISTRIBUT!OiJ OF CENTRAL ORDER STATISTICS In Section 7 1 we showed several examples that involved extreme order statistics, such as the largest and smallest, with limiting distributions that were not normal. Under certain conditions, it is possible to show that "central" order statistics are asymptotically normal.
Theorem Z
Let X1, ..., X be a random sample from a continuous distribution with a pdf f(x) that is continuous and nonzero at the pth percentile, xi,, for O
Example 7,5,2
p(l -
2
(7.5.2)
Let X1, X be a random sample from an exponential distribution, EXP(1), so that f(x) = ex and F(x) = i - e_x; x > 0. For odd n, let k = (n + 1)/2, so that Yk = Xk. is the sample median. If p = 0.5, then the median is x0 = in (0.5) = In 2 and X.
C
2
Thus, Xk
-
0.5(1 - 05)
0.25
[f(ln 2)]2 - (05)2
is asymptotically normal with asymptotic mean x0 = in 2 and
asymptotic variance c2/n = 1/n.
Example 7.5.3
.,
Suppose that X1,
X is a random sample from a uniform distribution, i and F(x) = X; O < x < 1. Also assume that n is
X. '- UNIF(0, 1), SO thatf(x) =
odd and k = (n + 1)/2 so that Y, = Xk is the middle order statistic or sample median Formula (6 53) gives the pdf of k which has a special form because k - .1 = n - k = (n - l)/2 in this example. The pdfis = {[(n
-
n! 1)/2] !}
0
(7.5.3)
According to the theorem, with p = 0.5, the pth percentile is x0,5 = 0.5 and c2 = 0.5(1 - 0.5)/[i]2 = 0.25, so that Z,, = 0.5)70.5 Z N(O, 1). Actually, this is strongly suggested by the pdf (7.5.3) after the transformation z = /(y - 0.5)/0.5, which has inverse transformation y = 0.5 + 0.5z/4 and Jacobian J = The resulting pdf is
-
/ z2\" 11-n - 1)72] !}2
n!(0.5)"
-
1)/2
jzj <
(7.5.4)
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
7.6
245
PROPERTIES OF STOCHASTIC CONVERGENCE
It follows from limit (7.2.7), and the fact that (1
/
um f n-.co\
z2/n)112 + 1, that
i--z2\-1)12 = fl
and it is also possible to show that the constant in (7.5.4) approaches i//
n -+ .
as
Thus, in the example, the sequence of pdf's corresponding tò Z converges to a standard normal pdf. It is not obvious that this will imply that the CDFs also converge, but this can be proved. However, we will not pursue this point.
7.6 PROPERTES OF STOCHASTIC C'JFC We encountered several examples in which a sequence of random variables converged stochastically to a constant For instance in Example 7 3 2 we discovered that the sample proportion converges stochastically to the population propor
tion. Clearly, this is a useful general concept for evaluating estimators of unknown population parameters, and it would be reasonable to require that a good estimator should have the property that it converges stochastically to the parameter value as the sample size approaches infinity. The following theorem, stated without proof, provides an alternate criterion for showing stochastic convergence.
Theorem 7.6.1
The sequence Y, Y2, ... converges stochastically to c if and only if for every 6 >0,
hm P[ ) n-'
c
z s] = i
(7.6.1)
A sequence of random variables that satisfies Theorem (7.6.1) is also said to converge in probability to the constant c, denoted by 1 - c. The notion of convergence in probability will be discussed in a more general context in the next section.
íE.rampIe 7.6.1
Example 7.3.2 verified the so-called Bernoulli Law of Large Numbers with the MGF approach. It also can be verified with the previous theorem and the Chebychev inequality. Specifically, the mean and variance of are E() = p and
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
246
CHAPTER 7
UM'ITING DISTRIBUTIONS
Var() = pq/n, so that
- p
(7.6.2)
22fl
for any r > O, so hm P[I ,, - p
1.
n-'
This same approach can be used to prove a more general result, usually referred to as the Law of Large Numbers (LLN).
Theorem 7.6.2 If X1, ..., X is a random sample from a distribution with finite mean and variance c2, then the sequence of sample means converges in probability to ,
-
Xn
P
P
Proof This follows from the fact that E(X) =
F[I
so that hm P[l n'
-
I <]
i
p, Var() =
c2/n, and thus
-
-1
(7.6.3) 1.
These results further illustrate that the sample mean provides a good estimate of the population mean, in the sense that the probability approaches i that ,, is arbitrarily close to p as n - co. Actually, the right side of inequality (7.6.3) provides additional information. Namely, for any r > O and O < (5< 1, if n> 2/(82, then
P pr < )
Theorem 7.6.3 If Z = /(Y, - m)/c - Z
N(O, 1), then Y,, - m.
ExampTh 7.6.2 We found in Examples 7.5.2 and i.5.3 that the sample median Xk.,, is asymptotically normal with asymptotic mean x05, the distribution median. It follows from the theorem that Xk:fl - x05 as n - co, with k/n -t 0.5. Similarly, under the conditions of Theorem 7.5.1, it follows that if k/n
- p, then
the kth smallest order statistic converges stochastically to the pth percentile, Xkfl !* xp.
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
7.7 ADDITIONAL LIMIT THEOREMS
247
7e7 ADDITIONAL LIMIT THEOREMS
Definition 7.7.1 Convergence in Probability The sequence of random variables } is said to converge in probability to Y, written 1
um P[ 1 n-'
VI
-
Y, if
(7.7.1)
It follows from equation (7.5.2) that stochastic convergence is equivalent to
convergence in probability to the constant c, and for the most part we will restrict attention to this special case. Note that convergence in probability is a stronger property than convergence in distribution. This should not be surprising, because convergence in distribution does not impose any requirement on the joint distribution of 1 and Y, whereas convergence in probability does. The following theorem is stated without proof
Theorem 7.7,1
For a sequence of random variables, if then
For the special case Y = c, the limiting distribution is the degenerate distribution P[Y = c] = 1. This was the condition we initially used to define stochastic convergence.
Theorem 7.7.2 If 1 -4 c, then for any function g(y) that is continuous at c,
g()g(c) Proof Because g(y) is continuous at c, it follows that for every a > O a ö> O exists such that y - c < 5 implies ¡ g(y) - g(c) j < a. This, in turn, implies that
P[j g(Y,) - g(c) I
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
248
CHAPTER 7 LIMTNG DISTRIBUTIONS
because P(B) ? P(A) whenever A c B. But because } > O that um P[J g(}Ç) - g(c)
um P[J 1
-
c,
it follows for every
- cl <] = i
The left-hand limit cannot exceed 1, so it must equal 1, and g()
Theorem 7.7.2 is also valid if 1 and c are k-dimensional vectors. Thus this theorem is very useful, and examples of the types of results that follow are listed in the next theorem. Thoorc'i.î 7.7.3
If X and }
-
are two sequences of random variables such that X
-
c
and
d, then:
aX+b}--ac+bd.
X1-cd. X/c- I, 1/X
for
I/c
c
r O.
if P[X if P[X,,
Example 7.7.1
Suppose that Y '
BIN(n,
O] = i for all n, c
O.
O] = i for all n.
p). We know that
= Y/n -
p. Thus it follows
that
(i )-p(1--p) The following theorem is helpful in determining limits in distribution. Theorem 7.7.4 Slutsky's Theorem
1f X and Y are two sequences of random variables such
thatX-c and }->Y,then: 1. X+
1-*c+ Y.
2 XY,-*cY 3. 1,/X4Y/c; cO.
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
ADDITIONAL LIMIT THEOREMS
7.7
249
Note that as a special case X could be an ordinary numerical sequence such
asX=n/(n-1).
xa,pll 77.2 Consider a random sample of size n from a Bernoulli distribution, X
BIN(1, p). We know that p
ZN(O,1)
- p)/n We also know that
(1
PP
- ) - p(1 - p), so dividing by [(1 -
- p)]''2 gives
d
-+Z-N(O, 1)
Jî'(1 -
(7.7.2)
Theorem 7.7.2 also may be generalized.
liseorem 7.7.6 If }
. Y, then for any cont nqs function g(y), g(1) - g(Y). Note that g(y) is assumed not t depend on n.
Thorm 7.7.6. If /(Y - m)/c - Z g'(m)
O,
Ñ(O, 1), and if g(y) has a nonzero derivative at y
then
- g(m)] I
cg (m)
N(O,1)
I
Proof Define u(y) = [g(y) - g(m)]/(y - m) - g'(m) if y
m,
and let u(m) = O. It follows
that u(y) is continuous at m with u(m) = O, and thus g'(m) ± i(Y) - g'(m). Furthermore,
- g(m)] k( ' - m)1 [g'(m) + u( [cg (m)]
L
c
j
)]
g'(m)
From Theorem 7.7.3, we have [g'(m) + u(Y)]/g'(m) - 1, and the result follows from Theorem 7.7.4.
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
250
CHAPTER 7
LIMITING DISTRIBUTIONS
According to our earlier interpretation of an asymptotic normal distribution, we conclude that for large n, if 1Ç,
g() N {(m)
N(m, c2/n), then approximately
c2[m)]2} (7.7.3)
Note the similarities between this result and the approximate mean and variance formulas given in Section 2.4.
ExampI 7.7.3 The Central Limit Theorem says that the sample mean is asymptotically normally distributed, N(O, 1)
or, approximately for large n,
-
xn
/ c2Nyi
We now know from Theorem 7.7.6 that differentiable functions of .i,, also will be asymptotically normally distributed. For example, if g(,,) = , then g'(u) = and approximately,
7e8* ASYMPTOTIC DISTRIBUTIOI\S OF ETEME DE
STATISTICS As noted in Section 7.5, the central order statistics, Xk.,,, are asymptotically normally distributed as n * cc and k/n p. If extreme order statistics such as X1,,, X2:n, and X,,.,, are standardized so that they have a nondegenerate limiting distribution, this limiting distribution will not be normal. Examples of such limiting distributions were given earlier. It can be shown that the nondegenerate limiting distribution of an extreme variable must belong to one of three possible types of distributions. Thus, these three types of distributions are useful when studying extremes, analogous to the way the normal distribution is useful when studying means through the Central Limit Theorem. For example, in studying floods, the variable of interest may be the maximum
flood stage during the year. This variable may behave approximately like the * Advanced (or optional) topic
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
78 ASYMPTOTIC DISTRIBUTIONS OF EXTREME ORDER STATISTICS
251
maximum of a large number of independent flood levels attained through the year. Thus, one of the three limiting types may provide a good model for this vaiiable. Similarly, the strength of a chaXn is equal to that of its weakest link, or
the strength of a Ceramic may be the (strength at its weakest flaw, where the number of flaws, n, may be quite large. Also, the lifetime of a system of indepen. dent and identically distributed components connected in series is equal to the minimum lifetime of the components Again, one of the limiting distributions may
provide a good approximation for the lifetime of the system, even though the distribution of the lifetimes of the individual components may not be known. Similarly, the lifetime of a system of components connected in parallel is equal to the maximum lifetime of the components The following theorems, which are stated without proof are useful in studying the asymptotic behavior of extreme order statistics.
Theorn! 72.1 If the limit of a sequence of CDFs is a Continuous CDF, F(y) = hm F(y), then
for any a >0 and b, hm F(a y + b) = F(ay + b) if and only if hm a, = a >0 and Jim b =
(7.8.1) b.
of CDFs is a continuous CDF, and if hm F(a y + b) = G(y) for all a> O and all real y, then um F(; y + J3,,) = G(y)
Theorem 7.8.2 If the limit of a sequence
for; >0, if and only if,1/a-+ land (ß - b)/a
0 as n- c.
LIMITING DISTRIBUTIONS OF MAXIMUMS
Let X1, ..., denote an ordered random sample of size n from a distribution with CDF F(x). In the context of extreme-value theory, the maximum X, is said to have a (nondegenerate) limiting distribution G(y) if there exist sequences of standardizing constants {a} and {b} with a > O such that the standardized variable, I = (X - b)/;, converges in distribution to G(y),
Y=
X: - b
d
G(y)
(7.8.2)
That is, if we say that X,,, has a limiting distribution of type G, we will mean that the limiting distribution of the standardized variable 1 is a nondegenerate distribution G(y). As suggested by Theorems 7.8.1 and 7.8.2, if G(y) is continuous, the sequence of standardizing constants will not be unique; however, it is not
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
252
CHAPTER 7
LIMITING DISTRIBUTIONS
possible to obtain a limiting distribution of a different type by changing the standardizing constants. Recall that the exact distribution of X,,.,, is given by F,,.,,(x) = [F(x)]
(7.8.3)
If we consider 1 = (X,,.,, - b,,)/a,,, then the exact distribution of 1 is G,,(y) = P[
y] = F,,.,,(a,,y + b,,)
= [F(a,,y + b,,)]"
(7.8.4)
Thus, the limiting distribution of X,,,, (or more correctly 1Ç) is given by
G(y) = hm G,,(y) = hm [F(a,, y + b,,)j
(7.8.5)
Thus, equation (7.8.5) provides a direct approach for determining a limiting extreme-value distribution, if sequences {a,,J and {b,,} can be found that result in a nondegenerate limit. Recall from Example 7.2.6 that if X EXP(1), then we may let ci,, = i and b,, = In n. Thus, G,,(y) = [F(y + In
)]fl
=
[i ()e]"
(7.8.6)
and thus, G(y) = 11m [1
- ()e]" = exp (e)
(7.8.7)
The three possible types of limiting distributions are provided in the following theorem, which is stated without proof. ..ni 7.8.3 If = (X,,.,, - b,,)/a,, has a limiting distribution G(y), then G(y) must be one of the following three types of extreme-value distributions:
Type I (for maximums) (Exponential type)
G"(y) = exp (e)
Type II (for maximums) (Cauchy type) G21(y) = exp (_y_Y) y> O, y > 0
(7.8.8)
(789)
Type III (for maximums) (Limited type)
Jexp [_(_y)Y]
y
yo
(7.8,10)
The limiting distribution of the maximum from densities such as the normal, lognormal, logistic, and gamma distributions is a Type I extreme-value distribu-
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
7.8 ASYMFTOTIC DISTRIBUTIONS OF EXTREME ORDER STATISTICS
253
tion. Generally speaking, such densities have tails no thicker than the exponential
distribution. This class includes a large number of the most common distributions, and the Type I extreme-value distribution (for maximum) should provide a useful model for many types of variables related to maximums. Of course, a loca-
tion parameter and a scale parameter would need to be introduced into the model when applied directly to the nonstandardized variable The Type II limiting distribution results for maximums from densities with thicker tails, such as the Cauchy distribution. The Type III case may arise from densities with finite upper limits on the range of the variables. The following theorem provides an alternative form to equation (7.8.5), which is sometimes more convenient for carrying out the limit. Theorem 7.8.4 Gnedenko In determining the limiting distribution of 1 =
- b)/a,
hm G,(y) = him [F(ay + b)] = G(y)
(7.8.11)
if and only if
hm n[1 - F(a y + b,,)] = - ln G(y)
(7.8.12)
In many cases the greatest difficulty involves determining suitable standardizing sequences so that a nondegenerate limiting distribution will result. For a given CDF, F(x), it is possible to use Theorem 7.8.4 to solve for a,, and b,, in teiiíis of F(x) for each of the three possible types of limiting distributions. Thus, if the limiting type for F(x) is known, then a,, and b,, can be computed. If the type is not known, then a,, and b,, can be computed for each type and then applied to see which type works out. One property of a CDF that is useful in expressing the standardizing constants is its "characteristic largest value."
Definition 7.8.1 The characteristic largest value, u,,, of a CDF F(x) is defined by the equation
[1 F(u,,)] = i
(7.8.13)
For a random sample of size n from F(x), the expected number of observations that will exceed u,, is 1. The probability that one observation will exceed u,, is
p = P[X > u,,] = i - F(u,,) and the expected number for n independent observations is
np = n[1 - F(u)]
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
254
LIMITING DISTRIBUTIONS
CHAPTER 7
Theorem 7.8.5 Let X
F(x), and assume that
- b,,)/a has a limiting distribution.
=
1. If F(x) is continuous and strictly increasing, then the limiting distribution of }Ç is of exponential type if and only if
hm n[1 - F(ay + b,,)] =
e' -
(7.8.14)
where b,, = u,, and a,, is the solution of F(a,, + u,,) =
i - (ne)1
G(y) is of Cauchy type if and only if
1F(y)
um
i - F(ky)
k
k>O,y>O
(7.8,15)
and in this case, a,, = u,, and b,, = O. G(y) is of limited type if and only if um y-.o-
lF y+xo)ky 1F(y+x0)
k>O
(7.8.16)
where x0 = max {x F(x) < 1}, the upper limit of x. Also b,, = x0 and un.
7.7
Suppose again that X
EXP(0), and we are interested in the maximum of a
random sample of size n. The characteristic largest value u,, is obtained from
nEl - F(u,,)] =
nEl - (1 - e'°)] =
which gives u,,
= û ln n
We happen to know that the exponential density falls in the Type I case, so we will try that case first. We have b,, = u,, = O In n, and a,, is determined from F(a,, + u,,) =
i - e_""°
i
= i - i/(ne) which gives a,, =
O.
Thus, if the exponential density is in the Type I case, we know that
X,,,,Olfl n
-YG (y) d
(1)
(7.8.17)
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
7.8
ASYMPTOTIC DISTRIBUTIONS OF EXTREME ORDER STATISTICS
255
This is verified easily by using condition i of Theorem 7.8.5, because
um n[l - F(ay + b,)] = um n[e"] n-' = um
e
n-
=e
Example 7.8.2 The density for the CDF F(x) = i - x, x 1, has a thick upper tail, so one would expect the limiting distribution to be of Cauchy type. If we check the Cauchy-type condition given in Theorem 7.8.5, we find um
y-'
- F(y) - um i - F(ky) y-'
(ky)
- k°
(7.8.18)
so the limiting distribution is of Cauchy type with y = 8. Also, we have = a,, and we let b, = O in this nEl - F(u,)] = nu° = 1, which gives u, = case. Thus we know that
X,:, j
- nilo -
Y
G2'(y)
(7.8.19)
Now that we know how to standardize the variable, we also can verify this result directly by Theorem 7.8.4. We have
hrn n[l - F(a,y + b,)] = hrn y
= In G(y)
(7820)
so G(y) = exp (y°), which is the Cauchy type with y = O.
Example 7.8.3
For X - UNIF(O, 1), where F(x) = x, O < x < 1, we should expect a Type III limiting distribution. We have
nEl - F(u,)] = n(1 - u,) = i
= i and a, = x0 - u, = 1/n. Checking
which gives u, = i - 1/n. Thus, b, = condition 3 of Theorem 7.8.5, hrn
y-'ø
i - 'F(ky + x0) i - (ky + x0) hrn y-*0 i - F(y + x0) i (y + x0) .
-
- y-0 hrnkyy- = k .
so the limiting distribution of 1 = fl(Xn:n - 1) is Type III with y = 1. Again, if we look directly at Theorem 7.8.4 to further illustrate, we have
hrn n[i - F(a, y + b,)] =
=y
- ( +)]
= in G(y)
(7.8.21)
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
256
CHAPTER 7
LIMITING DISTRIBUTIONS
and } =
- 1) - Y 's G(y), where
G(y) = G3(y)
{eY
Oy
i
LIMITING DISTRIBUTIONS OF MINIMUMS If a nondegenerate limiting distribution exists for the minimum of a random sample then it also will be one of three possible types Indeed, the distribution of a minimum can be related to the distribution of a maximum, because min (x1,
..., x) = max
x)
(7.8.22)
Thus, all the results on maximums can be modified to apply to minimums if the details can be sorted out. Let X be continuous, X F(x), and let Z = -x ' F(z) = i - F(z). Note also that X1.,, = Now consider = (X1.,, + b,,)/a,,. We have G,,(w)
[x1.+ b,, + b,,
[ [Zn.n - b,, a,,
=P[i
w]
= I - Gy,,(w) The limiting distribution of J4', say 11(w), then is given by
11(w) = hm G,,(w) = 11m [1 - G,,(w)]
=1 - G(w) where G(y) now denotes the limiting distribution of 1 = (Z,,,, - b,,)/a,,. Thus to find H(w), the limiting distribution for a minimum, the first step is to determine
F(z) = i - F(z) then determine a,,, b,,, and the limiting distribution G(y) by the methods described for maximums as applied to Fz(z). Then the limiting distribution for 4' is
H(w) = i - G(w)
(7.8.23)
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
257
7.8 ASYMPTOTIC DISTRIBUTIONS OF EXTREME ORDER STATISTICS
Note that if F(x) belongs to one limiting type, it is possible that F(z) will belong to a different type For example maximums from EXP(0) have a Type I limiting distribution, whereas Fz(z) in this case has a Type III limiting distribution, so the limiting distribution of the minimum will be a transformed Type III distribution. In summary, a straightforward procedure for determining a,,, b,,, and H(w) is
first to find F(z) and apply the methods for maximums to determine G(y) for = (Z,,.,, - b,,)/a,,, and then to use equation (7.8.23) to obtain H(w). It also is possible to express the results directly in terms of the original distribution F(x).
Definition 7.8.2 The smallest characteristic value is the value s, defined by nF(s,,)
i
(7.8.24)
It follows from equation (7.8.22) that s,,(x) = - u,,(z). Similarly, the condition F(a, + u,,(z)) = i - 1/(ne) becomes F(a,, + s,,) = 1/(ne), and so on.
Theorem 7.8.6 1f I4 = (X1.,,
+ b,,)/a,, has a limiting distribution H(w), then 11(w) must be one of the following three types of extreme-value distributions:
Type I (for minimums) (Exponential type) In this case, b,, = - s,,, a,, is defined by
w
F(s,, - a)
Xi:,,
n
and
HW(w) = i - GW(w) = i - exp (_e') if and only if hm nF(a,,y + s,,) = e.
co
Type II (for minimums) (Caichy type)
In this case, a,, = s,,, b,, = O, H' = X1.,,/s,,, and
R)(w) = i - G2(w) = i - exp [(--w)] w
y-
F(y) F(ky)
k
k>O,y>O
or him nF(s,,y) =
yY
y>O
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
258
CHAPTER 7
LIMITING DISTRIBUTIONS
3. Type III (for minimums) (Limited type)
If x1 = min {x F(x) > O} denotes the löwer limit for x (that
is,
= x0), then
- xi
a=xl+s
b=x1
Sn - Xl
and
H)(w) = i - G3'(w) =
wy)
w>O,y>O
if and only if um y-'O
F(ky + x1)
F(y+x1)
-k
or
hm nF[(x1 n-'
)y
+ xi] = (_y)Y
Note that the Type I distribution for minimums is known as the Type I extreme-value distribution. Also, the Type III distribution for minimums is a Weibull distribution Recall that the limiting distribution for maximums is Type I for many of the common densities. In determining the type of limiting distribution of the minimum, it is necessary to consider the thickness of the right-hand tail of F(z), where Z = X. Thus the limiting distribution of the minimum for some of these common densities, such as the exponential and gamma, belongs to Type III. This may be one reason that the Weibull distribution often is encountered in applications.
Example 7.8.4 We now consider the minimum of a random sample of size n from EXP(0). We already know in this case that X1.,, EXP(O/n), and so flXi:n/8 EXP(1). Thus the limiting distribution of nX1.,,/O is also EXP(1), which is the Type III case with y = I If we did not know the answer, then we would guess that the limiting distribution was Type III, because the range of the variable Z = -x is limited on the right. Checking condition 3 in Theorem 7.8.6, we have x1 = O and Jim
yO
F(ky + x1)
F(y+x1)
him
yO
exp (- ky)
k exp (- ky)
iexp (y) y+ exp(y) - k
Thus, we know that H(w) = i - e_w, where
X1., - x1
X1.,,
Snx1
8n
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
EXERCISES
259
In this case, i,, is given by
F(s)= I e=- or s= _Oln(I _!) This does not yield identically the same standardizing constant as suggested earlier; however, the results are consistent because
In (1 - 1/n) 1/n
SUM M R Y The purpose of this chapter was to introduce and develop the notions of convergence in distribution, limiting distributions, and convergence in probability. These concepts are important in studying the asymptotic behavior of sequences of random variables and their distributions The Law of Large Numbers (LLN) and the Central Limit Theorem (CLT) deal with the limiting behavior of certain functions of the sample mean as the sample size approaches infinity Specifically the LLN asserts that a sequence of sample means converges stochastically to the population mean under certain mild conditions. This type of convergence is also equivalent to convergence in probability in this case, because the limit is constant. Under certain conditions, the CLT asserts
that a suitably transformed sequence of sample means has a normal limiting distribution. These theorems have important theoretical implications in probability and statistics, and they also provide useful approximations in many applied situations. For example, the CLT yields a very good approximation for the binomial distribution.
EXERCISES 1.
Consider a random sample of size n from a distribution with CDF F(x) = i - 1/x if i x < co and zero otherwise Derive the CDF of the smallest order statistic, X1:,,. Find the limiting distribution of X1.,,. Find the limiting distribution of Xi.,,.
2. Consider a random sample of size n from a distribution with CDF F(x) = (1 + e)1 for all real x.
Does the largest order statistic, X,have a limiting distribution? Does X,,.,, - In n have a limiting distribution? If so, what is it?
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
260
CHAPTER 7
LIMITING DISTRIBUTIONS
3. Consider a random sample of size n from a distribution with CDF F(x) = i - x if x> 1, and zero otherwise. Determine whether each of the following sequences has a limiting distribution, if so, then give the limiting distribution
X1.. X,,,. 4. Let X1, X2, ... be independent Bernoulli random variables, X
BIN(1, pi), and let
= >(X - p)/n. Show that the sequence Y1, Y2, ... converges stochastically to c = 0 as n - cc. Hint: Use the Chebychev inequality.
8.
Suppose that Z
N(0, 1) and that Z1, Z, ... are independent. Use moment generating
functions to Ihid the limiting distribution of
(Z +
(
Show that the limit in equation (7.3.2) is still correct if the assumption np = u is replaced by the weaker assumption np n p as n n cc.
7.
Consider a random sample from a Weibull distribution, X. values a and b such that for n = 35:
P[a <
< b]
0.95.
Plia <
0.95, where
WEI(1, 2). Find approximate
= X18:35 is the sample median.
8. In Exercise 2 of Chapter 5, a carton contains 144 baseballs, each of which has a mean weight of 5 ounces and a standard deviation of 2/5 ounces. Use the Central Limit Theorem to approximate the probability that the total weight of the baseballs in the carton is a maximum of 725 ounces.
9.
Let X, X2.....X100 be a random sample from an exponential distribution,
KEXP(1),andlet YX1 +X2+
+X100.
Give an approximation for P{Y> 110]. If is the sample mean, then approximate P[i.1 < 10.
< 1.2].
Assume X GAM(1, n) and let Zn = (X - n)/..J. Show that Z -n Z - N(0, 1). Hint: Show that M,(t) = exp (- \/ t - in (1 - t/,,/)) and then use the expansion In (1 - s) = s(i + )s2/2 where n O as snO. Does the above limiting distribution
also follow as a result of the CLT? Explain your answer.
11. Let X UNIF(0, 1), where X1, X2, ..., X20 are independent. Find normal approximations for each of the following: 20
P[XI
12]. 20
The 90th percentile of
X.
12. A certain type of weapon has probability p of working successfully. We test n weapons, and the stockpile is replaced if the number of failures, X, is at least one. How large must n
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
EXERCISES
261
be to have P[X 1] 0.99 when p = 0.95? Use exact binomial. Use normal approximation Use Poisson approximation Rework (a) through (c) with p = 0.90.
13. Suppose that
NB(n, p). Give a normal approximation for P[Y y] for large n. Hint: is distributed as the sum of n independent geometric random variables
14.
For the sequence } of Exercise 13: Show that .Jn converges stochastically to l/p, using Theorem 7.6.2. Rework (a) using Theorem 7.6.3.
15. Let Ç be the weight of the Üb airline passenger's luggage. Assume that the weights are independent, each with pdf
f(w)= 6B°w°
if O
and zero otherwise. i 00
For n = 100, 0 = 3, and B = 80, approximate P 1f W1.
Wi> 6025
is the smallest value out of n, then show hat ¡Vi.,,
If 14,.,, is Che largest value out of n, then show that
-
O as n -+
B as n -*
.
Find the limiting distribution of(l4',./B). Find the asymptotic normal distribution of the median, 14.,,, where k/n -+ 0.5 with k - 0.5n bounded. To what does Wk.fl of (e) converge stochastically?
What is the limiting distribution of n1'° W./B?
16. Consider a random sample from a Poisson distribution, X P01(u). Show that }Ç = e' converges stochastically to P[X = 0] = e. Find the asymptotic normal distribution of }. Show that X,, exp (X,,) converges stochastically to P[X = 1] = ue'. 17. Let X1, X2.....X,, be a random sample of size n from a normal distribution, N(u, g2), and let X,, be the sample median. Find constants in and c such that i,, is X1 asymptotically normal N (in, c2/n).
18. In Exercise 1, find the limiting distribution of n ln X1.,,.
19. In Exercise 2, find the limiting distribution. of (1/n) exp (X,,,,).
20. Under the assumptions of Theorem 7.5.1: Show that Xk.,, converges stochastically to Show that F(X,.,,) - p if F(x) is continuous.
21. As noted in the chapter, convergence of a sequence of real numbers to a limit can be regarded as a special case of stochastic convergence. That is, if P[Y,, = c,,] = i for each n,
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
262
CHAPTER 7
LIMITING DISTRIBUTIONS
and e,, - e, then P
-
e. Use this along with Theorem 7.7.3 to show that if a,, - a, b,, - b p
and Y,,*e,thena,,+b,,Y,,--+a+be. Consider the sequence of independent Bernoulli variables in Exercise 4. Show that if
>pjn-3lasn--*o3 thenX,/n-1 For the sequence of random variables Z,, of Exercise 10, if
is another sequence such that
Y,, * cand if H Y,,Z,, does W,, have a limiting distribution? If so, what is it? What is the limiting distribution of Z,,/ ?
24
Use the normal approximation to work Exercise 6 of Chapter 3
25
Use the theorems of Section 7 8 to determine the standardizing constants and the appropriate extreme value distributions for each of the following
Xv,, and X,,.,, where F(x)=(1 + e_x). X1.,, and X,,.,, where X X1.,, and X,,.,, where X Xi:n and X,,.,, where X.
WEI(O, ß).
EV(6, j). PAR(O,
4
Consider the CDF of Exercise 1. Find the limiting extreme-value distribution of X1.,, and compare this result to the results of Exercise 1.
Consider a random sample from a gamma distribution, Xlimiting extreme-value distribution of Xi:,,.
GAM(O, îc), Determine the
Consider a random sample from a Cauchy distribution, X - CAU(1, 0). Determine the type of limiting extreme-value distribution of X.,,,
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
C
H
P
A
T
E
R
STìîI STI Cs MND SAMPLING DISTRIBUTIONS
8i INTRODUCTION In Chapter 4, the notion of random sampling was presented. The empirical dis
tribution function was used to provide a rationale for the sample mean and sample variance as intuitive estimates of the mean and variance of the population distribution. The purpose of this chapter is to introduce the concept of a statistic, which includes the sample mean and sample variance as special cases, and to derive properties of certain statistics that play an important role in later chapters.
82 STATISTICS Consider a set of observable random variables X1, ..., X. For example, suppose the variables are a random sample of size n from a population. 263
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
264
CHAPTER 8 STATISTICS AND SAMPLING DISTRIBUTIONS
Defin A function of observable random variables, T = ¿'(X1, ..., Xe), which does not depend on any unknown parameters, is called a statistic.
In our notation, script ' is the function that we apply to X1, ..., X to define the statistic, which is denoted by capital T. It is required that the variables be observable because of the intended use of
a statistic. The intent is to make inferences about the distribution of the set of random variables, and if the variables are not observable or if the function x) depends on unknown parameters, then T would not be useful in making such inferences. For example, consider the data of Example 4.6.3, which were obtained by observing the lifetimes of 40 randomly selected electrical parts. It is reasonable to assume that they are the observed values of a random sample of size 40 from the population of all such parts. Typically, such a population will have one or more unknown parameters, such as an unknown population mean,
say p. To make an inference about the population, suppose it is necessary to numerically evaluate some function of the data that also depends on the unknown parameter, such as I(x1, ..., x0) = (x1 + ... + x40)/p or x1.40 - p. Of course, such computations would be impossible, because p is unknown, and these functions would not be suitable for defining statistics. Also note that, in general, the set of observable random variables need not be a random sample. For example, the set of ordered random variables Y1, ..., Y10
of Example 6.5.6 is not a random sample. However, a function of these vari ables that does not depend on unknown parameters, such as 4y, ...,
y)
+ Yto) + 3Yio, would be a statistic. (Yi + Most of the discussion in the chapters that follow will involve random samples.
iirimpIe 8.2.1
Let X1, ..., X represent a random sample from a population with pdff(x). The sample mean, as defined in Chapter 4, provides an example of a statistic with the function «X1, ..., x) = (x1 + x/n. This statistic usually is denoted by
Xi -;-
n
(8.2.1)
When a random sample is observed, the value of , computed from the data, usually is denoted by lower case . As noted in Chapter 4, is useful as an estimate of the population mean, p = E(X). The following theorem -provides important properties of the sample mean.
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
265
8.2 STATSTJCS
Theorem 8.2.1 If X1, ..., X,, denotes a random sample from f(x) with E(X) = p and Var(X) = cr2 then
EÇ') = p
(8.2,2)
Var(X) =
(82.3)
and
Property (8.2.2) indicates that if the sample mean is used to estimate the popu-
lation mean, then the values of sample estimates will, on the average, be the population mean ,u. Of course, for any one sample the value 5 may differ substantially from p A statistic with this property is said to be unbiased for the param
eter it is intended to estimate. This property will receive more attention in the next chapter. An important special case of the theorem occurs when the population distribution is Bernoulli
Example 8.2.2
Consider the random variables X1, ..., X of Example 5.2.1, which we can regard as a random sample of size n from a Bernoulli distribution X, BIN(1 p) The Bernoulli distribution provides a model for a dichotomous or two-valued popu-
latiorì. The mean and variance of such a population are p = p and
pq
= Y/n, where Y is the binomial variable of Example 5.2.1, and it usually is called the sample proportion, denoted = Y/n. It is rather straightforward to show that is an unbiased estimate of p, respectively, where, as usual, q = i - p. The sample mean in this case is
E(p)=p
(824)
and that Var(ô)
pq n
(8.2.5)
As noted earlier, the binomial distribution provides a model for the situation of sampling with replacement. In Example 5.2.2, the comparable result for sampling without replacement was considered, with Y '- HYP(n, M, N).
In that example, suppose that we want to estimate M/N, the proportion of defective components in the population, based on the sample proportion, YIn. We know the mean and variance of Y from equations (5.2.20) and (5.2.21). Specifically,
(Y"\ = M =p
(8.2.6)
which means that YIn is unbiased for p, and Var(Y/n) is shown easily to approach zero as n increases. Actually, in this example, it is possible for the
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
266
CHAPTER 8
STATISTICS AND SAMPLING DISTRIBUTIONS
variance to attain the value O if n = N, which means that the entire population has been inspected. )2 Example 8.2.3 The function I(x1, ..., x,,) = [(x1 + + (x - 5)2]/(n - 1) when applied to data corresponds to the sample variance which was discussed in Chapter 4.
Specifically, the sample variance is given by
1=1
S2
(8.2.7)
n-1
The following alternate forms may be obtained by expanding the square:
¡n
n
1=1
S2
\2
¿=1
8,2.8)
n-1 flX
>.X2
-
1=1
(8.2.9)
n-1
The following theorem provides important properties of the sample variance.
Theorem 8.2.2 If X1, ..., X,, denotes a random sample of size n -from f(x) with E(X) = p, Var(X) = a2 then E(S2) = a2
(8.2.10)
n-3 o \\/J/n ( Var(S2) = (/14 n-1 j/ \
n> 1
(8,2.11)
Proof Consider property (8.2.10). Based on equation (8.2.9), we have E(S2) =
E[X
-n
1)
[E(x) - nE(2)] + a) -1- n(2 +
i - n-1 [(n_1)a2]
The proof of equation (8.2.11) is omitted.
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
8.3 SAMPLING DISTRIBUTIONS
67
According to property (8.2.10), the sample variance provides another example of an unbiased statistic, and this is the principal reason for using the divisor n - i rather than n.
'3 SAMPUNG DSTRIBUTONS A statistic is also a random variable, the distribution of which depends on the distribution of a random sample and on the form of the function 4x1, x2, ..., x). The distribution of a statistic sometimes is referred to as a derived distribution or sampling distribution, in contrast to the population distribution. Man) important statistics can be expressed as linear combinations of independent normal random variables.
LINEAR COMBINATIONS OF NORMAL VARIABLES Theorem 8.3.1
If X,
N(p,, o); i = 1, ..., n denote independent normal variables, then
N(ail1. Êacr)
Y=
(8.3.1)
Proof M(t)
flMx.(a1t)
i=1 .11
= fl e'' +at2i/2 = exp
[taPi + t2aa/2]
which is the MGF of a normal variable with mean
Corollary 8.3.1 If X1,..., X denotes a random sample from N(u,
Example 83.1
a, ,u, and variance
then X
ac
NCu, ci2/n).
In the situation of Example 3.3.4, we wish to investigate the claim that X
N(60, 36), so 25 batteries are life-tested and the average of the survival times of the 25 batteries is computed. If the claim is true, the average life of 25 batteries
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
268
CHAPTER 8 STATISTICS AND SAMPLING DISTRIBUTIONS
should exceed what value 95% of the time? We have E(X) = 60, Var(s) = 36/25, and
P[X>c]= i
(c_6O)
= 0.95 so
6/5
- z005
= - 1.645 and c = 58.026 months.
In general, for a prescribed probability level
i - , one would have (8.3.2)
or in terms of percentiles directly available in Table 3 (Appendix C) for small one could write
,
Thus a reasonable procedure would be to accept. the claim if the observed 58 026, but to disbelieve or reject the claim if
< 58 026, because that should
happen with very small probability (less than 0.05) if the claim is true. If one wished to be more certain before rejecting, then a smaller z, say = 0.01, could be used to determine the critical value c. This test procedure favors the consumer, because it does not reject when a large mean life is indicated. An appropriate test for the other direction (or both directions) also could be constructed.
Example 3.3.2
Consider two independent random samples X1, X2, ..., X,,, and Y1, Y2, ..., with respective sample sizes n1 and n2, from normally distributed populations, X, '- NCu1, c) and 1 NCu2, o) and denote by and Y the sample means It
follows from Theorem 8.3.1 that the difference also is normally distributed, - Y N(u1 - /12, a/n1 + c/n2), It is clear that the first n1 terms of the dif ference have coefficient a, = 1/n1 and the last n2 terms have coefficient a1
________________ =
= - 1/n2. Consequently, the mean of the difference is n1(i/n1)u1
+ n2( - i/n2)p2
- /12, and the variance is n1(i/n1)2c + n2( i/n2)2c = cJ/n1 + cr/n2.
Certain additional properties involve a special case of the gamma distribution.
CHI-S QUA RE DISTRIBUTION
Consider a special gamma distribution with û = 2 and ,c = v/2 The variable Y is said to follow a chi-square distribution with y degrees of freedom if
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
8.3
Y
GAM(2, v/2). A special notation for this is Y
Theorem 8.3.2
269
SAMPLING DISTRIBUTIONS
If Y
(8.34)
x2(v)
2(v), then
M(t) = (1 - 2t)'2 E(YT)
(8.3.5)
= 2' F(V/'2+
(8.3.6)
E(Y) = y
(8.3.7)
Var(Y) = 2v
(8.3.8)
Proof These results follow from the corresponding properties of the gamma distribu tion.
The cumulative chi-square distribution has been extensively tabulated in the literature. In most cases percentiles, x(v), are provided for particular y levels of 2(v) then X(v) is the interest and for different values of y Specifically, if Y value such that
P[Y
(83.9)
x(v)] = y
Values of x(v) are provided in Table 4 (Appendix C) for various values of y and y. These values also can be used to obtain percentiles for the gamma distribution.
Theorem 8.3.3 If X GAM(0, ic), then Y = 2X10
X2(2K).
Proof
M(t) = M2xje(t) = M(2t/û)
= (1 - 2t)22 which is the MGF of a chi-square distribution with 2K degrees of freedom.
The gamma CDF also can be expressed in terms of the chi-square notation. If
X '-' GAM(0, ,), and if H(y; y) denotes a chi-square CDF with y degrees of freedom, then
F(x) = H(2x/O; 2K)
(8.3.10)
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
270
CHAPTER 8 STATISTICS AND SAMPLING DISTRIBUTIONS
Cumulative chi-square probabilities H(c; y) are provided in Table 5 (Appendix C) for various values of c and y.
Exampk 8.3.3 The time to failure (in years) of a certain type of component follows a gamma distribution with 0 3 and K = 2 It is desired to determine a guarantee period for which 90% of the components will survive. That is, the 10th percentile, is desired such that P[X x0 ] = 0 10 We find
P[X
x00]
= H(2x0
x0,
/0; 2K) = 0.10
Thus setting
X1o(2K) gives X0.
2
For O = 3 and K = 2,
3x.(4) 2
(3X1.06) 2
- 1.59 years
It is clear in general that the pth percentile of the gamma distribution may be expressed as x
0X(2K) 2
(8.3.11)
The following theorem states the useful property that the sum of independent chi-square variables also is chi-square dìstributed.
7h &rern 8.3.4 If Y
x2(v1); i = 1, ..., n are independent chi-square variables, then (8.3.12)
= (1 - 2t)-Z vj/2 which is the MGF of x2(E y.).
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
8,3
SAMPLING DISTRIBUTIONS
271
The following theorem establishes a connection between standard normal and chi-square variables.
Theorem 83.5 If Z
N(O, 1), then Z2
Proof M2(t) = E
etZZ]
ie2_'
J2ir
i
_ÎjJ_ Ç
dz
i - 2t e_22(_2t2 dz
= (1 - 2t)"2 which is the MGF of a chi-square distribution with one degree of freedom.
Corollary 8.3.2 If X1,..., X, denotes a random sample from N(u, c2), then
(Xp)2 i=1
2
(8.3.13)
2
2 '-x(i)
(8.3.14)
The sample variance was discussed previously, and for a sample from a normal population its distribution can be related to a chi-square distribution. The sampling distribution of S2 does not follow directly from the previous corollary because the terms, X. - , are not independent. Indeed, they are functionally dependent because
(X, - ) = O
Theoram 8.3.6 If X1, ..., X, denotes a random sample from N(u, c2), then
-
and the terms X ; ¡ = i, ..., n are independent. ' and S2 are independent. (n - i)S2/C2 X2(n 1).
(8.3.15)
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
272
CHAPTER 8 STATISTICS AND SAMPLING DISTRIBUTIONS
Proof To obtain part 1, first note that by adding and subtracting we obtain the relationship
(x,.u)
n
i=1
+
i=1
Thus the joint density of X1,
...,
x,,)
and then expanding,
n( (8.3.16)
..., X,, may be expressed as i
i
= (2)nJ2fl exp
i
(212o'
n (x_ti\2 )
(x -
exp
)2
+ n( -
Now consider the joint transformation
yi=,yi=x
i=2,...,n
We know that
=
i2 (x - = - i=2
so
\2
n
n
Yi) +
Y?
and
'JI . Y,,) - (2rc)"12o" X exp {-
i
r' L\
n
"2
i=2 /
n
±
y? + (Yi
It is easy to see that the Jacobian, J, is a constant, and in particular it can be shown that I J ¡ = n. Therefore, the joint density function factors into the marginal
density function of Yi times a function of Y2' ..., y,, only, which shows that = J and the oerms Y, = X, for i = 2 , n are independent Because
X1 -
= (X, -
,
it follows that
and X1 -
also are independent
Part 2 follows from part 1, because S2 is a function only of the X.
-
To obtain part 3, consider again equation (8.3.16) applied to the random sample,
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
273
8.4 THE t, F, AND BETA DISTRIBUTIONS
From Corollary 8.3.2, V1 - X2(n) and V3 dent, so
2(1), Also, V2
and V3 are indepen-
M(t) = M2(t)M3(t) and
Mp2(t)
M1(t) (1 - 2t)"2 - M3(t) - (1 - 2t)"2
= (1 -
2t)'2 -
Thus, V2 = [(n - 1)S2]/a2 -' f(n 1). Thus, if c is the yth percentile of the distribution of S2, then
c. -
o2X(n - i)
(8.3.17)
n-1
Consider again Example 8.3.1, where it was assumed that X '.- N(60, 36). Suppose that it was decided to sample 25 batteries, and to reject the claim that = 36 ifs2 54.63, and not reject the claim ifs2 < 54.63. Under this procedure, what would be the probability of rejecting the claim when in fact cr2 = 36? We see that P[S2
54.63] = P24S2/36
36.42]
= i - H(36.42; 24) = 0.05
If instead one wished to be wrong only 1% of the time when rejecting the
claim, then the procedure would be to reject if = 2x.99(n - 1)/(n - 1) = 36(42.98)/24 = 64.47.
s2
co.99, where co.99
THE t, F, AND BETA DISTRIBUTIONS Certain functions of normal samples are very important in statistical analysis of populations.
STUDENT'S t DISTRIBUTION We noticed that S2 can be used to make inferences about the parameter 0.2 in a normal distribution Similarly, is useful concerning the parameter j, however the distribution of also depends on the parameter cr2. This makes it impossible
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
274
CHAPTER 8 STATISTICS AND SAMPLING DISTRIBUTIONS
to use for certain types of statistical procedures concerning the mean when a2 is unknown. It turns out that if a is replaced by S in the quantity
-
then the resulting distribution is no longer standard normal, but it does not depend on a, and it can be derived using transformation methods Theorem 8.4.1
If Z N(O, 1) and V - x2(v), and if Z and V are independent, then the distribution of
Z
(8.4.1)
,.JV/v
is referred to as Student's t distribution with y degrees of freedom, denoted by T t(v). The pdf is given by
(v1) f(t; y)
-_
(i +
viv
-
+ 1)J2
(8.4.2)
VJ
Proof The joint density of Z and Vis given by fz, V(Z, y)
=
?t2 - 'e v/2e - z2J2
/2ir r'(v/2)2v!2
O
Consider the transformation T = Z/\/V/v, W = V, with inverse transformation y = w, z t..Jw/v The Jacobian is J = Jw/v and fT, (t, w) -
(w/v)l'2wW2 - 'e - w!2e - t2w/2v
\/
r'(v/2)2v/2
cY3,
0 < w < cc
After some simplification, the marginal pdff(t; y) = I A, (t, w) dw yields equaJo
tion (8.4.2).
The t distribution is symmetric about zero, and its general shape is similar to that of the standard normal distribution. Indeed, the t distribution approaches the standard normal distribution as y - cc. For smaller y the t distribution is flatter with thicker tails and, in fact, T CAU(1, 0) when y = 1. Percentiles, t(v), are provided in Table 6 (Appendix C) for selected values of y
andforv= 1,...,30,40, 60, 120, cc.
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
8.4
THE t, F, AND BETA DISTRIBUTIONS
Thøorem 8.42 If T
275
t(v), then for y> 2r, JT((2r + 1)/2)r((v - 2r)/2)
E(T2r)
T(1/2)F(v/2)
V
r
r=l,2,..,
E(T21)=O
Var(T)=V2
2
(843) (8.4.4)
(8.4.5)
Proof The 2rth moment is E(T2r) = E(Z2')E[(V/v)_']
where Z
N(O, 1) and V
x2(v). Substitution of the normal and chi..square
moments gives the required result.
r
As suggested earlier, one application of the t distribution arises when sampling from a normal distribution, as illustrated by the following theorem.
Theoria L.4.3 If X1, ..., X,, denotes a random sample from Neu, a2) then '
t(n - 1)
(8.4.6)
Proof This follows from Theorem 8.4.1, because Z = - )/a ' N(O, 1) and, by Theorem 8.3.6, V = (n - 1)S2/a2 x2(n - 1), and . and S2 are independent.
SNEDECOR'S F DISTRIBUTION Another derived distribution of great importance in statistics is called Snedecor's F distribution.
Theorem 8.4.4 If V1 - x2(v1) and V2 V1/v
V2/v2
x2(v2) are independent, then the random variable (8.4.7)
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
276
CHAPTER 8 STATISTICS AND SAMPLING DISTRIBUTIONS
has the following pdf for x > O:
(y1 + v2 2
g(x; y1, y2) -
)
) x( i /\ +
f \vi/2
' \V21 I
\ -(vj+v2)/2
y X
(8.4.8)
V
This is known as Snedecor's F distribution with y1 and y2 degrees of freedom, and is denoted by X F(v1, y2). Some authors use the notation F rather than X for the ratio (8.4.7).
The pdf (8 4 8) can be derived in a manner similar to that of the t distribution as in Theorem 8.4.1.
Theor8m 8.4.5 If X
F(v1, y2), then
(.)rr(!! + vi
\2
/1
'2 - )
E(X)- V22 2 Var(X)
2v(vj + V2 - 2) vi(v2 - 2)2(v2 - 4)
y2
> 2r
(8.4.9)
2 < V2
(8.4.10)
4< V2
(8.4.11)
Proof These results follow from the fact that V1 and V2 are independent, and from chi-square moments (8.3.6). Specifically, they can be obtained from
E(X') = ('-a)'E( V)E(V')
(8.4.12)
Percentilesf(v1, y2) of X '-' F(v1, y2) such that
P[X z f(v1, y2)] = y
(8.4.13)
are provided in Table 7 (Appendix C) for selected values of y, y1, and y2. Percentiles for small values of y can be obtained by using the fact that if X F(v1, y2),
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
8.4
277
THE t, F, AND BETA DISTRIBUTIONS
then Y = 1/X
F(v2, y1). Thus,
i - y = P[X
=1P
2)]
i J1_1,Vi, V2)J Y
f1_(v1, y2
so that
i
f(v2.
Example 8.4.1
be independent random samples from popuLet X1, ..., X1 and Y1, ..., lations with respective distributions X. ' N(ji1, ) and Y -' N(u2, ci). If 2(v2), so -' x2(v) and y2 = n1 - i and y2 = n2 - 1, then y1 that
= 0.95 and
P
-
('2
f095(v1, y2)
cfi
iI=095 cr2j
If n1 = 16 and n2 = 21, then f095(15, 20) = 2.20, and for two such samples it usually is said that we are 95% "confident" that the ratio a/o' > s/[sf0,9(15, 20)]. This notion will be further developed in a later chapter. BETA DISTRIBUTION An F variable can be transformed to have the beta distribution. IfF(v1, X' then the random variable Y
(v1/v2)X
- i + (v/v2)X
(8.4.15)
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
278
CHAPTER 8
STATISTICS AND SAMPLING DISTRIBUTIONS
has the pdf
O
(8.4.16)
where a = v1/2 and b = v2/2. This pdf defines the beta distribution with parameters a > O and b > O, denoted Y BETA(a, b) The mean and variance of Y easily are shown to be
E(Y)=°b Var(Y)
(8.4.17)
ah
(8.4.18)
(a+b+ 1)(a+b) 2
The yth percentile of a beta distribution can be expressed in terms of a percentile of the F distribution as a result of equation (84 15), namely
y(a b)
af(2a, 2h) = b + af(2a, 2b)
(8 4 19)
If a and b are positive integers, then successive integration by parts leads to
a relationship between the CDFs of beta and binomial distributions. If X
BIN(n, p) and Y j-.' BETA(n - i + 1, i), then F(i - 1) = F(1 - p). The beta distribution arises in connection with distributions of order statistics.
For a continuous random variable X
f(x), the pdf of the kth order statistic
from a random sample of size n is given by
(k - 1)n - k)!
[F(xkfl)]k_l[1
- F(xfl)]f(xk.fl)
Making the change of variable Uk:,, = F(Xk:,,) gives Uk.,,
BETA(k, n - k + 1)
Because U = F(X) UNIF(O, 1), it also follows that Uk.,, represents the kth largest ordered uniform random variable. The CDF of Xk.,, can be expressed in terms of a beta CDF, because Gk(Xk:,,) = P[Xk:fl
= P[F(Xk.,,)
F(Xk:fl)]
= H(F(xk.,,); k, n - k + 1) where H(y; a, b) denotes the CDF of Y
Example 8.4.2
Suppose that X
BFTA(a, b).
EXP(0), and one wishes to compute probabilities concerning
Xk,,. We have
F(x) = 1 - e°
Uk.,, = F(Xk:,,) ' BETA(k, n - k + 1)
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
279
8.4 THE t, F, AND BETA DISTRIBUTIONS
and
( c] = P[F(Xk:) = P[U1.
-
F(c)] F(c)]
(nk+ l)Uk.fl
'(nk+ 1)F(c)
k(1 - Uk.,,)
k(1 - F(c))
L
where this last probability involves a variable distributed as F(2k, 2(n - k + 1)). Thus for specified values of 0, c, k, and n, this probability can be obtained from a cumulative beta table, or from a cumulative F table if the proper a level is available. For the purpose of illustration, we wish to know c such that
P[Xk.,, < c] = y then
f(2k, 2(n - k + 1)) =
(n - k + l)F(c) k[1 - F(c)] (n - k + 1)(1 - exp (c/0)) k exp (cr/O)
and
=0 ln
[i
+
kf(2k, 2(n
k + t))
nk-I-1
(8.4.20)
= Ii, k = 6, and y = 0.95, then = O ln
[i
+
6(269)]
- 1.310
and P[X6.11
1.310] = 0.95
or
[x61i
o] = 0.95
where X6.11 is the median of the sample, and 0 is the mean of the population.
We have defined the beta distribution and we have seen its relationship to the F distribution and the binomial CDF, as well as its application to the distribu tion of ordered uniform random variables. The beta distribution represents a generalization of the uniform distribution, and provides a rather flexible twoparameter model for various types of variables that must lie between O and 1.
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
280
CHAPTER 8 STATISTICS AND SAMPLING DISTRIBUTIONS
8.5 LARGE-SAMPLE APPROXIIMJIONS The sampling distributions discussed in the earlier sections have approximations that apply for large sample sizes.
Theorem 8.5.1
If Y,,
x2(v),
then
This follows from the CLT, because Y, is distributed X1, ..., X,, are independent, and X
2(l), so that E(X1)
as a sum >X where i and Var(X1) = 2.
We also would expect the pdf's of Z,, to closely approximate the pdf of Z for large y This is illustrated in Figure 8 1 which shows the pdf s for y = 20, 80 and 200.
FIGURE 8.1
Comparison of pdf's of standardized chi-square and standard normal distributions
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
8.5
281
LARGE-SAMPLE APPROXIMATtONS
It follows that chi-square percentiles can be approximated in terms of standard normal percentiles for large y. Specifically,
y = P[Y,,
-v
(Xvv
so that
(8.5.1)
For example, for y = 30 and y = 0.95,
30 + 1.645\/
X.95(30)
= 42.74
compared to the exact value x 95(30) = 43 77 A more accurate approximation known as the Wilson-Hilferty approximation, is given by
x(v)v[1
(8.5.2)
(v)/v within 0.01 of exact values for 0.99. For example, if y = 30 and y = 0.95, approximation (8.5.2) gives Xo.95(30) 43.77, which agrees to two decimal places with the exact
This gives approximate values of y
3 and 0.01
y
value.
It also is possible to derive asymptotic normal distributions directly for S
and S.
pli 8.5i
Let S denote the sample variance from a random sample of size n from Ncu, We know that ¡
PS2 2
and from Theorem 8.5.1,
(n-1) d* Z '
N(0, 1)
/2(n-1) That is ,.Jn
i [S
d
(8.5.3)
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
282
CHAPTER 8 STATISTICS AND SAMPLING DISTRIBUTIONS
or approximately,
N(2,
i)
(854)
If 1= S,, and g(y) = ,/'j, then g'(y) = 1/2J, g'(o2) = 1/2c and approximately, Sn
(8.5.5)
2(n - 1)]
It also is possible to show that a t-distributed variable has a limiting standard normal distribution as the degrees of freedom y increases. To see this, consider a variable 1 - t(v), where
We know that E(/v) = 1, Var(/v) = 2/y, and by Chebychev's inequality, P[Ix/v - iI < ] I - 2/v2, so that /v - I as y -* co. Thus Student's t distribution has a limiting standard normal distribution by
Theorem 7.7.4, part 3:
T=
FIGURE 8.2
'j X
N(O, 1)
(8.5.6)
Comparison of pdf's oft and standard normal distributions
f(t)
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
283
EXERCISES
This is illustrated by Figure 8.2, which shows the pdf's of N(0, 1) and t(v) for y = 1, 3, and 10. This suggests that the t percentile, t(v), is approximately equal to z for large y,
and it leads to the last row of valaes in Table 6 (Appendix C) that correspond to the standard normal percentiles.
A similar rationale yields approximate percentiles for an F distribution.
Suppose X = (V1/v1)/(V2/v2) As noted in the above discussion, V2/v2 - i as y2 - co Thus, if y1 is kept fixed, x1 -4 V1/v as y2 - co The resulting approx imation for an F percentile is f(v1, y2) = X(v1)/vi for large y2 A similar argu ment leads to an approximation for large y1, namely, f(v1, y2) v2/ _(v2). These also provide the limiting entries in Table 7 (Appendix C).
SUMMARY Our purpose in this chapter was to study properties of the normal distribution and to derive other related distributions that arise in the statistical analysis of data from normally distributed populations. An important property of the normal distribution is that linear combinations of independent normal random variables are also normally distributed, which means, among other things, that the sample mean is normally distributed. A certain function of the sample variance is shown to be chi-square distributed, and the sample mean and sample variance are shown to be independent random variables. This is important in the development of statistical methods for the analysis of the population mean when the population variance is unknown. This corresponds to Student's t distribution, which is obtained as the distribution of a standard normal variable divided by the square root of an independent chi square variable over its degrees of freedom Another example involves Snedecor s F distribution, which is obtained as the distribution of the ratio of two independent chi-square variables over their respective degrees of freedom. Variables of the latter type are important in statistical analyses that compare the variances of two normally distributed populations.
EXERCISES Let X denote the weight in pounds of a bag of feed, where X probability that 20 bags will weigh at least a ton?
N(101, 4). What is the
s denotes the diameter of a shaft and B the diameter of a bearing, where S and B are independent with S N(1, 0.0004) and B N(1.01, 0.0009). (a) Ifa shaft and a bearing are selected at random, what is the probability that the shaft diameter will exceed the bearing diameter?
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
284
CHAPTER 8
STATISTICS AND SAMPLING DISTRIBUTIONS
(b) Assume equal variances = cr o, and find the value of î that will yield a probability of noninterference of 0.95.
3. Let X1, X2, ..., X be a random sample of size n from a normal distribution, X,
N(p,
), and define U
X1
and W =
Find a statistic that is a function of U and W and unbiased for the parameter O = 2a 52 Find a statistic that is unbiased for ci2 + ,u2. Let c be a constant, and define Y, = I if X1 s c and zero otherwise. Find a statistic that is a function of Y1, Y2.....Y and also unbiased for Fx(c)
-
4. Assume that X1 and X2 are independent normal random variables, X1
N(u, ci2), and let
Y1 = X1 + X2 and Y2 = X1 - X2. Show that 1' and Y2 are independent and normally distributed.
5.
A new component is placed in service and nine spares are available. The times to failure in days are independent exponential variables, 7 EXP(100).
What is the distribution of What is the probability that successful operation can be maintained for at least 1.5 years? Hint: Use Theorem 8.3.3 to transform to a chi-square variable. How many spares would be needed to be 95% sure of successful operation for at least two years?
6. Repeat Exercise 5 assuming 7
GAM(100, 1.2).
7. Five independent tasks are to be performed, where the time in hours to complete the ith task is given by 7 GAM(l00, ,cj, where ic1 = 3 + i/3. What is the probability that it will take less than 2600 hours to complete all five tasks? 8. Suppose that X x2On), Y n > m?
2(n), and X and Y are independent. Is Y
- X f if
9, Suppose that X - 2(m), S = X + Y -. 2(m + n), and X and Y are independent. Use MGFs to show that S - X A random sample of size n = 15 is drawn from EXP(0). Find c so that P[c where is the sample mean.
< O] = 0.95,
LetZ'-.N(0,l). Find P[Z2 <3.84] using tabled values of the standard normal distribution. Find P[Z2 <3.84] using tabled values of the chi-square distribution.
2. The distance in feet by which a parachutist misses a target is D = ,./X + X, where X1 and X2 are independent with X.
N(0, 25). Find P[D
12.25 feet].
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
EXERCISES
13.
285
Consider independent random variables Z. sample mean. Find:
N(O, 1), i = 1,,.,, 16, and let 2 be the
P[Z
[16 !
(Z1 - 2)2 <25
Li= i
14. If T 15
t(v), give the distribution of T2.
Suppose that X, N(ji 2) = i n and Z NID 1) i = 1 k and all variables are independent. State the distribution of each of the following variables if it is a "named" distribution or otherwise state "unknown."
(a) X1X2
(i)
!
X2 + 2X3
(J) -
X1 - X2 aS2«J2
(k)
2
(e)
,./i(X
/1)
(f) Z + Z
(n) - + 11
(h)
(p)
(n-1)cr2(Z1-2)2
i1
16. Let X1, X2.....X9 be a random sample from a normal distribution, X N(6, 25), and denote by and S2 the sample mean and sample variance. Use tables from Appendix C to find each of the following
P[3<<7]. P[1.860 < 3(X - 6)/Si. PLIS2
31.9375]
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
286
CHAPTER 8
STATISTICS AND SAMPLING DISTRIBUTIONS
17. Use tabled values from Appendix C to find the following: P[7.26 < Y < 22.31] if Y The value b such that P[Y
2(15).
b]
0.75 if Y .f(23).
P1Y>]ifYx2(6). P[0.87 < T <2.65] if T t(13). The value b such that P[T
P[2.91
[> 0.25] if X
t(26).
e] = 0.02 if T '- t(23).
The value c such that P[J TI
F(7, 12).
F(20, 8).
18. Assume that Z, V1, and V2 are independent random variables with Z and 2 x2(9) Find the following:
N(0, 1), V1
f(s),
P[V1+V2<8.6]. P[Z//V1/5 < 2.015].
P[Z> 0.611/]. P[V1/V2 < 1.450]. V1
(e) The value b such that P 19. If T
V1+V2
t(1), then show the following:
The CDF of T is F(t) = + 2
arctan(t). iv
The 100 x yth percentile is t(1) = tan[iz(y - 1/2)].
20. Show that if X - F(2, 2h), then
P[X > x] =
(i +
for all x > 0.
The 100 x yth percentile isf7(2, 2b) = b[(1 - y)
- 1],
21. Show that if F(x; p) is the CDF of X - P01(p), and if II(y; y) is the CDF of a chi-square distribution with y degrees of freedom, then F(x; p) = 1 - H[2p; 2(x + 1)]. Hint: Use Theorem 3.3.2 and the fact that Y x2(v) corresponds to Y GAM(2, v/2).
22.
If X
BETA(p, q), derive E(X").
23. Consider a random sample from a beta distribution, X (Theorem 7.3.2) to approximate P[
24.
Let Y,,
BETA(1, 2). Use the CLT
0.5] for n = 12.
f(n). Find the limiting distribution of(
- n)/,./i
as n - co, using moment
generating functions.
25. Rework Exercise 5(b) and (e) using normal approximation, and compare to the exact results.
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
EXERCISES
287
26, Let X1, X,, ..., X be a random sample from a distribution whose first four moments exist, and let
= Show that S
(X1 -
1)
r2 as n - . Hint: Use Theorem 8.2.2 and the Chebychev inequality.
27. Compare the Wilson-Hilferty approximation (Equation 8.5.2) to the exact tabled values of x.95(1O) and x.o5(1O).
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
C
H
P
A
T
R
POLkiT ESTiRi=TION
9J INTRODUCTION The previous chapters were concerned with developing the concepts of probability and random variable to build mathematical models of nondeterministic physical phenomena. A certain numerical characteristic of the physical phenomenon may be of interest, but its value cannot be computed directly. Instead, it is possible to observe one or more random variables, the distribution of which depends on the characteristic of interest. Our main objective in the next few chapters will be to develop methods to analyze the observed values of random variables in order to gain information about the unknown characteristic. As mentioned earlier, the process of obtaining an observed result of a physical phenomenon is called an experiment. Suppose that the result of the experiment is
a random variable X, and f(x; O) represents its pdf. It is common practice to consider X as a measurement value obtained from an individual chosen at random from a population. In this context, f(x; O) will be referred to as the 288
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
9.1
289
INTRODUCTION
population pdf, and it reflects the distribution of individual measurements in the population. Complete specification of f(x; O) achieves the goal of identifying the distribution of the response of interest In some cases it is possible to arrive at a specified model based on axiomatic assumptions or other knowledge about the population as was the case in certain counting problems discussed earlier More often the experimenter is not able to
specify the pdf completely, but it may be possible to assumé that f(x; O) is a member of some known family of distributions (such as normal, gamma, Weibull, or Poisson), and that O is an unknown parameter such as the mean or variance of
the distribution. The objective of point estimation is to assign an appropriate value for O based on observed data from the population. The observed results of repeated trials of an experiment can be modeled mathematically as a random sample from the population pdf. In other words, it is assumed that a set of n independent random variables, X1, X2, ..., X, each with pdff(x; O), will be observed, resulting in a set of data x, X2, ..., x,,. Of course, it is possible to represent the joint pdf of the random sample as a product:
f(x1, x2, ..., x; O) = f(x1; O)f(x2; O) . . f(x; 0) .
(9.1.1)
This joint pdf provides the connection between the observed data and the mathematical model for the population. In this chapter we will be concerned with ways to make use of such data in estimating the unknown value of the parameter O.
In subsequent chapters, other kinds of analyses will be developed. For example, the data not only can provide information about the parameter value, but also can provide information about more basic questions, such as what family of pdf's should be considered to begin with. This notion, which is generally referred to as goodness-of-fit, will be considered in a later chapter. It also is possible to answer certain questions about the population without assuming a functional form forf(x; O). Such methods, known as nonparametric methods as well as other types of analyses, such as confidence intervals and tests of hypotheses about the value of O, also will be considered later. In this chapter we will assume that the distribution of a population of interest can be represented by a member of some specified family of pdf's,f(x; 9), indexed by a parameter O. In some cases, the parameter will be vector-valued, and we will use boldfaced O to denote this. We will let , called the parameter space, denote the set of all possible values that the parameter O could assume. If O is a vector, then Q will be a subset of a Euclidean space of the same dimension, and the dimension of Q will correspond to the number of unknown real parameters. In what follows, we will assume that X1, X2, ..., X is a random sample from f(x; O) and that t(0) is some function of O.
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
290
CHAPTER 9
POINT ESTIMATION
ïV
fillûiidoß
A statistic, T = I(X1, X2, ..., X), that is used to estimate the value of (6) is called an estimator of t(0), and an observed value of the statistic, t I(x1, x2, x), is called an estimate of t(6).
Of course, this includes the case of estimating the parameter value itself, if we let t(0) = O.
Notice that we are using three different kinds of letters in our notation. The capital T represents the statistic that we use as an estimator, the lower case t is an observed value or estimate, and the script ' represents the function that we apply to the random sample. Another fairly suggestive notation involves the use of a circumflex (also called a "hat") above the parameter, O, to distinguish between the unknown parameter value and its estimator. Yet another common notation involves the use of a tilde, O. The practice of using capital and lowercase letters to distinguish between estimators and estimates usually is not followed with notations such as and O. Two of the most frequently used approaches to the problem of estimation are given in the next section
9.2 SOME METHODS OF ESTIMATION In some cases, reasonable estimators can be found on the basis of intuition, but various general methods have been developed for deriving estimators.
METHOD OF MOMENTS The sample mean, , was proposed in Chapter 8 as an estimator of the population mean p. A more general approach, which produces estimators known as the method of moments estimators (MMEs), can be, developed. Consider a population pdf,f(x; O, ..., Ok), depending on one or more parameters O, ..., °k In Chapter 2, the moments about the origin, p, were defined. Generally, these will depend on the parameters, say .,
Ok)
= E(X)
j = 1, 2, ..., k
(9.2.1)
It is possible to define estimators of these distribution moments.
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
9.2 SOME METHODS OF ESTIMATION
291
Definition 9.2.1 If X, ..., X is a random sample fromf(x; 01.....0h), the first k sample moments are given by
xi, J
n
j = 1, 2.....k
(9.2.2)
As noted in Chapter 2, the first moment is the mean, p' = p Similarly, the first sample moment is the sample mean. Consider the simple case of one unknown parameter, say O = 01 That L = M'1
is generally a reasonable estimator of p = pj(0) suggests using the solution O of the equation M'1 = p4(0) as an estimator of O. In other words, because M'1 tends to be close to p'(0), under certain conditions we might expect that Ô will tend to be close to 0. More generally, the method of moments principle is to choose as estimators of the parameters 01, O the values 01, Ok that render the population moments equal to the sample moments. In other words, Ô, ..., Ô,, are solutions of the equations
i= Example 9.2.1
2,
...,
k
(9.2.3)
Consider a random sample from a distribution with two unknown parameters, the mean p and the variance o.2. We know from earlier considerations that p = and cr2 = E(X2) p2 = p - (u')2, so that the MMEs are solutions of the equations M1 = ft and M'2 = o2 + (11)2, which are ft = and
-
L_2
(X1)2
Notice that the MME of o.2 is closely related to the sample variance that was defined in Chapter 8, namely &2 = [(n - 1)/n]S2. E,rample 9.2.2 Consider a random sample from a two-parameter exponential distribution, X.
Example 9.2.3
EXP(1, 'i). We know that the mean is p = u() = i + = i + ì, then = 1 is the MME of.
,
and if we set
Consider now a random sample from an exponential distribution, X1 EXP(0), and suppose we wish to estimate the probability p(0) = P(X 1) = e '°. Notice that ¡4(6) = p = 0, so the MME of 0 is Ô = .. If we reparameterize the model with p = p(0) = = elL1, then p = p(p) = l/ln p, and if we equate = p(ß) = - 1/In fr, then the MME of p is fr = e''. Thus, in this case we see that
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
292
POINT ESTIMATION
CHAPTER 9
= p(0). If a class of estimators has this property, it is said to have an "invari-
ance" property.
Thus, to estimate t(0), one might first solve = u() to obtain the MME of 0 and then use i(0), or else one might express j directly in terms of r and solve = p(?) for the MME of t. it is not clear that both approaches will always give the same result, but if Ô is a MME of 6, then we also will refer to i(0) as an MME 0, are obtained, of i(0). In general, if the MMEs of the natural parameters j, then j(°i' ..., O) = t(Ô1, ..., 0,3 will be used to estimate other functions of the natural parameters, rather than require that the moment equations be expressed directly in terms of the t's.
Example 9.2.4
Consider a random sample from a gamma distribution, X. GAM(0, K Because u' = p = ic9 and p' = 2 + /22 = KO2 + ,c202 = ic(l + K)02, so that and
KO =
ic(l + ic)02 =
nX2
i1
The resulting MMEs are " O
(X,
)2
=
[(n - 1)/n]S2
and
ic =
METHOD OF MAXIMUM LIKELIHOOD We now will consider a method that quite often leads to estimators possessing desirable properties, particularly large sample properties. The idea is to use a value in the parameter space that corresponds to the largest "likelihood" for the observed data as an estimate of an unknown parameter.
Example 9.25 Suppose that a coin is biased, and it is known that the average proportion of heads is one of the three values p = 0.20, 0.30, or 0.80. An experiment consists of tossing the coin twice and observing the number of heads, This could be modeled mathematically as a random sample X1, X2 of size n 2 from a Bernoulli distribution, X, '- BIN(1, p), where the parameter space is f = {0 20, 0 30, 0 80} Notice that the MME of p, which is ' does not produce reasonable estimates in this example, because = 0, 0.5, or i are the only possibilities, and these are not values in Consider now the joint pdf of the random sample, j'(x1, X2; p) pXl+X2(1 p)2X1x2
.
for x1 = O or 1. The values off(x1, x2; p) are provided in Table 9.1 for the various
pairs (x1, x2) and values of p.
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
9.2
SOME METHODS OF ESTIMATION
293
Joint pdf of the numbers of heads for two tosses of a biased coin (xi, x2) p
(0,0)
(0,1)
(1,0)
(1,1)
020
0.64 0.49 0.04
0,16 021 0.16
016 0.21
0.04 0.09 0.64
0.30 0.80
0.16
Suppose that the experiment results in the observed pair (x1, x2) = (0, 0). From
Table 9.1, it would seem more likely that p = 0.20 rather than the other two values. Similarly, (x1, x2) = (0, 1) or (1, 0) would correspond to p = 0.30, and (x1, x2) = (1, 1) would correspond to p = 0.80. Thus, the estimate that maximizes the "likelihood" for an observed pair (x1, x2) is 0.20 0.30 0.80
if(x, x) = (0,0) if (X1, x2) = (0, 1), (1, 0) if (x1, x2) = (1, 1)
More generally, for a set of discrete random variables, the joint density function of a random sample evaluated at a particular set of sample data, say x; O), represents the probability that the observed set of data x1, ..., x will occur. For continuous random variables,f(x1, ..., x; O) is not a probability but it still reflects the relative "likelihood" that the set of data will occur, and this likelihood depends on the true value of the parameter.
Definition 9.2.2 Likelihood Function The joint density function of n random variables X1, ..., X evaluated at x1, ..., x, sayf(x1.....x; O), is referred to as a likelihood function. For fixed x1, ..., x the likelihood function is a function of O and often is denoted by L(6).
If X1, ..., X represents a random sample fromf(x; 8), then
L(0)=f(x1;8)» f(x;9)
(9.2.4)
For a given observed set of data, L(0) gives the likelihood of that set occurring as a function of O. The maximum likelihood principle of estimation is to choose as the estimate of O, for a given set of data, that value for which the observed set of data would have been most likely to occur. That is, if the likelihood of observing a given set of observations is much higher when O = 0 than when O = 82, then it is reasonable to choose 01 as an estimate of O rather than 82.
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
294
CHAPTER 9
POINT ESTIMATION
Definition 92.3 Maximum Likelihood Estimator Let L(6) = f(x1,..., x; 9), 9 n ), be the joint pdf of X1, ..., X,,. For a given set of observations, (x1, ..., x,,), a value Ô in ) at which L(8) is a maximum is called a maximum likehhood esOmate (MLE) of 6 That is 9 is a value of 9 that satisfies
f(x1, ..., x,,;
Ô)
max f(x1.....x,,; 6)
(9.2.5)
EO
Notice that if each set of observations (x1, ..., x,,) corresponds to a unique O, then this procedure defines a function, Ô = ¿f(x1 ,.., x,,). This same function, when applied to the random sample, O = ¿'(X1, ..., X,,), is called the value
maxinwm likelihood estimator, also denoted MLE, Usually, the same notation, 9, is used for both the ML estimate and the ML estimator.
In most applications, L(0) represents the joint pdf of a random sample, although the maximum likelihood principle also applies to other cases such as sets of order statistics If is an open interval, and if L(0) is differentiable and assumes a maximum on then the MLE will be a solution of the equation (maximum likelihood equation)
,
L(6)
=
O
(9.2.6)
Strictly speaking, if one or more solutions of equation (9.2.6) exist, then it should be verified which, if any, maximize L(0). Note also that any value of O that maximizes L(0) also will maximize the log-likelihood, In L(0), so for computational convenience the alternate form of the maximum likelihood equation, in L(0) = 0
(9.2.7)
often will be used.
Example 9.2.6
Consider a random sample from a Poisson distribution, X hood function is
PoI(0). The likeli-
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
9.2 SOME METHODS OF ESTIMATION
295
and the log-likelihood is
In L(8) = nO +
in O
The maximum likelihood equation is n
In L(0) =
which has the solution
=
= i It is possible to verify that this is a
maximum by use of the second derivative, d2
x
inL(0)=
which is negative when evaluated at , - n/5c
fl
t = t(0) = P[X = O] = We may reparameterize the model in terms oft by letting O = in t. We obtain
f(x;t)= t(in r) If L*(t) represents the likelihood function relative to t, then in L*(t)= nm
dInL*(t) dt
n
t
t + .xin(ln t) infl x1!
(i
1nt'\
T
and setting the derivative equal to zero gives
ln=
=e
In this example, it follows that
= e(0) = t(Ô).
We could have maximized L*(t) relgtive to t in this case directly by the chain rule without carrying out the reparameterization. Specifically, dO
in L(0) = -- In L*(t) dt dû
and if dt/dO O, then d/dt[ln L*(t)] = O whenever d/dû[ln L(0)] = O, so that the maximum with respect to t occurs at t(6).
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
296
CHAPTER 9
POINT ESTIMATION
It should be noted that we are using the notation t to represent both a function of O and a value in the range of the function. This is a slight misuse of standard mathematical notation, but it is a convenient practice that often is used in problems involving reparameterization.
In general, if u is some one-to-one function with inverse u1, and if t = u(0), then we can define L*(t) = L(u - '(t)). It follows that will maximize L*(t) if = O or equivalently, = u(s) When u is not one-to one there is no unique solution to u(0) for every value of t. The usual approach in this case is to
u'()
extend the definition of the function L*(t). For example, if for every value t, L(0) attains a maximum over the subset of such that t = u(0), then we define L*(t) to be this maximum value This generalizes the reparameterized likelihood function to cases where u is not one-to-one, and it follows that = u(0) maximizes L*(t) when maximizes L(0). (See Exercise 43.) These results are summarized in the following theorem. Theor.na £.2.1
Invariance Property is an MLE of u(0).
If Ô is the MLE of O and if u(0) is a function of O, then u(Ô)
In other words, if we reparameterize by t = t(0), then the MLE oft is = t(Ô).
Exam4e 9.2.7 Consider a random sample from an exponential distribution, X.
EXP(0). The
likelihood function for a sample of size n is
L(0)=_eEO
O
Thus,
lnL(0)= -ilnOL
and
lnL(0)=
Equating this derivative to zero gives Ô= If we wish to estimate p(0) = P(X 1) = e 9.2.1 that the MLE is p(Ô) = e11'.
then we know from Theorem
There are cases where the MLE exists but cannot be obtained as a solution of the ML equation.
Examplø 9.2.8
Consider a random sample from a two-parameter exponential distribution,
EXP(1, ,i). The likelihood function is L(i7) = exp [- (x1 - j)] if all x1 j and zero otherwise. If we denote the minimum of x1, ..., x,, by x1,,, then we can write L('j) = exp [n('j - )] if x1.,, j and zero otherwise. The graph of L('j) is shown in Figure 9.1. X1
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
9.2 SOME METHODS OF ESUMATION
FIGURE 9.1
297
The likelihood function for a random sample from EXP(1, q)
It is clear from the figure that L(7) is maximized at = x1.,,, and the ML estimator is the first order statistic. This is an example where the MLE and the MME are different. (See Example 9.2.2). As noted earlier, the ML principle can be used in situations where the observed variables are not independent or identically distributed.
tnpIe 9.2.9 The lifetime of a certain component follows an exponential distribution, EXP(0). Suppose that n components are randomly selected and placed on test, and the first r observed failure times are denoted by x1,,, ..., From equation (6.5,12) the joint density Of Xj:,,, ..., X, is given by
L(0) = f(x1., ...,
8)
/
r
n!
/ X
n!
L=1 -
Note that T =
i1
exp
[(n - r)xr.nlI1 [
+ (n
o
- r);
O
+ (n - r)Xr.n represents the total survival time of the n
items on test until the experiment is terminated. To obtain the MLE of O based on these data, we have d
In L(0) = const - r In O
ln L(0) =
rT +
O
Setting the derivative equal to zero gives = Tir. If the complete sample is observed, then r = n and Ô =
as before.
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
298
CHAPTER 9 POINT ESTIMATION
The previous examples have involved the distributions with one unknown parameter. The definitions of likelihood function and maximum likelihood esti-
mator can be applied in the case of more than one unknown parameter if O represents a vector of parameters, say O = ..., Ou). Although 1 could, in general, be almost any sort of k-dimensional set, in most examples it is a Cartesian product of k intervals. When f is of this form and if the partial derivatives of 0,3
exist, and the MLEs do not occur on the boundary of f, then the
MLEs will be solutions of the simultaneous equations
-lnL(01,
(928)
'0k)
,
k These are called the maximum likelihood equations and the solu fori = 1, tions are denoted by , Ô. As in the one-parameter case, it generally is necessary to verify that the solutions of the ML equations maximize L(01, ..., 0,3.
Theorem 9.2.2 Invariance Propefly If Ô = ..., Ô,3 denotes the MLE of O = (0k, ., 0,3, then the MLE oft = (r1(0), ..., te(0)) is = (, ..., r) = (t1(Ó), ..,, t(Ô)) for i r k. .
The situation here is similar to the case of a single parameter. If e represents a one to one transformation, then a reparameterized likelihood function can be defined, and the MLE of t is obtained as the transformation of the MLE of O. In the case of a transformation that is not one-to-one, the likelihood function relative to e must be extended in a manner similar to the single-parameter case. Note that the multiparameter estimators often are not the same as the individ-
ual estimators when the other parameters are assumed to be known. This is illustrated by the following example.
Example 9.2ï0 For a set of random variables X '-. N(u, a2), the MLEs of p and O = a2 bnsed on a random sample of size n are desired. We have
f(x; p, O) =
-(x--p)2/20
2iW
L(p, 8) = (2z0)''2 exp ln L(u, O)
In L, 8)
= const -
-
2
(x1
ln O
- p)
20
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
9.2 SOME METHODS OF ESTIMATION
299
and
in L(u, 0) n ô0
(x1 - p)2 202
20
Setting these derivatives equal to zero and solving simultaneously for the solution values p and B yields the MLEs
=
(xe )2
The ML method enjoys the invariance property in the two-parameter normal case. For example, if the likelihood function is maximized with respect to p and ,
then one obtains p =
and â = and similarly for other functions of the parameters. Notice that if O = 0 is known, then from the first ML equation
2(x,ît) 26 yields dû =
as before, but up = p0 is assumed known, then from the second ML
equation
yields
2+ i
-
2Ô2 (x1
n
Example 9.2.11 Consider a random sample from a two-parameter distribution with both parameters unknown, X EXP(0, 'j). The population pdf is
f(x; 0, j) = ()e -(X
x
The likelihood function is L(O, j)
all x
exp [
and the log-likelihood is in L(0,'j)=
19Xfl'j 0
X1.n
'1
X1: is the minimum of x1, . .., xe,. As in Example 9.2.8, the likelihood is maximized with respect to by taking = x1.,,. To maximize relative to 0, we may differentiate In L(0, îj) with respect to 0, and solve the resulting equation,
where
j
d ln L(0, dû
)- nû
+
(x
- i»
02
-
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
300
CHAPTER 9
POINT ESTIMATION
which yields (x
=
-
fl =x-
The percentile, x, such that F(X) = z is given by x« = 8 In (1 and the MLE of x is « = - in (1 - z) + by Theorem 9.2.2.
)+
,
ExampI8 p.2.12 Let us consider ML estimation for the parameters of a gamma distribution, GAM(0, K), based on a random sample of size n. We have
and
lnL(0,K)= nic mOnm F'(k)+(K l)lnflx>.x/0 The partial derivatives are 3 In L(O, c) 30 3 in L(0, K) 3K
-
ntc 0
+
02
- n in O - nF'(K)/F(K) + In
fl x
If we let = (fl X)11" denote the geometric mean of the sample and let = r'(K)/r(K) denote the psi function, then setting the derivatives equal to zero gives the equations Ô=
In (k) - W(k) - in (5/Tc) = O
This provides an example where the ML equations cannot be solved in closed form, although a numerical solution for k can be obtained from the last equation; for example, by using tables of the psi function. We see that k is a function of and is not a function of , , and n separately. Thus it is convenient to provide tables of K lfl terms of Perhaps the best approach to ML estimation for the gamma distribution is the use of the following rational approximation [Greenwood and Durand (1960)]: K
0.5000876 + 01648852M - 00544274M2 M
o
8.898919 + 9059950M + O.9775373M2 M(17 79728 + 11968477M + M2)
0.5772 < M
M
< M z 0.5772 17
M>17
where M = in (/).
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
9.2
SOME METHODS OF ESTiMATION
301
The MLEs are not the same as the MMEs, but the ML estimate of the mean is = Ok =
It is also possible to have solutions to the ML equations that can be obtained only by numerical methods.
Examp!& 9.2.13 Consider a random sample of si2e n from a Weibull distribution with both scale and shape parameters unknown, X, WEI(O, ß). The population pdf is f(x 0, /3) = (ß/0)(x/0)
'exp [- (x/O)]
for x > O; O > O, and /3> 0, and the log-likelihood function is
ln (x /0) -
in L(0, fi) = n In (ji/U) + (JI - 1)
which leads to the ML equations
in L(8, fi) = nfl/U + /8) in L(0, fi) = n/fl +
(x0) =
in (x1/0) -
(xj0) in (x,/O) = O
After some algebra, the MLEs are the solutions /3 = / and O = Ô of the equations
x1n x,1inx, x8 O
=(
fi
n
x/n)'I
The equation g(ß) = O cannot be solved explicitly as a function of the data, but for a given set of data it is possible to solve for /3 by an iterative numerical method such as the Newton-Raphson method. Specifically, we can define a sequence /32, ... such that fini =
where ,ß0 > O is an initial value, g'(ß) is the derivative of g(/3), and ,ß,,, -* g as m -+ co.
Some large-sample properties of MLEs will be discussed in Section 9,4, and additional methods of estimation will be presented in Section 9.5.
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
302
CHAPTER 9 POINT ESTIMATION
9.3 CRITEIA FOR EVALUATING ESTIMATORS Several properties of estimators would appear to be desirable, including
unbiasedness,
Definition 93i Unbiased Estimator An estimator T is said to be an unbiased estimator of r(0) if
E(T)=r(9)
(9.31)
for all O e t. Otherwise, we say that T is a biased estimator of t(8).
If an unbiased estimator is used to assign a value of t(0), then the correct value of i(0) may not be achieved by any given estimate, t, but the "average" value of T will be i(0).
Example 93.1
Consider a random sample from a distributionf(x; O) with O = (p, cr2), where p and .2 are the mean and variance of the population. It was shown in Section 8.2 that the sample mean and variance, and S2, are unbiased estimators of p and .2 respectively. If both p and cr2 are unknown, then the appropriate parameter space is a subset of two dimensional Euclidean space In particular f is the Cartesian product of the intervals (cc, cc) and(O, cc); l = (cc, cc) x (O, cc). If only one parameter is unknown, then will consist of the corresponding onedimensional set. For example, suppose the population is normal with unknown mean, p, but known variance o.2 = 9. The appropriate parameter space is
= (cc, cc), because in general for the mean of a normal distribution,
cc
We may desire to estimate a percentile, say the 95th percentile, of the distribu-
tion N(,u, 9). This is an example of a function of the parameter, because t(u) = p + crz09 = p + 4.95. It follows that T = + 4.95 is an unbiased estimator of 'r(u) because E(T) = E( + 495) = E() + 495 = p + 495, regardless of the
value of p.
It is possible to have a reasonable estimator that is biased, and often an estimator can be adjusted to make it unbiased
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
9.3
Example 9.3.2
CRITERIA FOR EVALUATING ESTIMATORS
303
Consider a random sample of size n from an exponential distribution, EXP(6), with parameter 0. Because O is the mean of the distribution, we know that the MLE, X, is unbiased for O. If we wish to estimate the reciprocal of X1
the mean, t(0) = 1/O, then by the invariance property the MLE is T = 1/X. However, T1 is a biased estimator of 1/O, which follows from results in Chapter 8. In particular, from Theorems 8.3.3 and 8.3.4, we know that
Furthermore, it follows from equation (8.3.6) with r = 1 that E(Y1) = 1/[2(n - 1)], and consequently E(T1) = [n/(n - 1)](l/O). Although this shows that T1 is a biased estimator of 1/0, it also follows that an adjusted estimator of
the form eT1, where c = (n - 1)/n is unbiased for 1/0 This also suggests that while the unadjusted estimator, T1, is biased, it still might be reasonable because the amount of bias, 1/[(n - 1)0], is small for large n.
It is not always possible to adjust a biased estimator in this manner, For example, suppose that it is desired to estimate 1/0 using only the smallest observation, which would correspond to observing the first order statistic X1,, It was shown in Example 7.2.2 that X1.,, EXP(0/n), and consequently nX1.,, is another example of an unbiased estimator of 0. This suggests that T2 = 1/(flXj:n) also could be used to estimate 1/0. The statistic T2 cannot be adjusted in the above manner to be unbiased for 1/0, because E(T2) does not even exist. The statistics T1 and T2 illustrate a possible flaw in the concept of unbiasedness as a general principle. In particular, if Ô is an unbiased estimator of 0, then i(0) will not necessarily be an unbiased estimator i(0). Yet i(0) may be a reasonable estimator of i(0), such as the case when 0 = X and i(0) = l/X.
It often is possible to derive several different potential estimators of a parameter. For example, in some cases the MLEs and the MMEs have basically different forms. This raises the obvious question of how to decide which estimators are "best" in some sense, and this question will be discussed next. A very general idea is to select the estimator that tends to be closest or "most concentrated" around the true value of the parameter. It might be reasonable to say that T1 is more concentrated than T2 about i(0) if
P[t(0) - a < T1
.P[t(6) - s < T2
(9.3.2)
for all a > O, and that an estimator is most concentrated if it is more concentrated than any other estimator. The idea of a more concentrated estimator is illustrated in Figure 9.2, which shows the pdf's of two estimators T1 and T2. It is not clear how to obtain an estimator that is most concentrated, but some
other concepts will be discussed that may partially achieve this goal. For
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
304
FIGURE 9.2
CHAPTER 9
POINT ESTIMATION
The concept of "more concentrated" -
f(t,O)
fT2( t; 8)
r(0)e
i'(0) i-(0)+e
example, if T is an unbiased estimator of y(0), it follows from the Chebychev inquality that
P[v(0) - e < T
i - Var(T)/e2
(9.3.3)
for all e > O. This suggests that for unbiased estimators, one with a smaller variance will tend to be more concentrated and thus may be preferable.
Example 9.3.3 Let us reconsider Example 9.3.2, where we are interested only in estimating the mean, O. If Ô1 = and Ô2 nX1.,, then both estimators are unbiased for 0, but Var(Ô1) 02/n and Var(O2) = 02. Thus, for n> 1, Var(O1) < Var(O2) for all O > O, and O is the better estimator by this criterion.
In some cases one estimator may have a smaller variance for some value.s of O and a larger variance for other values of O. In such a case neither estimator can be said to be better than the other in general. In certain cases it is possible to show that a particular unbiased estimator has the smallest possible variance among all possible unbiased estimators for all values of 0. In such a case one could restrict attention to that particular estimator.
UNIFORMLY MINIMUM VARIANCE UNBIASED ESTIMATORS
Definition 9..2 Let X1, X2, ..., X be a random sample of size n fromf(x; O). An estimator T* of c(0) is called a uniformly minimum variance unbiased estimator (UM VUE) of T(0) if T* is unbiased for T(0), and for any other unbiased estimator T of T(0), Var(T*)
Var(T) for all O e ft
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
305
93 CRITERIA FOR EVALUATING ESTIMATORS
In some cases, lower bounds can be derived for the variance of unbiased estimators. If an unbiased estimator can be found that attains such a lower bound, then it follows that the estimator is a UMVUE In the following discussion, if the appropriate derivatives exist and can be passed under the integral sign (or summation), then a lower bound for the variance of an unbiased estimator can be established Among other things, this will require that the domain of the inte grand must not depend on O.
If T is an unbiased estimator of r(0), then the Cramer-Rao lower bound (CRLB), based on a random sample, is [r'(0)]2
Var(T)
in f(X;
nE[
Assuming differentiability conditions as mentioned earlier, the CRLB can be developed as follows We will assume the case of sampling from a continuous distribution. The discrete case is similar. Consider the function defined by
x; O) =
In f(x1,
..., x;
O),
which also can be written (9.3.5)
If we define a random variable U =
..
E(U)
=
j(x1
..., x;
(X1, ..., X; O), then O)f(x1,
..., x; O) dx1 ... dx
Ifi,,0xi
1
Jf(x1, ..., x; 9)
n
=0 Note also that if T = I(X1, ..., X) is unbiased for r(0), then t(9) = E(T)
=
J
JI(xi, ..., x)f(x1, ..., x; O) dx1 ... dx
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
306
CHAPTER 5
POINT ESTIMATION
If we differentiate with respect to 0, then z'(0)
=J
I
I(x1,
..., x)
f(x1, ..., x; O) dx1
dx
0)f(x1, ..., x; 0) dx1
dx
= E(TU) It also follows, from equations (2.4.6) and (5.2.11) and the fact that E(U) = O, that Var(U) = E(U2) and Cov(T, U) = E(TU). Because the correlation coefficient is always between ± i [see equation (5.3.2)], it follows that [Cov(T, U)]2 Var(T) Var(U), and consequently Var(T)E(U2) ? [t'(0)]2 so that [r'(0)] 2
Var(T))
(9.3.6)
in f(X1, ..., X; 8) When X1, ..., X, represent a random sample,
so that ; 0)
in which case 2
E(U2) = Var(U) = n Var[
in f(X; 0) = nE[
in f(X; 0)
which yields inequality (9.3.4).
Note that if the proper differentiability conditions hold, as mentioned earlier, it can be shown that 2
in f(X; 0)
Exampk 9.3.4
=
_E[
in f(X; 0)]
Consider a random sample from an exponential distribution, X.
EXP(0).
Because
in f(x 8) = x/O - in 8 in f(x; 0) = x/82 - 1/O = (x - 8)/02
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
9.3
307
CRITERIA FOR EVALUATING ESTIMATORS
Thus,
= E[(X - 9)2/4]
in f(X;
= 02/04
= 1/92
and the CRLB for r(9) = 8 is 1/[n(1/02)] = 02/n. Because Var(X) follows that
02/n, it
is the UM VUE of B.
It is possible to obtain more information about the type of estimator, the variance of which can attain the CRLB, by further considering the derivation of inequality (9.3.6). The lower bound is attained only when the correlation coefficient of T and U is ± 1. It follows from Theorem 5.3.1 that this occurs if and only
if T and U are linearly related, say T = aU + b with probability i for some constants a function of
O and b. Thus, for T to attain the CRLB of 'r(0), it must be a linear
in f(XL; 0)
Example 9.3.5
We take a random sample X1, ..., X, from a geometric distribution with parameter 8 = p, X, - GEO(0), and we wish to find a UM VUE for i(0) = 1/0. Because
lnf(x; 0)=lnO+(x-1)ln(1-0) 3
1nf(x;8)=
1
x-1
1O
x - 1/0
0-1 For the variance of an unbiased estimator T to attain the CRLB, it must have the form
T = a(X, - 1/0)7(0 - 1) + b which also can be expressed as a linear function of the sample mean, say T
+ d for constants c and d. Because . is unbiased for 1/0, then necessarily e = i and d = O, so that T = . is the only such estimator. The variance of is Var(X) = (1 - 8)/(n02), which also can be shown to be the CRLB for this case.
This discussion also suggests that only certain types of functions will admit an unbiased estimator, the variance of which can attain the CRLB.
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
308
CHAPTER 9
POINT ESTIMATION
Theorem 9.3.1 If an unbiased estimator for r(6) exists, the variance of which achieves the CRLB, then only a linear function of x(0) will admit an unbiased estimator, the variance of which achieves the corresponding CRLB.
Thus, in the previous example, there is no unbiased estimator, the variance of which attains the CRLB for unbiased estimators of O because 6 is not a linear function of 1/O It cannot be concluded from this that an UMVUE of O does not exist, only that it cannot be found by the CRLB approach. In the next chapter we will study a method that often works when the present approach fails.
Comparisons involving the variances of estimators often are used to decide which method makes more efficient use of the data.
Defi,thcion 9.3.3 Efficiency The relative efficiency of an unbiased estimator T of r(0) to another unbiased estimator T* of T(0) is given by
Var(T')
re(T, T
(9.3.7)
- Var(T)
An unbiased estimator T* of x(0) is said to be efficient if re(T, T*) unbiased estimators T of t(9) and all O
E
i for all
1 The efficiency of an unbiased estimator
T of i(0) is given by e (T)
re(T, T*)
(9.3.8)
if T is an efficient estimator of x(9).
Notice that in this terminology an efficient estimator is just a UM VUE. The notion of relative efficiency can be interpreted in terms of the sample sizes
required for two types of estimators to estimate a parameter with comparable accuracy. Specifically, suppose that T1 and T2 are unbiased estimators of r(6), and that the variances are of the form Var(T1) = k1/n and Var(T2) = k2/n. The rela-
tive efficiency in this case is of the form re(T1, T2) = k2,'k1. If it is desired to choose sample sizes, say n1 and n2, to achieve the same variance by either method, then k1/n1 = k2/n2, which implies n2/n1 = re(T1, T2). In other words, if T1 is less efficient than T2, one could choose a larger sample size, by a factor of k1/k2 to achieve equal variances Some authors define the efficiency of T to be the ratio CRLB/Var(T) which
allows the possibility that a UMVUE could exist but not be efficient by this definition. However, it does follow that if CRLB/Var(T) = 1, then T is an efficient estimator by Definition 9.3.3. At this point, use of the CRLB is the only convenient means we have to verify that an estimator is efficient.
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
9.3
Example 9.3.6
CRITERIA FOR EVALUATING ESTIMATORS
309
Recall in Example 93.2 that the estimator T = (n - 1)/(n) is unbiased for T(0) = 1/0. In this case x'(0) = _1/02 and the CRLB is [-1/62]2/[n(1/82)] = 1/(n02). It was found in Example 9.3.4 that the variance of' attains the CRLB of unbiased estimators of 0. Because t(0) 1/0 is not a linear function of 0, there
is no unbiased estimator of 1/0 whose variance equals 1/(n02) In terms of the random variable Y = 2n/0, we can express T as T = [2(n - 1)/O] Y - From this and equation (8 3 6) we can show that Var(T) = l/[(n 2)02] Even tnough Var(T) does not attain the CRLB, it is quite close to it for large n. It often is possible to obtain an unbiased estimator, the variance of which is close to the CRLB even though it does not achieve it exactly so the CRLB can be useful in evaluating a proposed estimator, whether an UM VUE exists or not Actually, we will be able to show in the next chapter that the estimator T is an UM VUE for 1/0. This means that T is an example of an efficient estimator that cannot be obtained by the CRLB method.
Example 9.3.7 Recall in Example 9.3.3 that we had unbiased estimator Ô1 = I and Ô2 = of O It was later found that Ô is a UMVUE Thus
is an efficient estimator of
0 and the efficiency of Ô2 is e(O2)
= re(02, &) =
and thus 02 is a very poor estimator of 0 because its efficiency is small for large
n.
A slightly biased estimator that is highly concentrated about the parameter of interest may be preferable to an unbiased estimator that is less concentrated. Thus, it is desirable to have more general criteria that allow for both biased and unbiased estimators to be compared.
Kinition 9.3.4 If T is an estimator of i(0), then the bins is given by
b(T) = E(T) - i(0)
(9.3.9)
and the mean sqanred error (MSE) of T is given by
MSE(T) = EET - i(0)]2
(9.3.10)
Theorem 9.3.2 If T is an estimator of i(0), then
MSE(T) = Var(T) + [b(T)]2
(9.3.11)
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
310
CHAPTER 9 POINT ESTIMATION
Proof MSE(T) = E[T = E[T - E(T) + E(T) = E[T - E(T)]2 + 2[E(T) - y(0)] x [E(T) - E(T)] + [E(T) = Var(T) + [b(T)]2
The MSE is a reasonable criterion that considers both the variance and the bias of an estimator, and it agrees with the variance criterion if attention is restricted to unbiased estimators. It provides a useful means for comparing two or more estimators, but it is not possible to obtain an estimator that has uniformly minimum MSE for all O e fi and all possible estimators.
Example 9.3.8
Consider a family of pdf's f(x; O) where the parameter space fi contains at least two values. If no restrictions are placed on the type of estimators under consideration, then constant estimators, Ô = c for c e fi, cannot be excluded. Such estimators clearly are not desirable from a practical point of view, because they do
not even depend on the sample yet each such estimator has a small MSF for values of O near c In particular, MSE(ÔC) = (c 0)2 which is zero if O = e This means that for a uniformly minimum MSE estimator necessarily MSE(0) = O for all O E fi. This would mean that Ô is constant, say Ô = c'i' (with probability 1). Now, if O e fi and O c'', then MSE(Ô) = (c* - 0)2 > O, in which case O does not have uniformly minimum MSE.
If the class of estimators under consideration can be restricted to a smaller class, then it may be possible to find a uniformly minimum MSE estimator. For example, restriction to unbiased estimators eliminates estimators of the constant type, because ê, = e is not an unbiased estimator of 0.
Example 9.3.9
Consider a random sample from a two-parameter exponential distribution with known scale parameter, say O = 1, and unknown location parameter In other words, X. EXP(l, ii). .
We wish to compare the MME and the MLE, ,3 and n2 respectively. It is easy to show that i = - i and 2 = x1
Specifically, let
- EXP(1/n), and it follows that E(1)=E(l)=E()l=l+nl=n
GAM(l/n, n) and Xi:
-
n
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
9.4
311
LARGE-SAMPLE PROPERTIES
and E(i2) = E(X1,) = E(X1. Thus,
is unbiased and
2
-j+
) = E(X1. - n) + n = 1/n + n
is biased with bias term b(í2) = 1/n. Their MSEs are
MSE(i1) = Var( - 1) = Var() = 1/n and
MSE(2) = Var(2) + (1/n)2 = Var(X1.) + (1/n)2 = Var(Xi:n - n) + (1/n)2 = (1/n)2 + (1/n)2 = 2/n2
Thus, for n > 2 the biased estimator has a much smaller MSE than does the unbiased estimator.
It also is possible to adjust
2
to be unbiased, say
i
= X1., - 1/n, so
that E(3) = E(X1.) - 1/n = n + 1/n - 1/n = 'i and MSE(3) = Var(X1.) = Var(X1. - ) = 1/n2. Thus, for n> 1, i has the smallest MSE of the three. It is interesting to note that in Example 9.3.3, when the sample was assumed to be from an exponential distribution with unknown scale parameter O, the MLE of O, which is , was much superior to the one based on X1.,, In the present example, where the distribution is exponential with a known scale but unknown location parameter, the result is just the reverse.
94 LARGE-SAMPLE PROPERTIES We have discussed properties of estimators such as unbiasedness and uniformly minimum variance. These are defined for any fixed sample size n, and are exam-
ples of "small-sample" properties. It also is useful to consider asymptotic or "large-sample" properties of a particular type of estimator. An estimator may have undesirable properties for small n, but still be a reasonable estimator in certain applications if it has good asymptotic properties as the sample size increases. It also is possible quite often to evaluate the asymptotic properties of an estimator when small sample properties are difficult to determine
Definition 9.4.1 Simple Consistency Let {7} be a sequence of estimators of r(9). These estimators are said to be consistent estimators of t(0) if for every a > O,
lìrnP[J 1 - z(0)J
(9.4.1)
for every O e
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
312
CHAPTER 9
POINT ESTIMATION
In the terminology of Chapter 7, 7 converges stochastically to r(0), 7 - r(0) as -s co Sometimes this also is referred to as simple consistency One interpreta tion of consistency is that for larger sample sizes the estimator tends to be more concentrated about r(U), and by making n sufficiently large 7 can be made as n
concentrated as desired. Another slightly stronger type of consistency is based on the MSE.
Definition 94.2 MSE Consistency If {7} is a sequence of estimators of -r(0), then they are called mean squared error consistent if hm E[Tn
(9.4.2)
-r(8)]2 = O
n-
for every 6 e
Another desirable property is asymptotic unbiasedness.
Definition £9.'3.3 Asymptotic Unbiased A sequence {Tn} is said to be asymptotically unbiased for r(8) if (9.4.3)
um E(7) = t(8) n-'
for all 8 e
It cari be shown that a MSE consistent sequence also is asymptotically
unbiased and simply consistent.
Theorem 9.4.1
A sequence {7} of estimators of 'r(0) is mean squared error consistent if and only if it is asymptotically unbiased and um Var(7) = O. n-
Proof This follows immediately from Theorem 9.3.2, because
MSE(1) = Var(7,) + [E(7) Because both terms on the right are nonnegative, MSE(1) * O implies both Var(1) O and E(7) i(0). The converse is obvious.
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
9.4
ExampTh 9.4.1
313
LARGE-SAMPLE PROPERTtES
In Example 9.3.2 we considered the reciprocal of the sample mean, which we now denote by 1 = 1/X, as an estimator of r(6) i/O. As noted earlier, Y 2n/9
2(2n). It follows from equation (8.3.6) that E(7) = [n/(n - l)](1/O) and Var(7) = [n/(n - 1)]2/[(n - 2)82], so that E(7,) -+ 1/O and Var(7) -+ O as n -+ cc Thus, even though 7 is not unbiased, it is asymptotically unbiased and
MSE consistent for r(0) = 1/O.
As mentioned earlier, MSE consistency is a stronger property than simple consistency.
Theorem 9,4.2 Ifa sequence {7} is mean squared error consistent, it also is simply consistent.
Proof This follows from the Markov inequality (2.4.11), with X = c = e, so that
- r(0), r = 2, and
P[It(0)I
Eirample 9.4.2
151
Let X1, ..., X,, be a random sample from a distribution with finite mean t and variance 2, was shown in Chapter 7 that the sample mean, ,,, converges
stochastically to p, and if the fourth moment, ¡4, is finite, then the sample variance, S, converges stochastically to c2. Actually, because ,, and S,, are unbiased and their respective variances approach zero as n -* cc, it follows that they are both simply and MSE consistent. If the distribution is exponential, X«- EXP(0), then it follows that ,, is MSE
consistent for 8, but the estimator 8 = nX1.,, is not even simply consistent, because nX1.,, - EXP(0). If the distribution is the two-parameter exponential distribution, EXP(1, i'), as in Example 9.3.10, then the unbiased estimator i = X,, - 1/n is MSE consistent, because MSE(3) = 1/n2 - O. However, 74 = c(X1.,, - 1/n) is not MSE consistent for fixed c i and j O, because MSE(i4) = c2/n2 + (c - l)22 (c - 1)2n2 0. The choice of c that minimized MSE when j = i was c = n2/(1 + n2), which has limit 1. In general, if c - 1 as n - cc, then MSE(i4) -+ 0, and is MSE consistent, and also asymptotically unbiased for n
Zheorem 9.4.3 If {7,} is simply consistent for r(0) and if g(t) is continuous at each value of t(0), then g(7) is simply consistent for g(z(0)).
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
314
CHAPTER 9
POINT ESTIMATION
Proof This follows immediately from Theorem 7.7.2 with
= 7 and e =
A special application of this theorem is that if T(0) is a continuous function of 0 and ,, is simply consistent for 0, then t(Ôn) is simply consistent for z(0). It also is possible to formulate an asymptotic version of efficiency.
Defiaffon 9.4.4 Asymptotic Efficiency Let {T,} and {T} be two asymptotically unbiased sequences of estimators for r(0). The asymptotic relative efficiency of 7 relative to
T is given by are(T T*)
hrn
Var(T*) (9 4 4)
"
Var(7,)
The sequence {T'} is said to be asymptotically efficient if are(7, T) i for all other asymptotically unbiased sequences {1,} and all O e fl The asymptotic efficiency of an asymptotically unbiased sequence {T,,} is given by
ae(1,) = are(7,
T)
(9.4.5)
if (T'} is asymptotically efficient.
The CRLB is not always attainable for fixed n, but it often is attainable asymptotically, in which case it can be quite useful in determining asymptotic efficiency.
Example 9,..3 Recall that irs Example 9.4.1, which involved sampling from EXP(0), the sequence 1/X was shown to be asymptotically unbiased for 1/0. The variance is Var(7) = [n/(n - 1)]2/[(n - 2)02] and the CRLB is [- 1/02]2/[n(l/02)] Tl
= 1/EnO2]. Because
lim
CRLB Var(7,)
l/[n02]
.
- hin
[n/(n
1)]2/[( - 2)02]
-i
it follows that '1 is asymptotically efficient for estimating 1/0.
Example 9.4.4
Consider again Example 9.3.9, where the population pdf is a two-parameter '-e EXP(1, ij), Because the range of X, depends on j, the CRLB cannot be used here The estimators fl2 = X1 ,, and , = X1 ,, - 1/n are both exponential, X1
asymptotically
unbiased,
and
both have the
same
variance,
VarØ2)
= Var(3) = 1/n2, and thus
are(i3,
'12)
hrn
1/n2 2= 1/n
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
315
9.4 LARGE-SAMPLE PROPERTIES
We will see later that i is a UM VUE for i, and thus efficient for estimating
2
also is asymptotically
.
Another idea might be to compare MSEs rather than variance. The present example suggests that this generally will not provide the same criterion as comparing variances, because MSE(i3)
1/n2
1
MSE(4) - 1/n2 + 1/n2 - 2 This results from the fact that the bias squared and the variance are both the same power of 1/n, namely 1/n2. Thus, some limitation on the bias must be considered or the asymptotic relative efficiency may be misleading.
In Example 9.3.9, another unbiased estimator, sidered, and VarQ1) = 1/n Thus
are(1, ?73)= hrn
1/n2
In
=
- 1, also was con-
=0
is not as desirable as i. It should be noted that this is an unusual example; in most cases the variance of an estimator is of the form c/n, whereas in the case of it is 1/n2, which is a so
higher power of 1/n.
An estimator with variance of order 1/n2 usually is referred to as a supereffident estimator. It often is difficult to obtain an exact expression for the variance of a proposed estimator. Another approach, which sometimes is used, restricts attention only to estimators that are asymptotically normal and replaces the exact variances with asymptotic variances in the definition of asymptotic relative efficiency. Specifically, if {7} and {T} are asymptotically normal with asymptotic mean T(0) and respective asymptotic variances k(0)/n and k*(0)/n, then the alternative definition is
k*(0)
are(7, T') _ k(0)
(9.4.6)
Of course, this approach is appropriate only for comparing asymptotically normal estimators, but it is somewhat simpler to use in many cases. An estimator T is at least as good as 7 if k*(0) k(0) for all O e and T is asymptotically efficient, in this sense, if this inequality holds for all asymptotically normal estimators T,. Such an estimator is often referred to as best asymptotically normal (BAN).
Example 9.4.5 A random sample of size n is drawn from an exponential distribution, X EXP(8), and it is desired to estimate the distribution median, t(6) = (in 2)0. One
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
316
CHAPTER 9 POiNT ESTIMATION
possible estimator is the sample median, 7 = X
with rin - 1/2 as n
cc.
Under the conditions of Theorem 7.5.1 with p = 1/2, 'I, is asymptotically normal with asymptotic mean 'r(0) and asymptotic variance 02/n. Another possibility is
based on the sample mean, T = (In 2)n. It follows from the central limit theorem that Z,, = 0)/O N(0, 1) as n + cc, and consequently T,, is asymptotically normal with asymptotic mean and variance, respectively t(0) and (ln 2)292/n. Thus, k(0) 92 and k*(0) = (In 2)202, and by definition (9,4.6) the asymptotic relative efficiency is (in 2)202/92 = (in 2)2 = 0.48, and T,, is the better estimator, Actually, it is possible to show, by comparison with the CRLB, that
T is efficient, but it still might be useful in some applications to know that for large samples, a method based on the sample median is 48% as efficient as one based on the sample mean.
ASYMPTOTIC PROPERTIES OF MLEs Under certain circumstances, it can be shown that the MLEs have very desirable properties. Specifically, if certain regularity conditions are satisfied, then the solutions 0,,, of the maximum likelihood equations have the following properties Ô,, exists and is unique, O,, is a consistent estimator of 0, 3
0n is asymptotically normai with asymptotic mean 0 and variance
l/nE[
in f(X; 0)]
and
4. On is asymptotically efficient.
Of course, for an MLE to result from solving the ML equation (9.2.6), it is necessary that the partial derivative of in f(x; 0) with respect to 0 exists, and also that the set A = x: f(x; O)> 0} does not depend on O. Additional conditions involving the derivatives of In f(x; 0) and f(x; 9) also are required, but we will not discuss them here. Different sets of regularity conditions are discussed by Wasan (1970, p. 158) and Bickel and Doksum (1977, p, 150). Notice that the asymptotic efficiency of On follows from the fact that the asymp-
totic variance is the same as the CRLB for unbiased estimators of 0. Thus, for large n, approximately Ô,,
N(0, CRLB)
It also follows from Theorem 7.7.6 that if r(0) is a function with nonzero derivative, then ì,, = r(0,,) also is asymptotically normal with asymptotic mean r(0) and
variance [t'(0)]2CRLB. Notice also that the asymptotic variance of ,, is the CRLB for variances of unbiased estimators r = z(0), so that ,, also is asymptotically efficient.
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
9.4
Example 9.4.6
317
LARGE-SAMPLE PROPERTIES
Recall in Example 9.2.7 that the MLE of the mean O of an exponential distributian is the sample mean, = It is possible to infer the same asymptotic
properties either from the preceding discussion or from the Central Limit Theorem. In particular, ,, is asymptotically normal with asymptotic mean O and variance 02/n. It also was shown in Example 9.3,4 that CRLB = 02/n. We also know the exact distribution of ,, in this case, because 2n
X2(2n)
which is consistent with the asymptotic normal result
f
0N(O
1)
because a properly standardized chi square variable has a standard normal limit ing distribution. Suppose that now we are interested in estimating R = R(t; O)
P(X> t) = exp (t/O)
An approximation for the variance of E = exp (- t/) is given by the asymptotic variance
Var(R)
[
R(t; O)](eu/n)
= [exp (- t/O)(t/92)] 2(62/n) = [exp (- t/O)(t/9)] /n = [R(ln R)] 2/n
and thus for large n, approximately R
N(R, R(ln R)2/n)
Example 9.4.7 Consider a random sample from a Pareto distribution, X.
PAR(l, ¡) where ic
is unknown. Because
f(x ,c)=1c(1+x)_1c_i
x
it follows that
lnL(K)=nlnK(!c+1)ln(1+x1) and the ML equation is
+x)
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
318
CHAPTER 9
POINT ESTIMATION
which yields the MLE
k = n/
In (1 + x.)
To find the CRLB, note that
lnf(x;Jc)=lnc(K-i- i)1n( +x) lnf(x; K)= 1/Kin (1 +x) and thus,
CRLB = 1/nE[1/,c in (1 + X)]2 To evaluate this last expression, it is convenient to consider the transformation
Y = In (1 + X) - EXP(1/,c)
so that
E[ln (1 + X)] = 1/K
E[1/ic - in (1 + X)]2 = Var[ln (1 + X)] = i Var(k)
CRLB = ic2/n
and approximately k
Examplo 9.4.8
N(w, ¡c2/n)
Consider a random sample from the two-parameter exponential distribution, X - EXP(1, ). We recall from Example 9.2.8 that the MLE, = X1.,,, cannot be obtained as a solution to the ML equation (9.2.7), because in L() is not differentiable over the whole parameter space. Of course, the difficulty results from the fact that the set A = {x: f(x; 'j) > O} = [j, co) depends on the parameter j. Thus, we have an example where the asymptotic normality of the MLE is nöt expected to hold. As a matter of fact, we know from the results of Chapter 7 that the first order
statistic, X1:,,, is not asymptotically normal; rather, for a suitable choice of norming constants, the corresponding limiting distribution is an extreme-value type for minimums.
Asymptotic properties such as those discussed earlier in the section exist for MLEs in the multiparameter case, but they cannot be expressed conveniently without matrix notation. Consequently, we will not consider them here.
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
319
95 BAYES AND MINIMAX ESTIMATORS
905 BAYES AÑD MINIMAX ESTIMATORS When an estimate differs from the true value of the parameter being estimated, one may consider the loss involved to be a function of this difference. If it is assumed that the loss increases as the square of the difference, then the MSE criterion simply considers the average squared error loss associated with the estimator Clearly the MSE criterion can be generalized to other types of loss func tions besides squared error
Definiiion
b1
Loss Function If T is an estimator of r(0), then a loss function is any real-valued function, L(t; O), such that L(t; O)
O
(9.5.1)
for every t
and
L(t; O)=O when t=r(0)
Definitioi Risk Function
(9.5.2)
5.2 The risk function is defined to be the expected loss, (9.5.3)
RT(0) = E[L(T; 8)]
Thus, if a parameter or a function of a parameter is being estimated, one may choose an appropriate loss function depending on the problem, and then try to find an estimator, the average loss or risk function of which is small for all possible values of the parameter. If the loss function is taken to be squared error loss, then the risk becomes the MSE as considered previously. Another reasonable loss function is absolute error, which gives the risk function RT(0) = E T - z(0) J. I
As for the MSE, it usually will not be possible to determine for other risk functions an estimator that has smaller risk than all other estimatOrs uniformly for all O. When comparing two specific estimators, it is possible that one may have a smaller risk than the other for all O.
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
320
CHAPTER 9 POINT ESTIMATION
Definition 9.5.3 Admissible Estimator RT,(0)
R.2(0)
An estimator T1 is a better estimator than T2 if and only if for all O a
and RT1(0) < R.,(0) for at least one O
An estimator T is admissible if and only if there is no better estimator.
Thus, if one estimator has uniformly smaller risk than another, we will retain the first estimator for consideration and eliminate the latter as not admissible Typically, some estimators will have smallest risk for some values of O but not for others. As mentioned earlier, one possible approach to select a best estimator is to restrict the class of estimators. This was discussed in connection with the class
of unbiased estimators with MSE risk There is no guarantee that this approach will work in every case
ExanspTh 95.1
In Example 9.3.9 we found an unbiased estimator,
= Xi: _ 1/n, whiçh
appeared to be reasonable for estimating the location parameter i. We now consider a class of estimators of the form î = cíj3 for some constant c > O. Such estimators will be biased except in the case c = 1, and the MSE risk would be MSEØ4) = Var(c3) + [b(ci3)]2 = c2/n2 + (c - 1)22.
Let us attempt to find a member of this class of estimators with minimum MSE. This corresponds to choosing e = n2n2/(1 + n2n2). Unfortunately, this depends upon the unknown parameter that we are trying to estimate. However, this suggests the possibility of choosing e to obtain an estimator that will have smaller risk, at least over a portion of the parameter space. For example, if it is suspected that ij is somewhere close to 1, then the appropriate constant would be e = n2/(1 -4- n) For this choice of e, MSE(4) < MSE(3) if and only if ij satisfies this inequality c2/n + (c - 1)2n2 < 1/na, which corresponds to 2 < 2 + 1/n2
For a sample of size I = 3, e = 09, MSEØ4) = 009 + O O12, and MSE(4)
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
9.5
FIGURE 9.3
321
BAYES AND MINIMAX ESTIMATORS
Comparison of MSEs for two different estimators of the threshold parameter
MSE (a,)
- 1.45
1.45
Añother criterion that sometimes is used to select an estimator from the class of admissible estimators is the minimax criterion.
Definition 9.5.4 Minimax Estimator An estimator T1 is a minimax estimator if max R1(0)
max RT(0)
(9.5.4)
0
9
for every estimator T.
En other words, T1 is an estimator that minimizes the maximum risk, or max RT1(0) = min max RT(0) 0
T
(9.5.5)
O
Of course, this assumes that the risk function attains a maximum value for some
O and that such maximum values attain a minimum for some T. In a more general treatment of the topic, the maximum and minimum could be replaced with the more general concepts of least upper bound and greatest lower bound, respectively.
The minimax principle is a conservative approach, because it attempts to protect against the worst risk that can occur.
= c(X1. - 1/n) discussed in Example 9.5.2 Consider the class of estimators of the form Example 9.3.10, and MSE risk. Recall that MSE(4) = c2/n2 + (c - 1)22, which depends on j except when c = 1. This last case corresponds to the unbiased has uniformly smaller MSE for nor estimate . If O < c < 1, then neither all , so we might coder using the minimax principle. Because max MSE(j3) = 1/n2 'I
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
322
CHAPTER 9
POINT ESTIMATION
and
max MSE(4) = max [c2/n2 + (c - 1)22] 'I
'I
the unbiased estimator,
,
is the minimax estimator within this class of estima-
tors It is not clear, at this point whether í is minimax for any larger class of estimators
One possible flaw of the minimax principle is illustrated by the graph of two possible risk functions in Figure 94 The minimax principle would choose T1 over T2, yet T2 is much better than T1 for most values of O.
In Example 9.3,10 it was suggested that an experimenter might have some prior knowledge about where, at least approximately, the parameter may be located More generally, one might want to use an estimator that has small risk for values of O that are "most likely" to occur in a given experiment. This can be modeled mathematically by treating O as a random variable, say O '- p(0), where p(0) is a function that has the usual properties (2.3.4) and (2.3.5) of a pdf in the variable O A reasonable approach then would be to compute the average or expected risk of an estimator, averaged over values of O with respect to the pdf p(0), and choose an estimator with smallest average risk
Definition 955 Bayes Risk For a random sample from f(x; 0), the Bayes risk of an estimator T relative to a risk function RT(0) and pdf p(0) is the average risk with respect to p(0),
= Eß[RT(0)] =
RT(0)p(0) dO
(9.5.6)
If an estimator has the smallest Bayes risk, then it is referred to as a Bayes estimator.
FIGURE 9.4
Comparison of risk functions for two estimators
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
9.5
BAYES ANO MINIMAX ESTIMATORS
323
Definition 9.5.6 Bayes Estimator For a random sample fromf(x; O), the Bayes estimator T* relative to the risk function RT(0) and pdf p(0) is the estimator with minimum expected risk, Eg[RT.(6)]
E0[R-(U)]
(9.5.7)
for every estimator T.
In some kinds of problems it is reasonable to assume that the parameter varies for different cases, and it may be proper to treat 9 as a random variable In other cases, p(0) may reflect prior information or belief as to what the true value of the parameter may be. In either case, introduction of the pdf p(0), which usually is called a prior deiisity for the parameter O, constitutes an additional assumption
that may be helpful or harmful depending on its correctness. In any event, averaging the risk relative to a pdf p(G) is a procedure that provides a possible way to discriminate between two estimators when neither of their risk functions is uniformly smaller than the other for all O. A whole class of estimators can be produced by considering different pdf's p(0).
It is useful to have a class of estimators in a problem, although if there is some physical reason to justify choosing a particular p(0), then the estimator associated with that p(0) would presumably be the best one to use in that problem. There are different philosophies involved with choosing prior densities p(0), but we will not be too concerned with how the p(0) is chosen in this work. In some cases 6 may indeed act like a random variable, and p(0) would reflect this fact. Alternatively, p(0) may represent a degree of belief concerning the value of O arrived at from previous sampling information, or by other means, In any event, potentially useful estimators can be developed through this structure. The subject of choosing a prior pdf is discussed in books by DeGroot (1970) and Zehner (1971).
Example 9.5.3
Consider again the estimators = X - 1/n and i = 0.9(X1. - 1/n) of Example 9.3.10. With squared error loss we found that i is better by the minimax principle, but i is better if it is known that 172 <2 + 1/n2, because it has smaller MSE for in this subset of (1. We now assume a standard normal prior density, ij - N(0, I), and compare the Bayes risk. It follows that E[R(ij)] = E(1/n2) = 1/n2 and E1[R,14(17)] = E[0.8 1/n2 - 0.011)2] = 0.8 1/n2 + 0.01. According to this criterion, 173 is better if n e 5 and is better if n 4. A few results now are considered that are useful in determining a Bayes estima-
tor. Note that in this framework the density function f(x; O) is interpreted as a conditional density functionf(x O).
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
324
CHAPTER 9
POINT ESTIMATION
Definition 9.5.7 Posterior Distribution The conditional density of 6 given the sample observations x (x1, .., x) ís called the posterior density or posterior pdf, and is given by
f(x1,...,xIû)p(6)
(9.5.8)
j' f(x1, ..., xIO)p(8) dO
The Bayes estimator is the estimator that minimizes the average risk over û, E5[R1.(6)]. However,
E0[R(6)] = E5{Exio[L(T; û)]} = Ex{E81x[L(T; 6)]}
(9.5.9)
and an estimator T that minimizes Eoi[L(T; 6)] for each x also minimizes the average over X. Thus the Bayes estimator may be obtained by minimizing the expected loss relative to the posterior distribution. Theorem 9,5.1
If X1, ..., X denotes a random sample from f(x I 6), then the Bayes estimator is
the estimator that minimizes the expected loss relative to the posterior distribution 0fB Ix,
E0[L(T; 6)] For certain types of loss functions, expressions for the Bayes estimator can be determined more explicitly in terms of the posterior distribution. Theorem 9.5.2
The Bayes estimator, T, of t(6) under the squared error loss function,
L(T; 6)=[Tt(6)]2
(9.5.10)
is the conditional mean of 'y(0) relative to the posterior distribution,
T
E0j[t(0)] = j'(0)foix(û) dû
(9.5.11)
Proof See Exercise 41.
Example 9.5.4
Consider a random sample of size n from a Bernoulli distribution,
f(xI6)=9x(1_û)l
x=O,1
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
9.5
BAYES AND MINIMAX ESTIMATORS
and let ®
325
UNIF(0, 1). Equation (9.5.8) yields the posterior density
rj
0Xi
- 0)' dO
0<9<1
It is convenient to express this using the notation of the beta distribution. As noted in Chapter 8, a random variable Y has a beta distribution with parameters a and b, denoted Y '-. BETA(a, b), if it has pdf of the form f(y; a, b) yO_l(j - y)b_l/B(a, b) when O z y < 1, and zero otherwise, with constant B(a, b) = F'(a)f(b)/F'(a + b). In this notation recall that the mean of the distribution is E(Y) = a/(a + b). It also is possible to express the posterior
distribution in terms of this notation. Specifically,
f91(6)
x + 1, n -
B(
0<0<1
+ 1)
In other words, ®x '- BETA( x + 1, n -
x + 1). Consequently, E01(®)
=(x+ 1)/[(x+ 1)+(nx+ 1)]=(x+ 1)/(n+2). With squared error loss, we have by Theorem 9.5.2 that the Bayes estimator of O is T (
X + 1)/(n + 2)
Example 9.5.5 Suppose that X. POI(o), and we are interested in a Bayes estimator of 0 assuming squared error loss. We choose to consider the class of gamma prior densities O
GAM(ß, K)
- ßKf() where ß and K are known arbitrary constants, The posterior distribution is given by
f(0) -
e8OE xi-- 1e8
-
fl (x I )ß9T(ic)
/1
°8O1e° dO fl (x !)/3'T(ic)
That is, Olx
GAM[(n + 1/ß)1,
x ± K]
(9.5.13)
The Bayes estimator Of O is therefore
T
E(0
-
X1 + K
n+ 1/ß
A prior density with large ß and small K makes this estimator close to the MLE, Ô = 5c.
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
326
CHAPTER 9
POINT ESTIMATION
The risk in this case is
RT(0) = E[T - O]2 = Var(T) + [E(T) - O]2
nVar(X)
[nO+K
- (n + 1/2 + [n + 1/ß
-
nO + [ic - O/ß]2
(n+1/ß)2
Some authors define the Bayes estimator to be the mean of the posterior distribution, which (according to Theorem 9.5.2) results from squared error loss. However, it is desirable in some applications to use other loss functions.
Theorem 9.5.3 The Bayes estimator, Ô, of O under absolute error loss,
L(Ô; O)=}ÔO
(9.5.14)
is the median of the posterior distribution J'e(0).
Proof See Exercise 42.
The Bayes estimator structure sometimes is helpful in finding a mìnimax estimator.
Theorem 9.5,4 If T* is a Bayes estimator with constant risk, RT(0) = c, then T* is a minimax estimator.
Proof We have max RT*(0) = max c = c = RT(0), but because RT*(0) is constant over 6, o
o
RT(0) = Ee[RT.(0)]
for every T because T* is the Bayes estimator. Now the average of a variable is not larger than the maximum value of a variable, so
E8[R(8)]
max RT(0) 8
and max RT*(0) o
max RT(0) o
which shows that T* is a minimax estimator.
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
327
SUMMARY
It follows that if an appropriate prior pdf, p(0), can be found that will yield a Bayes estimator with constant risk, then the Bayes estimator also will be the minimax estimator.
Recall the prior and posterior distributions of Example 9.5.4, but now consider a "weighted" squared error loss, L(t; 0)
(t = 0(1
9)2 (9.5.15)
- 0)
which gives more weight to the values of O that are closer to zero or one. Note that
E0i[L(t;®)]
0'(1-0)_'
('i (tO)2
J0 oi 0)B(x+ 1,nx,+ 1) dO ('1
= C(x)
(
(t - 0)2
0X-i(1
j0
/ B(x,,nx1) "
dO
with C(x) = B( x1, n - x)/B( X. + 1, n - x, + 1), which means that the expression E01[L(t, 0)] is minimized when the latter integral is minimized Notice that this integral corresponds to the conditional expectation of ordinary squared error loss (t - 0)2 relative to the posterior distribution BETA n - x1). By Theorem 9.5.2, this integra! is minimized when t is the mean of BETA ( x,, n - xj. This mean is t = x/( x + n - x) = i It follows that t*(x) = , and the Bayes estimator is T* = . Furthermore, the risk is 9)2]
0(1 -
0)/n
RT()- 0(1-0) - 0(1-0)
1
n
which is constant with respect to 0. By Theorem 9.5.4, X is a minimax estimator in this example.
SUM MARY Our purpose in this chapter was to provide general methods for estimating unknown parameters and to present criteria for evaluating the properties of esti-
mators. The two methods receiving the most attention were the method of moments and the method of maximum likelihood. The MLEs were found to have
desirable asymptotic properties under certain conditions. For example, the MLEs in certain cases are asymptotically efficient and asymptotically normally distributed. In general, it is desirable to have the distribution of the estimator highly concentrated about the true value of the parameter being estimated. This concentra-
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
328
CHAPTER 9 POINT ESTIMATIONJ
tion may be reflected by an appropriate loss function, but most attention is centered on squared error loss and MSE risk If the estimator is unbiased then the MSE simply becomes the variance of the estimator. Within the class of unbiased estimators there may exist an estimator with uniformly minimum variance for all possible values of the parameter. This estimator is referred to as the UMVUE. At this point a direct method for finding a UMVUE has not been
provided; however, if an estimator satisfies the CRLB, then we know it is a UMVUE. The concepts of sufficiency arid completeness discussed in the next chapter will provide a more systematic approach for attempting to find a UM VUE.
In lieu of finding an estimator that has uniformly minimum variance over the parameter, O, we considered the principle of minimizing the maximum variance (risk) over O (minimax estimator) and minimizing the average variance (or more generally risk) over O giving the Bayes estimators Bayes estimation requires specifying an additional prior density p(0) Information was provided on how to compute a Bayes estimator, but very little on how to find a minimax estimator.
EXERCISES 1.
Find method of moments estimators (MMEs) of O based on a random sample X1, .., X from each of the following pdrs f(x; O) = Ox°_1; O
f(x; O) = (O + 1)x°2; i
2. Find the MMEs based on a random sample of size n from each of the following distributions (see Appendix B):
X
NB(3, p).
X1 - GAM(2, ir). X1 - WEI(O, 1/2). X DE(O, ,) with both O and j unknown. X1 EV(O, n) with both O and n unknown. X - PAR(6, K) with both O and K unknown.
3. Find maximum likelihood estimators (MLEs) for O based on a random sample of size n for each of the pdf's in Exercise 1.
4.
Find the MLEs based on a random sample X1, ..., X from each of the following distributions: X1 BIN(i, p). GEO(p). NB(3, p). X - N(O, O). X1
X1
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
329
EXERCISES
X.
X X
GAM(0, 2). DE(0, 0). WEI(0, 1/2).
X1 '-. PAR(1, K).
5. Find the MLE for 0 based on a random sample of size n from a distribution with pdf
f(x;
f202x3 6zx x<0;0<0 lo
6. Find the MLEs based on a random sample X1,..., X from each of the following pdf's: X 02 zero otherwise f(x 01 02) = 11(02 - 0) 0 f(x; 0, i) = On°x°'; i x zero otherwise; 0<0,0 <, < co. 7. Let X1, ..., X be a random sample from a geometric distribution, X
GEO(p). Find the
MLEs of the following quantities:
E(X) = i/p.
Var(X)=(1 p)/p2 P{X > k] = (i p)k for arbitrary k = 1, 2, Hint: Use the invariance property of MLEs.
8. Based on a random sample of size n from a normal distribution, X
N(p,
2) find the
MLEs of the following:
P[X > e] for arbitrary e. The 95th percentile of X.
are the smallest and largest observed values of a random sample of size n from a distribution with pdff(x 0) 0 < 0 If f(x; 0) = 1 for 0-0.5 x 0 + 0.5, zero otherwise, then show that any value 0.5 O x1. + 0.5 is an ML estimate of 0. such that If f(x; 0) = 1/0 for O x 20, zero otherwise, then show that Ô 0.5x,,, is an ML
9. Suppose that x1,, and
estimate of 0.
10. Consider a random sample of size n from a double exponential distribution, X, DE(0 ) ; are observed Find the MLE of when 0 = i Hint Show first that 1f x1 values, then the sum
x - a lis minimized when a is the sample median.
Find the MLEs when both O and y are unknown. 11
Consider a random sample of size n from a Pareto distribution, X1 PAR(O, 2). Find the ML equation (9.2.7). From the data of Example 4.6.2, compute the ML estimate, Ô, to three decimal places. Note: The ML equation cannot be solved explicitly for Ô, but it can be solved numerically, by an iterative method, or by trial and error.
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
330
CHAPTER 9
POINT ESTIMATION
12. Let Y1,..., be a random sample from a log normal distribution, Y LOGN(p, a2). Find the MLEs of p and a2. Find the MLE of E(Y). Hint: Recall that Y LOGNLLI, a2) if In (Y) - N(u, a2), or, equivalently, Y = exp (X) where X
N(p,
2)
13. Consider independent random samples X1, ..., X and Y1,..., IÇ,, from normal distributions with a common mean p but possibly different variances a and a X N(p, a) and 1 -' N(p, a). Find the MLEs of p, a, and o.
so that
4. Let X be the number of independent trials of some component until it fails, where i - pis the probability of failure on each trial. We record the exact number of trials, Y X, if X r; otherwise we record Y = r + 1, where r is a fixed positive integer. (a) Show that the discrete pdf of Y is p)pYi
y=r+1 = (b) Let Y, .., 1 be a random sample from f(y; p). Find the MLE of p. Hint: f(y; p) c(y; p)pYl, where c(y;p) = i pify = i.....r and c(r + 1; p)
1. It follows
that fi c(y; p) = (1 - p)tm where m is the number of observed y that are less than r + 1. 15. Let X - BIN(n, p) and = X/n. Find a constant c so that E[c(1 - )] = p(l Find an unbiased estimator of Var(X). (e) Consider a random sample of size N from BIN(n, p). Find unbiased estimators of p and Yar(X) based on the random sample. 16.
(Truncated Poisson.) Let X - P01(p), and suppose we cannot observe X observed random variable, Y, has discrete pdf
O, so the
e - 'pY
f(y; p) = y!(1 - e) O
y = 1, 2,. otherwise
We desire to estimate P[X > O] = i - e' Show that an unbiased (but unreasonable) estimator of i - e' is given by u(Y) where u(y) = O if y is odd and u(y) = 2 if y is even Hint: Consider the power series expansion of(e + e")/2.
17
Let X1
X be a random sample from a uniform distribution X
UNIF(O - i
O + 1).
Show that the sample mean, , is an unbiased estimator of O. Show that the "midrange," (Xi.,, + X.)/2, is an unbiased estimator of O.
18.
Suppose that X is continuous and its pdf, f(x; p), is symmetric about p. That is,
f(u+c p)=f(uc p)forallc>O
(a) Show that for a random sample of size n where n is odd (n = 2k - 1) the sample median, Xk.fl, is an unbiased estimator of p.
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
331
EXERCISES
(b) Show that Z Xi:. - p and W = p - X,,.,, have the same distribution, and thus that the midrange (X1 + X ,,)/2 is unbiased for p
19. Consider a random sample of size n from a uniform distribution, X UNIF( 8, 0); 8> 0. Find a constant e so that c(X,,.,, - X1.,,) is an unbiased estimator of 0.
20. Let S be the sample standard deviation, based on a random sample of size n from a distribution with pdf f(x; p, a2) with mean p and variance 72 Show that E(S) a, where equality holds if and only if f(x; p, a2) is degenerate at p, P[X = p] 1 Hint Consider VarUS) N(p, a2) find a constante such that eS is an unbiased estimator of a. Hint: If X 2(n - 1) and S = (52)1/2. Use the fact that (n - 1)S2/a2 Relative to (b), find a function of X and S that is unbiased for the 95th percentile of X N(p, a2)
21. Consider a random sample of size n from a Bernoulli distribution, X
BIN(1, p).
Find the CRLB for the variances of unbiased estimators of p. Find the CRLB for the variances of unbiased estimators of p(1 Find a UMVUE olp.
22. Consider a random sample of size n from a normal distribution, X
N(p, 9).
Find the CRLB for variances of unbiased estimators of p
Is the MLE, = X, a UMVUE op? Is the MLE of the 95th percentile a UMVUE?
23. Let X1.....X,, be a random sample from a normal distribution, N(O, 0). Is the MLE, Ô, an unbiased estimator of 0? IsÔaUMVUE of O?
24. Let X
P01(p), and let O = P[X = 0] = e.
an unbiased estimator of 0? Show that Ô = u(X) is an unbiased estimator of O, where u(0) = i and u(x) = O if x = 1,2, (e) Compare the MSEs of Ô and Ô for estimating O = e" when p = i and p 2. Is Ô =
25. Consider the estimator T1 = 1/X of Example 9.3.2. Compare the MSEs of T1 and eT1 for estimating 1/0 where e = (n - 1)/n
26. Consider a random sample of size n froma distribution with pdff(x; 0) = 1/6 if O, and zero otherwise; O < O. O< (a). Find the MLE Ô.
(b) Find the MME Ô. (e) Is Ô unbiased? Is Ô unbiased? Compare the MSEs of Ô and Ô.
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
332
CHAPTER 9
POINT ESTIMATION
27. Consider a random sample of size n = 2 from a normal distribution, X1
N(0, 1), where
= {0 O 0 c 1}. Define estimators as follows Ô1 = (1/2)X1 + (1/2)X2, 02 = (1/4)X1 + (3/4)X2, 03 = (2/3)X1, and 04 = (2/3)Ô. Consider squared errOr loss L(t; 0) = (t - 0)2. Compare the risk functions for these estimators Compare the estimators by the minimax principle. Find the Bayes risk of the estimators, using O UNIF(O, 1). Find the Bayes risk of the estimators, using O BETA(2, 1).
28. Let X1, ..., X be a random sample from EXP(0), and define 05 = 02 = n7(n + 1).
and
Find the variances of O and 02. Find the MSEs of Ô and 02. Compare the variances of O and 03 for n = 2. Compare the MSEs of Ô and 03 for n 2. a (e) Find the Bayes risk of Ô using 0 - EXP(2).
29. Consider a random sample of size n from a Bernoulli distribution, X1
BIN(1, p). For a uniform prior density, p '-. UNIF(O, 1), and squared error loss, find the following: Bayes estimator of p. Bayes estimator of p(l - p). Bayes risk for the estimator in (a).
3O.
Let X POJ(J1), and consider the loss function L(jt, p) = (Ii - ,u)2/u. Assume a gamma prior density, p GAM(0, K), where O and ¡e are known. Find the Bayes estimator of p. Show that = X is the minimax estimator.
31. Let Ô and O be the MLE and MME, respectively, for O in Exercise 26. Show that O is MSE consistent. Show that O is MSE consistent.
32, Show that the MLE of O in Exercise 5 is simply consistent.
33. Consider a random sample of size n from a Poisson distribution, X.
P01(u). Find the CRLB for the variances of unbiased estimators of p. Find the CRLB for the variances of unbiased estimators of O = e' Find a UM VUE of p. Find the MLE Ô of O. Is Ô an unbiased estimator of 0? Is Ô asymptotically unbiased? Show that O = [(n - 1)/n] is an unbiased estimator of O. Find Var(0) and compare to the CRLB of (b). Hint: Note that Y = X1 P0I(np), and that E(0) and Var(O are related to the MGF
of Y.
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
EXERCISES
333
34. Consider a random sample of size n from a distribution with discrete pdf f(x; p) p(1 p)X; x = 0, 1,..., zero otherwise. Find the MLE of p. Find the MLE of 9 (1 - p)/p. Find the CRLB for variances of unbiased estimators of 9. Is the MLE of 9 a UMVUE? Is the MLE of O MSE consistent? Find the asymptotic distribution of the MLE of O. Let O = nX/(n + 1). Find risk functions of both O and . using the loss function L(t; O) = (t - 0)Z/(O2 + O)
35.
Find the asymptotic distribution of the MLE of p in Exercise 4(a).
36. Find the asymptotic distribution of the MLE of O in Exercise 4(d). 37. Let X1. ..., X,, be a random sample with an odd sample size (n
2k - 1). ), find the asymptotic relative efficiency, as defined by equation (9.4.6), of fi,, = X relative to ,, = Xi,.,, If X1 -. N(tt, 1), find the asymptotic relative efficiency, as defined by equation (9.4.6), ofí,, = Xk.,, relative toît,, = X,,. If X1 'W.. DE(1,
38. An estimator Ois said to be median unbiased if FEO < O] F[O> O]. Consider a random sample of size n from an exponential distribution, X1 EXP(0). Find a median unbiased estimator of O that has the form O = c.. Find the relative efficiency of O compared to . Compare the MSEs of O and when n = 5.
39. Suppose that 01, ¡ = i.....n are independent unbiased estimators of O with Var(O1) = Consider a combined estimator O =
a1O1 where >2a1
I.
Show that O is unbiased. lt can be shown that Var(0) is minimized by letting a1
(1/)/(1/2). Verify this
for the case n = 2.
40. Let X be a random variable with CDF F(x). Show that E[(X - c)2] is minimized by the value c = E(X). Assuming that X is continuous, show that E[J X - c] is minimized if c is the median, that is, the value such that F(c) = 1/2.
41. Prove Theorem 9.5.2. Hint: Use Exercise 40(a) applied to the posterior distribution for fixed x.
42. Prove Theorem 9.5.3. Hint.' Use Exercise 40(b).
43. Consider the functions L(0), L*(t), and u(0) in the discussion preceding Theorem 9.2.1. Show that u(0) maximizes L*(r) if O maximizes L(0). Hint: Note that L*(t) in the range of the function u(0), and L(0) = L*() if u(0).
L(0) for all
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
334
CHAPTER 9
POINT ESTIMATION
44. Let X1, X2,..., X be a random sample from an exponential distribution with mean 1/0, f(x I 0) = O exp (- Ox) for x > O, and assume that the prior density of 9 also is exponential with mean 1/ß where ß is known. Show that the posterior distribution is 91x '. GAM[(fi + xY', n + 1]. Using squared error loss, find the Bayes estimator of 6. Using squared error loss, find the Bayes estimator of p = 1/0. Using absolute error loss, find the Bayes estimator of 6. Use chi-square notation to express the solution. Using absolute error loss, find the Bayes estimator of p = 1/0.
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
SUFHC NCY AND COM LETENESS
INTRODUCTION Chapter 9 presented methods for deriving point estimators based on a random sample to estimate unknown parameters of the population distribution. In some cases, it is possible to show, in a certain sense, that a particular statistic or set of statistics contains all of the "information" in the sample about the parameters. It then would be reasonable to restrict attention to such statistics when estimating or otherwise making inferences about the parameters. More generally, the idea of sufficietcy involves the reduction of a data set to a more concise set of statistics with no loss of information about the unknown parameter. Roughly, a statistic S will be considered a "sufficient" statistic for a parameter O if the conditional distribution of any other statistic T given the value of S does not involve O. In other words, once the value of a sufficient statistic is known, the observed value of any other statistic does not contain any further information aboùt the parameter. 335
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
336
CHAPTER 10 SUFFICIENCY AND COMPLETENESS
Example 70.1.1 A coin is tossed n times, and the outcome is recorded for each toss. As usual, this process could be modeled in terms of a random sample X1, ..., X from a Bernoulli distribution. Suppose that it is not known whether the coin is fair, and we
wish to estimate 8 = P(head). It would seem that the total number of heads, S = X, should provide as much information about the value O as the actual outcomes. To check this out, consider
-
..., x; O) =
We also know that S ' BIN(n, 8), so that f5(s; O)
(n)Os(l - O)
s =0, 1, ..., n
x. = s, then the events [X1 = x1, ..., X'= x, S = s] and [X1 = x1, ..., X,, = x,,] are equivalent, and If
...,
x,,)
-
P[X1 = x1, ..., X = x,,, S = s] P[S = s]
f(x1, ..., x,,; 8) f5(s;O)
- ()os(l
-
(n If
s, then the conditional pdf is zero. In either case, it does not involve O.
x
Furthermore, let T = I(X1, ..., X,,) be any other statistic, and define the set C, = {(x1,
,
x,,) I(x1,
,
x,,) = t} The conditional pdf of T given S = s is
fTH(t) = P[T = tJS = s] = f1 C:
which also does not involve O.
It will be desirable to have a more precise definition of sufficiency for more than one parameter, and a set of jointly sufficient statistics.
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
10.2 SUFFICIENT STATISTICS
337
10.2 SUFFICIENT STATISTICS in the previous chapter, a set of data x1, .., x will be modeled mathematically as observed values of a set of random variables X1, .., X. For convenience, we will use vector notation, X= (X1, ..., X,,) and x = (x1, ..., x,,), to refer to the observed random variables and their possible values. We also will allow the possibility of a vector-valued parameter O and vector-valued statistics S and T.
efinition
10.2.1
Jointly Sufficient Statistics Let X = (X1.....X,,) have joint pdf f(x, O), and let S= (S1.....Sk) be a k-dimensional statistic. Then S1, ..., S is a set of jointly sufficient statistics for O if for any other vector of statistics, T, the conditional pdf of T given S = s, denoted byfTIS(t), does not depend on O. In the one-dimensional case, we simply say that S is a sufficient statistic for O
Again, the idea is that if S is observed, then additional information about O cannot be obtained from T if the conditional distribution of T given S = s is free of O. We usually will assume that X1, ..., X,, is a random sample from a popu-
lation pdf f(x; O), and for convenience we often will refer to the vector X = (X1, ..., X,,) as the random sample. However, in general, X could represent some other vector of observed random variables, such as a censored sample or some other set of order statistics. The primary purpose is to reduce the sample to the smallest set of sufficient statistics, referred to as a "minimal set" of sufficient statistics. If k unknown parameters are present in the model, then quite often there will exist a set of k sufficient statistics. In some cases, the number of sufficient statistics will exceed the number of parameters, and indeed in some cases no reduction in the number of statistics is possible. The whole sample is itself a set of sufficient statistics, but
when we refer to sufficient statistics we ordinarily will be thinking of some smaller set of sufficient statistics.
Definition 102.2 A set of statistics is called a minimal sufficient set if the members of the set are jointly sufficient for the parameters and if they are a function of every other set of jointly sufficient statistics.
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
338
CHAPTER 10 SUFFICIENCY AND COMPLETENESS
For example, the order statistics will be shown to be jointly sufficient. In a sense, this does represent a reduction of the sample, although the number of statistics in this case is not reduced, In some cases, the order statistics may be a minimal sufficient set, but of course we hope to reduce the sample to a small number ofjointly sufficient statistics. Clearly one cannot actually consider all possible statistics, T, in attempting to use Definition 10.2.1 to verify that S is a sufficient statistic, However, because T
may be written as a function of the sample, X = (X1, ..., X,j, one possible approach would be to show that fXk(x) is free of O. Actually, this approach was used in Example 10.1.1, where X was a random sample from a Bernoulli distribution. Essentially the same derivation could be used in the more general situation where X is discrete and S and O are vector-valued. Suppose that S = (S1, ..., S,) where S = d(X1, ..., X) forj 1,..., k, and denote by (x1, ..., x) the vectorvalued function whose jth coordinate is ö(x1, ..., .x). In a manner analogous to
Example 10.1.1, the conditional pdf of X = (X1, ..., X,,) given S = s can be written as
.., x,,; f,,(s;0)
0)
otherwise
O
(10.2.1)
This would not be a standard situation for continuous random variables, because we have an n-dimensional vector of random variables with the distribution of probability restricted to an n - k dimensional subspace. Consequently, care must be taken with regard to the meaning of an expression such as (10.2.1) in the continuous case. In general, we can say that S1, ..., S are jointly sufficient for O if equation (10.2.1) is free of O.
Some authors avoid any concern with technical difficulties by directly defining ..., S to be jointly sufficient for 0 iff(x1, ..., x,,; O)/f(s; O) is free of O. In any
S1,
event, equation (10.2.1) will be used here without resorting to a more careful mathematical development.
Example 10.2.1 Consider a random sample from an exponential distribution, X
EXP(0), It
follows that
f(xi,...,xn;O)=exp(X) which suggests checking S '-. GAM(O, n), so that f8(s, O)
= OT(n)
the statistic S =
s1e'°
x>O X,. We also know that
s>0
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
339
10.2 SUFFICIENT STATISTICS
Ifs =
x1, then
..., x; O) f5(s;O)
f(n) sl
which is free of O, and thus by equation (10.2.1) S is sufficient for 6.
A slightly simpler criterion also can be derived. In particular, ifS1,
..., S are
jointly sufficient for O, then
x; O) = fis; O)f1(x1, ..., x) = g(s; O)h(x1, ..., x)
(10.2.2)
That is, the joint pdf of the sample can be factored into a function of s and O times a function of x = (x1, ..., x) that does not involve O. Conversely, suppose that f(x1, ..., x; O) = g(s; O)h(x1, .....xv), where it is assumed that for fixed s, h(x1, ..., x) does not depend on O. Note that this means that if the joint pdf of X1, ..., X is zero over some region of the xe's, then it must be possible to identify this region in terms of s and O, and in terms of x and s without otherwise involving the x with the 0. If this is not possible, then the joint pdf really is not completely specified in the form stated. Basically, then, if equation (10.2.2) holds for some functions g and h, then the marginal pdf of S must be in the form f5(s; O) = g(s; 0)c(s)
because for fixed s integrating or summing over the remaining variables cannot bring O into the function Thus
f(x1, ..., x, O) = f5(s; 0)h(x1, ..., x)/c(s) and
.., x; O) Js5'
V)
..., x)/c(s)
which is independent of O. This provides the outline of the proof of the following theorem. If X1, ..., X have joint pdff(x1, ..., x; O), and ifS = Sa), then S1,..., Sk are jointly sufficient fòr O if and only if
Theorem 10.2,1 Factorization Criterion (S1,
...,
x; O) = g(s; O)h(x1, ..., x)
(10.2.3)
where g(s; O) does not depend on x1, ..., x, except through s, and h(x1, ..., x) does not involve O.
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
340
CHAPTER
lo
SUFFICIENCY AND COMPLETENESS
Example 10.2.2 Consider the random sample of Example 10.1.1, where X '-j BIN(1, 0). In that example, we conjectured that S = X. would be sufficient, and then verified it directly by deriving the conditional pdf of X given S - s. The procedure is somewhat simpler if we use the factorization criterion. In particular, we have
0)nx
..., x; 0) 0X(l
- 0(1
0)
= g(s; 0)h(x1, ..., x) where s = x and, in this case, we define h(x1, ..., x) = i if all x1 = O or 1, and zero otherwise. It should be noted that the sample proportion, O = S/n, also is sufficient for 6. In general, if a statistic S is sufficient for O, then any one-to-one function of S also is sufficient for 0.
lt is important to specify completely the functions involved in the factorization criterion, including the identification ofregions of zero probability. The following example shows that care must be exercised in this matter.
Example 10,2.3 Consider a random sample from a uniform distribution, X. O is unknown The joint pdf of X1,
UNIF(0, 0), where
X,, is
,,; 6) =
O
and zero otherwise. It is easier to specify this pdf in terms of the minimum, x1.,, and maximum, x,,,,, of x1, ..., x. In particular,
..., x,,;
i n
O
Xn:n
which means that x,,; O) = g(x,,.,,; O)h(x1,
...,
Xn)
where g(s; 6) = 1/On jf s < O and zero otherwise, and h(x1, ..., x,,) = i if O
and zero otherwise. It follows from the factorization criterion that the largest order statistic, S = X,,.,,, is sufficient for 0.
This type of problem is made more clear by using "indicator function" notation, which allows the conditions on the limits of the variables to be incorporated directly into the functional form of the pdf.
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
10.2
341
SUFFICIENT STATISTICS
Definition 10.2.3 If A is a set, then the indicator function of A, denoted by 'A' is defined as
Ii
1A(x)Ø
ifxA A
In the previous example, we let A = (0, 6), then f(x; O) = (1/O)I0,81(x)
so that n
x; 6) = (1/Oh)I!I(O 6)(x1) Because (10.2.3)
= '(0,
flI(06)(x1) is
satisfied
i if and only if 0< X1. and 0< x with
s=
g(s; 6)
(1/O)I(o
5)(S)
..., x)
in)
ExampM 10.2.4 Consider a random sample from a normal distribution, X1
NCu, o-2), where
both p and a2 are unknown. It follows that
x; p, a2) Because (x holds with Sj
x - 2p
)2
exp --
= (2o-2)2
(X1 -
x + np2 it follows that equation (10.2.3)
X1, s2 =
g(s1, s2; p, a2)
i exp (2ira2)2
H
L
i 2o-
- 2ps1 + np2)
X1 and 52 and h(x5, ..., x) = 1. Thus, by the factorization criterion, S1 = X are jointly sufficient for O = (p, o-2) Notice also that the MLEs, ¡ = = S1/n and &2 = (X1 - )2/n = S2/n - (S1/n)2, correspond to a one-to-one transformation of S1 and S2 so that p and &2 also are jointly sufficient for p
and a2.
In the next section, the general connection between MLEs and sufficient statistics will be established. When a minimal set of sufficient statistics exists, we might expect the number
of sufficient statistics to be equal to the number of unknown parameters In the
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
342
CHAPTER 10 SUFFICIENCY AND COMPLETENESS
following example, two statistics are required to obtain sufficiency for a single parameter. Example 10.2.5 Consider a random sample from a uniform distribution, X. '-. UNIF(8, O +
1).
Notice that the length of the interval is one unit, but the endpoints are assumed to be unknown. The pdf of X is the indicator function of the interval, f(x; O) = I(0,O+1)(x), so the joint pdf of X1,..., X, is
f(x1, ...,
x; O) = fl1(j)(x1)
This function assumes the value i if and only if O so that
n
f(x1, ...,
x; O) = fJi01, J)(x)flI(_,Ol)(xl) = '(O, )(
0+ 1)(Xflfl)
which shows, by the factorization criterion, that the smallest and largest order statistics S1 = X1.,, and S2 = X,,.,, are jointly sufficient for O. Actually, it can be shown that S1 and S2 are minimal sufficient. Methods for verifying whether a set of statistics is minimal sufficient are dis cussed by Wasan (1970), but we will not elaborate on them here
103 FURTH ER PROPERTIES OF SUFFICIENT STATISTICS It is possible to relate sufficiency to several of the concepts that were discussed in earlier chapters.
Threm 10.3.1 If S1, ..., S are jointly sufficient for O and if Ô is a unique maximum likelihood estimator of O, then Ô is a function of S = (S1,,.., Sk).
Proof By the factorization criterion, L(0) = f(x1,
, X,,, O)
= g(s
O)h(x1
, x,,)
which means that a value that maximizes the likelihood function must depend on s, say Ô = i(s). If the MLE is unique, this defines a function of s.
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
10.3
FURTHER PROPERTIES OF SUFFICIENT STATISTICS
343
Actually, the result can be stated more generali: If there exist jointly sufficient statistics, and if there exists an MLE, then there exists an MLE that is a function
,
of the sufficient statistics.
It also follows that if the MLEs, 8, are unique and jointly sufficient, then they are a minimal sufficient set, because the factorization criterion applies for every set ofjointly sufficient statistics.
The following simple example shows that it is possible to have a sufficient statistic S and an MLE that is not a function of S.
Example 10.3.1 Suppose that X is discrete with pdff(x; 0) and fi = {O, 1}, where; with the use of indicator functions,
f(x; 8) =
(1O + )12(x) 8\ (1-0 + +(
i
'(1, 41(x) + Ç,
4
If i(x) = 4)(X) + 21121(x) + 41(3)(x), then S = (X) is sufficient for 9, which can be seen from the factorization criterion with h(x) =
2, 3 41(x)
and g(s; 9)
-
1-8 +
8
s
Furthermore, more than one MLE exists. For example, the functions i1(x) = 31(x) and 12(x) = 1(1, 3,4}(x) both produce MLEs, O = i1(x) and 2 = 12(x), because the corresponding estimates maximizef(x; O) for each fixed x. Clearly, I{
is not a function of S because i(1) = i(4) = 3, but i(1) = i while i(4) = O. However, °2 = i(s) where i(s) = 1(3 4}(S).
This shows that some care must be taken in stating the relationship between sufficient statistics and MLEs. If the MLE is unique, however, then the situation is rather straightforward.
Example 10.3.2 Consider a random sample of size n from a Bernoulli distribution X. - BIN(l, p). We know that S = X, is sufficient for p, and that S BIN(n, p). Thus, we may determine the MLE of p directly from the pdf of S, giving = S/n as before
Theorem 10.3.2 IfS is sufficient for 8, then any Bayes estimator will be a function of S.
Proof Because the function h(x1, ., x) in the factorization criterion does not depend on 0, it can be eliminated in equation (9.5.8), and the posterior density f81(0) can .
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
344
CHAPTER 10
SUFFICIENCY AND COMPLETENESS
be replaced by g(s; O)p(0)
- Çg(s;
O)p(0) dO
As mentioned earlier, the order statistics are jointly sufficient.
Theorem 10.3.3 If X1, ..., X,, is a random sample from a continuous distribution with pdff(x; 9), then the order statistics form a jointly sufficient set for O.
Proof Forfixedx1.,,,...,
and associated x1, ..., x,,
f(x1; O) ... f(x,,; 0)
1
and zero otherwise. Generally, sufficient statistics are involved in the construction of UMVUEs. Theorem 10.3.4 Rao-Blackwell Let X1, ..., X,, have joint pdf f(x1, ..., x,,; 0), and let S = (S1, ..., S,j be a vector of jointly sufficient statistics for 9. If T is any unbiased estimator of r(9), and if T* = E(T S), then
T* is an unbiased estimator of r(0),
T is a function of S, and Var(T') Var(T) for every 0, and Var(T*) < Var(T) for some O unless T* = T with probability 1.
Proof By sufficiency, f15(t) does not involve 9, and thus the function 1*(s) = E(T s)
does not depend on O. Thus, T* = I*(S) = E(T tion of S, and furthermore,
S)
is an estimator that is a func
E(T*) = E5(T*)
= E3[E(TS)]
=E(T) = t(0)
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
345
10.4 COMPLETENESS AND EXPONENTIAL CLASS
by Theorem 5.4.1. From Theorem 5.4.3,
Var(T) = Var[E(T 5)] + E[Var(T S)] I
Var[E(Tj5)] = Var(T*)
with equality if and only if E[Var(T IS)] = O, which occurs if and only if Var(T 5) = O with probability 1, or equivalently T = E(T LS) = T*.
It is clear from the Rao-Blackwell theorem that if we are searching for an unbiased estimator with small variance, we may as well restrict attention to func tions of sufficient statistics If any unbiased estimator exists, then there will be
one that is a function of sufficient statistics, namely E(T S), which also is I
unbiased and has variance at least as small or smaller In particular we still are interested in knowing how to find a UMVUE for a parameter and the above theorem narrows our problem down somewhat. For example, consider a oneparameter model f(x; O), and assume that a single sufficient statistic, S, exists. We know we must consider only unbiased functions of S in searching for a UM VUE. In some cases it may be possible to show that only one function of S is unbiased, and in that case we would know that it is a UMVUE. The concept of "completeness is helpful in determining unique unbiased estimators, and this concept is defined in the next section
COMPLETENESS AND EXPONENTIAL CLASS
DefInition 10.4.1 Completeness A family of density functions {f(t O) O e E[u(T)] = O for all O e
implies u(T)
) is called complete if O with probability I for all O e .
This sometimes is expressed by saying that there are no nontrivial unbiased estimators of zero In particular it means that two different functions of T cannot
have the same expected value. For example, if E[u1(T)] = r(0) and E[u2(T)] = r(0) then E[u1(T) - u2(T)] = O, which implies u1(T) - u2(T) = O, or u1(T) = u2(T) with probability 1, if the family of density functions is complete That is,
any unbiased estimator is unique in this case We primarily are interested in knowing that the family of density functions of a sufficient statistic is complete, because in that case an unbiased function of the sufficient statistic will be unique, and it must be a UMVUE by the Rao-Blackwell theorem.
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
346
CHAPTER lo SUFFICIENCY AND COMPLETENESS
A sufficient statistic the density of which is a member of a complete family of density functions will be referred to as a complete sufficient statistic.
Theorem 104.1 Lehmann-Scheffe Let X1, ..., X have joint pdff(x1,. ., x; O), and let S be a vector ofjointly complete sufficient statistics for 0. If T* = 1'(S.) is a statistic that is unbiased for r(0) and a function of S, then T* is a UMVUE of r(0).
Proof It follows by completeness that any statistic that is a function of S and an unbiased estimator of r(0) must be equal to T* with probability 1. If T is any other statistic that is an unbiased estimator of z(0), then by the Rao-Blackwell theorem E(T S) also is unbiased for r(0) and a function of S, so by uniqueness, T* = E(T IS) with probability 1. Furthermore, Var(T*) Var(T) for all O. Thus, I
T* is a UMVUE of t(0).
E;ramíIø 10.4.1 Let X1, X,
., X denote a random sample from a Poisson distribution,
P01(p), so that
fl(x!) By the factorization criterion, S = X, is a sufficient statistic. We know that P0I(np), and we can show that a Poisson family is complete. For convenience let O = np, and consider any function u(s). We have S
09s
E[u(S)]
= sO
Because e 0, setting E[u(S)] = O requires all the coefficients, u(s)/s' of 0' to be zero. But u(S)/s! = 0 implies u(s) = 0. By completeness, = S/n is the unique
function of S that is unbiased for E() = p, and by Theorem 10.4.1 it must be a UMVUE of p.
This particular result also can be verified by comparing VarÇ) to the CRLB; however, the CRLB approach will not work for a nonlinear function of S. The
present approach, on the other hand, can be used to find the UM VUE of 'r(0) = E[u(S)], for any function u(s) for which the expected value exists. For example, in the Poisson case, E(X2) = p2 + p/n, so that 2 = (S/n)2 is the UMVUE of p2 + p/n. It also follows that 2 X/n = (S/n)2 - S/n2 is the
-
UM VUE of p2. If a UMVUE is desired for any specified r(p), it is only necessary
to find some function of S that is unbiased for
z(ji);
then that will be the
UM VUE. If there is difficulty in finding a u(s) such that E[u(S)] = r(u), one possi-
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
347
10.4 COMPLETENESS ANO EXPONENTIAL CLASS
bility is to find any function, h(X, ..., X) = T, that is unbiased, arid then E(T S) will be an unbiased estimator that is a function of S. Thus, use of complete sufficient statistics and the Rao-Blackwell theorem provides one possible systematic approach for attempting to find UMVUEs in certain cases. Note that completeness is a property of a family of densities, and the family must be large enough or "complete" to enjoy this property. That is, there may be a nonzero u(S) whose mean is zero for some densities, but this situation may not hold if more densities are added to the family. If one considers a single Poisson distribution, say u = 1, then E(S - n) = O, and a family consisting of this single Poisson density function is not complete because u(s) = s - n O if s n If the range of the random variable does not depend on parameters, then one may essentially restrict attention to families of densities that fall in the form of the 'exponential class when considering complete sufficient statistics so we need not consider these families individually in detail.
Lt)fiAition 10.4.2 Exponeatial Class A density function is said to be a member of the regular exponential class if it can be expressed in the form f(x; O) = c(0)h(x) exp
[ï(0)Ii(x)]
xeA
(10.4.1)
and zero otherwise, where O = (0......0) is a vector of k unknown parameters, if the parameter space has the form
(={øIa0
b1,
l,.., k)
(note that a = - co and b1 = co are permissible values), and if it satisfies regularity conditions 1, 2, and 3a or 3b given by:
The set A = {x:f(x; O)> 0} does not depend on O.
The functions qi(0) are nontrivial, functionally independent, continuous functions of the O.
For a continuous random variable, the derivatives I(x) are linearly independent continuous functions of x over A. For a discrete random variable, the I(x) are nontrivial functions of x on A, and none is a linear function of the others.
For convenience, we will write thatf(x; O) is a member of REC(q1, ..,, q) or simply REC.
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
348
CHAPTER 10 SUFFICIENCY AND COMPLETENESS
Example 10.4.2 Consider a Bernoulli distribution, X
j'(x; ) = pX(l
BIN(1, p). It follows that
p)l_X
= (1 - p) exp {x in [p7(1 - p)]}
x e A = {O, 1}
which is REC(q1) with q1(p) = In [p1(1 - p)] and 11(x) = x.
Note that the notion of REC, with slightly modified regularity conditions, can be extended to the case where X is a vector.
It can be shown that the REC is a complete family for the special case when tj(x) = X. Many of the common density functions such as binomial, Poisson,
exponential, gamma, and normai pdf's are in the form of the REC, but we are particularly interested in knowing that the pdf's of the sufficient statistics from these models are complete. If a random sample is considered from a member of the REC, then a set of joint sufficient statistics is identified readily by the factorization criterion; moreover, the pdf of these sufficient statistics also turns out to be in the special form of a (possibly multivariate) REC, and therefore they are complete sufficient statistics.
Theorm 10.4.2 If X1, ..., X, is a random sample from a member of the regular exponential class REC (q1, q), then the statistics
= i= i
are a minimal set of complete sufficient statistics for
Example 10.4.3 Consider the previous example, X
I(x) = x and S =
01,
BIN(l, p). For a random sample of size n,
is a complete sufficient statistic for p
If we desire a UMVUE of Var(X) = p(l - p), we might try X(1 -
-
) Now
)] = E(i) - E(2) = p - [p2 + Yar()] = p - p2 - p(l - p)/n = p(l - p)(1 - 1/n)
and thus E[n(1 - )/(n - 1)] = p(l - p), and this p(1
- p) as c(1
-
gives
the UMVUE of
) where c = n/(n - 1).
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
10.4
349
COMPLETENESS AND EXPONENTIAL CLASS
Example 10.4.4 If X ' N(u, cr2), then
=
22
1
2ir 2,ux 2cr
2
,u2
2a '
For a random sample of size n, it is clear that S1 = X and S2 = X are jointly complete and sufficient statistics of i and cr2. Because MLEs are one-toone functions of S1 and S2, they also could be used as the jointly complete sufficient statistics here
It can be shown that under mild regularity conditions, families of density functions that admit k-dimensional sufficient statistics for all sample sizes must be k-parameter RECs. Thus, for the regular case, for most practical purposes Theorem 10.4,2 covers all of the models that admit complete sufficient statistics, and there is no point in attempting to find complete sufficient statistics in the regular case for models that are not in the REC form. We have seen that a close connection exists among the REC, complete sufficient statistics, and UMVUEs. Also, MLEs are functions of minimal sufficient
statistics, and the MLEs are asymptotically efficient with asymptotic variance being the CRLB If we call an estimator whose variance achieves the CRLB a CRLB estimator, then the following theorems can be stated.
Theorem 10.4 .3 If a CRLB estimator T exists for r(0), then a single sufficient statistic exists, and
T is a function of the sufficient statistic. Conversely, if a single sufficient statistic exists and the CRLB exists, then a CRLB estimator exists for some r(0).
Theorem 10.4.4 If the CRLB exists, then a CRLB estimator will exist for some function 'r(0) if and
only if the density function is a member of the REC Furthermore the CRLB estimator of r(0) will be r(0), where O is the MLE of O.
Most pdf's of practical interest that are not included in the REC belong to another general class, which allows the range of X, denoted by A = {x:f(x; O)> 0}, to depend on O.
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
350
CHAPTER 10 SUFFiCIENCY AND COMPLETENESS
Definition 10.4.3 A density function is said to be a member of the range-dependent exponential class, denoted by RDEC (q1, ..., q,), if it satisfies regularity conditions 2 and 3a or 3b of Definition 10.4.2, forj 3.....k, and if it has the form f(x; O)
c(0)h(x) exp [±qj(6i ..., 6)Ij(x)]
where A = {x q1(61, 02)
(10.4.2)
.
We will include as special cases the following:
The one-parameter casç, where
f(; O)
c(0)h(x)
(10.4.3)
with A = {x j q1(0) < x < q2(0)}
The two-parameter case, where
f(x; 0, 02) = c(01, 02)h(x)
(10.4.4)
with
A = {x q1(01, 02)
Theorem 10.4.5 Let X1, ..., X,, be a random sample from a member of the RDEC (q1,..., qj.
If k >2, then S1 = X1,,, S2 = X,,.,, and S3, ..., S where S3 are jointly sufficient for O = (Or, ..., 0j. In the two-parameter case, S1 = X1.,, and
2 = X,.,, are jointly sufficient
for O = (Or, 02).
In
the one-parameter case, S = X1.,, and S2 = X,,.,, are jointly sufficient for O. If q1(0) is increasing and q2(0) is decreasing, then T1 = min [q 1(X1.,,), q 1(Xn:n)] is a single sufficient statistic for 0. If q1(0) is decreasing and q2(0) is increasing, then T2 = max [qj'(X1.,,), q'(X,,,)]
is a single sufficient statistic for 0.
a
If one of the limits is constant and the other depends on a single parameter, say O, then the following theorem can be stated.
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
351
10.4 COMPLETENESS AND EXPONENT}AL CLASS
Theorem 10.4.6 Suppose that X1, ..., X,, is a random sample from a member of the RDEC.
If k > 2 and the lower limit is constant, say q1(0) statistics
a, then X,.,, and the
I3(X,) are jointly sufficient for O and 03, j = 3,
k If the
upper limit is constant, say q2(0) = b, then X1 ,, and the statistics I(X) are jointly sufficient for O aiid O;J = 3, ..., k. In the one-parameter case, if q(0) does not depend on O, then 2 = is sufficient for 0, and if q2(0) does not depend on 0, then S1 = X1.,, is sufficient for O
Example 10.4.5 Consider the pdf
f(x; 6) = and zero otherwise. We have q1(0) = O, a decreasing function, and q2(0) = 0, an
increasing function of 0 Thus, by Theorem 104 5 T2 = max [X1 ,,, X,, j is a single sufficient statistic for 0. Exampla 10.4.6 Consider a two-parameter exponential distribution, X
f(x; O, n) = (1/O) exp [(x = (1/0) exp (u/O) exp ( x/0)
EXP(O,
).
n
If X1.....X,, is a random sample, then it follows from Theorem 10.4.6 that X1.,, is not a funcand X are joint sufficient statistics for (0, ). Because q,() = tion of parameters, X,,.,, is not involved. Suppose that O is known, say O = 1. Then
fl
f(x; n) = = e_xe
We see that X1:,, is sufficient for . This is consistent with earlier results, where we
found that estimators of n based on X1.,, were better than estimators based on other statistics, such as ', for this model.
Example 10.4.7 Consider a random sample of size n from a uniform distribution, UNIF(01, 02). Because X i
01
f(x;01,02)=0
L'2
h1
it follows from Theorem 10.4.5 that X1.,, and X,,:,, are jointly sufficient for (6, 62).
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
352
CHAPTER 10 SUFFICIENCY AND COMPLETENESS
The previous results deal only with finding sufficient statistics for members of the RDEC These statistics also may be complete, but this must be verified by a separate argument.
ExampI 104.8 Consider a random sample of size n from a uniform distribution X UN1F(O, 8). It follows from the previous theorem that X,,.,, is sufficient for 8. The pdf of S = X,,, is f(s; U) =
nst/U
O
O
and zero otherwise. To verify completeness, assume that E[u(S)] = O for all O> O, which means that
'o
[u(s)ns - 1/9n] ds =
O
If we multiply by 9fl and differentiate with respect to U, then u(0)nO1 = O for all > O, which implies u(s) = O for all s > O, and thus S is a complete sufficient statistic for 8. O
The following interesting theorem sometimes is useful for establishing certain distributional results.
.
Thorm 10.4.7 Basii Let X1, ..., X,, have joint pdf f(x1, ..., X,,; O); O u Suppose that S = (S1, ..., S) where S1, ..., S are jointly complete sufficient statistics for O, and suppose that T is any other statistic. If the distribution of T does not involve O, then S and T are stochastically independent.
Proof We will consider the discrete case. Denote byf(t),f(s; O), andf(t s) the pdf's of T, S, and the conditional pdf of T given S = s, respectively. Consider the following expected value relative to the distribution of S:
E5{f(t) f(t S)] = f(t) -
= f(t) = f(t)
f(t j s)f(s, O)
f(s, t;O)
- f(t) =
O
Because Sis a complete sufficient statistic,f(tj s) = f(t), which means that S and T are stochastically independent. The continuous case is similar.
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
353
EXERCISES
Examplo 10 4
Consider a random sample of size n from a normal distribution X, N(p ci2) and consider the MLEs, = k and &2 = (X - k)2/n. It is easy to verify that
k is a complete sufficient statistic for ,i, for fixed values of ci2. Also n&2 ci
2
x2(n - 1)
k and &2 are independent random and &2 are jointly complete sufficient statistics for p and ci, and
which does not depend on p. It follows that variables. Also, k
are distributed independently of p and ci, so these quantities are stochastically independent of k and â2.
quantities of the form (X -
SUMMARY Our purpose in this chapter was to introduce the concepts of sufficiency and completeness. Generally speaking, a statistic provides a reduction of a set of data from some distribution to a more concise form. a statistic is sufficient, then it
contains, in a certain sense, all of the "information" ni the data concerning an unknown parameter of the distribution. Although sufficiency can be verified directly from the definition, at least theoretically this usually can be accom pushed more easily by using the factorization criterion If a statistic is sufficient and a unique MLE exists, then the MLE is a function of the sufficient statistic. Sufficient statistics also are important in the construction of UMVUEs If a statistic is complete as ,ell as sufficient for a parameter and if an unbiased estimator of the parameter (or a function of the parameter) exists, then a UMVUE exists and it is a function of the complete sufficient statistic. It often is difficult to verify completeness directly from the definition, but a special class of pdf's, known as the exponential class, provides a convenient way to identify complete sufficient statistics.
EXERCISES Let X1, ..., X be a random sample from a Poisson distribution, X1 S=
P01(p). Verify that
X1 is sufficient for p by using eqution (10.2.1).
Consider a random sample of size n from a geometric distribution, X1 equation (10.2.1) to show that S = X1 is sufficient for p.
GEO(p). Use
Suppose that X1,..., X is a random sample from a normal distribution, X1 -. N(0, 6). Show that equation (10,2.1) does not depend on 6 ifS = X.
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
354
CHAPTER lo SUFFICIENCY AND COMPLETENESS
4. Consider a random sample of size n from a two-parameter exponential distribution, X1 EXP(1 j) Show that S = X1 is sufficient for
by using equation (102 1)
5. Let X1,..., X be a random sample from a gamma distribution, X '-S GAM(O, 2). Show that S = X is sufficient for 0 by using equation (10.2.1), by the factorization criterion of equation (10.2.3).
6.
Suppose that X1, X2 .....X are independent, with X. - BIN(m1, p); i that S
7.
1, 2.....n. Show
X. is sufficient for p by the factorization criterion.
Let X1, X2, ..., X,, be independent with X1 - NB(r1, p). Find a sufficient statistic for p.
8. Rework Exercise 4 using the factorization criterion.
9. Consider a random sample of size n from a Weibull distribution, X. - WEI(0, fi). Find a sufficient statistic for 0 with fi known, say fi = 2. If fi is unknown, can you find a single sufficient statistic for fi?
10. Let X....., X,, be a random sample from a normal distribution, X.
N(p, a2).
Find a single sufficient statistic for p with a2 known. Find a single sufficient statistic for a2 with p known.
11. Consider a random sample of size n from a uniform distribution, X. '-j UNIF(01, 02). Show that X1.,, is sufficient for 0, if 02 is known. Show that X1.,, and X,,.,, are jointly sufficient for 01 and 62.
12. Let X1». ., X,, be a random sample from a two-parameter exponential distribution, X. »- EXP(0, ). Show that X1.,, and . are jointly sufficient for 0 and i. 13.
Suppose that X1, ..., X,, is a random sample from a beta distribution, X1 - BETA(01, 02). Find joint sufficient statistics for 0 and 02.
14. Consider a random sample of size n from a uniform distribution, X.
UNIF(0, 20); 0> 0. Can you find a single sufficient statistic for 0? Can you find a pair of jointly sufficient statistics for 0?
15. For the random sample of Exercise 2, find the estimator of p obtained by maximizing the pdf of S = X1, and compare this with the usual MLE of p. 16.
For the random variables X1.....X,, in Exerôise 7, find the MLE of p by maximizing the pdfofthe sufficient statistic. Is this the salme as the usual MLE? Explain why this result is expected.
117.
Consider the sufficient statistic, S = X1.,,, of Exercise 4. Show that S also is complete. Verify that X1.,, - 1/n is the UM VUE of i. Find the UM VUE of the pth percentile.
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
355
EXERCISES
18. LetX»-N(0,0);6>0. Show that X2 is complete and sufficient for 9. Show that N(0, 0) is not a complete family.
19. Show that N(J1, p) does not belong to the regular exponential class.
20. Show that the following families of distributions belong to the regular exponential class, and for each case use this information to find complete sufficient statistics based on a random sample X1.....X,,.
BIN(1,p);0
21.
Let X1.....X,, be a random sample from a Bernoulli distribution, X
BIN(1, p);
o
22. Consider a random sample of size n from a Poisson distribution, X P01(p); p > 0. Find the UMVUE of P[X 0] = e. Hint: Recall Exercise 33(g) of Chapter 9.
23. Suppose that X1,..., X,, is a random sample from a normal distribution, X1
N(p, 9).
Find the UM VUE of the 95th percentile. Find the UM VUE of P[X c] where e is a known constant. Hint Find the conditional distribution of X1 given = and apply the Rao Blackwell theorem with T = u(X1), where u(x1) = i if x1 c, and zero otherwise.
24. 1f X - P01(u), show that S
(_1)A' is the UM VUE of e2'. Is this a reasonable
estimator?
25. Consider a random sample of size n from a distribution with pdff(x; 0) = 9x°' if O < x < I and zero otherwise; 0> 0. Find the UM VUE of 1/9. Hint: E[.ln X] = 1/0. Find the UM VUE of 0.
26. For the random sample of Exercise 11, show that the jointly sufficient statistics X1.,, and X,,.,, also are complete. Suppose that it is desired to estimate the mean p = (0 + 02)/2. Find the UM VUE of p. Hint: First find the expected values E(X1.,,) and E(X,,.,,) and show that (X1.,, + X,,.,,)/2 is unbiased for the mean.
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
CHAPTER lo SUFFICIENCY AND COMPLETENESS
356
27. Let X1.....X,, be a random sample from a normal distribution, X - N1,a, c2). Find the UM VUE of cr2. Find the UM VUE of the 95th percentile. Hint: Recall Exercise 20 of Chapter 9.
28
Use Theorems 104 5 and 10 4 6 to find sufficient statistics for the parameters of the distributions in Exercises 5 and 6(b) of Chapter 9.
29
Consider a random sample of size n from a gamma distribution X, GAM(O K) and let X = (1/n) X and X = (fl X,)'1 be the sample mean and geometric mean respectively Show that X and X are jointly complete and sufficient for 8 and ic. Find the UM VUE of i = OK. Find the UM VUE of p". Show that the distribution of T XIX does not depend on 6. Show that X and T are stochastically independent random variables. Show that the conditional pdf of X given X does not depend on K
30. Consider a random sample of size n from a two-parameter exponential distribution, X EXP(O, n). Recall from Exercise 12 that X1.,, and X are jointly sufficient for Band . Because X1,,, is complete and sufficient for n for each fixed value of 0, argue from Theorem 104 7 that X1 and T = X1 ,, - X are stochastically independent Find the MLE 0 of O Find the UM VUE of . Show that the conditional pdf of X1.,, given X does not depend on 8. Show that the distribution of Q = (X1 - n)/Ô is free of and 0
31. Let X .....X,, be a random sample of size n from a distribution with pdf
O(l+x)'° O
0
x0
Find the MLE of O. Find a complete sufficient statistic for O. Find the CRLB for 1/O. Find the UM VUE of 1/O. Find the asymptotic normal distribution for Ô and also for z(Ô) = i/O. Find the UM VUE of O.
32.
Consider a random sample of size n from a distribution with pdf (in
f(x;O)=
9)X
Ox!
o
x =0, 1, ...; O> i otherwise
Find a complete sufficient statistic for 8. Find the MLE of O. Find the CRLB for O.
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
EXERCISES
357
Find the UM VUE of ln 6. Find the UM VUE of (In 8)2. Find the CRLB for (in 6)2,
33. Suppose that only the first r order statistics are observed, based on a random sample of size n from an exponential distribution, X. EXP(9). In other words, we have a Type II censored sample. Find the MLE of 8 based only on X1.,,.....Xr.,,. Relative to these order statistics, find a complete sufficient statistic for 8.
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
P
C
T
E
R
INTERVAL ESTIMATION
iLl INTRODUCTION The problem of point estimation was discussed in Chapter 9. Along with a point estimate of the value of a parameter, we want to have some understanding of how close we can expect our estimate to be to the true value. Some information on this question is provided by knowing the variance or the MSE of the estimator. Another approach would be to consider interval estimates; one then could
consider the probability that such an interval will contain the true parameter value. Indeed, one could adjust the interval to achieve some prescribed probability level, and thus a measure of its accuracy would be incorporated automatically into the interval estimate.
Example 11.1.1 In Example 4.6.3, the observed lifetimes (in months) of 40 electrical parts were given, and we argued that an exponential distribution of lifetimes might be reasonable. Consequently, we will assume that the data are the observed values of a 358
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
11.2
359
CONFIDENCE INTERVALS
random sample of size n = 40 from an exponential distribution, X EXP(0), where O is the_mean lifetime. Recall that in Example 9.3.4 we found that the
sample mean, X, is the UM VUE of O. For the given set of data, the estimate of 8 estimais = 93.1 months. Although we know that this estimate is based on an tor with optimal properties, a point estimate in itself does not provide informa-
tion about accuracy Our solution to this problem will be to derive an interval whose endpoints are random variables that include the true value of 9 between them with probability near 1, for example, 0.95. It was noted in Example 9.3.2 that 2nX/O j-.' 2(2n), and we know that percen-
tiles of the chi-square distribution are given in Table 4 (Appendix C). For X.o25(8O) = 57.15 and example, with n = 40 and y = 80, we find that
X.g75(80) = 106.63. It follows that P[57.15 <80X78 < 106.63] = 0.975 - 0.025, and consequently P[80X/106.63
In general, an interval with random endpoints will be called a random interval. In particular, the interval (80X/106 63, 80X157 15) is a random interval that con tains the true value of O with probability 095 If we now replace X with the es'imate = 93 1 then the resulting interval is (69 9 130 3) We will refer to this interva1 as a 95% confidence interval for O Because the estimated interval has known endpoints, it is not appropriate to say that it contains the true value of O with probability 095 That is the parameter 8, although unknown, is a constant and this particular interval either does or does not contain 8. However, the fact
that the associated random interval had probability 0.95 prior to estimation
might lead us to assert that we are "95% confident" that 69.9 < O < 130.3. The rest of the chapter will include a formal definition of confidence intervals and a discussion of general methods for deriving confidence intervals.
112 CONFIDENCE INTERVALS
Let X1, ..., X, have joint pdf f(x1, . .., x; O); O e fl, where fI is an interval. Suppose that L and U are statistics, say L = tf(X1, ..., X) and U
= (X1, ..., X). If an experiment yields data x1, ..., x,, then we have observed values (x1, ..., x) and
(x1, ..., x).
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
360
CHAPTER 11
INTERVAL ESTIMATION
Definition 11.21 Confidence Interval An interval (e(x1,..., x), «(x1, . ., x,j) is called a lOOy% confidence interval for O if
P[f'(X1,..., X)
Other notations that often are encountered in the statistical literature are I. and 0 for lower and upper confidence limits, respectively. We also sometimes will use the abbreviated notations (x) = (x1, ..., x,,) and x) = í(x1,. ., x) to denote the observed limits
Strictly speaking a distinction should be made between the random interval (L, U) and the observed interval ('(x), (x)) as mentioned previously. This situation is analogous to the distinction in point estimation between an estimator and an estimate. Other terminology, which is useful in maintaining this distinction, is to call (L, U) an interval estimator and (e'(x), o(x)) an interval estimate. The probability level, y, also is called the confidence coefficient or confidence level.
Perhaps the most common interpretation of a confidence interval is based on the relative frequency property of probability. Specifically, if such interval estimates are computed from many different samples, then in the long run we would expect approximately lOOy% of the intervals to include the true value of O. That is, our confidence is in the method, and because of Definition (11.2.1), the confidence level reflects the long-term frequency interpretation of probability. It often is desirable to have either a lower or an upper confidence limit, but not both.
Definition 11.2.2 One-Sided Confidence Limits If
then
P[t(X1.....X)
for O.
If
P[O < (X1, .. , X)] = y
(11.2.3)
then (x) = .(x1, ...,;) is called a one-sided upper lOOy% confidence limit for O
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
361
11,2 CONFIDENCE INTERVALS
It may not always be clear how to obtain confidence limits that satisfy Definitions 11.2.1 or 11.2.2. The concept of sufficiency often offers some aid in this problem. If a single sufficient statistic S exists, then one might consider finding confidence limits that are functions of S. Otherwise, another reasonable statistic, such as an MLE, might be considered.
Example 11.2.1 We take a random sample of size n from an exponential distribution, X. EXP(0), and we wish to derive a one-sided lower lOOy% confidence limit for O. We know that , is sufficient for B and also that 2n/O X2(2n). As mentioned in Chapter 8, yth percentiles, x(v), are provided in Table 4 (Appendix C) Thus
y = P[2n/O <(2n)] = P[2ni/(2n) < O] If is observed, then a one-sided lower lOOy% confidence limit is given by
= 2n/(2n)
(11.2.4)
Similarly, a one-sided upper lOOy% confidence limit is given by (x) = 2nc/X
(11.2.5)
(2n)
Notice that in the case of an upper limit we must use the value i - y rather than y when we read Table 4. For example, if a one-sided upper 90% confidence limit is desired i - 0.90 = 0.10. For a sample of size n = 40, the required percen-
tile is x.(8O) = 64.28, and the desired upper confidence limit has the form a(X) = 80/64.28. Suppose that we want a lOOy% confidence interval for O. If we choose values > O and 2 > O such that c ± 2 = = i - y, then it follows that
P[,1(2n) <2nX/O <
_2(2n)] = i -
-
and thus
P[2n7x_2(2n) < O < 2nX/x1(2n)] = y It is common in practice to let c = 2, which is known as the equal tailed choice, and this would imply c = = /2. The corresponding confidence interval has the form
(2nIx _12(2n), 2n/2(2n))
(11 .2.6)
Generally speaking, for a prescribed confidence level, we want to use a method that produces an interval with some optimal property such as minimal length. Actually the length, U - L of the corresponding random interval generaliy will be a random variable, so a criterion such as minimum expected length might be and 2 will more appropriate. For some problems, the equal tailed choice of
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
362
CHAPTER 11
INTERVAL ESTIMATION
provide the minimum expected length, but for others it will not. For example,
interval (11.2.6) of the previous example does not have this property (see Exercise 26).
Example 11.2.2 Consider a random sample from a normal distribution, X
N(u, a2), where a2 is
assumed to be known. In this case I is sufficient for ,u, and it is known that
Z = /(I - p)/a
N(O, 1). By symmetry, we also know that z«12 =
and thus
i - = P[z1_«12 <( - p)/c
< p 1 + Z1 -«/2
a/]
It follows that a 100(1 - c)% confidence interval for u is given by
2«j
a/,
+ Z1 -«/2
a/)
For example, for a 95% confidence interval, 1 lower confidence limits are ± 1.96a//.
(11.2.7)
/2 = 0.975 and the upper and
Notice that this solution is not acceptable if a2 is unknown, because the confidence limits then would depend on an unknown parameter and could not be computed. With a slightly modified derivation if will be possible to obtain a confidence interval for p, even if a2 is an unknown "nuisance parameter." Indeed, a major difficulty in determining confidence intervals arises in multiparameter cases where unknown nuisance parameters are present. A general method that often provides a way of dealing with this problem is presented in the next section. In multiparameter cases it also may be desirable to have a "joint confidence region" that applies to all parameters simultaneously. Also, a confidence region for a single parameter, in the one-dimensional case, could be some set other than an interval. In general, if O e t, then any region A0(x1, .., x) in ) is a lOOy% confidence region if the probability is y that A0(XI, ..., X«) contains the true value of O.
11.3 PIVOTAL QUANTITY METHOD Suppose that X1, ..., X,, has joint pdff(x1, ..., x,,; 8), and we wish to obtain confidence limits for 8 where other unknown nuisance parameters also may be present.
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
11.3
363
PIVOTAL QUANTITY METHOD
Definition 11.3.1 X; O) is a random variable that is a function Pivotal Quantity If Q = only of X1, ., X and O, then Q is called a pivotal quantity if its distribution does not depend on O or any other unknown parameters.
Exampi 11.3.1 In Example 11.2.1, we encountered a chi-square distributed random variable, which will be denoted here as Q = 2nX7O, and which clearly satisfies the definition of a pivotal quantity In that example we were able to proceed from a prob ability statement about Q to obtain confidence limits for O. More generally, if Q is a pivotal quantity for a parameter O and if percentiles of Q say q1 and q2, are available such that
P[q1<9(X1,
,X,,O)
then for an observed sample, x1, set of O e I that satisfy q1
x
(1131) IOOy% conhdence region for O is the
<9(x1, ..., x; O) < q1
(11.3.2)
Such a confidence region will not necessarily be an interval, and in general it might be quite complicated. However, in some rather important situations confidence intervals can be obtained, One general situation that will always yield an interval occurs when, for each fixed set of values x1, x, the function x, O) is a monotonie increasing (or decreasing) function of O It also is 9(x1, possible to identify certain types of distributions that will admit pivotal quan tities. Specifically, Chapter 3 included a discussion of location and scale paranieter models, which include most of the special distributions we have considered, Recall that a parameter B is a location parameter if the pdf has the formf(x; 0) = J0(x - 0), and it is a scale parameter if it has the form f(x, 0) = (1/O)f0(x/O), wherej'0(z) is a pdf that is free of unknown parameters (including O) In the case of location-scale parameters, say 0 and 2, the pdf has the form f(x 01 02) (1/02)f0[x - 8 1)182]. If MLEs exist in any of these cases, then they can be used to form pivotal quantities.
Theorem 11.3.1 Let X1, ..., X,, be
a random sample from O e , and assume that an MLE O exists.
distribution with pdff(x; O) for
location parameter, then Q = O - O is a pivotal quantity. If O is a scale parameter, then Q = O/O is a pivotal quantity.
i. If O is 2.
a
a
We already have seen examples of pivotal quantities that are slight variations of the ones suggested in this theorem. Specifically, recall Example 11.2.2, where
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
364
CHAPTER 11
INTERVAL ESTIMATION
X. Ncu, a2) With a2 known, ,u is a location parameter, and the MLE is X; thus - p is a pivotal quantity. In Example 11.2.1, X. EXP(0), so that O is a scale parameter and the MLE is ; thus X/0 is a pivotal quantity. Notice that it sometimes is convenient to make a slight modification, such as multiplying by a known scale factor, so that the pivotal quantity has a known distribution. For example, we know that 2n/0 '- 2(2n), which has tabulated percentiles, so it might be better to let this be our pivotal quantity rather than X/O.
Theorem 11.32 Let X1, ..., X be a random sample from a distribution with location-scale parameters 02)
f(X
=;
02)
If MLEs ê and Ô2 exist, then (Ô1 and 02, respectively.
- 01)/Ô2 and 02/02 are pivotal quantities for O
We will not prove this theorem here, but details are provided by Antle and Bain (1969).
Notice also that (8 - 0 1)102 has a distribution that is free of unknown parameters, but it is not a pivotal quantity unless 82 is known. If sufficient statistics exist, then MLEs can be found that are functions of them, and the method should provide good results.
Example 1132 Consider a random sample from a normal distribution, X N(p, a2), where both p and a2 are unknown. If â and ê are the MLEs of p and a, then (û and &/o are pivotal quantities, which could be used to derive confidence intervals for each parameter with the other considered as an unknown nuisance parameter.
It will be convenient to express the results in terms of the unbiased estimator S2 = n&2/(n - 1) to take advantage of some known distributional properties, namely
Xp S/
t(n - 1)
(11.3.3)
and
(n-1)S2 a2
2
(11.3.4)
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
365
11,3 PIVOTAL QUANTITY METHOD
If t1 -«/2 = t1 «12(n - 1) is the (1 - c/2)th percentile of the t distribution with n - 1 degrees of freedom, then
= P[_t
-«/2
=
t1«12 S/
< SI
r <1 <
which means that a 100(1 - a)% confidence interval for j is given by
( - tj_«2sI,
+ t1«12s/)
(11.3.5)
with observed values and s. Similarly, if X/2 = «2(n - 1) and X-«/2 = x2j «12(n - 1) are the (/2)th and
(1 - /2)th percentiles of the chi-square distribution with n - 1 degrees of freedom, then
<(n - l)S21a Xi2 -«/2 [(n - 1)S2 <2 (n - 1)S2
i
= P[.«12
2
L Xi-aj2
X«/2
and a 100(1 - )% confidence interval for i2 is given by ((n
1)S2'/Xi2 -«/2' (n
(11.3.6)
1)s2lx«2)
Also, confidence limits for c are obtained by computing the positive square roots of these limits
In general, if (OL' Ou) is a lOOy% confidence interval for a parameter O, and if z(0) is a monotonic increasing function of O e fl, then (r(OL), r(0)) is a lOOy% confidence interval for i(0).
Example 11.3.3 In Example 9.2.13, the computation of MLEs for the parameters of a Weibull distribution X WEI(O, ß), was discussed. Although the Weibull distribution is not a location-scale model, it is not difficult to show that the distribution of = in X has an extreme-value distribution that is a location-scale model. Specifically,
(1137)
f(y; 01, 02) = (1,I0,)f0[(y - 01)102] where f0(z) = exp (z - e«). The relationship between parameters is 02
=
11/3,
0
= in O and
and thus
Q1=ßln(61/O) pi_01
(11.3.8)
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
366
CHAPTER 11
tNTERVAL ESTMATÌON
and
Q2 =//ß=(Ô2/O2yi
(11,3.9)
are pivotal quantities for O and ß. Because the MLEs must be computed by iterative methods for this model, there is no known way to derive the exact distributions of Q1 and Q2, but percentiles can be obtained by computer simulation.
Tables of these percentiles and derivations of the confidence intervals are given by Bain and Engelhardt (1991) Approximate distributions are given in Chapter 16.
Lt may not always be possible to find a pivotal quatity based on MLEs, but for a sample from a continuous distribution with a single unknown parameter, at least one pivotal quantity can always be derived by use of the probability integral transform If X - f(x; O) and if F(x; O) is the CDF of X, then it follows from Theorem 6 3 3 that F(X O) UNIF(0, 1) and consequently 1Ç ln F(X1, O) EXP(1) For a random sample X1, ..., X, it follows that
2 ln F(X; O) i
(11.3.10)
X2(2n)
so that
P[x2(2n)
< 2ln F(X; O)
(11.3.11)
and inverting this statement will provide a confidence region for 0. If the CDF is not in closed form or if it is too complicated, then the inversion may have to be done numerically. If F(x; 0) is a monotonic increasing (or decreasing) function of
0, then the resulting confidence region will be an interval. Notice also that - F(X; 0) ' UNIF(0, 1), and
2ln [1 - F(X
O)] '-
2(2n)
(11 3 12)
In general, expressions (11310) and (11312) will give different intervals, and perhaps computational convenience would be a reasonable criterion for choosing between them.
Examph 11.3.4 Consider a random sample from a Pareto distribution, X.
PAR(l, K). The
CDF is
F(x;K)=1(1+x)
x>0
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
11.3
PIVOTAI. QUANTITY METHOD
367
If we use equation (11.3.12), then In [1 - F(x; ic)] = c in (1 + x), so 2K
in (1 + X1)
X2(2n)
and a 100(1 - )% confidence interval has the form
x1(2n)
X_«/2(2fl)
( 2ln(1 +x1)'2ln(1 +x1)) The solution based on equation (11.3.10) would be much harder because the resulting inequality would have to be solved numerically.
For discrete distributions, and for some multiparameter problems, a pivotal quantity may not exist. However, an approximate pivotal quantity often can be obtained based on asymptotic results. The normal approximation to the binomial distribution as discussed in Chapter 7 is an example A PPROXIMA TE CONFIDENCE IN TER VA LS Let X1, ..., X be a random sample from a distribution with pdff(x; O). As noted in Chapter 9, MLEs are asymptotically normal under certain conditions.
Example 71.3.5 Consider a random sample from a Bernoulli distribution, X BIN(1, p). The MLE of p is = X./n. We also know that ¡' is sufficient and that X.
BIN(n, p), but there is no pivotal quantity for p. However, by the CLT,
.Jp(l - p)/n
ZN(0, 1)
(11.3.13)
and consequently for large n,
Pp
P
- p)/n
i-
(11.3.14)
This approximation is enhanced by using the continuity correction, as discussed
in Chapter 7, but we will not pursue this point. Limits for an approximate 100(1 - )% confidence interval (pe, Pi) for p are obtained by solving for the smaller solution of
P - Po - p0)/n
(11.3.15)
and the larger solution of
- Pi
-
i -I2
(11.3.16)
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
368
INTERVAL ESTIMATION
CHAPTER 11
The common practice in this problem is to simplify the limits by using the limiting result that
-
-Z'-N(O, 1)
(11.3.17)
as n -, , which was shown in Example 7.7.2. Thus, for large n, we also have the approximate result «/2
j,
1,-p - p)/n
< l «/2
(11.3.18)
This statement is much easier to invert, and approximate confidence imits for p are given by P
± Z«/\/J(1
-
(11.3.19)
An important point here is that the random variables defined by expressions (11,2.13) and (11.3.17) are not pivotal quantitìes for any finite n, because their exact distributions depend on p However, the limiting distribution is standard normal, which does not involve p, and hence the degree to which the exact distribution depends on p should be small for large n, and the variables could be regarded as approximate pivotal quantities
Other important distributions also admit approximate pivotal quantities.
Example 11.3.6 Consider a random sample of size n from a Poisson distribution, X. '-.' P01(u). By the CLT, we know that
g-ít-z
N(O, 1)
(11.3.20)
and thus by Theorem 7.7.4 that
X/ld Z
N(O, 1)
(11.3 .21)
as n - aa. Either of these random variables could be used to derive approximate
confidence intervals, although expression (11.3.21) would be more convenient.
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
369
11.4 GENERAL METHOD
Actually, it is possible to generalize this approach when MLEs are asymptotically nojinal (see Exercise 29).
114 GENERAL METHOD If a pivotal quantity is not available, then it is still possible to determine a con-
fidence region for a parameter 6 if a statistic exists with a distribution that depends on 0 but not on any other unknown nuisance parameters Specifically,
let X1, ..., X,, have joint pdff(x1, .,., x,,; 0), and S = i(X1, ..., X,,)
g(s; 0).
Preferably S will be sufficient for B, or possibly some reasonable estimator such as an MLE, but this is not required. Now, for each possible value of 9, assume that we can find values h1(0) and h2(B) such that
P[h1(0) <5< h2(0)] = I -
(11.4.1)
If we observe S = s, then the set of values 0 e fi that satisfy h1(8)
Example 11.4.1 Consider a random sample of size n from the continuous distribution with pdf f(1/02)
x0
exp [(x - 0)/02]
x<0
with 0 > 0. There is no single sufficient statistic, but X1., nd X are jointly sufficient for 0 It is desired to derive a 90% confidence interval for O based on the statistic S = X1,, The CDF of S is G(s; 0) =
i - exp[n(s O
9)/92]
s
0
s<0
One possible choice of functions h1(0) and h2(0) that satisfy equation (11.4.1) is obtained by solving G(h1(0); 0) = 0.05
and G(h2(0); O) = 0.95
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
370
FIGURE 11.1
CHAPTER 11
INTERVAL ESTIMATION
Functions h1 (8) and h2(0) for the generai method of constructing confidence intervals s
h2(0)
h (0)
This yields the functions h1(8) = O - In (0 95)02/n = O + O 0513O2/n
and h2(0) = O - in (0.05)02/n
O + 2.996 02
The graphs of h1(0) and h2(0) with n = 10 are shown in Figure 11.1,
Suppose now that a sample of size n = 10 yields a minimum observation s = X1.10 = 2.50. The solutions of 2.50 = h1(0) and 2.50 h2(0) are 0 = 2.469 and 02 = 1.667. Because h1(0) and 1i2(0) are increasing, the set of all O> O such that h1(0) < 2.50 < h1(0) is the interval (1.667, 2.469), which is a 90% confidence interval for 0.
Because confidence limits in this approach are values of O that satisfy h1(0) = s and h2(0) = s, a more suggestive notation might be 0L and O rather than ¿'(x) and ¿i(x).
In general, if 1z(0) and h2(0) are both increasing, then the endpoints of the confidence interval can be determined for any observed s by solving for the lower limit such that h2(OL) = s, and for the upper limit O such that h1(0) = s The argument that (OL O) is a 100(1 - a)% confidence interval is illustrated graphi cally by Figure 11.2. If O is the true value of O then P[h1(00)
probability is . If h1(6) and h2(0) are both decreasing, then the argument is similar, but in this case hl(OL) = s and h2(0) = s. These results can be conveniently formulated in terms of the CDF of S.
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
371
11.4 GENERAL METHOD
FIGURE 11.2
A confidence interval based on the general method A
h2(60)
s h
Theorem 11.4.1 Let the statistic S be continuous with CDF G(s; O), and suppose that h1(0) and h2(0) are functions that satisfy G(h1(0); O) =
(11.4.2)
and G(h2(0);
O) = i -
(11.4.3)
for each O e t, where O
=s
(11.44)
A one-sided upper 100(1 - 1)% confidence limit, 0, is the solution of
h1(0) = s If
=
(11.4.5)
+ 2 and O < < 1, then (OL, O) is a 100(1 - )% confidence
interval for O.
The theorem is modified easily for the case where h1(0) and h2(0) are decreasing. In particular, if hl(OL) = s, then °L is a one-sided lower 100(1 con-
fidence limit, and if h2(0) = s, then
-
is a one-sided upper 100(1 -
confidence limit.
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
372
CHAPTER 11
INTERVAL ESTIMATION
Example 11.4.2 Consider again a random sample from an exponential distribution, X. - EXP(8), and recall that S = X1 is sufficient for 6. Because 2S/O - X2(2n), we have
a=P[Sh1(0)] = F[2S/O
2h1(61/OJ
which implies 2h1(0)/O = X(2n)
and h1(6) = OX(2fl)/2
This is an increasing function of 6, and the solution of h1(6) = s provides a one sided upper 100(1 - )% confidence limit O = 2s/X(2n), which we also obtained by the pivotal quantity approach The function h2(6) and 6L ar'e obtained in a similar manner.
There exist examples where h1(8) and h2(0) are not monotonic, and where the resulting confidence region is not an interval (see Exercise 25). Also note that in practice it is not necessary to know h1(6) and h2(6) for all O, but it will be necessary to know the values of 6 for which h1(0) = s and h2(6) = s, and whether these functions are increasing or decreasing, in order to know which function gives an upper limit and which gives a lowçr limit It can be shown that if G(s; 6) is a decreasing function of O for each fixed s, then Ioth h1(6) and h2(0) are increasing functions of O. This suggests the following theorem.
Theorem 11.4.2 Suppose that the statistic S is continuous with CDF G(s; 6), and let s be an
observed value of S, If G(s; O) is a decreasing function of O, then the following statements hold:
1. A onesided lower 100(1 - 2)% confidence limit, °L' is provided by a solution of G(s; OL) =
-
(11.4.6)
2. A one-sided upper 100(1 - OE1)% confidence limit, 0, is provided by a solution
G(s; 6) =
If=0c1 + c and O interval for 6.
(11.4.7)
<. <1, then (BL 6)is a 100(1 -)%
confidence
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
373
11.4 GENERAL METHOD
A similar theorem cari be stated for the case where G(s; O) is increasing in O. In
particular, if G(s; O) = 1 - c, then O is a one-sided upper 100(1 - 2)% confidence limit; and if G(s; OL) = , then 8L is a one-sided lower 100(1 confidence limit.
Example 11.4.3 Consider the statistic S = X1.,, of Example 11.4.1. For any fixed s, with the substitution t = s/6, G(s; O) can be written as i - exp [(n/s)(t"s- 1)t] for t 1. The derivative with respect to t is (n/x)(2t - 1) exp [(n/s)(t - 1)t], which is positive because 2t - 1 > O when t ? 1. Consequently, G(s; O) is an increasing function of
t and thus a decreasing function of O = s/t. It follows from the theorem that lower and upper 95% confidence limits for observed s = 2.50 and n = 10 are obtained by solving G(2.50; OL) = 0.95 and G(2.50; O) = 1.667 and O = 2.469, as before. = 0.05. These solutions are
one-sided
It also is possible to state a more general theorem that includes discrete cases, but it is not always possible to achieve a prescribed confidence level when the observed statistic is discrete. However, "conservative" confidence intervals, in :1
general, can be obtained.
Definition 11.4i 6
An observed confidence interval (OL, O) is called a conservative 100(1 - a)% confidence interval for O if the corresponding random interval contains the true value of O with probability at least i .
Conservative one-sided confidence limits can be defined similarly.
Theorem 11.4.3 Let S be a statistic with CDF G(s; O), and let h1(0) and h2(9) be functions that and satisfy G(h1(0); O) =
P[S < h2(0); O] = i -
(11.4.8)
where O < 1 <1 and O < 2 < 1. 1. If h1(0) and h2(0) are increasing Kunctions, then a conservative one-sided lower 100(1 - 2)% confidence limit for O, based on an observed value s of S, is a solution of h2(OL) = s, or O = 0L such that
P{S < s; OL] = -
(11.4.9)
A conservative one-sided upper 100(1 - 1)% confidence limit is a solution of h1(0) = s, or G(s; O) =
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
374
CHAPTER 11
INTERVAL ESTIMATION
If h1(0) and h2(0) are decreasing functions, then a conservative one-sided lower 100(1 - a1)% confidence limit is a solution of hI(OL) = s, or G(s; OL) = a. A conservative one-sided upper 100(1 - a2)% confidence limit is a solution of h2(8) = s, or 0 = 0 such that
P[S < s; O] = i
a2
(11.4.10)
In either case, if a = a1 + a2 and O
a)% confidence interval for 6.
An exact prescribed confidence level may not be achievable if S is discrete, but the confidence levels will be at least the stated levels. This requires keeping the strict inequality in conditions (11.4.8), (11.4.9), and (11.4.10). Of course, if S is continuous, then P[S < s; 0] = G(s; 0), and the previous theorems apply, yielding exact confidence levels.
Consider the case of a discrete distribution, G(s; 6), where hi(0) is an increasing function and G(s; O) is a decreasing function of 6. Let S assume the discrete values s1,
2, ,,
and suppose that there are parameter values 01, 62, .. such that
G(s; ei) = ix. IfS = s, is observed, then let 0, = 0 be the upper 100(1 - a% confidence limit. The confidence level will be greater than i a for the intermediate values of O. If 6_ < O < O, then the confidence interval will contain O if the observed value of S is greater than or equal to s, which will occur with probability
P[S
sJ6_ <0< 8,] = i - G(s_1; 8) 1
G(s_1;
=1a
8-)
Similarly, suppose that G(s; O) = 1 - a, and suppose 6L = That is, if S = s, then 0L is the solution of G(s_1; OL) = i - a, Now consider a value °j1 <0 < 0. This value will be in the confidence interval if S s, which occurs with probability
P[S
ExampM 11.4.4 In Example 11.3.5, two approaches were presented for obtaining approximate confidence intervals for the binomial parameter p, based on large sample approximations. We now desire to derive a conservative one-sided (1 - a)100% confidence limit for p. We know that S = X, is a sufficient statistic, and S BIN(n, p). We will
not find explicit expressions for h1(p) and h2(p) in this example, but note that
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
11.4
375
GENERAL METHOD
G(s; p) = B(s; n, p) is a decreasing function of p. Thus, for an observed value s, a solution of
(n)p - PuY y=o Y is a conservative one-sided upper limit, and a solution of = B(s; n, Pu) =
-
=B(s 1; n,pL)=
s i (n)Y(l y=o
PL)
Y
is a conservative one sided lower limit
1f
c
+ c2, then a conservative
100(1 - )% confidence interval is given by (Pr' Pu)' For a specified value of S, PL and Pu can be determined by interpolation from a cumulative binomial table such as Table i (Appendix C), or it can be obtained by numerical methods applied to the CDF For example suppose that n = 10 and s = 2. If c = 0.05, then B(2; 10, 0.5) = 0.0547 and B(2; 10, 0.55) = 0.0274, Linear interpolation yields Pu 0.509. By'-1'íiying a few more values, we obtain a closer value, B(2, 10,0507) = 005, so that Pu 0507 Similarly, if 2 = 0.05, we can find B(1; 10 0.037) = 0.95 and thus PL = 0.037. It follows also that (0.037, 0.507) is a conservative 90% confidence interval for p.
Example 11 .45 Recall that methods for deriving approximate confidence intervals for the mean, p, of a Poisson distribution were discussed in Example 11.3.6. For a random sample of size n X, '- P01(p), a sufficient statistic is S = X,, and S «-j P0I(np) Because the CDF of the Poisson distribution is related to the CDF of a chisquare distribution (see Exercise 21, Chapter 8), the confidence limits can be
expressed conveniently in terms of chi-square percentiles. If we denote by H(y; y) the CDF of a chi-square variable with y degrees-of-freedom, then a conservative upper 100(1 - z1)% confidence limit for p for an observed value s, is a solution of
= G(s; Pu) = i - H(2np0; 2s + 2) which means that 2np
X.l(25 + 2)
Pu
_1(2s + 2)/2n
and thus
Similarly, a conservative lower 100(1 - 2)% confidence limit for p is a solution of
i-
= G(s - 1; PL) = i - H(2npL; 2s)
so that
=
2(2s)
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
376
CHAPTER 11
INTERVAL ESTIMATION
and /-L
X2(2S)/2n
= /2, then a conservative 100(1 - )% confidence interval for u is
If c = given by
(x2(2
x1)/2n, X«/2(2
x. + 2)/2n)
(11.4.11
The general method can be applied to any problem that contains a single unknown parameter. The following theorem may be helpful in identifying a statistic that can be used in the presence of location-scale nuisance parameters. Theorem 11.4.4 Let X1, ..., X,, be a random sample of size n from a distribution with pdf of the form
f(x; 01, 02, 'c) =
-i f0P(x-0f \ 02
\
;K)
/
(11.4.12)
where co <01
on 01 or 02. If there exist MLEs Ô, Ô2, and k, then the distributions of (Ô
- 01)/Ô2, Ô2102, and k do not depend on 01 and 02.
It follows that the general method can be used with the statistic k to determine confidence limits on K, with 0 and 02 being unknown nuisance parameters. Of course, if K is known, then the pivotal quantities (0 - 01)/Ô2 and 02/02 can be used to find confidence intervals for 0 and 82. Theorem 11.3.2 also would apply in this situation, because 01 and 02 are location-scale parameters when K IS known. It may not be clear how to derive confidence limits for 0 and 02 if ic is unknown.
Theorem 11.4.5 Let X1, ..., X,, be a random sample of size n from a distribution with CDF F(x; 01, 02) where 0 and 02 are location-scale parameters, and suppose that MLEs Ô and 02 exist. If t is a fixed value, then F(t; Ô, 0) is a statistic whose distribution depends on t, 0, and 02 only through F(t; 0, 02). Consider the case where F(x; 01, 02) = F0[(x - 01)102] with F0(z) a one-to-one function. Let
c = (t - 01)102 = F'[F(t; 0, 02)] which depends on t, 0, and 02 only through F(t; 0, 02). It follows that
F(t; 01, 0) = F0[(t
- 01)/02]
= F0[c(02/02) - (Ô - 01)102]
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
11.5 iWO-SAMPLE PROBLEMS
377
which is a function only of c and the pivotal quantities (0 - 61)/82 and 62/02 Consequently, its distribution depends on F(t; 01, 02), but not on any other unknown nuisance parameters.
Example 11.4.6 Consider the quantity R(t)
P[X > t] = I - F(t; 0, 02)
This is an important quantity in applications where X represents a failure time or lifetime of some experimental unit. In some applications it is called the reliability function and in others it is called the survivor function It follows from the previous theorem that the distribution of the MLE of reliability R(t) = i - F(t; ô1, ô2), depends only on R(t). Thus, the general method can be used to find confidence limits on R(t).
For a more specific example, censider a random sample of size n from a twoparameter exponential distribution, X. e'.' EXP(6, j). The MLEs are = X1.,,, ô
=X-
X1.,,, and
= i - F(t; Ô, i) = exp [(t - 4)/Ô] = exp [(0/O) in R(t) - ( If Y = (t), then it can be shown that the CDF, G(y; R(t)), is decreasing in R(t) for each fixed y Thus by Theorem ii 42 a one sided lower (1 - cx)l00% con fidence limit, RL(t), is obtained by solving G((t); RL(t)) = i - cx. The CDF of is rather complicated in this case, and we will not attempt to derive it.
(t)
iLS 0-SAMPLE PROBLEMS Quite often random samples are taken for the purpose of comparing two or more populations. One may be interested in comparing the mean yields of two processes or the relative variation in yields of two processes. Confidence intervals are quite informative in making such comparisons.
TWO-SAMPLE NORMAL PROCEDURES
Consider independent random samples of sizes n1 and n2 from two normally distributed populations X N(u1, a) and Y N(u2, o), respectively. Denote by , V, S, and S the sample means and sample variances. Suppose we wish to know whether one population has a smaller variance than the other. For example, two methods of producing baseballs might be compared to see which method produces baseballs with a smaller variation in their elasticity. The unbiased point estimators S and S can be computed, but if there is only a small difference in the estimates, then it may not be clear whether this
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
378
INTERVAL ESTIMATION
CHAPTER 11
difference results from a true difference in the variances or whether the difference results from random sampling error In other words if the distribution variances are the same, then the difference in sample variances, based on two more samples,
might be just as likely to be in the other direction. Note also that if the sample sizes are large, then even a small difference between estimates may indicate a difference between parameter values; but if the sample sizes are small, then fairly large differences might result from chance. The confidence interval approach incorporates this kind of information into the interval estimate. PROCEDURE FOR VARIANCES
A confidence interval for the ratio c/o can be derived by using Snedecor's F distribution as suggested in Example 8.4.1. In particular, we know that
s
-
S2o1
'- F(n1 - 1, n2 - 1)
(11.5.1)
which provides a pivotal quantity for r/a. Percentiles for the F distribution with y1 = n1 - i and y2 = n2 - i can be obtained from Table 7 (Appendix C), so
S4
P f12(v, '2) < i i < Ji/2 '2
i
,
y2) I
=i-
(11.5.2)
and thus, if s arid s are estimates, then (1 - )100% confidence interval for
c/o
is given by
(4 f12(n1 - 1, n2
Example 115.1 Random samples of size
1),
n1
f1/2(nl -
(11.53)
= 16 and n2 = 21 yield estimates s = 0.60 and
= 020, and a 90% confidence interval is desired From Table 7 (Appendix C), f0 95(15, 20) = 220 and fo (i5, 20) = i/f0 95(20, 15) = 1/2 33 = 0429 It follows that (0 143, 0733) is a 90% confidence interval for c4/o Because the interval does not contain the value 1, we might conclude that r a (or cr/a i), and that the two populations have different variances. Because the confidence level is 90%, only 10% of such conclusions, on the average, will be incorrect. This type of reasoning will be developed more formally in the next chapter.
PROCEDURE FOR MEANS If the variances, o and o, are known, then a pivotal quantity for the difference, P2 - ¡ti, is easily obtained Specifically, because
i' -
- N(,u2 - Pi, o/n1 + cr/n2)
111 54)
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
379
11.5 TWO-SAMPLE PROBLEMS
it follows that
Z_YX2P1),.N(O,l)
(115.5)
+ a2/n2
With this choice of Z, the statement P[z1_12
- ± z1_12/a/n1 + a/n2
(11.5.6)
In most cases the variances will not be known, but in some cases it will be reasonable to assume that the variances are unknown but equal. For example, one may wish to study the øffect on mean yields when an additive or other modification is introduced in an existing method. In some cases it might be reasonable to assume that the additive could affect the mean yield but would not affect the variation in the process. If o = a = a2, then the common variance can be eliminated in much the same way as in the one-sample case using a Student's t variable. A pooled estimator of the common variance is the weighted average
(n 1)S +(fl2- 1)S
(11.5.7)
(n1 + n2 - 2)S
(11.5.8)
a2
then V
-
(n1 - 1)S
a
2
+
(n2 - 1)S 2
-x2(n1 + n2
2)
(11.5.9)
It is also true that .' and i are independent of S and S, so with Z as given by equation (11.5.5), with a = a = a2, and with V given by equation (11,5.8), it follows from Theorem 8.4.1 that
T-
YX(íi2p1) -
Z
t(n1 + n2 - 2)
,
I±_+j_ phn1 n2
..JV/(n1+n2
(11.5.10)
2)
Limits for a (1 - )100% confidence interval for P2 - ¡ are given by
± t1.2(n1 + n2 - 2)s,
L1
n1
(11.5.11)
n2
Example 11..2 Random samples of size n1 = 16 and n2 = 21 yield estimates = 4.31, = 5.22, = 0 12, and s = 0 10 We might first consider a confidence interval for the ratio of variances to check the assumption of equal variances. A 90% confidence
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
380
CHAPTER 11
INTERVAL ESTIMATION
interval for r/a is (0.358, 1.83), which contains the value 1. Thus, there is not strong evidence that the variances are unequal, and we will assume = o, The pooled estimate of variance is s = [(15)(0.12) + (20)(0.10)]/35 0.109, and = 0.330. Suppose that a 95% confidence interval for /-2 is desired, By lineafînterpolation between t0 975(30) = 2042 and t0 975(40) = 2 021 in Table 6 (Appendix C), we obtain t0 975(35) = 2 032 The desired confidence interval, based on the limits in equation (11.5.11), is (0.688, 1.133).
A PPROXIMA TE METHODS
It is not easy to eliminate unknown variances to obtain a pivotal quantity for P2 - i when the variances are unequal. One possible approach would be a large-sample method. Specifically, as n1 and n2
- x - (P2 -
-
N(0, i)
+ S/n2
(11.5.12)
Thus, for large sample sizes, approximate confidence limits for P2 - Pi may be easily obtained from expression (11.5.12).
Note that the above limiting results also hold if the samples are not from normal distributions, so this provides a general large-sample result from differences of means. The size of the samples required to make the limiting approximation close would depend somewhat on the form of the densities. For small samples from normal distributions, the distribution of the random variable in expression (11.5.12) depends on cr and cr, but good small-sample
approximations can be based on Student's t distribution. One such approximation which comes from Welch (1949), is
T
-
- (i
Pi)
+ S/n2
t(v)
(11.5.13)
where the degrees of freedom are estimated as follows: V
(s/n1 + s/n2)2
- [(s/ni)2/(ni - 1)] + [(s/n2)2/(n2 - 1)]
(11.5.14)
Notice that this generally will produce noninteger degrees of freedom, but linear interpolation in Table 6 (Appendix C) can be used to obtain the required percentiles for constructing confidence intervals. The general problem of making infer-
ences about P2 - p with unequal variances is known as the Behrens-Fisher problem Welch s solution is just one of many that have been proposed in the statistical literature, lt was studied by Wang (1971) who found it to be quite good.
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
11.5 TWO-SAMPLE PROBLEMS
381
PAIRED -SAMPLE PROCEDURE
All of the above results assume that the random samples are independent. In some cases, such as testretest experiments, dependent samples are appropriate. For example, to measure the effectiveness of a diet plan, we would select n people
at random and weigh them both before and after the diet. The observations would be independent between pairs, but the observations within a pair would not be independent because they were taken on the same individual.
We have a random sample of n pairs, (Xi, Ç), and we assume that the differences D = Y1 - X for i = 1, ..., n are normally distributed with mean
= P - Pi and variance cr = o' + a - 2c2, or D.
N(p2 _12i, cri)
Let
D1/n=
(11.5.15)
and n
n
D 1=1
-
(n \2 D1) i1
Ç
1)
(11.5.16)
It follows from the results of Chapter 6 that
T D(p2p1) -t(n,_
and thus a (1 - a)l00% confidence interval for P2
J± t1_«12(n - 1)sD/\/
(11.5.17)
p has limits of the form (11,5.18)
where ¿ and SD are observed.
Note that this method remains valid if the samples are independent because in
this case D. N(ji2 - Pi, + o). However, the degrees of freedom in the paired sample procedure is n - 1, whereas in the independent sample case with = we obtained a t statistic with 2n - 2 degrees of freedom so the effective sample size is twice as large in the independent sample case, and consequently the
paired-sample method would not be as good. However, if there is a reason for
pairing and the pairs are highly correlated, then a = a + cî - 2I2 may be much smaller than a + o, and thi could offset the loss in effective sample size. Thus pairing is a useful technique, but it should not be used indiscriminately. It is interesting to note that if two independent samples have equal sample size, and if the variances are not equal, then the paired sample procedure still can be used to provide an exact t statistic, but the resulting confidence interval would tend to be wider than one based on an approximate t variable such as that of expression (11.5.13).
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
382
CHAPTER 11
INTERVAL ESTIMATION
TWO-SAMPLE BINOMIAL PROCEDURE Suppose that X1 « BIN(n1, Pi) and X2 BIN(n2, P2 and 2 = X2/n2, from the results of Chapter 7 we have
P2P1
(P2Pi)
Z
N(O, 1)
Letting
i=
(11.5.19)
- P1)/ni + P2(1 - P2)!
It is clear that approximate large-sample confidence limits for P2 - Pi can be obtained in a manner similar to the one-sample case, namely
P2P1 ±zi
-
+ P2(1 - P2)
(11.5.20)
BAYESIAN INTERVAL ESTIMATION Bayes estimators were discussed briefly in Chapter 9 for the case of point estimation. There the parameter was treated mathematically as a random variable. In certain cases, this may be a physically meaningful assumption. For example, the parameter may behave as a variable over different conditions in the experiment. The prior density, p(0), may be considered to reflect prior knowledge or belief
about the true value of the parameter, and the Bayesian structure provides a convenient framework for using this prior belief to order the risk functions and select the best (smallest average risk) estimator. In this case, the prior density is not unlike a class of confidence intervals indexed by a. As a varies from O to 1, the resulting confidence intervals for B could be represented as producing a probability distribution for 8 The induced distribution in Ibis case is based on sample data rather than on subjective criteria. In any event, suppose that a prior density p(0) exists or is introduced into the problem andf(x; 8) is interpreted as a conditional pdf,f(xI 8). Consider again the posterior density of O given the sample x = (X1, ..., xv), fo (°)
xfl O)p(0)
(11.6.1)
xO)p(0) dO
The prior density p(0) can be interpreted as specifying an initial probability distribution for the possible values Of O, and in this context fo1(0) would represent a revised distribution adjusted by the observed random sample. For a particular - a level, a Bayesian confidence interval for O is given by (OL, O) where 0L and
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
383
SU M MARY
O
satisfy COU
Jf01(0) iL
dO
i-
(11.6.2)
If O is a true random variable, then the Bayesian interval would have the usual probability interpretation. Of course, in any such problem the results are correct only to the extent that the assumed models are correct. If p(0) represents a degree of belief about the values of O, then presumably the interval (OL, Ou) also would be interpreted in the degree-of-belief sense.
Example 11.6.1 In Example 9.5.5 it was assumed that X. POi(0) and that O posterior distribution was found to be given by
= Oir
GAM[(n + 1/ß1,
x, + ic]
GAM(JI, ;c), The (11.6.3)
It follows that 26
(n + 1/fl)-'
- 2(n + i/ß)O -'[2(
x + ,)]
(11.6.4)
and
P[2(v) < 2(n + l/ß)O
(11.6.5)
where y = 2( x -f- ic). Thus, a 100(1 - )% Bayesian confidence interval for O is given by (BL, O) where °L = X2(v)/2(n + 1/fl), and O = x /2(v)/2(n + 1/fl).
SUMMARY Our purpose in this chapter was to introduce the concept of an interval estimate or confidence interval. A point estimator in itself does not provide direct information about accuracy. An interval estimator gives one possible solution to this problem. The concept involves an interval whose endpoints are statistics that include the true value of the parameter between them with high probability. This probability corresponds to the confidence level of the interval estimator. Ordinarily, the term confidence interval (or interval estimate) refers to the observed interval that is computed from data.
There are two basic methods for constructing confidence intervals. One method, which is especially useful in certain applications where unknown nui-
sance parameters are present involves the notion of a pivotal quantity This amounts to finding a random variable that is a function of the observed random variables and the parameter of interest, but not of any other unknown parameters. It also is required that the distribution of the pivotal quantity be free of any
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
384
CHAPTER 11
INTERVAL ESTIMATION
unknown parameters. In the case of location-scale parameters, pivotal quantities can be expressed in terms of the MLEs if they exist. Approximate large-sample pivotal quantities can be based on asymptotic normal results in some cases The other method, which is referred to as the general method, does not require
the existence of a pivotal quantity, but has the disadvantage that it cannot be used when a nuisance parameter is present. This method can be applied with any
statistic whose distributions can be expressed in terms of the parameter. The percentiles are functions of the parameter, and the limits of the confidence inter-
val are obtained by solving equations that involve certain percentiles and the observed value of the statistic. Interval estimates obtained by either method can be interpreted in terms of the relative frequency with which the true value of the parameter will be included in the interval, which corresponds to the probability that the interval estimator will contain the true value. Another type of interval is based on the Bayesian approach. This approach provides a convenient way to use information or, in some cases, subjective judgment about the unknowú parameter, although the relative frequency interpretation may be inappropriate in some instances
EXERCISES il
Consider a random sample of size n from a normal distribution, X. '.. N(p, 02). If it is known that 62 = 9, find a 90% confidence interval for p based on the estimate . = 19.3 with n = 16. Based on the information in (a), find a one-sided lower 90% confidence limit for p. Also, find a one-sided upper 90% confidence limit for p. For a confidence interval of the form given by expression (11.2.7), derive a fórmula for the sample size required to obtain an interval of specified length 1. If a2 = 9 then what sample size is needed to achieve a 90% confidence interval of length 2? Suppose now that 2 is unknown. Find a 90% confidence interval foi p if
= 19.3 and s = 10.24 with n = 16. Based on the data in (d), find a 99% confidence interval for
2.
2.
Assume that the weight data of Exercise 24, Chapter 4, are observed values of a random sample of size n = 60 from a normal distribution. Find a 99% confidence interval for the mean weight of major league baseballs Find a 99% confidence interval for the standard deviation
3. Let X1.....X be a random sample from an exponential distribution, X '- EXP(û). If. = 17.9 with n = 50, then find a one-sided lower 95% confidence limit for O. Find a one-sided lower 95% confidence limit for P(X> t) = eh/O where t is an arbitrary known value.
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
EXERCISES
385
4. The following data are times (in hours) between failures of air conditioning equipment in a particular airplane: 74, 57, 48, 29, 502, 12, 70, 21, 29, 386, 59, 27, 153, 26, 326. Assume that the data are observed values of a random sample from an exponential distribution, EXP(0) Find a 90% confidence interval for the mean time between failures, 9. Find a one-sided lower 95% confidence limit for the 10th percentile of the distribution of time between failures.
5. Consider a random sample of size n from an exponential distribution, X1 EXP(1, j). Show that Q = Xi,, - i is a pivotal quantity and find its distribution. Derive a lOOy% equal tailed confidence interval for . The following data are mileages for 19 military personnel carriers that failed in
service 162 200 271 320 393 508 539 629 706 777 884 1008 1101 1182 1463 1603, 1984, 2355, 2880. Assuming that these data are observations of a random sample from an exponential distribution, find a 90% confidence interval for ij. Assume that O = 850 is known.
6. Let X1.....X,, be a random sample from a two-parameter exponential distribution, X,
EXP(O,
).
Assuming it is known that = 150, find a pivotal quantity for the parameter O based on the sufficient statistic. Using the data of Exercise 5, find a one-sided lower 95% confidence limit for O.
7.
Let X1, X2, ..., X,, be a random sample from a Weibull distribution, X Show that Q = 2 >X,2/O2 -
WEI(8, 2).
2(2n).
Use Q to derive an equal tailed lOOy% confidence interval for O. Find a lower lOOy% confidence limit for P(X> t) exp [(t/O)2]. Find an upper lOOy% confidence limit for the pth percentile of the distribution.
8. Consider a random sample of size n from a uniform distribution, X1 '-. UNIF(0, 0), 8> 0, and let X,,,, be the largest order statistic. Find the probability that the random interval (X,.,,, 2X,,,,) contains O. Find the constant c such that (x,,,, cx,,.,,) is a 100(1 - n)% confidence interval for 6.
9. Use the approach of Example 11.3.4 with the data of Example 4.6.2 to find a 95% confidence interval for ,c
10, Suppose that the exact values of the data x1. ..., x50 in Exercise 3(b) are not known, but it is known that 40 of the 50 measurements are larger than t. Find an approximate one-sided lower 95% confidence limit for P(X> t) based on this information.
Note that under the exponential assumption P(X > t) = exp (- t/6) If t = 5 use the result from (a) to find an approximate one sided lower 95% confidence limit for 8 and compare this to the confidence limit of Exercise 3(a).
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
386
CHAPTER 11
11.
INTERVAL ESTIMATION
Let p be the proportion of people in the United States with red hair. In a sample of size 40, five people with red'4iair were observed. Find an approximate 90% confidence interval for p.
12. Suppose that 45 workers in a textile mill are selected at random in a study of accident rate. The number of accidents per worker is assumed to be Poisson distributed with mean . The average number of accidents per worker is = 1.7. Find an approximate one-sided lower 90% confidence limit for using equation (11.3 .20).
Repeat (a) using equation (11.3.21) instead. Find a conservative one-sided lower 90% confidence limit for i using the approach of Example 11.4.5.
13. Consider a random sample of size n from a gamma distribution, X,
GAM(8, ic).
Assuming K is known, derive a 100(1 a)% equal-tailed confidence interval for O based on the sufficient statistic, Assuming that 0 1, aiid for n = 1, find an equal-tailed 90% confidence interval for tc if x1 lOis observed. Hint: Note that 2X1 2(2ic), and use interpolation in Table 4 (Appendix C).
14. Assume that the number of defects in a piece of wire that is t yards in length is
X POIt) for any t > 0. If five defects are found in a 100-yard roll of wire, find a conservative one-sided upper 95% confidence limit for the mean number of defects in such a roll. Ifa total of 15 defects are found in five 100-yard rolls of wire, find a conservative one-sided upper 95% confidence limit for A.
15.
Let X1,..., X, be a random sample from a Weibull distribution, X,
WEI(O, fi), where ß is known, Use the general method of Section 11.4 to derive a 100(1 - a)% confidence interval for O based on the statistic S X1,. Use the general method to find a (1 - a) 100% confidence interval for O based on the statistic S2 = X.
pf,(x) + (1 - p)f2(x), where X1 - N(1, 1) and X2 N(0, 1). Based on a sample of size n = 1 from f(x; p), derive a one-sided lower IOOy% confidence limit for p.
16. Let f(x; p)
17
Suppose that X GEO(p). Derive a conservative one-sided lower IOOy% confidence limit for p based on a single observation x. If x = 5, find a conservative one-sided lower 90% confidence limit for p. (o) If X1,..., X,, is a random sample from GEOQ,), describe the form of a conservative one-sided lower lOOy% confidence limit for p based on the sufficient statistic.
18. Let X1.....X,, be a random sample from a normal distribution, X '-. N(a, ci2). If t is a fixed real number, find a statistic that is a function of the sufficient statistics and whose distribution depends on t, p, and ci2 only through F(t; p, ci2) = P(X t).
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
387
EXERCISES
19. Consider independent random samples from two normal distributions, X Nlp, c1) and N (P2, c); ¡ = i.....n1,j i.....n2. Assuming that p and p2 are known, derive a 100(1 - )% confidence interval for o/cr based on sufficient statistics.
20. Consider independent random samples from two exponential distributions, X and Y,'-. EXP(02);i= 1.....n1,j = i.....n2.
EXP(01)
Show that (02/81X7V) F(2n1, 2n2). Derive a lOOy% confidence interval for 02/01.
21. Compute a 95% Bayesian confidence interval for p based on the results of Exercise 12. Use the prior density of Example 11.6.1 with fi = i and K = 2.
22. Let X1, ..., X. be a random saple from a Bernoulli distribution, X
BIN(1, 0), and assume a uniform prior 9 UNIF(0, 1). Derive a 100(1 - c)% Bayesian interval estimate of 0. Hint: The posterior distribution is given in Example 9,5,4,
23. Using the densities f(x I 0) and f(0) of Exercise 44 of Chapter 9: Derive a 100(1 - z)% Bayesian confidence interval for 0. Derive a 100(1 - n)% Bayesian confidence interval for p 1/0. Compute a 90% Bayesian confidence interval for 0 if n = 10, = 5, and ß = 2. and 1 are one-sided lower and upper confidence limits for 0 with confidence coefficients i - a1 and i - x2,respectively. Show that (OL' O) is a conservative 100(1 - a)% confidence interval forO if a = ; ± a2 < 1. Hint. Use Bonferroni's inequality
24. Suppose that (1.4.7), with
[OL<0<0U]=[OL
[0<0e,]
25. Consider a random sample of size n from a distribution with CDF
ulexp[-0(x-0)]
F(x;0)=10
x>0 x'
with 0 > O Find the CDF, G(s; 0), of S = Find the function h(0) such that G(h(0); 0) = i - a, and show that it is not monotonie.
(e) Show that h(0) = s has two solutions, 0 = [s ,,/2 + 4 (In a)/n]/2 and + 4(ln a)/n]/2, ifs2> 4 (ln a)/i, and that h(0) > s if 02 = [s + and only if either 0 < 01 or 0> 02. Thus, (0,,0) (02, ) is a 100(1 - a)% confidence region for 0, but it is not an inteival
26. Consider the equal-tailed confidence interval of equation (11.2.6). Use the fact that Q = 2nR70 «. x2(2n) to derive a formula for the expected length of the corresponding random interval. More generally, a 100(1 - a)% confidence interval for O has the form (2n/q2, 2n/q1) where F0(q1) = a and FQ(q2) = 1 - a2 with a a1 + a2 < 1, and the expected length is proportional to (1/q1 - i/q2). Note that q2 is an implicit function of q1, because FQ(q2) - F(q1) = i - a (which is fixed), and consequently that dq2/dq1 = fQ(q2). Use this to show that the values of q1 and q2 that minimize
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
388
CHAPTER 11
INTERVAL ESTIMATION
(1/q - 1/q2) must satisfy qf0(q1) = qf0(q2), which is not the equal-tailed choice for a chi-square pdf.
27.
Consider the equal-tailed confidence interval of equation (11.2.7). More generally, if
z=
- )/cr, then a 100(1 - % confidence interval for u is given by (x - z2
z1u//)whereI(z1)=1 and I(z2)= i
2with=1 +2 <1.
Show that the interval of this form with minimum length is given by equation (11.2.7).
In the case when ci2 is unknown, can a similar claim be made about the expected length ofthe random interval corresponding to equation (11,3,5)? In a manner similar to that of Exercise 26, show that the equal-tailed confidence interval given by equation (11.3.6) does not have minimum expected length.
28. Based on the pivotal quantities Q1 and
Q2 given by equations (11.3.8) and (11.3.9), derive one-sided lower lOOy% confidence limits for the parameters O and 2 of the extreme-value distribution. Leave the answer in terms of arbitrary percentiles q and q2.
29. As noted in Section 9.4, under certain regularity conditions, the MLEs ê,, are asymptotically normal, N(O, c2(0)/n), where c2(0)/n is the CRLB. Assuming further that c(0) is a continuous function of 6, it follows from Theorem 7,7.2 that c(Ô,,) converges stochastically to e(0). Using other results from Chapter 7, show that if Z,, O)/c(Ô,,), then
= /(& -
Z,,-ZN(0, 1)asn-+cxi
From (a), show that limits for an approximate large-sample 100(1 - a)% confidence interval are given by O,, ± z1 Based on the results of Example 9.4.7, derive an approximate 100(1 confidence interval for the parameter K where X PAR(1, ic). Use the data of Example 4.6.2 to find an approximate 95% confidence interval for Ic, and compare this with the confidence interval of Exercise 9. Would you expect a close approximation in this example?
/2
30. Suppose that Ô,, is asymptotically normal, N(O, c2(0)/ii). It sometimes is desirable to consider a function say g(0) such that the asymptotic variance of g(O,,) does not depend on O. Such a function is called a variaiice-stabilizing transformation, If we apply Theorem 7.7,6 with 1 = 0,,, in = O, and e = e(0), then g(0) would have to satisfy the equation [c(0)g'(0)]2 k, a constant. If X1 X,, is a random sample from a Poisson distribution X - POI(.n) and O,, = show that g(u) = is a variance stabilizing transformation Derive an approximate, large-sample 100(1 - )% confidence interval for ¡ based on
Consider a random sample of size ,i from EXP(0), and let Ô,, = . Find a variance-stabilizing transformation and use it to derive an approximate large-sample confidence interval for O.
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
C
TESTS OF HYPOTHESES
INTRODUCTION In scientific activities, much attention is devoted to answering questions about the validity of theories or hypotheses concerning physical phenomena. Is a new drug effective? Does a lot of manufactured items contain an excessive number of defectives? Is the mean lifetime of a component at least some specified amount? Ordinarily, information about such phenomena can be obtained only by per forming experiments whose outcomes have some bearing on the hypotheses of interest. The term hypotheses testillg will refer to the process of trying to decide the truth or falsity of such hypotheses on the basis of experimental evidence.
For instance, we may suspect that a certain hypothçsis, perhaps an accepted theory, is false, and an experiment is conducted, An outcome that is inconsistent with the hypothesis will cast doubt on its validity. For example, the hypothesis to In general, be tested may specify that a physical constant has the value experimental measurements are subject to random error, and thus any decision about the truth or falsity of the hypothesis, based on experimental evidence, also 389
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
390
CHAPTER 12 TESTS OF HYPOTHESES
is subject to error. It will not be possible to avoid an occasional decision error, but it will be possible to construct tests so that such errors occur infrequently and at some prescribed rate. A simple example illustrates the concept of hypothesis testing.
Example 12.1.1 A theory proposes that the yield of a certain chemical reaction is normally distributed X N(u 16) Past experience indicates that p = 10 if a certain mineral is not present, and p = 11 if the mineral is present. Our experiment would be to
take a random sample of size n. On the basis of that sample, we would try to decide which case is true. That is, we wish to test the "null hypothesis" H0 : = = 10 against the "alternative hypothesis" Ha p = Pi = 11.
Definition 12.1.1 If X -.'f(x; O), a statistical hypothesis is a statement about the distribution of X. If the hypothesis completely specifiesf(x; O), then it is referred to as a simple hypothesis; otherwise it is called composite.
Quite often the distribution in question has a known parametric form with a single unknown parameter 9, and the hypothesis consists of a statement about 9. In this framework, a statistical hypothesis corresponds to a subset of the parameter space, and the objective of a test would be to decide whether the true value of the parameter is in the subset. Thus, a null hypothesis would correspond to a subset of fi, and the alternative hypothesis would correspond to its complement, fi - fi In the case of simple hypotheses, these sets consist of only one clement each fi0 = {0} and fi - fi0 = {0}, where 8 Most experiments have some goal or research hypothesis that one hopes to support with statistical evidence, and this hypothesis should be taken as the alter native hypothesis The reason for this will become clear as we p'oceed In our example, if we have strong evidence that the mineral is present then we may wish to spend a large amount of money to begin mining operations so we associate this case with the alternative hypothesis We now must consider sample data, and decide on the basis of the data whether we have sufficient statistical evidence to reject H0 in favor of the alternative H0, or whether we do not have sufficient evidence. That is, our philosophy will be to divide the sample space into two regions, the critical region or rejection region C, and the nonrejection region S - C. If the observed sample data fall in C, then we will reject H0, and if they do not fall in C, then we will not reject H0. O
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
12.1
INTRODUCTION
391
Definition 12.1.2 The critical region for a test of hypotheses is the subset of the sample space that corresponds to rejecting the null hypothesis.
In our example, is a sufficient statistic for p, so we may conveniently express the critical region directly in terms of the univariate variable , and we will refer a natural foim for the critical region in to as the test statistic Because p > c} for some appropriate constant c , x) I this problem is to let C = {(x1, c, and we will not reject H0 if < c There are That is, we will reject H0 if two possible errors we may make under this procedure. We might reject H0 when H0 is true, or we might fail to reject H0 when H0 is false. These errors are referred to as follows:
Type I error: Reject a true H0. Type II error: Fail to reject a false H0. Occasionally, for convemenLe, we may refer to the Type II error as accepting a false H0" and to S - C as the "acceptance" region, but it should be understood that this is not strictly a correct interpretation. That is, failure to have enough statistical evidence to reject H0 is not the same as having strong evidence to support H0. We hope to choose a test statistic and a critical region so that we would have a small probability of making these two errors. We will adopt the following notations for these error probabilities:
P[Type I error] = P[TI] = P[Type II error] = PETIT] = fi.
Definition 12.1.3 For a simple null hypothesis, H0, the probability of rejecting a true H0, = PET!], is referred to as the significance level of the test. For a composite null hypothesis, H0, the size of the test (or size of the critical region) is the maximum probability of rejecting H0 when H0 is true (maximized over the values of the parameter under H0).
Notice that for a simple H0 the significance level is also the size of the test.
The standard approach is to specify or select some acceptable level of error = 0.01 for the significance level of the test, and then to such as = 0.05 or determine a critical region that will achieve this Among all critical regions of size we would select the one that has the smallest PETTI]. In our example, if = 10 + 1.645(4)15 = 11,316. This n = 25, then = 0.05 gives C = Po +
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
392
CHAPTER 12 TESTS OF HYPOTHESES
is verified easily, because
P[cI=p0= lO]P[X_Po>C_Po] T/\/ [
11.316-10 4/5
= PEZ
1.645]
= 0.05
= 11 is to reject Thus, a size 0.05 test of H0 : = lo against the alternative H H0 if the observed value 11.316. Note that this critical region provides a size 0.05 test for any alternative value u = , but the fact that > j means that we will get a smaller Type II error by taking the critical region as the right-hand tail
of the distribution of rather than as some other region of size 0,05. For an
.
mined under H0 for specified The probability of Type II error for the critical region C is
ß= P[TII] = P[
< il.3161p = = 11] - 11 11.316 - 11 4/5
4/5
= P[Z < 0.395] = 0.654 These concepts are illustrated in Figure 12.1. At this point, there is no theoretical reason for choosing a critical region of the
form C over any other. For example, the critical region C1 < < 10.1006} also has size = 0.05 because, under H0 P[1O <
< 10.1006] =
[o
{(x
,
... , x
10
X-10 < 0.1257]
= 0.05 FIGURE 12.1
Probabilities of Type I and Type Il errors
X
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
12.1
393
INTRODUCT0N
However, P[TII] for this critical region is
P[TII] = 1 - P[10
[10-11 4/5
<< 10.1006-11 4/5
= i P[-125
that a sample size n = 100 is available To maintain a critical region of size = 0 05, we would now use
C2 =u -J-zl_cI\/= 10+ 1645(4)/10= 10658 The value of P[TII] in this case is
P[X < 10.658i = 11] =
10.658-11 L 4/10
4/10
=P[Z< 0855] = 0 196
1f the choice of sample size is flexible, ihen one can specify both problem and determine what sample size is necessary.
and ß in this
More generally, we may wish to .test H0 P = Po against Ha IL = p (where Pi > Po) at the significance level. A test based on the test statistic
Z=
(12.1.1)
is equivalent to one using X, so we may conveniently express our test as rejecting H0 if z0 ? z1 , where z is the computed value of Z0. Clearly, under H0,
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
394
CHAPTER 12 TESTS OF HYPOTHESES
P[ZØ
] = a, and we have a critical region of size a. The probability of
z1
Type II error for the alternative p1 is
P[TII] -
[1-
P Xp1
z
= Pi
a
P = Pi
so tha PETIT] =
Po - Pi
(12.1.2)
The sample size n that will render P[TII] = ß is the solution to (12.1.3)
(12.1.4)
For a = 0.05, ß = 0.10, Po = 10, p = 11, and a = 4, we obtain n = (1.645 + 1,282)2(16)/i 137. In considering the error probabilities of a test, it sometimes is convenient to use the "power function" of the test.
Definitk The power functwn it(0), of a test of H0 is the probability of rejecting H0 when the true value of the parameter is 8.
For simple hypotheses H0 : 8 = versus Ha: O = 01, we have ir(O0) = P[TI] = a and ir(O1) = i - PETIT] = i - ß. For composite hypotheses, say H0 : O e versus Ha : O e - £lo the size of the test (or critical region) is a = max it(0)
(12.1.5)
Oefl
and if the true value O falls in fi - fi0, then ir(0) = i - P[TII], where we note that PETTI] depends on O. In other words, the value of the power function is always the area under the pdf of the test statistic and over the critical region, giving P[TI] for values of O in the null hypothesis and i - PETIT] for values of O in the alternative hypothesis. This is illustrated for a test of means in Figure 12.2.
In the next section, tests concerning the mean of a normal distribution will illustrate further the notation of composite hypotheses.
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
12.2
FIGURE 12.2
COMPOSITE HYPOTHESES
395
The relationship of the power function to the probability of a Type Il error
i ir(p)=ii
12e2 COMPOSITE HYPOTHESES Again, we assume that X N(p, v.2), where c2 is kìo,vn, and we wish to test H0 p = p against the composite alternative Ha p > i0 Ii was suggested in the previous example that the critical region should be loced on the right hand tail for any alternative > p but the value of the critical value c did not depend on the value of p Thus it is clear that the test for the simple alternative also is valid for this composite alternative A test at significance level still would reject H0if (12.2.1) cri
The power of this test at any value p is
=F
-
[-J1
/10-/I
=Pl
/1]
L
so that
=i-
+
/1 -
(12.2.2)
Foi /1 = p we have irlp) = , and for p > p0 we have xfu) = i - P[TH] We also may consider a composite null hypothesis Suppose that we wish to test H0 p p against Ha /2 > and we reject H0 if inequality (12 2 1) is satisfied This is still a size a test for the composite null hypothesis The probabil ity of rejecting H0 for any p p0 is it(p), and it(u) ir(p0) = a for p p, and thus the size is max ir(p) = a. That is, if the critical region is chosen to have size a
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
396
FIGURE 12.3
CHAPTER 12 TESTS OF HYPOTHESES
Comparison of power functions for tests based on two different sample sizes
lr(Jh)
=P[reject HoLa]
(n = 20)
/L0
at p, then the Type I error will be less than a for any p <4u0, so the original critical region still is appropriate here. Thus, the a level tests developed for simple null hypotheses often are applicable to the more realistic composite hypotheses,
and P[TI] will be no worse than a. The general shape of the power function is given in Figure 12.3 for n = 20 and 100.
From Figure 12.3, it is rather obvious why failure to reject H0 should not be interpreted as acceptance of H0. In particular, we always can find an alternative value of 1u sufficiently close to p0 so that the power of the test x(u) is arbitrarily close to a. This would not be a serious problem in particular, if we could determine an indifference region, which is a subset of the alternative on which we are willing to tolerate low power. In other words, it might not be too important to detect alternative values of p that are close to ,u0. In our example, we may not be too concerned about rejecting H0 when p is in some small interval, say (u0, which we could take as our indifference region. When p Pi' a sample size can be determined from equation (12 1 4) that will provide power it(u) i - fi That is, for alternative values outside of the indifference region, a test can be constructed that will achieve or exceed prescribed error rates for both types of error.
P-VALUE There is not always general agreement about how small a should be for rejection of H0 to constitute strong evidence in support of Ha. Experimenter i may consider a = 0.05 sufficiently small, while experimenter 2 insists on using a = 0.01. Thus, it would be possible for experimenter i to reject when experimenter 2 fails to reject, based on the same data. If the experimenters agree to use the same test
statistic, then this problem may be overcome by reporting the results of the experiment in terms of the observed size or p-value of the test, which is defined as the smallest size a at which H0 can be rejected, based on the observed value of the test statistic.
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
12,2 COMPOSITE HYPOTHESES
397
ExampM 122.1 On the basis of a sample of size n = 25 from a normal distribution, X1 ' N(p, 16), we wish to test H0 : p = 10 versus Ha : p> 10. Suppose that we observe = 11 '0 The p value is 11 4Ou = lo] = i 1(1 75) = 1 - 0.9599 = 0.0401. Because 0.01 <0.0401 <0.05, the test would reject at the
P['
0.05 level but not at the 0.01 level. If the p-value is reported, then interested readers can apply their own criteria
To test H0 p
against Ha : p < po, similar results are obtained by reject-
ing H0 if Z0
Zl_«
(12.2.3)
That is, the critical region of size now is taken on the left hand tail of the distribution of the test statistic These tests are known as one-sided tests of hypotheses. The test with a critical region of form (12.2.1) is called an upper one-sided test, i ud we form (12 2 3) corresponds to a lower one-sided test
Another common type of test involves a two-sided altruative. We may wish to test H0 : p = against the alternative Ha : p If we choose the right-hand tail for our critical region, then we will have good power for rejecting H0 when P > Po but we will have poor power when p
Z0 5 Z1 «/2
or
Z0
(12.2.4)
Z1 «/2
It is reasonable to use an equal-tailed test (each tail of size a/2) in this case because of symmetry considerations, and it is common practice to use equal tails, for the sake of convenience, in most two-sided tests. The power function for the two-sided test is
ir(p)= 1 P[z1«12
«/2
IP]
which gives
=i-
/
+ PoPs' +
)
/
IZi_«j2 + PoPs'
)
(12.2,5)
If p > Po' then, as suggested in Figure 12.4, the last normal probability term in the above power function will be near zero. Similarly, if P
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
398
FIGURE 12.4
CHAPTER 12 TESTS OF HYPOTHESES
Critical region for a two-sided test
an observed value , which hypothesized values ofi0 would not have been rejected. We see from expression (12.2.4) that these are the values of u that satisfy
z1_«12a/< /o
(12.2.6)
That is, the set of values of j that are in the "acceptance region" (nonrejection region) of the test is the same as our earlier 100(1 )% confidence interval for The values of ¡ in the "acceptance regions" of the one-sided tests correspond to the one-sided confidence intervals we discussed earlier. Indeed, one could carry out a test of H0 : u = by computing the corresponding confidence interval and rejecting H0 if the interval does not contain /1g.
-
.
123 TESTS FOR THE NORMAL D!STRUTION In this section, we will state theorems that summarize the common test procedures for the parameters of a normal distribution. In a later section, we will show that some of these tests have optimal properties.
TESTS FOR THE MEAN (u2 KNOWN) The results discussed in the previous section are summarized in the following theorem. ThGorGm 12.3.1 Suppose that x1 ..., x is an observed random sample from N(u, a2), where a2 is
known, and let (12.3.1)
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
399
12.3 TESTS FOR THE NORMAL DISTRIBUTION
AsizetestofH0 :p z0
z1
.
ji versus Ha:/2 > jz0istorejectH0if The power function for this test is
=1
(1232)
test ofH0 ¡
A size Z0
+°
Zi_
P0 versus Ha p
(12.3.3)
test of H0 : p = versus [{ ¡t is to reject H0 if or z0 zl_a/2 Z1_a/2. The sample size required to achieve a size n test with power i - .8 for an alternative value p is given by A size Z0
(z1 -
+ Zi_p)2a2
(12.3.4)
for a one-sided test, and
_)_2
+ Z1 flr (z1 -/2(.u0u)2
(12.3.5)
for a two-sided test.
TESTS FOR THE MEAN (a2 UNKIÌ!OWN) In most practical applications it is not possible to assume that the variance is known It is clear that the pivotal quantities and other statistics considered in developing confidence intervals can be applied to the associated hypothesis testing problems. Later we will discuss general techniques for deriving statistical tests.
Tests for p with a unknown can be based on Student's t statistic, which will be
similar to the tests based on the standard normal test statistic for the case in which the variance is known, with a2 replaced by the observed sample variance 82
Theorem 12,3.2 Let x1, ..., x be an observed random sample from N(jt, a2) where a2
is
unknown, and let (12.3.6)
=
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
400
CHAPTER 12 TESTS OF HYPOTHESES
A size a test ofH0 p t0
versus Ha P > Po is to reject H0 if
tj _a@ - 1).
A size a test ofH0 p
versus Ha P < Po is to reject H0 if
t0 s t_«(n - 1)
A size a test of H0 p = Po versus Ha ,a if t0 ti_.a/2(fl - 1) or t0 ti_12(n - 1).
is to reject H0
The power function and the related sample size problem are more complicated than in the case where a is known. For part i of the theorem for an alternative p > p, the power function is ir(u)
[X P[ - Po
PIL
ti_a(V)IP
S/V
so that it(p) = P
V/v
-
ti_a(V)
(12.3.7)
where y = n - 1, ¿1 = - p0)/a, and Z and V are independent, Z N(O, 1) and V = (n - 1)S2/c2 X2(v). The random variable in equation (12.3.7) is said to have a noncentral r distribution with y degrees of freedom and noncewitrality parameter (5. It has the usual Student's t distribution only when 5= 0. Otherwise, the distribution is rather hard to evaluate. Tables of noncentral
t distribution are available, and x(p) can be determined from these for given values of (5. Similarly, the sample size required to give a desired power can be determined for specified values of (5. This can be quite useful if approximate values or previous estimates of a are available. Table 8 (Appendix C) gives the sample size required to achieve ir(ji) = i - ß in terms of d = I// = I Po I/a for the tests described in Theorem 12.3.2.
-
ExamL' 12,3.1 lt is desired to test, at the a = 0.05 level, H0 : p = lo versus Ha p > 10 for a normal distribution, N(p, ci2) with a2, unknown, and we wish to have power 0.99
if the true value p is two standard deviations greater than Po = 10. In other words, 0.99 = ir(u) when d = I Po I/a = 2. From Table 8 (Appendix C), we obtain n = 6.
-
Tests also can be constructed for other paramçters such as the variance.
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
12.3 TESTS FOR THE NORMAL DISTRIBUTION
401
TEST FOR VARIANCE
It is possible to construct tests of hypotheses such as H0 a2 = Ha : .2 > c based on the test statistic
versus
V0 = (n - 1)S2/a
(12.3.8)
because V0 « x2(n - 1) when H0 is true. An observed value of the sample variance, s2, that is large relative to cr would support Ha. This suggests choosing the right-hand tail of the distribution of V0 as the critical region for such a test. This test would be very useful for deciding whether the amount of variability in a population is excessive relative to some standard value, c. Similar remarks
a.
apply for a composite null hypothesis of the form I-It, In the following theorem, H(c; y) is the CDF' of2(v). :
Theorem '12,3,3 Let x, ..., x be an observed random sample from N(u, 0.2) and let
= (n - 1)52/at A size ,
(12.3.9)
2 > a is to reject H0 if test of H0 : a a versus R, ...(n - 1). The power function for this test is :
i - HE(a/a2)X.a(n A size y0
1]
test ofH0 :
x(n - 1) or
2
y0
(1 2.3.10)
a versus Ha : 0.2 < a is to reject H0 if
x(n - 1). The power function for this test is ir(a2) = H[(a/a2)(n - 1); u - t]
A size V0
test ofH0 : a2
1); Ti
= a versus Ha: er2
(12.3.11)
a is to reject H if
X_«i2(n - 1).
Proof We will derive the power function for part i and leave the other details as an exercise. ir(a2) = PUyO
x _a(n - 1)1 a2]
= P[(n - 1)S2/a2 > (a2/a2)x2 _a(n -
= i - H[(0./0.2)X_a(fl - 1); n
1)1
a2]
1]
Notice th
in particular, n(o) = I - H[X.(n - 1); n - 1] = i - (1 - c) =
and bec..
ir(a2) is increasing, the size of the critical region is c.
In practice, it is convenient to use an equal-tailed two-sided test as described in part 3, but unequal tail values c and 2 with = ce + 2 may be desirable in some situations.
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
402
CHAPTER 12 TESTS OF HYPOTHESES
It is possible to solve for a sample size to achiève prescribed levels of size and power for these tests. For example, for the test of part 1, we may wish to have = i - ß for some > o This requires finding the value n such that
(a/a)X_a(n - 1) =4(n - 1)
(12.3.12)
This cannot be solved explicitly for n, but an iterative procedure based on the percentiles in Table 4 (Appendix C) can be used for specified values of cx, ß, and
o/a. It also would be desirable to have an approximate formula that gives n explicitly as a function of these values. Such an approximation can be based on the normal approximation x(v) y + which was given in Chapter 8. If this approximation is used in both sides of equation (12.3.12), thea it is possible k derive the approximation
fla 1+2
2
(cro/oi)2zi_
Zß
i
12.3.13)
(0/)2
Exampl& 123.2 We desire to test H0 : cr2 zr 16 versus Ha : .2 > 16 with size = 0.10 and power 1 - ß = 0,75 when r = 32 is true, Based on approximation (12.3.13) with
1282, Zß = û 674 and (o0/c1)2 = 05 we obtain n = 15 If ve compute both sides of equation (12.3.12) for values of y close to y = 15 - 1 14, we obtain the best agreement when y = 15, which corresponds to n
16.
TWO-SAMPLE TESTS It is possible to construct tests of hypotheses concerning the variances of two normal distributions, such as H0 o/ = d0, based on an F statistic. In particular, consider the test statistic
s
F0 = -
d0
(12.3.14)
-'2
where F0 - F(n1 - 1 n2 - 1) if H0 is true Theorem 12.3.4 Suppose that x1,
..., x1 and Yi' ..., y,,2 are observed values of independent random samples from N(,u1, o) and N(jt2, o), respectively, and let
f0=d0
-
(12.3.15)
1. A size test of H0 : cr/ d0 versus Ha : a/o- > d0 is to reject H0 if 1/f_(R2 - 1, ii1 - i). f0
2 A size test of H0 'r/r f0 f1_,,(ni - 1, n2 - 1).
d0 versus Ha cr/o
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
12.3 TESTS FOR THE NORMAL DISTR8UTION
403
3. A size a test of H0 : a/o' = d0 versus 'Ha : c/cr d is to reject H0 if 1/f1 a12(2 1, n1 1) orf0 i, - 1). f0 fi-a12@. if the variances are unknown but equal, then tests of hypotheses concerning the means such as H0 : p, - = can be constructed based on the t distribution. In particular, let (1 2.3.16)
where s is the pooled estimate defined by equation (11.5.7).
Theorcìm 12.3.5 Suppose that x1, ..., x1 and y, ..., y
are observed values of independent
random samples from N(p1, ci) and N(u2, ci), respectively, where cr =
A size a test ofH0 : P2 - p s d0 versus Ha : P2 H0 if t0 t1 (n1 + fl2 - 2). A size a test ofH0 : p2 d0 versus Ha : P2 H0 jf t0
-
ti_a(fli + n2 2).
- Pi > d0 is to reject
-i
A size a test ofH0 : P2 - pi = d0 versus H0: P2 H0 if t
t1a12(fli + n2 2) ort0
ci2.
< d0 is to reject
Pi
d4, is to reject
t1_012(fl1 + fl2
2)
The power functions for these tests involve the noncentral t distribution. It is possible to determine equal sample sizes n1 = n2 = n for a one-sided size a test with power i - fi by using Table 8 (Appendix C) with d = P - Pi ¡ci For a two-sided test, the size is 2a. Again, it is necessary to have a value to use for ci or else to express the difference in standard deviations, Of course, the test will have I
I
the proper size whatever n is used.
For unequal variances, an approximate test can be constructed based on Welch's approximate t statistic as given by equation (11.5.13). Similarly, tests for other cases such as the paired sample case can be set up easily.
PAIRED-SAMPLE t TEST All of the above tests assume that the random samples are independent As noted in Chapter 11, there are situations in which an experiment involves only one set of individuals or experimental objects, and two observations are made on each individual or object. For example, one possible way to test the effectiveness of a diet plan would be to weigh each one of a set of n individuals both before and after the diet period. The result would be paired data (x1, Yi).....(x, Ya) with x and y the weight measurements, respectively before and after the diet, for the ith individual in the study. Of course, one might reasonably expect a dependence
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
404
CHAPTER 12 TESTS OF HYPOTHESES
between observations within a pair because they are measurements on the same individual. We will assume that such paired data are observations on a set of independent pairs of random variables from a bivariate population (X Y) f(x, y) and that the differences D = Y - X are normally distributed D ' N(PD, o), with /-2D = E(Y) - E(X) = p2 - p. One possibility that leads to this situation is if the
pairs are a random sample from a bivariate normal population. That is, if there is independence between pairs, and each has the same bivariate normal distribution, then the differences are independent normal with mean P2 - p. Thus, a
test based on the T-variable of equation (11.5.17) applied to the differences d1 y1 - x yields a test of P2 - p1 with a unknown. Theorem 12.3.6 Denote by (x1, yi), ..., (xe, Ya) the observed values of n independent pairs of random variables, (X1 I'j) , (X 1,), and assume that the differences DL = - X1 are normally distributed, each with mean PD = Pl - p and variance cri. Let d and s be the sample mean and sample variance based on the differ-
encesdZ=yxL,forl=1
,n,andlet
-
(12.3.17)
Sd/
A size test ofH0 P2 - p1 H0 if t a ti_a(fl 1). A size test of H0 P2 - Pi
d0 versus Ha:p,pi
> d0 is to reject
d0 versus Ha P2 - Pi < d0 is to reject
H0 ift0 - t1_«(n 1) 3, A size test of H0 P2 - Pi = versus Ha P2 Pi H0 if either t0 - t 12(n - 1) or t0 t1 (n - 1).
d0 is to reject
124 ?3NOMIAL TESTS The techniques used to set confidence intervals for the binomial distribution also can be modified to obtain tests of hypotheses. Suppose that X1 BIN(1, p), i = 1, ..., n. Then tests for p will be based on the sufficient statistic S = X1
BIN(n, p).
Theorr'ieri 12.4.1 Let S BIN(n, p). For large n, an approximate size Ha p > Po is to reject H0 if
test of H0 :
Po
against
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
12.4
BINOMIAL TESTS
405
An approximate size a test of H0 p zO<
Po against H :p < Po is to reject H if
Z1_a
An approximate size a test of H0 p = Po against H p O
<
or
Z1 aJ2
z0
>
i
Po iS
to reject H0 if
/l
These results follow from the fact that when p = Po, then
Snp0 /np0(1
-z d
N(O, 1)
(12.4.1)
- Po) As in the previous one-sided examples, the probability of rejecting a true H will be less than a for other values of p in the null hypotheses.
Exact tests also can be based on S, analogous to the way exact confidence intervals were obtained using the general method. These tests also may be conser-
vative in the sense that the size of the test actually may be less than a for all parameter values under H0, because of the discreteness of the random variable.
Theorem 12.42 Suppose that S '-i BIN(n, p), and B(s; n, p) denotes a binomial CDF. Denote by s an observed value of S.
1. A conservative size a test of H0
H0if
against Ha : P > Po is to reject
i - B(s 1; n, Po) (a 2. A conservative size a test of H0
H0if B(s; fl,
against Ha : p < Po is to reject
Po) (a
3. A conservative two-sided test of H0 P reject H0 if
against Ha : p
Po 1S to
B(s;n,p0)(a/2 or i B(s-- 1; n,p0)(cx/2 The concept of hypothesis testing is well illustrated by this model. If one is testing H0 p Po' then H0 is rejected if the observed s is so small that it would have been very unlikely ((a) to have obtained such a small value ol s when P ? Po Also, it is clear that the indicated critical regions have size (a. In case 2, for example, if Po happens to be a value such that B(s; n, Po) a, then the test will have exact size a; otherwise it is conservative. Note also that B(s; n, p), for example, is the p value or observed size of the test in case 2, if one desires to use that concept.
It also is possible to construct tests of hypotheses about the equality of two population proportions. In particular, suppose that X BIN(n1, Pi) and Y BIN(n2, P2), and that X and Y are independent. The MLEs are = X/n1 and
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
406
CHAPTER 12 TESTS OF HYPOTHESES
P2 = V/n2. Under H0 : = P2' it would seem appropriate to have a pooled estimate of their common value, say = (X + Y)/(n1 + n2). It can be shown by the methods of Chapter 7 that if p = P2, then
I
P2P1 /1
Z''N(O1)
(12.4.2)
J(l _p)+-1
and consequently an approximate size a test of H0 p = P2 versus Ha Pi would reject H0 if z0 zi_a12 or Z0 ' Zi_«12.
P2
12.5 POISSON TESIS Tests for the mean of a Poisson distribution can be based on the sufficient statistic S = X.. In the following theorem, F(x; p) is the CDF of X ' P01(p). Theorem 12.5.1 Let x1, ...,
X
be an observed random sample from P01(p), and let s =
i A conservative size a test of H0 ji H0 if i - F(s - 1; np0) a. 2. A conservative size a test of H0 p H0 if F(s; np0)
x1.
p0 versus Ha ii > 120 is to reject p0 versus Ha : 4u
a.
Using the results in Exercise 18 of Chapter 8, it is possible to give tests in terms of chi-square percentiles. In particular, H0 of part i is rejected if 2np0 and H0 of part 2 is rejected if 2np0 x-2(2s + 2). A two-sided size a test of P = p versus Ha P Po would reject H0 if 2np0 42(25) or 2np0 x /2(2s + 2). Again the concept of p-value may be useful in this problem, where for an observed s, 1 - F(s - 1; np0) and F(s; np0) are the p-values for cases I and 2, respectively.
126 MOST POWERFUL TESTS In the previous sections, the terminology of hypothesis testing has been developed, and some intuitively appealing tests have been described based on pivotal quantities or appropriate sufficient statistics In most cases these tests are closely related to analogous confidence intervals discussed in the previous chapter The tests presented earlier were based on reasonable test statistics, but no rationale
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
12.6
MOST POWERFUL TESTS
407
was provided to suggest that they are best in any sense. If confronted with choosing between two or more tests of the same size, it would seem reasonable to select the one with the greatest chance of detectrng a true alternative value In other words, the strategy will be to select a test vìth maximum power for alternative values of the parameter. We will approach this problem by considering a method for deriving critical regions corresponding to tests that are most powerful tests of a given size for testing simple hypotheses Let X1, , X,, have joint pdff(x1, , x,, O) and consider a critical region C The notation for the power function corresponding to C is (12.6.1)
Le finition 126i A test ofH0 :0 = 8 versus Ha: most powerful test of size a if
0 based on a critical region C is said to be a
1. ir(80) = a, and
2, i(0)
0) for any other critical region C of size a a].
[that is, zc(Oo)
Such a critical region, C*, is called a most powerful critical region of size a. The following theorem shows how to derive a most powerful critical region for testing simple hypotheses.
Theer 72.6.1 Neyman-Pearson Lemma Suppose that X1, ..., X,, have joint pdf f(x1, ..., x,,; O). Let
..., x,, ; 6)
(12.6.2)
and let C* be the set (12.6.3)
where k is a constant such that
X,,) e C' J O]
(12.6.4)
Then C* is a most powerful critical region of size a for testing H0 : O = 8 versus Ha: O = 01.
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
408
CHAPTER 12 TESTS OF HYPOTHESES
Proof For convenience, we will adopt vector notation, X = (X1, x = (x1, ..., x,,). Also, if A is an n-dimensional event, let P[Xe AIG]
O) dx1
..., X) and
... dx
(12.6.5)
for the continuous case. The discrete case would be similar, with integrals replaced by summations. We also will denote the complement of a set C by C. Note that if A is a subset of C*, then
kP[Xe A 0]
P[XeA jOe] because JA f(x; O)
(12.6.6)
J kf(x; Or). Similarly, if A is a subset of C, then
P[Xc AjO0]? kP[Xe AjO1]
(12.6.7)
Notice that for any critical region C we have
C* = (C* n C) u (C* n
C)
and
C = (C n C*) u (C n C*)
Thus, lt ea(0)
P[X E C* n C IO] + P[X C n C O]
and
= P[Xe C* n CjO] + P[Xe C n C*IO] and the difference is
'(0) ir(&) = P[Xc C* n Cje] - P[Xe C n C*jo]
(12,6.8)
Combining equation (12.6.8) with 0 = O and inequalities (12.6.6) and (12.6.7), we have ltC*(Oi) -
ir(0) ? (1/k){P[X e C* n Cj O] - P[X e C n
C* I
O]}
Again, using (12 6 8) with O = 0 in the right side of this inequality we obtain
lt*(Oi) - it(0)
(1/k)[it,(60) - ir(0)]
1f C is a critical region of size z, then it(00) - lt(O0) = side of the last inequality is O, and thus it*(Oi) ?
= O, and the right
The general philosophy of the Neyman-Pearson 'approach to hypothesis testing is to put sample points into the critical region until it reaches size a. To maximize power, points should be put into the critical region that are more likely under H than under H0. In particular, the Neyman-Pearson lemma says that the criterion for choosing sample points to be included should be based on the magnitude of the ratio of the likelihood functions under H0 and Ha.
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
12.6 MOST POWERFUL TESTS
409
Example 12.6.1 Consider a random sample of size n from an exponential distribution, X. EXP(0). We wish to test H0 0 = 00 against Ha : O = 0 where O > 0. The Neyman-Pearson lemma says to reject H0 if 2(x, O
0)
exp (- 0j exp (0
xdOo) xil'Ol)
k
where k is such that P[1(X, 0, 0) z k] = a, under 0 = 00. Now
P[1(X; 0, O)
k
I 0] = P[ X1(1/01 - 1/0e)
ln ((00/01)"k) I Oo]
so that
P[X E C' I 0] = P[
X.
k1 0]
(12.6.9)
where k1 = In ((00/01)k)/(1/01 - 1/0e). Notice that the direction of the inequality
changed because 1/01 - 1/00 < O in this case. Thus, a most powerful critical region has the form C' = {(x1, ..., x)I x) k1}. Notice that under H0 : O = 00, we have 2 X./00 -'2(2n), so that k1 = 00_1(2n)/2 would give a critical
region of size a, and an equivalent test would be to reject H0 if 2 > xJ00 The original constant k could be computed if desired, but it is not necessary in order to perform the test Similarly, if we wish to test H0 0 = 00 versus Ha 0 0 with 81 <8e, then the most powerful test of size a is to reject H0 if 2 x,/00 (2n) The only difference between the two tests comes about because of the difference in the sign of 1/0k - 1/0e in the two cases. In other words, the right side of equation (12.6.9) becomes P[ X. k1 I 0] if 8 < 80, which corresponds to 1/0k - 1/82 > O. Note also that this is the only way in which C* depends on the alternative value in this example. That is, the most powerful test of H0 : O = 8 versus Ha : 0 = 82 is exactly the same, provided that 0 and 02 are both greater (or both less) than 00. This makes it possible to extend the concept of most powerful test for a simple alternative to a "uniformly most powerful" test for a composite alternative such as Ha : 0> 0 x1.(2n)
This concept is considered more fully in the next section.
Example 12.6.2 Consider a random sample of size n from a normal distribution with mean zero, X N(O, o-2). We wish to test H0 : o-2 cr versus Ha o-2 cr with o- > o-g. In this case 'Y'
2iw0! )fl
exp [-
x/2o-]
exp [-
x?/2o-fl
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
410
CHAPTER 12 TESTS OF HYPOTHESES
Thus, 2
k is equivalent to
(1/c-1/cr)x?
x/cr ( X(n).
reject if
The previous examples involve continuous distributions, so that a test with a prescribed size a is possible. For discrete distributions it may not be possible to achieve an exact size a, but one could choose k to give size at most a and as close to a as possible. In this case the Neyman-Pearson test is the most powerful test of size t.(O0) = z1 a, and it would be a conservative test for a prescribed size a.
Examplo 12.6.3 We wish to determine the form of the most powerful test of H0 p = Po against Ha p = ¡ > based on the statistic S BIN(n, p). We have
()i
2
Po)"
(fl(1 - Pi)
k
\SJ
so that Ipo(1
- Pi)' - Po)
k1
or s In [P0(1
- Pi)]
LPi(1 - Po)
In k1
Because p(l - Pi)/Pi(i - Po) 1, the log term is negative and the test is to reject H0 ifs k2, which is the same form as the binomial test suggested earlier. Now P[S
IP = Po] = i - B(i - 1;
fl,
Po) =
so for integer values i = 1, ..., n, exact most powerful tests of size a, are achieved by rejecting H0 if s i. For other prescribed levels of a the test would be chosen to be conservative as discussed earlier.
The Neyman-Pearson lemma does not claim that the conservative tests would be most powerful. Somewhat artificially, one can increase the power of the conservative test by adding a fraction of a sample point to the critical region in the
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
12.7
411
UNIFORMLY MOST POWERFUL TESTS
discrete case so that P[T1] = a. That is, if P[S 7] a, then one could reject H0 if s 7 and some fraction of the time when s = 6, depending on what fraction is needed to make the size of the critical regìon equal a. This is referred to as a randomized test, because ifs = 6 is observed one could flip a coin (appropriately biased) to decide whether to reject. In most cases it seems more reasonable to select some exact a1 close to the level desired, and then use the exact, most powerful test for this a significance level, rather than randomize on an additional discrete point to get a prescribed size a. Note that the Neyman-Pearson principle applies to testing any completely specified H0 f0(x; O) against any completely specified alternative Ha : f1(x; Or). In most applications x will result from a random sample from a density with possibly different values of a parameter, but x could result from a set of order statistics or some other multivariate variable. Also, the densities need not be of the same type under H0 and Ha as long as they are completely specified, so that the statistic can be computed and the critical region can be determined with size a under H0.
Example 12.6.4 We have a random sample of size n, and we wish to test H0 X against H0 : X
UNIF(O, 1)
EXP(l). We have
)fo(xl...xfl) e
so we reject FI if x sí k1 = In k. The distribution of a sum of uniform variables is not easy to express, but the central limit theorem can be used to obtain
an approximate critica' value. We know that if X
UNIF(0, 1), then E(X)
= 1/2, Yar(X) = 1/12, and N(0, 1)
Thus, an approximate size a test is to reject H0 if
z, = /12n( - 0.5)
z.«
The concept of a most powerful test now will be extended to the case of composite hypotheses.
127 UNIFORMLY MOST POWERFUL TESTS In the last section we saw that in some cases the same test is most powerful against several different alternative values. If a test is most powerful against every possible value in a composite alternative, then it will be called a unìformly most powerful test.
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
412
CHAPTER 12 TESTS OF HYPOTHESES
finition 12.7.1
L,
Let X1, . .., X have joint pdff(x1,..., x; O) for 8 E , and consider hypotheses of the form H0 O E fl0 versus Ha O e where ti0 is a subset of ti A critical
-
region C*, and the associated test are said to be uniformly most powerful (UMP) of size e if max irc.(0) =
(12.7.1)
i(0)
(12.7.2)
and lrc.(0)
for all O E ti - Li0 and all critical regions C of size e.
That is, C* defines a UMP test of size e if it has size e, and if for all parameter values in the alternative, it has maximum power relative to all critical regions of size e.
A UMP test often exists in the case of a one-sided composite alternative, and a
possible technique for determining a UMP test is first to derive the NeymanPearson test for a particular alternative value and then show that the test does not depend on the specific alternative value.
Example 12.7.1 Consider a random sample of size n from an exponential distribution, X
EXP(0), It was fòund in Example 12.6.1 that the most powerful test of size e
of H0 O =
00 versus Ha: 0 = 01, when 0 > 00, is to reject H0 if 2n5/00 = 2 x1/00 x (2n). Because this does not depend on the particular value of 61, but only on the fact that 01 > O, it follows that it is a UMP test of H0 : O =
00 versus Ha : 0> 0, Note also that the power function for this test can be expressed in terms of a chi-square CDF, H(c; y), with y = 2n. In particular, rc(0) = i - H[(Oo/O)x_0(2n); 2n]
(12.7.3)
X/O0] = 2 X1JO ' 2(2n) when 0 is the true value. Because ir(0) is an increasing function of 0, max ir(0) = ir(Oo) = , and the test is also a because (O/0)[2
UMP test of size e for the composite hypotheses H0 : O 0 versus H0: 6> 00. Similarly, a UMP test of either H0 : O = 6 or H0 : 0 00 versus Ha : 0 < 00 is to reject H0 if 2n/O0 2n), and the associated power function is ir(0) = H[(00/O)x(2n); 2n]
(12.7.4)
In Example 4.6.3, observed lifetimes of 40 electrical parts were given, and it was conjectured that these observations might be exponentially distributed with mean lifetime A = 100 months. In a particular application, suppose that the parts will be unsuitable if the mean is less than 100. We will carry out a size e = 0.05 test of
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
12.7
UNIFORMLY MOST POWERFUL TESTS
413
H0 0 ? 100 versus Ha O < 100. The sample mean is 5 = 93.1, and consequently 2n/80 = (80)(93 1)/lOO = 74 48 > 60 39 = x (80) which means that we cannot reject H0 at the 0.05 level of significance. Suppose that we wish to know F[TII] if, in fact, the mean is O = 50 months. According to function (12.7.4), ir(50) = H(120.78; 80). Table 5 (Appendix C) is generally useful for determining such quantities, although in this case y = 80 exceeds the values given in the table.
From Table 4 we can determine that the power is between 0.995 and 0.999, because 116.32 < 120.78 < 124.84; however, an approximate value can be found from the approximation provided with Table 5. Specifically, H(120.28; 80) 0.9978, so P[TII] i - 0.9978 = 0.0022, and Type II error is quite unlikely for this alternative. Note that for the two-sided composite alternative Ha : O
80, it is not possible
to find a test that is UMP for every alternative value. For an alternative value 01 > 0 a right tail critical region is optimal, but if the true O is 02 < 00 then the right tail critical region is very poor, and vice versa As suggested earlier we could compromise in this case and take a two sided critical region but it is not most powerful for any particular alternative It is possible to extend the concept of unbiasedness to tests of hypotheses, and in the restricted class of unbiased tests there may be a UMP unbiased test for a two-sided composite alternative. This concept will be discussed briefly later It is easy to see that the other Neyman-Pearson tests illustrated in Examples 12.6.2 and 12.6.3 also provide UMP tests for the corresponding one-sided composite alternatives General results along these lines can be stated for any pdf that satisfies a property known as the "monotone likelihood ratio."
For the sake of brevity, most of the distributional results in the rest of the chapter will be stated in vector notation. For example, if X = (X1, ..., X), then X'-f(x, 0) will mean that X1, , X, have joint pdff(x1 0) ,
Definition 12.7.2 A joint pdff(x; O) is said to have a monotoiie likelihood ratio (MLR) in the statistic T = 1(1) if for any two values of the parameter, 81 <82, the ratiof(x; 82)/f(x; 81) depends on x only through the function 1(x), and this ratio is a nondecreasing function of 1(x).
Notice that the MLR property also will hold for any increasing function of 1(x).
Example 12.7.2 Consider a random sample of size n from an exponential distribution, X. EXP(0). Becausef(x; O) = (1/0)" exp (- x1/0), we have
f(x; 02)
n f(x; O) - (81/02) exp
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
414
CHAPTER 12 TESTS OF HYPOTHESES
which is a nondecreasing function of 1(x) =
MLR property in the statistic T = holds for the statistic
,
x if 62>01. Thus,f(x; 6) has the X,. Notice that the MLR property also
because it is an increasing function of T.
The MLR property is useful in deriving UMP tests. Th6or&m 12.7.1 If a joint pdff(x; 0) has a monotone likelihood ratio in the statistic T =
then a UMP test of size 1(x)
k, where P[I(X)
for H0 : 0
versus Ha 0> O is to reject H if
0
k J 0] = .
The dual problem of testing H0 : O 6 versus Ha : O < 0 also can be handled by the MLR approach, but the inequalities in Theorem 12 7 1 should be reversed. Also, if the ratio is a nonincreasing function of 1(x), then H0 of the theorem can be rejected with the inequalities in 1(x) reversed. In many applications, the terms nondecreasing and nonincreasing can be replaced with the terms increasing and decreasing respectively, but not in all applications, as the next example demonstrates.
EximpIe 12.7.3 Consider a random sample of size n from a two-parameter exponential distribution, X EXP(1, ,j). The joint pdf is
f(x;
exp [- (x -
)
xi.,,
11
Ifi1
JO
fi
f(x; i)
ìexp [n(2
f2
f2
That this function is not defined for Xja is not a problem, because P[Xj: = O when 'Jj is the true value of j, Thus, the ratio is a ¶iondecreasing function of x1., and the MLR property holds for T = X1.. According to Theorem 12.7.1, a UM test of size for H0 : , > is to versus Ha reject H0 if x,, kwjdre = P[X1: kJi0] = exp [n(k j)], and thus k = - (In )/n.
-
Theorem 12.7.2 Suppose that X, ..., X,, have joint pdf of the form f(x; 0) = c(0)h(x) exp [q(0)1(x)]
(12.7.5)
where q(0) is an increasing function of 0.
1. A UMP test of size for H0 : O 1(x) k, where P[1(X) k O]
0
versus Ha: 0> 6 is to reject H0 if
=
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
415
12.7 UNIFORMLY MOST POWERFUL TESTS
2. A UMP test of size for H0: 0 00 versus H0: 0< 0 is to reject H0 if (x) z k, where P[I(X) k 00] = I
Proof If 0 < 02, then 0
q(01) < q(02), so
-
2)
that
exp {[q(02) - q(01)]4x)}
which is an increasing function of 4x) because q(02) -
q(01)
> 0. The theorem
follows by the MLR property.
An obvious application of the theorem occurs when X1, ..., X, is a random sample from a member of the 'egular exponential class, say f(x; 0) u(x,) and q(0) an increasing function of O = c(0)h(x) exp [q(0)u(x)] with 4..) Exampk? 't72J.4 Consider a randóm sample of size n from a Poisson distribution, X - P01(u). The j oint pdf is
f(x; p) = -,
ail x = 0, 1, !
x0!)
exp [(In p)
x]
x. A UMP test of size The theorem applies with q(u) = in p and 4x) X. ? k where for H0 : p Po versus H : ,- > Po would reject H0 if T P[T
k! P0] =
. Because T
POI(np), we must have
exp (- np0)(np0)'/
t! = . We again have a discreteness problem, but the tests described in Theorem 12 7 2 are UMP for the particular values of c that can be achieved
/ As mentioned earlier, there is a close relationship between tests and confidence intervals. If one tests H0 : O = 0 against Ha : O 00 at the significance level, then for a given sample, the set of 00 for which H0 would not be rejected represents a 100(1 - )% confidence region for 0. Loosely speaking, if the acceptance set of a size test is an intervaÌ then it is a 100(1 - )% confidence interval. Thus, one approach to find a confidence interval is first to derive an associated test by one of the techniques discussed earlier. Goodness properties of confidence
intervals usually are expressed in terms of the associated test. For example, a confidence region associated with a UMP test is termed uniformly most accurate (TJMA).
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
416
CHAPTER 12 TESTS OF HYPOTHESES
UNBIASED TESTS
it was mentioned earlier that in some cases where a UMP test may not exist, particularly for a two-sided alternative, there may exist a UMP test among the restricted class of "unbiased" tests.
Definition 12.7.3 A test of H0 : O e
min ir(0)
OCo
versus H0: O e CI - CIØ is unbiased if max it(0)
(12.7.6)
Ooflo
In other words, the probability of rejecting H0 when it is false is at least as large as the probability of rejecting H0 when it is true.
Example 12.1.8 Consider a random sample of size n from a normal distribution with mean zero, X N(0, a2). It is desired to test H0 : cr2 = versus a two-sided alternative, H0 a2 o, based on the test statistic S0 = X?/cr. Under H0 we know that Sq3(i), so an equal tailed critical region, similar to that of part 3 of Theorem 12,3.3, would reject H0 if s X,2(n) or s0 x_012(n). In particular, consider a sample of size n = 2 and a test of size = 0.05 for H0 : a2 = 1. The graph of the power function for this test is given in Figure 12 5
FlOU 11E 12.5
The power function of a biased test.
The minimum value of it(a2) occurs at a value a2 a, and thus the test is less likely to reject H0 for some values a2 c than it is when cr2 = a Consequent ly, the test is not unbiased It is possible to construct an unbiased two sided test if
we abandon the convenient equal-tailed test in favor of one with a particular
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
12.8 GENERALIZED LIKELIHOOD RATIO TESTS
417
choice of critical values x1(n) and 2(n) with a + a2 = a, but a a2 (see Exercise 26) Such a test is not very convenient to use, and the biased equal tailed test usually is preferred in practice.
It can be shown that the test described above is a UMP test among the restricted class of unbiased tests of H0 In fact, it can be shown, under certain conditions, that for joint pdf's of the form given in equation (12.7.5), a uniformly
versus H 8 60 exists. most powerful unbiased (UM PU) test of H0 8 = Methods for deriving UMPU tests are given by Lehmann (1959), but they will not be discussed here.
i28 GENERALIZED LIKELIHOOD RATIO TESTS The Neyman-Pearson lemma provides a method for deriving a most powerful test of simple hypotheses, and quite often this test also will be UMP for a onesided composite alternative. Two theorems that were stated in the previous section also are useful in deriving UMP tests when there is a single unknown parameter.
Methods for deriving tests also are needed when unknown nuisance parameters are present or in other situations where the methods for determining a UMP test do not appear applicable. For example, in a two-sample normal problem, one may test the hypothesis that the means are equal, H0 p1 = P2, without specifying them to be equal to a particular value. We discussed several natural tests in the first five sections, which were mostly suggested by analogous confidence-interval results. Of course, some of these tests also are UMP tests, but it is clear more generally that a test can be based on any statistic for which a
critical region of the desired size can be determined. The problem then is to choose a good test statistic and a reasonable form for the critical region to obtain a test with high power. If the distribution depends on a single unknown param-
eter, then a single sufficient statistic or an MLE may be available, and a test could be based on this statistic. For a multiparameter problem, a test statistic might be some function ofjoint sufficient statistics or joint MLEs, but it may not always be clear what test statistic would be most suitable. Of course, the distribution of the statistic must be such that the size of the critical region can be computed and not depend on unknown parameters. For example, Student's t statistic
may be used to test the mean of a normal distribution when the variance is unknown. Given a suitable test statistic, then, as with the Neyman-Pearson lemma, sample points should be included in the critical region that are less likely to occur under H0 and more likely to occur under H.
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
418
CHAPTER 12 TESTS OF HYPOTHESES
The generalized likelihood ratio test is a generalization of the Neyman-Pearson test, and it provides a desirable test in many applications.
Definition 12.8.1 Let 1= (X1
X) vihere X1
sider the hypothesis H0 : O e ratio (GLR) is. defined by
maxf(x; O) A(x)
X have joint pdff(x O) for 8 e and con versus Ha O e ( - flu, The generalized likelihood
f(x, f(x;
maxf(x; O)
(12.8.1) O)
where Ô denotes the usyial MLE of O, and Ô denotes the MLE under the restriction
that H is trueY
In other words, Ô and Ô are obtained by maximizingf(x; O) over Il and respectively. The generalized likelihood ratio test is to reject H0 if 2(x) k is chosen to provide a size test.
,
k, where
-
Another, slightly different approach is to maximize over O e L in the denominator, but the form in Definition (12.8.1) often is easier to evaluate and yields equivalent results.
Essentially, the GLR principle determines the critical region and associated test, by deciding which points will be included according to the ratio of estimated
likelihoods of the observed data, where the numerator is estimated under the restriction that H0 is true. This is similar to the Neyman-Pearson principle where the likelihoods are completely specified, but it is not, strictly speaking, a complete generalization of the Neyman-Pearson principle, because the unrestricted estimate, Ô, could possibly be in
We see that 2(X) is a valid test statistic that is not a function of unknown parameters; in many cases the distribution of A(A) is free of parameters, and the exact critical value k can be determined. In some cases the distribution of ).(X) under H0 depends on unknown parameters, and an exact size critical region cannot be determined If regularity conditions hold, which ensure that the MLEs are asymptotically normally distributed, then it can be shown that the asymptotic distribution of 1(X) is free of parameters, and an approximate size test
will be available for large n. In particular, if X-.'f(x; O, ..., 0,3, then under H0 : (O,, ..., 03 (0,,, ..., O), r < k, approximately, for large n, 2 In A(A) -X2(r) (12,8.2) Thus an approximate size
2 In 2(x)
test is to reject H0 if
X(r)
(12.8.3)
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
12.8 GENERALIZED LIKELIHOOD RATIO TESTS
419
is known and we wish to test H0 ji = ji0 ExampI 12 8 1 Suppose that X, N(p c2) where against Ha : ji ji, The usual (unrestricted) MLE is ft = 5, and the GLR is
2(x) -
f(x; jic)
f(x;p) (22w2)'2 exp [-
(x1 -
(22)2 exp [-
(x1 - )2/2a2]
which, after sne spiplification, can be expressed as
1(x) = exp [n( Rejecting H0 if 2(x)
k is equivalent to rejecting H0 if
jk1
2
N(O, 1) and Z2
where Z
.'
2(1). Thus, a size x test is to reject H0 if
X(1)
z2
or equivalently, reject H0 if
z1_12
z
or
z
z12
Thus the likçlihood ratio test may be reduced to the usual two-sided equal-tailed normal test. It is interesting to note that the asymptotic approximation, 2 In 1(X) 2(1) is exact in this example.
Example 12.8.2 We now consider the hypotheses H0 : ji = ji0 against Ha ji > ,Li0. For practical purposes, the GLR test in this case reduces to the one-sided UMP test based on z; however, there is one technical difference. In this case, the MLE relative to is = [u0,
) =
Because the size of the test, cc, usually is quite small, and we will be rejecting H0
for large 5, ordinarily we will be concerned only with determining a critical region for the GLR statistic for the case when 5 > ji0. Specifically, we have 2(x)
exp E- n( - p)2/2o]
but under H0, P[2(X) < 1] = P[ > ji0] = 0.5. So, for cc < 0,5, k < 1, and the critical region will not contain any x such that 2(x) = 1. In particular, the
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
420
CHAPTER 12 TESTS OF ,VPOTHESES
GLR
test
is
to
reject
H0
if
.>
and 1(x) z k;
or c >
and
z2=[/( p0)/ci]2k1 when>p0,z2k1ifandonlyifz,/?Z Thus,a
size a test (for a <0.5) is to reject H0 if z z1_, which also is the UMP test for these hypotheses. The GLR test of H0 p p,, against Ha : p > p is somewhat similar. In this case, ,û = , but maximizing the likelihood function over gives
>/ Thus, 2(x)
f(x; ,L)
f(x;j)
exp [-
- p) /2o]' 2
This is the same result as obtained for testing the simple H0
p = 110
against
Ha :11 > /10 The same critical region also gives a size a test for the composite null hypothesis H0 p , because z = (L p)/(a//), F[Z ? z1 1/10] a,
and
P[Zzi_ajjt]
-
It sometimes is desired to test a hypothesis about one unknown parameter in the presence of another unknown nuisance parameter.
ExmpI 12.8.3 Suppose now that X1 N(p, a2) where a2 is assumed unknown, and we wish to test H0 : /1 Po against Ha : Po This does not represent a simple null hypothesis, because the distribution is ilot specified completely under H0. parameter space is two-dimensional, fi = (aa, co) x (0, co) {(p a2) co
l
a2 > 0}. These sets are illustrated in Figure 12.6. FIGURE 12.6
A subset 1O of hypothesized values p0 a0
¡Lo
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
421
12.8 GENERAUZED LIKELIHOOD RATIO TESTS
Maximizing f(x; u, cr2) over
=
yields
the usual MLEs
we obtain fi = t0 and ô =
(x1 - )2/n, but over
=
and
(x1 - p0)2/n.
Thus,
2(x) -
f(x; ¡, f(x; ¡, &)
(2ir&)2 exp [- (x (2&2)"2exp [-
(x1 - )2/2&2]
= [&/&2] n/2 and consequently,
(x - ìuo)2
[1(x)] - 2/n
(x)2 +n(.j0)2 (x
= + n((x- - ) = 1 + 12(x)/(n - 1) where
=
1(x) =
T = 1(s)
-
- j0)/s. Under H0: io)// (x - )2/(n - 1) = t(n - 1) and T2 '- F(1, n - 1). Thus, rejecting H0 when 2(x)
is small is equivalent to rejecting H0 if T2 is large, and a size z test is to reject H0 if t2 f1_(1, n - 1), or alternatively if
t;
(n-1)
or
t
Thus the two-sided test proposed earlier based on Student's t pivotal quantity is now seen to be a GLR test. This two-sided test is not UMP, but it can be shown to be UMPU. The GLR approach also can be used to derive tests for two-sample problems.
Exwnple 12.&4 Suppose that X BIN(n1, p) and Y BIN(n2, P2) with X and Y independent, and we wish to test whether the proportions are equal, H0 : Pi = P2 = p against Ha Pi P2' where p is unknown The parameter space is = (0 1) x (0, 1) O < Pi < i and 0
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
422
FIG4RE 12.7
CHAPTER 12 TESTS OF HYPOTHESES
A subset of hypothesized vatuesp1 =p2 Pa
Based on x and y, the MLEs are = x/n1, k2 = y/ii2 ko = (x + y)/(n1 + n2) over fl,, The GLR statistic is
over t, and
f(x;
0)f(y; ) y - f(x; p1)f(y; P2)
(nl)kx(1 - ki)fh(2)k(l
\fl2Y
po)
P2)2
Except for the cancellation of the binomial coefficients, this particular GLR sta tistic does not appear to simplify greatly, but it can be computed easily. The distribution of A(X, Y) will depend on p under H0, however, so an exact size critical region cannot be determined. The chi-square approximation should be
useful here for large sample sizes. The hypothesis H0 represents a onedimensional restriction in the parameter space, because it corresponds to H0 : 8 = P2 - p1 = O; so r = lin expression (12.8.2), and approximately for a size test,
2 in A(X, Y)-X2(l) and H0 is rejected if 2 in 2(x, y) > The more common approach in this case is to use the approximate normal test statistic given by expression (12 4 2) It also is interesting to note that an exact conditional test can be constructed in this case, as discussed in the next section
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
12.8
423
GENERALIZED LIKELII-IOOD RATIO TESTS
k-SAMPLE TESTS The GLR approach also is helpful in deriving important tests involving samples from two or more normal distributions Suppose that independent random samples are obtained from k normal distributions. For each j = 1, ..., k, we denote the mean and variance of the jth distribution by ,u and cT, respectively, and we denote by n the sample size. Also,
we denote by x the observed value of the ith random variable from the jth random sample, X N(p, of). We will denote the individual jth
x/n and s
sample means and sample variances, respectively, by 5 = =
(x -
)2/(n - 1). It also will be convenient to adopt a notation for the =
mean of the pooled samples, namely
x/N = >
n,
/N with N
fl1+'
+flk. A test for the equality of distribution means now will be derived assuming a Lommon but unknown variance, say = c2 Specifically, we will derive the GLR test of H0 : p1 = = p, versus the alternative that p í for at least one pair i j. The likelihood is
L=L(ul,...,p,a2)=I11
i
exp
=
(x-
i -j
p
which yields the following log-likelihood: In L
=-
in
(2it2) -
(x -
-
= {,
a > O}, < Relative to the parameter space p, we compute the MLEs by taking partials of In L with respect to each of the k + i parameters and equating them to zero
lnL= 2aY2(x )(l)=O J=1 i 11nL=Nl2a22(2)2E()(l)
,
k
Solving these equations simultaneously we obtain the MLEs N
For the subspace CL under H0, the ¡'s have a common but unknown value, say p, and the MLEs are obtained by equating to zero the partials of in L with
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
424
CHAPTER 12 TESTS OF HYPOTHESES
respect to p and cr2:
a
leading to the solutions
"2
-
(ho, cre)
/1k' &2)
(2&)-1n/2 exp ci0
(2ith2)'2 exp
i
(x
(x -
)2] 2
(2itâ) - N/2 exp (- N/2)
- (2R&2)/2 exp (N/2) (6/&2) -N/2 (x (x
-N/2 (12.8.4)
-
It is possible to express this test in terms of an F-distributed statistic. To show this we first note the following identity (see Exercise 40):
+ n()2
(12.8.5)
From this and equation (12.8.4), it follows that A(x) =
which means that the GLR test is equivalent to rejecting H0 if for some c> O
(x -
Next we define variables V1 = V3
=
(12.8.6)
C
(x1 - .)2/cr2,
(x -
)2/cr2 and
n( - )2/cr2, and note that we can write V2 = (n 1)s/a2 Thus, V2
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
425
12.8 GENERALIZED LIVELIHOOD RATIO TESTS
is a function of the sample variances and V3 is a function of the sample means. Because the 's and si's are independent, it follows that V2 and V3 also are independent. Furthermore, from identity (12.8.5) we know that V1 = "2 + V3, so from the results of Chapter 8 about sums of independent chi-square variables, we have under H0 that V1 X2(N 1) and V2 '-. 2(N - k), from which it follows that V3 X2[(N - 1) - (N - k)] = 2(k - 1). Thus, the ratio on the left of equation (12 8 6) is proportional to an F-distributed statistic, namely
F
-
ii
n( - )2/(k - 1) (x - )2/(N - k)
Fk
- 1' N - k
)
128 . .7)
Finally, we note that the denominator of equation (12.8.7) is a pooled estimate of
a2 namely s =
(n - 1)s/(N - k).
The above remarks are summarized in the following theorem:
Theorem 12.8.1 For k normal populations with common variance, N (pi, a2),j = 1, ..., k, a size test of H0 :/11
= I-k is to reject H0 if
-i
fi -
(n - l)s/(N - k) and N
where s =
Furthermore, this test is equiv-
j= i
aient to the GLR test. An important application of this theorem involves testing the effects of k different experimental treatments For example, suppose it is desired to test whether k different brands of plant food are equally effective in promoting growth in garden plants. If jt is the mean growth per plant using brand j, then the test of Theorem 12.8.1 would be appropriate for testing whether the different brands of plant food are equivalent in this respect. This test also is related to a procedure called aoalysis of variance. This terminology is motivated by the identity (12.8.5). The term on the left reflects total varia-
bility of the pooled sample data. On the other hand, the first term on the right reflects variation "within" the individual samples, while the second term reflects variation between the samples Strictly speaking this corresponds to a one way analysis of variance because it only considers one factor, such as the brand of plant food It also is possible to consider a second factor in the experiment, such as the amount of water applied to plants. The appropriate procedure in this case is called a two way analysis of variance, but we will not pursue this point
The GLR approach also can be used to derive a test of equality of variances, = a. We will not present this derivation, but an equivalent test H0 : a
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
426
CHAPTER 12 TESTS OF HYPOTHESES
and an approximate distribution that is useful in performing such a test are given in the following theorems.
Theorem 12.8.2 Let
v in w,
where N =
(vi); then approximately
M/c '-.
f(k - 1)
if v
4
For smaller Vj critical values may be determined from Tables 31 and 32 of Pearson and Hartley (1958).
Now for normal samples, (n - 1)s/o j-' - 1), and if one lets w = in M, then the o will cancel out if they are all equal; thus M(n - 1, s) may be used as a test statistic for testing H0 : o = = cr, and its distribution may be
expressed in terms of the chi squared distribution under H0 This statistic is minimized when all of the observed SÌ are equal, and the statistic becomes larger for unequal which favors the alternative.
j = 1, ..., k, an approximate size test of against the alternative of at least one inequality is to
Theorem 12.8.3 For k normal populations N(u, H0
cr =
=
=o
reject H0 if
M(n - 1, sJ) > cX(k - 1)
129 CONDITIONAL TESTS It sometimes is possible to eliminate unknown nuisance parameters and obtain
exact size z tests by considering tests based on conditional variables. For example, if a sufficient statistic S exists for an unknown nuisance parameter O, then the distribution of XI S w'll not depend on O This technique will be illus trated for the two-sample binomial test.
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
12.9
427.
CONDITIONAL TESTS
BIN(n1, p) and Y '- BIN(n2, P2) with X and Y independent, We wish a size test of H0 Pi = P2 = p against Ha Pi
Exampl@ 12.9.1 Again let X
(fli)(n2)xY(1
f(x, y)
- p)
+112
(x+y)
and it is clear that S = X + Y is sufficient for the common unknown p in this density. This suggests considering a test based on the conditional distribution of (X, Y) given S = s. Because Y = S - X, it suffices to base the test on the conditional distribution of Y given S = s. Under H0, S «-j BIN(n1 + n2 , p), and thus J5,y(s, y)
fs(s)
f,(s
y y)
f5(s)
which is a hypergeometric distribution. This distribution does not involve p, and an exact size critical region can be determined under H0 for any given observed
value of s For Ha Pi
reject H0 if y n2
n1
iJ\\si
(ni±n2)
test, reject H0 if
-
Tests for other alternatives can be obtained in a similar manner. Except for the discreteness problem, this provides an exact size test In other words, it is exact for values of that the above sum can attain. Otherwise, the test is conservative for the prescribed
,
The following theorem is useful for constructing tests for hypotheses concerning a parameter O in the presence of nuisance parameters, K (ici, '''1
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
428
CHAPTER 12 TESTS OF HYPOTHESES
Theorem 12.9.1 Let X = (X1, ..., X) where X1, ..., X have joint pdf of the form
f(x,
K1(x)]
0, x) = c(0, K)h(x) exp [01(x) +
(129 1)
, m, and T = 1(X), then S1 If S, = ,(X) for i i , Sm are jointly sufficient ,Km for each fixed 0, and the conditional pdff13(t, 0) does not depend
for K1
on ic Furthermore,
i A size a test of H0 O
80 versus Ha 0> 6 is to reject H0 if
k(s) where PET 2. A size a test ofH0 : O
k(s) s] = a when O = 00 versus H0: O < 00 is to reject H0 if k(s) s] = a when O = 80.
1(x) 1(x)
k(s) where P[T
Under certain regularity conditions on equation (12.9.1), it also is possible to show that these tests are UMP unbiased tests. For more details, see Lehmann (1959).
Example 12.9.2 Consider a random sample from a gamma distribution, X
f(x; 6,
00K
K)
GAM(A, K). If we
1/). then the joint pdf is
reparameterize with 0
= F0(ic)
(1 1xY
exp (-6
= c(0, K)h(x) exp [0(
x1)
x) + ( -- 1) In (fl xi)]
where h(x) = i if all x > O, and O otherwise. According to the theorem, S = In (fl X) is sufficient for any fixed 8, and if T = - X, then the distribution of T given S = s does not depend on ic, The conditionâl pdffTls(t) is quite complicated in this example, but tables that can be used to perform an equivalent test are given by Engelhardt and Bain (1977). Note that a conditional size a test also is a size a test unconditionally, because, for example, if PET ? k(s) I s] = a, then
P[T
k(S)] = ES{P[T
k(S)IS]}
= Es(a)
=a
121O SEQUENTIAL TESTS We found earlier that for a fixed sample size n the Neyman-Pearson approach could be used to construct most powerful size a tests for simple hypotheses. Also, in some cases, formulas are available for computing the sample size n that yields
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
12.10 SEQUENTIAL TESTS
429
a size
test with a specified power, or equivalently, a specified level of Type II error, say ß. In this section we consider a sequential test procedure in which the sample size is not fixed in advance.
Example 22.1O.1 The force (in pounds) that will cause a certain type of ceramic automobile part to break is normally distributed with mean and standard deviation o = 40 The
original factory parts are known to have a mean breaking strength of 100 pounds. A manufacturer of replacement parts claims that its parts are better than the originals, and that the mean breaking strength of the replacement parts is 120
pounds. To demonstrate its claim, a test is proposed in which the breaking strength is measured for a fixed number n of the replacement parts, yielding data x,. Based on the resulting data, the manufacturer decides to perform a size a = 0.05 test of H0 ,u 100 versus Ha q> 100. We first consider the question of determining a fixed sample size based on methods discussed earlier in the chapter. It follows from Theorem 12.7.1 that a UMP test of size a for H0 ¡ versus Ha U > ji0 rejects H0 if e for an appropriate choice of e. Furtherz
more, we have from Theorem 12.3.1 that this is equivalent to rejecting H0 if - .i0)/a z1 , and that the sample size required to achieve a test z0 = with a specified ß = P[Type II error] is given by n = (Zl.. + z1_ß)22/(i20 ji1)2. Thus, to have a size a = 0.05 test of the replacement parts with ß = 0.10 when
ji = 120, a sample of size n = (1.645 + 1.282)2(40)27(100 - 120)2
34 is required. Obviously, the cost of such a project will depend on the number of parts tested, which might lead one to seek a procedure that requires that fewer parts be tested. One possibility is to consider a sequential test. In other words, it might be possible to devise a procedure such that testing of the first few parts would produce _________________ sufficient evidence to accept or reject H0 without the need for further testing.
SEQUENTIAL PROBABILITY RATIO TESTS Consider the situation of testing a simple null hypothesis H0 : O = 00 against a simple alternative hypothesis Ha : O = O. If X, ..., X is a random sample of size n from a distribution with pdff(x; O), then we know from the NeymanPearson lemma that a most powerful critical region is determined by the inequality
f(x,: jfl ... f('.fl
.1 C1
U.'
.1
f(x1; 0) «f(x; 0) where k is a positive constant. A sequential probability ratio test (SPRT) is defined in terms of a sequence of such ratios Specifically we define Xm)
f(x1 O) f(Xm 8) f(x1; 0) . f(x O)
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
430
CHAPTER 12 TESTS OF HYPOTHESES
for m = 1, 2, ..., and adopt the following procedure: Let k0
k0, then reject H0; if 22
k1,
then accept H0; and if k0 <22
remains between k0 and k1, and to stop as soon as either 2,
C,,={(x1,...,x,,flk0<1(x1,...,x)
k0}
1, 2.....
In other words, if for some n, a point (x1, ..., x,,) is in C,,, then H0 is rejected for a sample of size n. On the other hand, H0 is accepted if such a point is in an acceptance region, say A, which is the union of disjoint sets A,, of the following form: 2,a(xi, ..., x,,)
k1}
In the case of the Neyman-Pearson test for fixed sample size n, the constant k was determined so that the size of the test would be some prescribed Now it is
.
necessary to find constants k0 and k1 so that the SPRT will have prescribed values
and fi for the respective probabilities of Type I and Type II error,
a = P[reject H0 0]
=
JL(oo)
(12.10.1)
and
fi = P[accept H0 0] =
n1
(12.10.2)
L,,(01)
J'A,,
where L,,(0) =f(x1; 0) . . . f(x,,; 0), and the integral notations are defined as follows:
ÇLj00) =
Jc,,
Ç... Çf(i; 0) . . .f(x,,; 0) dx1 J
C,,
J
and ÇL,,(01)
JA,,
Ç... Çf(xi; 0) = JA,,,,)
.
f(x,,; 0) dx1
dx,,
The constants k0 and k1 are solutions of the integral equations (12.10.1) and (12. 10.2), and, as might be expected, an exact determination of these constants is not trivial, Fortunately, there is a rather simple approximation available that we will consider shortly.
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
1210 SEQUENTIAL TESTS
431
Before we proceed, there are a number of points to consider about SPRTs. In
particular, because the sample size depends on the observed values of the sequence of random variables X1, X2, . ., it is itself a random variable, say N. As
one might suspect, the distribution of N is quite complicated, and we will not derive it. Another concern is the possibility that testing could continue indefinitely. Although we will not attempt a proof, it can be shown that a SPRT will terminate in a finite number of steps. Specifically, it can be shown that N < (X)
with probability i Of course another point that was raised earlier concerns
whether the size of the sample can be reduced by using a sequential test rather than a test with a fixed sample size. We will discuss this latter point after we consider the question of approximations for k0 and k1.
APPROXIMATE SEQUENTIAL TESTS Suppose it is required to perform a sequential test with prescribed probabilities of Type I and Type II errors, a and ß, respectively. As noted above, the constants k0 and k1 can be obtained by solving the integral equations (12.10.1) and (12.10.2), and exact solutions, in general, will be difficult to achieve. Fortunately, it is possible to obtain approximate solutions that are much easier to compute and rather accurate. If a and ß are the exact levels desired, then we define constants k a
ßandkl_
1a
1
ß
The following discussion suggests using k and kj as approximations for k0 and k1. Using the above stated property that N < cn with probability 1 and that x) k0 when (x1, .. ., x,,) is in C,,, it follows that
a = P[reject H0 00] = I
f' L,,(00)
n1 .C,,
= k0
n1
zi1 J C,,
k0 L,,(01)
jL(Oi)
= k0 P[reject H0 O]
=k0(1ß) and hence a/(1 - ß)
k0. SImilarly, because 1,,(x1, ..., x,,)
k when (x1, ...,
x,,)
is in A,,, it follows that
i - a = P[accept H O] = ,i=1
J A,,L,,(00)
n=1
JA,,k1L(0)
= k1P[accept H0 0] = k1ß and hence k1
(1 - a)/ß. These results imply the inequality k
k0
k.
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
432
CHAPTER 12 TESTS OF HYPOTHESES
A relationship now will be established between the errors for the exact test and those of the approximate test. Denote by c' and ß* the actual error sizes of the approximate SPRT based on using the constants k and k?. Also, denote by C and A the sets that define, respectively, the critical and acceptance regions for the approximate test based on k and k?. It follows by an argument similar to that given above that
=
n1
J0o)
fi ni
JLn(Oi)
=i
-
i
/3*)
and
i-
JA°
= n1 It follows that *(l fi)
x(1
ly that *(l - fi) + (1 -
1fi n1 JA fi*) and (1 - )ß* a(1
ß*) + (1
- a*)ß, and consequent*)ß, which, after simplification
yields the inequality
+ß
+ fi
Thus, if the experimenter uses the approximate SPRT based on the constants - ß) and k? = (1 - )/ß rather than the exact SPRT, then based on. the constants k0 and k1, the sum of the errors of the approximate test are bounded above by the sum of the error of the exact test.
k=
EXPECTED SAMPLE SIZE We now consider a way of assessing the effectiveness of SPRTs in reducing the amount of sampling relative to tests based on fixed sample sizes. Our criterion involves the expected number of observations required to reach a decision. As before, we denote by N the number of observations required to reach a
decision, either reject H0 or accept H0. Theoretically, we might attempt to compute its expectation directly from the definition, but as noted previously the
distribution of N is quite complicated and thus we will resort to a different approach. Recall that the test is based on observed values of a sequence of random variables X1, X2, ..., X,, which are independent and identically distributed with pdff(x; O). Theoretically, we could continue taking observations indefinitely, but according to the sequential procedure defined above, we will teijijinate as soon as A,, k0 or A,, ? k1 for some n, and we define N as the first such value n.
We now define a new random variable, say Z = lnf(X; O) - lnf(X; O) where X f(x; O) for either 6 = 8 or 8. In a similar manner, we can define a whole sequence of such random variables Z1, Z2, ..., based on the sequence Z for m
X1, K2, ... and we also can define a sequence of sums, Sm =
i.
1
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
433
12.10 SEQUENTIAL TESTS
Notice that these sums are related to the likelihood ratios:
Sm = in [ìm(Xi, ...,
= 1,2,...
in (k0) It follows that N is the subscript of the first sum S such that either S + ZN. It or S, in (k1), and we denote the corresponding sum as 5N = Z1 + is possible to show that E(SN) = E(N)E(Z) when E(N) < cta. This relationship, which is known as Wald's equation, is useful in deriving an approximation to the expected sample size. We will not attempt to prove it here. If the sequential test rejects H0 at step N, then SN ln (k0), and we would expect the sum to be close to in (k0), because it first dropped below this value at the Nth step. Similarly, if the test accepts H0 at step N, then SN in (k1), and we would expect the sum to be close to in (k1) in this case. These remarks together with Wald s equation suggest the following approximation
E(N
E(SN)
in (k0)P[reject H0] + In (k1'IP[accept H0]
- E(Z)
E(Z)
By using the approximations k0 k = a/(1 - fi) and k1 k = (1 - o)/ß, we obtain the following approximation to expected sample size when H0 is true: E(N I O)
in[/(l fi)] +(l )ln[(l )/ß] =
E(Z O)
Similarly, an approximation when Ha is true is given by
E(N O)
(1
- fi) In [/(l - fi)] + fi in [(1 - )/ß] E(Z j O)
Example 12 10 2 We consider again the problem of Example 12 10 1, which dealt with the force (in pounds) required to break a certain type of ceramic part whose breaking strength is normally distributed with mean p and standard deviation cr = 40. Suppose we wish to test the simple null hypothesis H0 : p = loo versus the simple alternative Ha : p = 120 with x = 0.05 and fi = 0.10. Thus, the approximate critical values for a SPRT are k = 0.05/(1 - 0.10) 0.056 and k1' = (1 - 0,05)/0.10 = 9.5, and
0.056, and accept H0 as soon as 9.5. In this case, it also is possible to express the test in terms of the sum of i 1
such a test would reject H0 as soon as 2 the data. Specifically, because f(x; p) -
exp
- i2cr(x - p)2
we can
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
434
CHAPTER 12 TESTS OF HYPOTHESES
write
= lnf(x1; Po) - Inf(x1; /L1)
=(xp) (x--p) 1
i
-
2
1
[2x(u - Pi) - (U
2
-
Po - Pi [2x - (Po + pi)] 20.2
It follows that
-
PoPi [2 + Pi)] = which is a linear function of ;. Thus, the criterion of stopping the test if In (ks) or s ln (kr) is equivalent to stopping if x c0(n) or c1(n) with c0(n) and c1(n) determined by n, k, k, Po, Pi' and c2. It also would be interesting to approximate the expected sample size and compare this to the sample size required to achieve the corresponding test with a fixed sample size. The expression given above for z, also can be used to obtain E(Z). Specifically, it follows that E(Z)
-
Po - Pi [2E(X) - (Po + Pi)] 2a2
and thus,
(Po +p)] As a result, we have that E(Z p1) =
i =0,1
1)(p - p)2/(2o2) for i
0, 1, and in our
example,
E(Njp0 = 100) 0.05 In [0.05/(1 - 0.10)] + (1 - 0.05) In [(1 - O.05)/O.10] (100 - 120)2/[2(40)2] = 16
and E(N1p1 = 120)
(1 - 0.10) In [0.05/(1 - 0.10)] + 0.10 in [(1 - 0.05)/0.10] (100 - 120)2/[2(40)2] 19
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
SUMMARY
435
For the SPRT, the expected sample sizes 16 and 19 under H0 and H1, respectively, compare to the sample size n = 34 for the corresponding Neyman-Pearson test considered in Example 12.10.1.
For additional reading about sequential tests, the book by Ghosh (1970) is recommended.
SUMMARY Our purpose in this chapter was to introduce the concept of hypothesis testing, which corresponds to a process of attempting to determine the truth or falsity of specified statistical hypotheses on the basis of experimental evidence. A statistical
hypothesis is a statement about the distribution of the random variable that models the characteristic of interest. If the hypothesis completely specifies the distribution, then it is called a simple hypothesis; otherwise it is called composite. A Type I error occurs when a true null hypothesis is rejected by the test, and a
Type II error occurs when a test fails to reject a false null hypothesis, It is not possible to avoid an occasional decision error but it is possible in many situ ations to design tests that lead to such errors with a specified relative frequency If a test is based on a set of data consisting of n measurements then the critical region (or rejection region) is a subset of an n-dimensional Euclidean space. The null hypothesis is rejected when the n-dimensional vector of data is contained in the critical region. Often it is possible to express the critical region in terms of a test statistic. For a simple hypothesis, the significance level is the probability of committing a Type J error, In the case of a composite hypothesis, the sìze of the test (or size of the associated critical region) is the largest probability of a Type I error relative to all distributions specified in the null hypothesis. The power function gives the probability of rejecting a false null hypothesis for
the different alternative values. By means of the Neyman-Pearson lemma, it is possible to derive a most powerful test of a given size, and in some cases this test is a UMP test, In many cases a UMP test cannot be obtained, but it is possible to derive reasonable tests by means of the generalized likelihood ratio approach. For example, this approach can be used in many cases to derive tests of hypotheses where nuisance parameters are present, and also in situations involving two sided alternatives. In a few cases, it is possible to derive UMP unbiased tests that also can be used in nuisance parameter problems and with two-sided alternative hypotheses.
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
436
CHAPTER 12 TESTS OF HYPOTHESES
EX E R CI SES Suppose X1, . . , X1 is a random sample ofsize n = 16 from a normal distribution, xi - N(u, 1), and we wish to test H0 : p = 20 at significance level n = 0.05, based on the sample mean X. a} and Determine critical regions of the form A = {. - co < B = {Ib < co}. Find the probability of a Type II error, ß = P[TII], for each critical region in (a) for the alternative Ha : u = 21. Which of these critical regions is unreasonable for this alternative? Rework (b) for the alternative Ha : u = 19. What is the significance level for a test with the critical region A u B?
What is ß = P[TII] for a test with critical region A u B if u
20 I = i?
2.
Suppose a box contains four marbles, 9 white ones and 4 O black ones. Test H0 : O = 2 against H O 2 as follows Draw two marbles with replacement and reject H0 if both marbles are the same color; otherwise do not reject. Compute the probability of Type I error. Compute the probability of Type II error for all possible situations. Rework (a) and (b) if the two marbles are drawn without replacement.
3.
Consider a random sample of size 20 from a normal distribution, X N(p, a2), and suppose that = 11 and s2 = 16. Assuming it is known that .2 = 4, test H0 p 12 versus Ha p < 12 at the significance level a = 0.01. What is fi F{TII] if in fact p = 10.5? What sample size is needed for the power of the test to be 090 for the alternative value p = 10.5? Test the hypotheses of (a) assuming a2 unknown (t.)
fest H0 a2
9 against H a2> 9 with significance level a = 001
t; 1,Vhat sample size is needed for the test of(e) to be 90% certain of rejecting H0 when in fact o.2 18? What is fi = PETIT] in this case?,
4. Consider the biased coin discussed in Example 9.2.5, where the probability of a head, p, is known to be 0.20, 0.30, or 0.80. The coin is tossed repeatedly, and we let X be the number of tosses required to obtain the first head To test H0 p 0 80 suppose we reject H0 if X 3, and do not reject otherwise. What is the probability of Type I error, P[TI]? What is the probability of a Type II error PETTI] for each of the other two values of p9
For a test of H0 p = 030 suppose we use a critical region of the form (1 14 15 } Find FETI] and also find P{TII] for each of the other values of p 5.
It is desired to test the hypothesis that the mean melting point of an alloy is 200 degrees Celsius (°C) so that a difference of 20°C is detected with probability 0.95. Assuming
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
EXERCtSES
437
normality, and with an initial guess that a = 25CC, how many specimens of the alloy must be tested to perform a t test with a 0.01? Describe the null and alternative hypotheses, and the form of the associated critical region. 6
Let X1, ..., X be a random sample of size n from an exponential distribution, versus I-i,,: > m is desired, based on X1.. X-.' EXP(1, ). A test ofH0 Find a critical region of size a of the form {x1. c}. Derive the power function for the test of (a) Derive a formulai determine the sample size n for a test of size a with 8 = P[TII]
= 7 A coin is tossed 20 times and x = 6 heads are observed. Let p P(head). A test of H0 p 0.5 versus Ha p < 0.5 of size at most 0.10 is desired, (a) Perform a test using Theorem 12.4.1. (b) Perform a test using Theorem 12.4,2. (c) What is the power of a size a = 0.05 77 test of H0 p 0.5 for the alternative p = 0.2? (d) What is the p-value for the test in (b)? That is, what is the observed size?
8. Suppose that the number of defects in a piece of wire of length t yards is Poisson distributed, X POJ(At), and one defect is found in a 100-yard piece of wire, Test H0 A 005 against Ha A <005 with significance level at most 001 by means of Theorem 12.5.1. What is the p-value for such a test? (e) Suppose a total of two defects are found in two 100-yard pieces of wire. Test 0.05 versus Ha A < 0.05 t significance level a = 0.0103. I-Ia A (d) Find the power of the test in (c) if = 0.01. 9.
Consider independent random samples from two normal distributions, X - N(u1, o')
fori=1,..,n1 and 1N(u2,cr)forj=1,...,n2.Letn1=n2=9,=16J=10, s=36 ands=45 p at the a = 0.10 Assuming equal variances, test H0 : p1 = u2 against Ha p level of significance. Perform ari approximate a = 0 10 level test of H0 p = P2 against Ha Pi U2 using equation (11 5 13) Perform a test of these hypotheses at the a = 0,10 significance level using equation (11.5.17), assuming the data were obtained from paired samples with s, = 81. Test H0 a/a i versus Ha o/a > i at the a = 005 level Use Table 7 (Appendix C) to find the power of this test if cr/a = 1,33.
10. A certain type of component is manufactured by two different companies, and the respective probabilities of a nondefective component are Pi and P2' In random samples of 200 components each, 180 from company 1 are nondefective, and 190 from company 2 are nondefective. Test H0 : Pi = P2 against H0 Pi P2 at significance level a = 0.05.
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
438
CHAPTER 12 TESTS OF HYPOTHESES
11.
Consider a distribution with pdff(x; 8) = Ox°' if 0< x < land zero otherwise. Based on a random sample of size n = i find the most powerful test of H0 8 against H0 : O = 2 with z = 0.05. Compute the power of the test in (a) for the alternative O = 2. (e) Derive the most powerful test for the hypotheses of(a) based on a random sample of size n.
12.
Suppose that X - P01(p). Derive the most powerful test of H0 : p = Po versus H0 : p = ui (J > ,u) based on an observed value of X. Rework (a) based on a random sample of size n.
13,
Let X
NB(r, 1/2).
Derive the most powerful test of size 0.125 of H0 based on an observed value of X. Compute the power of this test for the alternative r
I against H, : r =
2
Assume that X is a discrete random variable. Based on an observed value of X, derive the most powerful test of H0: X = GEO(0.05) versus H0 : X -. P01(0.95) with = 0.0975. Find the power of this test under the alternative. 15.
Let X1.....X have joint pdff(x....., x; 8) and S be a sufficient statistic for 0. Show that a most powerful test of H0 : O
80 versus H0 : 0
0 can be expressed in terms of S.
16.
Consider a random sample of size n from a distribution with pdff(x; 0) = (3xu/0)e_x3/O if O < x, and zOco otherwise. Derive the form of the critical region for a uniformly most powerful (UMP) test of size a of H0 : 0 8 against H, 0> 00.
17.
Suppose that X1, ..., X,, is a random sample from a normal distribution, X -. N(0, n2). against H0 o> a0 Derive the UMP size a test of H0 a = Express the power function of this test in terms of a chi-square distribution. If n = 20, a0 = 1, and a = 0.005, use Table 5 (Appendix C) to compute the power of the test in (a) when a = 2.
18.
Consider a random sample of size n from a uniform distribution, X UNIF(0, 8). Find the UMP test of size a of H0 : O o 00 versus H0 :0 < 8, by first deriving a most powerful test of simple hypotheses and then extending it to composite hypotheses.
19
Let X1
X,, be a random sample from a normal distribution X, N(p 1) Find a UMP test of H0 p = Po against H0 P
20. Suppose that X is a continuous random variable with pdff(x; 8) = i - 82(x - 1/2) if O
0, based on an observed value x of X, is to reject H0 if x
a.
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
439
EXEHCSES
21. Consider a random sample X1, ..., X from a discrete distribution with pdf f(x; 0) = [0/(0 + 1)]X/(8 + 1) if x = 0, 1, ... where 6> 0. Find a UMP test of H O = 00 against Ha 0> O
22. Let X1,. ., X,, denote a random sample from a gamma distribution, X GAM(O, ;c). If , is known, derive a UMP size a test of H : 8 0 against Ha : 8> 0 Express the power function for the test in (a) in terms of a chi-square distribution. If n = 4, ir = 2. 60 = i, and a = 0.01, use Table 5 (Appendix C) to find the power of this test when U = 2. If Ois known, derive a UMP size a test of H0 : ir ir0 against Ha : K> K.
23. Consider a random sample of size n from a Bernoulli distribution, X
BIN(1, p).
Derive a UMP test of H0 :p Po versus H,, : p > Po using Theorem 12.7,1. Derive the test in (a) using Theorem 12.7.2.
24. Suppose that X1, ..., X,, is a random sample from a Weibull distribution, X
WEI(0, 2).
Derive a UMP test of H0 : O ? 0 versus Ha: O < 8 using Theorem 12.7.2.
25. Show that the test of Exercise 20 is unbiased. 26. Consider the hypotheses of Example 12.7.5, and consider a test with critical region of the c1 or s0 c2} where = x/o and where c1 and 02 are form C = {(x1.....x,,) I s1 chosen to provide a test of size a. Show that the power function of such a test has the form ir(2) = i - H(c2 o/cr2; n) + H(c1o/cr2; n) where H(c; n) is the CDF of f(n). Show that for this test to be unbiased, it is necessary that c and c2 satisfy the equations H(c2 ; n) - H(c1; n) i - a and c2h(c2 ; n) - c1h(c1; n) = O where h(c; n) = H'(c; n). Hint: For the test to be unbiased, the minimum of ir(a2) must
occur at a2 = a. In Example 12.7.5, a1
H(c1; n) and a2 = 1 - H(c2; n).
27
Let X1
28
Consider independent random samples of size n1 and n2 from respective exponential distributions X, EXP(61) and ) EXP(02) Derive the GLR test of 110 1 = 0 versus Ha 01 02
V be a random sample from an exponential distribution X, EXP (0) Derive the generalized likelihood ratio (GLR) test of H0: O = 00 against Ha: O 00. Determine an approximate critical value for size a using the large-sample chi-square approximation. Derive the GLR test of H : 0 00 against Ha : 0> 0.
Let X i,..., X,, be a random sample from a distribution with pdff(x; 0) = 1/0 if O r x and zero otherwise. Derive the GLR test of H0 : O = 00 versus Ha : 0 80.
O
Consider independent random samples of size n1 and n2 from respective normal
distributions, X - N(u, o) and Y
N(ß2, o').
Derive the GLR test of H0 : = cr against Ha : a are known. Rework (a) assuming that p1 and p2 are unknown.
a, assuming that p and P2
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
440
CHAPTER 12 TESTS OF HYPOTHESES
) Derive the GLR for testing H0 : Pi = /12
H0:p1p2 or
and a = cr against the alternative
31, Suppose that X is a continuous random variable with pdff(x; 9) = 9x and zero otherwise Derive the GLR test of H0 0 = 00 against H 8
if O < x < 1, 00 based on a random sample of size n. Determine an approximate critical value for a size a test based on a large-sample approximation.
32. To compare the effectiveness of three competing weight-loss systems, 10 dieters were assigned randomly to each system, and the results measured after six months. The following weight losses (in pounds) were reported: System 1:
4.3, 10.2, 4.4, 23.5, 54.0, 5.7, 10.6, 47.3, 9.9, 37.5
System 2:
10.7, 6.4, 33.5, 54.1, 25.7, 11.6, 17.3, 9.4, 7.5, 5.0
System 3:
51.0, 5.6, 10.3, 47.3, 2.9, 27.5, 14.3, 1.2, 3.4, 13.5
Assuming that the data are normally distributed, use the result of Theorem 12.8.1 to test the hypothesis that all three systems are equally effective in reducing weight. Use the results of Theorems 12.8.2 and 12.8.3 to test the hypothesis that the variance in weight loss is the same for all three systems.
33. Consider a random sample of size n from a distribution with pdff(x; 0, p) = 1/20 if - p 0, and zero otherwise. Test H0 : p = O against Ha : p 0. Show that the GLR I
2
= 1(x) is given by
= (X.,, - X1.)/[2 max (X1, X,,,,)] Note that in this case approximately 2 In I
X2(2) because of equation (12.8.2).
34. Let X1.....X,, be a random sample from a continuous distribution. Show that the GLR for testing H0 : X N(p, a2) against Ha : X EXP(0, n) is a function of O/e. Is the distribution of this statistic free of unknown parameters under H0? Show that the GLR for testing H0 : X N(p, a2) against Ha : X DE(0, ) is a function of Ô/â. 35.
Consider a random sample of size n from a two-parameter exponential distribution, X, EXP(0, ,), and let and O be the MLEs. Show that ] and O are independent. Hint: Use the results of Exercise 30 of Chapter 10.
2(2n) Let V1 = 2n( - ,j)/O 1' = 2n(n - n)/O and V3 = 2n0/0 Show that V1 x2(2n - 2). Hint: Note that V1 = J' + V3 and that V2 and V3 X2(2), and V3 V2
are independent. Find the MGF of V3 by the approach used in Theorem 8.3,6. n)/Ô « F(2, 2n - 2). Derive the GLR for a test of H0 : n = fo versus Ha : n > no. Show that the critical region for a size a GLR test is equivalent to (n - 1)( - flo)/0 f1_,,(2, 2n - 2).
Show that (n - 1)(î -
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
441
EXERCISES
36.
Suppose that X and Y are independent with X
s x + y.
P01(p1) and Y
Show that the conditional distribution of X given S P
P01(p2), and let
s is BIN(s, p) where
= 14i/1Pi + P2).
Use the result of (a) to construct a conditional test of H0 : Pi
P2 versus
H0:p1 P2. Construct a conditional test of H0 : p1/p2 = c0 versus H0 : Pi/P2
c for some
specified e0.
37.
Consider the hypotheses H0 a = i versus H0 :
3 when the distribution is normal
with p = 0. If x, denotes the ith sample value:
Compute the approximate critical values k' and k' for a SPRT with P[Type I error] 0.10 and P[Type II error] = 0.05. Derive the SPRT for testing these hypotheses. Find a sequential test procedure that is stated in terms of the sequence of sums s
>2
x and is equivalent to the SPRT for testing H0 against H0.
= Find the approximate expected sample size for the test in (a) if H0 is true. What is the approximate expected sample size if H0 is true?
Suppose the first 10 values of x1 are: 2.20, 0.50, 2.55, 1.85, 0.45, 1.15, 0.58, 5.65, 0.49, and - 1.16. Would the test in (a) terminate before more data is needed? 38.
Suppose a population is Poisson distributed with mean p. Consìder a SPRT for testing H0 : p = i versus H0 : p = 2. Express the SPRT in terms of the sequence of sums s = >2 x1. Find the approximate expected sample size if H0 is true when
39.
= 0.01 and /1 = 0.02.
Gross and Clark (1975, page 105) consider the following relief times (in hours) of 20 patients who received an analgesic: 1.1, 1.4, 1.3, 1.7, 1.9, 1.8, 1.6, 2.2, 1.7, 2.7, 4.1, 1.8, 1.5, 1.2, 1.4, 3.0, 1.7, 2.3, 1.6, 2.0
Assuming that the times were taken sequentially and that relief times are independent and exponentially distributed, X EXP(0), use an approximate SPRT to test the hypotheses H0 : O = 2.0 versus H0 : 0 4.0 with = 0.10 and ß = 0.05. Approximate the expected sample size for the test in (a) when H0 is true. Approximate the expected sample size for the test in (a) when H0 is true.
40. Prove the identity (12.8.5). Hint: Within the squared terms of'the left side, add and subtract ., and then use the binomial expansion.
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
C
P
A
T
E
R
ONTNGENCY TALE3
AND GOODNESSSFFT
13J INTRODUCTION Most of the models discussed to this point have been expressed in terms of a pdf
f(x; 9) that has a known functional form. Moreover, most of the statistical methods discussed in the preceding chapters, such as maximum likelihood estimation, are derived relative to specific models. In many situations, it is not possible to identify precisely which model applies. Thus, general statistical methods for testing how well a given model "fits," relative to a set of data, are desirable. Another question of interest, which cannot always be answered without the aid of statistical methods, concerns whether random variables are independent. One possible answer to this question involves the notion of contingency tables. 442
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
13.2 ONE-SAMPLE BINOMIAL CASE
443
ONE-SAMPLE BINOMIAL CASE First let us consider a Bernoulli-trial type of situation with two possible outcomes, A1 and A2, with P(A1) = p and P(A2) = P2 = i - Pi A random sample of n trials is observed, and we let o = x and 2 = n - x denote the observed number of outcomes of type A1 and type A2, respectively. We wish to test H0 : Pi = Pio against H r p1 Pio Under H0 the expected number of outcomes of each type is e1 = np10 and e2 = np20 = n(1 - prn). This situation is illustrated in Table 13.1. We have discussed an exact small-sample binomial test, and we also have discussed an approximate test based on the normal approximation to the binomial. The approximate test also can be expressed in terms of a chi-square variable, and
this form can be generalized for the case of more than two possible types of outcomes and more than one sample.
The square of the approximately normally distributed test statistic will be approximately distributed as 2(1), and it can be expressed as 2
(x - np10)2 - np10(1 - Pio)
(x - np10)2 np10
+
(x - np10)2
- Pio) (x - np10)2 [(n - x) - n(1 - Pio)]2 + np10 n(1 - Pio) n(1
(13.2.1)
test ofH0 is to reject H0 if X2> xf(l) The f statistic reflects the amount of disagreement between the observed outcomes and the expected outcomes under H0, and it is an intuitively appealing form. In this form, the differences in both cells are squared, but the differences are Ari approximate size
linearly dependent because
(o - e) = O, and the number of degrees of
freedom is one less than the number of cells.
TABLE 13.1
Values of expected and observed outcomes for a binomial experiment Possible outcomes Probabilities Expected outcomes Observed outcomes
A, pio
A2
0ix
O=flX
e,np10
Total
p20
82flp20
n n
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
444
CHAPTER 13 CONTINGENCY TABLES AND GOODNESS-OF-FIT
We should note that the chi-square approximation can be improved in this case by introducing a correction for discontinuity. Thus, a somewhat more accurate result is obtained by using the statistic
(IoeI _0.5)2
(13,2.2)
X2
The chi-squared test statistic may be generalized in two ways It can be extended to apply to a multinomial problem with k types of outcomes, and it can be generalized to an r-sample binomial problem. We will consider the r-sample binomial problem first.
1303 TEST (COMPLETELY PECFED H0)
r-SAM P
Suppose now that X.
BIN(n1, p1) for i = 1, ..., r, and we wish to test H0 : p = Peo, where the are known constants. Now let o = x, and i2 = n, - x, denote the observed outcomes in the ith sample, and let e11 = n1p10 and
e12 = n1(1 - Pio) denote the expected outcomes under H0. ecause a sum of independent chi-square variables is chi-square distributed, we have approximately r
X2
=
2
(
Ii
i1 j=1
An approximate size
e13,
(13.3.1)
e1j
test is to reject H0
if2 > x_(r).
Example 13.3.1 A certain characteristic is believed to be present in 20% of the population. Random samples of size 50, 100, and 50 are tested from each of three different races, with the observed outcomes shown in Table 13.2. The expected numbers of outcomes under H0 : Pi = P2 = p3 = 0.20 then are e11 = 50(0.2) = 10, e21 = 100(0.2) = 20, and e31 = 50(0.2) = 10. The remaining ejj may be obtained by subtraction and are shown in Table 13.3. TABLE 13.2
Observed outcomes for a three-sample binomial test Observed Outcomes
Race 1
Race 2 Race 3
Present
Absent
Total
20 25 15
30 75 35
50 100 50
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
13.3 r-SAMPLE BINOMIAL TEST (COMPLETELY SPECIFIED H0
TABLE 133
445
Expected outcomes for a three-sample binomial test
Racel Race 2 Race 3
Present
Absent
Total
10 20 10
40 80 40
100 50
50
We have
=
(20 _10)2
ii ji
e
+
(30 _40)2
+
+
(35 _40)2
= 17.18 Because X.99(3) = 11.3, we may reject H0 at the
= 0.01 significance level.
TEST OF COMMON p
Perhaps a more common problem is to test whether the p are all equal, = P2 = = Pr = p, where the common value p is not specified. We still have the same r x 2 table of observed outcomes, but the value p must be estiH0 :
mated to estimate the expected numbers under H0. Under H0 the MLE of p is the pooled estimate
n.
and ê = nub, ê2 = n(1 -p), where N = i= i
The test statistic is r
f = ii
2 "
ii
(13.3.2)
e1
The limiting distribution of this statistic can be shown to be chi-squared with
r - i degrees of freedom so an approximate size test is to reject H0 if - 1) >
f
(13.3.3)
Quite generally in problems of this nature, one degree of freedom is lost for each unknown parameter estimated This is quite similar to normal sampling, where
(X1
)2 x2(n)
and
1)
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
446
CHAPTER 13 CONTINGENCY TABLES AND GOODNESS-OFFIT
TASLE 13.4
Table of r-sample binomial observations Sample
A2
°21
°12 °22
i 2
Total
A1
fi ¡12
n,
N(1 -ß)
N,5
N
We again have a linear relationship, because
(x1 -
= 0, and the effect is that
the latter sum is equivalent to a sum of n - i independent squared standard normal variables. The degrees of freedom also can be heuristically illustrated by considering the r x 2 table of observed outcomes w Table 13 4 Having a fixea value of the estimate corresponds to having all of the marginal totals in the table fixed, Thus r - i numbers in the interior of the table can be assigned, and
all the remaining numbers then will be determined In general, the number of degrees of freedom associated with an r x c table with fixed marginal totals is (r - 1) . (c - 1). 'pIe 13.3.2 Consider again the data in Example 13.3.1, but suppose we had chosen to test the simple hypothesis that the proportion containing the characteristic is the same in the three races, H0 : p1 = P2 = p3 = p. In this case, = 60/200 0.3, = 50(60/200) = 15, ê21 = 100(60/200) 30, and the remaining expected numbers may be obtained by subtraction. These expected values are included in Table 13.5 in parentheses. In this case 3
2
(o1_ê1)2/ê=k'20 - 15\2 15
1=1 j=1
35'2
(35
35
I
Because x.99(2) = 9.21, we cannot reject the hypothesis of common proportions at the a = 0.01 level. TABLE 13.5
Observed and expected numbers under null hypothesis of equal
proportions Present Race 2 Race 3
20 (15) 25 (30) 15 (15)
Tota
60
Race 1
Absent 30 (35) 75 (70) 35 (35) 140
Total 50 100 50
200
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
447
13.4 ONE-SAMPLE MULTINOMIAL
Note that reducing the degrees of freedom when a parameter is estimated results in a smaller critical value. This seems reasoliable, because estimating p and o, and a smaller comautomatically forces some agreement between the puted chi squared value therefore is considered significant In this example the smaller critical value still is not exceeded, which suggests that the proportions may all be equal, although we had rejected the simple hypothesis that they were all equal to 0.2 in Example 13.3.1. In Section 13.2 we mentioned that a correction for discontinuity may enhance
the accuracy of the chi squared approximation for small sample sizes The chi squared approximation generally is considered to be sufficiently accurate for practical purposes if all the expected values in each cell are at least 2 and at least 80% of them are 5 or more.
134 ONE-SAMPLE MULTINOMIAL Suppose now that there are c possible types of outcomes, A1, A2 .....4, and in a sample size n let o .....o denote the number of observed outcomes of each
p = 1, and we
type. We assume probabilities P(A) = p,j = 1, ..., c, where j i
= p10,j = 1, ..., c. Under np0. The chi-square statistic again provides an appealing and convenient test statistic, where approx wish to test the completely specified hypothesis H0 H0 the expected values for each type are given by e imately
'"f
=
'-J'
ej
-x2(c
1)
(13.4.1)
j=i It is possible to show that the limiting distribution of this statistic under H0 is x2(c - 1), and this is consistent with earlier remarks concerning what the appropriate number of degrees of freedom tutns out to be in these problems. Equation (13.2.1) illustrated that one degree of freedom was appropriate for the binomial case with c = 2. Also, for fixed sample size, c - 1 observed values determine the remaining observed value.
= 1/6, Exarna 134.1 A die is rolled 60 times, and we wish to test whether it is a fair die, H0 i = 1, ..., 6. Under H0 the expectcd outcome in each case is ej = np = 10, and the results are depicted in Table 13.6. In this case - e3)2
- 6.0
j= i
so we cannot reject H0 at the
= 0.10 level of significance.
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
448
TABLE 13.6
CHAPTER 13 CONTINGENCY TABLES AND GOODNESS-OF-FIT
Observed and expected frequencies for a die-rolling experiment
Observed Expected
3
2
1
8
11
5
10
10
10
4
5
12 10
15 10
9
lO
60 60
As suggested in Section 13.3, if the model is not specified completely under H0,
then it is necessary to estimate the expected numbers,
,
and the number of
degrees of freedom is reduced by one for each parameter estimated This aspect is discussed further in the later sections.
13J r-SAMPLE MULTINOMIAL We may wish to test whether r samples come from the same multinomial population or that r multinomial populations are the same. Let A1, A2, ..., A,, denote c possible types of outcomes, and let the probability that an outcome of type A will occur for the ith population (or zth sample) be denoted by PjIi Note that i=1
= i for each i = 1, ..., r. Also let o1 denote the observed number of out
comes of type A in sample i. For a completely specified H0 = p, then = n.p) under H0, and it is clear that equation (13.3.1) can be extended to this case. Approximately for each i,
- e)2 i= i
and
=
r
i=1
r
z? =
e
i=i j=i
(o - e12/e1
2(r(c - 1))
(13.5.1)
under H0. The more common problem is to test whether the r multinomial populations are the same without specifying the values of the p, Thus we consider
H0:p11 P2
forj= 1,2,.,.,c
We must estimate e - i parameters Pi, ..., PC-1' which also will determine the estimate of p because
p1
= 1. Under H0 the MLE of p will be the pooled
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
449
13.5 rSAMPLE MULTINOMIAL
n. items, which gives
estimate from the pooled sample of N = ou
ci
_u=1
N
N
where c is thejth column total, and
= n.j = n1c/N The number of degrees of freedom in this
case
is
r(c - 1) - (c - 1)
= (r - 1)(c - 1), and approximately
- 1)(c - 1
)
(13.5.2)
mpIs 13.5.1 In Example 13.3.1, suppose that the characteristic of interest may occur at three
different levels: absent, moderate, or severe. We are interested in knowing whether the proportion of each level is the same for different races.
=
The notation is depicted in Table 13.7. We wish to test H0 : p,1 = = p forj = 1,2, 3.
The observed outcomes are shown in Table 13.8. The estimated expected numbers under H0 are given in parentheses, where ê11 = n1 = n1(c1/N) TABLE 13.7
Conditional probabihties for a three-sample binomial test
Sample 1 (Race 1) Sample 2 (Race 2) Sample 3 (Race 3)
TA8LE 13.8
A2
A,
Moderata
Absent
p111
p2'1
p3'1
p112 p113
p2'2 p2'3
p3'2 p3'3
pl
p2
p3
Observed and expected outcomes Observed Outcomes (Expected Outcomes) Severe
Moderate
s1 (Race 1) 53 (Race 2) 53 (Race 3)
10 (6) 4 (12) 10 (6)
10 (9)
Total
24
36
21(18) 5 (9)
Absent 30 (35) 75 (70) 35 (35) 140
Total 50 100 50
200
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
450
CHAPTER 13 CONTINGENCY TABLES AND GOODNESS-OF-FIT
= 50(24)/200 = 6, ê12 = n1(c2/N) = 9, ê21 = n2(c1/N) = 12, ê22 = n2(c2/N) = 18, and the others may be obtained by subtraction. The number of degrees of freedom in this case is (r - 1)(c - 1) = 2(2) = 4 and 3
3,'
E i1 j=1
- "2 = 14.13> 13.3 = X99(4 ejj
so H0 can be rejected at the
= 0.01 level.
The question of whether the characteristic proportions change over samples (in this example, over races) is similar to the question of whether these two factors (characteristic and race) are independent. Note that in the r-sample multinomial case, the row totals (sample sizes) are fixed, and it appears natural to consider the test of common proportions over samples, sometimes referred to as a test of homogeneity. In the example, one could have selected 200 individuals at random and then counted the number that. fall in each race category and each characteristic category. In this case, the row totals and column totals are both random. We can look at the conditional test in this case, given the row totals, and analyze the data in the same way as before, but it appears somewhat more natural to look
directly at a test of independence in this case. It turns out that the same test statistic is applicable under this interpretation, as discussed in the next section.
136 TEST FOR INDEPENDENCE, r x c CON1!IGENCY TM3LE Suppose that one factor with c categories is associated with columns and a second factor with r categories is associated with rows in an r x c contingency table. Let p denote the probability that a sampled item is classified in the ith row category and the jth column category Let p,
p, denote the marginal i=1
probability that an individual is classified in row i, and let p. =
p denote the 1=1
marginal probability that an individual is classified in the jth column, as illus trated in Table 13 9
Note that the total joint probabilities in this case add to 1, whereas in Table 13 7 the probabilities under consideration correspond to conditional probabil ities and each row adds to 1 If the classification of an individual according to one factor is not affected by its classification relative to the other category, then the two factors are independent That is they are independent if the Joint classification probabilities are the products of the marginal classification probabilities, = p. p.. Thus, to test
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
451
13.6 TEST FOR INDEPENDENCE rxc CONTINGENCY TABLE
TABLE 13.9
Contingency table of joint and marginal probabilities Columns
Rows 2 3
1
2
3
Pii
Pi2
Pi3
P21
P2
P23
P1. P2.
p31
p32
p33
p3.
p.1
p.2
p.3
i
independence we test H0 : p0
p1.
p.. Let n =
j1
o
and c =
denote í=1
the row and column totals as before, although the n1 are not fixed before the n, denote the total number of outcomes Then = = ni/N ¡' = es/N and under H0 the expected number of outcomes to fall in
sample in this case Let N
the (i, j) cell is estimated to be
We note that ê0 reduces to exactly the same values obtained in the previous problem of testing equal proportions over samples. Thus the chi-squared statistic and the expected for measuring the agreement between the observed outcomes numbers under H0 e1, is computed exactly the same as before Also, as before, asymptotic results show that approximately
(o - ê1/e1
=
X2((r - l)(c - 1))
(13.6.1)
With regard to the number of degrees of freedom, estimating the marginal probabilities
.
and ¡ amounts to fixing the marginal totals, which then leaves
(r - 1)(c - 1) degrees of freedom. This test is also similar to the asymptotic GLR test based on 2 in 2. In that case, the number of degrees of freedom in is r
rc - i because (r - 1) + (c
C
p = 1, and the degrees of freedom for C0 and H0 is
1=1 j=1
1) because
p. = 1 and
p. = 1; thus the dimension of the
parameters specified by H0 is (rc - 1) - (r - 1) - (c - 1) = (r - 1)(c - 1) This result also is consistent with the interpretation discussed in Section 13 4 For a completely specified H0 the number of degrees of freedom is one less than the total number of cells, which in this problem becomes rc - 1. If H0 is not completely specified, then the number of degrees of freedom is reduced by one for
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
452
CHAPTER 13 CONTINGENCY TABLES AND GOODNESS-OF-FIT
each parameter estimated, which in this case is r i + c - 1, which again results in (r - l)(c - 1) degrees of freedom. The formal justification for the approximate distributions and choice of degrees of freedom is based on asymptotic results that are not derived here.
Exaniple 13.6i A survey is taken to determine whether there is a relationship between political affiliation and strength of support for space exploration. We randomly select 100 individuals and ask their political affiliation and their support level to obtain the (artificial) data in Table 13 10
TABLE 13,10
Contingency table for testing independence of two factors Support
i iation
Increase
Republican Dgmocrat Independent
8
10 12
(9) (12) (9)
30
Decrease
Same 12 17 6
35
(10.5) (14) (10.5)
lO 13 12
i
i
-
30 40 30
100
35
Under the hypothesis of independence, H0 : computed and given in parentheses. We have
f=
(10.5) (14) (10.5)
= p. p.s, the expected values are
= 4.54 < 7.78 = x0.90Y)
thus we do not have sufficient evidence to reject the hypothesis of independence at the = 0.10 level of error. Of course, we would obtain the identical result if we
considered a conditional test given fixed row totals, and tested whether the support level probabilities are the same over the political affiliation categories This is reasonable because, for example if the "independents' had a higher prob ability for increased support, then they would have to have a lower probability in
some other category which would represent a dependence between the two factors.
indeed, if we express the notation in Table 13.10 in terms of the joint probabil ities, then p = p0/ps, represents the conditional probability of being in column j
given the ith row, and p = p, is the marginal probability of falling in the )th column classification; thus H0: = p in the sampling setup in Section 13.5 is equivalent to the test of independence in this section, because p = p,,/p,. =
implies p = p. p.1.
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
13.7
CHI-SQUARED GOODNESS-OF-FIT TEST
453
137 CHI-SQUARED GOODNESS-OF-FIT TEST The one-sample multinomial case discussed in Section 13.4 corresponds to testing whether a random sample comes from a completely specified multinomial distribution. This test can be adapted to test that a random sample comes from any completely specified distribution, H0 : X F(x). Simply divide the sample space = P[X e A1] where X F(x). Then for a into c cells, say A1, ..., A, and let
random sample of size n, let o denote the number of observations that fall into the jth cell, and under H0 the expected number in the jth cell is ej = np»1. This is now back in the form of the multinomial problem, and H0 : X '-. F(x) is rejected at the a significance level if
- 1)
e)2/e1
(13.7.1)
j=1
In some cases there may be a natural choice for the cells or the data may be grouped to begin with; otherwise, artificial cells may be chosen. As a general principle, as many cells as possible should be used to increase the number of 5 or so is maintained to ensure that the degrees of freedom, as long as e chi-squared approximation is fairly accurate.
Example 13.7.1 Let X denote the repair time in days required for a certain component in an airplane. We wish to test whether a Poisson model with a mean of three days appears to be a reasonable model for this variable. The repair times for 40 components were recorded, with the results shown in Table 13.11. In some cases the component could be repaired immediately on-site, which is interpreted as zero days.
Under H0 : X P01(3), we have f(x) = e_33x/x!, and the cell probabilities are given by Pio = P[X = 0] =f(0) = e3 = 0.050, P20 =f(1) = e33 = 0.149, P30 =f(2) = 0224, and so on The expected numbers are then ej = np The
TABLE 13.11
Observed and expected frequencies for chi-square goodness-of-fit test of Poisson model with mean 3 Repair Time (Days)
0
2
3
4
5
6
>7
1
3
7
6
10
7
6
0
Probabilities (p)
0050
0149
0224
5.96
8.96
0 050 2.00
0 034
2.00
0 168 6.72
0 101
Expected (e.)
0 224 8.96
Observed (o,)
7.96
4.04
1.36
7,40
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
454
CHAPTER 13
CONTINGENCY TABLES AND GOODNESS-OF-FIT
righthand tail cells are pooled to achieve an ej are pooled. This leaves c (4 - 7.96)2 X
7.96
5, and the first two cells also
5 cells and
+
(7 - 8.96)2 8.96
+
+
74Ø)2
(13
7.40
- 9.22> 7.78
2
- Xo.90
so we can reject H0 at the
= 0.10 significance level.
UNKNOWN PARAMETER CASE When using goodness-of-fit procedures to help select an appropriate population model, we usually are interested in testing whether some family of distributions
seems appropriate, such as H0 . X P01(p) or Ii Y N(u, a2) where the parameter values are unspecified, rather than testing a completely specified hypothesis such as H0 : X P01(3). That is, we are not interested in the lack of fit because of the wrong parameter value at this point, but we are interested in .
whether the general model has an appropriate form and will be a suitable model when appropriate parameter values are used. Suppose we wish to test H0 X f(x; O, ..., Ok), where there are k unknown parameters. To compute the x2 statistic, the expected numbers under H0 now
must be estimated. if the original data are grouped into cells, then the joint density of the observed values, o, is multinomial where the true but unknown = P[X e A] are functions of 0, ..., 8k' If maximum likelihood estimation is used to estimate 0, O (based on the multinomial distribution of grouped data values os), then the limiting distribution of the x2 statistic is chi-squared with degrees of freedom c - i - k, where c is the number of cells and k is the number of parameters estimated. That is, approximately, t
x2(c 1 k)
(13.7.2)
where ê = In many cases, MLE estimation based on the grouped data multinomial model
is not convenient to carry out, and in practice the usual MLEs based on the ungrouped data, or on grouped data approximations of these, most often are used.
If MLEs based on the individual observations are used, then the number of degrees of freedom may be greater than c - I - k, but the limiting distribution is bounded between chi-square distributions with e - 1 and c - I - k degrees of freedom. Our policy here will be to use e - k - i degrees of freedom if k parameters are estimated by any ML procedure A more conservative approach would be to bound the p value of the test using e - i - k and e - i degrees of freedom if the MLEs are not based directly on the grouped data (Kendall and Stuart, 1967, page 430).
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
13.7
455
Cl-lI-SQUARED GOODNESS-OF-FIT TEST
Example 13.7.2 Consider again the data given in Example 13.7.1, and suppose now that we wish
to test H0 : X '-j POI(u). The usual MLE of p is the average of the 40 repair times, which in this case is computed as
+ 6(6)J/40 = 3.65
= [0(1) + 1(3) + 2(7) +
= e365(3.65/j! and the estiUnder H0 the estimated probabilities are now mated expected values, ê = n0, are computed using a Poisson distribution with p = 3.65, Retaining the same five cells used before, the results are as shown in Table 13.12.
TABLE 13.12
Observed and expected frequencies for chi-square test of Poisson model with estimated mean 3.65 Repair Times (Days)
(0, 1)
2
3
Observed (os) Probabilities (p10) Expected (ê1)
4
7
6
0.121
0.173 6.92
0.211
4.84
8.44
10 0.192 7.68
13
0.303 12.12
We have 484)2
± ... +
(13
12.12
- 1.62 <6.25 = X.9o(3)
so a Poisson model appears to be quite reasonable for these data, although the Poisson model with p = 3 was found not to fit well. The number of degrees of freedom hce is 3, because one parameter is estimated. Note that the question of how to choose the cells is not quite so clear when H0 is not completely specified. For a completely specified H0 the best choice is to choose cells so that all the ej are approximately equal to 5 This makes the e3 large enough to ensure the accuracy of the chi-squared approximation and still gives the largest possible number of degrees of freedom Of course with discrete distributions this may not be completely achievable. If H0 is not completely specified, then the ej or cannot be computed before taking the sample. The usual procedure is to choose some natural or reasonable cell division initially, and then pool adjacent cells after the data are taken to achieve ê 5. This
pooling should not be done in a capricious manner. In some cases the data already are grouped and this provides an initial cell division Indeed, one advan tage of the chi-squared goodness-of-fit statistic is that it is applicable to grouped data. On the other hand, if the individual observations are available, then some information may be lost by using only grouped data. Some additional goodness-
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
456
CHAPTER 13 CONTINGENCY TABLES ANO GOODNESS-OF-FIT
of-fit tests based more directly on the individual observations are mentioned in the next section.
Example 13J.3 Trumpler and Weaver (1953) provided data collected at the Lick observatory on the radial velocities of 80 bright stars, as shown in Table 13.13.
TABLE 13.13
Observed and expected frequencies for chi-square goodness-of-fit test with estimates
ft= -20.3 and
=12.7
Intervels of Velocities
(-80, -70) (-70, -60) (-60, -50) (-50, -40) (-40, -30) (-30, -20) (-20, -lO) (-10,0)
.000
1
2 2
(0,10) (10,20)
êj
13jo
.001
15
009
2
.051
8
.163
24 26
284
11
.154)
.224
22.72 22.64
.283
2
17.92
.209
16.72
1
(20, 30)
1
.001
80
1.000
80.00
We wish to test for normality, H0 : X N(p, a2). We will use the chi-square test with u and o estimated by the MLEs based on the grouped data of Table 13.13. If we denote the arbitrary jth cell by A = (ai, aj L) and if z1 = (a then the likelihood equation based on the multinomial data is
L=
0.
fl [i1(z31) °cji
The likelihood equations are obtained by equating to zero the partials of the logarithm of L with respect to u and a. Specifically, we obtain
J1
o[D'(z+) - I'(z)]/[«(z1) -
I(z1)] = 0
o[z+1t'(z+1) - z«'(z)]/[t(z1) - ¿I(z)] = 0 i=
(13.7.3)
(13.7.4)
i
Equations (13.7.3) and (13,7.4) must be solved by an iterative numerical
method, and for the data of Table 13.13 the estimates of i and a are
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
13.8
OTHER GOODNESS-OF-FIT TESTS
457
= 20.3 and & = 12.7. The estimated cell probabilities are of the form - i() with = (a - )/&. Each must be at least 5/80 = .0625 to ensure that ê 5. To satisfy this requirement in the present example, it is necessary to pooi the first five cells and, similarly, the last four cells must be pooled. This reduces the number of cells to c = 4 with the pooled results shown in Table 13.13. It follows that 4
f = i=1 (o - e)2/e = 1.22 < 3.84 = x.95W Thus, the normal model gives a reasonable fit in this example.
It might seem that a simpler method could be based on the following oftenused grouped-data estimates: zio)
m o, - nit2
n-1 where rn is the midpoint of the jth interval. However, for this example, these
estimates are ti = 21 and & = 16. The latter estimate 'of a is somewhat larger than the grouped-data MLE. In fact, it is large enough that the chi-square test based on this estimate rejects the normal model, contrary to our earlier conclusion. Another type of simple closed-form estimate for grouped-data will be discussed in Chapter 15.
138 OTHER GOODNESS-OF-FIT TESTS Several goodness-of-fit tests have been developed in terms of the empirical distribution function. The basic principle is to see how closely the observed sample cumulative distribution agrees with the hypothesized theoretical cumulative distribution. Several methods for measuring the closeness of agreement have been proposed.
The EDF tests generally are considered to be more powerful than the chisquared goodness-of-fit test, because they make more direct use of the individual
observations. Of course, then they are not applicable if the data are available only as grouped data. Let Xj:, ..., x,,.,, denote an ordered random sample of size n, Then the EDF or sample CDF is denoted by Ê(x), and at the ordered values note that
P(x ,) = i/n
If we wish to test a completely specified hypothesis, H0 : X F(x), then the general approach is to measure how close the agreement is between F(x.) and
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
458
CHAPTER 13 CONTINGENCY TABLES AND GOODNESS-OF-FIT
Ê(x,) for i = 1, ..., n. Some slight modifications have been found helpful, such as using a half-point correction and comparing the hypothesized F(x1.) values to (i - 0.5)/n rather than to i/n. Because U = F(X) - UNIF(O, 1), the test of any completely specified H0 can be expressed equivalently as a test for uniformity, where U, = F(X.) are distributed as ordered uniform variables.
CRAMER VON MISES TEST FOR COMPLETELY SPECIFIED H0 The Cramér-Von Mises (CVM) test can be modified to apply to Type II censored samples. If the r smallest ordered observations from a sample of size n are available, then the CYM test statistic for testing H0 : X F(x) is given by CM
=
0.52 (x1 - - n)
+
(13.8.1 )
The distribution of CM under H0 is the same as the distribution of 0.5'\2
n)
where the U are ordered uniform variables. The asymptotic percentage points for CM have been obtained by Pettitt and Stephens (1976). They appear tp be sufficiently accurate for practical purposes for samples as small as 10 or so.
An approximate size a test of H0 : X F is to reject H0 if CM CM1_, where the critical values CM1 are provided in Table 9 (Appendix C) for several values of a and censoring levels.
ExampI 13.$ .1 We are given that 25 system failures have occurred in a 100 day period, and we
wish to test whether the failure times are uniformly distributed, H0 : F(x) = x/100, O < x < 100, where the 25 ordered observations are as follows:
5.2, 13.6, 14.5, 14.6, 20.5, 38.4, 42.0, 44.5, 46.7, 48.5, 50.3, 56.4, 61.7, 62.9, 64.1, 67.1, 71.6, 79.2, 82.6, 83.1, 85.5, 90.8, 927, 95.5, 95.6 We have
CM
= 125)
Because CM0
90
+
(x
i _0.5)2
= 0.182
= 0,347, we cannot reject H0 at the a = 0.10 level of significance.
If a Poisson process is observed for a fixed time t, then given the number of occurrences, the successive failure times are conditionally distributed as ordered uniform variables This suggests that the above data could represent data from a Poisson process (see Chapter 16).
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
459
13.8 OTHER GOODNESS-OF-AT TESTS
If the above data represent successive failure times from a Poisson process, then the times between failures should be independent exponential variables The interarrival times are as follows: 5.2, 8.4, 0.9, 0.1, 5.9, 17.9, 3.6, 2.5, 1.2, 1.8, 1.8, 6.1, 5.3, 1.2, 1,2,
30, 35, 76, 34, 05, 24, 53, 19 28, 01 If we wish to use the CM statistic to test these data for exponentiality, we mus completely specify H0. Suppose that we test H0 : Y EXP(5). We have CM
i
= 0.173
12(25)
Because CM099 = 0.743, we see that we cannot reject the hypothesis that the interarrival times follow an EXP(5) distribution at the
= 0.01 significance level.
CRAMER-VON MISES TEST, PAFIAMETERS ESTIMATED
As suggested in the previous section, we often are more interested in testing whether a certain family of distributions is applicable rather than testing a completely specified hypothesis. To test H0 : X may consider C
12n
i
)-
F(x; O), where O is unspecified, we
I
13.8,2)
where Ô denotes the MLE of 0. In general, the distribution of CM may or may not depend on unknown parameters;however, we know that if O = (Or, 62) are location-scale parameters, then F(X; O, Ô2) and F(X. ; Ô, 2) are pivotal quantities whose distribution does not depend on the parameters. Thus, at least in the case of location-scale parameters, CM provides a suitable test statistic whose critical values can be determined. We have the disadvantage in this case that the critical values depend on the form of F being tested, whereas in the original situation the same critical values are applicable for testing any completely specified hypothesis. Some asymptotic and simulated critical values are available in the literature for certain models such as the exponential, normal, and Weibull distributions. Stephens considers slight modifications of the test statistic so that the asymptotic critical values are quite accurate even for small n for complete sample tests of normality and exponentiality. Some of these results, along with the Weibull case, are included in Table 10 (Appendix C). Pettitt and Stephens (1976) and Stephens (1977) provide additional results, including the censored case.
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
460
CHAPTER 13 CONTINGENCY TABLES AND GOODNESS-OF-FIT
Example 13.8.2 Now we may test whether the interarrival times in the previous example follow any exponential distribution, EXP(0). In this case, Ô = = 3.7, and CM
= 12(25)
+
(i - e"13'7
05)2
- 0.051
Because (1 + 0.16/n)CM = 0,051 <0.177, we cannot reject H0 at the
= 0.10
level.
KOLMOGOROV-SMIRNOV OR KU/PER TEST The KolmogorovSmirnov (KS) test statistic is based on the maximum difference between the sample CDF and the hypothesized CDF To test a completely speci fled H0 : X 's-' F(x), let
D + = max (i/n - F(x.))
(1 38.3)
D - = max (F(x1.1) - (i - 1)/n))
(13.8.4)
D = max (Dt, D-)
(13.8.5)
V=D+D
(13.8.6)
Carets will be added if unknown parameters must be estimated. The first three statistics are KS statistics and V is Kuiper s test statistic The distributions of these statistics do not depend on F. Also, as in the CVM case, if location-scale parameters are estimated, then the distributions will not depend on the parameters, but they then will depend on the form of F The KS statistics allow for one sided alternatives and they also have been extended to two sample prob lems.
Stephens (1974 1977) has derived asymptotic critical values for these statistics,
and he has considered modifications so that these critical values are good for small n Some of these results are summarized in Table 11 (Appendix C) The Weibull results are provided by Chandra et al (1981), and they also provide more accurate small sample results for this case, as well as percentage points for D + and D - These results were developed for the extreme value distribution for max imums, which is related to the Weibuil distribution by a monotonically decreas ing transformation thus the D+ and D critical values are interchanged when applying them directly to the Weibull distribution Many other EDF type test statistics as well as tests devised specifically for a certain model, are available in the literature Other references in this area include Aho et al. (1983), Dufour and Maag (1978), Koziol (1980), and Bain and Engelhardt (1983).
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
461
SUMMARY
ExampI 13J.3 Let us rework Example 13.8.2 using the Kolmogorov test statistic. A plot of the EDF F(y) = i/n and the estimated CDF = 1 - eYi/37 F(y1, ; is given in Figure 13.1. The value of D occurs at the fifth order statistic
D = D = 1 e"213'7
- 0.117
The modified test statistic is
(5 + 0.26 + 0.1)(0.117 - 0.2/25) = 0.584 <0.995 so again we cannot reject the hypothesis of exponentiality at the 0,10 significance level. FiGURE 13.1
Comparison of an empirical CDF with an exponential CDF with estimated mean
Ô=3.7
F(y;3.7)=i e
/3.7
(y)
D = 0.117
0
1.2
SUMMARY Our purpose in this chapter was to introduce several tests that are designed either to determine whether a hypothesized distnbution provides an adequate model or to determine whether random variables are independent.
Chi-square tests are based on the relative sizes of the differences between observed frequencies and the theoretical frequencies predicted by the model. They
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
/ 462
CHAPTER 13 CONTINGENCY TABLES AND GOODNESS-OF-FIT
have the advantages of being fairly simple and also of having an approximate chi-square distribution. Tests for independence based on contingency tables use the differences between observed frequencies of joint occurrence and estimates of these frequencies made under the assumption of independence. Other goodness-of-fit tests, such as the
CVM and KS tests, are based on the differences between the empirical CDF, computed from a random sample, and the CDF of the hypothesized model. These tests generally are harder to work with numerically, but they tend to have good power relative to many commonly used alternatives.
All of the goodness-of-fit tests considered here are designed primarily for testing completely specified hypothçses, but all can be adapted for testing composite hypotheses by estimating unknown parameters of the model. When this is the case, the power of the test is less than for completely specified hypotheses, and the critical values needed to perform the test are changed. This is taken care of easily for chi-square tests by adjusting the number of degrees of freedom (one degree of freedom is subtracted for each parameter estimated). The situation is not as convenient for the CVM and KS tests, because new tables of critical values are required, and these must be obtained for the specific parametric form that is being tested (normal, Weibull, etc)
EXERCISES 1. A baseball player is believed to be a .300 hitter. In his first 100 at bats in a season he gets 20 hits. Use equation (13.2.1) to test H0 p = 0.3 against Ha : p 0,3 at the = 0.10 significance level. What would you do if you wanted a one-sided test?
2. You flip a coin 20 times and observe 7 heads. Test whether the coin is unbiased at the a
3.
0.10 significance level. Use equations (13.2.1) and (13.2.2).
Consider Example 13.3.1, but suppose that the following data are observed:
Racel Race 2 Race 3
Present
Absent
Total
10 50 30
40 50 20
lOO
50 50
Test H0 :
0.25, P2
Test H : p1 =
P2
p3
0.50, p3 = 0.50 at = 0.50 at = 0.10.
= 0.10.
TestH0:p1 =p2=p3ataa=0.10. 4.
A system contains four components that operate independently. Let p denote the probability of successful operation of the ith component. Test H0 : Pi = 0.90, P2 = 0.90,
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
463
EXERCISES
= 0.80, p = 0.80, if in 50 trials the components operated successfully as follows: Component
40
Successful
2
3
4
48
45
40
5. In a certain genetic problem it is believed that brown will occur with probability 1/4, white with probability 1/4, and spotted with probability 1/2. Test the hypothesis that this model is correct at a = O 10 if the following results were observed in 40 trials:
Observed
Brown
White
Spotted
5
15
20
Test the hypothesis that the probabilities are 1/9, 4/9, and 4/9, respectively, at a = 0.10. A sample of 36 cards are drawn with replacement from a stack of 52 cards with the following results:
spades
hearts
diamonds
clubs
6
8
9
13
Test the hypothesis at a = 0.05 that equal numbers of each suit are in the stack öf cards. Three cards are drawn from a standard deck of 52 cards, and we are interested in the number of hearts obtained. The possible outcomes are x, = 0, 1, 2, or 3, and if we assumed the usual sampling-without-replacement scheme, we would hypothesize that these values
would occur with probabilities p, =f(x,), wheref(x)
(13)( 39 )/(52)
data are available from 100 trials of this experiment. Test H0 : p, =f(x,), i = a = 0.05 level based on these data:
The following
i.....4 at the
o
No. of Hearts
40
Times Occurred
12
45
Note: Combine the last two cells. A box contains five black marbles and 10 white marbles. Player A and Player B each are asked to draw three marbles from the box and record the number of black marbles obtained They each do this 100 times with the following results Observed Outcomes o Player A Player B
25
40
40 40
25 15
10 5
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
464
CHAPTER 13 CONTINGENCY TAOLES AND GOODNESS-OF-FIT
Use equation (13,5.1) to test the hypothesis that Player A drew the marbles without replacement and that Player B drew the marbles with replacement. Let a = 0.10. Similarly, test the hypothesis that A drew with replacement and B drew without replacement. (e) Use equation (13.5.2) to test the hypothesis that the two multinomial populations are the same at a = 0.10. A question was raised as to whether the county distribution of farm tenancy over a given period of time in Audubon County, Iowa, was the same for three different levels of soil fertility. The following results are quoted from Snedecor (1956, page 225):
Soil
Owned
Rented
Mixed
36
67 60 87
49 49 80
Il
31
Ill
58
152 140 225
Test the hypothesis that the multinomial populations are the same for the three different soil fertility levels at a = 0.10. Certain airplane component failures may be classified as mechanical, electrical, or otherwise. Two airplane designs are under consideration, and it is desired to test the hypothesis that the type of falure is independent of the airplane design. Test this hypothesis at a = 0.05 based on the following data:
Design I Design Il
Mech.
Elect,
Other
50 40
30 30
60 40
11, A sample of 400 people was asked their degree of support ofa balanced budget and their degree of support of public education, with the following results:
Supported Balanced Budget
Public Education
Strong
Undecided
Weak
Strong Undecided Weak
100 60 20
80 50 50
20
.
15
5
Test the hypothesis of independence at a = 0.05.
12. A sample of 750 people was selected and classified according to income and stature, with the following results:
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
465
EXERCISES
Income
Stature
Poor
Middle
Rich
Thin Average
100 50 120
50 200
50 70
60
50
Fat
Test the hypothesis that these two factors are independent at
13
0.10.
A fleet of 50 airplanes was observed for 1000 flying hours and the number of planes m, that suffered x component failures in that time is recorded below
X
0
1
2
4
3
5
10
Test H0 : X P01(2) at = 0.01. Test the hypothesis that X follows a form of the negative binomial distribution given by
with k = 3 and yt = 1. Use = 0.10. 14. Consider the data in Example 4.6.3. Test H0 : X - EXP(100) at n = 0.10. Test H0 : X EXP(0) at n = 0.10. 15.- The following data concerning the number of thousands of miles traveled between bus motor failures were adapted from Davis (1952): Observed Bus Motor Failures
Miles (1000)
0-20 20-40 40-60 60-80 80-100 100-120 120-140 140-160 160-180 180-200
First
Second
6
16 25 34 46 33 16
19 13 13 15 15 18 5 2
2 2
2 2
11
Third
Fourth
Fifth
27
34 20 15 15 8 3
29 27 14 8
-.-.._16 18 13 11
10
4 0 0
1
5
-2
2
(a) Test the hypothesis that the data for the first bus motor failure follow an exponential distribution at a = 0.05.
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
466
CHAPTER 13 CONTINGENCY TABLES AND GOODNESS-OF-FIT
(b) Test the hypothesis that the data for the fifth bus motor failure follow an exponential distribution at a = O 10 (e) Test the hypothesis that the data foi the first bus motor failure follow a normal distribution at a = 0.10.
16
The number of areas my receiving y flying bomb hits is given as follows
m
229
Test H0: Y
17
1
2
3
4
211
93
35
7
?5 1
P01(p) at a = 0.05.
In Problem 13 test H0 X
P01(p) at a = 005 Assume that
325
18. The lifetimes in minutes of 100 flashlight cells were observed as follows [Davis (1952)]: Ñumber of Minutes
O-706
706-746
746-786
786oo
Observed Frequency
13
36
38
13
Test H0: X '- N(p, cr2) at a = 0.10. Note that
= 746 and s = 40.
9. Consider the weights of 60 major league baseballs given in Exercise 24 of Chapter 4. Test H0 : X i1
N(u, 2)
Consider the data in Example 4.63.
(a) Use the CM statistic to tcst H0 : X EXP(l00). (b) Use the CM statistic to test H0 : X -. EXP(0). (c) Use the CM statistic based on the first 20 observations to test H0 : X EXP(l00). (d) Use the Kolmogorov-Smirnov statistic to test H0 : X - EXP(100). Let a 0.10 throughout
21. Lieblein and Zelen (1956) provide the following data for the endurance, in millions of revolutions, of deep-groove ball bearings: 17.88, 28.92, 33.00, 41.52, 42.12, 45.60, 48.48, 51.84, 51.96, 54.12, 55.56, 67.80, 68.64, 68,64, 68.88, 84.12, 93.12, 98.64, 105.12;
10584 12792 12804 17340 Test H0 : X
WEI(8, 8) at a Use a chi-squared test. Use a CVM test. (e) Use a KS test.
0.10. Note:
= 2.102 and Ô = 81.88.
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
467
EXERCISES
22. Gross and Clark (1975, page 105) consider the following relief times in hours of 20 patients receiving an analgesic:
LI, 1.4, 1.3, 1.7, 1.9, 1,8, 1.6, 2.2, 1.7, 2.7, 4.1, 1.8, 1.5, 1.2, 1.4, 3.0, 1.7, 2.3, 1.6, 2.0.
Test H0 : X EXP(0) at = 0.10. Note that Test H0: X
N(J1, u2) at
= 1.90.
= 0.10.
Test H0 : X LOG N(p, u2) at = 0.10. Test H0 : X WEI(O, /7) at = 0.10. Note that
7
= 2.79 and Ô = 2.14.
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
C
H
P
T
E
NON PARAM ETR M ETH S OS
IoJNTRODUCTION Most of the statistical procedures discussed so far have been developed under the
assumption tha1he population or random variable is distributed according to some specified family of distributions, such as normal, exponential, Weibull, or Poisson. In the previous chapter we considered goodness-of-fit tests that are helpful in deciding what model may be applicable in a given problem. Some types of questions can be answered and some inference procedures can be developed without assuming a specific model, and these results are referred to as rionparametric or distribution-free methods. The advantages of nonparametric methods
are that fewer assumptions are required, and in many cases only nominal (categorized) data or ordinal (ranked) data are required, rather than numerical (interval) data. A disadvantage of nonparametric methöds is that we usually prefer to have a well-defined model with important parameters such as means and variances included in the model for interpretation purposes. In any event, 468
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
469
14.2 ONE-SAMPLE SIGN TEST
many important questions can be answered by a nonparametric approach, and some of these results will be given here. The CVM goodness-of-fit test already discussed is an example of one type of distribution-free result. This type of result depends on the fact that U = F(X) UNIF(O, 1) for a continuous variable, and distributional results can be obtained for functions of F(X) in terms of the uniform distribution, which hold
for all F. For the more classical nonparametric tests, the probability structure is induced by the sampling or randomization procedures used, as in the counting type probability problems considered in Chapter 1
142 ONE-SAMPLE SIGN TEST Consider a continuous random variable X F(x), and let rn denote the median of the distribution. That is, P[X rn] = P[X m] = 1/2. We wish to test H0 : ni = rn0 against Ha : rn> m0. This is a test for location, and it is thought of as analogous to a test for means in a parametric case. Indeed, for a symmetric distribution, the mean and the median are equal. Now we take a random sample of n observations and let T be the number of
x1's that are less than rn0 That is, we could consider the sign of (x - rn0), 1, .. ., n, and let (14.2.1)
T = Number of negative signs
Note that we do not really need numerical interval scale data here; we need only to be able to rank the responses as less than rn0 01. greater than rn0. 0] 1/2, so the Under H0 : rn = rn0, we have P[X ( rn0] = P[X, > rn0, we alternative rn1 probability of a negative sign is Po = 1/2. Under the have Pi = P[X rnj
A test of H0 : m = rn0 against Ha : rn > m0 based on T is equivalent to the binomial test of H0 : P = Po = 1/2 against Ha : p < 1/2, where T represents the number of successes, and a success corresponds to a negative sign for (x - m0). That is, for the alternative rn > rn0, H0 is rejected if T is small, as described in Theorem 12.4.1 earlier.
F(x) and F(rn) = 1/2. A size is to reject H0 if
Theorem 14.2i Let X
test of H0 : m = rn0 against Ra: rn> rn0
B(t; n, 1/2)
where t = number of negative signs of (x1 - m0) for i= 1, .. ., n.
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
470
CHAPTER 14 NONPARAMETRIC METHODS
The other onesided or two-sided alternatives would be carried out similarly. The usual normal approximation could be used for moderate sample sizes. Also note that the sign test corresponds to a special case of the chi-square goodnessof fit procedure considered earlier In that case, e categories were used to reduce the problem to a multinomial problem, here we have just two categories plus and minus, with each category equally likely tinder H0 Of course the sign test is designed to interpret inferences specifically related to the median of the original population. The sign test for the median has the advantage that it is valid whatever the distribution F, If the true F happens to be normal, then one may wonder how much is lost by using the sign test compared to using the usual t test for means, which was derived under the normality assumption. One way of comparing two tests is to consider the ratio of the sample sizes required to achieve a given power To test H0 O = 00, let n1(03 be the sample size required to achieve a specified power at 0 for test one, and n2(01) the sample size required to achieve the same poer for test two Then the (Pitman) asymptotic relative efficiency (ARE) of test two compared to test one is given by ARE
hm
n1(61)
n2(0)
(14.2.2)
If the test statistics are expressed in terms of point estimators of the parameter, then in many cases the ARE of the tests corresponds to the ratio of the variances of the corresponding point estimators. Thus there is often a connection between the relative efficiency of a test and the relative efficiency of point estimators as defined earlier. This aspect of the problem will not be developed further here, but it can be shown that under normality, the ARE of the sign test compared with the t test is given by 2/ir = 0.64, and it increases to approximately 95% for small n That is, when normality holds, a t test based on 64 observations would give about the same power as a sign test based on loo observations Of course, if the normality assumption is not true, then the t test is not valid. Another restriction is that interval scale data are needed for the t test.
ExampI 14.2.1 The median income in a certain profession is $24,500. The contention is that taller men earn higher wages than shorter men, so a random sample of 20 men who are six feet or taller is obtained. Their (ordered) incomes in thousands of dollars are as follows:
10.8, 12.7, 13.9, 18.1, 19.4, 21.3, 23.5, 24.0, 24.6, 25.0,
25.4, 27.7, 30.1, 30.6, 32.3, 33.3, 34.7, 38.8, 40.3, 55.5
To test H0 m = 24 500 against Ha m > 24,500 we compute T = 8 negative signs. The p-value for this test based on this statistic is B(8; 20, 0.5) = 0.2517. Thus we do not have strong evidence based on this statistic to reject H0 and support the claim that taller men have higher incomes.
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
14.3
471
BINOMIAL TEST (TEST ON QUANTLES)
Note that if any of the observed values are exactly m0, then these values should be discarded and the sample size reduced accordingly. Note also that the sign test would have been unaffected in this example if the workers had been unwilling to give their exact incomes but willing to indicate whether it was less than $24,500 or more than $24,500.
14.3 BINOMIAL TEST (TEST ON QUANTILES) Clearly a test for any quantile (or percentile) can be set up in the same manner as the sign test for medians. We may wish to test the hypothesis that x0 is the p0th percentile of a distribution F(x), for some specified value Po; that is, we wish to test
H0:x=x0
(H0:P[Xx0]=F(x0)=po)
against
Ha : x, > x0 (Ha F(x0) < Po) Let t = number of negative signs of (x - x0) for i = 1, ..., n. Then when H0 is true, T BIN(n, po), and this test is equivalent to a binomial test of H0 : p = Po against Ha : P
Theorem 14.3.1 Let X
F(x) and F(x)
p. For a specified Po' a size
test of H0 : x, =
against Ha : x,0 > x0 is to reject H0 if
B(t;n,p0)
The other one-sided and two-sided alternative tests may be carried out in a similar manner, using the binomial test described in Theorem 12.4.2. These tests could be modified to apply to a discrete random variable X if care is taken with the details involved.
Examplo 14.3.1 In the study of Example 14.2.1, we wish to establish that the 25th percentile for tall men is less than $24,500. That is, we test H0 : x025 = $24,500 against Ha : x025 <$24,500. This is equivalent to testing H0 : F(24,500) = 0.25 against Ha: F(24,500) > 0.25. We find from the data that t = 8, and the corresponding p value is
i - B(t - 1,
fl, Po) =
- B(7; 20, 0,25) = 0.102
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
472
CIIAPTEFI 14 NONPARAMETRIC METHODS
An alternate expression for the test on quantiles can be given in terms of order
statistics. The outcome T = t is equivalent to having x,.
We
know that confidence intervals for a parameter can be associated with the values of the parameter for which rejection of H0 would not occur in a test of hypothesis.
In developing distribution-free confidence intervals for a quantile, the
common practice is to express these directly in terms of the order statistics
CONFIDENCE INTER VAL FOR A QUANTILE
Consider a continuous random variable X = F(X). Then g(x1., ...,
F(x) and let F(x) = p.
x) = n!f(xj.) = n!
Let
(14.31)
O
(14.3.2)
and hk(zk)
n! Zk)
= (k - 1)!(n - k)!
0< Z < 1
(14.3.3)
Now P[Xk:fl
Xp
= P[F(Xk.fl) = P[Zk
F(x,)]
P]
('p
= Jo I h(z) dzk
(14.3.4)
This integral represents an incomplete beta function, and for integer k it can be expressed in terms of the cumulative binomial distribution, where P[Xk.fl
x] =
jk ()(1 J
p)n_i
I - B(k - 1; n, p) = y(k, n, p)
(14.3.5)
Thus for a given pth percentile, the kth order statistic provides a lower y(k, n, p) level confidence limit for xe,, where k and n can be chosen to achieve a particular desired level. Binomial tables can be used for small n and the normal approximation for larger n. In a similar fashion, P[XJÇ.
xi,] = i - P[Xk,fl
(14.3.6)
and a desired upper confidence limit can be obtained by proper choice of k and n.
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
14.3
BINOMIAL TEST (TEST ON QUANTILES)
473
A two-sided confidence interval for a given percentile x, also can be developed. Note that
P[X.
x] = P[X1.
because either Xi.,,
Xi.,,]
Xp
+ P[XJ.,,
x,, Xp,
(14.3.7)
SO
x,,] - P[X.,,
P[Xi:n
=B(j-1;n,p)--B(il;n,p)
(14.3.8)
Again one would attempt to find combinations of i, j, and n to provide the desired confidence level It can be shown that equation (14 3 8) provides a conservative confidence inter val if F is discrete.
Example 14.3.2 We now wish to compute a confidence interval for the 25th percentile in the previous example. We note that
P[XQ x025] = 1 B(1 20 025)= 09757 and P[X10.,Ø
x02] =
B(9; 20, 0.25) = 0.9861
Thus, (x2.20, x10.20) = (12.7, 25.0) is a two-sided confidence interval for x025 with
confidence coefficient i - 00243 - 00139 = 09618
For large n, a normal approximation may be used. For example, for an
upper
limit,
in
equation
B(k - 1; n, p) = 1(z), where setting z = z1_
set
(14.3.6)
z = (k - i + 0.5 - np)/Jnp(1 - p). For a specified level i gives an approximate expression for k in terms of n,
,
k = 05 + np + zi_/np(1 - p) If k is rounded to the nearest integer, then Xk.,, is the approximate upper i confidence limit for x,,. For the lower limit case, replace z1
with z.
TOLERANCE LIMITS A function of the sample L(x) is said to be a lower y probability tolerance limit for proportion p* jf
P[Jf(x) dx
= P[1
F(L() >
L(X)
= P[F(L(X))
= P[L(X)
i
p*]
=y
(14.3.9)
That is, we wish to have an interval that will contain a prescribed proportion p's' of the population. It is not possible to determine such an interval exactly if F is
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
474
CHAPTER 14 NONPARAMETRIC METHODS
unknown, but the tolerance interval (L(X), cc) will contain at least a proportion p* of the population with confidence level y. The proportion p* is referred to as the content of the tolerance interval It is clear that a lower y probability toler ance limit for proportion p is simply a lower y level confidence limit for the i p's' percentile, x1 _. Thus L(X) = Xk,, is a distribution free lower y tolerance limit for proportion p if k and n are chosen so that
y(k, n, 1 p*) = i B(k 1; n, i
p*) = y
(14.3.10)
One may also wish to have a two-sided tolerance interval (L(X), U(X)) such that
PF[U(X)] - F[L(X)]
p} = y
A two-sided tolerance interval cannot be obtained from a two-sided confidence interval on a percentile, but a two-sided distribution-free tolerance interval can be obtained in the form (L(X), U(X)) = (X1, Xi.,,), by the proper choice of i, j, and n. We need to choose i,j, and n to satisfy
P[F(X.) - F(X1.,,)
p]
= P[ZJ - Z
p] =
wheref(z1,...,z,,)=n!;O
Z-Z-
(14.3.11)
and E(l4) = l/(n + 1). The expected content between two consecutive order statistics is 1/(n + 1). It follows that the expected content between any two order statisttcsX,,, and X,,, z
E(Z_Z)=E(
Wk) =
+
(14.3,12)
That is, the expected content depends only on the difference j - i and does not depend on which i and j are involved. It turns out in general that the density of Z1 - Z. or the sum of any j - i coverages depends only on j - i and not on i and j separately. Consider the transformation
with inverse transformation
The Jacobian is 1, so
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
475
14.3 BINOMIAL TEST (TEST ON QUANTILES)
This density is symmetric with respect to the w. That is, the density of any function of the w, will not depend on which of the w, are involved In particular the density of the variable,
U_ = F(X) - F(Xj,n) =
Z=
W
(14.3.14)
k=i+1
depends only on the number of coverages summed, j - i, and the density is the same as the density of the sum of the first j - i coverages,
ii
k
i
W,=Z
The marginal density of Z._ is given by equation (14.3.3) with k ==j - i, which is a beta density, so
U_=ZZ1'BETA(ji,n---j+i+ 1)
(14.3.15)
Expressed in terms of the binomial CDF,
P[ZZ,
=
jii
n
p) BU - i-1, n,p)
(14316)
Thus the interval (X,,,, X,,) provides a two sided y probability tolerance interval for proportion p if i and j are chosen to satisfy
BU - i - 1; n, p) = y Theorwn 14.3.2 For a continuous random variable X F(x), L(X) = Xk.,, is a lower y probability tolerance limit for proportion p where y = i - B(k - i n, i p*) Also Xk is an upper y probability tolerance limit for proportion p where y = B(k - 1, n, p) The interval (X, n' X ) is a two sided y probability tolerance interval for propor
tionp,wherey=B(ji 1; n,p).
ExmpIe 14.3.3 In Example 14.3.2, we see that the lower confidence limit on x, = x025, given by L(X) = X2.20, also may be interpreted as a y = 0.9757 probability tolerance limit
for proportion p = i - 0.25 = 0.75. That is, we are 97.57% confident that at least 75% of the incomes of tall men in this profession will exceed x2.20 = 12.7 thousands of dollars. If we were interested in a lower tolerance limit for proportion p'' = 0.90, then Table i (Appendix C) at i p* = O 10 shows P[X1.20
= i - B(O; 20, 0.10) = 0.8784
Thus, for example, if a 95% tolerance limit is desired for proportion 0.90, a larger sample size is required.
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
476
CHAPTER 14 NONPARAM ETRIC METHODS
We see that (L(X), U(X)) = (X1.20, X2020) provides a two-sided tolerance interval for proportion 0.80 with probability level y = B(20 - i - 1; 20, 0.80) = 0.9308.
For specified y and p it is of interest to know what sample size is required so will provide the desired tolerance interval. Setting that (Xi.a, B(n - 2; n, p) = y
yields n as the solution to
- (n 1)pfl = i - y For this n, we find P[F(X.) - F(X1) np
p] =
144 IWO SAMPLE SIGN TEST The one-sample sign test is modified easily for use in a paired sample problem, and this approach can be used as au alternative to the paired sample t test when normality cannot be assumed. Assume that n independent pairs of observations (x1, ye), i = i, ..., n, are available, and let T equal the number of times X1 is less than Y. In terms of the differences X1 - Y1, we say that T = number of negative signs of X1 - 1, i = i, ..., n. Again we will assume that X and Y are continuous random variables so that PEX = Y] = 0, but if an observed x1 y1 -- O because
-
of roundoff or other reasons, then that outcome will be discarded and the number of pairs reduced by one. The sign test is sensitive to shifts in location, and it should be useful in detecting differences in means or differences in medians, although strictly speaking it is a test of whether the median of the differences is zero.
Theorem 18.4.1 Suppose that X and Y are continuous random variables and n independent pairs
of observation (x,, y,) are available Consider
H0 P[X < Y] = P[X> Y] =
(H0
median (X - Y) =0)
against
Ha P[X < Y]
(Ha median (X - Y) >0)
Let t be the number of negative signs of (x1 - y), i = T
and a size
1,
..., n. Then, under H0,
BIN(n, 1/2)
test of H0 against H0 is to reject H0 if B(t; n, 1/2)
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
14.5 WILCOXON PAIRED-SAMPLE SIGNED-RANK TEST
477
Example 14.4.1 A campaign manager wishes to measure the effectiveness of a certain politician's speech. Eighteen people were selected at random and asked to rate the politician
before and after the speech. Of these, 11 had a positive reaction, four had a negative reaction, and three had no reaction. To test H0 : no effect against Ha : positive effect, we have t = 4 negative reactions and n = 15, and the p-value for the test is B(4; 15, 0.5) = 0.0592
Thus there is statistical evidence at this error level that the speech was effective for the sampled population.
1405 WILCOXON PAIRED-SAMPLE SIGNED-RANK TEST The two sample sign test makes use only of the signs of the differences x, - y, A test would be expected to be more powerful or more efficient if it also makes some use of the magnitude of the differences which the Wilcoxon signed rank test does.
Let d. = x - y for i = 1, . . . , n denote the differences of the matched pairs; then rank the differences without regard to sign. That is, rank the d1 according to magnitude, but keep track of the signs associated with each one Now replace the d with their ranks, and let T be the sum of the i ariks of the positive differ ences Again this test statistic will be sensitive to differences in location between the two populations In the sign test the positive signs and negative signs were I
assumed to be equally likely to occur under H0 In this case to determine a critical value for T we need to assume that the positive signs and negative signs are equally likely to be assigned to the ranks under H0 The signs will be equally likely if the joint density of x and y is symmetric in the variables that is, we could consider H0 F(x, y,) = F(y,, x,) This corresponds to the distribution of
the differences being symmetric about zero or H0 FD(dI) = i - F(d). Note that the probabiiity of a negative sign is FD(0) = 1/2, and the median is mD = O. Also, for a symmetric distribution, the mean and the median are the same. Thus, under the symmetry assumptions mentioned, this test may be considered a test for equality of means for the two populations. In general, the signed-rank test is cunsidered a test of the equality of two populations and has good power against the alternative of a difference in location, but the specific assumption under H0 is that any sequence of signs is equally likely to be associated with the ranked differences. That is, if the alternative is stated as H0 E(X)
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
478
CHAPTER 14 NONPARAÑIETRIC METHODS
illustrate how a critical value for T may be computed, consider n = 8 pairs; then there are 28 = 256 equally likely possible sequences of pluses and minuses that can be associated with the eight ranks. If we are interested in small values of T, then we will order these 256 possible outcomes by putting the ones associated with the smallest values of T in the critical region. The outcomes associated with small values of T are illustrated in Table 14.1.
TABLE 14.1
Signs associated with the Wilcoxon paired-sample signed-rank test Ranks 2
1
Signs
-
+ -
+
+
+
-
+
-
-
3
4
-
--
-
+
-
+
-
6
7
8
-
-
-
-
--
-
-
5
+
-
-.
-
-
T O
i
2 3
3 4 4
Placing the first five possible outcomes in the critical region corresponds to rejecting H0 if T 3, and this gives = 5/256 = 0.0195. Rejecting H0 if T 4 results in a significance level of = 7/256 = 0.027, and so on. Conservative critical values, ç, are provided in Table 12 (Appendix C) for the usual prescribed levels for n 20. The true Type I error may be slightly less than because of discreteness. A normal approximation is adequate for n > 20. The mean and variance of T may be determined as follows.
Without loss of generality, the subscripts of the original differences can be rearranged so that the absolute differences are in ascending order, d1 d , in which case the rank of is i, and the signed-rank statistic can be I
I
I
written as T
iU, where U1
=
= i if the difference whose absolute value has
rank i is positive, and U1 = O if it is negative. Under H0, the variables U1, ..., U,, are independent identically distributed Bernoulli variables, U BIN(1, 1/2). Thus,
E(T) = E(
lU1)
ln(n+1) 2
2
iE(U,) =
=
-
n(n-i-1) 4
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
14.5 WILCOXON PAIRED-SAMPLE SIGNED-RANK TEST
479
and
¡U)
Var(T) = Yar(
-
4j 2
i2
=
Yar(U,)
n(n + lX2n + 1) 24
For large n,
T -E(T)
Z
JVar(T)
N(0, 1)
- y, i 1, .., n, denote the differences of n independent matched pairs. Rank the d. without regard to sign, and let
Theorem 14.5.1 Let d. =
T = Sum of ranks associated with positive signed differences A (conservative) size
test of
H0 F(x4, y1) = F(y, x)
(H0 : FD(d) = i - F0( - de))
against
Ha : X is stochastically smaller than Y (P[X > a]
s to reject H0 if t
This test also may be used as a test of the hypothesis that f0(d) is symmetric with Po = E(X) - E(Y) = O against the alternative E(X) < E(Y) For a two sided alternative let t be the smaller sum of like signed ranks, then a (conservative) size test is to reject H0 if t Ç12 For n 20, approximately n(n + 1) 4
n(n + 1)(2n + 1)
N (0, 1)
24
Note that the signed-rank test also can be used as a one-sample test for the median of a symmetric population Consider the null hypothesis H0 that X is a continuous random variable with a sy.mmetric distribution about the median m0. Let d. = x - m0 and T = sum of positive signed ranks as above. Then a size
test of H0 against the alternative H : m
:p = E(X)E(Y)=0
against
Ha: E(X) E(Y)< O
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
480
CHAPTER 14 NONPARAMETRIC METHODS
If the Wilcoxon signed-rank test is used in this case, its asymptotic relative efficiency is 3/ir = 0.955.
Also note that any observed d, = O should be discarded and the sample size
reduced. If there is a tie in the value of two or more d1, then it is common practice to use the average of their ranks for all of the tied differences in the group.
Example 14.5.1 To illustrate the one-sample signed-rank test, consider again Example 14.2.1, where we wish to test the median income H0 : in = 24.5 thousand dollars against Ha : m> 24.5. If we assume that the distribution of incomes is symmetric, then the signs of x - m0 are equally likely to be positive or negative, and we can apply the Wilcoxon paired sample signed-rank test The test in this case also will be a test of H0 i = 24 5, because the mean and median are the same for sym metric distributions. Note that if the assumption of symmetry is not valid, then H0 could be rejected even though in = m0, because lack of symmetry could cause the signs of x - in0 not to be equally likely to be positive or negative. Indeed, if the median m = in0 can be assumed known, then the Wilcoxon signed-rank test can be used as a test of symmetry. That is, we really are testing both that the distribution is symmetric and that it has median in0. In our example we first determine the ranks of the d, according to their abso lute value d = - m01, as follows:
-13.7
-11.8
-10.6
16
15
rank (Id,I)
17
d1
0.9
3.2
5.6
rank (ld1)
4
6.5
9
6.1
10
-6.4
-5.1
-3.2
-1.0
-0.5
11
8
6.5
5
2.5
7.8 12
8.8 13
0.1
0.5
1
2.5
10.2
14.3
15.8
31.0
14
18
19
20
For Ha : in> 24., we reject H0 for a small sum of negative signed ranks, where for this set of data
T=2.5+5+6.5-i-=8l From Table 12 (Appendix C) we see that T = 81 gives a p value of 0.20, so we cannot reject H0 at the usual prescribed significance levels However, this test does give some indication that the hypothesis that the incomes are symmetrically distributed about a median of $24,500 is false. The lack of symmetry may be the greater source for disagreement with H0 in this example.
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
14.5 WILCOXON PAIRED-SAMPLE SIGNED-RANK TEST
481
Example 14.5.2 To illustrate the paired-sample case, suppose that in the previous example a second sample, y, ..., Y2o' of 20 men under six feet tall is available, and we wish to see if there is statistical evidence that the median income of tall men is greater than the median income of shorter men, H0 x050 = Yo.so against Ha x050 > Yo.so In this two-sample case, if the medians are equal, then there would be little reason to suspect the assumption of symmetry. Note that if the two samples are independent, then the paired sample test still can be applied, but a more powerful independent samples test will be discussed later. If the samples are paired in a meaningful way, then the paired-sample test may be preferable to an independent samples test. For example, the pairs could be of short and tall men who have the same level of education or the same age. Of course, the samples would not be independent in that case. Consider the following 20 observations, where it is assumed that the first observation was paired with the first (ordered) observation in the first sample, and so on. The differences d. = X. - y. also are recorded.
X,
10.8
12.7
13.9
18.1
19.4
21,3
23.5
24.0
24.6
25.0
y'
9.8
13.0
10.7
19.2
18.0
20.1
20.0
21.2
21.3
25.5
d,
1.0
-0.3
3.2
-1.1
1.4
1.2
35
2.8
3.3
-0.5
rank (Id,I)
4
5
8
6
x,
rank
(Id,I)
12
1
14
10
13
3
25.4
27.7
30.1
30.6
32.3
33.3
34.7
38.8
40.3
55.5
25.7
26.4
24.5
27.5
25.0
28.0
37.4
43.8
35.8
60.9
-0.3
1.3
5.6
3.1
7.3
6.3
-2.7
-5.0
4.5
-5,4
7
2
19
11
20
17
9
16
15
18
For the alternative hypothesis as stated, we reject H0 if the sum of ranks of the negative differences is small. An alternative approach would have been to relabel or to let d, = y, - x, then we would have used the sum of positive signed ranks
Note also that T + T = n(n + 1)/2, which is useful for computing the smaller sum of like-signed ranks for a - two-sided alternative. We have T = i
+5+3+2+9+16+18=54 Because to05 =60, according to this set of
data we can reject H0 at the 0.05 level. The approximate large sample 005 critical value for this case is given by t0.05
z0.05
20(21X41) 24
+
20(21) 4
- 60.9
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
482
CHAPTER 14 NONPARAMETRIC METHODS
PAIRED-SAMPLE RANDOMIZATIOH TT In the Wilcoxon signed-rank test, the actual observations were replaced by ranks. The advantage of this is that predetermined critical values can be tabulated as a
function only of n and
. It is possible to retain the actual observations and
develop a probability structure for testing by assuming that all ordered outcomes of data are equally likely under H0. For example, if two samples are selected from the same population, then the assignment of which came from population i and which from population 2 could be made at random. There would be N = (n1 + n2) !/n1 ! n2! equally likely possible assignments to the given set of data. Thus a size = k/N size test of equality can be obtained by choosing k of these outcomes to be included in our critical region. Of course, we want to pick the k outcomes that are most likely to occur when the alternative hypothesis is true. Thus, we need some test statistic, T, that will identify what order we want to use in putting the possible outcomes into the critical region, and we need to know the critical value for T for a given In the signed-rank test, we used the sum of the positive signed ranks, and we were able to tabulate critical values. We now may use the sum of the positive differences as our test statistic, although we cannot determine the proper critical value until we know all the values of the d.. For example, the following eight differences are observed in a paired sample problem:
.
20, 10, 8, 7, +5, 4, +2, 1 There are
28
= 256 possible ways of assigning pluses and minuses to eight
numbers, and each outcome is equally likely under the hypothesis that the D. are symmetrically distributed about Pn = 0. We may rank these possible outcomes according to T, the sum of the positive d; the first dozen outcomes are shown in Table 14.2 TABLE 14.2
Differences d for the paired-sample randomization test
-20 -20 -20 -20 -20 -20 -20 -20 -20 -20 -20 -20
-10 -10 -10
-lo -10 -10 -10
-10 -10 -10 -10 -10
-8 -8 -8 -8 -8 -8 -8 -8 -8 -8 -8 -8
-7 -7 -7 -7 -7 -7 -7 -7 -7 -7 -7 +7
-5 -5 -5 -5 -5 -5 ±5 +5
-5 -5 +5
-5
-4 -4 -4 -4 -t-4 -i-4
-4 -4 +4 +4
-4 -4
-2 -2
-1
0
+1
1
+2
-1
-i-2
+1
2 3
-1
4
+1
-1
5 5
+1
6
-1
6
+1
7
-1 -1
7
-2 -2 -2 -2 +2 +2 +2
-2
7
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
14.7 WILCOXON AND MANN-WHITNEY (WMW) TESTS
483
Thus to test H0 : m = 0 (and symmetry) against a one-sided alternative Ha : mJ-J <0 is to reject H0 for small T. Given these eight numerical values, a size a = 12/256 = 0.047 test is to reject H0 if T 7. Thus we can reject H0 at this a for the data as presented. That is, there were only 12 cases as extreme as the one observed (using the statistic T)
Tests such as the one described based on the actual observations have high efficiency, but the test is much more convenient if the observations are replaced by ranks so that tixed critical values can be tabulated A normal approximation can be used for larger n, and quite generally the normal theory test procedures can be considered as approximations to the corresponding 'exact randomiza tion tests. For each normal test described, the same test statistic can be used to order the set of possible outcomes produced under the randomization concept that gives equally likely outcomes under H0. Approximating the distribution of
the statistic then returns us to a normal type test. For small n, exact critical values can be computed as described, but these are in general quite inconvenient to determine.
14.7 WILCOXON AND MANN-WHITNEY (WMW) TESTS We now will consider a nonparametric analog to the t test for independent samples. The Wilcoxon rank sum test is designed to be sensitive to differences in location, but strictly speaking it is a test of the equality of two distributions. To -
illustrate the case of a one-sided alternative, consider H0 : F = F against H: F > F (X stochastically smaller than Y). Suppose that n1 observations x1, .., x1 and n2 observations Yi ..., y,, are available from the two populations. Combine these two samples and then rank the combined samples in ascending order. Under H0 any arrangement of the x's and y's is equally likely to occur, and there are N = (n1 + n2)!/n1!n2! possible arrangements. Again we can produce a size a = k/N test by choosing k of the possible arrangements to include in a critical region. We wish to select arrangements to go into the critical region that are likely to occur when H0 is true to minimize our Type II error. The Wilcoxon test says to replace the observations with their combined sample ranks, and then reject H0 if
the sum of the ranks of the x's is small. That is, the order of preference for = rank (x's). including an arrangement in the critical region is based on For example, if n1 = 4 and n2 = 5, then there are () = 126 possible arrangements; the ones with the smallest values of W are shown in Table 14.3.
A size a w
7/126 = 0.056 test is achieved by rejecting H0 if the observed
13. Note that W + l4< = (n1 + n2)(n1 + n2 + l)/2.
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
484
TABLE 143
CHAPTER 14 NONPARAMETRC METHODS
Arrangements of x's and y's for the WilcoxonM ann-Whitney tests 3
4
x
x
X
X
x
x
y
x x
x y x x
1
2
x X
x x
W
5
6
7
S
9
x
y
y
y
10
0
y
y
1
y
12
2
x
y
y
x
y
x x
x
y
y y
x y
x
12 13 13 13
2
x
y y y y y
11
x
x x
y y
y
y
y
x
y
y
y
y y y y
y y y
3
3 3
Mann and Whitney suggested using the statistic
Ui,, = Number of times a y precedes an x
It turns out that LJ and W are equivalent statistics, The minimum value of W
+ 1)/2, and this corresponds to U = O. 1f one y precedes one x, then U = 1 and this increases W by 1. Similarly, each time a y precedes an x, this is n1(n1
increases W, by one more so that
n(n + U Similarly, n2(n2 + 1) 2
+
where U is the number of times an x precedes a y. Note that U. + U = n1 n2. For the alternative Ha : (X stochastically larger than Y), we would reject H0 if W is large or if W, and U are small. The seven sequences corresponding to the smallest values of f4'; in the example are the seven sequences in Table 14.3 ranked in the reverse order. In this case W = 18 for the last sequence, for example, and U = 18 - [5(6)/2] = 3 = U for the original table. Indeed, for a given sequence, U,, computed under the first order of ranking is the same as U for the reverse ranking, because the same number of interchanges between the x's and y's occurs whichever direction the ranks are applied to the sequences. Thus, U,, and U are identically distributed and the same critical values can be used with either U, or U. Sometimes the subscript will be suppressed and the notation U will be used. The notations u,, and u will refer to the observed values of U,, and U, respectively, and uis the notation for a 100zth percentile of U (where U U,, Up). Table 13A (Appendix C) gives PEU,, ( u] = P[U u] for values of m = min (n1, n2) and n = max (n1, n2) less than or equal to 8. Table 13B (Appendix C) gives critical values u such that PEU u,,] c for 9 n 14. A normal approximation may be used for larger sample sizes.
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
14.7 WILCOXON AND MANNWHITNEY (WMW) TESTS
Theorem 14.7ï
485
and y, ..., y, be independent random samples. Then for an
Let x1, ..., x
observed value of U,;, reject H0
F = F1 in favor of Ha F> F1 (X stochasti-
cally smaller than Y) if P[U u,;] , or if U,; z u8. Reject H0 in favor of c, or if U1 s u8 Ha F < F (X stochastically larger than Y) if P[U u] Reject H0 in favor of a two-sided alternative Ha F F1 if PEU u,;] or P[U u1] /2. Alternately, reject H0 against the two-sided alternative if min (U,;, U) = min (U,;,
fl1 fl2
- U,;) 1
Ut,.
A normal approximation for larger sample sizes may be determined as follows. Let
Jo
x
l
X.>Y
z
(14.7.1)
Then
U=
>Z
(14.7.2)
j1 j=i
Under H0,
i
E(Z)
P[Z1J
= J2
(14.7.3)
and E(U) = 122
(14.7.4)
The expected values of products of the variance of U. For example, if E(Z,JZZk) = i
j
are required to determine the
k, then
i P[ZJ = i, Z,, = 1]
=P[X>Y X,>Yk]
21
3!
3
(14.7.5)
There are two ways to have a success in the 3! arrangements of X, Y, and Yk. It can be shown (see Exercise 26) that Var(U)
-
n1n2(n1 + n2 + 1) 12
(14.7.6)
Thus the normal approximation for the level critical value is 2
- Z1 _8n1 n2(n1 + n2 + i)/12
(14,7.7)
It is possible to express the exact distribution of U recursively, but that will not
be considered here. If ties occur and their number is not excessive, then it is
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
486
CHAPTER 14 NONPARAMETRIC METHODS
common practice to assign the average of the ranks to each tied observation. Other adjustments also have been studied for the case of ties The asymptotic relative efficiency of the WMW test compared to the usual t test under normal assumptions is 3/it = 0.955.
Example 14.7.1 The times to failure of airplane air conditioners for two different airplanes were recorded as follows: X
23
261
87
j,
55 320
56
120
14
62
104 220 239
47
7
47 225 246
176
71
246
182
33
21
We wish to test H0 : F = F against H : X is stochastically smaller than Y, This alternative could be interpreted as Ha : ¡-
X
X
y
y
X
14 2
21
23
33
3
4
5
47 6.5
47 6.5
y 104
X
y
y
y
x
y
x
176
182
y 220
X
120
225
14
15
16
17
18
246 20,5
246 20.5
261
13
239 19
X
7 1
y
y 55
56
8
9
X
X
X
62 10
71
87
11
12
22
y 320 23
We have n = 12, n = 11, and the sum of the ranks of the x's is 124
and U = l''
n(n + 1)/2 = 124 78 = 46. For the given alternative, we wish to reject H0 if W or U is small. From Table 13B (Appendix C), the = 0.10 critical value is u010 = 44, so we cannot reject H0 at the c = 0.10 significance level. To illustrate the asymptotic normal approximation, E(U) = 66, Var(U) = 264, and the approximate p-value for this test is PEU
46]
(4;66)
-(
1.23) = 0.1093
148 CORRELATION TESTS-TESTS OF INDEPENDENCE Suppose that we have n pairs of observations (xe, y) from a continuous bivariate distribution function F(x, y) with continuous marginal distributions F1(x) and F2(y). We wish to test for independence of X and Y, H0 : F(x, y) = F1(x)F2(y) against, say, the alternative of a positive correlation.
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
14.8 CORRELATtON TESTSTESTS OF INDEPENDENCE
487
For a given set of observations there are n! possible pairings, which are all equally likely under H0 that X and Y are independent For example, we may consider a fixed ordering of the y's; then there are n! permutations of the x's that can be paired with the y's. Let us consider a test that is based on a measure of relationship known as the sample correlation coefficient,
r-
2
/OE x? - n5;2][>
- nj2]
That is, for a size = k/n! level test of H0 against the alternative of a positive correlation we will compute r for each of the n possible permutations, and then place the k permutations with the largest values of r in the critical region. If the observed ordering in our sample is one of these permutations, then we reject H0. Note that 5;, j, s, and s do not change under permutations of the observations, so we may equivalently consider
t=
xy
as our test statistic. Again it becomes too tedious for large n to compute t for all n! permutations
to determine the critical value for T. We may use a normal approximation for large n, and for smaller n we may again consider replacing the observations with their ranks so that fixed critical values can be computed and tabulated once and for all.
NORMAL APPROXIMA TION For fixed y, let us consider the moments of the x1 relative to the n! equally likely
permutations. The notation is somewhat ambiguous, but suppose we let X denote a random variable that takes on the n values x1, x, each with pröbability 1/n. Similarly, the variable XX will take on the n(n - 1) values x.xj for i
j, each with probability l/(n)(n - 1), and so on. Now x
E(X1) =
= 5;
i
Var(X) = E(X?) and
Cov(X1, X) = E(XXJ) i*i
i 5J
Now
E(T) = E(
Xy1) =
yE(X) =
E(r) = O
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
488
CHAPTER 14 NONPARAMETRIC METHODS
Because the correlation coefficient is invariant over shifts in location, for conve-
nience we will assume temporarily that the x's and y's are shifted so that = .P = O; then
Var( X y) = =
Cov(X, X)
Var(X) +
y(n - 1)s/n + [(
i*i
yy1
ij
x1x/n(n -
?Il[OE x)2 -
Y)2 -
1)
x]/n(n - î
Thus Var(r)
- i2,.2.2 Var( 1)
Xy1) -
i
These moments were calculated conditionally, given fixed values of (xe, y), but because the results do not depend on (x1, y), the moments are also true unconditionally.
It can be shown that a good large sample approximation is given by
i
i-1 t is interesting that for large n, approximately,
E(r3)=O
and
E(r)=23
These four moments are precisely the first four moments of the exact distribution of r based on random samples from a bivariate normal distribution. Thus a very close approximation for the "permutation" distribution of r, which is quite accu-
rate even for small n, is obtained by using the exact distribution of r under normal theory. We will find in the next chapter that under the hypothesis of independence, the sample correlation coefficient can be transformed into a statistic that is t distributed. In particular,
/Tir
t(n-2)
Basically, the preceding results suggest that the test for independence developed under normal theory is very robust in this case, and one does not need to worry much about the validity of the normal assumptions for moderate sample
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
489
14.8 CORRELATION TESTS-TESTS OF INDEPENDENCE
sizes. If one wishes to determine an exact nonparametric test for very small
n,
say
5 or 10, then it is again convenient to make use of ranks. The rank correlation coefficient also may be useful for testing randomness
Example 14.8.1 Consider again the paired-sample data given in Example 14.5.2. The correlation coefficient for that set of paired data is r = 0.96. It is clear that the pairing was effective in this case and that the samples are highly correlated without per-
forming tests of independence. The approximate t statistic in this case is
t = 0.96/Th//l - 0,962
= 14.8.
Again, it appears safe to use Student's t distribution based on normal theory unless n is very small.
SPEARMAN'S RANK CORRELATION COEFFICIENT Again consider n pairs of observations; this time, however, the pairs already are ordered according to the y. Thus the pairs will be denoted by (Xe, yj), i = 1, .. ., n, where the y, , are the fixed ordered y observations, and x, denotes the x value paired with the ith largest y value. We will replace the observed values with ranks. Let W = rank(y) = i, and let U = rank(x) denote the rank of the x value
that is paired with the zth largest y value The sample correlation coefficient based on these ranks is referred to as Spearman s rank correlation coefficient, R It may be conveniently expressed in terms of the difference of the ranks,
d. = U - i We have
W = U = (n + 1)/2
(n - 1)s = (n - 1)s, =
i2 - nU
n(n + 1)(2n + 1)
n(n +
1)2
4
6
=2
i2 - 2
iU1
n(n + lX2n + 1)
2
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
490
CHAFTER 14 NONPARAMETRIC METHODS
SO
- l)SWE;
R5 =
i: iU1 - n(n + 1)2/4
l)sS
n(n2
- 1)/12
6d? =1 n(n' - 1) If there is total agreement in the rankings, then each d. = 0 and R, = 1. In the
case of perfect disagreement, d = n - i + 1i = n - 2i + 1, and R, = 1 as it should. Of course, one would reject the hypothesis of independence in favor of a positive correlation alternative for large values of R,. Alternatively, one could compute the p value of the test and reject H0 if the p value is less than or equal to . Note also that the distribution of R, is symmetric, so Table 14 (Appendix C) gives the p values p = PERS
r] = PER,
r]
for possible observed values r or r of R, for n 10, For n > 10, approximate p values or approximate critical values may be obtained using Student s t dis tribution approximation, T
\/fl - 2R,
Ji - R,
t(n - 2
Theorram 14.8.1 For an observed value of R, = r,, a size test of H0 : F(x, y) = F1(x)F2(y) against Ha : "positive correlation" is to reject H0 if p = p[R, ? r,] z , or approximately - 2r,/\/1 - r,2 ? t1 _5(n - 2). For the alternative Ha: "negative correif t = lation," reject H0 if p = PER, r,] z , or approximately if t t1_,(n - 2). The ARE of R, compared to R under normal assumptions is (3/it)2 = 0.91.
xampIe 14.8.2 Now we will compute Spearman's rank correlation coefficient for the paired data considered in Examples 14.5.2 and 14.8.1. Replacing the observations with their ranks gives the following results: Rank (x,)
1
2
3
4
5
6
7
8
9
10
Rank (y,)
1
3
2
5
4
7
6
8
9
12
d.
0
1
1
0
0
2
1
1
1
1
Rank(x,)
11
12
13
14
15
16
17
18
19
20
Rank (y1)
13
14
10
15
11
16
18
19
17
20
2
2
3
1
4
0
2
0
1
1
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
14.8 CO9RELATION TESTSTESTS OF INDEPENDENCE
491
Any convenient procedure for computing the sample correlation coefficient may be applied to the ranks to obtain Spearman s rank correlation coefficient but because the x, already were ordered, it is convenient to use the simplified formula based on the d1, which gives
R, = i
6
- n(n2
d2
6(50) 20(202 -
- 1)
1)
- 096
TEST OF RANDOMNESS Consider n observations x1, .. , x. The order of this sequence of observations may be determined by some other variable, Y, such as time. We may ask whether these observations represent a random sample or whether there is some sort of trend associated with the order of the observations. A test of randomness against a trend alternative is accomplished by a test of independence of X and Y. The Y variable usually is not a random variable but is a labeling, such as a fixed sequence of times at which the observations are taken. In terms of ranks, the subscripts of the x's are the ranks of the y variable, and Spearman's rank correlation coefficient is computed as described earlier A test of H0 F1(x) = = F(x) against a one-sided alternative of the type Ha: F1(x) > F2(x) ... > F(x), for all x,
would be carried out by rejecting H0 for large values of R,. This alternative represents an upward trend alternative.
Under normal assumptions, a particular type of trend alternative is one in which the mean of the variable x is a linear function of i,
i=1,...,n
X'N(ß0+ß1i,o2)
In this framework, a test of H0 : = O corresponds to a test of randomness. The usual likelihood ratio test for this case is UMP for one-sided alternatives, and it
turns out that the ARE of the nonparametric test based on R, compared to the likelihood ratio test is (3/)1í3 = 098 when the normality assumptions hold There are, of course, other types of nonrandomness besides upward or down ward trends For example there could be a cyclic effect Various tests have been developed based on runs, and one of these is discussed in the next section.
Examplø 14.8.3 In Example 14.7.1 the lifetimes between successive repairs of airplane air conditioners were considered as a random sample. If the air conditioners were not restored to like-new conditions, one might suspect a downward trend in the lifetimes. The lifetimes from the first plane and their order of occurrences are shown below: i
2
3
4
5
6
7
8
9
10
11
12
23
261
87
7
120
14
62
47
225
71
246
21
RarìI (x)
4
12
8
1
9
2
6
5
10
7
11
3
d.
3
10
5
-3
4
-4
-1
-3
-3
0
-9
x
1
1
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
492
CHAPTER 14 NONPARAMETRIC METHODS
We find
R=i
12(122_ 1)
- 0.035
Because R > 0, there is certainly no evidence of a downward trend. if we consider a two-sided alternative, for = 0.10, then we find c095 = 0.497 and there is still no evidence to reject randomness
1409 WALD-WOLFOWITZ RUNS TEST Consider a sequence of observations listed in order of occurrence, which we wish
to test for randomness Suppose that the observations can be reduced to two types, say a and b. Let T be the total number of runs of like elements in the sequence. For example, the following numbers were obtained from a "random number generator" on a computer.
0.1, 0.4, 0,2, 0.8, 0.6, 0.9, 0.3, 0.4, 0.1, 0.2 Let a denote a number less than 0.5 and b denote a number greater than 0.5, which gives the sequence
aaabbbaaaa For this sequence, T = 3. A very small value of T suggests nonrandomness, and a very large value of T also may suggest nonrandomness because of a cyclic effect.
In this application the number of a's, say A, is a random variable, but given the number of a's and b's, A and B, there are N = (A + B) !/A ! B! equally likely permutations of the A + B elements under H0. Thus the permutations associated with very small values of T or very large values of T are placed in the critical region. Again, for a specified value of =k/N, it is necessary to know what critical values for T will result in k permutations being included in the critical region. It is possible to work out the probability distribution analytically for the number of runs under H0. Given A and B, the conditional probability distribution of the number of runs under H0 is
(A - i ( B - 1
P[T=r]
2/2 - 1)\r/2 - i (A+B\
,reven
kA ) (A-1 V B-1 (A-1 V B-1 - 1)/2)(r - 3)12) + (r - 3)/2)(r - 1)/2 (A + B
r odd
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
493
14.9 WALD-WOLFOWITZ RUNS TEST
For example, for even r there are exactly r/2 runs of a's and r/2 runs of b's. The sequence may start with either an a sequence or a b sequence, hence the factor 2. Now suppose that the sequence starts with an a. The number of ways of having r/2 runs of A a's is the number of ways of putting r/2 - 1 slashes into the A - 1
spaces between the a's, which is
((ri_1
dividing the B b's into r/2 runs is ((/2)
-
Similarly, the number of ways of which gives ((r/2)
-
1)((r/2) for the total number of ways of having r runs starting with a. The number of runs starting with b would be the same, and this leads to the first equation. The odd case would be similar, except that if the sequence begins and ends with an a, then there are (r + 1)/2 runs of a's and (r 1)/2 runs of b's. In this case,
the number of ways of placing [(r + 1)/2] - 1 = (r - 1)/2 slashes in the A - i spaces is
((rA_2) and the number of ways of placing
[(r - 1)12] - 1
= (r - 3)/2 slashes in the B - 1 spaces is
The total number of ways of having r runs beginning and ending with b is i 1)/2)k\(r - 3)/2
(B 1 ( A
-
-
In the above example A = 7, B = 3, r = 3, and
P[T3]=P[T=2]+P[T=3] (6(2 (6\(2 (6(2 (10
)+ (io
7) = 0083
=
Thus for a one-sided alternative associated with small T, one could reject H0 in this example at the c = 0.083 level. Tabulated critical values for this test are available in the literature (see, for example, Walpole and Myers, 1985, Table A,18), as are large-sample normal approximations. The runs test is applicable to testing equality of distributions in two sample problems by ranking the combined samples of x's and y's, and then counting the number of runs. The runs test is not as powerful as the WilcoxonMann-Whitney test in this case. It can be shown that E(T)
2AB
-A+B
+
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
494
CHAPTER 14
NONPARAMETRIC METHODS
and Var(T)
2AB(2AB - A - B) = (A + B)2(A + B - 1)
and for A and B greater than 10 or so the normal approximation is adequate, where tOE = E(T) +
z.fVar(T)
Example 14.9.1 We wish to apply the runs test for randomness in Example 14.8,3. The median of the x, is, say, (62 + 71)/2 = 66.5, and we obtain the following sequence of a's and b's:
abbabaaabbba We have r = 7, A = B = 6, and PET
7] = 0.61, P(T 7) = 0.22, and 7] = 0.61, so as before we have no evidence at all of nonrandomness. In this example E(T) = 7, Var(T) = 2.72, and the normal approximation with correction for discontinuity gives PET
\J2.72J
0.62
SUMMARY Our purpose in this chapter was to develop tests of hypotheses, and in some cases
confidence intervals, that do not require parametric assumptions about the model. In many cases, only nominal (categorized) data or ordinal (ranked) data are required, rather than numerical (interval) data. The one-sample sign test can be used to test a hypothesis about the median of a continuous distribution, using binomial tables, In the case of a normal distribution, this would provide an alternative to tests based on parametric assumptions such as the t test. However, the sign test is less powerful than the t test. A similar test can be used to test hypotheses about a percentile of a continuous distribution. Nonparametric confidence intervals also are possible, and this is related to the problem of nonparametric tolerance limits, which also can be derived. It is also possible, by means of thé two-sample sign test, to test for a difference in location of two continuous distributions. However, as one might suspect, if it is applied to test the difference of normal means, then the power is not as high as it would be with a two-sample t test. The power situation is somewhat better for a test based on ranks, such as the WilcoxonMann-Whitney tests, Tt is also possible to test nonparametrically for independence, One possibility is
to adapt the usual sample correlation coefficient by applying it to the ranks
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
495
EXERCISES
rather than to the values of the variables. This yields the Spearman's rank correlation coefficient
Another question that arises frequently concerns whether the order of
a
sequence of observations has occurred at random or whether it was affected by some sort of trend associated with the order A nonparametric test of correlation, such as the Spearman test, can be used in this situation, but another common choice is the Wald-Wolfowitz runs test. As noted earlier, this test can be used to test equality of two distributions, but it is not as powerful as the WilcoxonMann-Whitney test in this application.
EXERCISES 1
The following 20 observations are obtained from a random number generator. 0.48, 0.10, 0.29, 0.31, 0.86, 0.91, 0.81, 0.92, 0.27, 0.21,
0.31, 0.39, 0.39, 0.47, 0.84, 0.81, 0.97, 0.51, 0.59, 0.70 Test
m = 0.5 against H m> 0.5 at a = 0.10.
Test H0 m = 0.25 against Ha : m > 0.25 at a
2
0.10.
The median U S family income in 1983 was $24 58000 The following 20 family incomes were observed in a random sample from a certain city
23470 48160 15350 13670
580 20130 25570
20,410, 30,700, 19,340, 26,370, 25,630, 18,920, 21,310, 4,910, 24,840, 17,880, 27,620, 21,660, 12,110
For the median city family income, m, test H0 m = 24,800 against Ha : m <24,800 at
a
0.10.
The median number of hours of weekly TV viewing for children ages 6-11 in 1983 was 25 hours. In an honors class of 50 students, 22 students watched TV more than 25 hours per week and 28 students watched TV less than 25 hours per week. For this class, test m = 25 against Ha : m <25, at a = 0.05.
For the data in Exercise 2, test the hypothesis that 10% of the families make less than $16,000 per year against the alternative that the tenth percentile is less than $16,000. Using the first bus motor failure data in Exercise 15 of Chapter 13, test miles against Ha : x025 > 40,000 miles at a = 0.01.
: x025 = 40,000
Use the data in Exercise 2. What level lower confidence limit for x025 can be óbtained using x2.? Obtain an upper confidence limit for x025. Obtain an approximate 95% lower confidence limit on the median family income.
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
496
CHAPTER 14 NONPARAMETRIC METHODS
7. Consider the data in Exercise 24 of Chapter 4. Test H0 = 5.20 against Ha x050> 5.20 at approximation
= 0.05. Use the normal
+ 0.5 - n)
B(x;
Find an approximate 90% two-sided confidence interval on the median weight. Find an approximate 95% lower confidence limit on the 25th percentile, X025.
8. Consider the data in Exercise 24 of Chapter 4. Set a 95% lower tolerance limit for proportion 0.60. Set a 9O% two-sided tolerance interval for proportion 0.60.
9. Repeat Exercise 8 for the data in Example 4.6.3. 10. Consider the data in Exercise 24 of Chapter 4. Determine an interval such that one could expect 94.5% of the weights of such major league baseballs to fall.
11. Ten brand A tires and 10 brand B tires were selected at random, and one brand A tire and one brand B tire were placed on the back wheels of each of 10 cars. The following distances to wearout in thousands of miles were recorded. Car 1
2
3
4
5
6
7
8
9
10
23 20
20 30
26 16
25 33
48 23
26 24
25
24
8
21
15 13
20 18
Assume that the differences are normally distributed, and use a paired-sample t test to test H0 : PA PB against H : PA > p at e 0.10. Rework (a) using the two-sample sign test.
Suppose that 20 people are selected at random and asked to compare soda drink A against soda drink B. If 15 prefer A over B, then test the hypothesis that soda B is preferable against the alternative that more people prefer brand A at e = 0.05. Twelve pairs of twin male lambs were selected; diet plan I was given to one twin and diet plan lIto the other twin in each case. The weights at eight months were as follows.
Diet I: Diet II:
111
102
90
110
108
125
99
121
133
115
90
101
97
90
96
95
110
107
85
104
119
98
97
104
Use the sign test to test the hypothesis that there is no difference in the diets against the alternative that diet I is preferable to diet II at e = 0.10. Repeat (a) using the Wilcoxon paired-sample signed-rank test. Because n is only 12, use the table, but also work using the large-sample normal results for illustration purposes.
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
EXERCISES
497
Siegel (1956, page 85) gives the following data on the number of nonsense syllables remembered under shock and nonshock conditions for 15 subjects:
Subject
1
2
3
4
5
6
7
8
10
9
12 13
11
14 15
543524224 153 220332311 43 342240
Nonshock Shock
1
Test H0 that there is no difference in retention under th two conditions against the alternative that more syllables are rémembered under nonshock conditions at = 0.10. Use the Wilcoxon paired-sample signed-rank test. Davis (1952) gives the lifetimes in hours of 40-watt incandescent lamps from a forced-life test on lamps produced in the two indicated weeks:
1-2-47 1 0-2--47
1067
919
1196
785
1126
936
1105 1243 1204 1203 1310
1262
918
1156
920
948
1234 1104 1303 1185
Test the hypothesis that the manufacturing process is unchanged for the two different
periods at = 0.10. a) Work out both the small-sample and large-sample tests based on U. (b) Although these are not paired samples, work out the test based on the Wilcoxon paired-sample test. The following fatigue failure times of ball bearings were obtained from two different
testers Test the hypothesis that there is no difference in testers H0 F(x) = F(y) against Ha : F(x)
17.
F(y). (Use o = 0.10).
Tester 1
140.3
158.0 183.9
117.8
98.7
Tester 2
193.0
172.5 173.3 204.7 172.0
152.7
132.7
164.8
136.6
93.4 116.6
234.9 216.5 422.6
In Exercise 11, test the hypothesis that the brand A and brand B samples are independent at
= 0.10.
Use Pearson's r with the approximate t distribution. Use Spearman's R, Compare the small-sample and large-sample approximations in this case.
18. Consider the data in Exercise 13. Estimate the correlation between the responses (xi, y) on the twin lambs. Test H0 : F(x, y) = F1(x)F2(y) against the alternative of a positive correlation at level = 0.05, using Pearson's r. Compare the asymptotic normal approximation with the approximate t result in this case. Repeat the test in (b) using R,.
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
498
CHAPTER 14 NONPARAMETRIC METHODS
1. In a pig-judging contest, an official judge and a 4-H Club member each ranked 10 pigs as follows (Dixon and Massey, 1957, page 303): 9
Judge
4
3
7
2
1
5
8
4-H Member
10
6
10
Test the hypothesis of independence against the alternative of a positive correlation at = 0.10.
27. Use R, to test whether the data in Exercise 1 are random against the alternative of an upward trend at
= 0.10.
21. Proschan (1963) gives the times of successive failures of the air-conditioning system of Boeing 720 jet airplanes The times between failures on Plane 7908 are given below
413, 14, 58, 37, 100, 65, 9, 169, 447, 184, 36, 201, 118, 34, 31, 18, 18, 67, 57, 62, 7, 22, 34 1f the failures occur according to a Poisson process then the times between failures should be independent exponential variables. Otherwise, wearout or degradation may be occurring, and one might expect a downward trend in the times between failures. Test the hypothesis of randomness against the alternative of a downward trend at = 0,10.
22. The following values represent the times between accidents in a large factory: 8.66, 11.28, 10.43, 10.89, 11.49, 11.44, 15.92, 1250, 13.86, 13.32 Test the hypothesis of randomness against an upward trend at c = 0,05.
23. Use the runs test to test randomness of the numbers in Exercise 1.
24. Use the runs test to work Exercise 16. 25. Suppose that a runs test is based on the number of runs of a's rather than the total number of runs. Show that the probability of k runs of a's is given by
(A - l\(B + 1 Pk
k-1,A (A+B
k
Rework Exercise 23 by using (a).
26. For the Mann-Whitney statistic U, show that Var (U) is given by equation (14.7.6).
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
C
T
A
E
R
REGRESSION AND
LINEAR iìOûELS
15.1 INTRODUCTION Random variables that are observed in an experiment often are related to one or
more other variables. For example, the yield of a chemical reaction will be affected by variables such as temperature or reaction time, We will consider a statistical method known as regression analysis that deals with such problems. The term regression was used by Francis Galton, a nineteenth-century scientist, to describe a phenomenon involving heights of fathers and sons. Specifically, the study considered paired data, (x1, yi), ..., (x, y,,), where x and represent, respectively, the heights of the ith father and his son. One result of this work was
the derivation of a linear relationship y = a + bx for use in predicting a son s height given the father s It was observed that if a father was taller than average then the son tended also to be taller than average, but not by as much as the father. Similarly, Sons of fathers who were shorter than average tended to be shorter than average, but not by as much as the father. This effect, which is 499
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
500
CHAPTER 15
REGRESSION AND LINEAR MODELS
known as regression toward the mean, provides the origin of the term regression analysis, although the method is applicable to a wide variety of problems.
15,2 LINEAR REGRESSION We will consider situations in which the result of an experiment is modeled as a random variable Y, whose distribution depends on another variable x or vector of variables x = (x0, x1, ..., xv). Typically, the distribution of Y also will involve one or more unknown parameters. We will consider the situation in which the expectation is a linear function of the parameters
E() = ß0x0 + ß1x1 + ... + ßx
(15.21)
with unknown parameters, ß0, fi1, ..., ß. Usually, it also is assumed that the variance does not depend on x, Var(Y) = cr2. Other notations, which are sometimes used for the expectation in equation (15.2.1) are or E(YIx), but these notations will not represent a conditional expectation in the usual sense unless x0, x1, ..., x are values of a set of random variables. Unless otherwise indicated, we will assume that the values x0, x1, ..., x, are fixed or measured without error by the experimenter.
A model whose expectation is a linear function of the parameters, such as (15.2.1) will be called a linear regression model. This does not require that the model be linear in the xe's. For example, one might wish to consider such models as E(lÇ) =J30x0 +ß1x1 +ß2x0x1 or E(IÇ) =ßoex +ßje2x, which are both linear in the coefficients but not in the variables. Another important example is the polynomial regression model in which the xi's are integer powers of a common variable x. In particular, for some p = 1, 2,
/0 + ß1x + fi
+ '' + fi,x"
(15,2.2)
Some regression models involve functions that are not linear in the parameters, but we will not consider nonlinear regression models here Another way to formulate a linear regression model is
Y=ß0x0 +ß1
+
'' +
+s
(15.2.3)
in which e is interpreted as a random error with E(c) = O and Var() = ci2. It also is possible to have a constant term by taking the first component in X to
be 1. That is, if X =(1, x1, ..., x), then E(Y) =ß0 +ß1x1 + '' + In the next section we will study the important special case in which p = i and the model is linear in x1 = x.
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
501
15.3 SIMPLE LINEAR REGRESSION
15.3 SIMPLE LINEAR REGRESSION Consider a model of the form
Ç=ßo+ß1x+e
(15.3.1)
2 In this section we will develop the properties of with E(e) = O and Var(es) such a model, called the swiple hnear model, under two different sets o! assumptions First we will consider the problem of estimation of the coefficients fi0 and fi under the assumption that errors are uncorrelated LEAST-SQUARES APPROACH Suppose x1, ..., x,, are fixed real numbers and that experiments are performed at each of these values, yielding observed values of a set of n uncorrelated random variables of form (15.3.1). For convenience we will denote the subscripts by i rather than x. Thus, we will assume that for i = 1, ..., n,
E() = fi0 + ßx1
Var() = c2
Cov(1, 1) = O
i
j
The resulting data will be represented as pairs (xj, y1), ..., (x, ya). Suppose we write the observed value of each Y1 as y = fi9 + fi1 x. + e1 so that
e is the difference between what is actually observed on the ith trial and the theoretical value E( YJ. The ideal situation would be for the pairs (x1, y1) to all fall
on a straight line, with all the e = O, in which case a linear function could be determined algebraically However, this is not likely because the y, s are observed values of a set of random variables The next best thing would be to fit a straight line through the points (x,, yj in such a way as to minimize, in some sense, the resulting observed deviations of the Yl from the fitted line. That is, we choose a line that minimizes some function of the e = y - fi0 - fi1 x. Different criteria for
goodness-of-fit lead to different functions of e1, but we will use a standard approach called the Principle of Least Squares, which says to minimize the sum of the squared deviations from the fitted line. That is, we wish to find the values of fi0 and fi1, say / and ¡?., that minimize the sum n
s= 1=1
(jis - fib - fi1 x.)2
Taking derivatives of S with respect to fi0 and fi1 and setting them equal to zero as solutions to the equations gives the least-squares (LS) estimates j? and 2
[yÌ/0fl1x1](-1)=O
2
[YI ß0 -1x1](-x1) = O
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
502
CHAPTER 15 REGRESSION AND LINEAR MODELS
Simultaneous solution gives
x,y,(x1»Jy1)/n
'
(x,
=
- )(y -
-
(x1 (x,
and
Thus, if one wishes to fit a straight line through a set of points, the equation + x provides a straight line that minimizes the sum of squares of the y= errors between observed values and the points on the line, say SSE
=
i1
=
- -
i x.]2
- -
The quantities ê, = y, x are known as residuals and SSE is called the error sum of squares. The least-squares principle does not provide a direct estimate of a2, but the magnitude of the variance is reflected in the quantity SSE. It can be shown (see Exercise 8) that an unbiased estimate of a2 is given by a-2
SSE
n-2
The notation 2 will be used throughout the chapter for an unbiased estimator of c2 The notation &2 will represent the MLE, which is derived later in the section The following convenient form also can be derived (see Exercise 5):
SSE =yß0yfi1 >xy Also
=ß0 +fi1x may be used to predict the value of Y, and the same quantity would be used to $'
estimate the expected value of Y. That is, an estimate of E(Yj=ß0 +ß1x is given by
Ê() =/O +jX Note also that y = 5' + fi1(x - ) which reflects the regression adjustment being made to the overall mean, j'.
Other linear combinations of ß0 and fi1 could be estimated in a similar manner. The LS estimators are linear functions of the Y,'s, and it can be shown that among all such linear unbiased estimators, the LS estimators have minimum t Linear Unbiased variance. Thus, the LS estimators often are referred to as Estimators (BLUEs).
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
15.3
SIMPLE LINEAR REGRESSION
503
Theorem 15.3.1 If E(IÇ) = ß0 + ß1x1, Var() = cr2 and Cov(Y, ) = O for i then the LS estimators have the following properties:
1. E(ß1) = ß1,
j and i = 1, ..., n,
Var(81) -
E(c1j0 + c2fi1) = c1ß0 + c2ß1
ckß + c2ßi is the BLUE of c1ß0 + c2ß1
Proof Parti: E[
(x
(x
=
=(x )ß0 +ß1 (x1)x1 = 0 ß0 +ßj =ßi (x -
(x1 -
+) 2
it follows that
Also (x1
Var(ß1) = Var[ L
[
-
(x )2]2
(x
Var[(x1
-
)Z]
(x
[
(x 2
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
504
CHAPTER 15 REGRESSION AND LINEAR MODELS
Part 2: E(O0) = E(
=
E(1)ß1 Wo +ßx)fl1
=ßo +ßß = po
Now Y and / are not uncorrelated, but / can be expressed as a linear combination of the uncorrelated 17s, and the variance of this linear combination can )2, then ß1 = be derived If we let b = (x1 - )/> (x3 b, } and i
-x
Po = -
=
b1
d, }
where d. = - - 5b and Var(J10) =
d
2x2 (x
after some algebraic simplification. Part 3 follows from Parts 1-2 and the linearity of expected values.
Part 4:
Any linear function of the )'s can be expressed in the form c1/10 + c2/i1 + a }Ç for some set of constants a1 , a For this to be an unbiased estima tor of c1ß0 + c2 ß1 requires that aj(ßc + /1 x1) = O for all /30 and /1, because
E(cJ0 + c2/J +
a 1) = c1ß0 + c2ß1 +
But a(ß0 + ß1x1) = O for all ß and /11 implies that Now,
a,(ß + ß1x) a = O and
ax = O.
cjo+c2/?1 =(c1d1+c2b1)i
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
15.3
SIMPLE LINEAR REGRESSION
505
and Cov(c
+ c2,ß1,
a 1) =
ac1d1 + c2 b»2
= [c1
ad + c2
= [1
ai(
-
abJr2
b) + c2
[ci
[n =0
The last step follows from the result that
a = O and
a1x = O imply
a b. = 0, which is left as an exercise. Thus
Var(c1ß0 + c2fr +
a, 1) = Var(c1/0 + cj1) +
a, ci2
This variance is minimized by taking a = 0, which requires a = 0, i = 1, ..., n. Thus, c1ß0 + c2fi1 is the minimum variance linear-unbiased estimator of c1ß0 + c2ß1, and this concludes the proof.
Example 15.3.1 In an article about automobile emissions, hydrocarbon emissions (grams per mile) were given by McDonald and Studden (1990) for several values of accumu-
lated mileage (in l000s of miles). The following paired data was reported on mileage (x) versus hydrocarbons (y). x. 5.133, 10.124, 15.060, 19.946, 24.899, 29.792, 29.877, 35.011, 39.878, 44.862, 49.795 y: 0.265, 0.287, 0.282,
0.286,
0.310,
0.333,
0,343,
0.335,
0.311,
0.345,
0.319
I? and = 304.377, , we note that n = 11 and compute = 10461.814, y.= 3.407, y = 1.063, and xy1 = 97.506. Thus, = 27.671, .p = 0.310,
To compute
(97.506) - (304.377X3.407)/1
- (10461.814) - (304.377)2/11 - 000 158 .
= 0.3 10 - (0.00158)(27.671) = 0.266
Thus, if it is desired to predict the amount of hydrocarbons after 30,000 miles, we compute 5 = 0.266 + 0.00158(30) = 0.3 13. Furthermore, SSE = 1.063 - (0.266)(3.407) - (0.00158X97.506) = 0.00268, and
&2
= 0.00268/9 = 0.000298.
The estimated linear regression function and the plotted data are shown in Figure 15.1.
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
506
FIGURE 15.1
CHAPTER 15
REGRESSION AND LINEAR MODELS
Hydrocarbon emissions as a function of accumulated mileage 0.36
0.34
= .266 + .00158x
0.26
0.00
10.00
20.00
30.00
40.00
50.00
Mileage (x 1000)
Example 15.3.2 Consider now another problem that was encountered in Example 13.7.3. Recall that the chi-square goodness-of-fit test requires estimates of the unknown param-
eters p and a. At that time it was discovered that the usual grouped sample estimate of a was somewhat larger than the grouped sample MLE and that this adversely affected the outcome of the test. It also was noted that the grouped sample MLEs are difficult to compute and a simpler method would be desirable. The following method makes use of the least-squares approach and provides estimates that are simple to compute and that appear to be comparable. Let the data be grouped into c cells and denote the ith cell by A. = (a1_1, a.] for the i = 1, ..., c, and let o be the number of observations in A1, Let F. = (o + ' + o1)/n. Because a0 is chosen to be less than the smallest observation, the value F(a0) will be negligible, particularly if n is large. Thus, F, is an estimate of the CDF value F(aj, which in the present example is D«a - u)/cr). It follows that approximately
which suggests applying the simple linear regression method with x1 = a and
= 11(F3,
ß0 =
p/a, and ß1 =
1/a. The last cell is not used because F, = 1.
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
507
15.3 SIMPLE LINEAR REGRESSION
For the radial velocity data of Example 13.7.3, the estimates are and fi1 = .0778, which expressed in terms
p = 20.64 and &
of p and
cr
give
= 1.606 estimates
12.85. These are remarkably close to the MLEs for
grouped data, which are 20.32 and 12,70. Of course, another possibility would be to apply the simple linear method with x, = 1 - '(F,) and y, = a In this case the parameterization is simpler because 0 = p and ß1 = cr, although in this situation the y is fixed and the x is variable. However, this approach also gives reasonable estimates in the present example. Specifically, with this modification, the estimates of p and o are 20.63 and 12.73, respectively.
This approach is appropriate for location-scale models in general. Specifically, if F(x) = G(x
'i), then approximately, G'(F1)
u/O + (1/O)a,, so the simple
linear model can be used with ß0 = ii/O, /1 = 1/O,
y
G1(F1), and x. =
and the resulting estimates of O and ii would be Ô = 1/fir and
= -fi0/fi.
MAXIMUM LIKELIHOOD IPPROACH Now we will derive the MLEs of ß0, ß1, and 02 under the assumption that the random errors are independent and normal, & N(0, 0.2), Assume that x1, ..., x are fixed real numbers and that experiments are performed at each of these values, yielding observed values of a set of n independent normal random variables of form (15.3.1). Thus, Y, ..., are independent, Y N(ß0 + ß,x, 0.2). The resulting data will be represented as pairs (x1, y1), ..., (x, ya).
Theorem 15.3.2 If Y1,
...,
are independent with Y1
N(ß0 + ß1x1, 0.2) for i = 1, ..., n, then the
MLEs are
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
508
CHAPTER 15
REGRESSION AND LINEAR MODELS
Proof The likelihood function is
L = L(ß0, ß1,
2)
\/22 exp [_
(y - ßo - ßixJ2]
(15.3.2)
Thus, the log-likelihood is
lnL= _ln(2iw.2)__1 2
o.
(yß0ß1x)2
If we set the partials to zero with respect to fi0, fi1, and o.2, then we have the ML equations
nO + (
=
x)o + (
(
n
n&2 = ¿=1
(y -
(15.3.3)
i1
x?)1 =
(15.3.4)
- j1xj2
(15.3.5)
The MLEs of fi0 and fi1 are obtained by solving equations (15.3.3) and (15.3.4), which are linear equations in / and ¿&.
Notice that the MLEs of fi0 and fi1 are identical in form to the BLUEs, which were derived under a much less restrictive set of assumptions. However, it is possible to establish some useful properties under the present assumptions.
Theorem 15.3.3 If Y1, ..., S1
are independent with }Ç
=
1, S2 =
N(ß0 + fi1x1, 0.2) and
x Y, and S3 =
Y, then
The statistics S, S2, and S3 are jointly complete and sufficient for fi0' fi1'
and
c2.
If 2 is fixed, then S and 2 are jointly complete and sufficient for fi0 and fir.
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
15.3
509
SIMPLE LINEAR REGRESSION
Proof Part 1: given by equation (15.3.2), can be written as
The joint pdf of Y1, . .,
f(y1, ...,y)=(2iui2)2 exp --1=1
y+-
(fi0 +
i=1
y
ßjxj)]
= C(0)h(y1, ..., y) exp [qi(o) + q2(0)
xy + q3(0)
with O = (flo ß1, ç2),
C(0) = (2irr2)-"2 exp
--
o
+ ßjxj)2/(22)]
h(y1, ..., y) = 1, q1(0) = ß0/a, q2(0) = ß1/o, and q3(9) = - l/(22). This is the multivariate REC form of Chapter 10. Part 2 follows by rewriting the pdf as
and
..., y) = exp [and &2 are jointly complete and sufficient for ß0, x Y, and Y ß1, and ç2 because they can be expressed as functions of Y x and thus Y and and ß are functions of Similarly, if ci2 is fixed then they are jointly complete and sufficient for fi0 and ß.
Notice that the MLEs ß0,
We note at this point that some aspects of the analysis are simplified if we consider a related problem in which the x's are centered about 5. Specifically, if we let xr = - . and ß, = fi0 + ß1., then the variables of Theorem 15.3.2 can = 0. In be represented as Y = fi. + ß1(x - ) + = fi. + ß1x' + where has the form this representation, /
Ic = J'
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
510
CHAPTER 15 REGRESSION AND LINEAR MODELS
and i=i
It also is easily verified that , /, and ô2 are jointly complete and sufficient for fi1, and o, and if a2 is fixed, then and / are jointly complete and sufficient for fi and fi1.
An interesting property of the centered version that we will verify shortly is that the MLEs of fie. fi1, and a2 are independent. This property will be useful in proving the following distributional properties of the MLEs of fi0, fi1, and a2. Tiworem 15.3.4 If }Ç = fi0 + ß1x1 + with independent errors - N(O, a2), then the MLEs of fi0 and fi1 have a bivariate normal distribution with E(ß0) = fi0, E(/1) = fi1, and
c2x2 Var(ß0)
-n
Var(ft1)
-
Cov(fl0, /i) Furthermore,
--
0.2
(x XG
(x
/) is independent of 2, and nô 2/a2 -.. f(n - 2).
Proof The proof is somewhat simpler for the centered problem, because in this case the estimators of the coefficients are uncorrelated and all three MLEs will be independent. Our approach will be to prove the result for the centered case and then extend it to the general case.
-)
We define a set of n + 2 statistics, T4' = }-ß /21(x if i = 1,..., n, U1 = .ß, and U2 = /. Notice that each variable is a linear combination in the independent normal random variables Y1, U1 =
i1
with coefficients
(l/n))
b = (x
= - 1/n - (x1 - )b if j
= j1
...,
b}
. Specifically, cii 1
=
(x
1 1/n(x1-5)b, and n
i. It is easily verified that
= O and
c=O
for each i = 1, ..., n. Using these identities first we will derive the joint MGF of the Ui's and then derive the joint MGF of the W7s.
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
15.3
sii
SIMPLE LINEAR REGRESSION
Let U = (U1, U2) and t = (t1, t2), and consider
M(t) = E[exp (t1U1 + t2 U2)] exp
(1
(1/n)
= E[ex
+
t2
bj
a1
(15.3.6)
= t1,
with aj = (t1/n + t2 bi). It can be verified that
and >
a = ti/n ± t/
(x1
(x -
= t2,
5)2 (see Exercise 12).
Because each 1 is normal, its MGF, evaluated at aj, has the form
M1(a) = exp [(ß + ß1(x - ))a + cr2aÌ] Thus, from equation (15.3.6) we have that
M(t) = =
fi M1Ja)
i= i
fi
exp [(ß + ß1(x -
+ cj2a]
= exp [
=exp[ß L j=i = exp
+ ßi
(x - )a + 1c2
a
[ß ti + ß1t2 + C2t/n + C2t
= exp
t1 + (C2/n)t] exp [ßit2 +
2t/
(x
)2]
The first factor, a function of t1 only, is the MGF of N(ß, C2/n), and the second )2) Thus, / and / factor, a function oft2 only, is the MGF of N(ß1, C2/ (x1 are independent with / NW, cr2/n) and N(ß1, c2/
(x
We know that for fixed C2, and û1 are complete and sufficient for ß and ß. We also know from Theorem 10.4.7 that any other statistic whose distribution does not depend on ß and ß must be independent of and ß1. Furthermore, if the other statistic is free of C2, then it is independent of ß and ß1 even when C2 is not fixed.
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
512
CHAPTER 15
REGRESSION AND LINEAR MODELS
The next part consists of showing that the joint distribution of the Wi's does not depend on or ß1. If this can be established, then it follows from Theorem 10.4.7 that the We's are independent of the U1's. Let W = (Wi, ..., ) and t = (t1, ..., ta), and consider
M(t) = E[exP
(i1
= E[exP
(jt
-
exp ( j=1 i=1
= E[exp (ii, with
tL
d
(15.3.7)
t c1. The following identities are useful in deriving the MGF (see
d
Exercise 12):
d =0 j= 1
The rest of the derivation is similar to that of the MGF of U1 and U2 except that a is replaced with d. for allj = 1, . ., n. We note from (15.3.7) that
M(t)
=
fi MJd) d+
= exp [ß. 11
=
[
¡n
1
j1
(x1
-
)d1 + j2
d]
\2
1=1 ct)j=1
-
This last function depends on neither ß nor fi1, which as noted earlier means that W = (W1, .. -, M) is independent of ß and ß1.
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
15.3 SIMPLE LINEAR REGRESSION
513
The rest of the proof relies on the following identity (see Exercise 13): [1
-
- ß1(x1
)]2
=
2
+ n(
- ß)2 +
(x
)2
- fi)2 (1 5.3.8)
We defined random variables Z = [1 - fi - ß1(x1 - )]/, for
- fi)/a and Z2 =
Z,1 =
(x,
)2
i
= 1, ..., n,
- ß1)/a which aie all stan
dard normal. From equation (15.3.8) note that n
+ z.+1 +
= If we define V1 .= i
z2
(15.3.9)
Z, V2 = n&2/c2, and V3 = Z1 + Z2, then V1
X2(n)
and V3 X2(2), Furthermore, because V2 = nê2/o-2 is a function of W1, ..., we know that V2 and V3 are independent. From equation (15.3.9), it follows that V1 = J' + V3, and the MGF of V factors as follows:
M1(t) = M2(t)M3(t)
(1 - 2t)'2 = M2(t)(1 - 2t) Thus, M2(t) = (i - 2t)2'2, from which it follows that V2 x2(n - 2). This proves the theorem for the centered problem The extension to the general case can be accomplished by using the fact that ß0 is a linear function of ß and ¡Ji, Using this relationship, it is straightforward to derive the joint = MGF of / and ¡J based on the MGF of U = (J, /) (sèc Exercise 14). This
-
concludes the proof.
According to Theorem 15.3.4, E(û0) = fi0, E(ß1) = ß1, and E(n&2/o2) = n - 2,
from which it follows that /, fi1, and
2
- fia - fix]2/(n - 2) are
unbiased estimators. We also know from Theorem 15.3.2 that these estimators are complete and sufficient, yielding the following theorem.
Theorem 15.3.5 If Y1, ..., 1 are independent, 1 UMVUEs of fi,ß1, and c2.
N(ß + ß1x, c.2), then /, ft, and
2
are
It also is possible to derive confidence intervals for the parameters based on the above results.
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
514
CHAPTER 15
REGRESSION AND LINEAR MODELS
Theorem 15.3.6 If Y1, ...,
are independent, 1
N(ß, + ß1x, o.2), then 'y x 100% confidence
intervals for fi0, fi1, and cr2 are given respectively by
t+y)f2O'
2.
tu+y)12C
'
((n - 2)&2 (n - 2)2 k. X2i + y)/2
X1 -
where the respective t and x2 percentiles have n - 2 degrees of freedom.
Proof It follows from Theorem 15.3.4 that
Z°
a/[: x?/[n
(x
Z1 =
N(0, 1) 2)a2
v=
X2(n 2)
Furthermore, each Z0 and Z1 is independent of V. Thus, Z0
-
Q0 -
t(n - 2) (x
and
T'
Z1
,J2/.2
(x - )2(fi -
t(n 2)
The confidence intervals are derived from the pivotal quantities T0, T1, and V.
It also is possible to derive tests of hypotheses based on these pivotal quantities.
Theoroin 15.3J Assume that Y1, ... 1, are independent, N(ß + fi1x1, cr2), and denote by t0, t1, and y computed values of T0, T1, and V with fi0, fi1, and ci2 replaced by fi00, fi10, and o, respectively. 1. A size cc test of H0 : fi0 = fi00 versus Ha : fi0
jt
fi00 is to reject H0 if
t1_(n 2).
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
15.4 GENERAL LINEAR MODEL
2. A size
test of H0 : fi1 = fi10 versus Ha fi1 It1
515
fi10 is to reject H0
if
tl_/2(fl 2)
3 A size test of H0 a2 = cr versus Ha a2
is to reject H0 if
v>x_12(n-2) or v42(n-2) One-sided tests can be obtained in a similar manner, but we will not state them here (see Exercise 16).
Example 15.3.3 Consider the auto emission data of Exercise 15.3.1. Recall that x. = 304.377 )2 and = 10461.814 from which we obtain (x = 10461,814 - 11(27.671)2 2039.287. We also have that )1 = 0.266, / = 0.00158, and
= 000030 so that = 0017 Note that t 975(9) 2 262, fo2s(9) = 2 70, and X975(9) = 19 02 If we apply Theorem 15.3.6, then 95% confidence limits for fi0 are given by 0.266 ± (2.262)(0.017)J(1o461.814)/[(11)(2o39.287)] or 0.266 ± 0.038, and a 95% confidence interval is (0.228, 0.304). Similarly, 95% confidence limits for fi1 are given by 0.00158 ± (2.262)(0.017)/..J2039.287 or 0.00158 ± 0.00085, and a 95% confidence interval is (0.0007, 0.0024). The 95% confidence limits for o.2 are 9(0.00030)/19.02 = 0.00014 and 9(0.00030)/2.70 = 0.00100, and thus 95% confidence interval for a2 and a are (0.00014, 0.00100) and (0.012, 0.032), respectively.
154 GENERAL LINEAR MODEL Many of the results derived for the simple linear model can be extended to the general linear case. It is not possible to develop the general model conveniently without introducing matrix notation. A few basic results will be stated in matrix notation for the purpose of illustration, but the topic will not be developed fully here. We will denote the transpose of an arbitrary matrix A by A'. That is, if A {a1}, then A' = {a}. Furthermore, if A is a square nonsingular matrix, then we denote its inverse by A1. We also will make no distinction between a i x k matrix and a k-dimensional row vector. Similarly, a k x i matrix will be regarded the sa as a k-dimensional column vector. Thus, if e represents a k-dimensional column vector, then its transpose e' will represent the corresponding row vector. Consider the linear regression model (15.2.3) and assume that a response y is observed at the values x10, x1, ..., x, i = 1, ..., n with n p + 1. That is,
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
516
CHAPTER 15 REGRESSION AND LINEAR MODELS
assume that
E(YJ>ßjxj
Var()=i2
Cov(1, Y)O ij
We will denote by V a matrix such that the ifth element is the covariance of the and Y, V = {Cov(, Y)}. The matrix V is called the covariance variables
matrix of Y1, ..., 1Ç. We will define the expected value of a vector of random variables to be the corresponding vector of expected values. For example, if , f') is a row vector whose components are random variables, then W = (Wi, E(W) = (E(W), ... E(E'V,)), and similarly for column vectors of random variables. It is possible to reformulate the model in terms of matrices as follows:
V=21
E(F)=X
(15.4.1)
where lis the n x n identity matrix, and Y, f, and X are ¡X10
x=
p
Xn0
LEAST-SQUARES APPROACH The least-squares estimates are the values ß. = / that minimize the quantity
s=
r
i1
12
- j0 ß.x1]
= (Y - Xß)'(Y - X)
(15.4.2)
The approach used with the simple linear model generalizes readily. In other words, if we set the partials of S with respect to the ß's to zero and solve the resulting system of equations, then we obtain the LS estimates. Specifically, we solve
S= Pic
r
i1
2[
i - j0 ßxiij(_xiic)
O
k = 0, 1, ..., p
This system of equations is linear in the ß's, and it is conveniently expressed in terms of the matrix equation
X'Y=X'Xp
(15.4.3)
If the matrix X'X is nonsingular, then there exists a unique solution of the form
- (X'XyX'Y
(15.4.4)
Unless indicated otherwise, we will assume, that X X is nonsingular Of course, a more basic assumption would be that X has full rank.
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
15,4 GENERAL LINEAR MODEL
517
The estimators ,q are linear functions of Y1, ... Y, and it can be shown that they are unbiased. Specifically, it follows from matrix (15.4.1) and equation (154 4) that
E(e) = (X''X'E(Y) = (X'AY1X'Xp (15.4.5) = As in the case of the simple linear model, the LS estimates of the ß's for the general model are referred to as the BLUEs.
It also can be shown that the respective variances and covariances of the BLUEs are the elements of the matrix
C = {Cov(ß,
)} = c2(X') '
(15.4.6)
and that the BLUE of any linear combination o the ß's, say r'ß
=
rß is
given by r' (see Scheffe, 1959).
If E(Y) = X and V = {Cov(, BLUEofrß whereß=(X'A)'XY
Theorem 134.1 Gauss-Markov
)} = ci2!, then
r'
is the
This theorem can be generalized to the case in which
V={Cov(Y, Y)}ci24 where A is a known matrix. That is, the
's may be correlated and have unequal variances as long as A is known. It turns out that the BLUEs in this case are
obtained by minimizing the weighted sum of squares (Y Xp)'A"(Y Xv). Note that a2 also may be a function of the unknown ß,'s, say ci2 = c(ß).
Tieorm 15.4.2 Generalized
Gauss-Markov Let E(Y) = XJ and V = {Cov(Y, Y)} c(ß)A, where A is a matrix of known constants. The generalized least-squares estimates of l are the values that minimize S = (Y - X)'A '(Y Xp), and they are given by 1 = (X'A - 'X)' 'X'A 'Y. Also, r' is the BLUE of r'ß. U
Note that all of these results have, been developed in terms of the means, variances, and covariances and that no other distributional assumptions were made.
LINEAR FUNCTIONS OF ORDER STATISTICS One interesting application of Theorem 15.4.2, which arises in a slightly different context, is that of finding estimators of location and scale parameters based on
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
518
CHAPTER 15 REGRESSION AND LINEAR MODELS
linear functions of the order statistics. Consider a pdf of the form
g(W_fio)
f(w,fi0,ß1)
where g is a function independent of fi0 and fi1 and let
z
g(z)
For an ordered random sample of size n, let
=
= It follows that
E() =
= fi0 + ß1E[
0]
= fi0 + ßiE(Zi:n) = fi0 + ß1k1
That is, in this case let
/1
k1'
X=(:
\i
k/
Also,
v3 = Cov(4',, 14',,) = ß Cov(Z,
=
and
V=ßA It then follows that the components of =
(°) =(KA-1Ar1KA-1Y
are unbiased estimators of fi0 and fi1 that have minimum variance among all linear unbiased functions of the order statistics. The main drawback of this method is that the constants k and a, often are not convenient to compute. In some cases, asymptotic approximations of the constants have been useful,
It is interesting to note that if A is not used, then the ordinary LS estimates = (X'X)X'F still are unbiased, although they will not be the BLUEs in this case.
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
15.4 GENERAL UNEAR MODEL
519
MAXIMUM LIKELIHOOD APPROACH In this sectíon we will develop the MLEs and related properties of the general linear model under the assumption that the random errors from different experiments are independent and normal, e N(O, 0.2) Suppose Y1, ..., are independent normally distributed random variables, and they are the elements of a vector Y that satisfies model (15.4.1).
Theorem 15.4.3
If Y1,
..., 1 are independent with Y '-j N
for i =. 1, .. -, n, then
(io
the MLEs of ß0, ..., ß, and o-2 are given by
= (X'X)'X'Y
(15.4.7)
(Y XØy(Y XØ)
(15.4.8)
n
Proof The likelihood function is p
L=L
ßx11)i=o
(15.4.9)
Thus, the log-likelihood is 2xo-2)
-
-
2o-
ßx)
Y - X)'( Y - X
=-
in (2w2) - ---
S
(15.4.10)
where S is the quantity (15.4.2). Clearly the values of ß0, - ., ß that maximize function (15.4.9) are the same ones that minimize S. Of course, this means that the MLEs of the ß's under the present assumptions are the same as the BLUEs that were derived earlier. The MLE of o-2 is obtained by setting the partial derivative (15.4.10) with respect to o-2 to zero and replacing the ß's with the ß's. Another convenient form for the MLE of o-2 (see Exercise 25) is
Y'(YX)
(15.4.11)
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
520
CHAPTER 15 REGRESSION AND LINEAR MODELS
As in the case of the simple linear model, the minimum value of S reflects the amount of variation of the data about the estimated regression function, and this defines the error sum of squares for the general linear regression model, denoted by SSE = (Y - X)'(Y - Xe). Many of the results that were derived for the MLEs in the case of the simple linear model have counterparts for the general linear model. However, to state some of these results, it is necessary to introduce a multivariate generalization of the normal distribution MULTI VARIA TE NORMAL DiSTRIBUTION In Chapter 5 a bivariate generalization of the normal distribution was presented Specifically, a pair of continuous random variables X1 and X2 are said to be bivariate normal BVN(u1, P2, cr cr p), if they have a joint pdf of the form
f(x1 x2)
,____.exp ---Q 2 j 2iwr2J1p
(15412)
with
21 (lP)L\
- p'\2 7i
I
J
(x1 - p1'\(x2 - P2'\
2P1
\
II
O
J\
C2
1+
/
(x2_2)2] C2
This pdf can be expressed more conveniently using matrix notation. In particular, we define the following vectors and matrices: x
=
()
(')
V = {Cov(X1, X)}
lt can be shown that Q = (x - )'V_5(x - t) and that the determinant of V is
V = ra(l - p2). Notice that we are assuming that V is nonsingular. We will
restrict our attention to this case throughout this discussion This provides a way to generalize the normal distribution to a k-dimensional version.
Definition 15.4i A set of continuous random variables X1
X,, are said to have a multivariate
normal or !c-'ariate normal distributidn if the joint pdf has the form
f(x1
x,,)
- (2I
exp [-
- ) V1(x -
t15 413)
with x' = (x1.....X), ' = (i .....p,,), and V=5Cov(Xi, X)}, and where
=
E(X,) and V is a k >< k nonsingular covariance matrix.
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
521
15.4 GENERAL LINEAR MODEL
Notice that for a set of multivariate normal random variables X1, ..., Xk, the and covariance distribution is determined completely by the mean vector matrix V An important property of multivariate normal random variables is that ci?) (see Exercise 27). their marginais are normai, X '-.. Another quantity that was encountered in this development is the variable
Q = (x - It)' V 1(x - p) This is a special case of a function called a quadratic form. More generally, a quadratic form in the k variables x1, ..., x is a function of the form k
Q=
k
a0x1xJ
(15.4.14)
=1 j=0
where the ajj's are constants. It often is more convenient to express Q in matrix notation. Specifically, if A = {a} and x' = (x1, ..., x), then Q = x'Ax. Strictly speaking, Q = (x - It)'V1(x - Il) is a quadratic form in the differences x1 - j rather than x1, ..., x. An example of a quadratic form in the x.'s would be Q = xix = x Some quadratic forms in the y, s that have been encountered in this section are &2 and &2.
PROPERTIES OF THE ESTIMATORS Most of the properties of the MLEs for the simple linear model can be extended using the approach of Section 15.3, but the details are more complicated for the higher-dimensional problem. We will state some of the properties of the MLEs in the following theorems.
Theorem 15.4.4 Under the assumptions of Theorem 15.4.3 the following properties hold:
The MLEsß0, ..., / and &2 are jointly complete and sufficient, Ø has a multivariate normal distribution with mean vector Il and covariance matrix a2(X'X) 1
n&/c2 '- f(n
- p - 1).
and &2 are independent.
Each/ is the UMVUE of ß.
6 &2=(Y_Xß)(Y_X/2)/(np_ 1)istheUMVUEofr2 It also is possible to derive confidence intervals for the parameters. In the following theorem, let A =
= (X'X)' so that C = {Covß, /)} = a2A.
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
522
CHAPTER 15 REGRESSION AND LINEAR MODELS
Theorem 15.4.5 Under the assumptions of Theorem 15.4.3, the following intervals are y x 100% confidence intervals for the ß's and ci2:
-
1.
2
t11
j'
y)/2
1)2 (n
((n
-p-
\
Xi+y/2
j + t(l + y)/2
-p-
1)&2
where the respective t and x2 percentiles have n - p - i degrees of freedom.
Proof It follows from Theorem 15.4.4 that ß'j
=
N
so that
N(0, 1)
V- (npl)2
2
Furthermore, because Z and V are independent, Z
T=,,,_F__
(-ß3) '-t(np---l) ___
The confidence intervals follow from these results.
It also is possible to derive tests of hypotheses based on these pivotal quam tities.
Theorem 15.4.L
Assume the conditions Theorem 15.4.3 and denote by t and y computed values of
Tand Vwithß=ß0 and a2=ci. jtj
t112(n - pi)
test of H0 :
A size t
is to reject H0 if
versus Ha :
test of H0 :
A size
versus Ha : ß1> ß10 is to reject H0 if
=
t1(n - p - 1)
Asize test ofH0 : ci2 =
ci
versus Ha: ci2
ci
is to reject H0 if
v_j2(np 1) or y Xi2('P 1) A size cc test ofH0 : a2 = a
v(npl)
versus Ha: ci2> a is to reject H0 if
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
15.4 GENERAL LINEAR MODEL
523
Lower one-sided tests can be obtained in a similar manner, but we will not state them here.
Example 15.4i Consider the auto emission data of Example 15.31. Recall from that example that we carried out the analysis under the assumption of a simple linear model. Although there is no theoretical reason for assuming that the relationship between mileage and hydrocarbon emissions is nonlinear in the variables, the plotted data in Figure 15 1 suggests such a possibility, particularly after approx imately 40,000 miles An obvious extension of the original analysis would be to consider a second degree polynomial model That is, each measurement y, is an observation on a random variable 1Ç = ß0 + ß1x, + ß2 x +r,, with independent normal errors,; N(0, ci2) This can be formulated in terms of the general linear model (15.4.1) using the matrices
¡ßo \ ßi \ß2 /
with n = 11. Recall from equation (15.4.4) that if the matrix X'X is nonsingular, then the LS estimates, which also are the ML estimates, are the components of the vector = (X'X) 'X' Y. Although the recommended procedure is to use a statistical software package such as Minitab or SAS, it is possible to give explicit formulas for the estimates In particular
Pi (5
S1yS22 - S2S12 2 sus22 - si2
P2-
S2S1j - S1S12
=5,fi1fi2
2
s11s22 - 12
with x2
V 2 = k__L,
s11 =
S,
=
- n(5)2,
xy - n.5', s2 = s2 =
x, -
xy1 - n()2j', and
s22
=
The LS estimates of the regression coefficients are = 0.2347, fi, = 0.0046, and fi2 = 0000055, yielding the regression function 5) = 02347 + O 0046x - 0.000055x2. A graph of 5' is provided in Figure 15.2 along with a plot of the
data. For comparison, the graph of the linear regression function obtained in Example 15.3.1 also is shown as a dashed line.
An obvious question would be whether the second-degree term is necessary. One approach to answering this would be to test whether ß2 is significantly dif=O ferent from zero. According to Theorem 15,4.6, a size = .05 test of R0 O is to reject H0 if t 2.306. Because in this example versus H, : t = 2.235, ß2 does not differ significantly from zero, at least at the 05 level. It should be noted, however, that the test would reject at the .10 level.
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
524
FIGURE 15.2
CHAPTER 15
REGRESSION AND LFNEAR MODELS
Hydrocarbon emissions as a function of accumulated mileage (second-degree polynomial fit) 036
e
0.34 e
// 0,28
77
-p
//
/
//
e
235 +.006x.000055x2
026 0.00
10.00
20.00
30,00
40.00
50.00
60.00
Mileage (x 1000)
Another question of interest involves joint tests and confidence regions for two or more of the coefficients.
JOINT TESTS AND CONFIDENCE CONTOURS Part i of Theorem 15.4.6 provides a means for testing individual regression coefficients with the other parameters of the model regarded as unknown nuisance parameters. It also is possible to develop methods for simultaneously testing two or more of the regression coefficients.
ExmpTh 15.4.2 Although the plots of the hydrocarbon data in Figures 15.1 and 15.2 strongly suggest that some sort of regression model is appropriate, it would be desirable in
general to test whether terms beyond the constant term ß0 really are needed Such a test can be constructed using the approach of the generalized likelihood
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
15.4
525
GENERAL LINEAR MODEL
ratio (GLR) test of Chapter 12. In the auto emissions example, suppose it is desired to test jointly the hypothesis H0 fi1 = fi2 O versus the alternative that at least one of these coefficients is nonzero The parameters fi0 and a2 would be unknown. Let f be the set of all quadruples (fi0, ß1 fi2, a2) with
fi, O, 0, &)
/
f(y; ño, /11, /12, &2)
[(2x)&]2 exp [(1/2&)
[(22JI2 exp [(1/2&2) - ,iJ2
(. (.
)2]
Thus, the GLR test would reject H0 fi1 = fi2 = O if A(y) k, where k is chosen to provide a size test. A simple way to proceed would be to use the approximate test given by equation (12 8 3) The test is based on the statistic 2 in 2(y) = n in (&/&2) = 16 67 Because r = 2 parameters are being tested the approximate critical value is x(2) = 5.99, and the test rejects H0. In fact, the test is highly significant because the p-value is .0002. We will see shortly that an exact test can be constructed, but first we will consider joint confidence regions. Consider the quantity S = S() = (Y X)'(Y - Xv), which is the sum of squares that we minimized to obtain the LS and ML estimates of the regression coefficients In other words, the minimum value of S is S(Ø) Specifically, it can be shown (see Exercise 29) that
S() =
S(Ø) + (
p)'(X'X)(Ø
-
(15.4.15)
With the assumptions of Theorem 15.4.4 we have that S()/a2 = n&2/a2
x(n) and
1). Furthermore, from Part 4 of Theorem 15.4.4 we know that and a2 are independent, with equation (154 15) this implies that S() S() = ( l) and S(fl) = nô2 are independent Based ) (X'X on the rationale of Theorem 15.3.4, it follows that [S(f) - S()]/a2 x2(n (n - p - 1)) = x2(p + 1). Thus, we can derive an F-variable 2(n
p
F(p + 1, n pl) It follows that a y x 100% confidence region for fi0, fi1, inequality
(15.4.16)
..., fis, is defined by. the (15.4.17)
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
526
CHAPTER 15 REGRESSION AND LINEAR MODELS
The boundary of such a region is known as a confidence contour. Of course, another way to express such a confidence region is in terms of the quantity Q = ( - y(X'X)( - ), which is a quadratic form in the differences ß - ß. In particular, we have
( - l)'(X'X)( - i»
(p + 1)&2f(p +
1, n
- p - 1)
(15.4.18)
For the important special case of the simple linear model, the corresponding confidence contour is an ordinary two-dimensional ellipse with center at (ß,
Example 15.4.3 Consider the auto emission data of Example 1531. To obtain a confidence contour by quantity (15.4.18), it is necessary to find X'X. For the simple linear
(1
model we have X'
(n i
FIGURE 15.3
1
X,,/
Ex[n
, and consequently
n5ç
n
95% confidence contour for (ß0, ß,)
(ßo, ßi) = (.266, .00158)
Parameter ß0
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
15.4 GENERAL LINEAR MODEL
527
From this we obtain Q = n(ß - ¡30)2 + 2n0 -
- ß) + nx1 - ¡31)2,
verifying the earlier comment that the confidence contour is an ellipse with center at (ß, ft) For this example, n = 11, = 2767, and x2 = 95107 Using the per centilef95(2, 9), the graph of a 95% confidence contour for this data is shown in
Figure 15.3. Of course, the same confidence region would be obtained using inequality (15.4.17).
The above results also suggest how to construct a joint test of hypotheses about all of the fly's. For example, a size test of the hypothesis H0 : ¡30 = ¡31 = = = O versus the alternative that at least one of the ß1's is nonzero would be to reject if
[S(0) - S()]/(p + 1)
S(ß)/(np-1) >f1_(p+1,np-1)
(15,4.19)
where O is the (p + 1)-dimensional column vector with all zero components. This can be generalized to provide a test that the s in some subset are all zero Consider a null hypothesis of the form H0 ¡3m = = /3, = O where
o ç in p. This is a generalization the test of Example 15.4.2, and in fact the GLR approach extends immediately to the case of the general linear model with p + i coefficients. The parameters ¡30, ..., ¡3,,,- and 0.2 would be unknown. Let f be the set of all (p+2)-tuples (ß0,ß1, ...,ß,,, o) with - <ß< and cr2 > O, and let o be the subset of O such that ¡3. = O for j = m, ..., p. We assume independent errors, e N(0, 0.2). The MLEs over the subspace O are the components of fL, = (XX0)1X01Y, where X0 is the n x m matrix consisting
of the first m columns of X,
and
(y - 0)2/n
& = S(0)/n =
with
m1 On the other hand, over the unrestricted space
.1=0
MLEs are the usual
= (X'X)'X'Y and &2 = S(.,û)/n
=
=
the joint
(y - 5')2/n with
The GLR derivation of Example 15.4.2 extends easily to yield a
GLR of the form 2(y) = (&/ô2)"12, Thus, the GLR test would reject H0 : ¡3m = O if 2(y) k where k is chosen to provide a size test. As in the = example, we could employ the chi-square approximation of 2 in 2(y), but an exact test is possible. Note that
np-1 [2(y) -
pm+ i
2/n
pm+ i [S(0) - S()]/(p - ni + 1)
S(ß)/(np 1)
(15.4.20)
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
528
CHAPTER 15
REGRESSION AND LINEAR MODELS
which is a decreasing function of 2(y). Furthermore, under the mill hypothesis H0 : = = ß,, = 0, it can be shown that the ratio on the right of equation (15.4.20) is distributed as F(p - m + 1, n - p - 1). Consequently, a size test, which is equivalent to the GLR test, would reject H0 if
[S(0) - S()]/(p - m + 1)
S()/(np 1)
>
f1(p - m + 1, n -
(15.4.21)
For additional information on these distributional results, see Scheffe (1959). We summarize the above remarks in the following theorem.
Theorun 15.4.7 Assume the conditions of Theorem 15.4.4 and let S() (Y X)'(Y - Xii). A y x 100% confidence region for ß0, ß1, ..., ß is given by the set of solutions to the inequality
S(ß)S(Ø)[1+i1f(p+l,n_p_1)] if
[S(Ø0) - S()]/(p - ni + 1)
- p - 1)
> f1-(P - ni +
i, npl)
where fi = (X'X)1X'Y and &2 = S()/n are the MLEs over the full
and ô are the MLEs over the subset 1 of û such that ß = O for j = m, ..., p. Furthermore, = (XX0)1X0'Y, where X0 is the n x matrix consisting of the first ni columns of X, ô = S(0)/n, and the
ni
resulting test is equivalent to the GLR test.
Example 15.44 In Example 15.4.2 we considered a GLR test using the auto emissions data. There we used the chi-square approximation of 2 In 2(y) to carry out the test. Now we can perform an exact test using the results of Theorem 15.4.7. Recall that it was desired to test the hypothesis H0 fi1 = fi2 = O versus the alternative that at least one of these coefficients is nonzero. Over the subspace ûc there is only and in this case one undetermined coefficient fi0 with MLE fi0 = j over
= (y j)2 = 0.00796, while over the unrestricted space û the MLE is = (K'KY and S(D) = 0.00175. Because n = il, p = 2, and in = i in this
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
15.5 ANALYSIS OF BIVARIATE DATA
529
example, the value of the F statistic is
[0.00796 - 0.00175]/2 0.00175/8
- 14.194, which
would reject H0 at the = 05 level because f95(2, 8) = 4 46 As in the test of Example 15.4.2, evidence in favor of using a regression model is overwhelming because the p-ialue is .0023. It is interesting to note that both procedures led to a rejection, and in both cases the result was highly significant.
15.5 ANALYSIS OF BIVARIATE DATA To this point, we have assumed that the variable x can be fixed or measured without error by the experimenter. However, there are situations in which both X and Y are random variables. Assume (X1, Y1), ..., (X, ) is a random sample from a bivariate population with pdff(x, y). That is, each pair has the same joint pdf, and there is independence between pairs, but the variables within a pair may be dependent Our purpose in this section is to show how the results about the simple linear
model can be used in the analysis of data from bivariate distributions and to develop methods for testing the hypothesis that X and Y are independent. We first define an estimator of the correlation coefficient p.
Definition 15.5.1 If (X1, Y1), ..., (X, 1,) is a random sample from a bivariate population, then the sample correlation coefficeìit is
(X)(1V) R
i1
(X1X)2
(15.5.1)
(Y)2 i=1
The corresponding quantity computed from paired data (x1, yr), .., (x, y,,), denoted by r, is an estimate of p.
We will consider the important special case in which the paired data are obser-
vations of a random sample from a bivariate normal population (X, Y) BVN(p1, , o-f, c, p) We know from Theorem 54 8 that the conditional dis tribution of Y given X = x is normal, Yl X
+p
(x
- Pi),
(1
p2))
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
530
CHAPTER 15 REGRESSION AND LINEAR MODELS
This can be related to a simple linear model because E(Y!x) = ß0 + ß1x with
ßo = /1 - p
fi1 = p -h-, and Var(Yj x) =
a1
o1,
- p2). Thus, we have the
following theorem
Theorem 15.5.1 Consider a random sample from a bivariate normal population with means 1i and ji2, variances a and a, and correlation coefficient p, and let X = (X1, ..., X) and x = (x1, .., x). Then, conditional on X = x, the variables Y1, ..., IÇ,
are distributed
independent variables
as
withß0 =112p -jii,ß1 =p
Y - N(ß0 + ß1x, cr2)
,and a2 = a(1 p2). o_1
If it is desired to test the hypothesis H0 : p = O, the fact that fi1 = p
suggests
using the test of H0 :fi1 = O in Part 2 of Theorem 15.3.7. That is, reject H0 at level
if It1 t1_212(n - 2). Of course, the resulting test is a conditional test, but a conditional test of size also gives an unconditional test of size The following derivation shows that an equivalent test can be stated in terms of the sample correlation coefficient r. Specifically, if .
t=
(15.5.2)
(x1
then under H0 : p = O and conditional on X = x, t is the observed value of a random variable T that is distributed the same as T1. In other words, under H0 : p = O and, conditional on the observed xe's, the variable T t(n - 2). But the MLE of fi1 for the simple linear model is
(x,
)2
(y (15.5.3)
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
15.5 ANALYSIS OF BVAR!ATE DATA
531
where r is the estimate (15.5.1). Recall that for the centered regression problem, = j', and thus
-
[y -
(n - 2)& =
=(y_)z_fl
(x
(y)212
it follows that
-
(15.5.4)
n-2
The last expression is based on substituting the right side of equation (15.5.3) for ßi and some simplification By substituting equations (15 5 3) and (15 5 4) into (15.5.2), we obtain
\/
(x
(Y
)2/
(x
,/(._)2[1 _r2]/(n_2) (15.5.5)
= It follows that a size test of H0 p = O versus Ha p O is to reject H0 if tj a/2(fl - 2) where t = ,/n - 2r//i?. This provides a convenient test for independence of X and Y because bivariate normal random variables are independent if and only if they are uncorrelated. These results are summarized in it I
the following theorem.
Tlworem 15,5.2 Assume that (X1, Y1), ..., (Xv, }) is a random sample from a bivariate normal
distribution and let t = /r/J1 - r2.
i A size cc test of the hypothesis H0 p = O versus Ha p ti
t1_12(n 2).
A size cc test of the hypothesis H0 : p t
O is to reject if
t1_(n
O versus Ha : p > O is to reject if
2).
A size cc test of the hypothesis H0 p
O versus Ha p < O is to reject if
tt1«(n-2). It is interesting to note that for the above conditional distribution of T to hold, are it is only necessary to assume that the 2n variables X1, ..., X and Y1, ..,
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
532
CHAPTER 15 REGRESSION AND LINEAR MODELS
independent and that each
N(it, c2) for some u and
2,
Thus, the test
described above is appropriate for testing independence of X and Y, even if the distribution of X is arbitrary. This is summarized by the following theorem.
Theorem 15.5,3 Assume that (X1, Y1), ..., (X, 1) is a random sample from a bivariate popuN(u, ci2) for each i = 1, ..., n, then a size test of ation with pdff(x, y). If the null hypothesis that the variables X, and Y1 are independent is to reject if It I
t1 _«12(n - 2) where t =
../r/,/Tr2.
The test defined in Theorem 15.5.2 does not extend readily to testing other values of p except p = O. However, tests for nonzero values of p can be based on the following theorem, which is stated without proof. Theorem 15.5.4 Assume that (X1, Y1), ..., (X, ) is a random sample from a bivariate normal distribution, BVN(u1, /12, cr, c, p), and define
i "1+R V = - In Then Z =
P)
\1R1
2
- 3(V - m) - N(O, 1) as n -
That is, for large n, V is approximately normal with mean in and variance 1/(n - 3). This provides an obvious approach for testing hypotheses about p, and such tests are given in the following theorem. Theorem 15.5,5 If (X1, Y1), ..., (X, IÇ) is a random sample from a bivariate normal distribution,
BVIN(p1, /12, ci, ö, p), and z0 = ,J(v - in0) with y = (1/2) in [(1 + r)/ (1 - r)] and ni0 = (1/2) in [(1 + po)/(1 - Po)] then An approximate size test of H0 p = Po versus Ha P s1' Po is to reject H0 if Izo 2 -«/2 An approximate size test of H0 : p Po versus Ha : > Po is to reject H0 if z0
An approximate size
H0ifz0
test of H0 : p ì Po versus Ha P
z1_«.
It also is possible to construct confidence intervals for p based on this approximation. For example, the approximate normal variable Z can be used to derive a confidence interval for in of the form
(in1, in2) = (y - z1 -«/2/' V + Z1 _«12/fl - 3)
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
16.5 ANALYSIS OF BIVARIATE DATA
533
Limits for a confidence interval for p are obtained by solving equations = 1/2 in [(1 + p1)/(l - p.)] for i = 1, 2. The resulting confidence interval is of the form (Pi,
P2)
where p = [exp (2m1) - 1]/[exp (2m1) + 1] for i = 1, 2.
Example 15.5.1 Consider the auto emissions data of Example 15.3.1. Although, strictly speaking,
the variable x is not random in this example, we will use it to illustrate the computation of r and a test of hypotheses Recall that n = 11, = 304 377, x=10461814, y,=34075, y=1063, x,y,=97506, =27671 and
=0310
Thus
(y1j,)2
=ynj2=00059, and
1479 It follows that r = 3 1479/,J(2039 287)(0 0059) = 0 908 A test of H0 p = O versus Ha P O based on t = (0.908)/J1 - (0.908)2 = 6.502, which, of course, would reject at any practical level of significance, indicating a near linear relationship among the variables In Chapter 12, a paired-sample t test was discussed for testing the difference of means for a bivariate normal distribution with the variances and the correlation coefficient unknown nuisance parameters We now consider the problem of a simultaneous test of equality of the means and variances of a bivariate normal population with unknown correlation coefficient. This test was suggested by
Bradley and Blackwood (1989). Suppose X and Y are bivariate normal, (X, Y)
BVN(L11, P2' o-, o, p). It follows from the results of Chapter 5 that the sum S = X + Y and difference D = X - Y also are bivariate normal with
means Ps=Pi +/li and Pv=
=c+
-
2p and covariance
-P2,
variances
cr =a +a +2p and
Cov(X + Y, X - Y)
Cov(S, D)
= Yar(X) + Cov(X, Y) - Cov(X, Y) - Var(Y)
=a
Thus, the correlation coefficient of S and D is 2
PsD
tS aD
As a consequence of Theorem 5.4.8, we know that the conditional distribution of D given S = s is normal,
Djs N
aD
PD + PSD -
(s -
2
i
PsD
Notice that if we let ßo = PD
aD P5D
(T5
2
Ps
(Pi
P2)
2 2
Cs
(Pi + P2)
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
534
CHAPTER 15
REGRESSION AND LINEAR MODELS
and
then, conditional on S = s, D is normal with mean E(D s) = fi0 + ß1s and variance Var (D s) = o(1 PD). Thus, conditionally we have a simple linear regression model. Because p = P2 and cr = o if and only if fi0 = fi1 = O, the I
-
joint null hypothesis H0 P1 = P2
is equivalent to the joint null hypoth
=
esis H0 ß = fi = O which can be tested easily with the results of Theorem d are the sums and differences of n 1547 In particular, if s, ,; and d1, pairs (xe, y) based on a random sample from a bivariate normal population, then a size test of H0 would reject if [
d - SSE]/2 SSE/(n 2)
with$ =(X'X)1X Y,X'
[S(0) - S()]/2
S()/(n 2)
=(1
>1' 2
andY =(d1,
,
d)
It also is possible to test the equality of variances because cr =
= O. Theorem 15.4.7 yields a test of H0 Ha
=
cr
if and only if
versus the alternative
o, namely, we reject H0 if (d1
J)2
- SSE
SSE/(n 2)
- S() - S()/(n 2) S(û0)
>
2)
= ci over the parameter subspace with /3 = O and solution of the previous test. with ¡Jo
the unrestricted ML
SUMMARY Many problems in statistics involve modeling the relationship between a variable Y, which is observed in an experiment, and one or more variables x, which the
experimenter assumes can be controlled or measured without error. We have considered the approach of linear regression analysis, which assumes that Y can be represented as a function that IS linear in the coefficients, plus an error term whose expectation is zero. It was shown that estimates with minimum variance among the class on linear unbiased estimates could be obtained with the mild assumption that the errors are uncorrelated with equal variances With the addi tional assumption that the errors are independent normal, it also was possible to obtain confidence limits and tests of hypotheses. about the parameters of the model. We have only scratched the surface of the general problem of regression analysis. For additional reading, the book by Draper and Smith (1981) is recommended.
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
535
EXERCISES
EXERCISES 1.
In a study of the effect of thermal pollution on fish, the proportion of a certain variety of sunfish surviving a fixed level of thermal pollution was determined by Matis and Wehrly (1979) for various exposure times. The following paired data were reported on scaled time (x) versus proportion surviving (y). xe
0.10, 0.15, 0.20, 0.25, 0.30, 0.35, 0.40, 0,45, 0.50, 0.55
ye
1.00, 0.95, 0.95, 0.90, 0.85, 0.70, 0.65, 0.60, 0,55, 0,40
Plot the paired data as points in an x-y coordinate system as in Figure 15.1. Assuming a simple linear model, compute the LS estimates of fi0 and ß1. Estimate E(Y) = fi0 + ß1x if the exposure time is x = 0.325 units. Include the graph of j' ,ß0 + ,q1x with the plotted data from (a). Compute the SSE and give an unbiased estimate of a2 = Var(Y).
2. Mullet (1977) considers the goals scored per game by the teams in the National Hockey League. The average number of goals scored per game at home and away by each team in the 1973-74 season was
Boston Montreal N.Y. Rangers Toronto Buffalo Detroit Vancouver N.Y. Islanders Philadelphia Chicago Los Angeles Atlanta Pittsburgh St. Louis Minnesota California
At Homo
Away
.95 4.10 4.26 3.69 3.64 4.36 3.08 2.46 3.90 3.64 3.36 3.10 3.18 3.08 3.69 2.87
4.00 3.41
3.44 3.33 2.66 2.18 2.67 2.21
3.10 3.33 2.62 2.38 3.03 2.21
2.33 2.13
Plot the paired data in an x-y coordinate system with x the average number of goals at home and y 'the average number of goals away. Assuming a simple linear model, compute the LS estimates of fi0 and fi1. Predict the average number of away goals per game scored by a team that scored four goals per game at home. Include the graph of the estimated regression function with the plotted data from (a).
Compute an unbiased estimate of cr2.
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
536
CHAPTER 15
REGRESSION AND LINEAR MODELS
3.
Rework Exercise i after centering the x1's about . That is, first replace each x, with x -
4.
Rework Exercise 2 after centering the x1's about
5.
Show that the error sum of squares can be written as
SSE=y-0y1ß,
.
x1y1
6. Show each of the following: Residuals can be written as ê1 = y1 -
-
SSE
7.
For the constants a1,..., a in part 4 of Theorem 15.3.1, show that a1 = O and
a1x1 = O imply
a1b1 = O.
8. Assume Y .....1 are uncorrelated random variables with E() = ß0 + ß1x1 and
Var()
a2, and that /1 and /, are the LS estimates of ß0 and fi1. Show each of the
following:
Elß) = ß + - [1 + n2/
(xl
Hint: Use general properties of variance and the results of Theorem 15.3.1.
E8) = ß + a/> (x ?2] = (n - 1)a2 + ß E[ (Y
(x
Hinz: Use an argument similar to the proof of Theorem 8.2.2. &2 = [Y1 - Io - J,x1]2/(n - 2) is an unbiased estimate of a2. Hint: Use (b) and (e) together with Exercise 6.
9. Consider the bus motor failure data in Exercise 15 of Chapter 13. Assume that the mileages for the first set of data are normally distributed with mean and variance a2. Apply the method described in Earnple 15.3.2, with x1 = a1 and y. = - 1(F1) to estimate p and a2 For the fifth set of bus motor failure data, assume that the mileages have a two-parameter exponential distribution with location parameter t and scale parameter O. Apply the method described in Example 15.3.2 with x = a1 and G1(F1). Y1
10.
Rework (a) from Exercise 9 but use the method in which x1
11
j?, and a2 for the centered regression model of Section 3 are Verify that the MLEs jointly complete and sufficient for ß ß,, and a2.
4I
'(F1) and y1 = a1.
Verify the identities that follow equations (15.3.6) and (15.3.7).
Hint: Note that
(x1 - ) = O and
(x - )b = 1.
Derive equation (15.3.8). Hint: Add and subtractß + 1(x1 - Y) within the squared brackets, then use the binomial expansion. The term involving (J? - ßj(ß, - fi1) is zero because (x1 - ) = O.
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
EXERCISES
537
14. Derive the joint MGF of fi0 and fi1 under the assumptions of Theorem 153.4. Hint: Use the fact that fi. and fi1 are independent normally distributed random variables and that = fi - ß1. Recall that the MGF of bivariate normal random variables is given in Example 5.5.2.
15.
For the thermal pollution data of Exercise 1, assume that errors are independent and normally distributed, s N(0, ai). Compute 95% confidence limits for fi0. Compute 95% confidence limits for fi1. Compute 95% confidence limits for a2. Perform a size e 01 test of H0 fi0 = 1 versus I fi0 Perform a size a = .10 test of H0 : fi1 = - 1.5 versus Ha : fi1 s 1.5. Perform a size a = .10 test of H0 : a = .05 versus :a .05.
16.
Under the assumptions of Theorem 15.3.7, Derive both upper and lower one sided tests of size a for Redo (a) forfi1. Redo (a) for a2
17
Let Y1
Y,, be independent where Y - N(ßx, a ) with both fi and a2 unknown y are observed, derive the MLEs fi and &2 based on the pairs (x1, y1)
1f y1, ..., (x,,, ya).
Show that the estimator fi is normally distributed. What are E(ß) and Var(fi)? Show that the estimators fi and &2 are independent. Find an unbiased estimator &2 of a2 and constants, say c and y, such that c.2 2(v). Find a pivotal quantity for fi. Derive a (1 - a)100% confidence interval for fi Derive a (1 - a)100% confidence interval for a2.
18. In a test to determine the static stiffness of major league baseballs, each of six balls was subjected to a different amount of force x (in pounds) and the resulting displacement y (in inches) was measured. The data are given as follows: X,:
20 30 40 50 60 .045 .071 .070 .112 .120 .131 10
Assuming the regression model of Exercise 17, Compute the MLEs fi and &2 Compute an unbiased estimate of a2 Compute a 95% confidence interval for fi. Compute a 90% confidence interval for a2. 19. Assume independent 1 EXP(ßxj; i = Find the LS estimator of fi. Find the MLE of fi.
1.....n.
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
CHAPTER 15 REGRESSION AND LINEAR MODELS
538
20.
Apply the regression model of Exercise 19 to the baseball data of Exercise 18 and obtain the MLE of fi.
21.
Assume that Y1, ..., I, are independent Poisson-distributed random variables, Y - POI(2x1),
Find the LS estimator of A. Find the ML estimator of A. Are both the LS and ML estimators unbiased for A? Find the variances of both estimators.
22.
Assume that Y1, .,., ) are uncorrelated random variables with means E() = fi0 + ß1x and let w1, ..., w be known positive constants. Derive the solutions fi0 = /J and fi1 = / that minimize the sum S = w[y - fi0 - ß1x1]2. The solutions / and are called weighted least squares estimates Show that if Var(1-) = c72/w1 for each i = 1, ..., n, and some unknown a2 > O, then fi0 and ß1 are the BLUEs of fi0 and Hint: Use Theorem 15.4.2.
23.
Let X1, ..., X be a random sample from EXP(O, 'i) and let Z = (X1 Show that E(Z1:,,)
ki n - k + 1
Show that ajj = Cov(Z1, ZJ.h) =
1
11(n +
2' where m = min (i,j).
Showthat(A )11=(ni+ 1)2 +(ni)2,i= i.....n;(A')111 =(A1)111 =
(n
= i.....n - 1; and all other elements equal zero. n( - X1.)/(n - 1) and X1.)/(n - 1). -( ¡)2, i
Show that the BLUEs, based on order statistics, are Ô =
-
= Compare the estimators in (d) to the MLEs. 24
25.
Assume that the data of Example 4 6 3 are the observed values of order statistics for a random sample from EXP(O, ,j). Use the results of Exercise 23 to computethe BLUEs of O and . Under the assumptions of Theorem 15.4.3, verify equation (15.4.11).
Hint Note that(Y - X)(Y - Xß)= Y'(Y X) + i [X Y (X X)] and make use of equation (15.4.7).
26
Assume that (X1 X2) '- BVN(p1 P2 C a p) Recall the Joint MOF M(t1 t2)is given in Example 5.5.2. Show that if p (pi, P2),t' (t1, t2) and V = {Cov (X1, X)}, that the joint MGF can be written M(t1, t2) = exp [t'IL + (i/2)t'Vt].
27.
Using advanced matrix theory, it can be shown that the joint MGF of a vector of k-variate normal random variables, X' = (X1.....X1), has the form given in Exercise 26, namely M (t1, ..., t1) = exp [t'li + (i/2)t'Vt] with ' = (ji1, ..., p) and t' = (t1,..., tk). Assuming this result, show the following: (a) The marginal distribution of each component is normal, X. N(JA1, cf).
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
EXERCISES
539
Ifa' = (of,..., a) and b' = (b2.....bk) and U = a'X and W
b'X, then U and W
have a bivariate normal distribution. U and W are independent if and only if a' Vb = 0. Using only the first eight pairs of hydrocarbon emission data of Example 15.3.1: Compute the LS estimates of /3 , ß1, and ß2 for the second-degree polynomial model also compute the unbiased estimate 2 Express the regression function based on the smaller set of data and sketch its graph Compare this to the function y which was based on all n = 11 pairs Does it make sense in either case to predict hydrocarbon emissions past the range of the data say for x = 60 000 miles? Assuming the conditions of Theorem 1543 compute a 95% confidence interval for ß0 based on the estimate from (a). Repeat (c) for ß1. Repeat (c) for ß2. Compute a 95% confidence interval for 29.
Verify equation (15.4.15).
Hint: In the formula for S(8), add and subtract X in the expression Y - X and then simplify.
Sketch the confidence contour for the coefficients ß0 and ß1 in the simple linear model using the baseball data of Exercise 18. Use y = .95. Test the hypothesis H0 = O concerning the coefficients of the second-degree = polynomial model in Exercise 28 Use = 0005 Compute the sample correlation coefficient for the hockey goal data of Exercise 2, with x = average goals at home, and average goals away for the ith team.
33
Under the assumptions of Theorem 15.5.2 and using the data of Exercise 2, perform a size = .10 test of H0 : p= O versus H : p 0. Note: This assumes that the pairs(X, Y) are identically distributed from one team to the next, which is questionable, but we will assume this for the sake of the problem.
Using the hockey goal data, and assuming bivariate normality, construct 95% confidence hiriits for p.
35.
For the hockey goal data, assuming bivariate normality, test each of the follo'ing hypotheses at level = .05: H0:p1 =p2 and or versus Ha:pi
=i
H0:y=a versus
2
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
C
R ELIAB LITY
AND SUFVI'/AL
DSTR3UTONS
NTFIODLJCTION Many important statistical applications occur in the area of reliability and life testing If the random variable X represents the lifetime or time to failure of a unit, then X will assume only nonnegative values. Thus, distributions súch as the Weibull, gamma, exponential, and lognormal distributions are of particular inter-
est in this area. The Weibull distribution is a rather flexible two-parameter model, and it has become the most important model in this area. One possible theoretical justification for this in certain cases i that it is a limiting extreme value distribution.
One aspect of life testing that is not so common in other areas is that of censored sampling. If a random sample of n items are placed on life test, then the first observed failure time is automatically the smallest order statistic, x1.. Simi-
larly, the second recorded failure time is x2, and so on. If the experiment is terminated after the first r ordered observations are obtained, then this is referred
to as Type II censored sampling on the right. If for some reason the first s 540
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
16.2 RELIABILITY CONCEPTS
541
ordered observations are not available, then this is referred to as Type II censored sampling on the left If the experiment is terminated after a fixed time, x0, then this is known as Type I censored sampling, or sometimes truncated sam-
pling If all n ordered observations are obtained then this is called complete sampling.
Because the observations are naturally ordered, all the information is not lost for the censored items. It is known that items censored on the right have survived at least until time x0 Also a great savings in time may result from censoring If
100 light bulbs are placed on life test, then the first 50 may fail in one year, whereas it may take 20 years for the last one to fail. Similarly, if 50 light bulbs are placed on test, then it may take 10, 15, or 20 years for all 50 to fail, yet the first 50 failure times from a sample size 100 obtained in one year may contain as much information in some cases as the 50 failure times from a complete sample of size
50. The expected length of experiment required to obtain the first r ordered observations from a sample of size n is E(Xr.n). These values can be compared for different values of r and n for different distributions.
If a complete random sample is available, then statistical techniques can be expressed in terms of either the random sample or the associated order statistics, However, if a censored sample is used, then the statistical techniques and distributional results must be developed in terms of the order statistics
REUALITY CONCEPTS Ifa random variable X represents the lifetime or time to failure of a unit, then the reliability of the unit at time t is defined to be
R(t) = P[K > t] = i - F(t)
(16.2.1)
The same function, with the notation S(x) = 1 - F(x), is called the survivor function in biomedical applications.
Properties of a distribution that we previously studied, such as the mean and variance, remain important in the reliability area, but an additional property that is quite useful is the hazard function (HF) or failure-rate function. The hazard function, h(x), for apdf is defined to be h(x)
f(x)
i F(x)
- R'(x)
R(x) -
- d[log R(.x)] dx
(16.2.2)
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
542
CHAPTER 16
RELIABILITY AND SURVIVAL DISTRIBUTIONS
The HF may be interpreted as the instantaneous failure rate, or the conditional density of failure at time x, given that the unit has survived until time x,
f(x X x) = F'(x X = hm = him
= um = um
x)
F(x + AxX x) F(xX
x)
Ax
P[x
X z x + Ax I X Ax
x]
P[x
X x + Ax, X AxP[X x]
x]
P[x X x + Ax] Ax [1 - F(x)]
(16.2.3)
An increasing HF at time x indicates that the unit is more likely to fail in the next increment of time (x, x + Ax) than it would be in an earlier interval of the same length. That is, the unit is wearing out or deteriorating with age. Similarly, a decreasing HF means that the unit is improving with age. A constant hazard function occurs for the exponential distribution, and it reflects the no-memory property of that distribution mentioned earlier. If X
EXP(0),
h (x)
f(x) - F(x)
-
-
\\O)
i -O
In this case the failure rate is the reciprocal of the mean time to failure, and it does not depend on the age of the unit. This assumption may be reasonable for certain types of electrical components, but it would tend not to be true for mechanical components. However, the no-wearout assumption may be reasonable over some restricted time span. The exponential distribution has been an important model in the life-testing area, partly because of its simplicity. The Weibull distribution is a generalization of the exponential distribution, and it is much more flexible. If X '-. WEI(O, ß), then h(x)
ßOx1e
xJO)
eb0>ß
=
ß(x)ß-1 O
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
16.2
FIGURE 16.1
RELIABILITY CONCEPTS
543
Weibull HFs h(t) = (ß/O) (t/O) A
This reduces to the exponential case for ß = 1. For ß> 1, the Weibull HF is an increasing function of x; and for ß < 1, the HF is a decreasing function of x. The Weibull HF is illustrated in Figure 16.1. The gamma distribution also is an important model in life testing. It is not easy to express its HF, but if X GAM(0, k), then the HF is increasing for k > i and
decreasing for k < 1. For k > i the HF approaches 1/O asymptotically from below, while for k < i the HF approaches 1/O asymptotically from above This is substantially different from the Weibull distribution, where the HF approaches
cc or O in these cases. The HF for a lognormal distribution is hump-shaped. Although the pdf's in these cases appear quite similar, they clearly have somewhat different characteristics as life-testing distributions; the HF is a very meaningful property for distinguishing between these densities. Indeed, specifying a HF completely determines the CDF and vice versa.
Theorem 16,2.1 For any HF h(x), the associated CDF is determined by the relationship
r F(x)=1_exP[_j h(t)dtj
(16.2.4)
o
or
f(x)
h(x) exp
[JX
dt]
(16.2.5)
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
544
RELIABILITY AND SURVIVAL DISTRIBUTIONS
CHAPTER 16
Proof This result follows because h(x)
d[log R(x)] =
dx
and "X
d log R(t) = log R(x)
h(t) dt = 'I
which gives
R(x) = i - F(x) = exp L 1Xh(t) dt L
(16.2.6)
J0
Note that a function must satisfy certain properties to be a HF.
Thrm 16.2.2 A function h(x) is a HF if and only if it satisfies the following properties: h(x)
O, for all x
(16.2.7)
"o h(x) dx = co
(1 6.2.8)
Proof The properties are necessary because O
and h(x) dx =
d[log R(x)] = log R(x)J
The properties are sufficient because the resulting F(x) will be a valid CDF; that is, in terms of h(x),
F(co) = F(0) = i - exp F(co) = i - exp
1=0
- jh(t) dt
and F(x) is an increasing function of x because J h(t) dt is an increasing function ofx.
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
16.2
RELIABILITY CONCEPTS
545
One typical life-testing form of HF is a U shaped or bathtub shape. For example, a unit may have a fairly high failure rate when it is first put into operation, because of the presence of manufacturing defects. If the unit survives the early period, then a nearly constant HF may apply for some period, where the
causes of failure occur "at random." Later on, the failure rate may begin to increase as wearout or old age becomes a factor. In life sciences, such early failures correspond to the "infant mortality" effect. Unfortunately, none of the common standard distributions will accommodate a U-shaped HF. Of course, following Theorem 16.2.1, an F(x) can be derived that has a specified U-shaped HF. Quite often it is possible to consider the analysis after some "burn-in" period has taken place, and then the more common distributions are suitable The exponential distribution is used extensively in this area because of its simplicity Although its applicability is somewhat limited by its constant HF, it may often be useful over a limited time span as suggested above, and it is convenient for illustrating many of the concepts and techniques applicable to life testing. Also the homogeneous Poisson process assumes that the times between occurrences of failures are independent exponential variables, as we shall see in Section 16.5. The preceding discussion refers to the time to failure of a nonrepairable system or the time to first failure of a repairable system. The times between failures of a
repairable system often would be related to more general stochastic processes. Note that the HF of a density should not be confused with the failure rate or failure intensity of a stochastic process, although there is a connection between the two for the case of a nonhomogeneous Poisson process. In that case the HF of the time to first failure is also the failure intensity of the process, although its interpretation is somewhat different depending on whether you are concerned with the continuing process or only the first failure.
PA RA LLEL AND SERIES SYSTEMS
Redundancies may be iñtroduced into a system to increase its reliability. If X, i = 1, ..., m, denotes the lifetimes of m independent components connected in pBrdlel, then the lifetime of the system, Y, is the maximum of the individual components, Y = max (XL) = Xmm
and m
Y
F(y) = fl F(y)
ii
The distributions of maximums in general are not very convenient to work with, but the relìability of the system for a fixed time at least can be expressed. The reliability of the parallel system at time t, R1(t), in terms of the reliabilities of
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
546
CHAPTER 16
RELIABILITY AND SURVIVAL DISTRIBUTIONS
the components, R(t), is given by R1(t) = i
- [111 - R.(t)]
(16.2.9)
If P. is the probability that the ith component functions properly, then the probability that the parallel system of independent components functions properly is m
P= i
- fJi (1 - P-)
(16.2.10)
i
For example, if X
R(t) = i
EXP(0), then the reliability of the system at time t is
- [J (1 - e_109
(16.2.11)
If the components all have a common mean, O, = O, then it can be shown that the mean time to failure of the system is
+ + ... +
E(Y) = E(Xm.m) =
(16.2.12)
Thus the mean time to failure increases as each additional parallel component is added; however, the relative gain decreases as m increases.
To illustrate the effect on the HF from using a parallel system, consider a parallel system of two independent identically distributed components, X. F(x). The HF for the system hy(y), in terms of the HF, h(x), of each component is fY(y)
h(y)
- F(y) 2J'(y)F(y)
-i=
i
[ 2F,,(y) [i + FX(y)]hx
(16.2.13)
For positive continuous variables, the term in brackets goes from O to i as y goes from O to co. The failure rate of the parallel system is always less than the failure rate of the individual components, but it approaches the failure rate of an individual component as y - co. The HF of a system connected in series is somewhat easier to express. If X denotes the failure times of m independent components connected in series, then
the failure time of the system, W, is the minimum of those of the individual components, W=X
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
16.2
547
RELIABILJTY CONCEPTS
In this case
F(w)
PEW
w]
= i - PEW> w] = i - P[all X > w]
=i-
fl
[1 - F(w)]
(16.2.14)
1=1
In terms of the reliability of the system at time w, Rw(w) = 11 R(w)
(16.2.15)
If one is simply interested in the proper functioning of some sort, then for the series system
P=
11 i=1
(16.2.16)
Pi
where P is the probability of proper functioning of each component. Clearly the reliability of the system decreases as more components are added in terms of HFs, for the ith component,
Ry(w) = I - F(w) =
fl
R(w)
i= i
= exp [-
J O i=i
h(z) dz
= exp [_ j'hw(z) d
(16.2.17)
thus
h(w)
=
(16.2.18)
h(w)
That is, the HF of the system is the sum of the HFs of the individual components. If X EXP(01), then
h(w)
=
h(w)
= i1
=c
(16.2.19)
Because hM,.(w) is constant, this implies that W
EXP(I/c)
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
548
CHAPTER 16
If
e
RELIABILITY AND SURVIVAL DISTRIBUTIONS
8, then
E(W) = 8/rn
(16.2.20)
163 EXPONENTIAL DSTRBUTION COMPLETE SAMPLES
If X
EXP(0), then we know that X is the MLE and the UM VUE of 8, the
mean of the random variable. The MLE of the reliability at time t is = e_t/O =
(1631)
The MLE of R(t) is not unbiased, and it can be verified that the UMVUE is given by (t)
,{[1 - t/(n.)]"1
n. > t
nx
(16,3.2)
The MLE does have smaller MSE except when t/(n) is relatively large or close to zero. Tests of hypotheses about 8, or monotonie functions of O such as the reliability
or the HF are carried out easily based on the property that 2n ,
2 'X(2fl)
We already have seen that a one-sided confidence interval on a percentile is also a one-sided tolerance interval. There also is a close connection between tolerance
limits and confidence limits on reliability. For a lower (y, p) tolerance limit, L(X; y, p), the content p is fixed and the lower limit is a random variable. For a lower confidence limit on reliability, RL(t, y), the lower limit t is fixed and the proportion RL is a random variable. However, for a given set of sample data, if one determines the value p that results in L(x; y, p*) = t then p4 = RL(t, y)
If p4 is a random variable as defined above, and if R(t) = p and t = x1_,, then p p4 if and only if L(x; y, p) L(x; y, p*) = t, because increasing the content
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
16.3
EXPONENTIAL DISTRIBUTIONS
549
decreases the lower limit. Thus P[R(t) p*] = .P[p p*]
= P[L(X; y, p)
L(X; y, p*)]
= P[L(X, y, p)
t]
= P[L(X; y, p)
=y
(16.3.3)
For the exponential distribution
x= Olnp and a y probability tolerance limit for proportion p is L(x, y, p)
=
°L in
P
where 2n5:
(16.3.4)
= X(2n) is the lower y level confidence limit for O.
To obtain a lower confidence limit on the reliability at time t, we may set
t = L(x, y, p*) =
6L ln p*, and
RL(t) = p* = é'OL = exp [t(2n)/2n5:]
(16.3.5)
is a lower y level confidence limit for R(t). Of course, this could be obtained directly in this case because
R(t) = which is a monotonically increasing function of O, so 0] y= = P[e_t/L
= P[RL(t)
R(t)]
as before.
Example 16.3.1 Consider the following 30 failure times.in flying hours of airplane air conditioners (Proschan, 1963):
23, 261, 87, 7, 120, 14, 62, 47, 3, 95, 225, 71, 246, 21, 42, 20, 5, 12, 120, 11, 14, 71, 11, 14, 11, 16, 90, 1, 16, 52 The MLE of O is Ô = 5: = 59.6, The MLE of the HF is íi(x) = 1/O = 0.017, and the MLE of the reliability at time t = 20 hours is (20) = e20159'6 = 0.7 15. A lower
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
550
CHAPTER 16
RELIABILITY AND SURVIVAL DISTRIBUTIONS
0.95 Confidence limit for O is 2n5 2
xo.95'n
- 60(59.6) 45.22
A 95% lower confidence limit for the reliability at t = 20 hours is
= e_t/OL
RL
-20/45.22 = 0,643
We are 95% confident that 64.3% of the air conditioners will last at least 20 hours. If we are interested in 90% reliability, then we may set p = 0.90 and determine the lower tolerance limit L(X) such that P[L(X)
0.90] = 0.95
We have
L(X; 0.95, 0.90) = OL(ln 0.90) = 45.22(0.105) = 4.75
We are 95% confident that 90% of the air conditioners will last at least 4.75 hours.
Many of the results for complete samples can be extended to censored samples. We will make use of Theorems 6 54 and 6 5 5, with the notation X,,, representing the ith order statistic for a random sample of size n.
TYPE II CENSORED SAMPLING The special properties of the exponential distribution make it a particularly convenient model for analyzing censored data. The joint density function of the r smallest order statistics from a sample of size n is given by g(x1
,,, ...,
0)
(n r)!
exp {
X1.,,
+ (n - r)xrn]/O}
O
(16.3.6)
A useful property of the exponential distribution is that differences of consecutive order statistics are distributed as independent exponential variables.
Thorern 16.3.1 Let
= X1/O
EXP(l), i = 1, . ., .n, be n independent exponential variables,
and let ¡4/
= ya.,,
= Y2.,, - Y1.,,, .
= 11ç,.,, -
then
i. W1, ..., ¡4' are independent. ¡4' EXP(1/(n - i + 1)).
2.
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
16.3
55'
EXPONENTIAL DISTRIBUTIONS
The kth order statistic may be expressed as a linear combination of independent exponential variables
Z.
=
EXP(1).
Z/(n j + 1) where
k
E(Y) = i= i
(n -j + 1Y'
Proof = n! exp [-
Yi:n]
= n! exp
+ 1)(yi:n ¿=1
Yi_i;n)]
where Yo:,i = O. Consider the joint transformation
i1,., n
Wi=Yi:nYj_in with inverse transformation
y1,,w1
Y2:nWl+W2,...,
Yì:nWi++Wn
with Jacobian J = 1. This gives
f(wi...5wn)=n!exP[_
¡=1
(ni+1)w1
- i + 1)e'1
= ¡Iii
O
Thus we recognize that the w1 are independent exponential variables as stated in parts i and 2. Also, from the above transformation it follows that we may express as
Y,, = W1 +
+
Wk
(16.3.7)
but Z1 EXP(1) implies ZI(n - i + 1) '- EXP(1/(n - i + 1), so part 3 follows, and part 4 follows immediately from part 3.
Let us now consider the problem of estimation and hypothesis testing for the Type II censored sampling case. The MLE of 6 based on the joint density, X,.
;
O), of the first r order statistics is easily seen to be x1.
1=1
+ (n - r)xr:u (16.3.8)
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
552
CHAPTER 16
RELIABILITY AND SURVIVAL DISTRIBUTIONS
The properties of O for the censored sampling case are amazingly similar to those for the complete sample case.
Theorem 16.3.2 Let O denote the MLE based on the first r order statistics from a sample of size n from EXP(0). Then i. Ois a complete, sufficient statistic for 8. Ois the UMVUE of 8. 2rO/8
X2(2r).
Proof Because g(x1., ..., x,,, ; 0) is a member of the (multivariate) exponential class, it follows that O is a complete, sufficient statistic for 8 By uniqueness 8 will be the UM VUE of 0 if it is unbiased The unbiasedness of 8 will follow easily from part 3. To verify part 3, note that O may be rearranged to obtain
+ 1)(X.
2r9
X_ i:,,)
e
O
(ni+l)W1 =2
Z
where the 14'Ç are as defined in Theorem 16 3 1 and the Z, dent. Thus part 3 follows.
EXP(1) are indepen
Recall in the complete sample case that 2nO
o
x 2(2n)
(16.3.9)
This is a very unusual situation in which the censored sampling results essentially are identical to the complete sample results if n is replaced by r. Indeed, all of the confidence intervals and tests described earlier also apply to the Type II censored sampling case by replacing n by r The disadvant ige of censored sampling is the extra cost involved with sam pling the additional n - r items and placing them on test. The principal advantage is that it may take much less time for the first r failures of n items to occur than for all r items in a random sample of size r. Yet the efficiency and precision involved in the two cases are exactly the same. The relative expected experiment
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
16.3
EXPONENTIAL DISTRIBUTIONS
553
time in the two cases may be expressed as
REET =
(16.3.10) 1=1
A few values of REEl are given below for r = 10 to illustrate the substantial savings in time that may be realized: n:
11
REET:
0.69
12
0.55
13
15
0.46
0.35
20
0.23
30 0.14
A reasonable approach is to choose r and n to minimize the expected value of some cost function involving the cost of the units and the cost related to the length of the experiment.
IEïrnipIe 16.3.2 Consider again the data in Example 16.3.1. As mentioned earlier, in many lifetesting cases, the data will occur naturally ordered. This was perhaps not the case in the air conditioner data, but let us consider an analysis based on the 20 smallest ordered observations for illustration purposes. These are
1, 3, 5, 7, 11, 11, 11, 12, 14, 14, 14, 16, 16, 20, 21, 23, 42, 47, 52, 62 The MLE and UMV1JE of O is Ô = 51.1 based on these 20 observations. The MLE for reliability at time t = 20 is now (20) 0.676. A lower 95% confidence limit for O is 2rÔ
40(51.1)
y95(2r)
5576
3666
A 95% lower confidence limit for reliability at t = 20 hours is RL
= et = e2013666 = 0.58
The lower (0.95, 0.90) tolerance limit becomes L(X; 0.95, 0.90) = OL(lfl 0.90) = 36.66(0.105) = 3.85
There are, of course, some differences between these numbers and the complete sample numbers because of random variation, but the censored values have the same accuracy as if they had been based on a complete sample of size 20.
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
554
CHAPTER 16
RELIABILITY AND SURVIVAL DISTRIBUTIONS
TYPE! CENSORED SAMPLING Statistical analyses based on Type J censored sampling generally are more com plicated than for the Type II case. In this case the length of the experiment, say t, is fixed but the number of values observed in time t is a random variable. The number of items failing before time t follows a binomial distribution R
BIN(n p)
where p = F(t) = I - e°, when sampling from an exponential distribution Now the distribution of an exponential variable X given that X t is a truncated exponential distribution
F(x) = P[X
P[X
xX
t]
- x/O
xt
Xt
x] t]
- P[X
16.3.11)
X> t Also (x)
e_XIO f(x) = F(t) - 0(1 - et/)
O
(16.3.12)
Thus, given R = r, the conditional density of the first r failure times is equivalent to the joint density of an ordered random sample of size r from a truncated exponential distribution,
g(x1., ...,
R = r) = r!
fl
r! exp
(x1.)
--
Xj;n/O 1
(16.3.13)
9T(1 -
The joint density of obtaining R = r ordered observations at the values x1., Xr.n before time t, may be expressed as
., X) = g(X1., n!
= (n - r)!Or
, Xr.n
exp
r)b(r; n, p)
+ (n - r)t]/0}
IfO
(16.3,14)
-
=
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
555
16.3 EXPONENTIAL DISTRIBUTIONS
Note that this joint density has exactly the same form as in the Type II case with
X,:
replaced by t. In this case, the MLE of O is
i=i
Xj:n ± (n - r)t (16.3.15)
r
It is interesting to note that in both cases Ô is of the form Tir, where T represents the total surviving time of the n units on test until the termination of the experiment.
In this case Ô and r or
x1
and r are joint sufficient statistics for O. Gener-
ally it is not clear how to develop optimal statistical procedures for a single parameter based on two sufficient statistics. Reasonably good inference procedures for O can be based on R alone, even though this makes use of only the number of failures and not their values In this case R BIN(n, p), where p is a monotonic function of O, so the usual binomial procedures can be adapted to apply to O. Additional results are given by Bain and Engelhardt (1991).
TYPE I CENSORED SAMPLING (WITH REPLACEMENT) As we have seen, the manner in which the sampling is carried out affects the probability structure and statistical analysis. It may be of interest to consider the effect of sampling with replacement in these cases. Suppose that test equipment is available for testing n units simultaneously. It may make more economical use of the test equipment to replace failed items immediately after failure and continue testing. As the experiment continues, the failure times are measured from the start of the experiment, whether the failed item was an original or a replacement item. These failure times will be naturally ordered, but they are not the usual order statistics, so we still denote these suc
cessive failure times by s, i = 1, 2, 3.....Note that the number of observations may exeed n in this case.
A physical example of how such data may occur is as follows. Consider a system of n identical components in series, in which a component is replaced each time it fails. Each time a component fails, the system fails, so the system failure times would correspond to the type of observations described above. The special properties of the exponential distribution make the mathematics tractable for this
type of sampling. In particular, this situation can be relajed to the Poisson process for the Type I censoring case. (See Section 16.5 for details.) Suppose that the successive failure times from n positions are recorded until a
fixed time t If the times between failures are independent exponential variables ith mean O, then the failure times for each position represent the occurrences of a Poisson process with ) = 1/O Now it follows from the properties of Poisson processes that if n independent Poisson processes with intensity parameters 1/O
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
556
CHAPTER 16
REUABILITY AND SURVIVAL DISTRIBUTIONS
are occurring simultaneously, then the combined occurrences may be considered to come from a single Poisson process with intensity parameter y = n/U. Thus, for example, the number of failures in time t from the total experiment follows a Poisson distribution with parameter vt = nt/O, R POI(nt/O), so
fR(r)=
e-
r=O,1,
r1
(16316)
i = 2, 3, ... denote the interarrival times If we let T1 = S and 7 = S. of occurrences from the n superimposed Poisson processes, then the joint density of T1, .. ., 7. is
[y ii
f(t1, ..., tr) = yT exp
t,
L
and transforming gives the joint density of the first r successive failures,
(s - s1_1) + si]}
Sr) = Vr exp
= vre_r
(1 6.3.1 7)
where y = n/U. Now for Type I censoring at
t,
the likelihood function on
O
> t - S]f(S1, ..
= =e
O < s1 <
t
r = 1, 2, ...
(16.3.1 8)
Also
P[R=0]=P[S1>t]-re°
r=O
It is interesting to note at this point that given R = r, the successive failure times are distributed as ordered uniform variables.
Th8orem 16.3.3 Let events occur according to a Poisson process with intensity y, and let S1, S2, ... denote the successive times of occurrence. Given that r events occurred
in the interval (O, t), then conditionally S1, ..., Sr are distributed as ordered observations from the uniform distribution on (O, t).
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
16.3
EXPONENTIAL DISTRIBUTIONS
557
Proof We have
f(s1,
.. ,S)
..., Sift)
jÇ(r)
vIe_vt e
=
,
O
(16.3.19)
This is the joint density of the order statistics of a sample of size r from the uniform densityf(x) = l/t; 0
ifr.>O
(16,3.20)
in this case. The MLE is again in the form T/r where T = nt represents the total test time accrued by the items in the experiment before termination.
It is interesting that with replacement the statistic R now becomes a single sufficient statistic. Furthermore, R POI(nt/9), so that previously developed techniques for the Poisson distribution can be readily applied here to obtain tests or confidence intervals on O. For example, a lower O based on r is
i-
level confidence limit for
= 2nt/X_(2r + 2)
(16.3.21)
and an upper i - ci level confidence limit for O is
= 2nt/(2r)
(16.3.22)
These again may be slightly conservative because of the discreteness of the Poisson distribution.
ExmpIe 16.3.3 Consider a chain with 20 links, with the failure time of each link distributed as EXP(0). The chain is placed in service; each time a link breaks, it is replaced by a new one and the failure time is recorded. The experiment is conducted for 100 hours, and the following 25 successive failure times are recorded:
5.2, 13.6, 14.5, 14.6, 20,5, 38.4, 42.0, 44.5, 46.7, 48.5, 50.3, 56.4, 61.7,
62.9, 64.1, 67.1, 71.6, 79.2, 82.6, 83.1, 85.5, 90.8, 92.7, 95.5, 95.6
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
558
CHAPTER 16
RELIABILITY AND SURVIVAL DISTRIBUTIONS
The MLE of 0 is
= 20(100)/25 = 80. Suppose that we wish to test H : 0
loo against Ha : O < 100 at the c = 0.05 significance level. We have 2nt
ß-
2(20)(100) 1VJ
- 40 4:X( ) 2
so H0 cannot be rejected at the 0.05 level. A lower 95% confidence limit for O is 2(20)( 100) 2
Xo.95
Note that this set of data actually was generated from an exponential distribu tion with O = 100.
TYPE II CENSORED SAMPLING (WITH REPLACEMENT) Again suppose that n units are being tested with replacement, but that the experiment continues until r failures have occurred. As before, the ordered successive failure times s are measured from the beginning of the experiment without regard to whether they were original or replacement units when they fail.
If the failure time of an indi idual unit follows the exponential distribution EXP(0), then the superimposed failures from the n positions may be considered to be the occurrences of a Poisson process with intensity parameter y = n/O, as discussed earlier for the time-truncated case. Thus, the interarrival times Yi = i = 2, ..., r, are independent exponential variables with mean 0/n, =s-
f(
'
Yr)
exp
=
Transforming back to the s, e
with Jacobian J = 1, and
0
(163.23)
The MLE in this case is
r
(1 6.3.24)
where flSr is again the accrued survival time of all units involved in the test until
the experiment ends. The likelihood function is a member of the multivariate exponential class, and s7is a complete, sufficient statistic for 0. As noted above, in terms of the independent exponential variables 1, r
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
16.3
EXPONENTIAL DISTRIBUTIONS
559
so
2rÔ
=
2
2nSr
'' f(2r)
=
(16.3.25)
This is again a somewhat astonishing result, because exactly the same result is obtained for the Type II censored sampling without replacement, as well as for the complete sample case with n = r It follows that all of the statistical analyses available for the complete sample case may be applied directly to the Type II censored sampling case with replacement by using the above O and replacing n by r in the previous formulas. It is clear that identical results may be achieved by placing r units on test and conducting the experiment with replacement until r units fail, or beginning the experiment with n units on test and conducting the experiment with replacement until r units fail. The expected experiment time in the latter case is
E()
E(Sr,,,) =
(16.3.26)
so the relative expected experiment time of the latter case to the first case is
REET
E(Srn)
r
E(Srr)
'
(1 6.3.27)
Thus substantial savings in time may be achieved by beginning with additional units. The value in saving time must be weighed against the cost of testing additional units to decide on the appropriate censoring fraction to use. The relative experiment time gained by using replacement also can be determined by comparing the expected time required to obtain n failures from n units with replacement, E(S), with the time required to obtain n failures without replacement, E(X.).
¡ffxmpIo 16.3.4 Consider again the data given in Example 16.3.3, but suppose that Type II censored sampling with r = 20 was used. Based on the first 20 observations, the MLE ofûis
-
20(83.1) - 83.1 20
To test H0 : O 2r
=
100 against Ha: O < 100 in this case, we consider
2(20)(83.1) 100
- 33.2 >
= 26.5
so again H0 cannot be rejected at the 0.05 level. Similarly, tests for reliability, tolerance limits, and so on can be carried out for this case
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
560
CHAPTER 16
RELIABILITY AND SURVIVAL DISTRIBUTIONS
16.4 WEIBULL D!STRIBUTION We have seen that the exponential distribution is an important life-testing model that is very simple to analyze statistically However, it is somewhat restrictive in
that it is applicable only where a constant HF is reasonably appropriate. The Weibull distribution represents a generalization of the exponential distribution that is considerably more flexible, because it allows for either an increasing or a decreasing HF. The Weibull distribution has been shown empirically to provide a
good model for a great many types of variables. Also recall that the Weibull distribution is one of the three limiting extreme value distributions This may provide some theoretical justification for its use in certain cases. For example, the strength of a long chain (or the failure time of a system in series) is equal to that of the weakest link or component, and the limiting distribution of the minimum
is a Weibull distribution in many cases Similarly the breaking strength of a ceramic would be that of its weakest flaw.
Also recall that if X WEI(0, ß), then Y = In X - EV(l/ß, In 0). Thus, any statistical results developed for the Weibull distribution also can be applied easily to the Type I extreme-value model and vice versa. Indeed, in the Type I extremevalue notation, the parameters are location-scale parameters, so it often is more convenient to develop techniques in the extreme-value notation first. If Y
EV((5,
), then
= 1exp [_exp (Y;
)]
co<
(5>0
and X = e1 WEI(0, ß), where = ln O and 5= l/ß, For example, if x1, ..., x,, represents a sample of size n from a Weibull distribution, then letting Y = ln x1, i = i.....n, produces a sample from an extreme-value distribution. A test for may be developed based on the y1, and then this test could be restated in terms of O = exp (e).
MAXIMUM LIKELIHOOD ESTIMATION Let X WEI(O, ß). Then the likelihood function for the first r ordered observations from a random sample of size n under Type II censored sampling is given by
.., X:) - (n
r)! n!
[,
fx(x&.n)][1
- F(Xr.)] -r
(ß\1rr(x
(nr)!k\O} 1)j'\O (16.4.1)
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
16.4 WEIBULL DISTRIBUTION
561
Setting the partial derivatives with respect to O and fi equal to zero gives the MLEs and / as solutions to the equations i= i
x
In xe.,, + (n - r)x >
i=1
in
Xr:n
i
j,
--r
+ (n -
In x
(16.4.2)
and
(16.4.3)
For the special case of complete samples, where n
r, the equations reduce to
xf in x n ¡=1
ß
in x
(16.4.4)
and
Ê
p
(16.4.5)
In either case, the first equation cannot be solved in closed form. However, it
has been shown that the MLEs are unique solutions of these equations. The Newton-Raphson procedure for solving an equation g(ß) = O is to determine successive approximations / where = - g(ß,)/g (J3) Many other techniques also are available with a computer Note that the MLEs for and in the extreme-value notation are simply = in B
= i//i
(16.4.6)
It may initially seem unclear how to develop inference procedures based on the MLE in this case If the estimators cannot be expressed explicitly, then how can their distributions be determined7 Two key factors are involved in determining distributional results in this case These are the recognition of pivotai qudntlties and the ability to determine their distributions by Monte Carlo simulation. It follows from Theorem 11.3.2 for the extreme-value model with location-scale parameters that e
and
(16.4.7)
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
562
CHAPTER 16
RELIABILITY AND SURVIVAL DISTRIBUTIONS
are pivotal quantities with distributions that do not depend on any unknown parameters. Thus, in the Weibull notation, it follows that (O\ß
and
\6)
û
(16.4.8)
are also pivotal quantities For the Weibull case, the reliability at time t is given by
R = R(t) = e A pivotal quantity for R is not available, but the distribution of R depends only on R and not on t, O, and fi individually. This result is true in general for
location-scale models (or related distributions such as the Weibull) but it is shown directly in this case, because
_lnR=(f - (t/O)
/ß
[In R' =L
(16.4.9)
(Ô/O)
which is a function only of R and the previous pivotal quantities. This result makes it possible to test hypotheses about R or set confidence intervals on R, if the distribution of R can be obtained for various R values Recognition of these pivotal quantity properties makes it quite feasible to determine percentiles for the necessary distributions by Monte Carlo simulation. For example, we may desire to know the percentile, q1, such that
P[û/ß ( q1] = y for sorne sample size n. Let us generate, say, 1000 random samples of size n from a standard Weibull distribution, WEI(1, 1), and compute the MLE of fi say fi11 In particular, we could determine the number, , for which lOOy% of the calcuwere smaller than . Approximately, then, lated values of
= This approximation can be improved by increasing the number of simulated samples within the limits of the random number generator. Now, because the distribution of û/ß does not depend on the values of the unknown parameters, the distribution of 'û/ß is the same as the distribution of ß11/1; thus, approxP['ûi1
q1]
imately,
For example, within simulation error, fi/Zi is a lower lOOy% confidence limit for fi.
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
16.4 WEIBULL DISTRIBUTION
563
/(/ß -
Tables of percentage points for the quantities 1) and In (O/O), and other tables for determining tolerance limits and confidence limits on reliability, are provided by Bain and Engelhardt (1991) for both complete and censored sampling cases.
ASYMPTOTIC RESULTS Convergence is somewhat slow in the Weibull case, but for reasonably large asymptotic normality properties of the MLEs become useful. As n -* x and r/n -+ p, the following properties hold asymptotically:
n
the
\/(ßß)/ß'-..N(0, a22)
(16410)
jO In (O/O)
(16.4.11)
N(O, a11)
- R(t)]
N(0, VR)
(16.4.12)
where a11 = rß2 Var(O/û), a22 = r Var(û/ß), and a12 = r Cov(Ô/O, /î/ß) are the asymptotic variances and covariances, and VR
R2(a11(ln R)2
- 2a12 in (in R) + a22[ln (in R)]2}
(16.4.13)
The are included in Table 15 (Appendix C) for censoring levels p = 0.1, 0.2, ..., 1.0. See also Harter (1969) for CU = (n/r)a. Similar results hold for the extreme-value case where, asymptotically,
- ö)/ö '- N(0, a22)
(16.4.14)
-
(1 6.4.1 5)
)/5 '-i N(0, a11)
r Var(/6), a22 = r Var(3/ö), and r Cov(/5, /5) a12. For the Weibul! distribution, it appears that convergence to normality occurs faster for an alternate pivotal quantity of the form with a11
W(d)d
(16.4.16)
Confidence limits on , R(t), or percentiles, based on W(d), will be equivalent to those based directly on the earlier pivotal quantities. Johns and Lieberman (1966) consider confidence limits using W(d) based on simpler estimators, and Jones et al. (1984) develop limits essentially based on W(d) [see also Bain and Engelhardt (1986)]. Let w1(d) be the y percentage point such that P[W(d)
w(d)] = y
then the asymptotic normal approximation for w
w(d) = d +
(16.4.17
where ra, = a11 + d2a22 - 2da12 = A(d).
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
564
RELIABILITY AND SURVIVAL DISTRIBUTIONS
CHAPTER 16
We now consider first approximate confidence limits on reliability. Note that in the extreme-value notation, th reliability at time y is related to the Weibuil reliability as
R(y) = P[Y
y] = P[X
y] = P[ln X
e] = Rx(e)
(16.4.18)
and R-(x) = R1(ln x)
(16.4.19)
Now for a fixed time x, let y = in x, and a lower y confidence limit for R1(x) is given by
L(Rx(x)) = exp [exp (J + z
where J = in [In R(x)] = (
(16.4.20)
J
and c., =
y)/
(di = /A(â)/r. This follows
because
P[L(íx(x))
R(x)] = P{exp [ exp ( J + za)]
exp [(x/O)fl]}
= P{J+zaßln(x/û)]}
r+inx
FL
¿lnx
-
w(J)
c
w(d]
= P[W(J)
= EaP[W(J1i e w(dIJJ = Ea(y)
=y Approximate confidence limits for percentiles can be similarly determined, The IOO percentile for the Weibull distribution is given by
= O[ln (1
= exp
(16.4.21)
(ya)
where
y« = + (5 In [in (1
)] =
+
(5
(16,4.22)
is the
percentile for the extreme-value distribution. For the special case of 0 and these percentiles reduce to y8 = and x8 = 6. Note that a lower y level confidence limit for x8, say L(; ; y), is also a y-probability tolerance limit for proportion i for the Weibull distribution. For the extreme-
=i
1/e, 22
value distribution L(y8; y)
in L(x8;
y)
(16.4.23)
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
WEIBULL DISTRIBUTION
16.4
565
In terms of the pivotal quantity (164.24)
-
- qc
L(y; y) =
(1 6.4.25)
where P[Q q] = y. The Monte Carlo value q may be determined from Bain and Engelhardt (1991). In terms of W(d),
P{Q
=9 which gives w1(d) = À2, where d = q - À. That is, q = d + À where d is the
solution to w(d) = 2. Using the asymptotic normal approximation for w(d), we may solve d to obtain d
a12
- r22 + z[(a2 - a11a22)z + ra11 + 2ra12 ,1. + ra22 1]uI2 r - a22 z
6,4.26)
Then
=-d
- (d +
L(y2; y)
(1 6.4.27)
and exp [L(y2; y)]
L(x2; y) =
(16.4.28)
e
Lower confidence limits for
and 6 are obtained by letting À,
O in computing d.
INFERENCES ON & OR f A chi-square approximation often is useful for positive variables. An approximate distribution for a variable U with two correct moments is achieved by considering cU
x2(v)
(16.4.29)
where c and y are chosen to satisfy cE(U) = y and e2 Var(U) = 2v Following along these lines, Bain and Engelhardt (1986) propose the simple approximate distributional result given by P2
2(c(r -
1))
(16.4,30)
where p = r/n, e = 2/[(1 + p2)2a22], and a22 is the asymptotic variance of /c'/ as n - co and r/n - p. Values of a22 are given in Table 15 (Appendix C), and
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
566
CHAPTER 16
RELIABILITY AND SURVIVAL DISTRIBUTIONS
values of c also are included for convenience. The constant c makes this approximation become correct asymptotically. It is clear that inferences on ß or can be carried out easily based on the above approximation. For example, a lower y level confidence limit for 5 is given by
= 5[(c(r - l))/cr)]'(1P2)
(16 431)
An upper limit is obtained by replacing y with i - y.
ExmjIe 16.4.1 The 20 smallest ordered observations from a simulated random sample of size 40 from a Weibull distribution with U = loo and /3 = 2 are given by Harter (1969) as follows: 5,
10, 17, 32, 32, 33, 34, 36, 54, 55, 55, 58, 58, 61, 64, 65, 65, 66,
67, 68
It is possible to observe how the statistical results relate to the known model for
this data. Also, Monte Carlo tables happen to be available for this particular sample size and censoring level, so the approximate results can be compared to the results that would be obtained from using the Monte Carlo tables in this case For this set of data, r = 20, n = 40, p 0.5, /3 = 2.09, and = 83.8. In the extreme-value notation, = in = 4.43 and = 1/ß = 0.478. These also are the values that would be obtained if one directly computed the maximum likelihood estimators of and in an extreme-value distribution based on the natural logarithms of the above data, y = In x. Now an upper y level confidence limit ß such that P[ß < ß] = y is given by
=
ßh1(y,
p r) = ß[(c(r - 1))/cr)]11'21
(16432)
Similarly, a lower y level confidence limit is given by
ßL = /h1(1 - y, p, r) = /1?[_(c(r - 1))/cr]11t12
(16.4.33)
For y = 0.95, based on the above data, c = 1.49 from Table 15 (Appendix C) and
ßL = 2.O9[y05(28.3)/29.8]1I25 = 1.34 and = 2.09e
95(28,3)/29.8]1h12
= 2.73
If the tables in Bain and Engelhardt (1991) based on Monte Carlo simulation are used one obtains ßL = 1 34 and ß = 2 72 in this case Note that in the extreme-value notation, 5L = l/ß and ö,, = i/ßL. We now will illustrate the lower confidence limit for reliability at time t, R(t).
The true reliability at time t = 32.46 is 0.90 for a Weibull distribution with O = 100 and fi = 2. Thus, let us compute a lower confidence limit for R(32.46) based on the above data The lower confidence limit for R(t) is given by
L(Rx(t); y) = exp [exp (cl + öz)]
(1 6.4.34)
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
16,4 WEIBULL DISTAIBUTION
in [In
where d= Table 5. We have
567
(t)], A = a
+ ¿2a22 - 2da12, and the a11 are given in
(32.46) = exp [_(32.46/83.8)2.09] = 0.871
¿ = In [in (0.871)] rc
= 1.98
A = 1.25512 + (1.98)2(0.85809) - 2(1.98)(0.46788)
2.766
and for, say, y = 0.90, we have z = 1.282 and
L(Rx(32.46); 0.90) = exp [exp (-1.98 +
= 0.801
Again, direct use of the Monte Carlo tables gives nearly the same result, 0.797. Considering the reliability at time t = 32 46 in the Weibull distribution is corn parable to considering the reliability at Yo = in (32.46) = 3.48 in the analogous
extreme-value model. Thus, for example, a 90% lower confidence limit for R(3.48) is also L(R1(3.48); 0.90) = L(R(e3'48); 0.90) = 0.801
We now will illustrate a tolerance limit or confidence limit on a percentile. The X, for the Weibull distribution and y for the extreme-value distribution are given in equations (164 21) and (16 4 22) If for example, 0.10, then 100cc percentile,
x=Ô[ln(1 _010)]i/ß_286 and
In (,) = 3.35 A lower y level tolerance limit for proportion i - c for the extreme-value distribution is, from equation (16.4.27),
L(ya; y) =
- (d +
=
d
(16.4.35)
and for the Weibull distribution
L(x, y) = exp [L(y, y)] where lOE = in [in (1 -
(164 36)
)] and dis given by equation (16426)
We have 1010 =In [ln(1 010)] = 225, and if we choose our example, then z090 = 1.282, and
y
090 in
d = {_0.46788(1.282)2 - 20(-2.25) + 1.282[((0.46788)2 _(1.25512)(0.85809))(1.282)2 + 20(1.25512)
+ 2(20)(0.46788)( 2.25) +20(0.85809)(_2.25)2]h12}/[20 - 0.85809(1.282)2] 2.95
L(y010; 0.90) = 3.35 (2.90 - 2.25)(0.478) = 3.02
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
568
CHAPTER 16 RELIABUTY ANID SURVIVAL DISTR1BUTIONS
and L(x010; 0.90) = exp (3.04) = 20.4 Again, direct use of the Monte Carlo tables gives almost the same result, 20.3. A lower y level confidence limit for the parameter or O is given by
L(; y) - - d
(16.4.37)
L(6; y) =
(16.4.38)
or
where x = 1 - 1/e and 2« = O. Note that upper y level confidence limits are given by replacing y with i - y: U(O; y) = L(O; i -
(1 6.4.39)
y)
Let us find a two-sided 90% confidence limit for O. We must compute d with z095 = 1.645 and z005 = - 1.645. This simply changes the sign of the term in brackets; thus the two values of d are given by _O.46788(1.645)2 ± 1.645[((0.46788)2
d=
_(1.25512)(0,85809))(i,645)2 + 20(1.25512)] 1/2
20 - (0 85809)(1 645)2 =0.372 or 0.515 This gives L(8, 0,95) = 83.8e°'37212'°9 = 70.1
and U(O, 0.95) = 83.8e°'51512'°9 = 107.2
Thus a two-sided 90% confidence interval for O is (70.1, 107.2), and a two-sided 90% confidence interval for is (In 70.1, ln 107.2) = (4.25, 4.67)
The corresponding interval from the Monte Carlo tables is found to be (4.27, 4.71).
SIMPLE ESTIMATORS
Computation of the MLEs is relatively simple if a computer is available; however, it sometimes is more convenient to have simpler closed-form estimators available. A small set of sufficient statistics does not exist for the Weibull distribution, so it is not completely clear how to proceed with this model. We know
that the MLEs are asymptotically efficient, and they are good estimators for
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
16.4 WEIBULL DSTRtBUTON
569
small n except for their difficulty of computation. The MLEs also are somewhat biased for small n and heavy censoring. Simpler unbiased estimators have been developed for the location scale param eters of the extreme value distribution (see Engelhardt and Bain 1977b), and these can be applied to the Weibull parameters. These estimators are very similar to the MLEs, and for the most part can be used interchangeably with them. If an adjustment is made for the bias of the MLEs, then these two methods are essen tially equivalent, particularly for the censored sampling case. The simple estimators still require some tabulated constants for their computation. Let xi.n, ..., X denote the r smallest observations from a sample of size n from a Weibull distribution, and let y = in X,,, denote the corresponding ordered extreme-value observations. The simple estimators then are computed as follows: I. Complete sample case, r = n:
where s = [0.84n] = largest integer ' 0.84n, is the mean, and y = 0.5772. Some values of k are provided in Table 16 (Appendix C). 2. Censored samples, r < n:
=
r-
i/fl =[(r_ l)Yr
= in O = y,.
r-I 11
Yj/nIc,n
(16.4.42)
¿=1
(16.4.43)
C,,
Quadratic approximations for computing k,,, and Cr, n are given by
k,,, Cr, n
k + k1/n + k2/n2
(16.4.44)
E(Yr - )/ö = Co + ci/n + c2/n2
(16.4.45)
where the coefficients are tabulated in Table 17 (Appendix C). These constants make and unbiased estimators of ö and . The values k0
and c0 are the asymptotic values as n -* x and r/n - p.
If one wishes to substitute simple estimators for the MLEs, then slightly improved results are obtained by using the following modified simple estimators:
= /[1 + Var(/)] = h7(h + 2) = - Cov(çE/ö, ,/)ö*
(16.446) (16.4.47)
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
570
CHAPTER 16
RELIABILITY AND SURVIVAL DISTRIBUTIONS
where
Cov(/& /) =
- Cr, n 2n/h
= n Cov(1/5, 3/5)
h/n = 2/n Var(/5)
d0 + d1/n + d2/n2
a0 + a1/n + a2/n2
and the coefficients are included in Table 17 (Appendix C). Similarly, approximately debiased MLEs are given by
(h-2)/? U
(h+2)
(h+2)8 U
h
& = + Cov(/ö, /) Again, we have the approximations
(1 8.4.48)
&&
c,
5, and
c
16.5 REPAIRABLE SYSTEMS Much of the theory of reliability deals with nonrepairable systems or devices, and it emphasizes the study of lifetime models. It is important to distinguish between models for repairable and nonrepairable systems. A nonrepairable system can fail only once, and a lifetime model such as the Weibull distribution provides the distribution of the time at which such a system fails, This was the situation in the
earlier sections of this chapter On the other hand, a repairable system can be repaired and placed back in service. Thus, a model for repairable systems must allow for a whole sequence of repeated failures.
One such model is the homogeneous Poisson process or HPP which was introduced in Chapter 3 In this section we will consider additional properties of the HPP and discuss some more general processes that are capable of reflecting changes in the reliability of the system as it ages. HOMOGENEOUS POISSON PROCESS We denote by X(t) the number of occurrences (failures) in the time interval [O, t].
It was found in Theorem 3.2.4 that under the following conditions X(t) is an HPP: X(0)=O.
P[X(t+ h)X(t) = nX(s)=m] = P[X(t+ h) - X(t) = n] for all O
$z
t and O
P[X(t + ist) - X(t) = i] = ).Lt + o(Át) for some constant 2> 0.
P[X(t + \t) - X(t)
2] = o(At).
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
16.5 REPAIRABLE SYSTEMS
577
In other words, if conditions i through 4 are satisfied, then
P[X(t) = n] = et(At)f/n! for all n = 0, 1, .., and some A > 0. Thus, X(t) POI,t), where ,u = E[X(t)] = At. The proportionality constant A
reflects the rate of occtrrence or inteasity of the Poisson process. Because A is assumed constant over t, and the increments are independent, it turns out that one does not need to be concerned about the location of the interval under ques-
tion, and the model X
P01(u) is applicable for any interval of length
t,
[s, s + t], with p = At. The constant A is the rate of occurrence per unit length, and the interval is t units long. This also is consistent with Theorem 3.2.4. In particular, the interval [0, t] can be represented as a union of n disjoint subinter-
vals, each of length t. If Y =
X, where X is the number of occurrences in
the ith subinterval, then p = p, = A t,, and Y represents a Poisson variable with intensity rate A relative to the interval of length t. That is, one can choose any interval, but the variable remains Poisson with the appropriate mean.
Examplo 16.5i Let X denote the number of alpha particles emitted from a bar of polonium in one second, and assume that the rate of emission is A = 0.5 per second, Thus fix = 0.5(1) = 0.5, and the Poisson model for this variable would be
f(x) = e_0.5(0.5)7x!
X
= 0, 1,
For example,
P[X = 1] =f(1)
0.3
Let Y denote the number of emissions in an eight-second interval. One may 8
consider Y =
i=i
8
X. with p =
i=1
(0.5) = 4, or one may consider the mean of Y
as At = 0.5(8) = 4. In any case
f(y) = = e44/y!
y = 0, 1,
In practice one may wish to estimate the value of p from data. A frequency histogram also would be useful to help evaluate whether the Poisson model pro vides an appropriate distribution of probability. Rutherford and Geiger (1910) observed the number of emissions in 2608 intervals of 7.5 seconds each, with the results shown in Table 16.1. Note that y denotes the number of emissions in a 7.5-second interval in this example. The table indicates that no emissions were observed in 57 intervals, i emission was observed in 203 of the intervals, and so on. If we let Y denote the number of emissions in a 7.5-second period, and if we assume a Poisson model Y - P01(u),
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
572
TABLE iei
CHAPTER 16
RELIABILITY AND SURVIVAL DISTRIBUTIONS
Observed number of alpha particle emissions in 2608 intervals of 7.5 seconds each No. of Particles Emitted, y 0 1
2
3
4 5
6 7
8 9
No. of Intervals with y Emissions, m
Estimated Expected Nos., e,
57 203 383 525 532 408 273 139 45
54.40 210.52 407.36 525,50
50842 393,52 253,82 140,32 67.88 29.19 11,30 3.97
27
10 4
10 11
12 13
2
1.28
O
0.52
2608
then it would be reasonable to estimate
t
with the sample mean,
.
In this
example the data are grouped, so 2608
I1
ym,=0(57)+ (203) + 2(383) + ..' = 10094 y='O
and = 3.870. Using the fitted model Y - P01(3.870), in 2608 observed intervals one would expect (2608)P[Y = 0] = (2608)e3'870(3.870)°/0! = 54.4 intervals with no emissions, (2608)P[Y = 1] = 210.52 intervals with i emission, and so on. These com-
puted expected numbers are included in Table 16.1. The computed expected numbers appear to agree quite closely with the observed numbers, and the suggested Poisson model seems appropriate for this problem. More formal statistical
tests for the goodness-of-fit of a model can be performed using the results of Section 13.7. If we combine the cases where y 11, the chi-square value is
f = 12.97 and y = 12 the Poisson model at the
1 - 1 = 10. Because x.9o(10) = 15.99, we cannot reject = 0.10 level.
EXPONENTIAL WAITING TIMES With any Poisson process there is an associated sequence of continuous waiting times for successive occurrences.
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
16.5
573
REPAIRABLE SYSTEMS
Theorem 165.1 If events are occurring according to an HPP with intensity parameter A, then the
waiting time until the first occurrence, T1, follows an exponential distribution, EXP(0) with O = 1/A. Furthermore, the waiting times between consecutive T1 occurrences are independent exponential variables with the same mean time between occurrences, 1/A.
Proof The CDF of T1 at time t is given by
t] = i - P[T1 > t]
F1(t) = P[T1
Now T1 > t if and only if no events occur in the interval [O, t], that is, if X(t) = O. Hence,
F1(t) = 1 - P[X(t) = O] = 1 - P0(t) = i which is an exponential CDF with mean 1/A.
The proof of the second part is beyond the scope of this book (see Parzen, 1962, P. 135).
We see that the mean time to failure, O, is inversely related to the failure inten-
sity A. The HPP assumptions are rather restrictive, but at least a very tractable and easily analyzed model is realized.
Thorm 16.5.2 If Tk denotes the waiting time until the kth occurrence in an HPP, then Tk
GAM(l/A, k).
Proof The CDF of Tk at time t is given by
Fk(t) = I - P[Tk> t] = i - P[k - i or fewer occurrences in [O, t]]
=i-
k-1 P,(t) =0
=1-
k-1 (At)1e_n/i!
which is the CDF of a gamma variable with parameters k and 1/A.
This result also is consistent with the second part of Theorem 165.1; if we assume independent Y
EXP(i/A), then
k
=
- GAM(1/A, k)
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
574
CHAPTER 16
RELIABILITY AND SURVIVAL DISTRIBUTIONS
It was observed earlier that the exponential distribution has a no-memory property This is related to the assumption of a constant failure intensity A, which implies that no wearout is occurring It sometimes is said that the exponential
distribution is applicable when the failures occur 'at random and are not affected by aging We know that the 'at random terminology is related to the uniform distribution, but it is used in this framework for the following reason. If T1, T2, ... denóte the successive times of occurrence of a Poisson process measured from time 0, then given that n events have occurred in the interval [0, t], , 7 are conditionally distributed as ordered the successive occurrence times T1, observations from a uniform distribution on [0, t].
Example 1652 Proschan (1963) gives the times of successive failures of the air conditioning system of each member of a fleet of Boeing 720 jet airplanes. The hours of flying time, y, between 30 failures on plane 7912 are listed below
23 261, 87, 7, 120, 14 62 47, 225, 71, 246, 21, 42, 20, 5, 12, 120 11, 3, 14, 71, 11, 14, 11, 16, 90, 1, 16, 52, 95 1f we assume that the failures follow an HPP with intensity A, then this set of data represents a random sample of size 30 from EXP(1/A) Using the sample mean to estimate the population mean gives Ô = = 59.6 = 1/X, and X = 1/59.6 = 0.0168.
Let X denote the number of failures for a 200-hour interval from this process. Then X follows a Poisson distribution with p = At = 200A. If we wish to estimate A using the Poisson count data, we first consider the successive failure times of the observed data, given by 23, 284, 371, 378, 498, 512, 574, 621, 846, 917, 1163,1184, 1226, 1246, 1251, 1263, 1383, 1394, 1397, 1411, 1482, 1493, 1507,
1518, 1534, 1624 1625, 1641 1693 1788
Considering the first eight consecutive intervals of length 200 hours, the numbers of observed failures per interval are 1, 3, 3, 1, 2, 2, 7, 6. Thus, these eight values represent a sample of size eight from POI(200A). Estimating the mean of
the Poisson variable from these count data gives û = = 3.125 = 200X, and X = 0.0156. The two estimates obtained for A are quite consistent, although, of course, they are not identical.
NONHOMOGENEOUS POISSON PROCESS The Poisson process is an important model for the failure times of a repairable system. In this terminology, the HPP assumptions imply that the time to first failure is a random variable that follows the exponential distribution, and also that the time between failures is an independent exponential variable. The
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
16.5
REPAIRABLE SYSTEMS.
575
assumption of a constant failure intensity parameter A suggests that the system is being maintained and not wearing out or degrading. If the system is wearing out, then the model should be generalized to allow A to be an increasing function of t.
More generally, we might want to allow the intensity to be an arbitrary nonnegative function of t. We can model this if, in part 3 of Theorem 3.2.4, we replace the constant A with
a function of t, denoted by A(t). A similar derivation yields another type of Poisson process, known as a nonhomogeneous Poisson process (NHPP).
If X(t) denotes the number of occurrences in a specified interval [O, t] for a NHPP, then it can be shown that X(t) '.. POI(p(t))
where p(t) =
Ç
A(s) ds
Jo
The CDF for the time to first occurrence,
now becomes
F1(t) = 1exp [-p(t)] An important choice for a nonhomogeneous intensity function is A(t) = (ß/OXt/O)1 which gives
p(t) = (t/8)6
In this case the time to first occurrence follows a Weibull distribution, WEI(O, ß). This intensity parameter is an increasing function of t if ß> i and a decreasing function oft if ß < 1. Theß < I case might apply to a developmental situation, in which the system is being improved over time. Note that the times between consecutive failures are not independent Weibull váriables in this case.
COMPOUND POISSON PROCESS It was noted earlier that one characteristic of the Poisson distribution is that the mean and variance have the same value. In some cases this property may not be valid, and a more flexible model is required. One type of generalization is to consider mixtures of distributions. For example, if a fraction p, called type I, of the population follows POICU1), and the fraction i - p, called type 2, follows P01(u2), then the CDF for the population distribution is given by
F(x) = P[X
xl type l]P[type I] + P[X z x type 2]P[type 2]
= F1(x, Pi)P + F2(c u2)(1 - p) The pdf would be a similar mixture of the two separate pdf's.
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
576
CI-IAPTER 16
RELIABILITY AND SURVIVAL DISTRIBUTIONS
More generally, if the population is a mixture of k types, with fraction p following the pdfj(x), then k
f(x)
p.f.(x) ¡=1
1f thej(x) are all Poisson pdf's, but with differing means, then k
f(x) =
Ic
p/x!
pe
P=
(16.5.1)
For example, suppose that a fleet of k types of airplanes is considered, with frac-
tion p of type i. These also could be the same type of airplane used under k different conditions. Assume that the number of air conditioner failures in a specified interval [O, t] from an airplane of type i follows P01(u) Now, if an airplane is selected at random from this fleet of airplanes, then the number of air
conditionet failures in time [O t] for that airplane is a random variable that follows the mixed Poisson distribution, given by equation (16.5.1). This situation is equivalent to assuming that, given p, the conditional density
of the number of failures f1,(x) is P01(u), and that p is a random variable that takes on the value p, with probability p1. In this example the p are fixed values, and the effect of drawing an airplane at random is to produce a random variable p distributed over these values. Now we consider, at least conceptually, a large fleet of airplanes in which, for
any given airplane, the number of air conditioner failures in [O, t] follows a P01(p) with p = At, however, A may be different from plane to plane In particu lar for this conceptually large population we assume that A. is a continuous
random variable that follows a gamma distribution, GAM(y, K). That is, 2K_le_A/Y
fA(2)=
K
O<À
and fxIA(x)
= et(At)x/x!
X=
Note that in the context of a Bayesian analysis of a Poisson model, the density
f2(A) corresponds to a prior densit' for the parameter, and the mathematics involved here is essentially equivalent to that involved in the associated Bayesian development. The differences between the two problems depend on the philosophy for introducing a density function for the parameter, and the interpretation of the results. In a Bayesian analysis the parameter may have been considered
fixed, but the prior density reflects a degree of belief about the value of the parameter or some previous information about the value of the parameter.
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
16,5 REPA!ABLE SYSTEMS
577
The marginal density for the number of failures in time [O, t] for an airplane selected at random is given in this case by f(x, 2) dA
fx(x) =
= j fxj(x)f(2) dA =1
jo
e t(At)x A'e x! r(K)y (x+K-1
-
(yt)x
X
dA
=0, 1,...
where O < K < cn and O < y < cc. This is a form of negative binomial distribution, with p 17(1 + yt), and it is referred to as a compound Poisson distribution with a gamma compounding density Thus, the negative binomial distribution represents a generalization of the Poisson distribution, and it converges to the Poisson distribution when the gamma prior density becomes degenerate at a constant. The negative binomial model is used frequently as an alternative to the Poisson model in analyzing
count data particularly when the variance and the mean cannot be assumed equal.
The mean and variance for the negative binomial variable in the above nota-
tion are given by
E(X) = Kyt = tE(A)
Var(X) = Kyt(t + i) = tE(A) + t2 Var(A)
We see that Var(X) ? E(X), and the Poisson case holds as Var(A) = icy2 -9 0. Of course, other compound Poisson models can be obtained by considering compounding densities other than the gamma density; however, the gamma density is a very flexible two-parameter density, and it is mathematically convenient. The unknown parameters in this case are ic and y, and techniques developed for the
negative binomial model may be used to estimate these parameters based on observed values of x.
If one follows through the Bayes Rule, then an expression for the conditional density of A given x may be obtained fAIX(A) -
f(x, A) fx(x) f12(x)f2(A)
f(x) Simplification shows that A x GAM(y/(yt + 1), X. +
ic)
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
578
CHAPTER 16
RELIABILITY AND SURVIVAL DISTRIBUTIONS
COMPOUND EXPONENTIAL DISTRIBUTION
If air Conditioner failure for a given airpïane occurs according to a Poisson process with failure intensity A, then the time to first failure, T, follows an exponential distribution If we again assume that the intensity parameter varies from
airplane to airplane according to a gamma distribution, then the time to first failure for an airplane selected at random follows a compound exponential distribution.
f(t) =
fT. A(t, A) dA
= j fr(t)f) di. dA
= Ky(yt + 1)_(K+i)
0<
t<
This is a form of the Pareto distribution, T '-.' PAR(1/y, K).
Example 16.5.3 Proschan (1963) gave air conditioner failure data for several airplanes. Ten of these airplanes had at least 1000 flying hours. For these 10 planes the numbers of failure, x, in 1000 hours are recorded below: Airplane:
7908
7909
7910
7911
7912
7913
7914
7915
8044
8045
x:
8
16
9
6
10
13
16
4
9
12
For this set of data, s2 = 15.79 and
= 10.30. These results suggest that the
mean and variance of X may not be equal and that the compound Poisson (negative binomial) model may be preferable to the Poisson model in this case, however, additional distributional results are needed to indicate whether this magnitude of difference between 2 and reflects a true difference, or whether it could result from random variation It has been shown in the literature for this case that approximately
(n 1)S2
x2(n - 1)
when X does follow a Poisson model. In our problem (n - 1)s2/. = 9(15.79)/10.30
= 13.80. Now P[2(9)
13.8]
0.13; thus, the observed ratio is larger than
would be likely. However, there is an approximate 13% chance of getting such a result when the Poisson model is valid.
Greenwood and Yule (1920) studied the number of accidents during a five-
week. period for 647 women in a shell factory. It turned out that a Poisson
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
EXERCISES
579
process appeared reasonable for each individual, but the intensity varied from individual to individual. That is, some workers were more accident-prone than others They found that a gamma distribution provided a good compounding distribution for the accident intensities over workers, and that the negative binomial model provided a good model for the number of accidents of a workei selected at random.
SUMMARY Our purpose in this chapter was to introduce some basic concepts of reliability
and to develop the mathematical aspects of the statistical analyses of some common life testing models
Various characterizations of models in reliability theory can be given The most basic is the reliability function (or survivor function) that corresponds to the
probability of failure after time t, for each positive t. The hazard function (or failure-rate function) provides another way to characterize a reliability model. The hazard function gives a means of interpreting the model in terms of aging or wearout. If the hazard function is constant, then the model is exponential. An increasing hazard function is generally interpreted as reflecting aging or wearout of the unit under test. The gamma and Weibull distributions are two different models that include the exponential model, but also allow, by proper choice of the shape parameter, for
an increasing hazard function These models also admit the possibility of a decreasing hazard function, although this is less common in reliability applica tions.
Most of the statistical analyses for parametric life-testing models have been developed for the exponential and Weibull models. The exponential model is generally easier to analyze because of the simplicity of the functional form and some special mathematical properties that hold for the exponential model. However, the Weibull model is more flexible, and thus it provides a more realistic
model in many applications, particularly those involving wearout or aging. Although the Weibull distribution is not a location-scale model, it is related by means of a log transformation to the extreme-value model. This makes the derivation of confidence intervals and tests of hypotheses possible because pivotal quantities can be constructed from the MLEs.
EXERCISES 1.
Consider a random sample of size 25 fromf(x) = (1 + x)2, O < x < a. Give the likelihood function for the first 10 ordered observations. Give the likelihood function for the data censored at time x = 9
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
580
CHAPTER 16
RELIABILITY AND SURVIVAL DISTRIBUTIONS
What is the probability of getting zero observations by x = 9? What is the expected number of observations by time x = 9? What sample size would be needed so that the expected number of observations by x = 9 would be 20? Approximately what sample size would be needed to be 90% sure of observing 40 or more observations by x = 9? Given that R = 20 observations occurred by x 9, what is the joint conditional density of these 20 observations?
2. Rework Exercise 1, assuming X
EXP(4).
3. Suppose thatf(x) = 1/O, O < x < O. Find the MLE of O based on the first r ordered observations from a sample of size n.
4.
Suppose that X PAR(O, k). Determine the hazard function.
Express the i - p percentile, x1_,.
5. Can h(x) = e 6.
be a hazard function?
Find the pdf associated with the hazard function h(x) = ex.
7. A component in a repairable system has a mean time to failure of 100 hours. Five spare components are available. What is the expeòted operation time to be obtained? If T EXP(100) for each component and the five spares what is the probability that the system will still be in operation after 300 hours? (e) How many spares are needed to have a system reliability of 0.95 at 300 hours?
8. The six identical components considered in Exercise 7(b) are connected as a parallel system.
What is the mean time to failure of this parallel system? How many of these components would be needed in a parallel system to achieve a mean time to failure of 300 hours? (e) What is the reliability of this parallel system at 200 hours?
9. The six components considered in Exercise 7(b) now are connected in series. What is the mean time to failure of this series system? What is the reliability of the system at 10 hours? (e) What mean time to failure would be required for each component for the series system to have a reliability of 0.90 at 20 hours? (d) Give the hazard function for the series system. 10.
Rework Exercise 9, assuming that the 7
11.
Rework Exercise 8(c), assuming that 7
PAR(400, 5), PAR(400, 5).
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
EXERCISES
581
12. The failure time of a certain electronic component follows an exponential distribution, X EXP(0). In a random sample of size n = 25, one observes 75 days. Compute the MLE of the reliability at time 8 days, R(8). Compute an unbiased estimate of R(8). Compute a 95% lower confidence lìmit for R(8). Compute a 95% lower tolerance limit for proportion 0.90.
13. Grubbs (1971) gives the following mileages for the failure times of 19 personnel carriers: 162, 200, 271, 302, 393, 508, 539, 629, 706, 777, 884, 1008, 1101, 1182, 1463, 1603, 1984, 2355, 2880 Assume that these observations follow a two-parameter exponential distribution with known threshold parameter, ,i = loo; that is, X EXP(O, 100). Give the distribution of 2n( - 100)/O. Compute a lower 90% confidence limit for 8. Compute a 95% lower tolerance limit for proportion 0.80.
14,
Rework Exercise 13, assuming that only the first 15 failure times for the 19 carriers were recorded.
15.
Wilk et al. (1962) give the first 31 failure times (in weeks) from an accelerated life test of 34 transistors as follows:
3, 4, 5, 6, 6, 7, 8, 8, 9, 9, 9, 10, 10, 11, 11, 11, 13, 13, 13, 13, 13, 17, 17, 19, 19, 25, 29, 33, 42, 42, 52 It may be that a threshold parameter is needed in this problem but for illustration purposes suppose that X EXP(0). Estimate O.
Compute a 90% lower confidence limit for 8. Compute a 90% lower tolerance limit for proportion 0,95. Compute a 50% lower tolerance limit for proportion 0,95. Estimate x005.
16.
Suppose in Exercise 15 that the experiment on the 34 transistors had been terminated after 50 weeks,
Estimate O. Set a lower 0.90 confidence limit on 8. Compute a lower (0.90, 0.95) tolerance limit,
17.
One hundred light bulbs are placed on test, and the experiment is continued for one year. As light bulbs fail they are replaced with new bulbs, and at the end of one year a total of 85 bulbs have failed. Assume EXP(8). Estimate O,
Test H0 : O ? 1.5 years against Ha : O < 1,5 at a 0.05. Compute a 90% two-sided confidence interval for 8.
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
582
CHAPTER 16
RELIABILITY AND SURVIVAL DISTRIBUTIONS
If bulbs are guaranteed for six months, estimate what percentage of the bulbs will have to be replaced. What warranty period should be offered if one wishes to be 90% confident that at least 95% of the bulbs will survive the warranty period?
18. Consider the ball bearing data of Exercise 21 in Chapter 13. If we assume that this set of data is a complete sample from a Weibull distribution, then the MLEs are = 2.102 and Ô = 81.88.
Use equation (164 33) to compute an approximate lower 0 95 le tel confidence limit for ß.
Use equation (16.4.34) to compute an approximate lower 0,90 level confidence limit for R(75). Compute the MLE of the 10th percentile, x010. Use equation (16.4.36) to compute a lower 0.95 level tolerance limit for proportion 0.90.
Use equation (16.4.38) to compute an approximate lower 0.90 level confidence limit for O.
19. Consider the censored Weibull data of Example 16,4.1. Compute the simple estimates and = 1/s, using equation (16.4.42). Compute the simple estimates and Ô = exp (e), using equation (16.4.43).
20. Compute the simple estimates of ¿5, ß, , and O, using equations (16.4.40) and (16.4.41), with the ball bearing data of Exercise 21 in Chapter 13
21. Let X
P01(u). Show that
- 1; p)
for x
f(x-1;u)>f(x;p)
forx>p
and
22. Verify equation (16.2.9). 23. Let X denote the number of people seeking a haircut during a one-hour period, and suppose that X P01(4). If a barber will service three people in an hour: What is the probability that all customers arriving can be serviced? What is the probability that all but one potential customer can be serviced? How many people must the barber be able to service in an hour to be 90% likely to service everyone who arrives? What is the expected number of customers arriving per hour? Per 8-hour day? What is the expected number of customers serviced per hour? If two barbers are available, what is the expected number of customers serviced per hour?
24. Assume that the number of emissions of particles from a radioactive source is a homogeneous Poisson process with intensity A = 2 per second. (a) What is the probability of O emissions in 1 second?
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
EXERCISES
583
(b) What is the probability of O emissions in 10 seconds? (e) What is the probability of 3 emissions in i second? td) What is the probability of 30 emissions in 10 seconds? (e) What is the probability of 20 emissions or less in 10 seconds?
25
In World War II, London was divided into n = 576 small areas of 1/4 square kilometer each. The number of areas, m,, receiving y flying bomb hits is given by Clarke (1946), and these are listed below. 0
i
2
3
4
>5
m: 229
211
93
35
7
1
y:
The total number of hits was 537. Although clustering might be expected in this case, the Poisson model was found to provide a good fit to this data. Assume that Y POI(,u). Estimate from the data. Under the Poisson assumption, compute the estimated expected number of areas receiving y hits, ê = nf(y, ,û), for each y, and compare these values to the observed
values m. (e) What is the estimated probability of an area receiving more than one hit?
26. Mullet (1977) suggests that the goals scored per game by the teams in the National Hockey League follow independent Poisson variables, The average numbers of goals scored per game at home and away by each team in the 1973-74 season are given in Exercise 2 of Chapter 15. Assume a Poisson model with these means. What is the probability that Boston scores more than three goals in any away game? What is the probability that Boston scores more than six goals in two away games? What is the most likely number of goals scored by Boston in one away game? If the first eight teams play at home against the other eight teams, what is the distribution of S, the total number of home goals scored? What is the distribution of T, the total number of home and away goals scored in the eight games? What is the distribution of the total number of goals scored by Boston in a 78-game season? If Boston plays Atlanta in Atlanta, what is the probability that Boston wins? That is P[X < Y], where Y represents the number of Boston goals and X represents the number of Atlanta goals.
27. The probability of a typographical error on a page is 0,005. Using a Poisson approximation: What is the expected number of errors in a 500-page book? What is the probability of having five or fewer errors in a 500-page book? What size sample (of pages) is needed to be 90% sure of finding at least one error?
2& A certain mutation occurs in one out of 1000 offspring. How many offspring must be examined tobe 20% sure of observing at least one mutation?
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
584
CHAPTER 16
RELIABILITY AND SURVIVAL DISTRIBUTIONS
29. Suppose that X
BIN(20, 0.1).
Compute P[X 5]. Approximate P[X 5] with a Poisson distribution. (e) Approximate P[X 5] with a normal distribution.
30. In Exercise 23, what is the probability that the barber can finish a 10-minute coffee break before the first customer shows up? What is the probability that a person arriving after 30 minutes will not get serviced? What is the mean waiting time until the first customer arrives?
31. In Exercise 24, what is the probability of at least one emission within 0.5 seconds? What is the probability that the time until the third emission is less than 0.5 seconds? What is the mean time until the third emission?
32. Suppose that the breakdowns of a repairable system occur according to a nonhomogeneous Poisson process with failure intensity 2(t) = 2t/9, where time is measured in days. What is the mean number of breakdowns in one week? What is the probability of five or fewer breakdowns in a week? (e) What is the probability that the first breakdown will occur in less than one day? What is the average time to the first breakdown? If 10 independent systems were in operation, what would be the mean time to the first breakdown from any of the 10 systems?
33. Let X, f(x1) with E(X,) = p and Var(X1)
. Find the mean and variance of the
mixed density
f(x)
=
p, f(x)
34, In Exercise 26, Boston has two home games and one away game. If one of these games is selected at random, what is the probability that the number of goals scored in it will be less than or equal to 4? What is the expected number of goals scored in the game?
35.
Assume that the 16 NHL teams given in Exercise 26 represent a random sample from a conceptually large population of teams, and that the number of goals scored per home game for any team selected at random follows a Poisson model for fixed p, XI p P01(p), where p GAM(y, 2) Estimate y by using the average of the 16 at-home values to estimate the mean of the gamma distribution. What is the marginal pdf of the number of at-home goals scored by a team selected at random? (e) Find P[X 4j using k and y values from (a). (d) Estimate E(X) and Var(X).
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
EXERCISES
585
36. For the airplane air conditioner time-to-failure example in Section 155, suppose that A
GAM(0.0005, 20).
What is the probability of no failure in 100 hours for a plane selected at random? What is the mean time to first failure of an airplane selected at random? What is the probability that the time to first failure is less than 100 hours?
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
A
F
E
N
D
I
REVIEW OF SETS
The study of probability models requires a familiarity with some of the basic notions of set theory.
A set is a collection of distinct objects. Other terms that sometimes are used instead of set or collection are family and class. Sets usually are designated by capital letters, A, B, C, ..., or in some instances with subscripted letters A1, A2, A3 .....In describing which objects are contained in a set A, two methods are available: The objects can be listed. For example, A = {1, 2, 3} is the set consisting of the integers 1, 2, and 3. A verbal description can be used. For example, the set A above consists of "the first three positive integers." A more formal way is to write A = {x x is an integer and i x 3}. More generally, if p(x) is a statement about the object x, then {x I p(x)} consists of all objects x such that p(x) is a true statement. Thus, if A = {x p(x)}, then a is in A if and only if p(a) IS true This also can be related to the listing method
ifp(x) is the statement x = a1 or x = a2 or ..., or x = a when A = {a1, a2,
..., a}.
587
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
588
APPENOIX A REVIEW OF SETS
The individual objects in a set A are called elements. Other terms that sometimes are used instead of element are member and point. In the context of probability, the objects usually are called outcomes. When a is an element of A we write a E A, and otherwise a A. For example, 3 e {1, 2, 3}, but 4 {1, 2, 3}. In most problems we can restrict attention to a specific set of elements and no others. The universal set, which we will denote by S, is the set of all elements under consideration. In probability applications, such a set usually is called the sample space, and it consists of all outcomes of some experiment that is to be performed.
Another special set, called the empty set or null set, is denoted by Ø. It is the set that contains no elements. For example {x I x is an integer and x2 = 2} = are not integers. because the solutions x = ± In some cases all of the elements in a set A also are contained in another set B.
If this isthe case, then we say that A is a subset of B, denoted A c: B. For example, if A = {l, 2, 3} and B = {1, 2, 3, 4), then A B. It is always the case that Ø c A c S, for any set A under consideration. There are standard ways to combine two or more sets into a new set: The intersection of two sets A and B, denoted by A n B, is
A n B = {xlx e A and x e B) For example, ifA = {l, 2, 3} and B = {2, 3, 4), then A n B = {2, 3). The union of two sets A and B, denoted by A u B, is
AuB{xlxeA or xeB} For example, if A and B are the sets given in part 1, then
A u B =fl, 2,3, 4). The com!ement of a set A, denoted by A' or A, is
A'=A={xxaS and xA} For example, ifA is the set given in part land S = {l, 2,3,4, 5), then A' = {4, 5}.
The difference ofA and B is A - B = A n B'. Sometimes it is convenient to use a graphical device known as a Venn otagram Such diagrams for intersection, union, and complement are given in Figure Al. The points inside the rectangles are, associated with S, and the points inside the circles are associated with the sets A and B. The shaded regions correspond to the intersection, union, and complement respectively. In some cases, two sets A and B have no elements in common. This can be expressed by writing A n B = Ø, and saying that A and B are disjoint. In prob.. ability applications we say that A and B are mutually exclusive in this case. The Vern diagram of disjoint sets corresponds to nonoverlapping circles, as shown in Figure A.2.
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
APPENDIX A REVIEW OF SETS
589
FIGURE Ai
AflB
00
FIGURE A.2
AflB=
The notions of intersection and union can be extended to more than two sets. We can define the intersection and union of three sets A, B, and C to be respectively
A n B nC = {xlx e A and x E B and xe C} and
AuBuC={xlxA or xeB or xeC} Another way to accomplish this would be to use parentheses along with the of intersection and union of two sets For example, A n B n C = A n (B n C) = (A n B) n C To avoid ambiguity it would be desirable to establish that the way the sets are grouped with parentheses does not make a difference. This and several other properties of "set algebra" are stated in definitions
the following theorem.
Theorem A .1
For any subsets A, B, and C of S, the following equations are true:
1.Au(BuC)=(AUB)uCandAn(BnC)=(AnB)nC. 2.AuB=BuAandAnB=BnA.
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
590
APPENDIX A REVIEW OF SETS
Au(BnC) (AuB)n(AuC) and A n(Bu C)=(A n B)u(A n C).
Aø=A and AnSA.
AuA'=SandAnA'=Ø. Each assertion can be verified easily by Venn diagrams; however, a more formal way would involve showing that the set on either side of the inequality is
included in the set on the other side. Equations 1, 2, and 3 are referred to as associative, commutative, and distributive laws, respectively Other useful identities are given in the following theorem Theorem A .2
For any subsets A and B of S, the following equations are true: (A')' = A.
Ø' = S and S' = Ø.
AuA=A and AhA=A.
AuS=S and AnØ=Ø. A u (A n B) and A n (A u B) A. (A u B)' = A' n B' and (A n B)' = A' u B'. The identities given in part 6, known as De Morgan's laws, are particularly useful in many probability applications.
A third theorem gives identities that are useful when one set is a subset of another.
Theorem A .3
The following statements about sets A and B are equivalent: A
B.
A n B = A.
A uB = B. Notice that property 4 of Theorem A.1 and properties 3, 4, and 5 of Theorem
A.2 can be viewed as corollaries of Theorem A.3. because Ø A S, A '= A, A n B is a subset of both A and B, A and B are both subsets of A u B, and
A n Bc A u B. The notions of intersection and union are extended easily to more than three sets, but it is more convenient in this case to use subscripted set notation A1, A2, 1. The intersection of A1, A2, ..., A is defined as
A1nA2n»nA={xIxeAfora1li=1,2,..,n}
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
591
APPENDIX A REVIEW OF SETS
2. The union of A1, A1, ..., A is defined as
A1 u A2 u
u A = {xlx e A for at least one i = 1, 2, ..., n}
More concise notations for these expressions are, respectively,
fl A and
1=1
U A1, and the terms finite intersection and finite union, respectively usually are
¡=1
applied to them.
There are counterparts, in the case of n sets, to many of the properties in Theorems A 1, A 2, and A 3, but they generally are harder to state One property that is very useful in the area of probability is a generalization of the distributive law.
Theorem A .4
If A1, A2,
..., A and B are subsets of S, then the following equations are true:
1.Bn(A1uA2u.uA)=(BnA1)u(BnA2)u.u(BnA). 2.Bu(A1nA2nnA)=(BuA1)n(BuA2)n.n(BuA). Property i is the most frequently used of the two statements, because it provides a way to partition a set B into subsets, In particular, suppose that A1, A2, A are pairwise disjoint sets (A1 n A = Ø if i which also are exhaustive in the sense that A1 u A2 u '' u A = S. It can be established from the preceding theorems that
B= (B n A1) u (B n A2) u «' u (B n A) which partitions B into disjoint sets B n A1, B n A2, ..., B n A. This partitioning also is seen easily by means of an appropriate Venn diagram, such as that in Figure A.3.
FIGURE A.3
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
592
APPENDIX A REVIEW OF SETS
In the development of probability it often is necessary to consider higher dimensional or vector quantities The notion of Cartesian products is useful in this regard.
If A and B are two sets, then the Cartesian product of A and B, denoted by A x B, is defined to be the following set of ordered pairs:
A XB={(x,y)IXEA and yeB} For example, if A and B are the closed intervals A = [1, 3] = {xlx is real and y 2}, then A x B can be x z 3} and B = [1, 2] = y y is real and i i represented as a rectangle in the xy plane, as shown in Figure A.4. Notice that if we associate A and B with the corresponding Cartesian product sets A* = A X (- co, co) and B* = (- co, co) x B then the Cartesian product set A X B is identical to the intersection A* n B*. This correspondence is useful in certain probability problems in which an experiment consists of performing two successive steps, such as tossing a coin twice or drawing two cards from a deck.
Some problems also require higher-dimensional Cartesian product sets. 1f A1, A2, ..., A are sets, then the n-fold Cartesian product consists of the following set of n-tuples:
The question of how many elements are in a set often is of considerable importance in probability applications. A set A is said to be finite if its elements corre-
spond in a one-to-one manner to the elements in a set of integers of the form {l, 2, ..., n} for some positive integer n. It is said to be countably infinite if its elements correspond in a one-to-one manner to the elements in the set of all positive integers {1, 2, .. .}. For example, the set of all positive even integers
FIGUR A.4
Y
-B
B
A1
X
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
APPENDIX A REVIEW OF SETS
{2, 4, 6,
.
.
593
.} is countably infinite, because each positive integer has the form 21 for
some positive integer i. This establishes a one-to-one correspondence, i4-*2i. Although the correspondence is harder to describe, the set of all integers also can be put into one-to-one correspondence with the set of positive integers, and hence is countably infinite. A set is said to be countable if it is either finite or countably infinite.
The notions of intersection and union are extended easily to a countably infinite collection (or an infinite sequence) of sets. The intersection of A1, A2, ... is defined as
A1 nA2.={xjxeA1foral1i=1,2,...} The union of A1, A2, ... is defined as
A1 u A2 u More concise
= {x I x e A1 for at least one i = 1, 2,
otations for these expressions are, respectively,
.
fl
A1 and
1=1
A.. 1=1
As an example, let A1 = {1}, and let A. = {1, 2,
n A = {1} and
A1 = {1, 2,
. .
..., i} for i = 2, 3.....Then
.}. These are called, respectively, countably
infinite intersections and countably infinite unions. An intersection (or union) is called a countable intersection (or union) if it is either a finite intersection (or union) or a countably infinite intersection (or union).
Infinite sets that are not countably infinite are difficult to characterize in general. However, the only ones that will be of interest in this development are intervals, Cartesian products of intervals, and finite or countably infinite unions and intersections of such sets.
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
A
P
P
E
N
D
X
594
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
595
APPENDIX B SPECIAL DISTRIBUTIONS
TABLE Bi
Special discrete distributions Name of Distribution Binomial
Notation and Parameters
X- BIN(n,p)
q1 -p
BIN(1p)
X
O
q1 -p
Negative Binomial
X- NB(r, p) O
r1, 2,... Geometric
f n\
XI O
Bernoulli
Discrete pdf f(x)
X
GEO(p)
Mean
MGF M(t)
Variance
np
npq
(po'
p
pg
po'
f/p
rq/p2
i/p
g/pt
x=0,1 .....n
p'q'-' X = 0,1
r
(X - 1\
1)p'q'
( po' 1 qe'
X = r, r + 1,
pg'-1
pe'
i qe'
0
g1 -p
Hypergeometric
X .' HYP(n, M, N)
n='1,2.....N M=0,1 .....N Poisson
X'-'POI(L/)
nM/N
n
M
M'N-n N) N-1
x=O,1 .....n
ep' Xi
O
xÄ nx)/n)
x=0,1,
XDU(N)
1/N
N=1,2,...
x=1,2.....N
N+1
N2-1
2
12
1 e'e""' i-e'
N
Not tractable.
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
596
APPENDIX B SPECIAL DISTRIBUTIONS
TABLE B.2
Special continuous distributions Distribution
Notation and Parameters
Uniform
X
Name of
Continuous pdf f(x)
UNIF(a, b)
Variance
a-4-b
(b-a)2
2
12
b-a
MGF M(t) ¿21_
(b-a)t
a
a
Normal
Mean
1
X .- N(p ¿2)
Il
p/I 12/2
o < a2
Gamma
X GAM(0 ,)
O
F(w)
0<9
x
e
°
K
KO
-Ot
0
0
X
L
EXP(0)
1 - Of
O
0<0 Two-Parameter Exponential
Double-Exponential
X
o
a"
EXP(O, n)
i - Ot
0<0
n
X--. OE(O, n)
20
e"' 1 - 02t2
0<0 Waibull
or(i
X-.. WEI(O,ß)
0<0 0<ß 0<ß Extreme value
X--. EV(0, n)
0<0
+)
o
1
) (1
O2[f(1
-r2(1 -/--
¿292
exp{((x-)/O]
-exp{(x - n)/O])
n-v0 y
6
e'T(l +Ot)
0.5772 (Eu 1ers
constant)
**
Cauchy
X - CAU(0, n)
**
**
+ [(x-)/O]2}
0<0 Pa rato
X-. PAR(O, IC)
0<0 0
K
0(1
**
o
+x/O)1
K1
(IC_2)(/C_1)2
0
1
2
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
APPENDIX B
SPECIAL DISTRIBUTIONS
597
TABLE 8.2 (continued) Name of
Distribution
Notation and Parameters
Lognormal
X- LOGN(p, a2)
Continuous pdf f(x)
Mean
1-y)/) 2/2
Variance
MGF M5(r)
e20+2(e2 - 1)
2xxu 0< 02
Logistic
X LOG(0 ?)
I
exnf(x-nIfì1
+exp[(x-)/0»2
e'ir0tcsc (irüt)
3
0<0
X'-'2(v)
Chi -Square
2'22F(v/2)
x"210-"2
(1 \v/2
2v
V
1 -2t)
o
fv+1\ Student's t
X
t(v)
2 ) i / x2\'"'2
ç(ï /"(1+-
o
v-2
i
2
V
**
\,2J
Z
+
Snedocor's F
X
\ F(v1, 02)
) (v1l2(,2)_
2
r(-'r(' \v2) \2J \2/
y1
=1,2,
V21 2,.. Beta
X - BETA(a, b)
O
0
/ y \ -('i 0(1+_x) \
V2
F(a+b)
r(a)r(b)
J
xol (1 x)5
y2
2 -2
2v(v1 + y2 -2) Vi (
2)2(v2
-4)
2
4
a+b
(a+b+i)(a+b)2
ab
o
* Not tractable. Does not exist.
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
A
p
p
E
N
D
I
TA3LES OF DISTRIBUTIONS
The authors thank the organizations mentioned below for granting permission to use certain materials in constructing some of the tables in Appendix C:
Portions of Table 8 are excerpted from Table A-12C of Dixon & Massey, Introduction to Statistical Analysis, 2nd ed., McGraw-Hill Book Co., New York.
Portions of Table 9 are excerpted from Table i of "Modified Cramer-Von Mises Statistics for Censored Data," Biometrika, 63, 1976, with the permission of the Biometrika Trustees.
Portions of Tables 10 and 11 are excerpted from Table i of "EDF Statistics for Goodness-of-Fit and Some Comparisons," Journal of the American Statistical Association, 69, 1974.
Portions of Tables 10 and li are excerpted from Table i of "Goodness-of-Fit for the Extreme-Value Distribution," Biometrika, 64, 1977, with the permission of the Biometrika Trustees. 598
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
x 0.05
7
6
5
0.15
0.20
0.25
0.30 0.35
0.40 0.45
0.50 0 55
0.60
0.65
0.70 0.75
080
0.85
0.90
0.95
0.6561 0.5220 0.4096 0.3164 02401 01785 01296 00915 0.0625 00410 00256 00150 0.0081 0.0039 0.0016 00005 00001 00000 0.9860 0.9477 0.8905 0.8192 0.7383 06517 05630 04752 03910 03125 02415 01792 0.1265 0.0837 0.0508 00272 00120 00037 00005
0.10
p)
0.1681
01160 00778 00503 00312 00185 0,0102 00053 00024 00010 00003 00001 00000 00000
0.6983 0.9556 2 0,9962 3 0.9998 4 1.0000 5 1.0000
6
1
0
0.0016 0.0188
0.0000 00000 0.0000 0.0046 0.0016 0.0004 0.0001 0.0000 0.0705 0.0376 0.0170 0.0059 0.0013 0.0001 0.2557 0.1694 0.0989 0.0473 0,0158 0.0022 0.5798 0.4661 0.3446 0.2235 0.1143 0.0328 0.8824 0.8220 0.7379 0.6229 0.4686 0.2649 0.0109
0.0002 0.0038 0,0444 0,3017
0.0006 0.0002 0.0001 0.0000 0.0000 0,O'ÖOO 0.0000 0.0090 0.0038 0.0013 0.0004 0.0001 0.0000 0.0000 0.1529 0.0963 0,0556 0.0288 0.0129 0.0047 0.0012 0.0002 0.0000 0,0037 0.0357
0,0410 0.0223 0.1792 0.1174 0.4557 0.3529 0.7667 0.6809 0.9533 0.9246
0.5000 0.3917 0.2898 0.1998 0.1260 0.0706 0.0333 0.0121 0.0027 0.7734 0.6836 0.5801 0.4677 0.3529 0.2436 0.1480 0.0738 0.0257 0.9375 0.8976 0.8414 0.7662 0,6706 0.5551 0.4233 0.2834 0.1497 0.9922 0.9848 0.9720 0.9510 0.9176 0.8665 0.7903 0,6794 0.5217
0.0078 0.0625 0.2266
0.2333 0.1636 0.1094 0.0692 0.5443 0.4415 0.3438 0.2553 0.8208 0.7447 0.6562 0.5585 0.9590 0.9308 0.8906 0,8364 0.9059 0.9917 0.984.4 0.9723
0,4783 0.3206 0.2097 0.1335 0.0824 0.0490 0.0280 0.0152 0.8503 0,7166 0.5767 0.4.449 0.3294 0.2338 0.1586 0.1024 0.9743 0.9262 0.8520 0.7564 0.6471 0.5323 0,4199 0.3164 0,9973 0.9879 0.9667 0.9294 0.8740 0.8002 0.7102 0.6083 0.9998 0.9988 0.9953 0.9871 0,9712 0.9444 0.9037 0.8471 1,0000 0.9999 0.9996 0,9987 0.9962 0.9910 0.9812 0.9643 1.0000 1.0000 1.0000 1.0000 0.9999 0.9998 09994 0.9984 0.9963
0.2621 01780 01176 00754 0.9672 0.8857 0.7765 0.6554 0.5339 0.4202 0.3191 0.9978 0,9842 0.9527 0.0911 0.8306 0.7443 0.6471 3 0.9999 0.9987 0.9941 0.9830 0.9624 0.9295 0.8826 4 1,0000 0.9999 0.9996 0.9994 0.9954 0.9891 0.9777 5 1.0000 1.0000 1.0000 0.9999 0.9998 0.9993 0.9982
2
1
00467 00277 00156 00083 00041 00018 0.0007 00002 00001
1.0000 0 9995 09978 09933 0 9844 09692 09460 0 9130 08688 08125 07438 0 6630 0 5716 04718 03672 0.2627 0 1648 00815 0.0226 1 0000 1 0000 09999 0.9997 09990 09976 09947 0 9898 09815 0 9688 0.9497 09222 0 8840 0 8319 07627 0 6723 0 5563 04095 0.2262
0.9774 0.9185 0.8352 0.7373 0.6328 05282 04284 03370 02562 01875 01312 00870 00540 00308 00156 00067 00022 0.0005 00000 0.9988 0 9914 0 9734 0 9421 08965 0 8369 07648 0.6826 0 5931 0 5000 04069 0 3174 0 2352 0 1631 0 1035 00579 00266 00086 00012
0.7738 05905 04437 03277 0.2373
0 07351 05314 03771
4
2 3
1
0
2 0.9995 09963 0.9880 0.9728 09492 0.9163 08735 0.8208 07585 06875 0,6090 05248 04370 0.3483 02617 01808 0.1095 00523 00140 3 1.0000 0.9999 09995 09984 09961 09919 09850 09744 09590 09375 09085 08704 08215 07599 06836 0.5904 04780 03439 01855
1
4 0 0.8145
n
p
B(x;n,P)=(fl)Pk(l
Binomial CumWative Distribution Function
TA/:LE i
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
9
8
10
n
0.05
0.10 0.15
0.20 0.25
030 0.35 0.40
0.45
p 0.50 0.55
0.60
065 0.70
075
0.80
0.85
0.90
0.95
9
7 8
6
5
4
3
2
1
0 9139 0.7361
0.0001 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.5443 0.3758 0.2440 0.1493 0.0860 0.0464 0.0233 0.0107 0.0045 0.0017 0.0005 0.0001 0.0000 0.0000 0.0000 0.0000 0.0000 09895 0.9298 0.8202 0.6778 0.5256 0.3828 0.2616 0.1673 0.0996 0.0547 0.0274 0.0123 0.0048 0.0016 0.0004 0.0001 0.0000 0.0000 0.0000 09990 0.9872 0.9500 0.8791 0.7759 0.6496 0.5138 0.3823 0.2660 0.1719 0.1020 0.0548 0.0260 0.0106 0.0035 0.0009 0.0001 0.0000 0.0000 0.0999 0.9984 0.9901 0.9672 0.9219 0.8497 0.7515 0.6331 0.5044 0.3770 0.2616 0.1662 0.0949 0.0473 0.0197 0.0064 0.0014 0.0001 0.0000 1.0000 0,9999 0.9986 0.9936 0.9803 0.9527 0.9051 0.8338 0.7384 0.6230 0.4956 0.3669 0.2485 0.1503 0.0781 0.0328 0.0099 0.0016 0.0001 1.0000 1.0000 0.9999 0.9991 0.9965 0.9894 0.9740 0.9452 0.8980 0.8281 0.7340 0.6177 0.4862 0.3504 0.2241 0.1209 0.0500 0.0128 0.0010 1.0000 1.0000 1.0000 0.9999 0.9996 0.9984 0.9952 0.9877 0.9726 0.9453 0.9004 0.8327 0.7384 0.6172 0.4474 0.3222 0.1798 0.0702 0.0115 1.0000 1.0000 1.0000 1.0000 1.0000 0.9999 0.9995 0.9983 0.9955 0.9893 0.9767 0.9536 0.9140 0,8507 0.7560 0.6242 0.4557 0.2639 0.0861 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 0.9999 0.9997 0.9990 0.9975 0.9940 0.9865 0.9718 0.9437 0.8926 0.8031 0.6513 0.4013
0 05987 03487 01969 01074 00563 0.0282 00135 00060 0.0025 0.0010 0.0003
0 0 6302 0 3874 0.2316 0.1342 00751 0 0404 0 0207 0 0101 0 0046 00020 0 0008 0 0003 0 0001 0 0000 00000 00000 0.0000 00000 0 0000 1 0.9288 07748 OE5995 0.4362 03003 01960 01211 00705 00385 00195 0.0091 00038 00014 00004 00001 00000 00000 00000 00000 2 0.9916 0.9470 0.8591 0.7382 0.6007 0.4628 0.3373 0.2318 0.1495 00898 00498 0.0250 00112 0.0043 0.0013 0.0003 00000 0.0000 0.0000 3 0.9994 0.9917 0.9661 0.9144 0.8343 0.7297 0.6089 0,4426 0,3614 02539 01658 00994 00536 0.0253 0.0100 0.0031 00006 0.0001 0.0000 4 1.0000 0.9991 0.9944 0.9804 0.9511 0.9012 0.8283 0.7334 0.6214 0.5000 0.3786 0 2666 0.1717 0.0988 0.0489 0.0196 0.0056 0.0009 0.0000 5 1.0000 0.9999 0.9994 0.9969 0.9900 0.9747 0.9464 0.9006 0.8342 0.7461 0.6386 05174 0.3911 0.2703 0.1657 0.0856 0.0339 0.0083 0.0006 6 1.0000 1.0000 1.0000 0.9997 0.9987 0.9957 0.9888 0.9750 0.9502 0.9102 0.8505 0.7682 0.6627 0.5372 0.3993 0.2618 0.1409 0.0530 0.0084 7 1.0000 1.0000 1.0000 1.0000 0.9999 0.9996 0.9986 0.9962 0.9909 0 9805 0 9615 0.9295 0.8789 0.8040 0.6997 0.5638 0.4005 0.2252 0.0712 8 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 0.9998 0.9997 0.9992 09980 09954 0.9899 0.9793 0.9596 0.9249 0.8658 0.7684 0.6126 0.3698
1
0.0576 00319 00168 00084 00039 00017 00007 00002 00001 00000 00000 00000 00000 00000 09428 08131 06572 0.5033 0.3671 02553 0.1691 01064 00432 00352 00181 00085 0.0036 00013 00004 0.0001 00000 0.0000 00000 2 09942 09619 08948 07969 0.6785 05518 0,4278 03154 0.2201 01445 00885 00498 0.0253 0,0113 00042 00012 0.0002 0.0000 00000 3 09996 09950 09786 09437 08862 08059 07064 05941 0.4770 03633 0.2604 01737 01061 00580 00273 00104 0 0029 0 0004 0.0000 4 1.0000 09996 09971 0.9896 0.9727 0.9420 08939 08263 07396 06367 0.5230 04059 02936 01941 0.1138 0.0563 0.0214 00050 0.0004 5 1.0000 1 0000 0.9998 09988 0.9958 09887 09747 09502 09115 08555 07799 0 6846 05722 0 4482 03215 02031 01052 00381 0.0058 6 1 0000 1 0000 1 0000 09999 09996 09987 0.9964 09915 09819 09648 09368 08936 08309 07447 0.6329 04967 0.3428 01869 00572 7 1 0000 1.0000 1.0000 1 0000 1 0000 09999 0.9998 09993 09983 0.9961 09916 0.9832 09681 0.9424 0.8999 08322 07275 05695 03366
0 0.6634 0.4305 02725 01678 0.1001
X
TABLE i (conrinuod)
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
20
i5
0.3585 0.7358 0.9245 0.9841
12 13 14 15 16 17 18 19 20
11
10
9
7 8
6
5
0.1216 0.0388 0.3917 0.1756 0.6769 0.4049 0.8670 0.6477 0.9568 0.8298 0.9887 0.9327 0.9976 0.9781 0.9996 0.9941
0.0115 0.0032 0.0008 0.0002 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0692 0.0243 0.0076 0.0021 0.0005 0.0001 0.0000 0.0000 0.0000 0.0000 0.2061 0.0913 0.0355 0.0121 0.0036 0.0009 0 0002 0 0000 0.0000 0.0000 0.4114 0.2252 0.1071 0.0444 0.0160 0.0049 0.001 3 0.0003 0.0000 0.0000 0.6296 0.4148 0.2375 0.1182 0.0510 0.0189 0 0059 0 0015 0.0003 0.0000 0.8042 0.6172 0.4164 0.2454 0.1256 0.0553 00207 0.0064 0.0016 0.0003 0.9133 0.7858 0.6080 0.4166 0.2500 0.1299 0.0577 0.0214 0.0065 0.0015 0.9679 0.8982 0.7723 0.6010 0.4159 0.2520 0.1316 0.0580 0.0210 0.0060
0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 00000 0.0000 0.0000 0.0000 0.0000 0,0000 0.0000 0.0000 0.0000 0.0000 0.0003 0.0000 0.0000 0.0000 0,0000 0.0000 0.0013 0.0002 0.0000 0.0000 0.0000 0.0000
1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1,0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1,0000 1.0000 1.0000 1.0000
0.9520 0.8782 0.7553 0.5914 0.4119 0.2493 0.1275 0.0532 0.0171 0.0039 0.0006 0.0000 0.0000 0.0000 09829 09468 0.8725 0.7507 0.5881 0.4086 0.2447 0.1218 0.0480 0.0139 0.0026 0.0002 0.0000 0.0000 0.9949 0.9804 0.9435 0.8692 0.7483 0,5857 0.4044 0.2376 0.1133 0.0409 0.0100 0.0013 0.0001 0.0000 1.0000 1.0000 0.9998 0.9987 0.9940 0.9790 0.9420 0.8684 0.7480 0,5841 0.3990 0.2277 0.1018 0.0321 0.0059 0.0004 0.0000 1.0000 1.0000 1.0000 0.9997 0.9985 0.9935 0.9786 0.9423 0.8701 0.7500 0.5834 0.3920 0.2142 0.0867 0,0219 0.0024 0.0000 1,0000 1.0000 1.0000 1.0000 0,9997 0.9984 0.9936 0.9793 0.9447 0.8744 0.7546 0.5836 0.3828 0.1958 0.0673 0.0113 0.0003 1.0000 1.0000 1.0000 1,0000 1.0000 0.9997 0.9985 0.9941 0.9811 0.9490 0.8818 0.7625 0.5852 0.3704 0.1702 0.0432 0.0026 1,0000 1.0000 1.0000 1.0000 1.0000 1.0000 0.9997 0.9987 0.9951 0.9840 0.9556 0.8929 0.7748 0.5886 0,3523 0.1330 0.0159 1.0000 1.0000 1.0000 1,0000 1.0000 1.0000 1.0000 0.9998 0.9991 0.9964 0.9879 0,9645 0.9087 0.7939 0,5951 0.3231 0.0755 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 0.9999 0.9995 0.9979 0,9924 0.9757 0.9308 0.8244 0,6083 0,2642 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1,0000 1.0000 0.9998 0.9992 0.9968 0.9885 0.9612 0.8784 0.6415 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0C)0 1.0000 1.0000 1.0000 1.0000
0.9998 0.9974 0.9861 1.0000 0.9994 0.9961 1.0000 0.9999 0.9991
10000 09999 09987 09900 09591 08867 07624 05956 04143 02517 01308 00565 00196 00051 00009 00001 00000 00000 00000
0.9997 1.0000 1.0000
4 0.9974
3
2
1
0
0 4633 0.8290 0.9638 0,9945
0 2059 0 0874 0 0352 0 0 34 0 0047 0 001 6 0 0005 0 0001 0 OOO O 0000 0 0000 0 0000 0 0000 0 0000 0 0000 0 0000 0 0000 0 0000 0.5490 0.3186 0.1671 0.0802 0.0353 0,0142 00052 0.0017 0.0005 0.0001 0.0000 00000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 2 0.8159 0.6042 0.3980 0.2361 0.1268 0.0617 0.0271 0.0107 0.0037 0.0011 0.0003 0.0001 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 3 0.9444 0.8227 0.6482 0.4613 0.2969 0.1727 0.0905 0.0424 0.0176 0.0063 0.0019 0,0005 0.0001 0.0000 0.0000 0.0000 0.0000 0.0000 4 0.994 0.987g 0.9383 0.8358 0.6865 0.5155 0.3519 0.217g 0.1204 0.0592 0.0255 0.0093 0.0028 0.0007 0.0001 0.0000 0.0000 0.0000 0.0000 L 0.9999 0.9978 0.982 0.9389 0.8516 0.7216 0.5643 0.4032 0.2608 0.1509 0.0769 0.0338 0.0124 0.0037 0.0008 0.0001 0.0000 0.0000 0.0000 6 1 0000 0 9997 0 9964 0 981 9 0 9434 0 8689 0 7548 0 6098 0 4522 0 3036 0 1 81 8 0 0950 0 0422 0 01 52 0 0042 0 0008 0 0001 0 0000 00000 7 1 .0000 1 .0000 0.9994 0.9958 0.9827 0.9500 0.886e 0.7869 0.6535 0.5000 0.3465 0.21 31 0.1 1 32 0.0500 0.01 73 0.0042 0.0006 0.0000 0.0000 8 1 .0000 1 .0000 0.9999 0.9992 0.998 0.9848 0.9578 0.9050 0.81 82 0.6964 0.5478 0.3902 0.2452 0.1 31 1 0.0566 0.01 81 0.0036 0.0003 0.0000 g i.0000 1.0000 1.0000 0.9999 0.9992 0.9963 0.9876 0.9662 0.9231 0.8491 0.7392 0.5968 0.4357 0.2784 0.1484 0.0611 0.0168 0.0022 0.0001 10 1 0000 1 0000 1 0000 1 0000 09999 09993 09972 09907 09745 09408 08796 07827 06481 04845 03135 01642 00617 00127 00006 11 1.0000 1.0000 1.0000 1.0000 1.0000 0.9999 0.9995 0.9981 0.9937 0.9824 0.9576 0.9095 0.8273 0.7031 0.5387 0.3518 0.1773 0.0556 0.0055 i 2 1 .0000 1 .0000 1 .0000 1 .0000 1 .0000 1 .0000 0.9999 0.9997 0.9989 0.9963 0.9893 0.9729 0.9383 0,8732 0.7639 0.6020 0.3958 0.1 841 0.0362 1 3 1 .0000 1 .0000 1 .0000 1 .0000 1 .0000 1 .0000 1 .0000 1 .0000 0.9999 0.9995 0.9963 0.9948 0.9858 0.9647 0.91 98 0.8329 0.681 4 0.451 0 0.1710 i 4 1 .0000 1 .0000 1 .0000 1 .0000 1 .0000 1 .0000 1 .0000 1 .0000 1 .0000 1 .0000 0.9999 0.9995 0.9984 0.9953 0.9866 0.9648 0.91 26 07941 0.5367
i
0
602
TABLE 2
APPENDIX C TABLES OF DISTRIBUTfONS
Poisson cumulative distribution function epk/k! F(x;jJ)= k=O
x o 1
2 3
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
0.9048 0.9953 0.9998 1.0000
08187
0.7408
06730
0.9631
0.9384
0.4966 0 8442
09964
0.9921
0 9999 1 0000
0.9997 1.0000
0.9992 0.9999 1 0000
0.4066 0.7725 09371 0.9865 0.9977 0.9997
03679 07358
09989
0.6065 0 9098 0.9856 0.9982 0.9998 1.0000
05488
0,9825
4 5
0.9769 0 9966
09659
09996
09992 09999
0.4493 0.8088 0.9526 0 9909 0.9986 0.9998
1.0000
10000
10000
09999
0.8781
1.0000
6
0 1
2 3
4 5 6 7
8 9
10 11
12 13 14 15 16 17 18 19 20 21
22 23 24 25 26 27 28 29 30
0.9942
0.9197 0.9810
09963 0.9994
2.0
3.0
4.0
5.0
60
7.0
8.0
9.0
10.0
15.0
0.1353 0.4060 0.6767 0.à571 0.9473 0.9834 0.9955 0.9989 0.9998 1.0000
0.0498 01991 0.4232
0.0183
0.0067
00025
0.0009
0.0003
0.0001
00916
00404
00073
0.1247
00030 00138
0.0012
0.2381
04335
02650 04405 06160
0.0174 0.0620 0.1512
00062 00212
00000 00005 00028 00103
0.0000 0.0002
0.0550 0.1157 0.2068
0.0293 00671 01301
04530 05925
03239 04557
02202
0.7166 0.8159
0.5874
06472 08153 09161
09665 09881 0.9962 0.9989
0.6288 07851
08893 09489 09786 09919
09997 09999
0.9972
1.0000
0.9997 0.9999 1 0000
0.9991
0.0296
0.7622
04457 06063
08666
0.7440
00818 01730 03007 04497 05987
0.9319 0.9682 0.9863
08472
0.7291
09161
0.8305
09574 09799 09912 09964
09015 09466
09945 09980 09993 09998 0.9999 1 0000
0.2851
0.9986 0.9995
09998 09999 1.0000
0.9730
09872 09943 0.9976 0.9990
09996 09999 1.0000
0.0424 0.0996 0.1912 0.3134
0.8881
07060 08030
09362 09658
0.8758
0.9827 0.9918 0.9963 0.9984 0.9994 0.9997
0.9585 0.9780
09999 1.0000
0.9261
09889 0.9947 0.9976 0.9989 0.9996 0.9998 0.9999 1.0000
0.3328 0.4579
05830 0.6968 0.7916
00009 00028 00076 00180 00374 0.0699 0.1185 0.1848 0 2676
08645 09165 09513 09730 09857
03632 04657
0.9928
09965 09984
0 8195 0.8752 0.9170
0.9993
09469
09997
0.9673 0.9805
0.9999 1 0000
0.5681 0.6641
07489
09888 0 9938 0.9967
09983 09991 0.9996
31
09998 09999
32
1 0000
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
603
APPENDIX C TABLES OF DISTRIBUTIONS
TABLE 3
Standard normal cumulative distribution function D(z) and 100 x 'th percentiles z i
dt
(z) =f 0.00
0.01
0.02
0.03
0.04
0.05
0,06
0.07
0.08
0.09
0.0
0.5000
0.5040
0.5080
0.5120
0.5160
0.5199
0.5239
0.5279
0.5319
0.5359
01
05398
05438
05478
05517
05557
05596
05636
05675
05714
05753
0.2 0.3 0.4
0.5793 0.6179 0.6554
0.5832 0.6217
0.5871
0.6255 0.6628
0.5910 0.6293 0.6664
0.5948 0.6331 0.6700
0.5987 0.6368 0.6736
0.6026 0.6406 0.6772
0.6064 0.6443 0.6808
0.6103 0.6480 0.6844
0.6141 0.6517 0.6879
0.5 0.6 0.7 0.8 0.9
0.6915 0.7257 0.7580 0.7881 0.8159
0.6950
0.6985 0.7324 0.7642 0.7939 0.8212
0.7019 0.7357 0.7673 0.7967 0.8238
0.7054 0.7389 0.7704 0.7995 0.8264
0.7088 0.7422 0.7734 0.8023 0.8289
0.7123 0.7454 0.7764 0.8314
0.7157 0.7486 0.7794 0.8078 0.8340
0.7190 0.7517 0.7823 0.8106 0.8365
0.7224 0.7549 0.7852 0.8133 0.8389
1.0
0.8413 0.8643 0.8849 0.9032 0.9192
0.8438 0.8665 0.8869 0.9049 0.9207
0.8461
0.8485 0.8708 0.8907 0.9082 0.9236
0.8508 0.8729 0.8925 0.9099
0.8531
0.8749 0.8944 0.9115 0.9265
0.8554 0.8770 0.8962 0.9131 0.9279
0.8577 0.8790 0.8980 0,9147 0.9292
0.8599 0.8810 0.8997 0.9162 0.9306
0.8621
0.8686 0.8888 0.9066 0.9222
1.5 1.6 17 1.8 1.9
0.9332 0.9452
0.9345 0.9463
0.9357 0.9474
0.9370 0.9484
0.9394 0.9505
0.9406 0.9515
0.9418 0.9525
0.9429 0.9535
0.9441
09554
09564
09573
09582
0.9382 0.9495 09591
09599
09608
09616
09625
09633
0.9641
0.9649 0.9719
0.9656 0.9726
0.9664 0.9732
0.9671
0.9686 0.9750
0.9693 0.9756
0.9699
0.9738
0.9678 0.9744
0.9761
0,9706 0.9767
2.0
0.9772
2.1
0.9821 0.9861
0.9783 0.9830 0.9868 0.9898 0.9922
0.9788 0.9834
0.9798 0.9842 0.9878 0.9906 0.9929
0.9808 0.9850 0.9884
0.9925
0.9793 0.9838 0.9875 0.9904 0.9927
0.9803 0.9846
0.9893 0.9918
0.9778 0.9826 0.9864 0.9896 0.9920
0.9812 0.9854 0.9887 0.9913 0.9934
0.9817 0.9857 0.9890 0.9916 0.9936
2.5 2.6
0.9938 0.9953
0.9940 0.9955
0.9941
0.9943 0.9957
0.9945 0.9959
0.9946 0.9960
0.9948
0.9949 0.9962
0,9951
0.9956
0.9963
0.9952 0.9964
27
09965
09966
09967
09968
09969
09970
09973
09974
0.9974
0.9975 0.9982
0.9976 0.9982
0.9977 0.9983
0.9977 0.9984
0.9978 0.9984
09971 0.9979 0.9985
09972
2.8 2.9
0.9979 0.9985
0.9980 0.9986
0.9981 0.9986
3.0
0.9987 0.9990 0.9993 0.9995 0.9997
0.9987
0.9987
0.9988
0.9991
0.9991
0.9991
0.9993 0.9995 0.9997
0.9994 0.9995 0.9997
0.9994 0.9996 0.9997
0.9988 0.9992 0.9994 0.9996 0.9997
0.9989 0.9992 0.9994 0.9996 0.9997
0.9989 0.9992 0.9994 0.9996 0.9997
0.9989 0.9992 0,9995 0.9996 0.9997
0.9990 0.9993 0.9995 0.9996 0.9997
0.9990 0.9993 0.9995 0.9997 0.9998
0.90
0.95
0.975
0.99
0.995
0.999
0.9995
0.99995
0.999995
1.282
1.645
1.960
2.326
2.576
3.090
3.291
3.891
4.417
1.1
1.2 1.3 1.4
2.2 2.3 2.4
3.1
3.2 3,3 3.4
0.9713
Ô.9981
0.6591
0.7291 0.7611
0.7910 0.8186
0.9871 0.9901
0.9251
0.8051
0.9881
0.9909 0.9931
0.9961
0.9911 0.9932
0.8830 0.9015 0.9177 0.9319
0.9545
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor 0.87 1.24 1.65 2.09 2.56 3.05 3.57 4.11 4.66 5.23
0.68 0.99
134
1.73 2.16
2.60 3.07 3.57 4.07 4.60
6 7 8
9
11
12 13 14 15
10
5
0.30 0 55
0.11
0.07
0.21 0.41
0.02
0.01
0.010
4
2 3
1
0.005
5.63 6.26
5.01
3.82 4.40
1.24 1.69 2.18 2.70 3.25
0.05 0.22 0.48 0.83
0.025
4.57 5.23 5.89 6.57 7.26
1.64 2.17 2.73 3.33 3.94
0.10 0.35 071 1.15
0.050
5.58 6.30 7.04 7.79 8.55
2.20 2.83 3.49 4.17 4.87
1.61
0.58 1.06
7.58 8.44 9.30 10.17 11.04
3.45 4.25 5.07 5.90 6.74
1.92 2.67
1.21
0.10 0.58
0.02 0.21
0.250
0.100
10.34 11.34 12.34 13.34 14.34
5.35 6.35 7.34 8.34 9.34
0.45 1.39 2.37 3.36 4.35
0.500
v==IJo
Cx(v)
13.70 14.85 15.98 17.12 18.25
7.84 9.04 10.22 11.39 12.55
6.63
539
4.11
1.32 2 77
0.750
h(y;v)dy
22.31
21.06
19.81
17.28 18.55
10.64 12.02 13.36 14.68 15.99
6.25 7.78 9.24
2.71 4.61
0.900
19.68 21.03 22.36 23.68 25.00
18.31
16.92
15.51
12.59 14.07
9.49 11.07
7.81
3.84 5.99
0.950
i i i x yth Percentiles x(v) of the chi-square dstributon with y degrees of freedom
TABLE 4
21.92 23.34 24.74 26.12 27.49
24.72 26.22 27.69 29.14 30.58
23.21
18.48 20.09 21.67
17.53 19.02 20.48
16.81
16.01
11.34 13.28 15.09
9.21
6.63
0.990
14.45
5.02 7.38 9.35 11.14 12.83
0.975
26.76 28.30 29.82 31.32 32.80
18.55 20.28 21.96 23.59 25.19
7.88 10.60 12.84 14.86 16.75
0.995
34.53 36.12 37.70
32.91
31.26
22.46 24.32 26.12 27.88 29.59
10.83 13.82 16.27 18.47 20.52
0.999
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor 9.59 10.28 10.98 11.69 12.40 13.12
826
29.71
27.99 35.53 43.28 51.17 59.20 67.33
18.49 26.51 34.76 43.19 51.74 60.39 69.13 77.93
v[1 - (2/9v) +z..J(2/9vfl3.
37.48 45.44 53.54 61.75 70.06
14.95 22.16
13.79 20.71 16.79 24.43 32.36 40.48 48.76 57.15 65.65 74.22
8.91
7.63
14.61
8.67 9.39 10.12 10.85 11.59 12.34 13.09 13.85
7.56 8.23
6.41 7.01
5.70 6.26 6.84 7.43 8.03 8.64 9.26 9.89 10.52 8.90 9.54 10.20 10.86 11.52
796
691
581
514
For large v,x(v)
30 40 50 60 70 80 90 100
25
24
23
22
21
20
16 17 18 19
20.60 29.05 37.69 46.46 55.33 64.28 73.29 82.36
15.66 16 47
1485
931 10.09 10.86 11.65 12.44 13.24 14.04
24.48 33.66 42.94 52.29 61.70 71.14 80.62 90.13
17.24 18.14 19.04 19.94
1 6.34
1191 12.79 13.68 14.56 15.45
29 34 39.34 49.33 59.33 69.33 79.33 89.33 99.33
16.34 17.34 18.34 19.34 20 34 21.34 22.34 23 34 24.34
1534
66.98 77.58 88.13 98.64 109.14
5633
34.80 45.62
21.60 22.72 23.83 24.93 26.04 27.14 28.24 29.34
1937 2049
40.26 51.80 63.17 74.40 85.53 96.58 107.56 118.50
33.20 34.38
30.81 32.01
43.77 55.76 67.50 79.08 90.53 101.88 113.14 124.34
46.98 59.34 71.42 83.30 95.02 106.63 118.14 129.56
31.41
28.41
32.67 33.92 35.17 36.42 37.65
30.19 31.53 32.85 34.17 35.48 36.78 38.08 39.36 40.65
27.59 28.87 30.14
29.62
2885
2630
24.77 25.99 27.20
2354
135.81
50.89 63.69 76.15 88.38 100.42 112.33 124.12
53.67 66.77 79.49 91.95 104.22 116 32 128.30 140.17
149.45
137.21
112.32 124.84
99.61
59.70 73.40 86.66
43.82 45.32 46.80 48.27 49.73 51.18 52.62
42.31
40.79
35.73 37.16 35.58 40.00 41.40 42.80 44.18 45.56 46.93
33.41 34.81
36.19 37.57 38.93 40.29 41.64 42.98 44.31
3925
3427
3200
606
TABLE 5
APPENDIX C TABLES OF DISTRIBUTIONS
Cumulative distribution function H(c; y) of the chi-square distribution with y degrees of freedom H(c; y) =J0Ch(y; y) dy V 1
2
3
0.02 0,06
0.025 0.036 0.050 0.062 0.080 0.112 0.194
010
0248
0049
0008
0.20 0.60 1.0 1.4
0.345
0.095 0.259 0.393 0.503 0.593 0.667 0.727 0.777 0.817 0.850 0.878 0.900 0.918 0.933 0.945 0.955 0.963 0.970 0.975 0.980 0.983 0.986 0.989 0.993 0.996 0.998 0.998 0.999 0.999
0.022 0.104 0.199 0.294 0.385 0.468 0.543 0.608 0.666 0.716 0.759 0.796 0.828 0.855 0.878 0.898 0.914 0.928 0.940 0.950 0.958 0.965 0.971 0.981 0.988 0.993 0.995 0.997 0.998 0.999 0.999
0.001
0.002 0.004 0.006 0.01
0.561
4
5
6
7
8
9
10
11
0.001
0.002 0.003 0.005 0.010 0,030
0.001
0.004
0001 0.005 0.037 0.090 0.156 0.229
0.001
0.012 0.037 0.076 0.124 0.179 0.239 0.300
0.915 0.928 0.939 0.960 0.973 0.983 0.989 0.993 0.995 0.997 0.998
0.925 0.949 0.965 0.977 0.984 0.990 0.993 0.996
0.004 0.014 0.034 0.063 0.100 0.143 0.191 0.243 0.296 0.350 0.404 0,456 0.506 0.554 0.599 0.641 0.679 0.715 0.747 0.776 0.803 0.826 0.875 0.912 0.938 0.957 0.970 0.980 0.986 0.991
180
0999
0997
099
0988 0979 0965 0945 0918
19.0 20.0
0.999 0.999
0.998 0.999
0.996 0.997
0.992 0.994 0.999
1.8
2.2 2.6
3.0 3.4 3.8 4.2 4.6 5.0 5.4 5.8 6.2 6.6 7.0 7.4 7.8 8.2 8.6 9.0
10.0 11.0 12.0 13.0 14.0 15.0 16.0 17.0
25.0 30.0
0,683 0.763 0.820 0.862 0.893 0.917 0.935 0.949 0.960 0.968 0.975 0.980 0.984 0.987 0.990 0.992 0.994 0.995 0.996 0.997 0.997 0.998 0.999 0.999
12
0.301
0.373 0.442 0.507 0.566 0.620 0.669 0.713
0.361 0.421
0.479 0.533 0.584
0.751
0.631
0.785 0.815
0.674 0.713 0.748 0.779
0.841
0.864 0.884 0.901
0.807
0.832 0.854 0.874 0.891
0.001
0002
0.001
0.014 0.030 0.052
0.006 0.013 0.026 0.043 0.066 0.093 0.125
0.081
0.115 0.154 0.198 0.244 0.291 0.340 0.389 0.437 0.483 0.528 0.571
0.612 0.649 0.685 0.717 0.747 0.811 0.861
0.899 0.928 0.949 0.964 0.975 0.983
0.242 0,286 0.330 0.375 0.420 0.463 0.506 0.547 0.586 0.623 0.658 0.735 0.798 0.849 0.888 0.918
0.002 0.006 0.012 0.022 0.036 0.054 0.076 0.102 0.132 0.166 0.202 0.240 0.280 0.321 0.362 0.404 0.446 0.486 0.525 0.563 0.650 0.724 0,787 0.837 0.878
0.941
0909
0.958 0.970
0.933
0.161 0.201
0.985 0.990 0.998
0.951
0.975 0.982 0.997
0.001
0.002 0.005 0.011
0.019 0.030 0.044 0.062 0.084 0.109 0.137 0.168 0.202 0.237 0.275 0.313 0.352 0.391
0.430 0.468 0.560 0.642 0.715 0.776 0.827 0.868 0.900 0.926
0.960
0.001
0.002 0.005 0.009 0.016 0.025 0.036 0.051
0.069 0.090 0.114 0.140 0.170 0.201
0.234 0.269 0.305 0.341
0.378 0.470 0.557 0.636 0.707 0.767 0.818 0.859 0.892
0.971
0.939 0.955
0.995 0.999
0.991 0.998
0.001
0.002 0.004 0.008 0.013 0.020 0.030 0.042 0.057 0.074 0.094 0.117 0.142 0.170 0.199 0.231 0.263 0.297 0.384 0.471 0.554 0.631 0.699 0.759
0.809 0.850
0884 0.911
0.933 0.985 0.997
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
607
AFPENDX C TABLES OF DISTRIBUTIONS
TABLE 5 V
(continued) e
13
15
14
16
2.6 3.0
0.001
0.002
0.001
34 38
0004 0007
0002 0003
0001
0002
0001
4.2 4.6 5.0 5.4 5.8 6.2 6.6
0.01 1
0.006 0.009 0.014
0.001
17
0.061
0.029 0.039
0.078
0.051
0.003 0.005 0.008 0.012 0.017 0.024 0.032
70 0098
0065
0042
0027
0016
7.4
0.120
0.054
0,035
78
0144
0068 0084
0.017 0.025 0.035 0.047
0.021
0.003 0.004 0.007 0.010 0.014 0.020
0.001 0.001
0.002 0.004 0.006 0.008 0.012
21
22
0.001
23
24
0.001 0.001
0.002
0.001 0.001
0010
0006
0003
0002
0.014
0.005
0.003
0045 0057
0029 0038
0019 0024
0.008 0011
0001 0.002
0001
0.022
0007
0.103 0.122 0.180
0.071
0.048 0.060 0.096
0.032 0.040 0.068
0.020
0011 0.013
0002 0003
0001
0015
0004 0006 0.008
0.005 0.007 0.014
0.003 0,004 0.009
0143 0200 0264
0106 0153 0208
0076 0114
0025 0043 0067
0017 0030 0048
0.333 0.405 0.476
0.271
0.475 0.547
0.099 0.138 0.184
0.053 0.079 0.112
0614
0546
0477
0.676
0.611 0.671 0.726 0.774
0.544 0.608 0.667
0.197 0.227 0.306
110 120 130
0389 0472 0552
0314 0394 0473
0247
0191
0321
0256 0327
14.0 15.0 16.0
0.626 0.693
170
0801
0744
10.0 19.0 20.0 21.0
0.842 0.877 0.905 0.927
220 230 240 250
0945 0958 0969 0977
0.793 0.835 0.870 0.898 0921
0.474 0.549 0.618 0681 0.737 0.786 0.828 0.863
0.401
0.751
0.550 0.622 0.687
0857 0886 0910 0930
0815
0940 0954 0965
0892 0916 0935 0950
26.0 27,0
0.983 0.988
0.974
0.962
0.981
0.971
0.946 0.959
290 300
0993 0995
0986 0990 0992
0978 0984 0988
35.0 40.0 50.0
0.999
0.999
0.998
Forlarge v H(c; y)
0.002 0.003 0.005 0.007
20
0.001 0.001
8.6 9.0 10.0
780 091
0.001 0.001
19
0.002 0.003 0,004
0.082 0101 0121 0.144 0.169 0.238
82 0170
18
0398
0.087 0.133
0027 0.047
0.017 0.032
0.011 0.021
0054 0084 0123
0037 0060
0.001
0002
0001 0001 0.002 0,002 0.005 0011
0020 0034
0.343
0.169 0.244 0.283
0091 0.130 0.177 0.230
0410
0347
0289
0237
0.413 0.478 0.542 0.603
0.351
0.721
0.478 0.543 0.605 0.663
0.541
0.294 0.355 0.417 0.479
0.073 0.105 0.145 0191 0.243 0.299 0.358 0.419
0768 0809 0845 0875
0716 0763 0804 0839
0659
0600 0656 0707 0753
0540 0598 0653 0703
0480 0539 0596 0650
0421
0.926 0.942
0.900 0.921
0.870 0.895
0.794 0.829
0.748 0.789
0.699 0.744
0.647 0.696
0968 0976 0982
0955 0965 0974
0938 0952 0963
0917 0934 0948
0.834 0.865 0891
0912 0930
0860 0886 0908
0824 0855 0882
0784 0820
0740 0780 0815
0.996 0.999
0.993 0.999
0.991 0.998
0.986 0.997
0.980 0.995
0.972 0.993
0.961
0.731
0.780 0.821
0851 0881
0905
0.338 0.407
0161 0.216 0.277
0711
0758 0799
0.415 0.479
0.989 0.999
0851 0.948 0.985 0.999
0151 0.197 0.248 0.303 0.361
0480 0638 0594
0.932 0.979 0.999
8(z);z- [(c/v)113''l +219v]/(2/9v)112
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
608
TABLE 6
APPENDIX C TABLES OF DISTRIBUTIONS
100 x yth Percentiles t(v) of Student's t distribution with y degrees of freed orn CyIv)
f (t; y) dr
y
0.60
0.70
0.80
0.90
0.95
0.975
0.99
0.995
0.9995
0.325 0.289 0.277 0.271 0.267
0.727 0.617
1 376
6.314 2.920
12706
31.821
63.657 9.925
0.978
4.303 3.182
6965
0584
4.541
5.841
0.569 0.559
0.941
3.078 1 886 1.638 1.533 1 476
2.132 2.015
2776
3.747 3.365
4.604 4.032
636.619 31.598 12.924 8.610 6.869
0.553 0.549
0906 0896 0889
0260
0542
1.943 1 895 1 860 1.833 1.812
3 143 2.998 2.896
10
1.440 1.415 1.397 1 383 1 372
2.447
9
0.265 0.263 0.262 0261
3.707 3.499 3.355 3.250 3.169
11
0260 0259
0876 0873 0870
0258 0258
0.540 0.539 0.538 0.537 0.536
1 363 1 356 1 350 1.345
0.865 0.863
1
2 3 4 5 6 7 8
12 13 14 15
0,543
0920
0.883 0.879
0.868 0.866
19 20
0257 0257
0.533 0.533
0861 0.860
1 337 1.333 1.330 1 328 1 325
21
0.257
1.323
0256 0256 0256
0.532 0.532 0.532
0.859
22
0858
1 321
0.858 0.857 0.856
26 27 28 29 30
0.256
0531
0256 0256 0256 0256
0.531
40 60
0.255 0.254 0.254 0.253
0.529 0.527 0.526 0.524
120
0.530 0.530 0.530
2.306
2262 2228
2.821
1796
2.201
2718
1.782
2 179 2 160
2.681
2145
2.624 2.602
2120
0534 0534
0531
2365
1,746 1 740 1 734 1.729 1.725
0.535
0.257 0.257
0.256
2.571
1 341
0258
0.531
2353
1.771 1.761 1.753
16 17 18
23 24 25
'
0.259
0546
1 061
2131
2764
2650
4.015 3,965 3.922 3.883 3.850
2.921
2093 2086
2.539 2.528
2.861
1 721 1.717
2.080 2.074
2518
2.831
2.508
1 319 1.318 1 316
1 714 1.711
2069 2064 2060
2500 2492 2485
2.819 2.807 2.797 2.787
0856 0855 0855 0854
1 315 1.314 1 313
1 706 1 703
2056
2479 2473
2.779
2.467
1 311
0.854
1.310
1.699 1 697
2462 2457
2.763 2.756 2.750
0.851
1 303 1.296 1.289 1.282
2323 2390 2358 2326
2.704 2.660 2.617 2.576
0.848 0.845 0.842
1 701
2.052 2.048 2.045
2042
1.684
2.021
1.671
2.000 1.980 1.960
1.658 1.645
4.587 4.437 4.318
2583 2567 2552
1.708
5.041 4.781
3.106 3.055 3.012 2.977 2.947
2.110 2101
0862
5.959 5.408
2.898 2.878 2.845
2.771
4.221
4.140 4.073
3.819 3.792 3.767 3.745 3.725
3.707 3.690 3.674 3.659 3.646 3.551
3.460 3.373 3.291
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
Ö Lt
o
z w
L09
696 Oi'L LL6
Oo6 ¿69
9E6 C?L
06t'
z,.6 90L
888 889 961'
68i'
t8?
LZL
LLt
rL
L9 EC9
6'fl.
E9
LOE
891'
0V 998 089 88E
LZE
96t
66
LLt
0SP
96L
Et'I
L8E
L8 O8
966
t'Lt LZL SE6
9L7
O6
OLE
LL6
t'Lt
I'L7
Lt
L9t
o.9
Z06
L09 0t'V
ZL7
Lt E6L
98E
918 9L9
8CL
998 699
LCL
OLE
S6L 9EL 198
999
9tL OLE
991' 999 181
197 E99
981 998 818
96E
966 1'6E
766
661
966 961 886
661
96E
961 986
OOL
t'OL
OVt'
L6
86
9OL 01.8
0t'
SOL
0L
9t1.
899
0L
8tL 9L°9
0L
0tL 90 989
9Z2 OL9 LZI' LOt
YtL
/ Z99
Lt
096
99 96
86L 99
901'
L8L
LES
t'6Z
LL O0
99L L
06Z
686 Z69
Z8t' iCC
LtL
6E L8Z
)
' US Et
LLV
tL
IL
WLL
EZ.9
'6
OL
SL6
6L
LLI
LJ
099 9Lt'
8C
99L
&t
666
LYS
LL
6t'
96L
86
8t'
6fl
Et'
90E
SIL
1'L
LEt
98L
660
66O
660 060
9 6L6O
SOL
9ZL
L88 666 8LE
6660 660
t'LO
8
6 6L60
9it
8L
COL
E'8
OLE
t
660 060
00L
LL
Z9t
C9?
ZL 991
t99 90t'
Et 9CL
I'LL
6L6
E'
0BL
5660 660
6660 660 t' 5L60 960 060
17E
999
960 060
E 9L60
1'98
ILL
L9L
861' 80E
901 869 9E8
ZEZ 0'9L
6t
U'9
OLL
SLL
606
S9L
9L1'
866 699 991'
611'
L89
118
191
096 629 1'98
101
I'Ll
98E
986
9 9L60
9660 660
899
066 06E
091 996 979 966
661
96E
661.
966
661
96E
960 060
1
981 998
00009
66E
191
889
00991 0908
061 006 00919 0089
961 916 00999 0999
000'S 008 009
968
9660 660 9L60 960 060
661 896
916 829
999 611 106 198 661
001'EZ 09L'9 996
999
898 919
899
006 999
661
969 I'91 896 699
996 999 908
6CL 6LS
L0 869 961'
W
1'68
Pt 0°
9L
101'
096 919
899
8t'V 6L9 ¿VI
688
661
966 66E E61
00829 0989
E66 96E 261 696
LE6
996 0669
OOL'EZ
816
ELS
0E9
989
769
689
LEE
861 996
1'66 76E
661
¿99
971
1'1'I' ¿LE
L06 609 86E
9L 06L
88
1Ll
0
869
006'CZ 086'S L96 699
86E 861 ¿66
1'°66
661
9L9 911 988 999
188
96E
868 809
8iL
L'OZ
1'88
068 009 669
D0Z
969
LPL
9L8 96E
6t'L
169
t'PL
08E
6E1'
188
9L9 911
769
781
6L8
669
189
090'S 996
00189
t'61 896
1'6E
661 1'66
EEc 661
00989
t'66 I'69 861 696 090'9
O01"79
186
1'OL
661 1'66 Y6E
981 7L8 999
1L9
t'EI'
998 989 .61'
699 EPI 0L8 099 661
009'7Z
766 t'6S 861 986
0119
696 689
909
LL6 8t'9
L09
5
or
(6 ''),J
xp (t ''t x)5
o se6ep t pue 't q,t,t uoInqJsu sio3apau9 o (t "t)j saI!ua3Jed 'iU x ®i
0919 986 919 y
LEI' 9L9
¿8E
VL
t'9 9t' OEE
008'89 019'9 966
891'
000'99 0999 0001
LZt
OLE'9
PE
0101
¿76
961.
661
998 L19
998 999 091'
999 691 998 819 661
966 96E 961 00V99 086's
886
081 L98 919
OLE
811' 199
691
998 EIS 661
966 96E
561 686
0101
009'99
OLE'g
889
00999 0901 099
¿19
999
899
699
ESE
199
899
999
wopee
£ 378VA
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
346
5.32 7.57 11.3
090
0.95 0.975 0.99
3.18 4.75 6.55 9.33 11.8
12
15
0.90 0.95 0.975 0.99 0.995
0.90 0.95 0.975 0.99 0.995
8.68 10.8
3.07 4.54 6.20
2.70 3.68 4.77 6.36 7.70
2.49 3.29 4.15 5.42 6.48
3.49 4.47 5.95 7.23
3.89 5.10 6.93 8.51
2.61
2.81
4.83 6.55 8.08
2.36 3.06 3.80 4.89 5.80
6.52
5.41
2.48 3.26 4.12
3.48 4.47 5,99 7.34
3,71
12.8
100
2.92 4.10 5.46 7,56 9.43 2.61
10
3.29 4.96 6.94
269 363 472 642 796
2.73
0.90 0,95 0.975 0.99 0.995
802
101
136
386 508 699 872
571
426
9
721 106
281
301
336 512
090 095 0975 099 0995
701 881
759 960
110
5.05
281 3.84
10.1
5.52 7.85
296 412
4.07 5.42
292
8.45 10.9
307 435 589
147
311 4.46 6,06 8.65
6.54 9.55 12.4
474
326
0995
8
8.07 12.2 16.2
0,975 7
0.99 0.995
359 559
7 (continued)
090 095
TA B L
4.32 5.07
3.29 4.14 4.85
2.71
2.79
2.90 3.58 4.56 5.37
3.41
2.16 2.21
5.76
4.82
4.64 5.52
2.91 3.61
3.73
2.28
2.33
3.14 3.95 5.20 6.30
2.41
688
561
329 420
251
618 769
4.53
3.50
262
79 4.99 6.99 8.89
278
3.00
2.46 3.22 4.07 5.39 6.54
255 337 432 580 713
795
3.58 4.65 6.37
267
5.12 7.19 9.16
283 387
2.27
3.89 5.06 6.07
3.11
2.39
2.52 3.33 4.24 5.64 6.87
348 448 606 747
261
830
3.69 4.82 6.63
273
5.29 7.46 9.52
288 397
'
2.12 2.64 3,20 4.00 4.67
535
4.50
3.51
2.24 2.85
2.38 3.07 3.85 5.06 6.12
247 323 410 547 669
750
3.44 4.43 6.03
259
4.90 6.84 8.68
275 373
2.09 2.59 3.12 3.89 4.54
2.80 3,44 4.39 5.20
2.21
2.35 3.02 3.78 4.94 5.97
244 318 403 535 654
734
5.91
3.39 4.36
256
8.51
4.82 6.72
272 368
2.06 2.54 3.06 3.80 4.42
2.19 2.75 3.37 4.30 5.09
2.32 2.98 3.72 4.85 5.85
242 314 396 526 642
721
5.81
3.35 4.30
254
4.76 6.62 8.38
270 364
lo
2.02 2.48 2.96 3.67 4.25
2.15 2.69 3.28 4.16 4.91
5.66
3.62 4.71
2,91
2.28
307 387 511 623
238
701
5.67
3.28 4.20
250
4.67 6.47 8.18
267 357
12
1.97 2.40 2.86 3.52 4.07
4.72
4.01
318
2.10 2.62
2.24 2.84 3.52 4.56 5.47
377 496 603
301
234
3.22 4.10 5.52 681
246
7.97
6.31
351 4.57
263
I5
1.92 2.33 2.76 3.37 3.88
2.54 3.07 3.86 4.53
2.06
5.27
4,41
2,20 2.77 3.42
583
481
230 294 367
5.36 661
400
3.15
242
4.47 6.16 7.75
259 344
20
3.69
3.21
1.87 2.25 2.64
2.47 2.96 3.70 4.33
2.01
4.25 5.07
3.31
2.15 2.70
225 286 356 465 562
640
3.08 3.89 5.20
238
4.36 5.99 7.53
256 338
1.82 2.16 2.52 3.05 3.48
2.85 3.54 4.12
2.38
1.96
2.62 3.20 4.08 4.86
2,11
541
279 345 448
221
618
3.78 5.03
3.01
234
7.31
5.82
330 425
251
60
2.46 2.96 3.37
2.11
1.79
4.01
2.34 2.79 3.45
1.93
2.08 2.58 3.14 4.00 4.75
218 275 339 440 530
606
4.95
2.97 3.73
231
4.20 5.74 7.19
249 327
20
,
2.40 2.87 3.26
207
1.76
3.90
326
1.90 2.30 2.72
4.64
3.91
2.06 2.54 3.08
519
431
333
271
216
595
2.93 3.67 4.86
229
4.14 5.65 7.06
247 323
o
oo
E8
BLZ t'6 L SLL
09(
L8'L
S0
LS?
Est. 09L
LL
06? 990 000
6L L9L
6L
0Z ZLL
6Z
9C OC
8Zt' BLt'
LLZ
t'8C LL
8?0 96? 050
98
t
9660 660
0?L 5L60
678 80L 609 007 6L?
8660 660
560 060
8L6
00 5L60
990
g660 660 6L60 960 060
14 t1)kj/ = (Z4 t4)L
69C OOt
L
OCZ
8(8
0E9
795
99 06
BLD'
989 9L9 060
L9t' 09
080
SL?
iLC 90
LOO
E
097 560 000
SO?
6L
060 870 680
89'? OL?
L9
87'?
OLD'
089 867 060
9SL
L0
66 L
L
¿LO LS? 6?'?
0V
06 L
60E
8L'? ?L L
08 L
S6'L 89 L
6L? 60? 60? LI L 99 L
ELI'
11E?
LSS
960 060 SLP
SO'?
¿5?
80'?
OLE
69'?
SOP LS'?
9660 660 0? SLSO 560 060
960 060
9L6'O
9660 660
7(7
POE
509
LLP
09
LOO
590
SL? SL?
SO'S
88? ¿77
OLI' 60E
05? 500 LS'O
?0'?
0S'L 09 L
67E
OLE
SLO PEO
SL?
705
8L1'
766 OLE
LS'?
(SP
000 670
0(8 090
?L? 00?
OS? PO?
690 06? 8??
bs9 LOO
SOI' OLE
60'?
601'
990
(90
5(0 097
059
58'S
00E
I'B'P
960 970
l'O'?
07?
007 900 69? PL?
980
LOO
Lb? SP? 00'?
¿89
(90
¿LS OPI'
977 670
PS? 30? 96'L
LL7 SL?
¿L? SO" I'6'L
580
Z
LOO
OSE
LI'?
O6,
LOO
D'L'
OLE
96 L
LSZ
600 960
09'? S??
6L? 000
?S?
LS L
9
LL?
05?
OS L
05?
LP'? OLO ¿L L
600 0??
PO?
D'BL
PL L
SLL
L
OWL
LL L
99L
000
890 000
VZ
6O
SL
88 L L
99L
(L O
99? 0??
O6
OL L
L9L
L9L
PS?
LI'?
1L?
L9
6L'
D'O?
S?
9IL
t'L
90?
9
t'CL
LE? SL?
9
:s LV 6e L
86 L
6L? 00?
z:
i'L 98L
SL? L6L
00
SL L
69 L
76 L SL'L
08Z
L
99L
PS L
'9
LS L
LL OS L
08 L S9'L
L9
0L OSL OP'L
SP L
6 i'O
i' L OP L
LI' L
9SL
ZL
LE L
00 L
L
SOL
SOL 90 L
ZZ-
ZL L
OBL
00 00 L 00 L
SL OLL
SOL
)9L LI'L
SS'L
96L l'BL L9L 09L
OL
WL
St' L
O6L OD' L
99 L
01'? L?'?
SO L
P6L
000
I'LL
OL
90'? P8L
960
OS L
I'LL'
I'S L
890
(00
970
08? (L'? ¿BI.
II.? LO'?
¿L SL
89 L
09?
L??
O?P OLE LOO
88 L
¿70 LS? LS? L??
3L?
I'OE 86'? LS?
98 L
0L'O
SL?
¿LE
LI?
8LO 78'?
08 L
SSO
LL L
60? LS L
I'S L
0(0
OL'? LO?
60'? LO?
LOO
86 L
OL L
EE? 06 L
55'? O?? 06 L ¿SL PS L
06?
600
OS L
(80
LS?
11' L
,S?
7L
85? 8??
t6o
68'L
SL?
0??
SO?
I'S'L
LS'?
OL? 6L'L
0??
70? I'LL
OS'?
96 L 59 L
SL?
O? 8
06 L PSt.
TI"?
SL
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
612
TABLE 8
APPENDIX C TABLES OF DISTRIBUTIONS
Sample size fort test Sample size n to achieve power i - ß for d = lu in one-sample case and n n1 =n2 for d' = lu -u2j/ in two-sample case, for one sided test at significance level . These are approximate n for two-sided test at significance level 2. One-Sample Test
Two-Sample Te
i -p (2m)
J
0.005
0.1
(001)
02 0.4 0.6 0.8
10 1.2 1,4 1,6
0.0125
(0025)
O.O 0.60 0.70 669 169
42 22 14 10 8 7 6
805 204 53 25
966 244 64
16 12 9 8 7
19 13 10 9 8
1.8 2.0 3.0
6 3
4
0.1
506 127 34 17
625 158 42 19 13 9 7 6 6 5
02 0.4 0.6 0.8
4
11
10
8
1.2 1.4 1.6 1.8 2.0 3.0
7
0.025
0.1
(005)
02
386 98 26 13
0.4 0.6 0.8
10 1.2 1.4 1.6
6 5
6 6
31
080
090
0,55
099
0.50
0.80
070
0.80
0.90
0.96
0,99
ci
1173 296
1493 377 97
1785 450 115 53 32 22
2403 605 154
1327 333
1602
2977 749 189
3567 894 226
0.1
101
2337 588 150
4806 1206 304
85
138 85
0.4 0,6 0.8
38
4910
21
11
27
16 13
9
85 49 32 23 18 14
lOO 66
28
87 37 23 15
1922 484 124 55 32
11
6
10
6 4
6
4
5
1004 256
1244 316
1529 388
77 36 22 14 12 10
45
8
27 19 14 12 10
16 13 11
9 8
71 41
7
8
9
6 s
7 5
8 6
6
7
768 194 62 24 15
964 240 63 30
1245 312
1514 380
81
98 45 27 18 14
2090 524 135
11
8 7 6 6 5
18 13 10 8 7 6
6
492 124
619
788
166
201
33 16 10
41
9 6 6
7 6
9
52 24 14 10
5
5
20 12 7 6 5
18 20
8 7 6 5
37 23 16 12 10
.
11
61
36 24 18 14
8
9
11
7
8
10
6
7
9 6
1054 265 68 32 18 13 10 8
7 6 5
1302 327 85 39 23 16 12 9 8 7 6
7
64 30 18 12 9 7 6 5 4 771
53
23 14
15 12 10 8 7
44 27 18 13 10 9 7
81
37 21
14 11
8 7 6 5 3
1840 459 117 31 21
403
193 49
982 245 62 29 17
21
16 12 10 8 7
97 44 26 17 13
10 8 7
0.1
(010)
02 0,4 0.6 0.8
10 1.2 1.4 1.6 1.8 2.0
3.0
1901
482 120
310 80 36 21
11
8 7 5
14 10 8 6
4 4
5 5
362 92 24 12
8 6 5
473 119
620 156
31
41
15 9 7 5
19 12
858 215 55 26 15
8
11
6
5
543 138 35
722 182
46 27
16
21
lO
12
18
8 6
8
13 10
7 5 4
6
6
8
4
4
5
.6
7 6
1084 272 70
1580 396
8
32 19 13 10
6
5
101
46
5
4
16 13
21 17
1,6 1.8
10
11
6
6
2482 629 157
3020 766 191
21
27 19 15 12 10
49 32 23
16 12
9 8 4
17
12 10 8 6 6
11
13 10 7
4
11
40
1235 312 79 35 20
6 5 4
1.2
84
943 237 59 28 16 8
271,4
71
4
272 69 19 9 6 5
36
20
64 32
1574 395 100 45 26
3.0 0,05
9 8 6
1237
5
4
11
7
7
4
26 18 14
6 4
9
5
39
6 5 4
02
17
14 11
142.0
83.0
4180
0.1
1057 265
02
117 69
0.4 0.6 0.8
4410 31
23
1.2 1.4
181.6 151.8 122.0
8 5
9
2106 527 133
2603 650 164
3680
0.1
231
60 34 22
77 42 28 19 15 12 10
104 59
0.4 0.6 0.8
16
12 10 8 7 4
1715 430 109 48 28 18 13 10 8
7 6 3
5
8 5
73.0
922 02
3810 27 20
1.2 1.4
161.6
1318 1120 63.0
2166 543 137 62 35 23
3160
0.1
199 89
0.4 0.6
16 12 10
23
1.2
793 02
500.8
3310 171.4 141.8 11
1.8
7
92.0
4
5
3.0
8
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
613
APPENDIX C TABLES OF DISTRIBUTIONS
TABLE 9
TABLE W
for completely specified H0
Upper critical values CM1
rin
0,10
0.05
0.025
0.01
0.5 0.6
0.189
0.330 0.417
0.7 0.8 0.9 1.0
0.286 0.321 0.341
0.258 0.327 0,386 0.430 0.455
0.347
0.461
0.581
0.427 0.539 0,633 0.700 0.735 0.743
0.241
0,491
0.544 0,573
Critical values for CVM test of H0 X - F(x) with parameters estimated 1H0
Statistic
0.90
0.95
0.975
0.99
EXP(8)
(1 +O.16/ri)
WEI(8, ß)
(1 + 0.2/,/')
0.177 0.102 0.104
0.224 0.124 0.126
0.273 0.146 0.148
0.337 0.175 0.178
NCU,
TABLE 11
2)
(1 +0.5/n)CM
Critical values for KS and Kuiper statistics
1H0
Statistic
0.90
0.95
0.975
0.99
F(x)
(,,/+O.12+O.11/)D
EXP(0)
(,J+O.26+O.5/vn)(Ô -0.2/n)
N(p, 2) WEI(9, ß)
(/-0.1 +0.85/\/')Ô
1.224 0.995 0.819 0.803
1.358 1.094 0.895 0.874
1.480 1.184 0.955 0.939
1.628 1.298 1.035 1.007
F(x)
(,../+0.155+0.24/.,,/)V (,/+Q,24+O.35/,Ç)(t7 -0.2/n)
N(, c2)
(/+0.05+0.82/v)Q
1.620 1.527 1.386 1.372
1.747 1.655 1.489 1.477
1.862 1.774 1.585 1.557
2.001
EXP(0)
WEI(U, ß)
1.910 1.693 1.671
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
614
TA8LE 12
APPENDIX C TABLES OF DISTRIBUTIONS
Critical values t for the Wilcoxon signed-rank test, PT tj
0.005
0.01
0.025
0.05
0
4 5 6
O
0
2
1
3
3 5
5
5 7
7
9
12
10 13 17
12 15 19 23 27 32 37
15 19 23 27 32 37 43
7
8 9
0
10
3
11
12 13 14 15 16 17
18 19 20
0.10
1
9
8
21
25 29 34 40 46 52
0
2
2 3 5 8
3
8
8 10 14 17
11
21
21
26
25 30 35
31
47 53
60
2 3 5
5
10 13 17
41
0.20
36
42 48 55 62 69
14 18
22 27 32 38 44 50 57 65 73 81
0.50 4 7
10 13 17 22 27
32 38 45 52 59 67 76 85 94 104
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
XlONddV
37YA
VCL
SflBVI
J
8I1!tA O nid
ZW
c'-9
fl11
I
(u"u)xwr
9IJOIiflRIHiSI0
O
w
"
A
IJ
u C
Ufl
t'
9
9
L
9
E
t'
9
9
L
8
8zo'o
ZZO'O t'PO'O
8ZO'O L8O'O
LOO B 9E0'O
8000
9000
PLL'O
LLO'O
LOO z PZOO BPO'O E8O'O
¿LOO EEOO
ZLO'O PZOO
8800
zt'o'o
LELO
Z600
L9O'O
0610 CELO
¿600
8
9 9
L
C
1000 L000
0000
P000 8000
ZOO'O
L000
t'000
zoo-o
LOO E
¿000 1100
P000 9000
LOO L
LOO O
9zO'O ¿EOO
Lo_O 8
L800
OEOO LPOO
9910
6900 0600 LLLO Lt'LO
LLOO
E8LO
t'i-LO
O
L'O 00
L90'O
LPO'O
L
OOZ'O
8600
Z
OOt'O
EEL'O L9Z'O
C
0O9'o
t'
oot"o 98Z'O 0090 6ZPO
9E0'O LLO0 EPL'O PLZ'O LZCO
9
LL8'O
6zt'O
ECCO
LBZ'O
0890
ooz'o 9ZL'O t'LEO 96L'O 6Zt'O 98ZO
9 9
9
L
zoo_o
LOO'O
1000
1000
EOO'O
ZO0'O EO0'O
zOO'O
9L0'O
P000 6000
9000 O Lo 8 6000 PZOO 9100
06LO
LLL'O LBL'O
6800
ogz'o
OOZ'O
0go_O OOL'O OOZO OSCO 00g_O
9800
CELO
t'
9
9
L
8
L
Lo_O P
800'O
800'O
EOO'O
ZOO'O
L
6z0O
BLO'O
OLO'O
9000
Z
¿go_o
ZEO'O
6L0'O
ZL0'O
P000 8000
C
L'O 00
EEO'O
LZO'O
PLO'O
BZO'O
LOO 8
t'
O L
9900 9600
¿800
0 9E0
O
t'zo
O BPO
O BZO
8 9
EVzO Et'E'O Et't"O L99'O
9800
9800
9EOO
8LOO
LVOO
L'O 6Z L'O 9L
z80'O
88O'O
LO L L
E90'O
LO SL
¿LOO
L'O 88
6800
8EZ'O
88VO
EZL'O
YOE'O
BOZO
PLZ0
89LO
LEO'O E9O'O VLO'O LOLO
OL
8t'9O
L8E'O
t'9ZO
LOLO LPLO PBL'O
OLz'O
6
EPLO 90Z'O 8LZ'O 99E'O Z9PO
8t'E'O
PLZ'O
PELO
L9t"O St'9'O
t'ZE'O P6EO
oezo
LZP'O
89ZO
ZLL-O
98ZO
0090
LEEO
9Lzo
LLVO ZPLO
EL
P9t'O
Lt'CO
6L9O
89Z'o
¿LLO
PL
8E9'O
POPO
96EO 991'O
L6LO Zt'ZO
6LE0
¿Lz0
P6ZO
U
L B
LL
LI
n
P000 9000
8000 6000
LZOO
EzO'O EEOO
ZEOO LPOO
¿PO'O
9900 0600
P900 8800
OzLO
Loo_o
Lzoo
P900 L60O
u O
z E
i, 9 9 8 6 O',
LWO 9E00
i-o-O i,
¿1'OO
gz0.0
OEVO
0900 9L00
tV00
68VO
960_O
z9O.O
¿L
1610
9110
99OE0
EOO'O
BL
O,I-.o
0800
61
oZ0_0
01,0_o
LZ
OLEO 99E_o
8910 8610
L60O
t'LO_O
800_o ¿00'O
8ZZO L9ZO
0000
000'O
000'O
LOO_O
0000
000_O
z',
L0O0 zoo_0 CO00
LOO_O
000'O
EL
1000
t7L
LOO_O
91
9000
1000 zoo-0 C000
0 00L
91
600_0
800_o
ZOO-0
LOO E LOO 6
¿000 LOO O
¿ZOO
9E00
6t'OO ji9OO Z80'O
zcz-o
LOO 6
LLL0
ecto
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
616
TABLE 138
APPENDIX C TABLES OF DISTFIBUTIONS
Critical values u such that P[U uj for the Mann-Whitney statistic with m = min (n1 n2) n = max (n1 n2) 2 m n 20
m-3
m=2 n=9
10
11
0
0
1
1
1
2
0
0
0
1
1
1
4
1
1
5
3
3
2 4
3
2
2 4
3 4
3
1
2 4 5
6
7
5 8
3 4
3 5
4
6 9
5 7
6 8
8
9
11
6
7
11
10
11
12
9 12
11
9
5 8 10 13
7
6 8
5 7
12 15
13 17
8
9 13 16 19
11
14 18
21
12 16 19 23
13 17 20 24
15
18 23 27
21
19 23
17 22 26
27
n=10
11
19 23 27 32
22
12
0.01
0.025 0.05 0.10
14
13
9
4
m4 0.01
0.025 0.05 0.10
0.025 0.06 0.10
7 10
12 15
9
15
11
0.025 0.05 0.10
15 18 22
0.025 0.05 0.10
m=13 n=13 001 0.025 0.05 0.10
39 45 51
58
12
13
14
2 4 6
2 5
9
10
9 12 15 18
10
7
13 16
20
m=7 13 17
20 24
22 26
14 17
31
21
30
28 33
16 20 24
36
25
28
31
12
13
14
11
12
13
14
27 33 37 43
30
25
30 34 40
28 33 38 44
31
36
36
24 29 34 39
34 40 46 52
14
14 14
15 15
20 20
47 55
56 64 72 80
114 127 138
11
14 17
14 17 21
21
25
9
11
12
14
15 18
17
16 20 24 28
17 22 26
23 28 33 38
26 36 41
12
13
14
31
35
37
41
38 45
42
47 53
21
26
31
m9
m10 0.01
13
12 16 19 23
m=8 0.01
11
m-5
m=6 0.01
10
31
43 50 56 63
61
69
41
47
30 35
31
m12
m=11
26
26
37 42 48
49
51
58
151
- z1_ ../n1n2(n1 + n2 + 1)/12
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
APPENDIX C TABLES OF DISTRIBUTIONS
TABLE 14
617
p Values for Spearmans rank correlation coefficient
-r] =P[Rr]=p
P[R
n=3 t000 0167 0.500
0.500
n=4 t000 0,042
n=7
n=8
n=9
1.000 OOOO
0.690 0035 0,667 0.042
0767 0.011
0 964 0 001 O 929 0 003 0.893
0.006
0 643 0 048 0.619 0057 0.595 0.066
0.600 0.208
0857 0012 0821 0017 0786 0024
0400 0.375
0,750 0033
0524 0.098
0.200 0.458
0.714 0.044 0.679 0.055 0.643 0.069
0.500 0.476 0.452 0 429 0.405
0800 0.167
0.000 0542
0952 0.000 0.939
0650 0,033 0.633 0.617 0.600
0.1 50
0.163
0 583 0.054 0567 0.060
0381 0180
0.067 0 117 0.175
0.500 0133
0.357 0195
0.464 0 151
0.333 0.310
0500 0.225 0.400 0258 0.300 0342 0200 0.392
0.393 0.198 0.357 0.222 0.321 0.249 0.286 0.278
0.100
0.000 0525
0250 0297 0214 0331
n6 1.000 0.943 0.886 0.829 0.771 0.714 0.657 0.600 0.543 0.486 0.429
0 475
0.025 0.029
0.108 0.122 0.134
0536 0118 0429 0.177
0.964 0.000
0.683 0.667
0900 0042 0.800 0.700 0.800
0.038 0.043 0.048
0.879 0.867
0.001 0.001 0.001
0.074
0.830 0.002
0.321
0.081
0309 0.193
0.286 0250
0818 0.003 0806 0.004 0794 0.004
0262 0268
0467 0.106
0.782
0238 0.291 0214 0.310 0 190 0.332
0450 0115
0770 0007 0758 0008 0745 0.009
0.231
0.089 0.097
0.005
0709 0.013
0.420 0.453
0.095 0.420
0.350 0.179 0.333 0.193 0.317 0.205
0.697 0.685
0.300 0.218
0661 0.648
0.071
0.441
0.048 0.024
0.467 0.488
0.000 0.512
n=8 1.000 0.000
0.121
0.976
0.149 0.178 0.210
0.952 0001
0.000
0,358 0 156
0.533 0.517 0.500 0.483
0.214
0367 0.168
0.068 0.088
0.132 0.139 0.148
0333 0174
0.383
0.051
0.394 0.382 0.370
0.345
0119 0397
0.036 0482 0.000 0518
0418 0.116 0406 0,124
0842 0002
0.143 0.376
0.071
0.077 0.479 0.083 0.467 0.089 0.455 0.096 0.442 0.102 0,430 0.109
0550 0066
0179 0.357 0143 0.391 0.107
0.283
0.156
0.231
0267 0.247 n
9
0.250
0 260
1.000 0000
0233 0276
0.000 0.000 0.000 0.000
0.217 0290 0 200 0307 0183 0.322
0.371
0249
0.929 0.001 0.905 0.002 0881 0.004
0.314
0.282
0.857 0005
0.001
0150 0354
0.257 0329 0.200 0.357
0833 0008
0900 0.001
0.133 0.372
0.810 0001 0786 0.014 0762 0.018 0.738 0023
0.883 0.867 0.850 0.833
0.002 0.002 0.003 0.004
0714 0029
0817 0005
0117 0388 0100 0405 0083 0422 0067 0440 0050 0456
0.029 0.500
0.891
0167 0352
0.001
0.086 0460
0.915 0.000 0.903 0.000
0.433 0125 0.417 0 135 0.400 0.146
0.008 0.017 0.029
0.401
0.000
0927 0.000
n10 0.491
0855 0.001
0.983 0.967 0.950 0.933 0.917
0.143
0 976 0.000
0717 0018 0,700 0.022
0 607 0.083 0.100
0.000 0.000
0571 0.076 0.548 0.085
n=5 1.000 0.008
0.571
0.750 0.01 3 0 733 0.01 6
ti=1O 1.000 0 988
0.800 0.007
0783 0.009
0.167
0.033 0.017
0.339
0.733 0.721
0.010 0.012 0.015 0.017
0673 0.019 0.636 0.624 0.612 0.600 0.588 0.576 0.564 0.552 0.539 0.527 0.515 0.503
0.022 0.025 0.027 0.030 0.033 0.037 0.040 0.044 0.048 0.052 0.057 0.062 0.067 0.072
0.297 0.285 0.273 0.261
0.165 0.184 0.203 0.214 0.224 0.235
0248 0.246 0.236 0.224
0 257 0.268
0212 0.280 0200 0.292 0.188 0.176
0.304 0.316
0164 0.328 0152 0.341 0139 0.354 0.127 0.115 0.103 0.091
0.079 0.067
0.367 0.379 0.393 0.406 0.419 0.433
0055 0.446 0.042
0.459
0030 0.473 0018 0.486 0.006
0.500
0.474 0.491
0000 0509
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
618
APPENDIX C TABLES OF DISTRIBUTIONS
TABLE 15
for MLEs . and &
Asymptotic variances and covariancesa11, aa2, and a12 of ..,//ö and andc = 2/1(1 +p2)2a22] p = tin
c a11
a22 a12
1.0
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
0,82 1.10867 0.60793
0.88
1.0
1.15
1.31
1.49
1.67
1 83
1.95
2.01
2.00
1.03652 0.69034
1.00209 1.01308 1.08718 1.25512 1.57321 2.15713 3.29575 6.05171 0.74255 0.78571 0.82367 0.85809 0.88990 0.91965 0.94775 0.97447
-0.25702 -0.15877 -0.03943 0.10138 026796 0.46788
0.71421
1.03158
1.47506
2.21872
TABLE 16
Values of k0, n Var(/), n Cov(/L /ö), and n Var(/) for simp'e estimators and n
k0
n Var(/) n Cov(/1, 8/5) n Var(&ö)
2
3
0.69 1.42 0.13 1.32
10
15
30
60
1.31
1.36
1.40
075 -0.20 117
0.74
0.71
1 50 0.68
1.53 0.66
1.57 0.65
-0,20
-021
-022
-0.23
-0.23
1.17
1.16
1.17
1.16
1.16
4
5
6
7
8
9
098
115
1.04
0.92
1.27 0.86
1 35 0.83
1.18 0.80
1.25 0.77
-006
-012
-0.14
-0.14
-0.19
1 23
1.21
1.20
1.20
1.16
-0.19 116
TABLE 17
Coefficients for the quadratic approximations of k,0 k0 + k1/n + k2/n°, c + c2/n2, dro d0 + d1/n + d2/n2, h/n a0 + a1/n + a2/n2
c0 + c1/ri
rin 0.1
k1
010265 -1.0271
k2 c0
-22504
c1
-5.5743
c2
-7.201
k0
d0
0000
0.25973
0.2
021129 -1.0622 0030 -1 4999 -3.070 -1.886 0.27113
d1
-0,1259
-0.1436
d2 a
0.044 0.2052
0.046 0.4218 -2.111 0.008
a1
-2.052
a2
0.000
0.3 0.32723
-'1.1060
0.4 0.45234
-1.1634
0.5
058937 -1.2415
0.6
074274
-11340
0.054
0.089
0.145
0.242
-1 0309 -2.2859 -0.767
-0.6717 -1.9301 -0.335
-03665
-0.0874 -1.7114 -0.111
0.28480 -0.1681
-0.2026
0067 06514 -2.175 0.002
0.30160 0.102 0.8959
-2.244 -0.106
-1.7619 -0.091 0.32305,
-0.2537 0162 11577 -2.314 -0.064
0.35188
-0.3365 0.280 1.4391
-2.376 -0.188
0.7
0.92026
-1.5313 0433 0.1856
-1 7727 -0.369 039384 -0.4887 0.550 1.7416
-2.390 -0.526
0.8
0.9
1.1382
1.4436
-1.8567
-2.6929
0.906 0.4759
2.796 0.8340
-2.0110
-2.7773 -2.825
-0.891 0.46402
0.62397
-0.8394
-2.1509
1.383 2.0598
5.934 2.3394
-2.205 -1.682
-0.856 -7.928
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
ANSWERS TO SELECTED EXERCISES
CHAPTER 1 a. S{r,g,b} I b. {r}, (g), (b), {r, g), (r, b), (g, b), S, 0 (r, r), (r, b), (r, g)
2
a. S = (b, r), (b, b), (b, g) (g, r), (g, b), (g, g) 9
C1 = ((r, r), (r, b), (r, g)) = {(r, r)) u ((r, b)) u ((r, g)) u ((g, r)) C1 n C2 = C1, C' n C2 = ((b, r), (g, r)) = (b, r)) u (g, r)) C2 = ((r, r), (r, b), (r, g), (b, r), (g, r)) = ((r, r)) u
3
a. (O, O), (O, A), (O, B), (O, AB), O), (A, A), (A, B), (A, AB), O), (B, A), (B, B), (B, AB), (AB, O), (AB, A), (AB, B), (AB, AB).
4
S={rbrgrbbrggrbgrgbr {x X = r or X = C1c2
5
b. (O, O), (O, A), (O, B), (O, AB), (A, A), (B, B), (AB, AB), (A, AB), (B, AB).
c. (O, O), (A, A), (B, B), (AB, AB).
}
Ck r) where e1 = b or g
a S=(O,1,2,...} b. S=fO,co)={tItO} 619
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
620
ANSWERS TO SELECTED EXERCISES
6 7
S=[0,1]{xIOx1}
8
a. Yes. p + P2 + ¡'
9
a. 1/9 b. 1/3 C. 5/9 d. 1/3 e. 2/9 f. 5/9 a. 9/16 b. 1/4 c. 1/8
10
S=[0,00)={tItO} 1, p
b. No.p1+p2+p3+p4>1
0
14
a. 1/3, 1/3, 1/3 b. 1/4, 1/4, 1/2 c. 6/11,3/11,2/11 a. 1/4 b. 15/16 c. 3/8 d. 5/16
15
S = {(t, t), (t, a), (t, e), (a, t), (a, a), (a, c), (c, t), (e, a), (e, c)}, 2/3
19
a. 2/3 b. 23/30 c. 7/30 d. 9/10
20
a.7/8 b.1/8
22 23 24
a. 0.8 b. 0.3 c. 0.2 a. 0.2 b. 0.2 c. 0.3 d. 0.5
25 26 27 28 29 30
a. 3/5 a. 3/5 a. 5/14 a. 1/56
13
31
32 Ì
3 19/420
b. 1/2 c. 3/4 d. 3/10 f. 3/5 g. 1/2 b. 3/5 c. 3/5 d. 9/25 f. 3/5 g. 3/5 b. 15/28 c. 25/28 d. 3/28 b. 15/56 c. 5/28 d. 3/8
1/3
a. 12/51 b. 2/15 a. 1/5 b. 34/105 a. 0.66 b. 0.1212 a. 1/13 b. 53/715 c. 5/53
a 5/12 b 4/5 36 37 39
a. 29/1000 b. 10/29 a. 43/80 b. 28/43 a. 0.2 b. 1/3
(1P)3
1p3 097412
42 i7
7
27/50
a. 25/64 b. 15/32 c. 55/64 d. 9/64 a. 47/60 b. 3/5 No. P(A1 r A2 r A3) O 1/8 P(A1)P(A2)P(A3) a. 26! b. 7,893,600 c. 11,881,376 a. 6,760,000 b. 3,407,040 c. 1,514,240 72
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
621
ANSWERS TO SELECTED EXERCISES
51
40
52
a. 0.504 b. 0,496
53
8
54
(3o\ (17V13'\ (1V16V13 a.) b.)) c.1)2)2
55
a.
57
a. 8 b. 32 c.
58
a. () b. 1/7 c. 7!
59
24360
60
10! (2X9)!
61
a. 365
62
a. 27,720 b. 27,720
63
(26 01(9 !X11 !X6!)
64
Same as 63
65
a. 60 b. 13 c. 170
66
(60 !)/(1 5 !)(20 !)(25!)
67
a.
68
a. 11,550 b. (12!)(3!)/14!
69
a. 126
71
a 09722 b 06475 a No b Yes c Yes B, D and C, D
72
(4\ ii)
39
/'24V25'\ /149
5)6)/11
365
d. 576
d. 0.5073
365e
b (9!)/(3!) b. 0.0397 c. 3024
CHAPTER 2
i
a.
y
f(y)
2
3
4
5
6
7
8
1/16
1/8
3/16
1/4
3/16
1/8
1/16
F)(y)
o
2
4
5
7
8
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
622
ANSWERS TO SELECTED EXERCISES
C.
w
f(w)
2
0
1
4
9
1/4
3/8
1/4
1/8
a.,b.
1
2
3
4
5
6
7
f(x)
1/12
1/6
1/6
1/6
1/6
1/6
1/12
F(x)
1/12
1/4
5/12
7/12
3/4
11/12
d. 7/12 8.
1/2
3
a.
4
1/12 1/4 1/4 5/12 f(x) f(x) = (x '- 1)/lO if x = 2, 3,4,or 5
5
a.k=8/7 b.No
7
a. c=1/33
X
O
b. F(x)=
8/33 5/11 7/11 26/33 10/11
i
8 9
x'
0zx<1
lx<2
2x<3 3x<4 4x<5 5
c. 4/11 a. f(x) = (1/2)x+1; x = 0, 1,2,... b. 0.000488 c. 2/3 a. Yes b. Yes c. No
11
$3,50
12
a. 12 b. 3/5
13
a.k>0 b. F(x)=1--x;1
14
a. No b. Yes c. No
15
a. f(x) = (x + 1)/8; 1
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
623
ANSWERS TO SELECTED EXERCISES
a.
17
f(x)
F(x)
c. 1/4 cl. 3/4 e. 3/4 f. 0 18
a. F(x) = x2/9; O
19
a. 5/4 a. 3/2 b. f(x) = 2(1
20
3
1/x2) if I
i +4e 1 + 4e' - 4e_x 1 +4e1
aa--e 3
32
a 30/8 b 172/64 c 21/2 a. 3/4 b. 3/80 c. 3/(r + 3) d. 25/4 a. No b. Yes C. k < I a. 1.9 b. 1.29 c. Y = 35X 40, 26.50, 1580.25 a 1/2 b it c 3iv1O a. Bound = 7/5; not useful b. Bound = 2/5 c. Exact probability = 7/8 a. 25/8 b. 55/64
34
a.
23 24 25 26 27 31
X
f(x)
1
2
5
1/8
1/4
5/8
b. 1/4
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
624
ANSWERS TO SELECTED EXERCISES
36 37 38
a. M,(t)
e2'/(l - t) b. 1, 2
6,13
a. Y35SlQc b. 3
CHAPTER 3 1
0008
2 3
a 0000977 b 00439
4
a. l-4q3+3q4 b. 1q2
a. 0.0439 b. 0.1209 c. Two engines safer if q> 1/3; four engines safer if q < 1/3; equally safe if q = 1/3
5
a F(he wins)
05177 > 0 5 (good bet)
P(he wins) = 0.4914
P(at least one 6 in six rolls) 0.6651 P(at least two 6's in twelve rolls) = 0.6187
a 02182 b 06292 c E(X)_-4 Var(X)=32 a. 0.2007 b. 0.2151
9
a. 3/10 b. 2/3
lo
a 003993 b 003993 c 09583 d 00417
11
14
a. 0.1517 b. 0.1759 a. 0.0465 b. 0.0465 c. 0.9494 d. 0,0506 a. 0.09 b. 0.1(0.9)' c. 0.729 d. 10 a. 1/4 b. 15/16
15
a 001722 b 0999 c 30
16
a.
12 13
(x -
1)(06)4(04)X_4; x = 4, 5,6,7 b. 0.2765
17
a. 0.00729 b. 0.99144 c. 0.271
20
a 00335 b 08385
21
a 0090 b 0220 c 0217
22 23 24 25 26 28 29
00242
c. p = 1/2 d. x = 6
a 01234 b 01247 0677 01008
a. 0.8009 b. 0,1493
a. 3/5 b. 3/5 c. 1 - 1/(10k2) a.f(y) = (41 2y)/400 if y = 1,2,..., 20 b. 19/8000 o. E(X) = 21/2, Var(X) = 133/4
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
625
ANSWERS TO SELECTED EXERCISES
32
a. F(x)=(x - 50)125, 50< x <75, zero if z b 2/5 c 125/2 d 625/12
33
1/3
34
a UNIF(0 5) b 2/5
36 37 39 39 40 41
42 43 44
45 46
a 0353 b 0214 c 20 days a. 0.122 b. 0.0028 02636 a. 0.861 b. 0,333 c. Sanieasa(nomemory) d. 10,000 m0=1 05768
O'T(l+k/ß) a. O/(c - 1) b. 2(12/[(K - 1)Ø - 2)] E(X)=50 VarX)= 10000
a 0362 b 0967 c 300/
d 3(100)2(32
a 00183 c E(X) = 5\/' Var(X) = 100(1 -
48
a x, 6[(1 - p)'1 b m=10(,J-1)
52
53 54 55
56 57 58 59
75
a24b3.J/4
47
51
50, one ifx
- 3)
1]
a. 0.846 b. 0.0377 c. 20 days a. 0.937 b. 0.688 c. 0.341 d. 0.20 e. 0.38 f. 1.96 a. 0,5 b. 0.227 c. 0.290 d. 3.822 e. 0.658 a. 0.816 b. 5.88 a. 0.8413 b. 0.9104 c. 0.8413 d. 16.58 a. 0.6366 b. 324.36
a.12 b.25 a. E(Y)=350p-100 b. p>2/7 c. No c. 1002.857 d. 1001.8
d.6 e.6
CHAPTE. 4 I
a. 1
2 3 4 5 6 7 8
0 0
0 1/16 0 0 0
0 0
1/16 0 1/16
0 0
0
1/16
1/16 0 1/16 0 1/16 0
0
1/16 0
1/16 0 1/16
0 1/16 0 1/16 0 1/16 0
2
3 0
0
0 1/16
0
0 1/16 0 0
1/16
0 0 0 0
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
626
ANSWERS TO SELECTED EXERCISES
-3 -2 -1 0 1
2
3
0
0
0
O
0
1/8
0
3/16 0
0 0 0
O
1/4 0 0 0
1/8
0
0
1/16
3/16 0 0
1/16 0 0 0
2.
yNN 1
2
1/12
1/12 1/12
0
2 3
0 0
4 5
0
O
4
5
6
7
0
0
0
0
1/12 1/12 0 0
0
0 0
0
0 0
3
0 0 0
O
6
3
1
1/12 1/12 0 0
0
0
a. 00399 b. Same as (a) c. 0.000609
(4( 48
.
i
09583 h 00417 i
74V4\(4V
d. 0.0793
0 O O
1/12
e. 0.0153
1, 2, 3, 4
\x)\5xJ/ '\5
g
0 0 1/12 1/12
1/12 1/12
000005541
\ /752
40
-X
Y-
4
a. 0.0331 b. 0.0666
5
a. 0.0465 b. Same as a
x, Oy, 0iz, x+y+z5
5
c. 0.00089 d. 0.0922
e. 0.0191
f. (5)(1/13)x(12/13)5_x; x = 0, 1, 2, 34 5 6
a.
d.
0.000382
b. 0.00344
c. 0.00153
12!
x1!x3 !x5 !(12 - x1 - X3 - x5)!
7
1/18
8
a. e4 b. Both P01(2) c. Yes
9
a.
xi
f1(x) -X2
f2(x2)
(1/6y1 +x +X5(1/2)12 xj
1
2
1/4
14/45
79/180
1
2
3
5/36
19/36
1/3
X3
X
b No c 101/180 d 25/36 (13'\(26'\(
10
13
a.f(xY)_Â)_»;
x0, yO, x+y2
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
627
ANSWERS TO SELECTED EXERCISES
c. f(x) =
/13\( 39
'26"
26 "
x) 2x)/Ç ); f) y) 2y)/ 2
52
2
d. No e 0.668 (26'\( 13
y)iy (39 'i (26( y)2xy y2x (39
P[Y=yX=1] f(i,y) fi(i) -
y = 0, 1
13
P[Y=yIX=x
-x P[X + Y
z] =
0.059
if z <0 if O z <1
0.441
if i ( z <2
a. f(x,
2' - x!y!(2 - x - y)!
c. f1(x)
(2)(1/
d. No
. 2/3
f(i, y) fi(y)
14
a. Both EXP(1)
- (2/3)(1/3)';
- X)(2/i
P[Y=yX= No
(1/4)x(1/2)Y(1/4)2_x_Y
(2\
P{Y=yJX=
12
if2z
y = 0, 1 O
y 2x
b F(x y) = (1 - eX1 - e) if x >0 and y >0 0 otherwise
c e2 d 1/2 e 3e2 f Yes 15
a. f(x1, x2) = 2(1 + x1)2(1 + x2)3 if x > O and x2 >
16
1/3
17
a.f1(x1)=3(1x1)2; 0
O
b. 1/3
b. f2(x2)=6x2(1x2); 0
a.f(x,y)=x+y; 0
C. 1/2
d. 1/24 e. 19/24 z3/3
if0
i
if2z
f. P[X+Yz]= z2z3/3-1/3 ifiz<2
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
628
ANSWERS TO SELECTED EXERCISES
19
a.k=2 b.f1(x)=(1+3x1x)if0
xy2+x2yx3 if0xy1
ifx>yand0'
c.F(x,y)=y3
x+x2x3
ify>1 andx>1
i
2(x+y) ifxyi d.f(Ix)_(13x1) e.f(xly)= 2(x+y) if0xzy 3y2 21
a.f1(x)=(2/3Xx+1)ifø
b f2(y)=1 if0
c f(yIx)i if0
d. 4/9 e. 73/162 f. Yes a f(x1, x2, ..., = 3(x1x2 . ' ;)2 ifa!! 0< x1 < i b. 1/8 c. (1/8) a. f(x1, x2, ..., x) = 2(x1x2 x)exp(x -
- x)
ifall 0< x1
b. 1 - e114 r. (1 a.f(xj,x3)={63_X1) ifû
O
c.f(x2lx1,x3)__3_x1) if0
e. f(x1, X2 Ix3) =
27
a.
x2\x1
iO
otherwise 2
3
2 3
0.04 0.12 0.04
0.12 0,36 0,12
0.04 0.12 0.04
x2\x1
1
2
3
2
0.04 0.16 0.20
0.16 0.64
0.20 0.80
0.80
1.00
3
28 29
If û
1
1
b.
.
(2I
c. 0.72 a. Yes f(x, y) = g(x)h(y) over (0, 1) x (0, 1) b. 1/6
a.No f(x,y)=0
ifü
b. 7/27 C. f1(x) = 12x(1 - x)2 if O < x < i
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
ANSWERS TO SELECTED EXERCISES
30
629
a.f1(x)=30x2(1x)2 if0
b.f(ylx)=2y/(1x)2 if0'
3x2
a. 5/48 b. f2(x2) f(x1Jx2) =
2x1+x2 2
X2
<2< if O
1
CHAPTER 5 a. 15 b. 63
a. 720 b. 23,04 a. 2/15
b. 2/75 c. 2/3 d. 2/5
2/75 f. 2/15 g. 3/25 a. 4/3 b. 12/5 C. 16/5 d. O
5 7
a 7/12 b 7/6 a 1/3 d - 1/24
a7b116c16d20
e E(Y x) = (3x + 2)/(6x + 3) 0
11
a.f1(x)=6x(1x); O
12
a. 1/6 b. 1/3 a. 1/9 d. O e. 1/3
13
a. f(yjx) =
15
a. 12/5 b. 6/25
17
a.1 b.2
18
115/88
if O < y < 2x b. E(Yjx) = x c. 1/2
19
a. 1.44
20
a. 3/2 b. 25/12 c. 6
21
M(t1, t2) = 2/[(1 - t2)(2 - t1 - t2)]; t2 < 1, t1 + t2 <2 M(t1, t2) [1/(t2 - 1) 1/(t1 + t2 - 1)]/t1; t2 <1, t1 + t2 < i
22
b. 1.0944
CHAPTER 6 i a. f1(y)=l;
O
b. f(w) = 4(ln w)3/w;
i
c.fz)=4e4; co
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
630
ANSWERS TO SELECTED EXERCISES
2
a.f(y)=4y3; O
b.f(w)l/w; e
f(u) = 2(1 - 4u)'12; O < u < 1/4 a. X = 2irR; f(x) = 6x(2i - x)/(2ir)3; 0< x b. Y = nR2; f(y) 3(n112 - y'12)/n2; 0< y < n
y=x4ory=1x4
a. y=ln(1u) b. wx ifB(x-1;3, l/2)
10
a. f1(y)=exp(y); y>.O b. F(w) = 1/2 if O
il
w < 1, one if i
w, zero otherwise
Y'-BIN(n,lp)
(y ±r
12
p)Y; y = 0, 1
13
f'(y) = y112/24 if O
14
a.Fw(w)=1_(1+2w)e_2w;w>O
.
y<6
b. f0,1,(u, y) = 4(v/u2)exp(-2v(1 + 1/u)); u> 0, v>0 C. f(u) = 1/(1 + u)2 u > O 15
Y
16
a,f,(u,v)=1/(u2v); 1v
17
b. f(u) (in u)/u2; i u a. f(y) y exp(y2/2); y> O b. f(w) = w"2(1 + w)'/ir; w> O
18
a.fST(s,t)=es; 0
P01(2.1)
fT(t) = e'; O < t f5(s) = e(e2 - 1); s > O 22
0.8508
23
M(t)
24 25 27
a. M(t) = (1 - 2t)° b. Y a. X1
- (1
Pet)
P01(10)
a. L0GN(p
Y
NB(k p)
b. W
GAM(2, 10)
P01(15)
o) b.
L0GN1y2,+)
29
a. g(y1,»,y)=n!(y1»y,j2;
1
n
b. g1(y1) = n/yr'; y > i
: g(y) = n(y - 1)'1/y1;
1
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
631
ANSWERS TO SELECTED EXERCISES
(2r - l)!(y -
e. g(y) 31
'
- 1)!]2y,"
a. g1(y1) = ne"; 0< Yi
g(y) = ne"(1 - e')"'1; 0< y fR(r) = (n - 1)e(1 - e')2; 0< r
''
g(y1,
32
C
= 5e(1 - e)4; O
d
g1(y1) = (1/O)e
b.
33
g5(y5)
a G1(y1) = b Gk(yk)
1
i - (1 - p) Y (n)1 (1
i=k
G(y) = [1 (1
= I/(1/O + y = 123
- p)Yk](l y=
p)Yn]fl;
1] = G1(1) = i - (1
d. P[Y1
35
-
exp (= a. g(y1) = 5e5Y1; O
b. Y
a. M(t) = (1 -
y>O
+ 1/On)
p)(fl')Y
y
1, 2,
p)' DE(O, 0)
CHAPTER 7
i
a. G(y)
ify<1
Ço
= li - (l/y)
ify
i
Degenerate aty = i
a. G(y)=
2
Io
ify1
¿j-1/y ify>i
a. No b. Yes, G(y) = exp(exp(y)) a. Degenerate at y = i b. No limiting distribution
ify0
(0
o. G(y) = texp(_1/y2) if y> O
5
N(0, 1)
7
a. a
8
0.8508
B
a. 0.1587 b. 0.1359 a. 0.9394 b. 11.655 a. 90 b. 122 C. 92 a. 0.4364
11
12 15
0.733, b
1.193
b. a
0.634, b
1,032
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
632
ANSWERS TO SELECTED EXERCISES
if
d. G(y)=y
yo y1
ifO.
i if e. N(B/2"°, {B/(02'10)J2/n) f. B/21° g. WEI(1, O)
17
if yO fy>0
Gle1 23 24 25
N(0, c2), N(0, 1/c2)
a. 0.2206 b. 0.6044 a. Type I (for maximums), a,, = 1, b,,
In n
Type I (for minimums), a,, = 1, b,, = in n d. Type II (for maximums), a,, = On1(e'1' - 1), b,, = O Type III (for minimums), a,,
28
Type 1! (for maximums), a,,
O/(Kn), b,, = O
tan[x(l/2 - 1/n)], b,, = O
CHAPTER 8 1
0.987
2 3
a. 0.39 b. 0.0043 a. 2U/n - 5(W - U2/n)/(n - 1) b. W/n c. a. GAM(100, 10) b. 0.95 c. 12 spares
5 7
8
approx. 0.90
No
10
0.685
11
a. 0.95 b. 0.95
12
approx. 0.95
13
a. 0.9772 b. 0.921 c. 0.921 d. 0.988 e. 0.95 a. N(0, 2o2) b. N(3p, 5,T2) c. t(k - 1) d. 2(1)
15
f.
2(2)
g. unknown h. t(1) i. F(1, 1) j. Cauchy k. unknown
I. t(k) m. x2(n + k - i) n. 16
17
e. t(k - i)
N(4
+
2(1)
p.
F(ni,k-1)
a. 0.6898 b. 0.05 c. 0.75 a. 0.85 b. 27.14 c. 0.90 d. 0.19
18
e. 0.256 f. 2.50 g. 0.045 h. 0.975 a. 0.144 b. 0.95 c. 0.05 d. 0.90 8. 0.592
23 25
b. 0.924 C.
27
x.95(10) = 18.31; approx. = 18.29 x.05(10) = 3.94; approx. = 3.93
0.9929 13
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
633
ANSWERS TO SELECTED EXERCISES
CHAPTER 9 3
a.k/(1k) b. 1/(k-1) c. 2/k a. n/i in X b. 1 + n/ in X
5
X1.,,
7
a k b k2k c (1-1/k)"
i
[x,/(x + 0)] = n/3 b O = 1 404
11
a
15
a. c=n/(n-1) b. {n2/(n N
N
c 19 21
23 27
X,/(nN), i=1
1)](1 -)
[2/(, - 1)](X,/nXl - X/n)/N
¿=1
c=(n+1)/[2(n-1)] a. p(1 - p)/n b. p(1 - p)(l - 2p)2/n
aYesbYes a Estimator
03
0
01
2 + 02
9
9
1/3
Risk
1/2
5/8
Max Risk:
1/2
5/8
5/9
Bayes Risk: 1/2
5/8
13/27
X+1
b.
04
4+0
a.
33 35
a. p/n
37
a. ARE = 1/2 b. ARE = 2/7r b. (n + 1)/(ß + x,) e. (fi +
(& is minimax)
7/27
+2)
xt+1(
29
44
c.k
b. pe2/n c. k d. e
c. 1/[6(n+2)]
e. No
f. Yes
N(p, p(l - p)/n)
d
0.64
2(ß + .x,) 0(2n + 2)
X.5o(2fl + 2) 2(ß + x,)
CHAPTER 10
i
s !/(ns
fl x,
if s =
x, zero otherwise
¿=1
3
F(n/2)/(i'2s12) ifs =
(iii x)r(2n)/s2_
x, zero otherwise
ifs =
f(x1.....x,,; 0) = 0_2n exp( 7
x,, zero otherwise
x,/0Xfl x,); ali x, > O
S=X, ¿=1
9
a. S =
X
b. Only when n = i
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
634
ANSWERS TO SELECTED EXERCISES
13
S1_fJX1,S2_fl(1X1)
15
p
17
c. X1. - 1/n - ln(i - p)
21
a. [E X. - (E X1)2/n]/(n - 1) b. [(E X1)2 - X1]/[n(n - 1)] a + 3(1 645) b i:D([..fn(c - ')]/[3.../n - 1])
23 25 27
a (- 1/n)OE in X) b (1 - n)/(E in X,) a. S2 = [E - (E X1)2/n]/(n - 1) b.
29 31
n/s Same as usual MLE
+ 1.645c5 with c = ['((n - 1)/2)/[,JiF(n/2)]
a Use regular exponential class b i
c fl X =
a n/E ln(1 + X) b E ln(i + X,) c 1/(n2) d (i/n) E in(1 + X) e. N(O,02/n)for O, N(9', 02/n) for i/O f. (n - 1)/E ln(1 +
33
X, + (n
a (1/r)
r)Xr
X1)
b E X, + (n - r)X
CHAPTER 11 a (18 1 205) b lower 18 3 upper 203 e n i
25 for length = 2
d (179 207) e (468 3339) 3
5
a. 14.40 b. exp(t/14.40) a. Q'.-EXP(1/n) b (x1 + (1/n)in(a/2) x1 + (1/n)ln(1 - a/2)) with a = 1 - y e. (161.842, 161.997)
9
(0.81, 2.88)
11
(0.04, 0.21)
13
17
a. (2n/X_12(2nK), 2n/X2(2nK)) b. (5.62, 15.94) a. PL = 1 - y'' b. 0.0209
21
(1.35, 2.10)
23 29
2(2n + 2)/[2(n + /3)]) c. (0.119, 0.326) a. (2(2n + 2)/[2(n + ß)], e. (k - z1_11k//, k + z1_,2k/,/) d. (0.64, 2.73)
CHAPTER 12
i
a. a = 19.59, b = 20.41
b. For A
/3
1
For B
/3
= 0.0091
e. ForA:ß=0.0091 ForB:ß
I
d. a=0.10 e. ß0.0091
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
635
ANSWERS TO SELECTED EXERCISES
3
a. z0 = 2.236> 2.326 Can't reject H0 b. fi = 0.1515 c. n = 24 d. t0 1.12> 2.539 Can't reject H0 e y0 = 33 78 <36 19 Can t reject H0 f
n
5
32 from Table 8. Appendix C (d = 0.8, 1 - fi t0995(31) = 2.7454 (by interpolation in Table 6) C = {t0 It0 2.7454 or t0 2.7454}
7
a. z = 1.79< 1.28
50
n
0.95)
Reject 110
b. B(6; 20, 0.5) = 0.0577 <0.10 Reject H0 c. x(0.2) = 0.9133 d. p value = 0.0577
9
a. 2< 1,746 RejectH0 2< 1.75 RejectH0
2 < 1.86 11
Reject H0
0.29 <0.8 Can't reject H0 a. Reject H0 if x) 0.95 b. 0.0975
RejectH0iffl x 13 17
a. Reject H if x a. Reject H0 if
4
cwhereP[JJ
X1
dO = 1] = 0.05
b. power = 0.5
x?/cr
b. iz(o') = i - H{(o/r2)x_(n); n] c. 0.968 19
a. Reject H0 if
pO+zJ,%/
b. 21
23
Reject H if x Reject H if t = B(k - 1; n, Po)
c where P[
i-
cO
X.
k0 where k0 is the smallest k such that
27
a. Reject H0 if 2n[ln (/O) - /O + 1] Reject H0 if 2n./O0 fi .(2n)
29
Reject H0 ifx
31
37
39
Oc,] = cc
00'1" Reject H0 if 2n[ln(O0/)+ (00/ê - i)] a. k = 0.1053, k' 18 Reject H0 if
x
E(NJ'Y = 1)
4, E(NJo
2.47n + 5.07 Accept 110 if 3)
1
a. Accepts H with n = 14 b. 12
x
2.47n - 6.50
e. Yes, Rejects with n = 4
c. 6
CHAPTER 13 i
f = 4.76 (without continuity correction)
f 4.30 (with continuity correction) 4.30> 2,71 Reject H0 For a one-sided test use binomial test
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
636
ANSWERS TO SELECTED EXERCISES
3
Can't reject H0
a. 2.67 < 6.25
Reject H0
20> 6.25
8.18 > 4.61
5
Reject
a. 5 > 4.61
Reject H0
Can't reject H0
b. 0.78 <4.61
7
0.090 < 5.99
9
1.54 <7.78
11
H
Can't reject H0 Can't reject H0
17.93 > 9.49
Reject H0
13
a. 16.504> 13.28 Reject H0 b 024<924 CantrejectH0
15
a p=9670 = 35.65
4.42 < 6.26
= 96.70
& = 37.46
17
18.05 > 9.49
21
b. CM
c. D
Reject H0
23536> 1551
0.05e
Can't reject H0
Can't reject H0
10.31 < 12.02
Can't reject H0
(1 + 0.2A/X0.058) = 0.060 < 0.102
Can't reject H0
/3(0.151) = 0.724 <0.803
0.151
CHAPTER 'dí I
a
Can t reject H0
t = 10 B(10 20 0 5) = 0588 > 005
RejectH0
b i=2,B(2 30 025)=0091 <010
3 5 7
Based on large sample binomial test z = 0 85
Can t reject H0 Reject H0
a
Reject H0
22 B(22 60 05) = «(-1 94) = 00262 <005
t
10)
c. 5,13 (k
b. (5.22, 5.25) (i = 24,j = 37)
9
< 1 96
Based on large sample binomial test D( 514) <001
a 3539(k=l1) b. (22.40, 112.26)
based on i =
9
and j =
30;
other interva's are possible
providedj -i = 21 11
13
a
1
b
t = 2 B(2 10 05) = 0055 <0 10
a
t =4 B(4 12 05) =01938>010
b
t = 10 <21
349 < 1 383
Can t reject H0 Reject H0
Reject H0
Normal approx. z = 2.27 < 1.28 15
a. U =
5 < 27
Reject H0
Reject H0
Normal approx. U0,05 b. t =
CantrejectH0
28.2
sum of ranks of positive differences = 3
Two sided test would reject at
= 001
(/2 = 0005)
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
637
ANSWERS TO SELECTED EXERCISES
17
a. r = 0.165; t = 0.472< 1.860(a/2 = 0.05) Can t reject
independence
b. R3 = 0.215; p value between 0.280 and 0.268; Can't reject H0 independence t = 0.623 < 1.860
21
R3 = -0.259; t = 1.229> -1.383 Suggests moderate decreasing trend, but can't reject at
23
= 0.10
t=4runs;P(T4IH0true)=0.001 p-value
0.002 (for two-sided test)
CHAPTER 15 1.l82,.ß = 1.315 C. 0.76 d. SSE = 0.0184, &2 = 0.0023 i b. 2 b.0=0.628,ß1=0.608 c. =3.06 e. &2=o.194 3 9
b. ß0 = 1.182,fl = 0.755
a. ¡i=93.90,ô= 37.13 b.
10
û = 93.96, & = 36.77
18
a.
= 13.14, Ô = 24.73
= 0.00245, & = 0.00024 b
= 0.00028
c. (0.0020, 0.0029) d. (0.00013, 0.00123)
20
J = 0.00296
31
f= 22.17> 12.4,
32 33 35
r
Reject H0
0.672
t = 3.391 > 1.761,
Reject H0
a. f= 18.97 > 3.75
Reject H0 b.
= 0.252 <4.60
Can't reject H0
CHAPTER 16 c. (1/10)25 d. 22.5 e. 23 (rounded up) f. 47 (by normal approx.) I Ô = (n/r)x,.
No (convergent integral)
a. 600hours b. 0.916
c. n=7
a. 100/6 = 16.67 b. 0.5488
C. 1139 hours
d. 6/100
0.5714
a.
b. 688.63 c. 142.52 Reject H0 a. 1.177 b. 133.33 < 140.85 2(38)
r (0982 1 420) d 0 346 e 005 year or 2 6 weeks 19
a.
23
a. 0.4335 b. 0,6288 c. k = 6; P[X e Y = min (X 3) E(Y) = 2652
= 0,482,J7 = 2.07 b. 5=4.42, Ô = 83.1
6] = 0.8893 d. 4, 32
f W=X1 +X2'-POI(8) Z=min(W 6) E(Z)=5650
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
638
ANSWERS TO SELECTED EXERCISES
25
a.
[1(211) + 2(93) + 3(35) + 4(7)]/575 = 0.92 (neglecting y
5)
Another approach: 1f p = P[Y = 0] = e, then
= 229/576 = 0.398 and = -ln(O 398) = 092 which is the MLE based on binomial data for p
m: 229
229.5
211
211.2
93 97,1
35 29.8
7
6.9
0.602
27
a. 2,5 b. 0.958
29
a. 0.9887 b. 0.9834 c. 0,9955
31
0.6321, 0.0803, 3/2
35
a.
C. 461
= 3.583/2 = 1.7925
J)
(x+1)x/(l+)2+x x=0 i
c 06959
2 = 3583, 2(1 + ) = 10.011
EFER EN CES Mio, M.; Bain, L.J.; and Englehardt, M. 1983. "Goodness-of-Fit Tests for the Weibull Distribution with Unknown Parameters and Censored Sampling." J. Statist. Comput. Simul. 18, p. 59. Antle, CE., and Bain, LI. 1969. "A Property of Maximum Likelihood Estimators of Location and Scale Parameters." Siam Review. 11, p. 251. Bain, L.J., and Engelhardt, M. 1991. Statistical Analysis of Reliability and L(fe-Testing Models. 2nd ed. New York Marcel Dekker.
Bain, L.J., and Engelhardt, M. 1983. 'A Review of Model Selection Procedures Relevant to the Weibull Distribution." Commun. Statist .-Theor. Meth. 12(5), p. 589.
Bain, L.J., and Engelhardt, M. 1986. "Approximate Distributional Results Based on the Maximum Likelihood Estimators for the Weibull Distribution." J. Qual. Tech. p. 18. Barr, DR., and Zehna, P.W. 1983. Probability: Modeling Uncertainty. Reading, Mass.: AddisonWesley
Basu, D. 1955. "On Statistics Independent of a Complete Sufficient Statistic." Sankya. 15, p. 377. Bickel, P.J., and Doksum, K.A. 1977. Mathematical Statistics: Basic Ideas and Selected Topics. San Francisco: Holden-Day.
Blackwell D 1947
Conditional Expectation and Unbiased Sequential Estimation
Ann Math
Statist. 18, p. 105.
Bradley, EL., and Blackwood, L.G. 1989. "Comparing Paired Data: A Simultaneous Test for Means and Variances." Amer. Statist. 43, p. 234. Brownlee, KA. 1960. Statistical Theory and Methodology in Science and Engineering. New York: John Wiley & Sons Chandra, M.; Singpurwalla, ND.; and Stephens, M.A. 1981. "Kolmogorov Statistics for Tests of Fit for the Extreme-Value and Weibull Distributions." J. Amer. Statist. Assoc. 76, p. 729. Clark, RD. 1946. "An Application of the Poisson Distribution." J. Inst it. Actuar. 72. Cramér, H. 1946. Mathematical Methods of Statistics. Princeton, N.J.: Princeton University Press. Davis, D.J. 1952. "An Analysis of Some Failure Data." J. Amer. Statist. Assoc. 47, p. 113.
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
REFERENCES
639
DeGroot, M.H. 1970. Optimal Statistical Decisions. New York McGraw-Hill. Dixon, W.J., and Massey, F J. 1957. Introduction to Statisncal Analysis. New York: McGraw-Hill. Draper, N., and Smith, H 198 1. Applied Regresswn Analysis (2nd ed ) New York : John Wiley & Sons. Dufour, R., and Maag, U R. 1978. "Distribution Results for Modified Kolmogorov-Smirnov Statistics for Truncated or Censked Samples." Technometrics. 20, p. 29. Eastman, J , and Bain, L.J 1973 "A Property of Maximum Likelihood Estimators in the Presence of Location-Scale Nuisance Parameters " Commun Statist. 2(1), p. 23. Engelhardt, M., and Bain, Li. 1976. "Tolerance Limits and Confidence Limits on Reliability for the Two-Parameter Exponential Distribution." Technometrscs. 18, p. 37 Engelhardt, M., and Bain, L.J 1977a. "Uniformly Most Powerful Tests on the Scale Parameter of a Gamma Distribution with a Nuisance Shape Parameter." Technometrics. 19, p. 77. Engelhardt, M., and Bain, L.J. 1977b. "Simplified Statistical Procedures for the Weibull or ExtremeValue Distribution." Technometrics. 19, p. 323. Ferguson, T.S. 1967 Mathematical Statistics: a Dectston Theoretic Approach New York: Academic Press. Fisher, RA., and Tippett, L M.0 1928. "Limiting Forms of the Frequency Distribution of the Largest or Smallest Member of a Sample." Proceedings ofthe Cambridge Philosophical Society. 24, p. 180. Ghosh, B.K. 1970, Sequential Tests ofStatistical Hypotheses. Reading, Mass. : Addison-Wesley.
Greenwood J A and Durand D 1960 Aids for Fitting the Gamma Distribution by Maximum Likelihood." Technometrics, 2, p. 55.
Greenwood, M., and Yule, G.U. 1920. "An Inquiry into the Nature of Frequency Distributions Representative of Multiple Happenings." J. Royal Statist Soc 83, p. 255. Gross A J and Clark V A 1975 Survival Distr butions Reliability Applications in the Biomedical Sciences. New York : John Wiley & Sons.
Grubbs, FE. 1971, "Fiducial Bounds on Reliability for the Two-Parameter Negative Exponential Distribution." Technometrics. 13, p. 873. Harter, H.L. 1964 New Tables of the Incomplete Gamma Function Ratio and of Percentage Points of the Chi-Square and Beta Distributions. Washington, D.C.: U.S Government Printing Office. Harter, H.L. 1969. Order Statistics and Their Use in Testing and Estimation. Vol. 2: Estimates Based on Order Statistics from Various Populations. Washington, D C: U.S Government Printing Office.
Hogg, R.V., and Craig, AT. 1978. Introduction to Mathematical Statistics, 4th ed. New York: Macmillan. Johns, M.V., and Lieberman, G.J. 1966. "An Exact Asymptotically Efficient Confidence Bound for Reliability in the Case of the Weibull Distribution." Technometrics. 8, p. 135.
Jones, RA.; Scholz, F.W.; Ossiander, M.; and Shorack, GR. 1985 "Tolerance Bounds for Log Gamma Regression Models." Technometrics. 27, p 109. Kendall, MG., and Stuart, A. 1967. The Advanced Theory of Statistics, 2nd ed., Vol. 2. New York: Hafner. Kitagawa, T. 1952. Tables of Poisson Distribution. Tokyo: Baifukan.
Koziol, J.A. 1980. "Percentage Points of the Asymptotic Distributions of One and Two Sample Kuiper Statistics for Truncated or Censored Data." Technometrics. 22, p. 437. Lehmann, EL. 1959. Testing Statistical Hypotheses. New York: John Wiley & Sons.
Lehmann, EL., and Scheffe, H. 1955. "Completeness, Similar Regions and Unbiased Estimates." Sankya. 10, p. 305.
Lieberman, G.J., and Owen, DB. 1961. Tables of the Hypergeometric Distribution. Stanford, Ca.: Stanford University Press. Lieblein, J., and Zelen, M. 1956. "Statistical Investigation of the Fatigue Life of Deep-Groove Ball Bearings." J. Res. Nat. Bur. Stand. 47, p. 273.
Mann, H.B., and Whitney, DR. 1947. "On a Test Whether One of Two Random Variables is Stochastically Larger than the Other." Ann. Math. Statist. 18, p. 50. Matis, J.H., and Wehrly, T.E. 1979. "Stochastic Models of Compartmental Systems." Biometrics. 35, p. 199.
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
640
REFERENCES
McDonald, G.C., and Studden, WJ. 1990. "Design Aspects of Regression-Based Ratio Estimation." Technometrws 32 p 417 Molina, E.C. 1942. Poisson's Exponential Binomial Limit. New York: D. Van Nostrand. Mood, AM.; Graybill, F.A., and Boes, D.C. 1974. Introduction to the Theory ofStatistics. New York: McGraw-Hill. Mullet, G.M. 1977. "Simeon Poisson and the National Hockey League." The American Statistician. 31, p. 8.
National Bureau of Standards. 1950. Tables of the Binomial Probability Distribution, Applied Mathematics, Series 6. Washington, D.C. : U.S. Printing Office. Parzen, E. I 962. Stochastic Processes. San Francisco : Holden-Day. Patel, J.K., and Read, C.B. 1982. Handbook ofthe Normal Distribution. New York: Marcel Dekker. Pearson, ES., and Hartley, H.O. 1958, Biornetrika Tables for Statisticians, Vol. 1, 2nd ed. London: Cambridge University Press.
Pearson E S and Hartley H O 1966 Biometrika Tables for Statisticians Vol
1
3rd ed London
Cambridge University Press.
Pearson, K. (Ed ) 1934. Tables of the Incomplete Beta Function. London: The Biometrika Office, University College.
Pearson, K. 1951. Tables of the Incomplete Gamma Function. Cambridge: Cambridge University Press.
Pettitt, AN., and Stephans, MA. 1976. "Modified Cramer-Von Mises Statistics for Censored Data." Biometrska 63, p. 291. Proschan, F. 1963. "Theoretical Explanation of Observed Decreasing Failure Rate." Technometrics. 5, p. 375.
Rao, C.R. 1949. "Sufficient Statistics and Minimum Variance Unbiased Estimates." Proc. Camb. Phil. Soc. 45, p. 213.
Romig, H.G. 1953.50-100 Binomial Tables. New York: John Wiley & Sons.
Rutherford, E., and Geiger, H. 1910 "The Probability Variations in the Distribution of a Particle." Philosophical Magazine. 20, p. 698. Scheffé, H. 1959. The Analysis of Variance. New York: John Wiley & Sons. Snedecor, G W. 1959. Statistical Methods. Ames, Iowa: Iowa State University Press. Stephens, MA. 1974. "EDF Statistics for Goodness-of-Fit and Some Comparisons." J. Amer. Statist.
Assoc 69 p 730 Stephens, M.A. 1977. "Goodness-of-Fit for the Extreme-Value Distribution." Biometrika. 64, p. 583.
Thoman, D.R., Bain, L.J.; and Antic, C.E. 1969. "Inferences on the Parameters of the Weibull Dis. tribution Technometrics 11 p 445 Trumpler, R.J., and Weaver, HF. 1953. Statistical Astronomy. Berkeley: University of California Press.
Wald, A., and Wolfowitz, J. 1940. "On a Test Whether Two Samples are From the Sample Population." Ann. Math. Statist. 11, p. 147.
Walpole, RE., and Myers, RH. 1985. Probability and Statistics for Engineers and Scientists. New York: Macmillan. Wang, Y. 1971. "Probabilities of the Type I errors of the Welch tests ...." J. Amer, Statist. Assoc. 66, p. 605.
Wasan M T 1970 Parametric Estimation New York McGraw Hill Weibull W 1939 A Statistical Theory of the Strength of Materials ¡ng Velenskaps Akad Hand! 151,p. 1. Welch, B. 1949. "Further Notes on Mrs. Aspin's Tables." Biometrika. 36, p. 243. Wilcoxon, F. 1945. "Individual Comparisons by Ranking Methods." Biometrics. 1, p. 80.
Wilcoxon, F. 1947. "Probability Tables for Individual Comparisons by Ranking Methods," Biometrics. 3, p. 119.
Williamson, E., and Bretherton, M.K. 1963. Tables of the Negative Binomial Probability Distribution, New York: John Wiley & Sons. Zellner, A. 1971. An Introduction to Bayesian Inference in Econometrics. New York: John Wiley & Sons.
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
DEX
Absolute error, 319 Absolutely continuous distribution, 64 Admissible estimator, 320 Alternative hypothesis, 390 Analysis of variance, 425 Asymptotic efficiency, 314 Asymptotic mean, 243 Asymptotic normal distribution, 243 Asymptotic relative efficiency 314 Asymptotic variance, 243 Asymptotically efficient estimators, 314
Asymptotically unbiased estimators, 312
BAN estimators, 315 Bayes estimator, 323 Bayes risk, 322 Bayes rule, 26 Bayesian confidence interval, 382 Behrens-Fisher problem, 380 Bernoulli distribution, 91 Bernoulli law of large numbers, 237 Bernoulli trial, 91 Bernoulli variable, 91 Best linear unbiased estimators, 502 Beta distribution, 278 Bias, 309 Biased estimator, 302 Biased test, 417
Binomial distribution, 92 normal approximation of, 240 Poisson approximation of, 105, 240 table of CDF, 599 Binomial expansion, 36 Bivariate normal distribution, 185 Bonferroni's inequality, 16 Boole's inequality, 16
Cartesian product, 6, 592 Cauchy distribution, 127 Cauchy type, 252 CDF Technique, 194 Censored sampling, 221 type 1,222 type II, 221 Central limit theorem, 238 Characteristic largest value, 253 Chebychev inequality, 76 Chi-square distribution, 268 table of CDF, 606 table of percentiles, 604 Chi-squaie goodness-of-fit test, 453 Classical probability, 12 Combinations, 35 Complement, 5, 588 Complete family, 345 Complete sufficient statistIc, 346 Completeness, 345 Composite hypothesis, 390
Compound Poisson distribution, 577 Conceptual model, 4 Conceptual sample space, 3 Conditional distribution, 153 Conditional expectation, 180 Conditional probability, 18 Conditional probabilíty density function, 153 Conditional tests, 426 Conditional variance, 182 Confidence coefficient, 360 Confidence Contour, 526 Confidence interval, 359, 360 conservative, 373 for binomial parameter p, 368, 374 for difference of binomial parameters, 382 for difference of means, 378 for mean of normal distribution, 362, 365
for ratio of variances, 378 for variance of normal distribution, 365
general method, 369 one-sided, 360 pivotal quantity method, 362 uniformly most accurate, 415 Confidence level, 360 Confidence limit, 360 Confidence region, 362 Conservative confidence interval, 373 641
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
642
INDEX
Conservative test, 405 Consistent estimator, 311 Contingency table, 450 Continuity correction, 241 Continuous distribution, 64 Continuous from the right, 60 Continuous random variable, 64 Continuous sample space, 4 Convergence in distribution, 232 Convergence in probability, 245, 247 Converge stochastically, 233 Convolution formula5 210 Correlated random variables, 178 Correlation, 177 Correlation coefficient, 178 Countable, 4 Countable intersection, 5 Countable union, 5, 593 Countably infinite, 3, 592 Coverage, 474 Covariance, 174 Covariance matrix, 516 Cramer-Ran lower bound, 305 Cramer-Von Mises test, 458 table of critical values, 613 Critical region, 391 Cumulative distnbution function, 58
Degenerate distribution, 77 Degrees of freedom, 268 De Morgan's laws, 5, 590 Dependent events, 27 Dependent random variables, 150 Derived distributions, 267 Deterministic model, i Difference of sets, 588 Dirichiet distnbution, 212 Discrete probability density function, 56
Discrete random variable, 56 Discrete sample space, 3 Discrete uniform distribution, 107 Disjoint sets, 6, 588 Distinguishable, 33 Distribution function, 59 Double exponential dístribution, 127
Efficiency, 308
Efficient estimator, 308 Elements of a set, 588 Empirical CDF, 160
Empty set, 5, 588 Error sum of squares, 502 Equal-tailed conÍidence interval, 361 Equally likely, 12 Equivalent events, 55 Estimate, 290 Estimator, 290 admissible, 320 BAN, 315 Bayes, 323
best asymptotically normal, 315 best linear unbiased, 502 consistent, 311 efficient, 308 least squares, 501
maximum likelihood, 294 moment, 290 minimax, 321 unbiased, 265, 302 uniformly minimum variance unbiased, 304 Event, 4 elementary, 6 null, 5 sure, 5 Exhaustive sets, 23 Expectation, 61, 67 conditional, 180 Expected value, 61, 67 Experiment, 2 Exponential class, 347 regular, 347 range dependent, 350 Exponential distribution, 115 Exponential type, 252 Extended hypergeometric distribution, 138
Extreme-value distribution, 560 for maximums, 252 for minimums, 256
F distribution, 275 tables of percentiles, 609 Factorial moment, 82 Factorial moment generating function, 82
Factorization criterion, 339 Failure rate function, 541 Fair game, 62 Finite collection, 5 Finite intersection, 591 Finite multiplier, 177
Finite population, 177 Finite sample space, 3 Finite set, 592 Finite union, 591
Gamma distribution, 111 Gamma function, 111 Gauss-Markov theorem, 517 Gaussian distribution, 118 Generalized likelihood ratio, 418 General linear model, 515 Geometric distribution, 99 Geometric series, 99 Goodness-of-fit tests, 442 chi-square, 453 Cramer-Von Mises, 458 Kolmogorov-Smirnov, 460 Kuiper, 460
Hazard function, 541 Histogram, 162 Homogeneous Poisson process, 106 Hypergeometric distribution, 95 Hypothesis testing, 389
Independent events, 27, 30 Independent random variables, 150 Indicator function, 341 Indistinguishable objects, 37 Intensity, 107, 571 Intersection, 5, 588 Interval estimate, 360 Interval estimator, 360 Invariance property, 296, 298 Inverse binomial sampling, 103 Inverse transformation, 198
Jacobian, 199, 205 Joint cumulative distribution function, 143
Joint distributions, 136 Joint moment generating function, 187 Joint probability density function, 137, 144
Joint transformations, 204 Jointly sufficient statistics, 337
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
643
INDEX
Kolmogorov-Smirnov test, 460 table of critical values, 613 Kuiper test, 460 table of critical values, 613 K-variate normal, 520
Laplace distribution, 127 Law of large numbers, 246 Law of total probability, 23 Least squares estimates, 501 Lehmann-Schelle theorem, 346 Likelihood function, 293 Likelihood ratio tests, 418 Limiting distribution, 232 Linear combination, 267 Location parameter, 124, 363 Location-scale parameter, 126, 363 Logistic distribution, 200 Lognormal distribution, 199 Loss function, 319 Lower one-sided confidence limit, 360 Lower one-sided test, 397
Mann-Whitney test, 483 table of p-values, 615 Marginal CDF, 148 Marginal pdf of continuous random variables, 147 Marginal pdf of discrete random variables, 141 Marginal probabilities, 21 Markov inequality, 76 Mass points, 56 Mathematical model, 1 Maximum likelihood equations, 294, 298
Maximum likelihood estimate, 294 Maximum likelihood estimator, 294 invariance property, 296, 298 large-sample properties, 316 Mean, 61, 67 Mean absolute deviation, 74 Mean square error, 309 Mean square error consistency, 312 Measurement error, I Median of a distribution, 69 Median of a sample, 218 Median tinbiased, 333 Method of maximum likelihood, 292 Method of moments, 290 Minimal sufficient set of statistics, 337
Minimax estimator, 321 Minimum expected length criterion, 361
Mixed distribution, 70 Mode, 69 Modified relative frequency histogram, 163 Moment of a sample, 291 Moment of a random variable, 73 Moment generating function, 78 Monotone likelihood ratio, 413 More concentrated, 303 Most powerful critical region, 407 Most powerful test, 407 Multinomial distribution, 138 Multiplication principle, 32 Multiplication theorem, 19 Multivariate normal distribution, 520 Mutually exclusive events, 6, 7 Mutually independent events, 30
Negative binomial distribution, 101 Neyman-Pearson lemma, 407 No memory property, 100, 115 Noncentral t distribution, 400 table of sample sizes, 616 Noncentrality parameter, 400 Nonhomogeneous Poisson process, 574
Nonparametric methods, 468 binomial test, 471 confidence intervals, 472 correlation tests, 486 Mann-Whitney test, 483 rank correlation, 489 sign test, 469 tolerance limits, 473 Wald-Wolfowjtz runs test, 492 Wilcoxon test, 477 Normal distribution, 118 table of CDF and percentiles, 603 Nuisance parameters, 362 Null event, 5 Null set, 588 Null hypothesis, 390
One-sided confidence limits, 360 One-sided tests, 397 One-to-one transformation, 197
Order statistics, 215 asymptotic normality of, 244 ¡oint pdf of, 215 marginal pdf of, 216 Outcome, 2
Pairwise mutually exlusive, 7 Parallel system, 29, 545 Parameter, 289 Parameter space 289 Pareto distribution, 118 Partitioning, 39 Pascal distribution, 99 Percentile, 68 Permutation, 34 Pitman asymptotic relative efficiency, 470
Pivotal quantity, 363 Poisson distribution, 103 table of CDF, 602 Poisson process, 105 homogeneous, 106 nonhomogeneous, 574 Polynomial regression model, 500 Population, 158 Posterior density, 324 Power function, 394 Principle of least squares, 501 Prior density, 323 Probabilistic model, 2 Probability, 9 classical, 11
conditional, 16 density function, 64 generating function, 82 integral transformation, 201 mass function, 56 model, 2 set function, 9 subjective, 9 P-value of a test, 396
Quadratic form, 521 Quantile, 68 interval estimates, 472 test of hypotheses, 471
Random, 12 Random interval, 359 Random numbers, 202
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
644
'
INDEX
Random sample, 164 Random selection, 12 Random variable, 53 continuous, 64 discrete, 56 Randomization test, 482 Randomized test, 411 Range, 218 Range dependent exponential class, 350
Rao-Blackwell theorem, 34l Rate of occurrence, 107, 571 Rayleigh distribution, 116 Redundant system, 29 Regression analysis, 499 Regression curve, 186 Regression function, 186 Regression toward the mean, 500 Regular exponential class, 347 Relative frequency, 7 Reliability function, 541 Residuals, 502 Risk function, 319 Runs test, 492
Sample Correlation Coefficient, 529 Sample mean, 161 Sample median, 218 Sample moments, 291 Sample proportion, 161 Sample range, 218 Sample space, 2 Sample variance, 161 Sampling distribution, 267 Sampling without replacement, 19 Sampling with replacement, 29 Scale parameter, 112, 363 Sequential probability ratio test, 429 Series system, 28, 546 Set, 2, 587 Setrfunction, 9 Settheory, 587 Shape parameter, 111, 116 of gamma distribution, 111 of Pareto distribution, 118 of Weibull distribution, 116 Significance level, 391 Simple consistency, 311 Simple hypothesis, 390 Simple linear model, 501 Size of critical region, 391 Size of test, 391
Skewed distribution, 70 Slutsky's theorem, 248 Smallest characteristic value, 257 Snedecor's F distribution, 275 table of percentiles, 609 Spearman's rank correlation, 489 table of p-values, 617 Standard deviation, 73 Standard normal distribution, 119 table of CDF and percentiles, 603 Statistic, 264 Statistical hypothesis, 390 Statistical regularity, 8 Stochastic model, 2 Stochastic convergence, 233 Stoclîastieal]y independent, 150 Student's t distribution, 274 table of percentiles, 608 Subset, 4, 588 Sufficient statistic, 337 complete, 346 factorization criterion, 339 jointly, 337 minimal, 337 Superefficient estimator, 315 Sure event, 5 Survivor function, 541 Symmetric distribution, 69
t distribution, 274 table of percentiles, 608 t test, 399 one-sample, 399 paired, 403 two-sample, 403 Test statistic, 391 Test of hypotheses, 389 critical region of, 391 for composite hypotheses, 395 for equality of means, 403 for equality of proportions, 406 for equality of variances 402 for independence, 450 for the mean of-a normal distribution, 398, 399 generalized likelihood ratio, 418 most powerful, 407 nonparametric, 468 of randomness, 492 power of, 394 randomized, 411
relationship of confidence intervals, -
415 of simple hypotheses, 391 size of, 391 two-sample tests, 402 unbiased, 416 uniformly most powerful, 412 Three-parameter gamma distribution, 127
Three-parameter Weibull distribution, 127
Threshold parameter, 125 Tolerance limits, 473, 548 Total probability, 23 Transformations, 197 Tree diagram, 23 Trial, 2 Two-parameter exponential distribution, 127 Two-sided alternative, 397 Type I censored sample, 222 Type H censored sample, 221 Type I error, 391 Type I extreme-value distribution, 258 Type II error, 391 Unbiased estimator, 302 best linear, 502 uniformly minimum variance, 304 Unbiased test, 416 Uncorrelated random variables, 178 Uniform distribution, 109 Uniformly minimum variance, 304 Uniformly most accurate, 45 Uniformly most powerful test, 412 Union, 5 Uniqueness of MGF's, 82 Universal set, 588 Upper confidence limit, 360 Upper one-sided test, 397 Variance, 73 conditional, 182 of sample, 161, 266 stabilizing transformation, 388 Venn diagrams, 588
Wald-Wolfowitz runs test, 492 Weibull distribution, 116 Weighted least squares estimates, 538 Wilcoxon test, 477 table of critical values, 614 Wilson-Hilferty approximation, 281
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
Spe
//IilluhillillihiullhillhIIihihihIllhIIllhIIf//ffl///IIftuI IfflhIJllh!
7 0000 00077129 0
Notation and Parameters
Discrete pdf f(j
s
Variance
MGF M(t)
Binomial
X BIN(n,p)
npq
\' x=O,l.....n
o
q= lp Bernoulli
X BIN(l,p)
P
pq
r/p
rq/p2
I/p
q/p2
Pe' +q
o
q= lp Negative Binomial X
)prqx -
NB(r, p)
o
IPe'
lqe'
X = r, r + 1,
r= 1,2,... Geometric X
GEO(p)
pq -
Pe'
1qe'
o
q= lp
Hypergeometric
X
HYP(n, M, N)
n=l,2
M=O,l
N N
(M\/NM\ I/N
)n
.
)/ ( n
nM/N
fl-j
MN n
x=O,I.....n
Poisson
X-- P01(p)
p
p
Discrete Uniform X
DU(N)
N= 1,2,...
N+l
I/N
x=l,2
2 N
N2 12
I
N
e' -
le'
'Not tractable.
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor
PDF compression, OCR, web optimization using a watermarked evaluation copy of CVISION PDFCompressor