(- z). This gr a.p11 dernor1str at es t hat (- z) = 1 -
(1) - 1 = 0.683.
(4. 53)
-< 71} = { - 1 -<
T he probab ility of this event is
p [- 1 <
z<
1] =
3) a r e in t he tails of the PDF . When lzl > 3, (z) is very close t o one; for exarr1ple, <1> (3) = 0.9987 and ( 4) = 0.9999768. T 11e properties of
100]. Does t his m od el seem reasonable? 4.6.6 The temper atu re T in t his t hermostatically con t rolled lecture hall is a G aussian r andom var iable wit h expected value JL = 68 degrees Fa h renheit . In addit ion , P[T < 66] = 0.1587. \i\fhat is t he varia nce of T? 4.6.7 X is a Gaussian r a ndom var iab le w it h E [X ] = 0 a nd P [I XI < 10] = 0.1. \ i\f hat is t he standard d eviation a x? 4.6.8 _A.. function commonly used in comm1u1ications textbooks for t he tail probabilit ies of G aussian ra ndom variables is t he complemen tary error function, d efined as
erfc(z) =
J;. f,
00
e-
2
x
dx .
Sho'v t hat
Q(z) =
~ erfc ( 72) .
of n, years filled wit h blackboard errors, t he total a mount in d ollars pa id can be approxim ated b y a G aussian ra ndom variable y,;i wit h expected value 40n and variance lOOn,. \tV hat is t he probability t hat Y20 exceeds 1000? How many years n must t he professor teach in order t hat P [Y'.;i > 1000] > 0.99? 4.6.11 S upp ose t hat out of 100 million m en in t he U nited States, 23, 000 are at least 7 feet tall. Suppose t hat t he heights of ·u .S. men are independen t G aussian random variables w ith a expected value of 5'10". Let 1'l equa l t he number of m en 'vho ar e at least 7 1 611 tall.
(a) Ca lculate a x, t he standar d d eviat ion of t he heigh t of ·u .s . men. (b) In terms of t he
6, no matter how many t imes it runs. By co nt rast, Q(6) = 9. 9 x 10- 10 . T his suggests that i n a set of one billion independent samples of the Gaussian (0 ) 1) random variable, we ca n expect two samples w ith IXI > 6, o ne sa m ple w ith X < - 6, and o ne sa m ple w ith · 2v 1a 2v 1-
function FX=uniform12(rn); x=sum(rand(12,m))-6; T=(-3 : 3);FX=(count(x,T)/m)'; [T; phi (T) ; FX]
x
> 6.
[ PROBLEMS
331
Quiz 9.6
X is t11e binomial (100>0.5) randorn variable and ·y is the discrete ur1iforrn (0, 100) randorn ·variable. Calculate arid graph the PMF of W = X + Y.
Fv,rther Readin,g: [Dl1r94] contains a concise> rigorous preser1tation and proof of the central limit t11eorerri. Problems Difficulty:
Easy
9 .1 .1 X 1 and X 2 are iid random variables 'vith variance Var[X ].
(a) What is E [X1 - X2]? (b) What is Var[X1 - X2]?
Moderate
t
D ifficu lt
Experts Only
9.1.4• X1, X2 and X3 are iid continuous unifo1m random variables. Random variable Y = X 1 + X2 + X3 has expected value E [Y] = 0 and variance a~ = 4. What is the PDF fx 1 (x) of X1?
9.1 .2 F lip a biased coin 100 t imes. On each flip, P [H ] = p. Let X i denote the number of heads that occur on flip i. \iVhat is Px33 (x)? Are X1 and X2 independent? Define Y = X1 + X2 + · · · + X100. Describe Y in words. What is Py(y)? F ind E[Y] and Var[Y].
9.1.5 Random variables X and Y have joint PDF
9.1 .3 A radio program gives concert tickets to the fourth caller w ith t he right answer to a question. Of t he people 'vho call, 25% kno\v the answer. Phon.e calls are independent of one another. Th.e random variable Nr indicates the number of phone calls taken when t he rth correct ans,ver arrives. (If t he fourth correct answer arrives on the eighth call, then N4 = 8.)
9.2.1 For a constant a > 0, a Laplace random variable X has PDF
(a) W hat is the PMF of Ni, the number of phone calls needed to obtain t he first correct answer?
x>O , y>O ,::e +y
2
(X 1/) = { f X.Y , '' 0
otherwise.
= X + Y?
\tVhat is the variance of vV
j .x ( x )
= 2a e - alxl ,
Calculate the
~I[ G F
-oo
oo.
> x (s).
9.2.2• Random variables ,J and K have the joint probability mass function P.J ,K),k
k
j= -2 j = -1
=
-1 0.42 0.28
k=O
k=l
0.12 0.08
0.06 0.04
(b) What is E[J\T1], the expected number of phone calls needed to obtain t he first correct answer?
(a) \t\fhat is t he MGF of J?
( c) \i\fhat is the P lVIF of N 4 , t he number of phone calls needed to obtain the fourth correct a11S\Ver? Hint: See Example 3.13.
(c) F ind the PMF of M (d) \iVhat is E [JVf4]?
( d) What is E [N4]? Hint: J\T4 can be written as the independent sum N 4 = 1<1 + 1<2 + 1{3 + K 4, where each l{i is distributed identically to N 1.
(b) \iVhat is the
~l[GF
of K?
= .J + K?
9.2.3 X is the continuous uniform (a,b) random var iable. F ind t he MGF > x(s). ·u se the ~1IGF to calculate the first and second moments of X.
[ 332
CHAPTER 9
SUMS OF RANDOM VARIABLES
9.2.4 Let X be a Gaussian (0, a) random variable. Use the moment generating funct ion to sho'v t hat
E[X] = 0, E[X
3
]
= 0,
E[X
2
E[X
4
] ]
= a
2
= 3a
(a) F ind t he moment generating functions
, 4
number of points you earn for game i and let Y equal the total number of points earned over the n, games.
>xi (s) and
Let Y be a Gaussian (µ,, a) random variable. Use the moments of X to sho'v that
E [Y 2 ]
= a 2 + µ,2, E [Y 3 J = 3µ,a 2 + JL 3 , E [Y4 ] = 3a4 + 6JLa 2 + JL 4 .
(b) F ind E [Y] and Var [Y]. 9.3.5 At time t = 0 , you begin counting the arrivals of buses at a depot. The number of buses Ki that arrive bet,veen time i - 1 m inutes and time i minutes has t he Poisson P~1F
PK.(k)= i
{2ke- /k! 2
O
k=0,1,2, ... , otherwise.
9.2.5 Random variable K has a discrete uniform (1,n,) P lVIF . Use the MGF >K(s) to find E[K] and E [1< 2 ]. ·u se the first and second moments of 1{ to derive 'vell-kno,vn 2 expressions for L:~=l k and L:~= i k .
1<1 , 1<2 , .. . are an iid random sequence. Let Ri = Ki+ 1<2 + · · · + l{i denote the number
9.3.1 N is the binomial (100, 0.4) random variable. M is the binomia l (50, 0.4) random variable. NJ and N are independent. \i\1hat is the I>MF of L = J\ll + N?
(b) Find the MGF ¢rli (s).
9.3.2 Random variable Y has the moment generating function
(a) What are E[Y], E[Y2 ], and E [Y3 ]? (b) What is E[W2 ]? 9.3.3 Let Ki, 1<2 , . .. denote a sequence of iid Bernoulli (p) random var iables. Let M = 1< 1 + · · · + Kn.
(a) Find the MGF > K(s ). (b) F ind the l\IIGF > j\1 (s ). (c) Use the MGF >M(s) to find E [JVJ] and ·var[NJ]. 9.3.4 Suppose you participate in a chess tournament in which you play n, games. Since you are an average player, each game is equally likely to be a win, a loss, or a t ie. You collect 2 poin ts for each 'vin, 1 point for each tie, and 0 points for each loss. The outcome of each game is independent of t he outcome of every other game. Let X i be the
of buses arriving in the first i minutes. (a) What is the moment generating funct ion >1 Ri (s) and >Ki (s). (d) F ind E [Ri ] and Var[Ri]· 9.3.6 Suppose that during the ith day of December, the energy Xi stored by a solar collector is a Gaussian random variable w ith expected value 32 - i/4 kW-hr and standard deviation of 10 kW-hr. Assuming the energy stored each day is independent of any other day, 'vhat is t he PDF of Y, the total energy stored in t he 31 days of December? 9.3.7 1{, 1<1 , K2 , . . . are iid random variables. Use t he ~!IG F of JV!= Ki + · · · + l
(a) E [JV!] = n, E [1{] . (b) E [M 2] = n,(n - l )(E [K]) 2 + riE [1<2 ] . 9.4.1 X1, X2, .. . is a sequence of iid random variables each with exponential PDF
. {Ae-.Ax fx(x) = 0
(a) F ind > x (s).
::i;
> 0,
other,vise.
[ PROBLEMS
(b) Let J{ be a geometric random variable with PMF
PK(k) = {
~1 -
q)qk- 1
k=l,2, ... , other,¥ise.
333
t ickets? Hint: What is the probability q that an arbitrary ticket is a winner? Xis the Gaussian (1, 1) random variable and J{ is a discrete random variable, independent of X , 'vi t h PMF 9.4.6
F ind the lVIGF and PDF of V = X 1 + .. . + XK.
k
= 0, 1, ... )
otherwise. In any game, the number of passes N that Donovan ~1cNabb 'vill t hrow is t he Poisson (30) random variable. Each pass is completed with probability q = 2/ 3, independent of a ny other pass or the number of passes thro,vn. Let K equal the number of completed passes McNabb thro,vs in a game. What are ;K(s), E[I<], and Var[ I<]? \i\fhat is the PMF PK(k)? 9.4.2
Let X 1, X2 , ... denote a sequence of iid random variables each 'vit h the sa1ne distribution as X. (a) \t\fhat is t he MGF of J{? (b) \iVhat is the MGF of R = X1 + XK? Note that R = 0 if J{ = 0.
· ·· +
(c) Find E[R] and Var[R].
9.4.3
Suppose we flip a fair coin repeatedly. Let xi equal 1 if flip i 'Vas heads (H) and 0 other,vise. Let N denote t he number of flips needed until H has occtu·red 100 times. Is N independent of t he random sequence X1, X2 , ... ? Define Y = X 1 + · · · + X N . Is Yan ordinary random sum of random variables? What is t he P~IIF of Y?
Let X 1, ... , X n denote a sequence of iid Bernoulli (p) random variables and let K = X1 + · · · + X n . In addition , let JV! denote a binomial (ri, p) random variable, independent of X1 , ... , Xn. Do the random variables U = X 1 + · · · + X K and V = X 1 + · · ·+ X A1 have the same expected value? Be careful: U is not an ordinary random sum of random variables.
K , the number of passes that Donovan ~1cNabb completes in any game, is t he Poisson (20) random variable. If NFL yardage were measured with greater care (as opposed to al,vays being rounded to t he nearest yard), officials inight discover that each completion results in a yardage gain Y t hat is t he exponential random variable 'vith expected value 15 yards. Let equal McNabb's total passing yardage in a game. F ind ¢ v (s), E[V], ·v ar[V], and (if possible) t he PDF fv(v).
Suppose you participate in a chess tournament in which you play until you lose a game. Since yo u are an average player , each game is equally likely to be a win, a loss, or a t ie. You collect 2 points for each win, 1 point for each t ie, and 0 points for each loss. The outcome of each gaine is independent of the outcome of every other game. Let X i be the number of points you earn for game i a nd let Y equal the total number of points earned in the tournament.
This problem continues the lottery of Problem 3.7.10 in 'vhich each ticket has 6 randomly marked numbers out of 1, ... , 46. A t icket is a 'vinner if the six marked numbers match 6 numbers drawn at random at the end of a week. Suppose that following a 'veek in 'vhich the pot carried over 'vas r dollars, the number of t ickets sold in that 'veek, J{ , is the Poisson (r) random variable. \i\fhat is the PMF of the number of winning
(a) F ind the moment generating function
9.4.4
'I
9.4.5
9.4. 7
9.4.8
(b) F ind E [Y] and Var [Y] . The 'vaiting t ime in milliseconds, vV, for accessing one record from a computer database is the continuous uniform (0,10) random variable. The read time R
9.5.1
[ 334
CHAPTER 9
SUMS OF RANDOM VARIABLES
(for moving the information from t he disk to main memory) is 3 milliseconds. The random variable X m illiseconds is the total access t ime ('vai t ing time + read time) to get one b lock of information fro1n t he disk. Before performing a certain task, the computer must access 12 different blocks of information fro1n the disk. (Access t imes for d ifferent blocks are independent of one another.) The total access t ime for all the information is a random variable A milliseconds. (a) W hat is E[X]? (b) What is Var [X] ? ( c) \i\fhat is E[A]? ( d) What is a A? t ime? ( e) Use the central limit tlheorem to estimate P[A > 116ms]. (f) Use t he central limit theorem to estimate P[A < 86ms]. 9.5.2 Internet packets can be classified as video ('! ) or as generic data ( D). Based on a lot of observations taken by the Internet service provider, we have the following probability model: P[V] = 3/4, P[D] = 1/4. Data packets and video packets occur independen t ly of one another. The random variable Kn is the number of video packets in a collection of ri packets.
(a) W hat is E[K100 J, the expected number of video packets in a set of 100 packets? (b) What is a K 1 00 ? ( c) Use the central limit theorem to estimate P[l<100>18]. ( d) Use the central limit theorem to estimate P[l6 < Kioo < 24]. 9.5.3 The duration of a cellular telephone call is an exponential random variable 'vith expected value 150 seconds. A subscriber has a calling plan that includes 300 minutes per month at a cost of $30.00 plus $0.40 for each minute that the total calling time exceeds 300 minutes. In a certain month, the subscriber has 120 cellular calls.
(a) ·u se t he cen tral limit theorem to est imate t he probability t hat t he subscriber's bill is greater t han $36. (Assume t hat the durations of all phone calls are mutually ind ependen t a nd that the telephone company measures call duration exactly and charges accordingly, 'vithout rounding up fract ional minutes.) (b) Suppose t he telep hone company does charge a full minute for each fractional minute used. Re-calculate your estimate of the probability that the bill is greater than $36. 9.5.4 Let 1<1, K2, .. . be a n iid sequence of Poisson (1) random variables. Let 11V11 = 1<1 + · · · + l
(a) A \t\feb server has a capacity of C requests per minute. If the number of requests in a one-minute interval is greater than C, t he server is overloaded. Use t he cen tral limit theorem to estimate the s1nallest value of C for 'vhich the probability of overload is less than 0.05. (b) Use l\IIATLAB to calculate t he actua l probability of overload for the value of C derived from t he central limit t heorem. (c) }""or the value of C derived from the central limit theorem, what is the probability of overload in a one-second interval? ( d) \iVhat is t he smallest value of C for which the probability of overload in a one-second interval is less t han 0.05? (e) Comment on t he application of t he central limit theorem to estimate the overload probability in a one-second interval and in a one-minute interval.
[ PROBLEMS
9.5.6 In tegrat ed circuits from a certain factory p ass a cer tain quali ty t est 'vit h probability 0.8 . The out com es of all tests are mut ually independent .
(a) W hat is t he expected number of tests necessary t o find 500 accep tab le cirCUI.t,S ?.
(b) Use t he cent ral limit t h eorem t o estim ate t he probability of finding 500 accep table circuits in a batch of 600 circuits. ( c) Use NI ATLAB t o calcul ate t he actua l proba bility of finding 500 acceptable circuits in a batch of 600 circuits . (d ) Use t he central limit t heorem t o calculat e t he minim um batch size for finding 500 accep table circuits ,_,it h probability 0.9 or greater. 9. 5.7 Internet packets can1be classified as vid eo (V ) or as gen er ic d ata ( D ) . B as ed on a lot of observat ions taken by t he Internet serv ice provider , 've have t he following probability model: P[V] = 0.8 , P [D ] = 0.2. Data p ackets a nd video packets occur ind ependently of one a not her. The r andom variable K n is t he number of video packets in a collection of n packets.
(a) W hat is E[K 4s], t he expected number of video packets in a set of 48 packets? (b) W hat is aK 48 , t he standard d eviation of t he number of video p ackets in a set of 48 packets? ( c) Use t he cent ral limit t lheo re.m t o estim ate P [30 < f{4s < 42], t he probability of bet,veen 30 and 4 2 voice calls in a set of 48 calls . (d ) Use t he De l\/Ioivre- Lap lace formula t o estimate f>[30 < K 48 < 42]. 9.5.8 In t he presence of a h ead,vind of normalized intensity vV, your s peed on your b ike is V = 20 - 10W3 mi/ hr. The 'vind intensity vV is a con t inuous uniform (-1 , 1) r a ndom variable. Moreover , t he 'Vind changes every ten min utes. Let vVi denote t he head ,vind int ensity in t he ith tenminute interval. In a five- hour bike r ide ,
335
wit h 30 t en-minute in tervals, t he 'vind intensit ies W1 , ... , vV30 are independen t and ident ical t o W. The distance you t r avel is
X
= V1 +
V2 + · · · +
V10 .
6
·u se t he CLT t o estimate P [X
> 95].
9.5.9 An amplifier circuit has po,ver consumpt ion Y t hat gro,vs nonlinearly wit h t he input signal voltage X. \i\fhen t he input signal is X volts, t he instantaneous po,ver cons umed b y t he a mplifier is Y = 20 + 15X 2 Watts. The input signal X is t he con t inuous uniform (-1 , 1) random variable . Sampling t he input signal every millisecond over a 100-millisecond interval yields t he iid signal samples X1 , X 2, ... , X 100 - Over t he 100 ms interval, you estimate t he average power of t he amplifier as
1 100
w=
100
:L Yi i= l
where Yi = 20+ 15X f . Use t he cen t ral limit t heorem t o estim ate P [W < 25.4]. 9.5. 10 In t he face of perpet ua lly varying headwinds, cyclists La nce and Ash win are in a 3000 mile race across _America. T o maint ain a speed of v m iles/ hour in t he presence of a 11; mi/ hr head wind, a cyclist must generate a power ou t put y = 50+ (v + 3 VJ - 15) Watts. During each mile of road , t he 'vind speed W is t he cont inuous uniform (0, 10) random variable independen t of t he wind speed in an y ot her mile.
(a) La nce rides at constant velocity v = 15 mi/ hr m ile after mile. Let Y denote Lance's power out put over a r andomly chosen mile. \tVhat is E[ Y]? (b ) As hwin is less powerful but he is a ble t o ride at co nstan t power fJ Watts in t he presence of t he same variable head'vinds. ·u se t he cen t ral limit t heorem t o find iJ such t hat _Ash,vin 'vins t he r ace 'vit h probability 1/ 2. 9.5.1 1 Suppose your gr ad e in a probability co urse depends on 10 weekly quizzes . E ach quiz has ten yes/no que..stions, each
[ 336
CHAPTER 9
SUMS OF RANDOM VARIABLES
v;,ro rt h 1 p oint . The sco ring h as n o p a rt ial credit. Your performance is a m odel of consistency : On each o ne-p oin t quest ion , you get t he rig h t answer wit h probability p, independent of t he outcome on a ny other q uestion. Thus your score X i on quiz i is between 0 a nd 10. Your av10 er age scor e, X = I:i=l X i / 100 is used to d etermine your grade . Tl1e course g rading has simple letter gr ad es wit hout a ny curve: A: X > 0.9, B: 0.8 < X < 0.9, C : 0.7 < X < 0.8, D: 0.6 < X < 0.7 and F : X < 0.6. As it happens , you are a bord erline B /C studen t wit h p = 0.8 . (a) W hat is t he P MF of X i ? (b) Use t he cent ra l limit t heor em to est im ate t he probability P [A] t hat yo ur grad e is an A. ( c) S u ppose no'v t hat t he course has "attendance quizzes." If you attend a lect ure 'vit h an attendance quiz, you get cr edit for a bonus quiz wit h a scor e of 10. If you a r e present for n, bonus quizzes, your modified average
X'
=
lOn + EI~, xi lOri + 100
is used to cal culate your gra de : A: X ' > 0.9, B : 0.8 < X' < 0.9, a nd so on. G iven you attend ri attendance quizzes, use t he cen t ra l limit t heor em to estimate P [A]. (d) Now s uppase t her e a re no attendance quizzes and your week 1 quiz is scored an 8 . A fe,v hours after t he week 1 quiz, you notice t hat a question 'vas m arked incorrectly; your quiz score should have been 9. Yo u appeal to t he annoying prof 'vho says "Sorry, a ll r egr ade r equests must be su b mitted immediately after r eceiv ing your score . But d on 't worry, t he probability it makes a d ifference is virtually n il." Let U denote t he even t t hat your lett er grad e is uncha nged because of t he scoring error. F ind an exact expressio n for P [U].
Wn is t he number of ones in 10 71 independen t t ransmitted bits, each equiprobably 0 o r 1. For n, = 3, 4, ... , use t h e binorni a l prnf function to calculate 9.6. 1
P [0.499
< vVn/ lOn < 0.501] .
\tVhat is t he largest n, for which your l\IIATLAB installation can perform t he calculat ion ? C a n you perform t he exact calculat ion of Example 9.14? ·u se t he l\IIATLAB plot function to compar e t he E rlang (n, >.) P D F to a G a ussian P DF 'vit h t he same expected value and variance for A = 1 a nd ri = 4, 20, 100. \i\fhy are your results not surprising? 9.6.2
Recreate t he plots of Figure 9.3. On t he same plots, superimpose t he P D F of Yn, a G aussian ra ndo1n variable wit h t he sa1ne expected value and variance. If X n denotes t he binomial (n,, p) random variable, explain why for m ost in tegers k, Px n(k) ~ f y(k) . 9.6.3
F ind t he P ~1F of W = X 1 + X 2 in Exa1n ple 9 .17 using t he conv function.
9.6.4
·u se unif orrn12. rn to estim ate t he prob ability of a storm surge gr eater t ha n 7 feet in Example 10.4 based on: 9.6.5 •
(a) 1000 sam ples, (b) 10000 samples. X1 , X2, and X 3 are independent random variables such t hat X k has P MF
9.6.6
Pxk (x) =
{~/(Wk)
X= l , 2, ... , lOk, other,vise.
Find t he P MF of W = X1
+ X2 + X 3.
Let X and Y d enote independent finite r a ndom variables described by t he pr obability and r ange vectors px, sx and py, sy. W rite a l\IIATLAB function 9.6. 7
[pw,sw]=surnf i niteprnf (px,sx,py,sy)
such t hat finite random variable W = X is d escribed by pw and sw.
+Y
[
The Sample Mean
Earlier cliapters of this book present t he properties of probability rnodels. In referring to applications of probabilit}' t heor}', \Ve have assurned prior knovvledge of the probability model t hat governs the outcorries of an experirnent . In practice, however , we encounter rriany sit uations in v.rhich t he probability rriodel is riot knovvri in ad·vance arid experirrieriters collect data in order t o learn about t lie rnodel. In doing so, the}' apply principles of statist'i cal in,fererice, a bod}' of kno'ivledge t hat go,rerris t he use of meas11rerrients to d iscover t he properties of a probability rnodel. Tliis ch apter focuses ori t he properties of t he sarnple rnean, of a set of data. \Ve refer t o iridependerit t rials of one experirnent, wit h eacli trial producing one sample valt1e of a rand orri variable. T he sample rnean is simpl}' t he st1m of t he sarnple values divided by t he number of t rials. We begiri by describing t he relationship of t he sarnple meari of tlie d ata to t he expected vah1e of the ra ridorn vari able. '\Ve t hen describe rnethods of llSing the sample rriean t o estirriate t he expected va.lt1e.
10.1
Sample Mean: Expected Value and Variance
The sarriple rriean J\!fn(X ) = (X1 + · · · + X n)/ri of n, independent observatioris of ra,ndorri variable X is a random variable with expected value E[X] and variance Var [X] /r1,. In t his section, we define t he sarnple rnean, of a random variable and identify its expected valt1e and variance. Later sections of t his chapter show rriathernatica.lly how the sample rriean cori'irerges to a constant as t he riurnber of repetitions of ari experirnent increa,ses. T liis chapt er , therefore, provides t he rnatlierriatica.1 basis for the stat ernent t hat although t he rest1lt of a single experiment is unpredict able, predictable patt erns emerge as vve collect rriore and rriore d ata. T o define t he sarnple rnean , consider repeated iridependent trials of ari experirrient . Eacli t rial results in one observation of a randorri variable, X. After n, t rials, 337
[ 338
CHAPTER 10
THE SAMPLE MEAN
-vve have sarr1 ple values of the ri randorn variables X 1 , ... , X n, a ll v.ri t11 t h e sam e PDF as X. The sarnple rr1ean is the r1urnerical a;ver age of t he observations .
Definition 10.1
Sample Mean
For iid rar1,dorn variables X 1 , ... , X n 'tJJith PDF 1·x (x)) the sam ple mean of X 'is the raridorn variable
T11e first thing t o not ice is t h at Mn(X) is a fun ctior1 of the ra ndorr1 variables X 1 , ... , X n and is t herefor e a r andorr1 variable itself. It is irnportant t o distinguish t he sample rr1ean , Mn(X), from E [X ], whic11 we som etimes refer to as t he rnean, valv,e of randorr1 variable X . '\i\Thile J\rfn (X ) is a randorn ·variable, E [X ] is a nt1mber. T o avoid cor1ft1sion -vvhen st11dyir1g t he sample rr1ean , it is advisable to refer to E [ X ] as the expected value of X , r at11er t11an the rnean, of X. The sarnple rr1ean of X and t he expected value of X a.re closel}' relat ed. A rnajor purpose of t h is cha pter is t o explor e the fact t11at as 'n increases without bour1d, J\rfn( X ) predictably approaches E[X]. In everyday conversation, t his p11enorr1er1on is often called t 11e la'tJJ of averages . The expected value and variance of Mn( X ) reveal t he most irr1portant properties of t 11e sarr1ple rr1ean. From 011r earlier work '\vit h s urns of r a ndorr1 ·v ariables in Chapter 9, we 11ave the following result.
Theorem 10.1 The sarnple rnean, Nln (X ) has expected valv,e an,d var'ian,ce
E [Mn(X )] = E [X ] ,
Var[Mn(X )] = Var[X]. 'n,
Proof From Definit ion 10.1 , Theorem 9. 1, and t he fact t hat E [Xi] = E [X ] for all i,
E [Mn(X)]
= ln
(E [X1]
+ .. . + E [Xn]) = .! (E [X ] + . .. + E [X ]) = E [X ] . n,
(10.1)
Because Var [aY] = a, 2 Var[Y] fo r any r andom variable Y (Theorem 3.15), Var[ Mn(X)] = Var[X1 + ... + Xn]/n,2 . Since t he xi are iid, Vve can use Theorem 9.3 to sho'\V Var[X1
+ · · · + Xn] = Var[X 1] + · · · + Var[Xn] = n, Var[X ].
Thus Var [JV!n(X)] = rL Var[X ]/n?
(10.2)
= Var [X]/n.
R ecall t hat in Section 3.5, we r efer t o t he expect ed ·v alue of a randorn variable as a typical valv,e. Theorern 10.l derr1onstrat es t hat E [X ] is a typical valt1e of Nln(X ), r egardless of n, . Furth errnore, T 11eorerr1 10.1 dernor1str ates t h at as ri increases -vvit hout bound, t 11e varian ce of Mn (X ) goes to zero. ·vV11er1 we first rr1et t he variance, an d its squa re root t he star1dard deviat ion , -vve said that t 11ey indicate 110\v far a
[ 10.2
DEVIATION OF A RANDOM VARIABLE FROM THE EXPECTED VALUE
339
randorr1 variable is likely to be frorn its expect ed value. Theorern 10.1 suggests that as 77, approac11es infir1ity) it becornes 11igl1ly likel}' t 11at Mn (X) is arbit rarily close to its expected value ) E[ X ]. In ot her words ) the sarnple rr1ean Mn (X ) converges to t he expected val11e E [X ] a,s the nl1rnber of sarnples 77, goes to infinit}r. T11e rest of this ch apter contains the m atherr1atical an alysis t hat describes t he nat11re of this convergence. Quiz 10.1
Xis t11e exponential (1) r[-tndorn variable; 1'.lln(X ) is t 11e sarnple rr1ean of n, independent samples of X. Hov.r m an}' samples n, are needed t o guarantee t11at the ·v ariance of the sarnple rr1ean J\Jn(X ) is no rr1or e than 0.01 ?
10.2
Deviation of a Random Variable from the Expected Value
The Chebyshev ineq11alit}' is an upper bound on t11e probability P[IX - µx I > c]. \!Ve use t he Chebyshev inequalicy to derive t he Lav.rs of Large Nurr1bers and the p arameter-estimation t echniques that -vve study in the next t \vO sections. T11e Chebyshev inequality is derived from the l\/Iarkov inequality, a looser upper bo11nd. T he Chernoff' bound is a rr1ore accurate inequality calculated from the complete probability model of X. The a na lysis of t h e convergen ce of J\Jn(X ) to E [X ] begins -vvith a study of t he r andorr1 variable IX - µ,x I, the absolute difference bet ween a r andorr1variable X and its expect ed ·v alue. This study leads t o the Chebyshev i'Tl,equality ) \vl1ich st ates t 11at the probability of a large de·viation from t11e expected value is inversely proportional t o the sql1are of the devia,t iori. The derivation of t he C11ebyshev ineql1alit}' begins -vvith t 11e Marko'u i'Tl,eq11,ality ) an upper bol1nd on t11e proba bility that a sample value of a nonnegative r a r1dorr1 var ia ble exceeds t he expected value by a ny a rbit ra ry fact or. T 11e Laws of Large Nurr1bers a nd t echr1iques for par arneter estirnation, the subject of the next two sections ) are a consequence of the Cheb}rshev inequality . The Cher'Tl,ojf bo'u,71,d is a third inequality used t o estirr1at e the probabilit}' that a r andorn sarr1ple differs s11bst antially from its expected value. T11e Chernoff bound is rr1or e acc11rate t 11an t he Cheb}rshev and l\/Iarkov ineql1a1it ies because it takes into account rr1or e informatior1 about t he probability rnodel of X. To ur1derstand t11e r elationship of t h e Markov inequality) t h e Chebyshev inequality, and the Chernoff' bound) -vve consider t he exarr1ple of a storrr1 surge follo-vving a hurricane. We assurne that t he prob ability rnodel for t h e r ar1dorr1 11eight in feet of storrn s1rrges is X ) the G al1ssia n ( 5.5 ) 1) r andom ·variable ) and consider the event [X > 11] feet. The probabilicy of this ever1t is ver}' close t o zero : P [X > 11] = Q (ll - 5.5) = 1.90 x 10- 8 .
[ 340
CHAP TER 10
THE SAMPLE MEAN
Theorem 10.2
Markov Inequality
For a ran,dorn 'Uariable X , s'uch that P [X
< O]
Proof Since X is nonnegative, f x(x) = 0 for ::r c2
E [X ) = {
lo
Since x
> c2
xfx (1;) dx
+
!
=
O; an,d a con,st an,t
< 0 a nd
oo
~
C;
!.
oo
xfx (1;) dx
>
xfx (x) d1;.
(10.3)
c2
in t h e rema ining integr al,
(10.4)
K eep in rr1ind t h at the l\/I arkov ir1equality is ·v alid on ly for r1onnegative r an dorr1 variab les . As -vve see in t he next exarr1ple, t he bound provided b}' the l\/Iarkov inequalit}' can be very loose.
=== Example 10.l==::::::a Let X represent the height (in feet) of a storm surge fol lowing a hurricane . If the expected height is E [X ] = 5.5, then the Markov ineq ua lity states that an upper bound on the probability of a storm surge at least 11 feet high is P [X > 11] < 5. 5/11 = 1/ 2.
(10.5)
We say t he Markov ineqt1ality is a loose bound because the probability that a storm surge is higher t.ha n 11 feet is essen t ia ll}' zero, v.rhile the inequa lit}' rr1erely st at es that it is less than or eqt1a l t o 1 / 2. Although the bound is ext rem ely loose for m any randorn variables, it is tight (ir1 fact , an equation) -vvit h respect t o sorr1e randorr1 ·v ariables.
Example 10.2 Suppose random variable Y takes on the va lue c2 with probability '[J and the va lue 0 otherwise. In this case, E [Y] = '[JC 2 , and the Markov inequa lity states (10.6) Since P [Y > c 2 ] this instance .
= p, we observe t hat the
Markov inequality is in fact an equality in
The Chebyshev inequalit}' applies t he 1!{arkov ir1equa.lity to t11e nor1negative random ·variable (Y - 11,y ) 2 , d erived frorr1 any rar1dom ·v ariable Y.
[ 10.2
DEVIATION OF A RANDOM VARIABLE FROM THE EXPECTED VALUE
341
For ari arbitrary ra/ndorn v ariable ·y an,d con,stan,t c > 0 )
P
[ly _ /1,y I >- c] < -
Var[Y] 2 .
c
Proof In t he tv'Iarkov inequalit y, Theorem 10.2, let X
= (Y -
µ,y ) 2 . The inequality states (10.7)
The t heorem follows from the fact t hat { (Y - µ,y) 2
> c2 } = {IY -
J.lY I
> c}.
Unlike the Markov ineq11ality, t he Chebyshe·v inequality is ·valid for all randorr1 variables . '\i\Thile the Markov inequalit:y r efers only t o t he expected ·value of a r andorn variable, the C11ebyshe\r inequality also refers to the variance. Because it llSes more inforrnat ion about t he r andorn \rariable, tl1e Chebys11ev ir1equalit}' generally pro\rides a tighter bound than the l\/Iarko\r ineq11ality. In particular, v.rhen the \rariar1ce of Y is very srnall, the Chebyshev inequality says it is unlikely that ·y is far avvay frorn E [Y].
Example 10.3 If the height X of a storm s urge fo llowing a hurricane has expected value E [X] = 5.5 feet and standa rd deviatio n ax = 1 foot , use the Chebyshev in equa lity to to find an uppe r bou nd on P[X > 11]. Since a height X is no nnegative , the probabi lity that X
> 11 can be writte n as
P [X > 11] = P [X - µ,x > 11 - µ,x ] = P[IX - µxl >5 .5] .
(10.8)
Now we use the Chebyshev inequa lity to obta in
P [X > 11] = P [IX - µ,x i > 5.5] < Var[X]/(5.5) 2 = 0.033 ~ 1 /30. Although th is bound is bet te r tha n the Markov bound , it is a lso loose . P[X seven orders of mag nitude lower t han 1/ 30.
(10.9)
> 11] is
The Chernoff bot1nd is an ir1equality derived frorr1 t 11e rr1omer1t generating function in Definition 9.1. Lil
- - -Theorem 10.4
Chernoff Bound
For ari arbitrary ran,dorn variable X a'nd a coristarit c)
[ 342
CHAP TER 10
THE SAMPLE MEAN
Proof In terms of t he unit step function , 11,(::e), vve observe that
P [X > c] = For all s
> 0 , 11(x -
c)
< es(x -
f
00
Jc c).
00
fx(x) dx
= /_
11,(x - c) f x(x) dx.
(10.10)
- oo
This implies
(10.11) This inequality is t rue for an y s s to minimize e - sc ¢ .,'.( ( s) .
> 0. Hence t he up per bound must hold when we choose
The Chernoff b ound can b e applied to a n}' r andorr1 variable. However , for sm all values of c, e-sc>x (s) \x.rill be rninirr1ized by a negative va111e of s. In t 11is case, t11e rr1ir1irnizing nonnegative s is s = 0, and t he Chernoff bound gives the t rivial ans\ver
P [X > c] < 1. Example 10.4 If the probab ility model of the height X, measured in feet, of a storm surge fo llowing a hurrica ne at a certain locatio n is the Gaussian (5 .5, 1) random variable, use the Chernoff bou nd to find an upper bound on P [X > 11]. In T able 9.1 the MGF of X is
(10.12) Th us the Chernoff bound is
P[X > 11] < rr1in
2
e -llse( ll s+s )/ 2
= rr1ir1
s>o
2
eCs -lls)/ 2 .
s>o
(10.13)
T o find the minimizing s, it is suffi cient to c hoose s to m inimize h,(s) s 2 - ll s . Setti ng the derivative rlh,(s )/rls = 2s - 11 = 0 yields s = 5. 5. Applying s = 5.5 to the bound y ields
p [X > 11] <
=
eCs2 -lls)/2 s=5.5
e - (5 .5)2 / 2
= 2.7
X
10- 7.
(10.1 4)
E ven t hough the Cherr1off b ound is 14 tirr1es higher t h an the actl1a l probability, 1 - (5. 5) = 1.90 x 10-8 , it still conveys t he inforrnation that a st orrn surge higher than 11 feet is ex t rem el}' ur1likel}'· By cont r ast , t he }/!a rko\r a nd Chebys11ev ineql1alities provide bol1r1ds t h at suggest t 11at ar1 11-foot st orrn surge occ11rs r elatively frequent ly. The infor rr1ation needed t o calculate the t11ree inequalities accounts for t he differen ces ir1 t h eir accl1racy. The Marko'' inequ ality uses only the expect ed vall1e, the Cheb}rs11e\r ineq11ality llSes t he expect ed value and the variance, while the ml1ch rr1ore acc11rat e Chernoff' bound is b ased on kno\vledge of the cornplet e probability rr1odel (expressed as ¢ x (s)) .
[ 10.3
343
LAWS OF LARGE NUMBERS
Quiz 10.2 In a subway st a tion) t11ere a re ex actly er1ough custorn ers on t he p la tforrr1 to fill three trains. The a rrival tirr1e of t he rith train is X 1 + · · · + Xn vvhere X 1 , X2, ... are iid exponential r a ndom ·v ariables \vith E(X .i:] = 2 rr1int1tes. L et W equal the t im e required to serve the \vait ing c11storners . For P[W > 20), t11e probability t11a t Wis O\rer t wenty rr1inutes, (b) Use t he J\/Ia rkov ineqt1ality to fir1d (a) Use t11e central lirni t theorerr1 to fir1d a n estirr1ate. a n llpper bound. ( c) Use the Cheb}rshev in equality to find an upper bo11nd.
(d) Use the Chernoff' bot1nd to find a n upper bour1d.
(e) Use Theorerr1 4.11 for a n ex act calculation.
10.3
Laws of Large Numbers The sample mean lllfn(X) converges to E[X] a nd the relative freq11ency of event A cor1verges to P[A] as 'n, the nt1rr1ber of indeper1d ent trials of an experirnent , increases \vithout bound.
vVhen we apply t11e Chebyshev inequality to ·y = Mn(X), \Ve obtain llSeft1l insigl1ts into t11e properties of indeper1dent sarnples of a r a ndorr1 var iable.
---== Theorem 10.s---= Weak Law of Large Numbers (Finite Samples) For a/ny con,stan,t c > 0,
{a) P[IMn(X) {b) P [IMn (X) Proof Let Y
> c] <
·var[X]
Jl,X
I
jl,X
Var(X] I < C] > 1 - n,c2 ·
ric 2
,
= Mn(X). Theorem 10.l states t hat
E [Y] = E [Nln(X) ] = 11.x
Var[Y] = Var[Mn(X)] = ·v ar [X]/ri,.
(10.15)
Theorem 10.5(a) follo,vs by applying t he Chebyshev inequality (Theorem 10.3) to Y = M 11 (X) . Theorem 10.5(b) is just a restatement of Theorem 10.5(a), since
P [l l\1n(X) - /J,xl > c] = 1 - P [I Mn(X) - µ,x i < c] .
(10.16)
In \vords, Theorem 10.5(a.) says that the probability that the sarr1ple m ear1 is rr1ore than c units frorn E(X ] can be m ade a rbitra rily srn a ll by letting the nt1rnber of samples n, become large. B}' takir1g t11e lim it as n, ---+ oo, we obta in the infinit e lirr1it r est1lt ir1 the n ext t11eorerri.
[ 344
CHAP TER 10
THE SAMPLE MEAN
Theorem 10.6
Weak Law of Large Numbers (Infinite Samples) If X has firii te variarice) then, f or an,y con,stan,t c > 0,
(a) lirr1 P [IJ\Jn( X ) - µ,x I > c] = 0) n-+oo
{b) lirr1 P [IJ\Jn ( X ) - µ,x I < C] = 1. n-+oo
In parallel t o Theorerr110.5) T11eorerns 10.6(a) and 10.6(b) are equi·valent stat ements because
P [IMn(X ) - µx I > c] = 1 - P [ll\1n(X ) -
Jl,X
I < c] .
(10.17)
Ir1 vvords, T11eorern 10.6(b) says t11at the probability that t 11e sample mear1 is vvit11in ±c units of E[X] goes t o one as the number of sarnples approaches infinity . Since c can be arbitraril}' srnall (e.g., 10- 2000 )) both Theorem 10.5(a) and Theorerr1 10.6(b ) can be ir1terpret ed as saying t 11at the sarr1ple rr1ean converges t o E[ X ] as t11e r1urnber of sarr1ples increases \iVit hout bound. T11e \;veak la\iV of large nurr1bers is a \rer}' ger1er al result because it holds for all r andom variables X \iVith finite varian ce. l\/Ioreover , we d o not r1eed to knovv ar1y of the p ararr1et ers, sucl1 as t he expect ed \ralue or variance, of randorr1 \rariable X. The adjective 1JJeak in t he weak lavv of large nurnbers suggest s t 11at there is also a stror1g lavv. The}' differ in t 11e nature of the convergence of J\lln(X ) t o 11,x. The con\rerger1ce in T11eorerr1 10.6 is an exarnple of corivergen,ce iri probability. Definition 10. 2 Convergence in Probability T he ran,dorn seqv,en,ce ·11;1, con,verges in, probability to a con,stan,t y if fo r ariy
lim P [l .Yn -
n-+oo
YI > E] =
E
> 0,
O.
T he weak lavv of large numbers (T11eorerr1 10.6) is an exarnple of convergence in probabilit}' ir1 \iVhic11 Y;.1, = 1\Jn(X ), y = E[X] ) a rid E = c . The stron,g la111 of large 'n'tJ,rnbers stat es that 'tlJith probability 1) the seql1ence M 1 , J\J2 , ... 11as t h e limit /J,x . l\/Iat11ernaticiar1s use t he terrns corivergen,ce alrnost surely) con:uergerice alrnost al111ays, a nd con:uergen,ce alrnost ever~y'uJhere as synor1yrr1s for convergence wit 11prob ability 1. The differ en ce between t he strong lavv a nd t he vveak lavv of large nurnber s is Sl1bt le and r arely arises in practical a pplications of probabilit}' theory. As we will see in t 11e next theorem , t 11e \;veak law of large nurnbers \ralidat es the r elative freql1ency interpretation of probabilities. Consider ar1 arbitrary event A frorr1 an experiment. To examine P [A] \iVe define t he indicat or randorr1 variable 1
if event A occ11rs,
0
ot11ervvise.
(10.18)
Since X A is a Bernoulli r andorn variable \iVith success probability P [A]) E[XA] = P[A]. Since gener al proper t ies of t 11e expect ed vall1e of a r andorr1 variable apply t o
[ 10. 4
345
POINT ESTIMATES OF MODEL PARAMETERS
E[X A], -vve can apply the la-vv of large nurnbers t o sarnples of t 11e indicator X A: (10.19) " Since X Ai just counts wl1etl1er event A occurred on t rial i, Pn( A ) is the relative freq11,er1,cy of event A ir1 ri t rials. Since Pn(A ) is t h e sarnple rnean of X A, we v.rill see t hat the properties of the sarr1ple m ear1 explain the rr1athem atical connection bet weer1 relative frequer1cies and probabilit ies.
--== Theorem 10.1 A s ri ---+ oo; the relati'ue freqv,en,cy Pn(A ) con/verges to P[A]; fo r an,y con,stan,t c > 0; lirn P [ Pn(A ) - P [A ] >
n-+oo
c] = 0.
Proof The pr oof follows from Theorem 10.6 since Fn(-4)
of t he indicator XA, w hich h as expected value E [XA] P[A]( l - P [-4]).
= Ji.lln( X A) is t he sample m ean
= P [A] and varia nce Var [XA]
Theorerr1 10. 7 is a rr1atherr1atical vers ion of the staterr1en t t11at as t11e r1urr1ber of observat ions grows without lirr1it, the relative frequen cy of any ever1t approacl1es the probability of the ever1t . Quiz 10.3 X 1, ... , X n are n, iid sarnples of the Bernoulli (p = 0.8) rar1dom ·v ariable X. (a) Find E [X ] and Var[X]. (b) ·vvhat is Var[Jlf1oo(X )]?
( c) Use T11eorerr1 10 .5 to find t hat
O'.
such
(d) Ho-vv rr1any sarnples n, are needed to gl1arantee
P [l1VI1oo(X ) - Pl > 0.05]
10.4
P [1111.n(X ) - r>I > 0.1] < o.o5.
Point Estimates of Model Parameters " R, an estirr1ate of a pararr1eter, r, of a probability rnodel is unbiased if E [R] = r . A seql1ence of estimates R 1, R 2 , .. . is consist ent if " lirnn-+oo Rn = r . T he sarr1ple m ear1 is an t1nbiased estirr1ator of /J, x . The sequence of s arr1ple rr1eans is consister1t. The sample variance is a biased estimator ofVar [X ].
Ir1 t 11e rernainder of t his chapter , -vve consider experirnents perforrned in order t o ob ta,in ir1forrnation about a probabilit}' model. To do so, investigators usu ally
[ 346
CHAPTER 10
THE SAMPLE MEAN
derive probability rriodels frorri practical meas11rerrients. Lat er , they use tlie rnodels in ways described throughout this book. How to obtain a rnodel in the first place is a rriajor subject iri statistical inference. In this section -vve briefly irit roduce the subject b}' stt1dying estimates of the expected valt1e and the ·v arian ce of a r andorri variable. The general pro bl em is estirriation of a pararneter of a probability rnodel. A pararrieter is any nurnber that can be calculated from the probabilit}' model. For exarriple, for an arbitrary event A , P [A) is a rriodel param eter. The t echniques -vve st11dy in this cha pter rely on the properties of the sarnple rriean Nln( X). Dependirig on the defiriition of t he r andorri ·variable X , -vve can use the sample rriean t o describe an}' pararrieter of a probability model. We consider t-vvo t}rpes of estirnates: A poin,t estirnate is a nurriber that is as close as possible t o the p ar arriet er to be estin1ated, -vvhile a co'Tl,fiden,ce 'ir1,terval estirnate is a r ange of nt1rribers t hat contains the pararriet er to be estirnated -vvith liigh probability.
Properties of Point Estimates
Before presenting estirnat ion rriethods b ased on the sarnple m eari, we introduce three properties of point estimates: bias, con,sisten,cy, a nd acc11,racy. We \vill see that the sarriple m eari is an unbiased, consist ent estirriator of t he expected value of a randorn variable. By contrast, we will find that the sarnple varian ce is a biased estirriate of tlie va1iarice of a r aridorri variable. One rrieasure of the acct1racy of an estimate is the rnean, sq11,are error, the expected sq11a red difference bet\veen an estirnate arid the estirriat ed pararrieter. Consider ari experirnerit tliat prodt1ces observations of sarnple values of the r andom variable X . \!Ve perform an indefinite n11mber of iridependent t rials of the experirnent. T he observations are sarnple \ralues of the randorri variables X 1 , X2 , ... , all -vvith t he sarrie probability rnodel as X. Assurne tlia t r is a parameter of t h e probability rnodel. \N"e t1se the observations X 1 , X 2 , ... to prod11ce a seqt1en ce of estirnates of r . The estirriates R 1, R2 , ... are all r andom variables. R 1 is a ft1ncti on of X l · R2 is a function of Xi arid X2 , and in gerier al Rn is a fun ctiori of X1, X2 , ... , Xn. \i\Then the sequence of estimates R1 , R2 , ... coriverges in probability to r, -vve say the estimator is con,siste'Tl,t. A
A
A
A
A
A
A
Definition 10. 3 Consistent Estimator The seqv,e'Tl,ce of estirnates R 1 , R 2 , ... of pararneter r 'is consistent if fo r an,y E A
A
lirri P [
n-+oo
Rn A
> 0,
r > EJ = 0.
A
Another property of ari estirnate, R, is bias. R errierriber tliat R is a r andorn variable. Of course , \Ve -vvottld like R to be close t o the true pararrieter valt1e r -vvith high probability. Iri repeat ed experirrients, ho\ve\rer , sorrietirnes R < r and other tirries R > r . Altho11gh R is randorn, it -vvot1ld be t1ndesirable if R was eitlier typically less thari r or typicall}' great er thari r . To be precise, we wot1ld like R to be 11/nb'iased. A
A
A
A
A
[ 10.4
347
POINT ESTIMATES OF MODEL PARAMETERS
Definition 10.4
Unbiased Estimator A n, estirnate; R, of pararriet er r is unbiased if E[R]
= r;
other'tu'i se,
R is biased.
U nlike consistency, vvhicl1 is a property of a sequen ce of estirr1at ors , bias (or lack of b ias) is a propert:y of a single estirnator R. The concept o f asyrnptotic bias a p plies t o a sequence of estim ators R 1 , R 2, . .. Sl1ch t hat each R n is biased v.rith t he b ias dirninishing toward zero for la rge rL This type of sequen ce is asyrnpt otically un,biased. A
A
A
Definition 10.5: - - -Asymptotically Unbiased Estimator T he seque'nce of estirnators R n of pararneter r 'is asympt otically unbiased if A
lirr1 E[Rr1.] = r .
n---+oo
T l1e m ear1 squ a r e error is a n irnportan t rr1easure of t he accuracy of a poir1t estirnate. ·vve first en cot1ntered t h e m ean square error in Section 3. 8; hovvever , in t h at ch a p ter , vve v.rere estirr1ating t he value of a r andorn variable . T l1at is, vve v.rere guessing a d eterminist ic nurr1ber as a predict ion of a r a ndorn ·v aria ble t hat vve l1ad yet t o observe. Here vve llSe the sarr1e rnean squa re error rnetric, bl1t we a re using a rar1dom varia ble derived frorr1 experirr1en tal trials to estirnate a deterrr1inistic b t1t unkr1ovvn param eter.
Mean Square Error Definition 10. 6 T he mean squ are error of estirnator R of pararneter r is A
Note that vvhen R is a r1 u r1biased estirr1at e of r arid E[R] = r , t he rnean sql1a re error is t he variar1ce of R. For a sequ ence of unbiased est irr1at es, it is e noug h to shovv t h at tl1e m ean squa re error goes t o zero t o pro\re that t he estirr1at or is consist ent. A
---== Theorem 10.8 A
A
If a seqv,en,ce of 'IJ,n,biased estirnates R1, R 2, ... of param,et er r has rrieari sqv,are error en = Var [R n ] satisf yin,g limn---+oo en = 0, then, the se qv,en,ce R n is con,sisten,t. A
~
Proof Since E [Rn] = r, vv-e apply t he C hebyshev inequality to
P
Rn. For a ny constant E > 0,
Rn -r > EJ < \ far[2Rn] .
(10.20)
A
[
E
In t he limit of la rge n, we h ave li1n P
n-=
[Rn - r > EJ <
lim \!ar[f nJ
n- =
E
= 0.
(10.21)
[ 348
CHAPTER 10
- - Example
THE SAMPLE MEAN
10.5~=~
In any interva l of k seconds , the number J\Tk of packets passing through an Internet router is a Poisson random variable w ith expected va I ue E [Nk] = kr packets. Let Rk = J\Tk/k denote an esti mate of the parameter r packets/ second . Is each estimate A
A
Rk an unbiased estimate of r ? What is the mean square error ek of the estimate Rk? A
Is the sequence of estimates
A
R 1 , R2 , ...
consistent?
A
First , we observe that Rk is an unbiased estimator since
(10.22) Next, we recal l that since J\Tk is Poisson , Var[JVk]
= kr . This implies (10.23)
A
Because Rk is unbiased, the mean square error of the estimate is the same as its variance: ek = r/k. In add ition , since lirr1k-+oo Var[Rk] = 0 , the sequence of estimators Rk is consistent by T heorem 10.8. A
A
Point Estimates of the Expected Value
To estirnater = E [X], we t1se Rn = J\lfn(X) , thesarnple mean. SinceTheorern 10.l tells us that E [Mn( X) ] = E[X] , the sample rr1ean is t1r1biased. =-=;;;;;;;; Theorem 10. 9__..;;=;;;i The sarnple rnean, Nln ( X) is an, ·u·n biased estirnate of E[X].
Because the sarnple mean is unbiased , t.he rnean sqt1are difference betvveen Mn(x) and E (X] is Var(J\J,n(X) ], given in Theorern 10.1: ---== Theorem 10 .10 The sarnple rnean, estirnator Mn (X) has rnean, sqv,are error 2
en = E [(Mn(X) - E [X ]) ] = Var(Nin(X)] =
Var(X] ·n .
Ir1 the terrnir1ology of statistical inference, Fn, the standard deviation of the sarr1ple mean , is referred t o as t r1e stan,da:rd error of the estirnate . The standard error gi·ves an indication of how far we sr1011ld expect tr1e sarnple rnean to de·viate frorn the expected ·value. In particular, vvhen X is a Gaussian randorn ·variable (and Mn(X) is also Gat1ssian), Problem 10.4.l asks you to shov.r that
P [E [X] -
Fn < l\!In(X)
< E (X] + FnJ = 2(l) -
1 ~0 .68 .
(10.24)
[ 10. 4
POINT ESTIMATES OF MODEL PARAMETERS
349
Ir1 vvords, Equation (10.24) says t h ere is roughl}' a t wo-t hirds probability t 11at t he sarnple mean is vvithir1 one st andard error of t 11e expect ed vall1e. This sarne cor1cll1sior1 is approxirnat ely t rue wher1 'n is large and the central lirnit theorem Sa}' S t hat Mn (X) is approxirnately G aussiari. .....-----,=
Example 10.6
How many indepe ndent trials ri are needed to guarantee that Pn(A ), the relative frequency estimate of P[A]. has standard error < 0.1? Since the indicator X A has variance Var [X A] = P [A) (l-P [A)), Theorem 10.10 implies that the mean square error of Mn( X A) is
en =
Var [X) 'n
=
P [A) (1 - P [A)) n,
.
(10.25)
We need to choose n, large enough to guarantee .Je;i < 0. 1 (en < = 0.01) even though we don 't know P[A). We use the fact that TJ(l - p) < 0.25 for all 0 < [J < l . Th us, en< 0.25/n,. To guarantee en< 0.01, we choose n, = 0.25/ 0.01 = 25 trials.
T11eorem 10. 10 d err1on str ates t h at t he st anda rd error of t h e estirr1at e of E[X] converges t o zero as n, grows v.rithout bour1d. The follovvir1g t heorerr1 stat es that t his irnplies that t he sequer1ce of sample rr1eans is a consist er1t estimator of E[ X ).
Theorem 10.11.------. If X has fin,it e varian,ce; t hen, the sarnple rnean, Mn (X ) is a seq11,en,ce of con,sisten,t estirnates of E[X]. Proof By T heorem 10. 10, t h e mean sq uar e error of
Mn(X) satisfies
. . Var[X] hm Var[ l\1n(X) ) = 11m
n~=
n~ =
n
= O.
(10.26)
By Theor em 10.8, t he sequence M 11 (X) is consistent .
Theorern 10.11 is a restaterr1er1t of t he \veak law of large r1urr1bers (Theor em 10.6) in the langt1age of par arnet er estirnation.
Point Estimates of the Variance vV11en t he unknown pararneter is T = Var[X], \Ve have tvVO cases to consider. Because Var[X] = E[ (X - µ,x ) 2 ) depends on t h e expect ed value, \ve consider sep ar ately t he sit uation wher1 E[X] is kno\vn and \vl1en E [X ] is an llnknown p ar arnet er estirr1a ted b}' Mn (X ). Suppose \Ve know t 11at E[X) = 0. Ir1 t 11is case, Var[X) = E [X 2 ) and estimation of t he \rariance is straightforvvard. If \Ve defir1e y = X 2 ) \Ve can vieV\Tthe estimation of E [X 2 ) frorr1 the sarnples X ,;, as t he estirr1ation of E [.Y ] frorn the sarr1ples ·~ = X f . That is , t 11e sample rnean of ·y can be \vritten as
l\/In(Y ) =
l (Xr + ... + X.~) . T/,
(10.27)
[ 350
CHAPTER 10
THE SAMPLE MEAN
Assumir1g that Var[Y] exists , the \veak la\v of large nurnbers irr1plies that Nln (Y) is a consistent , ur1biased estimator of E [X 2 ] = Var[X]. vV11en E [X ] is a knovvn quantity 11,x, V\'e kno\v Var [X] = E[ (X - µ,x ) 2 ] . In this case, we can llSe the sarr1ple rr1ean of vV = (X - 11,x ) 2 to estirnat e Var[X ]., (10.28) If Var [W] exists, Mn(W) is a consistent , unbiased estirnat e of Var[X]. When t he expected \ra,lue µ,x is unknovvr1, the situation is rnore complicat ed becal1Se t11e \rarian ce of X depends or1 µx. \'Ve cannot use Equation (1 0.28) if µ,x is unkno\vn. In this case, \Ve replace the expect ed vc:,tlue 11,x by the sarr1ple rnean
Mn(X) . Sample Variance Definition 10. 7 The sa,rnple varia,n,ce of n, 'irideper1,der1,t observa,t'ion,s of ran,dorn variable X is n
Vn(X) =
l'n " ' (X ,i ~
Nln(X))
2
.
i= l
In contrast to t11e sarnple rr1ean, the sarnple variance is a biased estimat e of Var[X].
=== Theorem 10.12
E [Vri(X) ] =
T/, T/,
1
Var[X].
Proof Substit u t ing Definition 10. l of t he sample mean Mn (X) into Definition 10. 7 of
sample variance a nd expanding t he sums, we derive (10.29) Because t he X i are iid, E [Xf] = E[X2 ] for all i, a nd E[Xi] E[Xj] = Jl,~. By Theor em 5.16(a) , E[Xi Xj] = Cov[X i , Xj] + E [X i] E[Xj]· Thus, E[Xi Xj] = Cov[X i , Xj] + Jl,~. Combining t hese facts, t he expected value of Vn in Equation (10.29) is
(10.30)
I::
1 1 2 2 Since t he double sum has ri 2 terms, 2:::~ µ~ = ri µ~. Of the n, covariance terms, 1 1 t here are ri ter1ns of t he form Cov[X i , X i] = ·v ar[ X ], \vhile t he remaining covariance terms
[ 10.4
are a ll 0 because X i and X
j
POINT ESTIMATES OF MODEL PARAMETERS
351
are independen t for i =f. j. This implies
E [V7;7,] = Var[X] -
1
2 (n, Var[X]) = n
T/, -1
n
Var[X].
(10.31)
However , by Defir1itior1 10.5, ·v ;i(X) is asyrnptot ica.lly unbiased because 'n, - 1
lirr1 E [Vn(X)] = lirr1
n-+oo
n,
n-+oo
·v ar[X] = Var[X].
(10.32)
Although Vn(X) is a biased estirr1ate, Theorem 10.12 suggests the deri·vation of an t1nbitiSed estimat e.
Theorem 10.13 The est'irnat e
V~(X) = '/7, ~ l
n
L (Xi - 1Vfn(X))
2
'i = l
is an, un,biased estirnate of Var[X]. Proof Using D efini t ion 10. 7 , vve have
(10.33) and E
[V~(X)J
= n, :
1
E [V7;1,(X) ] = \ !ar[X].
(10.34)
Cornparir1g t11e two estirr1ates of Var [X ], we observe that as n, grows without limit, the tvvo estimat es converge to the sarne ·v alue. However, for n, = 1, J\!f1 (X) = X 1 and ·v1 ( X) = 0. B}' cont rast, ·v { (X) is undefined. Because the ·v arian ce is a measure of the spread of a probability rr1odel, it is irnpossible to obtain an estirnate of the spread frorn only one observation. T11us the estimate V1 (X) = 0 is cornplet ely illogical. On the other ha nd., the ur1biased estimate of variar1ce based on two observations car1 be writter1 as = (X1 - X 2 ) 2 / 2, vvhich clearl}' reflects the spread (rnean square difference) of the observations. To go ft1rther and e·v all1ate the consisten cy of the sequence V{(X), ·v ;(X), ... is a surprisir1gly diffict1lt problern. It is explored in Problern 10.4.5.
v;
Quiz 10.4 X is the contir1uous uniform ( - 1, 1) randorr1 variable. Find t11e rnean square error , E [(Var[X] - V100 (X)) 2 ], of the sarnple variance estimate of Var[X ], based or1 100 ir1dependent observations of X.
[ 352
CHAPTER 10
THE SAMPLE MEAN
10 .5
Confidence I nterv:als
A confiden ce interv al estimat e of a p ararnet er of a probability model, suc11 as the expect ed value or the probability of an event, cor1sist s of a range of nt1mbers arid the probability that the parameter is vvit11ir1 that range. Theorern 10. l s uggests t h at as the nt1rnber of indep endent sarr1ples of a r andorr1 variable incr eases, t he sarr1ple rnean get s closer and closer t o the expected value. Sirnilarl:y, a la;\v of large nurnbers su ch as T11eorem 10.6 refers t o a lirr1it as t h e ntrrnber of observations gr om wit ho11t bo11nd. In practice, hovvever , vve observe a finite set of rneast1rements . In this section , we develop techniques to assess t 11e accuracy of estirnat es based on a fir1ite collection of observations. \Ve introduce t vvo closely r elat ed qt1ant it ies: the confiden,ce in,t erval, related to t he differen ce bet ween a r a r1dom varia ble a nd its expect ed value, arid t11e con,fiderice coeffi c'i en,t , relat ed to the probability t 11at a sarnple value of t he randorr1variable will be vvit hin t 11e confiden ce interval. The basic rr1athernatics of confiden ce intervals corr1es from Theorern 10.5(b) , r est ated here vvit h a = Var[X]/ric2 : P [l.l\f.n (X ) - µ,x I < c] > 1 -
Var[X] 'nC 2
= 1-
o~
(10.35)
Equation (10.35) cor1tains t'ivo inequalit ies. One inequality,
IMn(X ) - 1J,x l < c,
(10.36)
defines an event . This event states t h at the sample rr1ean is vvithin ± c units of the expect ed \ral11e. The length of t he interva.1 that d efines t his ever1t, 2c 11r1its, is r eferred t o as a con,fiden,ce in,t erval. The other ineqt1ality st ates t hat t he probability that t he sarr1ple m ean is in the confidence interval is at least 1 - a . We refer t o t he quantity 1 - a as t he confi den,ce coeffi cien,t. If a is srnall, vve are highly confident that Mn (X ) is in the inter'iral (1),x - c, µ,x + c) . In E quation (10.35) v.re observe that for any positive nurr1ber c, r10 rr1atter hov.r sm all, we can m ake a as srr1all as we like by choosing n, large er1ough. Ir1 a practical application , c indicates t he desired accuracy of an estirnate of µ,x, a indicates our confidence that we h a,re achieved this accuracy, and n, t ells us ho'iv rr1any sarr1ples 'ive need t o achieve t 11e desired a . Alternati'irely, gi,ren Var[X], n,, a nd a , Eq11ation (10.35) t ells us the size c of t11e confidence inter\ral. Example 10.1
Suppose we perform 'n independe nt tria ls of an experiment and we use t he relative frequency Pn(A) to estimate P[A]. Find the s mallest n, such that Pn(A) is in a confidence interval of length 0.02 with confidence 0.999. Recal l that Pn(A) is the sample mea n of the indicator random variableXA. Since X A is Bernoul li with success probabi lityP [A], E[XA] = P[A] and Var[XA] = P[A](l - P [A]). Since E[Pn( A)] = P[A], Theorem 10.S(b) says P [ Fn(A ) - P [A] <
c]
>1- p
[A] (~'~ p
[A]) .
(10.37)
[ 10.5
353
CONFIDENCE INTERVALS
In Examp le 10.6, we observed t hat JJ(l - rJ) < 0.25 for 0 < p < 1. Thu s P[A] (l P[A] ) < 1/ 4 for any value of P[A] and P [ Pn (A) - P (A] < cJ
>1-
For a con fi dence interva l of length 0.02, we choose c meet our constraint if 1-
1
( )2 4n, 0.01
4~c2 =
(10.38)
.
0.01 . We are guaranteed to
> 0.999.
(10.39)
Th us we need ri > 2.5 x 106 tria ls.
Ir1 t he next exa rnple ) \ve see t.hat if \ve r1eed a good estimate of the probability of a ra re event A , then t he nl1rnber of t rials will be la rge. For exarr1ple, if event A has probability P[A] = 10- 4 ) t11en estimatir1g P[A] \vithin ± 0.01 is meaningless. Accurat e estimates of rare ever1ts require significantly rr1ore t rials .
==;;;.. Exam pIe 10.8__,,;;== Suppose we perform n, independent trials of a n experiment. For an eve nt A of t he experiment, calcu late t he number of tria ls needed to guarantee t hat t he probability the relative frequency of A diffe rs fro m P[A] by more t han 10% is less than 0. 001. In Example 10.7, we were asked to guarantee that the relative freq uency Fn(A ) was within c = 0.01 of P[A]. T h is problem is different on ly in that we require Pn(A) to be withi n 10% of P[A]. As in Example 10.7, we can apply Theorem 10.S(a) and write
P [ Pn (A) - P [A] >
c]
< P [A] (1 ~ P [A] ) . T/,C
We can ensure that Pn(A ) is wit hin 10% of P(A] by choosing c ~
]
=
(10.40)
0. 1 P[A]. T his yields
(1 - P [A])
100 p [ Pn(A) - p [A] > O. l p [A) < n,(0.1) 2 P [A] < ri P [A] '
(10.41 )
since P[A) < 1. T hus the number of trials req ui red for the relative frequency to be with in a certain percentage of the tr ue probability is inve rse ly proportiona l to that probabil ity.
In t he follovving example, v.re obtair1 an estimate and a confiden ce inten ral, but vve rr1ust determine t he cor1fidence coefficien t associated \vit l1 the estirnate and t11e confidence inter\ral.
=== Example 10.91==:::1 Theorem 10 .S(b) gives rise to statements we hear in t he news, such as, Based on a sample of 1103 potential voters, the percentage of people support ing Candidate J ones is 58% with an accuracy of plus or minus 3 percentage points.
[ 354
CHAPTER 10
THE SAMPLE MEAN
The experiment is to observe a voter at random and determine whether the voter supports Candidate Jones. We assign the va lue X = 1 if the voter supports Candidate Jones and X = 0 otherwise . The probab ility that a random voter supports Jones is E[X] = p . In this case, the data provides an estimate Nln( X ) = 0.58 as an estimate of p . What is the confidence coefficient 1 - a corresponding to this statement? is a Bernoulli (p ) random variable, 0.03, Theorem 10.5(b) says
Since
c=
X
E [X ) = I>and Var[X) = p( l - p) . For
p( l - r>)
P [IJIJ,n(X) - r>I < o.03) > 1 -
(
n, 0.03
)2
= 1- a.
(10.42)
We see that
I>(l - p)
(10.43)
cv. = n,(0.03) 2 ·
Keep in mind that we have great confidence in our result when a is small. Hovvever, si nee we don 't know the actua I va I ue of p, we wou Id Ii ke to have confidence in our resu Its regard less of the actua I va I ue of cone Iude that
p.
0.25 a< - - -2 - n,(0.03)
Because
Var [X ) = p( l - p) < 0.25 . We
277.778 ri
(10.44)
Thus for ri = 1103 samples, a < 0.25 , or in terms of the co nfidence coefficient, 1 - cv. > 0.75. Th is says that our estimate of I> is w ithin 3 percentage points of I>with a probability of at least 1 - a = 0.75.
Interval Estimates of Model Parameters
In Theorern 10.5 a nd Extt rnples 10.7 and 10.8 ) the sarnple rriean Nln( X ) is a poirit estirnate of the rriodel pa rarnet er E [X ). We have exarniried how t o guaran tee that the sample rriean is in a confiderice interval of size 2 c vvit h a confidence coefficient of 1 - cv. . In t his case) t he point estirriat e Mn( X ) is a ran dorri ·v ariable and the confidence interval is a deterrriiriistic iriterval. Iri confiderice interv al estirriation, we turn the confidence interval inside 011t. A confiden ce interval estirr1at e of a pararneter corisist s of a range of values and a probability t hat t he para.rrieter is in the stated rarige. If the paramet er of iriterest is r , t he estirriat e consists of random variables A and B ) arid a riurriber a, wit h t lie property P
[A < r < B ) >
1 - cv..
(10.45)
In t liis context) B - A is called the corifiden,ce iriterval a rid 1 - a is t lie confiden,ce coeffi cierit. Since A and B are random ·v ariables ) the con,fiden,ce in,terval is ran,dorn. The confidence coefficient is novv t lie probability t hat t he deterrninistic model parameter r is in t he randorri confiderice interval. An accurat e estimate is reflect ed in a lov.r value of B - A and a high ·v alue of 1 - a .
[ 10.5
CONFIDENCE INTERVALS
355
In rr1ost p r actical applications of cor1fiden ce-inter val estirr1a tion, the ur1knovvn p ara rneter r is t he expected va.ll1e E (X ] of a r a ndorr1 varia ble X a nd the cor1fiden ce ir1terval is derived from t h e sample rr1ean , J\Jn(X ), of d ata collected in n, indeper1dent tria ls. In t 11is context , Equ ation (10.35) can b e rearranged to say t h at for a n y constant c > 0, . Var(X] p [l\!In (x ) - c < E [X] < J\,{,n (x ) + c] > 1 2 .
(10.46)
T/,C
In compa rir1g Eql1ations ( 10.45) a rid (10.46), v.re see that
A= l\!In (X ) - c,
B = l\!In (X ) + c,
(10.47)
a nd t h e confiden ce interval is t h e ra n dom interval [1'.Jn(X ) - c, Mn( X ) + c] . Just as in Theorern 10.5 , the confider1ce coefficient is still 1 - a, w h ere a = Var (X] / (ric 2 ) . E quation (10. 46) indica.tes that every confiden ce interval estirr1a t e is a comprorr1ise between the goals of achie·v ir1g a n a rrow cor1fiden ce ir1terval a nd a high confidence coefficier1t . Giver1 a n}' set of d ata, it is alv.rays possible sirr1l1ltan eol1sly t o increase both the cor1fider1ce coefficient a nd the size of t he confiden ce interval or to d ecrea,se therri. It is aJso possible to collect rnore d ata (increase ri in E q u ation (10. 46)) and irnpro·ve bot 11 accl1racy m easures. The nurr1ber of t rials n ecessar}' t o ac11ieve specified qua.lit}' levels d epends on prior knowled ge of t 11e proba bility rr1od el. I n t 11e follov.ring ex arr1ple, the prior kno\vledge cor1sists of t h e expected value a nd st a nda rd d e,ria tion of t11e rneasurerr1en t error.
Example 10.10 Suppose X ,i is the i th indepe ndent measurement of the length (in cm) of a board whose actual length is b cm. Each measurement X ,i has the form (10.48) w here the measurement error Z ,i is a random variable with expected value zero and standard deviation (J"z = 1 cm. Since each measurement is fa irly inaccurate, we would like to use Mn (X ) to get an accurate confidence interva l estimate of the exact board length. How many measurements are needed for a confidence interva l estimate of b of length 2c = 0.2 cm to have confidence coefficient 1 - a = 0.99? Since E(X,i ] =band Var( X i] = Var [Z ] = 1, Equation (10.46) states
P (l\lfn ( X ) - 0 .1
< b<
1 100 Mn (X ) + 0 .1] > 1 - ( ) 2 = 1 . T/, 0.1 T/,
(10.49)
Therefore, P[lVIn(X ) - 0.1 < b < J\!fn (X ) + 0.1] > 0.99 if 100/n, < 0.01. This implies we need to make n, > 10,000 measurements. We note t hat it is quite possible t hat P [J\!fn(X ) - 0.1 < b < 1'.fn(X) + 0.1] is muc h less than 0.01. However, without know ing more about the probabi lity model of the ra ndom errors Z,i, we need 10,000 measurements to achieve the desired confidence.
[ 356
CHAPTER 10
THE SAMPLE MEAN
It is often assurr1ed that the sarr1ple rnean J\!fn(X) is a Gal1ssian random ·v ariable, either becat1se eac11 t rial produces a sarnple of a G aussian r ar1dorn variable or becat1se there is er1ol1gh data, to jt1stify a central lirr1it t11eorerr1 approxirnatiori. Ir1 t11e sirr1plest applications, the variance o-1 of eac11 data sarr1ple is knov.rn and t11e estirnat e is syrnrr1etric about the sarnple rr1ean: A= J\!fn(X) - c arid B = lYin(X) + c. This irr1plies the follov.rir1g relationship arnong c, a, and n,, the nl1mber of trials used to obtain t he sarnple meari.
Theorem 10.14 Let X be a Ga11,ssiar1, (µ,, a ) rari,dorn variable. A con,fiderice in,t erval estirnate ofµ, of the fo rrn
1Yfn(X) has confiden,ce coefficierit 1 - a,
C
< /1, < 1Yfn(X) + C
'tu here
a/2 = Q(cvn/o-) = 1 -
P [1"1n(X) - c < jJ.X < Mn(X)
+ c] = P [p,x
c < 1"1n(X) < jJ.X + c] = P [-c < Mn(X) - p,x < c] . -
Since J\1n (X) - jJ, is the Gaussian (O,o-I v 'n) random variable,
-c Ji.lln(X) - JJ. c P[M11 (X)-c
=1 -
2Q (
c:).
(10.50)
l (10.51)
Thus 1 - a= 1 - 2Q(cyTn/a).
Theorern 10.1411olds vvhenever Mn(X) is a Ga ussian randorr1 variable. As stated in the theorem , this occl1rs w henever X is Gat1ssian. Ho'ive·ver , it is also a reason able approxirnation vvhen n, is large enough to use the central limit theorem. i:=--== Example
10.11- - -
1n Example 10.10, suppose we know that the measurement errors Z,i are iid Gaussian random variables. How many measurements are needed to guarantee that our confidence interval estimate of length 2c = 0.2 has confidence coeffi cient 1 - a > 0.99? As in Example 10.10, we form th e interva l estimate
Mn(X) - 0.1 < b < Mn(X)
+ 0. 1.
(10.52)
The problem statement requires t his interval estimate to have confide nce coeffic ient 1 - a > 0.99, implying a < 0.01. Since each measurement X ,i is a Gaussian (b, 1) random variab le, Theorem 10.14 says that a = 2Q (O.lyn) < 0.01, or equiva lently,
Q( yn/10) = 1 -
(10.53)
[ 10.5
CONFIDENCE INTERVALS
357
In Table 4.2, we observe that (x) > 0.995 when x > 2.58. T herefore , our confidence coefficient condition is satisfied when yn/ 10 > 2. 58, or ri > 666.
In Exarriple 10.10, vvith lirnited knovvledge (only the expected value and variance) of the probability model of measurernerit errors, we find that 10,000 measurernerits are rieeded t o guararitee a n accuracy condition. vVhen v.re learn the en t ire probability model (Example 10.11), v.re find that only 666 rrieasurements are riecessary.
Example 10.12
Y is a Gaussian random variable with unknown expected value /J, but known variance o-~ . Use Nln (Y) to find a confidence interva l est imate of 11,y with confidence 0.99. If o-~ = 10 and NI100 (Y) = 33.2, what is our i nterva l est imate of 11, formed from 100 independent samples? With 1 - a = 0.99, Theorem 10 .14 states that
P [Nin(Y ) - c < µ < Mn(Y ) + c] = 1 - a = 0.99 ,
(10.54)
where
a/2
= 0.005 =
1 -
c fa) ( o-y
cyn/
This implies ( o-y) = 0.995. From Table 4.2, the confidence interva l estimate
.
c=
< µ, < Mn ( Y ) +
(10.55)
.
2.580-y / yn. Thus we have 2.580-y
fa'n,
(10.56)
If o-~ = 10 and J\f100 (Y ) = 33.2, our i nterva I est imate for the expected va Iue µ is
32.384 < µ, < 34.016.
Exarriple 10.12 demonstrat es t hat for a fixed corifidence coefficient, the width of the inter val estimate shrinks as \Ve iricrease t he nl1mber n, of independent samples . In part ict1lar , v.rlien t he obser vations are Ga ussian , the width of t he iriterval estirriat e is in\rersely proportional t o fa . Quiz 10.5 X is a Bernol1lli randorri variable \vit h uriknowri success probability JJ. Using n, independent sarriples of X arid a cent ral lirriit theorem approxirriation, firid confidence iriterval estirriates of JJ with corifidence levels 0.9 arid 0.99. If M 100 (X ) = 0.4 , what is our interval estirriate?
[ 358
CHAPTER 10
THE SAMPLE MEAN
0.99
0.7
••. .
0 . 6 l . ~.' \ . ,( .,.. . ~ ~-· 0. 5 -
- - M (x) n
.
v ..... .......
.. . .. . .
0.4
o.3
0.9
..
. ·..-
. . .. . . .':" . . . ..... _... . .,... ...
-:'"'"' -:"'
--
. ·1
'r!
0 .2~-~--~-~--~-~
100 7l -
200
300
400
500
500 ,p - 1/ 2
Figure 10.1 Two san1ple runs of bernoulliconf (n,p). Each graph plots five sequences: In t h e cen ter is JVln(X ) as a function of n,, which is sandwich ed by t he 0.9 confid ence interval (shov.rn as d otted line pair) , 111.rhich is in turn sandv.riched by t he outern1ost (d ashed line) pair shov.ring t he 0 .99 confidence interval.
10.6
MAT LAB
can help us visu alize t he rr1atherna.tical t ec11r1iques and estimation procedures presented in t his chapter. One J\IIATLAB progr arr1 generat es sarnples of J\fn( X ) as a function of n, for specific r andorr1 varia bles along wit11 the lirr1its of confidence intervals . Anot 11er progra.rn compares Mn (X ) vvith the p a.r a rr1et er value of the probability model used in the sirnulation. MAT LAB
The r1evv ideas in this ch a.p ter - n arr1ely, t11e con\rergence of t he sample rnean , the Chebys11ev inequality, and t 11e v.reak lavv of large r1urr1bers - are largely theoretical. The application of these ideas relies on m atherr1atica,l techniques for discrete and cor1t inuo11s r ar1dom \raria.bles a.rid s11ms of r a.ndorn \raria.bles that \Ver e introdt1ced in prior c11apters . As a. r esult, in t erms of MAT LAB, this c11ap ter b rea.ks litt le n e\v ground. Nevert11eless, it is instructive to use 1\IIATLAB to simulate t 11e convergen ce of t he sa.rnple rr1ea.n Mn (X ) . In pa rt icul ar, for a ra.r1dom \rariable X , v.re can v ie\v a set of iid samples X 1 , ... , X n as a. r andom vector X = [X 1 X n] '. T11is \rect or of iid sarr1ples }' ields a vector of sarr1ple rnean values M (X) = [llf1 (X ) M 2 (X ) Mn(X)]' \vhere
JV!k( X ) = X1
+ ·· · + X k k
(10.57)
We call a. graph of the sequence J\,fk ( X) versus k a. sarnple rnean, trace . By gr aphing the sarr1ple rr1ea.n t r ace as a function of n, \Ve can observe the convergence of the point estimate Mk( X ) t o E[X].
Example 10.13 Write a function bernoulliconf (n,p) that graphs a samp le mean trace of length n, as we ll as the 0.9 and 0.99 confidence interval estimates for a Bernoulli (r> = 0. 5)
[ 10.6
MATLAB
359
random variable.
function MN=bernoulliconf (n,p); x=bernoullirv(p,n); MN=cumsum(x)./((1:n)'); nn=(10:n)'; MN=MN(nn); std90=(0.41)./sqrt (nn); std99=(0.645/0.41)*std90; y=[MN MN-std90 MN+std90]; y=[y MN-std99 MN+std99]; plot(nn,y);
In the solution to Quiz 10.5 , we found that the 0.9 and 0. 99 confidence interval estimates cou ld be expressed as
where r-y = 0.41 for confidence 0.9 and r-y 0.645 for confidence 0.99 . In the JVI ATLAB function bernoulliconf (n,p), xis an instance of a ra ndom vector X with iid Bernoul li (rJ) components. Similarly, MN is an instance of the vector M (X ) . The output graphs MN as we ll as the 0. 9 and 0. 99 confidence intervals as a function of the number of trials ri. Each time bernoulliconf is run, a different graph is generated . Figure 10.1 shows two sample graphs. Qua litatively, both show that the sample mean is converging to p as expected . Further , as r1, increases, the confidence interva l estimates shrink.
By graphir1g rr1l1lt iple sam ple rr1ean t races ) "''e can observe t he convergence propert ies of the sarr1ple mean.
- - - Example 10.14- - Write a M ATLA.B function bernoullitraces(n,m,p) to generate 777, sample mean traces, each of length n,, for the sample mean of a Bernou lli (J>) random variable.
function MN=bernoullitraces(n,m,p); x=reshape(bernoullirv(p,m*n),n,m); nn=(1:n)'*ones(1,m); MN=cumsum(x)./nn; stderr=sqrt(p*(1-p))./sqrt((1:n)'); plot(1:n,0.5+stderr, ... 1:n,0.5-stderr,1:n,MN);
In bernoulli traces, each column of x is an instance of a random vector X with iid Bernoulli (rJ) components. Sim ilar ly, each column of MN is an instance of the vector M (X ) .
The output graphs each column of MN as a fu nct ion of the number of tria ls ri. In addition, we ca lcu late the standard error .jek, and overlay graphs of p - .jek, and p + .jek,. Equation (10.24) says t hat at each step k, we should expect to see roughly two-thirds of the sample mean traces in the range ( 10.58) A sample graph of bernoulli traces (50, 40, 0. 5) is shown in Figure 10.2 . T he figure shows how at any given step, approximate ly two thirds of the sample mean traces are within one standard error of the expected va lue.
- - - Quiz 10.6- - Generat e 777, = 1000 t r aces (each of length n, = 100) of the sarnple m ean of a Bernotllli (rJ) ra ndom variable. At each step k, calcl1lat e Ji.Ilk an d t he number of
[ 360
CHAPTER 10
THE SAMPLE MEAN
0.5
Q L..&..L-..L..L--t.;...._......L..~~....L..-~---1~~---L.~~....L...-~~.L...-~----L~~......L..~---'
0
5
I0
l5
20
25
30
35
40
45
50
n
Figure 10.2 Sample ou t put of bernoullitraces .m , including t he detern1inistic standard error graphs. T he graph shovvs ho\v at any given step , about t\vo thirds of t he san1ple means are \vi thin one standard error of t he true mean .
traces, suc11 t hat Mk is -vvit hin or1e standard error of t11e expect ed value p . Graph Tk = Nfk/rn, as a function of k . Explain }'Our results.
Fv,rther R eadin,g: [Dl1r94] contains concise, rigorous presentations and proofs of t11e lav.rs of large rurrnbers. ['vVSOl] co·vers pararr1eter estirnation for both scalar and vector randorr1 variables a.nd stoc11astic processes . Problems Difficulty:
Easy
10.1.1
X 1, .. . , X n is an iid sequence of exponent ial random variables, each \vit h expected value 5. (a) W hat is Var[J\1g(X)], t he va1iance of t he sam ple mean based on nine trials? (b) \iV hat is J:>[X1 > 7] , t he probabili ty t hat one ou tcome exceed s 7? ( c) Use t he cent ra l limit t heorem to est imate P [l\1g(X) > 7], t he p robability t hat t he sam ple mean of nine t rials exceeds 7.
10.1.2
X 1, ... ,Xn are independent uniform random variables wit h expected value J.lX = 7 and variance ·v ar [X] = 3. (a) W hat is t he P D F of X 1 ? (b) W hat is Var [JVf1 6 (X)], t he variance of t he sam ple mean based on 16 t ria ls? (c) \i\fhat is P [X 1 > 9], t he probabili ty t hat one ou tcome exceed s 9? ( d) Would yo u exp ect P [M15 (X) > 9] to be b igger or s m a ller t h a n P [X 1 > 9]?
Moderate
D ifficu lt
t
Experts Only
To ch eck yo tu· int uit io n, use the cent ra l limit t heorem to estimate l") [i\115(X) > 9].
10.1.3
X is a unifor m (0 , 1) ra ndom variable. y = X 2. \i\fhat is t he standard er ror of t he estimate of µ,y based on 50 independen t samples of X ?
10.1.4
Let X1, X2, ... d enote a sequence of independen t samples of a r andom variable X wit h varia nce Var[X ]. 'vVe defi ne a new r andom sequence Y1, Y2, . . . as Y1 = X1 - X2 and Yn = X 2n- 1 -X2n · (a) F ind E[X1.] and Var[Y'.;.,,]. (b) F ind t he exp ected value and vari ance of J\1n(Y).
10.2.1
The weigh t of a rando mly chosen ~!.Iaine black bear has expected value E [W] = 500 po unds and standard deviat ion a w= 100 pounds. Use t he C heb yshev inequality to upper bound t he probability t hat t he \veigh t of a r andomly chosen bear
[ PROBLEMS
is more than 200 pounds from the expected value of the 'veight.
10.2.2
For an arbitrary random variable X, use the Chebyshev inequality to show that the probability t hat X is more thank standard deviations from its expected value E[X] satisfies P[IX - E[X]I > ka]
< : 2.
For a Gaussian random variable Y, use the (-) function to calculate the probability that Y is inore than k standard deviations from its expected value E[Y]. Compare the result to the upper bound based on the Chebyshev inequality.
10.2.3
Elevators arrive randomly at the ground floor of an office build ing. Because of a large crowd, a person will wait for time W in order to board the third arriving elevator. Let X 1 denote the t ime (in seconds) until the first elevator arrives and let x i denote the time between the arrival of elevator i - 1 and i. Suppose X 1, X2, X3 are independen t uniform (0, 30) rando1n variables. Find upper bounds to the probability W exceeds 75 seconds using
3 61
10.2.6• Use the Chernoff bound to show that the Gaussian (0, 1) random variable Z satisfies
For c = 1, 2, 3, 4, 5, use Table 4.2 and T able 4.3 to compare the Chernoff bound to the true value: P[Z > c] = Q(c) .
10.2.7
·u se the Chernoff bound to sho'v for a Gaussian(µ,, a) rando1n variable X that p [X
> c] < e -
(c - µ, ) 2 / 2cr2.
Hint: Apply t he result of Problem 10.2.6.
10.2.8
Let J{ be a Poisson random variable with expected value a. ·u se the Chernoff bound to find an upper bound to P [K > c]. For 'vhat values of c do we obtain the trivial upper bound P[K > c] < 1?
10.2.9• In a subway station, there are ex-
(a) the l\/Iarkov inequality,
actly enough customers on the platform to fill three trains. The arrival t ime of the n,th train is X1 + · · · + X n 'vhere X1 , X2, . .. are iid exponential random variables with E[Xi] = 2 minutes. Let vV equal the t ime required to serve the waiting customers. Find P [W > 20].
(b) the Chebyshev inequality,
10.2.10
( c) the Chernoff bound.
10.2.4
Let X equal the ar1ival t ime of the third elevator in Problem 10.2.3. F ind the exact value of P [W > 75]. Compare your ans,ver to the upper bounds derived in Problem 10.2.3.
10.2.5
In a game with two d ice, the event snake eyes refers to both six-s ided dice showing one spot. Let R denote the number of dice rolls needed to observe the third occurrence of snake eyes. F ind (a) the upper bound to P [R > 250] based on the l\/Iarkov inequality, (b) the upper bound to P [R > 250] based on the Chebyshev inequality, (c) the exact value of P [R
> 250].
Let X1 , . . . , X 11 be independent samples of a random variable X. Use the Chernoff bound to show that JV!n( X) = (X1 + · · · + X n)/n, satisfies
P [M,,(X) > c] < ( ':;'J~ e- ' 0
Let X1, X2 , . .. denote an iid sequence of random variables, each 'vith expected value 75 and standard deviation 15. (a) Ho'v many samples n, do we need to guarantee that the sample mean Mn(X) is between 74 and 76 with probability 0.99? (b) If each X i has a Gaussian d istribution, ho'v many samples n,' would we need to guarantee JV!11 1 (X) is between 7 4 and 76 with probability 0.99?
[ 362
CHAPTER 10
T HE SAMPLE MEAN
10.3.2 Let XA be the indicator random variable for event A \Vith probability P[A] = 0.8. Let Pn (A) denote t he relative frequency of event A in n, independent trials.
Hint : Suppose n cookies have been made such that N k cookies have k chips. You are just one of the I:~ 0 kNk chips used in the n, cookies.
(a) Find E[XA) and Var[ XA).
10.3.6 In this problem, \Ve develop a weak law of large numbers for a correlated sequence X 1, X2 1 • • • of identical random variables. In particular, each X i has expected value E [X i] = µ, and t he random sequence has covariance function
(b) What is Var[P11 (A)]? ( c) Use the Chebyshev inequality to find the confidence coefficient 1 - a such that P1oo(A) is 'vithin 0. 1 of P[A]. In other \vords, find a such that P [ P1oo(A) - P [A] <0. 1] > 1 - a. ( d) Use the C hebyshev inequality to find out how many sa1nples ri are necessary to have Fn(A) \Vithin 0.1 of P[A] with confidence coefficient 0.95. In other words, find n such that
where a is a constant such that lal < 1. For this correlated random sequence, \Ve can define the sample mean of n sa1nples as
JVln
X1 + · · · + X n =- -n,- - -
(a) ·u se Theorem 9.2 to show that
10.3.3 X 1, X2 1 • • • is a sequence of i id Bernoulli (1/2) random variables. Consider t he random sequence Yn = X 1 + · · · + Xn. (a) What is limn-+oo P[IY2n - rij < vTn72J? (b) What does t he weak law of large numbers say about Y2n?
10.3.4 In communication systems, the error probability P [E] may be difficult to calculate; ho,vever it may be easy to derive an upper bound of the form P [E] < E. In this case, we may still \Vant to estimate P [E] using the relative frequency Fn(E) of E in n, trials. In this case, show that P [ Fn(E)-P[E]
>c]
E
< nc2.
10.3.5 A factory manufactures chocolate chip cookies on an assemb ly line. Each cookie is sprinkled \vith J{ chips from a very large vat of chips, \vhere K is Poisson with E[K] = 10, independent of the number on any other cookie. Imagine y ou are a chip in the vat and you are sprinkled onto a cookie. Let J denote the number of chips (including you) in your cookie. \i\f hat is the PMF of .J?
Var[X1
+ · · · Xn] < n,a 2
(1 +a) l-a
.
(b) Use the Chebyshev inequality to show that for any c > 0, P [I Nin -
µI > cl <
a 2 (1 +a) ( )2 n, l-ac
(c) ·u se part (b) to sho'v that for any c
·
> 0,
lim I=> [I Mn - p,I > c] = O.
n -+oo
10.3.7 In the Gaussian Movie DataBase (GMDB), reviewers like you rate movie.s wit h Gaussian scores. In particular, the first person to rate a movie assigns a Gauss ian (q, 1) revie'v score X 1, where ro is the true "quality 1' of the movie. After n revie,vs, a movie 1 s rating is R n= I::1, 1 X i/n,. Strangely enough, in the GMDB, revie\vers are infi uenced by prior revie,vs; if after ri - l reviews a movie is rated Rn- I = r, the n,th review n, will rate t he movie X n, a Gaussian (r, 1) random variable, conditionally independent of X 1, . . . , Xn - I given R n- 1 = r. (a) F ind E [Rn]·
[ PROBLEMS
(b) F ind the l=> DF f Rn(r). Hint: You may have unresolved parameters in t his answer. (c) F ind Var[Rn]. Hint : F ind E [R?i, IR n- 1]. (d) In terpret your results as n, --+ oo? Does t he la'v of large numbers apply here?
10.4.1 When X is Gaussian, verify Equat ion (10.24) , which states t hat the sample mean is 'vithin one standard error of the expected value \vith probability 0.68. 10.4.2 Suppose the sequence of estimates Rn is biased but asy1nptotically unbiased. If limn-too Var[ Rn]= 0 , is th.e sequence k i consistent?
363
H int : Let Ai = { IMi(n) - p,i l > c} and apply the union bound (see Problem 1 .3.11) to upper bound P[A1 U A2 U · · · U Ak]· Then apply the Chebyshev inequality.
10.4.5 Given the iid samples X 1, X2 , . . . of X, define the sequence Y1 , Y2 , . . . by
(x
2k- 1 -
+
X2k- 1 + X2k)
2
2
(x
X2k- 12_ +_ X2k) 2k - ___ _
2
A
10.4.3 i\.n experimental trial produces random variables X1 and X2 vvit h correlation r = E [X 1X 2]. To estimate r, \Ve perform n, independent trials a nd form t he estimate
where X 1(i) and X 2 (i) are samples of X 1 and X2 on trial i. Sho\v that if Var[X1X2] is finite, then R1, R2 , ... is an unbiased, consistent sequence of estimates of r.
10.4.4 An experiment produces rando1n / X k J vvi t h expected vector X = [ X 1 value µ x = [11,1 p,k] '. The ith component of X has variance ·v ar[ X i] = <7f. To estimate µ x, \Ve perform n, independent trials such that X (i) is the sample of X on trial i, and we form the vector mean 1 M(n) = n
n
L
X (i).
Note that each Yk is an example of v;, an estimate of the variance of X using tvvo samples, g iven in T heorem 10. 13. Show that if E[Xk] < oo for k = 1, 2, 3 , 4, t hen the sample mean Mn(Y) is a consistent, unbiased estimate of Var [X].
10.4.6 An experiment produces a Gaussian random vector X = [X1 Xk]' with E[X ] = 0 and correlation matrix R = E[XX']. To estimate R , we perform n, independent tria ls, yield ing the iid sample vectors X (l), X (2), . .. , X (n,), and form the sample correlation matrix 1 R (n,) = n
n
L rn= l
(a) Show R (n) is unb iased by showing E[R (n)] = R . (b) Sho'v t hat t he sequence of estimates R (n,) is consistent by showing that every ele1nent ~j (ri) of the matrix R converges to R ij . T hat is, show that for any c > 0,
i= l
lim P [max
(a) Sho'v M (n,) is unb iased b y showing E [M (ri) ] = µ x. (b) Show that t he sequence of estimates M n is consistent b y showing that for any constant c > 0,
IMj ( n) ; = l , . .. ,k
lim P [ max
n-too
X (m,)X '(m).
- p,j I >
c] = 0.
n -too
?,,J
~j - ~j
>
c] = O.
Hint: Extend t he techn ique used in Problem 10.4.4. ·v ou \vill need to use the result of Problem 7.6.4 to sho'v that Var[XiXj] is finite.
10.5.1 X1 , . . . ,Xn are n, independent identically d istributed samples of random
[ 364
CHAPTER 10
T HE SAMPLE MEAN
variable X with PMF
Px(x)
=
0.1 0.9 0
x = 0, x = 1, other,vise.
(a) How is E [X) related to Px(l)? (b) Use Chebyshev's inequality to find the confidence level a such that Mgo(X), t he estimate based on 90 observations, is \Vi thin 0.05 of Px(l). In other words, find a such that P [INfgo(X) - Px(l)I
10.6.1 Graph one t race of the sample mean of the Poisson (1) random variable. Calculate (using a central limit theorem approximation) and graph the corresponding 0.9 confidence interval estimate. 10.6.2 X is t he Bernoulli (1/2) random variable. The sample mean Mn(X) has standard error ·v ar[X]
> 0.05)
(c) Use Chebyshev's inequality to find out how inany samples n, are necessary to have Mn(X) \Vithin 0.03 of Px(l) with confidence level 0.1. In other \Vords, find n such that P [IJV!n(X) - Px (1)1
(b) Ho\v many trials ri are needed so t hat the probability Pn(A) differs from P[A) by more than 0.1 % is less t han 0.01?
> 0.03) < 0.1.
10.5.2 X is a Bernoulli random variable \Vith unknown success probability p. Using 100 independent samples of X , find a confidence interval estimate of p with confidence coefficient 0.99. If M1oo(X) = 0.06, \vhat is our interval estimate? 10.5.3 Inn independent experimental trials, the relat ive frequency of event _4 is Fn(A). How large should n be to ensure that the confidence interval estimate
Fn(A) - 0.05 < P [A) < Fn(A) + 0.05 has confidence coefficient 0.9?
10.5.4 When we perform an experiment, event A occurs w ith probability P [-4) = 0.01. In this problem, we estimate P [-4) using Fn(-4) , the relative frequency of A over ri, independent trials. (a) How many trials n, are needed so that t he interval estimate
Fn(A) - 0.001 [-4 ) < Fn(A) + 0.001 has confidence coefficient 1 - a= 0.99?
ri,
1
-
2Fn ·
The probability t hat Mn(X) is w ithin one standard error of p is
Pn
l
1 < l1lfn(X) < 2 1 + yn, 1 = .P [ 21 - 2 yr;, · 2
·u se the binorni a lcdf function to calculate the exact probability Pn as a func t ion of n,. What is the source of the unusual sawtooth pattern? Compare your results to the solution of Quiz 10.6.
10.6.3 Recall that an exponential (,\)random variable X has
E [X) = 1/ ,\ , Var[X) = 1 / ,\ 2 . Thus, to estimate ,\ from n, independent samples Xi , . .. , X n, either of the following techniques should work. (a) Calculate the sample mean Mn(X) and form t he estimate .\ = 1/ JVfn ( X). (b) Calculate t he unbiased variance estimate v;~(X) of Theorem 10.13 and form t he estimate 5- = 1/ Jv~ X).
(
A
·u se lVIATLAB to simulate the calculation ,\ and ,\ for m, = 1000 experimental t r ials to determine which estimate is better.
10.6.4 Xis IO-dimensional Gaussian (0 , I ) random vector. Since E [X ) = 0 , R x = C x = I . We w ill use the method of Problem 10.4.6 and estimate R x using the
[ PROBLEMS
sample correlation mat rix 1
R (n,)
= -n, L
10.6.5
In terms of p aram eter a, random variable X has CDF
n
X (m,)X '(m,).
1n= l
x
Fx(x)= {O
1 1- ..----........
[x - (c.L - 2)]2
For n E {10, 100, 1000, 10,000} , construct a l\II ATLAB simulation to estim ate
p
[max & j - l ij > o.os] . i ,J
365
::i;
>a -
1.
(a) Show t h at E [X] = a by showing t h at E[X-(a-2)] = 2. (b) G enerate m = 100 t races of t h e sample m ean JVJ11 (X) of lengt h n, = 1000. Do you observe convergen ce of t h e sample m ean to E[X] =a?
[
Hypothesis Testing
Sorne of the most irnporta.nt applications of probability theory invol-ve reasoning in t11e presence of uncertair1ty. In these applications, v.re analyze t 11e observatioris of an experirnent in order t o rr1ake a decision. W her1 t he decision is based on t he propert ies of rar1dorn variables, the reasoning is referred to as statistical inf eren,ce. In Chapter 10, -vve introduced two types of st atistical inference for model pararr1et ers: poir1t estimation and confidence-interval estirnation. In t11is ch apter , -vve introduce t-vvo rnore categories of inference: significance t estir1g and 11ypothesis t esting. Statistical inferer1ce is a broad , deep subject -vvith a very large body of theoretical kr10\vledge and pract ical techniques. It has it s own exter1si·ve lit erature and a ·vast collection of practical techniques, man:yr of t hern valuable secrets of cornpanies and governments. This chapt er, Chapter 10, and Chapter 12 provide a r1 int roductory view of t 11e st1bject of statist ica.1 inferen ce. Our aim is t o indicate to read ers ho-vv the fundamentals of probttbility theory presented in t11e earlier chapters can be used t o rr1ake accurat e decisior1s in the presen ce of uncertaint:yr. Like probability theory, t11e theory of st atistical ir1ference refers to an experirr1ent cor1Sistir1g of a procedtu·e and observations. In all statistical ir1ference rr1ethods, there is also a set of possible decisions and a rr1eans of measl1ring the acct1racy of a decision. A statistical ir1ference rr1ethod assigr1s a decision to each possible outcome of the experirr1er1t . Therefore, a statist ical inference rr1etl1od consist s of three steps : P erforrn an experiment , obser ve an outcorne, st at e a decisiori. T11e assignrr1ent of decisions t o out cornes is b ased on probabilit y theory . T 11e aim of t he assignrr1en t is t o achie·ve the highest possible accurac:yr. This c11apter cor1tains brief int roductions t o t\vo cat egories of st at istical inference. • S ignific ance T esting D e cis ion Accept or reject the 11ypothesis t hat the observatior1s result frorr1 a certain proba.bility rr1odel H.0 Accuracy Measure Probabilit y of reject ing the 11ypothesis when it is t rue 366
[ 11.1
SIGNIFICANCE TESTING
367
• H ypot h esis T estir1g D ecision The observatior1s result frorn one of M hypothetical probability models: H.o, H1 , ... , H JVJ -I·
Accuracy Measure Probability that t:he decision is Hi w11er1 the true rr1odel is
H.1 for i , j
= 0, 1, ... , M - 1.
In t he follovving exarr1ple, -vve see that for the same experirr1ent, each testing rnethod addresses a particular kind of question ur1der particl1lar assurnptions. ==;;;.. Exam p Ie 11. l ==:::::a Suppose X 1 , ... , Xn are ii d samples of an exponentia l (,\) random variable X w ith un known pa rameter A. Usi ng the observations X 1 , ... , Xn. each of the statistical inference meth ods can answer q uestions regard ing the unknown A. For each of the methods , we state the underlying assumptions of t he method and a question t hat can be addressed by the method .
• Significan ce Test Ass umi ng ,\ is a co nstan t , should we accept or reject the hy pothesis t hat ,\ = 3.5? • H y p othesis T est Ass umi ng ,\ is a constant, does ,\ equa l 2. 5, 3.5, or 4.5?
To decide either of the ql1estions in Example 11. l , "''e ha\re to state in advan ce "''hich \ralues of X 1 , ... , Xn produce each possible decision. For a sigr1ificance test, the decision rntlst be either accept or reject. For the h:ypothesis t est , t11e decision rnust be one of the numbers 2 .5, 3. 5, or 4.5.
11.1
Significance Testing
A hypothesis is a candidate probabilit}' model. A significance test specifies a rej ection set R consisting of lov.r-probabilit}' Ol1tcornes of an experirr1ent. If an observation is in the set of lov.r-probabilit}' outcorr1es, the decision is "reject the 11}rpothesis." The significance level, defined as the probabilit}' of an outcome in the rejection set, deterrnines the rejectior1 set . A significance test begins -vvith t 11e 11ypotl1esis, H 0 , t hat a certain probability rr1odel describes t 11e observations of an experirnent . The questior1 addressed b}' the test has t vvo possible ansV\rers: accept t 11e hypothesis or rej ect it. The sign,ifica/nce level of the test is defined as t11e probability of rej ecting the 11ypothesis if it is true. The test divides S , the sarnple space of the experirr1er1t , ir1t o a partition consisting of an acceptance set A and a rejection set R = A c. If the observa.tion s E A , vve accept H 0 . If s ER, \Ve rej ect the hypothesis. Therefore the significance level is
a = P [s ER] .
(1 1.1)
To design a significar1ce test, \Ve start wit11 a vall1e of a and ther1 determine a set R that satisfies Equation ( 11. l ).
[ 368
CHAPTER 11
HYPOTHESIS TESTING
In rriariy applications) Ho is referred to as the 'null hypothesis . In these applications) there is a kriown probability model for an experirnent. Then the conditions of tlie experiment change arid a significance test is perforrned to deterrriine whetlier the origirial probabilitJr rr1odel rerriains valid. The null h}rpothesis states that the changes in the experirrient have no effect ori the probability rnodel. An exarriple is the effect of a diet pill on tlie vveight of people vvlio test the pill. T lie follovving exarriple applies to Interriet tweeting.
- = Exa mple 11.2 Suppose that on T hursdays between 9:00 a nd 9:30 at night, the number of tweets J\T is the Poisson (107 ) random variable with expected val ue 107 . Next T hursday, the President will deliver a speech at 9:00 that wi ll be broadcast by al l radio and telev ision netvvorks . The nu ll hypothesis, H'0 , is that t he speech does not affect the probab ility model of tweets . In other words, H 0 states that on the night of the speech , N is a Poisson random variable with ex pected value 107 . Design a signifi cance test for hypothesis H'0 at a signifi ca nce level of a = 0.05. The experiment involves cou nting the number of tweets , JV, between 9:00 and 9:30 on the night of the speech . To design the test, we need to spec ify a rejection set , R, such that P (N ER] = 0.05. T he re a re many sets R that meet t his cond ition . We do not know whethe r the Pres ident's speech wi ll increase tweeting (by peop le deprived of their T hursday programs) or decrease tweeting (because many peop le who nor ma lly tweet listen to the speech). T herefore, we choose R to be a symmetrical set {n, : lri -107 > c} . T he re main ing task is to choose c to satisfy Equation (11 .1) . Under hypot hesis H'0 - the probability model is the Poisson (10 7 ) rando m variab le E(N) = Va.r(N) = 10 7 . T he sign ifica nce level is 1
> -c - O'N
l
(1 1.2)
Since E(J\T) is large, we can use t he central limit theorem and approximate (J\T E(N))/O'N by the sta ndard Gaussia n random variab le Z so that (1 1.3)
In this case,
In a significance test) two kinds of errors ar e possible . Statisticians refer to them as Type I errors arid Type II errors '\vith the follo'\ving defiriitioris:
• T ype I Error False Rejectiori: R eject H 0 when H 0 is t rue. • T ype II Err or False Acceptan,ce: Accept H 0 vvhen H'0 is false.
[ 11.1
SIGNIFICANCE TESTING
369
The hypothesis specified ir1 a sigr1ifican ce test rr1akes it possible to calct1late t he probability of a T ype I error , a = P[s ER]. In t he abser1ce of a probability rr1odel for the condit ion "H.0 false," there is no way t o calculat e the probability of a T ype II error. A bi'nary hypothes'is test , described in Section 11.2, ir1cludes an alter'native hypothesis H 1 . Then it is possible to use t 11e probabilit}r rnodel given by H 1 to calculate the probability of a T ype II error, vvhich is P [s E A IH 1 ] . Althoug11 a sigr1ifican ce test does not specify a complete probability rnodel as an alterr1ative to the null hypothesis, the nature of the experirr1ent influences the choice of the rejection set , R. In Example 11.2 , we implicitl}' assurr1e that t11e alt ernative to t he null hypothesis is a probability rr1odel with an expected vah1e t hat is either hig11er t11an 107 or lo\ver t han 10 7 . In t he following exarnple, t11e alternative is a rnodel with an expected \ralue that is lovver than the original expect ed \ralue.
- - - Example 11.3:-------Before re leasing a diet pill to the public, a drug company runs a test on a group of 64 people. Before testing the pil l, the probability model for the weight of the peop le measured in pounds, is the Gaussian (190, 24) random variable W . Design a test based on the samp le mean of the weight of the population to determine whether the pill has a significant effect. The significance leve l is a = 0.01. Under the nu ll hypothesis, H·0 , the probability model after the peop le take the diet pi ll , is a Gaussian (190, 24) , the same as before taking the pil l. The sample mean , M 64 (X), is a Gaussian random variable with expected value 190 and standard deviation 24/ .J64 = 3. To design the significance test , it is necessary to find R such that P[J\f64 (X) ER] = 0.01. If we reject the null hypothesis, we will decide that the pill is effective and release it to the publ ic. In this example, we want to know whether the pill has caused peop le to lose weight. If they gain weight, we certainly do not want to declare the pill effective. Therefore , we choose the rejection set R to consist entirely of weights below the original expected value : R = {Jlf64 (X) < r 0 } . We choose r 0 so that the probability that we reject the null hypothesis is 0.01:
P [Jlf54(X) E R]
. = P [l\/!54(X) < ro] = ( ro -190) = 0.01. 3
(1 1.4)
Since ( -2.3 3) = Q (2.33 ) = 0.01, it follows that (r 0 - 190) / 3 = -2. 33, or r 0 = 183.01 . Thus we wi ll reject the null hypothesis and decide that the diet pi ll is effective at significance level 0.01 if the sample mean of the popu lation weight drops to 183.01 pounds or less.
Note t11e difference betvveen the S}rmrnetrical rejection set in Exarr1ple 11.2 and the or1e-sided rejection set in Exarr1ple 11.3. We selected these set s on the basis of the application of the res11lts of t11e test . In the language of st atistical ir1ferer1ce, the S}rrr1metrical set is part of a t1110-tail sigriifica'nce test , and t11e one-sided rejection set is part of a on,e-tail sign,ifican,ce test.
[ 370
CHAPTER 11
HYPO THESIS TESTING
==-- Quiz 11.1- - •
Under hypothesis Ho , tlie interarri·val tirnes between p hone calls are independent and identically distributed exporiential (1) ran dorri variables. Gi·ven X, t lie rnaxirrium arnong 15 iridependent iriterarrival tirne sarnples X 1 , .. . , X 1 5, design a significance test for liypothesis H 0 at a level of a = 0.01 .
11.2
Binary Hypothesis Testing A binary h}rpot hesis test creates a pa.rtitiori { Ao , A 1 } for a n experirrient . ·w hen an outcome is in H 0 , t he d ecision is t o accept h}rpothesis Ho. Othervvise the decisiori is t o accept H·1 . The qua.lit}' rneasure of a t est is relat ed t o the probability of a false alarm (decide H 1 vvhen H.o is true) and the probability of a rriiss (decide Ho v.rhen H.1 is t rue.)
Iri a binary liypothesis test, there are t vvo hypothetical probabilit}' rriodels, H.0 and H 1 , and t vvo possible decisions: accept Ho as t lie true model, arid accept H 1 . There is also a probability rriod el for Ho and H 1 , coriveyed by t he nurribers P [H.o] a nd P [H 1 ) = 1 - P [H.o]. T liese nurribers are referred t o as the a priori probabil'ities or prior probabilities of Ho and H·1 . The}' reflect the state of knovvledge about the probability rriodel before a ri Ol1tcorrie is observed. Tlie corriplete experiment for a binar}' liypothesis t est consists of two subexperiments. The first subexperiment chooses a probability model frorn sarriple space S' = {H 0 , H 1 } . The proba,bility rriodels H 0 and H 1 h ave the sarrie sarriple space, S . The second subexperiment produces an observation corresponding t o ari Ol1t corrie, s E S . W hen the obserw1tiori leads to a random vect or X , we call X the decision, statistic . Often, the decision st atist ic is sirnply a randorri variable X. W hen the decision st atistic X is discrete, t lie probability models a re condit ional probabilit}' rria,ss f\1nctions Px1H0 (x ) a,nd Px1H1 (x ) . ·vvhen X is a cont iriuol1S raridorn vector, the proba bility models are conditional probabilit}' densit y ft1nctions f.XIHo(x ) and fx1H1 (x ). In t he terrriiriology of st atistical inferen ce, these funct ions a.re referred t o as likelihood f11,n,ction,s. For exarriple, fx1H0 (x ) is t lie likelihood of x given H.0 . The t est design divides S into tvvo sets, A o and A1 = A 0. If the outcorne s E A o, the decision is accept H.0 . Otherwise , the d ecisiori is accept H·1 . The accuracy rrieasl1re of the t est consist s of t vvo error probabilit ies . P [A1 IH0 ) corresponds t o the probability of a T ype I error. It is t he probability of accepting H.1 vvhen H 0 is the true probability model. Similarl}', P [Aol H1] is the probabilit y of accepting Ho vvhen H 1 is t lie true probabilit y rriodel. It corresponds to tlie probability of a T ype II error. One electrical engineering application of binary h}rpothesis t est ing relat es to a radar system. The transrriitter sends ot1t a sigrial, arid it is the job of t he receiver t o decide v.rhetlier a target is present . T o rriake this decision, the receiver examines the received signal to det errnine wliether it contairis a reflect ed ·version of the trans-
[ 11.2
1
1
(/')
VJ VJ
VJ
~ 0.5
o....~
0....
0 0
0.5 PFA
ROC for continuous X Figure 11.1
1
371
BINARY HYPOTHESIS TESTING
0.5 0 0
.
•
•
•
•
•
•
• • • • ••
•
•• ••
..
0.5
1
pFA
ROC for discrete X
Continuous and discrete examples of a receiver operating curve (ROC).
rriitted sigrial. The hypotriesis H 0 corresporids to the sitl1atiori in vvhich there is no target. H 1 corresponds to the presence of a target. In t he t erminology of radar, a T ype I error (decide target present vvhen there is rio target) is referred to as a false alarm, and a T}rpe II error (decide no target wheri there is a target present) is referred t o as a rniss . Tlie design of a biriary hypothesis test represents a trade-off betv.reen t he tv.ro error probabilities, PFA = P [A1 IHo] and P1v1rss = P[Aol H1]. To understand t he trade-off, corisider ari extrerne design in whicli A o = S consists of t he entire sarnple space arid A1 = 0 is tlie erripty set . In this case, PFA = 0 a nd P1v1rss = 1. N O\v let A 1 expand t o include an increasing proportiori of the outcomes in S. As A 1 expands, PFA iricreases arid PMrss decreases. At tlie other extreme, A o = 0, which irriplies Pl\111ss = 0. In t l1is case, Ai = S arid PFA = 1. A graph represerit ing the possible values of P FA and P1vrrss is referred to as a recei:uer operatin,g cv,rve (ROG). Exarnples appear in Figure 11.l. A receiver operatirig curve displays PMrss as a function of P FA for all possible A o and A1. The graph on the left represents probability rriodels vvith a cont iritlOllS sarnple space S. In the graph ori tlie right, S is a d iscrete set and the receiver operating curve consists of a collection of isolated points in the P FA, P}.!J.ISS plane. At the top left corner of the grapli, the point (0 , 1) corresporids to Ao = S arid A 1 = 0 . ·vVhen vve rriove one outcorrie frorri A 0 to A 1 , "\Ve rriove t o the next point on t he curve. ]\/I o,ririg down"\vard along the curve corresponds to takirig more outcomes frorri A 0 and putting them in A 1 until vve arri\re at the lower right cor rier (1, 0), "\vhere all the ot1tcorries are in A 1 .
=== Example 11.4=== The noise voltage in a radar detection system is a Gaussian (0, 1) random variable, JV. When a target is present, the received signal is X = v + N volts with v > 0. Otherwise the received signal is X = J\T volts . Periodically, the detector performs a binary hypothesis test, with H 0 as the hypothesis no target and H.1 as the hypothesis target present. The accepta nee sets for the test are A 0 = { X < :c 0 } and A 1 = {X > xo}. Draw the receiver operating curves of the radar system for the three target voltages v = 0, 1, 2 volts.
[ 372
CHAPTER 11
0 ...... ...... ......
HYPOTHESIS TESTING /
/
~ 0.5
/
I .
. ..
/
~
0 -2
/
.·
0.8 MISS v=O · - MISS v=I ' ' ' MISS v=2 FA
I
0;....,
•
~0.6 ~ ·
Q.,
0.2
,.,,,
i::::::;..._._,::..........,t.....:...,__~--------1
0
2
(a)
4
0.4
6
......
.......
•
'' •
......
.. ......
......
......
...... .
. .. . . . . 0.2
0.4
.-......
v=O v= I · v=2
.......
......
- - ·0.6
.....
0.8
.......
1
(b)
Figure 11.2 (a) T he probability of a nliss a nd t he probability of a false alarn1 as a function of t he t hreshold xo for E xam p le 11.4. (b) 1.,h e corresponding receiver operating cu rve for the system. YfVe see t h at t he ROC improves as v increases.
To derive a receiver operating curve, it is necessary to find PMISS and PFA as functions of ::c 0 . To perform the ca lculations, we observe that under hypothesis H 0 , X = N is a Gaussian (O>CJ) random variable . Under hypothesis H 1 , X = v + N is a Gaussian (v> CJ) random variab le. Therefore,
PM 1ss = PFA =
P [Aol H1] = P [X < xo lH1 ] = iP (xo - v) P [A1 IH0] = P [X > xolHo] = 1 - iP (xo) .
(11.5) (11.6)
Figure 11.2(a) shows PM1ss and PFA as functions of ~Do for 'V = 0 , v = 1, and v = 2 volts. Note that there is a single curve for PFA since the probabil ity of a false alarm does not depend on v. T he same data also appears in the corresponding receiver operating curves of Figure 11 .2(b). When v = 0, the received signal is the same regardless of whether or not a target is present. In this case, PMISS = 1 - PFA · As v increases, it is easier fo r the detector to distinguish between the two targets. We see that the ROC improves as v increases. That is, we can choose a value of x 0 such that both PMISS and PFA are lower for v = 2 than for v = 1.
In a practical b inar}' hypot11esis t est , it is necessary to a dopt one t est (a specific Ao) arid a corresponding t r ade-off betv.reen P FA and P 1v1ISS · There a re m tm y approaches to selectir1g A 0 . In the r adar application, the cost of a rniss (ignorir1g a threatening target) COl1ld be far higher than the cost of a false alarrr1 (causir1g the operat or t o t ake an unnecessary precal1t ion). This suggests that t he rad ar syst em should oper at e vvit h a lovv ·v alue of ~Do t o produce a lov.r P Mrss ever1 t11ol1gh t his "''ill produce a relati·vely 11igh P FA · T 11e rem ainder of this section describes fol1r rnet11ods of choosir1g A 0 .
[ 11.2
BINARY HYPOTHESIS TESTING
373
Maximum A posteriori Probability (MAP) Test ==~ E x am
p Ie 11. 5,___.;;=::::::1
A modem transm its a binary signa l to anot her modem. Based on a no isy measurement, the receiving mode m must choose between hypothes is Ho (the tra ns mi tter sent a 0) and hypothesis H .1 (the transm iiter sent a 1) . A fa lse a lar m occ urs w hen a 0 is sent but a 1 is detected at t he receiver. A miss occurs w hen a 1 is sent but a 0 is detected. For both types of error, t he cost is the same; one bit is detected incorrectly.
The rriaxirnurri a posteriori probabilit}' test rriinimizes P ERR ) t.he total probability of error of a biriary 1iypot1iesis test. T he lavv of total probability) Theorem 1. 9, relates P ERR to t he a priori probabilities of Ho arid H.1 and to the two conditional error probabilities) PFA = P[A1 IH0] and PMrss = P[AolH.1]: PERR =
P [A1IH0] P [Ho ] + P [AolH1] P [H1].
(11. 7 )
When tlie tvvo t}rpes of errors have the sarrie cost) as in Exarriple 11.5, minirnizing PERR is a sensible strategy. The follovving theorern specifies the binar}' h}rpothesis test that prodl1ces the rninim11rri possible PER,R ·
Maximum A posteriori Probability (MAP) Test Given, a bin,ary hypothesis-testin,g experirnerit 1JJith o'utcorne s, the f ollcr1nin,g r'IJ,le leads to the lo'1vest possible value of P ERR:
s
E
Ao if P [Hols] > P [H1 ls] ;
s E A 1 other'1vis e.
Proof To create the partition {Ao, Ai}, it is necessary to place every ele1nent s E Sin
either Ao or Ai. Consider the effect of a specific value of son the sum in Equation (11.7). E it her swill con tribute to the first (A1) or second (Ao) term in t he sum. By placing each s in t he term that has the lower value for the specific outcome s, we create a part it ion t hat ininimizes the entire sum. Thus we have the rule s E _4 o if P [s lH1] P [H i ] < P [s lHo] P [Ho] ;
(11.8)
Applying Bayes' theorem (Theorem 1.11), we see that the left s ide of the inequality is P[H1 Is] P [s] and the right side of t he inequality is P[ Ho Is] P [s]. Therefore the inequality is identical to P[Ho Is] P[s] > P [H1 Is] P [s], 'vhich is identical to the inequality in t he t heorem statement.
Note that P[R.ols] arid P[H11s] are referred to as tlie a posteriori probabilities of H 0 and H 1. Just as the a priori probabilities P[H 0 ] and P[H1] reflect our knov.rledge of Ho and H1 prior to performing an experirnent , P [Ho ls] and P [H1 ls] reflect Ollr knowledge after observing s. Theorem 11.l states that in order to rniriirriize PERR it is necessary to accept t lie hypothesis v.rith the 1iig1ier a posteriori probability. A test that follows this rule is a rnax;irnv,rn a posteriori probab'i lity (M.A P) h}rpothesis
[ 374
CHAPTER 11
HYPOTHESIS TESTING
test. In Sl1cl1 a t est , A 0 contains all ot1tcorr1es s for which P[H.0 1s] > P [H 1 ls], an d A 1 contair1s all outcomes s for w11ich P [H.1 ls] > P [H.0 ls ]. If P [H.0 ls ] = P[H1 ls], t he assignment of s t o eit her A 0 or A 1 does not affect PERR · In Theor ern 11.1 , -vve arbit rarily assigr1 s to A 0 -vvhen t h e a posterior i probabilities a re eql1al. ·vve -vvould have t he sarr1e probability of error if -vve assign s to A 1 for all outcomes t hat produce eql1al a posteriori probabilities or if"'ive assign sorr1e outcornes -vvit h equal a post eriori probabilit ies t o A 0 and others t o A 1 . Equation (11.8) is another st at ernent of t he MAP decision rule. It contains the three probability rnodels that are assurned t o be kr10-vvn: • The a priori probabilit ies of the hypot heses : P[H0 ] and P [H.1 ], • The likelihood functior1 of H o: P[sl H.o], • The likelihood functior1 of H 1 : P [sl H.1 ] . vVhen t he outcorr1es of a r1 experirr1en t yield a ra r1dom vector X as t he decision statistic , \Ve car1 express the 1!{AP rule in t erms of condit ional P MF s or PDFs. If X is discret e , we take X = X i t o be t he outcorr1e of the exper irr1ent . If t h e sarr1p le space S of t he experiment is continl10t1s, we interpret t 11e conditional probabilit ies by assurr1ing that each outcome corresponds t o t he randorr1 vector X in t 11e sm all volurne x < X < x + dx vvith probability fx(x )dx. Thus in terrris of the r andorr1 variable X , -vve h ave the follo-vving version of t 11e MAP hypothesis test.
- - -Theorem 11.2- - F or ari experirnen,t that prod'/J,Ces a ran,dorn 'Vector X , the MAP hypothesis t est is Discret e:
. PxlHo(x ) > P [H.1] x E Ao if PxlH1(x ) - p [H.o];
. . fx lHo (x ) Coritiri11,0'1J,S: x E Ao if f ·( ) > X IH 1 x
x E A 1 other'uJise;
x E A 1 other111ise.
In these forrr1ulas, t 11e r atio of condit ional probabilit ies is referred t o as a likehhood ratio. T he forrr1t1las state that in order t o p erform a bina ry h:ypot 11esis test , -vve observe the outcorne of an experirnent , calculate t he likelihood ratio on t he left side of the forrr1ula , and corr1pare it \vit h a cor1stant on the right side of the forrnula . We can \rie-vv the likelihood r atio as t 11e evidence, based on an obser\ration, ir1 fa,ror of H 0 . If t 11e likelihood ratio is greater t han 1, H 0 is rr1ore likely thar1 H 1 . The r atio of prior probabilit ies, on t he right side, is t he evidence, prior t o perforrr1ing the experirnent , in favor of H 1 . Therefor e, Theorern 11.2 states t h at acceptir1g H0 is t11e better decision if t he eviden ce in favor of H0 , based on t he experirr1ent , out \veighs the prior evider1ce in favor of accepting H 1 . In rr1any practical hypothesis tests, incll1ding the follo-vving example, it is convenient t o compare the logarit hrns of the t wo r atios .
Example 11.6 With probability p , a digita l com mun ications system transmits a 0. It transmits a 1 with probability 1 - p . Th e received signa l is either X = - v + J\T volts, if th e transmitted
[ 11.2
(l -p)P[X< x"IH 1 ]
BINARY HYPOTHESIS TESTING •
375
pP[X> x l~1 ]
(J - P) fxH1(X)
0 x·
-v
v
Decision regions for Exan1ple 11.6.
Figure 11 .3
bit is 0, or v + N vo lts , if t he trans mit t ed bit is 1. T he voltage ±v is the infor mation co mpone nt of the received s ig nal, a nd N, a Gauss ian (0, o-) random va riable, is the noise co mponent . Given t he rece ived s igna l X, what is the minimu m proba bility of error rule for decid ing whet her 0 or 1 was sent? With 0 trans mitted, X is t he Gaussian (- v, o-) ra ndom variable . With 1 tra nsm itted, X is t he Gaussian (v, o-) random variab le. W ith Hi denoti ng t he hypothesis t hat bit i was sent, the likelihood functio ns are (11.9) Since P[H0 )
= J>, the
likel ihood rat io test of T heorem 11 .2 becomes
x
E
A 1 otherwise.
(11.10)
T a king the logarithm of both sides and simp lify ing yields ::r; E
A 0 .f1
:i;<:i; * = -o-2 1n
2v
(
I> ) ;
1- p
:i; E
A 1 otherwise.
(11.11)
When I> = 1/ 2, the thresho ld x* = 0 and t he dec is ion depends on ly on whether the evide nce in the received s igna l favors 0 or 1, as indicated by t he s ign of x . W hen p =I= 1/ 2, the prior information s hifts t he decision t hresho ld :i;* . T he shift favors 1 ( :i;* < 0) if p < 1/ 2. T he shift favors 0 ( x* > 0) if I>> 1/ 2. T he influence of the prior information also depends on the signa l-to-no ise voltage ratio , 2v/o- . When the ratio is relatively high , t he infor mation in the received signa l is reliab le and the received signa l has re latively mo re in fl uence tha n t he prio r information (;x:* closer to 0) . W hen 2v/ois relatively low, the prior info rmation has relat ively more influe nce. In Figure 11.3 , t he thresho ld :i;* is the value of x for which t he two li ke li hood funct ions, each multiplied by a prior probability, are equal. The probab ility of error is the sum of the shaded a reas. Compared to a ll other dec ision rules, the threshold x* produces the minimu m possib le PERR·
- - - Example 11.1- - Fi nd the error probabil ity of the com mun ications system of Exa m ple 11.6.
[ 376
CHAPTER 11
HYPOTHESIS TESTING
Applying Equation (11.7), we can write the probability of an error as P ERR= r> P [X
> ::c* IHo] + (1 - r>) P [X < x* IH 1] .
Given H 0 , X is Gaussian ( - v, a ). Given
n·1, X
is Gaussian (v, a ). Consequently,
Q(x;*+v) + (l - );r"(x* -v) = pQ (-alr1 _ + -v) + (1 - r>)
RERR = P
[> '±'
(}
(11.12)
(}
p
p
v) .
- p a
(11.13)
Th is equation shows how the prior information, represented by ln[(1 - r>) /r>] , and the power of the no ise in the received signa l, represented by a, infl uence PERR ·
Example 11.8 At a computer disk drive factory, the manufacturing fai lure rate is the probability that a random ly chosen new drive fai ls the first time it is powered up . Normally, the production of drives is very reliable, with a failure rate q0 = 10- 4 . However, from t ime to time there is a production problem that ca uses the fa iIu re rate to j ump to q1 = 10- 1. Let Hi denote the hypothesis that the failure rate is qi . Every morning, an inspector chooses drives at random from the previous day 's production and tests them . If a failure occurs too soon , the company stops production and checks the critical part of the process. Production problems occur at random once every ten days, so that P [H 1] = 0. 1 =1 - P [H.0 ] . Based on N , the number of drives tested up to and including the first failure, design a MAP hypothes is test. Calculate the conditional error probabi lities PFA and PM1ss and the tota l error probability P ERR· Given a fa il ure rate of qi , N is a geometric random variable (see Example 3.9) with expected va lue l /q,i . That is, P 1v1Hi(n,) = qi( l- qi)n-l for n, = 1, 2, ... and P1v lHi(ri) = 0 otherwise. Therefore, by Theorem 11.2, the MAP design states n, E
. PNIHo(n,) P [H 1] > ; PNIH1 (n,) - P [Ho]
A o 1f
n, E A 1 otherwise
(11.1 4)
n, E A 1 otherwise.
(11.15)
With some algebra , we find that the MAP design is
1 (q1P[H 1) ) n, E A 0 if ri
> n, * = 1 +
n
CJO
ln (
P(H o]
i=~~)
;
Substituting q0 = 10- 4 , q1 = 10- 1. P [H 0 ] = 0.9 , and P [H.1] = 0. 1, we obtain n,* = 45 .8. Therefore, in the MAP hypothesis test, A 0 = {n, > 46}.This impl ies that the inspector tests at most 45 drives in order to reach a decision about the fa ilure rate. If the first failure occurs before test 46, the company assumes that the fa il ure rate is 10- 1. If the first 45 drives pass the test, then N > 46 and the company assumes that the fai lure rate is 10- 4 . The error probab ilities are: P FA = P [N < 45 IH .o] = FNI Ho(45) = 1 - (1 - 10- 4 ) 45 = 0.0045, PMISS = p [N > 45 IH '1] = 1 - F 1V IH1 ( 45) = (1 - 10- 1) 45 = 0.0087.
(11.16) (11.17)
[ 11.2
BINARY HYPOTHESIS TESTING
377
The total probability of error is PERR =
P [Ho] PFA + P (H1] PM 1ss
= 0.0049.
We \Nill return t o Example 11.8 vvhen we ex arr1ir1e other t ypes of tests.
Minimum Cost Test The JVIAP t est irr1plicit ly assurr1es that bot h t ypes of errors (miss and false alarm) are equally serious. As discussed ir1 connection v.rith t:he radar application earlier in t11is section , this is not t11e case in rr1an}' importa nt situations . Consider an application ir1 which C = C 10 t1nits is the cost of a false alarrn (decide H 1 v.rhen H.0 is correct ) and C = C 01 t1ni t s is the cost of a miss (decide H 0 when H.1 is correct). In t his sit uation the expected cost of test errors is (11 .18)
JVIinirnizing E [CJ is the goa.l of t 11e rr1inimurn cost hypotl1esis test. ·\i\Tl1en the decision statistic is a randorr1 vector X , we have the following t 11eorerri.
---== Theorem 11.3 Minimum Cost Binary Hypothesis Test For a,ri experirnen,t that pr·od'uces a ran,dorn vector X ; the rnin,irn'urn cost hypothesis t est 'is Discret e:
> P [H.1] Co1 ( ) _ p [H. ] C ;
x E A 1 other'1vise;
. _fX---'-IH_·· o_(x_) > P (H1] Co1. 1 fx1H (x) - P [Ho) C10'
x E A 1 otherv1ise.
. Px1Ho(x )
x E A o if p
Con,tin,?J,O'tJ,S : x E A o
1 '
X IH 1
x
0
10
1
Proof The function to be minimized , E quation (11.18), is iden t ical t o t he function t o be mini1nized in t he tv'Ii\.P hypot hesis test, Equation (11.7), except t hat P[ H1]Co1 appears
in place of P [H1] and P [H0]C 10 appears in place of P [H 0]. Thus t he op t imum hypothesis test is t he test in Theor em 11.2, wit h P[H1]C 01 r eplacing P [H 1] and P[H 0]C10 replacing P[Ho].
In this t est we note that or1ly the relative cost Co1/C10 ir1fiuen ces t he t est , riot the indiv idt1al cost s or t11e units ir1 "'' hich cost is rneasured. A r atio > 1 implies that rnisses ar e rr1ore costly t11a n false a.larrns. T11erefore, a r atio > 1 expands A 1 , t he acceptan ce set for H 1 , rr1aking it ha rder to rniss H 1 vvhen it is correct . On the ot her hand , the sarne r atio cor1t racts H.0 and ir1creases the false a.larrn probability, because a false a.larrn is less costly than a rniss .
Example 11.9 Continuing the disk drive test of Example 11.8, the factory produces 1000 disk drives per hour and 10,000 disk drives per day. Th e manufacturer sel ls each drive for $100 .
[ 378
CHAPTER 11
HYPOTHESIS TESTING
However , each defective drive is returned to the factory and replaced by a new drive. The cost of replacing a drive is $200, consisting of $100 for t he re placement d rive and an additional $100 for sh ipp ing, customer support, and claims processing. Further note that remedying a prod uctio n problem results in 30 minutes of lost prod uction. Based on the dec ision statistic N , the number of drives tested up to and including the f irst failure, what is the min im u m cost test? Based on the given facts, the cost C 10 of a false ala rm is 30 m inutes (5000 d ri ves) of lost production , or roughly $50,000. On t he other hand, the cost C 01 of a miss is t hat 10% of the daily production will be returned for replacement. For 1000 drives retu rn ed at $200 per drive, the expected cost is $200,000. T he minimu m cost test is
E A .f PNIHo (n,) > P [H1] Co1 . 1 0 PNIH (11,) - P [Ho] C10' n, 1
n, E A 1 otherwise.
(11.19)
Performing t he same subst itutions and simplifications as in Example 11.8 yields
11, E
Ao
. 1f 11,
*
> n, = 1 +
1 ( q1 P[Hi] Co1 ) n
qo
P[H o]C10
( ) ln 1 - go
= 58.92;
n, E A 1 otherwise.
(11.20)
l -q1
Therefore, in the m inimum cost hypothesis test, A 0 = {n, > 59}. An inspector tests at most 58 disk drives to reach a decision regardi ng the state of the factory. If 58 drives pass the test, t hen A 0 = {N > 59}, and the failure rate is assu m ed to be 10- 4 . T he error probabi lities are :
PFA = P [N < 58 IH.o] = FNI Ho(58) = 1 - (1 - 10- 4 ) 58 = 0.0058, PMISS = p [N > 59 IH'1] = 1 - FN IH1 (58) = (1 - 10- 1)58 = 0.0022.
(11.21) (11.22)
The average cost (in dollars) of this rule is
E [CMc] = P [Ho] PFAC10 + P [H 1] P M1ss C01 = (0. 9) (0.0058) (50,000) + (0.1 ) (0.0022) (200,000) = 305.
(11.23)
By comparison , the M AP test, which m inimizes the pro babi lity of an error rather tha n the expected cost , has error probabi lities PFA = 0.0045 and PM1ss = 0.0087 and the expected cost
E [CMAP] = (0. 9)(0.0045)(50,000) + (0.1 )(0.0087)(200,000) = 376.50.
(11.24)
The effect of the high cost of a miss has been to reduce t he miss probab ili ty from 0.0087 t o 0.0022. Howeve r , the false ala rm probability rises fro m 0.0047 in t he M A P test to 0. 0058 in the mini mum cost test. A savi ngs of $376.50 - $305 = $71. 50 may not seem very large. The reaso n is t hat both the MAP test and the minimum cost test work very wel l. By comparison , for a " no test" policy t hat skips testing a ltoget her, each day that the failure rate is q 1 = 0.1 wi ll result, on average, in 1000 returned drives at an expected cost of $200,000. Since such days occur with probability P [H 1] = 0.1, the expected cost of a "no test" po licy is $20,000 per day .
[ 11.2
BINARY HYPO THESIS TES TING
379
Neyman-Pearson Test Given an obser vation, t 11e l\/IAP test rninirriizes the probability of accepting the -vvrong hypothesis arid t lie rniniml1m cost t est minim izes t lie cost of errors . However ) t he MAP t est requir es tliat we kno-vv t he a priori probabilit ies P ( Hi) of t h e cornpeting h:ypotheses, arid t lie rriiriirnurri cost test requires that \Ve kno-vv in addition t he relat ive cost s of t h e t\vo t}rpes of errors . In rriany situations, tliese cost s and a priori probabilit ies are difficu lt or e\ren irnpossible to specify. Iri t his case, an alternat ive a pproach -vvould b e to specify a toler able level for either tlie false alarrn or miss probability. T his idea is t he basis for t he Neyrnari- Pearsori t est . T he Neym an- P earson t est rninirnizes PMrss subject to t he false alarrri probability coristraint P FA = a, \vlier e a is a const a rit t liat indicates our toler ance of false alarrns. Because P FA = P (A 1IH.0 ) a rid P1vrrss = P[A 0 IH1] are conditional probabilit ies, the t est does not require kno\x.rledge of the a priori probabilit ies P ( H 0 ] and P [H.1]. ·vve first describe the Neyrnan- P earson test when t he decision statistic is a cont irious r andorri vector X .
- - - Theorem 11.4 Neyman-Pearson Binary Hypothesis Test B ased on, the decisi on, statist'ic X , a co'ntin,v,ov,s ra,n,dorn 1;ector, the dec'is'io'n rv.le that rnin,irnizes PM1ss, s11,bj ect to the coristrain,t P1'"'A = a, is
x
f.X IHo (x ) L(x ) = j' ( ) >'Yi
.
E
A o if
x E A 1 other'tlJ'ise,
X IH1 x
1JJhere '"'! is chosen, so that
JL (x )<'Y f.XIHo(x )
dx = a .
Proof Using t he Lagrange mult iplier method, vve define t he Lagrange mult iplier >. and t he
function
G
= PM1ss + >.(PFA =
L 0
=
- a)
fx 1H, (x) d x +A ( 1 -
Lofx 1Ho(x )
r (fx 1H1 (x) - Afx 1Ho (x )) }Ao
dx
dx - a )
+ >.(1 - a) .
(11.25)
F or a given>. and a, we see t hat G is m inimized if A o includes all x satisfying
fx 1H1 (x) - Afx1Ho (x) < 0.
(11.26)
Note t hat >. is found from t he constr a in t PFA = a . l\/Ioreover , vve observe t hat Equation (11.25) implies >. > O; ot her,vise, fx 1H0 (x ) - Afx1H1(x ) > 0 for all x and A o= 0, t he empty set, would minimize G. In t his case , PFA = 1, \vhich \vould violate t he constr ain t t hat PFA =a . Since>. > 0 , \Ve can rewrite t he inequality (11.26) as L (x ) > 1 / >. =I'·
In the radar syst em of Exarnple 11.4, t he decision st atistic was a random variable X and the r eceiver oper a.ting ct1rves (RO Cs) of F igl1re 11.2 \Ver e gerier at ed by adjt1sting a t11resliold x o th at specified t lie sets Ao = {X <~D o} and A1 = {X > xo}. Exarnple 11.4 did riot qt1est ion v.rliether this rt1le finds t lie best ROC , t liat is, the
[ 380
CHAPTER 11
HYPOTHESIS TESTING
best trade-off between PMrss and P FA · The Ne}rrr1an- P ea,rson test finds t11e best ROC. For each specified ·value of P FA = a, t11e Neyrn ar1- P earson test identifies the decision rule that rr1inirnizes PMrss. In the Neyman- P earsor1 test , an increase in r decreases P1vrrss but increases PFA· \Vher1 t he decision statistic X is a contir1uous r andom vector , we can choose r so that false alarm probabilit}' is exactly cv. . This may not be possible v.rhen Xis discrete. Ir1 the discrete CttSe, we ha·ve t he following version of t11e Neyman- P earson test.
Discrete Neyman-Pearson Test Theorem 11.5 Based ori the decision, statistic X , a discrete ran,dorn vector, the decision, T'ale that rn'iriirnizes PM1ss, s'abject to the co'nstrain,t PFA
111here
r
.
if L(x) =
Px lHo (x ) p ( )
x
X IH1
x E A 1 other111ise,
>1;
is the largest possible valu,e sv,ch that
l: L (x )
Example 11.10 Continuing the disk drive factory test of Example 11.8, design a Neyman-Pearson test such that the false alarm probabi lity satisfies PFA < cv. = 0.01. Calculate the resu lting miss and false alarm probabilities. The Neyman-Pearson test is 11,
. E A o if L('n)
=
PNIHo (11,) p ( ) N IH 1 77,
>ri
77, E
A 1 otherwise.
(11 .27)
We see from Equation (11 .14) that this is the same as the MAP test with P (H 1]/ P [Ho] replaced by r · Thus, just like the MAP test, the Neyman-Pearson test must be a threshold test of the form 'n E
A 0 if 'n > 11,*;
77, E
A 1 otherwise.
(11 .28)
Some a lgebra would allow us to find the threshold n,* in terms of the parameter r · However , this is unnecessary. It is simpler to choose 77,* directly so that the test meets the fa lse alarm probabi lity constraint PFA = P [N <
11,* -
l lHo] = Ff'·l lHo(n* - 1) = 1 - (1 - qo)n*-l < cv. .
(11 .29)
Th is implies ,* 77
< 1 + _ln_(_l.::_a_.) -
Thus , we can choose n,* error probabi lities are :
1 + ln(0.99) = 101 .49 . lr1(0.9)
= 101 and sti ll meet the fa lse alarm
(11 .30)
probability constraint .The
< lOOIHo] = 1 - (1 - 10-4 ) 100 = 0.00995, P [N > lOllH1] = (1 - 10- 1) 100 = 2.66. lo- 5 .
PFA = P [N PM1ss =
ln(l - qo)
=
(11 .31 ) (11 .32)
[ 11.2
BINARY HYPOTHESIS TESTING
381
We see t hat tole ratin g a o ne perce nt fa lse a la rm probability effectively reduces the probabil ity of a miss to 0 (on the orde r of o ne miss per 100 years) but ra ises t he expected cost to
E (CNP] = (0.9)(0.01) (50,000) + (0.1) (2.66. 10- 5)(200,000) = $450. 53. Although the Neyma n-Pearson test min im izes neither the overa ll probabil ity of a test error nor the expected cost E[C], it may seem preferable to bot h the MAP test and the minimum cost test because customers wil l judge t he qual ity of the disk drives and t he reputation of the factory based o n the number of defect ive drives that a re shipped. Compared to the other tests, the Neyman- Pearson test resu lts in a much lower miss probability a nd far fewer defective drives being ship ped . However, it seems far too conservative, performing 101 tests before decidi ng that the factory is functioni ng correctly.
M aximum Likelihood Test
Similar t o t he Neyman- P earson test, the rnai;irn'Urn likelihood (ML) test is ar1other rr1ethod that avoids t he r1eed for a priori probabilit ies . U nder the JVIL a pproac11, for each Ol1tcome s -vve decide t he h}rpot11esis Hi for v.rhich P[sl H,i: ] is largest. T 11e idea behind choosing a hy pot hesis t o maxirnize t 11e probability of t 11e observatior1 is t o a·void rr1aking assurnpt ions about costs arid a priori probabilit ies P [ Hi] . T 11e r esl1lting decision rule, ca.lled t he rnaxirn11,rn l'ikelihood (ML) rl1le, car1 b e \A.T ritten rr1athematically as: Definition 11.1
Maximum Likelihood D ecision Rule
For a bi'nar~y hypothesis test based on, the ei;perirnerital O'tJ,tcorne s E S, the rnax'irn?J,rn likelihood (ML) decision, r11,le is
s
E
Ao if P [slHo] > P [sl H1] ;
s E A 1 other'tuise.
Comparing Theorem 11. l and Definition 11. 1, we see t11at in the absence of inforrr1at ion abol1t t h e a priori probabilit ies P[Hi], "''e ha,re adopted a rr1axirr1urr1 likelihood decision r l1le t h at is t he sam e as the JVIAP r11le llnder t 11e assurr1p t ion t 11at h}rpotheses H.0 and H 1 occl1r \vith eql1al probability. In essence, in the absence of a priori inform ation , t11e ML rule assumes t11at all hypot11eses ar e equally likel}'· By corr1paring the likelihood ratio t o a threshold eql1al t o 1, the ML hypot hesis test is neutral about \vl1ether Ho has a higher probability thar1 H.1 or vice versa. vVhen t he decision sta.tistic of t h e experirnent is a r an dorn vector X , \Ve can express t he ML rule in t errns of cor1dit ional P JVIFs or PDFs, just as \Ve did for t he JVIAP rule.
[ 382
CHAPTER 11
HYPOTHESIS TESTING
==;;;;: Theorem 11. fi___,;;=;;;;;i If an, experirnen,t prodv,ces a ran,dorn vector X , the ML decision, T'tJ,le states
Discrete:
> · ( ) _ 1;
. PxlHo(x)
x E A o if p
X IH1
A t. Con,,in,?J,O'US :x E 0
x
J ffx1Ho(x ( )) -> 1;
'/,
X IH1
x
x E A 1 other'tnise;
x E A 1 otherv1ise.
Comparing Theorerr111.6 t o T11eorerr11 1.4, v.rhen Xis continuous, or Theorern 11.5, "''hen X is discrete, "''e see t h at t he rr1axirr1l1rn likelihood test is t he sam e as the Neym an- P earson t est witl1 parameter '""( = 1. T his guarantees t hat t11e m axirr1l1rr1 likelihood test is optimal ir1 t he lirr1ited sense that no other test can reduce PMrss for t11e sarne PFA· In practice, -vve use the 11.I L 11ypothesis test in rnar1:y applicatior1s . It is alrnost as effective as t he MAP h}rpotl1esis t est \vher1 the exper im ent t hat produces outcorne s is reliable in the sense tl1at PERR for the ML test is low. To see "''hY t his is t rue, exarr1ine t h e decision rule ir1 Example 11.6. \¥ h en the signal-to- noise r atio 2v/a is high , the right side of Eql1atior1 ( 11.11) is close to 0 ur1less one of t11e a p riori probabilities r> or 1 - I> is close to zero (ir1 wl1ich case t he logarit hrr1 on t11e right side is a low negati,re nurr1ber or a high positive number , indicat ing strong prior knowledge t ha t t he trans rnitted bit is 0 or 1. ·\¥hen t11e right sid e is r1early 0 , llSually the case in bir1ary cornmur1ication, t11e e\ridence produced by the received sigr1al 11as rnl1ch rnore infll1ence on the decision t h an the a priori information and the result of the NIAP hypothesis test is close to the r esl1lt of the 11.IL h}rpothesis test. Example 11.11 ==Continuing the disk drive test of Example 11.8, design the maximum likelihood test for the factory status based on the decision statistic N, the number of drives tested up to and inc lud ing the first fai lure. The ML hypothesis test corresponds to the MAP test with P [H 0 ] = P [H 1] = 0.5. In this case , Equation (11.15) implies n,* = 66.62 or A 0 = { n, > 67}. T he cond itional error probabi lities and the cost of the ML decision rule are PFA = PMISS =
P [N < 66 IH o] = 1 - (1 - 10-4 ) 66 = 0.0066 , p [N > 67 IH 1] = (1 - 10- 1) 66 = 9.55. 10-4 )
E [CML] = (0.9) (0.0066) (50,000)
+ (0.1) (9 .55 · 10-4 ) (200,000) =
$316.10.
For the ML test, PERR = 0.0060. Comparing the MAP ru le with the ML rule , we see that the prior information used in the MAP rule makes it more difficu lt to reject the null hypothesis. We need only 46 good drives in the MAP test to accept Ho, while in the ML test, the first 66 drives have to pass. T he ML design, wh ich does not take into account the fact that the failure rate is usually low, is more susceptible to false a larms than the MAP test. Even though the error probability is higher for the ML test,
[ 11.2
Test Objective 1/[inirnize probability JVIAP of ir1correct decisior1
BINARY HYPOTHESIS TESTING
#tests P FA 45
4. 5 x 10-
P Mrss 3
Cost
8.7 x 10- 3 $365
JVIC
1/[inirnize expected cost
58
5.8 x 10- 3 2.2 x 10- 3 $305
JVIL
1![aximize likeli11ood; ignore costs a rid a pr1or1 probabilities
67
6.6 x 10-
3
383
9.6 x 10- 4 $316
1![ir1irnize PMrss for given NP
P FA
101
r:: 1.0 x 10- 2 2.7 x 10-,) $451
Table 11.1 Com parison of t h e maxin1um a posteriori probability (NIAP), n1inin1um cost (l\IIC), n1axin1un1 likelihood (l\!IL), and Neyn1an- Pearson (NP ) tests at t he disk drive factory. Tests are ordered by #tests , the n1axin1un1 number of tests required by each n1eth od.
the cost is lower because a costly miss occurs very infrequently (only once every four months). The cost of the ML test is only $11.10 more than the minim um cost. Th is is beca use the a priori probabi lities suggest avoiding false alarms beca use the f actory functions correctly, w hi le the costs suggest avoiding misses , because each one is very expensive. Because these two prior co nsiderations balance eac h ot her, the ML test , w hich ignores both of them, is very simi lar to the minimum cost test .
T able 11. l corr1pares the four b ir1ary 11ypothesis t ests (MAP, MC, ML and NP ) for t he disk d rive exarnple. In addit ion , t he recei·ver oper ating curve (sho\vn on the left ) associat ed vvit h t he decision st atistic J\T, t he number of tests up t o and including t he NP first failure, sho\vs t11e perforrnar1ce t r ade10-6 ...__ _ _ _ _ _ _ _ _ ___. off b et\veen these t ests. E ver1 t houg h it 3 10P io-2 uses less prior inforrnatior1 than the ot11er FA tests, t he ML test rnight be a good choice becal1Se t he cost of testing is near ly rninirr1um and the rniss probability is very lovv. The consequer1ce of a false alarrn is likely t o be an examination of t 11e rnanufacturing process t o fir1d Ollt if s omething is vvrong. A miss, or1 t he other h and (deciding the factory is functionir1g properly \vl1en 10% of t he d rives are defective) could be harrnful t o sales in the long run. Qu iz 11 .2 In an optical cornrr1unicatior1s syst em , t he p11otodet ector Ol1t pl1t is a P oisson randorr1 variable K , eit11er wit h a n expected valtle of 10 ,000 photor1s (11y pot hesis H.0 ) or vvit h a n expect ed valt1e of 1,000 ,000 p11otons (h}rpot11esis H 1 ) . Given t 11at b ot h hy potheses are equall}' lik:el}', design a JVIAP hypot11esis test t1sing observed valt1es of random \rariable J(.
[ 384
CHAPTER 11
HYPOTHESIS TESTING
11.3
Multiple Hypothesis Test
A multiple hypothesis t est is a gener alization of a binary hypothesis t est from 2 t o M hypot heses. As in the binary t est, observing an outcorr1e in A i corresponds to accepting the hypot hesis Hi . The accuracy of a mt1ltiple hypot hesis t est is embodied in a mat rix of cor1ditional proba bilities of decidir1g H.i wher1 H .i is t he correct hypot 11esis. A maximum a post eriori (MAP) t est takes into account a priori probabilities and observations to rnaxirr1ize the probability of a correct d ecision. A m axirnurr1 likelihood (ML) test llSes only observat ions . The t wo t est s coincide vvhen all hypotheses are equall}' likel}' a priori. There ar e rr1any applications in v.rhich an experirnent can conform t o rr1ore t11an t'ivo knovvr1 probability rr1odels , all vvith t 11e same sarr1ple sp ace S . A rrn1lt iple hypothesis test is a gener alization of a b inary 11ypotl1esis test . Ther e ar e J\,f hypothetica.l probability rnodels: H 0 , H 1 , ··· , H M- I· ·vve p erforrr1 ar1 experirnent , and b ased on t he outcorne, we com e t o t he d ecision that a certa in H,,n is t11e true probability rr1odel. The design of t he t est consists of dividing S int o a partition A 0 , A 1 , · · · , A J\1 - 1 , S'l1ch t h at t he d ecision is accept H,i if s E Ai . The accur acy rneast1re of the experirr1ent cor1sists of M 2 cor1ditional probabilities, P[Ail H:7], 'i , j = 0 , 1, 2, · · · , J\,f - l. T11e J\,f prob abilit ies, P[ Ai IH·,i ], i = 0, 1 , · · · , J\,f - 1 a r e probabilities of correct decisions. Example 11.12 A computer modem is capable of transmitting 16 d ifferent signa ls. Each signa l represents a sequence of four bits in the digita l bit stream at the input to the modem. The modem receiver examines the received signal and produces four bits in the bit stream at the output of the modem . Th e design of the modem considers the task of the receiver to be a test of 16 hypotheses Ho, H1 , ... , H.15, where Ho represents 0000, H i represents 0001, · · · , and H '1s represents 1111 . Th e samp le space of the experiment is an ensemble of possible received signa ls. T he test design places each outcome s in a set Ai such that the event s E Ai leads to the output of the four-bit sequence corresponding to Hi .
F or a rr1ultiple hypot hesis t est , t he MAP hypothesis test and the 11[1 hypothesis t est are ger1eralizations of the tests ir1 Theorem 11. l and Definition 11. l. Mir1irr1izir1g the prob ability of error corresponds to rr1axirnizing t h e proba bility of a correct decision , M -1
P coRRECT =
L
'i = O
P [Ai IH,i,] P [H,il ·
(11 .33)
[ 11.3
385
MULTIPLE H YPOTHESIS TEST
Theorem 11. t:==-..., MAP Multiple Hypothesis Test Given, a rn'tJ,ltiple hypothesis testin,g experirnen,t 1JJith 011,tcorne s; the f ollo'1uin,g rule leads to the highest possible valv,e of P CORRECT :
s E A1n if P [Hrnls) > P [n:1ls) for all j
=
0, 1, 2, ... , M - 1.
As in binary 11ypot hesis t esting, -vve can apply Ba:yes' t heor err1 to derive a decisior1 r ule based on the probability rr1odels (likelihood fur1ctions) corresponding t o the hypot heses an d t11e a priori p robabilities of the hypotheses . Therefore, corresponding to Theorem 11.2, we have the following generalization of t11e lv!AP binary hypothesis test.
11.8 For ari experirnen,t that prod'uces a raridorn var~iable X , the MA P rnultiple hypothesis test 'is
~--== Theorem
Discrete:
~r;i E
A rn if P [Hrn] Px lHrn (xi ) > P [n:1J Px lHj (x;i ) f or all j;
Con,tin/uov,s: x; E A rn if P [H,,n] f x lHrn (x;) > P (n:1J fx 1Hj (x;) for all j .
If information about the a priori probabilit ies of the h}rpot11eses is not available, a rnaxim urr1 likelihood 11}rp othesis test is appropriat e.
ML Multiple Hypothesis Test Definition 11. 2 A rnax;irnurn likelihood test of rn'ultiple hyJJotheses has the decision, 'rule s E Arn if P [sl Hrr1,] > P [sln:1J f or all j .
The ML 11ypothesis t est correspor1ds t o the 1!lAP hypothesis test when all hypot heses Hi have eql1al probability.
===- Exa mple 11.13 In a quaternary phase shift keying (QPSK) communications system, the trans mitter sends o ne of four equal ly likely symbo ls {s 0 , s 1 , s 2 , s 3 } . Let n·'i. denote t he hypothesis that the transm itted s ignal was S i . When Si is transmitted , a QPSK receiver produces the vector X = [X 1 X 2 ] ' such t hat (11 .34) where JV1 and J>l2 are iid Ga ussian (0, a) random variables that characterize the receiver noise and E is the average energy per symbol. Based on the receiver output X , t he rece iver must decide wh ich symbol was tra nsmitted . Design a hypothesis test that
[ 386
CHAPTER 11
HYPOTHESIS TESTING
maximizes the probability of correctly decid ing which symbol was sent . Since the four hypotheses are equa lly likely, both the MAP and ML tests maximize the probabi lity of a correct decision. To derive the ML hypothesis test, we need to calculate the conditional joint PDFs fx 1Hi(x ). Given Hi , l'l1 and N2 are independent and thus X 1 and X2 are independent. That is, using (),i = i11/2+ 11/4, we ca n write
f'x 1Hi (x ) =
f X1 IH i (x1) f X2 IHi (x2) r:::.. 2 2 r:::.. 2 2 -1- e-(:i:1-vEcosBi) / 2CY e-(:1;2-vEsinBi) / 2CY
2110" 2 e- [(:1;1- vE cosBi) 2 +(x2- JE i-;in Bi) 2]/ 2CY 2
1
2110" 2
.
(11 .35)
We must assign each possible outcome x to an acceptance set Ai . From Definition 11.2 , the acceptance sets Ai for the ML mu ltiple hypothesis test must satisfy
(11 .36) Equivalently, the ML acceptance sets are given by the ru le that x E A i if for all j,
Defining the signa l vectors
(11 .37)
x,
we can write the ML rule as
(11 .38) where II ull 2 = v,r + v,§ denotes the square of the Euclidean length of two-dimensional vector u. In Eq uation (11.38), the acceptance set A,i is the set of a 11 vectors x that a re closest to the vector s i . These acceptance sets {A 0 , A 1 , A2 , A3 } are the four quadrants (with boundaries marked by shaded bars) shown on the left . In communications textbooks, the space of vectors xis called the signal space, the set of vectors {s 1 , ... , s 4 } is ca lled the signal constellation, and the accepta nee sets A,i a re ca Iled decision regions.
=== Quiz 11.3=== For the QPSK corr1rr1unications S}rst ern of Exarr1ple 11.13, what is t he probability that tr1e receiver rr1akes ar1 error and decodes t he wrong syrnbol?
[ 11.4
11.4
MATLAB
387
MA1"'LAB
progran1s ger1erat e sarnple values of known probabilit}' models in order to cornpute sarr1ple values of derived random variables that appea.r in :hypothesis t est s . T11e programs l1se t he derived sample values in simulations and calculat e relative frequencies of events such as rr1isses and false alarrr1s . MATLAB
In t he examples of this c11a.pter ) we 11ave c11osen experimer1ts with sirr1ple probability rr1odels in order to 11ighlight t 11e cor1cepts and c11aract eristic propert ies of hypothesis t est s. J\!I.A. TLAB greatly exter1ds our abilit}' t o design and evaluat e hypot hesis t est s ) especiall}' in practica.l problems where exact ar1alysis of t he probability model becomes too cornplex. For exarr1ple, J\IIATLAB can easily perforrn proba bility of error calc11lations arid graph receiver operating c11rves. In addit ion ) there are rr1any cases in v.rhich an alysis ca.n ident ify t he accep tance set s of a 11ypot h esis t est b t1t calculation of t 11e error pr obabilit ies is overly cornplex. In this case, MATL.A.B car1 simulate repeated t rials of t he h}rpothesis t est. The following example preser1ts a situatior1 frequently er1countered by cornrnur1ications engineers . Details of a pract ical system create probability rr1odels t hat are hard t o analyze m athernatica.11}' · Instead ) er1gir1eers llSe MATL.A.B arid other software tools to sirr1ulate operation of the systerr1s of ir1t erest. Sirnt1lation data provides estirr1at es of S}rsterr1 performance for eac11 of sever al desigr1 alternatives. T11is exarr1ple is sirnilar to Exarr1ple 11. 6, \vith the added cornplication t 11at an arr1plifier in t 11e r eceiver produces a fraction of the sq11are of t he signal pl11s noise. Ir1 this exarr1ple, there is a well-known probability rnodel for t 11e noise N ) but the rr1odels for the deri\red randorr1 variables - v + N + d( - v + N ) 2 and v + N + d(v + f\T) 2 are difficult to derive. To stud}' this t est , \Ve \¥rite a J\IIATL.A.B program that generates rn, sample ·values of JV . For each sarr1ple of N) t 11e prograrn calculat es t 11e t vvo functions of f\T, performs a binar}' h}rpothesis t est , and det errr1ines vvhether t 11e t est results in a hit or false alarrn. It r eports t11e r ela.tive frequen cies of hits and false a larrns as estirnates of P1vrrss and PFA. .
Example 11.14 A digital communications system transmits either a bitB = 0 or B = 1 with probabi lity 1/ 2. The inte rnal circuitry of the receiver resu lts in a "squared distortion" such that received sign a I (measured in volts) is either
- v+N
+ d(- 'V + f\T) 2 X= v + f\T + d(v + N) 2
B = O. I
B =l
(11.39)
)
where f\T, the noise, is Gaussian (0) 1) . For each bit transmitted , the receiver produces an output B = 0 if X < T and an output B = 1, otherwise. Si mulate the transmission of 20 ,000 bits through t his system with v = 1.5 volts , d = 0.5 and the fol lowing values of the decision threshold: T = - 0.5 , - 0.2 , 0, 0.2 ) 0. 5 vo lts. Which choice ofT produces t he lowest probabi lity of error? Can you find a value of T that does a better job? ~
~
[ 388
CHAPTER 11
HYPOTHESIS TESTING
>> T T = -0.5000 -0.2000 0 0.2000 >> Pe=sqdistor(1 . 5,0.5 , 10000,T) Pe 0.5000 0.2733 0.2265 0.1978
Figure 11 .4 ple 11.14.
0.5000
0 .1762
_A. verage error rate for t he squared distortion connnunications system of Exan1-
Since each bit is trans mitted and received in dependently of the othe rs, t he progra m sqdistor transmits rn = 10,000 zeroes to estimate P[B = l lB = O], the probability of 1 received given 0 tra nsm itted , for eac h of the thresholds . It then transmits rn = 10,000 ones to estimate P[B = OIB = 1]. T he ave rage probabil ity of error is
PERR =
o.sP [.B = 11B = o] + o.sP [.B = OIB = 1] .
function y=sqdistor(v,d,m,T) %P(error) form bits tested %transmit +-v, add N & d(v+N)-2 %receive 1 if x>T, otherwise 0 x=(v+randn(m,1)); [XX,TT]=ndgrid(x,T(:)); P01=sum((XX+d*(XX.-2)< TT),1)/m; x= -v+randn(m,1); [XX,TT]=ndgrid(x,T(:)); P10=sum((XX+d*(XX.-2)>TT),1)/m; y=0.5* (P01+P10);
(11.40)
By defining t he grid matrices XX and TT, we can test each candidate value of T for the same set of noise variables . We observe the output in Figure 11.4. Because of t he bias induced by the squared distortio n term , T = 0.5 is best among the candidate values of T . However, t he data suggests t hat a va lue of T greater t han 0.5 might work better. Prob lem 11 .4.3 examines t his possibil ity.
The problems for t his section include a collection of hypot hesis testir1g problerr1s t hat car1 be solved llSing J\IIATLA.B bl1t a r e t oo difficult t o solve by h and. The solut ions ar e b l1ilt on t he J\ilA.TLAB rnethods developed in prior chapters; however , t he r1ecessary J\IIATLAB calcl1lations arid siml1lations are t ypicall}' problern specific. Q uiz 11.4 For the corr1rnl1nications S}rst em of Exarr1ple 11.14 -vvit h squared dist ort ion , -vve can define t he rr1iss arid false a,lar rr1 probabilit ies as
PMrss = Poi =
P [B = OIB = 1J ,
PFA
= Pio = P [ B = 1IB = 0J .
(11.41)
J\!Iodify t he progr arr1 sqdistor in E xarr1ple 11.14 to produce receiver oper ating Cl1rves for the p aram et ers v = 3 volts and d = 0.1 , 0.2 , and 0.3. Hint: The p oints on the ROC correspor1d t o different values of the t hreshold T ·volts.
[ PROBLEMS
389
F'U.rther R eadin,g: [Kay98] provides detailed) readable coverage of 11ypot l1esis t esting. [HayOl ] presents detectior1 of digital corr1rnunications signals as a hypothesis test. A collection of challenging horr1ework problerr1s for sections 11.3 arid 11.4 are based or1 b it detection for code division rr1ultiple access (CDMA) comrr1unications systerns . The authoritative treatrnent of t 11is subject car1 be found in [Ver98].
Problems Difficulty:
Easy
11.1.1 Let L equal the n11mber of flips of a coin up to and includ ing t he first flip of heads. Devise a significance test for L at level ex= 0.05 to test the h ypothesis H that the coin is fair. \l\lhat are the limitations of t he test? 11.1.2 A course has t\vo recitation sect ions t hat meet at different times. On the midterm , the average for section 1 is 5 points higher than the average for section 2. A logical conclus ion is that the TA for section 1 is better than the TA for section 2. Using \vords rather t han math, give reasons 'vhy t his might be t he \vrong conclusion.
Moderate
D ifficu lt
+
Experts Only
do not announce 'vhere the upgrades take place. You have the task of determining whether certain areas have been upgraded. You have decided to use an application in your smartphone to measure the ping t ime (ho\v long it takes to receive a response to a certain message) in each area. The ne'v system is faster t han the old ( 0) one. It has on average shorter ping times. The probability model for t he ping t ime in inilliseconds of the ne\v system is the exponentia l (60) random variable. Perform a ping test and reject the null hy pothesis that the area has the new system if the ping time is greater than to ms.
11.1.3 Under t he null hypot hesis Ho that traffic is typical, the number of call attempts in a 1-second interval (during rush hour) at a mobile telephone S\vitch is a Poisson random variable N \vith E[N] = 2.5. Over a T -second period, t he measured call rate is NJ = (N1 + · · · + N 'r)/'1, , w here N1, ... , N'r are iid Poisson random variables identical to N. However, \vhenever there is unusually heavy t r affic (result ing from an acc ident or bad \Veather or some other event), the measured call rate M is higher than usual. Based on t he observation M, design a significance test to reject the null hypothesis Ho t hat traffic is typical at a significance level ex= 0.05. Justify your choice of the rejection region R. Hint: You may use a Gaussian (centra[ limit theorem) approximation for calculating probabilit ies \Vi th respect to M. Ho'v does your test depend on the observation period T? Explain your answer.
11.1.6• Let J{ be the number of heads in n, = 100 flips of a coin. Devise significance tests for the h ypothesis H that t he coin is fair such that
11.1.4 A cellular telephone company is upgrading its network to a ne\v ( 1'l) transmission system one area at a t ime, but they
(a) The s ignificance level ex = 0.05 and the rejection set R has the form {II< - E[I<]i > c}.
(a) \l\lrite a formula for ex, the significance of the test as a function of to. (b) \tVhat is the v alue of to t hat produces a significance level ex = 0.05? 11.1.5 \l\lhen a pacemaker factory is operat ing normally (the null hypothesis Ho), a randomly selected pacemaker fails a "drop test" wit h probability q0 = 10- 4 . Each day, an inspector randomly tests pacemakers. Design a significance test for the null hypothesis \vith significance level ex = 0.01. Note t hat drop test ing of pacemakers is expensive because t he pacemal<:ers t hat are tested must be d iscarded. Thus the s ignificance test should try to minimize t he n umber of pacemakers tested.
[ 390
CHAPTER 11
HYPO THESIS TESTING
(b) The significance level a = 0.01 and t he rejection set R has t he form { K > c'}. 11.1. 7 \!\Then a chip fabrication facility is operating no1m ally, t he lifetime of a inicrochip oper ated at temperat ure 1", measured in d egrees Celsius, is g iven by an exponen t ial (.\) ra ndom varia ble X wit h expected value E [X ) = 1/ .\ = (200 /1 2 years. Occasionally, t he chip fabrication plan t has contamination problems and t he chips tend to fa il much more rapid ly. To test for contamination problems, each d ay m, chips are subj ected t o a one-d ay test at T = 100°0. Based on t he number 1'l of chips t hat fail in one d ay, d esign a significance test for t he null hypothesis test H o t hat t he plan t is operating normally. 1
)
(a) S uppose t he r ej ection set of t he test is R = { 1'l > 0}. F ind the s ignificance level of t he test as a function of m, t he number of chips tested.
(b) Ho'v m any chips must b e tested so t hat t he significance level is a = 0.01.
been played a nd t hat t he lead er (call him Nar ayan) has picked 119 games correctly. D oes t he pool lead er Narayan have skills or is he just lucky? (a) T o address t his question , d esign a significance test to d etermine \vhether t he p ool leader act ua lly h as a ny skill at picking games. L et Ho d enote t he null hypothesis t hat all players, including t he leader, pick \Vinners in each game wit h probability p = 1/ 2, independen t of t he ou tcom e of an y ot her g ame . B ased on t he obser vation of W, t he number of w inning p icks by t he pool lead er after m, \veeks of t he season , design a one-s ided significance test for hypothesis H 0 at significance level a= 0.05. You may use a cen t ral limit t heorem approximat ion for binom ial PNIFs as need ed. (b) G iven t hat Nar ayan is t he lead er wit h 119 winning picks in m = 14 weeks in a pool w ith n, = 38 contestants, do you reject or accep t hypot hesis H o?
(c) If we raise t he temper attrre of t he test, d oes t he number of ch ips vve need to test increase or decrease ?
(c) H ow d oes t he s ignificance test d epend on picks being mad e against t he poin t spread?
11.1.8 A gr ou p of rL peop le form a foo tball p ool. The rules of t h is pool a re s im ple: 16 football games are played each week. Each contestant m ust pick t he winner of each ga1ne against a point spread. The contestan t who picks t he m ost games correctly over a 16-week season wins t he pool. The spread is a point d ifference d such t hat picking t he favor ed tea1n is a \Vinning pick only if t hat team wins b y m ore t h an d points; ot herwise, t he pick of t he opp osing team is a \Vinner. Each pool contestan t can study t he teams' past histories, performance t rends, official injury reports, coaches' \Veekly press conferences, chat room gossip and any other \Visd om t hat m igh t help in placing a \Vinning bet . After m, weeks, contestant i \Vill have picked wi gam es correctly out of 16m, g ames . For examp le, s uppose t hat after m, = 14 weeks, 16 (14) = 224 games h ave
11.1.9 A class h as 2n, (a large number ) students The students ar e separ ated into tvvo groups A and B , each \vi t h n, students. Group _4 studen ts take exam _4 and earn iid scores X1 , . .. , X 11 • Group B studen ts take exam B , earning iid scor es Y1 , . .. , Xi· The t\vo exams a re similar but differen t; however, t he exams \Vere d esigned so t hat a stud en t's score X on exam _4 or Y on exam B have t he same expected value a nd var iance a 2 = 100. For each exam , we form t he sample mean statistic J\lfA = X1
+ · · · + X n' n,
i\!JB =
y;i + · · · + Y:n n,
Based on t he statistic D = MA - i\!JB, use t he cen t r a l limit t heorem to d esign a s ignificance test at s ignificance level a = 0.05 for t he hy pothesis H o t hat a studen t's score
[ PROBLEMS
on t he two exams has t he same expected value JJ, and variance a 2 = 100. \ i\1hat is t he rejection region if n, = 100? Make sur e to specify any addit ional ass umptions t hat you need to make; however, t ry to make as few addit ional assumpt ions as possible.
11.2.1 I n a random hour , t he number of call attempts N at a telephone switch has a Poisson distrib ut ion 'vit h an expected value of eit her ao (hy pothesis Ho) or a i (hypothesis Hi ) . For a p riori p r obabilities P[Hi], find t he lVIAP a nd ML h yp othesis testing rules given t he observation of N . 11.2.2 The p ing t ime, in m illiseconds of a ne\v t r ansm ission system , d escr ibed in Problem 11.1.4 is t he exponen t ia l (60) rand om variable N . T he ping t ime of a n old syste1n is a n exponen t ia l ra ndom variable 0 wit h expected value µ,o > 60 m s. The null h ypothesis of a binary hy pothesis test is H o: The t ra ns mission syste1n is t he new system. The alternative hypothesis is H i : The t rans mission system is t he old system. The probabili ty of a new system is P[N] = 0.8. T he probability of a n old system is P [O] = 0.2. A binary hypothesis test measures T milliseconds, t h e result of one ping test . T he d ecision is H o if T < to ms. O t her,vise, t he decision is H 1 . (a) Write a formula for t he false ala rm pr obability as a function of t o and µ,o . (b) Write a fo rmula for t he miss probability as a function of t 0 and µ,o . (c) Calculate t he maximum likelihood d ecision t ime t 0 = t j\1 L for µ,o = 120 m s and µ,o = 200 ms. (d ) D o you t hink t hat trvrAP, t he maximum a posterior i d ecis ion t ime, is gr eater t h an or less t han t rv1L? Exp la in your answer. ( e) Calculate t he m aximum a posteriori probability decision t ime to= tMAP for µ,o = 120 ms a nd µ,o = 200 ms. (f) Dra'v t he r eceiver operating curves for µ,o = 120 ms a nd µo = 200 ms.
11.2.3 A n a utom at ic d oor bell system r ings a bell 'vhenever it detects someone at
391
t he d oor. The system uses a photodetector such t hat if a person is pr esent, h ypothesis H 1 , t he photodetector ou tput N is a P oisson random variable wit h an expected value of 1300 photons. Ot herwise, if no one is t here, hypothesis Ho, t he p hotod etector out put is a Poisson random variable wit h an expected value of 1000. Devise a Neym anP earso n test for t he presence of someone ou ts ide t he d oo r such t hat t he false alarm probability is a < 10 - 6 . \i\fhat is minimum value of P~111ss?
11.2.4 I n t he rad ar system of Example 11.4, P[H 1 ] = 0 .01 . In t he case of a false ala rm , t he system issues an u nnecessar y a ler t at t he cost of C10 = 1 unit . T h e cost of a miss is C 0 1 = 10 4 units because t he tar get could cause a lot of damage. \tV hen t he target is present, t he voltage is X = 4 + N, a G aussian (4, 1) random variable. \i\f hen t here is no target presen t, t he voltage is X = N, t he Gaussia n (0 , 1) r and om variable. In a binary hypothesis test, t he acceptance sets are A o= {X < xo} and Ai= {X > xo }. (a) l<""'or t he MAI> hypothesis test, fi nd t he d ecision t hreshold xo = XMAP, t he error p robabilit ies PFA and PlVnss, a nd t he aver age cost E[ CJ. (b) Compare t he MAP test perform ance against t he minim um cost hypothesis test.
11.2.5 In t he r adar system of Example 11.4, show t hat t he RO C in F igure 11.2 is t he r esult of a Ney ma n- Pearso n test. T hat is, s ho'v t hat t he Neyman- Pearson test is a t hreshold test wit h accep tance set A o = { X < xo }. H ow is ::eo related to t he false alarm p robability a? 11.2.6 _A. system administr ator (and part t ime spy) at a classified r esearch facility wishes to use a gate,vay rou ter for covert communication of resear ch secrets to a n ou tside accomplice. The sysadmin covertly communicates a bit W for every n, transmitted packets . To s ign al W = 0, t he rou ter does nothing while n, regular packets ar e sen t ou t t hrough t he gate\vay as a
[ 392
CHAPTER 11
HYPO THESIS TESTING
Poisson process of rate .\o packets/sec. To sign al W = 1 t h e sysadmin injects addition al fake out bound packets so t h at ri ou tbound packets ar e sen t as a I=>oisson process of rate 2.\ 0 . The secret communication bits ar e equip r ob a b le in t h at P [W = 1] = P[W = O] = 1/ 2. The sysadmin 's accomp lice (outside t h e gateway) inonitors t he ou tbound packet t ra nsmission process by observ ing t he vector X = [X 1, X 2, ... , Xn ] of p acket inter arrival times and guessing t he b it W every n, packets. (a) F in d t h e condit ion al PDFs fx 1v\l =o(x ) and fx 1W=1(x).
(b) W h at are t h e MAP a nd l\/IL hypothesis tests for t he accomplice to gu ess eit her hypothesis Ho t hat vV = 0 or hy poth esis H 1 t hat vV = 1? ( c) Let W d en ote t h e d ecision of t h e ML Use t he C he rnoff hy poth esis test. bo und to upper bound t h e error probability P [W = OIW = 1]. 11.2. 7 T he p ing t ime, in milliseconds, of a n e'v t r ansmission system , d escr ib ed in Prob lem 11.1.4 is t he expon en t ia l (60) ra nd om variab le 1'l. T he p ing t ime of a n old system is t he exponen t ial (120) random varia ble 0. The null hypot hesis of a b inary hypothesis test is H 0 : The t r ansm ission system is t h e new system. T he alternative hypothesis is H 1 : T he t r ansm ission system is t h e old system. T h e probability of a n e'v system is P[N] = 0 .8. T h e probability of a n old system P[O] = 0.2. A bina ry h ypot hesis test p erforms k p ing tests a nd calcu lates Mn(T), t h e sample m ean of t h e p ing t i1ne. The d ecision is H o if Mn(T) < t o ms . O t her,vise, t h e d ecision is H 1. (a) Use t he cent r al limit t h eorem to 'vrite a formula for t he false a la r1n prob ability as a fun ction of to a nd k .
(b) Use t h e cen tr al limit t h eorem to vvr ite a formula for t h e miss prob a bility as a function of to an d k . (c) Calcu late t h e maximum likelihood d ecision t ime, to = t 1v1L, for k = 9 p in g tests.
(d) Ca lcu late t he m aximum a posterior i prob ability d ecision t ime, to = t MAP for k = 9 ping tests.
(e) Draw t h e receiver op er ating curves for k = 9 p ing tests a nd k = 16 p ing tests. 11.2.8 In t his proble1n , 've pe rform t he old/ new d etection test of Problem 11.2.7, except no'v we monitor k p ing tests a nd observe 'vh ether each p ing lasts longer t han t o ms . T h e ra ndom var iable M is t he number of pings t h at last lon ger t h an to ins . The d ecision is Ho if M < m,o . O t her,vise, t h e d ecision is H 1 .
(a) \tVr ite a formula for t he false a la rm prob ability as a function of t o, 1no, a nd 'n .
(b) F in d t he m aximum likelihood d ecision number 1no = ffiML for to = 4.5 ms and k = 16 p in g tests.
(c) F ind t h e maxim um a p oster iori pro b ability d ecision number m,o = ffiMAP for t 0 = 4.5 m s a n d k = 16 p ing tests. (d) Dra'v t he receiver op er ating curves for t 0 = 90 ms a n d to = 60 ms . In b ot h cases let k = 16 p ing tests . 11.2.9 A bina ry communication system h as t ransmitted sign a l X, t he Bernoul li (1/ 2) r a ndom varia b le. A t t h e receiver, 've observe Y = V X + W, w h ere V is a "fad ing factor " a n d W is addit ive n oise . Note t hat V a nd Ware exp on en t ial (1) ra ndom varia bles an d t h at X , V, a nd W a re mut u ally in dep endent . G iven t he observation Y, we inust gu ess wh eth er X = 0 or X = 1 was t rans mitted. ·u se a bina ry hy p oth esis test to d etermine t h e rule t h at minimizes t he probability Pe of a d ecoding error. For t h e optimum d ecision rule, calculate Pe . 11.2.10 In a BPSK amplify-a n d-for,vard relay system , a source t r ansmits a ra ndom b it ' / E {-1, 1} every T seconds to a destin ation receiver v ia a set of n, relay t ransmitters. V = 1 a n d ' / = -1 a re equa lly likely . In t h is communication system , t he source t r a nsm its during t h e t im e p er iod (0 , T / 2)
[ PROBLEMS
are iid G a ussian (0, 1) noises, independen t of X.
such t hat relay i receives i
=
1, 2, ... , n, ,
\Vhere t he wi are iid G a ussian (0' 1) random variables represent ing relay i receiver noise. In t he t ime interval (7,/ 2, T), each relay node amplifies and forwards t he received source signal. The d estinat ion receiver obtains t he vector Y = [Y1 Y'.;1 J' such t hat i= l , 2, ... , ri, \Vher e t he zi a re a lso iid Gaussian (0, 1) r andom variables. In t he follo wing, assume t hat t he parameters ai and f3i ar e all nonnegative. Also , let H 0 d enote t he hypot hesis t hat V = -1 a nd H1 t he hy pot hesis
v=
l.
(a) Suppose you build a sub opt imal d etec1 t or based on t he s um Y = 1 Yi . If Y > 0 , t he r eceiver g uesses H 1; o therwise t he receiver guesses H 0 . \i\fh at is t he probability of error Pe for t his receiver ?
I::
(b) Based on t he observation Y , now suppose t he destinat ion receiver detector performs a l\/IAI> test fo:r hypot heses H 0 or H1. \l\f hat is t he Mi\.P d etector rule? Simplify your answer as much as poss ible. Hint: First find t he likelihood functions fY IHi(y ). (c) \i\fhat is t he probabilitJr of bit error P; of t he MAP d etector? (d ) Compare t he t \vo detectors when n = 4 and
= (1, 1), ( 0'.3' (33) = (1, 10) '
(a1, f31)
( 0'.2 ) (32)
= ( 10' 1) '
( 0'.4) (34)
= (10, 10) .
In genera l, w hat's ba d about t he subopt imal d etector?
11.2.11
393
In a BPSK communica t ion system , a source wishes t o communicate a r andom bit XE {-1 , 1} t o a r eceiver. Inputs X = 1 a nd X = -1 ar e equa lly likely. In t his system , t he source t r ansmits X multiple t imes. In t he it h t ra nsmission, t he receiver observes Yi = X + 'Wi, where t he W i
(a) After n, t ra nsmissions of X , you observe Y = Y = [Y1 Yn] '. F ind P[X = llY = y ]. E xpress your answer in terms of t he likelihood rat io
L( ) y
=
fy1x(YI - 1) fY IX (y ll ) .
(b ) Suppose a fter n, t ransmissions, t he receiver observes Y = y and d ecides X*
=
{1
-1
P [X = l lY = y] > 1/ 2, otherw ise.
F ind t he probability of error Pe = f>[X* # X] in terms of t he <.P ( ·) funct ion. Hint : l
11.2.12
Suppose in t he disk drive factory of Example 11.8, \Ve can observe K , t he number of fa iled d evices out of n, devices tested. As in t he example, let Hi denote t he hypot hesis t hat t he failure rate is qi . (a) _Assuming qo < q1, 'vhat is t he lVIL hypot hesis test based on an observat ion of J{ ? (b) \tVhat are t he condit ional probabilities of error P FA = P [-41IH o] and PM1ss = P[Ao IH1]? Calculate t hese probabili.es for ri = o c:: oo , qo = 10 - 4 , q1 = 10 - 2 . t,1 (c) Compare t his test t o t hat considered in Example 11.8. \l\fhich t est is more
[ 394
CHAPTER 11
HYPO THESIS TESTING
reliable? W hich test is easier to implemen t?
11.2.13 Consider a binary hypothesis test in 'vhich t here is a cost associated with each type of d ecision . In a ddit io n to t he cost C~ 0 fo r a false alarm and C 01 for a miss, we also have t he costs C 00 for correctly d eciding hypothesis H o a nd t he C~ 1 for correctly d eciding hypothesis H i . B ased on t he observation of a con t inuous rand om vector X , d esign t he hypothesis test t hat minimizes t he total expected cost
p u t is
x=
Show t hat t he d ecision r ule t hat minimizes total cost is t he same as d ecision rule of t he minimum cost test in T heor em 11.3, 'vit h t he costs Co1 and C10 replaced by t he different ial costs C 01 - C~1 and C~o - Coo ·
11.3.1 In a tern ary amplit ude shift keying (ASK ) communications system , t here are t hree equally likely t ransmitted signals {so, s 1, s2 }. These signals ar e distinguished by t heir amplit udes such t h at if signal Si is t ransm itted, t he receiver ou t put \Vill be X = a (i - 1) + N, 'vhere a, is a posit ive co nstan t and N is a Gaussian (0, CYN) r a ndom variable . B ased on t he ou tput X, t he receiver m ust decod e 'vhich symbol Si \Vas transmitted. (a) W hat are t he accep tance sets A i for t he hypotheses H i t hat Si was t ransmitted ? (b) W hat is P[De ], t he probability t hat t he receiver d ecod es t he wrong symbol?
11.3.2 A multilevel Q P S K communicat ions system t rans mi ts thr ee b its every unit of t ime. For each possib le sequence ijk of t hree b its, o ne of eight symbols, {sooo, soo1, . . . , s1 11}, is t ransm itted . W hen signal Sijk is transm itted, t h e receiver out-
+N,
where N is a Ga ussian ( 0 , CY2 1) random vector. The two-dimens ional s ignal vectors s ooo, .. . , s111 are S110
S100
•
• So10
E [C ' J = P [A1IH o] P [Ho] C~o
+ P [-4o IH o] P [H o] Cbo + P [ilo IH1] P [H1] Cb1 + P [-41 IH 1l P [H 1l c~ 1 .
S ijk
s ooo
•
•
So11
s oo1
•
•
S111
S101
•
•
Let H ijk denote t he hypothesis t hat Sijk was t r ansmitted. The receiver ou t put X = [X 1 X 2J ' is used to d ecide t he accep tance sets {-4 000, . . . , Ai 11}. If a ll eight symbols are equa lly likely, sketch t he accep tance sets.
11.3.3 An M -ary quadrature amplit ude m odulation (Q i\.11[) communications system can be viewed as a generalization of t he QPSK system d escribed in Example 11.13 . In t he Q AM system, one of M equally likely symbols so, . . . , S1n - 1 is t r ansmitted every uni t of t ime. W hen symb ol Si is t ransmitted , t he receiver prod uces t he twod imensional vector ou t put X = si + N , where N has iid Gaussia n (0 , CY2 ) componen ts. Based on t he ou t put X , t he r eceiver m ust d ecide w hich symbol was transmi tted. D esign a hy pothesis test t hat m axim izes t he probability of cor rect ly decid ing what symbol was sen t . Hint : Following Example 11. 13, d escribe t he acceptance set in terms of t he vectors
._ [Sill
Si -
Si2
·
[ PROBLEMS
11.3.4 Suppose a user of t he multilevel QPSK system needs to decode only t he third bit k of the message ij k. Fork = 0, 1, let H k denote the hy pothesis that the third b it was k. What are the acceptance sets _4o and A1? What is P[B3], the probability t hat the t hird bit is in error? 11.3.5 T h e QPSK system of Example 11. 13 can be generalized to an M -ary phase shift keying (MPSK) system with M > 4 equally likely signals. The signal vectors are {s o, .. . , Sj\1 - 1}, where
and ()i = 211i/ JV!. \i\fhen the ith message is sent, the received signal is X = Si+ N where N is a Gaussian (0, a- 2 1) noise vector. (a) Sketch the acceptance set A i for t he hypothesis H i that Si was transmitted. (b) F ind the largest value of d such that
( c) Use d to find an upper bound for the probability of error.
11.3.6 A modem uses QAM (see Problem 11.3.3) to transmit one of 16 symbols, s 0 , .. . , s 15 , every 1/600 seconds. When signal si is transmitted, the receiver output is
The signal vectors s o, . .. , s 15 are
•
S5
• I
Sg
•
S11
•
•
•
•
S4
S1
So
• I
S3
S2
S5
I
ss
I
S12
•
•
S10
S13
•
•
I
I
(b) Let H i be t he event t hat symbol Si \Vas transmitted and let C be t he event that the correct symbol is decoded. \tVhat is P[CIH1]? (c) Arg11e that P[C]
> P [CIH 1].
11.3.7 For the QPSK communications system of Example 11.13, identify the acceptance sets for the MAP hypothesis test when t he symbols are not equally likely. Sketch the acceptance sets when a- = 0.8, E = 1, P[Ho] = 1/2, and P[H1] = P [H2] = P[H3] = 1 /6. 11.3.8 In a code d ivis ion multiple access (CDMA) communications system, k users share a radio channel using a set of n,dimensional code vectors { S 1, ... , S k} to distinguish their sig11als. The dimensionality factor n, is known as the processing gain. Eac h user i transmits independent data bits xi such t hat the vector x = [X 1 X k J' has iid components \Vi th Pxi(l) = Pxi(-1) = 1/ 2. The received signal is k
Y
= L xiyp;,si + N , i =l
S14
(a) Show that in terms of vectors,
y = g p 1/ 2X + N ,
•
•
(a) Sketch the acceptance sets based on the receiver outputs X1, X2. Hint: App ly the solut ion to Problem 11.3.3.
where N is a Gaussian (0, a- 2 1) noise vector. From the observation Y , the receiver performs a multiple hy pothesis test to decode the data bit vector X .
X = s i+ N.
S7
395
I
•
S15
where S is an n, x k matrix 'vith ith col. $i, ... , y'Pk] umn S-i. and P 1/ 2 -- d1ag[ is a k x k diagonal matrix. (b) G iven Y = y , shovv that the tv'IAP and ML detectors for X are t he same a nd are given by
•
x *(y ) = arg min
x EBk
y - gp l / 2x
[ 396
CHAPTER 11
HYPO THESIS TESTING
where Bk is the set of all k d imensional vectors with ±1 elements.
hypothesis H i, X has conditional P lVIF
x (c) How inany hypotheses does the ML d~ tector need to evaluate?
11.3.9
For the CDMA communications system of Problem 11.3. 8, a detection str ategy known as decorrelation applies a transformation to Y to generate
'vhere N = (S'S) - 1 S'N is still a G~ussian noise vector with expected value E[N] = 0 . Decorrelation separate_s the signals in that the ith component of Y is
= 1' 2, . . . ' 20,
otherwise,
0
where Po = 0.99 and P1 = 0.9. Calculate and plot the false alarm and miss probabilities as a function of the detection threshold xo. Calculate the d iscrete receiver operating curve (ROC) specified by xo.
11.4.2
For the binary communications system of Example 11.7, graph the error probability PER,R, as a funct ion of p, the probability that the t ransmitted signal is 0. For the signal-to-noise voltage ratio, consider v/a E {0.1 , 1, 10}. What values of p minimize PER,R,? \i\fhy are those values not practical?
11.4.3
'vhich is the same as a single-user receiver output of t he binary communication system of Example 11.6. For equally likely inputs X i= 1 and Xi= -1, Example 11.6 showed that t he optimal (minimum probability of bit error) decision rule based on the receiver output Yi is
xi = sgn (Yi) . A lthough this technique requires the code vectors S i, . .. , Sk to be linearly independent, t he number of hypot heses that must be tested is greatly reduced in comparison to t he optimal ML detector introduced in Problem 11.3.8. In the case of linearly independent code vectors, is the decorrelator optimal? That is, does it achieve the same bit error rate (BER) as the optimal ML detector?
11.4.1
A wireless pressure sensor (buried in t he ground) reports a discrete random variable X wit h range Sx = {1 , 2, . .. , 20} to signal the presence of an object. G iven an observation X and a t h reshold xo, we decide that an object is present (hypothesis H1) if X >::ea; other,vise we decide t hat no object is present (hypothesis Ho). ·u nder
For the squared d istortion communications system of Example 11.14 with v = 1.5 and d = 0.5, find t he value of T that minimizes p e .
11.4.4• A poisonous gas sensor reports continuous random variable X. In the presence of toxic gases, hy pothesis Hi,
.
fx 1H1 (x) =
{(x/8)e- x2 / 16 0
x >o, otherwise.
In the absence of dangerous gases, X has conditional PDF
.
j X I Ho
'
( ) _ { (1/2)e- x/ X
-
0
2
x > 0, otherwise.
Devise a h ypothesis test t hat determines the presence of poisonous gases. Plot the false alarm and m iss probabilities for the test as a function of the decision threshold. Lastly, plot the corresponding receiver operating curve.
11.4.5 Simulate the M -ary PSK system in Problem 11.3.5 for JV!= 8 and M = 16. Let PER,R, denote the relative frequency of symbol errors in t he simulated transmission in 1~0 5 symbols. For each value of J\1, graph PER,R,, as a func t ion of the s ignal-to-noise power ratio (SNR) 'Y = E/ a 2 . Consider
[ PROBLEMS
10 log 10 ')', the SNR in dB , ranging fro1n 0 to 30 dB.
11.4.6 In t his problem, we evaluate the b it error rate (BER) performance of t he CDJVIA communications system introduced in Problem 11.3.8. In our experiments , we \Vill make the follo,ving additional assumptions. • In practical systems, code vectors are generated pseudorandomly. We \Vill assume the code vectors are random. For each transmitted data vector X , the code vector of user i \vill be S i = 1 [ S i1 ~ S i2 S in ] I , \Vhere t he components S ij are iid random variables such that Psi.1(1) = Psij( -l) = 1/ 2. Note that the factor 1 / fa is used so t hat each code vector S i has 2 length 1: II Si 11 = s ~ si = 1. • Each user transmits at 6dB SNR. For convenience, assume Pi = p = 4 and a 2 =1.
(a) Use MATLAB to simulate a CD Mi\. system 'vith processing gain n, = 16. For each experimental trial, generate a random set of code vectors {Si }, data vector X , and no ise vector N . F ind the l\/IL estimate x * and count the n umber of bit errors; i.e. , the number of positions in which ::e i =J.:. x i. the re lative freque ncy of b it errors as an estimate of the probability of bit error. Consider k = 2, 4, 8, 16 users. For each value of k , perform enough trials so t hat bit errors are generated on 100 independent trials. Explain \vhy your simulations take so long.
·use
(b) For a s impler detector known as the matched filter, when Y = y , t he detector decision for user i is
where sgn (x) = 1 if x > 0 , sgn(x) -1 if x < 0, and otherwise sgn (x) 0. Compare the b it error rate of the matched filter and the maximum likelihood detectors. Note that the matched
39 7
fi lter is also called a single user detector since it can detect the bits of user i \Vi thou t t he kno\vledge of the code vectors of the other users.
11.4.7 For the CD~IIA system in Problem 11.3.8, \Ve wish to use l\/IATLAB to evaluate t he bit error rate (BER) performance of the decorrelater introduced Problem 11.3.9. In particular, \Ve want to estimate Pe, the probability that for a set of randomly chosen code vectors, that a randomly chosen user's bit is decoded incorrectly at the receiver. (a) l<'br a k user system vvi t h a fixed set of code vectors S1' ... ' sk, let s denote the matrix with Si as its ith column. Assuming that the matrix inverse (S'S) - 1 exists, write an expression for Pe,i(S ), the probability of error for t he transmitted bit of user i, in terms of S and the Q(-) funct ion. For t he same fixed set of code vectors S, \Vrite an expression for Pe, the probability of error for the bit of a randomly chosen user. (b) In the event that (S'S ) - 1 does not exist, \Ve assume the decorrelator flips a coin to guess the transmitted b it of each user. \tV hat are Pe,i and Pe in this case? (c) For a CD NIA system w ith processing gain n, = 32 and k users, each with SNR 6 dB , \Vrite a l\/IATLAB program that averages over randomly chosen matrices S to estimate Pe for the decorrelator. Note that unlike the case for Problem 11.4.6, simulating the transmission of bits isA not necessary . Graph your est imate Pe as a function of k.
11.4.8 Simulate the multi-level Q i\.M system of Problem 11.3.4. Estimate the probabili ty of symbol error a nd the probability of bit error as a function of the noise variance a 2 . 11.4.9 In Problem 11.4.5, \Ve used simulation to estimate the probability of symbol error. For transmitting a binary bit stream
[ 398
CHAPTER 11
HYPO THESIS TESTING
over an MPSK system, we set each Ji.If= 2iv and each transmitted symbol corresponds to N bits. For example, for Ji.If = 16, we map each four-bit input b3b2b1bo to one of 16 symbols. A s imple way to do t h is is binary index mapping: transmit S i when b3b2b1bo is the binary representation of i. For example, the bit input 1100 is mapped to the transmitted signal s 12. Symbol errors in the comm unicat io1~ system cause b it errors. For example if s1 is sent but noise causes s2 to b e decoded, the input bit .,... sequence b3b2b1bo = 0001 is decoded .,... ,.... as b3b2b1bo = 0010 , resulting in 2 correct bits and 2 bit errors. In t his problem , we use lVIATLAB to invest igate how t he mapping of bits to symbols affects the probability of bit error. For our preliminary investigation, it \Vill be sufficient to map the three bits b2b1bo to the M = 8 PSI< system of Problem 11.3.5.
(c) Let b (i) = [b2(i) bi(i) bo(i)] denote t he input bit sequence that is mapped to S i . Let dij denote the number of bit posit ions in which b (i) and b (j) d iffer. For a given mapping 1 the bit error rate (BER) is
(a) F ind the acceptance sets {A 0 , . .
11.4.10
......
.,
A 7 }.
(b) Simulate m trials of the transmission of symbol so. Estimate the probabili t ies {Po j lJ = 0 , 1, . .. , 7} , that the receiver output is S j \vhen so \Vas sent. By symmetry, use the set { Poj } to determine Pij for all i and j.
BER=
j~ L L Pijdij· i
J
(d) Estimate the BER for the binary index mapping. ( e) The Gray code is perhaps the most commonly used mapping:
b 000 001 010 011 100 101 110 111 Does t he Gray code reduce the BER compared to the binary index mapping? Continuing Problem 11.4.9 , in the mapping of the b it sequence b2b1bo to the symbols S i, \Ve \Vish to determine the probability of error for each input bit bi . Let qi denote the probability that bit bi is decoded in error. Determine qo, qi, and q2 for both t he binary index mapping as \Vell as the Gray code mapping.
[
Estimation of a Random Variable
The tec11niques in Chapters 10 a nd 11 use the Ol1tcomes of experiments to m a ke inferences about probabilit}' rr1odels. Ir1 this chapter v.re use observations to calculat e an approxirr1ate va1l1e of a sample value of a random ·variable that 11as not been observed. The rar1dom ·v ariable of interest rnay be ur1available because it is impractical to rr1east1re (for exarr1ple, the t ernperatt1re of the sun), or because it is obscured by distortion (a signal corrupted by r1oise), or because it is not available soon enough. We refer to the estirr1ation of future obser\rations as prediction,. A predictor uses randorn va.riables obser\red ir1 early subexperirr1er1ts to estirnate a random variable produced b}' a lat er subexperirnent. If X is the r ar1dom variable to be estirnated, v.re adopt the notation X (also a randorn variable) for t11e estirnate. Ir1 rnost of the chapter, -vve use the rnean, sqv,are error A
(12. l )
as a rr1easl1re of the qualit}' of the estirr1ate. Sigr1al estirnatior1 is a big subject . To introduce it in one c11apter, -vve confir1e ot1r att ention to the following problerns: • Blind estimation of a. randorr1 variable • Estirnatior1 of a r andom variable given an event • Estirnatior1 of a r andom \rariable given one ot11er randorn variable • Linear estimation of a random variable given a randorn \rector • Linear estimation of a random vector giver1 another r ar1dom vector 399
[ 400
CHAPTER 12
ESTIMATION OF A RANDOM VARIABLE
12.1
Minimum Mean Square Error Estimation
The estirnate of X t.hat minirnizes the rnean squ ar e error is the expected vall1e of X given available ir1forrnatiori. The optimurr1 b lir1d estimate is E[ X] . It llSes onl}' t he probabilit}' model of X. The optirr1um estirr1ate giver1 X E A is E[X IA]. The optimum estimate given ·y = y is E[X IY = y]. An experiment prodllCes a randorn variable X. Howe·ver , -vve are t1nable to obser ve X directl}' · Ir1stead, -vve observe an event or a randorn variable t11at provides partial ir1forrnation about the sarnple value of X. X can be eit her discrete or cont ir1t1ous. If X is a discrete randorr1 va.riable, it is possible to use h}rpot11esis testing to estirnat e X. For each ;i;i E Sx, \x.re cou ld define h}rpot hesis Hi as the probability model Px( ;i;,i ) = 1, P x( ;i;) = 0, x =J ;i; i . A hypothesis test -vvould then lead llS to choose the rnost probable xi giver1 our observations. Although this procedure rnaxirnizes t11e probabilit}' of deterrnining the correct valt1e of Xi, it does not take into account t11e consequer1ces of incorrect restllts. It treats all errors in the sarr1e m anner, regardless of whet11er t hey produce a nswers that are close to or far frorr1 the correct value of X. Section 12.3 describes estimation techniques that adopt t his approach. B}' cont rast , the aim of t11e estimation procedures presented ir1 t his section is t o find an estirnate X that , on aver age, is close to t11e trt1e valt1e of X , even if t11e estirnate never produces a correct tinswer. A popt1lar example is an estirnate of t11e nt1rr1ber of children in a farnily. T11e best est irnate, based on ava.ilable ir1forrr1ation, mig11t be 2.4 ch ildren. In an estirnatior1 procedt1re, -vve airn for a low probability t11at t11e estirr1ate is far from t he true value of X . An accl1racy meast1re t11at helps us ac11ie,re t11is airn is the rr1ean sql1are error in Equation (12.1) . The m ear1 square error is one of rr1any -vva}'S of definir1g the accuraC}' of a n estimate. T wo other accuracy rr1easures, which rr1ight be appropriate to certain applications, are the expected valt1e of the absolt1te estirr1atior1 error E[IX - X I] and the rnaximurn absolute estimation error, rr1ax IX X I. In this section, we confir1e ot1r attention to the m ear1 square error, w hich is the rr1ost v.ridely used accl1raC}' rneasure beca11se it lends itself to rr1atl1ematical analysis and often leads to estimates that are conver1ient to cornpute. Ir1 particl1lar, -vve use t he mear1 sqt1are error acct1racy rr1easure to exarr1ine three different ways of estirr1atir1g randorn \rariable X. T11ey are distir1guished by t he ir1formation available. We consider t hree types of inforrnation: A
~
• T11e probability model of X (blir1d estirr1ation) , • T11e probability model of X and inforrr1ation t11at t11e sample value x E A , • The probabilit}' model of randorr1 variables X arid Y and information t hat
y
= y.
The estimation rnethods for these t11ree sit uations are fur1dam entall}' the sarne. Each one implies a probability rr1odel for X , vvhich rr1ay be a P DF , a cor1ditior1a l PDF , a PMF , or a conditional P l\/IF. In all three cases, t11e estirnate of X that prodl1ces t11e minirr1um rr1ean squa re error is t11e expected value (or cor1ditior1al expected value) of X calculated v.rith the probabilit y rr1odel that incorporates the available informatiori. While the expected \ralue is the best estirr1at e of X , it rnay
[ 12.1
MINIMUM MEAN SQUARE ERROR ESTIMATION
401
be complicated t o calculat e in a practical application. Many applications deri·ve ar1 easily calculated hriear estirnate of X, t11e subj ect of Section 12.2. Blind Estimation of
X
An experiment produces a, r andorr1 ·v ariable X. Before the experiment is performed , vvh at is the best estirr1ate of X? T his is t11e blin,d estirnation, problern becal1Se it requires us to rr1ake an inferen ce about X in t11e abser1ce of any observations. Although it is llnlikely t11at vve 'ivill gl1ess t h e correct valtle of X, vve can derive a number that comes as close a.s possible in t11e sense t11at it rninirr1izes t11e rnean sql1are error. We er1countered t he blir1d estirnate in Section 3.8 vvhere T11eorern 3.13 shows that X B = E[X] is the rninirr1um m ean square error estirr1ate in the absence of observatioris. The rnir1irnum error is e:B = Var[X]. In int rodl1cir1g t h e idea, of expected value, Chapter 3 describes E [X ] as a "typical value" of X. Theorern 3.13 gives t11is descriptior1 a rr1athernatical rnear1ir1g. ~
Example 12.1 Before a six- sided die is ro l led, what is t he m inimu m mean square error esti m ate of t he num ber of spots X t hat w ill appea r ? T he probab il ity model is Px(x;) = 1/6, x = 1, 2, ... , 6, ot herwise Px(x) = 0. For this model, E[X] = 3.5. Even t ho ug h XB = 3.5 is not in t he ra nge of X , it is t he esti mate t hat m inim izes t he mean square est imatio n error.
Estimation of
X Given an Event
Suppose that in perforrning an experiment , instead of observing X directly, we learn only t11at X E A. Gi'iren t his inforrnation , what is the minirnurn m ear1 square error estirnate of X ? Giver1 A , X has a cor1ditior1a l PDF f'x 1A(x;) o r a condit ional PMF Px1A(x) . Our task is to minirnize the co'n dition,al rnean, sqv,are error ex A = E [(X - i;) 2 IA]. \¥e see that t his is esser1t iall:y the sarr1e as the blir1d estirnation problern witl1 t he condit ional PDF f'x 1A(xl A) or the condit ional P MF Px1A(x) replacing f'x( x) or P x( x) . T herefore, vve have the follo'ivir1g : 1
Theorem 12.1 Given, the in,forrnation, X E A , the rnin,irn'urn rnean, sq'u are error estirnate of X is
Example 12.2 T he d uratio n T minutes of a phone ca ll is an expo nentia l rando m variab le w ith expected va lue E[T] = 3 minutes. If we observe that a ca ll has already lasted 2 mi nutes, what is the min im um mean sq ua re error estim ate of t he ca ll d urat io n? T his probab ility model also appea rs in Exa m ple 7 .10. T he P D F of T is
[ 402
CHAPTER 12
ESTIMATION OF A RANDOM VARIABLE
f'r (t) =
~ e-t/ 3
t>O ,
0
otherwise.
If the ca ll is still in progress after 2 minutes , we have minimum mean square error esti mate of T is
t EA = {T >
(12.2) 2 } . Therefore, t he
iA = E [TIT > 2) .
(12.3)
Refe rring to Example 7.10,. we have the co nditiona l PDF
fTIT>2(t)
=
l - (t-2)/3 3e
t > 2,
0
otherwise.
(12.4)
Therefore, 00
E [TIT > 2) =
1 2
1
t- e-(t- 2 ) / 3 dt = 2 + 3 = 5 mi nutes.
3
(12.5)
Prior to the phone ca ll, the minimum mean square erro r (b lind) esti mate ofT is E[T ) = 3 minutes. After t he call is in progress 2 minutes, the best estimate of the d urat ion becomes E[TIT > 2) = 5 minutes. This resu lt is an example of the memoryless property o f a n exponential ra ndom variable. At any time during a call, the expected tim e remai ning is t he expected value of the call duration, E[T ).
Minimum Mean Square Estimation of X Given Y Consider an experirr1ent that prodt1ces two r a.ndorr1 variables, X and Y. "\?Ve car1 observe Y but vve really v.ra.nt to know X. Therefore, t.he estirr1ation task is t o assign to every y E Sy a nurnber , i; , t11a t is near X. As in the ot11er t ec11r1iques presented in this section, 011r accuraC}' rneasure is the m ear1 square error (12.6) Because eac11 y E Sy produces a specific XNI(Y), XM(Y) is a sample ·v al11e o f a randorr1 va1iable X1w('Y) . T11e fact that i 1w(Y) is a sarnple ·val11e of a r a ndorr1 variable is ir1 contrast t o blind estirnatior1 and estirr1a tion gi·ven an event. In those sitt1ations, XB a nd XA a re parameters of t11e probability model of X. Ir1 cornrr1on v.rith XB in T heorern 3.13 and x;A ir1 Theorerr1 12. 1, the estirnate of X given Y is an expect ed ·v alue of X based on available inforrnation. In this case, the available ir1forrr1atior1 is the ·v alue of Y.
- - - Theorem 12.2~-The rnin,irnurn m,ea,n, sqv.are error estirnat e of X given, the observat'iori ·y = y is
i;1111(y)
= E
[XIY
=
y) .
[ 12.1
MINIMUM MEAN SQUARE ERROR ESTIMATION
403
-= Example 12.3 Suppose X and ·y are independent random variab les with PDFs f'x(x) and j'y(y) . What is the minimum mean square error estimate of X given Y ?
In th is case, fx1y(x ly) XM(Y) =
1:
= f'x(x;)
and the minimum mean square error estimate is
xfx 1Y (x ly) dx =
1:
xfx(x) dx = E [X] = Xs .
(1 2. 7)
That is , when X and Y are independent , the observation ·y provides no information about X, and the best est imate of X is the blind estimate. Example 12.4 Suppose that R has a uniform (0, 1) PDF and that given R = r, X is a uniform (0 , r) random variable. Find ;f;M(r) , the minimum mean square error estimate of X given R.
From Th eorem 12.2, we know XN1(r) = E[X IR = r ]. To calculate the estimator, we need the conditional PDF f'x 1R(x lr). The prob lem statement implies that
l /r 0
0
(12.8)
otherwise,
permitting us to write
XM(r)
=
1r~ 0
T
dx =
!:. .
(1 2. 9)
2
Although t11e estirn ate of X given R = r is simply r/2, t 11e estimate of R giver1 X = x for t he same proba,bility model is rnore cornp licated. Example 12.5 Suppose that R has a uniform (0, 1) PDF and that given R = r, X is a uniform (0 , r) random variable. Find f NI(x;), the minimum mean square error estimate of R given
X = x. From Theorem 12.2 , we know flvt(x) = E[R IX = x;) . To perform this calculation , we need to find the cond itional PDF j "1z1x(rlx). The derivat ion of f"R1x(rlx) appears in Example 7.18: I
-rlnx
0
O
-
-
;i; -
.1 .
I
(12.10)
The corresponding estimator is, therefore,
1 1
A
(
TM' J;
)
=
T
x
1 . dr = - r ln x
ln x
The graph of this function appears at the end of Examp le 12 .6.
(1 2.11)
[ 404
CHAPTER 12
ES TIMATION OF A RANDOM VARIABLE
W hile the solution of Exarnple 12.4 is a sirnple ft1rict ion of r t h at can easil}' be obta ined vvith a rnicroprocessor or an an alog electronic circu it, t he solution of Exarnple 12.5 is considerably more cornplex. In m a riy applica,tions, the cost of calculat ing tliis estirria,te could be significant . In t hese applications, engineers v-vould look for a sirnpler estirnate. Even t hot1gh t lie simpler estimate produces a higher rnean sqt1are error thari the estirriate in Exarnple 12. 5, t lie corriplexit}' savirigs rriight justify t he simpler a pproach. For t his reason, t here are rnany applications of est irnation theory that erriploy linear estimat es , t he subj ect of Section 12.2. Quiz 12.1
The randorn variables X a,nd ·y have t he joint probability density functiori
f X,Y (x, Y) =
(a) (b) ( c) (d )
W liat '\i\Tha t \i\Tliat '\i\Th at
12.2
2(y + :r;)
0
0
ot herv-vise.
< x < y < 1)
(12.12)
is f'x iy(x ly), the condit iorial PDF of X giveri Y = y? is x M (y), the JVIMSE estirriat e of X given ·y = y? is f y ix(y lx ), the condit iorial PDF of ·y given X = ;i;? is fJ.rvr (x), the JVIMSE estirriat e of Y given X = x?
Linear Estimation of X g iven
Y
The linear rnean square error (LMSE) est imate of X given Y has the form a,'Y + b. The opt irnurri v ah1es of a and b depend on t he expected values a,rid variances of X and Y and t lie covariance of X and ·y . Iri this section we again t1se an observatiori , y , of r andom variable ·y t o produce an estirriate, i;, of r andom variable X. Agairi , our acc11racy rrieasure is the m ean square error in Equatiori (12.l ). Section 12.l derives XM (y), the opt irriurn est im ate for each possible observation ·y = y . B}' contrast, in t h is sectiori t he estirnate is a single function that applies for all ·y . The notation for t his functiori is (12.13) v-vhere a and b are const a n ts for all y E S y . Because ;i: L (y) is a linear function of y, t lie proced t1re is referre d to as hriear estirnatiori. Linear estirriation appears in rriany electrical engineerir1g applications of st atistical iriference for several reasons: • Linear estirriat es are easy t o corripute. Analog filters using resistors, capacit ors, arid induct ors, arid digital signal processing rriicrocornputers perform linear operations efficiently .
[ 12.2
LINEAR ESTIMATION OF X GIVEN Y
405
• For sorr1e probability models, the optimt1m estimator XM(Y) described in Sect ion 12. l is a linear function of y. (See Exarr1ple 12.4.) In other probability rr1odels, t11e error prodt1ced b:y t11e optirr1um lir1ear estirr1ator is riot much higher t11an t11e error prod11ced b}' the optirnum estirr1ator. • T he ·values of a, b of the optirnurr1 linear estimator and the corresponding value of the error depend only on E[ X ], E[.Y ], Var[X ], Var[Y], and Co·v[X, Y]. Therefore, it is not necessar}' to kno-vv the corr1plete probability rr1odel of X and Y ir1 order to design and evaluate ar1 optirr1um lir1ear estim ator. To present t11e rr1athernatics of rr1ir1irr1urn rr1ean square error lir1ear estirr1ation, -vve ir1troduce t he subscript L to denote the mean square error of a linear estirnate: (1 2.14) In this forrr1t1la, we use XL(Y) and not xL(Y) because t11e expected ·v alue in t11e formula is an t1nconditional expected ·value in contrast to t11e conditional expected value (Eqt1atior1 (12.6)) that is t11e quality rneasure for XM(y) . 11!inirn11rr1 rnean square error estirnation in principle uses a different calct1lation for each y E Sy. By contrast , a linear estirr1ator uses t11e sarne coefficients a and b for all y . The following t heorerr1 presents the important properties of optirr1urn linear estimat es in terms of the correlatior1 coefficient Px,Y of X and Y introd11ced in Defir1itior1 5.6.
Theorem 12.3 Ran,dorn variables X an,d Y have expect ed values 11,x a/nd µ,y; staridard deviation,s Cl x an,d Cly, an,d correlatiori coefficien,t p x ,y; The optirnurn lin,ear rneari square er;·or (LMSE) estirnator of X given, Y is ClX
A
XL(Y) = PX y '
Cl y
CY -
µ,y)
+ /1,X .
This lin,ear estim,ator has the follo'tJJin,g properties: (a,) The rniriirn'tJ,rn rnean, sqv,are estirnat'ion, error for a lin,ear estirnate is e[,
= E [(X - XL(Y)) 2 ] = Cll(l - P~,Y ).
{b) T he estirnation error X - XL (Y) is 11,ricorrelat ed v1ith Y.
Proof Replacing XL(Y) by aY +band expanding t he square, we have
e1 =
E [X
2
] -
2
2
2
2aE[XY] - 2bE [X ] + a E [Y J + 2abE [Y] + b
.
(12.15)
The values of a and b t hat produce t he m inimum eL are found by computing the partial derivatives of eL 'vith respect to a and b and setting the derivatives to zero, yielding
a;: OeL
ab
= -2E [XY] + 2aE (Y
2
]
+ 2b E [Y] = 0,
= -2E [X] + 2aE [Y] + 2b = 0.
(12.16) (12.17)
[ 406
CHAPTER 12
ES TIMATION OF A RANDOM VARIABLE
2 ~
2 ~
0
-2
2
Xx
x
0
~
x x
-2 -2
2
0
-2
2
0
>s<
-2
x
y
(a) PX,Y
0
-2
y
= - 0.95
x x 2
0 y
( c) PX ,Y
(b) PX ,Y = 0
= 0.60
Each graph contains 50 san1ple values of t he random variable pair (X , Y ) , each n1arked by t he symbol x . In each gr aph, E [X ) = E [Y) = 0, Var[X) = ·v ar[Y) = 1. 'I'h e solid line is the optimal linea r estin1ator X L (Y) = p x , y Y. Fig ure 12.1
Solving t he two equations for Var[Y)
and b, we find
~ - Px ,Y a y'
* _ Cov [X , Y) _ a, -
a,
.
b*
= E [X) -
a* E [Y) .
(12.18)
Some algebra will verify t hat a* Y +b* is t he optimum linear esti1nate X L(Y) . \!Ve confirm Theorem 12.3 (a) by using XL (Y ) in Equation (12.14) . T o prove part (b) of t he t heorem, observe t hat t he correlat ion of Y and t he estim ation error is E [Y [X-XL(Y)J]
=E[XY) -E[Y E[X])-c~!7;r1
2
(E[Y ] - E[YE[Y]])
Cov [X, Y) . [Y) = C ov [x , Y] Var[Y) Var = 0.
(
12.19
)
T11eorem 12.3(b) is r eferred t o as t he orthogon,ality prin,ciple of t he LMSE. It stat es that t 11e est im atior1 error is ort hogonal t o the d ata used in the est imate. A georr1etric explanation of linear estirr1ation is that t 11e optiml1rn estirnat e of X is the projection, of X in t o t he pla ne of linear f\1nctions of Y . The correlation coefficie r1t px, y pla}'S a key role in the optirnurr1 linear estimator. R ecall frorn Section 5 .7 t hat IPx ,Y I < 1 a nd t h at Px,Y = ± 1 corresponds to a det ermir1ist ic lir1ear relations11ip betV\reen X a nd ·y . T11is property is r eflected ir1 the fact t11at v.rhen Px,Y = ± 1, e£ = 0. At the other ex t re rne, v.rhe n X a nd Y are uncorrelated , Px,Y = 0 a nd XL (Y) = E[X] , t he blir1d estirnat e. vVith X and Y llncorrelat ed , there is no linear funct ion of ·y t h at pro·vides t1seful information a bol1t the ·v alue of X. T11e m agnit ude of the correlation coefficient indicates the ex ten t to v.r11ich observir1g Y improves our kr1owledge of X , a nd the sig n of PX,Y ir1dicates \vhether the slope of the estirnate is positive, negat ive , or zero. Figure 12.l cor1tair1s three d ifferen t pairs of r a ndorr1 varia bles X a nd Y. In each gra ph , the crosses a re 50 ot1tcomes x, y of the unde r lying experiment , a nd t11e lir1e is t he opt irr1urr1 linear est irnate of X . I n all three gra phs, E [X ] = E [Y] = 0 a rid V a r [X ] = V a r ["Y] = 1.
[ 12.2
LINEAR ESTIMATION OF X GIVEN Y
407
Frorn T 11eorerr1 12.3, v.re knovv t 11at t11e optirnurn linear estirnator of X giver1 Y is the line X L (Y ) = p x ,Y Y . For each pair ( x , y), t he estirnation error equals the vertical dist ar1ce t o the estimator line. In t 11e gr aph of F igure 12.l (a) , Px,Y = -0.95. Therefore, e[, = 0.0975 , a nd all t he observations are close t o t he estimat e , \vhich has a slope of -0.95. By con tr ast , in gr aph (b ), \vi th X and Y uncorrelated , t 11e points a re scatt er ed ra ndoml:y ir1 t11e x; , y plane and e[, = Var (X ] = 1. Lastl}r, in gr ap11 ( c), Px,Y = 0.6 , and t he observations, or1 a\rerage, follow the estimator X L ('Y ) = 0.6.Y , although the estimates are less accurat e t 11an t hose ir1 graph (a) . At the beginning of thiis section, we stat e that for some probability rnodels, t he optiml1rn estirnator of X given ·y is a linear estirnator. The following theorem s11ov.rs that t his is always t he case \vher1 X and Y are jointl}' Gaussian random variables, described in Section 5.9.
Theorem 12.4 If X arid Y are the bivariat e Gaussian, ran,dorn variables in, D efin,it'i on, 5.10) the optirnv,rn estirnator of X given, Y 'is the optirn11,rn l'iriear est'irnator in, T heorern 12. 3. Proof F rom Theorem 12.3, applying a* and b* to t he op t imal linear estimator XL(Y) = a* Y + b* yields
ax
A
XL(Y) = px,y -
a y
(Y - JLY)
+ µ,x .
(12.20)
Fr om Theor em 7.16, vve ob serve t h at w hen X and Y are j oint ly G aussian , E[XI Y] is ident ical to XL(Y ) .
X1VL (Y)
=
In the case of joint ly G aussian r andorr1 \rariables , the optimurr1 estimat e of X given Y and t 11e opt irr1urr1 estirr1at e of Y gi\ren X are bot11 linear. Hovvever, t here ar e also probability models in whic11 one of t11e op t irnum estimates is linear and the ot her one is not linear. This occurs in t he probability rr1odel of Examples 12.4 a nd 12.5. H er e x1w(r) (Exarr1ple 12.4) is linear, and fl'v!(x) (Example 12.5) is nonlinear. In t he follo\vir1g ex arr1ple, we derive t he linear estirr1at or 'r·L (x) for t his probability model and corr1pare it \vith t he optimtlrr1 estirnator in Exarnple 12.5.
Example 12.6 As in Examples 12 .4 and 12.5 , R is a un iform (0, 1) random variable and given R = r , Xis a un iform (0, r) random variable. Derive the opti mum linear estimator of R given
X. From the problem statement, we know PDF of X and R is
f x1 R(xlr)
and f'R(r), imp lying that the joint
l /r 0
O
)
(12.21 )
The estimate we have to derive is given by Theorem 12 .3: O'R
f L(x) =PR x---2. (x - E (X]) + E (R]. ' ax
(12.22)
[ 408
CHAPTER 12
ESTIMATION OF A RANDOM VARIABLE
1.2 . . . . - - - - - - - - - . . . . . . - - - - - - - - - - - - - - - - - - - . . - - - - - -
.' . . • •
1
'
.
'
•
.--.::.
~~0.6 :....
0.4 0.2
-
MMSE ........._ ___..___ _...__ ___._ _.....__ _ _ _ _ ___.._ _LMSE ......__ __.
~_
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
x
Figure 12.2 T he minimun1 n1ean square error (11fl\/ISE) estin1ate f 1VJ (x) in Exan1ple 12.5 and t he op timum linear (Ll\IISE) estin1ate fL(::e) in Example 12.6 of X given R .
Since R is uniform on [O, l], E [R] = 1/ 2 and fx 1R(xlr) in Equation (12 .8), we have
f x (x) =
J:
f x ,n(x,r) dr =
O'R
.{1~ 1/ r) dr = (
0
= 1/ Ji2.
- ln x
Using the formu la for
0 < x < 1. otherwise. -
-
I
(12.23)
From this margina l PDF , we can calcu late E[X] = 1/4 and O'X = ./7 / 12. Using the joint PDF, we obtain E[X R] = 1/ 6, so that Cov [X , R] = E [X R] - E [X] E [R] = 1/ 24. Thus fJR,X = v'3ft. Putting these values into Equation (12 .22), the optimum linear estimator is
6 2 f L(x) = - x +-.
7
7
(12.24)
Figure 12.2 compares the optimum (MMSE) estimator and the optimum linear (LMSE) estimator. We see that the two estimators are reasonably close for all but extreme values of x (near 0 and 1). Note that for x > 5/ 6, the linear estimate is greater than 1, the largest possible va lue of R. By contrast, the optimum estim ate f .rvr (x) is confined to the range of R for all x; .
In t his section, t11e exarnples apply to continuot1s rar1dom variables. F or discrete r andom variables, the linear estirnator is also described b}' T11eorern 12.3. \ ¥ her1 X and Y are discrete, the p a.rarnet ers (expected vall1e, variance, covariance) are Sl1rns contair1ing the joint P 1!{F Px ,Y( x , y). In Section 12.4, '""'' e llSe .a, linear cornbination of the randorn variables in a randorn vector to estirr1ate anot her randorn ·v ariable . Quiz 12.2 A telernetry signal, T , transrnitted frorn a ternperat11re sensor on a cornrnur1ications sat ellite is a Gaussian rar1dom variable v.rit h E [T ] = 0 and V ar [T ] = 9. The recei\rer
[ 12.3
at rriission control recei·ves R T '\vith PDF
MAP AND ML ESTIMATION
409
= T + X , '\vlier e X is a noise voltage iridependent of
f'x (x;)
=
1/6 0
- 3 < ;i; < 3, otherwise.
(12.25)
The receiver uses R to calculate a linear estirriat e of the telernetry \roltage: (1 2.26)
(a) Wliat is E[R], the expected va1t1e of tlie recei,red voltage? (b) vVliat is Var [R], the \rarian ce of the received \TOltage? ( c) Wliat is Cov[T, R] , t11e covarian ce of the t ransrriitted \roltage and the received voltage? (d) Wliat is t he correlation coefficierit PT,R of T and R? ( e) Wliat a re a* and b*, the optirn11m mean squar e values of a arid b in the linear estirnator? ( f) '\i\That is e£, the rriiriirnurn rnean square error of the linear estirnate?
12.3
MAP and ML Estimation The rriaxirnurri a 11osteriori probability (l\IIAP) estirriate of X given Y = y is the value of x that rnaxirnizes tlie conditional PDF f x I y( x Iy). The m axirn urri likelihood (ML) estimate is the val11e of x; that rriaxirriizes tlie conditional PDF f' y 1x(Ylx;) . The l\/IL estim ate is identical to the MAP estimate vvhen Xis a uniforrri randorri variable.
Sections 12.l and 12.2 describe rriethods for rriiriirriizing the rriean square error in estirriatirig a randorri \rariable X giveri a sarriple \ralue of another r andorri variable Y. Iri this section, vve present the rriaximurri a posteriori probability (MAP) estirriator and tlie m axirn11m likelihood (ML) estirnator. Although neither of these estimat es produces the rninirn11m rr1ean sq11ar e error , t hey are cori\renient t o obtairi in sorrie applications, and the:y often produce estirriates '\vit h errors that are riot m11ch higher than t he rninirriurn rriean square error. As you might expect , MAP and 1!{L estirriation are closely r elated to MAP and l\IIL h}rpothesis testirig.
Definition 12.1== MAP Estimate The maximum a posteriori probabilit y {MA P) estimate of X given, an, ob-
[ 410
CHAPTER 12
ESTIMATION OF A RANDOM VARIABLE
servation, of Y is Discrete: Coritiriv,o'IJ,S: Xl\tIAP(Y)
= a rg rr1:1~ f xi y ( xly) .
The notation a rgmaxx g(:_c) denotes a value of ;i; that rr1axirnizes g(:i;), where g(x) is ar1:y fur1ction of a varia,ble x . The properties of t he conditional P l\1F and the conditional PDF lead to forrr1ulas calculating the )\1IAP estirr1ator that are used in applications. Recall frorr1 Theorem 7 .10 that
f x ,Y(:i;,y) j'y (y)
(12.27)
Becat1se the denominator j.y(y) does not deper1d on ;i; , rnaximizing 1·x1y(;r:ly) o·ver all x is equivaler1t to maximizing t he nurnerator j'y1x(y l:i;) f'x( x) . Sirnilarly, rnaximizing Px1y(xly.i) is equi·valer1t to finding x ,;, that corresponds t o the m axirr111m ·value of Py 1x(Yl:i;)Y.i l:i;Px( x ):i;. This implies t he J\!{AP estimation procedure can be v.rritten ir1 the follovving way.
=== Theorem 12.5-==::::i The MAP estirnate of X g'iven, ·y = y is 1
Discrete:
XMAP(Y.i) = arg rnax Px1Y (xlY.i ); :r;E Sx
Con,tin/uo11,s: Xl\tIAP(Y) = argrr1axj'y 1x(yl:i;) f'x (;r:) . x
·vVhen X and ·y are discr et e r andorn variables, the MAP estimate is similar to t11e rest1lt of a rnt1ltiple 11ypothesis test ir1 Chapter 11 , where each outcorne X i in the sarnple space of X corresponds to a hypothesis Hi . The MAP estimate m axirr1izes the probability of choosir1g the correct ;_r;,;, . vVhen X and Y are contin11ous randorn variables and v.re obser ve the e·ver1t Y = y, "''e let H:i: der1ote the 11ypothesis t11at ;i; < X < x + rl:i; . Sir1ce x is a continuot1s pararneter, we have a cor1tinuurr1 of hypotheses Hx. Deciding 11ypothesis B .:J; corresponds to choosir1g x as an estimate for X. The MAP estirr1ator Xl\tIAP(Y) = x rnaxirr1izes t he probability of Hx given the observation Y = y . Theorern 12.5 indicates. that t he )\1IAP estirnation proced11re uses the P MF Px(x) or the PDF f'x(:i;), the a priori probabilit}' rnodel for random variable X. Tllis is analogous to the requirernent of the MAP hypothesis test t11at \rve know the a priori probabilit ies P[Hi] . In tl1e absence of this a priori irilormation , "''e can instead irr1plemer1t a m axirnt1m likelihood estirr1ator.
[ 12.3
MAP AND ML ESTIMATION
411
Definition 12.2
Maximum Likelihood (ML) Estimate The maximum likelihood (ML) estimate of X given, the observation, Y = y is Discrete:
The prirnary difference betv.reen the MAP and ML estirn ates is t hat t he rriaximt1rri likelihood estirnate does riot use inforrriation about t he a priori probability model of X. This is an alogous to t h e sit uation in hypothesis testing in vv1iic1i tlie J\!{L hypothesis-testing rule does riot use information about the a priori probabilit ies of the hypotheses. The J\!{L estimate is tlie sarrie as t h e MAP estimate vvlien all possible ·values of X are eq11all:y likely. The following exarnple, v.re observe relationsliips arriong five estirriates studied in this cliapter. Example 12.1----~Consider a col lection of old coins. Each coin has random probabi lity, q of land ing with heads up when it is flipped . The probabi lity of heads, q, is a samp le va lue of a beta (2, 2) random variable, Q, with PDF 6q(l - q) 0
0
< q < 1)
otherwise.
(12.28)
To estimate q for a co in we fl ip the coin ri t im es a nd co unt t he number of heads, k . Because each flip is a Bernoull i tria l with probability of s uccess q, k is a sample va lue of the binomial( n,q) rando m variable K. Given J( = k , de rive the follow ing estimates of Q : (a) (b) (c) (d) (e)
The The The The The
blind estimate fi.B, maximum like lihood estimate fJ.1v1L(k), maximum a posteriori probability estimatefi.MAP(k), min imum mean square error est imate r/1v1(k) , optimum linear estimater/L(k) .
(a) To derive the bli nd estimate, we refer to Append ix A for the propert ies of the beta (i = 2,j = 2) random variab le and find fJ.B = E (Q) = . 'l
'/,
. = 1/ 2.
+J
(1 2.29)
(b) To find the other estimates , refe r to the conditio nal PMF of the binom ia l(n,q) random variable J( :
PI
(1 2.30)
[ 412
CHAPTER 12
ESTIMATION OF A RANDOM VARIABLE
The ML estimate is the va lue of q that maximizes P1<1Q(klq) . The derivative of PK IQ(klq) with respect to q is
dPK IQ(k lq) dq
= (n')qk-l(l - q)n-k-1 (k(l - q) _ (n, _ k)q] . k
~
(12. 31)
Setting dP1<1Q(klq)/dq = 0, and solving for q yields
fiML(k)
k
(12. 32)
= - , 'n
the relative frequency of heads in 'n coin flips. ( c) T he MAP estimator is the value of q that maximizes
(klq) f Q(q) j .QII< (qlk) -_ PKIQPK(k) .
(12. 33)
Since the denominato r of Equation (12.33) is a constant with respect to q, we can obtain the maximum value by setting the derivative of the numerator to zero:
d[P1<1Q(k lq) f'Q(q) ] dq
=
6(ri)k qk( l -
q)n-k [(k
+ 1)(1 -
q) - (n, - k + l )q]
= o.
(12. 34)
Solving for q yields A
( )
(/lVIAP
k
= T/,k ++ l2 ·
(12. 35)
(d) To compute the MMSE estimate fiM(k) = E [QII< = k], we have to ana lyze f QIK(q lk) in Equation (12 .33). The numerator terms, f'Q(q) and PI
PK(k) =
1:
PKIQ(klq) fQ (q) dq.
(12. 36)
Substituting f Q(q) and PK IQ(k lq) from Equations (12.28) and (12.30) , we obtain
PK (k) = 6
(~)
1'
qk+'(l - q)n-k+l dq.
(12. 37)
The function of q in the integrand appears in a beta (k +2 ,n, - k +2) PDF. lfwe mu ltiply the integrand by the constant jJ(k + 2, n, - k + 2), the resulting integra l is 1. That is ,
1'
{J(k + 2 ,n - k + 2)qk+ 1 (1 - q)n-k+l dq = 1.
(12. 38)
[ 12.3
413
MAP AND ML ESTIMATION
It fol lows from Equations (12 .37) and (12.38) that
PK (k) =
5 (~) f3 (k +2 ,n, - k +2 )
(12.39)
for k = 0, 1, ... , 'n and PK(k) = 0 otherwise. From Equation (12 .33) ,
/3(k + 2, T/,
-
k + 2)qk+l (1 - q)n-k+l
0
otherwise.
That is, given K = k, Q is a beta (i = k Thus , from Appendix A, qM k = E A
( )
0 < q < 1,
[
+ 2,j =
n, - k + 2) rando m variable.
QIJ( = k l = . i . = k+ 2 . 'l, + J
'n +4
(12.40)
(e) In Equation (12 .40), the mi nimum mean square error estimatorqA1 (k) is the linear funct ion of k: r/lvt(k) = a* k + b* where a* = l /('n + 4) and b* = 2/(n, + 4) . Therefore, ljL(k) = fiN1(k) . It is instruct ive to compare the differe nt estimates. The blind estimate, using only prior infor mation, is simply E(Q) = 1/ 2, regard leS'S of the resu lts of t he Bernoul li trials. By contrast, the maximu m likelihood estimate makes no use of prior infor mation . Therefore, it estimates Q as k/n,, the re lative freque ncy of heads in n, coin fli ps. When n, = 0, there are no observations, and the re is no maximum li ke li hood estimate. The other estimates use both prior information and data from the coin fli ps. In the abse nce of data (n, = 0), they produce li1vIAP(k) = 1iA1(k) = ljL(k) = 1/ 2 = E(Q) = fJB· As n, grows large , they a ll approac h k/n, = cJi\tIL(k), the re lative frequency of heads. Fo r low va lues of ri > 0, liM(k) = qL(k) is a little fart he r from 1/ 2 relative to cJNIAP(k) . Th is reduces the probabil ity of hig h errors that occu r when n, is smal l and q is near 0 or 1.
Quiz 12.3 A recei·ver at a ra dia l distar1ce R frorr1 a r adio b eacon rneasures the beacon pov.rer to b e
X = ·y - 40 - 40log 10 R d B ,
(12.41 )
v.rh ere Y , called t11e s11adow fa ding factor , is the G at1ssiar1 ( 0 , 8) randorn ·va riable that is indep endent of R. vV h en t h e receiver is eqt1all}' likely to be at a ny point v.rithin a 1000 m r adit1s circle arot1nd the beacon, t11e distan ce R has PDF
f R(r) =
2r/106
0
0
other wise.
< T < 1000 ,
Find the ML a rid 1!{AP estimates of R giver1 the obser vatior1 X = x; .
(12.42)
[ 414
CHAPTER 12
ESTIMATION OF A RANDOM VARIABLE
12.4
Linear Estimation of Random Variables from Random Vectors Given an observation of a randorr1 vector) the coefficients of the optimurn linear estirr1ator of a randorr1 variable is the solution to a set
of linear eql1ations. The coefficients in the equations are elements of the autocorrelation rnatrix of the observed random vector. The right side is the cross-correlation matrix of the estirr1ated randorn variable and the observed randorr1 vector. The estirr1ation error of the optim urn linear estirr1ator is uncorrelated with the observed randorr1 variables. There are rnar1y practical a pplications that use sarnple ·v alues of n, randorn ·variables Yo ) ... Yr1,-1 t o calc11late a linear estirr1ates of sarr1ple val11es of other random variables Xo) ... ) X rn-1 · This section represents the random variables Yi and X.1 as elerr1ents of the randorr1 ·vectors Y arid X . vVe st art wit h Theorem 12.6) a ·vector versior1 of Theorem 12.3 ir1 which v.re form a linear estirr1ate of a randorr1 variable X based on the observatior1 of a randorr1 vector Y . Theorem 12.6 a pplies to t he special case in which X arid all of t11e elements of Y r1ave zero expected ·value. This is followed by Theorem 12.7) v.rr1ich applies to the ger1eral case inclt1ding X and Y vvith nonzero expected ·v alue. Finally, T11eorem 12.8 provides the vector version of Theorerr1 12. 7) in vvhicr1 tl1e randorr1 vector Y is used to forrn a linear estirr1ate of the sarnple ·value of randorr1 vector X .
- - - Theorem 12.6 X is a ran,dorn variable 'tJJith E[X ) = O; an,d Y is an, n,-dirnen,siorial ran,dorn vector 111ith E[Y ) = 0 . Th e rniri,irn'tJ,rn, rnean, sq'uare error lin,ear estirnator is A
XL(Y )
-1
= R x vRy
Y)
111here R y is the 'n x n, c orrelation, rnatrix of Y (D efin,ition, 8. 8) an,d Rxy is th e 1 x ri cross- correlation, rnatrix; of X arid Y (D efin,ition, 8.10). This estirnator has the follo'tlJ't'ng properties: (a,) The estirnation, error X -
XL (Y )
is un,correlated v.1ith th e elernen,ts of Y .
{b) Th e rn'iriirn11,rn rneari, sq'uare est'irnation, error is
1
~i - 1 ] ' and a = [ao an - 1] , \Ve represent the linear estimator as XL (Y ) = a'Y. T o derive t h e optimal a , we \vrite t h e m ean squ ar e estimation error as Proof In terms of Y =
[Yo
(12.43) The partial d erivative of eL wit h respect to
~:~ = -2E [Yi(X = -2 E [Yi(X -
ai
is
XL(Y)) ] aoYo - al Y1 - ... - an- 1~i - 1 ) ] .
(12.44)
[ 12.4
LINEAR ESTIMATION OF RANDOM VARIABLES FROM RANDOM VECTORS
415
T o minimize t he error, we set 8e L / 8ai = 0 fo r a ll i. \\ !e recognize t he first expected value in E quation (1 2.44) as t he correlation of Yi and t he estimation error . Setting t his cor relation to zer o for all Yi establishes Theorem 12.6 (a) . Expanding t he second expected value on t he righ t side a nd setting it to zero, we obtain (12.45) Recognizing t hat all t he exp ected values are cor relations , we write (12.46) Setting t he n, p art ial d erivatives to zero, v..;e ob tain a set of n linear equat ions in t he n, unkno,vn elemen ts of a. In m atrix form, t he equat ions ar e Rya = Ry x . Solving for a = Ry1 Ry x completes t he proof of t he first par t of t he t heorem. To verify t he minimum mean square error , 've 'vrite
el,= E [(X - a'Y) 2 ]
= E [(X 2
-
a'Y X )] - E [(X - a'Y)a'YJ .
The second term on t he right side is zero because E [(X - a' Y )}j ] = 0 for j The first term is iden t ical to t he error expr ession of Theore1n 12.6(b) .
(12.47)
= 0, 1, ... , n,-
1.
Example 12.8 Observe the random vector Y = X + W , where X and W are independent random vectors with expected values E [X ] = E [W] = 0 and correlat io n matrices
Rx=
[o.~5
0.75] 1
(12.48)
'
Find the coefficients 0, 1 and 0, 2 of the optimum linear estimator of the random variable X = X 1 given ·y 1 and ·v 2 . Find the mean square error, e£, of the optimum estimator. In terms of Theorem 12 .6 , n, = 2, and we wish to estimate X given the observation vector Y = [Y1 Y2]'. To apply Theorem 12.6, we need to find Ry and Rxy.
+ W )(X' + W')] = E [XX' + XW' + WX' + WW'] .
Ry= E [YY'] = E[ (X
(12.49)
Because X and Ware independent , E[XW'] = E[X] E[W'] = 0 . Similarly, E[WX'] = 0 . Th is implies
Ry= E [XX'] +E [WW'] =Rx+ Rw =
[o\
1 0.75] 5 1.1 .
To find Rxy, it is convenient to solve for the transpose R~y
(12.50)
= Ryx . (12.51)
[ 416
CHAPTER 12
ES TIMATION OF A RANDOM VARIABLE
Since X and W are independe nt vectors, E (W1X 1) = E(W1) E(X1) = 0. For th e s ame reaso n, E (W2X1] = 0. Thu s
R vx Therefo re, R xv = [1 X g iven Y1 a nd Y2 is
l [l
E(X[) [E [X2X1] -
=
1 0.75 .
(12 .52)
0 .75], an d by T heo re m 12.6 , the o ptimum linear est im ator of -1
.~
XL(Y ) = R xvR y Y = [1
0.75] [
1
~7 5
l l
0.75
l. l
-1 [ Y1
y
2
. 0.830Y1 + O.ll6Y2.
=
(12 .53)
T he mea n squ a re error is
Var[X] -
RxvRy1 R~y = 1 -
[0830 0. 116]
[0 ~ 5 ]
=
(12 .54) ==:...__-=--...1
00830
The r1ext theorern ger1eralizes T 11eorerr1 12.6 to randorn ·variables v.rith nor1zero expected values. In t11is case the optirr111rn estirnate contains a constant t errr1 b, and the coefficients of the linea.r eql1atior1s are co,rariances.
- - - Theorem 12.1 X is a randorn variable 'llJith expected val11,e E(X]. Y 'is an, 11, -dirnen,sion,al raridorn vector VJith ex;pected val11,e E[Y ) an,d ri x 77, covariarice rnatrix C y. C xv is the 1 x 77, cross-covarian,ce of X an,d Y . The rnin,irn?J,rn rneari sq11.are errvr (MMSE} lin,ear estirnator of X giveri Y 'is A
XL(Y )
1
= C xv C y (Y - E [Y )) + E[X) .
This est'irnator has the follo11Jir1,g pro1Jerties: (a,) The estirnation, error X -
XL (Y )
is ?J,n,correlated VJith the elernen,ts of Y .
{b) Th e rn'iriirn11,rn rneari sq11.are est'irnation, error is
Proof \A/e represent the optimum linear estimator as A
XL (Y )
I
= a Y + b.
For any a , 8eL / ab = 0 , implying 2E[ X - a ' Y - b] follo\:vs from Equation (12.55) that A
I
=
(12.55) 0. Hence b
XL(Y ) - E [X] = a (Y - E [Y ]).
=
E[X] - a ' E[Y ]. It
(12.56)
[ 12.4
LINEAR ESTIMATION OF RANDOM VARIABLES FROM RANDOM VECTORS
417
Defining U = X-E[X ) and V = Y -E[Y ), 've can write Equation (12.56) as UL(V ) = a ' V 'vher e E[U] = 0 a nd E[V) = 0 . Theorem 12.6 implies t hat t he optimum linear estimator of U given Vis UL(V ) = R uv· R -v 1 V. \Ne next observe t hat Definition 8.1 1 implies t hat R \,. = C y. Similarly R uv = C xy . Therefore, C xY Cy1 V is t he optimum estimator of U given V . That isi over all choices of a i
is minimized b y a ' = C x YCy 1 . Thus XL(Y ) = a ' Y + b is t he minimum mean square error estimate of X given Y. The proofs of Theor em 12.7 (a) a nd Theor em 12.7(b) use t he same logic as t he corresponding proofs in Theorem 12.6.
It is ofter1 convenient to r epresent the optimt1rr1 linear estirr1ator of Theorem 12.7 in t11e forrn
X L(Y ) = a'Y
+ b,
(12.58)
b = E [X] - a ' E [Y] .
(12.59)
vvith aI
= c XY c y-1 ,
This form rerninds us that a ' is a rovv ·vector that is the solut ion t o the set of linear equations a
'c-y 1
=
c XY ·
(12.60)
In rnany signal-processir1g applications, the vector Y is a collection of sarnples Y(t 0 ) , Y(t 1 ) , ... , ·Y (tn- 1 ) of a sigr1a l Y(t) . In this setting, a' is a vector r epresentation of a linear filter.
=== Example 12.91==::::::1 As in Exam pie 8 .10, consider the o utdoor t ern pe rature at a certain weathe r station . O n May 5, the tempe rature measurements in deg rees Fahre nh eit take n at 6 AM , 12 noon , and 6 PM are elements of the three-dimensiona l ra ndom vector X with E[X] = [50 62 58] '. Th e covariance matrix of the three measurements is
Cx
=
16.0 12.8 11.2
12.8 16.0 12.8
11.2 12.8 16.0
(12.61)
Use the tern pe ratu res at 6 AM and 12 noon to predict t he te m perature at 6 PM: X3 = a 'Y + b, where Y = [X1 X2]'. A
(a) (b) ( c) ( d)
What are the coefficients of the optimum estimator a and b? What is the mean square estimation error? What a re the coefficients a* and b* of the optimum estim ator of X 3 given X 2? What is the mean square estimation error based on the observation X2?
[ 418
CHAPTER 12
ESTIMATION OF A RANDOM VARIABLE
(a) Let X = X 3 . From T heorem 12.7, we know that
c X Y c y- 1>
I a=
(12.62)
b = E [X] - C x yCy1 E [Y ] = E [X] - a' E [Y ] .
(12.63)
Thus we need to find the expected value E (Y ], t he covaria nce mat ri x Cy , and the cross-cova riance matrix Cxy . Sin ce Y = [X1 X2]',
E [Y ] = [E [X1]
E [X2l] ' = [50 62] ' >
(12.64)
and we can find the covaria nce matrix of Y in Cx :
Cy = [Cx(l, ·1_ ) Cx(2 , 1)
Cx( l , 2)] = [16.0 Cx(2, 2) 12.8
12.8] 16.0 ·
(12.65)
Since X = X3, the elements of Cxy are a lso in Cx. In part icu lar, C x y = Cvx· whe re
(12.66) . S1nce a' =
c xY c-y 1 , a '
so 1ves a 'c y =
c xY , ·im p1y1ng ·
a'= [0.2745
0.6078] .
(12.67)
Furthermore, b = E [X 3] - a' E [Y ] = 58 - 50a 1 - 62a2 = 6.591. ( b) T he mean square estimat io n error is
e[, = Var[X] - a'C ~y = 16 - l l .2a 1 - 12.8a 2 = 5.145 degrees2. He re, we have fou nd Var(X ] = Var[X3] in Cx : ·v ar[X3] = Cov[X3, X 3] Cx(3 , 3) . ( c) Using only the observation Y = X 2 , we apply Theorem 12 .3 and find a,
*
Cov (X2 >X3] = Var(X2]
=
12.8 16
= 0.8,
b* = E [X] - a* E [Y] = 58 - 0.8(62) = 8.4.
(12.68) (12.69)
(d) T he mean square error of the estim ate based on ·y = X 2 is e[, = Var[X] - a* Cov ["Y, X ] = 16 - 0.8 (1 2. 8) = 5.76 degrees2.
(12.70)
Ir1 Example 12.9>we see t l1a.t the estirr1ator err1ploying both X 1 and X 2 can exploit the correlation of X 1 and X3 to offer a reduced mean square error corr1pa red to the estirnator t:hat uses just X 2.
[ 12.4
LINEAR ES TIMATION OF RANDOM VARIABLES FROM RANDOM VEC TORS
419
If yol1 go to we a ther . c om) } ' OU vvill receive a cornprehensive prediction of t 11e future \Veat11er. If xi is t 11e terr1peratl1re i 11ours frorn now, the website v.rill rr1ake predictions XA = [XA1 XAn ] ' of t 11e vector X = [X1 X n ] ' of future t emp er atures . These predict ions a re based on a vector Y of a\railable observations. That is) a we ather. c om prediction is t he \rector ft1nct ion X = X (Y ) of observation Y . \i\Then usir1g vect or Y t o est im ate \rector X , the MSE becorr1es A
A
(12.71) vVe see in Eqt1ation (12. 71 ) t 11at the J\1ISE r edt1ces to the surr1 of t h e expect ed square errors in est im ating eac11 cornpor1ent X i . The MJ\!{SE solution is t o use t he observation Y to rnake an MNISE estirnate of each cornponent X ,i of \rect or X. Ir1 the context of linear estirnatior1) t 11e optirr1urn linear estirnate of each component X i is Xi(Y ) = a~ Y + b,i) vvith a~ and bi as specified b}' Theorem 12.7 wit h X = X ,i · The opt irn urn linear vect or estirr1at e is
X n(Y )] 1
vVriting
1 •
(12.72)
XL(Y ) in m atrix forrr1}' ields the \rect or gener alization of T11eorern 12.7.
- - - Theorem 12.8 X is an, rn -dirnen.si on,al ran,dorn vector 'tuith expect ed valv,e E (X ]. Y is an, n,dirnen,sion,al raridorn vector 'llJ'ith expect ed val'ue E (Y ] an,d n, x n, covaria/nce rnatriJ; C y. X arid Y have rn x n, cross-covarian,ce rnatr~ix C xy. T he rnin,irnv,rn rnean, s qv,are error lin,ear estim,ator of X g'i ven, the observati on, Y is 1
X L (Y ) = C xvC y (Y - E (Y ]) + E (X ] . A
Proof F rom Theorem 12.7,
(12.73)
(12.74)
It is oft er1 convenient t o r epresent the optimt1rr1 linear estirr1ator of Theorern 12.8 in t 11e forrn (12.75)
[ 420
CHAPTER 12
ES TIMATION OF A RANDOM VARIABLE
vvith -1
~
b=
A= C xy C y , W 11en E [X ) r edt1ces t o
=
0 a nd E [Y )
=
0, Cy
= Ry
E[X ) -A E[Y ) .
and C xy
(12.76)
= R xy , and T11eorerr1
12.8
(12.77) the gener alizatior1 of Theorem 12.6 t o t he est irnation of the ·vector X . In addition, frorr1 y because each cornpor1en t of L (Y ) is t 11e optirnurr1 lir1ear est irnate of as giver1 by T 11eorerr1 12.7, t he MSE a nd ort hogonality properties of X i(Y ) given in Theorern 12. 7 rerr1air1 t he sarr1e. T 11e exper iment in Exa mple 12.9 consists of a seqt1ence of n, + 1 s11bexperirr1er1ts that prod t1ce randorn ·v ariables X 1 , X 2 , ... X n+1 . T11e est im ator uses t he outcomes of t h e first ri experirnents t o form a linear estirr1at e of the ot1tcorr1e of ex perirr1en t n, + 1. vVe refer to t his estirnatior1 procedure as lin,ear predictiori because it llSes observations of earlier experiments t o predict the outcorr1e of a s11bsequer1t experirnent. vVhen the correlations of t 11e r ar1dom variables have the proper ty t hat T x i, X j depends only or1 the difference Ii - j I, the estirr1ation equat ions in Theorern 12.8 have a strt1ct ure that is exploited in m any practical applications. To exa rr1ine t he implications of t 11is proper ty, we adopt the r1otat ion
xi
x
xi
Rx (i, j )
= r l,i - .J I ·
(12.78)
In Chap ter 13 we observe that t 11is proper t}' is ch ar act erist ic of r andom ·vectors der i·ved frorn a 111ide sen,se stat'i on,ary ran,dorn seq11,en,ce. Ir1 t he nota t ior1 of t he linear est im ation model developed in Sect ion 12.4, X = X n+1 and Y = [X1 X2 X n] ' . T11e elem ents of t he correlation m atrix R y and t 11e cross-correlation m atrix R y x all 11ave t he forrn To
'f']_
r n -1
Tn
T1
ro
'f'n -2
Tn -1
Ry =
lrn-1
' r1
ro
R yx =
(12.79)
l
J
r1
J
Here R y and R y x togetr1er ha·ve a special structure. T 11ere are only ri + 1 different nt1rr1bers among the ri 2 + n, elem ents of t h e tv.ro m atr ices, and each diagonal of R y consist s of iden tical elem ents . T 11is matr ix is in a categor}' r eferred to as To eplitz fo rrns . The p r op erties of R y and R y x rr1ake it possible to solve for a' in Eq11ation (12.60) vvit h far fevver corr1putations than a re r equired in solving an arbit r ar}' set of ri linear equat ions. Many aud io compression t echniques use algorit hms for solving linear equations based on the proper t ies of Toep lit z forms .
=== Quiz 12.4=== X = [ X 1 X2J' is a r andom vector wit11 E [X ) = 0 and aut ocorrelat ion rr1atrix R x vvith elerr1ents Rx (i,j ) = (- 0.9) li - .71. Observe the vector Y = X + W , v.r11er e E[W ) = 0 , E[vV{ ) = E [vV?J = 0.1 , a rid E [W 1 W 2 )
= 0.
W and X are ir1dependent .
[ 12.s
M ATLAB
421
(a) F ind a* , t he coefficient of t11e optirr1 t1m linear estimator of X 2 given ·y 2 an d t he rnean sql1ar e error of t his estirr1ator . (b ) F ind t he coefficients a ' = [a 1 a2 J of the optirr1urr1 linear estirnator of X 2 giver1 ·y 1 and ·y2 , and t he rnean square error of t11is estimator.
12.5
MA1"'LAB
The rr1a.trix orientation of lVI ATLAB m akes it possible t o vvrite concise progr a.rr1s for genera.ting the coefficients of a linear estirr1ator arid calcula.tir1g tl1e estirr1ation error. The follov.ring exa.rnple explores t he relatior1ship of t11e mean sqt1are error t o t h e n t1rr1ber of observations l1sed in a linear predict or of a r ar1dorn variable.
===- Example 12 .10--==:::::i The correlation matrix Rx of a 21-dimensional random vector X has i, j th element (1 2. 80) Wis a random vector, independent of X, with expected value E [W] = 0 and diagonal correlation matrix Rw = (0.1 )I . Use the first n, elements of Y = X + W to form a linear estimate of X 21 and plot the mean square error of the optimum linear estimate as a function of n, for
(a)
r
-
li-jl -
sin(O.l7rl 'i - JI) 0.17fli - jl
)
(b) r l,i-11 = cos(0. 57rli - j l).
In this prob lem , let W (n)• X (n) • and Y (n ) denote the vectors , consisting of the first n, components of W, X , and Y. Similar to Example 12.8, independence of X (n) and W (n ) implies that the correlation matrix of Y (n) is (1 2. 81) Note that Rx(n) and Rw(n) are the ri x n, upper-left submatrices of Rx and Rw. In addition ,
R'X
-Ry(n) x -- E
Y (n) -
(1 2. 82)
Thus the optimum linear estimator based on the first n, observations is (1 2. 83)
[ 422
CHAPTER 12
ESTIMATION OF A RANDOM VARIABLE
---
0.1
,...
~.....i
o .._~~~--~~~--~__,
0
20
10
0.05 o--~~~--~~~--~--
o
20
10
n
n
ra=[1 sin(0.1*pi*(1:20)) ... ./(0.1*pi*(1:20))]; mse(ra);
rb=cos(0.5*pi*(0:20)); mse(rb);
(a )
(b) Figure 12.3 ,.fv;,ro Runs of mse. m
and the mean square error is
(1 2. 84)
function e=mse(r) N=length(r); e=[] ; rr=fliplr(r(: ) '); for n=1 :N , RYX=rr(1 :n)'; RY=toeplitz(r(1 :n))+0.1*eye(n); a=RY\RYX; en=r(1)-(a')*RYX; e=[e;en]; end plot(1:N,e);
mse .m calculates th e mea n sq uare error us-
ing Equation (12 .84). The input r correr 20 ], which sponds to t he vector is t he first row of t he T oepl itz correlation matrix Rx. Note that Rx(n) is t he T oepl itz matrix whose first row conta ins t he first n, elements of r. To plot t he mean sq uare error as a f unction of the num be r of observat io ns, 'n, we generate the vector ra nd then run mse Cr). For t he corre lation functions (a) and (b) in t he problem statement , the necessary M.A.TLAB commands and co rresponding mean squa re estimation error out put as a f unction of 'n are shown in Figure 12.3.
[ro
In corriparing tlie rest1lt s of cases (a) arid (b) iri Exam ple 12.10 , we see t liat the rnean square est imation error deperids strongly on tlie correlation structure giveri by r l,i -.il . For case (a), sarnp les X n for n, < 10 liave very little correlation vvit h X2 1. T lius for n, < 10, the est imates of X21 are only slight ly better tlian the blind estirnate. On t he ot her ha.nd, for case (b), Xi and X21 are completely correlated; px 1 ,x 21 = 1. For n, = 1, ·y1 = X1 + W1 is sirnply a rioisy copy of X21, arid t h e estirnation error is due to t he varia nce of W1 . In th is case, as n, increases, tlie opt irrial linear estirnat or is able t o corribirie addit ional noisy cop ies of X 2 1 , :yielding further reductioris iri tlie rriean sqt1are estirnation error.
[ PROBLEMS
423
Quiz 12.5 Estimate the Gaussia n (0,1) r andorr1 variable X using t 11e observation vector Y = l X + W , "'' here 1 is the vector of 20 1 's . T 11e noise vector W = [vV0 vV19 is ir1deper1dent of X , has zero expected vall1e , and has a correlation rr1atrix vvith i, jth entry Rw (i,j) = c li-.i l- 1 . F ir1d XL(Y ), t 11e linear MMSE estirr1at e of X given Y. For c in the r ange 0 < c < 1, v.r11at ·v alue of c minirnizes the rr1ean square error of the estimat e?
J'
Further Readin,g: T he fin[tl c11apter of ('\i\TSOl] presents t 11e basic theor}' of estirr1at ion of randorr1 variables as well as extensions t o stochastic process estirr1ation in the t ime domair1 a nd freqil.1enC}' dornain.
Problems Difficulty:
Easy
12.1.1
Generalizing t he solution of Examp le 12.2, let t he call durat ion T be an exponential (.A) r andom variable. For to > 0, show t hat t he minimum mean square error estimate of T, given t hat T > to is
T = t 0 + E [T] . 12.1.2
X and Y have t he joint PDF
fx ,y(1;, y) =
6(y-::r) {0
0<1;
+
D ifficu lt
Moderate
Experts Only
(c) \t\fhat is t he ininimum mean square error estimate of X given X > 1 / 2? (d ) \iVhat isfy(y)? (e) \i\fhat is t he b lind estimate fJ B? (f) \i\fhat is t he minimum mean square error estimate of Y given X > 1 / 2?
12.1.4
X and Y have t he joint PDF
.
fx ,Y(x,y)
=
{6 (y-x)
0
O
(a) W hat is f x 1y(x ly)?
(a) W hat is f x(x)? (b) W hat is t he blind estimate
xB?
(c) \i\f hat is t he minimum m ean square error estimate of X given X < 0.5?
(b) \t\f hat is XM(y) , t he lVIlVISE estimate of X given Y = y?
( d) W hat is fy(y)?
(c) \t\f hat is fy 1x(ylx)? (d) \i\fhat is f; A1 (x), t he lVIlVISE estimate of Y given X = x?
( e) \i\f hat is t he blind estimate YB?
12.1.5 • X and Y have t he joint PDF
( f) \i\fhat is t he ininimum mean square error estimate of Y given Y > 0.5?
12.1.3
j .x
'
y
(
x, y
)
=
{2 0
O
X and Y have t he joint PDF
fx ,y(x , y) = {
~
0<1;
(b) \iVhat is XM·(y), t he MMSE estimate of X given Y = y? ( c) \iV hat is
(a) W hat is f x(x )? (b) W hat is t he blind estimate
(a) \t\fhat is f x 1y(x ly)?
xB?
e*(0.5)
=E
[(X - XA1 (0.5))
2
IY = 0 .5] ,
[ 424
CHAPTER 12
ES TIMATION OF A RANDOM VARIABLE
t he ininimum mean squ are error of the estimate of X given Y = 0.5?
12.1.6 A signal X and noise Z are independen t G aussian (0, 1) random var iables, and Y = X + Z is a noisy observat ion of t he signal X. Usually, we want t o use Y t o estimate of X ; however , in t his problem vve \Vill use Y t o estimate t he noise Z. (a) Find Z(Y ), t he lVIlVIS E estim ator of Z given Y. (b) Find t he m ean s quared error e
E[(Z - Z (Y ))
2
=
12.1.8 In a BP SK communication system, a source \vishes t o communicate a random bit X t o a receiver. The possib le inputs X = 1 a nd X = -1 ar e equa lly likely. In t his system , t he source t r ansm its X multip le t imes. In t he it h t ra nsm ission, t he receiver observes Yi = X + Wi . After ri t ransmissions of X , t he receiver has observed
[Y1
Yn]
1 •
(a) F ind Xn(y ), t he MlVIS E estimate of X given t he observation Y = y . E x press your a ns,ver in terms of t he likelihood ratio
L ( ) = fy 1x(YI - 1) Y fy 1x (y ll) . (b) Simplify your a nswer \Vhen t he wi are iid G a ussian (0, 1) random variables, independent of X.
12.2.1 R andom variables X and Y have joint P lVIF
Px .Y x, J X=
1;
-1
= 0
x= l
J
=
-3 J = -1
1/ 6 1/ 12 0
(b) Are X and Y independent? ( c) F ind E[X], Var[X], E [Y], Var [Y ], a nd Cov [X , Y ]. (d ) Let X (Y ) = aY + b be a linear esti1nat or of X. F ind a* and b*, t he values of a and b t hat minimize t he mean squar e error e L. (e) W hat is el,, t he m inimum m ean square error of t he opt imum linear estimate?
].
12.1.7 R andom variable Y = X - Z is a noisy observation of t he cont inuous ra ndom variable X. The noise Z has zero expected value a nd unit varia nce a nd is independen t of X. F ind t he condit ional expectation E [X IY].
Y = Y=
(a) F ind t he marginal proba bility m ass functions Px(:i;) and Py(y ).
1/8 1/ 12 1/ 24
1/ 24 0 1/ 12 1/ 12 1/8 1/ 6
(f) F ind Px 1y(x l - 3), t he conditional I>MF of X given Y = -3. (g) F ind X1w( -3 ), t he optimum (nonlinear ) mean squar e estimat or of X given y = -3. (h ) F ind t he mean square error e * (-3)
= E [ (X
- XAJ(-3)) 2 IY = -3]
of t his estimate.
12.2.2 A telem etry volt age ,I, t ransm itted fro1n a pos it ion sensor on a ship ,s rudd er , is a r andom variable wit h PDF j .\I (v ) = { 1/ 12 -6 < v < 6, 0 ot herwise. A r eceiver in t he s hip,s control room receives R = V + X , The r a ndom variab le X is a G aussian (0 , Vs) noise volt age t hat is independen t of V . The receiver uses R to calculate A a linear estim ate of t he telem etry voltage: V = aR + b. F ind (a) t he expected received voltage E [R ], (b ) t he var ia nce Var [R] of t he received voltage, (c) t he covariance Cov[V, R] of t he t r ansm itted and received voltages, (d) a* and b*, t he opt imum coefficients in t he linear estimate, (e) el,, t he minimum mean square error of t he estimate.
[ PROBLEMS
12.2.3 Random variables X and Y have joint P l\/IF given by t he follo wing table:
Px,Y (x, y) X= -1
y= -1
y=O
3/16 1/6
1/ 16 1/6
0
1/8
x= O x= l \Ji.,! e
estimate Y by YL (X)
y=l 0 1/ 6 1/8
= aX + b.
12.2.4 The random variables X and Y have t he joint probability density function
(
) {2(y+::r) ::e,y = 0
O<::r
otherwise.
\!\,!hat is XL (Y), t he linear ininim um mean square error estimate of X given Y? 12.2.5 For random variables X and Y from Problem 12.1.4, find XL(Y), t he linear minimum mean square error estimator of X given Y. 12.2.6 Random variable X has a secondorder Erlang PDF
. ( ) _ j xx -
\
/\ Xe
{
0
- .Ax ,
x > 0, otherwise.
= :i~ ,
YM(x), (b) t he Ml\IISE estimate of X given Y
(b) t he 1\1.IMSE estimate of X given R = r,
Xj\1(r), ( c) t he L l\/ISE estimate of R given X,
RL (X),
12.2.8 For random variables X and Y, vve wish to use Y to estimate X. However , our estimate must be of t he form X = aY.
(a) Find a* , t he value of a t hat m inimizes t he m ean square error e =
E[(X - aY) 2 ]. (b) For a= a * , what is t he minimum mean square error e * ? (c) ·u nder what conditions is estimate of X ? 12.2.9
= y,
X t he L MSE
Here are four different joint P lVIFs:
1/9 1/9 1/9
x= O x=l 1/9 1/9 1/9 1/9 1/9 1/9
=
1J,
X=
Y= -1
y=O y=l Pu, v ('IL, v)
11,
-1
-1
= 0
11,
=1
0
1/3
v= O
0 0
1/3
v= l
1/3
0
0 0
V=
Given X = x, Y is a uniform (0 , x) randoin variable. F ind (a) t he 1\/11\/ISE estimate of Y given X
f A1(x),
XL(R).
(b) What is t he minimum mean square error el,?
,Y
(a) t he MMSE estimate of R given X = x,
( d) t he LMSE estimate of X given R,
(a) Find a and b to minimize t he inean square estimat ion error.
. jX
425
-1
s= O s=l
Ps.r s, t) t = -1
s = -1
1/6
0
1/ 6
t=O
0
1/ 3
0
t= 1
1/ 6
0
1/ 6
Pc ,R q, r r= -1 r=O r=l
q= -1
q= 0
q= 1
1/12 1/12 1/ 6
1/ 12 1/6 1/12
1/ 6 1/12 1/12
XM(y), ( c) t he L MS E estimate of Y given X ,
YL(X), ( d) t he L MSE estimate of X given Y, XL(Y). 12.2. 7 Random variable R has an exponential PDF wit h expected value 1. Given R = r, X has an exponent ial PDF wit h expected value 1/r. F ind
(a) l:<"br each pair of random variables, indicate 'vhether t he two random variables are independent, and compute t he correlat ion coefficien t p.
[ 426
CHAPTER 12
ESTIMATION OF A RANDOM VARIABLE
(b) Co1npute t he least mean square linear estimator UL (V) of U given V. W hat is t he m ean square error? Do t he same for t he pairs X, Y, Q, R, and S, T . 12.2.10 Random variable Y = X - Z is a noisy observation of t he continuous random variable X. The noise Z has zero expected value and unit variance and is independent of X. Consider t he follo,ving argument: Since X = Y + Z 1 111e see that if Y = y 1 then X = y + Z. Th1ls 1 by Theorem 6.4 1 the conditional PDF of X given Y = y is fx 1y(xly) = fz(x -y). It follo1Ds that E [X IY = y] =
1_:
::r fx1y(x ly) d1;
1;fz (x - y) dx.
(b) F ind t he MAP estimate of R given N . (c) F ind t he l\!IL estimate of R given N .
12.3.4 F lip a coin n, t imes. For each flip , t he probability of heads is Q = q independent of all other flips. Q is a uniform (0, 1) random variable. K is t he number of heads in n flips.
(b) \i\fhat is t he Pl\!IF of J{? \i\fhat is E [I<]?
- oo
With the variable s1lbstit1ltion1 z =
(a) F ind t he l\!ll\!ISE estimate of R given N .
(a) W hat is t he ML estimator of Q given K?
00
= /_
12.3.3 Let R be a n exponent ial r a ndom variable wit h expected value 1/ µ,. If R = r , t hen over an interval of length T, t he number of phone calls N t hat arrive at a telephone switch has a Poisson PMF \Vit h expected value rT.
1; -
y1
E [X IY=y]= 1_:(z+y)fz(z)dz = E [Z] + y = y.
We concl1lde that E[X IY] = Y. Since E[X IY] is optim,al in the mean square sense 1 71Je conclude that the optim,al linear estim,at or X (Y) = aY must satisfy a = 1. Prove t hat t his conclusion is wrong. \iVhat is t he error in t he above argument? Hint : F ind t he LlVISE estimator XL (Y) = aY.
12.3.1 Suppose t hat in Quiz 12.3, R, measured in meters, has a uniform PDF over [O, 1000]. Find t he MA P esti1nate of R given X = 1; . In t his case, are t h e JVIAP and ML estimators t he same? 12.3.2 Let R be a n exponen t ia l random variable \Vi t h expected value 1/ JJ,. If R = r, t hen over an interval of length T, t he number of phone calls N t hat arrive at a telephone S\Vitch has a Poisso n P MF wit h expected value rT. (a) Find t he MlVISE estimate of N given R. (b) F ind t he MAP estimate of N given R. ( c) F ind t he l\!IL estimate of N given R.
(c) \iVhat is fQ IK(q lk)?
t he
condit ional
P DF
(d ) F ind t he Ml\!ISE estimator of Q given K = k.
12.4.1 ·Y ou \Vould like to know a sample value of X, a Gaussia n (0 , 4) r andom variable. Hovvever , you only can observe noisy observations of t he form Yi = X + Ni. In terms of a vector of noisy observations, you observe Y =
[i]
=
[~] X+ [~~],
where 1'l1 is a Gaussian (0 , 1) random variable and N2 is a Gaussian (0 , 2) random variable. ·u nder t he assumpt ion t hat X, N i , and N2 are mut ually independent, answer t he follo,ving questions: (a) Suppose you use Y1 as an estimate of X. The error in t he estimate is D 1 = Y1 - X. W hat a re t he expected error E[D1] and t he expected squar ed error E[Di]? (b) Suppose \Ve use Y.1 = (Y1 + Y2) /2 as an estimate of X. The error for t h is estimate is D 3 = Y3 - X. Find t he expected squar ed error E[ D §] . Is Y.1 or Y1 a better estimate of X?
[ PROBLEMS
427
( c) Let Y4 = A Y where A = [a 1 - a] is a 1x2 matrix. Let D 4 = Y4-X denote the error in using Y4 as an estimate for X. In terms of a, \¥hat is the expected squared error E [DiJ? vVhat value of a minimizes E [D~]?
Y is a tv:.ro-dimensional random vector with
12.4.2
(a) Find the optimum coefficients
X is a three-dimensional random vector w ith E[X ] = 0 and autocorrelation matrix R x with e1ements Tij = (-0.80) li- j l. ·u se X 1 and X 2 to form a linear estimate of X3: X3 = aiX2 + a2X1. (a) What are t h e optimum coefficients a,1 a nd
(b) Use X 2 to form a linear esti1nate of X 3: X3 = aX2 + b. \i\fhat are the optimum
Yi=Xi+X2+Wi, Y2 = X2 + X3 + W2. ·u se Y to form X1 = [ai estimate of X i .
a2] Y , a linear
(b) l!se Y1 to form a linear estimate of Xi: X1 = aYi + b. \tVhat are the optimum coefficients a* a nd b* ? \i\fh at is the mini1num mean square error el,?
12.4.5
Suppose
A
coefficients a* and b* and corresponding minimum mean sqt1are error el,?
12.4.3
X is a 3-dimensional random vector \vith E[X] = 0 and autocorrelation matrix R x with elements
Rx(i ,J) = 1 - 0.25li -
JI·
Y is a two-dimensional random vector \vith
where qo + q1 k + q2k 2 is an unkno,vn quadratic function of k a nd zk is a sequence of iid Gaussian (0, 1) noise random variables. \i\fe wish to estimate t h e unkno\¥11 parameters qo, q1, and q2 of the quadratic function. Suppose we ass ume qo, qi, and q2 are samples of iid Gaussian (0, 1) random variables. F ind the optimum linear estimator Q(Y ) of Q = [qo qi q2 J' given the observation Y = [Yi Yn] '.
12.4.6
X is a three-dimensional random vector with E [X ] = [-1 0 1 and correlation matrix R x \vith elements
Use Y to form X1 estimate of X l · (a) Find t h e opti1num coefficients ii1 and a,2 and the minimum mean square error e£. (b)
l!_se Y1 to form a linear estimate of X i : Xi =
+ b.
\i\fhat are t he optimum coefficients a* and b*? What is the minimum mean squ are error el,? aY1
12.4.4
X is a three-dimensional random vector wit h E[X ] = 0 and correlation matrix R x with elements
Rx(i ,J) = 1 - 0.25li -
JI·
W is a two-dimensional rando1n vector, independent of X , with E[W ] 0, E[W1 vV2] = 0 , and E [W12 ] = E[vV22 ] = 0.1.
J'
Rx(i,J)
=1-
0.25li -
JI·
W is a two-dimensional random vector, independent of X , \¥ith E[W ] 0, E [W1 W2] = 0, and
E
[Wt] = E [Wi] = 0.1.
Y is a t\vo-dimensional random vector with
Yi=Xi+X2+Wi, Y2 = X2 + X 3 + W2. ·u se Y to form a linear estimate of Xi:
(a) \t\fhat are the optimum coefficien ts ii1,
[ 428
CHAPTER 12
ESTIMATION OF A RANDOM VARIABLE
(b) F ind t he l\!IMSE el,. ( c)
t rix R x wit h i, jth element
l!_ se Y1 to form a linear estimate of X 1: X1 = aY1 + b. \i\fhat are the opt imum coefficients a* and b*? \tVhat is t he minimum mean squ are error e1?
12.4.7 W hen X and Y have expected values µx = µy = 0, Theorem 12.3 says that XL(Y) = px ,Y ~Y Y. Show that t his result is a special case of Theorem 12.8 when random vector Y is the one-dimensional random variable Y. 12.4.8 Prove t he follo,ving theorem: X is an n-dimensional rando1n vector with E[X ] = 0 and autocorrelation mat rix R x \Vith elements rij = cli- j l, \vhere lcl < 1. The optimum linear estimat or of X n, A
Xn = alXn - 1 + a2Xn- 2 +
. .. + Om,- 1X1,
is Xn = cXn- 1. The minim11m mean square estimation error is el, = 1 - c2. Hint : Consider then - 1 equations 8eL/8ai = 0.
12.4.9 In the CDMA mult iuser communicat ions system int roduced in Problem 11.3.8, each user i t ransmits an independent data bit Xi such that t he vector X = [X1 Xn]' has iid components \Vith Pxi(l) = Pxi(-1) = 1/ 2. The received signal is k
Y =
L
i=l
xiyp:si + N ,
'vhere N is a Gaussian (0 , cr 2 I ) noise. (a) Based on the observation Y , find t he LMSE estimate Xi(Y ) = a~ Y of Xi.
x
(b) Let = [X1 x k] I denote the vector of LMSE estimates of bits t ransmitted by users 1, ... , k. Sho'v t hat
12.5.1 Continuing Example 12.10, t he 21dimensional vector X has correlation ma-
. .) _ sin(¢o7r Ii R X (i,} 1· I
- j I)
?, -
}
.,
•
\?Ve use t he observation vector Y = Y (n) = [Y1 Yn to estimate X = X 21. F ind the Ll\!ISE estimate XL(Y (n)) = a (n)Y (n)· Graph the mean square error e'.L('n) as a func t ion of t he number of observations ri for ¢0 E {0.1, 0.5, 0.9}. Interpret your results. Does smaller ¢0 or larger ¢0 yield better estimates?
J'
12.5.2
Repeat Problem 12.5.l 'vhen
Rx(i ,j) = cos(¢o7r Ii -jl).
12.5.3 In a variation on Example 12.10, we use t he observation vector Y = Y (n) = [Y1 Yn J' to estimate X = X 1. The 21-dimensional vector X has correlation matrix R x with i, jth element Rx(i , j)
= r li - j l ·
Find t he Ll\!ISE estimate XL(Y (n )) a (n) Y (n) · Graph t he mean s quare error e'.L(n,) as a func t ion of the number of observations n, and interpret your results for t he cases sin(O.l7rli - j l) (a) r li - j l = 0.l7rli-jl ' (b) r li- j l = cos(0.57rli - jl).
12.5.4 In t he k user CDMA system employing LMSE receivers in Problem 12.4.9, t he re~eive r employs the L MSE bit estimate xi to implement t he bit decision rule Xi = sgn(Xi) for user i. Us ing the ap proach in Problem 11.4.6, construct a simulation to estimate the BER for a system with processing gain n, = 32, wit h each user operating at 6 dB SNR. Graph your results as a func t ion of k for k = 1, 2, 4, 8, 16, 32. l\!lake sure to average your results over the choice of code vectors S i .
[
Stochastic Processes
Our study of probability refers to ar1 experirnent consisting of a procedure and observations. \i\Then -vve study r andorn variables, eacr1 observatior1 corresponds t o one or rnore nurnbers . W hen \ve study stochast ic processes, each observation corresponds to a ft1r1ction of tirne . T he -vvord stochastic rneans randorri. Tr1e -vvord JJTocess ir1 this context mear1s function of tirne . Therefore, wher1 -vve study st och astic processes, -vve study r ar1dorn f\1nctions of t ime. Alrnost all practical applications of probability ir1volve rr1ult iple observations taken over a period of t irne. For exarr1ple, our earliest discl1ssion of prob abilit}' in this book refers t o t he not ion of the relative frequen cy of a n outcorne v.rhen ar1 experirnent is perforrned a large nurnber of t imes. In t r1at discussion and Sl1bsequent anal}rses of r ar1dorn variables, -vve ha;ve beer1 concerned only \A.Tith ho'tJJ f requeritly a.r1 event occt1rs . vVhen -vve stud}' stochastic processes , -vve also p a}' at ter1t ior1 t o t he tirne seqv.en,ce of the events. In t his ch apter , -vve a pply a nd extend t he t ools -vve 11ave develop ed for r andorn variables to introduce stochastic processes. vVe present a rnodel for t r1e randornness of a stoch astic process that is an alogot1s t o the rnodel of a randorn variable, and -vve describe sorne fa rr1ilies of stocr1astic processes (P oisson , Bro-vvnian , Ga.t1ssian ) t r1at ar ise ir1 practical applications. ·vve t h en defir1e the av,tocorrelation, f11,n,ction, and a'tJ,tocovarian,ce f/J,n,ction, of a st ochastic process . T hese t ime ft1nctions a re useft1l surr1maries of the t ime struct11re of a process, jl1st as t he expect ed \ra.lue and variance are useful Sl1mrnaries of the amplitl1de structl1re of a r andorn variable . Wide serise station,ary JJrocesses appear in many electrical and cornputer engir1eerir1g a pplications of stochastic processes . Ir1 addition t o descriptions of a single r andom process, \Ve define tr1e cross-cor·relation, t o describe t he relationship b et -vveen t -vvo v.ride sense stationary processes.
429
[ 430
CHAPTER 13
STOCHASTIC PROCESSES
SAJV!PLE SPACE Figure 13.1
13.1
SAMPLE FUNCTIONS Concept ua l represen tation of a random process.
Definitions and Examples
The stoc11astic process X (t) is a rnapping of ot1tcornes of an experirnent to functior1s of tirne. X (t ) is both the name of the process arid the narr1e of the randorn variable observed at t ime t. The definition of a stochastic process reserr1bles Definition 3. 1 of a randorr1 variable. Definition 13.1 Stochastic Process A stochastic process X (t) con,sists of an, experirnen,t v.1ith a probability rneasv,re P [·] defiri,ed on, a sarnple space S an,d a fv..'nct'io'n that assign,s a tirne f11,n,ction, x; (t , s) to each O'IJ,tcorne s in, the sarnple space of the experirnen,t.
Essentially, t he definit ion says that t 11e outcorr1es of the experirnent are all fur1ctions of t ime. Just as a rar1dorn ·variable assigns a nurnber t o each outcorne s in a sample space S , a stochastic process assigns a sarnple f'/J,ric tion, to each outcorr1e s . .---- Definition 13. 2 Sample Function A sample function x( t , s ) is the tirne f'/J,ric tion, assoc'i ated 'tuith O'tJ,tcorne s of an, exp erirnen,t.
A sarnple function corresponds to an outcome of a stochastic process experirr1er1t . It is one of t he possible t ime functions that can result frorn the experirr1ent . Figt1re 13.l s11ovvs the correspondence between the sample space of ar1 experirnent and the er1serr1ble of sarnple functior1s of a stoc11astic process. It also displays t 11e
[ 13.1
DEFINITIONS AND EXAMPLES
431
tvvo-dirnensional notation for sample f\1r1ctions x (t, s) . In this notation, X (t ) is t11e narne of t11e stoch astic process, s indicates the partic11lar outcorr1e of t11e experirnent, and t indicat es the t irr1e dependence. Corresponding to the sarr1ple space of an experirnent and to the range of a randorn variable, the en,sernble of a stochastic process is defined as follows .
Definition 13.3 - - Ensemble The ensemble of a stocliastic process is the set of all poss'ible tirne fv.'nction,s that can, res'ult frorn ari experirnen,t.
Example 13.1......::::::--mSta rt ing at launch time t = 0, let X(t) denote the temperature in Kelvins on t he surf ace of a space shuttle. With each launch s, we reco rd a tempe rature sequence x(t , s) . T he ensem ble of the experiment can be viewed as a cata log of the possib le tem perat ure sequences t hat we may record . For exa m ple,
3';(8073.68, 175) = 207
(13. l )
ind icates t hat in the 175t h entry in t he catalog of possible temperature sequences, the tempe rature at t = 8073.68 seconds after the launch is 207 K .
Just as wit h randorr1 ·va,riables , or1e of t11e m ain benefits of the stochastic process rnodel is t h at it lends itself to calculating aver ages . Corresponding to t 11e tv.rodimensional r1ature of a stoc11astic process , t 11er e are tv.ro kinds of aver ages . ·\i\Tith t fixed at t = to, X (to) is a r andorn variable, and we have the aver ages (for ex arr1ple, the expected value and t11e variance) that vve ha·ve studied already. In the terminology of stochastic processes, \rve refer to these averages as en,sernble a'/Jerages. The other type of aver age applies to a specific sarr1ple funct ior1 , J';(t, s 0 ), and prodt1ces a typical number for t11is sarnple function. This is a tirne a'/Jerage of t he sarnple function.
Example 13.2 In Example 13.1 of t he space sh uttle, over all possible launches, the average temperature after 8073.68 seconds is E[X (8073.68)] = 217 K . T h is is an ensem ble average ta ken over all possible temperature sequences. In t he 175th entry in the catalog of possible tempe rature sequences, t he average temperature over that space shu t tle m ission is
1
671 , 208.3 w here the integral lim it
1 671,208.3 0
x(t , 175) rlt
= 187.43 K,
(13.2)
671, 208.3 is t he du ration in seconds of t he shuttle m ission.
Before delving into t11e rr1atherr1atics of stochastic processes, it is instruct ive to examine the followir1g examples of processes that arise when we observe t im e functions.
[ 432
CHAPTER 13
STOCHASTIC PROCESSES
1n(t,s) 35
30 25
20 15
JO.___ __.__ ____.__ _ _.___ __.__ ____.__ ___..___ _....__ __.__ ____. 0
100
200
300
400
500
600
700
800
900
! (sec)
Figure 13.2
_A.. sam ple function
m(t , s) of t he ra ndon1 process M(t) described in E xam-
ple 13.4 .
-=
Example 13.3 Starting on January 1, we measure the noonti me temperature (in degrees Celsius) at Newark Airport every day for one year. This experiment generates a seque nce, 0(1 ), 0(2), ... , 0 (365) , of temperature measurements. With respect to t he two kinds of averages of stoc hastic processes, peop le make freque nt reference to both ensemble averages , s uch as "the average noonti me temperature for February 19," and time ave rages, such as the "average noontime temperature fo r 1986."
Example 13.4 Consider an experiment in whic h we record M(t ), t he numbe r of act ive calls at a telephone switch at t ime t , at eac h second over a n interval of 15 minutes . One t ria l of the experiment might yield t he sample function rn,(t , s) shown in Figure 13.2. Each t ime we perform the experiment , we would observe some other funct ion rn,(t , s) . The exact m (t , s) that we do observe wi ll depend on ma ny random variables inc lud ing the number of ca lls at the start of the observatio n pe riod , t he arrival t imes of the new cal ls, and the duration of each ca ll. An ensemb le average is the average nu m ber of calls in progress at t = 403 seconds. A time average is t he ave rage nu mber of ca lls in progress during a specific 15-m inute interva l.
The fur1damenta l diffeirer1ce bet ween Exarnples 13.3 a nd 13.4 and experirner1ts frorn earlier chapters is t hc:.-tt the rar1dornness of the experiment depends explicitly on tirne. 1!{oreo·ver , t he conclusior1s t r1at vie draw frorn our observations -vvill depend on tirne. For example, in the Newark t ernperature rneasl1rernents, v.re would expect t he t emperatures 0(1), ... , 0(30) during t he rnonth of J anuary to be lo-vv in corr1parison t o t r1e terr1perattlres 0 (18 1) , ... , 0(210) in t r1e rniddle of sumrr1er. In this case, t he r andornr1ess -vve observe v1ill depend or1 the absolute time of our obser vation. ·\'¥ e rnight also expect that for a d ay t that is within a fe-vv d ays oft' , t he terr1peratures 0( t ) and 0 (t') are likel}' to be similar. Ir1 this case, v.re see t r1at the randorr1ness v.re observe m a}' depend on the time differer1ce bet v.reer1 obser\rations. We will see that ch aracterizir1g t he effect s of the absolute tirr1e of an observation and t he relative
[ 13.1
DEFINITIONS AND EXAMPLES
433
t irr1e bet\veen observations \vill be a sigr1ificant step tovvard t1nderstanding stoch astic processes. ~-
Example
13.5~=~
Suppose that at time instants T = 0 , 1, 2, ... , we ro ll a die and record the outcome JVr where 1 < Nr < 6. We then define the random process X(t) such that fo r T < t < T + 1, X(t) = l'lr . In this case, the experiment consists of an infinite sequence of rolls and a sample function is just the waveform corresponding to the pa rticu la r sequence of rolls. Th is mapping is depicted on the right. c:::==
x(t,
s,)
s, 1,2,6,3,... 1
t
Example 13.6
In a quaternary phase shift keying (Q PSK) communications system , one of four equal ly probable symbols s 0 , ... , s 3 is transmitted in T seconds . If symbol Si is sent, a waveform x(t , si) = cos( 2111·0 t + 11/4 + i11/ 2) is transmitted during the interval [O, T]. In this examp le, the experiment is to transmit one symbol over [O , T] seconds and each sample function has duration T. In a rea l communications system , a symbol is transmitted every T seconds and an experiment is to transmit j symbols over [O ,jT ] seconds. In this case, an outcome corresponds to a sequence of j symbols, and a samp le function has duration jT seconds.
Alt11ougl1 the stochastic process rnodel ir1 F igt1re 13. l and Defir1ition 13. l refers to one experirr1ent producing an observation s , associated vvith a sarr1ple function x(t , s), our experience vvith practical applicatior1s of stochastic processes can better be described ir1 t erms of a. n or1going seqt1ence of observations of randorr1 events. In the experirnent of Example 13.4, if we observe rn( l 7, s) = 22 calls in progress after 17 seconds , then vve know that llnless in the r1ext second at least one of t he 22 calls ends or one or rnore new calls begin, m(18, s) would rernain at 22. We could say that each second we perforrr1 a n experiment to observe t11e nt1mber of calls beginning and the number of calls er1d ing . In this sense, the sarr1ple functior1 rn,(t , s) is the restllt of a seqt1er1ce of experimer1ts , \vith a r1ew experiment performed every secor1d. The obser vations of each experirnent produce sever al r andorr1 variables relat ed to the sample functior1s of t 11e stochastic process .
Ex a m p Ie 13 . ri-::::::::::11T he observations related to the waveform
rn,(t, s)
in Example 13.4 could be
• m,(O, s), the number of ongoing calls at the start of the experiment ,
• X 1 , ... , X 1n (O,s ) , the
remaining ti me in seconds of each of the
cal ls,
• N, the number of new ca lls that arrive during the experiment,
• S 1 , ... , SN,
the arrival times in seconds of the J\T new ca lls,
rn,(O, s)
ongoing
[ 434
CHAPTER 13
STOCHASTIC PROCESSES
• Y1 , ... , Y N , the ca 11 du rations in seconds of each of the fll new ca lls. Some thought wi ll show that samples of each of these random variables, by indicating w hen every cal l starts and ends , correspond to one sample funct ion m,(t, s) . Keep in mind that although these random variables completely specify m,(t, s) , t here are other sets of random variab les that also specify rn,(t, s). For example, instead of refe rring to the durat ion of each call, we could instead refer to the t im e at which each cal l ends. Th is yields a different but equ ivalent set of random variables corresponding to the sample function 777,(t, s) . Th is example emphasizes that stochastic processes can be quite complex in that each sample f unction 'JT1,(t, s) is related to a large number of rando m variables, each with its own probability model. A complete model of the entire process, NI(t) , is the model (joint probability mass funct ion or joint probability density function) of al l of the ind ividual random variables.
Just as v.re developed different v.rays of ar1a.lyzing discrete and continuous r andorn variables, \Ve can define ca.tegories of stoc11astic processes t11at can be an alyzed using different mathernatical techniques . To establish these categories, -vve characterize both the range of possible \ralues at any instant t as \vell as t11e tirne instants at v.rhich changes in the randorn process can occur.
Definition 13.4 Discrete-Value and Continuous-Value Processes X(t) is a discrete-value process if the set of all possible values of X(t) at all tirnes t is a co'un,table set S x ). other111ise X (t) is a continuous-value process.
Definition 13.5 Discrete- Time and Continuous-Time Processes The stochastic process X (t) is a discrete-time process if X (t) is defiri,ed on,ly fo r a set of t'irne in,stari,ts) tn = ri,T ) 'tnhere T is a cori,stari,t an,d ri, is ari, 'iri,teger; other'tuise X (t) is a continuous-time process. Ir1 F ig11r e 13.3 , v.re see t hat the corr1bir1ations of contirn1ous/ discrete t ime and cont in11ot1s/discrete va1t1e result in four categories. For a discrete-tirr1e process, the sarnple f11nction is cornpletely described b}r t11e ordered seqt1ence of r ar1dom variables X n = X ('nT).
=== Definition 13.6=== Random Sequence A ra/ndorn seqv,eri,ce Xn is an, ordered seqv,eri,ce of raridorn variables X 0 , X 1 , ... Quiz 13.1= = For the temperatt1re m eas11rerr1ents of Exa rnple 13.3 , construct exarr1ples of t he rneasurernent process st1ch t11at the process is (b) discr ete- tirr1e, cor1tir1t1ous-vaJt1e, (a) discrete-time , d iscr ete-vaJt1e, ( c) cor1tinuous-tirne , discrete-vaJt1e,
( d) continuot1s-tirr1e, cor1tinuous-value.
[ 13.2
435
RANDOM VARIABLES FROM RANDOM PROCESSES
Continuous-Time, Continuous-Value Discrete-Time, Continuous-Value
-
2
-""
.....
"-
>< ""
2 •
.....
'--·
0
><
"
-2 -1
• •
•
0
0
0.5
•
•
•
• •
0.5
0
1
Dis crete-Time, Discrete-Value
2
• • •
2
.....
>< "
•
{
Continuous-Time, Discrete-Value
~
•
•
I
-
• •
- 0.5
-1
1
•
•
•
-2 - 0.5
• •
~ '--·
0
><""""
-2
• •
•
•
•
•
0 •
-2
•
•
• • •
•
• • • -
- 1
- 0.5
0
0.5
1
-1
I
- 0.5
0
0.5
1
l
San1ple function s of four kinds of stoch astic processes. Xcc(t) is a con t inuoust ime, con t inuous-value process. Xdc(t) is discrete-t in1e, con t inuous-value p rocess obtained by san1pling X cc) every 0.1 seconds. R ounding X cc(t ) to t h e nearest integer yields X cct(t) , a cont inuous-t ime, d iscrete-value process. Lastly, Xctct(t ), a discret&time, d iscret&value process, can b e obtained eit h er by san1pling X ccL(t ) or by rounding X dc(t) . Figure 13.3
13.2
Random Variables from Random Processes The probability rr1odel for the ra r1dom process X (t ) s pecifies for all possible {t1 >... , t k} the joint PNIF P x(ti ),... ,X (tk)(x 1, ... , xk) or the joint PDF f'x (t 1 ) , ... ,X(t k)(x:1, ... ,xk )·
St1ppose we observe a st och astic process at a p art ic11lar time inst ant t 1 . In t his case, ea.cl1 t ime -vve perforrr1 t he experirnent , °'if.le obser ve a sample function x:(t , s) and t 11a.t sarnple ft1r1ction specifies the ·v alue of x( t 1 , s) . E a.ch time -vve perform t 11e experirr1en t, -vve 11a ve a ne~r s a.nd -vve observe a ne-vv x (t 1 > s) . Therefore, ea.ch x (t 1 , s) is a sarnple value of a random variable . SN'e use t 11e r1otation X (t 1 ) for this randorr1 variable. Like any other r andorr1 variable , it has either a. PDF f X (t 1 )(x) or a PMF Px(ti )(x) . Note that the notatior1 X (t ) can refer t o eit h er t11e r a.ndorn process or the r a.ndorn variable that corresponds t o the ·v alue of the randorr1 process at tirne t. As our an alysis progresses , wher1 -vve -vvrite X (t ) >it will be clear frorn the context -vvhether we a.re referring t o the ent ire process or t o or1e r a.r1dom variable.
- - - Example 13.8:- - 1nExample13 .5 of repeatedly rolling a die, what is t he PMF of X (3.5) ? The random variable X (3.5) is the value of the die ro ll at time 3. In this case,
[ 436
CHAPTER 13
ST OCHAS TIC PROCESSES
Px (3.5) (x) =
1/ 6
;i; =
1, ... , 6,
0
otherw ise.
(13.3)
Example 13.9 Let X (t) = R I cos27rf t i be a rectifi ed cosi ne signal having a rand o m am plit ude R w it h t he expo nenti al PDF
fR (r) =
> 0. -
..l e-r/ 10 10 _,
T
0
otherw ise.
(13.4)
I
W ha t is th e P D F f·x (t)(x )?
.. ... .. .. .. ... .. ... .. .. ... .. .. .. ... .. ... .. .. ... .. .. .. ... ... .. .. .. ... .. .. ... .. ... .
Since
X (t) > 0 f or all t, P [X(t) < ;i;) = 0 f or
;i;
< 0.
If
;i;
> 0,
and
cos27rf.t > 0,
P [X(t) < ;i;) = P [R < :i;/ lcos27rf.t1) = W hen
lo
:r:/lcos 21fftl
f R(r) dr
= 1 - e - :i:/ lOlca; 27rft1.
(13 .5)
cos27rft -=J 0, the co m plete CDF of X (t ) is 0 1 - e-:1;/10 lcos27rft 1
W hen cos 27r ft-# 0, t he PDF of
fx (t) ( :i;) =
=
(13.6)
X (t) is 1 e - x/10lcos 27rft1 10lca;27rft1 _,
x > 0,
0
otherwise.
dFx(t) (x)
,. G,X 1
x< O ' x > o.
W hen cos27rft = 0 co rrespond ing R m ay be. In thi s case, 1·x(t)(x) = vari able f o r each val ue of t.
(13. 7)
tot = 7r/2 + k7r, X (t) = 0 no m atter how la rge o(x). In thi s exa m ple, there is a different ra ndom
W ith respect to a single randorn ·var iable X , we four1d that all the properties of X are determined from t l1e P DF f x( x) . Sirr1ilarly, for a pair of randorr1 ·variables X 1 , X2 , v.re r1eeded the joint PDF fx 1 ,x2 (:i; 1 ,:i;2) . In p a rticltlar , for the pair of randorn ·var iables, v.re found that the marginal PDF 's f x 1 (x 1 ) and f x 2 (x 2 ) were not er1ol1gh to describe the pair of randorn variables. A sirnilar situatior1 exists for r ar1dorr1 processes . If-vve sarnple a process X(t) at k tirneir1stants t 1 , .. . ,tk, -vve obtain t11e k-dirnensional randorr1 vector X = [X (t 1 ) X (tk)] '. To ans-\ver questions about the random process X (t), -vve must be able to ans-vver questior1s abol1t any randorn ·vector X = [X (t 1 ) X(tk) ]' for ari,y value of k ari,d an,y set of tirne iristari,ts t 1 , ... , t k· In Section 8.1, t11e randorr1 ·vector is described by the joint P MF Px(x ) for a discrete-vall1e process X (t) orb}' the j oint PDF fx(x) for a cor1tinuous-value process.
[ 13.3
INDEPENDENT, IDENTICALLY DISTRIBUTED RANDOM SEQUENCES
437
F or a r andorr1variable X , v.re could describe X by its PDF f'x( x), v.rithout specifying t 11e exact underlyir1g experirr1er1t . Ir1 t he sam e way, kr1owledge of the joint P DF f 'x (t i ),. .. ,X(t k )(x; 1 , .. . , Xk) for a.11 k will a.llov.r llS t o descr ibe a r a.ndorn process -vvithot1t r eference t o an 11nderlying exper iment . This is conven ient because rr1a.ny experirnents lea.cl to t11e s~1rne stoch astic process. This is ar1alogous to t 11e situation -vve described earlier in w11ich more t han one experiment (for exarnple, flipping a. coin or transmitting one b it) produces the sarne r ar1dom variable. In Section 13.1 , there a.r e tv.ro exarr1ples of r a.ndorn processes b ased on rr1easurements. The rea.1--vvorld factors t hat infiuer1ce these rr1ea.surements can be ·very cornplicated. For exarr1ple, t he sequence of daily t ernper a.tures of Example 13.3 is the result of a. ·very large dynarnic weather syst em t11at is or1l}' p art ially understood. Just as vie developed r andom ·v ariab les frorr1 idealized models of experirner1ts , v..re v..rill construct r andom processes t hat a.re idealized models of real phenorr1er1a . The next t 11ree sections ex arr1ir1e the probab ility rr1odels of sp ecific t ypes of stoch astic processes. Quiz 13.2 In a production lir1e for 1000 resistors, t11e actu al r esistance in ohrns of each r esistor is a. l1r1iform (950 , 1050) r ar1dom variable R. The r esistan ces of different r esistors a.r e independent. The resistor cornpany ha.s a.n order for 1% resistors with a. r esistan ce betweer1 990 n a.nd 1010 n. An a utomatic tester t akes one resistor p er second and rneast1res its exact resistance. (T11is t est takes or1e second. ) The ra.ndorr1 process JV(t) denotes t he nt1rnber of 1% resistors found ir1 t seconds. The ra.ndorr1variable T r seconds is the elapsed t irne at v.r11ich r 1% resistors a.re fot1r1d.
n
(a) (b ) ( c) (d ) (e)
W 11a.t is '[J, the probabilit}' t hat any single resistor is a 1% resistor? '\i\That is the P MF of J\T(t )? W hat is E[T1 ] seconds, the expect ed tirne to fir1d t he first 1% resistor? W hat is t he probability t hat the first 1% resistor is found in exactly 5 seconds? Ifthe automatic tester finds the first 1% resistor ir110 seconds, v.rhat is E [T 2 IT 1 = 10], t he conditional expect ed ·v alue of the t irne of findir1g t he second 1% resistor?
13.3
Independent, Identically Distributed Random Sequences
The iid r andorn sequence X 1 , X 2 , . .. is a discret e-t im e stochastic process consisting of a seql1ence of independent, identically distributed random variables. An independent ident ically distribl1ted (iid) randorn sequence is a randorn seqt1ence X n in v.rhich ... , X_ 2 , X_ 1 , X 0 , X 1 , X 2 , . . . are iid randorn ·v ariables . An iid rar1dom sequen ce occurs \iVhenever \iVe p erforrn indep endent tria ls of an experirner1t at a
[ 438
CHAPTER 13
STOCHASTIC PROCESSES
constant rate. An iid r andorr1 seq11en ce car1 be either discret e-valt1e or cont inuot1svalue. In the discrete case, each r andom variable X i h as P 1!fF Pxi(x) = Px(x) , "'' hile in the continuot1s ca,se, each X i has PDF f"xi(x) = f'x(:i;) .
Example 13.10 In Q uiz 13.2, each independe nt resistor test required exactly 1 second . Let Rn equal the number of 1% resistors found during m in ute ri. The random variable Rn has the binom ial PMF
Pnn (r) =
(~O)pr(l -
p)60-r _
(13 .8)
Since each resistor is a 1% resistor independent of all other resistors, the number of 1% res isto rs found in each minute is independent of t he number found in other minutes. Thus R 1 , R 2, ... is an iid ra ndom sequence.
-----:= Example 13.11 = = In the absence of a transmitted signal, the o utput of a matched fi lter in a digita l communications system is an iid sequenceX1 , X2 , ... of Gaussian (0 , 0") random variables.
X n ] ' is
For an iid randorn seq11er1ce, the probability model of X = [X1 easy to "'' rite since it is the product of t11e individual P MFs or PDFs.
Theorem 13.1==----i Let X n de'note an, iid ran,dorn seq?J,e'nce. For a discrete -va[1J,e process) the sarnple / X nk J has join,t P MF vector X = [X n1 k
Px (x ) = Px (x;1) Px (x;2) · · · Px (xk) =
II Px (x;i ) . 'i = l
For a cor1,t'iri'1J,O'IJ,S-'ualv,e JJrocess) the joirit PDF of X = [ X n1
1
· · • ,
X nk] is
k
f"x (x) =
f x (x1) f"x (x2) · · · f"x (xk)
=
II f'x (x;i ) · 'i= l
Of all iid randorr1 sequences , perl1aps the B ernoulli rar1dom sequence is the sirnplest.
- - - Definition 13.1; - - -Bernoulli Process A B ern,011,lli (p) process X n is an, i'id ran,dorn seq'1J,en,ce in, 'tvhich each X n 1,s a B ern,O'/J,lli (IJ) raridorn variable.
[ 13.4
THE POISSON PROCESS
439
=== Example 13.12===ln a common model for communications , the output
X 1 , X 2 , ... of a binary source is
modeled as a Bernoulli (TJ = 1/ 2) process.
===- Example 13.13 Each day , we buy a ticket for the New York Pick 4 lottery . X n = 1 if our t icket on day n, is a winner; otherwise, X n = 0. The random sequence X n is a Bernoulli process. c:::==
Example 13.14
For the resistor process in Quiz 13.2 , let ·y n = 1 if, in the rith second, we find a 1 % resistor; otherwise ·y n = 0. The random sequence Yn is a Bernoulli process.
Example 13.15 For a Bernoulli (p ) process X n, find the joint PMF of X = [X1 For a single sample X ,;, , we can write the Bernoulli PMF in the following way :
When xi E
{O, 1}
for 'i
r/l:i (l - rJ)1-xi
x; ,;, E
0
otherwise.
= 1, ... ,ri, the joint PMF can
{O, 1} ,
(13.9)
be written as
n
Px (x ) = IJ r/i:i(l - rJ) 1- xi = JJk(l - rJ)n-k ,
(13. 10)
'l = l
where k = x 1
Px (x )
=
+ · · · + Xn . The complete expression for the joint PM F is p:r;1 +· ·· +~1;n (1 - '[J )n- (:r;1 +· ·· +xn)
0
Xi
E { 0, 1} , 'i =
1, ... , T/, ,
otherwise.
(13.11)
Quiz 13.3 For an iid randorr1 seql1en ce X n of Gaussian (0, 1) random variables, find the joint 1 PDF of X = [X1 X 1n] •
13.4
The Poisson Process The Poisson process is a rnemoryless count ing process in w11ich ar1 arrival at a partic11lar instant is independer1t of an arrival at any other ir1stant.
[ 440
CHAPTER 13
STOCHASTIC PROCESSES
N(J) 4 I
5
.
4 . 3 '
2 . J
.
S,
. S,
..
Si
i+X, •u x~~•..1•--~ Figure 13.4
.
'
S,
•1• X, •1•
Xs
S1 ••
.
I
~
Sa n1ple path of a count ing process.
A count ing process JV (t) starts at t ime 0 and counts t11e occurren ces of events . These events a re ger1erally called arrivals becal1se counting processes are most often tlsed to rnodel the a rrivals of custorners at a ser vice facility. However , since counting processes ha·ve m an:yr applications, \Ve v.rill speak about a rri·vals v.rithout saying v.rhat is a rr1 \rir1g. Since we st a rt at tirne t = 0 , 'n (t, s) = 0 for all t < 0. Also, t he nurr1ber of a rri\rals up t o a ny t > 0 is a n ir1teger that canr1ot d ecrease vvit h t irr1e.
- - - Definition 13.8:= --- Counting Process A stochastic process N (t) is a counting process if fo r e1;ery s arnple f v,n,cti ori, 'n( t , s) = 0 for t < 0 an,d r1,(t , s ) is in,teger-value d arid n,on,decr easin,g 'tuith tirne. We car1 thir1k of N (t ) as cour1t ing the nurnber of custorners t hat arrive at a systerr1 dl1ring t11e ir1terval (0 , t]. A t ypical sarr1ple p ath of J\T(t ) is sket ched in Figure 13.4. The jl1mps in the sample funct ior1 of a COl1nting process rr1a rk t he arrivals , a nd t 11e nl1rnber of arrivals in t 11e interval (t 0 , t 1 ] is just J\T(t 1 ) - N (t 0 ) . We car1 use a B err1oulli process X 1 , X 2 , ... t o d eri\re a sirnple countir1g process. In p a rticula r , consider a s m all t ime step of size ~ seconds su c11 that there is one arrival in t11e interval (ri,~ , (ri, + 1 ) ~] if and or1ly if X n = 1. For a n aver age a rrival r at e A > 0 a rrivals/second, we car1 choose ~ Sl1ch that ,A~ << 1. I n t his case, v.re let t he success proba bility of X n be A~. This irr1plies that the r1urnber of a rriva.ls lVrn b efore tirr1e T = rn, ~ h as t he binorr1ial PMF
PN (ri) = rn
(rn) 77,
(,AT/rn,)n (l - ,AT/m,)rn - n .
Ir1 Theorerr1 3.8, vve shov.red that as rn, --+ oo, or equivalentl:yr as Nrn becorr1es a Poisson r andorr1 variable N(T ) v.rit h PMF
(,AT)ne->-T /ri,!
'T/,
0
otherwise.
=
(13.12) ~
0, 1, 2, ... )
--+ 0 , t he P l\/IF of
(13.13)
[ 13.4
441
THE POISSON PROCESS
vVe can generalize t his argurr1ent to say that for an}' interval ( t 0 > t 1 ] , the r1urr1ber of arrivals "'' ould have a P oisson PMF w ith p aram eter >..T vvhere T = t 1 - to . ]\/Ioreo·ver , t11e nurr1ber of a rrivals in (to, t 1 ] d eper1ds or1 the independent Bernoulli trials corresponding to that interval. T hus the r1l1rnber of arriva ls in nonoverla pping intervals 'ivill b e independent. In the lirnit as ~ -+ 0, 'ive 11ave obtained a countir1g process in vv11ich the nurr1ber of arrivals in an}' interva l is a Poisson ra ndom variable indep e ndent of the arri'ir[tls in any other nonoverlapping ir1terval. '\"!Ve call this lirnitir1g process a Poisson, process.
Definition 13. 9 Poisson Process A cov,n,tin,g process 1'l(t) is a Poisson process of rate >.. if (a) Th e 'n'urnber of arrivals ir1, an,y in,ter·ual (to, t1L N(t1) - J\T(to), is a Poisson, rar1,dorn variable 111ith expected val'ue >.. (t 1 - t 0 ). {b) For ariy pair of rion,overlappirig in,terval,s (t 0 , t 1 ] ar1,d (t~ , t~ ], the n/u,rnber of arr'ivals in, each ir1,terval, JV(t 1 ) - N(t 0 ) an,d J\T (t~) - J\T (t~), respectively, are in,depen,den,t raridorri variables.
vVe call >.. t11e rate of the process b ecau se t11e expected nl1rnber of arrivals per llnit t ime is E[J\T(t) ]/t = .A. B y the d efir1ition of a Poissor1 ra ndorr1 variable, M = JV(t 1 ) - N(t 0 ) has the P NIF [>- (t1 -to) ]= e- >-(t1 -to) rn !
PM(rn,) =
7Tl, = 0) 1, ... ,
(13.14)
other'ivise.
0
For a set of tirne instants t 1 < t 2 < · · · < t k, we can u se the property that the nurr1ber of arrivals in nono'irerlapping intervals are indeper1dent to 'ivrite t11e joint PMF of N(t 1), ... , N(t k) as a product of probabilities .
=== Theorem 13.2'==::::::i For a Poissori process JV(t) of rate>.., th e joirit PMF of N for ordered tirne ir1,stan,ces t 1 < · · · < t k, is
= [N(t 1 )> ... , N(tk)]',
n 1 - °' 1 a . e a . n 2 - n1 e -O< ·;inil (n2 - n 1)!
0<
0
other'1nise,
Proof Let M l = N(t1) and for i
n,1
< · · · < 'nk,
> 1, let Mi = N(ti ) - N(ti- 1) . By t he definition of t he
Poisson process, JV! 1 , . . . , Mk is a collection of independen t Poisson random variables such t hat E[J\fi) = cxi . P::"J (n )
=
(n,1, n2 - n,1, ... , nk - n,k-1 ) jVJ 1 (n,1) PM2 (n,2 - nl) · · · P j\1k (n,k - n,k - 1)
P j\11
=P
(13.15)
,M 2 ,. . ., 1\ll k
The t heor em follo,vs by substit u t ing Equation (13 .14) for
.
P jVli (ni - n ,i- 1 ) .
(13.16)
[ 442
CHAPTER 13
STOCHASTIC PROCESSES
Keep in rr1ind that t he indep er1dent intervals property of the Poisson process rnust 11old even for very srn a.11 interva.ls . For exarnple, the r1urr1ber of arriva.ls in (t , t + <5] rnust be independer1t of the arri·val process over [O, t] no rnatter 11ow sm all V\re c11oose fJ > 0. Essentially, the probability of an arrival during any instant is independent of the past history of t11e process. In this sense, the Poisson process is rnernoryless. This m emoryless property can also be seen when \Ve exarnine t he t irnes betV\reen arri\rals. As depicted ir1Figure1 3.4, the r andom tirne Xn bet\veen arri\ral n, - 1 arid arri\ral ri, is called then,th in,terarrival tirne. Ir1 addition , \Ve call the tirr1e X 1 of t11e first arrival the first inter a,rrival t irne even though there is r10 pre\rious arri\ral.
- - - Theorem 13.3:- - For a Poisson, process of rate ,\ 7 the in,terarrival tirnes X 1 , X 2 , ... are ari iid ra/ndorn sequen,ce 'tJJith the ex;pon,en,tial PDF f x(x) =
Proof Given X 1 = x1, X2
x > - O other'tJJise. )
0
= : 1;2, ... , X n- 1 = Xn - 1, arrival n, t n- 1 = : 1;1 +
1 occurs at tim e
· · · + Xn - 1·
(13.17)
For x > 0, X n > x if a nd only if there are no arrivals in t he interval (tn- 1, tn - 1 + x]. The number of arrivals in (tn- 1, tn - 1 + x] is independent of the past history described by X1, ... ,Xn- 1· This implies
Thus X 11 is independent of X
1, . . . ,
Xn - 1 and has the exponential CDF l -
Fxn (::i;) = 1 - P [Xn > ::i;] = { O
e - Ax
::i; > 0 ,
otherv;.rise.
From t he derivative of t he CDF, we see t hat Xn has t he exponen t ial PDF fx n(x) in t he statem ent of t he t heorem.
(13.18)
=
f x( x)
Frorr1 a sample fur1ction of N(t) , V\re can identify the ir1terarrival tirr1es X 1 , X2 and so on. Sirrrilarly, frorn the inter arrival t irnes X 1 , X2, ... , we car1 construct the sarr1ple function of the Poisson process N(t) . This implies t:hat an eql1ivalent r epreser1tation of the Poisson process is the iid r andorn sequence X 1 , X 2 , ... of exponentially distributed interarrival tirnes .
.--------== Theorem 13.4 A cov.'ntirig process v1ith in,depen,den,t ex;1Jorien,tial (..\) in,terarrivals X 1 , X 2 , ... 1,s a Poisson, process of rate ..\.
[ 13.5
PROPERTIES OF THE POISSON PROCESS
443
==--Quiz 13.4--== Data packet s transmitted by a rnoderr1 o·ver a pl1one lir1e forrn a Poisson process of r at e 10 p acket s/sec. Using NJk t o denote t 11e r1urnber of packets transmitted in the kt h hour, find the joint P1!fF of Ji.III and M 2.
13.5
Prope rties of the Poisson Process The st1rr1 N(t) = N I(t) + N 2(t) of independent Poisson processes N I(t) a nd J\T2(t ) is a Poisson process . The Poisson process N(t) can be decorr1posed into two independer1t Poisson processes J\TI (t) and N2(t ).
The rnerr1oryless property of t he Poisson process can also be seer1 in t he exponent ial ir1ter arrival times. Since P (X n > x] = e - >-x, t 11e condit ional probability t 11at X n > t + ;r: , given X n > t , is
P [x n
> t + X Ix n > t ] =
P [Xn > t
+ X , X n > t] = e - .A:.e .
p (X
n
>t
]
(13.19)
The ir1terpretation of Eq11ation (13.19) is t h at if the arrival h as not occurred by tirr1e t , t h e addit ior1al t irne until t h e arrival, X n - t , h as t he sarr1e exponent ial distribution as X n. That is, no rnatter how long we h a;ve v.raited for the arri·val, the r em air1ing tirne until the a,rri·val rem ains an exponential ( ,\) randorn ·variable. The consequence is that if -vve st a rt t o \Vat ch a Poisson process at an:y t irr1e t , we see a stochastic process t11at is indistir1guishable frorr1 a Poisson process st arted at t ime
o. T11is interpretation is the basis for \va:ys of corr1posing and decorr1posing Poisson processes . First \Ve consider the sum N (t ) = J\TI(t ) + J\T2(t) of t wo ir1dep endent Poisson processes N I(t ) a nd N 2 (t). Clearl}', N (t ) is a co11r1t ing process since any sarnple functior1 of N (t ) is nondecreasing. Since interar1ival tirnes of each JVi (t ) are cont inuous expor1enti:1,l r ar1dom \rariables , t 11e probability t hat bot h processes have a rrivals at the same tirr1e is zero. Thus N (t ) ir1creases by one a rri\ra l at a time. Furt her , Theorerr1 9. 7 showed t 11at t11e surr1 of independent Poisson r andorr1 variables is also Poissori. Thus for any time t 0 , N (to ) = J\TI (to)+ J\T2(t 0 ) is a Poisson randorr1 variable. This suggests (bt1t does not prove) that N(t) is a Poisson process. Ir1 the follo-vving theorem a,nd proof, we verify t 11is conj ect t1re b}' shoV\ring that JV(t ) has ir1deper1dent exponent ial interarriva1 tirnes.
Theorem 13.5 L et N 1 (t ) arid N 2 (t) be t'1JJ0 irideper1,der1,tPoissor1, processes of rates ,\ 1 an,d ,\ 2 . Th e co'un,tin,g JJrocess JV(t ) = N I (t ) + N 2 (t ) is a, Poisson, process of rate ,\ 1 + ,\ 2 .
[ 444
CHAPTER 13
STOCHASTIC PROCESSES
Proof vVe sho'v t hat the interarrival t imes of t he N(t) process are iid exponen tial rando1n
variables. Suppose the N(t) process just had an arrival. \t\lhether t hat arrival \Vas from N1(t) or N2(t), Xi, the residual t ime until the next arrival of Ni(t), has an exponential PDF since Ni(t) is a memoryless process. Further, X, the next interarrival time of t he N(t) process, can be Vi'ritten as X = min(X1 , X 2). Since X 1 and X 2 ar e independent of the past interarrival t imes, X must be independent of t he past interarrival t imes. In addition, \Ve observe that X > x if and only if X1 > x and X2 > x. This implies P[X > x] = P[X1 > x, X2 > ::r]. Since N1(t) and N2(t) are independent processes, X1 and X 2 are independen t random variables so that
P [X
> x]
= P [X1
Thus X is an exponen t ial (..\1
> x] P [X2 > x]
+ .A2)
= {
~-(A,+A,)x
x x
< 0, > o.
(13.20)
random variable.
We derived t11e Poisson process of r ate ,\ as the lirniting case (as ~ ---+ 0) of a Bernoulli arrival process that has ar1 arrival ir1 an ir1terval of ler1gth ~ v.rith probabilit}' .A~. vV11en we consider the st1rr1 of two independen t Poisson processes JV1(t) + N2(t) O\rer an inter\ral of length~ ' eac11 process N;,(t) can h ave a n a rrival wit11 probability Ai ~· The probability that both processes 11ave an arrival is ,\ 1 ,\ 2 ~ 2 . As ~ ---+ 0, ~ 2 << ~ and the probabilit}' of tvvo arrivals becorr1es insignificant in comparison to the probability of a single arrival.
Example 13.16 Cars, trucks, and buses arrive at a to ll booth as independent Po isson p rocesses w ith rates Ac = 1.2 cars/ m inute, At = 0.9 trucks/ minute , and AtJ = 0.7 buses/ minute. In a 10- minute interva l, what is the PMF of JV, the number of vehicles (cars, trucks, or buses) tha t arrive?
By T heorem 13.5 , the a rriva I of vehicles is a Poisson process of rate,\ = 1.2+0 .9+0. 7 2.8 vehic les per m inute. In a 10-mi nute interval, .AT = 28 and /ll has PM F 28n e- 28 /rd
n, = 0, 1, 2, ... ,
0
otherwise.
=
(13.21)
Theorern 13.5 desc1ibes the cornposition of a Poissor1 process. No\v we exarnine the decornposition of a Poissor1 process into t wo separate processes. Suppose whenever a Poisson process N (t) has an arri\ral, \Ve flip a biased coin to decide Vi' hether to call this a t ype 1 or t ype 2 arrival. That is, each arrival of N (t) is ir1dependentl}' labeled eit11er type 1 vvit11 probability p or type 2 vvith probability 1 - r>. T 11is r esults in tViro countir1g processes, N 1 (t) and N 2 (t), \vl1ere Ni(t) denotes the nurnber of type i arrivals b}' tirne t. We Vi'ill call this procedlire of breaking dovvn the J\T(t) processes into tvvo countir1g processes a Bern,011,lli decornposition,.
- - - Theorem 13.6- - The couritirig processes N 1 ( t) an,d N 2 ( t) deri'ved frorn a B errioulli decornposition, of the Poissori process JV ( t) are ir1,deperider1,t Poissori 1>rocesses 111ith rates ,\p arid .A(l - r>) .
[ 13.5
PROPERTIES OF THE POISSON PROCESS
445
Proof Let Xii) , X~i), . .. denote t he inter arrival t imes of t h e process N i (t) . \fi.l e will verify
1 2 2 · in · d epen d en t ran · d om sequ ences, eac h 'v1t · h ext h at X 1(l) , x(2 ) , .. . an d x 1C) , x 2<) , ... a re ponential CDFs. \Ne first consider t he interarrival t imes of t h e N1(t) process. Suppose time t marked a rrival n, - 1 of t h e N 1 (t) process. The next inter arrival t ime X 1\ 1 ) depends only on future coin flips and future arrivals of t he rnernoryless N(t) process a nd t hus is indepe nden t of a ll past interarrival t imes of either the Ni(t) or N2(t) processes. This implies t h e N1 (t) process is independen t of t he N2(t) process. A ll t h at remains is to sho'v t h at X 1\ 1 ) is a n exponen tial r andom variable. We observe t hat X 1\ 1 ) > x if t here ar e no type 1 a rrivals in t he interval [t, t + x] . For t h e interval [t, t + x], let Ni a nd N denote t h e n tun ber of arrivals of t h e N 1 ( t) a nd N (t) processes. In terms of N i and N, ,;ve can 'vri te 00
1
P [ X1\
)
> ::r] = PN1 (0) = L PN1 IN (Oln) PN (ri) .
(13.22)
n =O
Given N = ri, total a rrivals, N1 = 0 if each of t hese arrivals is labeled typ e 2. This ,;vill occur wit h probability PN1 IN(Oln,) = (1 - p ) 11 • Thus
P
[x
(l )
n
·] _ >::i, -
~(
L.__,1
_
p
11 )n (>.x) e- >.x = e- p>.x ~ [(1 - p),\x)1ie- (l - p)>.x
n =O
f
Tl,.
n!
L..., n =O
(13.23)
1
Thus P[X~1 ) > x ] = e- p>.x ; each X 1\ .1 ) h as a n expone nt ial PDF wit h mea n l/(p>.). It follo,vs t h at l\T1 (t ) is a l=>oisson process of r ate .\1 = p,\. The same argume n t can be used 2 to sho'v t h at each X 1\ . ) h as an exponen t ial PDF wit h mean 1/[(1 - p).\], imply ing J\T2 (t) is a Poisson process of rate >. 2 = (1 - p )>..
Example 13.17=== A corporate Web server records hits (requests for HTML documents) as a Poisson process at a rate of 10 hits per second. Each page is e it her an internal request (with probability 0.7) from the corporate intranet or an externa l request (with probability 0.3) from the Internet . Over a 10-minute interva l, what is the joint PMF of I, the number of internal requests, and X, the number of external requests? By Theorem 13.6, the internal and externa l request arrivals are independent Poisson processes with rates of 7 and 3 hits per second. In a 10-m inute (600-second) interva l, I and X are independent Poisson random variables with parameters a 1 = 7(600) = 4200 and ax = 3(600) = 1800 hits. The joint PM F of I and X is P1,x (i,1';) = P1 (i) Px(x) (4200)ie-4200 (1800) xe- 1800 i!
'i , x E
0
otherwise.
{O, 1, ... } ,
(13.24)
The Berr1oulli decorr1position of two Poisson processes and the sum of two Poisson processes are closel:y relat ed. Theorern 13.6 sa}'S two independent Poisson processes
[ 446
CHAPTER 13
ST OCHAS TIC PROCESSES
JV1 (t ) and J\T2 (t ) with rates ,\ 1 and ,\ 2 can be constrl1ct ed from a Bernol1lli decorr1position of a Poisson process J\T(t) vvith r ate ,\ 1 + .A 2 by choosing t he success probability to be JJ = .A 1 / ( ,\ 1 + ,\ 2 ) . Furt 11errnore, gi·ven these tvvo independer1t Poissor1 processes N 1 ( t ) and J\T2 ( t) der ived frorr1 the Berr1ot1lli decornposit ion, the original N (t ) process is t he s um of t 11e tvvo processes . T 11at is, N(t ) = N 1 (t) + N 2 (t). Thus whenever we observe tv.ro indep endent Poisson processes, vve can t hir1k of those processes as beir1g derived frorn a Bernoulli decornposition of a single process. T11is vie\v leads to t 11e following conclusion.
Theorem 13.1 L et N (t ) = J\T1 ( t )+ J\T2 ( t) be the S'tJ,rn of tv.Jo in,dep en,den,t Poisson, processes V.J'ith rates ,\ 1 a'nd ,\ 2 . Gi'IJen, that tlie N (t) process has an, arri'IJal) the con,dition,al probability that the arri'IJal is frorn N 1(t) is .A 1/(.A 1 + .A2). Proof We can v ie'v Ni(t) a nd N2(t) as bein g d erived from a Bernou lli d ecomposit ion of N(t ) in which a n a rrival of ]'l(t ) is la b eled a type 1 arrival wit h prob a b ility A1 /(.A1 + .\2) .
By Theorem 13.6, Ni (t) a nd J\T2(t ) are independen t P oisson processes wit h rate A1 a nd A2, r espectively. l\/Ioreover, given a n arrival of t h e N(t ) process, t he condit iona l proba bility t h at an a rrival is a n a rrival of t he f\T1 ( t ) process is also A 1 / (A1 + A2) .
A second way to prove Tl1eorern 13.7 is outlir1ed ir1 Problem 13.5.5. Quiz 13.5
Let N(t ) be a Poisson process of r ate .A. Let N '(t ) be a process in which \ve count only even-r1t1rr1bered arri\r~tls; t 11at is, arriva.ls 2 , 4, 6, . .. , of t he process JV(t). Is N ' (t ) a Poisson process?
13.6
The Brownian Motion Process The Brownian motion process describes a or1e-dirr1ensional r andorr1 walk in \vh ich a t every instant , the position changes by a srnall increment t h at is ir1dependent of the current position and past history of t11e process. The positior1 change over any t irr1e interval is a Gaussian randorn variable with zero expected va.lt1e and variance proportional to t11e time interval.
The Poisson process is a.n example of a continuot1s-t irne, discret e-value stochastic process. Nov.r we \vill exa.rr1ine Brov.rnian rnotion, a continuous-tirne, cor1t inuot1svalue stoc11astic process.
===- Definition 13.10 Brownian Motion Process A Brownian motion p rocess Ml (t ) has the pr operty that W (O) = O; an,d fo r T > 0, W (t + T) - W (t) is a Ga'ussian, (0, j"(n'-) ra/ndorn '/Jariable that 'is in,deperiden,t of W(t') fo r all t' < t.
[ 13.6
THE BROWNIAN MOTION PROCESS
447
For Brov.rnian rnotion) \f.!e can viev.r vT! (t) as the position of a particle on a line. For a small tirne increment 8)
W(t
+ 8)
= T¥(t)
+ [vV(t + 8) -
T¥(t)].
(13.25)
Althot1gh t:his expar1sion rnay seem trivial, by t11e definitior1 of Brownian rr1otion, the incrernent X = T¥(t + <5) - T¥(t) is indeper1dent of vT! (t) and is a G at1ssian (0, ~) random variable. T his property of the Bro\vnian motion is called 'iridepen,den,t in,crernen,ts. T11us after a t ime st ep 8) t11e particle's positior1 has moved by an amot1nt X that is independent of the previot1s position W (t). The positior1 change X ma}' be positive or nega,tive. Brovvnian motion was first described in 1827 by botanist Robert Brown when he vvas examining the movem ent of poller1 grair1s in \A.Tater. It vvas believed that the rnoverr1ent \Vas the r est1lt of the internal processes of the living pollen. Brown fot1nd t11at the sarne movernent could be observed for any fir1ely grot1nd rr1ir1eral particles. In 1905, Albert Eir1st ein identified the source of t11is rnovement as randorr1 collisior1s \A.Tith water rr1olecules in t11errr1al motion. The Brovvnian rnotior1 process of Definit ion 13.10 describes t11is rr1otion along one axis of motion. BroV\rr1ian rr1otion is ar1other process for \vhich "''e can derive the PDF of the sample vector W = [W(t 1 ), · · · , vT! (tk)]' .
===Theorem 13.8---==:::i For the Bro'uJn,iari rnotion, process Ul (t ); the PDF of W = [W(t 1 ), ... , W(tk)]' is
Proof Since vV(O)
= 0,
vV(t1)
= X(t1)-vV(O)
is a G aussian rando1n variable. Given t ime instan ts ti, ... , tk, we define to = 0 and, for n = 1, ... , k, we can define t he increments X n = vV(tn) - vV(tn- 1). Note t hat X1, ... ,Xk are independen t random variables such that Xn is Gaussian (0 , Ja(tn - tn - 1)).
j .X n (X ) --
1
..j27ra(tn - tn - 1)
2
e
- x /(20- (t n - t n-1 ))
.
(13.26)
Note t hat W = w if and only if W1 = 'UJ1 and for n, = 2, ... , k, Xn = 'UJn -7Dn- l · Although \Ve omit some s ignifican t steps t hat can be found in Problem 13.6.5, t his does imply k
f,,-(w ) =
IJ fxn ('UJn - 7Dn- 1) .
(13.27)
n=l The t heorem follo,vs from substitution of Equat ion (13.26) into Equation (13.27).
==-- Quiz 13.6 Let W(t) be a BroV\rnia n rnotior1 process wit11 variance Var[W(t)] = CJ.t. Sho"'' that X(t) = vT! (t)/ ya is a Brownian motion process "''ith variance Var[X(t)] = t.
[ 448
CHAPTER 13
STOCHASTIC PROCESSES
13.7
Expected Value and Correlation
The expected vall1e of a st ochastic process is a funct ior1 of t irne. The atltocovariance and autocorrelation are ftlnctions of t wo tirr1e variables. All three fun ctions indicat e the r a t e of change of the sarnple fur1ctions of a stochastic process . Ir1 studyir1g randorr1 ·v ariables , we often refer t o properties of the probabilit}' rr1odel Stlch as the expect ed vall1e, the ·v ariance, t 11e co-variance, and the correlation. These p ararnet ers are a fev.r nurnbers t hat surnmarize t he cornplete probabilit y rnodel. In the case of st ochastic processes, deterrr1inistic f\1r1ctions of time pro·vide corresponding Stlrr1rnaries of t 11e proper t ies of a corr1plete rnodel. For a stoc11astic process X (t ), X (t 1) , the value of a s arnple function at tirr1e inst ant t1 , is a r andorn va,riable. Hence it h as a PDF f x (t 1 )(x ) and expect ed ·v alue E[X (t 1)). Of course, once v.re knov.r the PDF f·x (t 1 )(x), e·verything vie have learned about randorn variables and expected values car1 be applied to X(t 1) and E [X (t 1)). Since E[X (t )) is sirnpl}' a r1umber for eac11 value oft, t 11e expect ed value E [X (t )] is a deterrr1ir1istic function oft. Since E [X(t)] is a some-vvhat Cl1rr1bersome notat ion , the next definit ion is just a ne-vv notation that err1phasizes t hat the expected value is a function of time . Definition 13. ll The Expected Value of a Process The expected value of a stochastic process X (t) is the deterrnin,istic f'u'nct'iori
µ,x(t) = E [X(t)) .
Example 13.lR-=:::i If R isa no nnegative random varia ble, find the expected value of X (t ) = R I cos 27rf't l.
The rectified cosine signal X (t ) has expected val ue
µ,x (t) = E [R lcos 27rf t i] = E [R] lcos 27rf t i .
(13.28)
Frorn t he PDF f'x (t) (x) , we ca n also ca lculate t11e varia nce of X (t). ·vVliile t he variance is of some ir1terest , t 11e covariance f\1nction of a stochastic process pro·vides very importar1t ir1format ion about t he t ime struct ure of t11e process . Recall that Cov [X , Y) is an ir1dicatior1 of hov.r much inforrnat ion ra.ndorr1 variable X provides abol1t r andorn variable Y. \ i\Ther1 t h e rnagr1itude of the covarian ce is 11igh, a n observation of X provides a n accl1rate indication of t 11e value of ·y . If the t\vo r andorr1 variables are o bservat ior1s of X (t) t aken at t-vvo different tirnes, t 1 seconds and t 2 = t 1 + T seconds, the covariance indicat es ho\v rnt1ch the process is likely to c11ange in the T seconds elapsed bet ween t 1 and t 2 . A high covariance indicat es that t he sample functior1 is unlikely t o c11a r1ge rr1l1ch in t 11e T-second interval. A covariar1ce near zero suggest s r apid change. This ir1forrr1ation is cor1\reyed by t he autocovar·ian,ce fur1ctiori.
[ 13.7
EXPECTED VALUE AND CORRELATION
449
.....--- Definition 13.1 Autocovariance The aut ocovariance fu,nction, of the stochastic process X (t) is
The aut ocovariance fv,rict'iori of the ran,dorn seqv,en,ce Xn is
For r andom sequences, ""' e have slightly rnodified t11e notation for autocovariance by placing t11e argt1rnents ir1 sq11are brackets jt1st as a rerr1ir1der t11at the functions have ir1teger argt1ments. For a cor1tin11011s-tirne process X (t), the at1tocovariar1ce definition at T = 0 implies C x (t , t ) = Var [X (t)]. Equivaler1tly, for k = 0, C x [n,, n,] = Var[Xn] · The prefix av,to of at1tocovariance emphasizes that Cx(t , T) rneast1res t11e covariance betweer1 two samples of t11e sarr1e process X (t). (Ther e is a lso a crosscovariance function that d escribes the relationship between two different r ar1dom processes.) The a utocorrelation funct ion of a stoch astic process is closely related to the at1tocovariance function.
=== Definition 13.13 Autocorrelation Function The aut ocorrelation f unction of the stochastic JJrocess X (t) is
Rx (t , T) = E [X(t)X(t
+ T)].
The au tocorrelation fv,n,ction, of the ran,dorn sequen,ce Xn is
Frorr1 Theorem 5.16 (a) , we 11ave the following result.
- - - Theorem 13.9 The a11,tocorrelatiori an,d a'u,tocovarian,ce fv,rict'iorM3 of a process X(t) satisfy
The
av,tocor~relatiori
an,d a'u,tocovarian,ce ftJ,n,ctioris of a raridorn seq'1J,er1,ce Xn satisfy Cx (71,, k] = R x [r1, , k] - 11,x(n,)µ,x(n, + k) .
Since the autocovariance tind autocorrelation are so closely related , it is reasonable to ask vvhy vve need both of thern. It vvould be possible to use only one or the other
[ 450
CHAPTER 13
STOCHASTIC PROCESSES
ir1 cor1jt1nction v.rit h t he expected va1t1e µ,x(t) . T11e ar1s\ver is that each fur1ction h as its uses. In particular , the at1tocovariar1ce is rnore usef\11 -vvh en we \Vant to t1se X (t) to predict a ft1tt1re value X (t + T). On t he other hand , s ince Rx (t , 0) = E[X 2 (t)], the autocorrelation describes t 11e a\rerage po\ver of a r andorn sigr1al.
Example 13.19 Find the autocovariance process X (t) .
Cx(t T) and autocorrelation Rx(t,T) of the Brownian motion
From the definition of the Brownian motion process, we know that µx (t) = 0 . Thus the autocorrelation and autocovariance are equa l: Cx(t , T) = Rx (t , T) . To find the autocorrelation Rx(t, T), we exploit the independent increments property of Brownian motion. For the moment, we assume T > 0 so we can write Rx(t, T) = E[X(t)X(t + T)] . Because the definition of Brownian motion refers to X (t + T) - X(t), we introduce this qua nt ity by substituting X(t + T) = X(t + T) - X(t) + X(t) . The resu lt is
+ T) - X(t)) + X(t) ]] 2 = E [X(t)[X(t + T) - X(t)]] + E [X (t)] . (13.29) By the defin ition of Brownian motion , X (t) and X (t + T) - X (t) are independent, with Rx(t, T)
= E
[X(t)[(X(t
zero expected va lue. This implies
E [X(t)[X(t
+ T)
- X(t) ]] = E [X(t) l E [X(t
+ T)
- X(t) ] =
o.
(13.30)
Furthermore, since E[X(t)] = 0 , E[X 2 (t) ] = ·v ar [X(t)] . T herefore , Equation (13 .29) im pl ies
Rx (t, T) When
T
Rx(t,T)
<
=
E [X 2 (t)]
=
at,
(13.31)
0 , we can reverse the labels in the preceding argument to show that a(t + T). For arbitrary t and T we can combine these statements to
write
Rx (t , T)
=
a rnin {t , t
+ T} .
(13.32)
=== Example 13.20===The input to a digital fi lter is an iid random sequence ... , X _ 1 , X 0 , X 1 , ... with E[Xi] = 0 and ·v ar[X ,i ] = 1. T he output ... , Y_ 1 , ·y 0 , Y 1 , ... is related to the in put by the formula
Yn = X n + X n-1
for a 11 integers ri .
Find the expected value E[Yn] and autocovariance function
(13.33)
Cy[rn,, k].
Because Yi = X i + X ,i -l · we have from Theorem 5.10, E [Yi] = E [X ,i] + E [X i_ 1 ] = 0. Before calculating Cy[1n,, k], we observe that Xn being an iid random sequence with E[Xn] = 0 and ·v ar[X n] = 1 implies
1 k = 0, 0
otherwise.
(13.34)
[ 13.7
EXPECTED VALUE AND CORRELATION
451
For any integer k, we can write
Cy [rn,, k] ['YrnYrn+k ] = E[(Xrn + X rn-1)(Xrn+k + Xrn+k-1) ) = E [Xrn Xrn+k + XrnXrn+k -1 + Xrn-1X1n+k + X rn- 1X 1n+k- 1]. = E
(13.35)
Since the expected va lue of a sum eq uals t he su m of the expected va lues,
Cy [rn,, k]
=
C x [rri, k]
+ C x [rn,, k - 1] + Cx [m, - 1, k + 1] + Cx [m, - 1, k] .
(13.36)
We stil l need to evaluate this expression for all k. For each value of k, some terms in Equation (13.36) wi ll equa l zero sin ce C x [rn , k ] = 0 fork -=/= 0. In particular, if lkl > 1, t hen k, k - 1 and k + 1 are nonzero, implying Cy [rn,, k] = 0. When k = 0 , we have
Cy [rn,, O] = Cx = 2.
['1T1,,
O] + Cx [m,, -1] + Cx [rn, - 1, 1] + Cx
['JTI, -
1, O] (13.37)
For k = -1 , we have
Cy [rn,, -1] = C x
['IT/,,
-1] + C x [rn,, - 2] + C x [rn, - 1, 0] +Cx [m, -1 , -1] =1.
(13.38)
The fi nal case , k = 1, yields
Cy ['!Tl,, 1] = Cx [rn,, 1] + Cx [rn , O] + Cx [m - 1, 2] +Cx[rn -1 , l ] =l.
(13.39)
A complete expression for t he autocovariance is
Cy [rn, k] =
20
lkl
k
= -1 , 0, 1,
otherwise.
(13.40)
We see that sin ce t he filter output depends on the two previous in puts, the fil ter outputs Yr1• and Yr1.+ 1 are correlated, whe reas filter outputs that are two or more time instants apart are uncorrelated.
An interesting propert~y of t11e autoco·v arian ce funct ior1 found in Exarnple 13.20 is that Cy [rn, k] deper1ds or1ly on k and not on rn . In the next section , "if..Te learn that this is a property of a class of r ar1dorn sequer1ces referred to as statioriary randorn sequences. Quiz 13.7 X(t) has expected value µ.x(t) and aut ocorrelation Rx(t,T). '\Ve rnake the noisy observation ·y ( t) = X (t) + N (t) , vvhere /ll (t) is a r ar1dom noise process independent of X(t ) with 11,N(t) = 0 arid autocorrelation R N(t,T) . Fir1d the expected ·v alue and at1tocorrelation of Y (t).
[ 452
CHAPTER 13
STOCHASTIC PROCESSES
13.8
Stationary Processes A stochastic process is stationary if the probabilit}' model does not vary with tirr1e.
R ecall t11at in a stocl1astic process, X (t), there is a randorr1 ·variable X (t 1 ) at every tirne ir1st ant t1 -vvith PDF f'x (t 1 )(:i;). For rr1ost randorr1 processes, t he PDF f'x (t1 )(x) depends on t 1 . For example, -vvhen we rnake daily terr1perature readings , we expect that readings taken ir1 t he v.rinter "''ill be loV\rer thar1 terr1per atures recorded in the surnmer. Ho,;vever , for a special class of r andorn processes kr1own as statioriary processes, f x(ti)(:i;) does not deper1d on t 1 . T11at is, for any two t irr1e inst ants t 1 and t 1 + T , (13.41) Therefore, in a stationary process, \ i\Te observe t l1e sarne r ar1dorr1 variable at all time ir1stants. T he key idea of st ationarity is that t he statistical proper t ies of the process do riot change \iVith tirne. Eql1ation (13.41) is a necessary condition but not a sufficient condit ior1 for a, stationary process. Since the statistical properties of a r andorn process are described by PDFs of randorr1 vectors [X(t 1 ) , ... , X(trn)], \Ve have t he following defir1it iori.
- - - Definition 13.14
Stationary Process A sto chastic process X (t) is st ationary if an,d on,ly if for all sets of t'irne in,sta'nts t1, ... , trn' arid an,y tirne differen,ce T,
A ran,dorn seqv,erice X n is st ationary if an,d on,ly if for an,y set of in,teger tirne in,stan,ts ri 1 ,
. .. ,
ri,rn, an,d in,teger tirne differen,ce k,
Generall}' it is not obvious wh ether a. stoch astic process is stationary. Usl1ally a stocl1astic process is not stationar}'· However, proving or disproving stationarity can be t ricky. Curious r eaders rna}' wish to deterrrlir1e w hich of t h e processes in earlier exarnples are st atior1ary.
Exa mple 13.21
==
Is the Brown ian motion process with parameter a introd uced in Section 13.6 stationary?
JCITi')
For Brown ian motion, X(t 1 ) is the Gaussian (0, random variab le. Sim ilarly, X(t 2 ) is Gaussian (0, .JCIT2) . Since X(t 1 ) and X(t 2 ) do not have t he same variance, fx(t 1 )(:i;) i= f x(t 2 )(:i;), and the Brownia n motion process is not stationa ry.
[ 13.8
453
STATIONARY PROCESSES
The following theorem applies to applications in w11ich \f.fe rnodify one stochastic process to produce a new process. If the origir1al process is stationary and the transformation is a linear operation, the n ew process is also stationa ry .
Theorem 13.10 Let X(t) be a stationary ran,dorn process. For con,stan,ts a > 0 an,d b, Y(t) aX (t) + b 'is also a stat'irJ'nary process. Proof For an arbitrary set of t ime samples ti, ... , tn, we need to find t he joint PDF of
Y(t1), ... , Y(tn )· We have solved this problem in Theorem 8 .5 where we found that .
fY(t 1 ) , ... ,Y (t n )
(
)
YI,··· ' Yn =
1 . (Y1 laln fx (t 1 ) , ... ,X (t n) a,
b' · · · ' Yn a,- b) ·
(13.42)
Since t he process X (t) is stationary, vve can 'vrite .
fY (t1+T) , ... ,Y (t n +T)(y1 , ... ,yn)
=
1 .
a,nfX (t 1+ T ) , ... ,X (t n +T)
1 .
=a,nfX (t 1 ) , . .. ,X(tn)
( Y1 -
( Y1 a,
a,
b, ... , Yn - b) a,
b, ... , Yn a,- b)
= fY(t1 ) , . .. ,Y (t n) (y1, · · · ' Yn) ·
(13.43)
Thus Y (t) is also a stationary random process.
There are rnar1y consequen ces of t11e tirne-irrva r ia nt n at ure of a stationar}' randorn process. For example, setting rn, = 1 in D efinition 13.14 leads imrnediatel}' to Equation (13.41). Equatior1 (13.41) implies , in turn , that t11e expected ·vall1e f\1nct ion in D efinit ion 13.11 is a cor1stant. Furtherrr1ore, the autocovariance functior1 and the a utocorrelation fur1ction defined in D efinition 13.12 and D efinition 13.13 are indeper1dent of t a nd deper1d only on the t ime-difference variable T. Therefore, -vve adopt t11e notatior1 C x (T) and Rx (T) for t11e autocovariar1ce function a nd autocorrelation function of a stationary stochastic process.
Theorem 13.11 For a station,ary process X (t), the ex;pected value, the a/u,tocorrelation,, an,d the tocovariarice have the follo'tJJin,g properties for all t: {a) µ,x (t)
O/IJ,-
= µx,
{b) Rx (t )T)
= R x ( 0) T) = Rx (T) )
{c) Cx(t ,T)
= Rx(T) - 11,~ = Cx(T).
For a station,ar~y raridorn seqv,en,ce X n the expected value, the a/u,tocorrelation,, an,d the a/u,tocovarian,ce satisfy for all ri
{a) E[Xn] = µ,x, {b) Rx [n,, k] = Rx [O, k] = Rx [k], {c) Cx [ri , k] = Rx [k] - /J,~
= Cx
[k].
[ 454
CHAP TER 13
STOCHASTIC PROCESSES
Proof By Definit ion 13.14, stationarity of X (t) implies fx ctlx) = fx (o)(x), so t hat
(13.44) Note that µ,x (0) is just a constant t hat we call JLx . Also, by Definition 13.14, (13.45) so that
Rx(t,T) =E[X(t)X(t+T)] =
1_: 1_:
x1x2fx(o) ,X(T)(x1,x2) dx1dx2
(13.46)
= Rx(O,T) = Rx(T).
(13.47)
Cx(t,T) = Rx(t,T) - JL~ = Rx(T) - µ~ = Cx(T).
(13.48)
Lastly, by T heorem 13.9,
\'f.le obtain essentially t he same relat ionships for random sequences by replacing X (t) and
X(t
+ T)
wit h Xn and Xn+k·
=== Example 13.22=== At the receiver of an AM radio, the rece ived signa l conta ins a cosine carrie r signa l at the carrier frequency f'c with a random phase 8 that is a sample va lu e of the unifo rm (0, 27r) random va riable. The received carrier signa l is (13 .49) What are t he expected va lue and autocorrelation of t he process X(t) ? The phase has P DF
fe(B) =
1/(27r) 0
0 < (} < 27r. otherwise. -
-
I
(13.50)
For any fixed angle a and integer k,
E [cos( a + k8 )) =
1
2~
0
1 cos( a + k(})- rl(} 27r
s ir1( a + kB) k
Choosing a = 27rfct. and k
=
2~
_ sin ( a
-
(13 .51)
+ k27r) k
sin cv. _
- 0.
(13 .52)
0
1, E [X(t)] is
µ,x(t) = E [A cos(27r f.ct + 8 )) = 0.
(13 .53)
[ 13.9
WIDE SENSE STATIONARY STOCHASTIC PROCESSES
We w ill use the iden t it y cos A cos B autocorrelatio n:
= (cos(A - B ) + cos(A + B )]/ 2
455
to fi nd th e
R x(t , T) = E [A cos(27rf .ct + 8 )A cos(27rf c(t + T) + 8) ]
A2
= 2 E(cos(27rf cT) + cos(27rf .c(2t + T) + 28)]. For a = 21rf c(t
+ T) and
(13.54)
k = 2,
E (cos( 27r}.c(2t
+ T) + 28)] = E
[cos((J'. + k8 )] = O.
(13.55)
T hus
(13.56) T herefore, X (t) is stationary. It has the pro perties o f a stationary stochastic process listed in T heo rem 13.1 1.
Quiz 13.8 Let X 1 , X 2 , ... b e a n iid r andorr1 seqt1en ce. Is X 1 , X 2 , ... a st ationary r ar1dom seqt1ence?
13.9
Wide Sense Stationary Stochastic Processes A stochastic process is vvide ser1se st at ionary if t he expect ed valt1e is constant with t ime and the autocorrelation dep ends only on the t irr1e difference bet\veen two r ar1dom variables . A vvide sense stationary process is ergodic if expected values su ch as E[ X (T )] and E[X 2 (t)] are equal to corresponding t ime averages.
There are rnar1y applicatior1s of probability t heor}' in which investigators do riot h ave a complete probabilit}' rnodel of ar1 experirnent. Even so, rr1uch can be accorr1plished \vith partial ir1forrr1ation abot1t t he model. Often the p art ial inforrr1ation t akes t he forrn of expect ed values, varian ces, correlations, and covariances . In t 11e con tex t of stochast ic processes, wh en t h ese p ar a rr1et ers satisfy t h e con dit ions of Theorerr1 13.11 , we refer t o t he relevar1t process as 'llJ'ide sen,se station,ary. ~=;... Definition 13.1~-==Wide
Sense Stationary X (t) is a wide sense st ationary st ochastic process if a/nd on,ly if f or all t)
[ 456
CHAPTER 13
STOCHASTIC PROCESSES
Xn is a wide sense st at ionary r andom sequence if arid orily if for all n,,
Theorerr1 13.11 irr1plies tha,t e\rery stationary process or sequence is also \vide sense stationary. HoV\rever) if X (t) or Xn is wide sense stationary) it rnay or rnay riot be stationary . Thus "'ride sense st a.tior1ary processes include stationar:y processes as a subset. Sorne texts use the t errn strict sen,se station,ary for vvhat \Ve 11ave sirnpl}' called station,ary.
Example 13.23 In Example 13.22 , we observe that /J,x(t) = 0 and Rx(t, T) = (A 2 / 2)cos 27rf.cT· Th us the random phase ca r rier X (t) is a wide sense stationary process.
The autocorrelation fur1ct ion of a. wide sense stationar}' process 11as a number of irr1portant properties.
---== Theorem 13.12 For a 'tuide sen,se station,ary process X (t), the O/tJ,tocorrelatiori f11,n,ction, Rx (T) has the follo'tJJ't'ng properties:
Rx(O)>O ,
Rx(T) = Rx( - T))
Rx(O) > IRx(T) I .
If Xn is a 'tJJide sen,se station,ary raridorn seq'1J,er1,ce:
Rx [O] > 0,
Rx [k] = Rx [- k] ,
Proof For t he first property, Rx(O)
have E[X 2 (t)] to obtain
>
Rx [O] > IRx [k] I .
= Rx(t,O) = E[X 2 (t) ]. Since X 2 (t) > 0, we must
0. For t he second property, we substitute 'IL= t +Tin Definition 13.13
Rx(t,T) = E [X('u-T)X('ll,)] = Rx('ll,, -T).
(13. 57)
Since X ( t) is wide sense stationary,
Rx(t,T) =Rx(T) =Rx('IJ,,-T) =Rx(-T).
(13. 58)
The proof of the t hird property is a little more complex. F irst, \Ve note t hat w hen X(t) is,videsensestationary, 'lar[X(t)] = Cx(O), a constan t for all t. Second, Theorem 5.14 implies t hat (13. 59) Now, for any numbers a, b, and c, if a
2 )2 < (Cx(O)+µ,x2 )2 . (Cx(t,T)+JLx
(13.60)
[ 13.9
WIDE SENSE STATIONARY STOCHASTIC PROCESSES
457
In t his expression, t he left s ide equals (Rx (T)) 2 and t he righ t s ide is ( Rx (0)) 2, which proves t he t hird part of t he t heorem. The proof for t he ra ndom sequence X n is essent ially t he same . P roblem 13.9.10 asks t he read er t o confirm t his fact .
Rx (0) h as an importan t physical interpret ation for electrical engineers. Definition 13.16
Average Power
The average power of a 'llJide serise stat'ioriary process X(t ) is Rx (0)
= E[X 2 (t)].
The average power of a 'tJJide sen,se station,ary seq'ueri,ce X n is R x [O] = E [X.~ ].
This definition relates t o the fact that in ar1 electrical circuit, a signal is rneasured as either a voltage v (t) or a, current 'i (t ). Across a resistor of RD, the ir1st antaneous power dissipated is v 2 (t) / R = i 2 (t )R. vV11en t 11e resist ance is R = 1 D, t 11e inst ant aneol1s po'iver is v 2 (t ) 'ivl1en 'ive rneasure t he \roltage, or i 2 (t) wher1 vve rneasure the current. '\ ll her1 Vl'e use x(t ), a sample fur1ct ion of a 'ivide sense st ationary stochastic process, to rnodel a voltage or a Cl1rrent , the instanta neous power across a 1 D resistor is x 2 (t). We usl1a ll}' assurr1e implicitly t he presence of a 1 D resistor and refer to 1'; 2 (t) as the ir1stantar1eous power of x(t). By extension, vve refer to t he randorr1 variable X 2 (t ) as t h e inst an taneol1S power of the process X (t ). Definition 13. 16 t1ses the t errninology average JJO'llJer for t he expected \ralue of the instanta r1eous power of a process. Recall that Section 13. l describes enserr1ble averages and time a'irer ages of stochast ic processes . In Ollr presentation of st ationary processes, we have encountered only er1sernble averages including t he expect ed \ralue, t 11e al1tocorrelat ion, the aut ocovar iance, a rid t 11e average power. Er1gir1eers, on the ot her hand, ar e accl1storned t o observ ing t irne avertiges. For example, if X (t) rr1odels a voltage, the tirr1e average of sarr1ple function x(t) over an interval of duration 2T is -X (T ) = - 1
2T
JTx(t) rlt.
(13.61)
- T
This is the DC voltage of x (t ), V1rh ich can be rneasured V1rith a volt rnet er. Sirnilarly, a t irr1e aver age of t 11e power of a sarnple ft1nction is
Xi (T ) =
T1 2
JTx (t ) rlt. 2
(13.62)
- T
The relatior1ship of these time a'irerages to t he corresponding ensernble averages, /J, x and E [X 2 (t)], is a fascinating topic ir1 the study of stochastic processes. \ N'her1 X (t ) is a st ationary process such that lirr1T-+oo X (T ) = µ,x , t he process is referred t o as ergodic . In words, for an ergodic process, t 11e tirne average of t he sample f\1nction of a vvide sense stationary stochastic process is equal to the corresponding er1sern ble average. For an electrical signal modeled as a sarr1ple function of an ergodic process, µ,x and E[X 2 (t)] and rr1an y ot her er1sem ble a'irerages can be obser'ired 'ivith familiar rneasuring equiprnent .
[ 458
CHAPTER 13
STOCHAS TIC PROCESSES
A lt hough the precise d efinition and anal:ysis of ergodic processes are beyond the scope of this ir1t roductor}' text , we can use the t ools of Cl1apter 10 to m ake sorr1e additional observations. For a station ary process X (t) , -vve can ·v ie"'' t11e tirne a;verage X (T ) as a r1 estimate of t h e p ar am eter µx , analogous t o t he sarnple rnean Nln( X ). T11e differ ence, ho-vve·ver , is that t he sample rnean is an a·v er age of independent r andom variables, whereas sample ·v alues of the randorr1 process X (t ) are correlat ed. Ho,vever , if t 11e autocovariance C x (T) approaches zero quickl}', t11en as T becorr1es large, rnost of t he sample values have little or no correlatior1, and \Ve "''ould expect the process X (t ) t o be ergodic. T his idea is rr1ade rnore precise ir1 t11e folloV\ring theorern.
Theorem 13.13 L et X (t) be a statioriary raridorn process 'llJith ex;pected value 11,x an,d a11,tocovariarice 00 Cx(T). If f 00 ICx(T)ldT < oo, then, X (T ),X (2T), .. . is ariun,biased, corisisten,t sequen,ce of estirnates of /.J,x. Proof First \Ve verify t h at
E [X (T)]
X (T)
is unb iased:
= ~E
2 [/_TX (t) dt ] = 2~ J_'T E [X (t )] dt = 2~ /_~,, µx dt = JLx . - T
- T
(13.63)
- 1
-
T o show consistency, it is sufficient to show t h at lim r -+oo Var [X (T)] = 0. F irst, \Ve observe 1 'T t h at X (T) - µ,x = 2T J_,r (X(t ) - µ,x) dt. This im p lies
Var[°X (T)] = E [ ( = E [
~ J_'~r(X(t) -
( 2 ~') 2
( /_': . (X T
=
2
µx) dt)
]
(t ) - /LX) dt)
(/_~~.(X(t') -
= (
2~)
2 /_
l
'T
( 2~) 2 J_ T J_ 'r E [(X (t ) -JLx)(X (t' ) -JLx)] T
/LX) dt' )
dt ' dt
'T
(13.64)
T /_ 'l" CX ( t' - t ) dt' dt.
\ \! e note t hat
1_·:.Cx(t' -t )dt' < J_'~r < 1_:
ICx(t' - t) I dt' ICx(t' -t) I dt' =
1_:
ICx(T) I dT < oo.
(13.65)
H ence t h ere exists a constan t K such t h at -
Var [X (T)] Thus lim1"-+oo Var[X (7"')]
1
< (2T) 2
< lim 'l"-+oo -#
= 0.
/_
1
" - T
K dt
K
= TI,.
(13.66)
[ 13.10
CROSS-CORRELATION
459
Quiz 13.9
W hich of t11e follov.rir1g functions are valid autocorrelation f11nctions? (a) R 1(T) = e-ITI
(b) R 2(T) = e- 7 2
(c) R 3 ( T ) = e -
( d)
13.10
T
cos T
2
R4 ( T) = e - T sin r
Cross-Correlation The cross-covariance and cross-correlation funct ior1s p art ially describe the probability rr1odel of two wide sense st ationary processes.
Ir1 rnany a pplications, it is necessar}' t o consider the relationship of t wo stoc11astic processes X(t ) and Y (t ), or t wo randorr1 sequences X n and Yn. For certain experirnents, it is a ppropriat e t o rnodel X (t ) and ·Y (t) as independent processes . Ir1 this simple case, any set of ra ndorr1 variables X (t 1 ), ... , X(tk) from t h e X (t ) process is indeper1dent of an}' set of r andom ·v ariables ·Y (t~ ), ... , ·Y (t j) frorn t he Y (t ) process. In genera l, however, a cornplete proba bility rnodel of t-vvo processes consists of a joir1t P 1!{F or a joint PDF of all set s of r andorr1 ·v ar iables contained in the processes. St1cl1 a joint probability ft1nction corr1plet ely expresses the relationsliip of the t -vvo processes. However, fir1ding and working wit h st1ch a joint proba bility function is t1sually prohibit ively difficult. To obt ain useful tools for analyzing a p air of processes, -vve recall that t he covariance and t he correlation of a pair of r andorr1 variables provide ·v aluable inforrr1ation about the relationsmp betv.reen the r andom variables . To use t 11is inforrr1ation t o t1nderstand a pair of stoc11astic processes, we -vvork v.rith the correlatior1 and co-variance of the r andorn variables X (t ) and Y (t + T).
T he cross-correlation of coritiri11,o'us-tirne ran,dorn processes X (t ) arid Y (t ) 1,s
Rxy (t , T)
=
E [X(t )Y (t + r )].
T he cross-correlation of ra/ndorn seq'uen,ces X n an,d ·11;1, 1,s
Just as for the autocorTela,tion, t 11ere are rnany interesting practical applications ir1 -vvhich t 11e cross-correlation depends only on one t ime variable, t 11e tirne diflerer1ce T or the index difference k . Definition 13.18 Jointly W ide Sense Stationary Processes Cor1,tir1,11,o'us-tim,e randorn processes X (t ) an,d Y (t ) are jointly wide sense s tationary if X (t ) an,d Y (t) are both v.1ide sen,se station,ary; a'nd the cross-correlat'ion, deperids on,ly on, the tirne differerice bet'tlJeen, the tv.10 ran,dorn variables:
[ 460
CHAPTER 13
STOCHASTIC PROCESSES
Ran,dorn sequerices Xn and Yn are jointly wide sense stationary if Xn an,d ·yn are both 111ide sen,se station,ary an,d the cross-correlation, depen,ds on,ly on, the in,de1; differen,ce bet111een, the t'uJo ran,dorn 1Jar~iables : Rxy [rn,, k]
=
R xy [k] .
vVe encounter cross-correlations in experiments that invol've noisy observations of a vvide sense stationary ra,ndorn process X (t).
-=
Example 13.24
Suppose we are interested in X(t) but we can o bserve only
'Y (t) = X(t)
+ N(t),
(13.67)
N(t) is a noise process t hat inte rferes with our observation of X(t). Assume X(t) and N(t) are independent wide sense stationary processes with E [X(t)] = 11,x and E [N(t)] = 11,N = 0. Is Y(t) wide sense stationary? Are X(t) and Y(t) jointly wide
where
sense stationary? Are Y ( t) and j\/ ( t) jointly wide sense stationary? Since the expected va lue of a sum equals the sum of t he expected va lues,
E [Y (t) ] = E (X (t)]
+ E (N (t)]
= µx
.
(13.68)
Next, we must f ind the autocorrelation
R y (t, T) = E [Y(t)Y(t + T)] = E[(X(t) + N(t)) (X(t + T) + N(t + T))] = Rx(T) + RxN(t,T) + R Jvx(t,T) + R N(T) . Since X(t) and larly, Rx1v (t, T)
(13.69)
1'l(t) are independent, R 1vx(t,T) = E [N(t)]E[X(t + T)] = 0. Sim i= µ,x11,N = 0. Th is implies (13.70)
The right side of this equation indicates that Ry (t, T) depends only on T , which impl ies that Y(t) is wide sense stationary. To determine whether Y(t) and X(t) are jointly wide sense stationa ry, we calculate the cross-correlation
Ry x (t' T) = E [Y (t) x (t + T)] = E [(x (t) + ]\T(t)) x (t + T) ] = Rx(T) + R Jvx(t, T) = Rx(T).
(13. 71 )
We can conclude that X(t) and 'Y (t) are joint ly wide sense stationary. Simi larly, we can verify that Y(t) and N(t) are jointly wide sense stationary by ca lculating
Ry 1v(t, T) = E [Y(t)N(t + T)] = E[(X(t) + J\T(t))N(t + T)]
= RxN (t, T) +RN (T) = RN (T).
(13. 72)
[ 13.10
CROSS-CORRELATION
461
In the following ex a mple, v..re observe that a r ar1dorn seq11ence Y;i derived from a v..ride sense stationary sequen ce X n rnay also be v..ride ser1se stationa ry e·ven t 11011gh Xn arid Yn are not jointly v..ride sense stationar y .
Example 13.25 Xn is a wide sense stationary random sequence with autocorrelation function Rx [k] . The random sequence ·yn is obtained from Xn by reversing the sign of every other random variable in Xn: Y;i, = - lnXn. (a) Express the autocorrelation function of ·yn in terms of
Rx [k].
(b) Express the cross-correlation function of Xn and Y;.1, in terms of Rx [k]. ( c) Is
Yri
(d) Are
wide sense stationary?
Xn
and
Y;i
jointly wide sense stationary?
The autocorrelation function of Y;1, is
Ry [ri ,k]
=
E ['YnYn+k]
=
E [( - l)nXn( - l)n+k Xn+k]
=
(- 1) 2n+k E [XnXn+k]
=
(- l)k Rx [k].
(13 .73)
Y;.1, is wide sense stationary because the autocorrelation depends on ly on the index difference k . The cross-correlation of Xn and Y;.1, is
Rxy [n,,k]
=
E [XnY;1,+k]
=
E [Xn( -l )n+kXn+k]
=
(- l)n+k E [XnXn+k ]
=
(- l)n+k Rx [k] .
(13 .74)
Xn
Yn are not j ointly wide sense stationary because the cross-corre lation depends 'n and k. When ri and k are both even or when ri and k are both odd, Rxy [n,, k] = Rx [k ]; otherwise Rxy [ri, k] = -Rx [k].
and on both
Theorern 13.12 indicates that the autocorrelation of av.ride sense statior1ary process X (t) is syrnrnetric abot1t T = 0 (cor1tin11ous-tirne) or k = 0 ( randorn sequen ce) . The cross-correlat ion of jointl}' wide sense stat ionary processes has a correspondir1g S}rrr1metry.
==;;;; Theorem 13.14.~=~ If X (t) an,d Y (t) are joiritly 'tJJide serise station,ary con,t'iri'UO'tJ,s-tirne JJTocesses, then,
If Xn a/nd ·yn are joiritly VJide sen,se station,ary ran,dorn seq11,en,ces, then, Rxy [k] = Ry x [- k] .
[ 462
CHAPTER 13
STOCHASTIC PROCESSES
Proof From Definition 13.17, Rxy(T)
= E [X(t)Y(t + T) ]. Making the substitution 11, =
t + T yields RxY (T) = E [X (71,
-
T) Y (11,)] = E [Y (11,)X (11, - T)] = RYx (71, , -T).
(13.75)
SinceX(t) and Y(t) arejoin tlywidesensestationary, Ryx(7J, , -T) = Ryx(-T). The proof is similar for random sequences.
Quiz 13.10 X(t) is a wide sense stationary stocr1astic process w ith autocorrelatior1 function Rx(T). Y(t) is ident ical to X(t) , except tr1at the time scale is re·versed: Y(t) = X( - t).
(a) Express the autocorTelation function of Y(t) ir1 terrns of R x(T) . Is Y(t) wide ser1se statior1ary? (b) Express tr1e cross-correlation function of X (t) and ·y ( t) ir1 terms of Rx (T). Are X(t) and Y(t) jointly "'ride sense stationary?
13.11
Gaussian Processes For a Gal1ssian process X(t), every vector of sarnple vall1es X [X(t 1 ) X(tk) ]' is a Gaussian random vector .
=
The central lirnit theorerr1 (Theorem 9.12) helps explain the proliferation of Gaussiar1 randorr1 ·v ariables in natl1re. The sarne insigr1t extends to Ga ussian stochastic processes. For electrical a.nd corr1puter engineers, tr1e noise voltage ir1 a resistor is a pervasive exarr1ple of a phenomenon that is accl1rately rr1odeled as a Gaussian stochastic process. In a G aussian process, e·ver}' collection of sarnple values is a Gaussiar1 randorn vector (Defir1ition 8.12).
Definition 13.19
Gaussian Process
X(t) is a Gaussian, stochastic process if an,d on,ly if X = [X(t 1 ) X(tk)]' is a Ga/ussian, ran,dorn vector fo r an,y iriteger k > 0 an,d an,y set of tirne in,stan,ts
t 1 ) t2) ... ) t k . 1
Xn is a Gav.ssian, raridorn seqv,en,ce if an,d O'nly 'if X = [Xn 1 Xnk] is a Gaus sia'n ran,dorn vector fo r an,y in,teger k > 0 an,d an,y set of tirne iristan,ts n, 1 , n,2 , ... , n,k .
In Problern 13.11.5, -vve ask you to s hoV\r t h at t he Brov;.rr1ian rnotion process in Section 13.6 is a special ca.se of a Ga ussian process. Althol1gh the Brownian rr1otion process is not stationary (see Example 13.21) , our prirr1ary interest will be in -vvide
[ 13.11
GAUSSIAN PROCESSES
463
sense statioriar:y Gaussian processes . In t h is case, t he probability rnodel for tlie process is cornpletely specified b}' the expect ed value µ,x and t he aut ocorrelation function Rx (T) or R x [k]. As a consequence, a v.ride sense st ationary G a ussian process is st ationary .
Theorem 13.15 If X (t ) is a v1ide serise statioruzry Ga'tJ,ssian, prvcess, theri X (t ) is a station,ary Ga'tJ,SSiari process. If X n is a 111ide sen,se station,ary Ga/ussian, seqv,en,ce, X n is a station,ary Ga'tJ,SSi an, sequerice. Proof Let µ and C d enote t he expected value vector and t he covariance m atrix of t he
r ando1n vector X = [X (t1) X (t k)] '. Le t µ, and C d enote t he same qua nt it ies for X (t k + T)] '. Since X (t) is 'vid e t he t ime-shifted r andom vector X = [X (t1 + T) sense stationary, E [X(ti)] = E [X (ti + T)] = JJ,x . The i, j t h en t ry of C is
Cij
= Cx(ti , t j) = Cx(t j = Cx(t j
- ti )
+ T - (ti + T)) = Cx(ti + T , t j + T)
= Cij ·
(13.76)
Thus µ = µ, and C = C, implying t hat fx(x ) = fx_(x ). H ence X (t) is a stationary process. The same reasoning applies to a Gaussian r andom sequence X n .
The 'tllhite Gaussiari 'n oise process is a corrvenient starting point for rriariy studies iri elect rical and cornpl1t er engineeririg.
Definition 13.20 ==White Gaussian Noise W (t ) is a 111hite Ga/ussian, n,o'ise process if an,d on,ly ifW (t) is a station,ary Ga'tJ,ssi an, stochastic process 'UJ'ith the properties µ,11v = 0 arid R11v (T) = r70o(T). A conseqt1ence of t he definit ion is t liat for any collect ion of dist inct t irne inst a nts t 1 , . .. , tk, W (t 1 ) , . .. , W (t k) is a set of independent Gal1ssian r andom variables. In this case, the value of t he rioise at tirne t ,i tells nothing abot1t t he value of the noise at t ime t .i . W hile the \vliite Gaussian noise process is a useful m atherriatical rnodel, it does riot coriforrri to ar1y sigrial that can be obser,red ph}rsically. Note that t he . . average rio1se power is
E [W 2 (t)]
=
R w (O ) = oo.
(13.77)
That is , wliite noise lias ir1finit e average poV\rer , \vhich is physically irnpossible. The rnodel is useful, however , because any G aussian noise sigrial observed in practice can be interpret ed as a filtered vvhite G aussian noise signal \vith finite power.
Quiz 13.11 X (t) is a stat ionar}' G aussia n r andom process "'' ith µ,x(t) = 0 a rid a t1tocorrelation function Rx (T) = 2-ITI. W liat is t he joint PDF of X (t) a nd X (t + 1)?
[ 464
13.12
CHAPTER 13
STOCHASTIC PROCESSES
MA,.f LAB
Stochastic processes appear in models of rr1any phenomer1a studied by engineers . '\i\Then t he phenorr1ena ar e cornplicated, J\IIATLAB sirr1ulations are valuable anal}rsis tools. T o produce l\IIA.TLAB sirr1ulatior1s vve need to develop codes for stochastic processes. For exarr1ple, t o simulate the celll1lar telephone sv.ritch of Exarr1ple 13.4, v.,re need to model both the a rrivals and departures of calls . A Poisson process N(t) is a conventional rr1odel for arrivals.
Example 13.26 Use l\IIATLAB to generate the arriva l times S 1 , S 2 , ... of a rate a time interval [O, T].
function s=poissonarrivals(larn,T) %arrival times s=[s(1) ... s(n)] % s(n)<= T < s(n+1) n=ceil(1.1*larn*T); s=curnsurn(exponentialrv(larn,n)) ; while (s(lengt h(s))< T), s_new=s(length(s))+ ... curnsurn(exponentialrv(larn,n)); s= [s·' s - new] ·' end s=s(s<=T);
A Poisson process over
To generate Poisson a rriva Is at rate ,\, we employ Theorem 13 .4, which says t hat t he interarrival times are independent exponentia l (.\) random variables. Given interarrival times X ,i,, the 'i th arrival time is the cumulative sum
The functio n poissonarri vals generates cu-
mu lative sums of independent exponential random variables; it returns the vector s with s (i) corresponding to Si, the 'i t h arriva l time. Note t hat t he length of s is a Poisson (.\T) random variable because the num ber of arriva ls in [O, T] is random .
W hen \Ve \vish to exarnine a Poisson arrival process gr apl1ically , the vector of arri\ral tim es is not so convenient . A direct representation of the process N(t) is often more usef\11.
=== Example 13.27==::::1 Generate a sample path of N(t), a rate,\= 5 arrivals/ min Poisson process. Plot N(t) over a 10-minute interva l. 1
function N=poissonprocess (lambda, t) Given t = [t1 trn] , the function %N(i) = no. of arrivals by t (i) poissonprocess generates the sample path s=poissonarrivals(larnbda,rnax(t)); N = [N1 JV1n]' where ]\Ti = J\T(t,i) N=count (s, t); for a rate,\ Poisson process N (t) . The basic idea of poissonprocess. m is that g iven the a rr iva I t i mes 8 1 , 8 2 , .. ., N (t) =
rr1ax{r1,I Sn < t} is the number of arrivals that occur by time t. In particular, in N=count(s,t), N(i) is the nu m ber of elements of s that are less than or equal to t (i). A samp le path generated by poissonprocess .m appea rs in Figure 13.5.
[ 13.12
MATLAB
465
40 t=0.01 * (0: 1000); lambda=5· ' N=poissonprocess(lambda,t); plot(t,N) xlabel ('\it t') ; ylabel ('\it N(t)');
~
~ 20
0
0
5
10
t
Figure 13.5
A P oisson p rocess sa.n1ple p ath N (t ) generated by poissonprocess .m.
Note that t he nurnber of arri·v als gener at ed by poissonprocess deper1ds only on T = rr1axi t i but not on hov.r finely -vve represent time. That is, t =0.1*(0:10*T) or t =0.001*(0:1000*T) bot h generat e a Poisson nt1rr1ber N, v.rit h E [N ] = )..T, of arrivals over the interval [O, T]. \i\That changes is hov.r finely '""'e observe t he outp11t J\T(t ). Nov.r that MATLA.B can ger1er at e a Poisson arrival process, -vve can sirr1ulat e syst ems such as t 11e t elephor1e switc11 of Example 13.4.
--=-- Exa mple 13.28----=.... Simu late 60 minutes of activ ity of the telep hone switch of Example 13.4 under t he fo llowing assumptions. (a) T he switch sta rts w it h
M(O)
= 0 ca lls.
(b) Arrivals occur as a Poisson process of rate).. = 10 ca lls/ mi n. (c) T he duration of each ca ll (ofte n ca ll ed t he ho ld ing t ime) in m inut es is a n exponential (1 / 10) random variable independent of t he number of ca lls in the system and t he d uration of a ny other ca ll. In simswitch.m, the vecto rs sand x ma rk t he a rr iva I t i mes and ca 11 du rations. T he i th call arrives at t ime s (i), stays for t ime x(i), and departs at t ime y(i) =s(i)+x(i) . T hus the vector y =s+x denotes t he ca ll completio n t imes, also known as departures. By counting t he arriva ls s and departures y , we produce t he arr iva l and de parture processes A and D. At any given t ime t, the number of ca lls in the system equa ls the nu m ber of arrivals minus the number of departures. Hence M=A-D is the number of ca lls in the system. One run of simswitch.m depict ing sample funct ions of A(t), D (t), and JVJ(t ) = A(t) - D (t) appears in Figure 13.6.
function M=simswitch(lambda,mu,t) %Poisson arrivals, rate lambda %Exponential (mu) call duration %For vector t of times %M(i) =no. of calls at time t(i) s=poissonarrivals(lambda,max(t)); y=s+exponentialrv(mu,length(s)); A=count(s,t); D=count(y,t); M=A-D· '
Sirnilar t echniques can be used t o produce a Bro-vvnian rnot ion process ·Y (t).
[ 13.12
MATLAB
467
For an arbitra ry Ga ussian process X (t ), v.re can use MA.TLAB t o gener ate randorn seqt1ences Xn = X (n,T) t h at r epr esen t sarnpled ·v ersions of X (t) . For th e sarnpled process, t11e vector X = [Xo Xn- 1 ]' is a G aussian r andom ·v ector vvith expected value µ x = [E(X (0)) · · · E(X ( (n, - l )T ))] ' and covariance rnatrix Cx v.rith i,jth elem ent Cx(i , j) = Cx(iT,jT). We can generate rn, samples of X using x=gaussvector(mu,C,m). As described in Section 8.6 , mu is a ler1gth n, vector and C is the n, x ri covariance m atrix. W hen X (t) is wide sense stationary , the sampled sequer1ce is "'ride sense statior1ary V\rith a11t oco·variance C x [k]. Ir1 this case, the vector X = [ Xo Xn-l J' has covariar1ce rnatrix Cx V\rith Cx(i , j) = Cx [i - j). Since Cx[k] = Cx [- k],
Cx =
Cx [OJ
Cx (1)
Cx (1)
Cx [OJ
C x [ri - 1)
C x [ri - 1)
(13.78) Cx (1) Cx [OJ
Cx (1)
We see t11at Cx is constan t along each diagonal. A m atrix vvith constar1t diagonals is called a Toeplitz rnatrix. W hen t he covarian ce m atrix Cx is Toeplitz, it is cornpletely specified by tl1e ·v ector c = [Cx[O] Cx [l ] Cx [n, - 11] ' , v.rhose elerr1ents ar e bot11 t11e first column and first roV\r of Cx. T11us t11e PDF of X is cornpletely described by the expected value µ,x = E (Xi] and t11e ·v ector c. Ir1 this case, a function t11at generates sample \rectors X needs or1ly the scalar µ,x and vector c as ir1puts. Since generating sample vectors X corresponding to a stationary Gaussiar1 sequence is q11ite cornrr1on, v.re extend the function gaussvector (mu, C, m) introduced ir1 Section 8.6 t o rnake this as simple as possible. function x=gaussvector(mu,C,m) %output: m Gaussian vectors, %each with mean mu %and covariance matrix C if (min(size(C))==1) C=toeplitz(C); end n=size(C,2); if (length(mu)==1) mu=mu*ones(n,1); end [U,D,V]=svd(C); x=V*(D~(0.5))*randn(n,m) ... +(mu(:) *ones(1,m));
If C is a length ri roV\r or colurnn \rector , it is assurn ed t o be the firs t roV\r of a n n, x n, Toeplitz covariance rr1atrix that "'re create vvith the st atement C=toepli tz (C). In addition , V\rhen mu is a scala r \ralue, it is assurn ed to be the expected value E (Xn) of a st ationary sequence. T11e prograrn exter1ds mu to a length n, vector \vit11 ident ical elerr1er1ts. W hen mu is an r1,-elerr1ent vector arid C is an n, x n, CO\rarian ce rr1atrix, as was r equired in t he original gaussvector. m, they are left unchanged. The real vvork of gaussvector still occurs in t11e last t vvo lines , \vhich are identical t o the simpler versior1 of gaussvector. m ir1 Section 8.6.
--=--=-- Exa mple 13.30---=-Write a MA.TLAB f unction x =gseq(a,n,m) that generates rn samp le vectors X [Xo Xn]' of a stationary Gaussian sequence with µ,x
= 0,
cx
1
[k] = -1 +a k2 , .
(13.79)
[ 13.12
MATLAB
467
For an arbitra ry Ga ussian process X (t ), v.re can use MA.TLAB t o gener ate randorn seqt1ences Xn = X (n,T) t h at r epr esen t sarnpled ·v ersions of X (t) . For th e sarnpled process, t11e vector X = [Xo Xn- 1 ]' is a G aussian r andom ·v ector vvith expected value µ x = [E(X (0)) · · · E(X ( (n, - l )T ))] ' and covariance rnatrix Cx v.rith i,jth elem ent Cx(i , j) = Cx(iT,jT). We can generate rn, samples of X using x=gaussvector(mu,C,m). As described in Section 8.6 , mu is a ler1gth n, vector and C is the n, x ri covariance m atrix. W hen X (t) is wide sense stationary , the sampled sequer1ce is "'ride sense statior1ary V\rith a11t oco·variance C x [k]. Ir1 this case, the vector X = [ Xo Xn-l J' has covariar1ce rnatrix Cx V\rith Cx(i , j) = Cx [i - j). Since Cx[k] = Cx [- k],
Cx =
Cx [OJ
Cx (1)
Cx (1)
Cx [OJ
C x [ri - 1)
C x [ri - 1)
(13.78) Cx (1) Cx [OJ
Cx (1)
We see t11at Cx is constan t along each diagonal. A m atrix vvith constar1t diagonals is called a Toeplitz rnatrix. W hen t he covarian ce m atrix Cx is Toeplitz, it is cornpletely specified by tl1e ·v ector c = [Cx[O] Cx [l ] Cx [n, - 11] ' , v.rhose elerr1ents ar e bot11 t11e first column and first roV\r of Cx. T11us t11e PDF of X is cornpletely described by the expected value µ,x = E (Xi] and t11e ·v ector c. Ir1 this case, a function t11at generates sample \rectors X needs or1ly the scalar µ,x and vector c as ir1puts. Since generating sample vectors X corresponding to a stationary Gaussiar1 sequence is q11ite cornrr1on, v.re extend the function gaussvector (mu, C, m) introduced ir1 Section 8.6 t o rnake this as simple as possible. function x=gaussvector(mu,C,m) %output: m Gaussian vectors, %each with mean mu %and covariance matrix C if (min(size(C))==1) C=toeplitz(C); end n=size(C,2); if (length(mu)==1) mu=mu*ones(n,1); end [U,D,V]=svd(C); x=V*(D~(0.5))*randn(n,m) ... +(mu(:) *ones(1,m));
If C is a length ri roV\r or colurnn \rector , it is assurn ed t o be the firs t roV\r of a n n, x n, Toeplitz covariance rr1atrix that "'re create vvith the st atement C=toepli tz (C). In addition , V\rhen mu is a scala r \ralue, it is assurn ed to be the expected value E (Xn) of a st ationary sequence. T11e prograrn exter1ds mu to a length n, vector \vit11 ident ical elerr1er1ts. W hen mu is an r1,-elerr1ent vector arid C is an n, x n, CO\rarian ce rr1atrix, as was r equired in t he original gaussvector. m, they are left unchanged. The real vvork of gaussvector still occurs in t11e last t vvo lines , \vhich are identical t o the simpler versior1 of gaussvector. m ir1 Section 8.6.
--=--=-- Exa mple 13.30---=-Write a MA.TLAB f unction x =gseq(a,n,m) that generates rn samp le vectors X [Xo Xn]' of a stationary Gaussian sequence with µ,x
= 0,
cx
1
[k] = -1 +a k2 , .
(13.79)
[ 468
CHAPTER 13
STOCHASTIC PROCESSES
3
3 .--~--~~---~--~~--~--.
2
2
1
1
0
0
- 1
- 1 ~~
-2
-2
_3 --~---~~--~--~~--~--
_ 3 --~---~~--~---~~--~--
o
I0
20
30
40
50
(a) a = 1: gseq(1,50,5) Figure 13.8
rf\VO
o
~r-~-.,,,
I0
20
30
40
50
(b) a = 0.01: gseq(0.01,50,5)
sa.n1ple ou tputs for Exarnple 13.30.
All we need to do is generate the vector ex corresponding to the covariance function. Figure 13.8 shows samp le outputs for graphs (a) a = 1 : gseq(1,50,5), (b) a = 0.01 : gseq(0.01,50,5) . We observe in Figure 13.8 that each graph shows m = 5 samp le paths even though graph (a) may appear to have many more. T he graphs look very different because for a = 1, samples just a few steps apart are nearly uncorrelated a nd the sequence varies q uickly with time. T hat is, the sample paths in graph (a) zig-zag around . By contrast, when a = 0.01, samples have signifi ca nt correlation and the sequence varies s lowly . that is, in graph (b), the samp le paths look rea lt ively smooth. function x=gseq(a,n,m) nn=O:n; cx=1. I (1 +a*nn. ~2); x=gaussvector(O,cx,m); plot(nn,x);
==-- Quiz 13.1 2~==The svvitc11 simulation of Exarr1ple 13.28 is unrealistic ir1 the assurnption that the sv.ritch can handle an arbitrarily large nurnber of calls . J\l[odify t 11e sirnt1lation so that the Svvitch blocks (i. e., discards) new calls v.rhen the switch 11as c = 120 calls in progress . Estirnate P[B], the probability that a nev.r call is blocked. Your sirr1ulation rnay need t o be significar1tly longer t11an 60 rr1ir1utes.
Pu.rther R eadin,g: [Doo90] contains t h e origina l (1953 ) rnatherr1atical theor}' of stochastic processes . [HSP87] is a concise int roduction t o basic prir1ciples for readers familiar v.rith probability arid r andom variables . The secor1d half of [PP02] is a cornpreher1sive treatise or1 stochastic processes.
[ PROBLEMS
469
Problems - - - - - - - - - - - - - - - - - - - - - - - - - - - Difficulty:
Easy
13.1.1
For Example 13.4, define a set of random variables that could produce the sample funct ion m,(t, s). Do not duplicate the set listed in Example 13.7.
13.1.2
For the random processes of Examples 13.3 , 13.4, 13.5, and 13.6, identify whether the process is discretet ime or continuous-time, d iscrete-value or continuous-value. Let Y(t) denote the random process corresponding to the transmission of one symbol over the QPSK commtmicat ions system of Example 13.6. \tVhat is the sample space of the underlying experiment? Sketch the ensemble of sample functions.
13.1.3
13.1.4
In a b inary phase shift keying (BPSK) communications system, one of two equally probable bits, 0 or 1, is transmitted every T seconds. If the kth b it is j E { 0, 1}, t he waveform Xj(t) = cos(27rfot + j7r) is transmitted over t he interval [(k - l )T,kT]. Let X(t) denote the rando1n process in which t hree symbols are transmitted in the interval [O, 3T]. Assuming f"o is an integer multiple of l /T, sketch the sample space and corresponding sample functions of the process X (t).
Moderate
D ifficu lt
+
Experts Only
pany has an order for one part in 10 4 oscillators vvith frequency between 9999 Hz and 10, 001 Hz. A technician takes one oscillator per minute from the production line and measures its exact frequency. (This test takes one minute.) The random variable Tr ininutes is the elapsed t ime at 'vhich the technician finds r acceptable oscillators. (a) F ind p, the probability that an s ingle oscillator has one-part-in-10 4 accuracy. (b) \t\f hat is E [T1 ] minutes, the expected t ime for the technician to find the first one-part-in-104 oscillator? (c) What is the probability that the technician will find t he first one-part-in-10 4 oscillator in exactly 20 minutes? ( d) \t\f h at is E [T5], the expected t ime of find ing the fifth one-part-in-10 4 oscillator?
13.2.3
For the random process of Problem 13.2.2, w hat is t he conditional PMF of T2 given T1? If the technician finds the first oscillator in 3 minutes, w hat is E[22 IT1 = 3], the conditional expected value of the time of finding t he second onepart-in-104 oscillator?
13.1.5
Let X(t) = e - (t- 'r)u(t - T) be an exponential pulse \Vith a random delay T. The delay 2" h as a PD F f'r(t). F ind the PDFofX(t).
Let vV be an exponential random variable with PDF
13.3.1
True or false: For a continuousvalue random process X(t), the random variable X(t0 ) is always a continuous random variable.
13.2.1
fw(w) = {
~-w
'UJ
> 0,
otherwise.
F ind the CDF Fx(t)(x) of the time-delayed ra1np process X (t) = t - W.
13.2.2
In a production line for 10 kHz oscillators, the output frequenc y of each oscillator is a random variable vV uniforml y dist ributed between 9980 Hz and 1020 Hz. The frequencies of different oscillators are independent. The oscillator com-
13.2.4
Suppose t hat at the equator, we can model the noontime temperature in degrees Celsius, X n, on day n by a sequence of iid Gaussian random variables with expected value 30 degrees and standard deviat ion of 5 degTees. A ne\v random process Yk = [X2k- 1 + X2k]/ 2 is obtained by averaging the temperature over t\vo days. Is Yk an iid rando1n sequence?
13.3.2
For t he equatorial noonti1ne temperature sequence X n of Problem 13.3.1, a second sequence of averaged temperatures is Wn = [X n + X n- 1]/ 2. Is vVn an iid random sequence?
[ 470
CHAPTER 13
STOCHAS TIC PROCESSES
13.3.3 Let Yk denote the ntunber of failures between successes k - 1 and k of a Bernoulli (p) rando1n process. A lso, let Y1 denote the number of fail u res before the first success. What is the Pl\tlF Pyk(y)? Is Yk an iid random sequence? 13.4.1 The arrivals of ne'v telephone calls at a telephone switching office is a Poisson process N(t) with an arrival rate of .\ = 4 calls per second. An experiment consists of monitoring the switching office and recording N (t) over a 10-second interval. (a) What is P_N(i )(O) , the probability of no p hone calls in the first second of observation? (b) What is PN(i)(4), the probability of exactly four calls arriving in the first second of observation? ( c) \i\fhat is PN( 2 )(2), t he probability of exactly two calls arriving in the first two seconds? 13.4.2 Queries presented to a compu ter database are a Poisson process of rate .\ = 6 queries per minute. An experiment consists of monitoring the database for m minutes and recording N(1n), the number of queries pre.sented. The answer to each of the following questions can be expressed in terms of the PlVIF PN (ni)(k) = P [J\T(m,) = k). (a) What is the probability of no queries in a one-minute interval? (b) What is the probability of exactly six queries arriving in a one-minute interval? (c) \i\fhat is t he probability of exactly three queries arriving in a one-half-minute interval? 13.4.3 At a successful garage, there is al'vays a backlog of cars 'vailting to be serviced. The service times of cars are iid exponential random var iables 'vith a mean service t i1ne of 30 m inutes. F ind the P lVIF of J\T(t), the number of cars serviced in the first t hours of the day . 13.4.4 The count of students dropping the course "Probability and Stochastic Processes" is kno,vn to be a Poisson process of
rate 0.1 drops per day. Starting with day 0, the first day of the semester, let D (t) denote the number of students that have dropped after t days. What is PD(t)(d)?
13.4.5 Customers arrive at the 'V eryfast Bank as a Poisson process of rate .\ customers per minute. Each arriving customer is immediately served b y a teller. After being served, each customer immediately leaves the bank. The time a customer spends w ith a teller is called the service t ime. If the service t ime of a customer is exactly t'vo minutes , what is the PMF of the number of customers J\T(t) in service at the bank at t ime t? 13.4.6 Given a Poisson process N(t), identify which of the following are Poisson processes .. (a) N(2t),
(b) N(t/2) ,
(c) 2N(t) ,
(d) N(t)/2,
(e) N(t + 2) ,
(f) N (t) - N (t - 1).
13.4.7 Starting at any time t, the number N,,. of hamburgers sold at a \i\fhite Castle in t he T m inute interval (t , t + T) has the Poisson P lVIF
. () PJV,,. Tl, -
{(lo) n
Te
0
- 10T/ f r1,.
rl, = 0 , 1, ... otherwise
(a) F ind t he expected number E [J\T6o] of hamburgers sold in one hour (60 m inutes). (b) \tVhat is the probability that no hamburgers are sold in the 10-minute interval starting at 12 noon? (c) You arrive at the \i\fhite Castle at 12 noon. You w ait a random t ime W (minutes) until you see a hamburger sold. \i\!hat is the PDF of vV? Hint: F ind P[W > 11J).
13.4.B• A sequence of queries are made to a database system. The response t ime of the system, T seconds, is t he exponential (1/8) random variable. As soon as the system responds to a query, the next query is
[ PROBLEMS
made. Assuming the first query is made at t ime zero, let N(t) denote t he number of queries made b y time t. (a) \tVhat is P[7"' > 4], the probability t hat a single query 'vill last at least four seconds? (b) If t he database user h as been 'vaiting five seconds for a response, what is P [7"' > 13 17"' > 5), the probability that the user will wait at least eight more seconds? ( c) \tVhat is the PlVIF of N (t)?
13.4.9 The proof of Theorem 13.3 n&glected t he first interarrival t ime X 1. Sho'v that X1 has an exponential (.\) PDF. 13.4.10 U1, U2, ... are independent identically distributed uniform random variables 'vith parameters 0 and 1. (a) Let X i= -ln Ui. \i\fhat is P [Xi> x)? (b) What kind of random variable is X i? ( c) G iven a constant t > 0, let fl denote the value of n,, such that n
IT ui > e i =l
>
station vvhile you are waiting. What is the PlVIF PN(n,)?
13.5.3 A sub,vay station carries both blue (B) line and red ( R) line trains. Red line trains and blue line trains arrive as independent Poisson processes 'vith rates AR= 0.15 and AB= 0.30 trains/min respectively. You arrive at the station at random time t and watch the trains for one hour. (a) What is t he Pl\!lF of fl, the number of trains that you count passing through the station? (b) Given that you see N = 30 trains, what is t he condit ional PMF of R, the number of red trains that you see?
13.5.4 Buses arrive at a bus stop as a Poisson process of rate,\= 1 bus / minute. After a very long time t, you show up at the bus stop. (a) Let X denote t he inter arrival time between t'vo bus arr ivals. \tVhat is the I>DF fx(x)? (b) Let vV equal t he time you wait after t ime t until the next bus arrival. What is the PDF fvv(ru1)?
n+ l - t
471
IT ui.
i =l
Note that we define TI~ \i\f hat is t he PMF of N?
1
ui
1.
13.5.1 Customers arrive at a casino as a Poisson process of rate 100 customers per hour. Upon arriving, each customer must flip a coin, and only those customers 'vho flip heads actually enter t he casino. Let N(t) denote the process of customers entering t he casino. F ind the .P MF of fl, the number of customers who arrive bet,veen 5 P :rvI and 7 P M. 13.5.2 A subway station carries both blue (B) line and red (R) line trains. Red line trains and blue line trains arrive as independent Poisson processes with rates .\1z = 0.05 and ,\ B = 0 .15 trains/ min, respectively. You arrive at a random t ime t and wait until a red train arrives. Let N denote the number of blue line trains that pass t hrough t he
( c) Let V equal the t ime (in minutes) that has passed since the most recent bus arrival. \t\lhat is t he PDF f v( v )? ( d) Let U equal the t ime gap between the most recent bus arrival and the next bus arrival. What is the PDF of U?
13.5.5 For a Poisson process of rate A. , the Bernoulli arrival approximation assumes that in any very small interval of length .6.. , there is either 0 arrivals with probability 1-A..6.. or 1 arrival with probability A..6... ·u se this approximation to prove Theorem 13. 7. 13.5.6 Continuing Problem 13.4.5, suppose each service t ime is either one minute or t'vo minutes equiprobably, independen t of the arrival process or the other service t imes. \i\fhat is the PMF of the nu1nber of customers N(t) in service at the bank at . t'? ,. t ime
[ 472
CHAPTER 13
STOCHAS TIC PROCESSES
13.5.7 Ten runners compete in a race starting at time t = 0. The runners ' finishing t imes Ri , ... , R io a:re iid exponent ial random variables 'vith expected value l / p, = 10 minutes. (a) What is the probability that the last runner will finish in less than 20 minu t,es.? (b) What is the PDF of X1 , the finishing t ime of the 'vinning runner? (c) Find the PDF of Y
= R 1 +·· · +Rio.
(d) Let X 1, . .. , X 10 denote the runners ' interarrival t imes at the finish line. F ind the joint PDF fx 1 , .. .,x10 (x1, . .. , X10).
13.5.8 Let N denote the number of arr ivals of a Poisson process of rate ,:\ over the interval (0, T). Given. J\T = n,, let S1, ... , S 11 denote the corresponding arrival t imes. P rove that
f s1 ,. .. ,Snl l\I (S1, . . . , Sn lri)
=
{n,!/Tn 0
0 < s1 < · · · < Sn other,vise.
< T,
Conclude that, given N (T) = n,, S 1, . . . , S11 are the order statistics of a collection of n, uniform (0, T) random variables. (See Problem 5. 10.11.)
13.6.1 Over the course of a day, the stock price of a widely traded company can be modeled as a Brownian motion process where X (0) is the opening price at the morning bell. Suppose t he unit of t ime t is an hour, the exchange is open for eight hours , and the standard deviation of the daily price change (the d ifference between the opening bell and closing bell prices) is 1/ 2 point. \i\fhat is the Bro,vnian motion parameter a? 13.6.2 Xo ,X1, .. . is an iid Gaussian (0, 1) rando1n sequence. The random sequence Yri is defined by Y0 = 0 and Yri+ 1 = X n+ 1 + 1";1. F ind the autocorrelation fur1ction Ry[n,, k]. 13.6.3 Let X(t) be a Brownian motion process wit h variance Var[X(t)] =at. For a
constant c > 0, determine whether Y(t) = X (ct) is a Brownian motion process.
13.6.4 For a Brownian motion process X(t), let Xo = X(O) , X1 = X(l), ... represent samples of a Bro,vnian motion process with variance at. The d iscrete-time continuous-value process Y1 , Y2, . . . defined by Yri = X n - Xn - 1 is called an increm,ents process. Show that Yn is an iid rando1n sequence. 13.6.5 This problem works out the missing steps in the proof of Theorem 13.8. For W and X as defined in the proof of the theorem, show that W = AX. \t\lhat is the matrix A ? Use Theorem 8 .11 to find f,,-(w ). 13.7.1 X 11 is an iid random sequence with expected value E [X11 ] = JLX and variance Var[Xn] = a1:-. What is the autocovariance Ox [m,, k ]?
13.7.2 For the time-delayed ramp prcr cess X (t) from Problem 13.2.1, find for any t > 0: (a) The expected value function JL x (t) , (b) The autocovariance function 0 x (t, T). Hint : E[W] = 1 and E [W 2 ] = 2.
13.7.3 _A. simple model (in degrees Cels ius) for the daily temperat1u·e process O(t) of Example 13.3 is
l
27rn, Cn = 16 [ 1 - cos - +4Xn, 365
where X1 , X2 , . . . is an iid random sequence of Gaussian (0, 1) random variables. (a) What is E [On]? (b) Find the Oc[m,, k].
autocova1iance
function
(c) \t\lhy is this model overly simple?
13.7.4 A different model for the daily temperature process 0 (n,) of Example 13.3 is 1
On= 2 0n- l
+ 4X
11 ,
[ PROBLEMS
4 73
'vhere Co, X1, X2, ... is an iid random sequence of Gaussian (0, 1) random variables.
sa1npling X(t) every .6. seconds, we obtain the discrete-t ime random sequence Y'.;1 = X (ri,.6.). Is Yn a stationary sequence?
(a) F ind the mean and variance of Cn·
13.8.5 Given a stationary random sequence Xn, \Ve can S7J,bsample Xn by extracting every kth sample: Y'.;1 = Xkn· Is Y'.;1 a stationary random sequence?
(b) F ind the autocovariance Cc[1n, k]. ( c) Is this a plausible model for the daily temperature over the course of a year? (d) Would C1, ... , C31 constitute a plausible model for the daily temperature for the month of .January?
13.7.5 For a Poisson process N (t) of rate .A, show that for s < t, the autocovariance is CN(s, t) =.As. Ifs> t, what is CN(s, t)? Is there a general expression for C N ( s, t)? 13.7.6 N(t) is a Poisson process of rate .A= 1 a nd Xo, X1, X2, ... is an iid sequence of Gaussian (0, a) random variables that are independent of N(t). Consider the process {Y(t)lt > O} defined by Y(t) = XN(t )· Find the expected value µ,y(t) = E [Y(t)] and the covariance function Cy (t, r). (Assume
lr l < t.) 13.7.7 Xn is an iid rando1n sequence \Vith E[Xn] = 0 and 'lar[Xn] = 3. Find t he autocorrelation function Cy [n,, k] of the process
Yn = Xn - 1Xn. For an arbitrary constant a, let Y(t) = X(t +a). If X(t) is a stationary random process, is Y (t) stationary?
13.8.1
13.8.2 X = [X1 X2]' has expected value E [X ] = 0 and covariance inatrix Cx =
[i
~]
Does there exist a stationary process X(t) and t ime ins tances f1 and t2 such that X is actually a pair of observations [X(t1) X(t2)]' oftheprocessX(t)? For an arbitrary constant a, let Y(t) = X(at). I f X(t) is a stationary random process, is Y (t) stationary?
13.8.3
13.8.4 Let X (t) be a stationary continuous-time random process. By
13.8.6 Let A be a nonnegative random variable that is independent of any collect ion of samples X(t1), ... , X(tk) of a stationary random process X(t). Is Y(t) = AX(t) a stationary random process? 13.8. 7 Let g (x) be deterministic function . If X(t) is a stationary random process, is Y(t) = g(X(t)) a stationary process? 13.9.1 \i\fhich of the following are valid autocorrelation functions?
Rl(r) = o(r) R3 (T) = 0 (T - 10)
R2(r) = o(r) + 10 R4 (T) = 0 (T) - 10
13.9.2 Let _4 be a nonnegative random variable t hat is independen t of any collect ion of samples X(t1), ... , X(tk) of a w ide sense stationary random process X (t). Is Y(t) = _4 + X (t) a w ide sense stationary process? 13.9.3 True or False: If Xn is a wide sense stationary random sequence with E [Xn ] = 0, then Y'.;1 = Xn - Xn - l is a \v ide sense stationary random sequence. 13.9.4 Let Xn denote an iid sequence of Bernoulli (p = 1/2) random variables. F ind the autocorrelation function Rx [n,, k] and the autocovariance function C x [n,, k]. 13.9.5 Xn is an iid sequence \Vith E[Xn] = µa nd Var[Xn] = a 2. F ind the autocorrelat ion function Rx [n,, k]. 13.9.6 X(t) and Y(t) are independent wide sense stationary processes. Determine if t hese processes are \vid&sense stationary:
(a) V(t)=X(t)+Y(t), (b) W(t)
= X(t)Y(t).
[ 474
CHAPTER 13
STOCHAS TIC PROCESSES
13.9.7 True or False: If X n is a \¥ide sense stationary random sequence \Vi th E [X n) = 0, then Yn = X n + (-l)n- l X n- l is a w ide sense stationary random sequence. 13.9.8
vV(t)
Consider the random process
=X
cos(2?Tfot)
+Y
sin(2?Tfot),
\¥here X and Y are uncorrrelated random var iables, each with expected value 0 and variance a 2 . Find the autocorrelation Rw (t, T). Is vV (t) \vide sense stationary?
13.9.9 X(t) is a 'vide sense stationary random process with average po,ver equal to 1. Let 8 denote a random variable 'vith uniform distribution over [O , 2?T] such that X(t) and e are independent. (a) What is E [X 2 (t))? (b) What is E [cos(2?Tfct + '8 ))?
13.10.1 X(t) and Y(t) are independent wide sense stationary processes with expected values µx and JJ,y and autocorrelation functions Rx (T) and Ry (T), respect ively. Let W(t) = X(t)Y(t ). (a) F ind p,w and Rvv(t , T) and sho\v that W(t) is \¥ide sense stationary. (b) Are W(t ) and X(t) jointly wide sense stationary?
13.10.2 X(t) is a \vide sense stationary random process. For each process X i(t) defined below, determine whether X i(t) and X (t) are jointly \¥ide sense stationary. (a) X1(t)
= X(t +a)
(b) X2 (t) = X (at)
13.10.3 X(t) is a w ide sense stationary stochastic process with autocorrelation function
( c) Let Y (t) = X (t) cos(2?Tfct+ 8 ). \i\fhat is E [Y(t))? (d) What is the average power of Y (t)?
13.9.10 Prove the properties of Rx [n,) given in Theorem 13.12. 13.9.11 Let X n be a \vide sense stationary random sequence \¥ith expected value JJ,x and autocovariance Cx[k). For m = 0 , 1, . . . , we define -
1
x ---2m, + 1
rn
rn -
n =-rn,
as t he sample mean process. Prove that if I:%° _00 Cx [k] < oo, then X o, X 1, .. . is an unbiased consistent sequence of estimates of jJ,X.
The process Y(t) is a version of X(t) delayed by 50 microseconds: Y (t) = X (t - to) where to = 5 x 10- "'0 s. (a) Derive the autocorrelation function of Y(t). (b) Derive the cross-correlation function of X (t) and Y (t) . ( c) Is Y (t) 'vide-sense stationary? ( d) Are X(t ) and Y(t) jointly \vide sense stationary?
13.11.1 _A stationary Gaussian process X(t) is observed at times tl and t2 to form 1 the random vector X = [X(t1) X(t2)] with expected value E [X ) = 0 and covari-
[~?
:;i]
13.9.12 Determine whether each of these statements is true or false:
ance matrix C x =
(a) If X n and Yn are independent stationary processes , t hen Vn = Xn / Y n is wide-sense stationary.
13.11.2 G iven a Gaussian process X(t), identify which of the following , if any, are Gaussian processes. (a) 2X(t), (b) X(t/ 2) ,
(b) If X n and Xi are independent \¥ide sense stationary processes, then Wn = X n /Y11 is wide-sense stationary .
·What is the
range of valid values (if any) of
(c) X(t)/2, (e) X(2t).
er?
and a~?
(d) X(t) -X(t-1),
[ PROBLEMS
4 75
(a) F ind the condit ional CDF
13.11.3
A 'vhite Gaussian noise process N(t) with au tocorrelat ion R_N(T) = nO(T) is passed through an integrator yielding t he output
Y(t) =
fo' N(u) du.
F ind E[Y(t)] and the autocorrelation function Ry(t,T). Sho\v that Y(t) is anonstat ionary process. Let X(t) be a Gaussian process \Vi th mean p,x (t) and autocovariance Cx(t,T). In this problem, we verify that t he for two samples X(t1), X(t2), the mult ivariate Gaussian density reduces to t he bivariate Gaussian PDF. In the following steps, let a'f d enote the variance of X (ti ) and let p = Cx(t1, t2-t1)/(a1a2) equal the correlation coefficient of X (t1) and X (t2).
13.11.4
(a) F ind the covariance matrix C and sho'v that the determinant is ICI = af a~(l p2). (b) Show t hat t he inverse of the correlation matrix is
c - 1=
- 11 - p2
(c) Now sho\v that the m ultivariate Gaussian density for X(t1), X(t2) is t he b ivariate Gaussian density.
13.11.5
Show that the Brownian motion process is a Gaussian random process. Hint: For W and X as defined .in the proof of t he Theorem 13.8, find matrix A such t hat W = AX and then apply Theorem 8.11.
FY(t)IN(t)
Express your answer in terms of the
13.12.1
Write a l\IIATLAB program that generates and graphs t he noisy cosine sample paths X cc(t), Xdc(t), X cct(t), a nd Xdct(t) of F igure 13.3. Note that t he mathematical d efinition of X cc (t) is
X cc (t) = 2cos(2'rrt)
N(t)
Y(t) =
L
n =O
Xn.
+ N(t),
-1
< t < 1.
Note that N (t) is a \vhite noise process wit h autocorrelation RN(T) = O.Olo(T). Practically, the graph of Xcc(t) in Figure 13.3 is a sampled version X cc[n,) = Xcc(n,Ts), where the sampling period is Ts = 0.00ls. In addit ion , the discrete-t ime functions are obtained by subsampling X cc [ri). In subsampl.ing, \Ve generate Xctc[n,) by extracting every kth sample of Xcc[n,); see Problem 13.8.5. I n terms of 1VIATLAB, which starts indexing a vector x 'vith first element x (1))
Xdc(n)=Xcc(1+(n-1)k) . The discrete-t ime graphs of F igtu·e 13.3 used k = 100.
13.12.2
For t he telephone S\vitch of Example 13.28, we can estimate the expected number of calls in the system, E[M(t)], after T minutes using the t ime average estimate 1 T
Jiij,T
=T L J\!l(k). k= l
13.11.6
Let X1, X2, . .. denote a s& quence of iid Gaussian (0, 1) random variables. Let N(t) denote a Poisson process of rate .A that is independent of t he sequence Xn. Consider the random process
= p [Y(t) < vlN(t) = n,).
(yin,)
Perform a 600-minute S\vitch simulation - and graph the sequence Alf 1, JV! 2, ... , M 600 · Does it appear t hat your estimates are converging? Repeat your experiment ten times and interpret your results.
13.12.3
A particular telephone switch handles only automated junk voicemail calls that arrive as a Poisson process of rate
[ 476
CHAPTER 13
STOCHASTIC PROCESSES
.A = 100 calls per minute. Each au tom ated vo icem a il call has d urat ion of exactly one minute. Use t he method of P roblem 13.12.2 to estimate t he exp ected number of calls E[M(t)]. D o your results differ very much from t hose of P r oblem 13.12.2?
13.12.4
R ecall t hat for a r ate .A Poisson p rocess, t he expected number of a rrivals in [O, T ] is >.T . Inspection of t he cod e for poissonarri vals (lambda, T) will show t h at init ially n, = jl . l.ATl arrivals are generated. If Sn > T, t he prog,T am stops and returns { Sj ISJ < T }. Ot herw ise, if Sn < T, t hen vve gener ate an ad d it ional n ar rivals and check if S2n > T . This process may be r epeated an arb it r ary n um ber of t imes k until Skn > T . L et K equa l t he number of t imes t his process is repeated . \i\f hat is P[I< = 1]? W hat is t he disad vantage of choosing larger n, so as to increase P[K = 1] ?
13.12.5
In t his p roblem , we em ploy t he r es ult of l=> roblem 13.5.8 as t he b as is for a function s=newarri vals (lambda, T) t hat gener ates a P oisson arriva] process. The program newarri vals. m sh ould d o t he follo,ving:
• Generate a sample value of N, a P oisson (.AT) random variable. • Given N = n,, gener ate {U1, ... ,Un }, a set of n uniform (0, T) random variables. • Sort { U1, ... , Un} from sm allest to largest and return t he vector of sorted elemen ts. Write t he program newarrivals .m and experimen t to find ou t 'vhether t his program is a ny faster t han poissonarrivals .m.
13.12.6
Suppose t he Bro,vnian m otion process wit h a = 1 is constra ined by barriers . That is , we 'vis h to gener ate a process Y (t) such t hat - b < Y(t ) < b for a constan t b > 0. Build a s imulation of t his system. Estimate P [Y (t) = b]. For t he depart ure process D (t) of E xample 13.28, let Dn d enote t he t ime of t he nt h d ep art ure . The n,t h interdeparture tim,e is t hen Vn = Dn - D n - 1 · From a sample path containing 1000 depart ures, est imate t he PDF of ~i · Is it r easo na ble to m odel V'.;i as a n exp onent ial r a ndom variab le? \tV hat is t he m ean interdeparture t ime?
13.12.7
[
Appendix A Families of Random Variables A.1
Discrete Random Variables
Bernoulli (IJ) ForO
Px(x;)
E[X] ·v ar[X]
=
1 - r)
~r; = O
IJ
;i;
0
otherwise
= 1
>x (s)
= 1 - [J
+ [Jes
= JJ
= IJ( l - p)
==- Binomial (ri,p)---== For a positive iriteger n, arid 0 < p < 1, 1
Px (x)
E [X] Var[X]
=
(n.')px( l x
JJ)n-:i:
= Tl[)
= riJJ( l - r))
Discrete Uniform (k, l) For integers k arid l st1ch t hat k < l,
==
Px(x) =
1/ (l - k+ l )
x = k,k + l , ... ,l
0
otherwise
esk _ es(l+ I )
> x(s)
= (l - k
+ 1)(1 -
es)
E [X] = k + l 2
Var[X] = (l - k)(l - k + 2)
12
477
[ 478
FAMILIES OF RANDOM VARIABLES
=== Geometric (r>)==::::a For 0
< 1,
Px(x)
=
p(l - J>) x- 1
;i;
0
other-vvise
=
pes
1, 2, ... > X ( s) =
1 - (1 - P)es
E [X] = 1/p ·v ar[X]
=
(1 - p) / p 2
==- Multinomia1----. For integer n, > 0 , I>,;,
> 0 for 'i =
1, ... , ri, a.rid J>1
+ · · · + I>n =
1,
==- Pascal (k, p) For positive integer k , a nd 0
Px (x) =
< p < 1,
(~ =~)pk(l -
p) x-k
E [X] = k /r> Var[X] = k(l - p) /r>2
Poisson (a)=== For cv. > 0 ,
Px(:i;) =
x! 0
E [X] =
a
Var[X ] = a
x = 0, 1, 2, ... ot herwise
> X ( s)
= (
r>es 1 - (1 - P )es
)
k
[ A.2
CONTINUOUS RANDOM VARIABLES
479
Zipf (n,, a) For positive ir1teger n, > 0 and constant a > 1,
c(n,, a) Px (::c) = 0
x
=
1, 2, . . . , ri
otherwise
-vvhere 1
n
c( n, cv.)
A.2
=
-1
L ~ k= l
Continuous Random Variables
Beta (i, j) For positive ir1tegers i a nd j, the beta function is defined as
. . (3('1,,J)
=
(i + j -1)! (i - l) !(j - 1)!
For a f3('i, j) randorn ·v ariable X,
1·x (x) = E [X) V·· [X) ar
= =
f3 (i , j)x;i- 1(1 - x;)-'l-l
0< x <1
0
otherwise
"'
i +j
..
iJ
(i+j) 2 ('i +j
+l)
Cauchy (a, b) For const ants a > 0 and - oo < b < oo,
fx
( ) ;i;
=
a, 1 11a,2+(x - b)2
( s) = ebs-alsl
Note t11at E[X) is 11r1defined since f 00 x; f'x(x) dx is undefined. Since the PDF is symmetric a.bout x = b, the m ea.r1 car1 be defined, ir1 the ser1se of a principal value, to be b. 00
E [X)
b
Var[X] = oo
[ 480
FAMILIES OF RANDOM VARIABLES
=== Erlang (ri , ,\)- - For,\ > 0, a rid a positive integer 'n,
(ri - 1)!
>x(s) other\;vise
0
=
,\ )n (,\ - s
E [X ) = 'n/ ,\ Var[X ] =
n,/,\ 2
=== Exponential (,\)- - For,\ > 0,
f'x (:i;) =
>0
Ae ->.x
;i;
0
other\;vise
,\
>x (s) = ,\ - s
E [X ) = 1/ ,\ ·v ar [X ] = 1/ ,\ 2
- - -Gamma (a, b)- - For a > - 1 and b > 0, ;_r;ae - x/b
f x(:i; ) =
x> O
a!ba+l
ot herwise
0
E [X ) = (a + 1) b Var [X ) = (a + 1)b2
Gaussian (µ,, a )---=== For const ants a > 0, - oo < µ, < oo, ...,
e - ('.r; -µ,)- / 2
2
f x (:i;) = - - - -
ay'2;
E [X ) = 11, Var[X] = a 2
1
>x(s)
=
(1 - bs)a+l
[ A.2
CONTINUOUS RANDOM VARIABLES
- - - Laplace (a, b) For constants a > 0 and - oo < b < oo,
f'x (x;) = ae-a l ~r;-bl 2
E [X] = b ·v ar[X]
= 2/o,2
- - - Log-normal (a, b, (} ) For constants - oo < a < oo, - oo < b < oo, a rid (]' > 0, 1 - - -
e-(ln(x-a)-6) 2 / 2
f'x (x) = ot11er\vise
0
E [X] = a+
2 6 e +(j / 2
26
Var[X] = e +(j
2 (
e(j
2 -
1)
==- Maxwell (a)--== For a> 0,
f'x (x) =
v/2Fra 3x;2e-a2:i:2 / 2
x >0
0
othervvise
v;;I;
E[X] = {8 Var[X] = 3 7r
7ra2
8
Pareto (a, 11,) For a > 0 and µ, > 0,
f'x (x) = E [X] = Var[X ]
=
(a/µ,) (x;/1;,)-(a+I)
x > µ,
0
otherwise
ai;,
a - 1 aµ,2 (a - 2)(a - 1)2
(a> 1) (a> 2)
481
[ 482
FAMILIES OF RANDOM VARIABLES
Rayleigh (a) For a > 0,
f x(x)
=
a2xe-a2:1:2/ 2
x >0
0
otherwise
E [X] = ;;~
V
Var [X ] =
2
- :/
2
a,
Uniform (a , b) For const ants a < b. I
1
f x (x) = E [X] =
b0
a;
a,
otherv.rise
b
Var[X] = (b - a,)
12
2
ebs -
a
>x (s)
=
eas
s(b - a)
[
Appendix B A Few Math Facts This text assl1rnes that tl1e reader kr1ov.rs a variety of m athernatica.l facts . Often these facts go unstated. For example, we llSe rnany properties of lirr1its, derivatives, and integrals. Generally, we have omitted cornrnent or reference to matherr1atical tec11niql1es typically ernplo}red b}' er1gir1eering students. Ho,vever , \vl1en we err1ploy rr1ath techniql1es t11at a stl1dent rr1ay have forgotten , the result car1 be confusion. It becomes d iffic11lt to separate the m at11 facts from the probability facts. To decrease the likelihood of this event, we have surr1rr1arized certair1 key rnat11emat ical facts. In the text, we h ave noted wher1 \A.Te llSe these fact s. If any of t11ese facts a re l1nfarr1iliar , -vve encourage the reader to cons11lt wit11 a t ext book in that area.
Trigonometric Identities Math Fact B.l = = Half Angle Formulas cos(A + B ) = cos Acos B - sinA sinB
sir1(A + B) = sin A cos B + cos A sir1B
c os 2A = cos 2 A - sir12 A
s in2A = 2s in A cos A
==- Math Fact B.2'--==:...... Products of Sinusoids 1 sin A sin B = - [cos (A - B) - cos( A + B) J 2 1 cos A cos B = ? [cos (A - B) + cos (A + B) J ...,
sin A cos B =
l2 [sin(A +
B ) + sir1(A - B )J
Math Fact B .3 - -The Euler Formula The El1ler formula e.ie = cos () + j sin(} is the so11rce of the identities
cos (} =
e 1-e
+ e-1·e · 2
e.7·e - e-.7·e
sin(} = - - - -
2j
483
[ 484
A FEW MA TH FACTS
Sequences and Series Math Fact B.4 == Finite Geometric Series The finite geornetric series is n
~
~
.
qi = l
+ q + q2 + ...
1 - qn+ l + qn = _ __ l- q
To see this ) rnultipl}' left a,nd rig11t sides by (1 - q) to obtain n
(1 - q)
L q'i
= (1 - q) (l
+ q + q2 + ... + qn)
= 1 - qn+ l.
i=O
==- Math Fact B.5,---===-- lnfinite Geometric Series When lql < 1) n
oo
~ qi = lirr1 ~ qi = lirn ~
n-+oo
'i =O
~
i=O
n-+oo
1
n+ l
- q 1- q
Math Fact B.6
n ·i _ q ( l - qn [l +n,(l - q) ]) ( l - q)2 . . iq -
L
i= l
Math Fact B. 7 If lql < 1)
1 1- q
[ 485
==- Math Fact B.8:- -===
~ . n,(n, + 1) Lt J = 2 .7= 1
==- Math Fact B.91__,;;==
f, j
2
=
n(n + 1~(2n + 1)
.7=1
Calculus Math Fact B.l0----.1ntegration by Parts The integr atior1 by parts forrnula is 1
b l 'u,
dv
a,
= v,vl~.
-
lb
v dv,.
a,
r---- Math Fact B.11---..,Gamma Function The garr1ma function is defined as
If z = ri, a positive integer , t h en I' (n,) = (n, - 1) !. Also r1ote t hat I' (l / 2) = ft, I'(3/ 2) = fa/ 2, a nd r (5/ 2) = 3fi/ 4.
[ 486
A FEW MA TH FACTS
==- Math Fact
B.12'..........;=~ Leibniz's
Rule
The functior1
b(n.) R ((:J.)
=
1a,(n.)
r (a, x) rlx
has derivative
dR (a ) __,.. ( ( )) da((Y) - 1 a, a a 1 1,Q'. (a G
( )) db(o~) 1b(n.) + ,.( r a, b a + 1 G,Q'.
a,( n)
8r(a,x) _1 .• o,x .
0a
In the special case when a,((:J.) = a and b (a) = b are cor1st ants,
rb
R (a) = j °' r ((:J., ;,r;) dx, and Leib niz 's rule simplifies to
rlR(a ) da
=
(
6
la
or((Y,x; ) dx; . 8a
Math Fact B.13 ====-~change- of-Var ia bl e The or em Let x = T (y ) be a cont int1ousl:y differentiable trar1sformation frorr1 un t o R n. Let R be a set in un 11a·ving a bounda ry consisting of finit ely rr1any srnoot h sets. Suppose that R a nd it s bounda.r}' are conta ined in t h e interior of t11e doma ir1 of T , T is one-t o-one of R , a rid det( ()T' ), the J acobia r1 deterrninant of T , is nonzero on R. Then , if j '(x ) is bounded a nd cont ir1uo11s on T (R),
{ l r(R)
j (x )d.Vx = { j'(T(y ))l d et (T)' I dVy·
jR
[ 487
Vectors and Matrices Math Fact B.14
~-vector /Matrix
(a) Vect ors x and y are orthogon,al if x ' y
=
Definitions 0.
(b) A n um ber ;\ is a n eigen,value of a rr1atrix A if t her e exists a vector x Sl1cl1 t hat Ax= ;\x. The ·v ect or x is an eigerivector of rnatrix A . (c) A rnat rix A is syrnrnetric if A= A '. ( d) A squa re rnatrix A is
'Un,itar~y
if A ' A equals the ident ity rnatrix I .
(e) A real syrnrnetric m atrix A is positive defiriite if x 'Ax vector x.
> 0 for every nonzero
(f) A real S}rrr1metric m at rix A is positi:ue sernidefin/ite if x' Ax nonzero vector x.
> 0 for e·v ery
(g) A set of vectors { x 1 > ••• , X n} is orthon,orrnal if x' ix .i = 1 if i = j arid otherv.rise eq11als zero. (11) A m atrix U is 11,riitary if its colurnns { u 1 > •.• , un} are ort hor1ormal.
Math Fact B.15 Real Symmetric Matrices A real syrr1rr1etric rr1atrix .A has t11e follo,ving propert ies: (a) All eiger1va.l ues of A are real. (b) If x 1 and x 2 are eigenvectors of A corresponding t o eigenval11es A1 -=/: A2, t her1 x 1 and x 2 are ort hogonal \rectors. (c) A car1 be vvritter1 as A = UDU' vvhere D is a diago nal rr1atrix an d U is a llnitary m atrix vvith colurr1ns t hat are n, ort honorrr1al eigen,rect ors of A .
.---- Math Fact B.16 Positive Definite Matrices For a real syrr1rnetric rr1atrix A , the follo,vir1g staterr1ents are equiva.ler1t : (a) A is a positive defiriite rnatrix .
> 0 for all nor1zero vectors x. (c) Each eiger1value ;\ of A satisfies ;\ > 0.
(b) x ' Ax
(d) There exist s a nonsir1gul<:..i r rnatrix W such that A= WW'.
[ 488
A FEW MATH FACTS
Math Fact B.17 Positive Semidefinite Matrices For a real syrr1rnetric rr1atrix A , the follo\vir1g staterr1ents are equiva.ler1t:
(a) A is a positi,re sernidefir1ite rr1atrix.
(b) x' Ax > 0 for all vectors x. (c) E ach eiger1value .A of A satisfies .A > 0. ( d) T11ere exists a rnatrix W st1ch that A = WW'.
[
References Ber98.
P. L. Bernstein. Against the Gods: The R ernarkable Story of Risk. Johri W ile}') 1998.
Bill2.
P. Billingsley. Probability a'Tl,d Jl/le asv,re. Johri W iley & Soris) anniversary edition, 2012.
BT08.
D .P. Bertsekas arid J.N. T s itsiklis . I'Tl,troductio'TI, to Probability. Athena Scientific, 2nd edit ion) 2008.
D avlO.
T. A. D a·vis . MATLAB Prirner. CRC Press, 8t1i edition) 2010.
Doo90.
J. L. Doob. Stochastic Processes. ·\i\Tiley Repririt ) 1990.
Dra67.
A. vV . Drake . Furl,darnen,tals of Applied Probability Theory. McGravv-Hill) New York, 1967.
Dtu·94.
R. Durret t . The Esseritials of Probability. Dlixbury, 1994.
G al l3.
R. G. G allager. Stochastic Processes: Urii·versity Press, 2013.
GS93.
L. Gonick and \i\T. Srrlith. The Cartoo'TI, G'uide to Statist'ics. H arper P ererinial, 1993.
Gl1b06.
J. Gt1brier. Probability a'Tl,d Ra'Tl,dorn Processes for Electr~ical a'Tl,d Cornpv,ter Erigirieers. Carribridge Uni·versity Press) 2006.
H ayOl.
Sirrion Hay kin. Cornrn'U'Tl,icat'tOT/, Systerns. John vViley) 4th edit ion) 2001.
HLll.
D. Ha nselman and B. Litt lefield . Masteri'Tl,g MATLAB. Preritice Hall, 2011.
HSP87.
P. G. Hoel ) C. J. Stone) a nd S. C. Port. I'Tl,trodv,ctio'TI, to Stochastic Processes. Wavelarid Press, 1987.
K a}r98.
S. ~v1 . Ka}'· Furidarnen,tals of Statistical Sig'Tl,al Processi'Tl,g Volv,rne II: Detection, theor~y. Prent ice H all, 1998.
The or~y fo r
AJJplicatirJ'ns. Carnbridge
KMT12. H. Kobayashi) B. ]\/l ark) arid W . Turin. Probability7 Ra'Tl,dorn Processes7 a'Tl,d Statistical A 'Tl,alysis: Applicatio'Tl,s to Cornrnu'Tl:icatioris, Sign,al Processi'Tl,g7 Quev,ei'Tl,g Theory arid M athernatical Fi'Tl,a'Tl,ce. Carnbridge Universit}' Press) 2012. LGll.
A. Leon-Garcia . Probab'ility, Statist'ics, a'Tl,d Ra'Tl,dorn Processes fo r Electrical E'Tl,g'i'Tl,eerin,g. Prerit ice H all) third editiori) 2011.
NIRlO.
D. C. ]\/Iontgorriery and G. C. Rl1nger. A JJplied Statistics an,d Probability for E'Tl,girieers. John \i\Tiley & Sons) fifth edition, 2010.
PosOl.
K. Pookitt . Do Yo'u Feel Lucky ? The Secrets of Probab'i lity Scliolastic, 2001. 489
[ 490
REFERENCES
PP02 .
A. P apol1lis a rid S. U . Pilla i. Probab·ility) R aridom, Variables an,d Stochastic Processes. McGrav.r Hill, 4t h edition, 2002.
R os l2.
S . ~v1. R oss. A First Co1J,rse in, Probability. P earson, r1inth edit ior1, 2012.
SMMlO. R. L. Scheaffer , M . Niulekar , and J . T. McClave. Probab·ility ar11d Statistics fo r En,girieers. Cengage Learnir1g , 5t h edit ion, 2010. St r9 8.
G . Stra ng . ln1trodv,ction, to Lin,ear A lgebra. ·vVellesley Carn bridge Press, second ed it ion , 1998.
Ver98.
S . Verdu. M·ultiv,ser Detection,. Cambridge ·u r1iversit}r P ress, Ne'iv York, 1998.
W SO l .
J. W . Woods a rid H. Stark. Probability an,d R an,dorn Processes 111ith A pplication,s to S·ign,al Processin,g. Prer1t ice Hall, 3rd edition, 2001.
[
Index a priori probabiliLy , 15 acceptance set, 367 a lLernative hypothesis, 369 arrival, 74, 440 asyn1ptotically unbiased estimator, 347 a uLocorrelation function , 429, 449 random sequence, 449 wide sense sLationary process, 456 a u Locorrelation Brownian motion, 450 a uLocovariance function, 429 random sequence, 449 stochastic process, 449 a utocovaria nce Brownian motion, 450 average pO\·ver , 457 axioms of probability, 11 , 196 conditional probability, 16 consequences of, 13 a priori probabiliLy , 370 Bayes' t heorem, 22 bell curve, 321 Bernoulli decomposition , 444 Bernoulli process, 438 Bernoulli random variable expected value , 83 bernoullicdf.m, 103 bernoullipmf.m, 103 bernoullirv .m, 103 bias in estin1ators, 346 bigpoissonpmf.m, 117 binary hypoLhesis test, 369 maximum a posteriori probability, 374 minimum cost, 377 binary hypot hesis test, 370 binon1ial coefficient, 42 binomia l r a ndom variable, 113, 326-327 expected value, 84 binon1ialcdf.m, 103 binomia lpmf.m, 103, 336 binomialrv .n1, 103 blind estimation, 401
Brovvn . Robert , 447 Bro\·vnian motion, 446-447 joint PDF, 447 \·vit h a barrier, 476 brownian.1n, 466 central limit theorem, 322, 360 approximation, 322 confidence interval estimation , 356 significance tests, 368 Chebyshev inequality, 339, 341 Chernoff bound, 339, 341 chiptest.m, 56 circuits.m, 217 clipper, 237 clipping circuit, 226 collectively exhaustive, 6 combina tions, 42 communications system bina ry, 392 BPSK, 392, 469 CDlVIA, 395-397, 428 !v1PSK, 395, 398 QA!vI, 394-395 QPSK, 385-386, 394-395, 433 469 ' ternary amplitude shift keying, 394 compact disc, 325 complemen tary CDF standard normal, 142 components in parallel, 53 in series, 53 conditional expected value as a ra ndom variable , 263 given a random variable, 262 of a function given an event, 254 conditional probability, 16 conditional cumula tive distribution funcL ion , 243 expected value, 249, 263 n1ean square error, 401 probability density function, 258 properties, 249
probability mass function , 256 given an event, 243 joint, 252 properties, 249 variance, 250 confidence coefficient, 352, 354 confidence interval, 352, 354 Gaussian, 356 consistent estimator, 346 continuous random variable , 64, 118, 121 cu1nulative disLribution function, 1 21 expected value, 129 convergence almost a lways, 344 almost every\·v here, 344 almost surely, 344 in probability, 344 wit h probability 1, 344 convolution, 234 correlation coefficient, 185 in linear estimation, 405 correlation, 187 random vector, 287 random vectors , 286 counL.m, 103, 330, 464 counLequal. m , 61 counting process, 440 co unting fundamental principle of, 40 methods , 40 covariance matrix, 287, 289 random vectors , 287 covariance, 184 noisy observation, 189 of independent random variables, 189 of r andom vectors, 287 properties, 187 cross-correla tion, 289-290, 429 function, 459 of random vectors, 289 cross-covariance, 289-290 of rando1n vectors, 289 cumulative distribution function
491
[ 492
INDEX
conditional, 243 continuous random variable, 121 discrete random variable, 77 join t , 164 derived from joint PDF, 173 multivariate, 1 95 of a pair of randon1 vectors, 278 random vectors, 278 standard normal, 140 DC voltage, 457 decision regions, 386 decision stat istic, 370 decorrelation , 396 delta func tion, 145 deltarv .m, 241 depart ures, 465 De Moivre-Laplace formula , 327 De l\1organ's l a\~' , 6 d iabetes test , 270 discrete random variable conditional f>f\1 F , 256 variance, 94 discrete uniform random variable expected value, 84 dispersion, 93 distinguishable objects, 41 dtrianglerv .m, 269 duniforn1cdf.n1, 103 duniforn1pmf. m, 103 duniformrv .m, 103 eigenvalue, 297, 487 eigenvector, 487 E instein . J\lbert, 447 ensemble averages, 431 ensemble, 431 equally likely o utcomes , 14 erf. m, 152 ergodic, 457 erlangcdf.m, 153 erlangpdf.1n , 153 erlangrv .m , 153, 235 estimation blind, 401 linear, 404 from random vectors, 4 14 of parameters, 427 LMSE, 404 estimator asymptotically unbiased , 347 consis tent, 346 linear inean square error (LMSE), 405
maximum a posteriori probability, 409-410 m 1n1mum mean square error (MlVISE), 402 unbiased, 347 event, 10 expectation, 82 iterated , 264 expected value , 65, 8 1, 338 Bernoulli random variable, 83 binomial random variable , 84 conditional, 249, 263 continuous random variable, 129 derived random variable, 90 discrete random variable , 8 1 discrete uniform random variable, 84 exponential random variable , 135 function of t\·VO random variables, 182 geometric random variable, 83 of a function, 130 of a sum of funct ions , 182 ofs un1 , 182 Pascal random variable , 84 random mat rix , 286 random sum, 319 randon1 vectors, 286 stochastic process, 448 experiment, 8 exponential randon1 variable expected value, 135 variance, 135 exponentialcdf.m, 153 exponentialpdf.m, 153 exponentialrv.m, 153 fac torial.m, 100 false acceptance, 368 false alarm, 371 false reject ion , 368 find.m, 268 fines t-grain, 9 finitecdf.m, 103 finitepmf.m, 99, 103-105 , 203, 329 finiterv.m, 103, 105, 203 first moment , 96 floor.m, 10 1 football pool, 390 freqxy.m, 204 game shO\·V l\1onty Hall , 38
suitcases, 112 gausscdf.m, 153 Gaussian PDF bivaria te , 475 multivaria te, 291 , 4'75 process vvhite noise, 463 \·vide sense stationary, 463 random variables, 291, 480 bivariate, 191 ,, random vector, 304 random vectors , 291 stochastic process, 462 gausspdf.m, 153 gaussrv .m, 153 gaussvector.m, 299, 467 gaussvectorpdf.m, 299 genetics, 32, 34 geometric randon1 variable expected value, 83 geometriccdf.m, 103 geometricpmf.m , 100, 103 geometricrv.m, 103 , 117, 161 georv .1n , 161 Gray code, 398 gseq.m , 467-468 handoffs, 31 hard limiter, 226 headwind, 236, 335 high blood press ure , 247 his t .m, 56, 61, 104, 204 human granulocytic ehrlichiosis, 32 hypot hesis test , 367 binary, 370-374, 385 maximum a posteriori probabilit y, 374 maximum a posteriori probabilit y, 373-374 maximum likelihood , 381-382 m inimu1n cost , 377 Neyman-Pearson, 379-380 m ultiple, 384 maximum a posteriori probabilit y, 385 maximum likelihood, 385 icdf3spin.m, 235 icdfrv .m, 235, 241 icdfw.m, 241 identically dis t ributed, 200 iid-see independent and identically distrib uted, 200 imagepmf.m, 202 imagerv .m, 203 imagesize.m , 202-203
[ INDEX
imagestem.m, 204 improper experiments, 64 increments process, 472 independent and identically distributed, 200 random sequence, 43'7, 450 independent N rando1n variables, 200 components of a Gaussian random vector, 292 events, 24 inore t han t \·VO events, 26 three events, 26 increments, 447, 450 random variables, 1 79 random vectors, 280 t rials, 44, 49 indicator random variables, 344 intera.rrival ti1ne, 116, 442-444 Poisson process, 4 4 2 In ternet, 11, 18, 51, 69, 138, 248, 274,368 jit ter , 116 join t ly wide sense sta tionary processes cross-correlation, 461 j uly temps.m, 300 Laplace t ransforn1, 311 law of averages, 338 law of large nun1bers validation of rela tive frequencies, 345 weak, 344, 349 law of Lota l probability, 21 likelihood func tions, 370 likelihood raLio, 374 limiter, 226-227, 229, 238 linear estimate, 401 linear estimation, 404 using a random vector, 4 14, 4 16, 419 linear estimator Gaussian, 407 linear mean square error (LMSE) estimator, 405 linear prediction, 420 linear Lransformation Gaussian random vector, 294 moment generating function, 313 of a random vect or, 284, 289 lottery, 60, 109, 114-115, 275, 305 Lyme disease, 32
marginal probability mass function , 170 marginal probability mass funct ion, 169-170 Markov inequality, 339-340 MATLAB funct ion bernoullicdf, 103 bernoullipmf, 103 bernoullirv , 103 bigpoissonpmf, 117 binomialcdf, 103 binomialpmf, 103, 336 binomialrv, 103 bro,.v nian, 466 chiptest , 56 circuits, 217 count, 103, 330,464 co untequal , 61 deltarv , 241 d t ria.nglerv, 269 duniformcdf, 103 d uniformpmf, 103 d uniformrv, 103 erf, 152 erlangcdf, 153 erlangpdf, 153 erlangrv , 153, 235 exponentialcdf, 153 exponentialpdf, 153 exponentialrv, 153 factorial, 100 find , 268 finiLecdf, 103 finiLepmf, 99, 103-105, 203, 329 fin i terv , 103, 105, 203 fioor, 10 1 freqxy , 204 gausscdf, 153 gausspdf, 153 gaussrv, 153 gaussvecLor, 299,467 gaussvecLorpdf, 299 geometriccdf, 103 geometricpmf, 100, 103 geon1etricrv, 103, 117, 161 georv, 16 1 gseq, 467-468 hisL , 56, 61 , 104 , 204 icdf3spin, 235 icdfrv , 235, 241 icdfw , 241 imagepmf, 202 imagerv , 203 in1agesize, 202-203 imagesLem, 204 july temps, 300
493
modemrv, 276 mse, 422 ndgrid, 61 , 201, 329 newarrivals , 476 pasca.lcdf, 103 pascalpmf, 103 pascalrv, 103 phi, 152 p lot, 336 plot3, 205 pmfplo t , 104 poissonarrivals, 464, 476 poissoncdf, 103 poissonpmf, 101 , 103, 117 poissonprocess, 464-465 poissonrv, 103 q uiz3lrv , 160 rand , 27-28, 102-103, 119, 153, 225, 234 randn, 153, 298-299, 329-330 shipcostpmf, 116 shipcostrv, 105 shipweight8, 116 shipweight pmf, 100 simswitch, 465 stem3, 204 sumxlx2, 329 svd, 298-299 t 2rv, 154 Lhreesun1, 329 Loeplit z, 467 Lrianglecdfplot, 217 uniform l 2, 330 uniformcdf, 153 uniformpdf, 153 uniformrv , 153, 234 unique, 116, 202,204 urv, 241 volt po,.v er, 104 wrv l , 241 wrv2, 241 x5,298 xy t rianglerv, 269 matrix positive definite, 296, 487 maximum a posteriori probability binary hypothesis test , 373 estimator, 4 10 maximum likelihood estimate , 411 maxim um likelihood binary hypothesis test , 381-382 decision rule , 381 multiple hypot hesis Lest , 385 McNabb Donovan, 333
[ 494
INDEX
mean square error, 34 7 , 399 mean value, 82, 338 mean , 80, 338 median , 80-81 memory less property Poisson process, 443 memory less property, 402 m1n1mum inea.n square error (MMSE) estimator, 402 m iss, 371 m ixed random variable, 145 , 150 mode , 80-81 model of an experiment, 8 models , 8 modem, 124, 133, 276, 373, 384 mode1nrv .1n, 276 moment genera.ting function , 310 sums of random variables, 314 table of, 312 moments exponentia l random variable, 313 of a random variable, 95 Monty Hall , 38 mse.m, 422 multi1noda.l , 80 multinomial coefficient, 47-4 8 mutually exclusive events, 24 M&Ms, 107, 109, 114 ndgrid.m, 61, 201, 329 newa.rriva.ls.m, 476 Neyman-Pearson binary hypothesis test, 379-380 noisy observation, 424, 426 covariance, 189 j oint PDF, 180 probability density function, 194 normal, 139 null hypothesis, 368-369 null set, 4 observations , 8 one-tail significance test, 369 order statistics, 217 ordered sample, 45 ort hogonality principle, 406 ort honormal , 487 o utcome, 9 pacemaker factory , 389 partition, 6
Pascal random variable expected value, 84 pascalcdf.1n, 103 pascalpmf.m, 103 pascalrv.m, 103 permuta tions, 4 1 phi.m, 152 plot .m, 336 plot3.m, 205 pmfplot.m, 104 Poisson process, 138, 441 arrival rate, 441 Bernoulli decomposition, 444 competing, 446 in tera.rrival time, 442 memory less property, 442-443 sums of, 443 poissonarriva.ls.m, 464 , 476 poissoncdf.m, 103 poissonpmf.m, 101 , 103 , 117 poissonprocess.m, 464-465 poissonrv.m, 103 positive definite matrix , 487 positive semidefinite, 487 prediction, 399 prior probability , 15, 370 probability density funct ion, 119, 124 bivariate Gaussian conditional, 266 conditional joint, 253 conditional, 258 given an event, 244 properties, 249 joint, 171 of noisy observation, 180 properties, 172 marginal, 1 77 multivariate marginal, 198 multivariate, 196 of a pair of random vectors, 278 properties, 125 random vectors, 278 probability mass function , 66 conditional, 256 given an event, 243 properties, 249 joint, 166 n1arginal, 169-170 m ultivariate marginal, 198 multivariate, 195 of a pair of random vectors, 278 randon1 vectors, 278 probability axioms of, 11, 196
a priori, 370 prior, 370 procedure, 8 projection, 406 QPSK comm unications system, 394 quantization noise, 270 q ua.n tizer uniform, 2 4 6 , 270, 276 quiz3lrv .m, 160 radar system, 370-372, 379, 391 rand.m, 27-28, 102-103, 119, 153, 225 , 234 randn.m, 153, 298-299, 329-330 random matrix, 286 expected value, 286 random sequence, 434 autocorrelation function, 449 autocovariance function, 449 Bernoulli, 4 38 independent and identically distrib uted, 437 joint PMF /PDF, 438 stationary, 45 1- 4 52 wide sense stationary, 420 , 456 random sum, 317-318 random variables, 62, 64 nth moment, 31 1 Bernoulli, 69, 71 , 477 beta, 4 12-413, 479 binon1ia.l, 71 , 84, 111, 116, 477 Cauchy, 4 79 derived, 86 expected value, 90 inverse CDF method, 225 probability density [unction, 220 discrete uniform, 73, 84, 477 discrete, 65 Erlang, 136, 480 exponential, 1 3 4 , 480 generating samples, 234 funct ion of, 229 ga.1nma, 480 Gaussian, 139, 480 geometric , 7 1 , 4 78 independent, 179 indicator, 344 jointly Gaussian, 291 La.place, 481 log-normal, 481 maximum of, 282
[ INDEX
maximum CDF 230 Maxwell, 481 m inimum of, 282 mixed, 150 moments, 95 , 311 multinomial , 197, 478 normal , 139 ort hogonal, 188 Pareto, 156, 481 Pascal, 72, 84, 4 78 Poisson, 74, 478 Rayleigh, 226, 482 standard normal, 140, 295 sum of PDF, 233 uncorrelated, 188, 307 uniform, 132, 482 Zipf, 108, 117,479 random vector Gaussian, 304 random vectors , 162, 278 correlation, 286 covariance inatrix, 287 cumula.t i ve dis t ribution function, 278 expected value, 286 func t ion of, 281 , 283 Gaussian, 291 independent, 280 linear transformation of, 284 probability density function, 278 probability mass func t ion, 278 sample value, 278 standard normal, 295 range, 62 receiver operating curve (ROC), 371 rectifier, 238 clipping, 237 region of convergence, 31 1 rejection set , 367 rela.ti ve frequency, 11-12, 345, 353 and la.,.vs of large numbers, 345 reliability analysis, 52 right hand limit , 79 router , 11 , 18, 27, 51 , 138, 274 sailboat race , 216, 305 sample function, 430 sample mean , 337-338, 338 as estimator, 348 consis tent estimator , 349 expected value, 338 mean square error, 348 )
stationary stochastic process, 458 trace, 358 unbiased estimator, 348 variance, 338 sample space grid , 201 sample space, 9 san1ple variance biased, 350 unbiased, 35 1 san1ple, 4 1 ordered, 45 sampling, 4 1 \·v ith replacement , 44-45 \·v ithout replacement, 4 1, 44 second inoment, 96 second-order statistics, 286, 289 sequential experiments, 35 sets , 3 collectively exhaustive, 6 complement , 5 disjoint, 5 elements of, 3 in tersection , 5 mutually excl usive, 5 partition, 6 union, 5 shipcost pmf.m, 116 shipcost rv .m , 105 shipvveight8.m, 116 shipvveightpmf.m, 100 sifting property of t he delta. funct ion, 147 signal constellation, 386 signal space, 386 signal- to-noise ratio , 190 significance level, 367 significance test, 366-367 central limit theorem, 390 sims\·vitch.m, 465 singular value decon1position , 296-297, 299 snake eyes, 361 standard deviation, 94 standard error, 348 standard normal complementary CDF, 142 cumula tive distribution function , 140 ra.ndon1 variables, 295 random vectors, 295 stationary random sequence, 451- 452 stochastic process, 452 properties, 453 sample mean, 458 statis tic , 82
495
statistical inference, 337, 366 stem3.m, 204 stochastic process, 430 autocorrelation func tion, 449 a.utocovaria.nce func tion, 449 continuous-time, 434 continuous-value, 434 discrete-time, 434 discrete-value, 434 expected value, 448 Gaussian, 462, 474 Poisson, 4 70 stationary, 452 wide sense stationary, 455 j oin tly , 459 strict sense stationary, 456 subexperiments, 35 subset, 4 sums of random variables, 306 expected value of, 307 exponent ial, 316 Gaussian, 316 moment genera.ting func tion, 314 PDF, 233 Poisson, 315 variance, 307-308 sumxlx2.m, 329 SYD, 297, 299 svd.m, 298-299 symmet ric, 487 t2rv .m, 154 tails, 142 three-sig1na. event , 145 threesum.m, 329 time average, 431 time sequence, 429 Toeplitz forms, 420 toeplit z.m, 467 tree diagrams, 35 trials , 12 independent , 44, 49 trianglecdfplot.m, 217 tvveet , 29-30, 368 tv.ritter, 368 tv.ro-tail significance test, 369 Type II errors , 368 Type I errors, 368 typical value, 338 unbiased estimator, 347 uniform l 2.m, 330 uniformcdf.m, 153 uniformpdf.m, 153 uniformrv .m, 153, 234 unique.m, 116, 202, 204 unit in1pulse function, 146
[ 496
INDEX
unit step function , 147 unitary, 48 7 universal set, 4 urv.m, 241 variance, 94 conditional, 250 discrete r andom variable, 94 estimation of, 350 of sum, 183
s ums of random variables , 307-308 vectors orthogonal, 487 Venn diagrams , 4 volt power.m, 104 white noise, 463 wide sense stationary Gaussia n process, 463 process
autocorrelation function, 456 average power, 457 random sequence, 420, 456 stochastic process, 455 \·vr vl.m, 241 \•Vr V2.m, 241
x5 .m, 298 xytrianglerv .m, 269