Probability and Stochastic Processes: A Friendly Introduction for Electrical and Computer Engineers Roy D. Yates and David J. Goodman Problem Solutions : Yates and Goodman,9.1.2 9.2.2 9.3.2 9.3.6 9.3.7 9.4.3 9.4.4 9.5.8 9.5.9 9.6.4 9.7.4 and 9.7.6 Problem 9.1.2 (a) We wish to develop a hypothesis test of the form
P K E K
c 0 05
to determine if the coin we’ve been flipping is indeed a fair one. We would like to find the value of c, which will determine the upper and lower limits on how many heads we can get away from the expected number out of 100 flips and still accept our hypothesis. Under our fair coin hypothesis, the expected number of heads, and the standard deviation of the process are E K 50 σK
100 1 2 1 2 5
Now in order to find c we make use of the central limit theorem and divide the above inequality through by σK to arrive at
K E K P σk
c σK
0 05
Taking the complement, we get P
c σK
K E K σk
c σK
0 95
Using the Central Limit Theorem we can write c σK
Φ
Φ
c
σK
2Φ
c σK
1 0 95
This implies Φ c σK 0 975 or c 5 1 96. That is, c 9 8 flips. So we see that if we observe more then 50 10 60 or less then 50 10 40 heads, then with significance level α 0 05 we should reject the hypothesis that the coin is fair. (b) Now we wish to develop a test of the form PK
c 0 01
Thus we need to find the value of c that makes the above probability true. This value will tell us that if we observe more than c heads, then with significance level α 0 01, we should reject the hypothesis that the coin is fair. To find this value of c we look to evaluate the CDF FK k
0 ∑ki 1
100 0 i
1
1 2
100
k 0 k 0 1 100 k 100
Computation reveals that c 62 flips. So if we observe 62 or greater heads, then with a significance level of 0.01 we should reject the fair coin hypothesis. Another way to obtain this result is to use a Central Limit Theorem approximation. First, we express our rejection region in terms of a zero mean, unit variance random variable.
PK
c 1 P K
Since E K 50 and σK
c 1 P
K E K σK
c E K σK
0 01
5, the CLT approximation is PK
c 50 5
c 1 Φ
0 01
From Table 4.1, we have c 50 5 2 35 or c 61 75. Once again, we see that we reject the hypothesis if we observe 62 or more heads. Problem 9.2.2 Given hypothesis Hi , K has the binomial PMF PK Hi k
n k
qki 1 qi
n k
0
k 0 1 n otherwise
(a) The ML rule is
k A1 if PK H1 k PK H0 k .
k A0 if PK H1 k
PK H0 k .
When we observe K k 0 1 n , plugging in the conditional PMF’s yields the rule
k A1 if
k A0 if
n k n k
qk1 1 q1
n k
qk1 1 q1
n k
n k n k
qk0 1 q0
n k
qk0 1 q0
n k
Cancelling common factors, taking the logarithm of both sides, and rearranging yields
k A1 if k ln q1 ! n k ln 1 q1 " k lnq0 ! n k ln 1 q0
k A0 if k ln q1 ! n k ln 1 q1
k lnq0 ! n k ln 1 q0
By combining all terms with k, the rule can be simplified to
k A1 if k ln
q1#%$ 1 q1 & q0#%$ 1 q0 &
n ln
1 q0 1 q1
k A0 if k ln
q1#%$ 1 q1 & q0#%$ 1 q0 &
n ln
1 q0 1 q1
Note that q1
q0 implies q1 ' 1 q1
q0 ' 1 q0 . Thus, we can rewrite our ML rule as
k A1 if k k () n ln * q1# lnq0* $-/1. lnq* 0$ 1&+#, $ 1q0 &+q#%1$ 1& -
k A0 if k k (
2
q1 & -
(b) Let k ( denote the threshold given in part (a). Using n 500, q0 have k(
500
ln 1 10 4 ' 1 10 2 10 40 ln 1 10 4 ' 1 10
2
ln 10
10 4 , and q1
2
10 2 , we
1 078
Thus the ML rule is that if we observe K 1, then we choose hypothesis H0 ; otherwise, we choose H1 . The conditional error probabilities are
P E H0 P A1 H0 P K
1 H0
1 PK H0 0
1 1 1 q0
500
PK H1 1
500q0 1 q0
499
0 0012
and
P E H1 P A0 H1 P K
1 H1
PK H1 0
1 q1
500
2
PK H1 1
499
500q1 1 q1
0 0398
(c) In the test of Example 9.7, the geometric random variable N, the number of tests needed to find the first failure, was used. In this problem, the binomial random variable K, the number of failures in 500 tests, was used. We will call these two procedures the geometric and the binomial tests. Also, we will use P EN Hi and P EK Hi to denote the respective conditional error probabilities. From Example 9.7, we have the following comparison: geometric test
binomial test
P EN H0 0 045
P EK H0 0 0012
P EN H1 0 0095
P EK H1 0 0398
When making comparisons between tests, we want to judge both the reliability of the test as well as the cost of the testing procedure. With respect to the cost of the test, the geometric test is guaranteed to never require more than 464 tests because if we observe 464 consecutive working devices, then we choose H0 . Moreover, the geometric test may require far fewer than 464 tests, particularly if hypothesis H1 happens to be true. On the other hand, the binomial test always uses 500 tests. Consequently, the geometric test is likely to cost a little less, although the difference may not be very significant. When we examine the test reliability, we see that the conditional error probabilities appear to be comparable in that P EK H043 P EN H0 while P EN H143 P EK H1 . If we knew the a priori probabilities P Hi and also the relative costs of the two type of errors, then we could determine which test procedure was better. However, in the absence of that kind of information, we make the following observation. Given that the product is a pacemaker whose installation requires heart surgery, we would like the device to be very very reliable. Hence, if H1 is true and the failure rate is q1 0 01, we would like very much to know this fact before the pacemaker is available for sale. On the other hand, if H0 is true and we make an error and guess that the failure rate is q1 0 01, then we may go back and redesign either the pacemaker or 3
the assembly process to improve reliability. It seems likely that this cost may be significantly lower than the cost of installing many faulty pacemakers. These arguments suggest that the conditional error probabilities given H1 are far more important. On this basis, it would appear that the geometric test is a better test. Problem 9.3.2 1 2 Let the components of si jk be denoted by si$ jk& and si$ jk& so that given hypothesis Hi jk , 1
X1 X2
si$ jk& 2 si jk& $
N1 N2
As in Example 9.9, we will assume N1 and N2 are iid zero mean Gaussian random variables with variance σ2 . Thus, given hypothesis Hi jk , X1 and X2 are independent and the conditional joint PDF of X1 and X2 is fX1 5 X2 Hi jk x1 x2
fX1 Hi jk x1 fX2 Hi jk x2
1 $ x1 e 2πσ2 1 8* $ x1 e 2πσ2
17
s6i jk &
2 #
17 2
s6i jk &
2σ2
.$
e
27
x2 si6 jk & $
27 2
x2 s6i jk &
- #
2 #
2σ2
2σ2
In terms of the distance x si jk between vectors x
1
x1 x2
si jk
si$ jk& 2
si$ jk&
we can write fX1 5 X2 Hi x1 x2
1 9 e 2πσ2
x si jk 9
2 #
2σ2
Since all eight symbols s000 : s111 are equally likely, the MAP and ML rules are
x Ai jk if fX1 5 X2 Hi jk x1 x2 "
fX1 5 X2 Hi; j; k; x1 x2 for all other hypotheses Hi;
j; k;
.
This rule simplifies to
x Ai jk if x si jk
x si jk for all other i< j< k< .
This means that Ai jk is the set of all vectors x that are closer to si jk than any other signal. Graphically, to find the boundary between points closer to si jk than si; j; k; , we draw the line segment connecting si jk and si; j; k; . The boundary is then the perpendicular bisector. The resulting boundaries are shown in the following figure:
4
@)A
WX XEY
KLEMEM
TDU UEV
HDIEJEJ QRESER
FGEGEG
NDOEPEO
BDCECEC =?>
]^E_
ghEhEi ZD[E\
dDeEeEf
`Da a a
jDkElEk mnEoEn
bc c c
Problem 9.3.6 (a) In Problem 9.3.4, we found that in terms of the vectors x and si , the acceptance regions are defined by the rule
x Ai if 9 x si 9
2
x sj
2
for all j
Just as in the case of QPSK, the acceptance region Ai is the set of vectors x that are closest to si . Graphically, these regions are easily found from the sketch of the signal constellation: r:s
v
y
tvu pq
v
|v}~
v
v{
5
wyx{z
v{
(b) For hypothesis A1 , we see that the acceptance region is
A1 ¡' X1 X2 0 X1 Given H1, a correct decision is made if X1 X2 Thus,
P
¢
Φ 1 σN
¢
2Φ 1 σN
2 0 1 N2
1 N1
2
A1 H1
P 0 1 N1 ¢
A1 . Given H1, X1 1 N1 and X2 1 N2 .
P C H1 P X1 X2
2 0 X2
1
1
2
2
Φ 1 σN D
2
2
(c) Surrounding each signal si is an aceptance region Ai that is no smaller than the acceptance region A1 . That is,
P C Hi P X1 X2
P
2
1 N1
P
Ai H1
1 N1
1 1 N2
1
1
2
P C H1
This implies 15
∑ P C Hi P H1
P C
15
i 0
∑ P C H1 P Hi
15
P C H1
i 0
∑ P Hi
P C H1
i 0
Problem 9.3.7 Theorem 9.7, the MAP multiple hypothesis test is
x1 x2
Ai if pi fX1 5 X2 Hi x1 x2 £
p j fX1 5 X2 H j x1 x2 for all j
From Example 9.9, the conditional PDF of X1 X2 given Hi is fX1 5 X2 Hi x1 x2
1 ¤* $ x1 ¥ e 2πσ2
E cos θi &
2
.$
x2 ¥ E sin θi &
2
- #
2σ2
Using this conditional joint PDF, the MAP rule becomes
¦
x1 x2
Ai if for all j
x1 1§ E cos θi
2
!
x2 1§ E sin θi
2
1
x1 !§ E cos θ j
2
!
x2 !§ E sin θ j
2
2σ2
ln
Expanding the squares and using the identity cos2 θ sin2 θ 1 yields the simplified rule
x1 x2
Ai if for all j, x1 cos θi cos θ j 0 x2 sin θi sinθ j
6
pj σ2 ln § E pi
pj pi
Note that the MAP rules define linear constraints in x1 and x2 . Since θi following table to enumerate the constraints:
i i i i
0 1 2 3
cos θi 1 § 2 1 § 2 1 '§ 2 1 § 2
π 4 iπ 2, we use the
sin θi 1 § 2 1 § 2 1'§ 2 1 § 2
To be explicit, to determine whether x1 x2 Ai , we need to check the MAP rule for each j ¨ Thus, each Ai is defined by three constraints. Using the above table, the acceptance regions are
x1 x2
A0 if
x1
x1 x2
x1 x2
x1 x2
σ2 p1 ln 2E p0
§
σ2 p3 ln 2E p0
§
σ2 p2 ln 2E p0
x1 x2 §
x2
σ2 p2 ln 2E p1
§
σ2 p3 ln 2E p1
x1 x2
§
A2 if
x1
§
x2
A1 if
x1
σ2 p1 ln 2E p0
§
σ2 p2 ln 2E p3
x2
§
σ2 p2 ln 2E p1
x1 x2 §
σ2 p2 ln 2E p0
A3 if x1 §
σ2 p2 ln 2E p3
x2
σ2 p3 ln 2E p0
§
x1 x2 §
σ2 p2 ln 2E p3
Using the parameters σ 0 8
E
1
p0
1 2
p1
p2
p3 1 6
the acceptance regions for the MAP rule are
A0 2' x1 x2 x1
0 497 x2
A1 2: x1 x2 x1
A2 2' x1 x2 x1
A3 2: x1 x2 x1
0 497 x1 x2
0 497 x2 0 x2
0 x2
Here is a sketch of these acceptance regions:
7
0 x1 x2
0 x1 x2
0 497
0
0 497
0 497 x1 x2
0
i.
«)¬
³´
±D²
D®
¯° ©?ª
·£¸
¹»º
µ¶
¼"½
Note that the boundary between A1 and A3 defined by x1 x2 0 plays no role because of the high value of p0 . Problem 9.4.3 (a) The marginal PMFs of X and Y are listed below PX x
1 3 x 2 1 0 1 0 otherwise
PY y
1 4 y 2 3 1 0 1 3 0 otherwise
(b) No, the random variables X and Y are not independent since PX 5 Y 1 3
0
¨
PX 1 PY D 3
(c) Direct evaluation leads to E X ¤ 0
Var X 2 3
E Y 0
Var Y 5
This implies σX 5 Y
Cov X Y E XY ' E X E Y E XY 7 6
(d) From Theorem 9.11, the optimal linear estimate of X given Y is Xˆ L Y
ρX 5 Y
σX
Y µY σY
µX
7 Y 0 30
Therefore, a ( 7 30 and b ( 0. (e) The conditional probability mass function is
PX Y x
3
PX 5 Y x 3 PY 3
8
1# 6 1# 4 1# 12 1# 4
0
2 3
x 2 1
1 3 x 0 otherwise
(f) The minimum mean square estimator of X given that Y
xˆM 3
EXY
∑ xPX Y
3
2
3 is
x
3
2 3
¢
x
(g) The mean squared error of this estimator is
eˆM 3
E X xˆM 3 2 Y
3
¢
∑x
2 3 2 PX Y x
3
x
1 3
¢
2
2 3
!
2 3
2
1 3
2 9
Problem 9.4.4 to each other. In particular, completing the row sums and column sums shows that each random variable has the same marginal PMF. That is, PX x
PY x
PU x
PV x
PS x
PT x
PQ x
PR x
1 3 x ¢ 1 0 1 0 otherwise
This implies E X E Y E U E V ¾ E S¾ E T E Q E R 0 and that E X2
E Y2
E U2
E V2
E S2
E T2
E Q2
E R2
2 3
Since each random variable has zero mean, the second moment equals the variance. Also, the standard deviation of each random variable is § 2 3. These common properties will make it much easier to answer the questions. (a) Random variables X and Y are independent since for all x and y, PX 5 Y x y
PX x PY y
Since each other pair of random variables has the same marginal PMFs as X and Y but a different joint PMF, all of the other pairs of random variables must be dependent. Since X and Y are independent, ρX 5 Y 0. For the other pairs, we must compute the covariances. Cov U V E UV ¾2 1 3
1
1 3
1
1
¡
2 3
Cov S T E ST 1 6 1 6 0 1 1 6 1 6 0 Cov Q R E QR 1 12 1 6 1 6 1 12 2 1 6 The correlation coefficient of U and V is ρU 5 V
Cov U V §
Var U
Var V §
§
2 3
2 3 § 2 3
¢
1
In fact, since the marginal PMF’s are the same, the denominator of the correlation coefficient will be 2 3 in each case. The other correlation coefficients are ρS 5 T
Cov S T 2 3
ρQ 5 R
0
9
Cov Q R 2 3
2
1 4
(b) From Theorem 9.11, the least mean square linear estimator of U given V is Uˆ L V
σU E V
V σV
ρU 5 V
E U ρU 5 V V
V
2
Similarly for the other pairs, all expected values are zero and the ratio of the standard deviations is always 1. Hence,
ρX 5 Y Y
ρS 5 T T
ρQ 5 R R 2 R 4
Xˆ L Y SˆL T Qˆ L R
0
0
From Theorem 9.11, the mean square errors are Var X 1 ρ2X 5 Y
eL( X Y
eL( U V
2 3
Var U 1 ρU2 5 V
eL( S T
Var S 1
eL( Q R
Var Q 1
ρ2S 5 T ρ2Q 5 R
0
2 3
5 8
Problem 9.5.8 The minimum mean square error linear estimator is given by Theorem 9.11 in which Xn and Yn play the roles of X and Y in the theorem. That is, our estimate Xˆn of Xn is Xˆn Xˆ L Yn
1
ρXn 5 Yn¿
By recursive application of Xn cXn
1# 2
Var Xn Var Yn 1
1
Yn
E Yn
1
1
E Xn
Zn 1, we obtain
1
n
∑ aj
Xn an X0
1
Zn
j
j 1
The expected value of Xn is E Xn an E X00 ∑nj n
Var Xn a2n Var X0'
∑ aj
1a
j 1E
Zn
j
0. The variance of Xn is n
1 2
Var Zn
j
a2n Var X00 σ2 ∑ a2
j 1
j 1
Since Var X0 σ2 ' 1 c2 , we obtain Var Xn Note that E Yn
1¾
dE Xn
10
Var Yn Since Xn and Yn
1
c2n σ2 1 c2
σ2 1 c2n 1 c2
σ2 1 c2
E Wn 0. The variance of Yn 1
d 2 Var Xn
10
d 2 σ2 1 c2
Var Wn
have zero mean, the covariance of Xn and Yn
Cov Xn Yn
1
E XnYn
1
E cXn 10
1
Zn
is
1
1
η2
is
1
dXn
1
Wn
1
j 1
1
From the problem statement, we learn that E Xn 1Wn
1
0
E Zn 1 Xn
1
0
Hence, the covariance of Xn and Yn
E Xn
1
The correlation coefficient of Xn and Yn ρXn 5 Yn¿ 1
1
0
E Zn 1Wn
1
0
is
Cov Xn Yn
Since E Yn
E Wn
1
1
Cov Xn Yn 1 Var Xn Var Yn
1
is
1
1
cd Var Xn
1
and E Xn are zero, the linear predictor for Xn becomes
Xˆn ρXn 5 Yn¿
1
1# 2
Var Xn Var Yn 1
Yn
1
Cov Xn Yn Var Yn 1
1
Yn
1
cd Var Xn 1 Yn Var Yn 1
1
Substituting the above result for Var Xn , we obtain the optimal linear predictor of Xn given Yn 1 . Xˆn
c 1 Yn d 1 β2 1 c2
1
where β2 η2 ' d 2 σ2 . From Theorem 9.11, the mean square estimation error at step n eL( n
E Xn Xˆ n
2
1 ρ2Xn 5 Yn¿
Var Xn
We see that mean square estimation error eL( n creasing function β.
Problem 9.5.9 requested parameters are:
11
1
σ2
1
1 β2 β2 1 c2
(1)
eL( , a constant for all n. In addition, eL( is an in-
3
2
2
1
1 Xn
Xn
3
0
−1
0
−1
−2
−2
Actual Predicted
−3
Actual Predicted
−3 0
10
20
30
n
40
50
0
10
20
n
10
(d) c 0 6, d
3
3
2
2
1
1 Xn
Xn
(a) c 0 9, d
0
−1
40
50
40
50
10
Actual Predicted
−3 0
10
20
30
40
50
0
10
20
n
30 n
(b) c 0 9, d
1
(e) c 0 6, d
3
3
2
2
1
1 Xn
Xn
50
0
−2
Actual Predicted
−3
0
−1
1
0
−1
−2
−2
Actual Predicted
−3
Actual Predicted
−3 0
10
20
30
40
50
0
10
n
(c) c 0 9, d
40
−1
−2
For σ η Yn 1 is
30 n
20
30
n
0 1
(f) c 0 6, d
0 1
1, the solution to Problem 9.5.8 showed that the optimal linear predictor of Xn given
Xˆn
d 2 !
cd Yn 1 c2
1
The mean square estimation error at step n was found to be eL( n
eL( σ2
12
d2 1 d 2 ! 1 c2
We see that the mean square estimation error is eL( n eL( , a constant for all n. In addition, eL( is a decreasing function of d. In graphs (a) through (c), we see that the predictor tracks Xn less well as β increases. Decreasing d corresponds to decreasing the contribution of Xn 1 to the measurement Yn 1 . Effectively, the impact of measurement noise variance η2 is increased. As d decreases, the predictor places less emphasis on the measurement Yn and instead makes predictions closer to E X ¤ 0. That is, when d is small in graphs (c) and (f), the predictor stays close to zero. With respect to c, the performance of the predictor is less easy to understand. In Equation 3, the mean square error eL( is the product of σ2 1 c2
Var Xn
1 ρ2Xn 5 Yn¿
1
d 2 1 1 c2 d 2 ! 1 c2
As a function of increasing c2 , Var Xn increases while 1 ρ2Xn 5 Yn¿ 1 decreases. Overall, the mean square error eL( is an increasing function of c2 . However, Var X is the mean square error obtained using a blind estimator that always predicts E X while 1 ρ2Xn 5 Yn¿ 1 characterizes the extent to which the optimal linear predictor is better than the blind predictor. When we compare graphs (a)-(c) with a 0 9 to graphs (d)-(f) with a 0 6, we see greater variation in Xn for larger a but in both cases, the predictor worked well when d was large. Note that the performance of our predictor is limited by the fact that it is based on a single observation Yn 1 . Generally, we can improve our predictor when we use all of the past observations Y0 : Yn 1 . Problem 9.6.4 Example 9.18. (a) Given Q q, the conditional PMF of K is n k
PK Q k q
qk 1 q
n k
k 0 1 À n otherwise
0
The ML estimate of Q given K k is
qˆML k
arg max PQ K q k
0Á qÁ 1
Differentiating PQ K q k with respect to q and setting equal to zero yields
dPQ K q k dq
n k
kqk
1
1 q
n k
!
n k qk 1 q
n k 1
The maximizing value is q k n so that Qˆ ML K
K n
(b) To find the PMF of K, we average over all q. ∞
PK k
2Â
∞
1
PK Q k q fQ q dq ¡Â 13
0
n k q 1 q k
n k
dq
0
We can evaluate this itegral by expressing it in terms of the integral of a beta PDF. Since β k 1 n k 1 k!$ n$ n. 1k& !& ! , we can write PK k
1 n 1
1 Â
0
β k 1 n k 1 qk 1 q
n k
1
dq
n 1
That is, K has the uniform PMF PK k
1' n 1 0
k 0 1 À n otherwise
Hence, E K n 2. (c) The conditional PDF of Q given K is
fQ K q k
PK Q k q fQ q PK k
$ n. 1& ! k k!$ n k & ! q
1 q
n k
0
0 q 1 otherwise
That is, given K k, Q has a beta k 1 n k 1 PDF.
(d) The MMSE estimate of Q given K k is the conditional expectation E Q K k . From the beta PDF described in Appendix A, E Q K k2 k 1 ' n 2 . The MMSE estimator is K 1 n 2
Qˆ M K
E Q K
Problem 9.7.4 (a) This part is just algebra and doesn’t require any probabilities. By expanding the square, the random variable Vn can be written as Vn
1 n 1 Xi ∑ n i 1 n
2
n
∑ Xj
j 1
1 n 2 n 1 Xi2 Xi ∑ X j 2 ∑ n i 1 n j 1 n
1 n 2 2 Xi 2 n i∑ n 1
n
n
i 1
j 1
1 n 2 2 Xi 2 n i∑ n 1
∑ Xi ∑ X j 2
n
∑ Xj
i 1
1 n2
2
n
∑ Xj
j 1
1 n 1 n i∑ n2 1
1 n 2 Xi n i∑ 1
∑ Xj
j 1
14
Mn X
∑ Xj
j 1 2
n
(b) From the previous part, Vn
n
2
2
Taking expectations, we have E Vn
1 n E Xi2 n i∑ 1
2
E Mn X
E Xi2
E Mn X »
2
Since we know the mean and variance of the sample mean Mn X , we can calculate the second moment of the sample mean. E Mn X »
2
Var Mn X
E Mn X
'!
2
Var X n
!
E X
2
This implies E Vn E X 2
Var X n
1
E X
2
Var X 0
Var X n
n 1 Var X n
Problem 9.7.6 fXi x
1
2πσ2 §
x2 # 2σ2
e
The joint PDF of X1 Xn is n
fX1 5 Ã Ã Ã5 Xn x1 : xn
∏ fX
i
xi
1
i 1
2π
n# 2 σn
e $
x21 .4Ä Ä Ä . x2n &+# 2σ2
(a) The ML estimate of σ, maximizes the joint PDF fX1 5 Ã Ã Ã5 Xn x1 : xn . We find this by taking the derivative of the joint PDF with respect to σ. Using w2 x12 Å Æ xn2 to simplify our expressions, we have d fX1 5 Ã Ã Ã5 Xn x1 : xn dσ
σn e
w2 # 2σ2
w2 σ3 e n 2 σ2n
2π #
w2# 2σ2 nσn 1
0
Solving for σ yields σ
§ §
w2 n
x12 1 , xn2 §
n
The ML estimator is σˆ ML n
X12 ! Ç Xn2 §
n
(b) First we observe that Vn
σˆ 2ML n
X12 1 % Xn2 n
Since E Xi 0, we know that E Xi2 σ2 . In this case, we see that Vn is a sample mean estimate of E Xi2 σ2 . By Theorem 9.14, Vn is an unbiased consistent sequence of estimates of σ2 . 15
(c) To determine whether σˆ ML n is an unbiased estimator, we check to see if E σˆ ML n
σ
For arbitrary n, this is diffcult. For example for n 1, σˆ ML 1 is ∞ 1 2 2 2 e x # 2σ dx E σˆ ML 1 Â x 2 ∞ § § 2πσ 2πσ2
X12 ∞ Â
X1 . The mean value
x2 # 2σ2
xe
dx
0
Making the variable substitution u x2 2σ2 , we obtain E σˆ ML 1
∞
σ 2 π Â
u
e
0
du σ 2 π
Since σˆ ML 1 ¨ σ, we see that the estimator is biased. For n deviation is biased since X12 ! , Xn2 n
E
1, this estimator of the standard
X12 1 % Xn2 n
E ¨
To determine consistency of the estimator, we use Definition 9.5 and check whether for any ε 0, É
σ
σ
lim P σˆ ML n
nÈ ∞
ε 0
Equivalently, we can check if É
lim P σˆ ML n
nÈ ∞
ε 1
For our estimate of the standard deviation, we observe that /
P σˆ ML n
σ
ε P σ ε
σˆ ML n
P σ
P
2εσ ε2
2
Ê
2εσ ε
2
σ ε Vn
σ2 2εσ ε2
Vn σ2
2εσ ε2
For a nontrivial problem, σ 0. Hence, we can assume that ε is sufficiently small to ensure that ε σ. This implies that
2σε ε2
2σε ε2
0
This implies /
P σˆ ML n Let ε<Ë 2εσ ε2
σ
ε
2εσ ε2
P
P Vn σ2
¦
Vn σ2
2εσ ε2
2εσ ε2
0. Since Vn is a consistent estimator of σ2 , for any ε< É
lim P σˆ ML n
nÈ ∞
σ
ε
lim
nÈ ∞
P Vn σ2
Hence, σˆ ML n is a consistent sequence of estimates. 16
ε<
1
0,