Hypothesis Testing of population mean
Given sample data, we want to test if the population mean is a
µ 0
Before we start testing, we first f irst need to define the following 1) H 0 : Null hypothesis Our assumption of the population mean. Eg H 0 : µ
=
µ 0
2) H 1 : Alternative hypothesis How the population mean is different. Eg: H 1 : µ > µ 0 or H 1 : µ < µ 0 or H 1 : µ ≠ µ 0 Let’s use an example to illustrate The manufacturer of coke cans claims that the mean weight of coke cans (i.e. population mean) is 100 g and that the weight of a randomly selected coke can is normally distributed and the variance is 10 . You take a random sample of 20 coke cans and find that the mean weight is 102 g hence you suspect that the population mean should be more than 100 g. In this example, we want to test whether the weight of coke can is more than the 100 g that is claimed by the manufacturer. Hence the writing of H 0 and H 1 will be H 0 : µ = 100 H 1 : µ > 100
Let X be the random variable the weight of a coke can
The question now is, how small is small enough for us to conclude that µ > 100 ? In any statistical test we are doing, there will be errors. For example, in the above example, the probability of getting a sample that exceeds our sample mean of 102 is 0.00234. Assume we decide that this probability is small and reject H 0 . However there is still a probability of 0.00234 for it to occur if H 0 is true. In this situation, we will have made the wrong conclusion. There are two types of errors we can make 1) Type I error: the event of rejecting the null hypothesis when the null hypothesis was true 2) Type II error: the event of failing to reject the null hypothesis when the null hypothesis was false. Level of significance of a test (denoted by α%): probability of rejecting H 0 when H0 is true (making Type I error). This probability is fixed before the test to help us decide whether to reject H0 or not to reject H 0. In hypothesis testing, we reject H 0 when p-value <
α
100
.
2
Case 1: X is normally distributed and population variance σ is known. [N2001/P2/10a] A random variable X is known to have a normal distribution with variance 36. The mean of the distribution of X is denoted by µ . A random sample of of 50 observations of X has mean 20.2. Test, at 1% level of significance level, the null hypothesis µ = 22 against µ < 22 2
Since X is normally distributed and population variance σ is known, we will be using the Z-Test. i.e X is normally distributed. Possible null and alternative hypothesis testing H 0 : µ = µ 0 , H 1 : µ < µ 0 H 0 : µ
= µ 0 ,
H 1 : µ > µ 0
H 0 : µ = µ0 , H 1 : µ ≠ µ 0 H 0 : µ = 22 H 1 : µ < 22 Under H 0 : X ~ N ( 22, 36 )
Given that X ~ N ( µ , σ
2
σ 2 ) , The sample mean of n observation X ~ N µ , n
X ~ N 22
36
We can also use the GC Z-Test to find the p-value. Press STAT button to select the TESTS menu Key in the data as shown.
Note that the in the GC Z-Test, σ is the population standard deviation. Possible conclusion α
. We reject H 0 and conclude that we have sufficient evidence at 100 α % level of significance to [write in context to question]
1) Since p-value <
α
. We do not reject H 0 and conclude that we have insufficient 100 evidence α % level of significance to [write in context to question]
2) Since p-value>
Since p-value=0.0169>0.01. We do not reject H 0 and conclude that we have insufficient evidence at 1% level of significance to say that the mean is less than 22.
2
Case 2: X is normally distributed and population variance σ is not known. For this type of question, the question will usually either give you
∑ x
and
∑ x
2
, or the
sample variance. Given these data we need to estimate the population variance. Sample mean and variance From n sample observation X 1 + X 2 + X 3 + + X n Sample mean
: x =
Sample variance
: s x2
1
∑x
n 1
=
∑ ( x x) n −
2
=
1
∑ x (x) n 2
2
−
2
Unbiased estimator for the population σ 2
σ
=
n n −1
s x2
=
1 n −1
∑ x
2
−
( ∑ x ) n
2
2 = 1 ∑ ( x − x ) [given in MF15] n −1
When we estimated the population variance, we are introducing errors. Hence, X will not be normally distributed. Instead it now follows the Student distribution. Given that X ~ N ( µ s 2 ) where s is the unbiased estimator for the population variance. 2
[N2008/P2/Q6] In a mineral water from a certain source, the mass of calcium, X mg , in a one-litre bottle is a normally distributed random variable with mean µ . Based on observations over a long period, it is known that µ = 78 . Following a period of extreme weather, 15 randomly chosen bottles of the water were analysed. The mass of calcium in the bottles are summarised by x = 1026.0 , x 2 = 77265.90
∑
∑
Test, at the 5% significance level, whether the mean mass of calcium in a bottle has changed. 2
Since X is normally distributed and population variance σ is unknown, we will be using the T-Test. i.e X is follows the Student distribution. x =
1026.0 15
2
σ
=
=
68.4
2 x ) ( 2 ∑ ∑ x − = ( 22.5) 2 n −1 n
1
H 0 : µ = 78 H 1 : µ ≠ 78 Under H 0 , 2 X ~ N ( 78, 22.5 )
T =
X − 78
22.5 15
2
~ t (15 − 1)
Calculation of p-value for T-Test
x − µ 0 p-value= P T < <α % ? for µ < µ 0 2 s n x − µ 0 p-value= P T > < α % ? for µ > µ 0 2 s n x − µ 0 p-value=2 P T < < % ? α s2 n or for µ ≠ µ 0 x − µ 0 p-value=2P T > < α % ? 2 s n Since x = 68.4 < 78 ,
p-value = 2 P T
<
68.4 − 78 99 = 2 P ( T < − 1.65247 ) = 2 tcdf (10 , −1.65247,15 − 1 ) = 0.121 2 22.5 15
Using the GC to find p-value
Case 3: X is not normally distributed and number of observations is large.
We can use the Central Limit Theorem to approximate X to a normal distribution. [N2007/P2/Q7] A large number of students in a college have completed a geography project. The time, hours, taken by a student to complete the project is noted for a random sample of 150 students. The results are summarised by x = 4626, x 2 = 147691 .
∑
∑
Find unbiased estimates for the population mean and variance. Test, at 5% significance level, whether the population mean time for student to complete the project exceeds 30 hours. State, giving reason, whether any assumption about the population are needed in order for the test to be valid.
The unbiased estimate of the population mean µ =
∑ x n
=
4626 150
=
30.84
The unbiased estimate of the population variance 2
σ
2 1 147691 x ( ) ∑ 2 2 = x − = − 30.84 ∑ = 33.7259.... ≈ 33.7 n −1 n 150 150
1
H 0 : µ = 30 H 1 : µ > 30
Case 4: X is not normally distributed and number of observations is small.
For these cases, we cannot use the central limit theorem to approximate the X to normal distribution. We need to assume that X is normally distributed, then it is either case 1 ( σ 2 known) or 2
case 2 ( σ 2 unknown and σ used).
[FM 1999/P2/Q7i] The manufacturer of an electric water heater claim that their heater will heat 500 litres of O O water from a temperature of 10 C to a temperature of 35 C in , on average no longer than 12 minutes. In order to test their claim, 14 randomly chosen heaters are bought and time O O times ( x minutes) to heat 500 litres of water from 10 C to 35 C are measured. Corrected to 1 decimal place, the results are as follows. 13.2 12.1
12.2 12.6
11.4
14.5
11.6
12.9
12.4
∑ x = 171.3, ∑ x 2
10.3 =
12.3
11.8
11.0
13.0
2109.81
Stating, any assumptions necessary for validity, test the manufacturer’s claim at 10% significance level using a t-test. Let X be the random variable the time taken to heat the water. Assuming that X is normally distributed, we will use the T-Test. Note the special way to define the null and alternative hypothesis. In this question, the manufacturer claims that the time is less than 12.
Summary Determine H0: µ=
0
Determine H1: µ< µ0, µ> µ0 or µ≠ µ0
Calculate x =
1 n
∑x 2
σ =
n
2
σ
Assume Normal
n ≥ 50?
known?
no
n −1
2
s x
2 2 ( ∑ x ) 1 2 x − x − x) = = ( ∑ ∑ n −1 n n −1
1
yes
T-Test T =
Normal
no
2
n
yes 2
σ
estimated?
no Z-Test
σ
n
X ~ N µ 0 ,
2
yes
n ≥ 50?
yes approximately (optional)
( < x ) <α % ? for µ < µ p-value= P ( X > x ) <α %? for µ > µ p-value=2 P ( X < x ) <α %? for µ ≠ µ p-value=2 P ( X > x ) <α % ? p-value= P X
~ t ( n − 1)
σ
distribution?
CLT
X − µ 0
0
0
0
no
x − µ 0 p-value= P T < <α %? for µ < µ 0 2 s n
< α % ? for µ > µ 0 2 s n x − µ 0 p-value=2 P T < < α % ? 2 s n for µ ≠ µ 0 x − µ 0 p-value=2 P T > < α % ? 2 s n p-value= P T >
x − µ 0
Comparison between p-value and critical values Test
Z-Test
p-value
Non-standardised
T-Test
Standardised
p-value
H 1 : µ < µ 0
To reject:
To reject:
(
p-value = P X =
)
p-value <
x critical = invNorm
σ
n
normalcdf −1099 , x, µ o , α
100
To reject:
2
α
100
x < x critical
2
, µ 0 ,
σ
n
To reject:
α , 0,1 100
zcritical = invNorm
z test =
x − µ 0 2
σ
α , n − 1 100
tcritical = invT
t test =
x − µ 0 2
σ
n
n
ztest < zcritical
t test < t critical
p-value
H 1 : µ > µ 0
To reject:
To reject:
(
p-value = P X =
>x
)
p-value <
x critical = invNorm 1 −
σ
n
2
normalcdf x,1099 , µ o ,
To reject:
x > x critical
α
100
α
100
2
, µ 0 ,
σ
n
To reject:
zcritical = invNorm 1 −
z test =
x − µ 0 2
α
100
, 0,1
tcritical = invT 1 −
t test =
α
100
, n − 1
x − µ 0 2
σ
σ
n
n
ztest > zcritical
t test > t critical
To reject:
(
p-value=2 P X
< x
) <α %
OR
(
p-value=2 P X H 1 : µ ≠ µ 0
> x
) <α % 0.5α To reject: , 0, 1 zcritical = invNorm 1 − must lies in the rejection 100 tcritical = invT 1 − 0.5α , n − 1 region. It is too messy to do as 100 x − µ 0 there will be two critcal values, ztest = x − µ 0 2 σ hence usually standardised Z Test t test = 2 will be easier. n σ To reject: x
ztest > zcritical
n
t test > t critical