NORMAL DISTRIBUTION
Applied Statistics and Computing Lab
Indian School of Business
Learning Goals •
To understand the concept of Normal Distribution
•
Useful properties of Normal Distribution
•
Finding Normal Probabilities
•
Applications of Normal Distribution
Normal Distribution • • •
•
•
One of the most important continuous distributions A number of real life examples “If a random variable is affected by many independent causes, and the effect of each cause is not overwhelmingly overwhelmingly large compared to other effects, then the random variable will closely follow a normal distribution. The lengths of pins made by an automatic machine, the times taken by an assembly worker to complete the assigned task repeatedly, the weights of baseballs, the tensile strengths of a batch of bolts, and the volumes of soup in a particular brand of canned soup are good examples examples of normally distributed random variables. ““- Aczel- Sounderpandian All of these are affected affected by a number of independent causes where the effect of each cause is small For eg length of a pin p in is affected by many independent causes such as vibrations, temperature, temperature, wear and tear on the machine, and raw material material properties.
Bell shaped Curve •
Data can be spread out in a number of ways- The following histograms histograms (relative frequency on y-axis) illustrates a few different shapes-
Skewed to the left
The normal curve is symmetric- that is, neither right-skewed nor left skewed.
Skewed to the right
All jumbled up
The bell shaped or normal curve
Normal Curve and real life data •
We look at how many real life data like weights of new born babies, heights of men and women resemble the bell shaped normal curve-
All the above diagrams show the results of fitting normal density curves to
What is Normal Density? The density function of a normal distribution distribution is given as:
where µ is the mean and σ is the variance of the normal distribution. These are the parameters of the distribution. To check that its is a pdf, if we integrate f(x) over the entire range of x we get a value of 1 Normal distributions are symmetric around their mean. The mean, median, and mode of a normal distribution are equal. 68% of the area of a normal distribution is within one standard deviation of the mean. Approximately 95% of the area of a normal distribution is within two standard deviations of the mean. •
•
•
•
•
•
Normal Distribution(S) •
Though normal distribution refers to bell shaped curves, the mean and variance of the normal distribution will, in general, differ from one normal distribution to another resulting in different shapes of the bell- the mean and variance are thus the parameters of the normal distribution In the diagram on the left, all the shapes are that for bell shaped normal curves, but note how the shapes differ with different means and variances
Standard Normal Distribution: Need for standardization • •
•
How to compare normal distributions with different µ and σ? We define the standard normal variable variable Z= (X(X - µ)/ σ, where µ and σ are respectively the mean and standard deviation deviation of the normal variable X Z follows normal distribution with mean= 0 and standard deviation=1 deviation=1
Why Standardize? •
•
•
•
•
By standardizing standardizing a normally distributed distributed variable, we we can find the area under its normal curve using a table. This is because the percentage percentage of observations of the original normally distributed distributed variable that lie between a and b is the same as the percentage of observations of the standard normal variable, z, that that lie between between (a−µ)/σ and (b−µ)/ (b−µ)/ σ Also, it facilitates facilitates comparison and helps helps you make decision about your data. data. Eg: Prof Snape has given the following marks marks in an exam ( out of 60, 30 is the qualifying marks)- 20, 15, 26, 32, 18, 28, 35, 14, 26, 22, 17 So, only one student has passed! The mean marks= 23 and the standard deviation= deviation= 6.6. Prof. Snape decides to set a new qualifying marks- only those students who would score less than 1 standard deviation from the mean will not qualify. These are the standard scores: -0.45, -1.21 -1.21,, 0.45, 1.36, -0.76, 0.76, 1.82, -1.36 -1.36,, 0.45, -0.15, -0.91. -0.91. So, now only two students fail
Reading the standard normal table There is the standard normal table available which shows the area of the normal curve to the left of the standard normal variable For eg, from the table what is A snapshot of the table: P(z<= .69)? •
•
•
Read it directly from the table: 8th row and 11th column, that is, 0.7549 P(Z<=.33)= .6293 But this way, you are only able to find the area to the left of the standard normal variable What If you are asked to find the area to the right of a standard normal variable? Or any area to the left or right of a negative standard normal variable? For these, we will use various •
•
•
•
•
•
Properties of Standard Normal Distribution Most useful properties of the normal distribution are based on the symmetry property of normal distribution.
P(z<=a)= F(a) P(z>=a)= 1-p(z<=a)= 1-F(a)= F(-a) Symmetry of normal distribution distribution P(z<= (-a))= F(-a)= 1-F(a) P(z>= (-a))= 1-F(-a)= F(a)= P(Z<=a) P(b<=Z<=a)= P(Z<=a)- P(Z<=b)= F(a)-F(b) Check that with these set of results you can evaluate any probability of standard normal variable! •
•
•
•
•
•
An example: •
A survey indicates that for each trip to the supermarket, a shopper spends an average µ=45 minutes with a standard deviation of σ=12 minutes. The length of time spent in the store is normally distributed and is represented by the variable x. A shopper enters the store. (a) Find the probability that the shopper will be in the store for each interval of time listed below. (b) If 200 shoppers enter the store, how many shoppers would you expect to be in the store for each interval of time listed below? 1) Between 24 and 54 minutes
•
2) More than 39 minutes
Solution: The graph at the left shows a normal curve with µ=45 minutes and σ=12 minutes. The area for x between 24 and 54 minutes is shaded . a) The z-scores corresponding to x=24 and x=54 are: Z1= (24-45)/12= -1.75, Z 2 = (54-45)/12= .75 So, the probability that a shopper will be in the store between 24 and 54 minutes is P(-1.75<=Z<=.75)= F(.75)-F(-1.75)= F(.75)- [1-F(1.75)] = F(.75)+F(1.75)-1 F(.75)+F(1.75)-1 =.7734-.0401=.7333 (from the standard normal table) •
•
Solution Continued Continued b) Another way of interpreting interpreting this probability is to say that 73.33% of shoppers will be in the store between 24 and 54 minutes after entering. entering. So if 200 shoppers enter the stop, we expect (200*.7333)=146.66 or 147 shoppers to stay between 24 and 54 minutes. •
The graph below shows the normal curve with µ=45 minutes and σ=12 minutes and the area greater than 39 minutes is shadedThe z-score corresponding to 39 mins is Z= (39-45)/12= -.5 P(Z> (-.5))= 1- P(Z<=.5)= 1-.3085= .6915 If 200 shoppers enter the store, you would expect 200*(.6915)= 138.3 shoppers to stay in the store for more than 39 minutes •
•
•
Source:
Example: Given probability, finding the Z ordinate •
•
•
The amount of fuel consumed by the engines of a jetliner on a flight between two cities is a normally normally distributed distributed random variable X with mean of 5.7 tons and standard deviation of 0.5. Carrying too much fuel is inefficient as it slows the plane. If, however, too little fuel is loaded on the plane, an emergency landing may be necessary. The airline would like to determine the amount of fuel to load so that there will be a 0.99 probability that the plane will arrive at its destination. Solution: We first find the value of Z such that P(Z<= z)= .99. .99. From From the standard normal normal table we see that the value of z corresponding to .99 .99 is 2.33 (check slide 14, 25th row, 5th column). Transforming the z value to an x value, we get x = µ+σ z =5.7 + (0.5) *(2.33)=6.865. Thus, the plane should be loaded with 6.865 tons of fuel to give a 0.99 probability that the fuel will last throughout the flight. Source: - Complete Business Statistics Statistics by Aczel- Sounderpandia Sounderpandian n
Snapshot of Standard normal Table
Further Applications Applications •
•
•
•
•
If the weekly wage of 20,00 workers in a factory follow normal distribution with mean 5,000 and standard deviation 500 respectively, find the expected number of workers whose weekly wages are a) between Rs 4000-4500 b) Less than Rs 4,000 c) More than Rs 5,000 The marks obtained by a group of students for Mathematics are assumed to be normally distributed with mean 60 and standard deviation 8. If 5 students are taken at random from this set, what is the probability that exactly one of them will have marks above 70? ( Hint: First find the probability that the marks is above 70 by using normal distribution. Then letting Y denote the number of students who have marks above 70 out of the 5 students, students, find the binomial probability probability for Y taking the value 1) Normal Distribution has many applications in business: For eg, modern portfolio portfolio theory assumes that that the return return of a diversified diversified asset portfolio follows a normal distribution distributio n In HR management, the performance of employees is often assumed to be normally distributed
Thank you