Measures of Dispersion, Ske Skewness, wness, and Kurtosis
•
Descriptive summary measure
•
Helps characterize data
•
Variation of observations
•
Determine degree of dispersion of observations about the center of the distribution
•
Absolute dispersion •
•
Same unit as the observations obser vations
Relative Relativ e dispersion dispers ion •
No unit
•
Measures of dispersion cannot be negative
•
Smallest possible value is zero
Absolute Dispersion
•
Simplest and easiest to use
•
Difference between the highest and the lowest observation
− −
Disadvantages •
•
•
•
Description of data is not comprehensive Affected by outliers Smaller for small samples; larger for large samples Cannot be computed when there is an open-ended class interval
Advantages •
Simple
•
Easy to compute
•
Easy to understand
•
•
•
Describe variation of the measurements Average squared difference of each observation from the mean May also be used as a measure of how good the mean is as a measure of central tendency
•
Unit of the variance is the squared unit of the observations
•
People tend to use standard deviation for easier interpretation
Population Variance
Sample Variance
•
Denoted by σ2
•
Denoted by s2
•
N elements
•
n elements
•
Parameter
•
Statistic
•
Cannot be computed using sample data
•
Estimate value of the population variance
•
•
•
•
•
Utilizes every observation Affected by outliers; extreme values make the standard deviation bloated Cannot be computed when there are open-ended intervals Addition or subtraction subtraction of a constant c to eac eachh obser observation vation would yield the same standard deviation as the original data set Multiplication or division of each observation by a constant would result in a standard deviation multiplied by or divided by the constant
Relative Relativ e Dispersion
•
•
•
•
Compare variability of two or more data sets even if the they y have different means or different units of measurement Ratio of the standard deviation to the mean, expressed as a Ratio percentage (denoted by CV) Small CV means less variability; large CV means greater variability Not to be used when mean is 0 or negative
•
•
A sample surve sur vey y in a certain cer tain province showed showed the number of underweight children under five years of age in each barangay: 3 5 6 4 7 8 6 9 10 4 6 7 5 8 9 8 3 4 5 5 Given the frequency distribution table of scores
The number of incorrect answers on a true-false exam for a random sample of 20 students was recorded as follo follows: ws: 2, 1, 3, 2, 3, 2, 1, 3, 0, 1, 3, 6, 0, 3, 3, 5, 2, 1, 4, and 2. Given the frequency distribution of scores of 200 students in an entrance exam in college.
Scores
Freq.
LCB
UCB
59 – 62
2
2
58.5
62.5
63 – 66
12
14
62.5
66.5
67 – 70
24
38
66.5
70.5
71 – 74
46
84
70.5
74.5
75 – 78
62
146
74.5
78.5
79 – 82
36
182
78.5
82.5
83 – 86
16
198
82.5
86.5
87 – 90
2
200
86.5
90.5
•
•
•
Relying solely on the mean and standard deviation may be misleading Possible for two data sets to have same mean and standard deviations, yet different shapes If it is possible to divide the histogram at the center into two identical halves where each eac h half is a mirror image of the other other,, then the distribution is symmetric. Otherwise, it is skewed .
Positively Skewed •
•
•
Skewed Skew ed to the right Values concentrated at the left Upper tail stretches out more than the lower tail
Negatively Skewed •
•
•
Skewed to the left Values concentrated to the right Lower tail stretches Lower stretc hes out more than the upper tail
•
•
Single value that indicates the degree and direction of asymmetry Denoted by Sk
Sk = 0
Symmetric
Sk > 0
Positively skewed
Sk < 0
Negatively skewed
•
•
•
To determine degree of skew skewness, ness, use |Sk| (magnitude of Sk) If |Sk| is far from 0, then it is an indication that the distribution is seriously ser iously skewed skewed Most commonly used measures •
Pearson’ earson’ss first and second coefficients of skewness
•
Coefficient of skewness based on third moment
•
Coefficient of skewness based on the quartiles quar tiles
•
•
•
Relationships among the mean, median, and mode as basis Signs of the measures depend only on the sign of the numerator because S is not negativ negative e Problems with Pearson’s Pearson’s first coefficient of skewness
associated with problems of using the mode
•
•
•
Based on the definition of quartiles •
Around 25 percent fall between Q1 and the median
•
Around 25 percent fall between the median and Q 3
Symmetric distribution distance between Q1 and Md = distance between Md and Q3 Skewed Skew ed distribution •
Positively skewed Md is closer to Q 1
•
Negatively skewed Md closer to Q 3
•
Term coined by Karl Pearson
•
Greek word kurtos which means convex convex
•
•
Shape of a hump of a relative frequency distribution compared to the normal distribution Three classifications •
Mesokurtic
•
Leptokurtic
•
Platykurtic
•
Graph
•
Displays the following •
Location
•
Spread
•
Symmetry
•
Extremes
•
Outliers
1. Construct a rectangle with one end at the first quartile and the other end at the third quartile. 2. Put a vertical line at the median, across the interior of the rectangle. 3. Compute for the inter-quar inter-quartile tile range, range, lower fence, and upper fence. 4. Locate smallest and largest values within the intervals inter vals [FL , Q1] and [Q3, FU], respectively. respectively. Draw a line from these values to the quartiles. 5. Values falling outside the fences are considered outliers, denoted by “x”.
•
Construct the boxplot for the following following data set:
1
15
21
22
24
10
18
22
23
25
14
20
22
24
28
Definition •
Population Variance
•
•
( − ) =
( − ) =
−1
Population Variance
Sample Variance
Computational Formula
•
− ( ) = =
Sample Variance
− ( ) = =
( − 1)
Definition •
Population Variance
•
Computational Formula
= (
− )
•
•
Sample Variance
Population Variance
= (
− )
−1
= − (= )
Sample Variance
= − (= ) ( − 1)
Population CV
•
Sample CV
× 100%
× 100%
Where
•
Where
•
σis
the population standard deviation
•
•
μis
the population mean
•
is the sample standard deviation is the sample mean s
First Coefficient of Skewness
−
Second Coefficient of Skewness
3( − )
= sample mean; Md= sample median; Mo = sample mode; Where S = sample standard deviation
4
− − ( ( − ) −
+ − 2 −