Comparing the mean and the median
1.
If the median is less than the mean, the data set is skewed to the right.
Relative Frequency
Median
Rightward Skewness Skewness
2.
Mean
measurement units
Mean Mode s tan dard deviation
3(mean median) s tan dard deviation
The median will equal the mean when the data set is symmetric.
Median
Mean Measurement unit
Symmetry
17
3.
If the median is greater than the mean, the data set is skewed to the left.
Mean
Median
The range: A measure of variability
Measures of Variation Definition:
The range of a data. Set is equal to the largest measurement minus the smallest measure. When dealing with grouped data, there are two procedures which are not adopted for determining the range. 1. 2.
Range = class mark of highest class – class mark of lowest class. Range = upper class boundary of highest class – lower class boundary of lowest class.
Variance and Standard Deviation
The Sample Variance for a sample of n measurements is equal to the squared distances from the mean divided by (n-1). In symbols using S 2 to represent the simple variances,
n
( x x )
2
i
S 2
i 1
n 1
The second step in finding a meaningful measure of data variability is to calculate the standard deviation of the data set.
18
The second method of describing qualitative data sets sets is the pie chart. This is often used in newspaper and magazine articles to depict budgets and other economic information. A complete circle (the pie) represents the the total number of measurements. This is partitioned into a number of slices slices with one slice slice for each o category. For example, since since a complete circle spans 360 , if the relative frequency for a category is .30, the slice assigned to that category is 30% of 360 or (.30) (36) = 108o.
108o
Figure 1.2 The portion of a pie char corresponding to a relative frequency frequency of .3.
Graphical Methods for Describing Quantitative Data. The Frequency Histogram and Polygon.
The histogram (often called a frequency distribution) distribution) is the most popular graphical technique for depicting quantitative data. To introduce the histogram we will use thirty companies selected randomly from the 1980 Financial Magazine (the top 500 companies in sales for calendar year 1979). The variable X we will be interested interested in is the earnings per share (E/S) for these thirty companies. The earnings per share is computed by dividing the year‟s net profit by the total number of share of common stock outstanding. This figure is of interest to the economic community because it reflects the economic health of the company. The earnings per share figures for the thirty companies are shown (to the nearest ngwee) in Table 1.3.
Company
E/S
Company
E/S`
8
Company
E/S
1 2 3 4 5 6 7 8 9 10
1.85 3.42 9.11 1.96 6.48 5.72 1.72 .8.56 0.72 6.28
11 12 13 14 15 16 17 18 19 20
2.80 3.46 8.32 4.62 3.27 1.35 3.28 3.75 5.23 2.92
21 22 23 24 25 26 27 28 29 30
2.75 6.58 3.54 4.65 0.75 2.01 5.36 4.40 6.49 1.12
How to construct a Histogram
1.
Arrange the data in increasing order, from smallest to largest measurement.
2.
Divide the interval from the smallest to the largest measurement into between five and twenty equal sub-intervals, making sure that: a)
Each measurement falls into one and only one measurement class.
b)
No measurement falls on a measurement class boundary. Use a small number of measurement classes if you have a small amount of data; use a larger number of classes for large amount of data.
3.
Compute the frequency measurement class.
(or relative frequency) of measurements in each
4.
Using a vertical axis of about three-fourths the length of the horizontal axis, plot each frequency (or relative frequency) as a rectangle over the corresponding measurement class. Using a number of measurements, n = 30, is not large, we will use six classes to span the distance between the smallest measurements, 0.72, and the largest measurement, 9.11. This distance divided by 6 is equal to Largest measurement – smallest measurement Number of intervals
=
9.11 – 0.72 6 1.4
By locating the lower boundary of the first class interval at 0.715 (slightly below the smallest measurement) and adding 1.4, we find the upper boundary to be 2.115. Adding
9
1.4 again, we find the upper boundary of the second class to be 3.515. Continuing this process, we obtain the six class intervals shown in the table below. Note that each boundary falls on a 0.005 value (one significant digit more than the measurement), which guarantees that no measurement will fall on a class boundary. The next step is to find the class frequency and calculate the class relative frequencies
Class
1 2 3 4 5 6
Measurement Class 0.715 – 2.115 2.115 – 3.515 3.515 – 4.915 4.915 – 6.315 6.315 – 7.715 7.715 – 9.115 Total
Class Frequency 8 7 5 4 3 3
Class relative Frequency 8/30 = .267 7/30 = .233 5/30 = .167 4/30 = .133 3/30 = .100 3/30 = .100
30
1.00
Table 1.4
Definition
The class frequency for a given class, say class i, is equal to the total number of measurements that fall in that class. The class frequency for class I is denoted by the symbol f i . Definition
The class relative frequency for a given class, say class i, is equal to the class frequency divided by the total number n of measurements, i.e. Relative frequency for class i =
f i n
10
8
6
4
2
0
a)
0.517 2.115 Earnings per share Frequency Histogram.
3.515 4.915
6.315 7.715
.3
.2
.1
0.715
(b)
2.115 3.515 4.915 6.315 7.715 9.115
Earnings per share Relative Frequency histogram
Cumulative Frequency Distribution
11
9.115