Topic
2
PRESENTATION OF DATA
INTRODUCTION: Before applying any statistical statistical technique on the raw data, we must arrange and classify the data in the systematic form. So that the statistical work become simple and easy. This is called presentation of data. Usually following four methods are used for the presentation of data. (i) Classification (ii) Tabulation (iii) Diagrammatical (iv) Graphical
CLASSIFICATION: The process of arranging data into classes or categories according to some common characteristics present in the data is called as classification.
The Basis of Classification: There are four important bases for classification of data. (i) Qualitative base (ii) (iii) Geographical base (iv)
(i)
Quantitative base Chronological base
Qualitative Base:
The classification is called Qualitative when the data are classified by qualities or attributes such as gender, marital status, employment status, religion, beauty etc.
(ii)
Quantitative Base:
The classification is called Quantitative when the data are classified by quantitative characteristics such as heights, age, weight, distance, length, income etc.
(iii)
Geographical Base:
The classification is called Geographical when the data are classified by geographical regions or locations. For example, the population of country may be classified by provinces, division, districts, tehsils or towns etc.
(iv)
Chronological Base:
The classification is called Chronological when the data are arranged by successive time periods. For example, the monthly sale of a departmental store, yearly enrollment of students in M.A.O. College, hourly temperature recorded by weather bureau etc.
Types of Classification: Some important types of classification are; (i) One way classification. (iii) Three way classification.
(i)
(ii) Two way classification. (iv) Many way classification.
One Way Classification:
When the data are classified by one characteristic, then the classification is said to be one way. For example, the population of country may be classified by religions as Muslims, Christians and Sikhs.
(ii)
Two Way Classification:
When the data are classified by two characteristics simultaneously simultaneously (at a time), then classification is said to be two way. For example, the students of Punjab University, Lahore may be classified by Age and Height.
(iii)
Three Way Classification:
When the data are classified by three characteristics simultaneously, then classification is said to be three way. For example, the population of city Lahore may be classified by Religion, Sex and Literacy rate.
(iii)
Many Way Classification:
When the data are classified by many characteristics simultaneously, then the classification is said to be many way. For example, the population of city Lahore may be classified by Religion, Sex, age, height, Literacy rate etc.
TABULATION: The process of systematic arrangement of data into rows and columns is called tabulation. Classification is first step of tabulation. Tabulation may be single, double or manifold depending on the type of classification.
Main Parts of Table and its Construction: A statistical table has at least four major parts as; (i) The title (ii) The box head (iii) The stub (iv) The body of table In addition some tables have some other minor parts as; (v) Prefatory Note or Head Note (vi) Foot Note (vii) Source Note
…………………………….TITLE…………………………...
(Prefatory Notes) Column Captions
BOX HEAD
s n o i t B p a U C T S w o R
.........
……..
...B O
D Y…
……..
……..
Foot note……….. Source note ……..
Example 2.1 Represent the data given in the following paragraph in the form of a table, so as to bring out clearly all the facts, including the source and bearing suitable title; “According to the census of Manufactures Report 1945, the John Smith Manufacturing Company employed 400 non-union and 1250 union employees in 1941. Of these 220 were females of which 140 were non-union. In 1942, the number of union employees increased to 1475 of which 1300 were males. Of the 250 non-union employees 200 were males. In 1943, 1700 employees were union members and 50 were non-union. Of all the employees in 1943, 250 were females of which 240 were union members. In 1944, the total number of employees was 2000 of which one percent was non-union. Of all the employees in 1944, 300 were females of which only 5 were non- union.”
Solution:
Title
DISTRIBUTION OF EMPLOYEES OF THE JOHN SMITH MANUFACTURING COMPANY BY SEX AND MEMBERSHIP DURING 1941 TO 1944 BOX HEAD
Captions
All
Union Total
Male
Non-union
Year
Total
Male
Female
Female
1941 B U 1942 T 1943 S 1944
1650
1430
1725
1500
1750
1500
BODY
2000
1700
Total
Male
Female
220
1250
1170
80
400
260
140
225
1475
1300
175
250
200
50
250
1700
1460
240
50
40
10
300
1980
1685
295
20
15
5
Source note: Census of Manufacturers Report, 1945.
FREQUENCY DISTRIBUTION: A tabular arrangement of data into classes with corresponding class frequencies is called as frequency distribution. Data which has classified in various categories or groups is called as Grouped data while Data which have not been arranged in a systematic order are called Raw data or Ungrouped data .
DISCRETE FREQUENCY DISTRIBUTION: When an observation is repeated, it is Discrete or counted. If the repeated observations are written once with the number of times it occurs in a tabular form in ascending order, is known as discrete frequency distribution. The number of observations is de noted by “X” and the number of times it occurs i.e. frequency is denoted by “ f ”.
RELATIVE FREQUENCY DISTRIBUTION: Relative frequency of a class is obtained through dividing the frequency of that class by the sum of frequencies of all the classes. It is generally expressed as a percentage so obviously the sum of the relative frequencies of all the classes is equal to 1 or 100. If all the frequencies in a frequency distribution are changed into relative frequencies, the resulting distribution is called as “Relative frequency distribution or Percentage distribution.”
CUMULATIVE FREQUENCY DISTRIBUTION: The total frequency of all values less than the upper class boundary of a given class is called Cumulative frequency up to and including that class. Cumulative frequency distribution is obtaining by adding the frequency of each class to frequencies of pre ceding classes and denoted by “c. f” or “F”.
Types of Cumulative frequency distribution: There are two types of Cumulative frequency distribution; (i) Less than Cumulative frequency distribution (ii) More than Cumulative frequency distribution
(i)
Less than Cumulative frequency distribution:
When the frequencies are cumulated from the lowest value to the highest value, it is referr ed to as “ less than” type cumulative frequency distribution. It should be noted that a “less than ” type cumulative frequency distribution starts with the lower class boundary of the first group indicating that there is no frequency below it.
Example Construct a less than type cumulative frequency distribution from the following data. Age
20-24
25-29
30-34
35-39
40-44
45-49
50-54
f
1
2
26
22
20
15
14
Solution:
LESS THAN TYPE CUMULATIVE FREQUNCY DISTRIBUTION Age C – I 20 – 24 25 – 29 30 – 34 35 – 39 40 – 44 45 – 49 50 – 54 TOTAL
(ii)
f
C – B
1 2 26 22 20 15 14 100
19.5 – 24.5 24.5 – 29.5 29.5 – 34.5 34.5 – 39.5 39.5 – 44.5 44.5 – 49.5 49.5 – 54.5 -
Less than Class Boundaries Less than 19.5 Less than 24.5 Less than 29.5 Less than 34.5 Less than 39.5 Less than 44.5 Less than 49.5 Less than 54.5 -
c.f 0 1 3 29 51 71 86 100 -
More than Cumulative frequency distribution:
When the frequencies are cumulated from the highest value to the lowest value, it is referred to as “more than” type cumulative frequency distribution. Example Construct a more than type cumulative frequency distribution from the following data. Age
20-24
25-29
30-34
35-39
40-44
45-49
50-54
f
1
2
26
22
20
15
14
Solution: MORE THAN TYPE CUMULATIVE FREQUNCY DISTRIBUTION Age C – I
f
C – B
More than Class Boundaries
c.f
-
-
-
More than 19.5
100
20 – 24
1
19.5 – 24.5
More than 24.5
100-1=99
25 – 29
2
24.5 – 29.5
More than 29.5
99-2=97
30 – 34
26
29.5 – 34.5
More than 34.5
97-26=71
35 – 39
22
34.5 – 39.5
More than 39.5
71-22=49
40 – 44
20
39.5 – 44.5
More than 44.5
49-20=29
45 – 49
15
44.5 – 49.5
More than 49.5
29-15=14
50 – 54
14
49.5 – 54.5
More than 54.5
14-14=0
TOTAL
100
-
-
-
DIAGRAMS OR CHARTS: A diagram is any one, two or three-dimensional form of graphical representation. The commonly used diagrams or charts are as; (i) Simple Bar Chart (ii) Multiple Bar Chart (iii) Component Bar Chart or Sub-divided Bar Chart (iv) Percentage Component Bar Chart (v) Rectangular Bar Chart (vi) Pie chart
Simple Bar Chart or Diagram: Simple Bar Chart is used to represent the data having a single variable. The vertical or horizontal bars are made to represent the data when the difference between different quantities is usually small. The width of the bars always uniform and has no significance. The length of the bars is proportional to the size of quantities. The space between the bars should not be more than the width of bars and should not be less than half of its width. The vertical bars are used to represent time series or quantitative data while horizontal bars are used to represent qualitative or geographical data. A data which do not belong to time should be arranged in ascending or descending order before drawing chart.
Example 2.10 The following table shows the production of wheat in Pakistan during the year 2001 to 2006. Represent the data by a Simple Bar Chart Years
2001
2002
2003
2004
2005
2006
Production(Lakh tons)
64
68
73
75
71
81
Solution: SIMPLE BAR CHART SHOWING PRODUCTION OF WHEAT IN PAKISTAN FOR THE YEARS 2001 TO 2006
100
80
n 60 o i t c u d o r P 40
20
0 2001
2002
2003
2004
2005
2006
Years
Pie Chart: Pie Chart has the same function as sub-divided rectangular chart. The only difference between them is that “in Pie Chart the circles are used instead of rectangles ”. A Pie Chart is consisting of a circle divided into different sectors or pie shaped pieces whose areas are proportional to the various parts into which whole quantity is divided. The sectors are shaded differently to show the relationship of parts with the whole. A pie Chart is also known as Sector Diagram. To construct the Pie Chart, draw a circle of any convenient radius. The whole quantity to be displayed is equal to 360 because a total angle of circle is 360 0. So the angles for each component are calculated and these angles are used to show different components. The angles are calculated by the following formula; Angle =
Component part 0 360 WholeQuantity
Then divide the circles into different sectors by constructing angles at the center with the help of a protractor.
Example 2.11 The following table gives expenditures in rupees of a Family on different commodities or items. Represent the data by a Pie Chart. Items
Expenditure in Rs.
Food Clothing Rent Medical Care Other items
190 64 100 46 80
Solution: Items
Expenditure in Rs.
Food
190
Clothing
64
Rent
100
Medical Care
46
Other items
80
TOTAL
480
Angles of the Sectors 190 480
360
142.5
360
48
360
75
360
34.5
360
60
64 480 100 480 46 480 80 480
360 0
0
0
0
0
0
PIE CHART SHOWING EXPENDITURES IN RUPEES OF DIFFERENT COMMODITIES OF A FAMILY
GRAPHS: Diagrams fail to represent a statistical series spread over a time, or a frequency distribution, or two related variables in visual form. So Graphs are used for such representations. A Graph consists of a straight line or a curve and presents the data in a simple and effective manner. Graphs are used to make comparison between two or more than two statistical series. Sometime Graphs may also be used to make predication and forecasts.
Types of Graphs: Graphs can be divided into two main categories as; (a) Graph of time series (Historigram) (b) Graph of frequency distribution Here we will only discuss the graph of frequency distribution.
GRAPHS OF FREQUNCY DISTRIBUTION: The important graphs of frequency distributions are; (i) Histogram (ii) Frequency Polygon (iii) Frequency Curve (iv) Cumulative frequency Curve or Ogive.
Histogram: A Histogram consists of a set of adjacent rectangles in which class boundaries are marked along X-axis and frequencies are taken on Y-axis. When the class intervals are equal then the rectangles all have the same width and the heights of rectangles are directly proportional to the respective class frequencies. If the class intervals are not equal, then the heights of the rectangles have to be adjusted accordingly. To adjust the heights of the rectangles in frequency distributions, each class frequency is divided by its class interval size.
Example 2.12 Construct Histogram for the following frequency distribution. Classes
10-14
15-19
20-24
25-29
30-34
35-39
40-44
f
4
12
25
30
25
15
6
Solution: C – B
9.5-14.5
14.5-19.5
19.5-24.5
24.5-29.5
29.5-34.5
34.5-39.5
39.5-44.5
f
4
12
25
30
25
15
6
HISTOGRAM
Example 2.13 Construct Histogram for the following frequency distribution. Classes
10-11
12-14
15-19
20-29
30-34
35-39
40-42
f
4
12
25
60
25
15
6
Solution: C – I
frequency
C – B
Class Interval Size
10 – 11
4
9.5 – 11.5
2
12 – 14
12
11.5 – 14.5
3
15 – 19
25
14.5 – 19.5
5
20 – 29
60
19.5 – 29.5
10
30 – 34
25
29.5 – 34.5
5
35 – 39
15
34.5 – 39.5
5
40 – 42
6
39.5 – 42.5
3
Adjusted frequency 4 2
2
12 3
4
5
6
5
3
25 5 60 10 25 5 15 5
6 3
2
HISTOGRAM FOR UN-EQUAL CLASS INTERVALS
Frequency Polygon: A second useful way of presenting a frequency distribution in graphic form is frequency polygon. A frequency polygon is a line graph obtained by plotting class frequencies against class marks and then joining the consecutive points by a straight line. A frequency polygon can also be obtained by joining the mid points of the tops of the rectangles in the Histogram. The ends of the graphs do not meet the X-axis. Because a polygon is a many sided closed figure, we, therefore, add extra classes on both ends of the frequency distribution with zero frequencies. In this way we get the frequency polygon. We used frequency polygon instead of Histogram, when two frequency distributions are to be compared. A frequency polygon gives rough idea about the mode, skewness and kurtosis of the curve.
Example 2.14 Draw a frequency polygon for the following frequency distribution. Classes
60-62
63-65
66-68
69-71
72-74
75-77
78-80
4
f
9
14
18
12
7
3
Solution: C – B f X
59.5-62.5 4 61
62.5-65.5 9 64
65.5-68.5 14 67
68.5-71.5 18 70
71.5-74.5 12 73
74.5-77.5 7 76
77.5-80.5 3 79
FREQUENCY POLYGON
Alternative Method
FREQUENCY POLYGON
Frequency Curve: If the curve of the frequency polygon is smoothed, it is called as frequency curve or if in the frequency polygon, the plotted points are joined by a freehand drawing method instead of joined by a straight line, we get the frequency curve. A frequency curve should not touch the X-axis.
Example 2.15 Draw a frequency polygon for the following frequency distribution. Classes
60-62
63-65
66-68
69-71
72-74
75-77
78-80
f
4
9
14
18
12
7
3
C – I
60-62
63-65
66-68
69-71
72-74
75-77
78-80
f
4
9
14
18
12
7
3
X
61
64
67
70
73
76
79
Solution:
FREQUENCY CURVE
Cumulative frequency polygon or Ogive: A cumulative frequency polygon also known as Ogive is a graph obtained by plotting the cumulated frequencies of a distribution against the upper or lower class boundaries and then the points are joined by straight line segments. The graph corresponding t o a “less than ” and or “ more than ” cumulative frequency distribution are called “ less than ” and or “ more than ” Ogives respectively. A smoothed Ogive is called an Ogive Curve, which is often used to locate the values of median, quartiles, deciles, percentiles etc. of a frequency distribution.
Example 2.16 Draw a “less than ” cumulative frequency polygon from the following data. Age
20-24
25-29
30-34
35-39
40-44
45-49
50-54
f
1
2
26
22
20
15
14
Solution: Age C – B
f
C – B
Less than Class Boundaries
c.f
-
-
-
Less than 19.5
0
20 – 24
1
19.5 – 24.5
Less than 24.5
1
25 – 29
2
24.5 – 29.5
Less than 29.5
3
30 – 34
26
29.5 – 34.5
Less than 34.5
29
35 – 39
22
34.5 – 39.5
Less than 39.5
51
40 – 44
20
39.5 – 44.5
Less than 44.5
71
45 – 49
15
44.5 – 49.5
Less than 49.5
86
50 – 54
14
49.5 – 54.5
Less than 54.5
100
TOTAL
100
-
-
-
“LESS THAN ” TYPE CUMULATIVE FREQNEUCY POLYGON
Example 2.23 Draw a “more than ” cumulative frequency polygon from the following data. Age
20-24
25-29
30-34
35-39
40-44
45-49
50-54
f
1
2
26
22
20
15
14
Solution: Age C – B
f
C – B
More than Class Boundaries
c.f
-
-
-
More than 19.5
100
20 – 24
1
19.5 – 24.5
More than 24.5
100-1=99
25 – 29
2
24.5 – 29.5
More than 29.5
99-2=97
30 – 34
26
29.5 – 34.5
More than 34.5
97-26=71
35 – 39
22
34.5 – 39.5
More than 39.5
71-22=49
40 – 44
20
39.5 – 44.5
More than 44.5
49-20=29
45 – 49
15
44.5 – 49.5
More than 49.5
29-15=14
50 – 54
14
49.5 – 54.5
More than 54.5
14-14=0
TOTAL
100
-
-
-
“MORE THAN ” TYPE CUMULATIVE FREQNEUCY POLYGON