Definition: “The process of arranging data into classes or categories according to some common characteristics present in the data is called classification.” Four important bases for classification : Qualitative Quantitative Geographical Chronological
Qualitative : “When data are classified by attributes it is said to be qualitative.” Example: religion , sex and marital status. Quantitative : “When data are classified by quantitative characteristics it is called quantitative classification.” Examples : Height , weight , income etc .
Geographical : “When data are classified by geographical regions or locations it is called geographical classification .” Example : The population of a country ma be classified by provinces , districts , divisions and towns . Chronological or temporal : “When data are arranged by their time of occurrence it is called chronological or temporal.” (An arrangement of data by their time of occurrence is called a time series.)
Types of classification Data may be classified by one , two ,three or more characteristics at a time . One way Two way Three way Many way One way classification : “When data are classified by one characteristics , classification is said to be one way .” Example : The population of a country may be classified as Muslims , Christians , Hindus etc .
Two way classification : “When data are classified by two characteristics at a time, classification is said to be two way .” Examples: Three way classification : “When data are classified by three characteristics at a time, classification is said to be three way .” Example:
Many way classification : “When data are classified by many characteristics at a time, classification is said to be many way .” Example:
Classification of Qualitative Data “In classifying qualitative data , we may divide a characteristic into two , three or many subclasses.” Two fold division (Dichotomy): If the characteristic is divided into two subclasses, one possessing the characteristic and the other not possessing it. This is called two fold division or Dichotomy. Example: If we are studying the literacy of a population, we may divide the population into two categories that is literate and illiterate.
Three fold division(trichotomy) When we divide a characteristic into three subclasses it is called a three fold division or trichotomy. example : Manifold division : When we divide a characteristic into many sub classes it is called a manifold division . Example : division on basis of religion like Muslims , Christians, Hindus and others.
Table : Def : “A table is a systematic arrangement of data into vertical columns and horizontal rows” Tabulation : The process of arranging data into rows and columns is called tabulation . Types of tabulation : Simple Double Treble complex
Simple tabulation : “When tabulation corresponds to one way classification , it is called simple tabulation.” example : Tabulation of data on population of a country classified by one characteristics (religion or marital status ) Double tabulation: When tabulation corresponds to two way classification it is called double tabulation. example : tabulation of data classified by religion and sex or religion or marital status is an example of double tabulation .
Complex tabulation : When tabulation corresponds to many way classification it is called complex tabulation. example: tabulation of data on the population of a country classified by age , sex , religion , marital status etc.
statistical table Statistical table has at least four parts The title The stub The box head The body in addition some tables have Prefatory note Foot note Source note
POPULATION OF PUNJAB AND BALOCHISTAN PROVINCES BY SEX FOR 1961 AND 1972 CENSUS Prefatory note ( figures in thousands) All areas including Gawadar foot note Source: population senses report, 1961&1972
Title : A title is a heading at the top of the table describing its contents. title is usually in capital through out. If the title requires two or more lines, it is arranged to form an interval pyramid. Boxhead : The headings for various columns are called column captions. The portion of the table containing column captions is called boxhead. Stub: The headings for various rows are called row captions. The portion of the table containing row captions is called stub.
Prefatory note : It is used to explain certain characteristics of the data . the prefatory note appears between the title and the body of the table . it throws light on a table as a whole. Foot note: A foot note appears immediately below the body of table it is used to explain a single fact or a part of the table. Source note: Source note is placed immediately below the table but after the foot note, if any. Every table must have the source note unless data are original.
Use of zeros: Zero should not be used in a table. When no case have been found to exist or when the value of an item is zero, this is indicated by means of dots (…..) or short dashes (-----). Raw data: Def. “collected data which have not been organized numerically are called raw data”. 67 63 57 85 67 60 75 55 67 68 51 54 45 57 64 68 67 86 63 60 98 83 76 70 56 50 74 74 67 77 61 85 66 66 60 61 58 56 56 57 60 60 63 64 85 80 75 75 57 58 59 58 58 61 62 91 74 72 57 73 61 86 64 91 64 64 61 62 69 57 81 66 65 81 82 76 77 81 76 66 62 63 62 63 60 60 60 72 72 79 70 70 58 78 58 71 76 60 60 65 60 73 73 71 73 66 73 76 68 69 68 73 73 68 74 68 67 76 52 79
Class limits and class boundaries: Def: “each class is defined by two numbers, these numbers are called class limits. The smaller number is called lower class limit and larger number is called upper class limit” Example: 45 and 49 Lower class limit Upper class limit As measurements are seldom are exact so, 45kg is interpreted as (weight lying between 44.5kg & 45.5kg) Similarly, 49kg is interpreted as (weight lying between 48.5kg & 49.5kg) The values 44.5 and 49.5 are called true class limits or class boundaries.
Open end classes: Some times frequency tables are formed in which a class has either no lower class limit or no upper class limit. Example: In the class “below 5” there is no lower class limit and in the class “25 and above” there is no upper class limit. Such class is called an open end class. Class mark or mid point: The class mark or mid point is that value which divides a class into 2 equal parts.
Size of Class interval “The size of class interval is the difference between the upper class boundary and the lower class boundary.” More about Class Interval Class intervals are generally equal in width and are mutually exclusive. The ends of a class interval are called class limits, and the middle of an interval is called a class mark. Class interval is generally used to draw histogram.
In following table, the class interval for data 49.5_44.5=50-45=52-47= 5
Formation of frequency distribution “The organization take raw data in table form with classes and frequencies”. Determine the greatest and smallest value and find range. e.g in weight of 120 students Greatest no=98 Smallest no=45 Range = 98-45=53 Decide on the number of classes. e.g its alright to have 5 to 20 classes no hard and fast rules.
Determine the approximate class interval size. i.e dividing range by desirable class no. e.g class interval size=53 now 53/11=4.8 or say 5 Decide what should be the lower class limit, it should cover smallest value in raw data. e.g here it is 44.5 Find the upper class boundary by adding the class interval size in lower class boundary. e.g lower class boundary=44.5 class interval size=5 Upper class boundary=44.5+5=49.5
Distribute the raw data into classes and determine the the cases falling in each class i.e the class frequencies. There are two methods:- a) By listing the actual values. b) By using tally marks.
Cumulative frequency distribution “The number of values less than the upper class boundary for the current class. This is a running total of the frequencies”. e.g. cumulative frequency of class 50_54 is 1+4=5 cumulative frequency of class 50_59 is 1+4+17=22 It means 22 students have weight less then 59.5 and 5 students have weight less then 54.5.
Relative frequency distribution “The frequency of a class divided by the total frequency is called relative frequency”. This gives the percent of values falling in that class. e.g relative frequency of class 70_74 is (18/120)=15%
Relative cumulative frequency “The running total of the relative frequencies or the cumulative frequency divided by the total frequency is called relative frequency”. Gives the percent of the values which are less than the upper class boundary. e.g. relative cumulative frequency of weight less then 69.5 is (75/120)*1oo=62.5% which means 62.5% of students have weight less then 69.5kg.
Bivariate frequency Distribution “Constructing frequency distribution by taking two variables is called bivariate frequency”. e.g we have height in inches and weights in pounds of 50 students at a certain college.
X-axis and Y-axis should be taken on horizontal axis and vertical axis respectively
Axes should be labeled correctly and graph must not be over crowded.
On this chart the two coordinates of a point measured along two perpendicular axes X and Y from a fixed point called the origin , are taken to represent numerical values of two characteristics of an individual
Usually x-axis represents qualitative characteristics. e.g. time, age etc
Cigarettes(crore tons) year
Graphs of frequency distributions The important graphs of frequency distributions are
Histograms A histogram consists of a set of adjacent rectangles having bases along the x-axis with centers at the class marks (i.e. marked off by class boundaries) and areas proportional to the class frequencies To draw a histogram (for equal class intervals) ,class boundaries are marked along the x-axis and frequencies are marked along on y-axis.
Frequency polygon: A frequency polygon is a many sided closed figure. it is constructed by plotting the class marks(mid-points) and then joining the resulting points by means of straight lines A frequency polygon can also be obtained by joining the mid points of the tops of rectangles in the histogram To construct it, we mark the class marks along the x-axis and class frequencies along y-axis After plotting points, they are joined by straight lines. Extra classes are taken on both ends to make frequency polygon a closed figure
Frequency polygon for frequency distribution of weight of 120 students
Relative frequency histogram and relative frequency polygon Graphic representation of relative frequency distribution can be obtained from the histogram or frequency polygon simply by changing the y-axis from frequency to relative frequency on a graph The resulting graphs are called relative frequency polygon or percentage frequency polygon respectively
Cumulative frequency polygon or Ogive A graph showing the cumulative frequencies plotted against the upper class boundaries is called a cumulative frequency polygon or an ogive If we use relative cumulative frequencies in place of cumulative frequencies, the resulting graph is called a relative cumulative frequency polygon or percentage ogive
The graphs corresponding to a “less than” and an “or more” cumulative frequency distributions are called “less than "and “or more” ogives respectively
Common shapes of frequency curve The frequencies curves arising in practice take on certain characteristics shapes and are generally classified as Symmetrical or bell shaped curve Moderately asymmetrical curve J shaped and reverse j shaped curves U shaped curves Bimodal & multimodal curve
Symmetrical or bell shaped curve (observations are equidistant from the central maximum)
Moderately asymmetrical curve (in these curves , the tail of the curve to one side of the central maximum is longer than that to the other)
J shaped and reverse j shaped curve (a j shaped curve starts at a low point on the left hand and goes higher and higher towards extreme right and reverse j shaped curve starts with a high point on the right and goes to the extreme left) J-shaped curve Reverse J-shaped curve
U shaped curve (a frequency curve with a low spot on middle and high spots at both curves)
Bimodel and multimodel frequency curve (a bimodel curves has 2 maximas while a multimodel frequency curve has more than 2 maximas)
Pie chart Used to compare the relation between the whole and its components. Difference between bar chart and pie chart.. Circles are drawn proportional to the square root of the quantities to be represented.
Construction of a pie chart Draw a circle with some suitable radius(square root of the total) To show the components by sectors we calculate angles for each sector by the formula… component part × 360 total
Example:Draw a pie chart to show the distribution of Punjab Govt. employees by their academic qualifications
The Stem- and -Leaf Plot Suppose the data are represented by x1,x2,…xn and that each number xi consist of at least two digits. To construct a stem-and-leaf plot, we divide each number xi into two parts: a stem consisting of one or more of the leading digits; and a leaf consisting of the remaining digits. The following example will illustrate the construction of a stem-and-leaf plot.
Example The following table represent weight measurements in kilogram of forty individual in a locality. Construct a stem-and-leaf display of these measurements. 48 53 49 52 51 52 63 60 53 64 59 58 47 49 45 64 79 65 62 60 68 65 73 88 69 83 78 81 86 92 75 85 81 77 82 76 75 91 73 92.
First select the values 4, 5, 6, 7, 8 and 9 as stems. The resulting table is given..
In this display we use stems twice in two lines.. We use one line for leaves o, 1, 2, 3, 4 and the other for 5, 6, 7, 8, 9. the stems are repeated with the * shown for the leaves 0, 1, 2, 3, 4 and • for the leaves 5, 6, 7, 8, 9. Example: Construct a stretched stem-and-leaf plot .. Stretched Stem-and-Leaf Plot 48 53 49 52 51 52 63 60 53 64 59 58 47 49 45 64 79 65 62 60 68 65 73 88 69 83 78 81 86 92 75 85 81 77 82 76 75 91 73 92.
Stretched stem-and-leaf plot for 40 measurements: