3. Processing and Analysis of data –
Classification of data, types of classification,
Tabulation, Graphical presentation of data- Bar
diagram, pie-chart and curves.
Processing and analyzing data –
Descriptive Analysis (Mean, Mode, Median,
Standard Deviation, and Variance Analysis)
Inferential Analysis (‘t’ test, Chi- Square test).
3
5. Classification is a process by which
collected statistics (i.e. numerical
information) are put into different classes
on the basis of their values.
Classification process is not limited only to
statistics.
The process of keeping different
individuals, objects etc. into different
classes on the basis of their characteristics
is also classification.
5
7. Classification has the following major
methods or types:
1. Geographical Classification:
When statistics are placed in different
classes on the basis of geography of place,
classification is called a geographical
classification.
As it is clear in this type of classification
place is the base of classification. This place
may be a country, a state, a district, etc.
7
8. Name of state Food grain production in 2003
(million tonnes)
Punjab 42
Uttar-Pradesh 34
Harayana 80
Maharashtra 18
Other States 20
Total 194
8
9. When statistics are classified according to
time, classification is called Chronological
Classification.
As it is clear, in this type of classification,
time is the base of classification This time
may be a decade, a year, a month, a day
etc.
9
10. Year Production of food grains in Maharashtra
( million Tonnes )
2004 34.1
2005 35.6
2006 38.4
2007 39.6
2008 37.2
10
11. In this type of classification base of
classification is any characteristic of quality.
Facts are placed into various classes
according to the presence or absence of
that characteristics quality. This type of
classification is used when direct
quantification of facts is not possible.
11
12. Qualitative classification is of two types:
a) Two-Fold Classification
In two-fold classification we study only
one quality or characteristic.
b)Sub-Two classification
These are constructed on the basis of the
presence or absence of that
characteristics.
12
14. This type of classification is done where direct
measurement of facts is possible, e.g.. heights
of students, weights of students, daily wages of
workers, sales of a firm etc.
Anything which can be measured directly
and can be expressed numerically is called a
variable.
14
15. Variables are classified with the help of
statistical series.
On the basis of construction of a series,
series are of three types:
(a) Individual series
(b) Discrete series
(c) Continuous series
15
16. (a) Individual series :
In these series, every item is presented
individually. This type is used when
number of items is small. In this type
figures are not placed in different classes.
In this type of series, frequency column
is not there. Generally such series are
printed in the form of Serial Number, Roll
Number, Alphabetical Order, Day, Month,
Place etc.
16
17. Roll No. Marks of students
1 58
2 86
3 75
4 64
5 73
6 88
7 70
8 63
9 64
10 65
17
18. (b) Discrete Series:
In discrete series, any statistics is not written
repeatedly. Statistics with same value are kept
together. Suppose three students have 50 marks
each, then 50 is not written three times but 50 is
written once and in front of this '3' is written. This '3' is
called frequency By frequency we mean how many
times any value is repeated. In this way we find that
in discrete series different values of a variable are
presented with their corresponding frequencies.
18
19. No. of children in Family No. of Houses
0 15
1 20
2 25
3 17
4 4
5 1
Total N=82
19
20. (c) Continuous Series:
Continuous series are used to classify continuous
variables. As written already, if a variable can take also
the possible values within the specified limits, it is called
as a continuous variable.
In continuous series, variable is presented in the
form of classes, e.g., 0-5 which means from 0 to 5, 5-10
which means from 5 to 10 and so on. In front of each
class is written the frequency of that class. If class 0-5
has frequency 3, it means variable has taken value in
between 0 to 5, three times.
20
21. Marks (x) Number of students ( f)
0-10 5
10-20 3
20-30 4
30-40 7
40-50 5
50-60 3
60-70 1
70-80 6
80-90 7
90-100 2
Total Total=45
21
22. 1. Class Limits :
The two limits, which combined together, forma class are called class limits.
One limit is called the lower class limit and denoted by 'L' and the other limit is
called upper class limit and is denoted by U. In our Example 1, class 9-10 has
two class limits. '' is the lower class limit and 10 is the upper class limit.
2. Class Frequency: Number of items in a class is called class frequency. In
our Example 1. there are 5 students with marks in between and 10. Therefore
'5' is the class frequency of this class.
3. Mid Value: Middle Value of any class is called Mid Value of Mid Point. It is
calculated by taking the summation of lower class limit and upper class limit
and dividing this summation by two.
Upper limit + lower limit
Mid value =
2
20+10
If class is 10-20, than mid value = 2 =15
22
23. 4. Class Interval:
The difference between upper class limit and lower class
limit is called as class interval. It is denoted by symbol 'i
generally.
i = U-L
Suppose class is 600-800, then class interval is 800-600
= 200. If class interval is equal for all the classes, then it
is called Equal Class Interval. If class interval is not
same for all the classes, then it is called Unequal Class
Interval.
23
24. 5. Exclusive Method:
In this method, upper class limit of any
class is equal to lower class limit of the
next class.
e.g.
0-10
10-20
20-30
24
25. 6. Inclusive Method: In this method upper
class limit of any class is not equal to the
lower class limit of the next class. E.g.
0-9
10-19
20-29
25
27. In any research study, after collection and
classification of data, presentation of data
is very important.
Presentation of data helps in Research
Method. Understanding, condensing, and
interpreting the data.
27
28. Data can be presented by the following
methods:
1. Presentation of data through tables or
tabulation.
2. Presentation of data through diagrams
or diagrammatic presentation
3. Presentation of data through graphs or
graphic presentation.
28
29. "Tabulation in its broadest sense is an orderly arrangement
of data in columns and rows.
- M. M. Blair.
"A statistical table is a systematic organisation of data in
columns and rows.
- Neiswanger
"A statistical table is a classification of related numerical
facts in vertical columns and horizontal rows."
- Spurand Bonini
29
30. 1. Simplification of data
2. Economies Space
3. Helpful in comparison
4. Helpful in presentation
5. Helpful in analysis
6. Helpful in interpretation
7. Helpful in clarifying the characteristics of data
8. Helpful in finding mistakes
9. Helpful in condensation
10. Attractive form
30
31. Tabulation performs some functions. Because of these functions, tabulation has
importance. To perform these functions is the objective of any tabulation. These
functions are as follows:
1. Simplification of data: Tabulation makes complex data simple and as a result it
becomes easy to understand the data.
2. Economise Space: Tabulation helps in giving maximum information in minimum
possible space.
3. Helpful in comparison: Comparison is one of the major objectives of
investigation. Tabulation helps in comparison.
4. Helpful in presentation: As written already, tabulation helps in presenting data.
5. Helpful in analysis: In a research enquiry, tabulation helps in analysing the
collected data.
6. Helpful in interpretation: Tabulation assists in interpreting the data.
7. Helpful in clarifying the characteristics of data: Tabulation helps in studying and
showing the various characteristics of data.
8. Helpful in finding mistakes: Tabulation is helpful in finding mistakes.
9. Helpful in condensation : Tabulation is useful in condensing the collected data.
10. Attractive form: With tabulation numerical figures take the attractive form.
Because of these uses, tabulation has a significant importance in research.
31
32. A good table have the following qualities.
1. Table should be attractive
2. Table should be clear.
3. Table should not have either too large size or too small size.
4. Table should be suitable to the purpose of enquiry.
5. Table should be comparable.
6. Table should be self-explanatory
7. The different units used in table should be clearly written.
8. Different columns and rows must be numbered in the table.
9. Different items must be presented systematically in table
10. Table must have its title.
11. Footnote and source of data must be there below the table
32
34. 1. Table Number:
First of all for any table, table number is given. It helps in giving reference
to the table.
2. Title of the table :
Below the table number, title of the table is wrist 2. Titles title makes clear the
purpose of the Title must be given with caution. Generally, it should not be
too lengthy or too small..
3 .captions:
Every table has many columns. Title of any column in table is called
'caption'. Sometimes title is given for all the columns collectively It is called
'master caption. When required columns are further divided into sub-
columns. Title must also be given to these sub-columns
4. Stubs:
Every table has many rows. All the rows are given title separately. These
titles are called 'stubs'. Sometimes title is given to all the rows collectively.
This title is called 'stub head'. All the stubs are under this stub head.
34
35. 5. Body:
This is the most significant part of any table. All the statistics
presented in the table, taken together constitute the body of the
table. Different statistics are presented in the form of cells in the
table. These cells are formed by the intersection of rows and
columns. These cells constitute the body of the table.
6. Footnotes:
Below the table necessary information and explanations are given.
These are called footnotes.
7. Source: Below the footnotes, we write source of data:
35
36. Sr.
No.
Basis Type-1 Type-2
1 On the basis of
purpose
1.General Purpose Table 2.Special Purpose Table
2 On the Basis of
Originality
1.Original Table 2.Derivative Table
3 On the basis of
construction
1.Simple Table Complex Table
1. Two Way or Two
Fold Tables:
2. Three Way or Three
Fold Table
3. Manifold Table
36
37. On the basis of purpose
(i) General Purpose Table:
Those tables which are not constructed for some special
purpose but for general purpose are called as general
purpose tables. The chief objective of such tables is to
present data so that data may be used when required.
Such tables are also called 'Reference Tables' or
'Primary Tables'.
(ii) Special Purpose Table:
These tables are constructed with special purpose. This
special purpose may be studying comparison or
association. These tables are smaller in comparison with
general purpose table. These are also called 'Summary
Tables.'
37
38. On the Basis of Originality:
On the basis of originality, tables are
classified into two categories
(i) Original Table: These are the tables in
which statistics or numerical
data are presented in original form.
(ⅱ) Derivative Table: These are the tables
in which statistics are notpresented in the
original form but in the form of total, ratio,
percent, rate etc.
38
39. 3. On the basis of construction:
On the basis of construction, table is of two
types
i) Simple Table :
When statistics are presented in the table
on the basis of one characteristic or quality
or attribute, the table is called a simple
table. This base may be age, sex, type of
work etc.
39
40. (ii) Complex Table:
When statistics are presented in the table on the
basis of two or more than two characteristics or
attributes, the table is called complex table. Complex
table is of these types.
Two Way or Two Fold Tables: In these table,
numerical data are presented on the basis of two
characteristics. For example, if in the above
example with nature of work, sex is also to be
presented, we shall use two way table.
40
41. b) Three Way or Three Fold Table:
In these tables, statistics an presented on
the basis of three characteristics. If in the
above example with sex and nature of
work we are also to present whether
worker is with training or without training,
we shall use three fold table.
41
42. c) Manifold Table :
When numerical information are presented
on the basis of more than three
characteristics table is called manifold
table If in the above example with nature of
work, sex and training, age of worker is
also given, we shall use manifold table for
presentation.
42
43. Classification Tabulation
Similarities
1. Classification helps in condensing
data.
2. Classification helps in simplifying
complex data.
3. Classification helps in keeping data
in a systematic order.
Similarities
1.Tabulation also helps in
condensation.
2. Tabulation also helps in simplifying
complex data.
3. Tabulation also helps in keeping
data in a systematic order
Dissimilarities
1.Classification is done after collection
on data
2. In classification various statistics
are kept in different classes.
3. Classification helps in analysis of
data.
Dissimilarities
1. Tabulation is done after
classification.
2. In tabulation, various statistics are
kept in different columns and rows.
3. Tabulation helps in presentation of
data.
43
44. Diagrammatic presentation of data is one
of such methods with which a layman can
understand the statistics or data.
Newspapers, magazines and advertisers
use this method.
In brief, diagrammatic presentation is the
method of presenting the complex data in
simple, attractive and comparable form so
that even lay man can understand it.
44
45. Prof. M. J. Money writes,
"Diagrams helps us to visualise the
whole meaning of a numerical complex at
a single glance .
45
46. 1. Attractive Presentation of data
2. Impressive presentation of data
3. Simple presentation of data
4.Useful for Comparison
5. Useful for interpretation
6. Universal Utility
7. Condensation of data
46
47. 1. Attractive Presentation of data:
With the help of diagrams, the data can presented in an attractive form. Those who don't have interest in data, and
if those are presented with the help of diagrams, they start showing interest in data. Because of this the
newspapers, magazines and advertisers use diagrams for catching the people's eye.
2. Impressive presentation of data:
The diagrams are not only attractive but impressive too. The various advertisements with the help of diagrams
leave deep impressions on our brains. That is why the diagrams are used while teaching small children.
3. Simple presentation of data:
Diagrams help in presenting the data in a simple way. Tabulation is a complex method of presenting data.
Diagrams remove the complexities of data and thus help in their easy understanding.
47
48. 4.Useful for Comparison:
Diagrams are also useful in making comparison. In reality, the main objective of diagrams is to help
in comparison. For example, an ordinary man may be able to understand price index but only
when the prices are presented with the help of diagrams.
5. Useful for interpretation:
We can easily interpret with the help of diagrams. It saves time and labour.
6. Universal Utility:
Nowadays the diagrams are being used in almost all the spheres such as trade, economics,
advertisement etc.
7. Condensation of data:
The data can be condensed with the help of diagrams. It is an old saying that a picture is worth
10.000 words.
48
49. Different types of diagrams are listed below:
1. Bar Diagrams
(A) Simple Bar Diagram
(B) Sub-divided Bar Diagram
(C) Multiple Bar Diagrams
(D) Percentage Bar Diagram
(E) Bi-Lateral Bar Diagram
2. Pie-chart
3. Rectangular
4. Maps and picture graph
5. Line-graph
(A) Geometrical Straight Lines Or Curved Line
(B) Frequency Curve
(C) Histogram
(D) Frequency Polygon
(E) Cumulative Frequency Polygon
49
66. Descriptive statistics are brief informational coefficients
that summarize a given data set, which can be either a
representation of the entire population or a sample of a
population.
Descriptive statistics are broken down into measures of
central tendency and measures of variability (spread).
Measures of central tendency include the mean, median,
and mode,
Measures of variability include standard deviation,
variance, minimum and maximum variables, kurtosis,
and skewness.
66
67. Measures of central tendency focus on the average or middle values of data sets,
whereas measures of variability focus on the dispersion of data. These two
measures use graphs, tables and general discussions to help people understand
the meaning of the analyzed data.
Measures of central tendency describe the center position of a distribution for a data
set. A person analyzes the frequency of each data point in the distribution and
describes it using the mean, median, or mode, which measures the most common
patterns of the analyzed data set.
67
68. Mean
is also known as average of all the
numbers in the data set which is calculated
by below equation.
Median :
Median is mid value in this ordered data
set.
Mode :
Mode is the number which occur most
often in the data set..
68
69. Measures of variability (or the measures of spread) aid in analyzing how
dispersed the distribution is for a set of data. For example, while the
measures of central tendency may give a person the average of a data set,
it does not describe how the data is distributed within the set.
So,while the average of the data maybe 65 out of 100, there can still be data
points at both 1 and 100. Measures of variability help communicate this by
describing the shape and spread of the data set. Range, quartiles, absolute
deviation, and variance are all examples of measures of variability.
Consider the following data set: 5, 19, 24, 62, 91, 100. The range of that
data set is 95, which is calculated by subtracting the lowest number (5) in
the data set from the highest (100).
69
70. Variance :
Variance is the numerical values that describe the variability of the
observations from its arithmetic mean
Standard Deviation :
It is a measure of dispersion of observation within dataset relative to
their mean. It is square root of the variance and denoted by Sigma
(σ) .
Standard deviation is expressed in the same unit as the values in the
dataset so it measure how much observations of the data set differs
from its mean.
70
72. Inferential Statistics is a branch of statistics that makes
the use of various analytical tools to draw inferences
about the population data from sample data.
Inferential statistics help to draw conclusions about the
population while descriptive statistics summarizes the
features of the data set.
72
73. Inferential Statistics Descriptive Statistics
Inferential statistics are used to make
conclusions about the population by using
analytical tools on the sample data.
Descriptive statistics are used to quantify
the characteristics of the data.
Hypothesis testing and regression analysis
are the analytical tools used.
Measures of central tendency and measures
of dispersion are the important tools used.
It is used to make inferences about an
unknown population
It is used to describe the characteristics of a
known sample or population.
Measures of inferential statistics are t-test, z
test, linear regression, etc.
Measures of descriptive statistics are
variance, range, mean, median, etc.
73
75. Hypothesis Testing Regression Analysis
Z Test Linear Regression
T Test Nominal Regression
F Test Logistic Regression
Chi-Square Test Ordinal Regression
ANOVA Test
Wilcoxon Signed Rank Test
Mann Whitney U Test
75
77. Hypothesis Testing also includes the use of
confidence intervals to test the parameters
of a population.
77
78. Hypothesis Testing is a type of inferential statistics that is used to
test assumptions and draw conclusions about the population from
the available sample data.
It involves setting up a null hypothesis and an alternative hypothesis
followed by conducting a statistical test of significance.
A conclusion is drawn based on the value of the test statistic,
the critical value, and the confidence intervals.
A hypothesis test can be left-tailed, right-tailed, and two-tailed. Given
below are certain important hypothesis tests that are used in
inferential statistics.
78
79. The null hypothesis proposes that no significant
difference exists between a set of given
observations.
Null: Two sample means are equal.
Alternate: Two sample means are not equal.
To reject a null hypothesis, one needs to
calculate test statistics, then compare the result
with the critical value. If the test statistic is
greater than the critical value, we can reject the
null hypothesis.
79
80. Z Test
T Test
F Test
Chi-Square Test
ANOVA Test
Wilcoxon Signed Rank Test
Mann Whitney U Test
80
81. F Test:
An F Test is used to check if there is a
difference between the variances of two
samples or populations.
81
82. Z Test: A z test is used on data that follows
a normal distribution and has a sample
size greater than or equal to 30. It is used
to test if the means of the sample and
population are equal when the population
variance is known.
82
83. T-test is used to compare the mean of two
given samples. Like a z-test, a t-test also
assumes a normal distribution of the
sample. When we don’t know the
population parameters (mean and
standard deviation), we use t-test.
There are multiple variations of the t-test.
83
84. THE THREE VERSIONS OF A T-TEST
Independent sample t-test: compares
mean for two groups
Paired sample t-test: compares means
from the same group at different times
One sample t-test: tests the mean of a
single group against a known mean
84
85. The statistic for this hypothesis testing is
called t-statistic, the score for which we
calculate as:
t=(x1—x2) / (σ / √n1 + σ / √n2), where
x1 = mean of sample 1
x2 = mean of sample 2
n1 = sample size 1
n2 = sample size 2
85
86. : A confidence interval helps in estimating
the parameters of a population. For
example, a 95% confidence interval
indicates that if a test is conducted 100
times with new samples under the same
conditions then the estimate can be
expected to lie within the given interval 95
times. Furthermore, a confidence interval
is also useful in calculating the critical
value in hypothesis testing.
86
88. Regression analysis is used to quantify how one variable
will change with respect to another variable. There are
many types of regressions available such as simple
linear, multiple linear, nominal, logistic, and ordinal
regression.
The most commonly used regression in inferential
statistics is linear regression. Linear regression checks
the effect of a unit change of the independent variable in
the dependent variable.
88