Chapter 11: Data Analysis: Classification
and Tabulation
Meaning of Data Analysis
• Data analysis has multiple facets and approaches. It encompasses diverse techniques under a variety of
names in different business, science and social science domains.
• In statistical applications, some researchers divide data analysis into descriptive statistics (DS), exploratory
data analysis (EDA) and confirmatory data analysis (CDA).
• Similarly, predictive analytics focuses on application of statistical or structural models for predictive
forecasting or classification, while text analytics applies statistical, linguistic and structural techniques to
extract and classify information from textual sources, a species of unstructured data.
• Whatever may be the analytical purpose of researchers, data analysis is the process of scanning, examining
and interpreting data available in a tabulated form. It is the procedure of evaluating data using analytical
and logical reasoning to examine each component of the data provided.
Why to Analyse Data?
• The underlying purpose of data analysis is to understand the nature of the data and reach a conclusion. In
fact, data analysis provides answers to the research questions or research problems that you have
formulated. Without analysing the data, you cannot draw any conclusion and inferences.
• The prime objective of analysing data is to obtain usable and useful information. The analysis, regardless of
whether the data are qualitative or quantitative may assist you to:
– describe and summarise the data
– identify relationships between variables
– compare variables
– identify the difference between variables
– forecast outcomes
Types of Data Analysis
Generally speaking, there are two most widely used categories of data analysis:
• Qualitative analysis: Qualitative analysis handles the data that are categorical in nature. Qualitative analysis
serves three basic principles (Seidel 1998): (a) notice things, (b) collect things and (c) think about things.
• Quantitative analysis: Quantitative analysis is the process by which numerical data are analysed and often
involves DS. The statistical methods widely used in quantitative data analysis are statistical models, analysis of
variables, data dispersion, analysis of relationships between variables, contingence and correlation, regression
analysis, statistical significance, precision and error limits.
Benefits of Data Analysis
The following are the benefits of data analysis:
– allows meaningful insights from the data set
– highlights critical decisions from the findings
– allows a visual view leading to faster and better decisions
– offers better awareness regarding the habits of potential customers
– structures the findings from survey research or other means of data collection
– breaks a macro picture into a micro one
– rules out human bias through proper statistical treatment
Nature of Statistical Data: Variables and
Attributes
• Statistics provides methods for the following:
– Design: planning and carrying out research studies
– Description: summarising and exploring data
– Inference: making predictions and generalising about phenomena represented by the data
• In social research, both variables and attributes represent social concepts.
 Variables: A variable is a data item. Its value may vary between data units in a population and may change in
value over time. When analysing your data, you should keep in mind that variables are not always ‘quantitative’
or numerical. You should also keep in mind that variables are not the only things that we measure in the
traditional sense.
 Attributes: An ‘attribute’ is defined as a characteristic or quality of a variable. A variable uses numerical values to
measure an attribute. It is a quantity that expresses a quality in numbers to allow more precise measurement.
Parametric and Non-parametric Data
• Non-parametric statistical procedures are less strong or powerful because these variables use less
information in their calculation.
• The basic distinction for parametric versus non-parametric is: if our measurement scale is nominal or
ordinal, then we use non-parametric statistics. On the other hand, if we are using interval or ratio scales,
we use parametric statistics.
• The other considerations which you have to take into account are: you have to carefully observe the
distribution of your data. If you find the possibility of your data to take parametric statistics, you should
check that the distributions are approximately normal. If a distribution deviates markedly from normality,
then you take the risk that the statistic will be inaccurate.
Classification of Data
• Classification is the process of arranging data in homogeneous groups or classes on the basis of
resemblances and common characteristics. Classification is the grouping of related facts into classes. It is
the first step in tabulation.
• Objectives of classification: The principal objectives of classifying data are to condense the mass of data, to
facilitate comparison and to allow a statistical treatment of the material collected, among others.
• Methods of classification of data: Classification of data can be done on the basis of either of the two types:
 Classification on the basis of attributes: In this type of classification, researchers classify data on the basis of
some attributes of quality such as sex, religion, occupation and so on.
 Classification on the basis of class intervals: In frequency distribution, raw data are shown by distinct
groups. These groups are termed as ‘classes’. The main methods of such classification are geographical
classification, chronological classification and variable classification.
Classification of Data (Contd.)
• How to construct continuous series?: In continuous series, measurements are only approximations. They
are expressed in class intervals, that is, within certain limits. In a continuous frequency distribution, the
class intervals theoretically continue from the beginning of the frequency distribution to the end without
break.
• Determinants of class intervals: Statisticians use exclusive and inclusive methods for determining the class
intervals in a continuous series. In the exclusive method, while counting the observations, researchers
include the lower limit and exclude the upper limit. In the Inclusive class interval method, both the limits
are included while counting the observations.
• Rules of classification of data: The classification of data should be in (a) exhaustive, (b) exclusive, (c)
homogenous, (d) flexibility and (e) appropriate manner.
Tabulation
• Tabulation means summarising data using a systematic arrangement of data into rows and columns. It
shows the data in concise and attractive form which can be easily comprehended and used to compare
numerical figures. Tabulation of data is done with the aim of carrying out investigation, for comparison,
identifying errors and omissions in data, studying a prevailing trend and for simplifying the raw data.
• The main objectives of tabulation are: simplifying complex data, facilitating comparison, ensuring economy
of space, depicting trend and pattern of data, helping reference, facilitating statistical analysis and
detecting errors.
• Components of a table: In general, a statistical table consists of table number, title of the table, caption,
stub, body, headnote, footnote and source note.
Tabulation (Contd.)
Tables can be categorised as follows:
• Simple or one-way table: This type of table shows only one characteristic of the data. It is the simplest
table which contains data of one characteristic only.
• Two-way table: When the data are tabulated according to two characteristics at a time, it is said to be
double tabulation or two-way tabulation. It is a table that contains information on two variables.
• Multivariate table: This type of table contains information concerning more than two variables.
• Frequency distribution table: A frequency table is a table that lists items and uses tally marks to record and
show the number of times they occur. Frequency tables are the normal tabular method of presenting
distributions of a single variable.
Tabulation (Contd.)
• Discrete or ungrouped frequency distribution: In this form of distribution, the frequency refers to discrete
value. Here, the data are presented in a way that exact measurement of units is clearly indicated.
• Continuous frequency distribution: There are three methods of classifying the data according to class
intervals, namely, exclusive method, inclusive method and open-ended classes.
• Computation of rates and ratios: Ratios are used frequently for comparison. In an educational research,
the most commonly used rates are simple rates and growth rates.
• Percentages: The term percentage or symbol % is used frequently in everyday life. Percentages provide a
result in the form of parts per hundred that is usually more readily understandable and comparable than
raw values.

Chapter 11 Data Analysis Classification and Tabulation

  • 1.
    Chapter 11: DataAnalysis: Classification and Tabulation
  • 2.
    Meaning of DataAnalysis • Data analysis has multiple facets and approaches. It encompasses diverse techniques under a variety of names in different business, science and social science domains. • In statistical applications, some researchers divide data analysis into descriptive statistics (DS), exploratory data analysis (EDA) and confirmatory data analysis (CDA). • Similarly, predictive analytics focuses on application of statistical or structural models for predictive forecasting or classification, while text analytics applies statistical, linguistic and structural techniques to extract and classify information from textual sources, a species of unstructured data. • Whatever may be the analytical purpose of researchers, data analysis is the process of scanning, examining and interpreting data available in a tabulated form. It is the procedure of evaluating data using analytical and logical reasoning to examine each component of the data provided.
  • 3.
    Why to AnalyseData? • The underlying purpose of data analysis is to understand the nature of the data and reach a conclusion. In fact, data analysis provides answers to the research questions or research problems that you have formulated. Without analysing the data, you cannot draw any conclusion and inferences. • The prime objective of analysing data is to obtain usable and useful information. The analysis, regardless of whether the data are qualitative or quantitative may assist you to: – describe and summarise the data – identify relationships between variables – compare variables – identify the difference between variables – forecast outcomes
  • 4.
    Types of DataAnalysis Generally speaking, there are two most widely used categories of data analysis: • Qualitative analysis: Qualitative analysis handles the data that are categorical in nature. Qualitative analysis serves three basic principles (Seidel 1998): (a) notice things, (b) collect things and (c) think about things. • Quantitative analysis: Quantitative analysis is the process by which numerical data are analysed and often involves DS. The statistical methods widely used in quantitative data analysis are statistical models, analysis of variables, data dispersion, analysis of relationships between variables, contingence and correlation, regression analysis, statistical significance, precision and error limits.
  • 5.
    Benefits of DataAnalysis The following are the benefits of data analysis: – allows meaningful insights from the data set – highlights critical decisions from the findings – allows a visual view leading to faster and better decisions – offers better awareness regarding the habits of potential customers – structures the findings from survey research or other means of data collection – breaks a macro picture into a micro one – rules out human bias through proper statistical treatment
  • 6.
    Nature of StatisticalData: Variables and Attributes • Statistics provides methods for the following: – Design: planning and carrying out research studies – Description: summarising and exploring data – Inference: making predictions and generalising about phenomena represented by the data • In social research, both variables and attributes represent social concepts.  Variables: A variable is a data item. Its value may vary between data units in a population and may change in value over time. When analysing your data, you should keep in mind that variables are not always ‘quantitative’ or numerical. You should also keep in mind that variables are not the only things that we measure in the traditional sense.  Attributes: An ‘attribute’ is defined as a characteristic or quality of a variable. A variable uses numerical values to measure an attribute. It is a quantity that expresses a quality in numbers to allow more precise measurement.
  • 7.
    Parametric and Non-parametricData • Non-parametric statistical procedures are less strong or powerful because these variables use less information in their calculation. • The basic distinction for parametric versus non-parametric is: if our measurement scale is nominal or ordinal, then we use non-parametric statistics. On the other hand, if we are using interval or ratio scales, we use parametric statistics. • The other considerations which you have to take into account are: you have to carefully observe the distribution of your data. If you find the possibility of your data to take parametric statistics, you should check that the distributions are approximately normal. If a distribution deviates markedly from normality, then you take the risk that the statistic will be inaccurate.
  • 8.
    Classification of Data •Classification is the process of arranging data in homogeneous groups or classes on the basis of resemblances and common characteristics. Classification is the grouping of related facts into classes. It is the first step in tabulation. • Objectives of classification: The principal objectives of classifying data are to condense the mass of data, to facilitate comparison and to allow a statistical treatment of the material collected, among others. • Methods of classification of data: Classification of data can be done on the basis of either of the two types:  Classification on the basis of attributes: In this type of classification, researchers classify data on the basis of some attributes of quality such as sex, religion, occupation and so on.  Classification on the basis of class intervals: In frequency distribution, raw data are shown by distinct groups. These groups are termed as ‘classes’. The main methods of such classification are geographical classification, chronological classification and variable classification.
  • 9.
    Classification of Data(Contd.) • How to construct continuous series?: In continuous series, measurements are only approximations. They are expressed in class intervals, that is, within certain limits. In a continuous frequency distribution, the class intervals theoretically continue from the beginning of the frequency distribution to the end without break. • Determinants of class intervals: Statisticians use exclusive and inclusive methods for determining the class intervals in a continuous series. In the exclusive method, while counting the observations, researchers include the lower limit and exclude the upper limit. In the Inclusive class interval method, both the limits are included while counting the observations. • Rules of classification of data: The classification of data should be in (a) exhaustive, (b) exclusive, (c) homogenous, (d) flexibility and (e) appropriate manner.
  • 10.
    Tabulation • Tabulation meanssummarising data using a systematic arrangement of data into rows and columns. It shows the data in concise and attractive form which can be easily comprehended and used to compare numerical figures. Tabulation of data is done with the aim of carrying out investigation, for comparison, identifying errors and omissions in data, studying a prevailing trend and for simplifying the raw data. • The main objectives of tabulation are: simplifying complex data, facilitating comparison, ensuring economy of space, depicting trend and pattern of data, helping reference, facilitating statistical analysis and detecting errors. • Components of a table: In general, a statistical table consists of table number, title of the table, caption, stub, body, headnote, footnote and source note.
  • 11.
    Tabulation (Contd.) Tables canbe categorised as follows: • Simple or one-way table: This type of table shows only one characteristic of the data. It is the simplest table which contains data of one characteristic only. • Two-way table: When the data are tabulated according to two characteristics at a time, it is said to be double tabulation or two-way tabulation. It is a table that contains information on two variables. • Multivariate table: This type of table contains information concerning more than two variables. • Frequency distribution table: A frequency table is a table that lists items and uses tally marks to record and show the number of times they occur. Frequency tables are the normal tabular method of presenting distributions of a single variable.
  • 12.
    Tabulation (Contd.) • Discreteor ungrouped frequency distribution: In this form of distribution, the frequency refers to discrete value. Here, the data are presented in a way that exact measurement of units is clearly indicated. • Continuous frequency distribution: There are three methods of classifying the data according to class intervals, namely, exclusive method, inclusive method and open-ended classes. • Computation of rates and ratios: Ratios are used frequently for comparison. In an educational research, the most commonly used rates are simple rates and growth rates. • Percentages: The term percentage or symbol % is used frequently in everyday life. Percentages provide a result in the form of parts per hundred that is usually more readily understandable and comparable than raw values.