1. 1
Approaches to Data Analysis in Social
Research
Demola Akinyoade
Department of Peace and Conflict Studies
Afe Babalola University, Ado-Ekiti, Nigeria
demolaakinyoade@yahoo.com
Paper presented at the Research Methods Seminar of the College of Social and
Management Sciences, Afe Babalola University, Ado-Ekiti
Date: 19th
and 20th
March 2013
2. 2
Abstract
Data analysis is a critical stage in social research. Considering its primary audienceâproject students
at the undergraduate levelâthe paper covers the basics approaches to analyzing data from social
research. Using simple terms, as much as possible, it briefly traces the epistemological roots of the
qualitative and quantitative data to subjectivism and positivism respectively. The paper treats some
crosscutting issues in the analysis of data from social research. These issues are the role of research
questions in analyzing data, developing data analysis algorithm, ethics of data analysis. Analyses of
quantitative and qualitative data are treated separately. Under quantitative data analysis it provides
basic information to understand the logic behind the main statistical tools and appreciate how and
when to use them in actual research situations. It covers certain foundational concepts germane to the
field of numerical analysis including scales of data, parametric and non-parametric data, descriptive
and inferential statistics, kinds of variables, hypotheses, one-tailed and two-tailed tests, and statistical
significance. Under qualitative data analysis, the paper provided a six-stage general procedure for
analyzing qualitative data. These are organizing the data, finding and organizing ideas and concepts,
building overarching themes in the data, ensuring reliability and validity in the data analysis and in the
findings, finding possible and plausible explanations for findings; and the final steps. The paper
provides Brief information on the use of computer technology in form of online services and
computer software for data analysis.
Keywords: algorithm, data analysis, ethics, quantitative data, qualitative data, statistics.
3. 3
Introduction
Data analysis is a critical stage in the scientific knowledge building. Prior to the stage of data analysis,
a scientist would have accomplished some necessary steps and processes in the scientific approach.
These include determination of research area, development of research objectives and questions
within the area, selection of appropriate methods (of research, data collection and sampling
techniques), development of instrument and data collection. Scientists expend much effort and
resources in these activities. However, these efforts will be a share waste of resources if the data are
not properly handled and manipulated to get the required information (in line with the quest of the
study). This is why data analysis is critical. The difference between doing it well and doing it anyhow
could mean the success or failure but will, in the minimum, determine the quality of a scientific study.
From the point the research questions are determined, they become the driver of the whole research
process. In the sense that they select the research method(s) or strategy(ies) (which could be from
either of or both of the quantitative or qualitative paradigm).
For a scientific data to be worth its name, there must be consonance between research
methods, instruments, and sampling (including sample size and procedure). This ensures the integrity
of scientific data and the whole research process. Obviously, from the foregoing, getting to the point
of scientific data involves following certain important rules of the scientific method. Considering the
amount of effort and resources (time, intellectual, physical, financial and emotional) that is invested in
producing scientific data, it is only sensible for a scientist to handle her data appropriately at the point
of data analysis. The subsequent sections describe the analyses of qualitative and quantitative
data. However, before that, it is important to consider certain issues that apply to both
qualitative and quantitative data analyses. These include the sources, roots and nature of data
in social research, the nature of data analysis, the role of research questions/hypothesis in data
analysis, data analysis algorithm, and the ethics of data analysis in social research.
Sources, Roots and Nature of data in Social Research
Depending on the nature of data, there are two basic sources of data in the social research. These are
qualitative and quantitative. Qualitative data are data not in form of numbers (usually words, pictures,
etc.) while quantitative data are data in form of numbers. Each has its own paradigm (that is, a whole
world of ideasâassumptions, theories, concepts, and traditions) and is rooted in its epistemology. Just
for the sake of knowing (at this stage), the qualitative approach is rooted in subjectivism or
humanism, and the quantitative in positivism (an approach borrowed from the natural sciences). So
we will say that subjectivism and positivism are the epistemologies of qualitative and quantitative
research respectively. Epistemology is a philosophy of knowledge, that is, it refers to reasoning about
ways of knowing things or realities. It is important to be aware of the sources of data and whole
traditions behind it because it has significant implications for data analyses. For instance, while
4. 4
qualitative data (in form of words) seek to understand meaning, quantitative data (in form of number)
seeks to establish causality. Moreover, the analysis of quantitative data proceeds by using
statistics, tables or charts and discussing how what they show relates to hypothesis or
research questions. While the analysis of qualitative data proceeds by extracting themes or
generalizations from evidence and organizing data to present a coherent, consistent picture.
Data Analysis
To analyze is to examine or study something closely to understand it better or discover something
about it. It means to break it down into its components to study its structure and find out what it is
made up of by identifying its constituent parts and how they are put together. Data analysis involves
all these. It involves the examination of data in detail in order to first understand it better and be able
to draw conclusions from it. It involves breaking data into its constituents, combining and
recombining the constituents to discover patterns, assess, describe or explain a phenomenon after
careful consideration or investigation in line with the research questions or hypotheses. A data analyst
is therefore an expert with specialist knowledge or skill to manipulate data in consonance with
rigorous rules in order to assess, describe or explain a phenomenon. This idea of data analysis is
applicable to both qualitative and quantitative data.
Data analysis therefore requires a number of skills and attitude including critical thinking,
problem-solving and patience. With adequate commitment, most people will be able to develop or
improve on these skills in the process of time. Data analysis also involves investment of time. So you
may want to give adequate time for your analysis. Data analysis should not be hurried. Though both
quantitative and qualitative data can be analyzed with the aid of computer (this is the idea behind
Computer Aided Quantitative/Qualitative Data AnalysisâCAQDAS) saving some time.
Nevertheless, adequate time is still required in using computer applications to analyze data, whether
qualitative or quantitative.
Research Questions or Hypothesis
As hinted earlier, research questions drive research in the sense that it selects the methods, points to
the data and prescribe the methods of data collection and analysis. But first, what is the difference
between research questions and hypothesis. Hypothesis is a predicted answer to a research question.
While a research question states what we are trying to find out, hypothesis predicts, a priori (up front),
the answer to that question. But why would we predict answers to what we are trying to find out?
There are two reasons we will predict this answer and not another. First because another researcher
has done similar research and this is what was found. This, however, does not explain the prediction,
only answers the question. Second, there are propositions (theory) put forward which explains that the
5. 5
predicted answer can be expected. So in executing the research and testing the hypothesis, we are
actually testing the theory behind the hypothesis.
There is no logical difference between hypothesis and research questions, when it comes to
their implications for design, data collection and data analysis. What then is the role of hypothesis in
empirical research? And when is it important to have hypothesis? It is important to have a hypothesis
when we have an explanation (a theory) in mind behind the hypotheses (Since hypotheses are used to
test theories). Determining whether a research requires hypotheses is simple. For each research
question, ask what answer you expect to this question. If you cannot predict with any confidence, then
forget about hypothesis and go ahead with research questions. If you can predict, then ask why you
predict this and not something else. If the only answer is because another researcher found it to be
true, then do not propose the hypotheses. However, if you have some explanation for the prediction in
mind, then propose the hypothesis, and expose and analyze the theory behind them.
Research questions or hypotheses guide the analysis of data. And this is how it does that. First
it is expected that data collection instruments (questionnaires, interview guide, observation schedule,
etc.) must have been developed as guided by research questions. In other words, items on the data
collection instrument, that is, specific data collection items must have been developed to collect facts
or data to answer specific research questions. With this understanding, data analysis starts with
identifying which items on the instrument provides answer to what research question. Usually two or
more items on the instrument will provide data to answer a research question, depending on the
complexity of the question. Therefore the data analyst identifies sections of the instrument providing
appropriate data for each question. She proceeds to âharvestingâ by counting the number of
respondents that responded this way or that or who said what.
Data Analysis Algorithm
Algorithm is a borrowed word from mathematics and computer science. It refers to a written logical
step-by-step procedure or sequence of steps for solving a problem. Since data are gathered to solve
particular problems (in form of research questions or hypotheses), it is sensible to write out a plan to
carry out this operations or activities. In this wise, a data analysis algorithm serves as master plan or
blue print for data analysis. There are three important reasons to have a written algorithm for data
analyses. First is because an unwritten plan is only a wish says Trim. So donât just have the vague
idea in your head, put it down in writing. Writing it down helps you to clarify your thought about
analyzing your data. It forces you to think about it. It gives you a sharp mental picture of how you are
going to analyze those data. Two, data analysis could be overwhelming sometime such that one can
forget or miss some important steps or analysis if not written down. An additional reason to have an
algorithm is because sometimes a researcher may want to involve another person or the services of
professional data analyst at the data analysis stage. The algorithm will be a guide or template to
ensure that the analysis is done as it should be done. Furthermore, a form of algorithm is sometimes
6. 6
required if you are writing proposal for research grant to fund your research. It shows that you have
thought through your study increasing your chances of securing the grant. So whether the data
analysis is qualitative or quantitative, consider having a written algorithm for data analysis.
Ethics of Data Analysis
Research ethics concerns the issues of right and wrong in conducting research. The moral issue of
research extends to data analysis. The core thing here is that a researcher has the moral obligation to
do her best in analyzing her data following appropriate techniques and procedures. This is to ensure
the integrity of the scientific knowledge production. It may be important to put things in the right
perspectives here. Every scientific study is supposedly a contribution to knowledge production. Every
researcher is to make her modest effort in contributing to consolidating what is known and/or making
the unknown known. This is how scientific knowledge grows. No one is too inexperienced to make a
modest contribution and nobody is too experienced to do it at her own whims and caprices. Everyone
must follow the rulesâthe rules of the scientific method. Data analysis is a critical part of the
scientific method. Therefore in analyzing data, a researcher, irrespective of her level in research, has
an obligation to herself, if she is worth the name, the scientific community, and the larger public to
follow the rigorous procedures for analyzing her data. It is better not to analyze data at all than to
fraudulently (through laziness, ineptitude, or outright fraud or a combination of these) analyze data. It
is tantamount to producing fake drugs for public consumption. So rather than defraud yourself, the
scientific community and the general public in your data analyses, it is better you seek to acquire
requisite skills, or employ the services of others who can help you to analyze your data. Whatever
decisions you make, you are still responsible for your data analyses, be it quantitative or qualitative.
Statistics: The Analysis of Quantitative Data
Quantitative data analysis is a powerful tool; nevertheless, it is only as good as the original
data, data collection instrument, operational definitions, and research question. So these must
be given deserved attention prior to the data analysis stage. After collecting quantitative data,
there is a need to make sense of the responses collected. We do this by organizing, summarizing, and
doing exploratory analysis. We then communicate meanings to others using tables, graphical displays,
and summary statistics. Quantitative analysis helps us to see similarities, differences and relationships
between phenomena investigated, that is, things we have collected data on. The analysis of
quantitative data is generally called statistics. Unfortunately, the fear of numbers, calculations,
arithmetic and mathematics makes many researchers shiver at the thought of STATISTICS. The good
news, however, is that as a researcher you are not required to have a full knowledge of how it works
in order to use it effectively. What is required, as Punch puts it is to understand the logic behind the
main statistical tools and, and appreciation of how and when to use them in actual research situations.
(1998, p. 112).
7. 7
Quantitative analysis is generally deductive in approach and independent of researcherâs
value judgment. Since quantitative data are in form of numbers, quantitative data requires using
statistical tools. Analyzing quantitative data requires familiarity with certain foundational concepts
germane to the field of numerical analysis. These include scales of data, parametric and non-
parametric data, descriptive and inferential statistics, kinds of variables, hypotheses, one-tailed and
two-tailed tests, and statistical significance.
Scales of Data
Scales or levels of data refer to the nature or kinds of numbers that researchers deal with. The nature
of data we are dealing with determines the kind of analysis that can be performed with such data.
There are four kinds of data: nominal, ordinal, interval and ratio, each subsuming (incorporating the
characteristics of) its predecessor. It is erroneous to not apply the statistics of a higher order scale of
data to data at lower scale when analyzing the data. Nominal scale is a naming quality. It involves
determining the presence or absence of a characteristic and categorizes to mutually exclusive groups.
Nominal measurement is also referred to as categorical measurement, since naming involves
categorisation. For instance, according females the number 1 category and males the number 2
category. There cannot 1.5 or 1.9 category. Nominal data denote discrete variables. The implication of
this for data analyses is that the mode (the most frequent score in a set of scores) is the only
appropriate central tendency statistics for nominal data. The chi-square analyses are the most
appropriate statistics for nominal data (Dane, 1990). Chi square analyses are used to determine
whether the frequencies of scores in the categories defined by the variable match the frequencies one
would expect is based on chance or based on theoretical predictions.
Ordinal scale classifies and introduces an order into the data, while keeping the features of
nominal scale. It involves ranking or otherwise determining an order of intensity for a quality. Ordinal
measurement identifies the relative intensity of a characteristic, but not reflects any level of absolute
intensity. For instance, in a five-point rating scale (1=strongly disagree; 2=disagree; 3=undecided; 4=
agree; and 5=strongly agree), point 1 cannot be said to 4 times in disagreement as point 5. However, it
helps to place them in order. Ordinal measurement is like the place an athlete finishes in a sprint. The
winner may be a head ahead of the second-place finisher or several seconds. The positions (1st
, 2nd
, 3rd
âŚ) do not tell us how much faster an athlete in comparison to others is. Therefore, calculating a mean
of ordinal score is as meaningless as that of nominal measurement. Generally, any statistical
technique that involves comparisons on the basis of the median is appropriate for ordinal data. Ordinal
scales are frequently are such rating scales or Likert scales frequently used in asking for opinions or
rating attitude.
Interval scale has a metric, that is, a regular and equal interval between each data point, as
well as keeping the features of ordinal scales. It involves a continuum composed of equally spaced
intervals. Examples of interval measurements are the Fahrenheit and Celsius temperature scales; this
8. 8
is because a change of one degree anywhere along the continuum reflects an equal amount of change
in heat. To all intents and purposes, the statistics for interval scale is the same for ratio scale. Ratio
scale has all the features of interval scale and adds a powerful featureâa true zero. âThis enables the
researcher to determine proportions easilyââtwice as many asâ, âhalf as many asâ, âthree times the
amount ofââŚâ (Cohen, Manion, & Morrison, 2011, p. 605) the presence of zero makes performing all
arithmetic operations such as addition, subtraction, multiplication and division possible. This and the
opportunity to use ratios make it the most powerful level of data.
Table 1: Summary of Measurement Levels
S/N Level Whatâs Measured Example Central Tendency
1 Nominal Distinction Guilty/Not Guilty Mode
2 Ordinal Relative position Socioeconomic status Median
3 Interval Arbitrary amounts Intelligence Quotient Mean
4 Ration Actual amounts Age Mean
Adapted from Dane, 1990 p 253
Parametric and Non-Parametric Data
Parametric data assume knowledge of the characteristics of the population. This assumption makes it
safe for inferences to be made. They often assume a normal, Gaussian curve of the distribution
(Cohen, Manion, & Morrison, 2011). Interval and ratio scales are considered to be parametric. Non-
parametric data, on the other hand, do not have assumptions about the population. This is usually
because the characteristics of the population are not known. Nominal and ordinal data are considered
to be non-parametric. The implications for data analysis is that, whilst non-parametric statistics can be
applied to parametric data, parametric statistics cannot be applied to non-parametric data. Non-
parametric data tend to be derived from questionnaires, whilst parametric data tend to be derived from
experiment and tests.
Descriptive and Inferential Statistics
Descriptive statistics describe and present data. They make no attempt to infer or predict population
parameters. They are only concerned with data enumeration and organization, that is, they simply
report what has been found using variety of ways. (Cohen, Manion, & Morrison, 2011). Descriptive
statistics therefore are for summarizing quantitative data. They include frequencies, percentages and
cross-tabulations; measures of central tendency (means, medians, modes) and dispersal (standard
deviation); taking stock; correlations and measures of association; partial correlations; and reliability
(Cohen, Manion, & Morrison, 2011; Punch, 1998). Simple frequency distributions and percentages
are useful in summarizing and understanding data. Scores in the distribution are usually grouped in
ranges and tabulated according to how many respondents responded this way or that way or fell into
this or that category. Absolute numbers and/or percentages are used in such tables. Frequencies and
9. 9
percentages tell at a glance what the data look like. They also help the researcher to stay close to the
data at the initial stages of analysis. (Punch, 1998)
Cross-tabulation is another presentational device in which one variable is presented in relation
to another, with relevant data placed into corresponding cells. It helps us to compare across groups
and draw attention to certain factors. It also helps us to show trends or tendencies in the data.
Moreover, it aids in rating scales of agreement to disagreement. âThe central tendency of a set of
scores is the way in which they tend to cluster round the middle of a set of scores, or where the
majority of the scores are located.â (Cohen, Manion, & Morrison, 2011, p. 627) The three common
measures of central tendency are the mean, the median and the mode. The arithmetic mean is the
standard average, that is, the arithmetic average of a set of values, or distribution. The mean is the
most commonly used measure of central tendency. It is calculated by adding up all the data, and then
dividing this total by the number of values in the data. There are two important things to know about
the mean. First, it is very useful where scores within a distribution do not vary too much but not so
effective where there is great variance. Second, the mean is very useful in estimating variance. The
median is the middle score in a ranked (ordered) distribution, that is, the midpoint score of a range of
data. It is useful for ordinal data. The mode is the score with the highest frequency. This could be
more than one. It is useful for all scales of data. It is particularly useful for nominal and ordinal data,
that is, discrete and categorical data, rather than continuous data.
While descriptive statistics may be useful, inferential statistics are usually more valuable for
researchers and are typically more powerful. Inferential statistics seeks to make inferences and
predictions based on the data collected. âThey infer or predict population parameters or outcomes
from simple measures, e.g. from sampling and from statistical techniques, and use information from a
sample to reach conclusions about a population, based on probabilityâ (Cohen, Manion, & Morrison,
2011, p. 606). Inferential statistics include hypothesis testing, correlations, regression and multiple
regression, difference testing, (t-tests, analysis of variance, factor analysis and structural equation
modeling).
It is important to know exactly what you want to do right from the outset. This will help you
to choose the most appropriate data collection and analysis techniques for your study. If you intend to
describe what happens with your sample of participants, then descriptive statistics will be appropriate.
But if you want to be able to generalize your findings to a wider population you will find inferential
statistics more appropriate.
Hypotheses and Hypothesis Testing
Quantitative research is often concerned with finding evidence to support or contradict an idea or
hypothesis. For example one may propose that if undergraduates are taught Peace and Conflict
Resolution it will improve their peace attitude. One will then explain why a particular answer is
expected with the aid of a theory. Testing hypothesis generally involves two hypothesesâa null
10. 10
hypothesis and an alternative hypothesis. A null hypothesis indicates that there will be no change or
effect while an alternative hypothesis which is usually the experimental hypothesis claim their will be
change.
From the example above, the null hypothesis will be âteaching undergraduates Peace and
Conflict Resolution will not improve their peace attitudeâ. The alternative hypothesis will then be
âteaching undergraduates Peace and Conflict Resolution will improve their peace attitudeâ The
researcher then put up a research strategy to collect evidence to support either the null or the
alternative hypothesis. For this purpose a sample is taken from the population. If it is an experiment (a
quantitative research method) two groups of students will be set up for this purpose. One group will
be taught Peace and Conflict Resolution and another will not be taught the course. Then the peace
attitude of each group will be tested using appropriate methods, for example observation in a
laboratory or rated using rating scales in interviews or questionnaires. Then we determine which
group does better in peace attitude rating.
Quasi-experimental design will take advantage of pre-existing groups rather than create the
groups for the purpose of the research and administer its tests on them. For instance, it could take
samples from universities where the course is offered and compare it with samples from university
where the course is not offered and determine whether there is difference in the undergraduatesâ peace
attitude. Thereby the researcher determines which of the hypotheses (null or alternative) the collected
data support. It is important to note that testing hypothesis does not prove or disprove a hypothesis or
the theory behind it. The evidence only support or contradicts a hypothesis. Hypothesis includes
concepts which need to be measured. In measuring concepts we need to do three basic things. First we
translate concepts into measurable factors, take these measurable factors and treat them as variables,
and identify measurement scales to quantify variables. (These are mainly taken into consideration
during instrument development stage).
To round off this section, itâs important to address two important things about hypothesis. The
first is the four stages involved in hypothesis testing in quantitative research and the second is to
distinguish between two types of hypotheses. The first stage is to set the null hypothesis. The second
stage is to set the level of significance, that is, alpha level. This is usually 0.05 (for most social
research), written as âLet ďĄ=0.05â. This means that for 95 per cent of the time the null hypothesis is
not supported. This indicates the level of risk the researcher is willing to take in supporting or not
supporting the null hypothesis. The third stage involves computing the data using statistics
appropriate for the research question. The final stage is to support or not to support the null hypothesis
in light of the data analysis. Causal and associative hypotheses are two types of hypotheses which
must be clarified. Causal hypothesis suggests that and input A will affect outcome B. An associative
hypothesis on the other hand describes how variables may relate to each other, but not necessarily in a
causal manner. A researcher must be careful not to confuse the two during the stage of analysis.
11. 11
Hypothesis is one of the four main issues in quantitative data analysis. Others are causality, reliability
and generalizability. These are treated in the next section.
Causality, Generalizability and Reliability
Causality refers to cause and effect. Causality stems directly from the three basic assumptions of the
scientific methodâorder, determinism and discoverability. It shows how things come to be the way
they are. Establishing cause and effect involves identifying variables which include independent,
dependent and control variables. Independent variable is the one that is deliberately manipulated by
the researcher. Dependent variable is the one measured to determine the effect of the manipulated
(independent) variable. The control variables are those held constant during the experiment. It is
assumed that independent variables have causal effect on dependent variables. From our example
above, independent variable is the âteaching of Peace and Conflict Studiesâ which manipulate by
varying the teaching given to different students. Dependent variable is the undergraduatesâ peace
attitude, which we measure by scoring studentsâ responses on an attitude rating scale. Other variables,
such as age, religious affiliation, social status, ethnicity, educational level that could affect studentsâ
responses may be controllable whilst others may not be. Therefore, we have a causeâteaching Peace
and Conflict Studiesâand effect (peace attitude). The alternative or experimental hypothesis is that
teaching Peace and Conflict Studies will increase studentsâ peace attitude and if we donât teach it,
students will be low on peace attitude. The null hypothesis is that there will be no change or effect.
Generalizability or external validity is the degree to which the findings of a study can be
applied, extended, or extrapolated beyond the sample to the larger population. It presupposes that the
studyâs findings can be generalized beyond the specific research. There are two aspects to
generalizabilityâpopulation validity and ecological validity. The first refers to the extent to which the
findings can be applied to other people while the latter is about the applicability of the findings to
other settings. For instance, a study of Postgraduate Masters students in Peace and Conflict Studies in
the University of Ibadan that found one method of teaching peace education was better than another
may not be applicable to undergraduate students (population) in Afe Babalola University (ecological).
Quantitative researchers usually work towards producing generalizable findings. Reliability or
internal validity has to do with the consistency and dependency of a measure. It sometimes means that
a reliable test should produce the same results on successive trials.
Variables: Categorical, Discrete and Continuous
A variable is an operational definition of a concept. According to Dane (1990), a variable is a
measurable entity that exhibits more than one level or value. Spata (2003) sees it as any condition,
situation, object, event, or characteristic that may change in quantity and/or quality. Examples include
intelligence, motivation, heat, thermometer reading, and rating of 1 to 10 on a scale of physical
attractiveness. A variable is the opposite of a constant. A categorical variable is the one that has
categories of values, e.g. variable sex which has two valuesâfemale and male. This is a dichotomous
12. 12
variable. A variable may have more than two categories. Categorical variables match categorical
(nominal) data. A discrete variable has a finite number of values of the same item, without intervals or
fractions of the value. (Cohen, Manion, & Morrison, 2011) For example the number of illnesses a
person has had. A continuous variable vary in quantity. Examples include age, monthly earnings, and
so on. Continuous variables match interval and ratio scale. The kind of variable will determine the
kinds of statistics that can be applied to the data.
Kinds of Analysis
Statistical analysis could be univariate, bivariate or multivariate. Univariate analysis examines the
differences amongst cases within one variable. Bivariate analysis measures the relationship between
two variables, whilst multivariate analysis looks for relationship between two or more variables.
One-tailed and two-tailed tests
Statistical analysis involves making decision as to whether to do a one-tailed or two-tailed analysis.
The kind of result one want to predict will influence which of these is applicable. One-tailed test
predict that one group will score more highly than the other, whilst two-tailed test does not involve
such prediction. One-tailed test is considered stronger than two-tailed test because it makes
assumptions about the population and the direction of the outcome. (Cohen, Manion, & Morrison,
2011) One-tailed test is used with directional hypothesis such as âundergraduates taught Peace and
Conflict Resolution will possess higher peace attitude than those who are not taughtâ On the other
hand, a two-tailed test will be used with a non-directional hypothesis such as âthere will be a
difference in peace attitude of students taught Peace and Conflict Resolution and those not taught.â
While the former shows âmoreâ or âlessâ, the latter only indicates a difference and not where the
difference lies. (Cohen, Manion, & Morrison, 2011)
Computer in the Analysis of Quantitative Data
This section briefly introduces the use of computer in quantitative research and the analysis of
quantitative data. The use of online survey services is becoming increasingly popular in quantitative
research. Online survey services are used to create, distribute and analyze quantitative questionnaires.
A researcher can create her survey using survey editor and select from different types of questions
such as multiple choice, rating scale, drop-down menus, and so on. Available options allow for
flexibility and researcherâs preferences. In addition, if a researcher decides not to distribute her
questionnaires online, she still can use the service as an analysis tool by entering the data herself from
the questionnaires she got from the respondents. Two of the most popular online survey services are
the SurveyMonkey and the Bristol Online Surveys (BOS). Their URLs are
http://www.surveymonkey.com and http://www.survey.bris.ac.uk/ respectively. Once a researcher
masters one service, she is likely to be able to use others without much difficulty.
13. 13
Computer packages have also been useful in the analysis of quantitative data. Although we can
analyze data manually, data analysis computer packages or software can ease the work significantly.
The Statistical Package for Social Sciences (SPSS) is the best used package. It is designed to help data
analyst manipulate data and generate statistics.
Analysis of Qualitative Data
The diversity of qualitative research is probably most apparent in its approaches to the analysis of
qualitative data. Unlike the quantitative analysis with its standardized approaches to data analysis,
there is no one correct way of analyzing and presenting qualitative data. The multiplicity of
approaches, itself a characteristic feature of the qualitative paradigm, is both the glory and pain of the
qualitative analysis. In spite of this variety, one can identify common features of qualitative data
analysis. This is what this section will focus upon. It is not uncommon to find proposals on a number
of steps in qualitative analysis. Miles and Huberman, (1994) identified six while Tesch (1990)
identified 10 procedures common across different types of qualitative analysis (Punch, 1998). The six
procedural steps by Oâ Connor and Gibson (n.d) are simple enough for beginning qualitative data
analysts.
Before delving into these, it may be useful to state an important fact about qualitative data
analysis, which makes it essentially different from quantitative analysis. And that is, data collection
and data analysis are likely to be iterative. That is, there is likely to be a back and forth movement
between collecting data and analyzing it until information collected is adequate for the purpose of the
study. In quantitative data, data collection phase is completed before data analysis phase. This is not a
rigid thing in qualitative data, where data analysis can begin immediately after collecting small data.
Sources of qualitative data include interviews (transcribed or not); observation; field notes; documents
and reports; memos; emails and online conversations; diaries; audio, and video film materials; website
data; advertisements, and print material; pictures and photographs; and artifacts. (Cohen, Manion, &
Morrison, 2011). The six stages involve in analyzing data from such sources include:
1. Organizing the data;
2. finding and organizing ideas and concepts;
3. building overarching themes in the data;
4. ensuring reliability and validity in the data analysis and in the findings;
5. finding possible and plausible explanations for findings; and
6. the final steps (O'Connor & Gibson, n.d).
(This process was adapted from the work of O'Connor & Gibson, n.d)
First Stage: Organizing the Data
Qualitative data analysis is significantly aided by displaying data in a way that permit viewing a full
data set in one location and are systematically arranged to answer the research questions at hand.
14. 14
(Huberman and Miles, 1994, cit. in OâConnor & Gibson) The share volume of data generated by
qualitative research makes the possibility of a researcher getting lost in the data. Hence, the need to
organize the data using research questions as guide. Identify the original research questions first and
try to locate which parts of the data should provide the required data. If it is an interview, some
questions or items on the interview guide would specifically solicit information to answer certain
research question. So go to the location of the responses of the respondents that corresponds with each
research question to locate appropriate data. However, do not close your eyes to the possibilities of
locating useful data in unexpected places. (For instance interview respondents may sometimes provide
information relevant to a question posed at the beginning while answering the question posed last or
even after the interview session!) Having answered your original questions, consider other merging
issues in the data.
Once you have identified the appropriate data for each research question, you need to
organize and display the data in a manner that allows you to see the responses to each topic and
specific question individually. You many need to put this in charts, networks or tables. This is what
Miles and Huberman refers to as data display in their approach to qualitative data analysisâ
transcendental realism. Organizing and displaying data as such makes it easier to pick out recurrent
words, ideas, concepts, and themes for the next step of the analyses, which involves picking out ideas
and organizing them into categories.
Second Stage: Finding and organizing ideas and concepts
According to Marshall and Ross (1995 cited in Oâ Connor & Gibson) âIdentifying salient themes,
recurring ideas or language, and patterns of belief that link people and settings together is the most
intellectually challenging phase of the analysis and one that can integrate the entire endeavor.â While
examining responses for a particular question, the researcher may notice that some words and phrases
are most frequently used as she reads through respondentsâ responses for that question. She then
makes a list and/or notes of these words and phrases. At this point there are four important tasks the
analyst needs to perform according to Oâ Connor & Gibson. First is to find meaning in language,
second is to watch out for the unexpected, hearing stories, and coding and categorizing ideas and
concepts.
Finding meaning in language: the choice of words sometimes reveal peopleâs perception,
attitude, and feeling about something or behavior towards it. There may be need to clarify the
meaning of words or expressions in an interview because the manner the interviewee uses it may be
different from that of the interviewer, maybe because of cultural differences. It is important for the
researcher to understand the meaning of certain words and their underlying implications in the context
of the research. Moreover, the researcher must watch for or be sensitive to the unexpected. Sometimes
participants deviate from the expected and delve into a new or an unexpected area. A researcher needs
to follow up on such âdigressionsâ as they may lead to unexpected discoveries. Events, themes, and
15. 15
meanings can come out stories. Participants may use it to communicate point of ideas or symbols
indirectly. The researcher must pay close attention to stories and their meanings (O'Connor & Gibson,
n.d). The next task is to code and categorize words, phrases, ideas, and concepts. Codes are names,
tags, or labels. Coding is therefore the process of putting names, tags or labels against pieces of data.
(Punch, 1998) The data may be individual words, phrases, a whole sentence or more, a part of a
picture, and so on. Most of us have coded text without knowing. Highlighting part of a text and
tagging it with a label to represent what we consider the central idea is a form of coding. Coding
serves as index for the data. The first labels also permit a more advanced coding at the latter stage of
the analyses. Hence coding is both the first part of the analysis and part of getting of getting the data
ready for subsequent analysis. (Punch, 1998) We may be able to group similar codes together to form
categories.
Memoing should probably be discussed here because of its close relationship with coding. It
begins at the start of the analysis alongside coding. First, memoing is the writing down of ideas that
occur to the analyst while coding (at whatever level). A memo is the write-up of ideas about codes
and their relationships as they occur to the analyst while coding. Memos can be a sentence, a
paragraph or a few pages. It expresses the analyst momentary ideas elaborated using certain concepts.
Memos could be as varied as the analyst imagination permits. They may be about any part of the data.
Memos are useful throughout the stages of the analysis and even may constitute useful part of the
report writing later. Coding and memoing are essential parts of the style of qualitative data analysis
here described.
Third Stage: Building overarching themes in the data
This is a higher level of abstraction involving grouping associated categories together to build
overarching themes. As the researcher works closely with her data, she will be able to recognize
patterns or relationships among her categories. This will help her to group related categories together
in thematic families. A theme may emerge for each section of the response categories. These are the
building blocks of theory development.
Fourth Stage: Ensuring reliability and validity in the data analysis and in the
findings
Validity and reliability are two essential issues in social research. Validity is the accuracy with which
a method measures what it is intended to measure (Schopper et al., 1993) and yields data that really
represents ârealityâ (Goodwin et al.,1987 cited in Oâ Connor & Gibson). Validation is an ongoing
principle throughout the entire qualitative research process. Reliability on the other hand is the
consistency of the research findings (Kvale, 1996). To ensure reliability in qualitative research
requires diligent efforts and commitment to consistency throughout interviewing, transcribing and
analyzing the findings (O'Connor & Gibson, n.d). These two ideas mean slightly different things in
quantitative and qualitative research. There is a whole lot of arguments and counter-arguments over
16. 16
the appropriateness of validity and reliability in qualitative research. This is not, however, the concern
of this write-up.
Whether the research is conducted by one primary researcher or a team of researchers, there is
need to develop a systematic and consistent way of carrying out and analyzing the research. There are
certain steps a researcher needs to do to ensure validity and reliability of her research. These include
testing emergent findings and hypotheses, checking for researcher effects, validating and confirming
findings, obtaining feedbacks from participants, external validations of coding strategy, and
acknowledging factors beyond the researcherâs control (which might have influenced respondentsâ
responses).
Testing Emergent Findings and hypotheses
Hypotheses in this sense refer to initial explanations about phenomenon under investigation and not in
the same sense as in quantitative analysis. Findings and hypotheses emerging from themes and
patterns identified in the data need to be tested. One important way to do this is to look for negative
instances of the patterns, that is, such occurrences that do not conform to the patterns and themes of
the data. Miles and Huberman (1994) refer to them as âoutliersâ. Analyst must carefully examine this
outliers and provide possible explanations for them.
Checking for researcher effects
The researcher is part of data collection instrument in quantitative research. Her interactions with the
respondents will be influenced by the demographic, socio-economic, cultural characteristics of both
parties. These must be recognized and taken into consideration during data analysis.
Validating/confirming findings
One way of ensuring validity in qualitative research is triangulation. Although triangulation applies to
the whole social science research, where it means that the more the independent sources that confirm
findings, the more dependable it is. That is when more than one instrument measure the same thing.
There are different ways of triangulating in qualitative research. They are triangulating from different
sources, from different methods, from different researchers. An example of the first is interviewing
different members of a community in order to get different perspectives on a specific question or
topic. When we triangulate with different methods, we use different research methods to answer the
same questions/topics. For instance, data from quantitative (say survey) and qualitative (say focus
group and individual interviews) could be blended. Sometimes we may have two different researchers
conduct the same interview, or analyze the same data in order to ensure validity. Triangulation is
useful in that it helps us to corroborate the findings. Contrarily, it may reveal inconsistent or
conflicting findings calling for collection of more data or outright reorientation of the study, one way
or the other.
Another way of ensuring validity of the research findings and researcherâs interpretations is to
obtain feedback from participants. This may require the researcher going back to ask the previous
17. 17
participants or those who can speak on their behalf. Local informants can judge the major findings of
the study. Also focus group discussions can be conducted to get the feedback from the community
members to check the research findings for accuracy, validity and appropriateness. It involves
discussing the implications of the research findings and its dissemination.
External Validation of Coding Strategies
In addition to validity of research process and findings, there is also the need for validity of data
analysis process. The analyst can do this by comparing how she has coded and categorized into
themes with how a colleague would have done it. She can randomly select a few passages from
research question/topic that she has already coded and analyzed, provide the colleague with a list of
the codes and categories and have the colleague code the responses. The researcher then compares her
coding with her colleagues. However, protecting the identity of the respondent is very important.
Acknowledging Factors which May have Influenced the Participantâs Response
Sometimes, the situations (which at the time may be beyond the interviewerâs control) under which
the data was collected could influence participantâs response. For instance, time of day of an
interview, presence of other people (family members, friends, co-workers, members of other factions,
etc.), in the vicinity of an interview could influence responses. These factors should be taken into
consideration in explaining the findings. They should be included in the limitations of the study in the
final report.
Fifth Stage: Finding Possible and Plausible Explanations of the Findings
The analystâs work at this stage should start with summarizing her findings and themes. She then
compares this with what is prevalent in the literature from similar studies. How are they similar or
different to extant knowledge on the issue? Are there surprises? She then needs to find possible
explanations for her findings. She turns to the literature, her memos, or personal journal sheâs kept
throughout the study, or key informants in the study area for answers. Memos or other personal notes
kept throughout the study could help her tie themes together to better make sense of the results found
and the reasons for the results (O'Connor & Gibson, n.d). Key informants familiar with the topic and
context of the study might be able to assist the researcher in explaining her findings. This is because
they possess insiderâs knowledge that could be useful in this regard. This helps also to know the
implications the findings have for the community, which is an important part of the final report.
Sixth Stage: The Final Steps
Having determined the implications of the findings, a researcher needs to decide how to communicate
her findings. It has become conventional to let participants know about the findings of the research,
especially in qualitative study. Communicating findings this way is particularly important in
qualitative research because by its nature of inquiry, a researcher is most likely going to develop
informal relationships and sometime close friendship with some of the respondents. Itâs only fair to let
them know the outcome of the study. Sometimes this may include not only individuals but also groups
18. 18
such as communities and organizations. In communicating the findings a researcher has options
including newspaper, newsletter, mail, radio or video, council meeting, focus groups, community
workshops/seminars, formal report.
Computer-aided qualitative data analysis
Just like its counterpart, the computer technology has been found useful in the analysis of qualitative
data. However, unlike its counterpart, the use of computer in the analysis of qualitative data is
relatively less developed. Some of the popular computer packages for qualitative data analysis include
ATLAS.ti 7.0 (http://atlasti.de/), Ethnograph (http://qualisresearch.com), NUD-IST
(http://www.qsrinternational.com) , Nnvivo (http://www.qsrinternational.com). Computer packages
for qualitative data analysis save the analyst from the much paper work that manual analysis requires.
Conclusions
Investigating the social world generate both quantitative and qualitative data. Data from these two
paradigms are essentially different with distinctive nature, epistemologies, and methodologies. These
differences have implications for analyzing data from each paradigm. No paradigm is superior to the
other. Whichever paradigm a research adopts should be determined primarily by the nature of the
enquiry, and not the researcherâs preference. Many preceding stages such as research design,
developing research question, sampling, and data collection will the data analysis stage. The analysis
of quantitative data is generally called statistics. The quantitative paradigm has well developed highly
standardized set of tools for the analyses of its data. However, approaching quantitative data analysis
requires foundational knowledge on concepts such as scales of data, parametric and non-parametric
data, descriptive and inferential statistics, hypothesis and hypotheses testing, causality,
generalizability, and reliability, variables, kinds of analysis, and one-tailed and two-tailed test. The
usual iterative nature of data collection and data analysis is a characteristic feature of qualitative data
analysis. Qualitative data analysis involves organizing the data, finding and organizing ideas and
concepts, building overarching themes in the data, ensuring reliability and validity in the data analysis
and in the findings, finding possible and plausible explanations for findings; and the final steps.
19. 19
Bibliography
Alemika, E. (2002). Epistemological Foundations of the Scientific Method. In L. Erinosho, I. Obasi,
& A. Maduekwe (Eds.), Interdisciplinary Methodologies in the Social Sciences (pp. 1-31).
Abuja, Nigeria: UNESCO Abuja & Social Science Academy of Nigeria.
Asika, N. (1991). Research Methodology in the Behavioural Sciences. Ikeja: Longman Nigeria Plc.
Baker, T. (1999). Doing Social Research (Third ed.). Singapore: McGraw Hill Company Inc.
Bryman, A. (2004). Social Research Methods. Oxford.
Cohen, L., Manion, L., & Morrison, K. (2011). Research Methods in Education (7th ed.). Oxon:
Routledge.
Dane, F. C. (1990). Research Methods. Belmont: Brooks/Cole Publishing Company.
Friese, S. (2011). ATLAS.ti 6 User Guide and Reference. Berlin: ATLAS.ti Scientific Software
Development.
Gelles, R. J., & Levine, A. (1999). Sociology: An Introduction 6th ed. USA: McGraw-Hill College.
Glaser, B. G. (1998). Doing Grounded Theory: Issues and Discussions. California: Sociology Press.
Introduction to Designing Qualitative Research. (n.d.). Retrieved March 12, 2012, from
http://trochim.human.cornell.edu/kb/gif/kbroad.gif
King, M. E., & Sall, &. E. (2007). Introduction: Research and Education Fundamental to Peace and
Security. In E. a. McCandless, Peace Research for Africa: critical essays on methodologies
(pp. 9-28). Addis Ababa: University for Peace, Africa Programme.
Learn Higher and MMU. (2008). Analyse This!!! Learn to analyse qualitative data. Retrieved March
21, 2013, from Analyse This: http://www.learninghigher.ac.uk/analysethis
O'Connor, H., & Gibson, N. (n.d). A Step-by-Step Guite to Qualitative Data Analysis . Pimatiziwin: A
Journal of Aboriginal and Indigenous Community Health, 1(1), 63-90.
Punch, K. F. (1998). Introduction to social research: quantitative and qualitative approaches.
London: Sage Publication Ltd.
Spata, A. V. (2003). Research Methods: Science and Diversity . New York: John Wiley and Sons, Inc.
Trochim, W. M. (2006, October 20). Qualitative Methods. Retrieved March 22, 2012, from Research
Methods Knowledge Base: http://www.socialresearchmethods.net/kb/