Important Statistical Terminologies in Research Methodology
1. Statistics vs. Parameters: A statistic is a numerical measure computed from a sample and a
parameter is a numerical measure computed from a population. Thus, these terms are also
referred to as sample statistics and population parameters.
2. Frequency Distribution: The frequency (f) is the number of times a variable takes on a
particular value. Note that any variable has a frequency distribution. For example, roll a pair
of dice several times and record the resulting values (constrained to being between and 2 and
12), counting the number of times any given value occurs (the frequency of that value
occurring), and take these all together to form a frequency distribution. Frequencies can be
absolute when the frequency provided is the actual count of the occurrences, or it can be
relative when they are normalized by dividing the absolute frequency by the total number of
observations [0, 1]. Relative frequencies are particularly useful if you want to compare
distributions drawn from two different sources, i.e., while the numbers of observations of
each source may be different.
3. Mean, Median, Mode and Range: The mean is the numerical average of the data set.
Ordinarily, the mean is computed by adding all the values in the set, then dividing the sum by
the number of values. The median is the number that is in the middle of a set of data.
Arrange the numbers in the set in order from least to greatest. Then find the number that is in
the middle. What, if there are even number of data in the set? In this case, take two central
numbers, add them and divide by 2 and there comes the median value. Say, for example, if a
student’ scores in eight different subjects are 45, 67, 74, 82, 88, 91, 92, 93, then his/her
median score will be (82+88)/2 = 170/2 = 85. One important thing here is the data needs be
converted into an array of ascending or descending order before computing the median value.
So, what is mode then? The mode is the piece of data that occurs most frequently in the data
set. A set of data can have i. one mode, more than one mode, and no mode at all. The range
is the difference between the lowest and highest values in a data set. For example, in above
case of marks earned by the student, the Range = 93 – 45 = 48. It reveals the numerical
extent of the width of data set.
4. Variance and Standard Deviation: The variance is the average squared deviation from the
mean of a set of data. It is used to find the standard deviation. Process: 1. Find the mean of
the data. Hint: Mean is the average, so add up the values and divide by the number of items.
2. Subtract the mean from each value; the result is called the deviation from the mean.3.
Square each deviation of the mean. 4. Find the sum of the squares. 5. Divide the total by the
number of items. The variance formula includes the Sigma Notation, , which represents the
sum of all the items to the right of Sigma;
Here, mean is represented by
and n is the number of items. Standard Deviation shows the variation in data. If the data is
close together, the standard deviation will be small. If the data is spread out, the standard
deviation will be large. Standard Deviation is often denoted by the lowercase Greek letter
sigma ( ).
Notice the standard deviation formula is the square root of
the variance. As we have seen, standard deviation measures the dispersion of data. The
greater the value of the standard deviation, the further the data tend to be dispersed from the
mean. Z-Scores are referred to as the number of standard deviations an observation is away
from the mean.
5. Skewness and Kurtosis: A fundamental task in many statistical analyses is to characterize
the location and variability of a data set. A further characterization of the data includes the
analyses of skewness and kurtosis. The measure of dispersion tells us about the variation of
the data set. Skewness tells us about the direction of variation of the data set. Skewness is a
measure of symmetry, or more precisely, the lack of symmetry. A distribution, or data set, is
symmetric if it looks the same to the left (negative) and right (positive) of the center point.
The histogram is an effective graphical technique for showing both the skewness and
kurtosis of a data set.
There are further statistics that describe the shape of the distribution, using formulae that
are similar to those of the mean and variance. 1st moment - Mean (describes central value);
2nd moment - Variance (describes dispersion); 3rd moment - Skewness (describes
asymmetry); and 4th moment - Kurtosis (describes peakedness).
Kurtosis measures how peaked the histogram is.
The kurtosis of a normal distribution is 0. Kurtosis characterizes the relative peakedness
or flatness of a distribution compared to the normal distribution. Platykurtic: When the
kurtosis < 0, the frequencies throughout the curve are closer to be equal (i.e., the curve is
more flat and wide). Thus, negative kurtosis indicates a relatively flat distribution.
Leptokurtic: When the kurtosis > 0, there are high frequencies in only a small part of the
curve (i.e, the curve is more peaked). Thus, positive kurtosis indicates a relatively peaked
Kurtosis is based on the size of a distribution's tails. Negative kurtosis (platykurtic):
distributions with short tails. Positive kurtosis (leptokurtic): distributions with relatively
6. Hypothesis: It is a hunch, assumption, suspicion, assertion or an idea about a phenomena,
relationship, or situation, the reality of truth of which one do not know. A researcher calls
these assumptions, assertions, statements, or hunches hypotheses and they become the basis
of an inquiry. In most cases, the hypothesis will be based upon either previous studies or the
researcher’s own or someone else’s observations. Hypothesis is a conjectural statement of
relationship between two or more variable (Kerlinger, 1986). Hypothesis is proposition,
condition or principle which is assumed, perhaps without belief, in order to draw its logical
consequences and by this method to test its accord with facts which are known or may be
determined (Webster’s New International Dictionary of English). According to Black and
Dean (1976), a tentative statement about something, the validity of which is usually unknown
is known as hypothesis. Accordingly, Baily (1978) has defined it as a proposition that is
stated in a testable form and that predicts a particular relationship between two or more
variable. In other words, if we think that a relationship exists, we first state it is hypothesis
and then test hypothesis in the field. In fact, a hypothesis may be defined as a tentative theory
or supposition set up and adopted provisionally as a basis of explaining certain facts or
relationships and as a guide in the further investigation of other facts or relationships.
Hypotheses has these characteristics – i. a tentative proposition, ii. unknown validity, and iii.
specifies relation between two or more variables.
Functions of a hypothesis: Bringing clarity to the research problem. It provides a study with
focus. It signifies what specific aspects of a research problem is to be investigated. It also
helps delimit what data to be collected and what not to be collected. It serves for the
enhancement of objectivity of the study. It serves highly instrumental to formulate the theory
and enables to conclude with what is true or what is false.
Types of hypotheses: Three types of hypotheses include -- working hypothesis, null
hypothesis and alternate hypothesis.
Working hypothesis is provisionally adopted to explain the relationship between some
observed facts for guiding a researcher in the investigation of a problem. A statement
constitutes a trail or working hypothesis (which) is to be tested and conformed, modifies or
even abandoned as the investigation proceeds.
Null hypothesis is formulated against the working hypothesis, and it opposes the statement
of the working hypothesis. It is contrary to the positive statement made in the working
hypothesis. It is formulated to disprove the contrary of a working hypothesis. When a
researcher rejects a null hypothesis, he/she actually proves a working hypothesis. It is
normally denoted by H0. Normally, only null hypothesis is written research papers.
Alternate hypothesis is formulated when a researcher totally rejects null hypothesis. He/she
develops such a hypothesis with adequate reasons. It is normally denoted by H1. A researcher
formulates this hypothesis only after rejecting the null hypothesis.
Examples of different hypotheses:
Working hypothesis: Population influences the number of bank branches in a town.
Null hypothesis (Ho): Population may not have any significant influence on the number of
bank branches in a town.
Alternate hypothesis (H1): Population might have significant effect on the number of bank
branches in a town.
7. Statistical Tests: Different statistical tests have to be performed for different types of data.
For continuous data: If comparing 2 groups (treatment/control), t-test. If comparing > 2
groups, ANOVA (F-test). If measuring association between 2 variables, Pearson r
correlation. If trying to predict an outcome (crystal ball), regression or multiple regression.
For ordinal data: Likert-type scales are ordinal data. If comparing 2 groups, Mann Whitney
U (treatment vs. control), Wilcoxon (matched pre vs. post). If comparing > 2 groups,
Kruskal-Wallis (median test). If measuring association between 2 variables, Spearman rho
For categorical data: Called a test of frequency; how often something is observed (AKA:
Goodness of Fit Test, Test of Homogeneity). Chi-Square (χ2). Examples of burning research
questions -- Do negative ads change how people vote? Is there a relationship between marital
status and health insurance coverage? Do blonds have more fun?
8. Descriptive and Inferential Statistics: Descriptive Statistics provide an overview of the
attributes of a data set. These include measurements of central tendency (frequency
histograms, mean, median, and mode) and dispersion (range, variance and standard
deviation). Inferential statistics provide measures of how well your data support your
hypothesis and if your data are generalizable beyond what was tested (significance tests).