Sampling-A compact study of different types of sample
KRISHNA RAJ NS
ASITH PAUL K
OVERVIEW OF DATA COLLECTION
Virtually every management research project will involve some type of data
Data collection must be well planned and managed
Data is the raw material of problem solving and decision making
Management researcher is interested in information rather than data
Must know not only what data is required, but also approaches and techniques
for collecting data
What are the principal types of data ?
How may we classify data ?
Data collection methods ?
What are the principle ways ?
Techniques of collecting data ?
OVERVIEW OF DATA COLLECTION
• The concept of a variable is basic but vitally important
• A variable is anything that varies and can be measured
• The values that the variable takes will vary when measurements are made on
different objects or at different times.
• Variables differ in how well they can be measured, i.e., in how much measurable
information their measurement scale can provide
In general, a variable may fall into one of two types, viz., quantitative variables
and qualitative variables.
• Quantitative variables are those for which the value has numerical meaning.
The value refers to a specific amount of some quantity. They are also called
metric variables or measurement variables.
• Measurement variables tell us “how much” of something each case has. You can
do mathematical operations on the values of quantitative variables (like taking
• A good example would be a person's weight
• The quantitative variables can be broken down into two types, viz., discrete
variable and continuous variable.
• Discrete: A discrete variable is one which can only take certain
fixed numerical values, there are usually gaps between the
values. Discrete variables have numerical values that arise from a
• Continuous: A continuous variable in one which - in principle at
least – can take any numerical value within a specific range.
Continuous variables produce numerical responses that arise
from a measuring process.
Qualitative Variables are those for which the value indicates different
groupings. They are also called Attributes or Categorical Variables
For the purpose of analysis we assign an arbitrary numerical value to such a
variable. Objects that have the same value on the variable are the same with
regard to some characteristic, but you can't say that one group has “more" or
“less" of some feature.
Thus, the categorical variables tell us only “what kind” or category a case
belongs in. It doesn't really make sense to do math on categorical variables.
Dependent Variable : Any variable that depend upon other factors
It represent the output whose variation is being studied
Example : Exam score
It depends upon how much time you studied
How much sleep you got last night
Your physical condition during examination etc.
Independent Variable: A variable that stands alone and isn’t changed by
It represents inputs or causes for variation
Example: Someone's age , t doesn’t depend upon any variable
• Antecedent Variables : Any variable that explains the relationship
between two variables by its prior impact on the two variables
• Example : Social class affect an relationship between income and
political party support
• Intervene Variables :They used to explain relationship between observed
variables such independent and dependent variables, also called mediating
• Example: Income and longevity is not fully related, money cant make life longer.
But people with money get high medical care than others. Here medical care is
an intervening variables
A variable that has only two possible categories such as gender
Dichotomous qualitative variable coded as ‘1’ if the characteristic is
present and ‘0’ if the characteristic is absent
DATA AND DATA SET
• Data is a collection of facts, figures and statistics related to an
• Each time that we record information about an object we observe a
• We might include several different variables in the same case. For
example, we might measure the age, sex, height, weight, and hair
color of a group of people in an experiment.
• We would have one case for each entity
DATA AND DATA SET
• When the raw data are organized in a row-by-column
format, with each row representing one case and each
column representing one variable, it is called a data set.
CLASSIFICATION OF DATA
Some common modes of classification are:
1) geographical, i.e., area wise or region-wise;
2) chronological, temporal, or historical, i.e., with respect to
occurrence of time;
3) qualitative, i.e., by character or by attributes; and
4) numerical, quantitative or by magnitude.
• Before you collect data for a research study, consider carefully
which of the four types of data you are collecting and how you will
use them once you have them
• The four widely used classification of measurement scales are:
TYPES OF DATA
Nominal Ordinal Interval Ratio
The lowest measurement level you can useIn nominal measurement the
numerical values are assigned to name the attribute uniquely.
In this scale, the numbers or letters assigned to objects serve only as labels
or tags for identification and classification of objects.
A nominal scale simply place data into categories, without any order or
The numbers do not reflect the amount of the characteristic possessed by the
These are scales in name only.
• It is the least powerful measurement
• The counting of members in each group is the only possible
arithmetic operation when a nominal scale is employed.
• No measure of dispersion can be used.
• An ordinal scale is next up the list in terms of power of
• Ordinal data include the characteristics of the nominal scale plus an
indicator of order
• In ordinal measurement the attributes can be rank-ordered
• Here, the distance between the attributes do not have any meaning
• In addition to the counting operation allowable for nominal scale
data, ordinal scales permit the use of statistics based on centiles,
e.g., percentile, quartile
• Median is the appropriate measure of central tendency
• A percentile or quartile measure reveals dispersion. Rank
correlation and a few nonparametric tests of significance can be
• Interval data have the power of nominal and ordinal data plus one
additional strength: they incorporate the concept of equality of interval.
Thus, in interval measurement the distance between attributes does have
• The zero point on an interval scale is arbitrary and is not a true zero.
• It permits comparison of the differences between objects. Example: when
we measure temperature(in Fahrenheit) the distance from 30-40 is same
as distance from 70-80.
• A ratio scale is the top level of measurement and is not often
available in social research.
• Ratio data incorporates all the powers of the interval data plus the
provision for absolute zero or origin. Ratio data represent the
actual amounts of a variable.
• Here, we can construct actual fractions(or ratios) with a a ratio
• Examples: Height, weight, distance, etc.
• All statistical techniques can be applied to ratio data.
LEVELS OF MEASUREMENT
When data are collected, the information
obtained from each member of a population or
sample is recorded in the sequence in which it
becomes available. Such data, before they are
grouped or ranked, are called raw data.
CROSS-SECTIONAL AND TIME SERIES DATA
• Cross-sectional data are collected at the same or approximately the
same point in time.
• Example: data detailing the number of road accidents in 28 Indian states in
• Time series data are collected over several time periods.
• Example: data detailing the number of road accidents in each of the 28 Indian
states in the last 36 months
PRIMARY AND SECONDARY SOURCES OF DATA
• The sources from which data can be collected are divided into
primary and secondary:
• Primary data is the data collected by an individual or organization
to use specifically for the purpose of the investigation at hand. The
primary data is collected by conducting experiments, investigations,
observation, interviews, and surveys and by using questionnaires.
• Secondary data is facts and information gathered not for the
immediate study at hand but for some other purpose
• Secondary data has been gathered by others for their own purposes,
but the data could be useful in the analysis of a wide range of real
property. In general, secondary data exists in published sources- both
print and electronic.
PRIMARY AND SECONDARY SOURCES OF DATA
A population consists of all elements—individuals, items, or
objects— that we are interested in studying.
When researchers gather data from the whole population for a
given measurement of interest, they call it census.
• A sample is a finite subset of the population, if properly taken, is
the representative of the population.
• Data can be collected from a sample to answer questions about the
POPULATION VERSUS SAMPLE
• Large samples express greater expected variation
• Large samples represent population better than small samples
SAMPLE SIZE SELECTION
Statistical analysis planned
Expected variability within subsets of the sample
Tradition in our research
• Sampling design is a design that specifies the population frame ,
sample size, sample selection and estimation method in detail
• It is a definite plan for obtaining the sample from a population
It is the record of the population from which a sample can be drawn
RANDOM SAMPLING (CHANCE SAMPLING)
• Random sampling is the purest form of probability sampling (the best
sampling method available)
• Each member of the population has an equal chance of being selected
• The sample units are drawn without showing any regard to the character of
(QUASI RANDOM SAMPLING)
• Involves ‘Ordering of the universe’
• Ordering may be in alphabetical , numerical, geographical etc.
• Every nth member is selected from the order
• Sampling interval = Size of population
Size of the sample
• All people in sampling frame are divided into strata
• Stratum means groups or categories
• Strata is developed on basis of homogenity
• From each stratum , random samples or systematic samples are
OTHER SAMPLING METHODS
• Used when no satisfactory sampling frame available
• Population is divided into clusters or large groups
RANDOM ROUTE SAMPLING
Used In market research surveys (example: sampling households)
An address is selected randomly from register
Then every nth address by alternate left and right turns
OTHER SAMPLING METHODS
• Researcher simply contacts and pickup those cases which he come across and continue the
process (e.g.: 1st 100 willing persons)
Quotas are setup according to certain characteristics
We initially contact with potential respondents and ask them to refer respondents of similar
DATA COLLECTION METHODS
• A focus group is a form of qualitative research
• Respondents from the target population are typically put in a
single group and are asked about their perceptions, opinions,
beliefs, and attitudes towards a product, service, concept,
advertisement, idea, or packaging
• Questions are asked in an interactive group setting where
participants are free to talk with other group members
• Two-way focus group - one focus group watches another focus group
and discusses the observed interactions and conclusion
• Dual moderator focus group - one moderator ensures the session
progresses smoothly, while another ensures that all the topics are
• Mini focus groups - groups are composed of four or five members
rather than 6 to 12
• Teleconference focus groups - telephone network is used
• Online focus groups - computers connected via the internet
• A method for collecting primary data
• The person who is interviewing is called interviewer
• Person who is giving interview is called interviewee or respondent
• Respondent is asked to provide information in form of opinions,
facts and attitude etc.
• A non-experimental, descriptive research method
• Helps in collecting data on phenomena that cant be directly observed such as
• Industrial and consumer survey –for industrial use
• Media studies- to know effectiveness of advertising etc.
• Multiple survey- Survey is conducted among several groups of people
• It is a method for collecting 10 data in which a sample of respondents are
asked a list of carefully structured questions chosen after considerable
testing, with a view of eliciting reliable responses
• It can translate the information needed into specific questions
• It must motivate the respondent to become involved in completing it
OPEN AND CLOSED QUESTIONS
• An open question is likely to receive a long answer
• They offer the advantage of giving the opinions as precisely as possible in our
• Eg: Do you support hike in oil prices, why?
OPEN AND CLOSED QUESTIONS
• A closed question can be answered with either a single word or a short phrase.
• They are convenient and easy to answer, since range of potential answers is limited
• Eg: Do you support hike in oil prices ?
• Tick your opinion
• 1. Yes
• 2. No
MULTIPLE CHOICE QUESTIONS
• The participant is asked a closed question and select his/her anser from a list
of predetermined responses
• E.g.:Who is the reserve bank governor?
Tick your opinion
• 1.Reghu ram Rajan
• 2. D. Subbarao
• 3. Sen Gupta
• 4. L.K.Jha
• 5. Dr.C. Rangarajan
• These scales are used to tap preferences between two or among more objects
• Rating scales are used quite frequently in research, especially in surveys
• Typically, an itemized rating scale asks subjects to choose one response
category from several arranged in hierarchical order.
Eg: Please rate our service (out of 10)
• > 4 - Poor
• 5 to 7 - Average
• 8 to 9 - Good
• 10 - Outstanding