Dr. K. Prabhakar
Professor
Sample Size Determination
Let us assume that you have chosen an organization or a
phenomena, to study human resources effectiveness or
organizational behavior.
Quantitative or qualitative research are the two research paths
you may choose based on your objectives and budget
availability.
Organizational behavior originated from the fact-based
scientific experimentation which requires manipulation of
independent variables and to study their impact on the
dependent variables.
However, all phenomena may not be studied with scientific
exploration. Therefore there is a need for qualitative research.
One of the most important task is determination of
sample size.
Quantitative sample size determination is substantially
different from that of qualitative methods.
We will be discussing sample size determination under
qualitative study.
Review of research articles indicate
inappropriate, inadequate, or excessive sample
sizes continue to impact the quality of
inferences thus inflating the cost and effort of
doing research.
In this lecture, procedures for
determining sample size are
addressed to help the researchers
to select appropriate sample size.
Sample size determination
for quantitative Research
Research starts with defining population consisting of sample units. The
process of defining sampling units and frame will be discussed in another
lecture.
Our objective will be to select a set of samples that are representative of
the population and make inferences from the sample and the inferences
are generalized for the population.
Since we are using samples to make inferences for population, there will
be sampling error.
Thus objective of sample size determination is to minimize sampling error.
Sampling Error
The sampling error is the difference between a sample statistic used to
estimate a population parameter and the actual but unknown value of the
parameter..
For example if you want to estimate the proportion of the employees who
are happy out of 1000 employees in an organization. you determined your
sample size as 118 employees and you obtained 3.5 as an average score on
a scale of 1-5.
However, if you select the entire population and perform the same analysis
and you obtain the average as 3.8. Thus there is an error of 3.8 - 3.5 = .3.
This is called to sampling error.
The effect of sampling error and response & nonresponse bias are usually
not considered by researchers while choosing the sample size.
Sampling Error
What
should be
the sample
size for
following
instances?
Studying voting preferences of Indians; the population for
study is approximately 914.5 million voters in India.
Attitude of employees of an automobile manufacturing unit
with 20,000 work force dispersed across different locations
in India.
Measuring happiness index for 600 employees working in an
IT organization.
Measuring stress levels of eighty employees working in night
shifts in an information technology enabled service firm.
What is your observation?
You will observe that the population range between 90 million units to
60 units in the four examples.
What is the relationship of sample size to that of population size? Or
Should the researcher be concerned about size of population?
The sample size is not dependent on the population size. This is a crucial
statistical insight that is counter intuitive. Please do go through the first
self instruction material given by Journal of Statistical Education.
Observation and Questions
Assertion and Discussion
Let us make a strong assertion;
the standard error of an
estimator depends on the size of
the sample, but not on the size of
the population.
Some management researchers
do talk about 5% to 10% of
sample size.
Alpha Error and Beta Error
Survey designs try to minimize both
alpha error (finding a difference that does
not actually exist in the population) and,
beta error (failing to find a difference
that exists in the population.
What is the way to go about calculating the
sample size?
Let us start with the variables that are
measured in a hypothetical job satisfaction
study that is to be conducted.
You know that job satisfaction is a continuous
variable and may be measured to the
precision that you wish to measure.
This Photo by Unknown Author is licensed under CC BY-NC-ND
Primary variables of measurement
Job satisfaction is a continuous variable and you are going to use a 7-
point scale in the instrument. Thus we will consider continuous
variable measured on a 7-point scale.
We know that job satisfaction is influences by variables such as
gender& race (nominal variable), number of years of service, age
group, educational qualification (categorical variable).
Question is which variable to be used in the formula that we are going to
form?
Step one of our starting point
• He posited that “One method of determining
sample size is to specify margins of error for the
items that are regarded as most vital to the survey.
• An estimation of the sample size needed is first
made separately for each of these important items”
Recommendation
given by
Cochran(1977).
Reference: Cochran, W. G. (1977). Sampling Techniques (3rd Ed.). New York: John Wiley &
If gender is a primary variable that is likely to impact the measurement of job
satisfaction(primary variable) by 7-point scale, it is likely to provide higher
sample size.
Thus the researcher will have a range of n’s that is n=sample size for different
variables. We should find all the n’s for different variables and select the highest
n so that we obtain likely lowest sample error.
Error Estimation –
Your Decision….
Factors to be considered in error estimation are
The risk the researcher is willing to
accept in the research which is called as
margin of error, or the error the
researcher thinks that his inferences
about population are estimated.
The alpha level is the level of acceptable
risk the researcher is willing to accept
that the true margin of error exceeds the
acceptable margin of error; i.e., the
probability that differences revealed by
statistical analyses really do not exist;
also known as Type I error. The Type I
error is making inference that is not in
existence.
Type II error which is not addressed in this
lecture
Another type of error is beta error.
Type II error occurs when statistical procedures result
in a judgment of no significant differences when
these differences do indeed exist.
Type II error is unable to detect the differences when
there are statistically significant differences exists.
Sample Size Determination and the
Thinking Process
Think on what should be there!
𝒏 =
𝒏𝒖𝒎𝒆𝒓𝒂𝒕𝒐𝒓
𝒅𝒆𝒏𝒐𝒎𝒊𝒏𝒂𝒕𝒐𝒓
,
Where n is the sample size.
We will not go into the mathematical aspects but try to intuitively
understand the complete process.
To determine n, the sample size what factors should be there in the
numerator and denominator? i.e., how we have to design a formula ?
Alpha Level in Practice
The alpha level used in determining the sample size in most social
sciences research is 0.05 or 0.01.
We will be using Cochran’s formula which we will discuss in future
slides and use student t-value.
We have studied to use student t- distribution for small sample
size and for large sample size normal distribution. If you read
most of the articles and statistical output you will observe t-value
being given by SPSS or R for larger samples.
I want you to reflect on this point for sometime by reading the
self instruction material.
I have discussed the history of t-
distribution in another self
instruction manual.
The t value is robust in the sense
for sample size lesser than 60, it
has a different value and more
than 120 it will approach normal
distribution value
Th t-value for α level of point 0.05
is 1.96 for sample sizes of above
120 sample units.
You may be wondering what is
standard normal table and value of
1.96? Why it has no units?
For this you should go through the
self instruction manual discussion
on standard normal distribution.
t-value
Alpha Levels to be used for different studies
An alpha level of 0.05 is acceptable for most of your research reports
and for publication in journals however you use an alpha level of 0.10
or even 0.20 to quickly identify statistical phenomena of relationships
and differences.
This output you will be using for further study.
In the case of critical studies you should use a alpha level of 0.01. this
level of alpha level is used if the decisions have large financial and
social implications.
The acceptable margin of error for continuous data and categorical data are
different.
If you want to detect variations in different variables such as quickness of
grasping, ability to solve problems based on the reading material prepared by
SWAYAM.
Thus education levels such as +2, graduation, post graduation are likely to
influence other variables.
In this case the educational levels will be influencing other variables and this
will be our primary variable.
Acceptable Margin of Error
Continuous
data
If you are measuring job satisfaction
on a 7 point scale and assuming it to
be continuous variable generally 3%
margin of error is acceptable. That
means the researcher is confident
but the true mean of the seven point
scale is within ± 0.21.
Reflect on why 3% will give us 0.21
error?
The answer is you have to multiply
0.03 with 7 to obtain 0.21.
Acceptable Margin for Continuous Variables
The other critical component of sample size
formula is the estimation of variance in the
primary variables
As a researcher you cannot control variance, but
you can incorporate variance estimate in your
research design
Four ways are suggested for estimating population
variance for sample size determination
Variance Estimation
Four stages to estimate sample size
1.Take samples in two stages
1.Use the results of first stage o determine how many
samples are needed in second sample based on
variance observed in first sample
Use Pilot studies
Use Previous studies with similar population and,
Use estimator based on some mathematical logic
As a social science researcher you will be
estimating variance of the scale and a
categorical variables.
The sample standard deviation S
To estimate variance of a scaled variable we use the following
formula:
S=
𝟕 𝒏𝒖𝒎𝒃𝒆𝒓 𝒐𝒇 𝒑𝒐𝒊𝒏𝒕𝒔 𝒐𝒏 𝒕𝒉𝒆 𝒔𝒄𝒂𝒍𝒆
𝟔( 𝒏𝒖𝒎𝒃𝒆𝒓 𝒐𝒇 𝒔𝒕𝒂𝒏𝒅𝒂𝒓𝒅 𝒅𝒆𝒗𝒊𝒂𝒕𝒊𝒐𝒏𝒔)
= 7/6 = 1.167
We all know that 6 standard deviations both on positive and
negative sides of the mean will explain the 98% of variation.
Basic Sample Size Determination
Let us assume that a continuous variable is likely to play a role in measurement of job
satisfaction.
You and your organization decided to set alpha level at 0.05, plan to use 7 point scale
and set level of acceptable error at 3% and has estimated standard deviation of scale
1.167 then the formula is
n = (𝒕 𝟐) * (𝑺 𝟐)/ (𝒅 𝟐)
= (𝟏. 𝟗𝟔 𝟐) * (𝟏. 𝟔𝟕 𝟐)/ ((𝟕 ∗ 𝟎. 𝟎𝟑 𝟐) = 118
Therefore the required sample size is 118.
Suitability of the sample size
This sample size will be suitable if
the participants are captive
audience or where you have control
over selecting random samples out
of the population.
Categorical Data
The sample size formulas for categorical
data is similar but not identical
•N =
[𝒕 𝟐
∗ 𝒑.𝒒 ]
𝒅 𝟐 = [1.962∗(0.5)(0.5)/(0.052 )] = 384
Units
Dr. K. Prabhakar
Professor
Sample Size Determination
Online Refresher Course
On
Research Methods & Data Analysis in HRM

Sample Size Determination

  • 1.
  • 2.
    Let us assumethat you have chosen an organization or a phenomena, to study human resources effectiveness or organizational behavior. Quantitative or qualitative research are the two research paths you may choose based on your objectives and budget availability. Organizational behavior originated from the fact-based scientific experimentation which requires manipulation of independent variables and to study their impact on the dependent variables. However, all phenomena may not be studied with scientific exploration. Therefore there is a need for qualitative research.
  • 3.
    One of themost important task is determination of sample size. Quantitative sample size determination is substantially different from that of qualitative methods. We will be discussing sample size determination under qualitative study.
  • 4.
    Review of researcharticles indicate inappropriate, inadequate, or excessive sample sizes continue to impact the quality of inferences thus inflating the cost and effort of doing research. In this lecture, procedures for determining sample size are addressed to help the researchers to select appropriate sample size.
  • 5.
    Sample size determination forquantitative Research Research starts with defining population consisting of sample units. The process of defining sampling units and frame will be discussed in another lecture. Our objective will be to select a set of samples that are representative of the population and make inferences from the sample and the inferences are generalized for the population. Since we are using samples to make inferences for population, there will be sampling error. Thus objective of sample size determination is to minimize sampling error.
  • 6.
    Sampling Error The samplingerror is the difference between a sample statistic used to estimate a population parameter and the actual but unknown value of the parameter.. For example if you want to estimate the proportion of the employees who are happy out of 1000 employees in an organization. you determined your sample size as 118 employees and you obtained 3.5 as an average score on a scale of 1-5.
  • 7.
    However, if youselect the entire population and perform the same analysis and you obtain the average as 3.8. Thus there is an error of 3.8 - 3.5 = .3. This is called to sampling error. The effect of sampling error and response & nonresponse bias are usually not considered by researchers while choosing the sample size. Sampling Error
  • 8.
    What should be the sample sizefor following instances? Studying voting preferences of Indians; the population for study is approximately 914.5 million voters in India. Attitude of employees of an automobile manufacturing unit with 20,000 work force dispersed across different locations in India. Measuring happiness index for 600 employees working in an IT organization. Measuring stress levels of eighty employees working in night shifts in an information technology enabled service firm. What is your observation?
  • 9.
    You will observethat the population range between 90 million units to 60 units in the four examples. What is the relationship of sample size to that of population size? Or Should the researcher be concerned about size of population? The sample size is not dependent on the population size. This is a crucial statistical insight that is counter intuitive. Please do go through the first self instruction material given by Journal of Statistical Education. Observation and Questions
  • 10.
    Assertion and Discussion Letus make a strong assertion; the standard error of an estimator depends on the size of the sample, but not on the size of the population. Some management researchers do talk about 5% to 10% of sample size.
  • 11.
    Alpha Error andBeta Error Survey designs try to minimize both alpha error (finding a difference that does not actually exist in the population) and, beta error (failing to find a difference that exists in the population.
  • 12.
    What is theway to go about calculating the sample size? Let us start with the variables that are measured in a hypothetical job satisfaction study that is to be conducted. You know that job satisfaction is a continuous variable and may be measured to the precision that you wish to measure. This Photo by Unknown Author is licensed under CC BY-NC-ND
  • 13.
    Primary variables ofmeasurement Job satisfaction is a continuous variable and you are going to use a 7- point scale in the instrument. Thus we will consider continuous variable measured on a 7-point scale. We know that job satisfaction is influences by variables such as gender& race (nominal variable), number of years of service, age group, educational qualification (categorical variable). Question is which variable to be used in the formula that we are going to form?
  • 14.
    Step one ofour starting point • He posited that “One method of determining sample size is to specify margins of error for the items that are regarded as most vital to the survey. • An estimation of the sample size needed is first made separately for each of these important items” Recommendation given by Cochran(1977). Reference: Cochran, W. G. (1977). Sampling Techniques (3rd Ed.). New York: John Wiley & If gender is a primary variable that is likely to impact the measurement of job satisfaction(primary variable) by 7-point scale, it is likely to provide higher sample size. Thus the researcher will have a range of n’s that is n=sample size for different variables. We should find all the n’s for different variables and select the highest n so that we obtain likely lowest sample error.
  • 15.
    Error Estimation – YourDecision…. Factors to be considered in error estimation are The risk the researcher is willing to accept in the research which is called as margin of error, or the error the researcher thinks that his inferences about population are estimated. The alpha level is the level of acceptable risk the researcher is willing to accept that the true margin of error exceeds the acceptable margin of error; i.e., the probability that differences revealed by statistical analyses really do not exist; also known as Type I error. The Type I error is making inference that is not in existence.
  • 16.
    Type II errorwhich is not addressed in this lecture Another type of error is beta error. Type II error occurs when statistical procedures result in a judgment of no significant differences when these differences do indeed exist. Type II error is unable to detect the differences when there are statistically significant differences exists.
  • 17.
    Sample Size Determinationand the Thinking Process Think on what should be there! 𝒏 = 𝒏𝒖𝒎𝒆𝒓𝒂𝒕𝒐𝒓 𝒅𝒆𝒏𝒐𝒎𝒊𝒏𝒂𝒕𝒐𝒓 , Where n is the sample size. We will not go into the mathematical aspects but try to intuitively understand the complete process. To determine n, the sample size what factors should be there in the numerator and denominator? i.e., how we have to design a formula ?
  • 18.
    Alpha Level inPractice The alpha level used in determining the sample size in most social sciences research is 0.05 or 0.01. We will be using Cochran’s formula which we will discuss in future slides and use student t-value. We have studied to use student t- distribution for small sample size and for large sample size normal distribution. If you read most of the articles and statistical output you will observe t-value being given by SPSS or R for larger samples. I want you to reflect on this point for sometime by reading the self instruction material.
  • 19.
    I have discussedthe history of t- distribution in another self instruction manual. The t value is robust in the sense for sample size lesser than 60, it has a different value and more than 120 it will approach normal distribution value Th t-value for α level of point 0.05 is 1.96 for sample sizes of above 120 sample units. You may be wondering what is standard normal table and value of 1.96? Why it has no units? For this you should go through the self instruction manual discussion on standard normal distribution. t-value
  • 20.
    Alpha Levels tobe used for different studies An alpha level of 0.05 is acceptable for most of your research reports and for publication in journals however you use an alpha level of 0.10 or even 0.20 to quickly identify statistical phenomena of relationships and differences. This output you will be using for further study. In the case of critical studies you should use a alpha level of 0.01. this level of alpha level is used if the decisions have large financial and social implications.
  • 21.
    The acceptable marginof error for continuous data and categorical data are different. If you want to detect variations in different variables such as quickness of grasping, ability to solve problems based on the reading material prepared by SWAYAM. Thus education levels such as +2, graduation, post graduation are likely to influence other variables. In this case the educational levels will be influencing other variables and this will be our primary variable. Acceptable Margin of Error
  • 22.
    Continuous data If you aremeasuring job satisfaction on a 7 point scale and assuming it to be continuous variable generally 3% margin of error is acceptable. That means the researcher is confident but the true mean of the seven point scale is within ± 0.21. Reflect on why 3% will give us 0.21 error? The answer is you have to multiply 0.03 with 7 to obtain 0.21. Acceptable Margin for Continuous Variables
  • 23.
    The other criticalcomponent of sample size formula is the estimation of variance in the primary variables As a researcher you cannot control variance, but you can incorporate variance estimate in your research design Four ways are suggested for estimating population variance for sample size determination Variance Estimation
  • 24.
    Four stages toestimate sample size 1.Take samples in two stages 1.Use the results of first stage o determine how many samples are needed in second sample based on variance observed in first sample Use Pilot studies Use Previous studies with similar population and, Use estimator based on some mathematical logic As a social science researcher you will be estimating variance of the scale and a categorical variables.
  • 25.
    The sample standarddeviation S To estimate variance of a scaled variable we use the following formula: S= 𝟕 𝒏𝒖𝒎𝒃𝒆𝒓 𝒐𝒇 𝒑𝒐𝒊𝒏𝒕𝒔 𝒐𝒏 𝒕𝒉𝒆 𝒔𝒄𝒂𝒍𝒆 𝟔( 𝒏𝒖𝒎𝒃𝒆𝒓 𝒐𝒇 𝒔𝒕𝒂𝒏𝒅𝒂𝒓𝒅 𝒅𝒆𝒗𝒊𝒂𝒕𝒊𝒐𝒏𝒔) = 7/6 = 1.167 We all know that 6 standard deviations both on positive and negative sides of the mean will explain the 98% of variation.
  • 26.
    Basic Sample SizeDetermination Let us assume that a continuous variable is likely to play a role in measurement of job satisfaction. You and your organization decided to set alpha level at 0.05, plan to use 7 point scale and set level of acceptable error at 3% and has estimated standard deviation of scale 1.167 then the formula is n = (𝒕 𝟐) * (𝑺 𝟐)/ (𝒅 𝟐) = (𝟏. 𝟗𝟔 𝟐) * (𝟏. 𝟔𝟕 𝟐)/ ((𝟕 ∗ 𝟎. 𝟎𝟑 𝟐) = 118 Therefore the required sample size is 118.
  • 27.
    Suitability of thesample size This sample size will be suitable if the participants are captive audience or where you have control over selecting random samples out of the population.
  • 28.
    Categorical Data The samplesize formulas for categorical data is similar but not identical •N = [𝒕 𝟐 ∗ 𝒑.𝒒 ] 𝒅 𝟐 = [1.962∗(0.5)(0.5)/(0.052 )] = 384 Units
  • 29.
    Dr. K. Prabhakar Professor SampleSize Determination Online Refresher Course On Research Methods & Data Analysis in HRM