Sampling & Hypothesis
Introduction to Sampling, Concepts of Population, Sample, Sampling Frame,
Sampling Error, Sample Size, Characteristics of a good sample, Types of
Sampling-Probability and Non-Probability, Determining Size of the Sample,
Sample Vs. Census, Introduction to Hypothesis: Meaning, Concepts & Types,
Type I and Type II Errors, Level of Significance, Testing of Hypotheses: Concepts,
Steps in Testing of Hypothesis, P-Value Approach.
Introduction to Sampling
• The process of selecting a number of individuals for a
study in such a way that the individuals represent the
larger group from which they were selected.
• Population
• For example, the number of full-time MBA students in a business school could form one
population. If there are 200 such students, the population size would be 200. We may be
interested in understanding their perceptions about business education. If there are 200
class IV employees in an organization and we are interested in measuring their job
satisfaction, all the 200 class IV employees would form the population of interest. If a TV
manufacturing company produces 150 TVs per week and we are interested in estimating
the proportion of defective TVs produced per week, all the 150 TVs would form our
population. If, in an organization there are 1000 engineers, out of which 350 are mechanical
engineers and we are interested in examining the proportion of mechanical engineers who
intend to leave the organization within six months, all the 350 mechanical engineers would
form the population of interest. If the interest is in studying how the patients in a hospital are
looked after, then all the patients of the hospital would fall under the category of population.
• Element: An element comprises a single member of the
population. Out of the 350 mechanical engineers
mentioned above, each mechanical engineer would form
an element of the population. In the example of MBA
students whose perception about the management
education is of interest to us, each of the 200 MBA
students will be an element of the population. This means
that there will be 200 elements of the population.
• Sampling frame: Sampling frame comprises all the elements of a
population with proper identification that is available to us for selection at
any stage of sampling.
• For example, the list of registered voters in a constituency could form a
sampling frame; the telephone directory; the number of students registered
with a university; the attendance sheet of a particular class and the payroll
of an organization are examples of sampling frames. When the population
size is very large, it becomes virtually impossible to form a sampling
frame. We know that there is a large number of consumers of soft drinks
and, therefore, it becomes very difficult to form the sampling frame for the
same.
• Sample: It is a subset of the population. It comprises only some elements of
the population. If out of the 350 mechanical engineers employed in an
organization, 30 are surveyed regarding their intention to leave the
organization in the next six months, these 30 members would constitute the
sample.
• Sampling unit: A sampling unit is a single member of the sample. If a
sample of 50 students is taken from a population of 200 MBA students in a
business school, then each of the 50 students is a sampling unit. Another
example could be that if a sample of 50 patients is taken from a hospital to
understand their perception about the services of the hospital, each of the 50
patients is a sampling unit.
• Sampling: It is a process of selecting an adequate number of elements
from the population so that the study of the sample will not only help in
understanding the characteristics of the population but will also enable
us to generalize the results. We will see later that there are two types of
sampling designs—probability sampling design and non-probability
sampling design.
• Census (or complete enumeration): An examination of each and
every element of the population is called census or complete
enumeration. Census is an alternative to sampling. We will discuss the
inherent advantages of sampling over a complete enumeration later.
Process of Sampling
• ➢ Defining the target population
• ➢ Specifying the sampling frame
• ➢ Specifying the sampling unit
• ➢ Selection of sampling method
• ➢ Determination of sampling size
• ➢ Specifying the sampling plan
• ➢ Selecting the sample
• 1) Defining the target population:- Defining the target population is
the first step of the sampling process. Normally, target population is
defined in the terms of sampling unit, elements and time frame. A
well-defined population reduces the probability of those respondents
who do not fit for the research objectives of the company.
• 2) Specifying the sampling frame:- After defining the targeted
population, the next step is to specify the sampling frame. It is a list of
those elements from which the sample may be drawn. When the
sampling frame does not represent the total population accurately,
the sampling frame error pops up.
• 3) Specifying the sampling unit:- It isthe basic unit which
contains a single or group of elements of the population to be
sampled.
• 4) Selection of sampling Method:- The next step under the
process of sampling is the selection of sampling method, which
outlines the way in which the sample unit is to be selected. The
sampling method choice is influenced by the objectives of the
business research, time constraints, availability of financial
resources etc.sampling methods can be categorized under two
categories i.e., probability and non-probability sampling.
• 5) Determining the sample size:- The sample size plays an important
role in the process of sampling. There are many ways of classifying the
techniques used in determining the sample size.
• 6) Specifying the sampling plan:- After determining the size of the
sample, the next step is to specify the sampling plan. Under this step, the
decision regarding implementation of the research process is outlined.
There are the guidelines which would help the researcher in every step
of the process.
• 7) Selecting the sample:- It is the final step of the sampling process in
which the actual selection of the sample elements is carried out. It
involves implementing the sampling plan required for the survey.
• The sample size for any research study depends upon four Ps:
• 1. Purpose: The required precision of study.
• 2. Population: The size and nature of population under study
• 3. Procedure: The time, budget and resources available.
• 4. Publishing: The importance of the studies.
• Note: The more heterogeneous or diverse the population is, the
bigger should be the sample size.
SAMPLING & NON-SAMPLING ERRORS
• There are two types of error that may occur while we are trying to
estimate the population parameters from the sample. These are
called sampling and nonsampling errors
• The difference between the values derived from the sample of a
population and the true values of the population parameters is
considered a sampling error. The errors can be eliminated by
increasing the sample size or the number of samples. : This error
arises when a sample is not representative of the population.
• Sampling errors do not occur in a census, as the census values
are based on the entire population.
• For example, if our population comprises 200 MBA students in a business
school and we want to estimate the average height of these 200 students by
taking a sample of 10 (say). Let us assume for the sake of simplicity that the
true value of population mean (parameter) is known. When we estimate the
average height of the sampled students, we may find that the sample mean is
far away from the population mean. The difference between the sample mean
and the population mean is called sampling error, and this could arise because
the sample of 10 students may not be representative of the entire population.
Suppose now we increase the sample size from 10 to 15, we may find that the
sampling error reduces. This way, if we keep doing so, we may note that the
sampling error reduces with the increase in sample size as an increased
sample may result in increasing the representativeness of the sample.
• Non-sampling error
• This error arises not because a sample is not a representative of the population but because of other reasons.
Some of these reasons are listed below:
• • The respondents when asked for information on a particular variable may not give the correct answers. If a
person aged 48 is asked a question about his age, he may indicate the age to be 36, which may result in an
error and in estimating the true value of the variable of interest.
• • The error can arise while transferring the data from the questionnaire to the spreadsheet on the computer.
• • There can be errors at the time of coding, tabulation and computation.
• • If the population of the study is not properly defined, it could lead to errors.
• • The chosen respondent may not be available to answer the questions or may refuse to be part of the study.
• • There may be a sampling frame error. Suppose the population comprises households with low income, high
income and middle class category. The researcher might decide to ignore the low-income category respondents
and may take the sample only from the middle and the high-income category people.
How do we measure error?
• • Two common measures of error are:
• standard error and the relative standard error.
• Standard Error (SE)
• • Standard Error (SE) is a measure of the variation between any estimated
population value that is based on a sample rather than true value for the population.
• • SE of any estimate for a measure of average magnitude of the difference between
sample estimate and population parameters taken over the all sample estimate from
the population.
• • It is important to consider the Standard Error as it affects the accuracy of the
estimates and, therefore, the importance that can be placed on the interpretations
drawn from the data.
• • SE is applied for standard deviation of sampling distribution of any estimate
• • The Standard Error of the Mean (SEM) can be expressed as: where s is the
standard deviation of the population. n is the size (number of observations) of the
sample.
• Relative Standard Error (RSE)
• Relative Standard Error (RSE) is the standard error
expressed as a proportion of an estimated value. It is
usually displayed as a percentage. RSEs are a useful
measure as they provide an indication of the relative size
of the error likely to have occurred due to sampling.
• A high RSE indicates less confidence that an estimated
value is close to the true population value.
• • The Standard Error measure indicates the extent to
which a survey estimate is likely to deviate from the true
population and is expressed as a number.
• • The Relative Standard Error (RSE) is the standard error
expressed as a fraction of the estimate and is usually
expressed as a percentage.
• • Estimates with a RSE of 25% or greater are subject to
high sampling error and should be used with caution
Probability Sampling…
• A probability sampling is one in which every unit in the
population has a chance of being selected in the sample.
Types of probability sampling
• 1) Simple random sample- Every member of the population has a
known and equal chance of being selected. This sample
technique gives each element an equal and independent chance
or probability of selection. For example in a population of 25
students in a college under master of commerce studies each
student has 1/25th chance of being selected. This method can be
used for populations of any size with homogenous character.
Three methods can be used to draw sample in this approach
• a) lottery method, b) use of random table number and c)
Computers.
• 2) Systematic Sampling/Fixed interval sampling-The entire list of items of the population
are given serial numbers. Thereafter the sample items are selected with equal intervals, then
the first unit of a sample is selected randomly and the remaining units at the fixed interval (K
• th element) in a given series. In this case, k = (population size/sample size). For example in a
population of 25 students in a college under master of commerce studies. The management
is going to select 5 students out of 25 then the process is Starting number: Select the starting
number randomly for this purpose researcher can use lottery method taking 1-3 number (k =
population size/sample size 25/5=5 is the k th number)
• • Interval: The researcher picks second number taking interval of k th (k=5) which will serve
as the constant difference between any two consecutive numbers in the progression till the
sample size.
• • Selection of Sample- first sample number selected randomly by using lottery method is 3 the
second sample is (3+kth i.e. 3+5=8) then and so on. E.g. sample is 3, 8, 13, 18 and so on till
sample size of 5 students
Cluster Sampling
• Cluster Sampling-is a sampling technique where the entire population is divided into
groups, or clusters, and a random sample of these clusters are selected. All observations
in the selected clusters are included in the sample.The most common cluster used in
research is a geographical cluster. (E.g. household, income levels, etc) .For example, a
researcher wants to survey academic performance of high school students in
Ramanagaram district. The process is – Divide in to groups/cluster-First the Research can
divide the entire population (high schools of Ramanagaram) into different clusters (taluk).
• – Select cluster-Then the researcher selects a number of clusters (taluk) through simple or
systematic random sampling.
• – Selected cluster include all the element-Then, from the selected clusters (randomly
selected Taluk) the researcher can either include all the high school students as subjects
or he can select a number of subjects from each cluster through simple or systematic
random sampling.
• the example of a cluster sampling, one may assume that there is a company having its
corporate office in a multi-storey building. In the first floor, we may assume that there is
a marketing department where the offices of the president (marketing), vice president
(marketing) and so on to the level of management trainee (marketing) are there.
Naturally, there would be a lot of variation (heterogeneity) in the amount of salaries
they draw and hence a high amount of variation in the amount of money spent on
entertainment. Similarly, if the finance department is housed on the second floor, we
may find almost a similar pattern. Same could be assumed for third, fourth and other
floors. Now, if each of the floors could be treated as a cluster, we find that there is
homogeneity between the clusters but there is a lot of heterogeneity within the clusters.
Now, a sample of, say, 2 to 3 clusters is chosen at random and once having done so,
each of the cluster is enumerated completely to be able to make an estimate of the
amount of money the entire population spends on entertainment.
Stratified Random Sampling
• Stratified Random Sampling- The researcher divides
the entire heterogeneous population into different
nonoverlapping homogeneous subgroups or strata, and
sample items are selected from each stratum (group)
randomly, all the units drawn from each stratum is called
sample size. The most common strata used in stratified
random sampling are age, gender, socioeconomic status,
religion, nationality and educational attainment.
• The process is – divide members of the population into
homogeneous subgroups (stratum)
• – The strata should be mutually exclusive (i.e. every
element in the population must be assigned to only one
stratum)
• – Then simple or systematic sampling is applied within
each stratum
• – The units drawn from each stratum is called sample size.
Types of stratified sampling
Multi-Stage sampling
• Multi-Stage sampling-Using all the sample elements in all the selected
clusters may be expensive or unnecessary. Under these circumstances,
multistage cluster/stage sampling becomes useful. Instead of using all the
elements contained in the selected clusters, the researcher randomly
selects elements from each cluster. The process is
• – Constructing the clusters is the first stage.
• – Deciding what elements within the cluster to use is the second stage.
•  The technique is used frequently when a complete list of all members of
the population does not exist and is inappropriate.
2. Non-probability sampling
• Accidental sampling-It is known as grab or convenience
sampling or opportunity sampling. the sample being
drawn from that part of the population that is close to
hand.
• That is, sample populations selected because it is readily
available and convenient, as researchers are drawing on
relationships or networks to which they have easy access.
Quota Sampling
• Quota Sampling –Quota sampling is a non-probability sampling technique
wherein the assembled sample has the same proportions of individuals as the
entire population with respect to known characteristics, traits or focused
phenomenon. The process is
• – Population is first segmented into mutually exclusive subgroups,
• – Then judgment is used to select the subjects or units from each segment
based on a specified proportion.
• • For example, an interviewer may be told to sample 200 females and 300
males between the age of 25 and 40.
• • The sample is representative of the entire population. It also allows the
researcher to study traits and characteristics that are noted for each subgroup.
• A researcher wants to measure the job satisfaction level among the
employees of a large organization and believes that the job satisfaction
level varies across different types of employees. The organization is
having 10 per cent, 15 per cent, 35 per cent and 40 per cent, class I,
class II, class III and class IV, employees, respectively. If a sample of
200 employees is to be selected from the organization, then 20, 30, 70
and 80 employees from class I, class II, class III and class IV
respectively should be selected from the population. Now, various
investigators may be assigned quotas from each class in such a way
that a sample of 200 employees is selected from various classes in the
same proportion as mentioned in the population.
• For example, the first field worker may be assigned a quota of 10
employees from class I, 15 from class II, 20 from class III and 30 from
class IV. Similarly, a second investigator may be assigned a different
quota such that a total sample of 200 is selected in the same
proportion as the population is distributed. Please note that the
investigators may choose the employees from each class as
conveniently available to them. Therefore, the sample may not be
totally representative of the population, hence the findings of the
research cannot be generalized. However, the reason for choosing
this sampling design is the convenience it offers in terms of effort, cost
and time.
• In the example given above, it may be argued that job satisfaction is
also influenced by education level, categorized as higher secondary or
below, graduation, and postgraduation and above. By incorporating
this variable, the distribution of population may look as given in Table.
• we may note that there are 8 per cent class I employees who are
postgraduate and above, there are 35 per cent class IV employees
with a higher secondary education and below and so on.
• Now, suppose a sample of size 200 is again proposed. In this case,
the distribution of sample satisfying these two conditions in the same
proportion in the population is given in Table.
Purposive Sampling/Judgmental sampling
• the researcher chooses the sample based on who they
think would be appropriate for the study. This is used
primarily when there is a limited number of people that
have expertise in the area being researched.
Snowball sampling
• Snowball sampling- it is also known as chain sampling, chain-referral
sampling, referral sampling. It is used by researchers to identify
potential subjects in studies where subjects are hard to locate. After
observing the initial subject, the researcher asks for assistance from
the subject to help identify people with a similar trait of interest. The
sample group appears to grow like a rolling snowball. For example a
researcher is studying environmental engineers but can only find five.
The researcher asks these engineers if they know any more. They
give several further referrals, which in turn provide additional contacts.
In this way, researcher manages to contact sufficient engineers.
Type of Sampling Meaning Example
1. Simple Random
Sampling
The sampling procedure that ensures
each element in the population will have
an equal chance of being included in the
sample is called simple random
sampling .
Drawing names from a hat and selecting the winning
lottery ticket from a large drum.
Drawing names or numbers out of a fishbowl, using a
spinner, rolling dice, or turning a roulette wheel
2. Systematic
Sampling
A starting point is selected by a random
process; then every nth number on the
list is selected.
A researcher wants to take a sample of 1,000 from a
list of 200,000 names. With systematic sampling ,
every 200th name from the list would be drawn.
To take a sample of consumers from a rural
telephone directory that does not separate business
from residential listings, every 23rd name might be
selected
Probability Sampling
Type of
Sampling
Meaning Example
3. Stratified
Sampling
Step 1 - Choosing strata (subgroups) on
the basis of existing information
Step 2 - A subsample is drawn using
simple random sampling within each
stratum.
The usefulness of dividing the
population into subgroups,
or strata, whose members are more or
less equal with respect to some
characteristic
A stratified sample is to ensure that the
sample will accurately reflect the
population on the basis of the criterion
or criteria used for stratification.
Classifying retail outlets based on annual
sales volume.
Draw a sample from each stratum
Probability Sampling
Type of
Sampling
Meaning Example
4. Proportional
versus
Disproportional
Sampling
If the number of sampling units
drawn from each stratum is in
proportion to the relative population
size of the stratum, the sample is
a proportional stratified sample .
In a disproportional stratified
sample the sample size for each
stratum is not allocated in proportion
to the population size but is dictated
by analytical considerations, such
as variability in store sales volume.
Probability Sampling
Type of
Sampling
Meaning Example
5. Cluster
Sampling
First stage - a sample of areas is
chosen; Second stage - a sample
of respondents within those areas is
selected.
Population is divided into clusters of
homogeneous units, usually based
on geographical area.
A sample of such clusters is then
selected. All units from the selected
clusters are studied.
A grocery store researcher, for example, may
randomly choose several geographic areas as
primary sampling units and then interview all or
a sample of grocery stores within the
geographic clusters. Interviews are confined to
these clusters only. No interviews occur in other
clusters.
Population Element - College seniors
Clusters - Colleges
Population Element - Airline travelers
Clusters - Airports
Probability Sampling
Type of
Sampling
Meaning Example
6. Multistage Area
Sampling
Typically, geographic areas are
randomly selected in progressively
smaller (lower-population) units.
Geographic Hierarchy inside Urbanized
Areas From U.S. Bureau of the Census, “Geography
—Concepts and Products,” Washington, DC, August
First stage, random number of districts chosen
in all states.
Followed by random number of talukas,
villages.
Then third stage units will be houses.
All ultimate units (houses, for instance)
selected at last step are surveyed.
Probability Sampling
Type of
Sampling
Meaning Example
1. Convenience
Sampling
It refers to sampling by obtaining
people or units that are
conveniently available.
Set up an interviewing booth to intercept
consumers at a shopping mall.
2. Judgment
Sampling (or
Purposive
Sampling)
An experienced individual selects
the sample based on his or her
judgment and experience. It will
be about some appropriate
characteristic required of the
sample member.
A fashion manufacturer regularly selects a
sample of key accounts that it believes are
capable of providing information needed to
predict what may sell in the summer.
Forecast election results, Test-market cities etc.
Nonprobability Sampling Methods
Type of Sampling Meaning Example
3. Quota Sampling The interviewer has a quota to achieve.
ensure that the various subgroups in a
population are represented on pertinent
sample characteristics to the exact extent
that the investigators desire.
An interviewer in a particular city may be assigned 100
interviews, 35 with owners of Sony TVs, 30 with owners
of Samsung TVs, 18 with owners of Panasonic TVs,
and the rest with owners of other brands. The
interviewer is responsible for finding enough people to
meet the quota.
4. Snowball Sampling A variety of procedures known
as snowball sampling involve using
probability methods for an initial selection
of respondents and then obtaining
additional respondents through
information provided by the initial
respondents.
This technique is used to locate members
of rare populations by referrals.
Suppose a manufacturer of sports equipment is
considering marketing a mahogany croquet set for
serious adult players. This market is certainly small. An
extremely large sample would be necessary to find 100
serious adult croquet players. It would be much more
economical to survey, say, 300 people, find 15 croquet
players, and ask them for the names of other players.
- Psychologists or Psychiatrists
Nonprobability Sampling Methods
Hypotheses
• The Hypothesis is an assumption which is tested to check
whether the inference drawn from the sample of data
stands true for the entire population or not.
• whether a new drug is more effective than the existing
drug based on the sample data, and whether the
proportion of smokers in a class is different from 0.30.
Essential Characteristics of Good Hypotheses:
They are declarative statements, not questions.
They are tentative—conjectures that await empirical evidence.
They must be testable.
They are statements of the relation
between variables.
They must have at least two variables.
It is neither too specific nor to general
It is a prediction of consequences
• Null hypothesis: The hypotheses that are proposed with the intent of receiving a
rejection for them are called null hypotheses. This requires that we hypothesize the
opposite of what is desired to be proved. For example, if we want to show that
sales and advertisement expenditure are related, we formulate the null hypothesis
that they are not related. Similarly, if we want to conclude that the new sales
training programme is effective, we formulate the null hypothesis that the new
training programme is not effective, and if we want to prove that the average wages
of skilled workers in town 1 is greater than that of town 2, we formulate the null
hypotheses that there is no difference in the average wages of the skilled workers
in both the towns. Since we hypothesize that sales and advertisement are not
related, new training programme is not effective and the average wages of skilled
workers in both the towns are equal, we call such hypotheses null hypotheses and
denote them as H0.
• The Null Hypothesis denoted by H0 asserts that there is
no true difference between the sample of data and the
population parameter and that the difference is accidental
which is caused due to the fluctuations in sampling.
• Thus, a null hypothesis states that H0 = there is no
difference between the assumed and actual value of the
parameter.
• Alternative hypotheses: Rejection of null hypotheses leads to the
acceptance of alternative hypotheses. The rejection of null hypothesis
indicates that the relationship between variables (e.g., sales and
advertisement expenditure) or the difference between means (e.g., wages
of skilled workers in town 1 and town 2) or the difference between
proportions have statistical significance and the acceptance of the null
hypotheses indicates that these differences are due to chance. As already
mentioned, the alternative hypotheses specify that values/relation which the
researcher believes hold true. The alternative hypotheses can cover a
whole range of values rather than a single point. The alternative
hypotheses are denoted by H1.
One-tailed and two-tailed tests:
• A test is called one-sided (or one-tailed) only if the null
hypothesis gets rejected when a value of the test statistic
falls in one specified tail of the distribution. Further, the
test is called two-sided (or two-tailed) if null hypothesis
gets rejected when a value of the test statistic falls in
either one or the other of the two tails of its sampling
distribution
• In statistics, the p-value is the probability of obtaining
results at least as extreme as the observed results of a
statistical hypothesis test, assuming that the null
hypothesis is correct.
• For example, consider a soft drink bottling plant which
dispenses soft drinks in bottles of 300 ml capacity. The
bottlingis done through an automatic plant. An overfilling
of bottle (liquid content more than 300 ml) means a huge
loss to the company given the large volume of sales. An
underfilling means the customers are getting less than
300 ml of the drink when they are paying for 300 ml. This
could bring bad reputation to the company. The company
wants to avoid both overfilling and underfilling.
• Therefore, it would prefer to test the hypothesis whether
the mean content of the bottles is different from 300 ml.
This hypothesis could be written as:
• The hypotheses stated above are called two-tailed or two-
sided hypotheses. However, if the concern is the
overfilling of bottles, it could be stated as:
• Such hypotheses are called one-tailed or one-sided hypotheses and
the researcher would be interested in the upper tail (right hand tail) of
the distribution.
• If however, the concern is loss of reputation of the company
(underfilling of the bottles), the hypothesis may be stated as:
• The hypothesis stated above is also called one-tailed test and the
researcher would be interested in the lower tail (left hand tail) of the
distribution.
Type I and type II error:
• : The acceptance or rejection of a hypothesis is based
upon sample results and there is always a possibility of
sample not being representative of the population. This
could result in errors as a consequence of which
inferences drawn could be wrong.
Example - In a clinical trial of a new drug
Example
• If null hypothesis H0 is true and is accepted or H0 when false is
rejected, the decision is correct in either case. However, if the
hypothesis H0 is rejected when it is actually true, the researcher is
committing what is called a Type I error. The probability of
committing a Type I error is denoted by alpha (α). This is termed
as the level of significance. Similarly, if the null hypothesis H0
when false is accepted, the researcher is committing an error
called Type II error. The probability of committing a Type II error is
denoted by beta (β). The expression 1 – β is called power of test.
• For example H0: there is no difference between the two drugs on
average. Type I error will occur if we conclude that the two drugs
produce different effects when actually there isn’t a difference.
• The probability of making a Type I error when the null hypothesis
is true as an equality is called the level of significance.
• Applications of hypothesis testing that only control the Type I error
are often called significance tests.
• Prob(Type I error) = significance level = α
• Hypothesis Testing Procedure : The following steps are
followed in hypothesis testing:
STEPS IN TESTING OF HYPOTHESIS
• 1. Set up a Hypothesis:
• The first step is to establish the hypothesis to be tested.
• The statistical hypothesis is an assumption about the
value of some unknown parameter, and the hypothesis
provides some numerical value or range of values for the
parameter.
• Here two hypotheses about the population are
constructed: Null Hypothesis and Alternative Hypothesis.
• the two hypotheses are constructed in such a way that if one is true,
the other one is false and vice versa. There can also be situations
where the researcher is interested in establishing the relationship
between any two variables. In such a case, a null hypothesis is set
as the hypothesis of no relationship between those two variables;
whereas the alternative hypothesis is the hypothesis of the
relationship between variables. The rejection of the null hypothesis
indicates that the differences/ relationship have a statistical
significance and the acceptance of the null hypothesis means that
any difference/relationship is due to chance.
• 2. Set up a Suitable Significance Level:
• The next step in the testing of hypothesis exercise is to choose a suitable
level of significance. The level of significance denoted by α is chosen before
drawing any sample. The level of significance denotes the probability of
rejecting the null hypothesis when it is true. The value of α varies from
problem to problem, but usually it is taken as either 5 per cent or 1 per cent. A
5 per cent level of significance means that there are 5 chances out of hundred
that a null hypothesis will get rejected when it should be accepted. This
means that the researcher is 95 per cent confident that a right decision has
been taken. Therefore, it is seen that the confidence with which a researcher
rejects or accepts a null hypothesis depends upon the level of significance.
• 3. Determining a Suitable Test Statistic:
• After the hypothesis is constructed, and the significance level is
decided upon, the next step is to determine a suitable test
statistic and its distribution
• Most of the statistic tests assume the following form
• the test statistic could be t, Z, χ2 or F, depending upon various
assumptions.
• 4. Determining the Critical Region:
• Before the samples are drawn it must be decided which values to the test
statistic will lead to the acceptance of H0 and which will lead to its rejection.
The values that lead to rejection of H0 are called the critical region.
• Before a sample is drawn from the population, it is very important to specify the
values of test statistic that will lead to rejection or acceptance of the null
hypothesis. The one that leads to the rejection of null hypothesis is called the
critical region. Given a level of significance, α, the optimal critical region for a
two-tailed test consists of that α/2 per cent area in the right hand tail of the
distribution plus that α/2 per cent in the left hand tail of the distribution where
that null hypothesis is rejected. Therefore, establishing a critical region is similar
to determining a 100 (1 – α) per cent confidence interval.
• Computing the value of test-statistic: The next step is
to compute the value of the test statistic based upon a
random sample of size n. Once the value of test statistic
is computed, one needs to examine whether the sample
results fall in the critical region or in the acceptance
region.
• Making decision: The hypothesis may be rejected or accepted depending
upon whether the value of the test statistic falls in the rejection or the
acceptance region. Management decisions are based upon the statistical
decision of either rejecting or accepting the null hypothesis. If the
hypothesis is being tested at 5 per cent level of significance, it would be
rejected if the observed results have a probability less than 5 per cent. In
such a case, the difference between the sample statistic and the
hypothesized population parameter is considered to be significant. On the
other hand, if the hypothesis is accepted, the difference between the
sample statistic and the hypothesized population parameter is not
regarded as significant and can be attributed to chance.

ARM Module 4 advanced research methodology

  • 1.
    Sampling & Hypothesis Introductionto Sampling, Concepts of Population, Sample, Sampling Frame, Sampling Error, Sample Size, Characteristics of a good sample, Types of Sampling-Probability and Non-Probability, Determining Size of the Sample, Sample Vs. Census, Introduction to Hypothesis: Meaning, Concepts & Types, Type I and Type II Errors, Level of Significance, Testing of Hypotheses: Concepts, Steps in Testing of Hypothesis, P-Value Approach.
  • 3.
    Introduction to Sampling •The process of selecting a number of individuals for a study in such a way that the individuals represent the larger group from which they were selected.
  • 4.
    • Population • Forexample, the number of full-time MBA students in a business school could form one population. If there are 200 such students, the population size would be 200. We may be interested in understanding their perceptions about business education. If there are 200 class IV employees in an organization and we are interested in measuring their job satisfaction, all the 200 class IV employees would form the population of interest. If a TV manufacturing company produces 150 TVs per week and we are interested in estimating the proportion of defective TVs produced per week, all the 150 TVs would form our population. If, in an organization there are 1000 engineers, out of which 350 are mechanical engineers and we are interested in examining the proportion of mechanical engineers who intend to leave the organization within six months, all the 350 mechanical engineers would form the population of interest. If the interest is in studying how the patients in a hospital are looked after, then all the patients of the hospital would fall under the category of population.
  • 5.
    • Element: Anelement comprises a single member of the population. Out of the 350 mechanical engineers mentioned above, each mechanical engineer would form an element of the population. In the example of MBA students whose perception about the management education is of interest to us, each of the 200 MBA students will be an element of the population. This means that there will be 200 elements of the population.
  • 6.
    • Sampling frame:Sampling frame comprises all the elements of a population with proper identification that is available to us for selection at any stage of sampling. • For example, the list of registered voters in a constituency could form a sampling frame; the telephone directory; the number of students registered with a university; the attendance sheet of a particular class and the payroll of an organization are examples of sampling frames. When the population size is very large, it becomes virtually impossible to form a sampling frame. We know that there is a large number of consumers of soft drinks and, therefore, it becomes very difficult to form the sampling frame for the same.
  • 7.
    • Sample: Itis a subset of the population. It comprises only some elements of the population. If out of the 350 mechanical engineers employed in an organization, 30 are surveyed regarding their intention to leave the organization in the next six months, these 30 members would constitute the sample. • Sampling unit: A sampling unit is a single member of the sample. If a sample of 50 students is taken from a population of 200 MBA students in a business school, then each of the 50 students is a sampling unit. Another example could be that if a sample of 50 patients is taken from a hospital to understand their perception about the services of the hospital, each of the 50 patients is a sampling unit.
  • 8.
    • Sampling: Itis a process of selecting an adequate number of elements from the population so that the study of the sample will not only help in understanding the characteristics of the population but will also enable us to generalize the results. We will see later that there are two types of sampling designs—probability sampling design and non-probability sampling design. • Census (or complete enumeration): An examination of each and every element of the population is called census or complete enumeration. Census is an alternative to sampling. We will discuss the inherent advantages of sampling over a complete enumeration later.
  • 9.
    Process of Sampling •➢ Defining the target population • ➢ Specifying the sampling frame • ➢ Specifying the sampling unit • ➢ Selection of sampling method • ➢ Determination of sampling size • ➢ Specifying the sampling plan • ➢ Selecting the sample
  • 10.
    • 1) Definingthe target population:- Defining the target population is the first step of the sampling process. Normally, target population is defined in the terms of sampling unit, elements and time frame. A well-defined population reduces the probability of those respondents who do not fit for the research objectives of the company. • 2) Specifying the sampling frame:- After defining the targeted population, the next step is to specify the sampling frame. It is a list of those elements from which the sample may be drawn. When the sampling frame does not represent the total population accurately, the sampling frame error pops up.
  • 11.
    • 3) Specifyingthe sampling unit:- It isthe basic unit which contains a single or group of elements of the population to be sampled. • 4) Selection of sampling Method:- The next step under the process of sampling is the selection of sampling method, which outlines the way in which the sample unit is to be selected. The sampling method choice is influenced by the objectives of the business research, time constraints, availability of financial resources etc.sampling methods can be categorized under two categories i.e., probability and non-probability sampling.
  • 12.
    • 5) Determiningthe sample size:- The sample size plays an important role in the process of sampling. There are many ways of classifying the techniques used in determining the sample size. • 6) Specifying the sampling plan:- After determining the size of the sample, the next step is to specify the sampling plan. Under this step, the decision regarding implementation of the research process is outlined. There are the guidelines which would help the researcher in every step of the process. • 7) Selecting the sample:- It is the final step of the sampling process in which the actual selection of the sample elements is carried out. It involves implementing the sampling plan required for the survey.
  • 13.
    • The samplesize for any research study depends upon four Ps: • 1. Purpose: The required precision of study. • 2. Population: The size and nature of population under study • 3. Procedure: The time, budget and resources available. • 4. Publishing: The importance of the studies. • Note: The more heterogeneous or diverse the population is, the bigger should be the sample size.
  • 14.
    SAMPLING & NON-SAMPLINGERRORS • There are two types of error that may occur while we are trying to estimate the population parameters from the sample. These are called sampling and nonsampling errors • The difference between the values derived from the sample of a population and the true values of the population parameters is considered a sampling error. The errors can be eliminated by increasing the sample size or the number of samples. : This error arises when a sample is not representative of the population. • Sampling errors do not occur in a census, as the census values are based on the entire population.
  • 15.
    • For example,if our population comprises 200 MBA students in a business school and we want to estimate the average height of these 200 students by taking a sample of 10 (say). Let us assume for the sake of simplicity that the true value of population mean (parameter) is known. When we estimate the average height of the sampled students, we may find that the sample mean is far away from the population mean. The difference between the sample mean and the population mean is called sampling error, and this could arise because the sample of 10 students may not be representative of the entire population. Suppose now we increase the sample size from 10 to 15, we may find that the sampling error reduces. This way, if we keep doing so, we may note that the sampling error reduces with the increase in sample size as an increased sample may result in increasing the representativeness of the sample.
  • 16.
    • Non-sampling error •This error arises not because a sample is not a representative of the population but because of other reasons. Some of these reasons are listed below: • • The respondents when asked for information on a particular variable may not give the correct answers. If a person aged 48 is asked a question about his age, he may indicate the age to be 36, which may result in an error and in estimating the true value of the variable of interest. • • The error can arise while transferring the data from the questionnaire to the spreadsheet on the computer. • • There can be errors at the time of coding, tabulation and computation. • • If the population of the study is not properly defined, it could lead to errors. • • The chosen respondent may not be available to answer the questions or may refuse to be part of the study. • • There may be a sampling frame error. Suppose the population comprises households with low income, high income and middle class category. The researcher might decide to ignore the low-income category respondents and may take the sample only from the middle and the high-income category people.
  • 18.
    How do wemeasure error? • • Two common measures of error are: • standard error and the relative standard error.
  • 19.
    • Standard Error(SE) • • Standard Error (SE) is a measure of the variation between any estimated population value that is based on a sample rather than true value for the population. • • SE of any estimate for a measure of average magnitude of the difference between sample estimate and population parameters taken over the all sample estimate from the population. • • It is important to consider the Standard Error as it affects the accuracy of the estimates and, therefore, the importance that can be placed on the interpretations drawn from the data. • • SE is applied for standard deviation of sampling distribution of any estimate • • The Standard Error of the Mean (SEM) can be expressed as: where s is the standard deviation of the population. n is the size (number of observations) of the sample.
  • 20.
    • Relative StandardError (RSE) • Relative Standard Error (RSE) is the standard error expressed as a proportion of an estimated value. It is usually displayed as a percentage. RSEs are a useful measure as they provide an indication of the relative size of the error likely to have occurred due to sampling. • A high RSE indicates less confidence that an estimated value is close to the true population value.
  • 21.
    • • TheStandard Error measure indicates the extent to which a survey estimate is likely to deviate from the true population and is expressed as a number. • • The Relative Standard Error (RSE) is the standard error expressed as a fraction of the estimate and is usually expressed as a percentage. • • Estimates with a RSE of 25% or greater are subject to high sampling error and should be used with caution
  • 23.
    Probability Sampling… • Aprobability sampling is one in which every unit in the population has a chance of being selected in the sample.
  • 24.
    Types of probabilitysampling • 1) Simple random sample- Every member of the population has a known and equal chance of being selected. This sample technique gives each element an equal and independent chance or probability of selection. For example in a population of 25 students in a college under master of commerce studies each student has 1/25th chance of being selected. This method can be used for populations of any size with homogenous character. Three methods can be used to draw sample in this approach • a) lottery method, b) use of random table number and c) Computers.
  • 25.
    • 2) SystematicSampling/Fixed interval sampling-The entire list of items of the population are given serial numbers. Thereafter the sample items are selected with equal intervals, then the first unit of a sample is selected randomly and the remaining units at the fixed interval (K • th element) in a given series. In this case, k = (population size/sample size). For example in a population of 25 students in a college under master of commerce studies. The management is going to select 5 students out of 25 then the process is Starting number: Select the starting number randomly for this purpose researcher can use lottery method taking 1-3 number (k = population size/sample size 25/5=5 is the k th number) • • Interval: The researcher picks second number taking interval of k th (k=5) which will serve as the constant difference between any two consecutive numbers in the progression till the sample size. • • Selection of Sample- first sample number selected randomly by using lottery method is 3 the second sample is (3+kth i.e. 3+5=8) then and so on. E.g. sample is 3, 8, 13, 18 and so on till sample size of 5 students
  • 26.
    Cluster Sampling • ClusterSampling-is a sampling technique where the entire population is divided into groups, or clusters, and a random sample of these clusters are selected. All observations in the selected clusters are included in the sample.The most common cluster used in research is a geographical cluster. (E.g. household, income levels, etc) .For example, a researcher wants to survey academic performance of high school students in Ramanagaram district. The process is – Divide in to groups/cluster-First the Research can divide the entire population (high schools of Ramanagaram) into different clusters (taluk). • – Select cluster-Then the researcher selects a number of clusters (taluk) through simple or systematic random sampling. • – Selected cluster include all the element-Then, from the selected clusters (randomly selected Taluk) the researcher can either include all the high school students as subjects or he can select a number of subjects from each cluster through simple or systematic random sampling.
  • 27.
    • the exampleof a cluster sampling, one may assume that there is a company having its corporate office in a multi-storey building. In the first floor, we may assume that there is a marketing department where the offices of the president (marketing), vice president (marketing) and so on to the level of management trainee (marketing) are there. Naturally, there would be a lot of variation (heterogeneity) in the amount of salaries they draw and hence a high amount of variation in the amount of money spent on entertainment. Similarly, if the finance department is housed on the second floor, we may find almost a similar pattern. Same could be assumed for third, fourth and other floors. Now, if each of the floors could be treated as a cluster, we find that there is homogeneity between the clusters but there is a lot of heterogeneity within the clusters. Now, a sample of, say, 2 to 3 clusters is chosen at random and once having done so, each of the cluster is enumerated completely to be able to make an estimate of the amount of money the entire population spends on entertainment.
  • 28.
    Stratified Random Sampling •Stratified Random Sampling- The researcher divides the entire heterogeneous population into different nonoverlapping homogeneous subgroups or strata, and sample items are selected from each stratum (group) randomly, all the units drawn from each stratum is called sample size. The most common strata used in stratified random sampling are age, gender, socioeconomic status, religion, nationality and educational attainment.
  • 30.
    • The processis – divide members of the population into homogeneous subgroups (stratum) • – The strata should be mutually exclusive (i.e. every element in the population must be assigned to only one stratum) • – Then simple or systematic sampling is applied within each stratum • – The units drawn from each stratum is called sample size.
  • 31.
  • 32.
    Multi-Stage sampling • Multi-Stagesampling-Using all the sample elements in all the selected clusters may be expensive or unnecessary. Under these circumstances, multistage cluster/stage sampling becomes useful. Instead of using all the elements contained in the selected clusters, the researcher randomly selects elements from each cluster. The process is • – Constructing the clusters is the first stage. • – Deciding what elements within the cluster to use is the second stage. •  The technique is used frequently when a complete list of all members of the population does not exist and is inappropriate.
  • 33.
    2. Non-probability sampling •Accidental sampling-It is known as grab or convenience sampling or opportunity sampling. the sample being drawn from that part of the population that is close to hand. • That is, sample populations selected because it is readily available and convenient, as researchers are drawing on relationships or networks to which they have easy access.
  • 34.
    Quota Sampling • QuotaSampling –Quota sampling is a non-probability sampling technique wherein the assembled sample has the same proportions of individuals as the entire population with respect to known characteristics, traits or focused phenomenon. The process is • – Population is first segmented into mutually exclusive subgroups, • – Then judgment is used to select the subjects or units from each segment based on a specified proportion. • • For example, an interviewer may be told to sample 200 females and 300 males between the age of 25 and 40. • • The sample is representative of the entire population. It also allows the researcher to study traits and characteristics that are noted for each subgroup.
  • 35.
    • A researcherwants to measure the job satisfaction level among the employees of a large organization and believes that the job satisfaction level varies across different types of employees. The organization is having 10 per cent, 15 per cent, 35 per cent and 40 per cent, class I, class II, class III and class IV, employees, respectively. If a sample of 200 employees is to be selected from the organization, then 20, 30, 70 and 80 employees from class I, class II, class III and class IV respectively should be selected from the population. Now, various investigators may be assigned quotas from each class in such a way that a sample of 200 employees is selected from various classes in the same proportion as mentioned in the population.
  • 36.
    • For example,the first field worker may be assigned a quota of 10 employees from class I, 15 from class II, 20 from class III and 30 from class IV. Similarly, a second investigator may be assigned a different quota such that a total sample of 200 is selected in the same proportion as the population is distributed. Please note that the investigators may choose the employees from each class as conveniently available to them. Therefore, the sample may not be totally representative of the population, hence the findings of the research cannot be generalized. However, the reason for choosing this sampling design is the convenience it offers in terms of effort, cost and time.
  • 37.
    • In theexample given above, it may be argued that job satisfaction is also influenced by education level, categorized as higher secondary or below, graduation, and postgraduation and above. By incorporating this variable, the distribution of population may look as given in Table. • we may note that there are 8 per cent class I employees who are postgraduate and above, there are 35 per cent class IV employees with a higher secondary education and below and so on. • Now, suppose a sample of size 200 is again proposed. In this case, the distribution of sample satisfying these two conditions in the same proportion in the population is given in Table.
  • 40.
    Purposive Sampling/Judgmental sampling •the researcher chooses the sample based on who they think would be appropriate for the study. This is used primarily when there is a limited number of people that have expertise in the area being researched.
  • 41.
    Snowball sampling • Snowballsampling- it is also known as chain sampling, chain-referral sampling, referral sampling. It is used by researchers to identify potential subjects in studies where subjects are hard to locate. After observing the initial subject, the researcher asks for assistance from the subject to help identify people with a similar trait of interest. The sample group appears to grow like a rolling snowball. For example a researcher is studying environmental engineers but can only find five. The researcher asks these engineers if they know any more. They give several further referrals, which in turn provide additional contacts. In this way, researcher manages to contact sufficient engineers.
  • 42.
    Type of SamplingMeaning Example 1. Simple Random Sampling The sampling procedure that ensures each element in the population will have an equal chance of being included in the sample is called simple random sampling . Drawing names from a hat and selecting the winning lottery ticket from a large drum. Drawing names or numbers out of a fishbowl, using a spinner, rolling dice, or turning a roulette wheel 2. Systematic Sampling A starting point is selected by a random process; then every nth number on the list is selected. A researcher wants to take a sample of 1,000 from a list of 200,000 names. With systematic sampling , every 200th name from the list would be drawn. To take a sample of consumers from a rural telephone directory that does not separate business from residential listings, every 23rd name might be selected Probability Sampling
  • 43.
    Type of Sampling Meaning Example 3.Stratified Sampling Step 1 - Choosing strata (subgroups) on the basis of existing information Step 2 - A subsample is drawn using simple random sampling within each stratum. The usefulness of dividing the population into subgroups, or strata, whose members are more or less equal with respect to some characteristic A stratified sample is to ensure that the sample will accurately reflect the population on the basis of the criterion or criteria used for stratification. Classifying retail outlets based on annual sales volume. Draw a sample from each stratum Probability Sampling
  • 44.
    Type of Sampling Meaning Example 4.Proportional versus Disproportional Sampling If the number of sampling units drawn from each stratum is in proportion to the relative population size of the stratum, the sample is a proportional stratified sample . In a disproportional stratified sample the sample size for each stratum is not allocated in proportion to the population size but is dictated by analytical considerations, such as variability in store sales volume. Probability Sampling
  • 45.
    Type of Sampling Meaning Example 5.Cluster Sampling First stage - a sample of areas is chosen; Second stage - a sample of respondents within those areas is selected. Population is divided into clusters of homogeneous units, usually based on geographical area. A sample of such clusters is then selected. All units from the selected clusters are studied. A grocery store researcher, for example, may randomly choose several geographic areas as primary sampling units and then interview all or a sample of grocery stores within the geographic clusters. Interviews are confined to these clusters only. No interviews occur in other clusters. Population Element - College seniors Clusters - Colleges Population Element - Airline travelers Clusters - Airports Probability Sampling
  • 46.
    Type of Sampling Meaning Example 6.Multistage Area Sampling Typically, geographic areas are randomly selected in progressively smaller (lower-population) units. Geographic Hierarchy inside Urbanized Areas From U.S. Bureau of the Census, “Geography —Concepts and Products,” Washington, DC, August First stage, random number of districts chosen in all states. Followed by random number of talukas, villages. Then third stage units will be houses. All ultimate units (houses, for instance) selected at last step are surveyed. Probability Sampling
  • 47.
    Type of Sampling Meaning Example 1.Convenience Sampling It refers to sampling by obtaining people or units that are conveniently available. Set up an interviewing booth to intercept consumers at a shopping mall. 2. Judgment Sampling (or Purposive Sampling) An experienced individual selects the sample based on his or her judgment and experience. It will be about some appropriate characteristic required of the sample member. A fashion manufacturer regularly selects a sample of key accounts that it believes are capable of providing information needed to predict what may sell in the summer. Forecast election results, Test-market cities etc. Nonprobability Sampling Methods
  • 48.
    Type of SamplingMeaning Example 3. Quota Sampling The interviewer has a quota to achieve. ensure that the various subgroups in a population are represented on pertinent sample characteristics to the exact extent that the investigators desire. An interviewer in a particular city may be assigned 100 interviews, 35 with owners of Sony TVs, 30 with owners of Samsung TVs, 18 with owners of Panasonic TVs, and the rest with owners of other brands. The interviewer is responsible for finding enough people to meet the quota. 4. Snowball Sampling A variety of procedures known as snowball sampling involve using probability methods for an initial selection of respondents and then obtaining additional respondents through information provided by the initial respondents. This technique is used to locate members of rare populations by referrals. Suppose a manufacturer of sports equipment is considering marketing a mahogany croquet set for serious adult players. This market is certainly small. An extremely large sample would be necessary to find 100 serious adult croquet players. It would be much more economical to survey, say, 300 people, find 15 croquet players, and ask them for the names of other players. - Psychologists or Psychiatrists Nonprobability Sampling Methods
  • 52.
  • 53.
    • The Hypothesisis an assumption which is tested to check whether the inference drawn from the sample of data stands true for the entire population or not. • whether a new drug is more effective than the existing drug based on the sample data, and whether the proportion of smokers in a class is different from 0.30.
  • 54.
    Essential Characteristics ofGood Hypotheses: They are declarative statements, not questions. They are tentative—conjectures that await empirical evidence. They must be testable. They are statements of the relation between variables. They must have at least two variables. It is neither too specific nor to general It is a prediction of consequences
  • 55.
    • Null hypothesis:The hypotheses that are proposed with the intent of receiving a rejection for them are called null hypotheses. This requires that we hypothesize the opposite of what is desired to be proved. For example, if we want to show that sales and advertisement expenditure are related, we formulate the null hypothesis that they are not related. Similarly, if we want to conclude that the new sales training programme is effective, we formulate the null hypothesis that the new training programme is not effective, and if we want to prove that the average wages of skilled workers in town 1 is greater than that of town 2, we formulate the null hypotheses that there is no difference in the average wages of the skilled workers in both the towns. Since we hypothesize that sales and advertisement are not related, new training programme is not effective and the average wages of skilled workers in both the towns are equal, we call such hypotheses null hypotheses and denote them as H0.
  • 56.
    • The NullHypothesis denoted by H0 asserts that there is no true difference between the sample of data and the population parameter and that the difference is accidental which is caused due to the fluctuations in sampling. • Thus, a null hypothesis states that H0 = there is no difference between the assumed and actual value of the parameter.
  • 57.
    • Alternative hypotheses:Rejection of null hypotheses leads to the acceptance of alternative hypotheses. The rejection of null hypothesis indicates that the relationship between variables (e.g., sales and advertisement expenditure) or the difference between means (e.g., wages of skilled workers in town 1 and town 2) or the difference between proportions have statistical significance and the acceptance of the null hypotheses indicates that these differences are due to chance. As already mentioned, the alternative hypotheses specify that values/relation which the researcher believes hold true. The alternative hypotheses can cover a whole range of values rather than a single point. The alternative hypotheses are denoted by H1.
  • 60.
    One-tailed and two-tailedtests: • A test is called one-sided (or one-tailed) only if the null hypothesis gets rejected when a value of the test statistic falls in one specified tail of the distribution. Further, the test is called two-sided (or two-tailed) if null hypothesis gets rejected when a value of the test statistic falls in either one or the other of the two tails of its sampling distribution
  • 61.
    • In statistics,the p-value is the probability of obtaining results at least as extreme as the observed results of a statistical hypothesis test, assuming that the null hypothesis is correct.
  • 64.
    • For example,consider a soft drink bottling plant which dispenses soft drinks in bottles of 300 ml capacity. The bottlingis done through an automatic plant. An overfilling of bottle (liquid content more than 300 ml) means a huge loss to the company given the large volume of sales. An underfilling means the customers are getting less than 300 ml of the drink when they are paying for 300 ml. This could bring bad reputation to the company. The company wants to avoid both overfilling and underfilling.
  • 65.
    • Therefore, itwould prefer to test the hypothesis whether the mean content of the bottles is different from 300 ml. This hypothesis could be written as: • The hypotheses stated above are called two-tailed or two- sided hypotheses. However, if the concern is the overfilling of bottles, it could be stated as:
  • 66.
    • Such hypothesesare called one-tailed or one-sided hypotheses and the researcher would be interested in the upper tail (right hand tail) of the distribution. • If however, the concern is loss of reputation of the company (underfilling of the bottles), the hypothesis may be stated as: • The hypothesis stated above is also called one-tailed test and the researcher would be interested in the lower tail (left hand tail) of the distribution.
  • 67.
    Type I andtype II error: • : The acceptance or rejection of a hypothesis is based upon sample results and there is always a possibility of sample not being representative of the population. This could result in errors as a consequence of which inferences drawn could be wrong.
  • 69.
    Example - Ina clinical trial of a new drug
  • 70.
  • 71.
    • If nullhypothesis H0 is true and is accepted or H0 when false is rejected, the decision is correct in either case. However, if the hypothesis H0 is rejected when it is actually true, the researcher is committing what is called a Type I error. The probability of committing a Type I error is denoted by alpha (α). This is termed as the level of significance. Similarly, if the null hypothesis H0 when false is accepted, the researcher is committing an error called Type II error. The probability of committing a Type II error is denoted by beta (β). The expression 1 – β is called power of test.
  • 72.
    • For exampleH0: there is no difference between the two drugs on average. Type I error will occur if we conclude that the two drugs produce different effects when actually there isn’t a difference. • The probability of making a Type I error when the null hypothesis is true as an equality is called the level of significance. • Applications of hypothesis testing that only control the Type I error are often called significance tests. • Prob(Type I error) = significance level = α
  • 73.
    • Hypothesis TestingProcedure : The following steps are followed in hypothesis testing:
  • 74.
    STEPS IN TESTINGOF HYPOTHESIS • 1. Set up a Hypothesis: • The first step is to establish the hypothesis to be tested. • The statistical hypothesis is an assumption about the value of some unknown parameter, and the hypothesis provides some numerical value or range of values for the parameter. • Here two hypotheses about the population are constructed: Null Hypothesis and Alternative Hypothesis.
  • 75.
    • the twohypotheses are constructed in such a way that if one is true, the other one is false and vice versa. There can also be situations where the researcher is interested in establishing the relationship between any two variables. In such a case, a null hypothesis is set as the hypothesis of no relationship between those two variables; whereas the alternative hypothesis is the hypothesis of the relationship between variables. The rejection of the null hypothesis indicates that the differences/ relationship have a statistical significance and the acceptance of the null hypothesis means that any difference/relationship is due to chance.
  • 76.
    • 2. Setup a Suitable Significance Level: • The next step in the testing of hypothesis exercise is to choose a suitable level of significance. The level of significance denoted by α is chosen before drawing any sample. The level of significance denotes the probability of rejecting the null hypothesis when it is true. The value of α varies from problem to problem, but usually it is taken as either 5 per cent or 1 per cent. A 5 per cent level of significance means that there are 5 chances out of hundred that a null hypothesis will get rejected when it should be accepted. This means that the researcher is 95 per cent confident that a right decision has been taken. Therefore, it is seen that the confidence with which a researcher rejects or accepts a null hypothesis depends upon the level of significance.
  • 77.
    • 3. Determininga Suitable Test Statistic: • After the hypothesis is constructed, and the significance level is decided upon, the next step is to determine a suitable test statistic and its distribution • Most of the statistic tests assume the following form • the test statistic could be t, Z, χ2 or F, depending upon various assumptions.
  • 78.
    • 4. Determiningthe Critical Region: • Before the samples are drawn it must be decided which values to the test statistic will lead to the acceptance of H0 and which will lead to its rejection. The values that lead to rejection of H0 are called the critical region. • Before a sample is drawn from the population, it is very important to specify the values of test statistic that will lead to rejection or acceptance of the null hypothesis. The one that leads to the rejection of null hypothesis is called the critical region. Given a level of significance, α, the optimal critical region for a two-tailed test consists of that α/2 per cent area in the right hand tail of the distribution plus that α/2 per cent in the left hand tail of the distribution where that null hypothesis is rejected. Therefore, establishing a critical region is similar to determining a 100 (1 – α) per cent confidence interval.
  • 80.
    • Computing thevalue of test-statistic: The next step is to compute the value of the test statistic based upon a random sample of size n. Once the value of test statistic is computed, one needs to examine whether the sample results fall in the critical region or in the acceptance region.
  • 81.
    • Making decision:The hypothesis may be rejected or accepted depending upon whether the value of the test statistic falls in the rejection or the acceptance region. Management decisions are based upon the statistical decision of either rejecting or accepting the null hypothesis. If the hypothesis is being tested at 5 per cent level of significance, it would be rejected if the observed results have a probability less than 5 per cent. In such a case, the difference between the sample statistic and the hypothesized population parameter is considered to be significant. On the other hand, if the hypothesis is accepted, the difference between the sample statistic and the hypothesized population parameter is not regarded as significant and can be attributed to chance.