Wollega University Shambu Campus Faculty of
Agriculture
Dep't Fisheries, Wetlands and Wildlife
Research Methods and Experimental Design
Fantahun Dereje
Meaning and Concepts of Scientific Research
 The word research is composed of two syllables, re and search.
 re is a prefix meaning again, anew or over again
 search is a verb meaning to examine closely and carefully, to test and try, or to probe.
 Together they form a noun describing a careful, systematic, patient study and investigation in some field
of knowledge, undertaken to establish facts or principles.
 Research is a structured enquiry that utilizes acceptable scientific methodology to solve problems and
create new knowledge that is generally applicable.
 Research is a process of collecting, analyzing and interpreting information to answer questions
 Research refers to a search for knowledge or facts.
 It can be also defined as a scientific and systematic search for pertinent information on a specific topic. In
fact, Research is an art of scientific investigation.
Meaning ……
 Different scholars define research in different ways as:
 Redman and Mory define research as a “systematized effort to gain new knowledge.
 Some people consider research as a movement:- a movement from the known to the
unknown. It is actually a voyage of discovery.
 We all possess the vital instinct of inquisitiveness or eagerness for, when the unknown
confronts us, we wonder and our inquisitiveness makes us probe and attain full and fuller
understanding of the unknown.
 This inquisitiveness is the mother of all knowledge and the method which man employs
for obtaining the knowledge of whatever the unknown can be termed as Research.
 Slesinger and M. Stephenson, define research as “the manipulation of things, concepts or symbols for the
purpose of generalizing to extend, correct or verify knowledge, whether that knowledge aids in construction
of theory or in the practice of an art.
 Research is, thus, an original contributor to the existing stock of knowledge making for its advancement. It
is the pursuit or hobbying of truth with the help of study, observation, comparison and experiment.
 Research is an academic activity and as such the term should be used in a technical sense. According to
Clifford Woody, research comprises defining and redefining problems, formulating hypothesis or suggested
solutions; collecting, organizing and evaluating data; making deductions and reaching conclusions; and at
last carefully testing the conclusions to determine whether they fit the formulated hypothesis or not.
 In short, the search for knowledge through objectives and systematic method of finding solution to a
problem is a Research.
Meaning ……
Research can be classified from three perspectives:
 Application of the research study
 Objectives in undertaking the research
 Inquiry mode employed
1.Based on Application of the research study:
 From the point of view of application, there are two broad categories of research:
 pure (basic or fundamental) research and
 applied (or action) research
TYPES OF RESEARCH
 Pure research involves developing and testing theories and hypotheses that are intellectually
challenging to the researcher but may or may not have practical application at the present
time or in the future.
 The knowledge produced through pure research is sought in order to add to the existing body
of research methods.
 Applied research is done to solve specific, practical questions; for policy formulation,
administration and understanding of a phenomenon.
 It is aimed at certain conclusions (solution) facing a concrete social or business problem
 It can be exploratory, but is usually descriptive.
TYPES OF RESEARCH…
2. Based on objectives in undertaking the research
 From the viewpoint of objectives, a research can be classified as:
-Descriptive
-Correlational
-Explanatory
-Exploratory
 Descriptive research attempts to describe systematically a situation, problem, phenomenon, service or
programme, or provides information about, say, living condition of a community, or describes attitudes
towards an issue.
 The main characteristic of this method is that the researcher has no control over the variables; he can
only report what has happened or what is happening.
TYPES OF RESEARCH…
 Correlational research attempts to discover or establish the existence of a relationship/
interdependence between two or more aspects of a situation.
 Explanatory research attempts to clarify why and how there is a relationship between two
or more aspects of a situation or phenomenon.
 Exploratory research is undertaken to explore an area where little is known or to
investigate the possibilities of undertaking a particular research study (feasibility study/
pilot study).
 In practice most studies are a combination of the first three categories.
TYPES OF RESEARCH…
3. Based on inquiry mode employed
 From the process adopted to find answer to research questions – the two approaches are:
- Structured approach
- Unstructured approach
 Structured approach: The structured approach to inquiry is usually classified as
quantitative research.
 Here everything that forms the research process- objectives, design, sample, and the
questions that you plan to ask of respondents- is predetermined.
 It is more appropriate to determine the extent of a problem, issue or phenomenon by
quantifying the variation.
 e.g. how many people consume fish? How many people have good attitude toward
fish rearing?
TYPES OF RESEARCH…
 Unstructured approach: The unstructured approach to inquiry is usually classified as
qualitative research.
 This approach allows flexibility in all aspects of the research process.
 It is more appropriate to explore the nature of a problem, issue or phenomenon without
quantifying it.
 Main objective is to describe the variation in a phenomenon, situation or attitude.
 e.g., description of an observed situation, the historical enumeration of events, an
account of different opinions different people have about an issue, description of
 working condition in a particular industry.
Note: In many studies you have to combine both qualitative and quantitative approaches.
TYPES OF RESEARCH…
Steps in Research Process
1. Formulating the Research Problem
2. Extensive Literature Review
3. Developing the objectives
4. Preparing the Research Design including Sample Design
5. Collecting the Data
6. Analysis of Data
7. Generalisation and Interpretation
8. Preparation of the Report or Presentation of Results-Formal write
ups of conclusions reached.
Levels and principles of research planning
 After identifying and defining the problem, researcher must arrange his ideas in order and write them in the form of an
experimental plan or what can be described as ‘Research Plan’.
 This is essential specially for new researcher because:
a. It helps to organize ideas in a form whereby it will be possible to look for flaws and inadequacies, if any
b. It provides an inventory of what must be done and which materials have to be collected as a preliminary step
c. It is a document that can be given to others for comment.
 Research plan must contain the following items.
1. Research objective should be clearly stated, what it is that the researcher expects to do
2. The problem to be studied by researcher should be put clearly
3. Each major concept which to be measured should be defined in operational terms in context of the research project
4. The method to be used in solving the problem
5. It must also state the details of the techniques to be adopted
Research problem
 It is the first and most crucial step in the research process
- Main function is to decide what you want to find out about.
• The way you formulate a problem determines almost every step that follows.
Steps in formulation of a research problem
Step 1 Identify a broad field or subject area of interest to you.
Step 2 Dissect the broad area into sub areas.
Step 3 Select what is of most interest to you.
Step 4 Raise research questions.
Step 5 Formulate objectives.
Step 6 Assess your objectives.
Step 7 Double check.
Research question/hypothesis
 Detail the problem statement
 Further describe and refine the issue under study
 Add focus to the problem statement
 Guide data collection and analysis
 Sets context
Selection of appropriate methodology
 Research methodology is a systematic way to solve a problem. It is a science of studying how
research is to be carried out. Essentially, the procedures by which researchers go about their
work of describing, explaining and predicting phenomena are called research methodology. It is
also defined as the study of methods by which knowledge is gained. Its aim is to give the work
plan of research.
 Research methodology tells you which has to be used out of the various existing methods. More
precisely, research methods help us get a solution to a problem. On the other hand, research
methodology is concerned with the explanation of the following:
 (1) Why is a particular research study undertaken?
 (2) How did one formulate a research problem?
 (3) What types of data were collected?
 (4) What particular method has been used?
 (5) Why was a particular technique of analysis of data used?
RESEARCH APPROACHES (Quantitative And Qualitative) Method
1. Quantitative Approach
 Involves the generation of data in quantitative form which can be subjected to rigorous quantitative
analysis in a formal fashion. This approach can be further sub-classified into Inferential,
Experimental and Simulation Approaches to research.
 The purpose of inferential approach to research is to form a data base from which to infer
characteristics or relationships of population. This usually means survey research where a sample of
population is studied (questioned or observed) to determine its characteristics, and it is then inferred
that the population has the same characteristics.
 Experimental approach is characterized by much greater control over the research environment
and in this case some variables are manipulated to observe their effect on other variables.
 Simulation approach involves the construction of an artificial environment within
which relevant information and data can be generated. This permits an observation of
the dynamic behaviour of a system (or its sub-system) under controlled conditions.
 Qualitative approach to research is concerned with subjective assessment of attitudes,
opinions, behaviour, etc. Research in such a situation is a function of researcher’s
insights and impressions. Such an approach to research generates results either in non-
quantitative form or in the form which can not be subjected to rigorous quantitative
analysis. Generally, the techniques of focus group interviews, projective techniques and
depth interviews are used.
Research strategy: experimental and survey research Design
Experimental Research Design
 Experiment :
 An experiment is a test or a series of tests
 is a planned inquiry to investigate new facts or to confirm or deny the results of the previous
experiments
Experimentation is used to obtain
 New information or to improve the results of previous findings
 It helps to answer questions
An experimental design:
 is a planned interference in the natural order of events by the researcher.
Importance of Experimental design
To provide estimates of a treatment effects or differences among treatment effects.
To provide an efficient way of confirming or denying hypothesis about the response to
treatments.
To control experimental errors and increase precision by reducing external variation in
experimental error.
To facilitate the application of treatments, management operations and harvest of the plots.
Types of Experimental Designs
1)Complete block design
– When a block contains all the treatment
– Number of replication equal to blocks
e.g. Completely randomized design, Completely Randomized block design, Spilt
plot
2)Incomplete block design
 When the block does not contain all treatment
 Number of blocks is not the same as that of replication
e.g:Lattice square and latin rectangle, Augmented designs, replication may
contain two or more blocks.
Survey research Design
The term survey is used for the techniques of investigation by a direct observation of a
phenomenon
or a systematic gathering of data from population by applying personal contact and interviews
when adequate information about certain problem is not available in records, files and other
sources.
The survey is an important tool to gather evidences relating to certain social problems.
The term social survey indicates the study of social phenomena through a survey of a small
sampled population and also to broad segments of population.
It is concerned with the present and attempts to determine the status of the phenomenon under
investigation.
 The method of survey research is a non-experimental (that is, it does not involves any
observation under controlled conditions), descriptive research method which is one of the
quantitative method used for studying of large sample.
 In a survey research, the researcher collects data with the help of standardised questionnaires
or interviews which is administered on a sample of respondents from a population
(population is sometimes referred to as the universe of a study which can be defined as a
collection of people or object which possesses at least one common characteristic).
 The method of survey research is one of the techniques of applied social research which can
be helpful in collection of data both through direct (such as a direct face to face interview)
and indirect observation (such as opinions on library services of an institute).
STEPS INVOLVED IN CONDUCTING SURVEY RESEARCH
Any type of survey research follows the following systematic steps
Step 1: Determination of the aims and objectives of study
Step 2: Define the population to be studied
Step 3: Design and construct a survey
Step 3: Select a representative sample
Step 4: Administer the survey
Step 5: Analyse and interpret the findings of the survey
Step 6: Prepare the report of the survey
Step 7: Communicate the findings of the survey
TYPES OF SURVEY RESEARCH
Basically there are two major types of survey: cross sectional surveys and longitudinal surveys
Cross sectional surveys: are used by the researcher when he or she wants to collect data from varied or
different types of groups ( that may be in terms of age, sex, group, nation, tribes and so on) at a single
time.
 An example of a survey can be a study on the effect of socialization of children of different age
groups of a particular country. This type of survey is less time consuming and economical as well.
Longitudinal survey: is used only when the subject wants to study the same sample for a longer period
of time.
Such longitudinal studies may be used to study behavioural changes, attitude changes, religious effects
or any event or practice that may have a long time effect on the selected sample or population.
There are three main types of longitudinal studies which help the researcher to analyse the long term
effects on the selected sample. These include
 (i) Trend studies
 (ii) Cohort studies and
 (iii) Panel studies.
COMPONENTS OF EXPERIMENTAL METHODS
Unit Two
2.1 Introduction
 In agricultural research, the key question to be answered is generally expressed as a statement
of hypothesis. This hypothesis has to be verified or disproved through experimentation.
 Once a hypothesis is framed, the next step is to design a procedure for its verification.
 This is the Experimental Procedure or Research Methodology, which usually consists of four
phases:
 Selecting the appropriate materials to test the hypothesis
 Specifying the characters to measure
 Selecting the procedure/design to measure those characters
 Specifying the procedure/method of analyzing the characters to determine whether the
measurements made support the hypothesis or not.
 In general, the first two phases are fairly easy for a subject matter specialist to specify.
 On the other hand, the procedures/design regarding how measurements are to be made and how
prove or disprove a hypothesis depend heavily on techniques developed by statisticians
 The procedures and how measurements can prove/disprove the hypothesis
requires generally the idea of experimentation.
 This is what we call design of the experiments.
 The design of experiments has 3 essential components
Estimate of error
Control of error
Proper interpretation of results obtained either verified/disproved
1. Estimate of Error
 We need to compare the two cattle breeds in terms
of their milk yield.
 Breed A and B have received the same management
and are housed side by side.
 Milk yield is measured and higher yield is judged
better.
 The difference in milk yield of the two breeds could
be caused due to breed differences.
 But this certainly is not true.
 Even if the same breed might have been housed on
both houses, the milk yield could differ.
 Other factors such as, climatic factors (temperature),
damage by disease and insects affect milk yield.
Exotic breed (A)
Local breed /indigenous (B)
 Therefore, a satisfactory evaluation of the two cattle breeds must involve a procedure that
can separate breed difference from other sources of variation.
 Therefore, the animal breeder must be able to design an experiment that allows them to
decide whether the milk yield difference observed is caused by breed difference or by
other factors.
 In this case, we are able to estimate the exact experimental error in livestock research.
 The difference among experimental plots/materials treated alike (similarly) is called
Experimental Error.
 This error is the primary basis for deciding whether an observed difference is real or just
due to chance.
 Clearly, every experiment must be designed to have a measure of the experimental error.
It is unavoidable but try to be reduced as minimum as possible in the experiment.
Methods To Reduce Experimental Error
 Increase the size of experiment either through provision of more replicates or by
inclusion of additional treatments.
 Refine or improving the experimental techniques/procedures
 Have uniformity in the application of treatments such as equally spreading of
fertilizers, recording data on the same day, similar housing, similar feeding, etc.
 Control should be done over external influences so that all treatments produce their
effects under comparable conditions e.g. protecting disease, etc.
1.1. Replication
 It is the repetition of treatments in an experiment. At least two plots/experimental
materials of the same breed/variety are needed to determine the difference among
plots/experimental materials treated alike.
 Experimental error can be measured if there are at least two plots treated the same or
receiving the same treatment.
Thus, to obtain a measure of experimental error, replication is needed.
 The advantage of replication in an experiment is to increase precision of error
estimation and error variance is reduced and easily estimated.
Functions of Replication
 Provides an ease of estimate of exp,tal error
 Because it provides several observations on experimental units receiving the same
treatment. For an experiment on which each treatment appears only once, no
estimate of experimental error is possible.
 Improves the precision or accuracy of an experiment
 As the number of replicates increases, the estimates of population means as
observed treatment means becomes more precise.
 Increases the scope of inference/conclusion of the experiments
1.2. Randomization
 Randomization ensures that no treatment is consistently favored or discriminated being
placed under best or unfavorable conditions, thereby avoiding biasness.
 It means that each variety/breeds of animal will have equal chance of being assigned to
any experimental plots.
 It also ensures independence among observations, which is a necessary condition for
validity of assumptions to provide significance tests and confidence intervals.
 Randomization can be done by using random number, lottery system, or coin system.
 Thus, experimental error of the difference will be reduced if assigned randomly and
independently.
2. Control of Error
 The ability of the experiment to detect the existing difference among
treatments/experimental materials is increased as the size of the experimental error
decreased.
 A good experiment should incorporate all possible means of minimizing the
experimental error.
 Three commonly used techniques for controlling experimental error in agricultural
research are as follows:
 Blocking
 Proper plot technique
 Proper data analysis
1. Blocking
 Putting experimental units that are as similar as possible together in the same group is generally referred as a
block.
 By assigning all treatments/experimental plots in to each block separately and independently, variation among
blocks can be measured and removed from experimental error.
 Reduction in experimental error is usually achieved with the use of proper blocking techniques in different
experimental designs.
2. Proper plot technique
 For all experiments it is absolutely essential that except treatments, all other factors must be maintained
uniformly for all experimental units.
 For example, for a forage variety trial where the treatments consists solely of the test varieties, it is required
that all other factors such as soil nutrients, solar energy, temperature, plant population, pest incidence and other
infinite environmental factors should be maintained as uniformly for all plots in the experiments as possible.
This is primarily a concern of a good plot technique.
3. Proper data analysis
 In cases, where blocking alone may not able to achieve adequate control of experimental error, proper data
analysis can help greatly. In this case, covariance analysis is most commonly used for this purpose.
3. Proper Interpretation of Results of an Experiment
 After estimating and controlling experimental error, the result
of experiment must be interpreted properly according to the
situation of the environment and conditions of the experiment
for practical application.
 For example, the DMY of the forage variety must be reported based
on the environmental conditions where the study is conducted
including climatic data (temperatures, rainfall, others), soil fertility
and type, topography, and others as much as possible.
Analysis of Variance (ANOVA)
 Anova is a procedure that can be used to analyze the results from both simple and
complex experiments
 Reveals whether the obtained difference between any treatments is real or occur by
chance.
 It partition the total variation in to different components and test their significance
Overview of some experimental designs
 The most common types of designs used in agricultural research:
 Completely randomized design(CRD)
 Completely randomized block design(RCBD)
 Latin square design(LSD)
 lattice design
 Augmented designs
 Split plot design.
Completely Randomized Design (CRD)
 The simplest and least restrictive design.
 The only restrictions:
 Experimental units are homogeneous.
 Treatments are assigned completely at random
 Advantages:
– Flexibility
– Statistical analysis simple
– provides maximum degrees of freedom for error
 Disadvantages:
– Low precision
When to Use CRD?
when the experimental area -units/plots are more or less
homogeneous and where environmental effects are relatively easy
to control, e.g., laboratory and greenhouse.
CRD is flexible and the statistical analysis is also simple even
when there are unequal replications or missing value: Df=t(r-1)
Randomized Completely Block Design (RCBD)
It is the most frequently used experimental design in field
experiments.
It has three sources of variation such as treatment, blocks and
experimental error. This has one additional source of variation than
CRD.
It can be used when the experimental units that can be
meaningfully grouped
Characteristics of RCB design
The number of blocks are equal to the number of replications
The number of plots in each replication (block) is equal to the number of treatment.
The treatments are randomized in each replication subjected to the restriction that each
treatment occur once and only once.
Blocking (grouping) is done based on the gradient: Soil heterogeneity, slope, initial body
weight, age, sex, and breed of animal, Slope
Blocking minimizes the variability with in each block while the variability among blocks
is maximized.
 Block shape, size and orientation determination
– When the gradient is unidirectional, use narrow blocks perpendicular to the direction of the
gradient.
– When the gradient occur in two directions ignore the weaker gradient.
– Arrange your blocks perpendicular to the stronger gradient but reduce the length of blocks.
– Blocking reduces experimental error by eliminating the contribution of known sources of
variation among experimental units.
Advantages of RCB design
Precision: More precision is obtained than with CRD because grouping experimental
units into blocks reduces the magnitude of experimental error.
Flexibility: Theoretically there is no restriction on the number of treatments or blocks.
Ease of Analysis: The statistical analysis of the data is simple
Disadvantage
When the number of treatments is large (>15), variation among experimental units
within a block becomes large, resulting in a large error. In such situations better to use
other designs.
Factorial experiment
The designs are applicable to any type of experiments, regardless of the
structure of treatments.
Two types፡ single & multi-factor experiment:
Single factor experiment: An experiment that is concerned with testing
several levels of one factor, keeping all other factors constant
Multi - factor experiments: two or more factors where effects and cross-
effects are tested simultaneously.
Characteristics of factorial experiments
Factorial experiments are those trials that can accommodate more than one factor, each of
which having two or more levels.
All possible combinations of factors and levels vary simultaneously.
They do not have their own designs.
Combination of treatments make possible to find differential effect of one factor at two or
more levels of the second factor.
Factorials will have an error term if the designs are CRBD, CRD or Latin square.
However, if the design is split plot, two error systems will be used.
An interaction effect between two factors can be measured only if the two factors are
tested together in the same experiment.
If the interaction effect is significant, more attention should be given on the results of
interaction than main effects
Estimation of missing plot values is more complex in factorial experiments.
Disadvantages
As the number of factors increase the size of experiment becomes
very large and complex. e. g: with 8 factors each at 2-levels, there
are 28, 256 treatment combinations.
Large factorial experiments are difficult to interpret especially
when there are interactions.
Unit Three
Social Research Methods
Unit Four
Writing strategies
Structure/format for proposal writing
Definition of proposal
The research proposal is the document that finally establishes that there is a niche for your chosen
area of study and that the research design is feasible.
The research proposal:
- helps you to think out the research project you are about to undertake and predict any difficulties that might
arise.
For those who aren't quite sure what their focus will be, the research proposal can be a space to
explore options -- perhaps with one proposal for each potential topic (which can then be more easily
compared and evaluated than when they are still just ideas in one's head).
Research proposals can be effective starting places to discuss projects with your professors or
advisors, too.
 A professor who is initially skeptical about a project may be able to imagine it more
easily after reading a well written research proposal (this doesn't mean he or she will
approve the topic, especially if there are significant potential difficulties that you haven't
considered).
 Once you have begun your research project, a research proposal can help you to remain
on track -- and can also remind you why you started this project in the first place!
 Researchers very often begin to lose heart about two thirds of the way into a project when
their research hits a snag or when they are having problems developing a thesis,
organizing the ideas, or actually starting to write.
 Re-reading the initial research proposal, especially "Significance" can re-energize the
project or help the researcher to refocus in an effective manner.
General Elements or structures of research proposal:
 Cover page (include topic or title, institution name, your name, advisor name, time of
submission)
 Acknowledgements (optional)
 Abbreviations and Acronyms
 Table of contents
 List of tables (If any)
 List of figures (If any)
 Introduction
 Literature review
 Materials and Methods
 Plan of activities
 Logistics
 References
 Appendix
General form of the main research report
Put together structure of the paper:
Cover page (include topic or title, institution name, your name, advisor name, time of submission)
Acknowledgements (optional)
Abbreviations and Acronyms
Table of contents
List of tables (If any)
List of figures (If any)
Abstract
Introduction
Methods & Materials
Results and
Discussion
Summary & Conclusions and Recommendation
References
Divide long sections into subsections
Reference and citations
Citation
Textual citation
Use name and date for published works
 Minale Simachew (1999) stated that or (Minale Simachew 1999)
For co-authored published works
 Getachew Belay and Hailu Tefera (2006)….
For more than 2 authors
 Aster Bedaso et al., (2001) reported
Handel second hand citations in one of these ways
 Seid Ahemed (2005) cited in Hailu Tefera (2007) dicussed…… or Hailu Tefera
(2007) quoting Seid Ahemed (2005) discussed……..
 References
 Common author/date system
 Arrange in alphabetical order of the surnames
 Example:
 Abebe W. 1991.Traditional husbandry practices and major health problems of camels in the
Ogaden, Ethiopia. Nomadic Peoples, 29:21-30.
 Alemayehu G. 2001. Breeding program and evaluation of semen characteristics of Camels in the
Central Rift Valley of Ethiopia, an MSc Thesis Presented to the School of Graduate Studies of
Alemaya University.
 CARE-Ethiopia 2009. Value Chain Analysis of Milk and Milk Products in Borena Pastoralist
Area. Addis Ababa: CARE Ethiopia.
 If two or more entries have the same author (s) in the same publication year alphabetize the entries by title
and use lower case letters (a,b,c etc) to separate their identity.
 Example:
 Sampath S. (2001)a Sampling Theory and Methods, Narosa Publishing House, New Delhi.
 Sampath S. (2001)b Statistical Theory and Methods, Narosa Publishing House, New Delhi.
Part II
Chapter I
Sampling Distributions
 Statistics is a Science of Inference
• Statistical Inference:
– Predict and forecast values of population parameters
– Test hypotheses about values of population parameters
– Make decisions
On basis of sample statistics derived from limited and incomplete sample
information
– Make generalizations about the characteristics of a population
On the basis of observations of a sample, a part of a population
–Unbiased, representative sample drawn at random from the entire population.
 A sample statistic is a numerical measure of a summary characteristic of a sample.
 A population parameter is a numerical measure of a summary characteristic of a
population.
Estimator
•The sample mean, X , is the most common estimator of the population mean, 
•The sample variance, s2
, is the most common estimator of the population variance, 2
.
•The sample standard deviation, s, is the most common estimator of the population standard
deviation, .
•The sample proportion, , is the most common estimator of the population proportion, p.
Inferential Statistics involves three distributions:
A population distribution – variation in the larger group that we want to know about.
A distribution of sample observations – variation in the sample that we can observe.
A sampling distribution – a normal distribution whose mean and standard deviation are
unbiased estimates of the parameters and allows one to infer the parameters from the
statistics.
p̂
Sampling Distributions
The sampling distribution of a statistic is the probability distribution of all possible
values the statistic may assume, when computed from random samples of the same
size, drawn from a specified population.
The sampling distribution of X is the probability distribution of all possible values the
random variable may assume when a sample of size n is taken from a specified
population.
 When sampling from a normal population with mean  and standard deviation , the
sample mean, X, has a normal sampling distribution:
X N
n
~ ( , )


2
An estimator of a population parameter is a sample statistic used to estimate the
parameter. The most commonly-used estimator of the:
Population Parameter Sample Statistic
Mean () is the Mean (X)
Variance (2
) is the Variance (s2
)
Standard Deviation () is the Standard Deviation (s)
Proportion (p) is the Proportion ( )

p
 Desirable properties of estimators include:
Unbiasedness
Efficiency
Consistency
Sufficiency
Estimators and Their Properties
Probability Distribution
Normal distribution
Naturally most variables are assumed to be distributed normally, where the distribution
curve takes a bell-shape. E.g. height or body weight of people in Shambu
A normal distribution can be completely described by its mean and standard deviation.
N (µ, δ)
Binomial distribution
It is one of the most widely used discrete distributions.
A binomial distribution can be thought of as simply the probability of a SUCCESS or
FAILURE outcome in an experiment or survey that is repeated multiple times.
The binomial is a type of distribution that has two possible outcomes. For two possible
outcomes: pass or fail.
For the binomial model to be applied the following four criteria must be satisfied
1. the trial is carried out a fixed number of times n.
2. the outcomes of each trial can be classified into two ‘types’ success or failure.
3. the probability p of success remains constant for each trial. (tails, heads, fail or pass) is exactly the same from
one trial to another.
4. the individual trials are independent of each other. In other words, none of your trials have an effect on the
probability of the next trial.
 For example, if we consider throwing a coin 7 times what is the probability that exactly 4 heads occur?
 This problem can be modelled by the binomial distribution since the four basic criteria are assumed satisfied as
we see.
 here the trial is ‘throwing a coin’. This is carried out 7 times
 the occurrence of a head on any given trial (i.e. throw) may be called a success
 the probability of success is p = 1/2 and remains constant for each trial
 each throw of the coin is independent from the others.
Central Limit Theorem
 Is a statistical concept regarding the relationship between sample size and the
distribution of sample statistic (sample mean);
 It is a concept closely related to the law of large numbers (LoLN);
 The CLT states that for a su ciently large sample size n, a normal distribution will
ffi
occur regardless of what the initial distribution looks like.
Law of Large Numbers As n grows, the probability that the mean of n
samples is close to µ goes to 1
Central Limit Theorem As n grows, the distribution of the mean of n
samples converges to the normal distribution
 This Theorem tell us:
 Even if a population distribution is skewed, we know that the sampling distribution of
the mean is normally distributed
 As the sample size gets larger, the mean of the sampling distribution becomes equal to
the population mean
 As the sample size gets larger, the standard error of the mean decreases in size (which
means that the variability in the sample estimates from sample to sample decreases as N
increases).
 It is important to remember that researchers do not typically conduct
repeated samples of the same population.
 Instead, they use the knowledge of theoretical sampling distributions to
construct confidence intervals around estimates.
Chapter II
Estimation
 Estimation – A process whereby we select a random sample from a population and use a
sample statistic to estimate a population parameter.
Statistical inferences of estimation has two general areas:
 Point Estimate
 Interval Estimate
 Point Estimate – A sample statistic used to estimate the exact value of a population parameter
– Most common Point Estimators
 Sample mean estimates population mean 
 Sample std. dev. estimates population std. dev. 
 Sample proportion estimates population proportion 
ˆ i
y
y
n
  

2
( )
ˆ
1
i
y y
s
n


 


ˆ

 Interval estimate –an inferential statistical procedure used to estimate population
parameters from sample data through the building of confidence intervals
 Confidence Intervals: a range of values computed from sample data that has a known
probability of capturing some population parameter of interest
 A defined interval of values that includes the statistic of interest, by adding and subtracting
a specific amount from the computed statistic
 Confidence Level – The likelihood, expressed as a percentage or a probability, that a
specified interval will contain the population parameter.
 95% confidence level – there is a .95 probability that a specified interval DOES contain
the population mean. In other words, there are 5 chances out of 100 (or 1 chance out of
20) that the interval DOES NOT contain the population mean.
 99% confidence level – there is 1 chance out of 100 that the interval DOES NOT
contain the population mean.
Various Levels of Confidence
When population standard deviation is known use Z table values:
 For 95%CI: mean +/- 1.96 s.e. of mean
 For 99% CI: mean +/- 2.58 s.e. of mean
When population standard deviation is not known use “Critical
Value of t” table
 For 95%CI: mean +/- 2.04 s.e. of mean
 For 99% CI: mean +/- 2.75 s.e. of mean
Process for Constructing Confidence Intervals
Compute the sample statistic (e.g. a mean)
Compute the standard error of the mean
Make a decision about level of confidence that is desired (usually
95% or 99%)
Find tabled value for 95% or 99% confidence interval
Multiply standard error of the mean by the tabled value
Form interval by adding and subtracting calculated value to and
from the mean
Chapter III
Tests of hypotheses based on a single
sample
 A hypothesis test is used to determine whether or not a treatment has an effect, while
estimation is used to determine how much effect.
 This complementary nature is demonstrated when estimation is used after a hypothesis test
that resulted in rejecting the null hypothesis.
 In this situation, the hypothesis test has established that a treatment effect exists and the
next logical step is to determine how much effect.
 You should keep in mind that even though estimation and hypothesis testing are inferential
procedures, these two techniques differ in terms of the type of question they address.
 A hypothesis test, for example, addresses the somewhat academic question concerning the
existence of a treatment effect.
 Estimation, on the other hand, is directed toward the more practical question of how much
effect.
 A hypothesis test is a process that uses sample statistics to test a claim about the value of
a population parameter.
 A verbal statement, or claim, about a population parameter is called a statistical
hypothesis.
 Hypothesis testing is designed to detect significant differences: differences that did not
occur by random chance.
 In the “one sample” case: we compare a random sample (from a large group) to a
population.
 We compare a sample statistic to a population parameter to see if there is a significant
difference.
The Null and Alternative Hypotheses:
1. Null Hypothesis (H0)
What is tested
Has serious outcome if incorrect decision made
Always has equality sign: , , or 
Designated H0 (pronounced H-oh)
Specified as H0:   some numeric value
Specified with = sign even if  or 
• Example, H0:   3
“The difference is by random chance”.
The H0 always states there is “no significant difference.” In this case, we mean that there is no
significant difference between the population mean and the sample mean.
1. Alternative hypothesis (H1)
Opposite of null hypothesis
Always has inequality sign: ,, or 
Designated Ha
Specified Ha:  ,, or  some value
• Example, Ha:  < 3
“The difference is real”.
(H1) always contradicts the H0.
Types of Errors
No matter which hypothesis represents the claim, always begin the hypothesis test
assuming that the null hypothesis is true.
At the end of the test, one of two decisions will be made:
1. reject the null hypothesis, or
2. fail to reject the null hypothesis.
A type I error occurs if the null hypothesis is rejected when it is true.
A type II error occurs if the null hypothesis is not rejected when it is false.
Level of Significance
In a hypothesis test, the level of significance is your maximum allowable probability of making a type I error. It is denoted by , the lowercase Greek letter alpha.
The probability of making a type II error is denoted by , the lowercase Greek letter beta.
By setting the level of significance at a small value, you are saying that you want the
probability of rejecting a true null hypothesis to be small.
Commonly used levels of significance:
 = 0.10  = 0.05  = 0.01
Hypothesis tests are
based on .
P-values
 If the null hypothesis is true, a P-value (or probability value) of a
hypothesis test is the probability of obtaining a sample statistic
with a value as extreme or more extreme than the one determined
from the sample data.
 The P-value of a hypothesis test depends on the nature of the test.
 There are three types of hypothesis tests – a left-, right-, or two-
tailed test. The type of test depends on the region of the sampling
distribution that favors a rejection of H0. This region is indicated
by the alternative hypothesis.
 If the alternative hypothesis contains the less-than inequality symbol (<), the
hypothesis test is a left-tailed test.
 H0: μ  k
 Ha: μ < k
 If the alternative hypothesis contains the greater-than symbol (>), the
hypothesis test is a right-tailed test.
 H0: μ  k
 Ha: μ > k
z
0 1 2 3
-3 -2 -1
P is the area to the
left of the test
statistic.
Test
statistic
z
0 1 2 3
-3 -2 -1
P is the area to the
right of the test
statistic.
Test
statistic
3. If the alternative hypothesis contains the not-equal-to symbol (), the hypothesis
test is a two-tailed test. In a two-tailed test, each tail has an area of P.
z
0 1 2 3
-3 -2 -1
Test
statistic
Test
statistic
H0: μ = k
Ha: μ  k
P is twice the area
to the left of the
negative test
statistic.
P is twice the area
to the right of the
positive test
statistic.
2
1
Making a Decision
Decision Rule Based on P-value
To use a P-value to make a conclusion in a hypothesis test, compare
the P-value with .
1. If P  , then reject H0.
2. If P > , then fail to reject H0.
Claim
Claim is H0 Claim is Ha
Do not reject H0
Reject H0
There is enough evidence to
reject the claim.
Decision
There is not enough evidence to
reject the claim.
There is enough evidence to
support the claim.
There is not enough evidence to
support the claim.
Interpreting a Decision
Example:
You perform a hypothesis test for the following claim. How should you interpret your decision
if you reject H0? If you fail to reject H0?
H0: (Claim) A cigarette manufacturer claims that one-eighth
of the US adult population smokes cigarettes.
If H0 is rejected, you should conclude “there is sufficient evidence to
indicate that the manufacturer’s claim is false.”
If you fail to reject H0, you should conclude “there is not sufficient
evidence to indicate that the manufacturer’s claim is false.”
Steps for Hypothesis Testing
1. State the claim mathematically and verbally. Identify the
null and alternative hypotheses.
2. Specify the level of significance.
3. Determine the standardized
sampling distribution and
draw its graph.
H0: ? Ha: ?
 = ?
4. Calculate the test statistic
and its standardized value.
Add it to your sketch. Test statistic
This sampling distribution is
based on the assumption that
H0 is true.
z
0
Continued.
z
0
Steps for Hypothesis Testing
5. Find the P-value.
6. Use the following decision rule.
7. Write a statement to interpret the decision in the context of
the original claim.
Is the P-value less than or
equal to the level of
significance?
Fail to reject H0.
Yes
Reject H0.
No
These steps apply to left-tailed, right-tailed, and two-tailed
tests.3
Chapter IV
Hypotheses test based on two samples
Two Sample Hypothesis Testing
In a two-sample hypothesis test, two parameters from two populations are
compared.
For a two-sample hypothesis test,
1.the null hypothesis H0 is a statistical hypothesis that usually states
there is no difference between the parameters of two populations.
The null hypothesis always contains the symbol , =, or .
2.the alternative hypothesis Ha is a statistical hypothesis that is true
when H0 is false. The alternative hypothesis always contains the
symbol >, , or <.
Two Sample Hypothesis Testing
To write a null and alternative hypothesis for a two-sample hypothesis test, translate
the claim made about the population parameters from a verbal statement to a
mathematical statement.
H0: μ1 = μ2
Ha: μ1  μ2
H0: μ1  μ2
Ha: μ1 > μ2
H0: μ1  μ2
Ha: μ1 < μ2
Regardless of which hypotheses used, μ1 = μ2 is always assumed to be true.
Two Sample z-Test
Three conditions are necessary to perform a z-test for the
difference between two population means μ1 and μ2.
1.The samples must be randomly selected.
2.The samples must be independent. Two samples are independent
if the sample selected from one population is not related to the
sample selected from the second population.
3.Each sample size must be at least 30, or, if not, each population
must have a normal distribution with a known standard deviation.
Two Sample t-Test
If samples of size less than 30 are taken from normally-distributed
populations, a t-test may be used to test the difference between the
population means μ1 and μ2.
Three conditions are necessary to use a t-test for small independent samples.
1.The samples must be randomly selected.
2.The samples must be independent. Two samples are independent if the
sample selected from one population is not related to the sample selected
from the second population.
3.Each population must have a normal distribution, and samples of size less
than 30 .
Chapter V
Analysis of Variance
 Anova is a procedure that can be used to analyze the results
from both simple and complex experiments
 Reveals whether the obtained difference between any
treatments is real or occur by chance.
 It partition the total variation in to different components and
test their significance
Assumptions of ANOVA
 Most of the analysis are based on linear models (regression and ANOVA models)
1)Normality
 Errors or residuals must be normally distributed. This is ensured through proper randomization
and blocking
 Another way of assessing normality is to use probability plots (pplots) of the residuals-this
examines frequency distribution of your data, and compare the shape of that distribution to that
expected normal distribution
 For normal, the pplot will be a straight line; various kinds of skewness..etc
 If normality is violated the F-test is invalid
 Can be checked using box plots,..etc
2)Homogeneity of variances
The variance in the response variable is the same at each level, or combination of levels, of the
predictor variables.
check the normal distribution, but unequal variances may occur if sample sizes are small.
3) Linearity
Parametric correlation and linear regression analyses are based on straight-line relationships between
variables.
This assumption is checked by examining a scatter plot of the two variables or more variables
4)Independence of errors
This assumption implies all the observations should be independent of each other
Any treatment should be assigned randomly to any of the experimental units through proper
randomization to avoid dependency .
This assumption is not met when the same experimental unit is affected by under different treatments
If violated mean square of error will be inflated and
type II will occur
Analysis of variance (ANOVA)
 Commonly used to determine differences between several
groups or treatments
 Partitioning of total variation into different component
 Is used when we have two and more than two treatment levels
 The simplest ANOVA -single factor – one-way ANOVA.
Chapter VI
Correlation and Regression Analysis
 Because of the nature of agricultural research that focuses primarily on the behaviour of
biological organisms in a specified environment, the association among treatments,
environmental factors, and responses that are usually evaluated in livestock research are
association between response variables, Association between response and treatments,
association between response and environment.
 Both correlation and regression has a numbers of advantages:-
 To know the association/relationships between numbers of variables that could
affect the response of treatment on the experimental units in an experiment.
 To understand the association of different variables on the response of animal
performances in an experiment
 To predict the association of different variables on animal performance so that it
would be possible to adjust the amount of treatments used on the experimental
units.
Correlation analysis
 The discovering and measuring of the magnified and direction of the relationship between
two/more variable is called Correlation.
 It is a measure of the degree to which variables vary together or a measure of the intensity
of the association between different variables in an experiment.
 Suppose you have two continuous variables X and Y, if the change in one variable affects,
the change in the other variable, the variable X is said to be correlated with variable Y or
vice versa.
 In this case, the correlation between two or more variables does not necessarily interested to
have dependent or independent variables, both can be dependent or independent variables or
both alternatively.
 The correlation procedures can be classified according to the number of variables involved
and the form of the functional relationship between variables involved in the experiment.
 The procedures is termed simple if only two variables are involved and multiple, otherwise.
 The procedure is termed linear if the form of the underlying relationship is linear and non-
linear, otherwise.
 Thus, correlation analysis can be classified into four types.
1. Simple linear correlation
2. Multiple linear correlation
3. Simple non linear correlation
4. Multiple non linear correlation
 Correlation analysis is usually expressed by using index called coefficient of correlation
and it is symbolized by “r” incase of sample, and “p” incase of population.
 The values of coefficient of correlation range between –1 and 1, inclusively (−1 ≤ r ≤ 1). It
tells us only the magnitude, degree, and direction of association of the variables in an
experiment.
 For r > 0, the two variables have a positive correlation, and for r < 0, the two variables
have a negative correlation.
 The positive correlation means where the changes in both variables move in the same
direction (as values of one variable increase, increasing values of the other variable are
observed and as values of one variable decrease, decreasing values of the other variable are
observed).
 A negative correlation means that as values of one variable increase, decreasing values of
the other variable are observed or vice versa.
 The value r = 1 or r = –1 indicates an ideal or perfect linear relationship, and r = 0 means
that there is no linear association.
 Coefficient of correlation is unit free. It is not affected by change of the origin, scale or
both in an experiment.
 The coefficient of correlation ® is used under certain assumptions, such as the variables are
continuous, random variables and are normally distributed, the relationship between
variables is linear and each pair of observation is not connected with each other.
 The magnitude of correlation is calculated by the formula called coefficient of determination (r2
) that
shows the amount of change in one variable is accounted by the second variable.
 Correlation can be used as selection criteria in animal breeding if it is positive so decide up to what level the
variables are used in an experiment.
Example: From a research which is conducted in Horro Guduru Wollega Goats, the following weight and
heart girth data are taken. Calculate linear correlation coefficient and coefficient of determination.
Heart girth (x) Body weight (y) XiYi
70 25 1750
67 22 1474
73 32 2336
73 32 2336
65 20 1300
74 31 2294
73 31 2263
68 27 1836
Total ∑ Xi =563 ∑ Yi = 220 ∑XiYi = 15,589
Mean X-
= 70.4 y-
=27.5
Solution
 SSX =∑ xi2
– (∑ xi)2
= [(70)2
+ (67)2
+--- (68)2
] – (563)2
= 39701-39621.13 =79.87
n 8
 SSY = =∑ Yi2
– (∑ Yi)2
= [(25)2
+ (22)2
+--- (27)2
] – (220)2
= 6208 -6050 =158
n 8
 Cov XY = ∑XiYi – (∑Xi∑Yi)= [1750 +1474+---1836] – (563x220) = 15589-15482.5 = 106.5
n 8
 rXY (correlation coefficient) = Cov XY = 106.5 = 106.5 = 0.95
√SSX*SSY √79.87*158 112.34
 r2
(coefficient of determination) = 0.952
= 0.8988 = 89.88%. This shows that
the relation between heart girth(x) and body weight(y) variable is 89.88%.
Regression analysis
 It is often of interest to determine how changes of values of some variables influence the change
of values of other variables.
 For example, how alteration of air temperature affects feed intake, or how increasing the protein
level in a feed affects daily gain.
 In both the first and the second example, the relationship between variables can be described
with a function, a function of temperature to describe feed intake, or a function of protein level
to describe daily gain.
 A function that explains such relationships is called a regression function and analysis of such
problems and estimation of the regression function is called regression analysis.
 Regression includes a set of procedures designed to study statistical relationships among
variables in a way in which one variable is defined as dependent upon others defined as
independent variables.
 By using regression, the cause-consequence relationship between
the independent and dependent variables can be determined.
 In the examples above, feed intake and daily gain are dependent
variables, and temperature and protein level are independent
variables.
 Regression analysis describes the effect of one or more variables
(designated as independent) on a single variable (designated as
the dependent variable) by expressing the latter as a function of
the former.
 For this analysis, it is important to clearly distinguish between the dependent and
independent variable.
 The regression analysis tells us the cause and effect or the magnitude of relationship
between variables in an experiment.
 The regression procedures can be classified according to the number of variables
involved and the form of the functional relationship between variables involved in the
experiment.
 The procedures is termed simple if only two variables are involved and multiple,
otherwise.
 The procedure is termed linear if the form of the underlying relationship is linear and
non-linear, otherwise.
Thus, regression analysis can be classified into four types.
1.Simple linear regression
2.Multiple linear regression
3.Simple non linear regression
4.Multiple non linear regression
 The functional form of the linear relationship between a dependent variable Y and an
independent variable X is represented by the equation: Y= œ + βX, where œ is the
intercept of the line on the Y axis and β is the linear regression coefficient, is the slope of
the line or the amount of change in Y for each unit change in X.
 where there is more than one independent variable, so k independent variables (X1, X2,
X3,---Xk), the simple linear functional form of the equation Y= œ + βX can be extended to
the multiple linear functional form of Y = œ + β1X1+ β2X2 +----+ βkXk, where œ is the
intercept (the value of Y where all X’s are zero) and βi (i =1---k), is the regression
coefficient associated with independent variable Xi, represents the amount of change in Y
for each unit change in Xi.
 The two main applications of regression analysis are:
1. Estimation of a function of dependency between variables
2. Prediction of future measurements or means of the dependent variable using new measurements of the
independent variable(s).
 Example: From a research which is conducted in Horro Guduru Wollega Goats, the
following weight and heart girth data are taken.
Heart girth (x) Body weight (x) XiYi
70 25 1750
67 22 1474
73 32 2336
73 32 2336
65 20 1300
74 31 2294
73 31 2263
68 27 1836
Total ∑Xi =563 ∑Yi = 220 ∑XiYi = 15,589
Mean X-
= 70.4 y-
=27.5
Solution
SSX =∑ xi2
– (∑ xi)2
= [(70)2
+ (67)2
+--- (68)2
] – (563)2
= 39701-39621.13 =79.87
n 8
SSY = =∑ Yi2
– (∑ Yi)2
= [(25)2
+ (22)2
+--- (27)2
] – (220)2
= 6208 -6050 =158
n 8
Cov XY = ∑XiYi – (∑Xi∑Yi)= [1750 +1474+---1836] – (563x220) = 15589-15482.5 = 106.5
n 8
b = Cov XY/SSx = 106.5/79.87 = 1.33
Y = a + bx
a = y-
-b X-
= 27.5- (1.33 x 70.4) = 27.5 – 93.87 = -66.37
Y = -66.37 + 1.33x
Body weight = -66.37 + 1.33(heart girth)
CHAPTER 7
SAMPLING TECHNIQUES
 Many professions (business, government, engineering, science, social research, agriculture,
etc.) seek the broadest possible factual basis for decision-making. In the absence of data on
the subject, a decision taken is just like leaping into the dark.
 Sampling is a procedure, where in a fraction of the data is taken from a large set of data,
and the inference drawn from the sample is extended to whole group. The surveyor’s (a
person or a establishment in charge of collecting and recording data) or researchers initial
task is to formulate a rational justification for the use of sampling in his research. If
sampling is found appropriate for a research, the researcher, then:
(1) Identifies the target population as precisely as possible, and in a way that makes sense in
terms of the purpose of study.
(2) Puts together a list of the target population from which the sample will be selected. This
list is termed as a frame (more appropriately list frame) by many statisticians.
(3) Selects the sample, and decide on a sampling technique, and;
(4) Makes an inference about the population.
 All these four steps are interwoven and cannot be considered isolated from one another. Simple random
sampling, systematic sampling, stratified sampling fall into the category of simple sampling techniques.
Complex sampling techniques are used, only in the presence of large experimental data sets; when
efficiency is required; and, while making precise estimates about relatively small groups within large
populations .
Characteristics of Good Samples
 Representative
 Accessible
 Low cost
SAMPLING TERMINOLOGY
 A population is a group of experimental data, persons, etc. A population is built up of elementary units,
which cannot be further decomposed.
 A group of elementary units is called a cluster.
 Population Total is the sum of all the elements in the sample frame.
 Population Mean is the average of all elements in a sample frame or population.
 The fraction of the population or data selected in a sample is called the Sampling Fraction.
 The reciprocal of the sampling fraction is called the Raising Factor.
 A sample, in which every unit has the same probability of selection, is called a random
sample.
 If no repetitions are allowed, it is termed as a simple random sample selected without
replacement. If repetitions are permitted, the sample is selected with replacement.
PROBABILITY AND NON-PROBABILITY SAMPLING
 Probability sampling, is a sampling process that utilizes some form of random selection.
In probability sampling, each unit is drawn with known probability or has a non-zero
chance of being selected in the sample. Such samples are usually selected with the help
of random numbers. With probability sampling, a measure of sampling variation can be
obtained objectively from the sample itself.
 Non-probability sampling or judgment sampling depends on subjective judgment.
 The non-probability method of sampling is a process where probabilities cannot be
assigned to the units objectively, and hence it becomes difficult to determine the
reliability of the sample results in terms of probability. Examples of non-
probability sampling used extensively in 1920’s and 1930’s are the judgment
sample, quota sample, and the mail questionnaire.
 In non-probability sampling, often, the surveyor selects a sample according to his
convenience, or generality in nature. Non-probability sampling is well suited for
exploratory research intended to generate new ideas that will be systematically
tested later. However, if the goal is to learn about a large population, it is
imperative to avoid judgment of non-probabilistic samples in survey research.
 In contrast to probability sampling techniques, there is no way of knowing the
accuracy of a non-probabilistic sample estimate.
SAMPLING ERRORS
 Sampling errors occur as a result of calculating the estimate (estimated mean, total, proportion,
etc) based on a sample rather than the entire population. This is due to the fact that the estimated
figure obtained from the sample may not be exactly equal to the true value of the population. For
example, if a sample of blocks is used to estimate the total number of persons in the city, and the
blocks in the sample are larger than the average — then this sample will overstate the true
population of the city.
 When results from a sample survey are reported, they are often stated in the form “plus or minus”
of the respective units being used. This “plus or minus” reflects sampling errors. Salant and
Dilman, describe, that the statistics based on samples drawn from the same population always vary
from each other (and from the true population value) simply because of chance. This variation is
sampling error and the measure used to estimate the sampling error is the standard error.
 se (p) = [(p q)/n] where, se (p) is the standard error of a proportion, p and q is the proportion of the sample
that do (p) and do not (q) have a particular characteristic, and n = the number of units in the sample.
 Standard errors are usually used to quantify the precision of the estimates.
 Sample distribution theory, points out that about 68 percentage of the estimates lie within one standard
error or standard deviation of the mean, 95 percentages lie within two standard deviations and all estimates
lie within three standard deviations.
 Sampling errors can be minimized by proper selection of samples, and Salant and Dilman state ― “Three
factors affect sampling errors with respect to the design of samples – the sampling procedure, the variation
within the sample with respect to the variate of interest, and the size of the sample. Large sample results in
lesser sampling error.
NON-SAMPLING ERRORS
 The accuracy of an estimate is also affected by errors arising from causes such as incomplete coverage and
faulty procedures of estimation, and together with observational errors, these make up what are termed non
sampling errors.
 The aim of a survey is always to obtain information on the true population value. The idea is to get as close
as possible to the latter within the resources available for survey. The discrepancy between the survey
value and the corresponding true value is called the observational error or response error.
 Non-sampling errors occur as a result of improper records on the variate of interests,
careless reporting of the data, or deliberate modification of the data by the data collectors
and recorders to suit their interests. Non response error occurs when a significant
number of people in a survey sample are either absent; do not respond to the
questionnaire; or, are different from those who do in a way that is important to the study.
BIAS
 Although judgment sampling is quicker than probability sampling, it is prone to
systematic errors. For example, if 20 books are to be selected from a total of 200 to
estimate the average number of pages in a book, a surveyor might suggest picking out
those books which appear to be of average size.
 The difficulty with such a procedure is that consciously or unconsciously, the sampler
will tend to make errors of judgment in the same direction by selecting most of the
books which are either bigger than the average of otherwise. Such systematic errors lead
to what are called biases.
BASIC PROBABILISTIC SAMPLING TECHNIQUES
SIMPLE RANDOM SAMPLING
 Sample surveys deal with samples drawn from populations, and contain a finite number of N units. If
these units can all be distinguished from one another, the number of distinct samples of size n that can be
drawn from N units is given by the combinatorial formula-
 Objective: To select n units out of N, such that each number of combinations has an equal chance of
being selected, i.e., each unit in any given population has the same probability of being selected in the
sample.
 Procedure: Use a table random numbers, a computer random number generator, or a mechanical device to
select the sample.
 Example Suppose there are N = 850 students in a school from which a sample of n = 10 students is to
be taken. The students are numbered from 1 to 850. Since our population runs into three digits we use
random numbers that contain three digits. All numbers exceeding 850 are ignored because they do not
correspond to any serial number in the population. In case the same number occurs again, the
repetition is ignored. Following these rules the following simple random sample of 10 students is
obtained when columns 31 and 32 of the random numbers given in Appendix 1 are used.
251 546 214 495 074 800 407 502 513 628
 Remark: If repetitions are included, the procedure is termed as selecting a sample
with replacements. In the present example the sample is selected without
replacement.
SYSTEMATIC SAMPLING
 Systematic sampling is a little bit different from simple random sampling.
Suppose that N units of the population are numbered 1 to N in some order.
To select a sample of n units, we must take a unit at random from the first k
units and every kth unit thereafter.
Procedure:
1. Number the units in population from 1 to N
2. Decide on the n (sample size) that is required
3. Select an interval size k = N/n
4. Randomly select an integer between 1 to k
5. Finally, take every kth unit
 Let's assume that we have a population that only has N=100 people in it and that you
want to take a sample of n=20. To use systematic sampling, the population must be listed
in a random order. The sampling fraction would be n/N = 20/100 = 20%. In this case, the
interval size, k, is equal to N/n = 100/20 = 5. Now, select a random integer from 1 to 5.
 In our example, imagine that you chose 4. Now, to select the sample, start with the 4th
unit in the list and take every k-th unit (every 5th, because k=5). You would be sampling
units 4, 9, 14, 19, and so on to 100 and you would wind up with 20 units in your sample.
 In order for systematic sampling to work, it is essential that the units in the population be
randomly ordered, at least with respect to the characteristics you are measuring.
 Systematic sampling is fairly easy to do and is widely used for its convenience and time
efficiency. In many surveys, it is found to provide more precise estimates than simple
random sampling. This happens when there is a trend present in the list with respect to
the characteristic of interest.
 Systematic sampling is at its worst, when there is periodicity in the sampled data and the
sampling interval has fallen in line with it. When this happens, most of the units in the
sample will be either too high or low, which makes the estimate very variable.
STRATIFIED SAMPLING
 It involves dividing the population into homogeneous non-overlapping groups (i.e.,
strata), selecting a sample from each group, and conducting a simple random sample in
each stratum.
 On the basis of information available from a frame, units are allocated to strata by
placing within the same stratum, those units which are more-or-less similar with respect
to the characteristics being measured. If this can be reasonably achieved, the strata will
become homogenous, i.e., the unit-to-unit variability within a stratum will be small.
 Surveyors use various different sample allocation techniques to distribute the samples in
the strata.
 In proportional allocation, the sample size in a stratum is made proportional to the
number of units in the stratum. In equal allocation, the same number of units is taken
from each stratum irrespective of the size of the stratum.
CLUSTER SAMPLING
 The smallest units into which a population can be divided are called the elements of the
population, and groups of elements, the clusters.
 The problem with random sampling methods when sampling a population that's
distributed across a wide geographic region lies in covering a lot of ground
geographically in order to get to each of the units sampled. This geographic trotting to
collect samples is an expensive affair. But, without taking samples from across the whole
geographic population, it may become difficult to conclude anything affirmatively about
the population. The impasse is to determine the best size of the cluster for a specified cost
of the survey. This predicament can be solved if the cost of the survey and the variance of
the estimate can be expressed as functions of the size of the cluster.
 Cluster sampling includes:
1. Divide population into clusters
2. Randomly sample clusters
3. Measure all units within sampled clusters
 Cluster sampling is ordinarily conducted in order to reduce costs. The variance of
the estimate of the mean in simple random sampling of clusters depends on the
sample size, the population variance and on the correlation of the variate of
interest between units within the same cluster.
 If the units within a cluster are more similar than units belonging to different
clusters, the estimator is subject to a larger variance; thus the smaller the intra-
cluster correlation, the better.
MULTISTAGE SAMPLING
 Multistage sampling involves, combining various probability techniques in the most
efficient and effective manner possible. The process of estimation is carried out stage
by stage, using the most appropriate methods of estimation at each stage.
 Quite often, auxiliary information is used to improve the precision of an estimate. But,
in the absence of auxiliary information, it may be advantageous to conduct the enquiry
in two phases. In the first phase, auxiliary information is collected on the variate of a
fairly large sample. Then a sub-sample is taken, and information collected on the
variate of interest. Then the two samples are used in the best possible manner to
produce an estimate for the variate of interest. The procedure of first selecting clusters
and then choosing a specified number of elements from each selected cluster is known
as sub sampling. It is also known as two-stage sampling or double sampling.
 The clusters, which form the units of sampling at the first stage, are called first stage
units, and the elements or group of elements within clusters, which form the units of
sampling at the second stage, are called sub-units or second-stage units. The procedure
can be easily generalized to three or more stages and hence known as multi-stage
sampling. Double sampling can be used to reduce the response bias in survey results.
NON-PROBABILISTIC SAMPLING
ACCIDENTAL, HAPHAZARD OR CONVENIENCE SAMPLING
 One of the most common methods of sampling goes under the various titles listed here. I
would include in this category the traditional "man on the street" (of course, now it's
probably the "person on the street") interviews conducted frequently by television news
programs to get a quick (although non representative) reading of public opinion. I would
also argue that the typical use of college students in much psychological research is
primarily a matter of convenience. (You don't really believe that psychologists use
college students because they believe they're representative of the population at large, do
you?).
 In clinical practice, we might use clients who are available to us as our sample.
 In many research contexts, we sample simply by asking for volunteers. Clearly, the
problem with all of these types of samples is that we have no evidence that they are
representative of the populations we're interested in generalizing to -- and in many cases
we would clearly suspect that they are not.
PURPOSIVE SAMPLING
 In purposive sampling, sampling is done with a purpose in mind. We usually would have one or more
specific predefined groups we are seeking. For instance, have you ever run into people in a mall or on
the street who are carrying a clipboard and who are stopping various people and asking if they could
interview them? Most likely they are conducting a purposive sample (and most likely they are
engaged in market research). They might be looking for Caucasian females between 30-40 years old.
They size up the people passing by and anyone who looks to be in that category they stop to ask if
they will participate. One of the first things they're likely to do is verify that the respondent does in
fact meet the criteria for being in the sample.
 Purposive sampling can be very useful for situations where you need to reach a targeted sample quickly and where
sampling for proportionality is not the primary concern. With a purposive sample, you are likely to get the opinions of
your target population, but you are also likely to overweight subgroups in your population that are more readily
accessible.
 All of the methods that follow can be considered as sub-categories of purposive
sampling methods. We might sample for specific groups or types of people as in modal
instance, expert, or quota sampling. We might sample for diversity as in heterogeneity
sampling. Or, we might capitalize on informal social networks to identify specific
respondents who are hard to locate otherwise, as in snowball sampling. In all of these
methods we know what we want -- we are sampling with a purpose.
1. MODAL INSTANCE SAMPLING
 In statistics, the mode is the most frequently occurring value in a distribution. In sampling, when we do a
modal instance sample, we are sampling the most frequent case, or the "typical" case. In a lot of informal
public opinion polls, for instance, they interview a "typical" voter. There are a number of problems with
this sampling approach. First, how do we know what the "typical" or "modal" case is? We could say that
the modal voter is a person who is of average age, educational level, and income in the population. But,
it's not clear that using the averages of these is the fairest (consider the skewed distribution of income, for
instance). And, how do you know that those three variables -- age, education, income -- are the only or
event the most relevant for classifying the typical voter? What if religion or ethnicity is an important
discriminator? Clearly, modal instance sampling is only sensible for informal sampling contexts.
2. EXPERT SAMPLING
 Expert sampling involves the assembling of a sample of persons with known or demonstrable experience
and expertise in some area. Often, we convene such a sample under the auspices of a "panel of experts."
There are actually two reasons you might do expert sampling. First, because it would be the best way to
elicit the views of persons who have specific expertise. In this case, expert sampling is essentially just a
specific sub case of purposive sampling. But the other reason you might use expert sampling is to provide
evidence for the validity of another sampling approach you've chosen.
3. QUOTA SAMPLING
 In quota sampling, you select people non-randomly according to some fixed quota. There are
two types of quota sampling: proportional and non proportional. In proportional quota
sampling you want to represent the major characteristics of the population by sampling a
proportional amount of each. For instance, if you know the population has 40% women and
60% men, and that you want a total sample size of 100, you will continue sampling until you
get those percentages and then you will stop. So, if you've already got the 40 women for your
sample, but not the sixty men, you will continue to sample men but even if legitimate women
respondents come along, you will not sample them because you have already "met your
quota." The problem here (as in much purposive sampling) is that you have to decide the
specific characteristics on which you will base the quota. Will it be by gender, age, education
race, religion, etc.?
4. NON-PROPORTIONAL QUOTA SAMPLING
 It is a bit less restrictive. In this method, you specify the minimum number of sampled units
you want in each category. Here, you're not concerned with having numbers that match the
proportions in the population. Instead, you simply want to have enough to assure that you will
be able to talk about even small groups in the population.
 This method is the non-probabilistic analogue of stratified random sampling in that it is
typically used to assure that smaller groups are adequately represented in your sample.
5. HETEROGENEITY SAMPLING
 We sample for heterogeneity when we want to include all opinions or views, and we aren't
concerned about representing these views proportionately. Another term for this is sampling for
diversity. In many brainstorming or nominal group processes (including concept mapping), we
would use some form of heterogeneity sampling because our primary interest is in getting
broad spectrum of ideas, not identifying the "average" or "modal instance" ones. In effect, what
we would like to be sampling is not people, but ideas. We imagine that there is a universe of all
possible ideas relevant to some topic and that we want to sample this population, not the
population of people who have the ideas. Clearly, in order to get all of the ideas, and
especially the "outlier" or unusual ones, we have to include a broad and diverse range of
participants. Heterogeneity sampling is, in this sense, almost the opposite of modal instance
sampling.
6. SNOWBALL SAMPLING
 In snowball sampling, you begin by identifying someone who meets the criteria for inclusion in
your study. You then ask them to recommend others who they may know who also meet the
criteria. Although this method would hardly lead to representative samples, there are times
when it may be the best method available. Snowball sampling is especially useful when you are
trying to reach populations that are inaccessible or hard to find. For instance, if you are
studying the homeless, you are not likely to be able to find good lists of homeless people within
a specific geographical area. However, if you go to that area and identify one or two, you may
find that they know very well who the other homeless people in their vicinity are and how you
can find them.
Inference
 Two ways to make inference
Estimation of parameters
* Point Estimation (X or p)
* Intervals Estimation
Hypothesis Testing
Parameter
Parameter
Statistic
Statistic
Mean:
Standard
deviation:
Proportion:
s
X 


estimates
estimates
estimates
from sample
from entire
population
p
Mean, , is
unknown
Population Point estimate
I am 95%
confident that 
is between 40 &
60
Mean
X = 50
Sample
Interval estimate
Parameter
= Statistic ± Its Error
Sampling Distribution
X or P
X or P X or P
Standard Error
SE (Mean) =
S
n
SE (p) =
p(1-p)
n
Quantitative Variable
Qualitative Variable
95% Samples
Confidence Interval
X
_
X - 1.96 SE X + 1.96 SE
 SE
SE Z-axis
1 - α
α/2
α/2
95% Samples
Confidence Interval
SE
SE  p
p + 1.96 SE
p - 1.96 SE
Z-axis
1 - α
α/2
α/2
Interpretation of
CI
Probabilistic Practical
We are 100(1-)%
confident that the single
computed CI contains 
In repeated sampling 100(1-
)% of all intervals around
sample means will in the
long run include 
Example (Sample size≥30)
An epidemiologist studied the blood glucose
level of a random sample of 100 patients. The
mean was 170, with a SD of 10.
SE = 10/10 = 1
Then CI:
 = 170 + 1.96  1 168.04   ≥ 171.96
95
%
 = X + Z
SE
In a survey of 140 asthmatics, 35% had
allergy to house dust. Construct the 95% CI
for the population proportion.
 = p + Z
0.35 – 1.96  0.04   ≥ 0.35 + 1.96  0.04
0.27   ≥ 0.43
27%   ≥ 43%
Example (Proportion)
In a survey of 140 asthmatics, 35% had
allergy to house dust. Construct the 95% CI
for the population proportion.
 = p + Z
0.35 – 1.96  0.04   ≥ 0.35 + 1.96  0.04
0.27   ≥ 0.43
27%   ≥ 43%
P(1-p)
n 140
0.35(1-0.35) = 0.04
SE =
Hypothesis testing
A statistical method that uses sample
data to evaluate a hypothesis
about a population parameter. It
is intended to help researchers
differentiate between real and
random patterns in the data.
An assumption
about the population
parameter.
I assume the mean SBP of
participants is 120 mmHg
What is a Hypothesis?
 H0 Null Hypothesis states the Assumption
to be tested e.g. SBP of participants = 120
(H0:  120).
 H1 Alternative Hypothesis is the opposite of
the null hypothesis (SBP of participants ≠ 120
(H1:  ≠ 120). It may or may not be accepted
and it is the hypothesis that is believed to
be true by the researcher
Null & Alternative Hypotheses
 Defines unlikely values of sample statistic
if null hypothesis is true. Called rejection
region of sampling distribution
 Typical values are 0.01, 0.05
 Selected by the Researcher at the Start
 Provides the Critical Value(s) of the Test
Level of Significance, 
Level of Significance, a and the Rejection Region
0
 Critical
Value(s)
Rejection
Regions
H0: Innocent
Jury Trial Hypothesis Test
Actual Situation Actual Situation
Verdict Innocent Guilty Decision H0 True H0 False
Innocent Correct Error
Accept
H0 1 - 
Type II
Error ( )
Guilty Error Correct
H0
Type I
Error
( )
Power
(1 - )
Result Possibilities
False
Negative
False
Positive
Reject
 True Value of Population Parameter
 Increases When Difference Between Hypothesized
Parameter & True Value Decreases
 Significance Level 
 Increases When Decreases
 Population Standard Deviation 
 Increases When  Increases
 Sample Size n
 Increases When n Decreases
Factors Increasing
Type II Error


 

n
β
 d
 Probability of Obtaining a Test Statistic
More Extreme  or ) than Actual
Sample Value Given H0 Is True
 Called Observed Level of Significance
 Used to Make Rejection Decision
 If p value  Do Not Reject H0
 If p value <, Reject H0
p Value Test
State H0 H0 : 120
State H1 H1 : 
Choose  = 0.05
Choose n n = 100
Choose Test: Z, t, X2
Test (or p Value)
Hypothesis Testing: Steps
Test the Assumption that the true mean SBP of
participants is 120 mmHg.
Compute Test Statistic (or compute P value)
Search for Critical Value
Make Statistical Decision rule
Express Decision
Hypothesis Testing: Steps
 Assumptions
 Population is normally distributed
 t test statistic
One sample-mean Test
n
s
x
t 0
error
standard
value
null
mean
sample 




Example Normal Body Temperature
What is normal body temperature? Is it actually
37.6o
C (on average)?
State the null and alternative hypotheses
H0:  = 37.6o
C
Ha:   37.6o
C
Example Normal Body Temp (cont)
n
s
x
t 0
error
standard
value
null
mean
sample 




Data: random sample of n = 18 normal body temps
37.2 36.8 38.0 37.6 37.2 36.8 37.4 38.7 37.2
36.4 36.6 37.4 37.0 38.2 37.6 36.1 36.2 37.5
Variable n Mean SD SE t P
Temperature 18 37.22 0.68 0.161 2.38 0.029
Summarize data with a test statistic
STUDENT’S t DISTRIBUTION TABLE
Degrees of
freedom
Probability (p value)
0.10 0.05 0.01
1 6.314 12.706 63.657
5 2.015 2.571 4.032
10 1.813 2.228 3.169
17 1.740 2.110 2.898
20 1.725 2.086 2.845
24 1.711 2.064 2.797
25 1.708 2.060 2.787
 1.645 1.960 2.576
Example Normal Body Temp (cont)
Find the p-value
Df = n – 1 = 18 – 1 = 17
From SPSS: p-value = 0.029
From t Table: p-value is
between 0.05 and 0.01.
Area to left of t = -2.11 equals area
to right of t = +2.11.
The value t = 2.38 is between
column headings 2.110& 2.898 in
table, and for df =17, the p-values
-2.11 +2.11 t
Example Normal Body Temp (cont)
Decide whether or not the result is
statistically significant based on the p-value
Using  = 0.05 as the level of significance criterion,
the results are statistically significant because 0.029
is less than 0.05. In other words, we can reject the
null hypothesis.
Report the Conclusion
We can conclude, based on these data, that the mean
temperature in the human population does not equal
37.6.
 Involves categorical variables
 Fraction or % of population in a category
 Sample proportion (p)
n
p
Z
)
1
( 





One-sample test for proportion
size
sample
successes
of
number
n
X
p 

 Test is called Z test
where:
 Z is computed value
 π is proportion in population
(null hypothesis value)
Critical Values: 1.96 at α=0.05
2.58 at α=0.01
Example
• In a survey of diabetics in a large city, it was
found that 100 out of 400 have diabetic foot.
Can we conclude that 20 percent of diabetics in
the sampled population have diabetic foot.
• Test at the  =0.05 significance level.
Solution
Critical Value: 1.96
Decision:
We have sufficient evidence to
reject the Ho value of 20%
We conclude that in the
population of diabetic the
proportion who have diabetic foot
does not equal 0.20
Z
0
Reject Reject
.025
.025
= 2.50
Ho: π = 0.20
H1: π  0.20
Z =
0.25 – 0.20
0.20 (1- 0.20)
400
+1.96
-1.96
Research method for gggggggggggggggg.ppt

Research method for gggggggggggggggg.ppt

  • 1.
    Wollega University ShambuCampus Faculty of Agriculture Dep't Fisheries, Wetlands and Wildlife Research Methods and Experimental Design Fantahun Dereje
  • 2.
    Meaning and Conceptsof Scientific Research  The word research is composed of two syllables, re and search.  re is a prefix meaning again, anew or over again  search is a verb meaning to examine closely and carefully, to test and try, or to probe.  Together they form a noun describing a careful, systematic, patient study and investigation in some field of knowledge, undertaken to establish facts or principles.  Research is a structured enquiry that utilizes acceptable scientific methodology to solve problems and create new knowledge that is generally applicable.  Research is a process of collecting, analyzing and interpreting information to answer questions  Research refers to a search for knowledge or facts.  It can be also defined as a scientific and systematic search for pertinent information on a specific topic. In fact, Research is an art of scientific investigation.
  • 3.
    Meaning ……  Differentscholars define research in different ways as:  Redman and Mory define research as a “systematized effort to gain new knowledge.  Some people consider research as a movement:- a movement from the known to the unknown. It is actually a voyage of discovery.  We all possess the vital instinct of inquisitiveness or eagerness for, when the unknown confronts us, we wonder and our inquisitiveness makes us probe and attain full and fuller understanding of the unknown.  This inquisitiveness is the mother of all knowledge and the method which man employs for obtaining the knowledge of whatever the unknown can be termed as Research.
  • 4.
     Slesinger andM. Stephenson, define research as “the manipulation of things, concepts or symbols for the purpose of generalizing to extend, correct or verify knowledge, whether that knowledge aids in construction of theory or in the practice of an art.  Research is, thus, an original contributor to the existing stock of knowledge making for its advancement. It is the pursuit or hobbying of truth with the help of study, observation, comparison and experiment.  Research is an academic activity and as such the term should be used in a technical sense. According to Clifford Woody, research comprises defining and redefining problems, formulating hypothesis or suggested solutions; collecting, organizing and evaluating data; making deductions and reaching conclusions; and at last carefully testing the conclusions to determine whether they fit the formulated hypothesis or not.  In short, the search for knowledge through objectives and systematic method of finding solution to a problem is a Research. Meaning ……
  • 5.
    Research can beclassified from three perspectives:  Application of the research study  Objectives in undertaking the research  Inquiry mode employed 1.Based on Application of the research study:  From the point of view of application, there are two broad categories of research:  pure (basic or fundamental) research and  applied (or action) research TYPES OF RESEARCH
  • 6.
     Pure researchinvolves developing and testing theories and hypotheses that are intellectually challenging to the researcher but may or may not have practical application at the present time or in the future.  The knowledge produced through pure research is sought in order to add to the existing body of research methods.  Applied research is done to solve specific, practical questions; for policy formulation, administration and understanding of a phenomenon.  It is aimed at certain conclusions (solution) facing a concrete social or business problem  It can be exploratory, but is usually descriptive. TYPES OF RESEARCH…
  • 7.
    2. Based onobjectives in undertaking the research  From the viewpoint of objectives, a research can be classified as: -Descriptive -Correlational -Explanatory -Exploratory  Descriptive research attempts to describe systematically a situation, problem, phenomenon, service or programme, or provides information about, say, living condition of a community, or describes attitudes towards an issue.  The main characteristic of this method is that the researcher has no control over the variables; he can only report what has happened or what is happening. TYPES OF RESEARCH…
  • 8.
     Correlational researchattempts to discover or establish the existence of a relationship/ interdependence between two or more aspects of a situation.  Explanatory research attempts to clarify why and how there is a relationship between two or more aspects of a situation or phenomenon.  Exploratory research is undertaken to explore an area where little is known or to investigate the possibilities of undertaking a particular research study (feasibility study/ pilot study).  In practice most studies are a combination of the first three categories. TYPES OF RESEARCH…
  • 9.
    3. Based oninquiry mode employed  From the process adopted to find answer to research questions – the two approaches are: - Structured approach - Unstructured approach  Structured approach: The structured approach to inquiry is usually classified as quantitative research.  Here everything that forms the research process- objectives, design, sample, and the questions that you plan to ask of respondents- is predetermined.  It is more appropriate to determine the extent of a problem, issue or phenomenon by quantifying the variation.  e.g. how many people consume fish? How many people have good attitude toward fish rearing? TYPES OF RESEARCH…
  • 10.
     Unstructured approach:The unstructured approach to inquiry is usually classified as qualitative research.  This approach allows flexibility in all aspects of the research process.  It is more appropriate to explore the nature of a problem, issue or phenomenon without quantifying it.  Main objective is to describe the variation in a phenomenon, situation or attitude.  e.g., description of an observed situation, the historical enumeration of events, an account of different opinions different people have about an issue, description of  working condition in a particular industry. Note: In many studies you have to combine both qualitative and quantitative approaches. TYPES OF RESEARCH…
  • 11.
    Steps in ResearchProcess 1. Formulating the Research Problem 2. Extensive Literature Review 3. Developing the objectives 4. Preparing the Research Design including Sample Design 5. Collecting the Data 6. Analysis of Data 7. Generalisation and Interpretation 8. Preparation of the Report or Presentation of Results-Formal write ups of conclusions reached.
  • 12.
    Levels and principlesof research planning  After identifying and defining the problem, researcher must arrange his ideas in order and write them in the form of an experimental plan or what can be described as ‘Research Plan’.  This is essential specially for new researcher because: a. It helps to organize ideas in a form whereby it will be possible to look for flaws and inadequacies, if any b. It provides an inventory of what must be done and which materials have to be collected as a preliminary step c. It is a document that can be given to others for comment.  Research plan must contain the following items. 1. Research objective should be clearly stated, what it is that the researcher expects to do 2. The problem to be studied by researcher should be put clearly 3. Each major concept which to be measured should be defined in operational terms in context of the research project 4. The method to be used in solving the problem 5. It must also state the details of the techniques to be adopted
  • 13.
    Research problem  Itis the first and most crucial step in the research process - Main function is to decide what you want to find out about. • The way you formulate a problem determines almost every step that follows. Steps in formulation of a research problem Step 1 Identify a broad field or subject area of interest to you. Step 2 Dissect the broad area into sub areas. Step 3 Select what is of most interest to you. Step 4 Raise research questions. Step 5 Formulate objectives. Step 6 Assess your objectives. Step 7 Double check.
  • 14.
    Research question/hypothesis  Detailthe problem statement  Further describe and refine the issue under study  Add focus to the problem statement  Guide data collection and analysis  Sets context
  • 15.
    Selection of appropriatemethodology  Research methodology is a systematic way to solve a problem. It is a science of studying how research is to be carried out. Essentially, the procedures by which researchers go about their work of describing, explaining and predicting phenomena are called research methodology. It is also defined as the study of methods by which knowledge is gained. Its aim is to give the work plan of research.  Research methodology tells you which has to be used out of the various existing methods. More precisely, research methods help us get a solution to a problem. On the other hand, research methodology is concerned with the explanation of the following:  (1) Why is a particular research study undertaken?  (2) How did one formulate a research problem?  (3) What types of data were collected?  (4) What particular method has been used?  (5) Why was a particular technique of analysis of data used?
  • 16.
    RESEARCH APPROACHES (QuantitativeAnd Qualitative) Method 1. Quantitative Approach  Involves the generation of data in quantitative form which can be subjected to rigorous quantitative analysis in a formal fashion. This approach can be further sub-classified into Inferential, Experimental and Simulation Approaches to research.  The purpose of inferential approach to research is to form a data base from which to infer characteristics or relationships of population. This usually means survey research where a sample of population is studied (questioned or observed) to determine its characteristics, and it is then inferred that the population has the same characteristics.  Experimental approach is characterized by much greater control over the research environment and in this case some variables are manipulated to observe their effect on other variables.
  • 17.
     Simulation approachinvolves the construction of an artificial environment within which relevant information and data can be generated. This permits an observation of the dynamic behaviour of a system (or its sub-system) under controlled conditions.  Qualitative approach to research is concerned with subjective assessment of attitudes, opinions, behaviour, etc. Research in such a situation is a function of researcher’s insights and impressions. Such an approach to research generates results either in non- quantitative form or in the form which can not be subjected to rigorous quantitative analysis. Generally, the techniques of focus group interviews, projective techniques and depth interviews are used.
  • 18.
    Research strategy: experimentaland survey research Design Experimental Research Design  Experiment :  An experiment is a test or a series of tests  is a planned inquiry to investigate new facts or to confirm or deny the results of the previous experiments Experimentation is used to obtain  New information or to improve the results of previous findings  It helps to answer questions An experimental design:  is a planned interference in the natural order of events by the researcher.
  • 19.
    Importance of Experimentaldesign To provide estimates of a treatment effects or differences among treatment effects. To provide an efficient way of confirming or denying hypothesis about the response to treatments. To control experimental errors and increase precision by reducing external variation in experimental error. To facilitate the application of treatments, management operations and harvest of the plots.
  • 20.
    Types of ExperimentalDesigns 1)Complete block design – When a block contains all the treatment – Number of replication equal to blocks e.g. Completely randomized design, Completely Randomized block design, Spilt plot 2)Incomplete block design  When the block does not contain all treatment  Number of blocks is not the same as that of replication e.g:Lattice square and latin rectangle, Augmented designs, replication may contain two or more blocks.
  • 21.
    Survey research Design Theterm survey is used for the techniques of investigation by a direct observation of a phenomenon or a systematic gathering of data from population by applying personal contact and interviews when adequate information about certain problem is not available in records, files and other sources. The survey is an important tool to gather evidences relating to certain social problems. The term social survey indicates the study of social phenomena through a survey of a small sampled population and also to broad segments of population. It is concerned with the present and attempts to determine the status of the phenomenon under investigation.
  • 22.
     The methodof survey research is a non-experimental (that is, it does not involves any observation under controlled conditions), descriptive research method which is one of the quantitative method used for studying of large sample.  In a survey research, the researcher collects data with the help of standardised questionnaires or interviews which is administered on a sample of respondents from a population (population is sometimes referred to as the universe of a study which can be defined as a collection of people or object which possesses at least one common characteristic).  The method of survey research is one of the techniques of applied social research which can be helpful in collection of data both through direct (such as a direct face to face interview) and indirect observation (such as opinions on library services of an institute).
  • 23.
    STEPS INVOLVED INCONDUCTING SURVEY RESEARCH Any type of survey research follows the following systematic steps Step 1: Determination of the aims and objectives of study Step 2: Define the population to be studied Step 3: Design and construct a survey Step 3: Select a representative sample Step 4: Administer the survey Step 5: Analyse and interpret the findings of the survey Step 6: Prepare the report of the survey Step 7: Communicate the findings of the survey
  • 24.
    TYPES OF SURVEYRESEARCH Basically there are two major types of survey: cross sectional surveys and longitudinal surveys Cross sectional surveys: are used by the researcher when he or she wants to collect data from varied or different types of groups ( that may be in terms of age, sex, group, nation, tribes and so on) at a single time.  An example of a survey can be a study on the effect of socialization of children of different age groups of a particular country. This type of survey is less time consuming and economical as well. Longitudinal survey: is used only when the subject wants to study the same sample for a longer period of time. Such longitudinal studies may be used to study behavioural changes, attitude changes, religious effects or any event or practice that may have a long time effect on the selected sample or population. There are three main types of longitudinal studies which help the researcher to analyse the long term effects on the selected sample. These include  (i) Trend studies  (ii) Cohort studies and  (iii) Panel studies.
  • 25.
  • 26.
    2.1 Introduction  Inagricultural research, the key question to be answered is generally expressed as a statement of hypothesis. This hypothesis has to be verified or disproved through experimentation.  Once a hypothesis is framed, the next step is to design a procedure for its verification.  This is the Experimental Procedure or Research Methodology, which usually consists of four phases:  Selecting the appropriate materials to test the hypothesis  Specifying the characters to measure  Selecting the procedure/design to measure those characters  Specifying the procedure/method of analyzing the characters to determine whether the measurements made support the hypothesis or not.  In general, the first two phases are fairly easy for a subject matter specialist to specify.  On the other hand, the procedures/design regarding how measurements are to be made and how prove or disprove a hypothesis depend heavily on techniques developed by statisticians
  • 27.
     The proceduresand how measurements can prove/disprove the hypothesis requires generally the idea of experimentation.  This is what we call design of the experiments.  The design of experiments has 3 essential components Estimate of error Control of error Proper interpretation of results obtained either verified/disproved
  • 28.
    1. Estimate ofError  We need to compare the two cattle breeds in terms of their milk yield.  Breed A and B have received the same management and are housed side by side.  Milk yield is measured and higher yield is judged better.  The difference in milk yield of the two breeds could be caused due to breed differences.  But this certainly is not true.  Even if the same breed might have been housed on both houses, the milk yield could differ.  Other factors such as, climatic factors (temperature), damage by disease and insects affect milk yield. Exotic breed (A) Local breed /indigenous (B)
  • 29.
     Therefore, asatisfactory evaluation of the two cattle breeds must involve a procedure that can separate breed difference from other sources of variation.  Therefore, the animal breeder must be able to design an experiment that allows them to decide whether the milk yield difference observed is caused by breed difference or by other factors.  In this case, we are able to estimate the exact experimental error in livestock research.  The difference among experimental plots/materials treated alike (similarly) is called Experimental Error.  This error is the primary basis for deciding whether an observed difference is real or just due to chance.  Clearly, every experiment must be designed to have a measure of the experimental error. It is unavoidable but try to be reduced as minimum as possible in the experiment.
  • 30.
    Methods To ReduceExperimental Error  Increase the size of experiment either through provision of more replicates or by inclusion of additional treatments.  Refine or improving the experimental techniques/procedures  Have uniformity in the application of treatments such as equally spreading of fertilizers, recording data on the same day, similar housing, similar feeding, etc.  Control should be done over external influences so that all treatments produce their effects under comparable conditions e.g. protecting disease, etc.
  • 31.
    1.1. Replication  Itis the repetition of treatments in an experiment. At least two plots/experimental materials of the same breed/variety are needed to determine the difference among plots/experimental materials treated alike.  Experimental error can be measured if there are at least two plots treated the same or receiving the same treatment. Thus, to obtain a measure of experimental error, replication is needed.  The advantage of replication in an experiment is to increase precision of error estimation and error variance is reduced and easily estimated.
  • 32.
    Functions of Replication Provides an ease of estimate of exp,tal error  Because it provides several observations on experimental units receiving the same treatment. For an experiment on which each treatment appears only once, no estimate of experimental error is possible.  Improves the precision or accuracy of an experiment  As the number of replicates increases, the estimates of population means as observed treatment means becomes more precise.  Increases the scope of inference/conclusion of the experiments
  • 33.
    1.2. Randomization  Randomizationensures that no treatment is consistently favored or discriminated being placed under best or unfavorable conditions, thereby avoiding biasness.  It means that each variety/breeds of animal will have equal chance of being assigned to any experimental plots.  It also ensures independence among observations, which is a necessary condition for validity of assumptions to provide significance tests and confidence intervals.  Randomization can be done by using random number, lottery system, or coin system.  Thus, experimental error of the difference will be reduced if assigned randomly and independently.
  • 34.
    2. Control ofError  The ability of the experiment to detect the existing difference among treatments/experimental materials is increased as the size of the experimental error decreased.  A good experiment should incorporate all possible means of minimizing the experimental error.  Three commonly used techniques for controlling experimental error in agricultural research are as follows:  Blocking  Proper plot technique  Proper data analysis
  • 35.
    1. Blocking  Puttingexperimental units that are as similar as possible together in the same group is generally referred as a block.  By assigning all treatments/experimental plots in to each block separately and independently, variation among blocks can be measured and removed from experimental error.  Reduction in experimental error is usually achieved with the use of proper blocking techniques in different experimental designs. 2. Proper plot technique  For all experiments it is absolutely essential that except treatments, all other factors must be maintained uniformly for all experimental units.  For example, for a forage variety trial where the treatments consists solely of the test varieties, it is required that all other factors such as soil nutrients, solar energy, temperature, plant population, pest incidence and other infinite environmental factors should be maintained as uniformly for all plots in the experiments as possible. This is primarily a concern of a good plot technique. 3. Proper data analysis  In cases, where blocking alone may not able to achieve adequate control of experimental error, proper data analysis can help greatly. In this case, covariance analysis is most commonly used for this purpose.
  • 36.
    3. Proper Interpretationof Results of an Experiment  After estimating and controlling experimental error, the result of experiment must be interpreted properly according to the situation of the environment and conditions of the experiment for practical application.  For example, the DMY of the forage variety must be reported based on the environmental conditions where the study is conducted including climatic data (temperatures, rainfall, others), soil fertility and type, topography, and others as much as possible.
  • 37.
    Analysis of Variance(ANOVA)  Anova is a procedure that can be used to analyze the results from both simple and complex experiments  Reveals whether the obtained difference between any treatments is real or occur by chance.  It partition the total variation in to different components and test their significance
  • 38.
    Overview of someexperimental designs  The most common types of designs used in agricultural research:  Completely randomized design(CRD)  Completely randomized block design(RCBD)  Latin square design(LSD)  lattice design  Augmented designs  Split plot design.
  • 39.
    Completely Randomized Design(CRD)  The simplest and least restrictive design.  The only restrictions:  Experimental units are homogeneous.  Treatments are assigned completely at random  Advantages: – Flexibility – Statistical analysis simple – provides maximum degrees of freedom for error  Disadvantages: – Low precision
  • 40.
    When to UseCRD? when the experimental area -units/plots are more or less homogeneous and where environmental effects are relatively easy to control, e.g., laboratory and greenhouse. CRD is flexible and the statistical analysis is also simple even when there are unequal replications or missing value: Df=t(r-1)
  • 41.
    Randomized Completely BlockDesign (RCBD) It is the most frequently used experimental design in field experiments. It has three sources of variation such as treatment, blocks and experimental error. This has one additional source of variation than CRD. It can be used when the experimental units that can be meaningfully grouped
  • 42.
    Characteristics of RCBdesign The number of blocks are equal to the number of replications The number of plots in each replication (block) is equal to the number of treatment. The treatments are randomized in each replication subjected to the restriction that each treatment occur once and only once. Blocking (grouping) is done based on the gradient: Soil heterogeneity, slope, initial body weight, age, sex, and breed of animal, Slope Blocking minimizes the variability with in each block while the variability among blocks is maximized.
  • 43.
     Block shape,size and orientation determination – When the gradient is unidirectional, use narrow blocks perpendicular to the direction of the gradient. – When the gradient occur in two directions ignore the weaker gradient. – Arrange your blocks perpendicular to the stronger gradient but reduce the length of blocks. – Blocking reduces experimental error by eliminating the contribution of known sources of variation among experimental units.
  • 44.
    Advantages of RCBdesign Precision: More precision is obtained than with CRD because grouping experimental units into blocks reduces the magnitude of experimental error. Flexibility: Theoretically there is no restriction on the number of treatments or blocks. Ease of Analysis: The statistical analysis of the data is simple Disadvantage When the number of treatments is large (>15), variation among experimental units within a block becomes large, resulting in a large error. In such situations better to use other designs.
  • 45.
    Factorial experiment The designsare applicable to any type of experiments, regardless of the structure of treatments. Two types፡ single & multi-factor experiment: Single factor experiment: An experiment that is concerned with testing several levels of one factor, keeping all other factors constant Multi - factor experiments: two or more factors where effects and cross- effects are tested simultaneously.
  • 46.
    Characteristics of factorialexperiments Factorial experiments are those trials that can accommodate more than one factor, each of which having two or more levels. All possible combinations of factors and levels vary simultaneously. They do not have their own designs. Combination of treatments make possible to find differential effect of one factor at two or more levels of the second factor. Factorials will have an error term if the designs are CRBD, CRD or Latin square. However, if the design is split plot, two error systems will be used. An interaction effect between two factors can be measured only if the two factors are tested together in the same experiment. If the interaction effect is significant, more attention should be given on the results of interaction than main effects Estimation of missing plot values is more complex in factorial experiments.
  • 47.
    Disadvantages As the numberof factors increase the size of experiment becomes very large and complex. e. g: with 8 factors each at 2-levels, there are 28, 256 treatment combinations. Large factorial experiments are difficult to interpret especially when there are interactions.
  • 48.
  • 49.
  • 50.
    Structure/format for proposalwriting Definition of proposal The research proposal is the document that finally establishes that there is a niche for your chosen area of study and that the research design is feasible. The research proposal: - helps you to think out the research project you are about to undertake and predict any difficulties that might arise. For those who aren't quite sure what their focus will be, the research proposal can be a space to explore options -- perhaps with one proposal for each potential topic (which can then be more easily compared and evaluated than when they are still just ideas in one's head). Research proposals can be effective starting places to discuss projects with your professors or advisors, too.
  • 51.
     A professorwho is initially skeptical about a project may be able to imagine it more easily after reading a well written research proposal (this doesn't mean he or she will approve the topic, especially if there are significant potential difficulties that you haven't considered).  Once you have begun your research project, a research proposal can help you to remain on track -- and can also remind you why you started this project in the first place!  Researchers very often begin to lose heart about two thirds of the way into a project when their research hits a snag or when they are having problems developing a thesis, organizing the ideas, or actually starting to write.  Re-reading the initial research proposal, especially "Significance" can re-energize the project or help the researcher to refocus in an effective manner.
  • 52.
    General Elements orstructures of research proposal:  Cover page (include topic or title, institution name, your name, advisor name, time of submission)  Acknowledgements (optional)  Abbreviations and Acronyms  Table of contents  List of tables (If any)  List of figures (If any)  Introduction  Literature review  Materials and Methods  Plan of activities  Logistics  References  Appendix
  • 53.
    General form ofthe main research report Put together structure of the paper: Cover page (include topic or title, institution name, your name, advisor name, time of submission) Acknowledgements (optional) Abbreviations and Acronyms Table of contents List of tables (If any) List of figures (If any) Abstract Introduction Methods & Materials Results and Discussion Summary & Conclusions and Recommendation References Divide long sections into subsections
  • 54.
    Reference and citations Citation Textualcitation Use name and date for published works  Minale Simachew (1999) stated that or (Minale Simachew 1999) For co-authored published works  Getachew Belay and Hailu Tefera (2006)…. For more than 2 authors  Aster Bedaso et al., (2001) reported Handel second hand citations in one of these ways  Seid Ahemed (2005) cited in Hailu Tefera (2007) dicussed…… or Hailu Tefera (2007) quoting Seid Ahemed (2005) discussed……..
  • 55.
     References  Commonauthor/date system  Arrange in alphabetical order of the surnames  Example:  Abebe W. 1991.Traditional husbandry practices and major health problems of camels in the Ogaden, Ethiopia. Nomadic Peoples, 29:21-30.  Alemayehu G. 2001. Breeding program and evaluation of semen characteristics of Camels in the Central Rift Valley of Ethiopia, an MSc Thesis Presented to the School of Graduate Studies of Alemaya University.  CARE-Ethiopia 2009. Value Chain Analysis of Milk and Milk Products in Borena Pastoralist Area. Addis Ababa: CARE Ethiopia.  If two or more entries have the same author (s) in the same publication year alphabetize the entries by title and use lower case letters (a,b,c etc) to separate their identity.  Example:  Sampath S. (2001)a Sampling Theory and Methods, Narosa Publishing House, New Delhi.  Sampath S. (2001)b Statistical Theory and Methods, Narosa Publishing House, New Delhi.
  • 56.
  • 57.
     Statistics isa Science of Inference • Statistical Inference: – Predict and forecast values of population parameters – Test hypotheses about values of population parameters – Make decisions On basis of sample statistics derived from limited and incomplete sample information – Make generalizations about the characteristics of a population On the basis of observations of a sample, a part of a population –Unbiased, representative sample drawn at random from the entire population.  A sample statistic is a numerical measure of a summary characteristic of a sample.  A population parameter is a numerical measure of a summary characteristic of a population.
  • 58.
    Estimator •The sample mean,X , is the most common estimator of the population mean,  •The sample variance, s2 , is the most common estimator of the population variance, 2 . •The sample standard deviation, s, is the most common estimator of the population standard deviation, . •The sample proportion, , is the most common estimator of the population proportion, p. Inferential Statistics involves three distributions: A population distribution – variation in the larger group that we want to know about. A distribution of sample observations – variation in the sample that we can observe. A sampling distribution – a normal distribution whose mean and standard deviation are unbiased estimates of the parameters and allows one to infer the parameters from the statistics. p̂
  • 59.
    Sampling Distributions The samplingdistribution of a statistic is the probability distribution of all possible values the statistic may assume, when computed from random samples of the same size, drawn from a specified population. The sampling distribution of X is the probability distribution of all possible values the random variable may assume when a sample of size n is taken from a specified population.  When sampling from a normal population with mean  and standard deviation , the sample mean, X, has a normal sampling distribution: X N n ~ ( , )   2
  • 60.
    An estimator ofa population parameter is a sample statistic used to estimate the parameter. The most commonly-used estimator of the: Population Parameter Sample Statistic Mean () is the Mean (X) Variance (2 ) is the Variance (s2 ) Standard Deviation () is the Standard Deviation (s) Proportion (p) is the Proportion ( )  p  Desirable properties of estimators include: Unbiasedness Efficiency Consistency Sufficiency Estimators and Their Properties
  • 61.
    Probability Distribution Normal distribution Naturallymost variables are assumed to be distributed normally, where the distribution curve takes a bell-shape. E.g. height or body weight of people in Shambu A normal distribution can be completely described by its mean and standard deviation. N (µ, δ) Binomial distribution It is one of the most widely used discrete distributions. A binomial distribution can be thought of as simply the probability of a SUCCESS or FAILURE outcome in an experiment or survey that is repeated multiple times. The binomial is a type of distribution that has two possible outcomes. For two possible outcomes: pass or fail.
  • 62.
    For the binomialmodel to be applied the following four criteria must be satisfied 1. the trial is carried out a fixed number of times n. 2. the outcomes of each trial can be classified into two ‘types’ success or failure. 3. the probability p of success remains constant for each trial. (tails, heads, fail or pass) is exactly the same from one trial to another. 4. the individual trials are independent of each other. In other words, none of your trials have an effect on the probability of the next trial.  For example, if we consider throwing a coin 7 times what is the probability that exactly 4 heads occur?  This problem can be modelled by the binomial distribution since the four basic criteria are assumed satisfied as we see.  here the trial is ‘throwing a coin’. This is carried out 7 times  the occurrence of a head on any given trial (i.e. throw) may be called a success  the probability of success is p = 1/2 and remains constant for each trial  each throw of the coin is independent from the others.
  • 63.
    Central Limit Theorem Is a statistical concept regarding the relationship between sample size and the distribution of sample statistic (sample mean);  It is a concept closely related to the law of large numbers (LoLN);  The CLT states that for a su ciently large sample size n, a normal distribution will ffi occur regardless of what the initial distribution looks like. Law of Large Numbers As n grows, the probability that the mean of n samples is close to µ goes to 1 Central Limit Theorem As n grows, the distribution of the mean of n samples converges to the normal distribution
  • 64.
     This Theoremtell us:  Even if a population distribution is skewed, we know that the sampling distribution of the mean is normally distributed  As the sample size gets larger, the mean of the sampling distribution becomes equal to the population mean  As the sample size gets larger, the standard error of the mean decreases in size (which means that the variability in the sample estimates from sample to sample decreases as N increases).  It is important to remember that researchers do not typically conduct repeated samples of the same population.  Instead, they use the knowledge of theoretical sampling distributions to construct confidence intervals around estimates.
  • 65.
  • 66.
     Estimation –A process whereby we select a random sample from a population and use a sample statistic to estimate a population parameter. Statistical inferences of estimation has two general areas:  Point Estimate  Interval Estimate  Point Estimate – A sample statistic used to estimate the exact value of a population parameter – Most common Point Estimators  Sample mean estimates population mean   Sample std. dev. estimates population std. dev.   Sample proportion estimates population proportion  ˆ i y y n     2 ( ) ˆ 1 i y y s n       ˆ 
  • 67.
     Interval estimate–an inferential statistical procedure used to estimate population parameters from sample data through the building of confidence intervals  Confidence Intervals: a range of values computed from sample data that has a known probability of capturing some population parameter of interest  A defined interval of values that includes the statistic of interest, by adding and subtracting a specific amount from the computed statistic  Confidence Level – The likelihood, expressed as a percentage or a probability, that a specified interval will contain the population parameter.  95% confidence level – there is a .95 probability that a specified interval DOES contain the population mean. In other words, there are 5 chances out of 100 (or 1 chance out of 20) that the interval DOES NOT contain the population mean.  99% confidence level – there is 1 chance out of 100 that the interval DOES NOT contain the population mean.
  • 68.
    Various Levels ofConfidence When population standard deviation is known use Z table values:  For 95%CI: mean +/- 1.96 s.e. of mean  For 99% CI: mean +/- 2.58 s.e. of mean When population standard deviation is not known use “Critical Value of t” table  For 95%CI: mean +/- 2.04 s.e. of mean  For 99% CI: mean +/- 2.75 s.e. of mean
  • 69.
    Process for ConstructingConfidence Intervals Compute the sample statistic (e.g. a mean) Compute the standard error of the mean Make a decision about level of confidence that is desired (usually 95% or 99%) Find tabled value for 95% or 99% confidence interval Multiply standard error of the mean by the tabled value Form interval by adding and subtracting calculated value to and from the mean
  • 70.
    Chapter III Tests ofhypotheses based on a single sample
  • 71.
     A hypothesistest is used to determine whether or not a treatment has an effect, while estimation is used to determine how much effect.  This complementary nature is demonstrated when estimation is used after a hypothesis test that resulted in rejecting the null hypothesis.  In this situation, the hypothesis test has established that a treatment effect exists and the next logical step is to determine how much effect.  You should keep in mind that even though estimation and hypothesis testing are inferential procedures, these two techniques differ in terms of the type of question they address.  A hypothesis test, for example, addresses the somewhat academic question concerning the existence of a treatment effect.  Estimation, on the other hand, is directed toward the more practical question of how much effect.
  • 72.
     A hypothesistest is a process that uses sample statistics to test a claim about the value of a population parameter.  A verbal statement, or claim, about a population parameter is called a statistical hypothesis.  Hypothesis testing is designed to detect significant differences: differences that did not occur by random chance.  In the “one sample” case: we compare a random sample (from a large group) to a population.  We compare a sample statistic to a population parameter to see if there is a significant difference.
  • 73.
    The Null andAlternative Hypotheses: 1. Null Hypothesis (H0) What is tested Has serious outcome if incorrect decision made Always has equality sign: , , or  Designated H0 (pronounced H-oh) Specified as H0:   some numeric value Specified with = sign even if  or  • Example, H0:   3 “The difference is by random chance”. The H0 always states there is “no significant difference.” In this case, we mean that there is no significant difference between the population mean and the sample mean.
  • 74.
    1. Alternative hypothesis(H1) Opposite of null hypothesis Always has inequality sign: ,, or  Designated Ha Specified Ha:  ,, or  some value • Example, Ha:  < 3 “The difference is real”. (H1) always contradicts the H0.
  • 75.
    Types of Errors Nomatter which hypothesis represents the claim, always begin the hypothesis test assuming that the null hypothesis is true. At the end of the test, one of two decisions will be made: 1. reject the null hypothesis, or 2. fail to reject the null hypothesis. A type I error occurs if the null hypothesis is rejected when it is true. A type II error occurs if the null hypothesis is not rejected when it is false.
  • 76.
    Level of Significance Ina hypothesis test, the level of significance is your maximum allowable probability of making a type I error. It is denoted by , the lowercase Greek letter alpha. The probability of making a type II error is denoted by , the lowercase Greek letter beta. By setting the level of significance at a small value, you are saying that you want the probability of rejecting a true null hypothesis to be small. Commonly used levels of significance:  = 0.10  = 0.05  = 0.01 Hypothesis tests are based on .
  • 77.
    P-values  If thenull hypothesis is true, a P-value (or probability value) of a hypothesis test is the probability of obtaining a sample statistic with a value as extreme or more extreme than the one determined from the sample data.  The P-value of a hypothesis test depends on the nature of the test.  There are three types of hypothesis tests – a left-, right-, or two- tailed test. The type of test depends on the region of the sampling distribution that favors a rejection of H0. This region is indicated by the alternative hypothesis.
  • 78.
     If thealternative hypothesis contains the less-than inequality symbol (<), the hypothesis test is a left-tailed test.  H0: μ  k  Ha: μ < k  If the alternative hypothesis contains the greater-than symbol (>), the hypothesis test is a right-tailed test.  H0: μ  k  Ha: μ > k z 0 1 2 3 -3 -2 -1 P is the area to the left of the test statistic. Test statistic z 0 1 2 3 -3 -2 -1 P is the area to the right of the test statistic. Test statistic
  • 79.
    3. If thealternative hypothesis contains the not-equal-to symbol (), the hypothesis test is a two-tailed test. In a two-tailed test, each tail has an area of P. z 0 1 2 3 -3 -2 -1 Test statistic Test statistic H0: μ = k Ha: μ  k P is twice the area to the left of the negative test statistic. P is twice the area to the right of the positive test statistic. 2 1
  • 80.
    Making a Decision DecisionRule Based on P-value To use a P-value to make a conclusion in a hypothesis test, compare the P-value with . 1. If P  , then reject H0. 2. If P > , then fail to reject H0. Claim Claim is H0 Claim is Ha Do not reject H0 Reject H0 There is enough evidence to reject the claim. Decision There is not enough evidence to reject the claim. There is enough evidence to support the claim. There is not enough evidence to support the claim.
  • 81.
    Interpreting a Decision Example: Youperform a hypothesis test for the following claim. How should you interpret your decision if you reject H0? If you fail to reject H0? H0: (Claim) A cigarette manufacturer claims that one-eighth of the US adult population smokes cigarettes. If H0 is rejected, you should conclude “there is sufficient evidence to indicate that the manufacturer’s claim is false.” If you fail to reject H0, you should conclude “there is not sufficient evidence to indicate that the manufacturer’s claim is false.”
  • 82.
    Steps for HypothesisTesting 1. State the claim mathematically and verbally. Identify the null and alternative hypotheses. 2. Specify the level of significance. 3. Determine the standardized sampling distribution and draw its graph. H0: ? Ha: ?  = ? 4. Calculate the test statistic and its standardized value. Add it to your sketch. Test statistic This sampling distribution is based on the assumption that H0 is true. z 0 Continued. z 0
  • 83.
    Steps for HypothesisTesting 5. Find the P-value. 6. Use the following decision rule. 7. Write a statement to interpret the decision in the context of the original claim. Is the P-value less than or equal to the level of significance? Fail to reject H0. Yes Reject H0. No These steps apply to left-tailed, right-tailed, and two-tailed tests.3
  • 84.
    Chapter IV Hypotheses testbased on two samples
  • 85.
    Two Sample HypothesisTesting In a two-sample hypothesis test, two parameters from two populations are compared. For a two-sample hypothesis test, 1.the null hypothesis H0 is a statistical hypothesis that usually states there is no difference between the parameters of two populations. The null hypothesis always contains the symbol , =, or . 2.the alternative hypothesis Ha is a statistical hypothesis that is true when H0 is false. The alternative hypothesis always contains the symbol >, , or <.
  • 86.
    Two Sample HypothesisTesting To write a null and alternative hypothesis for a two-sample hypothesis test, translate the claim made about the population parameters from a verbal statement to a mathematical statement. H0: μ1 = μ2 Ha: μ1  μ2 H0: μ1  μ2 Ha: μ1 > μ2 H0: μ1  μ2 Ha: μ1 < μ2 Regardless of which hypotheses used, μ1 = μ2 is always assumed to be true.
  • 87.
    Two Sample z-Test Threeconditions are necessary to perform a z-test for the difference between two population means μ1 and μ2. 1.The samples must be randomly selected. 2.The samples must be independent. Two samples are independent if the sample selected from one population is not related to the sample selected from the second population. 3.Each sample size must be at least 30, or, if not, each population must have a normal distribution with a known standard deviation.
  • 88.
    Two Sample t-Test Ifsamples of size less than 30 are taken from normally-distributed populations, a t-test may be used to test the difference between the population means μ1 and μ2. Three conditions are necessary to use a t-test for small independent samples. 1.The samples must be randomly selected. 2.The samples must be independent. Two samples are independent if the sample selected from one population is not related to the sample selected from the second population. 3.Each population must have a normal distribution, and samples of size less than 30 .
  • 89.
  • 90.
     Anova isa procedure that can be used to analyze the results from both simple and complex experiments  Reveals whether the obtained difference between any treatments is real or occur by chance.  It partition the total variation in to different components and test their significance
  • 91.
    Assumptions of ANOVA Most of the analysis are based on linear models (regression and ANOVA models) 1)Normality  Errors or residuals must be normally distributed. This is ensured through proper randomization and blocking  Another way of assessing normality is to use probability plots (pplots) of the residuals-this examines frequency distribution of your data, and compare the shape of that distribution to that expected normal distribution  For normal, the pplot will be a straight line; various kinds of skewness..etc  If normality is violated the F-test is invalid  Can be checked using box plots,..etc
  • 92.
    2)Homogeneity of variances Thevariance in the response variable is the same at each level, or combination of levels, of the predictor variables. check the normal distribution, but unequal variances may occur if sample sizes are small. 3) Linearity Parametric correlation and linear regression analyses are based on straight-line relationships between variables. This assumption is checked by examining a scatter plot of the two variables or more variables 4)Independence of errors This assumption implies all the observations should be independent of each other Any treatment should be assigned randomly to any of the experimental units through proper randomization to avoid dependency . This assumption is not met when the same experimental unit is affected by under different treatments If violated mean square of error will be inflated and type II will occur
  • 93.
    Analysis of variance(ANOVA)  Commonly used to determine differences between several groups or treatments  Partitioning of total variation into different component  Is used when we have two and more than two treatment levels  The simplest ANOVA -single factor – one-way ANOVA.
  • 94.
    Chapter VI Correlation andRegression Analysis
  • 95.
     Because ofthe nature of agricultural research that focuses primarily on the behaviour of biological organisms in a specified environment, the association among treatments, environmental factors, and responses that are usually evaluated in livestock research are association between response variables, Association between response and treatments, association between response and environment.  Both correlation and regression has a numbers of advantages:-  To know the association/relationships between numbers of variables that could affect the response of treatment on the experimental units in an experiment.  To understand the association of different variables on the response of animal performances in an experiment  To predict the association of different variables on animal performance so that it would be possible to adjust the amount of treatments used on the experimental units.
  • 96.
    Correlation analysis  Thediscovering and measuring of the magnified and direction of the relationship between two/more variable is called Correlation.  It is a measure of the degree to which variables vary together or a measure of the intensity of the association between different variables in an experiment.  Suppose you have two continuous variables X and Y, if the change in one variable affects, the change in the other variable, the variable X is said to be correlated with variable Y or vice versa.  In this case, the correlation between two or more variables does not necessarily interested to have dependent or independent variables, both can be dependent or independent variables or both alternatively.  The correlation procedures can be classified according to the number of variables involved and the form of the functional relationship between variables involved in the experiment.
  • 97.
     The proceduresis termed simple if only two variables are involved and multiple, otherwise.  The procedure is termed linear if the form of the underlying relationship is linear and non- linear, otherwise.  Thus, correlation analysis can be classified into four types. 1. Simple linear correlation 2. Multiple linear correlation 3. Simple non linear correlation 4. Multiple non linear correlation  Correlation analysis is usually expressed by using index called coefficient of correlation and it is symbolized by “r” incase of sample, and “p” incase of population.  The values of coefficient of correlation range between –1 and 1, inclusively (−1 ≤ r ≤ 1). It tells us only the magnitude, degree, and direction of association of the variables in an experiment.
  • 98.
     For r> 0, the two variables have a positive correlation, and for r < 0, the two variables have a negative correlation.  The positive correlation means where the changes in both variables move in the same direction (as values of one variable increase, increasing values of the other variable are observed and as values of one variable decrease, decreasing values of the other variable are observed).  A negative correlation means that as values of one variable increase, decreasing values of the other variable are observed or vice versa.  The value r = 1 or r = –1 indicates an ideal or perfect linear relationship, and r = 0 means that there is no linear association.  Coefficient of correlation is unit free. It is not affected by change of the origin, scale or both in an experiment.  The coefficient of correlation ® is used under certain assumptions, such as the variables are continuous, random variables and are normally distributed, the relationship between variables is linear and each pair of observation is not connected with each other.
  • 99.
     The magnitudeof correlation is calculated by the formula called coefficient of determination (r2 ) that shows the amount of change in one variable is accounted by the second variable.  Correlation can be used as selection criteria in animal breeding if it is positive so decide up to what level the variables are used in an experiment. Example: From a research which is conducted in Horro Guduru Wollega Goats, the following weight and heart girth data are taken. Calculate linear correlation coefficient and coefficient of determination. Heart girth (x) Body weight (y) XiYi 70 25 1750 67 22 1474 73 32 2336 73 32 2336 65 20 1300 74 31 2294 73 31 2263 68 27 1836 Total ∑ Xi =563 ∑ Yi = 220 ∑XiYi = 15,589 Mean X- = 70.4 y- =27.5
  • 100.
    Solution  SSX =∑xi2 – (∑ xi)2 = [(70)2 + (67)2 +--- (68)2 ] – (563)2 = 39701-39621.13 =79.87 n 8  SSY = =∑ Yi2 – (∑ Yi)2 = [(25)2 + (22)2 +--- (27)2 ] – (220)2 = 6208 -6050 =158 n 8  Cov XY = ∑XiYi – (∑Xi∑Yi)= [1750 +1474+---1836] – (563x220) = 15589-15482.5 = 106.5 n 8  rXY (correlation coefficient) = Cov XY = 106.5 = 106.5 = 0.95 √SSX*SSY √79.87*158 112.34  r2 (coefficient of determination) = 0.952 = 0.8988 = 89.88%. This shows that the relation between heart girth(x) and body weight(y) variable is 89.88%.
  • 101.
    Regression analysis  Itis often of interest to determine how changes of values of some variables influence the change of values of other variables.  For example, how alteration of air temperature affects feed intake, or how increasing the protein level in a feed affects daily gain.  In both the first and the second example, the relationship between variables can be described with a function, a function of temperature to describe feed intake, or a function of protein level to describe daily gain.  A function that explains such relationships is called a regression function and analysis of such problems and estimation of the regression function is called regression analysis.  Regression includes a set of procedures designed to study statistical relationships among variables in a way in which one variable is defined as dependent upon others defined as independent variables.
  • 102.
     By usingregression, the cause-consequence relationship between the independent and dependent variables can be determined.  In the examples above, feed intake and daily gain are dependent variables, and temperature and protein level are independent variables.  Regression analysis describes the effect of one or more variables (designated as independent) on a single variable (designated as the dependent variable) by expressing the latter as a function of the former.
  • 103.
     For thisanalysis, it is important to clearly distinguish between the dependent and independent variable.  The regression analysis tells us the cause and effect or the magnitude of relationship between variables in an experiment.  The regression procedures can be classified according to the number of variables involved and the form of the functional relationship between variables involved in the experiment.  The procedures is termed simple if only two variables are involved and multiple, otherwise.  The procedure is termed linear if the form of the underlying relationship is linear and non-linear, otherwise.
  • 104.
    Thus, regression analysiscan be classified into four types. 1.Simple linear regression 2.Multiple linear regression 3.Simple non linear regression 4.Multiple non linear regression  The functional form of the linear relationship between a dependent variable Y and an independent variable X is represented by the equation: Y= œ + βX, where œ is the intercept of the line on the Y axis and β is the linear regression coefficient, is the slope of the line or the amount of change in Y for each unit change in X.  where there is more than one independent variable, so k independent variables (X1, X2, X3,---Xk), the simple linear functional form of the equation Y= œ + βX can be extended to the multiple linear functional form of Y = œ + β1X1+ β2X2 +----+ βkXk, where œ is the intercept (the value of Y where all X’s are zero) and βi (i =1---k), is the regression coefficient associated with independent variable Xi, represents the amount of change in Y for each unit change in Xi.
  • 105.
     The twomain applications of regression analysis are: 1. Estimation of a function of dependency between variables 2. Prediction of future measurements or means of the dependent variable using new measurements of the independent variable(s).  Example: From a research which is conducted in Horro Guduru Wollega Goats, the following weight and heart girth data are taken. Heart girth (x) Body weight (x) XiYi 70 25 1750 67 22 1474 73 32 2336 73 32 2336 65 20 1300 74 31 2294 73 31 2263 68 27 1836 Total ∑Xi =563 ∑Yi = 220 ∑XiYi = 15,589 Mean X- = 70.4 y- =27.5
  • 106.
    Solution SSX =∑ xi2 –(∑ xi)2 = [(70)2 + (67)2 +--- (68)2 ] – (563)2 = 39701-39621.13 =79.87 n 8 SSY = =∑ Yi2 – (∑ Yi)2 = [(25)2 + (22)2 +--- (27)2 ] – (220)2 = 6208 -6050 =158 n 8 Cov XY = ∑XiYi – (∑Xi∑Yi)= [1750 +1474+---1836] – (563x220) = 15589-15482.5 = 106.5 n 8 b = Cov XY/SSx = 106.5/79.87 = 1.33 Y = a + bx a = y- -b X- = 27.5- (1.33 x 70.4) = 27.5 – 93.87 = -66.37 Y = -66.37 + 1.33x Body weight = -66.37 + 1.33(heart girth)
  • 107.
    CHAPTER 7 SAMPLING TECHNIQUES Many professions (business, government, engineering, science, social research, agriculture, etc.) seek the broadest possible factual basis for decision-making. In the absence of data on the subject, a decision taken is just like leaping into the dark.  Sampling is a procedure, where in a fraction of the data is taken from a large set of data, and the inference drawn from the sample is extended to whole group. The surveyor’s (a person or a establishment in charge of collecting and recording data) or researchers initial task is to formulate a rational justification for the use of sampling in his research. If sampling is found appropriate for a research, the researcher, then: (1) Identifies the target population as precisely as possible, and in a way that makes sense in terms of the purpose of study. (2) Puts together a list of the target population from which the sample will be selected. This list is termed as a frame (more appropriately list frame) by many statisticians.
  • 108.
    (3) Selects thesample, and decide on a sampling technique, and; (4) Makes an inference about the population.  All these four steps are interwoven and cannot be considered isolated from one another. Simple random sampling, systematic sampling, stratified sampling fall into the category of simple sampling techniques. Complex sampling techniques are used, only in the presence of large experimental data sets; when efficiency is required; and, while making precise estimates about relatively small groups within large populations . Characteristics of Good Samples  Representative  Accessible  Low cost SAMPLING TERMINOLOGY  A population is a group of experimental data, persons, etc. A population is built up of elementary units, which cannot be further decomposed.  A group of elementary units is called a cluster.  Population Total is the sum of all the elements in the sample frame.  Population Mean is the average of all elements in a sample frame or population.  The fraction of the population or data selected in a sample is called the Sampling Fraction.  The reciprocal of the sampling fraction is called the Raising Factor.
  • 109.
     A sample,in which every unit has the same probability of selection, is called a random sample.  If no repetitions are allowed, it is termed as a simple random sample selected without replacement. If repetitions are permitted, the sample is selected with replacement. PROBABILITY AND NON-PROBABILITY SAMPLING  Probability sampling, is a sampling process that utilizes some form of random selection. In probability sampling, each unit is drawn with known probability or has a non-zero chance of being selected in the sample. Such samples are usually selected with the help of random numbers. With probability sampling, a measure of sampling variation can be obtained objectively from the sample itself.  Non-probability sampling or judgment sampling depends on subjective judgment.
  • 110.
     The non-probabilitymethod of sampling is a process where probabilities cannot be assigned to the units objectively, and hence it becomes difficult to determine the reliability of the sample results in terms of probability. Examples of non- probability sampling used extensively in 1920’s and 1930’s are the judgment sample, quota sample, and the mail questionnaire.  In non-probability sampling, often, the surveyor selects a sample according to his convenience, or generality in nature. Non-probability sampling is well suited for exploratory research intended to generate new ideas that will be systematically tested later. However, if the goal is to learn about a large population, it is imperative to avoid judgment of non-probabilistic samples in survey research.  In contrast to probability sampling techniques, there is no way of knowing the accuracy of a non-probabilistic sample estimate.
  • 111.
    SAMPLING ERRORS  Samplingerrors occur as a result of calculating the estimate (estimated mean, total, proportion, etc) based on a sample rather than the entire population. This is due to the fact that the estimated figure obtained from the sample may not be exactly equal to the true value of the population. For example, if a sample of blocks is used to estimate the total number of persons in the city, and the blocks in the sample are larger than the average — then this sample will overstate the true population of the city.  When results from a sample survey are reported, they are often stated in the form “plus or minus” of the respective units being used. This “plus or minus” reflects sampling errors. Salant and Dilman, describe, that the statistics based on samples drawn from the same population always vary from each other (and from the true population value) simply because of chance. This variation is sampling error and the measure used to estimate the sampling error is the standard error.  se (p) = [(p q)/n] where, se (p) is the standard error of a proportion, p and q is the proportion of the sample that do (p) and do not (q) have a particular characteristic, and n = the number of units in the sample.
  • 112.
     Standard errorsare usually used to quantify the precision of the estimates.  Sample distribution theory, points out that about 68 percentage of the estimates lie within one standard error or standard deviation of the mean, 95 percentages lie within two standard deviations and all estimates lie within three standard deviations.  Sampling errors can be minimized by proper selection of samples, and Salant and Dilman state ― “Three factors affect sampling errors with respect to the design of samples – the sampling procedure, the variation within the sample with respect to the variate of interest, and the size of the sample. Large sample results in lesser sampling error. NON-SAMPLING ERRORS  The accuracy of an estimate is also affected by errors arising from causes such as incomplete coverage and faulty procedures of estimation, and together with observational errors, these make up what are termed non sampling errors.  The aim of a survey is always to obtain information on the true population value. The idea is to get as close as possible to the latter within the resources available for survey. The discrepancy between the survey value and the corresponding true value is called the observational error or response error.
  • 113.
     Non-sampling errorsoccur as a result of improper records on the variate of interests, careless reporting of the data, or deliberate modification of the data by the data collectors and recorders to suit their interests. Non response error occurs when a significant number of people in a survey sample are either absent; do not respond to the questionnaire; or, are different from those who do in a way that is important to the study. BIAS  Although judgment sampling is quicker than probability sampling, it is prone to systematic errors. For example, if 20 books are to be selected from a total of 200 to estimate the average number of pages in a book, a surveyor might suggest picking out those books which appear to be of average size.  The difficulty with such a procedure is that consciously or unconsciously, the sampler will tend to make errors of judgment in the same direction by selecting most of the books which are either bigger than the average of otherwise. Such systematic errors lead to what are called biases.
  • 114.
    BASIC PROBABILISTIC SAMPLINGTECHNIQUES SIMPLE RANDOM SAMPLING  Sample surveys deal with samples drawn from populations, and contain a finite number of N units. If these units can all be distinguished from one another, the number of distinct samples of size n that can be drawn from N units is given by the combinatorial formula-  Objective: To select n units out of N, such that each number of combinations has an equal chance of being selected, i.e., each unit in any given population has the same probability of being selected in the sample.  Procedure: Use a table random numbers, a computer random number generator, or a mechanical device to select the sample.  Example Suppose there are N = 850 students in a school from which a sample of n = 10 students is to be taken. The students are numbered from 1 to 850. Since our population runs into three digits we use random numbers that contain three digits. All numbers exceeding 850 are ignored because they do not correspond to any serial number in the population. In case the same number occurs again, the repetition is ignored. Following these rules the following simple random sample of 10 students is obtained when columns 31 and 32 of the random numbers given in Appendix 1 are used. 251 546 214 495 074 800 407 502 513 628
  • 115.
     Remark: Ifrepetitions are included, the procedure is termed as selecting a sample with replacements. In the present example the sample is selected without replacement. SYSTEMATIC SAMPLING  Systematic sampling is a little bit different from simple random sampling. Suppose that N units of the population are numbered 1 to N in some order. To select a sample of n units, we must take a unit at random from the first k units and every kth unit thereafter. Procedure: 1. Number the units in population from 1 to N 2. Decide on the n (sample size) that is required 3. Select an interval size k = N/n 4. Randomly select an integer between 1 to k 5. Finally, take every kth unit
  • 116.
     Let's assumethat we have a population that only has N=100 people in it and that you want to take a sample of n=20. To use systematic sampling, the population must be listed in a random order. The sampling fraction would be n/N = 20/100 = 20%. In this case, the interval size, k, is equal to N/n = 100/20 = 5. Now, select a random integer from 1 to 5.  In our example, imagine that you chose 4. Now, to select the sample, start with the 4th unit in the list and take every k-th unit (every 5th, because k=5). You would be sampling units 4, 9, 14, 19, and so on to 100 and you would wind up with 20 units in your sample.  In order for systematic sampling to work, it is essential that the units in the population be randomly ordered, at least with respect to the characteristics you are measuring.  Systematic sampling is fairly easy to do and is widely used for its convenience and time efficiency. In many surveys, it is found to provide more precise estimates than simple random sampling. This happens when there is a trend present in the list with respect to the characteristic of interest.
  • 117.
     Systematic samplingis at its worst, when there is periodicity in the sampled data and the sampling interval has fallen in line with it. When this happens, most of the units in the sample will be either too high or low, which makes the estimate very variable. STRATIFIED SAMPLING  It involves dividing the population into homogeneous non-overlapping groups (i.e., strata), selecting a sample from each group, and conducting a simple random sample in each stratum.  On the basis of information available from a frame, units are allocated to strata by placing within the same stratum, those units which are more-or-less similar with respect to the characteristics being measured. If this can be reasonably achieved, the strata will become homogenous, i.e., the unit-to-unit variability within a stratum will be small.  Surveyors use various different sample allocation techniques to distribute the samples in the strata.
  • 118.
     In proportionalallocation, the sample size in a stratum is made proportional to the number of units in the stratum. In equal allocation, the same number of units is taken from each stratum irrespective of the size of the stratum. CLUSTER SAMPLING  The smallest units into which a population can be divided are called the elements of the population, and groups of elements, the clusters.  The problem with random sampling methods when sampling a population that's distributed across a wide geographic region lies in covering a lot of ground geographically in order to get to each of the units sampled. This geographic trotting to collect samples is an expensive affair. But, without taking samples from across the whole geographic population, it may become difficult to conclude anything affirmatively about the population. The impasse is to determine the best size of the cluster for a specified cost of the survey. This predicament can be solved if the cost of the survey and the variance of the estimate can be expressed as functions of the size of the cluster.
  • 119.
     Cluster samplingincludes: 1. Divide population into clusters 2. Randomly sample clusters 3. Measure all units within sampled clusters  Cluster sampling is ordinarily conducted in order to reduce costs. The variance of the estimate of the mean in simple random sampling of clusters depends on the sample size, the population variance and on the correlation of the variate of interest between units within the same cluster.  If the units within a cluster are more similar than units belonging to different clusters, the estimator is subject to a larger variance; thus the smaller the intra- cluster correlation, the better.
  • 120.
    MULTISTAGE SAMPLING  Multistagesampling involves, combining various probability techniques in the most efficient and effective manner possible. The process of estimation is carried out stage by stage, using the most appropriate methods of estimation at each stage.  Quite often, auxiliary information is used to improve the precision of an estimate. But, in the absence of auxiliary information, it may be advantageous to conduct the enquiry in two phases. In the first phase, auxiliary information is collected on the variate of a fairly large sample. Then a sub-sample is taken, and information collected on the variate of interest. Then the two samples are used in the best possible manner to produce an estimate for the variate of interest. The procedure of first selecting clusters and then choosing a specified number of elements from each selected cluster is known as sub sampling. It is also known as two-stage sampling or double sampling.  The clusters, which form the units of sampling at the first stage, are called first stage units, and the elements or group of elements within clusters, which form the units of sampling at the second stage, are called sub-units or second-stage units. The procedure can be easily generalized to three or more stages and hence known as multi-stage sampling. Double sampling can be used to reduce the response bias in survey results.
  • 121.
    NON-PROBABILISTIC SAMPLING ACCIDENTAL, HAPHAZARDOR CONVENIENCE SAMPLING  One of the most common methods of sampling goes under the various titles listed here. I would include in this category the traditional "man on the street" (of course, now it's probably the "person on the street") interviews conducted frequently by television news programs to get a quick (although non representative) reading of public opinion. I would also argue that the typical use of college students in much psychological research is primarily a matter of convenience. (You don't really believe that psychologists use college students because they believe they're representative of the population at large, do you?).  In clinical practice, we might use clients who are available to us as our sample.  In many research contexts, we sample simply by asking for volunteers. Clearly, the problem with all of these types of samples is that we have no evidence that they are representative of the populations we're interested in generalizing to -- and in many cases we would clearly suspect that they are not.
  • 122.
    PURPOSIVE SAMPLING  Inpurposive sampling, sampling is done with a purpose in mind. We usually would have one or more specific predefined groups we are seeking. For instance, have you ever run into people in a mall or on the street who are carrying a clipboard and who are stopping various people and asking if they could interview them? Most likely they are conducting a purposive sample (and most likely they are engaged in market research). They might be looking for Caucasian females between 30-40 years old. They size up the people passing by and anyone who looks to be in that category they stop to ask if they will participate. One of the first things they're likely to do is verify that the respondent does in fact meet the criteria for being in the sample.  Purposive sampling can be very useful for situations where you need to reach a targeted sample quickly and where sampling for proportionality is not the primary concern. With a purposive sample, you are likely to get the opinions of your target population, but you are also likely to overweight subgroups in your population that are more readily accessible.  All of the methods that follow can be considered as sub-categories of purposive sampling methods. We might sample for specific groups or types of people as in modal instance, expert, or quota sampling. We might sample for diversity as in heterogeneity sampling. Or, we might capitalize on informal social networks to identify specific respondents who are hard to locate otherwise, as in snowball sampling. In all of these methods we know what we want -- we are sampling with a purpose.
  • 123.
    1. MODAL INSTANCESAMPLING  In statistics, the mode is the most frequently occurring value in a distribution. In sampling, when we do a modal instance sample, we are sampling the most frequent case, or the "typical" case. In a lot of informal public opinion polls, for instance, they interview a "typical" voter. There are a number of problems with this sampling approach. First, how do we know what the "typical" or "modal" case is? We could say that the modal voter is a person who is of average age, educational level, and income in the population. But, it's not clear that using the averages of these is the fairest (consider the skewed distribution of income, for instance). And, how do you know that those three variables -- age, education, income -- are the only or event the most relevant for classifying the typical voter? What if religion or ethnicity is an important discriminator? Clearly, modal instance sampling is only sensible for informal sampling contexts. 2. EXPERT SAMPLING  Expert sampling involves the assembling of a sample of persons with known or demonstrable experience and expertise in some area. Often, we convene such a sample under the auspices of a "panel of experts." There are actually two reasons you might do expert sampling. First, because it would be the best way to elicit the views of persons who have specific expertise. In this case, expert sampling is essentially just a specific sub case of purposive sampling. But the other reason you might use expert sampling is to provide evidence for the validity of another sampling approach you've chosen.
  • 124.
    3. QUOTA SAMPLING In quota sampling, you select people non-randomly according to some fixed quota. There are two types of quota sampling: proportional and non proportional. In proportional quota sampling you want to represent the major characteristics of the population by sampling a proportional amount of each. For instance, if you know the population has 40% women and 60% men, and that you want a total sample size of 100, you will continue sampling until you get those percentages and then you will stop. So, if you've already got the 40 women for your sample, but not the sixty men, you will continue to sample men but even if legitimate women respondents come along, you will not sample them because you have already "met your quota." The problem here (as in much purposive sampling) is that you have to decide the specific characteristics on which you will base the quota. Will it be by gender, age, education race, religion, etc.? 4. NON-PROPORTIONAL QUOTA SAMPLING  It is a bit less restrictive. In this method, you specify the minimum number of sampled units you want in each category. Here, you're not concerned with having numbers that match the proportions in the population. Instead, you simply want to have enough to assure that you will be able to talk about even small groups in the population.  This method is the non-probabilistic analogue of stratified random sampling in that it is typically used to assure that smaller groups are adequately represented in your sample.
  • 125.
    5. HETEROGENEITY SAMPLING We sample for heterogeneity when we want to include all opinions or views, and we aren't concerned about representing these views proportionately. Another term for this is sampling for diversity. In many brainstorming or nominal group processes (including concept mapping), we would use some form of heterogeneity sampling because our primary interest is in getting broad spectrum of ideas, not identifying the "average" or "modal instance" ones. In effect, what we would like to be sampling is not people, but ideas. We imagine that there is a universe of all possible ideas relevant to some topic and that we want to sample this population, not the population of people who have the ideas. Clearly, in order to get all of the ideas, and especially the "outlier" or unusual ones, we have to include a broad and diverse range of participants. Heterogeneity sampling is, in this sense, almost the opposite of modal instance sampling. 6. SNOWBALL SAMPLING  In snowball sampling, you begin by identifying someone who meets the criteria for inclusion in your study. You then ask them to recommend others who they may know who also meet the criteria. Although this method would hardly lead to representative samples, there are times when it may be the best method available. Snowball sampling is especially useful when you are trying to reach populations that are inaccessible or hard to find. For instance, if you are studying the homeless, you are not likely to be able to find good lists of homeless people within a specific geographical area. However, if you go to that area and identify one or two, you may find that they know very well who the other homeless people in their vicinity are and how you can find them.
  • 126.
    Inference  Two waysto make inference Estimation of parameters * Point Estimation (X or p) * Intervals Estimation Hypothesis Testing
  • 127.
  • 128.
    Mean, , is unknown PopulationPoint estimate I am 95% confident that  is between 40 & 60 Mean X = 50 Sample Interval estimate
  • 129.
  • 130.
    Sampling Distribution X orP X or P X or P
  • 131.
    Standard Error SE (Mean)= S n SE (p) = p(1-p) n Quantitative Variable Qualitative Variable
  • 132.
    95% Samples Confidence Interval X _ X- 1.96 SE X + 1.96 SE  SE SE Z-axis 1 - α α/2 α/2
  • 133.
    95% Samples Confidence Interval SE SE p p + 1.96 SE p - 1.96 SE Z-axis 1 - α α/2 α/2
  • 134.
    Interpretation of CI Probabilistic Practical Weare 100(1-)% confident that the single computed CI contains  In repeated sampling 100(1- )% of all intervals around sample means will in the long run include 
  • 135.
    Example (Sample size≥30) Anepidemiologist studied the blood glucose level of a random sample of 100 patients. The mean was 170, with a SD of 10. SE = 10/10 = 1 Then CI:  = 170 + 1.96  1 168.04   ≥ 171.96 95 %  = X + Z SE
  • 136.
    In a surveyof 140 asthmatics, 35% had allergy to house dust. Construct the 95% CI for the population proportion.  = p + Z 0.35 – 1.96  0.04   ≥ 0.35 + 1.96  0.04 0.27   ≥ 0.43 27%   ≥ 43% Example (Proportion) In a survey of 140 asthmatics, 35% had allergy to house dust. Construct the 95% CI for the population proportion.  = p + Z 0.35 – 1.96  0.04   ≥ 0.35 + 1.96  0.04 0.27   ≥ 0.43 27%   ≥ 43% P(1-p) n 140 0.35(1-0.35) = 0.04 SE =
  • 137.
    Hypothesis testing A statisticalmethod that uses sample data to evaluate a hypothesis about a population parameter. It is intended to help researchers differentiate between real and random patterns in the data.
  • 138.
    An assumption about thepopulation parameter. I assume the mean SBP of participants is 120 mmHg What is a Hypothesis?
  • 139.
     H0 NullHypothesis states the Assumption to be tested e.g. SBP of participants = 120 (H0:  120).  H1 Alternative Hypothesis is the opposite of the null hypothesis (SBP of participants ≠ 120 (H1:  ≠ 120). It may or may not be accepted and it is the hypothesis that is believed to be true by the researcher Null & Alternative Hypotheses
  • 140.
     Defines unlikelyvalues of sample statistic if null hypothesis is true. Called rejection region of sampling distribution  Typical values are 0.01, 0.05  Selected by the Researcher at the Start  Provides the Critical Value(s) of the Test Level of Significance, 
  • 141.
    Level of Significance,a and the Rejection Region 0  Critical Value(s) Rejection Regions
  • 142.
    H0: Innocent Jury TrialHypothesis Test Actual Situation Actual Situation Verdict Innocent Guilty Decision H0 True H0 False Innocent Correct Error Accept H0 1 -  Type II Error ( ) Guilty Error Correct H0 Type I Error ( ) Power (1 - ) Result Possibilities False Negative False Positive Reject
  • 143.
     True Valueof Population Parameter  Increases When Difference Between Hypothesized Parameter & True Value Decreases  Significance Level   Increases When Decreases  Population Standard Deviation   Increases When  Increases  Sample Size n  Increases When n Decreases Factors Increasing Type II Error      n β  d
  • 144.
     Probability ofObtaining a Test Statistic More Extreme  or ) than Actual Sample Value Given H0 Is True  Called Observed Level of Significance  Used to Make Rejection Decision  If p value  Do Not Reject H0  If p value <, Reject H0 p Value Test
  • 145.
    State H0 H0: 120 State H1 H1 :  Choose  = 0.05 Choose n n = 100 Choose Test: Z, t, X2 Test (or p Value) Hypothesis Testing: Steps Test the Assumption that the true mean SBP of participants is 120 mmHg.
  • 146.
    Compute Test Statistic(or compute P value) Search for Critical Value Make Statistical Decision rule Express Decision Hypothesis Testing: Steps
  • 147.
     Assumptions  Populationis normally distributed  t test statistic One sample-mean Test n s x t 0 error standard value null mean sample     
  • 148.
    Example Normal BodyTemperature What is normal body temperature? Is it actually 37.6o C (on average)? State the null and alternative hypotheses H0:  = 37.6o C Ha:   37.6o C
  • 149.
    Example Normal BodyTemp (cont) n s x t 0 error standard value null mean sample      Data: random sample of n = 18 normal body temps 37.2 36.8 38.0 37.6 37.2 36.8 37.4 38.7 37.2 36.4 36.6 37.4 37.0 38.2 37.6 36.1 36.2 37.5 Variable n Mean SD SE t P Temperature 18 37.22 0.68 0.161 2.38 0.029 Summarize data with a test statistic
  • 150.
    STUDENT’S t DISTRIBUTIONTABLE Degrees of freedom Probability (p value) 0.10 0.05 0.01 1 6.314 12.706 63.657 5 2.015 2.571 4.032 10 1.813 2.228 3.169 17 1.740 2.110 2.898 20 1.725 2.086 2.845 24 1.711 2.064 2.797 25 1.708 2.060 2.787  1.645 1.960 2.576
  • 151.
    Example Normal BodyTemp (cont) Find the p-value Df = n – 1 = 18 – 1 = 17 From SPSS: p-value = 0.029 From t Table: p-value is between 0.05 and 0.01. Area to left of t = -2.11 equals area to right of t = +2.11. The value t = 2.38 is between column headings 2.110& 2.898 in table, and for df =17, the p-values -2.11 +2.11 t
  • 152.
    Example Normal BodyTemp (cont) Decide whether or not the result is statistically significant based on the p-value Using  = 0.05 as the level of significance criterion, the results are statistically significant because 0.029 is less than 0.05. In other words, we can reject the null hypothesis. Report the Conclusion We can conclude, based on these data, that the mean temperature in the human population does not equal 37.6.
  • 153.
     Involves categoricalvariables  Fraction or % of population in a category  Sample proportion (p) n p Z ) 1 (       One-sample test for proportion size sample successes of number n X p    Test is called Z test where:  Z is computed value  π is proportion in population (null hypothesis value) Critical Values: 1.96 at α=0.05 2.58 at α=0.01
  • 154.
    Example • In asurvey of diabetics in a large city, it was found that 100 out of 400 have diabetic foot. Can we conclude that 20 percent of diabetics in the sampled population have diabetic foot. • Test at the  =0.05 significance level.
  • 155.
    Solution Critical Value: 1.96 Decision: Wehave sufficient evidence to reject the Ho value of 20% We conclude that in the population of diabetic the proportion who have diabetic foot does not equal 0.20 Z 0 Reject Reject .025 .025 = 2.50 Ho: π = 0.20 H1: π  0.20 Z = 0.25 – 0.20 0.20 (1- 0.20) 400 +1.96 -1.96