Researchmethods2012

Experimental methodology and
statistics
42
5
1
0011 0010 1010 1101 0001 0100 1011
Pedro Aguiar Pinto
papinto@isa.utl.pt
January 2012
Instituto Superior de Agronomia
Universidade Técnica de Lisboa
Portugal

Research
What for?
42
5
1
0011 0010 1010 1101 0001 0100 1011
What for?
What are we talking about?
How to?

Bernard de Clairvaux (1090-1153)
• There are five stimulii that push man towards Science:
– There are men that want to know for the simple pleasure of
knowing
• It is low curiosity
42
5
1
0011 0010 1010 1101 0001 0100 1011
• It is low curiosity
– There are other that want to know to be known:
• It is vanity
– Others want to possess science in order to sell it and make profit
and get honours
• It is a selfish motivation
– But there are some who want to know in order to edify
– and this is charity
• Others to be edified
– and this is wisdom

What is all about?
It is a matter of knowing
42
5
1
0011 0010 1010 1101 0001 0100 1011

42
5
1
0011 0010 1010 1101 0001 0100 1011

Observation and experiment
• Experimental method
– Experimental “pathway”
– Trial and error
42
5
1
0011 0010 1010 1101 0001 0100 1011
– Trial and error
– Logical deduction (deductive method)
– Test

42
5
1
0011 0010 1010 1101 0001 0100 1011
ISO 3591, Sensory analysis - Wine tasting glass,

Sensorial analysis
42
5
1
0011 0010 1010 1101 0001 0100 1011

Environment / reality
• Scenary where the activity takes place
• The environment is a reality
– that is external to the activity, but has a marked
42
5
1
0011 0010 1010 1101 0001 0100 1011
– that is external to the activity, but has a marked
effect on it
– that can only be partially modified and in a limited
manner

Environmental characterization
• This reality, strange to the will or action of
man is given, is there, it is not made by
him.
42
5
1
0011 0010 1010 1101 0001 0100 1011
him.
• Environmental characterization
deals with knowing reality as such, as it is
given.
– Deals with knowing its characteristics
(most remarkable details)

Observation
• One gets to know reality (environment) by
observation
• Observation must prevent the observer’s
42
5
1
0011 0010 1010 1101 0001 0100 1011
• Observation must prevent the observer’s
influence as well as the influence of
observation tools, otherwise, what is being
observed differs from what is given.

Quantitative observation
• The characteristics of reality that matter to
our purpose are known by measurement
(observation) of physical dimensions or
42
5
1
0011 0010 1010 1101 0001 0100 1011
(observation) of physical dimensions or
quantities
• Observation error / measurement error

Variability
• Measuring the same characeristic results
in diferente values as a function of:
– observer (observation error)
42
5
1
0011 0010 1010 1101 0001 0100 1011
– observer (observation error)
– measurement tool (instrumental error)
– location (space and time) of the observation
• Environmental variability

Example: Soil characterization
• Soil physical characteristics have a variability
that is mainly spatial
– Visible in soil maps
– Giving meaning to
Precision Agriculture
42
5
1
0011 0010 1010 1101 0001 0100 1011
Precision Agriculture
• Time variability
(%H2O, %OM,…)

Climate characterization I
• Aerial environment characterization
departs from observations that are, by
nature, instantaneous
42
5
1
0011 0010 1010 1101 0001 0100 1011
• Weather characteristics (temperature,
atmospheric pressure, radiation,…) are in
each instant the focus of observation

Climate characterization II
• The nature of the factors that determine
weather
(radiation, general atmosphere circulation and rainfall mechanics)
introduces a large temporal variability that
42
5
1
0011 0010 1010 1101 0001 0100 1011
introduces a large temporal variability that
adds up to spatial variability

Climate characterization III
• Note that,
– soil characterization is attained by a set of
observations (in different locations and along
soil profile) that are not repeated in time
42
5
1
0011 0010 1010 1101 0001 0100 1011
– weather characterization demands observation
time series that incorporate time variation

Climate
• Results from the aggregation of a series of
observations of instantaneous weather
measurements
– for example, daily average temperature results
42
5
1
0011 0010 1010 1101 0001 0100 1011
– for example, daily average temperature results
from the arithmetic average between maximum
and minimun daily temperatures - 2
instantaneous observations used as estimates
of the daily “thermal climate”

Climate
• The concept we call climateclimate
results from a greater aggregation of
climate data already aggregated
,
42
5
1
0011 0010 1010 1101 0001 0100 1011
(averages and arithmetic sums over monthly periods of observation),
integrated in indices that allow the
differentiation of spatial and geographical
units
– Climate classification

Climate
• Each CLIMATE correspondes to a set of
climatological normals.
– Instantaneous observations
– Daily sums or averages
42
5
1
0011 0010 1010 1101 0001 0100 1011
– Daily sums or averages
– Monthly sums or averages
– Averages of monthly averages or averages of monthly
sums for a standard period (30 years)

Annual rainfall variability
625
700
775
850
925
30 year average =
617 mm
Annual rainfall variability
Mora, Portugal
30 years is a long time,but
not long enough to find a
pattern
42
5
1
0011 0010 1010 1101 0001 0100 1011
325
400
475
550
1955 1960 1965 1970 1975 1980 1985
617 mm

Climate normality
• The successive data agreggation that leads to
what one may call a “normal climate”s useful to
several purposes, but has the cost of cancelling
the natural variability of instantaneous
observations
42
5
1
0011 0010 1010 1101 0001 0100 1011
observations
• As a consequence, “the normal climate” is not
data anymore, but rather some sort of data
manipulation.
– Therefore, it is only by coincidence that ond can run
into a “normal year”

Uncertainty
• Variability, and in a more evident and
sensible fashion, time variability, illustrates
the question of uncertainty in the
knowledge of reality
42
5
1
0011 0010 1010 1101 0001 0100 1011
• In any case, it is with this “uncertain”
knowledge that we depart to make
decisions
• By the way, one needs to decide, to make a
decision, when he/she is not sure about
the outcome

Uncertainty
• In many cases it is useful to know the degree of
uncertainty linked to a decision;
and some times one can use predictive models to
make predictions/forecasts
• As opposed to observations, predictions are not
42
5
1
0011 0010 1010 1101 0001 0100 1011
data: they did not happen…
• Assumption: prediction supposes that the pattern
that was verified in the past holds in the future (or
changes in a given hypothetical manner).

Induction and deduction
• Data, observation, data analysis,
descriptive statistics, polls, samples
deductive method
42
5
1
0011 0010 1010 1101 0001 0100 1011
deductive method
• Experiments, results, observations,
conclusions, generalization, statistical
inference
inductive method

Deductive reasoning
• Given some general principle what happens
in a specific set of conditions:
– Given the formula for the area of a circle, what is the area of a circle
whose raius is 5?
42
5
1
0011 0010 1010 1101 0001 0100 1011
whose raius is 5?
– Given a key and description of herbaceous species in Southern France, to
what species does a certain plant belong?
– Give a coin whose probability of coming up heads when tossed is ½, what
will happen when the coin is tossed 10 times?

Inductive reasoning
• Given some specific cases, arrive to some
general principles that will apply to all:
– Given the areas and radii of several circles, what general formula can we
give to express the relation between the areas and the radii?
42
5
1
0011 0010 1010 1101 0001 0100 1011
give to express the relation between the areas and the radii?
– Given several specimens of an undescribed weed species, how would you
describe the species as a whole and express its relation to other species
in a key??
– Given the results of tossing a coin10 times what conclusions can we draw
regarding the bias or lack of bias of the coin?

Prediction / induction
• What happened 1, 2, …, k times can be
generalized in the next …. - future … (until
n) times
42
5
1
0011 0010 1010 1101 0001 0100 1011
n) times
• These implies a statistical description of
“what happened”

Three questions
• What is a sample?
• What is the meaning of random?
42
5
1
0011 0010 1010 1101 0001 0100 1011
• What is the meaning of random?
• What is a variance?

Statistics
42
5
1
0011 0010 1010 1101 0001 0100 1011
Population census
Demographics
Taxes

Statistical data analysys
• A data series
(for ex. average February temperature for the years 1960-90)
might be sinthetically described as
– Measures of central tendency
• Average, median, mode
42
5
1
0011 0010 1010 1101 0001 0100 1011
• Average, median, mode
– Measures of dispersion
the way values are distributed around a central tendency
• variance
• amplitude

Sample standard deviation
42
5
1
0011 0010 1010 1101 0001 0100 1011

Calculation in Excel
42
5
1
0011 0010 1010 1101 0001 0100 1011

Raíz nr. % sucrose Raíz nr. % sucrose Raíz nr. % sucrose Raíz nr. % sucrose
1 11,8 26 13,5 51 10,1 76 9,0
2 13,1 27 11,9 52 12,4 77 14,0
3 9,2 28 16,7 53 10,8 78 13,2
4 8,7 29 9,6 54 11,3 79 15,0
5 12,9 30 15,1 55 6,3 80 13,8
6 13,7 31 14,6 56 15,7 81 15,1
7 9,6 32 10,4 57 14,3 82 14,9
8 13,7 33 13,4 58 15,0 83 12,6
9 8,5 34 14,6 59 12,5 84 14,1
10 15,7 35 10,5 60 11,8 85 11,4
11 14,1 36 8,6 61 11,6 86 9,4
12 11,9 37 15,2 62 11,2 87 12,4
42
5
1
0011 0010 1010 1101 0001 0100 1011
13 16,7 38 11,1 63 7,5 88 15,0
14 7,4 39 14,5 64 13,4 89 9,4
15 10,0 40 12,1 65 14,7 90 12,9
16 4,4 41 14,9 66 14,2 91 13,4
17 13,2 42 15,0 67 14,0 92 10,6
18 13,8 43 12,1 68 15,1 93 6,5
19 9,1 44 12,6 69 6,5 94 11,0
20 11,9 45 13,0 70 8,7 95 11,9
21 12,8 46 14,1 71 11,0 96 11,8
22 15,3 47 14,4 72 13,0 97 12,6
23 12,6 48 13,1 73 9,2 98 9,5
24 16,1 49 13,3 74 7,0 99 12,2
25 17,2 50 15,0 75 13,2 100 8,2

X1
X2
X3
…
a1
a2
a3
…
ak
b1
b2
b3
…
sa, a
42
5
1
0011 0010 1010 1101 0001 0100 1011
Xn
…
bm
c1
c2
c3
…
cp
σ, µsb, b
sc, c

A two-way table
RowsRowsRowsRows (i)(i)(i)(i) ColumnsColumnsColumnsColumns
1111
(j)(j)(j)(j)
2222 …………………… rrrr
TotalsTotalsTotalsTotals
YiYiYiYi....
MeansMeansMeansMeans
1 Y11 Y12 Y1r Y1. Ÿ1.
2 Y21 Y22 Y2r Y2. Ÿ2.
42
5
1
0011 0010 1010 1101 0001 0100 1011
2 Y21 Y22 Y2r Y2. Ÿ2.
… Yij
n Yn1 Ynr
Totals
Y.j
Y.1 Y.2 Y.r Y..
Means Ÿ.1 Ÿ.2 Ÿ..

Frequency distribution
• Maximum and minimum values
• Amplitude
• Number of classes:
42
5
1
0011 0010 1010 1101 0001 0100 1011
• Number of classes:
Sturges’ rule: k= 1 + 3.3 log N
• Class interval:
amplitude/k

Distributions
• The way a series of data is distributed as a
function of its relative frequency (frequency curve
or polygon)
• Normal distribution as interesting and useful
42
5
1
0011 0010 1010 1101 0001 0100 1011
• Normal distribution as interesting and useful
properties
– simmetry
– average, mode and median coincide
– we can know the probability or ocurrence of any value

Frequency polygon
16
20
24
Frequency polygon
42
5
1
0011 0010 1010 1101 0001 0100 1011
0
4
8
12
4,80 6,30 7,80 9,31 10,81 12,31 13,81 15,31 16,81
% de Sucrose
Histogram

Normal distribution
2,0%
2,5%
3,0%
60%
70%
80%
90%
100%
42
5
1
0011 0010 1010 1101 0001 0100 1011
0,0%
0,5%
1,0%
1,5%
0 20 40 60 80 100
0%
10%
20%
30%
40%
50%

Standard deviations
42
5
1
0011 0010 1010 1101 0001 0100 1011

Normal distribution and scales
42
5
1
0011 0010 1010 1101 0001 0100 1011

Different normal distributions
Differences in position Differences in dispersion
42
5
1
0011 0010 1010 1101 0001 0100 1011

Normal deviations
42
5
1
0011 0010 1010 1101 0001 0100 1011

Normal distribution table
42
5
1
0011 0010 1010 1101 0001 0100 1011

• Student'sStudent'sStudent'sStudent's tttt----distributiondistributiondistributiondistribution (or simply the tttt----
distributiondistributiondistributiondistribution) is a continuous probability
distribution that arises in the problem of
t-Student’s distribution
42
5
1
0011 0010 1010 1101 0001 0100 1011
distribution that arises in the problem of
estimating the mean of a normally
distributed population when the sample
size is small

t distribution
df=1
df=30
42
5
1
0011 0010 1010 1101 0001 0100 1011

t-Student’s distribution
42
5
1
0011 0010 1010 1101 0001 0100 1011

Chi-square distribution
42
5
1
0011 0010 1010 1101 0001 0100 1011

Finite vs. infinite
• The observation of reality is always finite
– 20 years vs. all years….
• The data I have are a sample of all data
42
5
1
0011 0010 1010 1101 0001 0100 1011
• The data I have are a sample of all data
possible about this subject
• Frequency distribution is only an
approximation (estimate) of the true
distribution

Sample dimensions
• The larger the sample size, the closer the
frequency distribution is to the “theoretical
distribution”.
• When sample size tends to infinity, the
42
5
1
0011 0010 1010 1101 0001 0100 1011
• When sample size tends to infinity, the
distribution tends to be well represented by the
normal distribution

Finite samples
• Is the distribution normal?
• Chi- Square test χ2
χ2 a quocient of variances
42
5
1
0011 0010 1010 1101 0001 0100 1011
χ2 a quocient of variances

Hipothesis testing
• Chi-square test as well as other statistic
tests
(t-Student, Fischer’s F, etc.)
are tests that use the instruments of
42
5
1
0011 0010 1010 1101 0001 0100 1011
are tests that use the instruments of
classical logic
• Null hypothesis: Ho (no dif.)
• Alternative hypothesis: H1 (sign. dif.)

Null Hypothesis
• Represents the attitude of observer’s
independence, i. e., the real attitude that
accepts reality as given, as data, as
42
5
1
0011 0010 1010 1101 0001 0100 1011
accepts reality as given, as data, as
opposed to manipulating it as a result of a
prejudice – the idea we make of it.

Probability and significance
• When the test value is larger than a table
value for the same degrees of freedom and
a chosen probability level (the power of the
42
5
1
0011 0010 1010 1101 0001 0100 1011
a chosen probability level (the power of the
test) the null hypothesis is refused.

type I error
• In 100 cases, I fail 5 times if the power of
the test is 95% (5% significance)
42
5
1
0011 0010 1010 1101 0001 0100 1011

Analysys of variance involves
• Partition of the sum of squares by origins
of variation
• Estimation of variance for each origin of
42
5
1
0011 0010 1010 1101 0001 0100 1011
• Estimation of variance for each origin of
variation
• Comparison of variances by F tests

ANOVA
VarVarVarVar.... ReplicationsReplicationsReplicationsReplications Y1.Y1.Y1.Y1. Y1Y1Y1Y1.(.(.(.(averaveraveraver))))
Yields of 2 wheat varieties from plots to which the
varieties (A and B) where randomly assigned (values in 100 kg)
42
5
1
0011 0010 1010 1101 0001 0100 1011
VarVarVarVar.... ReplicationsReplicationsReplicationsReplications Y1.Y1.Y1.Y1. Y1Y1Y1Y1.(.(.(.(averaveraveraver))))
A 19 14 15 17 20 85 17 Y1. (aver.)
B 23 19 19 21 18 100 20 Y2. (aver.)
100 kg is an old unit of mass: quintal or centner in English, quintal in French.
It is equivalente in the pound system to the unit hundredweight
http://en.wikipedia.org/wiki/Quintal

ANOVA – Analysis of variance
OriginOriginOriginOrigin ofofofof variationvariationvariationvariation DegreesDegreesDegreesDegrees ofofofof
freedomfreedomfreedomfreedom
SumSumSumSum ofofofof SquaresSquaresSquaresSquares MeanMeanMeanMean SumSumSumSum ofofofof
SquaresSquaresSquaresSquares
Total kr-1 SS MS
42
5
1
0011 0010 1010 1101 0001 0100 1011
Total kr-1 SS MS
Treatments k-1 SST MST
Within treatments
(Experimental Error)
k(r-1) SSError MSE

ANOVA (Wheat yield varieties)
• Step 1 – Outline the ANOVA table and list the sources of
variation and degrees of freedom
• Two sources of variation:
42
5
1
0011 0010 1010 1101 0001 0100 1011
• Between treatments (Varieties)
• Within treatments (replications)

• Anova table for the wheat example
OriginOriginOriginOrigin ofofofof variationvariationvariationvariation DegreesDegreesDegreesDegrees ofofofof
freedomfreedomfreedomfreedom
SumSumSumSum ofofofof SquaresSquaresSquaresSquares MeanMeanMeanMean SquaresSquaresSquaresSquares
Total kr-1 (2x5-1) 9 SST MST
ANOVA (wheat example) (cont.)
42
5
1
0011 0010 1010 1101 0001 0100 1011
Total kr-1 (2x5-1) 9 SST MST
Treatments k-1 (2-1)1 SStreatments MSTreatments
Within treatments
k(r-1) (2x(5-1)) 8 SSError MSE

• Step 2 – Calculate the total sum of squares
• SS = Σ (Yij – overall mean)2 64,5
• Step 3 – Calculate the sum of squares for treatments
• SST = Σ (Yi. – overall mean)2 4,5
• Step 4 – Calculate the sum of squares for error
42
5
1
0011 0010 1010 1101 0001 0100 1011
• Step 4 – Calculate the sum of squares for error
• SSE= SS-SST 64,5 – 4,5 = 60
• Step 5 – Calculate the mean squares
• MST = SST/(k-1) 4,5 MSE = SSE/k(r-1) 60 /8 = 7,5
• Step 6 – Calculate the F value
• F = MST / MSE 4,5 / 7,5 = 0,6

OriginOriginOriginOrigin ofofofof
variationvariationvariationvariation
dfdfdfdf SumSumSumSum ofofofof
SquaresSquaresSquaresSquares
MeanMeanMeanMean SquaresSquaresSquaresSquares FFFF valuesvaluesvaluesvalues
Total 9 64,5
42
5
1
0011 0010 1010 1101 0001 0100 1011
Total 9 64,5
Treatments 1 4,5 4,5
Within
treatments
8 60,0 7,5 MST/MSE =F*
=4,5/7,5
=0,6

F- table
42
5
1
0011 0010 1010 1101 0001 0100 1011

F-distribution (5;20)
42
5
1
0011 0010 1010 1101 0001 0100 1011

Partitioning of the sum of squares
SS = ∑ ∑ (Yij- Y..)
42
5
1
0011 0010 1010 1101 0001 0100 1011

Glossary so far
• Reality
• Data
• Sample
• Chance
– Randomness
• Frequency
• Frequency polygon
• Distribution functions
• Median
• Mode
• Deviation
42
5
1
0011 0010 1010 1101 0001 0100 1011
• Frequency
– Relative
– Absolute
• Average
• Mean
• Frequency classes
• Deviation
• Variance
• Standard deviation
• Coefficient of variation
• Hypothesis testing
• Confidence intervals
• ANOVA

Research, scientific method and
the experiment
• Research
– A systematic inquiry into a subject to discover
new facts or principles. The procedure for
42
5
1
0011 0010 1010 1101 0001 0100 1011
new facts or principles. The procedure for
research is generally known as the scientific
method

Scientific method
1. Formulation of an hypothesis
2. Planning an experiment to test the
hypothesis
42
5
1
0011 0010 1010 1101 0001 0100 1011
hypothesis
3. Careful observation and collection of data
from the experiment
4. Interpretation of the experimental results

Characteristics of a well planned experiment
• Simplicity
• Degree of precision
– Appropriate design and sufficient replication
• Absence of systematic error
42
5
1
0011 0010 1010 1101 0001 0100 1011
• Absence of systematic error
– No bias
• Range of validity of conclusions
– Replication on time and space
• Calculation of the degree of uncertainty
– Probability of obtaining the observed results by chance
alone

Steps in experimentation
1. Definition of the problem
Clearly and concisely;
if you can’t define there is little chance you can solve it
2. Statement of objectives
42
5
1
0011 0010 1010 1101 0001 0100 1011
2. Statement of objectives
Write down in precise terms; hierarchy
3. Selection of treatments
4. Selection of experimental material
Material used should be representative of the population

5. Selection of experimental design
Parcimony – the simplest possible
6. Selection of the unit for observation and
the number of replications
42
5
1
0011 0010 1010 1101 0001 0100 1011
the number of replications
7. Control of the “border effect”
8. Consideration of data to be collected
9. Outlinig statistical analysis and
summarization of results
Sources of variation in ANOVA
What means to compare?

10. Conducting the experiment
Procedures free from personal biases (fatigue, double-
checking, careful note-taking)
11. Analysing data and interpreting results
42
5
1
0011 0010 1010 1101 0001 0100 1011
Dont’t jump into conclusions even if statistically
significant
12. Preparation of a complete, readable and
correct report of the research
There is no such thing as a negative result

The three R’s of experimentation
I. Replicate
42
5
1
0011 0010 1010 1101 0001 0100 1011
II. Randomize
III. Request help

Linear correlation and regression
• The idea
– The more, the merrier
– The bigger they are, the harder they fall
– Easy come, easy go
42
5
1
0011 0010 1010 1101 0001 0100 1011
– Much haste, little speed
– The best gifts come in small packages
• 2 variables: dependent, independent
• Direct or inverse correlation;
• Measuring correlation:
– correlation coefficient ( r )

Regression
• The amount of change in one variable
associated with a unit change in the other
variable
42
5
1
0011 0010 1010 1101 0001 0100 1011
variable
• Correlation – refers to the fact that two
variables are related and to the closeness
of the relationship
• Regression – refers to the nature of the
relationship

Regression examples
• A penny saved is a penny earned
• A bird in hand is worth two in the bush
42
5
1
0011 0010 1010 1101 0001 0100 1011
• A stitch in time saves nine
• One picture is worth a thousand words

Sayings in math terms
IndependentIndependentIndependentIndependent varvarvarvar. X. X. X. X DependentDependentDependentDependent varvarvarvar. Y. Y. Y. Y RegressionRegressionRegressionRegression eqeqeqeq.... RegressionRegressionRegressionRegression coeffcoeffcoeffcoeff....
Pennies saved Pennies earned Y=X 1
Hand birds Bush birds Y=2X 2
Stitches in time Stitches saved Y=9X 9
42
5
1
0011 0010 1010 1101 0001 0100 1011
Stitches in time Stitches saved Y=9X 9
Pictures Words Y=1000X 1000
Y = mx + b

Y = mx+b
42
5
1
0011 0010 1010 1101 0001 0100 1011

y = 2,856x - 104908
R² = 0,9633
y = 0,972x - 35146
R² = 0,7851
1500
2000
2500
3000
3500
4000
4500
5000
5500
6000
Regression in Excel
42
5
1
0011 0010 1010 1101 0001 0100 1011
y = -1,884x + 69762
R² = 0,9846
-3500
-3000
-2500
-2000
-1500
-1000
-500
0
500
1000
Abr-01 Ago-01 Dez-01 Abr-02 Ago-02 Dez-02 Abr-03 Ago-03 Dez-03 Abr-04 Jul-04 Nov-04 Mar-05 Jul-05 Nov-05 Mar-06
Inscrições
Desistências e inactivações
Membros

y = 0,246x - 9540,3
R² = 0,1352800
1000
1200
42
5
1
0011 0010 1010 1101 0001 0100 1011
0
200
400
600
14-Nov 22-Fev 1-Jun 9-Set 18-Dez 28-Mar 6-Jul 14-Out 22-Jan 2-Mai 10-Ago

Linear regression and Excel
• On-line tutorial
– http://phoenix.phys.clemson.edu/tutorials/exc
el/regression.html
42
5
1
0011 0010 1010 1101 0001 0100 1011
el/regression.html

Researchmethods2012

Recommended

Recommended

More Related Content

Viewers also liked

Viewers also liked (9)

Similar to Researchmethods2012

Similar to Researchmethods2012 (20)

Recently uploaded

Recently uploaded (20)

Researchmethods2012