SlideShare a Scribd company logo
1 of 93
Download to read offline
Experimental methodology and
statistics
42
5
1
0011 0010 1010 1101 0001 0100 1011
Pedro Aguiar Pinto
papinto@isa.utl.pt
January 2012
Instituto Superior de Agronomia
Universidade Técnica de Lisboa
Portugal
Research
What for?
42
5
1
0011 0010 1010 1101 0001 0100 1011
What for?
What are we talking about?
How to?
Bernard de Clairvaux (1090-1153)
• There are five stimulii that push man towards Science:
– There are men that want to know for the simple pleasure of
knowing
• It is low curiosity
42
5
1
0011 0010 1010 1101 0001 0100 1011
• It is low curiosity
– There are other that want to know to be known:
• It is vanity
– Others want to possess science in order to sell it and make profit
and get honours
• It is a selfish motivation
– But there are some who want to know in order to edify
– and this is charity
• Others to be edified
– and this is wisdom
What is all about?
It is a matter of knowing
42
5
1
0011 0010 1010 1101 0001 0100 1011
42
5
1
0011 0010 1010 1101 0001 0100 1011
42
5
1
0011 0010 1010 1101 0001 0100 1011
42
5
1
0011 0010 1010 1101 0001 0100 1011
Observation and experiment
• Experimental method
– Experimental “pathway”
– Trial and error
42
5
1
0011 0010 1010 1101 0001 0100 1011
– Trial and error
– Logical deduction (deductive method)
– Test
42
5
1
0011 0010 1010 1101 0001 0100 1011
42
5
1
0011 0010 1010 1101 0001 0100 1011
42
5
1
0011 0010 1010 1101 0001 0100 1011
42
5
1
0011 0010 1010 1101 0001 0100 1011
42
5
1
0011 0010 1010 1101 0001 0100 1011
42
5
1
0011 0010 1010 1101 0001 0100 1011
42
5
1
0011 0010 1010 1101 0001 0100 1011
42
5
1
0011 0010 1010 1101 0001 0100 1011
42
5
1
0011 0010 1010 1101 0001 0100 1011
42
5
1
0011 0010 1010 1101 0001 0100 1011
ISO 3591, Sensory analysis - Wine tasting glass,
Sensorial analysis
42
5
1
0011 0010 1010 1101 0001 0100 1011
Environment / reality
• Scenary where the activity takes place
• The environment is a reality
– that is external to the activity, but has a marked
42
5
1
0011 0010 1010 1101 0001 0100 1011
– that is external to the activity, but has a marked
effect on it
– that can only be partially modified and in a limited
manner
Environmental characterization
• This reality, strange to the will or action of
man is given, is there, it is not made by
him.
42
5
1
0011 0010 1010 1101 0001 0100 1011
him.
• Environmental characterization
deals with knowing reality as such, as it is
given.
– Deals with knowing its characteristics
(most remarkable details)
Observation
• One gets to know reality (environment) by
observation
• Observation must prevent the observer’s
42
5
1
0011 0010 1010 1101 0001 0100 1011
• Observation must prevent the observer’s
influence as well as the influence of
observation tools, otherwise, what is being
observed differs from what is given.
Quantitative observation
• The characteristics of reality that matter to
our purpose are known by measurement
(observation) of physical dimensions or
42
5
1
0011 0010 1010 1101 0001 0100 1011
(observation) of physical dimensions or
quantities
• Observation error / measurement error
Variability
• Measuring the same characeristic results
in diferente values as a function of:
– observer (observation error)
42
5
1
0011 0010 1010 1101 0001 0100 1011
– observer (observation error)
– measurement tool (instrumental error)
– location (space and time) of the observation
• Environmental variability
Example: Soil characterization
• Soil physical characteristics have a variability
that is mainly spatial
– Visible in soil maps
– Giving meaning to
Precision Agriculture
42
5
1
0011 0010 1010 1101 0001 0100 1011
Precision Agriculture
• Time variability
(%H2O, %OM,…)
Climate characterization I
• Aerial environment characterization
departs from observations that are, by
nature, instantaneous
42
5
1
0011 0010 1010 1101 0001 0100 1011
• Weather characteristics (temperature,
atmospheric pressure, radiation,…) are in
each instant the focus of observation
Climate characterization II
• The nature of the factors that determine
weather
(radiation, general atmosphere circulation and rainfall mechanics)
introduces a large temporal variability that
42
5
1
0011 0010 1010 1101 0001 0100 1011
introduces a large temporal variability that
adds up to spatial variability
Climate characterization III
• Note that,
– soil characterization is attained by a set of
observations (in different locations and along
soil profile) that are not repeated in time
42
5
1
0011 0010 1010 1101 0001 0100 1011
– weather characterization demands observation
time series that incorporate time variation
Climate
• Results from the aggregation of a series of
observations of instantaneous weather
measurements
– for example, daily average temperature results
42
5
1
0011 0010 1010 1101 0001 0100 1011
– for example, daily average temperature results
from the arithmetic average between maximum
and minimun daily temperatures - 2
instantaneous observations used as estimates
of the daily “thermal climate”
Climate
• The concept we call climateclimate
results from a greater aggregation of
climate data already aggregated
,
42
5
1
0011 0010 1010 1101 0001 0100 1011
(averages and arithmetic sums over monthly periods of observation),
integrated in indices that allow the
differentiation of spatial and geographical
units
– Climate classification
Climate
• Each CLIMATE correspondes to a set of
climatological normals.
– Instantaneous observations
– Daily sums or averages
42
5
1
0011 0010 1010 1101 0001 0100 1011
– Daily sums or averages
– Monthly sums or averages
– Averages of monthly averages or averages of monthly
sums for a standard period (30 years)
Annual rainfall variability
625
700
775
850
925
30 year average =
617 mm
Annual rainfall variability
Mora, Portugal
30 years is a long time,but
not long enough to find a
pattern
42
5
1
0011 0010 1010 1101 0001 0100 1011
325
400
475
550
1955 1960 1965 1970 1975 1980 1985
617 mm
Climate normality
• The successive data agreggation that leads to
what one may call a “normal climate”s useful to
several purposes, but has the cost of cancelling
the natural variability of instantaneous
observations
42
5
1
0011 0010 1010 1101 0001 0100 1011
observations
• As a consequence, “the normal climate” is not
data anymore, but rather some sort of data
manipulation.
– Therefore, it is only by coincidence that ond can run
into a “normal year”
Uncertainty
• Variability, and in a more evident and
sensible fashion, time variability, illustrates
the question of uncertainty in the
knowledge of reality
42
5
1
0011 0010 1010 1101 0001 0100 1011
• In any case, it is with this “uncertain”
knowledge that we depart to make
decisions
• By the way, one needs to decide, to make a
decision, when he/she is not sure about
the outcome
Uncertainty
• In many cases it is useful to know the degree of
uncertainty linked to a decision;
and some times one can use predictive models to
make predictions/forecasts
• As opposed to observations, predictions are not
42
5
1
0011 0010 1010 1101 0001 0100 1011
data: they did not happen…
• Assumption: prediction supposes that the pattern
that was verified in the past holds in the future (or
changes in a given hypothetical manner).
Induction and deduction
• Data, observation, data analysis,
descriptive statistics, polls, samples
deductive method
42
5
1
0011 0010 1010 1101 0001 0100 1011
deductive method
• Experiments, results, observations,
conclusions, generalization, statistical
inference
inductive method
Deductive reasoning
• Given some general principle what happens
in a specific set of conditions:
– Given the formula for the area of a circle, what is the area of a circle
whose raius is 5?
42
5
1
0011 0010 1010 1101 0001 0100 1011
whose raius is 5?
– Given a key and description of herbaceous species in Southern France, to
what species does a certain plant belong?
– Give a coin whose probability of coming up heads when tossed is ½, what
will happen when the coin is tossed 10 times?
Inductive reasoning
• Given some specific cases, arrive to some
general principles that will apply to all:
– Given the areas and radii of several circles, what general formula can we
give to express the relation between the areas and the radii?
42
5
1
0011 0010 1010 1101 0001 0100 1011
give to express the relation between the areas and the radii?
– Given several specimens of an undescribed weed species, how would you
describe the species as a whole and express its relation to other species
in a key??
– Given the results of tossing a coin10 times what conclusions can we draw
regarding the bias or lack of bias of the coin?
Prediction / induction
• What happened 1, 2, …, k times can be
generalized in the next …. - future … (until
n) times
42
5
1
0011 0010 1010 1101 0001 0100 1011
n) times
• These implies a statistical description of
“what happened”
Three questions
• What is a sample?
• What is the meaning of random?
42
5
1
0011 0010 1010 1101 0001 0100 1011
• What is the meaning of random?
• What is a variance?
Statistics
42
5
1
0011 0010 1010 1101 0001 0100 1011
Population census
Demographics
Taxes
Statistical data analysys
• A data series
(for ex. average February temperature for the years 1960-90)
might be sinthetically described as
– Measures of central tendency
• Average, median, mode
42
5
1
0011 0010 1010 1101 0001 0100 1011
• Average, median, mode
– Measures of dispersion
the way values are distributed around a central tendency
• variance
• amplitude
Sample standard deviation
42
5
1
0011 0010 1010 1101 0001 0100 1011
Calculation in Excel
42
5
1
0011 0010 1010 1101 0001 0100 1011
Raíz nr. % sucrose Raíz nr. % sucrose Raíz nr. % sucrose Raíz nr. % sucrose
1 11,8 26 13,5 51 10,1 76 9,0
2 13,1 27 11,9 52 12,4 77 14,0
3 9,2 28 16,7 53 10,8 78 13,2
4 8,7 29 9,6 54 11,3 79 15,0
5 12,9 30 15,1 55 6,3 80 13,8
6 13,7 31 14,6 56 15,7 81 15,1
7 9,6 32 10,4 57 14,3 82 14,9
8 13,7 33 13,4 58 15,0 83 12,6
9 8,5 34 14,6 59 12,5 84 14,1
10 15,7 35 10,5 60 11,8 85 11,4
11 14,1 36 8,6 61 11,6 86 9,4
12 11,9 37 15,2 62 11,2 87 12,4
42
5
1
0011 0010 1010 1101 0001 0100 1011
13 16,7 38 11,1 63 7,5 88 15,0
14 7,4 39 14,5 64 13,4 89 9,4
15 10,0 40 12,1 65 14,7 90 12,9
16 4,4 41 14,9 66 14,2 91 13,4
17 13,2 42 15,0 67 14,0 92 10,6
18 13,8 43 12,1 68 15,1 93 6,5
19 9,1 44 12,6 69 6,5 94 11,0
20 11,9 45 13,0 70 8,7 95 11,9
21 12,8 46 14,1 71 11,0 96 11,8
22 15,3 47 14,4 72 13,0 97 12,6
23 12,6 48 13,1 73 9,2 98 9,5
24 16,1 49 13,3 74 7,0 99 12,2
25 17,2 50 15,0 75 13,2 100 8,2
X1
X2
X3
…
a1
a2
a3
…
ak
b1
b2
b3
…
sa, a
42
5
1
0011 0010 1010 1101 0001 0100 1011
Xn
…
bm
c1
c2
c3
…
cp
σ, µsb, b
sc, c
A two-way table
RowsRowsRowsRows (i)(i)(i)(i) ColumnsColumnsColumnsColumns
1111
(j)(j)(j)(j)
2222 …………………… rrrr
TotalsTotalsTotalsTotals
YiYiYiYi....
MeansMeansMeansMeans
1 Y11 Y12 Y1r Y1. Ÿ1.
2 Y21 Y22 Y2r Y2. Ÿ2.
42
5
1
0011 0010 1010 1101 0001 0100 1011
2 Y21 Y22 Y2r Y2. Ÿ2.
… Yij
n Yn1 Ynr
Totals
Y.j
Y.1 Y.2 Y.r Y..
Means Ÿ.1 Ÿ.2 Ÿ..
Frequency distribution
• Maximum and minimum values
• Amplitude
• Number of classes:
42
5
1
0011 0010 1010 1101 0001 0100 1011
• Number of classes:
Sturges’ rule: k= 1 + 3.3 log N
• Class interval:
amplitude/k
Distributions
• The way a series of data is distributed as a
function of its relative frequency (frequency curve
or polygon)
• Normal distribution as interesting and useful
42
5
1
0011 0010 1010 1101 0001 0100 1011
• Normal distribution as interesting and useful
properties
– simmetry
– average, mode and median coincide
– we can know the probability or ocurrence of any value
Frequency polygon
16
20
24
Frequency polygon
42
5
1
0011 0010 1010 1101 0001 0100 1011
0
4
8
12
4,80 6,30 7,80 9,31 10,81 12,31 13,81 15,31 16,81
% de Sucrose
Histogram
Normal distribution
2,0%
2,5%
3,0%
60%
70%
80%
90%
100%
42
5
1
0011 0010 1010 1101 0001 0100 1011
0,0%
0,5%
1,0%
1,5%
0 20 40 60 80 100
0%
10%
20%
30%
40%
50%
Standard deviations
42
5
1
0011 0010 1010 1101 0001 0100 1011
Normal distribution and scales
42
5
1
0011 0010 1010 1101 0001 0100 1011
Different normal distributions
Differences in position Differences in dispersion
42
5
1
0011 0010 1010 1101 0001 0100 1011
Normal deviations
42
5
1
0011 0010 1010 1101 0001 0100 1011
Normal distribution table
42
5
1
0011 0010 1010 1101 0001 0100 1011
• Student'sStudent'sStudent'sStudent's tttt----distributiondistributiondistributiondistribution (or simply the tttt----
distributiondistributiondistributiondistribution) is a continuous probability
distribution that arises in the problem of
t-Student’s distribution
42
5
1
0011 0010 1010 1101 0001 0100 1011
distribution that arises in the problem of
estimating the mean of a normally
distributed population when the sample
size is small
t distribution
df=1
df=30
42
5
1
0011 0010 1010 1101 0001 0100 1011
t-Student’s distribution
42
5
1
0011 0010 1010 1101 0001 0100 1011
Chi-square distribution
42
5
1
0011 0010 1010 1101 0001 0100 1011
Finite vs. infinite
• The observation of reality is always finite
– 20 years vs. all years….
• The data I have are a sample of all data
42
5
1
0011 0010 1010 1101 0001 0100 1011
• The data I have are a sample of all data
possible about this subject
• Frequency distribution is only an
approximation (estimate) of the true
distribution
Sample dimensions
• The larger the sample size, the closer the
frequency distribution is to the “theoretical
distribution”.
• When sample size tends to infinity, the
42
5
1
0011 0010 1010 1101 0001 0100 1011
• When sample size tends to infinity, the
distribution tends to be well represented by the
normal distribution
Finite samples
• Is the distribution normal?
• Chi- Square test χ2
χ2 a quocient of variances
42
5
1
0011 0010 1010 1101 0001 0100 1011
χ2 a quocient of variances
Hipothesis testing
• Chi-square test as well as other statistic
tests
(t-Student, Fischer’s F, etc.)
are tests that use the instruments of
42
5
1
0011 0010 1010 1101 0001 0100 1011
are tests that use the instruments of
classical logic
• Null hypothesis: Ho (no dif.)
• Alternative hypothesis: H1 (sign. dif.)
Null Hypothesis
• Represents the attitude of observer’s
independence, i. e., the real attitude that
accepts reality as given, as data, as
42
5
1
0011 0010 1010 1101 0001 0100 1011
accepts reality as given, as data, as
opposed to manipulating it as a result of a
prejudice – the idea we make of it.
Probability and significance
• When the test value is larger than a table
value for the same degrees of freedom and
a chosen probability level (the power of the
42
5
1
0011 0010 1010 1101 0001 0100 1011
a chosen probability level (the power of the
test) the null hypothesis is refused.
type I error
• In 100 cases, I fail 5 times if the power of
the test is 95% (5% significance)
42
5
1
0011 0010 1010 1101 0001 0100 1011
Analysys of variance involves
• Partition of the sum of squares by origins
of variation
• Estimation of variance for each origin of
42
5
1
0011 0010 1010 1101 0001 0100 1011
• Estimation of variance for each origin of
variation
• Comparison of variances by F tests
ANOVA
VarVarVarVar.... ReplicationsReplicationsReplicationsReplications Y1.Y1.Y1.Y1. Y1Y1Y1Y1.(.(.(.(averaveraveraver))))
Yields of 2 wheat varieties from plots to which the
varieties (A and B) where randomly assigned (values in 100 kg)
42
5
1
0011 0010 1010 1101 0001 0100 1011
VarVarVarVar.... ReplicationsReplicationsReplicationsReplications Y1.Y1.Y1.Y1. Y1Y1Y1Y1.(.(.(.(averaveraveraver))))
A 19 14 15 17 20 85 17 Y1. (aver.)
B 23 19 19 21 18 100 20 Y2. (aver.)
100 kg is an old unit of mass: quintal or centner in English, quintal in French.
It is equivalente in the pound system to the unit hundredweight
http://en.wikipedia.org/wiki/Quintal
ANOVA – Analysis of variance
OriginOriginOriginOrigin ofofofof variationvariationvariationvariation DegreesDegreesDegreesDegrees ofofofof
freedomfreedomfreedomfreedom
SumSumSumSum ofofofof SquaresSquaresSquaresSquares MeanMeanMeanMean SumSumSumSum ofofofof
SquaresSquaresSquaresSquares
Total kr-1 SS MS
42
5
1
0011 0010 1010 1101 0001 0100 1011
Total kr-1 SS MS
Treatments k-1 SST MST
Within treatments
(Experimental Error)
k(r-1) SSError MSE
ANOVA (Wheat yield varieties)
• Step 1 – Outline the ANOVA table and list the sources of
variation and degrees of freedom
• Two sources of variation:
42
5
1
0011 0010 1010 1101 0001 0100 1011
• Between treatments (Varieties)
• Within treatments (replications)
• Anova table for the wheat example
OriginOriginOriginOrigin ofofofof variationvariationvariationvariation DegreesDegreesDegreesDegrees ofofofof
freedomfreedomfreedomfreedom
SumSumSumSum ofofofof SquaresSquaresSquaresSquares MeanMeanMeanMean SquaresSquaresSquaresSquares
Total kr-1 (2x5-1) 9 SST MST
ANOVA (wheat example) (cont.)
42
5
1
0011 0010 1010 1101 0001 0100 1011
Total kr-1 (2x5-1) 9 SST MST
Treatments k-1 (2-1)1 SStreatments MSTreatments
Within treatments
(Experimental Error)
k(r-1) (2x(5-1)) 8 SSError MSE
ANOVA (wheat example) (cont.)
• Step 2 – Calculate the total sum of squares
• SS = Σ (Yij – overall mean)2 64,5
• Step 3 – Calculate the sum of squares for treatments
• SST = Σ (Yi. – overall mean)2 4,5
• Step 4 – Calculate the sum of squares for error
42
5
1
0011 0010 1010 1101 0001 0100 1011
• Step 4 – Calculate the sum of squares for error
• SSE= SS-SST 64,5 – 4,5 = 60
• Step 5 – Calculate the mean squares
• MST = SST/(k-1) 4,5 MSE = SSE/k(r-1) 60 /8 = 7,5
• Step 6 – Calculate the F value
• F = MST / MSE 4,5 / 7,5 = 0,6
OriginOriginOriginOrigin ofofofof
variationvariationvariationvariation
dfdfdfdf SumSumSumSum ofofofof
SquaresSquaresSquaresSquares
MeanMeanMeanMean SquaresSquaresSquaresSquares FFFF valuesvaluesvaluesvalues
Total 9 64,5
ANOVA (wheat example) (cont.)
42
5
1
0011 0010 1010 1101 0001 0100 1011
Total 9 64,5
Treatments 1 4,5 4,5
Within
treatments
(Experimental Error)
8 60,0 7,5 MST/MSE =F*
=4,5/7,5
=0,6
F- table
42
5
1
0011 0010 1010 1101 0001 0100 1011
F-distribution (5;20)
42
5
1
0011 0010 1010 1101 0001 0100 1011
Partitioning of the sum of squares
SS = ∑ ∑ (Yij- Y..)
42
5
1
0011 0010 1010 1101 0001 0100 1011
Glossary so far
• Reality
• Data
• Sample
• Chance
– Randomness
• Frequency
• Frequency polygon
• Distribution functions
• Median
• Mode
• Deviation
42
5
1
0011 0010 1010 1101 0001 0100 1011
• Frequency
– Relative
– Absolute
• Average
• Mean
• Frequency classes
• Deviation
• Variance
• Standard deviation
• Coefficient of variation
• Hypothesis testing
• Confidence intervals
• ANOVA
Research, scientific method and
the experiment
• Research
– A systematic inquiry into a subject to discover
new facts or principles. The procedure for
42
5
1
0011 0010 1010 1101 0001 0100 1011
new facts or principles. The procedure for
research is generally known as the scientific
method
Scientific method
1. Formulation of an hypothesis
2. Planning an experiment to test the
hypothesis
42
5
1
0011 0010 1010 1101 0001 0100 1011
hypothesis
3. Careful observation and collection of data
from the experiment
4. Interpretation of the experimental results
Characteristics of a well planned experiment
• Simplicity
• Degree of precision
– Appropriate design and sufficient replication
• Absence of systematic error
42
5
1
0011 0010 1010 1101 0001 0100 1011
• Absence of systematic error
– No bias
• Range of validity of conclusions
– Replication on time and space
• Calculation of the degree of uncertainty
– Probability of obtaining the observed results by chance
alone
Steps in experimentation
1. Definition of the problem
Clearly and concisely;
if you can’t define there is little chance you can solve it
2. Statement of objectives
42
5
1
0011 0010 1010 1101 0001 0100 1011
2. Statement of objectives
Write down in precise terms; hierarchy
3. Selection of treatments
4. Selection of experimental material
Material used should be representative of the population
Steps in experimentation
5. Selection of experimental design
Parcimony – the simplest possible
6. Selection of the unit for observation and
the number of replications
42
5
1
0011 0010 1010 1101 0001 0100 1011
the number of replications
7. Control of the “border effect”
8. Consideration of data to be collected
9. Outlinig statistical analysis and
summarization of results
Sources of variation in ANOVA
What means to compare?
Steps in experimentation
10. Conducting the experiment
Procedures free from personal biases (fatigue, double-
checking, careful note-taking)
11. Analysing data and interpreting results
42
5
1
0011 0010 1010 1101 0001 0100 1011
Dont’t jump into conclusions even if statistically
significant
12. Preparation of a complete, readable and
correct report of the research
There is no such thing as a negative result
The three R’s of experimentation
I. Replicate
42
5
1
0011 0010 1010 1101 0001 0100 1011
II. Randomize
III. Request help
Linear correlation and regression
• The idea
– The more, the merrier
– The bigger they are, the harder they fall
– Easy come, easy go
42
5
1
0011 0010 1010 1101 0001 0100 1011
– Much haste, little speed
– The best gifts come in small packages
• 2 variables: dependent, independent
• Direct or inverse correlation;
• Measuring correlation:
– correlation coefficient ( r )
Regression
• The amount of change in one variable
associated with a unit change in the other
variable
42
5
1
0011 0010 1010 1101 0001 0100 1011
variable
• Correlation – refers to the fact that two
variables are related and to the closeness
of the relationship
• Regression – refers to the nature of the
relationship
Regression examples
• A penny saved is a penny earned
• A bird in hand is worth two in the bush
42
5
1
0011 0010 1010 1101 0001 0100 1011
• A stitch in time saves nine
• One picture is worth a thousand words
Sayings in math terms
IndependentIndependentIndependentIndependent varvarvarvar. X. X. X. X DependentDependentDependentDependent varvarvarvar. Y. Y. Y. Y RegressionRegressionRegressionRegression eqeqeqeq.... RegressionRegressionRegressionRegression coeffcoeffcoeffcoeff....
Pennies saved Pennies earned Y=X 1
Hand birds Bush birds Y=2X 2
Stitches in time Stitches saved Y=9X 9
42
5
1
0011 0010 1010 1101 0001 0100 1011
Stitches in time Stitches saved Y=9X 9
Pictures Words Y=1000X 1000
Y = mx + b
Y = mx+b
42
5
1
0011 0010 1010 1101 0001 0100 1011
y = 2,856x - 104908
R² = 0,9633
y = 0,972x - 35146
R² = 0,7851
1500
2000
2500
3000
3500
4000
4500
5000
5500
6000
Regression in Excel
42
5
1
0011 0010 1010 1101 0001 0100 1011
y = -1,884x + 69762
R² = 0,9846
-3500
-3000
-2500
-2000
-1500
-1000
-500
0
500
1000
Abr-01 Ago-01 Dez-01 Abr-02 Ago-02 Dez-02 Abr-03 Ago-03 Dez-03 Abr-04 Jul-04 Nov-04 Mar-05 Jul-05 Nov-05 Mar-06
Inscrições
Desistências e inactivações
Membros
y = 0,246x - 9540,3
R² = 0,1352800
1000
1200
42
5
1
0011 0010 1010 1101 0001 0100 1011
0
200
400
600
14-Nov 22-Fev 1-Jun 9-Set 18-Dez 28-Mar 6-Jul 14-Out 22-Jan 2-Mai 10-Ago
Linear regression and Excel
• On-line tutorial
– http://phoenix.phys.clemson.edu/tutorials/exc
el/regression.html
42
5
1
0011 0010 1010 1101 0001 0100 1011
el/regression.html

More Related Content

Viewers also liked (9)

Incerteza
IncertezaIncerteza
Incerteza
 
Tapada da ajuda
Tapada da ajudaTapada da ajuda
Tapada da ajuda
 
Uncertaintyclimatechangeandmodeling
UncertaintyclimatechangeandmodelingUncertaintyclimatechangeandmodeling
Uncertaintyclimatechangeandmodeling
 
Producaovegetal2015
Producaovegetal2015Producaovegetal2015
Producaovegetal2015
 
Agricultura biologicaintro
Agricultura biologicaintroAgricultura biologicaintro
Agricultura biologicaintro
 
Agricultura precisao
Agricultura precisaoAgricultura precisao
Agricultura precisao
 
O movimento slow e a Agricultura Biológica
O movimento slow e a Agricultura BiológicaO movimento slow e a Agricultura Biológica
O movimento slow e a Agricultura Biológica
 
Agricultura: monótona e diversa
Agricultura: monótona e diversaAgricultura: monótona e diversa
Agricultura: monótona e diversa
 
Princípios agronómicos da Producao vegetal 2014
Princípios agronómicos da Producao vegetal 2014Princípios agronómicos da Producao vegetal 2014
Princípios agronómicos da Producao vegetal 2014
 

Similar to Researchmethods2012

Lecture 3 (handout)
Lecture 3 (handout)Lecture 3 (handout)
Lecture 3 (handout)
ibased
 
Biostatistics CH Lecture Pack
Biostatistics CH Lecture PackBiostatistics CH Lecture Pack
Biostatistics CH Lecture Pack
Shaun Cochrane
 
Measures of dispersion
Measures of dispersionMeasures of dispersion
Measures of dispersion
DrZahid Khan
 

Similar to Researchmethods2012 (20)

Perimetry
PerimetryPerimetry
Perimetry
 
Scientific Method.ppt
Scientific Method.pptScientific Method.ppt
Scientific Method.ppt
 
Lecture 3 (handout)
Lecture 3 (handout)Lecture 3 (handout)
Lecture 3 (handout)
 
Science and Hypothesis.ppt
Science and Hypothesis.pptScience and Hypothesis.ppt
Science and Hypothesis.ppt
 
Statistics for the Health Scientist: Basic Statistics II
Statistics for the Health Scientist: Basic Statistics IIStatistics for the Health Scientist: Basic Statistics II
Statistics for the Health Scientist: Basic Statistics II
 
Brain death n drowning
Brain death n drowningBrain death n drowning
Brain death n drowning
 
Physiology of Aging3
Physiology of Aging3Physiology of Aging3
Physiology of Aging3
 
Scientific Method
Scientific MethodScientific Method
Scientific Method
 
Lecture 1.pptx
Lecture 1.pptxLecture 1.pptx
Lecture 1.pptx
 
Presentation mopb.pptx
Presentation mopb.pptxPresentation mopb.pptx
Presentation mopb.pptx
 
Unit 1 - Introduction to Chemistry (2017/2018)
Unit 1 - Introduction to Chemistry (2017/2018)Unit 1 - Introduction to Chemistry (2017/2018)
Unit 1 - Introduction to Chemistry (2017/2018)
 
Biostatistics CH Lecture Pack
Biostatistics CH Lecture PackBiostatistics CH Lecture Pack
Biostatistics CH Lecture Pack
 
Cranial nerves examination ih
Cranial nerves examination ihCranial nerves examination ih
Cranial nerves examination ih
 
Tonometry
TonometryTonometry
Tonometry
 
Applied Statistics Chapter 2 Time series (1).ppt
Applied Statistics Chapter 2 Time series (1).pptApplied Statistics Chapter 2 Time series (1).ppt
Applied Statistics Chapter 2 Time series (1).ppt
 
GROUNDWATER FLOW SIMULATION IN GUIMARAS ISLAND, PHILIPPINE
GROUNDWATER FLOW SIMULATION IN GUIMARAS ISLAND, PHILIPPINEGROUNDWATER FLOW SIMULATION IN GUIMARAS ISLAND, PHILIPPINE
GROUNDWATER FLOW SIMULATION IN GUIMARAS ISLAND, PHILIPPINE
 
Measures of dispersion
Measures of dispersionMeasures of dispersion
Measures of dispersion
 
8134485.ppt
8134485.ppt8134485.ppt
8134485.ppt
 
Visual field testing and interpretation
Visual field testing and interpretationVisual field testing and interpretation
Visual field testing and interpretation
 
Introduction to Biostatistics
Introduction to BiostatisticsIntroduction to Biostatistics
Introduction to Biostatistics
 

Recently uploaded

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Recently uploaded (20)

2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 

Researchmethods2012

  • 1. Experimental methodology and statistics 42 5 1 0011 0010 1010 1101 0001 0100 1011 Pedro Aguiar Pinto papinto@isa.utl.pt January 2012 Instituto Superior de Agronomia Universidade Técnica de Lisboa Portugal
  • 2. Research What for? 42 5 1 0011 0010 1010 1101 0001 0100 1011 What for? What are we talking about? How to?
  • 3. Bernard de Clairvaux (1090-1153) • There are five stimulii that push man towards Science: – There are men that want to know for the simple pleasure of knowing • It is low curiosity 42 5 1 0011 0010 1010 1101 0001 0100 1011 • It is low curiosity – There are other that want to know to be known: • It is vanity – Others want to possess science in order to sell it and make profit and get honours • It is a selfish motivation – But there are some who want to know in order to edify – and this is charity • Others to be edified – and this is wisdom
  • 4. What is all about? It is a matter of knowing 42 5 1 0011 0010 1010 1101 0001 0100 1011
  • 5. 42 5 1 0011 0010 1010 1101 0001 0100 1011
  • 6. 42 5 1 0011 0010 1010 1101 0001 0100 1011
  • 7. 42 5 1 0011 0010 1010 1101 0001 0100 1011
  • 8. Observation and experiment • Experimental method – Experimental “pathway” – Trial and error 42 5 1 0011 0010 1010 1101 0001 0100 1011 – Trial and error – Logical deduction (deductive method) – Test
  • 9. 42 5 1 0011 0010 1010 1101 0001 0100 1011
  • 10. 42 5 1 0011 0010 1010 1101 0001 0100 1011
  • 11. 42 5 1 0011 0010 1010 1101 0001 0100 1011
  • 12. 42 5 1 0011 0010 1010 1101 0001 0100 1011
  • 13. 42 5 1 0011 0010 1010 1101 0001 0100 1011
  • 14. 42 5 1 0011 0010 1010 1101 0001 0100 1011
  • 15. 42 5 1 0011 0010 1010 1101 0001 0100 1011
  • 16. 42 5 1 0011 0010 1010 1101 0001 0100 1011
  • 17. 42 5 1 0011 0010 1010 1101 0001 0100 1011
  • 18. 42 5 1 0011 0010 1010 1101 0001 0100 1011 ISO 3591, Sensory analysis - Wine tasting glass,
  • 19. Sensorial analysis 42 5 1 0011 0010 1010 1101 0001 0100 1011
  • 20. Environment / reality • Scenary where the activity takes place • The environment is a reality – that is external to the activity, but has a marked 42 5 1 0011 0010 1010 1101 0001 0100 1011 – that is external to the activity, but has a marked effect on it – that can only be partially modified and in a limited manner
  • 21. Environmental characterization • This reality, strange to the will or action of man is given, is there, it is not made by him. 42 5 1 0011 0010 1010 1101 0001 0100 1011 him. • Environmental characterization deals with knowing reality as such, as it is given. – Deals with knowing its characteristics (most remarkable details)
  • 22. Observation • One gets to know reality (environment) by observation • Observation must prevent the observer’s 42 5 1 0011 0010 1010 1101 0001 0100 1011 • Observation must prevent the observer’s influence as well as the influence of observation tools, otherwise, what is being observed differs from what is given.
  • 23. Quantitative observation • The characteristics of reality that matter to our purpose are known by measurement (observation) of physical dimensions or 42 5 1 0011 0010 1010 1101 0001 0100 1011 (observation) of physical dimensions or quantities • Observation error / measurement error
  • 24. Variability • Measuring the same characeristic results in diferente values as a function of: – observer (observation error) 42 5 1 0011 0010 1010 1101 0001 0100 1011 – observer (observation error) – measurement tool (instrumental error) – location (space and time) of the observation • Environmental variability
  • 25. Example: Soil characterization • Soil physical characteristics have a variability that is mainly spatial – Visible in soil maps – Giving meaning to Precision Agriculture 42 5 1 0011 0010 1010 1101 0001 0100 1011 Precision Agriculture • Time variability (%H2O, %OM,…)
  • 26. Climate characterization I • Aerial environment characterization departs from observations that are, by nature, instantaneous 42 5 1 0011 0010 1010 1101 0001 0100 1011 • Weather characteristics (temperature, atmospheric pressure, radiation,…) are in each instant the focus of observation
  • 27. Climate characterization II • The nature of the factors that determine weather (radiation, general atmosphere circulation and rainfall mechanics) introduces a large temporal variability that 42 5 1 0011 0010 1010 1101 0001 0100 1011 introduces a large temporal variability that adds up to spatial variability
  • 28. Climate characterization III • Note that, – soil characterization is attained by a set of observations (in different locations and along soil profile) that are not repeated in time 42 5 1 0011 0010 1010 1101 0001 0100 1011 – weather characterization demands observation time series that incorporate time variation
  • 29. Climate • Results from the aggregation of a series of observations of instantaneous weather measurements – for example, daily average temperature results 42 5 1 0011 0010 1010 1101 0001 0100 1011 – for example, daily average temperature results from the arithmetic average between maximum and minimun daily temperatures - 2 instantaneous observations used as estimates of the daily “thermal climate”
  • 30. Climate • The concept we call climateclimate results from a greater aggregation of climate data already aggregated , 42 5 1 0011 0010 1010 1101 0001 0100 1011 (averages and arithmetic sums over monthly periods of observation), integrated in indices that allow the differentiation of spatial and geographical units – Climate classification
  • 31. Climate • Each CLIMATE correspondes to a set of climatological normals. – Instantaneous observations – Daily sums or averages 42 5 1 0011 0010 1010 1101 0001 0100 1011 – Daily sums or averages – Monthly sums or averages – Averages of monthly averages or averages of monthly sums for a standard period (30 years)
  • 32. Annual rainfall variability 625 700 775 850 925 30 year average = 617 mm Annual rainfall variability Mora, Portugal 30 years is a long time,but not long enough to find a pattern 42 5 1 0011 0010 1010 1101 0001 0100 1011 325 400 475 550 1955 1960 1965 1970 1975 1980 1985 617 mm
  • 33. Climate normality • The successive data agreggation that leads to what one may call a “normal climate”s useful to several purposes, but has the cost of cancelling the natural variability of instantaneous observations 42 5 1 0011 0010 1010 1101 0001 0100 1011 observations • As a consequence, “the normal climate” is not data anymore, but rather some sort of data manipulation. – Therefore, it is only by coincidence that ond can run into a “normal year”
  • 34. Uncertainty • Variability, and in a more evident and sensible fashion, time variability, illustrates the question of uncertainty in the knowledge of reality 42 5 1 0011 0010 1010 1101 0001 0100 1011 • In any case, it is with this “uncertain” knowledge that we depart to make decisions • By the way, one needs to decide, to make a decision, when he/she is not sure about the outcome
  • 35. Uncertainty • In many cases it is useful to know the degree of uncertainty linked to a decision; and some times one can use predictive models to make predictions/forecasts • As opposed to observations, predictions are not 42 5 1 0011 0010 1010 1101 0001 0100 1011 data: they did not happen… • Assumption: prediction supposes that the pattern that was verified in the past holds in the future (or changes in a given hypothetical manner).
  • 36. Induction and deduction • Data, observation, data analysis, descriptive statistics, polls, samples deductive method 42 5 1 0011 0010 1010 1101 0001 0100 1011 deductive method • Experiments, results, observations, conclusions, generalization, statistical inference inductive method
  • 37. Deductive reasoning • Given some general principle what happens in a specific set of conditions: – Given the formula for the area of a circle, what is the area of a circle whose raius is 5? 42 5 1 0011 0010 1010 1101 0001 0100 1011 whose raius is 5? – Given a key and description of herbaceous species in Southern France, to what species does a certain plant belong? – Give a coin whose probability of coming up heads when tossed is ½, what will happen when the coin is tossed 10 times?
  • 38. Inductive reasoning • Given some specific cases, arrive to some general principles that will apply to all: – Given the areas and radii of several circles, what general formula can we give to express the relation between the areas and the radii? 42 5 1 0011 0010 1010 1101 0001 0100 1011 give to express the relation between the areas and the radii? – Given several specimens of an undescribed weed species, how would you describe the species as a whole and express its relation to other species in a key?? – Given the results of tossing a coin10 times what conclusions can we draw regarding the bias or lack of bias of the coin?
  • 39. Prediction / induction • What happened 1, 2, …, k times can be generalized in the next …. - future … (until n) times 42 5 1 0011 0010 1010 1101 0001 0100 1011 n) times • These implies a statistical description of “what happened”
  • 40. Three questions • What is a sample? • What is the meaning of random? 42 5 1 0011 0010 1010 1101 0001 0100 1011 • What is the meaning of random? • What is a variance?
  • 41. Statistics 42 5 1 0011 0010 1010 1101 0001 0100 1011 Population census Demographics Taxes
  • 42. Statistical data analysys • A data series (for ex. average February temperature for the years 1960-90) might be sinthetically described as – Measures of central tendency • Average, median, mode 42 5 1 0011 0010 1010 1101 0001 0100 1011 • Average, median, mode – Measures of dispersion the way values are distributed around a central tendency • variance • amplitude
  • 43. Sample standard deviation 42 5 1 0011 0010 1010 1101 0001 0100 1011
  • 44. Calculation in Excel 42 5 1 0011 0010 1010 1101 0001 0100 1011
  • 45. Raíz nr. % sucrose Raíz nr. % sucrose Raíz nr. % sucrose Raíz nr. % sucrose 1 11,8 26 13,5 51 10,1 76 9,0 2 13,1 27 11,9 52 12,4 77 14,0 3 9,2 28 16,7 53 10,8 78 13,2 4 8,7 29 9,6 54 11,3 79 15,0 5 12,9 30 15,1 55 6,3 80 13,8 6 13,7 31 14,6 56 15,7 81 15,1 7 9,6 32 10,4 57 14,3 82 14,9 8 13,7 33 13,4 58 15,0 83 12,6 9 8,5 34 14,6 59 12,5 84 14,1 10 15,7 35 10,5 60 11,8 85 11,4 11 14,1 36 8,6 61 11,6 86 9,4 12 11,9 37 15,2 62 11,2 87 12,4 42 5 1 0011 0010 1010 1101 0001 0100 1011 13 16,7 38 11,1 63 7,5 88 15,0 14 7,4 39 14,5 64 13,4 89 9,4 15 10,0 40 12,1 65 14,7 90 12,9 16 4,4 41 14,9 66 14,2 91 13,4 17 13,2 42 15,0 67 14,0 92 10,6 18 13,8 43 12,1 68 15,1 93 6,5 19 9,1 44 12,6 69 6,5 94 11,0 20 11,9 45 13,0 70 8,7 95 11,9 21 12,8 46 14,1 71 11,0 96 11,8 22 15,3 47 14,4 72 13,0 97 12,6 23 12,6 48 13,1 73 9,2 98 9,5 24 16,1 49 13,3 74 7,0 99 12,2 25 17,2 50 15,0 75 13,2 100 8,2
  • 46. X1 X2 X3 … a1 a2 a3 … ak b1 b2 b3 … sa, a 42 5 1 0011 0010 1010 1101 0001 0100 1011 Xn … bm c1 c2 c3 … cp σ, µsb, b sc, c
  • 47. A two-way table RowsRowsRowsRows (i)(i)(i)(i) ColumnsColumnsColumnsColumns 1111 (j)(j)(j)(j) 2222 …………………… rrrr TotalsTotalsTotalsTotals YiYiYiYi.... MeansMeansMeansMeans 1 Y11 Y12 Y1r Y1. Ÿ1. 2 Y21 Y22 Y2r Y2. Ÿ2. 42 5 1 0011 0010 1010 1101 0001 0100 1011 2 Y21 Y22 Y2r Y2. Ÿ2. … Yij n Yn1 Ynr Totals Y.j Y.1 Y.2 Y.r Y.. Means Ÿ.1 Ÿ.2 Ÿ..
  • 48. Frequency distribution • Maximum and minimum values • Amplitude • Number of classes: 42 5 1 0011 0010 1010 1101 0001 0100 1011 • Number of classes: Sturges’ rule: k= 1 + 3.3 log N • Class interval: amplitude/k
  • 49. Distributions • The way a series of data is distributed as a function of its relative frequency (frequency curve or polygon) • Normal distribution as interesting and useful 42 5 1 0011 0010 1010 1101 0001 0100 1011 • Normal distribution as interesting and useful properties – simmetry – average, mode and median coincide – we can know the probability or ocurrence of any value
  • 50. Frequency polygon 16 20 24 Frequency polygon 42 5 1 0011 0010 1010 1101 0001 0100 1011 0 4 8 12 4,80 6,30 7,80 9,31 10,81 12,31 13,81 15,31 16,81 % de Sucrose Histogram
  • 51. Normal distribution 2,0% 2,5% 3,0% 60% 70% 80% 90% 100% 42 5 1 0011 0010 1010 1101 0001 0100 1011 0,0% 0,5% 1,0% 1,5% 0 20 40 60 80 100 0% 10% 20% 30% 40% 50%
  • 52. Standard deviations 42 5 1 0011 0010 1010 1101 0001 0100 1011
  • 53. Normal distribution and scales 42 5 1 0011 0010 1010 1101 0001 0100 1011
  • 54. Different normal distributions Differences in position Differences in dispersion 42 5 1 0011 0010 1010 1101 0001 0100 1011
  • 55. Normal deviations 42 5 1 0011 0010 1010 1101 0001 0100 1011
  • 56. Normal distribution table 42 5 1 0011 0010 1010 1101 0001 0100 1011
  • 57. • Student'sStudent'sStudent'sStudent's tttt----distributiondistributiondistributiondistribution (or simply the tttt---- distributiondistributiondistributiondistribution) is a continuous probability distribution that arises in the problem of t-Student’s distribution 42 5 1 0011 0010 1010 1101 0001 0100 1011 distribution that arises in the problem of estimating the mean of a normally distributed population when the sample size is small
  • 58. t distribution df=1 df=30 42 5 1 0011 0010 1010 1101 0001 0100 1011
  • 60. Chi-square distribution 42 5 1 0011 0010 1010 1101 0001 0100 1011
  • 61. Finite vs. infinite • The observation of reality is always finite – 20 years vs. all years…. • The data I have are a sample of all data 42 5 1 0011 0010 1010 1101 0001 0100 1011 • The data I have are a sample of all data possible about this subject • Frequency distribution is only an approximation (estimate) of the true distribution
  • 62. Sample dimensions • The larger the sample size, the closer the frequency distribution is to the “theoretical distribution”. • When sample size tends to infinity, the 42 5 1 0011 0010 1010 1101 0001 0100 1011 • When sample size tends to infinity, the distribution tends to be well represented by the normal distribution
  • 63. Finite samples • Is the distribution normal? • Chi- Square test χ2 χ2 a quocient of variances 42 5 1 0011 0010 1010 1101 0001 0100 1011 χ2 a quocient of variances
  • 64. Hipothesis testing • Chi-square test as well as other statistic tests (t-Student, Fischer’s F, etc.) are tests that use the instruments of 42 5 1 0011 0010 1010 1101 0001 0100 1011 are tests that use the instruments of classical logic • Null hypothesis: Ho (no dif.) • Alternative hypothesis: H1 (sign. dif.)
  • 65. Null Hypothesis • Represents the attitude of observer’s independence, i. e., the real attitude that accepts reality as given, as data, as 42 5 1 0011 0010 1010 1101 0001 0100 1011 accepts reality as given, as data, as opposed to manipulating it as a result of a prejudice – the idea we make of it.
  • 66. Probability and significance • When the test value is larger than a table value for the same degrees of freedom and a chosen probability level (the power of the 42 5 1 0011 0010 1010 1101 0001 0100 1011 a chosen probability level (the power of the test) the null hypothesis is refused.
  • 67. type I error • In 100 cases, I fail 5 times if the power of the test is 95% (5% significance) 42 5 1 0011 0010 1010 1101 0001 0100 1011
  • 68. Analysys of variance involves • Partition of the sum of squares by origins of variation • Estimation of variance for each origin of 42 5 1 0011 0010 1010 1101 0001 0100 1011 • Estimation of variance for each origin of variation • Comparison of variances by F tests
  • 69. ANOVA VarVarVarVar.... ReplicationsReplicationsReplicationsReplications Y1.Y1.Y1.Y1. Y1Y1Y1Y1.(.(.(.(averaveraveraver)))) Yields of 2 wheat varieties from plots to which the varieties (A and B) where randomly assigned (values in 100 kg) 42 5 1 0011 0010 1010 1101 0001 0100 1011 VarVarVarVar.... ReplicationsReplicationsReplicationsReplications Y1.Y1.Y1.Y1. Y1Y1Y1Y1.(.(.(.(averaveraveraver)))) A 19 14 15 17 20 85 17 Y1. (aver.) B 23 19 19 21 18 100 20 Y2. (aver.) 100 kg is an old unit of mass: quintal or centner in English, quintal in French. It is equivalente in the pound system to the unit hundredweight http://en.wikipedia.org/wiki/Quintal
  • 70. ANOVA – Analysis of variance OriginOriginOriginOrigin ofofofof variationvariationvariationvariation DegreesDegreesDegreesDegrees ofofofof freedomfreedomfreedomfreedom SumSumSumSum ofofofof SquaresSquaresSquaresSquares MeanMeanMeanMean SumSumSumSum ofofofof SquaresSquaresSquaresSquares Total kr-1 SS MS 42 5 1 0011 0010 1010 1101 0001 0100 1011 Total kr-1 SS MS Treatments k-1 SST MST Within treatments (Experimental Error) k(r-1) SSError MSE
  • 71. ANOVA (Wheat yield varieties) • Step 1 – Outline the ANOVA table and list the sources of variation and degrees of freedom • Two sources of variation: 42 5 1 0011 0010 1010 1101 0001 0100 1011 • Between treatments (Varieties) • Within treatments (replications)
  • 72. • Anova table for the wheat example OriginOriginOriginOrigin ofofofof variationvariationvariationvariation DegreesDegreesDegreesDegrees ofofofof freedomfreedomfreedomfreedom SumSumSumSum ofofofof SquaresSquaresSquaresSquares MeanMeanMeanMean SquaresSquaresSquaresSquares Total kr-1 (2x5-1) 9 SST MST ANOVA (wheat example) (cont.) 42 5 1 0011 0010 1010 1101 0001 0100 1011 Total kr-1 (2x5-1) 9 SST MST Treatments k-1 (2-1)1 SStreatments MSTreatments Within treatments (Experimental Error) k(r-1) (2x(5-1)) 8 SSError MSE
  • 73. ANOVA (wheat example) (cont.) • Step 2 – Calculate the total sum of squares • SS = Σ (Yij – overall mean)2 64,5 • Step 3 – Calculate the sum of squares for treatments • SST = Σ (Yi. – overall mean)2 4,5 • Step 4 – Calculate the sum of squares for error 42 5 1 0011 0010 1010 1101 0001 0100 1011 • Step 4 – Calculate the sum of squares for error • SSE= SS-SST 64,5 – 4,5 = 60 • Step 5 – Calculate the mean squares • MST = SST/(k-1) 4,5 MSE = SSE/k(r-1) 60 /8 = 7,5 • Step 6 – Calculate the F value • F = MST / MSE 4,5 / 7,5 = 0,6
  • 74. OriginOriginOriginOrigin ofofofof variationvariationvariationvariation dfdfdfdf SumSumSumSum ofofofof SquaresSquaresSquaresSquares MeanMeanMeanMean SquaresSquaresSquaresSquares FFFF valuesvaluesvaluesvalues Total 9 64,5 ANOVA (wheat example) (cont.) 42 5 1 0011 0010 1010 1101 0001 0100 1011 Total 9 64,5 Treatments 1 4,5 4,5 Within treatments (Experimental Error) 8 60,0 7,5 MST/MSE =F* =4,5/7,5 =0,6
  • 75. F- table 42 5 1 0011 0010 1010 1101 0001 0100 1011
  • 76. F-distribution (5;20) 42 5 1 0011 0010 1010 1101 0001 0100 1011
  • 77. Partitioning of the sum of squares SS = ∑ ∑ (Yij- Y..) 42 5 1 0011 0010 1010 1101 0001 0100 1011
  • 78. Glossary so far • Reality • Data • Sample • Chance – Randomness • Frequency • Frequency polygon • Distribution functions • Median • Mode • Deviation 42 5 1 0011 0010 1010 1101 0001 0100 1011 • Frequency – Relative – Absolute • Average • Mean • Frequency classes • Deviation • Variance • Standard deviation • Coefficient of variation • Hypothesis testing • Confidence intervals • ANOVA
  • 79. Research, scientific method and the experiment • Research – A systematic inquiry into a subject to discover new facts or principles. The procedure for 42 5 1 0011 0010 1010 1101 0001 0100 1011 new facts or principles. The procedure for research is generally known as the scientific method
  • 80. Scientific method 1. Formulation of an hypothesis 2. Planning an experiment to test the hypothesis 42 5 1 0011 0010 1010 1101 0001 0100 1011 hypothesis 3. Careful observation and collection of data from the experiment 4. Interpretation of the experimental results
  • 81. Characteristics of a well planned experiment • Simplicity • Degree of precision – Appropriate design and sufficient replication • Absence of systematic error 42 5 1 0011 0010 1010 1101 0001 0100 1011 • Absence of systematic error – No bias • Range of validity of conclusions – Replication on time and space • Calculation of the degree of uncertainty – Probability of obtaining the observed results by chance alone
  • 82. Steps in experimentation 1. Definition of the problem Clearly and concisely; if you can’t define there is little chance you can solve it 2. Statement of objectives 42 5 1 0011 0010 1010 1101 0001 0100 1011 2. Statement of objectives Write down in precise terms; hierarchy 3. Selection of treatments 4. Selection of experimental material Material used should be representative of the population
  • 83. Steps in experimentation 5. Selection of experimental design Parcimony – the simplest possible 6. Selection of the unit for observation and the number of replications 42 5 1 0011 0010 1010 1101 0001 0100 1011 the number of replications 7. Control of the “border effect” 8. Consideration of data to be collected 9. Outlinig statistical analysis and summarization of results Sources of variation in ANOVA What means to compare?
  • 84. Steps in experimentation 10. Conducting the experiment Procedures free from personal biases (fatigue, double- checking, careful note-taking) 11. Analysing data and interpreting results 42 5 1 0011 0010 1010 1101 0001 0100 1011 Dont’t jump into conclusions even if statistically significant 12. Preparation of a complete, readable and correct report of the research There is no such thing as a negative result
  • 85. The three R’s of experimentation I. Replicate 42 5 1 0011 0010 1010 1101 0001 0100 1011 II. Randomize III. Request help
  • 86. Linear correlation and regression • The idea – The more, the merrier – The bigger they are, the harder they fall – Easy come, easy go 42 5 1 0011 0010 1010 1101 0001 0100 1011 – Much haste, little speed – The best gifts come in small packages • 2 variables: dependent, independent • Direct or inverse correlation; • Measuring correlation: – correlation coefficient ( r )
  • 87. Regression • The amount of change in one variable associated with a unit change in the other variable 42 5 1 0011 0010 1010 1101 0001 0100 1011 variable • Correlation – refers to the fact that two variables are related and to the closeness of the relationship • Regression – refers to the nature of the relationship
  • 88. Regression examples • A penny saved is a penny earned • A bird in hand is worth two in the bush 42 5 1 0011 0010 1010 1101 0001 0100 1011 • A stitch in time saves nine • One picture is worth a thousand words
  • 89. Sayings in math terms IndependentIndependentIndependentIndependent varvarvarvar. X. X. X. X DependentDependentDependentDependent varvarvarvar. Y. Y. Y. Y RegressionRegressionRegressionRegression eqeqeqeq.... RegressionRegressionRegressionRegression coeffcoeffcoeffcoeff.... Pennies saved Pennies earned Y=X 1 Hand birds Bush birds Y=2X 2 Stitches in time Stitches saved Y=9X 9 42 5 1 0011 0010 1010 1101 0001 0100 1011 Stitches in time Stitches saved Y=9X 9 Pictures Words Y=1000X 1000 Y = mx + b
  • 90. Y = mx+b 42 5 1 0011 0010 1010 1101 0001 0100 1011
  • 91. y = 2,856x - 104908 R² = 0,9633 y = 0,972x - 35146 R² = 0,7851 1500 2000 2500 3000 3500 4000 4500 5000 5500 6000 Regression in Excel 42 5 1 0011 0010 1010 1101 0001 0100 1011 y = -1,884x + 69762 R² = 0,9846 -3500 -3000 -2500 -2000 -1500 -1000 -500 0 500 1000 Abr-01 Ago-01 Dez-01 Abr-02 Ago-02 Dez-02 Abr-03 Ago-03 Dez-03 Abr-04 Jul-04 Nov-04 Mar-05 Jul-05 Nov-05 Mar-06 Inscrições Desistências e inactivações Membros
  • 92. y = 0,246x - 9540,3 R² = 0,1352800 1000 1200 42 5 1 0011 0010 1010 1101 0001 0100 1011 0 200 400 600 14-Nov 22-Fev 1-Jun 9-Set 18-Dez 28-Mar 6-Jul 14-Out 22-Jan 2-Mai 10-Ago
  • 93. Linear regression and Excel • On-line tutorial – http://phoenix.phys.clemson.edu/tutorials/exc el/regression.html 42 5 1 0011 0010 1010 1101 0001 0100 1011 el/regression.html