3. Bernard de Clairvaux (1090-1153)
• There are five stimulii that push man towards Science:
– There are men that want to know for the simple pleasure of
knowing
• It is low curiosity
42
5
1
0011 0010 1010 1101 0001 0100 1011
• It is low curiosity
– There are other that want to know to be known:
• It is vanity
– Others want to possess science in order to sell it and make profit
and get honours
• It is a selfish motivation
– But there are some who want to know in order to edify
– and this is charity
• Others to be edified
– and this is wisdom
4. What is all about?
It is a matter of knowing
42
5
1
0011 0010 1010 1101 0001 0100 1011
20. Environment / reality
• Scenary where the activity takes place
• The environment is a reality
– that is external to the activity, but has a marked
42
5
1
0011 0010 1010 1101 0001 0100 1011
– that is external to the activity, but has a marked
effect on it
– that can only be partially modified and in a limited
manner
21. Environmental characterization
• This reality, strange to the will or action of
man is given, is there, it is not made by
him.
42
5
1
0011 0010 1010 1101 0001 0100 1011
him.
• Environmental characterization
deals with knowing reality as such, as it is
given.
– Deals with knowing its characteristics
(most remarkable details)
22. Observation
• One gets to know reality (environment) by
observation
• Observation must prevent the observer’s
42
5
1
0011 0010 1010 1101 0001 0100 1011
• Observation must prevent the observer’s
influence as well as the influence of
observation tools, otherwise, what is being
observed differs from what is given.
23. Quantitative observation
• The characteristics of reality that matter to
our purpose are known by measurement
(observation) of physical dimensions or
42
5
1
0011 0010 1010 1101 0001 0100 1011
(observation) of physical dimensions or
quantities
• Observation error / measurement error
24. Variability
• Measuring the same characeristic results
in diferente values as a function of:
– observer (observation error)
42
5
1
0011 0010 1010 1101 0001 0100 1011
– observer (observation error)
– measurement tool (instrumental error)
– location (space and time) of the observation
• Environmental variability
25. Example: Soil characterization
• Soil physical characteristics have a variability
that is mainly spatial
– Visible in soil maps
– Giving meaning to
Precision Agriculture
42
5
1
0011 0010 1010 1101 0001 0100 1011
Precision Agriculture
• Time variability
(%H2O, %OM,…)
26. Climate characterization I
• Aerial environment characterization
departs from observations that are, by
nature, instantaneous
42
5
1
0011 0010 1010 1101 0001 0100 1011
• Weather characteristics (temperature,
atmospheric pressure, radiation,…) are in
each instant the focus of observation
27. Climate characterization II
• The nature of the factors that determine
weather
(radiation, general atmosphere circulation and rainfall mechanics)
introduces a large temporal variability that
42
5
1
0011 0010 1010 1101 0001 0100 1011
introduces a large temporal variability that
adds up to spatial variability
28. Climate characterization III
• Note that,
– soil characterization is attained by a set of
observations (in different locations and along
soil profile) that are not repeated in time
42
5
1
0011 0010 1010 1101 0001 0100 1011
– weather characterization demands observation
time series that incorporate time variation
29. Climate
• Results from the aggregation of a series of
observations of instantaneous weather
measurements
– for example, daily average temperature results
42
5
1
0011 0010 1010 1101 0001 0100 1011
– for example, daily average temperature results
from the arithmetic average between maximum
and minimun daily temperatures - 2
instantaneous observations used as estimates
of the daily “thermal climate”
30. Climate
• The concept we call climateclimate
results from a greater aggregation of
climate data already aggregated
,
42
5
1
0011 0010 1010 1101 0001 0100 1011
(averages and arithmetic sums over monthly periods of observation),
integrated in indices that allow the
differentiation of spatial and geographical
units
– Climate classification
31. Climate
• Each CLIMATE correspondes to a set of
climatological normals.
– Instantaneous observations
– Daily sums or averages
42
5
1
0011 0010 1010 1101 0001 0100 1011
– Daily sums or averages
– Monthly sums or averages
– Averages of monthly averages or averages of monthly
sums for a standard period (30 years)
32. Annual rainfall variability
625
700
775
850
925
30 year average =
617 mm
Annual rainfall variability
Mora, Portugal
30 years is a long time,but
not long enough to find a
pattern
42
5
1
0011 0010 1010 1101 0001 0100 1011
325
400
475
550
1955 1960 1965 1970 1975 1980 1985
617 mm
33. Climate normality
• The successive data agreggation that leads to
what one may call a “normal climate”s useful to
several purposes, but has the cost of cancelling
the natural variability of instantaneous
observations
42
5
1
0011 0010 1010 1101 0001 0100 1011
observations
• As a consequence, “the normal climate” is not
data anymore, but rather some sort of data
manipulation.
– Therefore, it is only by coincidence that ond can run
into a “normal year”
34. Uncertainty
• Variability, and in a more evident and
sensible fashion, time variability, illustrates
the question of uncertainty in the
knowledge of reality
42
5
1
0011 0010 1010 1101 0001 0100 1011
• In any case, it is with this “uncertain”
knowledge that we depart to make
decisions
• By the way, one needs to decide, to make a
decision, when he/she is not sure about
the outcome
35. Uncertainty
• In many cases it is useful to know the degree of
uncertainty linked to a decision;
and some times one can use predictive models to
make predictions/forecasts
• As opposed to observations, predictions are not
42
5
1
0011 0010 1010 1101 0001 0100 1011
data: they did not happen…
• Assumption: prediction supposes that the pattern
that was verified in the past holds in the future (or
changes in a given hypothetical manner).
37. Deductive reasoning
• Given some general principle what happens
in a specific set of conditions:
– Given the formula for the area of a circle, what is the area of a circle
whose raius is 5?
42
5
1
0011 0010 1010 1101 0001 0100 1011
whose raius is 5?
– Given a key and description of herbaceous species in Southern France, to
what species does a certain plant belong?
– Give a coin whose probability of coming up heads when tossed is ½, what
will happen when the coin is tossed 10 times?
38. Inductive reasoning
• Given some specific cases, arrive to some
general principles that will apply to all:
– Given the areas and radii of several circles, what general formula can we
give to express the relation between the areas and the radii?
42
5
1
0011 0010 1010 1101 0001 0100 1011
give to express the relation between the areas and the radii?
– Given several specimens of an undescribed weed species, how would you
describe the species as a whole and express its relation to other species
in a key??
– Given the results of tossing a coin10 times what conclusions can we draw
regarding the bias or lack of bias of the coin?
39. Prediction / induction
• What happened 1, 2, …, k times can be
generalized in the next …. - future … (until
n) times
42
5
1
0011 0010 1010 1101 0001 0100 1011
n) times
• These implies a statistical description of
“what happened”
40. Three questions
• What is a sample?
• What is the meaning of random?
42
5
1
0011 0010 1010 1101 0001 0100 1011
• What is the meaning of random?
• What is a variance?
42. Statistical data analysys
• A data series
(for ex. average February temperature for the years 1960-90)
might be sinthetically described as
– Measures of central tendency
• Average, median, mode
42
5
1
0011 0010 1010 1101 0001 0100 1011
• Average, median, mode
– Measures of dispersion
the way values are distributed around a central tendency
• variance
• amplitude
48. Frequency distribution
• Maximum and minimum values
• Amplitude
• Number of classes:
42
5
1
0011 0010 1010 1101 0001 0100 1011
• Number of classes:
Sturges’ rule: k= 1 + 3.3 log N
• Class interval:
amplitude/k
49. Distributions
• The way a series of data is distributed as a
function of its relative frequency (frequency curve
or polygon)
• Normal distribution as interesting and useful
42
5
1
0011 0010 1010 1101 0001 0100 1011
• Normal distribution as interesting and useful
properties
– simmetry
– average, mode and median coincide
– we can know the probability or ocurrence of any value
57. • Student'sStudent'sStudent'sStudent's tttt----distributiondistributiondistributiondistribution (or simply the tttt----
distributiondistributiondistributiondistribution) is a continuous probability
distribution that arises in the problem of
t-Student’s distribution
42
5
1
0011 0010 1010 1101 0001 0100 1011
distribution that arises in the problem of
estimating the mean of a normally
distributed population when the sample
size is small
61. Finite vs. infinite
• The observation of reality is always finite
– 20 years vs. all years….
• The data I have are a sample of all data
42
5
1
0011 0010 1010 1101 0001 0100 1011
• The data I have are a sample of all data
possible about this subject
• Frequency distribution is only an
approximation (estimate) of the true
distribution
62. Sample dimensions
• The larger the sample size, the closer the
frequency distribution is to the “theoretical
distribution”.
• When sample size tends to infinity, the
42
5
1
0011 0010 1010 1101 0001 0100 1011
• When sample size tends to infinity, the
distribution tends to be well represented by the
normal distribution
63. Finite samples
• Is the distribution normal?
• Chi- Square test χ2
χ2 a quocient of variances
42
5
1
0011 0010 1010 1101 0001 0100 1011
χ2 a quocient of variances
64. Hipothesis testing
• Chi-square test as well as other statistic
tests
(t-Student, Fischer’s F, etc.)
are tests that use the instruments of
42
5
1
0011 0010 1010 1101 0001 0100 1011
are tests that use the instruments of
classical logic
• Null hypothesis: Ho (no dif.)
• Alternative hypothesis: H1 (sign. dif.)
65. Null Hypothesis
• Represents the attitude of observer’s
independence, i. e., the real attitude that
accepts reality as given, as data, as
42
5
1
0011 0010 1010 1101 0001 0100 1011
accepts reality as given, as data, as
opposed to manipulating it as a result of a
prejudice – the idea we make of it.
66. Probability and significance
• When the test value is larger than a table
value for the same degrees of freedom and
a chosen probability level (the power of the
42
5
1
0011 0010 1010 1101 0001 0100 1011
a chosen probability level (the power of the
test) the null hypothesis is refused.
67. type I error
• In 100 cases, I fail 5 times if the power of
the test is 95% (5% significance)
42
5
1
0011 0010 1010 1101 0001 0100 1011
68. Analysys of variance involves
• Partition of the sum of squares by origins
of variation
• Estimation of variance for each origin of
42
5
1
0011 0010 1010 1101 0001 0100 1011
• Estimation of variance for each origin of
variation
• Comparison of variances by F tests
69. ANOVA
VarVarVarVar.... ReplicationsReplicationsReplicationsReplications Y1.Y1.Y1.Y1. Y1Y1Y1Y1.(.(.(.(averaveraveraver))))
Yields of 2 wheat varieties from plots to which the
varieties (A and B) where randomly assigned (values in 100 kg)
42
5
1
0011 0010 1010 1101 0001 0100 1011
VarVarVarVar.... ReplicationsReplicationsReplicationsReplications Y1.Y1.Y1.Y1. Y1Y1Y1Y1.(.(.(.(averaveraveraver))))
A 19 14 15 17 20 85 17 Y1. (aver.)
B 23 19 19 21 18 100 20 Y2. (aver.)
100 kg is an old unit of mass: quintal or centner in English, quintal in French.
It is equivalente in the pound system to the unit hundredweight
http://en.wikipedia.org/wiki/Quintal
70. ANOVA – Analysis of variance
OriginOriginOriginOrigin ofofofof variationvariationvariationvariation DegreesDegreesDegreesDegrees ofofofof
freedomfreedomfreedomfreedom
SumSumSumSum ofofofof SquaresSquaresSquaresSquares MeanMeanMeanMean SumSumSumSum ofofofof
SquaresSquaresSquaresSquares
Total kr-1 SS MS
42
5
1
0011 0010 1010 1101 0001 0100 1011
Total kr-1 SS MS
Treatments k-1 SST MST
Within treatments
(Experimental Error)
k(r-1) SSError MSE
71. ANOVA (Wheat yield varieties)
• Step 1 – Outline the ANOVA table and list the sources of
variation and degrees of freedom
• Two sources of variation:
42
5
1
0011 0010 1010 1101 0001 0100 1011
• Between treatments (Varieties)
• Within treatments (replications)
72. • Anova table for the wheat example
OriginOriginOriginOrigin ofofofof variationvariationvariationvariation DegreesDegreesDegreesDegrees ofofofof
freedomfreedomfreedomfreedom
SumSumSumSum ofofofof SquaresSquaresSquaresSquares MeanMeanMeanMean SquaresSquaresSquaresSquares
Total kr-1 (2x5-1) 9 SST MST
ANOVA (wheat example) (cont.)
42
5
1
0011 0010 1010 1101 0001 0100 1011
Total kr-1 (2x5-1) 9 SST MST
Treatments k-1 (2-1)1 SStreatments MSTreatments
Within treatments
(Experimental Error)
k(r-1) (2x(5-1)) 8 SSError MSE
73. ANOVA (wheat example) (cont.)
• Step 2 – Calculate the total sum of squares
• SS = Σ (Yij – overall mean)2 64,5
• Step 3 – Calculate the sum of squares for treatments
• SST = Σ (Yi. – overall mean)2 4,5
• Step 4 – Calculate the sum of squares for error
42
5
1
0011 0010 1010 1101 0001 0100 1011
• Step 4 – Calculate the sum of squares for error
• SSE= SS-SST 64,5 – 4,5 = 60
• Step 5 – Calculate the mean squares
• MST = SST/(k-1) 4,5 MSE = SSE/k(r-1) 60 /8 = 7,5
• Step 6 – Calculate the F value
• F = MST / MSE 4,5 / 7,5 = 0,6
77. Partitioning of the sum of squares
SS = ∑ ∑ (Yij- Y..)
42
5
1
0011 0010 1010 1101 0001 0100 1011
78. Glossary so far
• Reality
• Data
• Sample
• Chance
– Randomness
• Frequency
• Frequency polygon
• Distribution functions
• Median
• Mode
• Deviation
42
5
1
0011 0010 1010 1101 0001 0100 1011
• Frequency
– Relative
– Absolute
• Average
• Mean
• Frequency classes
• Deviation
• Variance
• Standard deviation
• Coefficient of variation
• Hypothesis testing
• Confidence intervals
• ANOVA
79. Research, scientific method and
the experiment
• Research
– A systematic inquiry into a subject to discover
new facts or principles. The procedure for
42
5
1
0011 0010 1010 1101 0001 0100 1011
new facts or principles. The procedure for
research is generally known as the scientific
method
80. Scientific method
1. Formulation of an hypothesis
2. Planning an experiment to test the
hypothesis
42
5
1
0011 0010 1010 1101 0001 0100 1011
hypothesis
3. Careful observation and collection of data
from the experiment
4. Interpretation of the experimental results
81. Characteristics of a well planned experiment
• Simplicity
• Degree of precision
– Appropriate design and sufficient replication
• Absence of systematic error
42
5
1
0011 0010 1010 1101 0001 0100 1011
• Absence of systematic error
– No bias
• Range of validity of conclusions
– Replication on time and space
• Calculation of the degree of uncertainty
– Probability of obtaining the observed results by chance
alone
82. Steps in experimentation
1. Definition of the problem
Clearly and concisely;
if you can’t define there is little chance you can solve it
2. Statement of objectives
42
5
1
0011 0010 1010 1101 0001 0100 1011
2. Statement of objectives
Write down in precise terms; hierarchy
3. Selection of treatments
4. Selection of experimental material
Material used should be representative of the population
83. Steps in experimentation
5. Selection of experimental design
Parcimony – the simplest possible
6. Selection of the unit for observation and
the number of replications
42
5
1
0011 0010 1010 1101 0001 0100 1011
the number of replications
7. Control of the “border effect”
8. Consideration of data to be collected
9. Outlinig statistical analysis and
summarization of results
Sources of variation in ANOVA
What means to compare?
84. Steps in experimentation
10. Conducting the experiment
Procedures free from personal biases (fatigue, double-
checking, careful note-taking)
11. Analysing data and interpreting results
42
5
1
0011 0010 1010 1101 0001 0100 1011
Dont’t jump into conclusions even if statistically
significant
12. Preparation of a complete, readable and
correct report of the research
There is no such thing as a negative result
85. The three R’s of experimentation
I. Replicate
42
5
1
0011 0010 1010 1101 0001 0100 1011
II. Randomize
III. Request help
86. Linear correlation and regression
• The idea
– The more, the merrier
– The bigger they are, the harder they fall
– Easy come, easy go
42
5
1
0011 0010 1010 1101 0001 0100 1011
– Much haste, little speed
– The best gifts come in small packages
• 2 variables: dependent, independent
• Direct or inverse correlation;
• Measuring correlation:
– correlation coefficient ( r )
87. Regression
• The amount of change in one variable
associated with a unit change in the other
variable
42
5
1
0011 0010 1010 1101 0001 0100 1011
variable
• Correlation – refers to the fact that two
variables are related and to the closeness
of the relationship
• Regression – refers to the nature of the
relationship
88. Regression examples
• A penny saved is a penny earned
• A bird in hand is worth two in the bush
42
5
1
0011 0010 1010 1101 0001 0100 1011
• A stitch in time saves nine
• One picture is worth a thousand words
89. Sayings in math terms
IndependentIndependentIndependentIndependent varvarvarvar. X. X. X. X DependentDependentDependentDependent varvarvarvar. Y. Y. Y. Y RegressionRegressionRegressionRegression eqeqeqeq.... RegressionRegressionRegressionRegression coeffcoeffcoeffcoeff....
Pennies saved Pennies earned Y=X 1
Hand birds Bush birds Y=2X 2
Stitches in time Stitches saved Y=9X 9
42
5
1
0011 0010 1010 1101 0001 0100 1011
Stitches in time Stitches saved Y=9X 9
Pictures Words Y=1000X 1000
Y = mx + b