1.
Measurement and Representation of
Hydrological Quantities
Leonardo da Vinci - Vitruvian Man, ca 1487
photo by Luc Viatour, www.lucnix.be
Riccardo Rigon
Sunday, September 12, 2010
2.
Measurement and Representation of Hydrological Quantities
Objectives:
•In these pages the spatio-temporal variability of measurements of
hydrological quantities is discussed by means of examples.
•One deduces that statistical instruments must be used to describe these
quantities.
2
Riccardo Rigon
Sunday, September 12, 2010
3.
Measurement and Representation of Hydrological Quantities
Frickenhausen, on the River Meno
Hydrometric Height
3
Riccardo Rigon
Sunday, September 12, 2010
4.
Measurement and Representation of Hydrological Quantities
Frickenhausen, on the River Meno
Hydrometric Height
4
Riccardo Rigon
Sunday, September 12, 2010
5.
Measurement and Representation of Hydrological Quantities
Hydrological Data have Complex Trends 1/2
The hydrological cycles is controlled by innumerable factors: hence it depends
on innumerable degrees of freedom. Only a small portion of these factors can be
taken into consideration, while the remaining part needs to be modelled as a
boundary condition or as “background noise” (this noise is either modelled or
eliminated with statistical instruments).
The dynamics of the hydrological cycle are non-linear. Both the hydrodynamics
and the thermodynamics of the processes, that involve numerous phase
changes, are non-linear. Another non-linear characteristic is that many of these
processes are activated in function of some regulating quantity surpassing a
threshold value. For example, the condensation of water vapour into raindrops
is triggered when air humidity exceeds saturation; landslides are triggered when
the internal friction forces of the material are overcome by the thrust of water
within the capillarities of the soil; the channels of a hydrographic network begin
to form when running water reaches a certain value of force per unit area.
5
Riccardo Rigon
Sunday, September 12, 2010
6.
Measurement and Representation of Hydrological Quantities
Hydrological Data have Complex Trends 2/2
The dynamics include processes which are linearly unstable: for example the
baroclinic instability the drives meteorological processes at the middle
latitudes.
The dynamics of climate and hydrology are dissipative. That is to say they
transfer and transform mechanical energy into thermal energy. The
hydrodynamic process of turbulence transports energy from the larger
spatial scales to the smaller ones, where the energy is dissipated through
friction. Wave phenomena of various kind (e.g. gravity waves) transport the
energy contained in water and in air.
6
Riccardo Rigon
Sunday, September 12, 2010
7.
Measurement and Representation of Hydrological Quantities
Some Typical Problems
precipitation
7
Riccardo Rigon
Sunday, September 12, 2010
8.
Measurement and Representation of Hydrological Quantities
Some Typical Problems
incident solar radiation
8
Riccardo Rigon
Sunday, September 12, 2010
9.
Measurement and Representation of Hydrological Quantities
Some Typical Problems
Flow of the River Adige at San Lorenzo Bridge
1400
1200
1000
Portate m^3/s
800
600
400
200
0
1990 1995 2000 2005
Anno 9
Riccardo Rigon
Sunday, September 12, 2010
10.
Measurement and Representation of Hydrological Quantities
Some Typical Problems
Distribution of monthly river flows in Trento
10
Riccardo Rigon
Sunday, September 12, 2010
11.
Measurement and Representation of Hydrological Quantities
Some Typical Problems
Annual water budget for the Lake of Serraia catchment
Grafico bilancio annuo del bacino (2000)
P - precipitazione ET - evapotraspirazione Inv - volume invasato (accumulo) R - rilascio
1
0,9 0,8675
0,797
0,8
0,7
0,6
0,5
Valore (mc/s)
0,4
0,343
0,3
0,2
0,1
0
-0,1
-0,2 -0,184
-0,3
gen-00 feb-00 mar-00 apr-00 mag-00 giu-00 lug-00 ago-00 set-00 ott-00 nov-00 dic-00
Tempo (mese- anno)
11
Riccardo Rigon
Sunday, September 12, 2010
12.
Measurement and Representation of Hydrological Quantities
Some Typical Problems
Water content of the soil in the Little Washita catchment (Oklahoma)
12
Riccardo Rigon
Sunday, September 12, 2010
13.
Measurement and Representation of Hydrological Quantities
Some Typical Problems
Water content of the soil in the Little Washita catchment (Oklahoma)
13
Riccardo Rigon
Sunday, September 12, 2010
14.
Measurement and Representation of Hydrological Quantities
Some Typical Problems
Spatial distribution of preceipitation
14
Riccardo Rigon
Sunday, September 12, 2010
15.
Measurement and Representation of Hydrological Quantities
Some Typical Problems
Spatial pattern of the hydrographic network
15
Riccardo Rigon
Sunday, September 12, 2010
16.
Statistical Inference
and Descriptive Statistics
Lucio Fontana - Expectations (MoMA), 1959
Riccardo Rigon
Sunday, September 12, 2010
17.
Measurement and Representation of Hydrological Quantities
Objectives:
•In these pages the fundamental elements of statistical analysis will be
recalled.
•Population, sample and various elementary statistics, such as mean,
variance and covariance, will be defined.
•The existence of statistics and their value will be argued.
•The concept of random sampling will be introduced.
17
Riccardo Rigon
Sunday, September 12, 2010
18.
Statistics
Population and Sample
Statistical inference assumes that a dataset is representative of a subset of
cases, among all the possible cases, called the sample. All the possible
cases represent the population from which the dataset has been extracted.
While the sample is know, generally the population is not. Hypotheses are
implicitly made about the population.
18
Riccardo Rigon
Sunday, September 12, 2010
19.
Statistics
Exploratory Data Analysis
temporal representation - histogram
A set of n data constitutes, therefore, a sample of data.
a) Bergen:Sep temperature
15
14
Temperature (oC)
13
12
11
10
9
8
1860 1880 1900 1920 1940 1960 1980 2000
time
b) Bergen:Sep temperature distribution (1861!1997)
30
25
20
Frequency
15
10
5
0
5 6 7 8 9 10 11 12 13 14 15
Temperature (oC)
These data can be represented in various forms. Each representation
form emphasises certain characteristics. 19
Riccardo Rigon
Sunday, September 12, 2010
20.
Statistics
Sample Means
Given a sample, various statistics can be calculated. For example:
n
1
x :=
¯ x,t Temporal Mean
n t=1
n
1
x := xi Spatial Mean
n i=1
The mean is an indicator of position
20
Riccardo Rigon
Sunday, September 12, 2010
21.
Statistical Inference and Descriptive Statistics
Statistical Inference
Corrado Caudek
21
Riccardo Rigon
Sunday, September 12, 2010
22.
Statistical Inference and Descriptive Statistics
Statistical Inference
•Statistical inference is the process which allows one to formulate
conclusions with regards to a population on the basis of a sample of
observations extracted casually from the population.
Corrado Caudek
21
Riccardo Rigon
Sunday, September 12, 2010
23.
Statistical Inference and Descriptive Statistics
Statistical Inference
•Statistical inference is the process which allows one to formulate
conclusions with regards to a population on the basis of a sample of
observations extracted casually from the population.
•Central to classic statistical inference is the notion of sample distribution,
that is to say how the statistics of the samples vary if casual samples, of the
same size n, are repeatedly extracted from the population.
Corrado Caudek
21
Riccardo Rigon
Sunday, September 12, 2010
24.
Statistical Inference and Descriptive Statistics
Statistical Inference
•Statistical inference is the process which allows one to formulate
conclusions with regards to a population on the basis of a sample of
observations extracted casually from the population.
•Central to classic statistical inference is the notion of sample distribution,
that is to say how the statistics of the samples vary if casual samples, of the
same size n, are repeatedly extracted from the population.
•Even though, in each practical application of statistical inference, the
researcher only has one n-sized casual sample, the possibility that the
sampling can be repeated furnishes the conceptual foundation for deciding
Corrado Caudek
how informative the observed sample is of the population in its entirety.
21
Riccardo Rigon
Sunday, September 12, 2010
25.
Statistics
Exploratory Data Analysis
The mean is not the only indicator of position
Mode
22
Riccardo Rigon
Sunday, September 12, 2010
26.
Statistics
Median and Mode
The mode represents the most frequent value.
If the histogram distinctly presents various maximums, though the matter
risks being controverial, the dataset is said to be multimodal.
The median represents the value for which 50% of the data has an inferior
value and (obviously!) the other 50% has a greater value.
23
Riccardo Rigon
Sunday, September 12, 2010
27.
Statistics
Empirical Distribution Function
Given the dataset
hi = {h1 , · · ·, hn }
and having derived from this the ordered set in ascending order
ˆ ˆ ˆ ˆ ˆ ˆ
hj = (h1 , · · ·, hn ) h1 ≤ h2 ≤ · ≤ hn
the empirical cumulative distribution function is defined
i
ˆ 1
ECDFi (h) := j
n j=1
24
Riccardo Rigon
Sunday, September 12, 2010
28.
Statistics
ECDF
The empirical cumulative distribution function can be represented as illustrated.
The ordinate value identified by the curve is called the frequency of non-
exceedance or quantile.
Frequenza di non superamento
1.0 ●
●
●
●
●
●
●
●
●
●
0.8
●
●
●
●
●
●
●
●
●
0.6
●
●
P[Hh]
●
●
●
●
●
●
●
0.4
●
●
●
●
●
●
●
●
0.2
●
●
●
●
●
●
●
●
●
0.0
20 40 60 80
25
h[mm]
Riccardo Rigon
Sunday, September 12, 2010
29.
Statistics
ECDF
The 0.5 quantile separates the data distribution in half in relation to the ordinate.
Frequenza di non superamento
1.0 ●
●
●
●
●
●
●
●
●
●
0.8
●
●
●
●
●
●
●
●
●
0.6
●
●
P[Hh]
●
0.5 quantile ●
●
●
●
●
●
0.4
●
●
●
●
●
●
●
●
0.2
●
●
●
●
●
●
●
●
●
0.0
20 40 60 80
26
h[mm]
Riccardo Rigon
Sunday, September 12, 2010
30.
Statistics
ECDF
The 0.5 quantile separates the data distribution in half in relation to the ordinate.
Frequenza di non superamento
1.0 ●
●
●
●
●
●
●
●
●
●
0.8
●
●
●
●
●
●
●
●
●
0.6
●
●
P[Hh]
●
0.5 quantile ●
●
●
●
●
●
0.4
●
●
●
●
●
●
●
●
0.2
●
●
●
●
●
●
●
●
●
0.0
20 40 60 80
27
h[mm]
Riccardo Rigon
Sunday, September 12, 2010
31.
Statistics
ECDF
And so the median is identified
Frequenza di non superamento
1.0 ●
●
●
●
●
●
●
●
●
●
0.8
●
●
●
●
●
●
●
●
●
0.6
●
●
P[Hh]
●
0.5 quantile ●
●
●
●
●
●
0.4
●
●
●
●
●
●
●
●
0.2
●
●
●
●
●
●
●
●
●
0.0
median 20 40 60 80
28
h[mm]
Riccardo Rigon
Sunday, September 12, 2010
32.
Statistics
Box and Whisker Diagrams
The procedure can be generalised and represented with a box and whisker diagram.
Frequenza di non superamento
1.0
●
●
●
●
●
●
●
●
●
●
0.8
●
0.75 quantile
●
●
●
●
●
●
●
●
0.6
●
●
0.5 quantile P[Hh]
●
●
●
●
●
●
●
0.4 ●
●
0.25 quantile
●
●
●
●
●
●
0.2
●
●
●
●
●
●
●
●
●
0.0
20 40 60 80
h[mm]
“whisker”
29
The box and whisker diagram is another way of representing the data distribution.
Riccardo Rigon
Sunday, September 12, 2010
33.
Statistics
Parameters and Statistics
A parameter is a describes a certain aspect of the population.
• For example, the (real) mean annual precipitation at a weather station
is a parameter. Let us suppose that this mean is
µh = 980 mm
• In any concrete situation the parameters are unknown
Corrado Caudek
30
Riccardo Rigon
Sunday, September 12, 2010
34.
Statistics
Parameters and Statistics
A statistic is a number that can be calculated on the basis of data
given by a sample, without any knowledge of the parameters of the
population.
• Let us suppose, for example, that the casual sample of precipitation
data covers 30 years of measurement and that the mean annual
precipitation, on the basis of the sample, is
¯
h = 1002 mm
Corrado Caudek
• This mean is a statistic.
31
Riccardo Rigon
Sunday, September 12, 2010
35.
Statistics
Other Statistics: the Range
Rx := max(x) − min(x)
The range is the simplest indicator of data distribution. It is an indicator of the
scale of the data. However, it only considers two data and does not consider
the other n-2 data that make up the sample.
32
Riccardo Rigon
Sunday, September 12, 2010
36.
Statistics
Other Statistics: Variance and
Standard Deviation
n
1
V ar(x) := (xi − x)
¯
n i=1
n
1
σx := (xi − x)
¯
n i=1
The variance is an indicator of “scale” that considers all the data of the sample
33
Riccardo Rigon
Sunday, September 12, 2010
37.
Statistics
Other Statistics: Variance and
Standard Deviation
“corrected” version (unbiased)
n
1
V ar(x) := (xi − x)
¯
n−1 i=2
n
1
σx := (xi − x)
¯
n−1 i=1
The unbiased version of the variance takes into account that only n-1 data are
independent, their mean being fixed. 34
Riccardo Rigon
Sunday, September 12, 2010
38.
Statistics
Coefficient of Variation
• The coefficient of variation (CV) of a data sample is defined as the
ratio of between the standard deviation and the mean:
σx
CVx :=
x¯
• The greater the coefficient of variation, the less informative and
indicative the mean is in relation to the future trends of the
population.
35
Riccardo Rigon
Sunday, September 12, 2010
39.
Statistics
Other Statistics: Skewness and Kurtosis
n
3
1 ¯
xi − x
skx :=
i=1
n σx
Skewness is a measure of the asymmetry of the data distribution
n
4
1 ¯
xi − x
kx := 3 +
i=1
n σx
Kurtosis is a measure of the “peakedness” of the data distribution
36
Riccardo Rigon
Sunday, September 12, 2010
40.
Statistics
Estimation and Hypothesis Testing
Usually, we are not interested in the statistics for themselves, but in
what the statistics tell us about the population of interest.
• We could, for example, use the annual mean precipitation, measured
at all hydro-meteorological stations, to estimate the mean annual
precipitation for the Italian Peninsula.
• Or, we could use the mean of the sample to establish whether the
mean annual precipitation has mutated during the duration of the
sample.
37
Riccardo Rigon
Sunday, September 12, 2010
41.
Statistics
Estimation and Hypothesis Testing
These two questions belong to the two main schools of classical
statistical inference
• The estimation of parameters
• Statistical hypothesis testing
38
Riccardo Rigon
Sunday, September 12, 2010
42.
Statistics
Sample Variability
A fundamental aspect of sample statistics is that they vary from one
sample to the next. In the case of annual precipitation, it is very
improbable that the mean of the sample, of 1002mm, will coincide
with the mean of the population.
• The variability of a sample statistic from sample to sample is called
sample variability.
– When sample variability is very high, the sample is
misinformative in relation to the population parameter.
– When the sample variability is small, the statistic is informative,
even though it is practically impossible that the statistic of a
sample be exactly the same as the population parameter.
39
Riccardo Rigon
Sunday, September 12, 2010
43.
Statistical Inference and Descriptive Statistics
Sample Variability
Simulation
Sample variability will be illustrated as follows:
1. we will consider a discrete variable that can only assume a small
number of possible values (N = 4);
2. a list will be furnished listing all possible samples of size n = 2;
3. the mean will be calculated for each possible sample of size n = 2;
4. the distribution of means of the samples of size n = 2 will be
examined.
The mean μ and the variance σ of the population will be calculated. It
must be noted that μ and σ are parameters, while the mean xi and the
variance s2i of each sample are statistics.
Corrado Caudek
Techniques in Psychological Research and Data Analysis 8
40
Riccardo Rigon
Sunday, September 12, 2010
44.
Statistical Inference and Descriptive Statistics
Sample Variability
•The experiment in this example consists of the n=2 extractions with
return of a marble xi from an urn that contains N=4 marbles.
•The marbles are numbered as follows: {2, 3, 5, 9}
•Extraction with return of the marble corresponds to a population of
infinite size (it is in fact always possible to extract a ball from the urn)
Corrado Caudek
41
Riccardo Rigon
Sunday, September 12, 2010
45.
Statistical Inference and Descriptive Statistics
Sample Variability
•For each sample of size n=2 the mean of the value of the marbles
extracted is calculated:
2
xi
x=
¯
i=1
2
•For example, if the marbles extracted are x1=2 and x2=3, then:
2+3 5
x=
¯ = = 2.5
Corrado Caudek
2 2
42
Riccardo Rigon
Sunday, September 12, 2010
46.
Statistical Inference and Descriptive Statistics
Sample Variability
Three Distributions
We must distinguish between three distributions:
1. the population distribution
2. the distribution of a sample
3. the sample distribution of the means of all possible samples
Corrado Caudek
43
Riccardo Rigon
Sunday, September 12, 2010
47.
Statistical Inference and Descriptive Statistics
Sample Variability
๏ 1. The Population Distribution
The population distribution: the distribution of X (the value of the
marble extracted) in the population. In this specific case the population
is of infinite size and has the following probability distribution:
xi pi
2 1/4
3 1/4
5 1/4
9 1/4
Corrado Caudek
Total 1
44
Riccardo Rigon
Sunday, September 12, 2010
48.
Statistical Inference and Descriptive Statistics
Sample Variability
•The mean of the population is:
µ= xi pi = 4.75
•The variance of the population is:
σ =2
(xi − µ) pi = 7.1875
2
Corrado Caudek
45
Riccardo Rigon
Sunday, September 12, 2010
49.
Statistical Inference and Descriptive Statistics
Sample Variability
๏ 2. The Distribution of a Sample
The distribution of a sample: the distribution of X in a specific sample.
• If, for example, the x1 = 2 and x2 = 3, then the mean of this
sample is x = 2.5 and the variance is s2 = 0.5
¯
Corrado Caudek
46
Riccardo Rigon
Sunday, September 12, 2010
50.
Statistical Inference and Descriptive Statistics
Sample Variability
๏ 3. The Sample Distribution of a the Means
The sample distribution of a the means: the distribution of the means
of all the possible samples.
• If the size of the samples is n=2, then there are 4X4=16 possible
samples. We can therefore list their means.
sample mean xi
¯ sample mean xi
¯
{3, 2} 2.5 {2, 3} 2.5
{5, 2} 3.5 {2, 5} 3.5
{9, 2} 5.5 {2, 9} 5.5
{5, 3} 4.0 {3, 5} 4.0
Corrado Caudek
{9, 3} 6.0 {3, 9} 6.0
{9, 5} 7.0 {9, 5} 7.0
{2, 2} 2.0 {3, 3} 3.0
{5, 5} 5.0 {9, 9} 9.0 47
Riccardo Rigon
Sunday, September 12, 2010
51.
Statistical Inference and Descriptive Statistics
Sample Variability
•The sample distribution of the means has the following probability
distribution:
¯
xi pi
2.0 1/16
2.5 2/16
3.0 1/16
3.5 2/16
4.0 2/16
5.0 1/16
5.5 2/16
Corrado Caudek
6.0 2/16
7.0 2/16
9.0 1/16
Total 1
48
Riccardo Rigon
Sunday, September 12, 2010
52.
Statistical Inference and Descriptive Statistics
Sample Variability
•The mean of the sample distribution of the means is:
µx =
¯ xi pi = 4.75
¯
•The variance of the population is:
2
σx
¯ = (¯i − µx ) pi = 3.59375
x ¯
2
Corrado Caudek
49
Riccardo Rigon
Sunday, September 12, 2010
53.
Statistical Inference and Descriptive Statistics
Sample Variability
! The example we have seen is very particular insomuch that the
population is known. In practice the population distribution is never
known.
However, we can take note of two important properties of the sample
distribution of the means:
•The mean of the sample distribution of means µx is the same as the
¯
population mean µ
2
•The variance of the sample distribution of means ¯σx is the equal to
2
the ratio of the variance of the population σ to the numerosity n of
Corrado Caudek
the sample:
σ2 7.1875
σx =
2
= = 3.59375
¯
n 2 50
Riccardo Rigon
Sunday, September 12, 2010
54.
Statistical Inference and Descriptive Statistics
Sample Variability
The two things to note can be summarised as follows:
•The mean and variance of the sample distribution of means are
determined by the mean and variance of the population:
σ2
µx = µ
¯ σx =
2
¯
n
•The variance of the sample distribution of the means is smaller than
the variance of the population.
Corrado Caudek
51
Riccardo Rigon
Sunday, September 12, 2010
55.
Statistical Inference and Descriptive Statistics
Sample Variability
To follow, we will use the properties of the sample distribution to
make inferences about the parameters of the population even when
the population distribution is not known.
Corrado Caudek
52
Riccardo Rigon
Sunday, September 12, 2010
56.
Statistical Inference and Descriptive Statistics
Sample Variability
Three Distributions
Therefore, we have distinguished between three distributions:
1. the population distribution
Ω = {2, 3, 5, 9}, µ = 4.75, σ 2 = 7.1875
2. the distribution of a sample
Ωi = {2, 3}, x = 2.5, s = 0.5
¯ 2
3. the sample distribution of the means of all possible samples
Corrado Caudek
Ωx = {2.0, 2.5, 3.0, 3.5, 4.0, 5.0, 5.5, 6.0, 7.0, 9.0},
¯
µx =
¯ 4.75, σx
2
¯ = 3.59375
53
Riccardo Rigon
Sunday, September 12, 2010
57.
Statistical Inference and Descriptive Statistics
Sample Variability
The population distribution: this is the distribution that contains all
possible observations. The mean and variance of this distribution are
indicated with μ and σ2.
1. The distribution of a sample: this is the distribution of the values of
the population that make up a particular casual sample of size n. The
single values are indicated x1,.... xn, and the mean and variance are
¯
indicated x and s2.
2. The sample distribution of the means of the samples: this is the
¯
distribution of the xi for al the possible samples of size n that can be
extracted from the population being considered. The mean and variance
of the sample distribution of means are indicated by µx and σ 2 .
Corrado Caudek
¯ x
¯
54
Riccardo Rigon
Sunday, September 12, 2010
58.
Statistical Inference and Descriptive Statistics
Sample Variability
The distribution that is the basis of statistical inference is the sample
distribution.
Definition: the sample distribution of a statistic is the distribution of
values that the specific statistic assumes for all samples of size n that
can be extracted from the population.
It must be noted that if the simulation considers less samples than all
those theoretically possible than the resulting distribution will only be
an approximation of the real sample distribution.
Corrado Caudek
55
Riccardo Rigon
Sunday, September 12, 2010
59.
Statistical Inference and Descriptive Statistics
Estimation and Hypothesis Testing
Having created different statistics, we can now make some hypotheses. For
example:
• Do the samples all have the same mean and the same variance?
• Does the mean depend on the numerosity of the sample?
• Does the variance depend on the numerosity of the sample?
56
Riccardo Rigon
Sunday, September 12, 2010
60.
Statistical Inference and Descriptive Statistics
Estimation and Hypothesis Testing
If the samples do not have the same mean, a trend can present istself.
57
Riccardo Rigon
Sunday, September 12, 2010
61.
Statistical Inference and Descriptive Statistics
Estimation and Hypothesis Testing
The variance can vary with the numerosity of the sample !
If it does not stabilise as the data of the sample increases than the data
are said to have “Infinite Variance Syndrome”.
58
Riccardo Rigon
Sunday, September 12, 2010
62.
Statistical Inference and Descriptive Statistics
Null Hypothesis
We will have a chance to look at hypothesis testing in detail in future
lectures. However, it is well to remember the following:
• Generally, it is not possible to definitively prove anything. One can
only attempt to prove that a hypothesis is not true.
• Let H0 be the (null) hypothesis to be tested. If H0 can not be rejected,
then one an affirm that “it is true” with a certain degree of confidence.
59
Riccardo Rigon
Sunday, September 12, 2010
63.
Statistical Inference and Descriptive Statistics
Other Statistics: Covariance
Given two datasets, for example:
hi = {h1 , · · ·, hn } and li = {l1 , · · ·, ln }
La covariance between these two datasets is defined as:
n
1
Cov(hi , li ) := (li − ¯i )(hi − hi )
l ¯
N −1 1
60
Riccardo Rigon
Sunday, September 12, 2010
64.
Statistical Inference and Descriptive Statistics
Other Statistics: Correlation
Given two datasets, for example:
hi = {h1 , · · ·, hn } and li = {l1 , · · ·, ln }
La correlation between these two datasets is defined as:
Cov(l, h)
ρlh := √
σh σl
61
Riccardo Rigon
Sunday, September 12, 2010
65.
Statistical Inference and Descriptive Statistics
Other Statistics: Correlation
Please observe that one can consider the correlation between two sample
series of equal length:
hi = {h1 , · · ·, hn−1 } and hi+1 = {h2 , · · ·, hn−1 }
Resulting in:
n−1
1 ¯ ¯
Cov(hi , hi+1 ) := (hi − hi )(hi+1 − hi+1 )
N −1 j=1
62
Riccardo Rigon
Sunday, September 12, 2010
66.
Statistical Inference and Descriptive Statistics
Other Statistics: Correlation
Repeating this operation for the series which are gradually reduced in
length and separated by r instants, the resulting series are:
r
hi = {h1 , · · ·, hn−r } and hi+r = {hr , · · ·, hn }
From where:
n−r
1 ¯ r )(hi+r − hi+r )
¯
Cov(hi , hi+r )
r
:= (hi
r
− hi
N −1 j=1
Cov(hr , hi+r )
ρ(hi , hi+r ) :=
r i
σi σi + r
r
63
Riccardo Rigon
Sunday, September 12, 2010
67.
Statistical Inference and Descriptive Statistics
Other Statistics: Autocorrelation
64
Riccardo Rigon
Sunday, September 12, 2010
68.
Statistical Inference and Descriptive Statistics
Random Sampling
Within the strategy of creating and analysing data samples, the selection ( or,
sometimes, the generation) of random samples plays an important role.
A random sample of n events, selected from a population, is such if the probability
of that sample being selected is the same as any other sample of the same size.
If the data are generated, then one is carrying out a random experiment. Some
examples of this are:
•tossing a coin;
•counting the rainy days in a year; and
•counting the days when the river flow at the Bridge of San Lorenzo, Trento, is
greater than a predetermined value.
Riccardo Rigon
Sunday, September 12, 2010
69.
Statistical Inference and Descriptive Statistics
Sample Variability
Simulation 2
Let us consider another example where sample variability is illustrated as
follows:
1. the same population as in the previous example shall be used (N = 4);
2. by means of the computer programme R, 50,000 samples will be
extracted, with replacement, from the population of size n = 2;
3. the mean will be calculated for each of these samples of size n = 2;
4. the mean and variance of the distribution of means of the 50,000
samples of size n = 2 will be calculated.
Corrado Caudek
66
Riccardo Rigon
Sunday, September 12, 2010
70.
Statistical Inference and Descriptive Statistics
Sample Variability
3 Simulazione 2
N - 4
n - 2
nSamples - 50000
X - c(2, 3, 5, 9)
Mean - mean(X)
Var - var(X)*(N-1)/N
SampDistr - rep(0, nSamples)
Corrado Caudek
for (i in 1:nSamples){
samp - sample(X, n, replace=T)
SampDistr[i] - mean(samp)
}
MeanSampDistr - mean(SampDistr)
67
VarSampDistr - var(SampDistr)*(nSamples-1)/nSamples
Riccardo Rigon
Tecniche di Ricerca Psicologica e di Analisi dei Dati 27
Sunday, September 12, 2010
71.
Statistical Inference and Descriptive Statistics
Sample Variability
3 Simulazione 2
N - 4
n - 2
nSamples - 50000
X - c(2, 3, 5, 9)
Mean - mean(X)
Var - var(X)*(N-1)/N
SampDistr - rep(0, nSamples)
Corrado Caudek
for (i in 1:nSamples){
samp - sample(X, n, replace=T)
SampDistr[i] - mean(samp)
}
MeanSampDistr - mean(SampDistr)
67
VarSampDistr - var(SampDistr)*(nSamples-1)/nSamples
Riccardo Rigon
Tecniche di Ricerca Psicologica e di Analisi dei Dati 27
Sunday, September 12, 2010
72.
Statistical Inference and Descriptive Statistics
Sample Variability
3 Simulazione 2
N - 4
n - 2
nSamples - 50000
X - c(2, 3, 5, 9)
Mean - mean(X) Mean and Variance of the Sample
Var - var(X)*(N-1)/N
SampDistr - rep(0, nSamples)
Corrado Caudek
for (i in 1:nSamples){
samp - sample(X, n, replace=T)
SampDistr[i] - mean(samp)
}
MeanSampDistr - mean(SampDistr)
67
VarSampDistr - var(SampDistr)*(nSamples-1)/nSamples
Riccardo Rigon
Tecniche di Ricerca Psicologica e di Analisi dei Dati 27
Sunday, September 12, 2010
73.
Statistical Inference and Descriptive Statistics
Sample Variability
3 Simulazione 2
N - 4
n - 2
nSamples - 50000
X - c(2, 3, 5, 9)
Mean - mean(X) Mean and Variance of the Sample
Var - var(X)*(N-1)/N
SampDistr - rep(0, nSamples)
Corrado Caudek
for (i in 1:nSamples){
50,000 samples are extracted
samp - sample(X, n, replace=T)
SampDistr[i] - mean(samp)
}
MeanSampDistr - mean(SampDistr)
67
VarSampDistr - var(SampDistr)*(nSamples-1)/nSamples
Riccardo Rigon
Tecniche di Ricerca Psicologica e di Analisi dei Dati 27
Sunday, September 12, 2010
74.
Statistical Inference and Descriptive Statistics
Sample Variability
3 Simulazione 2
! Results of analysis with R:
Risultati della simulazione
Mean
[1] 4.75
Var
[1] 7.1875
MeanSampDistr
[1] 4.73943
VarSampDistr
[1] 3.578548
Var/n
Corrado Caudek
[1] 3.59375
68
Tecniche di Ricerca Psicologica e di Analisi dei Dati 28
Riccardo Rigon
Sunday, September 12, 2010
75.
Statistical Inference and Descriptive Statistics
Sample Variability
! Population:
µ = 4.75, σ = 7.1875
2
๏Sample distribution of the means:
µx = 4.75, σx = 3.59375
¯
2
¯
๏Results of the R simulation:
µx =
ˆ¯ 4.73943, σx
ˆ¯
2
= 3.578548
Corrado Caudek
69
Riccardo Rigon
Sunday, September 12, 2010
76.
Measurement and Representation of Hydrological Quantities
Thank you for your attention!
G.Ulrici - Uomo dope aver lavorato alle slides , 2000 ?
70
Riccardo Rigon
Sunday, September 12, 2010
Be the first to comment