Environmental Data Analysis:
Probability Distribution and
Statistical Inference
Vitor Vieira Vasconcelos
João Marcelo Borovina Josko
Federal University of ABC (UFABC)
São Bernardo do Campo-SP, Brazil
October 2024
What We Will Study in Today's Class
• Populations and Samples
• Normal Curve
• Frequency and Probability Distributions
• Standard scores
• Calculating Probability Under the Normal Curve
• Central Limit Theorem and Standard Error
• Confidence Interval
• Practice in R
As researchers, we are interested in
investigating questions that apply to a
whole population of people or things
 The population can be general (all humans) or small
(all rabbits in field)
 We rarely have access to data from the entire
population, but only from a subset of a sample,
which we use to infer things about the entire
population
Populations & Samples
 The larger the sample, the more likely it is to reflect
the entire population
 Random samples from the same population may
give slightly different results
 On average, results from large samples should be
quite similar
Populations & Samples
 POPULATION: It is the set of all the elements or
results under investigation.
 SAMPLE: Any subset of the population.
 PARAMETER: It is a numerical measure that
describes a population
 STATISTICS: It is a numerical measure that
describes a sample
 ESTIMATOR: It is a sample statistic used to
approximate a population parameter
Concepts
Source: Slides Prof. Marcos Pó, UFABC
Scientific method for drawing conclusions about
population parameters from the collection,
treatment, and analysis of data from a sample
collected from that population
Statistical Inference
Statistical Inference boiling
down to an equation...
Outputi = (Modeli) + errori
That is, the data we observe can be predicted by
the model we choose to fit the data plus an error
Frequency Distributions
HISTOGRAM: Graph with the values observed on the
horizontal axis, with bars showing how many times
each value occurred in the dataset
Useful for evaluating the properties of a set of values
Mode
Most frequently
occurring score in the
dataset
Frequency
Values
Cummulative Distribution Function
https://statisticsbyjim.com/probability/cumulative-distribution-function-cdf/
Median
Cummulative
Frequency
Distribution
https://statisticsbyjim.com/probability/cumulative-distribution-function-cdf/
Cummulative
Frequency
Distributions
Wang, J., Lai, S., Ke, Z., Zhang, Y., Yin, S.,
& Zheng, J. (2014). Exposure assessment,
chemical characterization and source
identification of PM 2.5 for school children
and industrial downwind residents in
Guangzhou, China. Environmental
geochemistry and health, 36, 385-397.
Histograms and cumulative
frequency curves of the daily
average potential exposure
doses of PM 2.5 for the school
children at the school (a) and
the industrial downwind
residents in the community (b)
Normal Curve
Most of the scores are around the center of the distribution.
As we move away from the center (average), the frequency of
the scores decreases.
Frequência
Valores
Examples of normal distribution in
Environmental Sciences
 Daily air, surface or water temperature in a
location
 Noise in environmental sensors or measurements
 Soil pH Levels (or another environmental
characteristic) in a homogeneous region
 Species body size
 Noise level in urban environments
 Daily or monthly averaged wind speeds
 Canopy height in mature forests
Properties of Frequency Distributions
A distribution can deviate from a normal in 2 main ways:
(1) Lack of symmetry
ASYMMETRY
(2) Flattening
KURTOSIS
Leptokurtic Platykurtic
Positively Asymmetrical Negatively Asymmetrical
Frequency
Values
Frequency
Values
Frequency
Values
Frequency
Values
GREATER
STANDARD
DEVIATION
LOWER
STANDARD
DEVIATION
Pearson’s coefficient of skewness
Where
= the mean,
Md = the median and
s = the standard deviation for the sample
Fisher’s moment coeficient of
skewness
Where
X = random variable (central moment – median)
μ = mean
σ = the standard deviation for the sample
E = expectation operator
Central Trend Measures vs Distribution Frequences
MODE (Mo): Most frequent value in a distribution
MEDIAN (Me): Measure that separates the distribution into two equal parts
MEAN (X): Sum of a set of scores divided by the total number of scores in
the set
Negativelly Asymmetrical Positively Asymmetrical
Kurtosis
Leptokurtic Mesokurtic Platykurtic
Tomy, L., Chesneau, C., & Madhav, A. K. (2021). Statistical Techniques for Environmental
Sciences: A Review. Mathematical and Computational Applications, 26(4), 74.
Pearson’s measure of kurtosis
 Forth moment (squared of skewness – 1)
 Calculated based on the extremity of the deviations
(outliers) and not of data near the mean
 Kurtosis is not “peakdness”
 Kurtosis is ““tailedness”
Where
X = random variable (central moment – median)
μ = mean
σ = the standard deviation for the sample
E = expectation operator
Normal Curve
Symmetric. Average, median and fashion coincide!
Neither leptokurtic, neither platykurtic  Mesokurtic
From the central peak, the curve gradually drops at both ends, getting
closer and closer to the basic straight, without ever touching it
Frequency
Values
C’mon guys! Can’t
we behave normally?
Probability Distributions
Frequency distributions can be used to get a rough idea of the
probability of a score occurring (or interval)
Considering that the distribution of the number of fish offspring
per birth of a certain species has a normal distribution, what
would be the probability of having, in a birth, 4 offsprings or
less?
PROBABILITY:
AN IMPORTANT NOTION FOR DECISION MAKING!!
Probability Distributions
To facilitate our work, statisticians
have devised a mathematical form
that specifies idealized versions of
the distributions:
PROBABILITY DISTRIBUTIONS
Probability Distributions
The probability distribution associates a probability
with each numerical outcome of an experiment, that
is, it gives the probability of each value (or range of
values) of a random variable.
 It is analogous to a frequency distribution, except that it is
based on theory rather than empirical data (real-world
observations)
 The probabilities represent the chance of each score
occurring, directly analogous to the percentages in a
frequency distribution.
You cannot prove anything from
samples or by using statistical
inference
You always estimate a probability
The normal curve as a probability
distribution
 The normal curve is a theoretical ideal
 However, there are many actual data distributions that
approximate the shape of the normal curve
It is always important to check!!!
Building a histogram is a good start!
Normal distribution
μ = the mean or expectation of the distribution
(and also it’s the median and the mode)
σ2 = variance.
σ (sigma) = standard deviation
The normal curve as a probability
distribution
The normal curve as a probability
distribution
Offsprings per birth
Mean = 2.6; s = 1.14
1,46 3,74
2,6
Interactive Practice
https://distribution-explorer.github.io/continuous/normal.html
Returning to our question:
Considering that the distribution of the number of fish offspring
per birth of a certain species has a normal distribution, what
would be the probability of having, in a birth, 4 offsprings or
less?
1,46 3,74
2,6
4
Offsprings per birth
Mean = 2,6; s = 1,14
Standard Normal Distribution
 They have already calculated the probability of certain scores
occurring in a normal distribution with
Mean = 0 & Standard Deviation = 1
STANDARD NORMAL DISTRIBUTION
Standard Normal Distribution
BUT... The distribution of my data does not show
mean = zero and standard deviation = 1!
So????
ANY DATASET CAN BE CONVERTED TO A DATASET
THAT HAS ZERO MEAN AND STANDARD DEVIATION 1!
YEAH!!!!
How to do:
(1) To center data at zero, we took each score and
subtracted from it the mean of all the scores.
(2) We divided the resulting score by the standard
deviation to ensure that the results will have SD = 1
Z score
Standard Normal Distribution
Returning to our question:
Considering that the distribution of the number of fish offspring
per birth of a certain species has a normal distribution, what
would be the probability of having, in a birth, 4 offsprings or
less?
Considering that the distribution of the data can be described as a normal
distribution, with mean = 2.6 and standard deviation = 1.14
Z score
Convert the value 4 into a z-score
(4 - 2.6)/1.14 = 1.23
Standard Normal Distribution
1,23
Some z-scores are cutoff
points that highlight
important points of the
distribution.
1,96
-1,96
z = 1.96
z = -1.96
Separates the top and tail
2.5%
That is, 95% of the scores are
between -1.96 and 1.96
z = -2,58
z = +2,58
99% of the scores
are between -2.58
and 2.58
Some z-scores
are cutoff points
that highlight
important points
of the
distribution.
z = -3.29
z = +3.29
99.9% of the scores
are between -3.29
and 3.29
Practice in R
Is my sample
representative of my
population?
Conventions:
μ = population mean
X = sample mean
We used samples to estimate the
behavior/characteristics of a population.
For example, we use the sample mean
(X), to estimate the population mean
(μ).
If we take many samples from the same
population, each sample will have its
own mean and in several of these
samples the means will be different.
Population
Population
Is my sample
representative of my
population?
We can construct a frequency
distribution with the averages of these
samples!
SAMPLE DISTRIBUTION
Frequency distribution of the means of
all samples of the same population. It is
centered on the same value as the
population mean.
SAMPLE
DISTRIBUTION
Conventions:
μ = population mean
X = sample mean
σ = standard deviation
of the population
s = standard deviation of
the sample
3. The standard deviation of a sample
distribution (σX) is smaller than that of the
population (σ). The sample mean is more
stable than the scores that compose it.
Characteristics of a
sample distribution
SAMPLE
DISTRIBUTION
1. Aproximates a normal curve
(provided the sample size is reasonably
large – N > 30)
2. The mean of a sample distribution (the
mean of the means) is equal to the true
population mean (μ).
DURATION OF
INTER-
MUNICIPAL
PHONE CALLS
Distribution of the population Distribution of a sample (N = 200)
Sample distribution (100 samples) Theoretical sample distribution (for
infinite samples) – Normal curve
Standard Error of the
Mean
SAMPLE
DISTRIBUTION
STANDARD ERROR
It measures variability
between the means
of different samples.
In fact, it should be the standard deviation of the population
divided by the square root of the sample size; however, for large
samples, this approximation is reasonable.
STANDARD ERROR OF THE
MEAN(σX )
Standard deviation of sample means.
Measure of how representative the
sample may be of the population
In reality, we cannot select hundreds of
samples to construct a sample
distribution.
Technique for estimating standard error
from the standard deviation of the
sample(s):
Divide s by the square root of the sample
size (N)
Oh! How Much did I miss?
Standard Error of the Mean
RECAP:
 We use the sample mean as an estimate of the value of the
population mean.
 Different samples give different values than the population
mean.
 The Standard Error can be used to get an idea of the
difference between the sample mean and the population
mean.
 The Standard Error can be estimated  higher when the
standard deviation of the population is greater (in the
absence of the standard deviation of the population, we used
the standard deviation of the sample); smaller when the
sample number is bigger.
Interactive Practice
https://istats.shinyapps.io/sampdist_cont/
Interactive Practice
Interactive Practice
Interactive Practice
Try with different number of samples, sample size, and
different datasets (real and simulated examples).
Interactive practice
https://stats.libretexts.org/Learning_Objects/02%3A_Interactive_Statistics/15%3A_Discover_the_Central_Limit_Theorem_Activity
Interactive practice
https://stats.libretexts.org/Learning_Objects/02%3A_Interactive_Statistics/15%3A_Discover_the_Central_Limit_Theorem_Activity
 Visualize the sample means with the different sample
sizes.
 What happens when you increase sample size?
 Click on “New Dist” button and try the same
procedure with other distribution functions
Standard Error of the Mean
In addition to providing us with an idea of the
difference between the sample mean (X) and the
population mean (μ)...
 With the help of the Standard Error of the Mean we
can estimate the probability that our population
mean is within a range of mean values.
Concept of CONFIDENCE INTERVAL
Confidence Intervals
An approach to determining the accuracy of
the Sample Mean:
Calculate the thresholds between which we believe
the value of the true mean will be
CONFIDENCE INTERVAL
Range of values (thresholds) between which we
think the population value (parameter) will be
(in this case, the value of the true average)
Point Estimate Confidence Interval
https://moderndive.com/8-confidence-intervals.html
Ismay, C.; Kim, A. Y. 2019. Statistical Inference via Data Science: A ModernDive into
R and the Tydiverse. Chapman & Hall / XRC The R Series
Which one has higher chance of getting the fish?
Confidence Intervals
A 95% confidence interval (CI)
How do I interpret it???
If we select 100 samples, calculate the mean, and
after we determine the confidence interval for
that mean, 95% of the confidence intervals will
contain the actual value of the population mean
OK! Now let's
see how the CI
is calculated...
SAMPLE
DISTRIBUTION
OF MEANS
THE MEAN OF OUR
SAMPLE IS
SOMEWHERE IN THE
DISTRIBUTION
Confidence Intervals
Remember why the value 1.96
is an important z-value???
Also remember how we can
convert scores into z-scores:
Z scores
And 2,58?
And 3,29?
Because 95% of z-scores are
between -1.96 and 1.96!!
1,96
-1,96
Confidence Intervals
If we know that our limits will be -1.96 and 1.96, in z-
scores, what are the corresponding scores in values
of our data?
[It is the inverse of how to calculate the Z score]
To find this, let's put z back into the equation
Z score
Z - scores
-
-
Lower threshold of the confidence interval
Upper threshold of the confidence interval
SE)
SE)
Lower threshold of the confidence interval
Upper threshold of the confidence interval
SE)
Z - scores
We use Standard Error rather
than Standard Deviation
because we are interested in the
variability of sample means
rather than the variability of
within-sample observations
SE)
Confidence Intervals
SE
SE
SE
Example – 95% CI
Let's say we have collected data on the
emissions of CO2 (million tons per year) per
country in the world. We have a sample of
100 countries (N=100), with mean = 3800 and
standard deviation(s) = 1500.
Standard Error (SE) calculation:
SE
Exemple – 95% CI
Let's say we have collected data on the emissions of CO2 (million
tons per year) per country in the world. We have a random sample
of 100 countries (N=100), with mean = 3800 and
standard deviation(s) = 1500.
Lower threshold of the confidence interval =
3800 – (1.96*150) = 3506
Upper threshold of the Confidence Interval =
3800 + (1.96*150) = 4094
Lower threshold of the confidence interval
Upper threshold of the confidence interval
SE)
SE)
Example – 95% CI
Let's say we have collected data on the emissions of CO2 (million
tons per year) per country in the world. We have a random sample
of 100 countries (N=100), with mean = 3800 and
standard deviation(s) = 1500.
Lower threshold of the confidence interval = 3800 – (1.96*150) = 3506
Upper threshold of the Confidence Interval = 3800 + (1.96*150) = 4094
Considering that 95% of the confidence intervals contain the
population mean, we can say that this interval between 3506 and
4094 has a 95% chance of containing the real average emission of
CO2 per country.
Interactive Practice
https://stats.libretexts.org/Learning_Objects/02%3A_Interactive_Statistics/22%3A_Intera
ctively_Observe_the_Effect_of_Changing_the_Confidence_Level_and_the_Sample_Size
• Try different sample sizes and confidence levels.
• What happens when you increase or decrease the sample size?
• What happens when you increase or decrease the confidence level?
More Accurate Confidence Intervals
For small samples, where s is a less reliable estimate of σ, we
should build our confidence interval a little differently.
Instead of using 1.96 (z-score), we used a slightly higher value to
reflect our reduction in confidence. This value is based on the t
distribution.
Degrees of freedom
“The number of values in a final calculation of a statistic that are free to
vary”
“The number of independent pieces of information that go into the
estimate of a parameter”.
“The number of independent scores that go into the estimate minus the
number of parameters used as intermediate steps (restrictions) in the
estimation of the parameter itself”
“The number of unique pieces of information that are used as input into
the analysis, called knowns, minus the number of parameters that are
uniquely estimated, called unknowns.”
“The number of dimensions of the domain of a random vector, or
essentially the number of "free" components (how many components
need to be known before the vector is fully determined)”
“Summarizes the flexibility of a model”
“The denominator of a variance estimate”
Degrees of freedom
VARIANCE
"mean of the square of deviations"
Estimation of population variance using n random
samples xi where i = 1, 2, ..., n.
T Distribution
Interactive practice
https://probstats.org/studentt.html
Interactive practice
https://distribution-explorer.github.io/continuous/student_t.html
Comparison of Confidence Intervals
We measured heavy metal pollution concentration in soils
of sites in a group of industrial areas and in another group
of agricultural areas. We can construct a 95% confidence
interval for the mean for each of the groups, and then
construct a graph with these intervals against a common
axis to see if there is an intersection (i.e. if there are any
values in common). If the ranges do not overlap, then we
have (at least) 95% confidence that the true means are not
equal.
Comparison of Confidence Intervals
Margins of Error in Ellections
https://www.surveypractice.org/article/11736-sample-size-and-uncertainty-when-
predicting-with-polls-the-shortcomings-of-confidence-intervals
Margins of Error in Ellections
I don’t
know /
refuse
to
answer
Candidate
S
Candidate
M
https://www.surveypractice.org/article/11736-sample-size-and-uncertainty-when-
predicting-with-polls-the-shortcomings-of-confidence-intervals
Margins of Error
https://www.ft.com/content/710b8677-2ed2-4a68-9600-4ecb4b22d0d6
Error Margins
https://ig.ft.com/us-elections/2024/polls/
Symmetrical vs Asymmetrical
Confidence Intervals
Laufer, I. (2013). Statistical analysis of CPT tip resistances. Periodica Polytechnica
Civil Engineering, 57(1), 45-61.
Distribution by Bootstrapping
https://towardsdatascience.com/bootstrapping-statistics-what-it-is-and-why-its-used-e2fa29577307
Practice in R
https://www.khstats.com/blog/tmle/tutorial
Poisson distribution
Probability of a given
number of events occurring
in a fixed interval of time
or space, under the
assumption that these
events happen independently
of one another.
Poisson distribution
Examples in environmental sciences
 Rare event counts (e.g., disasters)
 Rainfall events in a time period
 Number of parasites on host species
 Occurrence of endangered species
sightings
 Number of species in a quadrant
Poisson distribution
where
k is the number of occurrences ( k = 0 , 1 , 2 , …)
e is Euler's number ( e = 2.71828 …)
k! = k(k–1) ··· (3)(2)(1) is the factorial.
The positive real number λ (lambda) is equal to the
expected value of X and also to its variance.
λ = E (X) = Var (X)
Poisson distribution
Interactive Practice
https://probstats.org/poisson.html
Interactive Practice
https://distribution-explorer.github.io/discrete/poisson.html
Practice in R
Binomial Distribution
Discrete probability distribution
of the number of successes in a
sequence of n independent
experiments, each asking a yes–
no question
Binomial Distribution
What is the probability that a toast with jelly will fall with
topping side down?
Binomial probability tree
Up
Down
Up
Down
Up
Down
50% vs 50%
0.5
0.5
0.5
0.5
0.5
0.5
0.5 * 0.5 = 0.25
2x UP
Up +
Down
2x Down 0.5 * 0.5 = 0.25
2 * (0.5 * 0.5) = 0.5
Binomial probability tree
Up
Down
Up
Down
Up
Down
10% vs 90%
0.1
0.9
0.1
0.9
0.1
0.9
0.1 * 0.1 = 0.01
2x UP
Up +
Down
2x Down 0.9 * 0.9 = 0.81
2 * (0.1 * 0.9) = 0.18
Examples of Binomia Distributions
in Environmental Sciences
(yes/no questions)
• Disaster occurrence / absence
• Species distribution modeling
(presence / absence)
• Survival of seedlings / animal / humans
• Contaminant presence / absence
• Probability of a certain environmental condition
(e.g., wet days in a month) being met
Binomial Distribution
for k = 0, 1, 2, ..., n, where
The probability of getting exactly k successes in n
independent Bernoulli trials (with the same rate p) is given
by the probability mass function:
https://commons.wikimedia.org/wiki/File:Binomial_distribution_pmf.svg
Binomial Distribution
Interactive Practice
https://probstats.org/binomial.html
Interactive Practice
https://stats.libretexts.org/Learning_Objects/02%3A_Interactive_Statistics/17%3A_
Observe_the_Relationship_Between_the_Binomial_and_Normal_Distributions
Practice in R
Negative binomial
Discrete probability distribution that
models the number of failures in a
sequence of independent and identically
distributed Bernoulli trials before a
specified (non-random) number of
successes (denoted r) occurs
Example of negative binomial
distribution
How many lottery tickets I
would probably need to buy
before I win for two times?
Negative binomial
where
r is the number of successes,
k is the number of failures, and
p is the probability of success on each trial.
Negative binomial
where
r is the number of successes,
k is the number of failures,
and
p is the probability of
success on each trial.
Interactive practice
https://distribution-explorer.github.io/discrete/negative_binomial.html
Examples of Negative Binomial distribution
in Environmental Sciences
 Number of failed seedlings until you reach a certain
number of mature plants in a restoration site
 Number of failed attacks before a predator acquires
the amount of prey food
 Number of events when the mean is different from
the variance
(Poisson distribution cannot apply)
 Species abundance per unit of area
 Infections disease spread
 Microbial counts in water samples
Beta Binomial Distribution
Used to model binomial distributions
(probability of success of yes/no events)
when the probability of success (p) is unknown.
https://commons.wikimedia.org/wiki/File:Beta-binomial_distribution_pmf.png
Beta Binomial Distribution
Used to model number of events when the mean is
different from the variance
(Poisson distribution is not valid in this case)
https://commons.wikimedia.org/wiki/File:Beta-binomial_distribution_pmf.png
Examples of Beta Binomial Distribution in
Environmental Sciences
 Species abundance in habitat patches with
different environmental conditions (food
availability, light, temperature, etc.)
 Germination rates (success of seeds) in regions
with heterogeneous soil types and moisture.
 Survival rates of animal populations
 Number of positive samples testing positive for
bacterial contamination (e.g., E. coli), Where the
source of contamination changes from site to
site
Multinomial Distribution
 Generalization of the binomial distribution, used
to model outcomes that fall into multiple
categories
https://jramkiss.github.io/2020/05/15/beta-and-dirichlet-distributions/
Examples of multinomial distribution in
Environmental Sciences
 Classification of land use types in remote sensing or
ecological studies
 Proportions of different species in ecological
communities
 Pollutant source attribution (industrial, agricultural,
vehicle traffic, natural background)
 Soil classification
 Waste composition analysis (organic, plastic, paper,
metal, and glass).
 Species functional groups (herbivore, carnivores,
decomposers, pollinators, etc.)
Normalized difference vegetation index
vs
Land cover
Azizah, S. N. N., June, T., Salmayenti, R., Ma'rufah, U., & Koesmaryono, Y. (2022).
Land use change impact on normalized difference vegetation index, surface albedo, and
heat fluxes in Jambi province: Implications to rainfall. Agromet, 36(1), 51-59.
Continuous Uniform Distribution
https://commons.wikimedia.org/wiki/File:Uniform_Distribution_PDF_SVG.svg
Practice in R
Log-Normal Distribution
 The logarithm of the variable is normally distributed
 Many small common events, but sometimes a very large
one happens
https://commons.wikimedia.org/wiki/File:Log-normal-pdfs.png
Interactive practice
https://probstats.org/lognormal.html
Examples of Log-Normal Distribution in
Environmental Sciences
 Concentrations of pollutants, such as particulate matter
(PM2.5 or PM10) or heavy metals (lead, mercury) in air,
water or soil.
 Distribution of species abundance, where a few species
are highly abundant, but many species have low counts.
 Rainfall intensity of storm events
(mm of rain / storm time)
 River discharge
 Wildfire size
 Economic Losses from Environmental Disasters
Exponential distribution
Time between
independent
events occurring
at a constant
average rate
https://commons.wikimedia.org/wiki/File:Exponential_distribution_pdf_-_public_domain.svg
Interactive practice
https://probstats.org/exponential.html
Examples of Exponential distribution in
Environmental Sciences
 Time intervals between natural events, like
earthquakes, extreme weather events (e.g.,
storms, droughts, wildfires), or lightning strikes.
 The decay rate of pollutants or radioactive
materials in ecosystems.
 The recovery time of ecosystems following a
disturbance (e.g., deforestation, flooding, or a
wildfire)
Gamma distribution
Often used for
non-negative
variables
https://commons.wikimedia.org/wiki/File:Gamma_distribution_pdf.svg
Interactive Practice
https://probstats.org/gamma.html
Examples of Gamma distribution
in Environmental Sciences
 Precipitation data
 Hydrological processes
 Drought duration
 Time to ecological recovery
 Time to disasters events
Gumbell distribution
Used for modeling extreme values, such as the
maximum or minimum of sample groups
https://commons.wikimedia.org/wiki/File:Gumbel-Density.svg
Interactive practice
https://rdzudzar-distributionanalyser-main-45cc69.streamlit.app/
Try also with gumbel_l
Examples of Gumbell distribution in
Environmental Sciences
 Modeling extreme weather events, such as
maximum daily rainfall or highest river
discharge during floods.
 Hailstorm frequency
 Landslide events caused by extreme rainfall
 Analysis of extreme temperatures
(heatwaves or cold spells)
 Peaks of severe air pollution
Weibull
Distribution
https://upload.wikimedia.org/wikipedia/commons/5/58/Weibull_PDF.svg
Interactive practice
https://probstats.org/weibull.html
Examples of Weibull distribution in
environmental sciences
 Wind speed modelling
 Disaster recurrence time, such as
wildfire and landslides
 Species survival rates under a changing
environmental condition (pollution,
temperature, etc.)
Beta Distribution
https://commons.wikimedia.org/wiki/File:Beta_distribution_pdf.svg
X is a
continuous
probability from
0 (zero) to 1
Interactive practice
https://probstats.org/beta.html
Examples of Beta distribution in
Environmental Sciences
 Proportion of vegetation types in an
Ecosystems
 Soil moisture content
 Proportion of land use cover in regions
(e.g., municipalities)
 Proportion of water samples meeting
regulatory standards
 Probability of adverse affects from
pollutants
Cauchy Distribution
https://commons.wikimedia.org/wiki/File:Cauchy_pdf.svg
Ratio of two independent normally distributed random
variables with mean zero
Cauchy Distribution
https://commons.wikimedia.org/wiki/File:Cauchy_pdf.svg
where :
x0 is the location
parameter, specifying the
location of the peak of
the distribution
γ (gamma) is the scale
parameter which
specifies the half-width at
half-maximum (HWHM)
Cauchy Distribution
https://commons.wikimedia.org/wiki/File:Cauchy_pdf.svg
• Mode and median = x0
• Symmetrical distribution (skewness = 0)
• Heavy tails (many extreme values for both sides)
Interactive practice
https://distribution-explorer.github.io/continuous/cauchy.html
Examples of Cauchy distribution in
Environmental Sciences
• Extreme weather events including
scarcity vs excess
 rainstorms vs droughts
 heatwaves vs cold spells
• Remote sensing reflectance data
Uses of distribution functions in Statistics
• Calculate
 Probabilities
 Uncertainty and variability
 Recurrence times
• Simulate data
• Planning for sampling design and error margins
• Generalized models
 Extend models based on normal distributions to
other distributions
 Allow more precise estimates and less effect of
outliers

Probability and Statistical Inference - 2024

  • 1.
    Environmental Data Analysis: ProbabilityDistribution and Statistical Inference Vitor Vieira Vasconcelos João Marcelo Borovina Josko Federal University of ABC (UFABC) São Bernardo do Campo-SP, Brazil October 2024
  • 2.
    What We WillStudy in Today's Class • Populations and Samples • Normal Curve • Frequency and Probability Distributions • Standard scores • Calculating Probability Under the Normal Curve • Central Limit Theorem and Standard Error • Confidence Interval • Practice in R
  • 3.
    As researchers, weare interested in investigating questions that apply to a whole population of people or things  The population can be general (all humans) or small (all rabbits in field)  We rarely have access to data from the entire population, but only from a subset of a sample, which we use to infer things about the entire population Populations & Samples
  • 4.
     The largerthe sample, the more likely it is to reflect the entire population  Random samples from the same population may give slightly different results  On average, results from large samples should be quite similar Populations & Samples
  • 5.
     POPULATION: Itis the set of all the elements or results under investigation.  SAMPLE: Any subset of the population.  PARAMETER: It is a numerical measure that describes a population  STATISTICS: It is a numerical measure that describes a sample  ESTIMATOR: It is a sample statistic used to approximate a population parameter Concepts Source: Slides Prof. Marcos Pó, UFABC
  • 6.
    Scientific method fordrawing conclusions about population parameters from the collection, treatment, and analysis of data from a sample collected from that population Statistical Inference
  • 7.
    Statistical Inference boiling downto an equation... Outputi = (Modeli) + errori That is, the data we observe can be predicted by the model we choose to fit the data plus an error
  • 8.
    Frequency Distributions HISTOGRAM: Graphwith the values observed on the horizontal axis, with bars showing how many times each value occurred in the dataset Useful for evaluating the properties of a set of values Mode Most frequently occurring score in the dataset Frequency Values
  • 9.
  • 10.
  • 11.
    Cummulative Frequency Distributions Wang, J., Lai,S., Ke, Z., Zhang, Y., Yin, S., & Zheng, J. (2014). Exposure assessment, chemical characterization and source identification of PM 2.5 for school children and industrial downwind residents in Guangzhou, China. Environmental geochemistry and health, 36, 385-397. Histograms and cumulative frequency curves of the daily average potential exposure doses of PM 2.5 for the school children at the school (a) and the industrial downwind residents in the community (b)
  • 12.
    Normal Curve Most ofthe scores are around the center of the distribution. As we move away from the center (average), the frequency of the scores decreases. Frequência Valores
  • 13.
    Examples of normaldistribution in Environmental Sciences  Daily air, surface or water temperature in a location  Noise in environmental sensors or measurements  Soil pH Levels (or another environmental characteristic) in a homogeneous region  Species body size  Noise level in urban environments  Daily or monthly averaged wind speeds  Canopy height in mature forests
  • 14.
    Properties of FrequencyDistributions A distribution can deviate from a normal in 2 main ways: (1) Lack of symmetry ASYMMETRY (2) Flattening KURTOSIS Leptokurtic Platykurtic Positively Asymmetrical Negatively Asymmetrical Frequency Values Frequency Values Frequency Values Frequency Values GREATER STANDARD DEVIATION LOWER STANDARD DEVIATION
  • 15.
    Pearson’s coefficient ofskewness Where = the mean, Md = the median and s = the standard deviation for the sample
  • 16.
    Fisher’s moment coeficientof skewness Where X = random variable (central moment – median) μ = mean σ = the standard deviation for the sample E = expectation operator
  • 17.
    Central Trend Measuresvs Distribution Frequences MODE (Mo): Most frequent value in a distribution MEDIAN (Me): Measure that separates the distribution into two equal parts MEAN (X): Sum of a set of scores divided by the total number of scores in the set Negativelly Asymmetrical Positively Asymmetrical
  • 18.
    Kurtosis Leptokurtic Mesokurtic Platykurtic Tomy,L., Chesneau, C., & Madhav, A. K. (2021). Statistical Techniques for Environmental Sciences: A Review. Mathematical and Computational Applications, 26(4), 74.
  • 19.
    Pearson’s measure ofkurtosis  Forth moment (squared of skewness – 1)  Calculated based on the extremity of the deviations (outliers) and not of data near the mean  Kurtosis is not “peakdness”  Kurtosis is ““tailedness” Where X = random variable (central moment – median) μ = mean σ = the standard deviation for the sample E = expectation operator
  • 20.
    Normal Curve Symmetric. Average,median and fashion coincide! Neither leptokurtic, neither platykurtic  Mesokurtic From the central peak, the curve gradually drops at both ends, getting closer and closer to the basic straight, without ever touching it Frequency Values
  • 21.
    C’mon guys! Can’t webehave normally?
  • 22.
    Probability Distributions Frequency distributionscan be used to get a rough idea of the probability of a score occurring (or interval) Considering that the distribution of the number of fish offspring per birth of a certain species has a normal distribution, what would be the probability of having, in a birth, 4 offsprings or less? PROBABILITY: AN IMPORTANT NOTION FOR DECISION MAKING!!
  • 23.
    Probability Distributions To facilitateour work, statisticians have devised a mathematical form that specifies idealized versions of the distributions: PROBABILITY DISTRIBUTIONS
  • 24.
    Probability Distributions The probabilitydistribution associates a probability with each numerical outcome of an experiment, that is, it gives the probability of each value (or range of values) of a random variable.  It is analogous to a frequency distribution, except that it is based on theory rather than empirical data (real-world observations)  The probabilities represent the chance of each score occurring, directly analogous to the percentages in a frequency distribution.
  • 26.
    You cannot proveanything from samples or by using statistical inference You always estimate a probability
  • 27.
    The normal curveas a probability distribution  The normal curve is a theoretical ideal  However, there are many actual data distributions that approximate the shape of the normal curve It is always important to check!!! Building a histogram is a good start!
  • 28.
    Normal distribution μ =the mean or expectation of the distribution (and also it’s the median and the mode) σ2 = variance. σ (sigma) = standard deviation
  • 29.
    The normal curveas a probability distribution
  • 30.
    The normal curveas a probability distribution Offsprings per birth Mean = 2.6; s = 1.14 1,46 3,74 2,6
  • 31.
  • 32.
    Returning to ourquestion: Considering that the distribution of the number of fish offspring per birth of a certain species has a normal distribution, what would be the probability of having, in a birth, 4 offsprings or less? 1,46 3,74 2,6 4 Offsprings per birth Mean = 2,6; s = 1,14
  • 33.
    Standard Normal Distribution They have already calculated the probability of certain scores occurring in a normal distribution with Mean = 0 & Standard Deviation = 1 STANDARD NORMAL DISTRIBUTION
  • 34.
    Standard Normal Distribution BUT...The distribution of my data does not show mean = zero and standard deviation = 1! So???? ANY DATASET CAN BE CONVERTED TO A DATASET THAT HAS ZERO MEAN AND STANDARD DEVIATION 1! YEAH!!!! How to do: (1) To center data at zero, we took each score and subtracted from it the mean of all the scores. (2) We divided the resulting score by the standard deviation to ensure that the results will have SD = 1 Z score
  • 35.
    Standard Normal Distribution Returningto our question: Considering that the distribution of the number of fish offspring per birth of a certain species has a normal distribution, what would be the probability of having, in a birth, 4 offsprings or less? Considering that the distribution of the data can be described as a normal distribution, with mean = 2.6 and standard deviation = 1.14 Z score Convert the value 4 into a z-score (4 - 2.6)/1.14 = 1.23
  • 36.
  • 37.
    Some z-scores arecutoff points that highlight important points of the distribution. 1,96 -1,96 z = 1.96 z = -1.96 Separates the top and tail 2.5% That is, 95% of the scores are between -1.96 and 1.96
  • 38.
    z = -2,58 z= +2,58 99% of the scores are between -2.58 and 2.58 Some z-scores are cutoff points that highlight important points of the distribution. z = -3.29 z = +3.29 99.9% of the scores are between -3.29 and 3.29
  • 40.
  • 41.
    Is my sample representativeof my population? Conventions: μ = population mean X = sample mean We used samples to estimate the behavior/characteristics of a population. For example, we use the sample mean (X), to estimate the population mean (μ). If we take many samples from the same population, each sample will have its own mean and in several of these samples the means will be different. Population
  • 42.
  • 43.
    Is my sample representativeof my population? We can construct a frequency distribution with the averages of these samples! SAMPLE DISTRIBUTION Frequency distribution of the means of all samples of the same population. It is centered on the same value as the population mean. SAMPLE DISTRIBUTION Conventions: μ = population mean X = sample mean σ = standard deviation of the population s = standard deviation of the sample
  • 44.
    3. The standarddeviation of a sample distribution (σX) is smaller than that of the population (σ). The sample mean is more stable than the scores that compose it. Characteristics of a sample distribution SAMPLE DISTRIBUTION 1. Aproximates a normal curve (provided the sample size is reasonably large – N > 30) 2. The mean of a sample distribution (the mean of the means) is equal to the true population mean (μ).
  • 45.
    DURATION OF INTER- MUNICIPAL PHONE CALLS Distributionof the population Distribution of a sample (N = 200) Sample distribution (100 samples) Theoretical sample distribution (for infinite samples) – Normal curve
  • 46.
    Standard Error ofthe Mean SAMPLE DISTRIBUTION STANDARD ERROR It measures variability between the means of different samples. In fact, it should be the standard deviation of the population divided by the square root of the sample size; however, for large samples, this approximation is reasonable. STANDARD ERROR OF THE MEAN(σX ) Standard deviation of sample means. Measure of how representative the sample may be of the population In reality, we cannot select hundreds of samples to construct a sample distribution. Technique for estimating standard error from the standard deviation of the sample(s): Divide s by the square root of the sample size (N)
  • 47.
    Oh! How Muchdid I miss?
  • 48.
    Standard Error ofthe Mean RECAP:  We use the sample mean as an estimate of the value of the population mean.  Different samples give different values than the population mean.  The Standard Error can be used to get an idea of the difference between the sample mean and the population mean.  The Standard Error can be estimated  higher when the standard deviation of the population is greater (in the absence of the standard deviation of the population, we used the standard deviation of the sample); smaller when the sample number is bigger.
  • 49.
  • 50.
  • 51.
  • 52.
    Interactive Practice Try withdifferent number of samples, sample size, and different datasets (real and simulated examples).
  • 53.
  • 54.
  • 55.
     Visualize thesample means with the different sample sizes.  What happens when you increase sample size?
  • 56.
     Click on“New Dist” button and try the same procedure with other distribution functions
  • 58.
    Standard Error ofthe Mean In addition to providing us with an idea of the difference between the sample mean (X) and the population mean (μ)...  With the help of the Standard Error of the Mean we can estimate the probability that our population mean is within a range of mean values. Concept of CONFIDENCE INTERVAL
  • 59.
    Confidence Intervals An approachto determining the accuracy of the Sample Mean: Calculate the thresholds between which we believe the value of the true mean will be CONFIDENCE INTERVAL Range of values (thresholds) between which we think the population value (parameter) will be (in this case, the value of the true average)
  • 60.
    Point Estimate ConfidenceInterval https://moderndive.com/8-confidence-intervals.html Ismay, C.; Kim, A. Y. 2019. Statistical Inference via Data Science: A ModernDive into R and the Tydiverse. Chapman & Hall / XRC The R Series Which one has higher chance of getting the fish?
  • 61.
    Confidence Intervals A 95%confidence interval (CI) How do I interpret it??? If we select 100 samples, calculate the mean, and after we determine the confidence interval for that mean, 95% of the confidence intervals will contain the actual value of the population mean
  • 63.
    OK! Now let's seehow the CI is calculated... SAMPLE DISTRIBUTION OF MEANS THE MEAN OF OUR SAMPLE IS SOMEWHERE IN THE DISTRIBUTION
  • 64.
    Confidence Intervals Remember whythe value 1.96 is an important z-value??? Also remember how we can convert scores into z-scores: Z scores And 2,58? And 3,29? Because 95% of z-scores are between -1.96 and 1.96!!
  • 65.
  • 66.
    Confidence Intervals If weknow that our limits will be -1.96 and 1.96, in z- scores, what are the corresponding scores in values of our data? [It is the inverse of how to calculate the Z score] To find this, let's put z back into the equation Z score
  • 67.
    Z - scores - - Lowerthreshold of the confidence interval Upper threshold of the confidence interval SE) SE)
  • 68.
    Lower threshold ofthe confidence interval Upper threshold of the confidence interval SE) Z - scores We use Standard Error rather than Standard Deviation because we are interested in the variability of sample means rather than the variability of within-sample observations SE)
  • 69.
  • 70.
    Example – 95%CI Let's say we have collected data on the emissions of CO2 (million tons per year) per country in the world. We have a sample of 100 countries (N=100), with mean = 3800 and standard deviation(s) = 1500. Standard Error (SE) calculation: SE
  • 71.
    Exemple – 95%CI Let's say we have collected data on the emissions of CO2 (million tons per year) per country in the world. We have a random sample of 100 countries (N=100), with mean = 3800 and standard deviation(s) = 1500. Lower threshold of the confidence interval = 3800 – (1.96*150) = 3506 Upper threshold of the Confidence Interval = 3800 + (1.96*150) = 4094 Lower threshold of the confidence interval Upper threshold of the confidence interval SE) SE)
  • 72.
    Example – 95%CI Let's say we have collected data on the emissions of CO2 (million tons per year) per country in the world. We have a random sample of 100 countries (N=100), with mean = 3800 and standard deviation(s) = 1500. Lower threshold of the confidence interval = 3800 – (1.96*150) = 3506 Upper threshold of the Confidence Interval = 3800 + (1.96*150) = 4094 Considering that 95% of the confidence intervals contain the population mean, we can say that this interval between 3506 and 4094 has a 95% chance of containing the real average emission of CO2 per country.
  • 73.
    Interactive Practice https://stats.libretexts.org/Learning_Objects/02%3A_Interactive_Statistics/22%3A_Intera ctively_Observe_the_Effect_of_Changing_the_Confidence_Level_and_the_Sample_Size • Trydifferent sample sizes and confidence levels. • What happens when you increase or decrease the sample size? • What happens when you increase or decrease the confidence level?
  • 74.
    More Accurate ConfidenceIntervals For small samples, where s is a less reliable estimate of σ, we should build our confidence interval a little differently. Instead of using 1.96 (z-score), we used a slightly higher value to reflect our reduction in confidence. This value is based on the t distribution.
  • 75.
    Degrees of freedom “Thenumber of values in a final calculation of a statistic that are free to vary” “The number of independent pieces of information that go into the estimate of a parameter”. “The number of independent scores that go into the estimate minus the number of parameters used as intermediate steps (restrictions) in the estimation of the parameter itself” “The number of unique pieces of information that are used as input into the analysis, called knowns, minus the number of parameters that are uniquely estimated, called unknowns.” “The number of dimensions of the domain of a random vector, or essentially the number of "free" components (how many components need to be known before the vector is fully determined)” “Summarizes the flexibility of a model” “The denominator of a variance estimate”
  • 76.
    Degrees of freedom VARIANCE "meanof the square of deviations" Estimation of population variance using n random samples xi where i = 1, 2, ..., n.
  • 77.
  • 78.
  • 79.
  • 80.
    Comparison of ConfidenceIntervals We measured heavy metal pollution concentration in soils of sites in a group of industrial areas and in another group of agricultural areas. We can construct a 95% confidence interval for the mean for each of the groups, and then construct a graph with these intervals against a common axis to see if there is an intersection (i.e. if there are any values in common). If the ranges do not overlap, then we have (at least) 95% confidence that the true means are not equal.
  • 81.
  • 82.
    Margins of Errorin Ellections https://www.surveypractice.org/article/11736-sample-size-and-uncertainty-when- predicting-with-polls-the-shortcomings-of-confidence-intervals
  • 83.
    Margins of Errorin Ellections I don’t know / refuse to answer Candidate S Candidate M https://www.surveypractice.org/article/11736-sample-size-and-uncertainty-when- predicting-with-polls-the-shortcomings-of-confidence-intervals
  • 85.
  • 86.
  • 87.
    Symmetrical vs Asymmetrical ConfidenceIntervals Laufer, I. (2013). Statistical analysis of CPT tip resistances. Periodica Polytechnica Civil Engineering, 57(1), 45-61.
  • 88.
  • 89.
  • 90.
  • 91.
    Poisson distribution Probability ofa given number of events occurring in a fixed interval of time or space, under the assumption that these events happen independently of one another.
  • 92.
    Poisson distribution Examples inenvironmental sciences  Rare event counts (e.g., disasters)  Rainfall events in a time period  Number of parasites on host species  Occurrence of endangered species sightings  Number of species in a quadrant
  • 93.
    Poisson distribution where k isthe number of occurrences ( k = 0 , 1 , 2 , …) e is Euler's number ( e = 2.71828 …) k! = k(k–1) ··· (3)(2)(1) is the factorial. The positive real number λ (lambda) is equal to the expected value of X and also to its variance. λ = E (X) = Var (X)
  • 94.
  • 95.
  • 96.
  • 97.
  • 98.
    Binomial Distribution Discrete probabilitydistribution of the number of successes in a sequence of n independent experiments, each asking a yes– no question
  • 99.
    Binomial Distribution What isthe probability that a toast with jelly will fall with topping side down?
  • 100.
    Binomial probability tree Up Down Up Down Up Down 50%vs 50% 0.5 0.5 0.5 0.5 0.5 0.5 0.5 * 0.5 = 0.25 2x UP Up + Down 2x Down 0.5 * 0.5 = 0.25 2 * (0.5 * 0.5) = 0.5
  • 101.
    Binomial probability tree Up Down Up Down Up Down 10%vs 90% 0.1 0.9 0.1 0.9 0.1 0.9 0.1 * 0.1 = 0.01 2x UP Up + Down 2x Down 0.9 * 0.9 = 0.81 2 * (0.1 * 0.9) = 0.18
  • 102.
    Examples of BinomiaDistributions in Environmental Sciences (yes/no questions) • Disaster occurrence / absence • Species distribution modeling (presence / absence) • Survival of seedlings / animal / humans • Contaminant presence / absence • Probability of a certain environmental condition (e.g., wet days in a month) being met
  • 103.
    Binomial Distribution for k= 0, 1, 2, ..., n, where The probability of getting exactly k successes in n independent Bernoulli trials (with the same rate p) is given by the probability mass function:
  • 104.
  • 105.
  • 106.
  • 107.
  • 108.
    Negative binomial Discrete probabilitydistribution that models the number of failures in a sequence of independent and identically distributed Bernoulli trials before a specified (non-random) number of successes (denoted r) occurs
  • 109.
    Example of negativebinomial distribution How many lottery tickets I would probably need to buy before I win for two times?
  • 110.
    Negative binomial where r isthe number of successes, k is the number of failures, and p is the probability of success on each trial.
  • 111.
    Negative binomial where r isthe number of successes, k is the number of failures, and p is the probability of success on each trial.
  • 112.
  • 113.
    Examples of NegativeBinomial distribution in Environmental Sciences  Number of failed seedlings until you reach a certain number of mature plants in a restoration site  Number of failed attacks before a predator acquires the amount of prey food  Number of events when the mean is different from the variance (Poisson distribution cannot apply)  Species abundance per unit of area  Infections disease spread  Microbial counts in water samples
  • 114.
    Beta Binomial Distribution Usedto model binomial distributions (probability of success of yes/no events) when the probability of success (p) is unknown. https://commons.wikimedia.org/wiki/File:Beta-binomial_distribution_pmf.png
  • 115.
    Beta Binomial Distribution Usedto model number of events when the mean is different from the variance (Poisson distribution is not valid in this case) https://commons.wikimedia.org/wiki/File:Beta-binomial_distribution_pmf.png
  • 116.
    Examples of BetaBinomial Distribution in Environmental Sciences  Species abundance in habitat patches with different environmental conditions (food availability, light, temperature, etc.)  Germination rates (success of seeds) in regions with heterogeneous soil types and moisture.  Survival rates of animal populations  Number of positive samples testing positive for bacterial contamination (e.g., E. coli), Where the source of contamination changes from site to site
  • 117.
    Multinomial Distribution  Generalizationof the binomial distribution, used to model outcomes that fall into multiple categories https://jramkiss.github.io/2020/05/15/beta-and-dirichlet-distributions/
  • 118.
    Examples of multinomialdistribution in Environmental Sciences  Classification of land use types in remote sensing or ecological studies  Proportions of different species in ecological communities  Pollutant source attribution (industrial, agricultural, vehicle traffic, natural background)  Soil classification  Waste composition analysis (organic, plastic, paper, metal, and glass).  Species functional groups (herbivore, carnivores, decomposers, pollinators, etc.)
  • 119.
    Normalized difference vegetationindex vs Land cover Azizah, S. N. N., June, T., Salmayenti, R., Ma'rufah, U., & Koesmaryono, Y. (2022). Land use change impact on normalized difference vegetation index, surface albedo, and heat fluxes in Jambi province: Implications to rainfall. Agromet, 36(1), 51-59.
  • 120.
  • 121.
  • 122.
    Log-Normal Distribution  Thelogarithm of the variable is normally distributed  Many small common events, but sometimes a very large one happens https://commons.wikimedia.org/wiki/File:Log-normal-pdfs.png
  • 123.
  • 124.
    Examples of Log-NormalDistribution in Environmental Sciences  Concentrations of pollutants, such as particulate matter (PM2.5 or PM10) or heavy metals (lead, mercury) in air, water or soil.  Distribution of species abundance, where a few species are highly abundant, but many species have low counts.  Rainfall intensity of storm events (mm of rain / storm time)  River discharge  Wildfire size  Economic Losses from Environmental Disasters
  • 125.
    Exponential distribution Time between independent eventsoccurring at a constant average rate https://commons.wikimedia.org/wiki/File:Exponential_distribution_pdf_-_public_domain.svg
  • 126.
  • 127.
    Examples of Exponentialdistribution in Environmental Sciences  Time intervals between natural events, like earthquakes, extreme weather events (e.g., storms, droughts, wildfires), or lightning strikes.  The decay rate of pollutants or radioactive materials in ecosystems.  The recovery time of ecosystems following a disturbance (e.g., deforestation, flooding, or a wildfire)
  • 128.
    Gamma distribution Often usedfor non-negative variables https://commons.wikimedia.org/wiki/File:Gamma_distribution_pdf.svg
  • 129.
  • 130.
    Examples of Gammadistribution in Environmental Sciences  Precipitation data  Hydrological processes  Drought duration  Time to ecological recovery  Time to disasters events
  • 131.
    Gumbell distribution Used formodeling extreme values, such as the maximum or minimum of sample groups https://commons.wikimedia.org/wiki/File:Gumbel-Density.svg
  • 132.
  • 133.
    Examples of Gumbelldistribution in Environmental Sciences  Modeling extreme weather events, such as maximum daily rainfall or highest river discharge during floods.  Hailstorm frequency  Landslide events caused by extreme rainfall  Analysis of extreme temperatures (heatwaves or cold spells)  Peaks of severe air pollution
  • 134.
  • 135.
  • 136.
    Examples of Weibulldistribution in environmental sciences  Wind speed modelling  Disaster recurrence time, such as wildfire and landslides  Species survival rates under a changing environmental condition (pollution, temperature, etc.)
  • 137.
  • 138.
  • 139.
    Examples of Betadistribution in Environmental Sciences  Proportion of vegetation types in an Ecosystems  Soil moisture content  Proportion of land use cover in regions (e.g., municipalities)  Proportion of water samples meeting regulatory standards  Probability of adverse affects from pollutants
  • 140.
    Cauchy Distribution https://commons.wikimedia.org/wiki/File:Cauchy_pdf.svg Ratio oftwo independent normally distributed random variables with mean zero
  • 141.
    Cauchy Distribution https://commons.wikimedia.org/wiki/File:Cauchy_pdf.svg where : x0is the location parameter, specifying the location of the peak of the distribution γ (gamma) is the scale parameter which specifies the half-width at half-maximum (HWHM)
  • 142.
    Cauchy Distribution https://commons.wikimedia.org/wiki/File:Cauchy_pdf.svg • Modeand median = x0 • Symmetrical distribution (skewness = 0) • Heavy tails (many extreme values for both sides)
  • 143.
  • 144.
    Examples of Cauchydistribution in Environmental Sciences • Extreme weather events including scarcity vs excess  rainstorms vs droughts  heatwaves vs cold spells • Remote sensing reflectance data
  • 145.
    Uses of distributionfunctions in Statistics • Calculate  Probabilities  Uncertainty and variability  Recurrence times • Simulate data • Planning for sampling design and error margins • Generalized models  Extend models based on normal distributions to other distributions  Allow more precise estimates and less effect of outliers