SlideShare a Scribd company logo
1 of 28
Download to read offline
1/28
Semiparametric Models for Analyzing Extremes
Surya T Tokdar and Erika Cunningham
Duke University
Thanks to: Whitney Huang, Michael Stein, Michael Wehner and others in the Extremes Semiparametric Subgroup
2/28
Thresholding for Extreme Analysis
3/28
Analyzing extremes
How to predict 1000-year flood from limited data?
0 2000 4000 6000 8000
020406080
Index
Dailyrainfallfornon−drydays(mm)
●● ●
●
●
●
●
●
●●
●
●
●
● ● ●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
● ●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●●
●
●
●
●
●
●
● ●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●●
●●
●
●●●
●
●
●
●●
●
●
●
●●
●
Common practice: Model values over threshold (“Peaks Over
Thresholds” or ”POT”), by throwing away bulk of data to improve
tail estimation
4/28
Motivation behind POT
The main motivation is to obtain low-dimensional parametric
estimation that focuses primarily on the tail decay rate which
may be quantified by the tail index1 ξ
By Pickands-Balkema-de Haan Theorem, the truncated CDF
above a large threshold is well approximated by a generalized
Pareto distribution with matching tail index ξ
1
For polynomially decaying tails, ξ = limy→∞
− log(1−F(y))
log y
5/28
Issues with POT
Setting the correct/optimal threshold is extremely challenging
POT is difficult to extend as a model for more complex data
with spatio-temporal dependence or other structures
Our goal: develop a semiparametric model for heavy tailed
data where the tails are estimated under parametric
assumptions whereas the center is estimated
nonparametrically!
6/28
Transformation to separate tail from the bulk
7/28
Transformation
Setting.
Data range = (a, b), with a = −∞ and/or b = ∞
{gθ : θ ∈ Θ} a parametric family of pdfs on (a, b)
Gθ denotes the CDF of gθ
Lemma
For any pdf f on (a, b) and any θ ∈ Θ there exists a unique pdf
h = hθ,f on (0, 1) such that
f (y) = gθ(y)h(Gθ(y)), y ∈ (a, b).
Proof. Take Y ∼ f and take h to be the pdf of U = Gθ(Y )
8/28
Tail matching
Suppose gθ and f are continuous densities
Then h = hθ,f is continuous and the two limits
lim
y a
f (y)
gθ(y)
= lim
u 0
h(u) =: h(0),
lim
y b
f (y)
gθ(y)
= lim
u 1
h(u) =: h(1)
exist but could equal 0 or ∞.
Corollary
f and gθ have same right and/or left tail index if and only if
0 < h(1) < ∞ and/or 0 < h(0) < ∞
9/28
Tail-identified transformation
Definition
The family {gθ : θ ∈ Θ} is tail-identified if θ = θ implies gθ and
gθ have distinct right and/or left tail indices.
Lemma
If {gθ : θ ∈ Θ} is tail-identified then for any pdf f on (a, b) there is
at most one θf ∈ Θ with h = hθf ,f satisfying 0 < h(0), h(1) < ∞.
10/28
Semiparametric density model for bulk + tail
{gθ : θ ∈ Θ} a tail-identified family
H := {h(·) a cont pdf on [0, 1] : 0 < h(0), h(1) < ∞}
F := {f (·) = gθ(·)h(Gθ(·)) : θ ∈ Θ, h ∈ H}
Model: Y1, Y2, . . .
IID
∼ f , f ∈ F
11/28
Bayesian estimation
12/28
Logistic GP prior on H
Definition (The logistic transform)
L : C([0, 1]) → H given by
(Lw)(u) =
ew(u)
1
0 ew(t)dt
, u ∈ [0, 1].
Definition (The logistic GP)
LGP(µ, σ) = L∗GP(µ, σ)
I.e., h ∼ LGP(µ, σ) ⇐⇒ h = Lw with w ∼ GP(µ, σ)
13/28
sLGP heavy-tailed density estimation on R
Model: Y1, . . . , Yn ∼ f (·) = gθ(·)h(Gθ(·))
h ∼ LGP(0, κ2CSE
λ ), (κ2, λ2) ∼ Ga−1
× Ga−1
gθ = tν(µ, τ2), θ = (µ, τ2, ν) ∼ 1
τ2 × πν(ν)
14/28
gθ = t3(0, 1), λ = 0.3 (top) and 0.08 (bottom)
0.0 0.2 0.4 0.6 0.8 1.0
−2−10123
w
u
w(u)
0.0 0.2 0.4 0.6 0.8 1.0
0.51.01.52.0
h
u
h(u)
−4 −2 0 2 4
0.00.20.40.6
f
y
f(y)
0.0 0.2 0.4 0.6 0.8 1.0
−2−1012
w
u
w(u)
0.0 0.2 0.4 0.6 0.8 1.0
012345
h
u
h(u)
−4 −2 0 2 4
0.00.51.01.5
f
y
f(y)
15/28
Model fitting
Low-rank approximation to GP w
Discretization of length-scale λ over a dense grid
Precomputed covariance matrix + Cholesky factors
Adaptive Metropolis MCMC
16/28
Software
17/28
Numerical study
18/28
Simulation
Simulation setup
Mixture standard normal and centered t4.
100 data sets, n=2000
−10 −5 0 5 10
0.000.050.100.150.200.25
Simulation 2
x
Density
Mixture centered normal & t with 4 df
Check
1. Parameters of gθ, specifically, tail index ν−1
2. Estimation of high (and low) quantiles
19/28
Results
LGP Tail Index Estimation
Parameter True Mean Estimates Coverage 95% CI
ν 4 3.76 88%
ξ 0.25 0.27 88%
Comparison to Generalized Pareto Distribution (GPD)
Fit GPD to absolute values over Q0.975; expect n=100 above
Fit using Maximum likelihood
20/28
Extreme quantiles
Upper-tail Bias and Ratio of RMSE
0
5
10
0.99 0.999 0.9999 0.99999
p
QuantileBias
Estimator
GPD
LGP
0.9
1.0
1.1
1.2
1.3
0.99 0.999 0.9999 0.99999
p
QuantileRMSERatio(GPD/LGP)
GPD has lower quantile RMSE just beyond threshold (p=0.98, 0.99)
Otherwise, LGP has lower RMSE in extrapolated tails
21/28
Transition from parametric to non-parametric
F−1(p) ≈ G−1
θ ( p
h(0)), and, F−1(1 − p) ≈ G−1
θ (1 − p
h(1)), p ≈ 0
−20
−10
0
0.001 0.01 0.1
p
Quantile
Quantile
LGP CI
Parametric CI
True
Dataset 1, Lower Quantiles
0
10
20
0.9 0.99 0.999
p
Quantile
Quantile
LGP CI
Parametric
True
Dataset 1, Upper Quantiles
22/28
Conclusions and ongoing work
Summary
sLGP approach provides a promising alternative to POT
Despite bias, sLGP reduces variance sufficiently in quantile
estimation to provide lower RMSE than GPD
Ongoing work
Change gθ for one with different indices in each tail
Get theory results for estimation of θ
Extend sLGP to multivariate, time-series, spatio-temproal,
regression etc.
Application to wind speed vs direction analysis
23/28
Ongoing work
24/28
Asymmetric t: approach 1
25/28
Asymmetric t: approach 2
Could use gα,ν(y) = g(y)hα,β(G(y)) with
1. g(y) = tν0
2. hα,β is the Be(α, β) pdf
Tail-index:
Left: (αν0)−1
Right: (βν0)−1
26/28
Theory for sLGP
Theorem
If true f ∗ matches model: f ∗(y) = gθ∗ (y)h∗(Gθ∗ (y)) with
1. log h∗ ∈ Cα([0, 1]) and
2. {gθ : θ ∈ Θ} is regular at θ∗ (e.g., Cram´er conditions)
then
plim
n→∞
Π f − f ∗
1 ≥ Mnn− α
2α+1 (log n)q
|Y1, . . . , Yn = 0
for any Mn → ∞ with q = (4α + 1)/(4α + 2)
What about estimation of θ?
Is uncertainty suppressed for extreme quantiles?
27/28
Extension to dependent data
Primary idea is to use copula, which offers a conceptually
simple and sound extension and keeps model fitting relatively
simple.
There are concerns about appropriate choice of copulas –
particularly in terms of what tail dependence models are
derived from them
28/28
Wind analysis
Use polar transformation to represent data as wind direction and
wind speed and model the latter as a possibly heavy tailed data

More Related Content

What's hot

Jyokyo-kai-20120605
Jyokyo-kai-20120605Jyokyo-kai-20120605
Jyokyo-kai-20120605
ketanaka
 
Recursive Compressed Sensing
Recursive Compressed SensingRecursive Compressed Sensing
Recursive Compressed Sensing
Pantelis Sopasakis
 
Martin Roth: A spatial peaks-over-threshold model in a nonstationary climate
Martin Roth: A spatial peaks-over-threshold model in a nonstationary climateMartin Roth: A spatial peaks-over-threshold model in a nonstationary climate
Martin Roth: A spatial peaks-over-threshold model in a nonstationary climate
Jiří Šmída
 

What's hot (20)

Estimation of the score vector and observed information matrix in intractable...
Estimation of the score vector and observed information matrix in intractable...Estimation of the score vector and observed information matrix in intractable...
Estimation of the score vector and observed information matrix in intractable...
 
MAPE regression, seminar @ QUT (Brisbane)
MAPE regression, seminar @ QUT (Brisbane)MAPE regression, seminar @ QUT (Brisbane)
MAPE regression, seminar @ QUT (Brisbane)
 
Overview of Stochastic Calculus Foundations
Overview of Stochastic Calculus FoundationsOverview of Stochastic Calculus Foundations
Overview of Stochastic Calculus Foundations
 
Estimation of the score vector and observed information matrix in intractable...
Estimation of the score vector and observed information matrix in intractable...Estimation of the score vector and observed information matrix in intractable...
Estimation of the score vector and observed information matrix in intractable...
 
Jyokyo-kai-20120605
Jyokyo-kai-20120605Jyokyo-kai-20120605
Jyokyo-kai-20120605
 
Richard Everitt's slides
Richard Everitt's slidesRichard Everitt's slides
Richard Everitt's slides
 
QMC: Operator Splitting Workshop, Perturbed (accelerated) Proximal-Gradient A...
QMC: Operator Splitting Workshop, Perturbed (accelerated) Proximal-Gradient A...QMC: Operator Splitting Workshop, Perturbed (accelerated) Proximal-Gradient A...
QMC: Operator Splitting Workshop, Perturbed (accelerated) Proximal-Gradient A...
 
Accelerating Pseudo-Marginal MCMC using Gaussian Processes
Accelerating Pseudo-Marginal MCMC using Gaussian ProcessesAccelerating Pseudo-Marginal MCMC using Gaussian Processes
Accelerating Pseudo-Marginal MCMC using Gaussian Processes
 
Improved Trainings of Wasserstein GANs (WGAN-GP)
Improved Trainings of Wasserstein GANs (WGAN-GP)Improved Trainings of Wasserstein GANs (WGAN-GP)
Improved Trainings of Wasserstein GANs (WGAN-GP)
 
Recursive Compressed Sensing
Recursive Compressed SensingRecursive Compressed Sensing
Recursive Compressed Sensing
 
Martin Roth: A spatial peaks-over-threshold model in a nonstationary climate
Martin Roth: A spatial peaks-over-threshold model in a nonstationary climateMartin Roth: A spatial peaks-over-threshold model in a nonstationary climate
Martin Roth: A spatial peaks-over-threshold model in a nonstationary climate
 
Estimation of the score vector and observed information matrix in intractable...
Estimation of the score vector and observed information matrix in intractable...Estimation of the score vector and observed information matrix in intractable...
Estimation of the score vector and observed information matrix in intractable...
 
ENFPC 2010
ENFPC 2010ENFPC 2010
ENFPC 2010
 
Mark Girolami's Read Paper 2010
Mark Girolami's Read Paper 2010Mark Girolami's Read Paper 2010
Mark Girolami's Read Paper 2010
 
Phase diagram at finite T & Mu in strong coupling limit of lattice QCD
Phase diagram at finite T & Mu in strong coupling limit of lattice QCDPhase diagram at finite T & Mu in strong coupling limit of lattice QCD
Phase diagram at finite T & Mu in strong coupling limit of lattice QCD
 
Poster for Bayesian Statistics in the Big Data Era conference
Poster for Bayesian Statistics in the Big Data Era conferencePoster for Bayesian Statistics in the Big Data Era conference
Poster for Bayesian Statistics in the Big Data Era conference
 
Memory Efficient Adaptive Optimization
Memory Efficient Adaptive OptimizationMemory Efficient Adaptive Optimization
Memory Efficient Adaptive Optimization
 
REBAR: Low-variance, unbiased gradient estimates for discrete latent variable...
REBAR: Low-variance, unbiased gradient estimates for discrete latent variable...REBAR: Low-variance, unbiased gradient estimates for discrete latent variable...
REBAR: Low-variance, unbiased gradient estimates for discrete latent variable...
 
short course at CIRM, Bayesian Masterclass, October 2018
short course at CIRM, Bayesian Masterclass, October 2018short course at CIRM, Bayesian Masterclass, October 2018
short course at CIRM, Bayesian Masterclass, October 2018
 
Andreas Eberle
Andreas EberleAndreas Eberle
Andreas Eberle
 

Similar to CLIM Transition Workshop - Semiparametric Models for Extremes - Surya Tokdar, May 16, 2018

Similar to CLIM Transition Workshop - Semiparametric Models for Extremes - Surya Tokdar, May 16, 2018 (20)

Finite frequency H∞ control for wind turbine systems in T-S form
Finite frequency H∞ control for wind turbine systems in T-S formFinite frequency H∞ control for wind turbine systems in T-S form
Finite frequency H∞ control for wind turbine systems in T-S form
 
PMED Transition Workshop - A Bayesian Model for Joint Longitudinal and Surviv...
PMED Transition Workshop - A Bayesian Model for Joint Longitudinal and Surviv...PMED Transition Workshop - A Bayesian Model for Joint Longitudinal and Surviv...
PMED Transition Workshop - A Bayesian Model for Joint Longitudinal and Surviv...
 
MUMS: Bayesian, Fiducial, and Frequentist Conference - Model Selection in the...
MUMS: Bayesian, Fiducial, and Frequentist Conference - Model Selection in the...MUMS: Bayesian, Fiducial, and Frequentist Conference - Model Selection in the...
MUMS: Bayesian, Fiducial, and Frequentist Conference - Model Selection in the...
 
MUMS Undergraduate Workshop - A Biased Introduction to Global Sensitivity Ana...
MUMS Undergraduate Workshop - A Biased Introduction to Global Sensitivity Ana...MUMS Undergraduate Workshop - A Biased Introduction to Global Sensitivity Ana...
MUMS Undergraduate Workshop - A Biased Introduction to Global Sensitivity Ana...
 
Bayesian Inference and Uncertainty Quantification for Inverse Problems
Bayesian Inference and Uncertainty Quantification for Inverse ProblemsBayesian Inference and Uncertainty Quantification for Inverse Problems
Bayesian Inference and Uncertainty Quantification for Inverse Problems
 
Exact Sum Rules for Vector Channel at Finite Temperature and its Applications...
Exact Sum Rules for Vector Channel at Finite Temperature and its Applications...Exact Sum Rules for Vector Channel at Finite Temperature and its Applications...
Exact Sum Rules for Vector Channel at Finite Temperature and its Applications...
 
Estimation of the score vector and observed information matrix in intractable...
Estimation of the score vector and observed information matrix in intractable...Estimation of the score vector and observed information matrix in intractable...
Estimation of the score vector and observed information matrix in intractable...
 
Lec5 advanced-policy-gradient-methods
Lec5 advanced-policy-gradient-methodsLec5 advanced-policy-gradient-methods
Lec5 advanced-policy-gradient-methods
 
QMC: Operator Splitting Workshop, Thresholdings, Robustness, and Generalized ...
QMC: Operator Splitting Workshop, Thresholdings, Robustness, and Generalized ...QMC: Operator Splitting Workshop, Thresholdings, Robustness, and Generalized ...
QMC: Operator Splitting Workshop, Thresholdings, Robustness, and Generalized ...
 
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
 
Trilinear embedding for divergence-form operators
Trilinear embedding for divergence-form operatorsTrilinear embedding for divergence-form operators
Trilinear embedding for divergence-form operators
 
MUMS Opening Workshop - An Overview of Reduced-Order Models and Emulators (ED...
MUMS Opening Workshop - An Overview of Reduced-Order Models and Emulators (ED...MUMS Opening Workshop - An Overview of Reduced-Order Models and Emulators (ED...
MUMS Opening Workshop - An Overview of Reduced-Order Models and Emulators (ED...
 
PCA on graph/network
PCA on graph/networkPCA on graph/network
PCA on graph/network
 
Fourier_Pricing_ICCF_2022.pdf
Fourier_Pricing_ICCF_2022.pdfFourier_Pricing_ICCF_2022.pdf
Fourier_Pricing_ICCF_2022.pdf
 
Ch06 6
Ch06 6Ch06 6
Ch06 6
 
Adaptive dynamic programming algorithm for uncertain nonlinear switched systems
Adaptive dynamic programming algorithm for uncertain nonlinear switched systemsAdaptive dynamic programming algorithm for uncertain nonlinear switched systems
Adaptive dynamic programming algorithm for uncertain nonlinear switched systems
 
guenomu software -- model and agorithm in 2013
guenomu software -- model and agorithm in 2013guenomu software -- model and agorithm in 2013
guenomu software -- model and agorithm in 2013
 
Spectral sum rules for conformal field theories
Spectral sum rules for conformal field theoriesSpectral sum rules for conformal field theories
Spectral sum rules for conformal field theories
 
A common unique random fixed point theorem in hilbert space using integral ty...
A common unique random fixed point theorem in hilbert space using integral ty...A common unique random fixed point theorem in hilbert space using integral ty...
A common unique random fixed point theorem in hilbert space using integral ty...
 
Complex analysis notes
Complex analysis notesComplex analysis notes
Complex analysis notes
 

More from The Statistical and Applied Mathematical Sciences Institute

Causal Inference Opening Workshop - Smooth Extensions to BART for Heterogeneo...
Causal Inference Opening Workshop - Smooth Extensions to BART for Heterogeneo...Causal Inference Opening Workshop - Smooth Extensions to BART for Heterogeneo...
Causal Inference Opening Workshop - Smooth Extensions to BART for Heterogeneo...
The Statistical and Applied Mathematical Sciences Institute
 
Causal Inference Opening Workshop - Difference-in-differences: more than meet...
Causal Inference Opening Workshop - Difference-in-differences: more than meet...Causal Inference Opening Workshop - Difference-in-differences: more than meet...
Causal Inference Opening Workshop - Difference-in-differences: more than meet...
The Statistical and Applied Mathematical Sciences Institute
 
Causal Inference Opening Workshop - New Statistical Learning Methods for Esti...
Causal Inference Opening Workshop - New Statistical Learning Methods for Esti...Causal Inference Opening Workshop - New Statistical Learning Methods for Esti...
Causal Inference Opening Workshop - New Statistical Learning Methods for Esti...
The Statistical and Applied Mathematical Sciences Institute
 
Causal Inference Opening Workshop - Bipartite Causal Inference with Interfere...
Causal Inference Opening Workshop - Bipartite Causal Inference with Interfere...Causal Inference Opening Workshop - Bipartite Causal Inference with Interfere...
Causal Inference Opening Workshop - Bipartite Causal Inference with Interfere...
The Statistical and Applied Mathematical Sciences Institute
 
Causal Inference Opening Workshop - Bracketing Bounds for Differences-in-Diff...
Causal Inference Opening Workshop - Bracketing Bounds for Differences-in-Diff...Causal Inference Opening Workshop - Bracketing Bounds for Differences-in-Diff...
Causal Inference Opening Workshop - Bracketing Bounds for Differences-in-Diff...
The Statistical and Applied Mathematical Sciences Institute
 
Causal Inference Opening Workshop - Targeted Learning for Causal Inference Ba...
Causal Inference Opening Workshop - Targeted Learning for Causal Inference Ba...Causal Inference Opening Workshop - Targeted Learning for Causal Inference Ba...
Causal Inference Opening Workshop - Targeted Learning for Causal Inference Ba...
The Statistical and Applied Mathematical Sciences Institute
 

More from The Statistical and Applied Mathematical Sciences Institute (20)

Causal Inference Opening Workshop - Latent Variable Models, Causal Inference,...
Causal Inference Opening Workshop - Latent Variable Models, Causal Inference,...Causal Inference Opening Workshop - Latent Variable Models, Causal Inference,...
Causal Inference Opening Workshop - Latent Variable Models, Causal Inference,...
 
2019 Fall Series: Special Guest Lecture - 0-1 Phase Transitions in High Dimen...
2019 Fall Series: Special Guest Lecture - 0-1 Phase Transitions in High Dimen...2019 Fall Series: Special Guest Lecture - 0-1 Phase Transitions in High Dimen...
2019 Fall Series: Special Guest Lecture - 0-1 Phase Transitions in High Dimen...
 
Causal Inference Opening Workshop - Causal Discovery in Neuroimaging Data - F...
Causal Inference Opening Workshop - Causal Discovery in Neuroimaging Data - F...Causal Inference Opening Workshop - Causal Discovery in Neuroimaging Data - F...
Causal Inference Opening Workshop - Causal Discovery in Neuroimaging Data - F...
 
Causal Inference Opening Workshop - Smooth Extensions to BART for Heterogeneo...
Causal Inference Opening Workshop - Smooth Extensions to BART for Heterogeneo...Causal Inference Opening Workshop - Smooth Extensions to BART for Heterogeneo...
Causal Inference Opening Workshop - Smooth Extensions to BART for Heterogeneo...
 
Causal Inference Opening Workshop - A Bracketing Relationship between Differe...
Causal Inference Opening Workshop - A Bracketing Relationship between Differe...Causal Inference Opening Workshop - A Bracketing Relationship between Differe...
Causal Inference Opening Workshop - A Bracketing Relationship between Differe...
 
Causal Inference Opening Workshop - Testing Weak Nulls in Matched Observation...
Causal Inference Opening Workshop - Testing Weak Nulls in Matched Observation...Causal Inference Opening Workshop - Testing Weak Nulls in Matched Observation...
Causal Inference Opening Workshop - Testing Weak Nulls in Matched Observation...
 
Causal Inference Opening Workshop - Difference-in-differences: more than meet...
Causal Inference Opening Workshop - Difference-in-differences: more than meet...Causal Inference Opening Workshop - Difference-in-differences: more than meet...
Causal Inference Opening Workshop - Difference-in-differences: more than meet...
 
Causal Inference Opening Workshop - New Statistical Learning Methods for Esti...
Causal Inference Opening Workshop - New Statistical Learning Methods for Esti...Causal Inference Opening Workshop - New Statistical Learning Methods for Esti...
Causal Inference Opening Workshop - New Statistical Learning Methods for Esti...
 
Causal Inference Opening Workshop - Bipartite Causal Inference with Interfere...
Causal Inference Opening Workshop - Bipartite Causal Inference with Interfere...Causal Inference Opening Workshop - Bipartite Causal Inference with Interfere...
Causal Inference Opening Workshop - Bipartite Causal Inference with Interfere...
 
Causal Inference Opening Workshop - Bridging the Gap Between Causal Literatur...
Causal Inference Opening Workshop - Bridging the Gap Between Causal Literatur...Causal Inference Opening Workshop - Bridging the Gap Between Causal Literatur...
Causal Inference Opening Workshop - Bridging the Gap Between Causal Literatur...
 
Causal Inference Opening Workshop - Some Applications of Reinforcement Learni...
Causal Inference Opening Workshop - Some Applications of Reinforcement Learni...Causal Inference Opening Workshop - Some Applications of Reinforcement Learni...
Causal Inference Opening Workshop - Some Applications of Reinforcement Learni...
 
Causal Inference Opening Workshop - Bracketing Bounds for Differences-in-Diff...
Causal Inference Opening Workshop - Bracketing Bounds for Differences-in-Diff...Causal Inference Opening Workshop - Bracketing Bounds for Differences-in-Diff...
Causal Inference Opening Workshop - Bracketing Bounds for Differences-in-Diff...
 
Causal Inference Opening Workshop - Assisting the Impact of State Polcies: Br...
Causal Inference Opening Workshop - Assisting the Impact of State Polcies: Br...Causal Inference Opening Workshop - Assisting the Impact of State Polcies: Br...
Causal Inference Opening Workshop - Assisting the Impact of State Polcies: Br...
 
Causal Inference Opening Workshop - Experimenting in Equilibrium - Stefan Wag...
Causal Inference Opening Workshop - Experimenting in Equilibrium - Stefan Wag...Causal Inference Opening Workshop - Experimenting in Equilibrium - Stefan Wag...
Causal Inference Opening Workshop - Experimenting in Equilibrium - Stefan Wag...
 
Causal Inference Opening Workshop - Targeted Learning for Causal Inference Ba...
Causal Inference Opening Workshop - Targeted Learning for Causal Inference Ba...Causal Inference Opening Workshop - Targeted Learning for Causal Inference Ba...
Causal Inference Opening Workshop - Targeted Learning for Causal Inference Ba...
 
Causal Inference Opening Workshop - Bayesian Nonparametric Models for Treatme...
Causal Inference Opening Workshop - Bayesian Nonparametric Models for Treatme...Causal Inference Opening Workshop - Bayesian Nonparametric Models for Treatme...
Causal Inference Opening Workshop - Bayesian Nonparametric Models for Treatme...
 
2019 Fall Series: Special Guest Lecture - Adversarial Risk Analysis of the Ge...
2019 Fall Series: Special Guest Lecture - Adversarial Risk Analysis of the Ge...2019 Fall Series: Special Guest Lecture - Adversarial Risk Analysis of the Ge...
2019 Fall Series: Special Guest Lecture - Adversarial Risk Analysis of the Ge...
 
2019 Fall Series: Professional Development, Writing Academic Papers…What Work...
2019 Fall Series: Professional Development, Writing Academic Papers…What Work...2019 Fall Series: Professional Development, Writing Academic Papers…What Work...
2019 Fall Series: Professional Development, Writing Academic Papers…What Work...
 
2019 GDRR: Blockchain Data Analytics - Machine Learning in/for Blockchain: Fu...
2019 GDRR: Blockchain Data Analytics - Machine Learning in/for Blockchain: Fu...2019 GDRR: Blockchain Data Analytics - Machine Learning in/for Blockchain: Fu...
2019 GDRR: Blockchain Data Analytics - Machine Learning in/for Blockchain: Fu...
 
2019 GDRR: Blockchain Data Analytics - QuTrack: Model Life Cycle Management f...
2019 GDRR: Blockchain Data Analytics - QuTrack: Model Life Cycle Management f...2019 GDRR: Blockchain Data Analytics - QuTrack: Model Life Cycle Management f...
2019 GDRR: Blockchain Data Analytics - QuTrack: Model Life Cycle Management f...
 

Recently uploaded

Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
kauryashika82
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
ciinovamais
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
PECB
 

Recently uploaded (20)

Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdf
 
Micro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfMicro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdf
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104
 
Energy Resources. ( B. Pharmacy, 1st Year, Sem-II) Natural Resources
Energy Resources. ( B. Pharmacy, 1st Year, Sem-II) Natural ResourcesEnergy Resources. ( B. Pharmacy, 1st Year, Sem-II) Natural Resources
Energy Resources. ( B. Pharmacy, 1st Year, Sem-II) Natural Resources
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan Fellows
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SD
 
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
 
Role Of Transgenic Animal In Target Validation-1.pptx
Role Of Transgenic Animal In Target Validation-1.pptxRole Of Transgenic Animal In Target Validation-1.pptx
Role Of Transgenic Animal In Target Validation-1.pptx
 
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot Graph
 
General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...
 
Unit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxUnit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptx
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
Sociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning ExhibitSociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning Exhibit
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
 
Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 

CLIM Transition Workshop - Semiparametric Models for Extremes - Surya Tokdar, May 16, 2018

  • 1. 1/28 Semiparametric Models for Analyzing Extremes Surya T Tokdar and Erika Cunningham Duke University Thanks to: Whitney Huang, Michael Stein, Michael Wehner and others in the Extremes Semiparametric Subgroup
  • 3. 3/28 Analyzing extremes How to predict 1000-year flood from limited data? 0 2000 4000 6000 8000 020406080 Index Dailyrainfallfornon−drydays(mm) ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ●● ●● ● ●●● ● ● ● ●● ● ● ● ●● ● Common practice: Model values over threshold (“Peaks Over Thresholds” or ”POT”), by throwing away bulk of data to improve tail estimation
  • 4. 4/28 Motivation behind POT The main motivation is to obtain low-dimensional parametric estimation that focuses primarily on the tail decay rate which may be quantified by the tail index1 ξ By Pickands-Balkema-de Haan Theorem, the truncated CDF above a large threshold is well approximated by a generalized Pareto distribution with matching tail index ξ 1 For polynomially decaying tails, ξ = limy→∞ − log(1−F(y)) log y
  • 5. 5/28 Issues with POT Setting the correct/optimal threshold is extremely challenging POT is difficult to extend as a model for more complex data with spatio-temporal dependence or other structures Our goal: develop a semiparametric model for heavy tailed data where the tails are estimated under parametric assumptions whereas the center is estimated nonparametrically!
  • 6. 6/28 Transformation to separate tail from the bulk
  • 7. 7/28 Transformation Setting. Data range = (a, b), with a = −∞ and/or b = ∞ {gθ : θ ∈ Θ} a parametric family of pdfs on (a, b) Gθ denotes the CDF of gθ Lemma For any pdf f on (a, b) and any θ ∈ Θ there exists a unique pdf h = hθ,f on (0, 1) such that f (y) = gθ(y)h(Gθ(y)), y ∈ (a, b). Proof. Take Y ∼ f and take h to be the pdf of U = Gθ(Y )
  • 8. 8/28 Tail matching Suppose gθ and f are continuous densities Then h = hθ,f is continuous and the two limits lim y a f (y) gθ(y) = lim u 0 h(u) =: h(0), lim y b f (y) gθ(y) = lim u 1 h(u) =: h(1) exist but could equal 0 or ∞. Corollary f and gθ have same right and/or left tail index if and only if 0 < h(1) < ∞ and/or 0 < h(0) < ∞
  • 9. 9/28 Tail-identified transformation Definition The family {gθ : θ ∈ Θ} is tail-identified if θ = θ implies gθ and gθ have distinct right and/or left tail indices. Lemma If {gθ : θ ∈ Θ} is tail-identified then for any pdf f on (a, b) there is at most one θf ∈ Θ with h = hθf ,f satisfying 0 < h(0), h(1) < ∞.
  • 10. 10/28 Semiparametric density model for bulk + tail {gθ : θ ∈ Θ} a tail-identified family H := {h(·) a cont pdf on [0, 1] : 0 < h(0), h(1) < ∞} F := {f (·) = gθ(·)h(Gθ(·)) : θ ∈ Θ, h ∈ H} Model: Y1, Y2, . . . IID ∼ f , f ∈ F
  • 12. 12/28 Logistic GP prior on H Definition (The logistic transform) L : C([0, 1]) → H given by (Lw)(u) = ew(u) 1 0 ew(t)dt , u ∈ [0, 1]. Definition (The logistic GP) LGP(µ, σ) = L∗GP(µ, σ) I.e., h ∼ LGP(µ, σ) ⇐⇒ h = Lw with w ∼ GP(µ, σ)
  • 13. 13/28 sLGP heavy-tailed density estimation on R Model: Y1, . . . , Yn ∼ f (·) = gθ(·)h(Gθ(·)) h ∼ LGP(0, κ2CSE λ ), (κ2, λ2) ∼ Ga−1 × Ga−1 gθ = tν(µ, τ2), θ = (µ, τ2, ν) ∼ 1 τ2 × πν(ν)
  • 14. 14/28 gθ = t3(0, 1), λ = 0.3 (top) and 0.08 (bottom) 0.0 0.2 0.4 0.6 0.8 1.0 −2−10123 w u w(u) 0.0 0.2 0.4 0.6 0.8 1.0 0.51.01.52.0 h u h(u) −4 −2 0 2 4 0.00.20.40.6 f y f(y) 0.0 0.2 0.4 0.6 0.8 1.0 −2−1012 w u w(u) 0.0 0.2 0.4 0.6 0.8 1.0 012345 h u h(u) −4 −2 0 2 4 0.00.51.01.5 f y f(y)
  • 15. 15/28 Model fitting Low-rank approximation to GP w Discretization of length-scale λ over a dense grid Precomputed covariance matrix + Cholesky factors Adaptive Metropolis MCMC
  • 18. 18/28 Simulation Simulation setup Mixture standard normal and centered t4. 100 data sets, n=2000 −10 −5 0 5 10 0.000.050.100.150.200.25 Simulation 2 x Density Mixture centered normal & t with 4 df Check 1. Parameters of gθ, specifically, tail index ν−1 2. Estimation of high (and low) quantiles
  • 19. 19/28 Results LGP Tail Index Estimation Parameter True Mean Estimates Coverage 95% CI ν 4 3.76 88% ξ 0.25 0.27 88% Comparison to Generalized Pareto Distribution (GPD) Fit GPD to absolute values over Q0.975; expect n=100 above Fit using Maximum likelihood
  • 20. 20/28 Extreme quantiles Upper-tail Bias and Ratio of RMSE 0 5 10 0.99 0.999 0.9999 0.99999 p QuantileBias Estimator GPD LGP 0.9 1.0 1.1 1.2 1.3 0.99 0.999 0.9999 0.99999 p QuantileRMSERatio(GPD/LGP) GPD has lower quantile RMSE just beyond threshold (p=0.98, 0.99) Otherwise, LGP has lower RMSE in extrapolated tails
  • 21. 21/28 Transition from parametric to non-parametric F−1(p) ≈ G−1 θ ( p h(0)), and, F−1(1 − p) ≈ G−1 θ (1 − p h(1)), p ≈ 0 −20 −10 0 0.001 0.01 0.1 p Quantile Quantile LGP CI Parametric CI True Dataset 1, Lower Quantiles 0 10 20 0.9 0.99 0.999 p Quantile Quantile LGP CI Parametric True Dataset 1, Upper Quantiles
  • 22. 22/28 Conclusions and ongoing work Summary sLGP approach provides a promising alternative to POT Despite bias, sLGP reduces variance sufficiently in quantile estimation to provide lower RMSE than GPD Ongoing work Change gθ for one with different indices in each tail Get theory results for estimation of θ Extend sLGP to multivariate, time-series, spatio-temproal, regression etc. Application to wind speed vs direction analysis
  • 25. 25/28 Asymmetric t: approach 2 Could use gα,ν(y) = g(y)hα,β(G(y)) with 1. g(y) = tν0 2. hα,β is the Be(α, β) pdf Tail-index: Left: (αν0)−1 Right: (βν0)−1
  • 26. 26/28 Theory for sLGP Theorem If true f ∗ matches model: f ∗(y) = gθ∗ (y)h∗(Gθ∗ (y)) with 1. log h∗ ∈ Cα([0, 1]) and 2. {gθ : θ ∈ Θ} is regular at θ∗ (e.g., Cram´er conditions) then plim n→∞ Π f − f ∗ 1 ≥ Mnn− α 2α+1 (log n)q |Y1, . . . , Yn = 0 for any Mn → ∞ with q = (4α + 1)/(4α + 2) What about estimation of θ? Is uncertainty suppressed for extreme quantiles?
  • 27. 27/28 Extension to dependent data Primary idea is to use copula, which offers a conceptually simple and sound extension and keeps model fitting relatively simple. There are concerns about appropriate choice of copulas – particularly in terms of what tail dependence models are derived from them
  • 28. 28/28 Wind analysis Use polar transformation to represent data as wind direction and wind speed and model the latter as a possibly heavy tailed data