The document presents a methodology for decomposing dependence for high-dimensional extremes using a tail pairwise dependence matrix (TPDM). The authors apply the method to analyze spatial precipitation extremes across 409 stations in the United States. They estimate the TPDM from the station data and perform an eigendecomposition to identify principal components that explain extremal dependence structures. The first few eigenvectors and eigenvalues account for a large proportion of the total dependence. The extremal principal components may provide insight into climate drivers of extreme precipitation patterns.
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
Decompositions of Dependence for High-Dimensional Extremes Applied to Spatial Precipitation
1. Decompositions of Dependence for High-Dimensional
Extremes
Applied to Spatial Precipitation Extremes
Yujing Jiang1 Dan Cooley1 Michael Wehner2
1Department of Statistics
Colorado State University
2Lawrence Berkeley National Laboratory
May 16, 2018
Yujing Jiang, Dan Cooley, Michael Wehner (Universities of Somewhere and Elsewhere)Decompositions of Dependence for High-Dimensional ExtremesMay 16, 2018 1 / 26
2. Outline
1 Prelude
2 Methodology: Tail Pairwise Dependence Matrix (TPDM) and Extreme
Principal Component
3 Data Analysis
Yujing Jiang, Dan Cooley, Michael Wehner (Universities of Somewhere and Elsewhere)Decompositions of Dependence for High-Dimensional ExtremesMay 16, 2018 2 / 26
3. Extreme Precipitation Over US
25
30
35
40
45
50
-120 -100 -80
long
lat
Distribution of the Selected Stations
Yujing Jiang, Dan Cooley, Michael Wehner (Universities of Somewhere and Elsewhere)Decompositions of Dependence for High-Dimensional ExtremesMay 16, 2018 3 / 26
4. Time Series of Scores for a Specific Principal Component
−50
0
50
1960 1980 2000
order
score
low
medium
high
ENSO vs score 2
Above is a time series of a specific principle component.
For standard PCA for climate mean state, it provides information for
change in the pattern across time.
Standard PCA used covariance matrix, which is not suitable for
extremes.
Similar stuff for extremes? Tail Pairwise Dependence Matrix (TPDM).
Yujing Jiang, Dan Cooley, Michael Wehner (Universities of Somewhere and Elsewhere)Decompositions of Dependence for High-Dimensional ExtremesMay 16, 2018 4 / 26
5. Motivation
Covariance matrix. Not suitable for extremes.
Dependence structure of multivariate extremes can be formidable to
describe, e.g., max-stable process.
Tail pairwise dependence matrix (TPDM): using pairwise extremal
dependence to provide a profile for multivariate extreme dependency.
Yujing Jiang, Dan Cooley, Michael Wehner (Universities of Somewhere and Elsewhere)Decompositions of Dependence for High-Dimensional ExtremesMay 16, 2018 5 / 26
6. Regularly Varying Random Vector
Multivariate regularly variation is a framework used to describe tail
dependency.
Relationship between regularly varying random vector and
heavy-tailed multivariate extreme distribution.
Suppose X is a random vector in Rp whose component are all
positive.
X is regularly varying if nPr(b−1
n X ∈ ·)
v
→ νX (·), where bn → ∞, νX
is a measure and
v
→ denotes vague convergence.
Yujing Jiang, Dan Cooley, Michael Wehner (Universities of Somewhere and Elsewhere)Decompositions of Dependence for High-Dimensional ExtremesMay 16, 2018 6 / 26
7. Regularly Varying Random Vector
ν(tB) = t−α
ν(B).
X is regularly varying with index α > 0.
α is the index of regular variation, it describes the power law behavior
of the tail distribution.
The larger α is, the less likely of “extremely large”observations.
Yujing Jiang, Dan Cooley, Michael Wehner (Universities of Somewhere and Elsewhere)Decompositions of Dependence for High-Dimensional ExtremesMay 16, 2018 7 / 26
8. Angular Measure
We can do the following polar decomposition: X = R × W
(R = X : radial component, W = X −1X: angular component).
Another definition for regularly variation is: if there exists a
normalizing sequence bn → ∞ where , such that
nP b−1
n R > r, W ∈ A
v
→ r−α
H(A)
where p is the dimension of X, and where H is some probability
measure on the unit ‘ball’ Sp = {x ∈ Rp | x = 1}.
Angular measure H describes distribution of directions – completely
describes dependence.
Yujing Jiang, Dan Cooley, Michael Wehner (Universities of Somewhere and Elsewhere)Decompositions of Dependence for High-Dimensional ExtremesMay 16, 2018 8 / 26
9. Polar Decomposition in a Picture
nP b−1
n R > r, W ∈ A
v
→ r−α
H(A)
H is hard to estimate if it is in high dimensions. Therefore, TPDM
can be used to summarize dependency.
Yujing Jiang, Dan Cooley, Michael Wehner (Universities of Somewhere and Elsewhere)Decompositions of Dependence for High-Dimensional ExtremesMay 16, 2018 9 / 26
10. Note
Definition of regularly variation implies each marginal random variable
is a univariate regularly varying random variable with index α, which
is the same for all margins.
If this is not the case for the data, apply probability integral transform
which is a common practice for multivariate extremes.
In our case, we render the marginal distributions Fr ´echet distribution
with α = 2, i.e., F(x) = exp(−x−2).
Yujing Jiang, Dan Cooley, Michael Wehner (Universities of Somewhere and Elsewhere)Decompositions of Dependence for High-Dimensional ExtremesMay 16, 2018 10 / 26
11. Definition of TPDM
Assume X ∈ Rp, X > 0 and regularly varying with α = 2.
Then
nPr(n−1/2X ∈ ·)
v
→ νX (·), where νX (dr × dw) = 2r−3drdHX (w).
HX is a Radon measure on the L2 unit ball
Θp−1 = {w ∈ Rp, w > 0 : w 2 = 1}.
Pairwise dependency: σXik
= Θp−1
wi wkdHX (w).
TPDM: ΣX = (σXik
)i,k=1,...,p.
TPDM summarizes the multivariate extremal dependence through a
pairwise matrix.
Yujing Jiang, Dan Cooley, Michael Wehner (Universities of Somewhere and Elsewhere)Decompositions of Dependence for High-Dimensional ExtremesMay 16, 2018 11 / 26
12. Analogous Properties between TPDM and Covariance
Matrix
The diagonal elements indicate the scale of the corresponding random
variable.
If an off-diagonal element is 0, that indicates asymptotic
independence of the corresponding pair.
TPDM is non-negative definite.
Yujing Jiang, Dan Cooley, Michael Wehner (Universities of Somewhere and Elsewhere)Decompositions of Dependence for High-Dimensional ExtremesMay 16, 2018 12 / 26
13. Estimation of TPDM
The estimation of ΣX is as the following:
ˆσij = 2 ∗ n−1
exc,ij Σ
nsamp
t=1 wti wtj I(rtij > r0),
where rtij = (xti , xtj ) 2, nexc,ij = Σ
nsamp
t=1 I(rtij > r0), and when α = 2
wi +wj =1,wi >=0,wj >=0
dHij (wij ) = 2.
.
Yujing Jiang, Dan Cooley, Michael Wehner (Universities of Somewhere and Elsewhere)Decompositions of Dependence for High-Dimensional ExtremesMay 16, 2018 13 / 26
14. Eigen decomposition
Non-extreme PCA:
Eigen-decomposition of Σcov = PDPt.
P: a matrix whose columns are eigenvectors pi ,
forms a basis for Rp
, allows for reconstruction.
D: a diagonal matrix, diagonal elements are λi .
Extreme PCA:
Eigen decomposition of ΣTPDM = PDPt.
P: a matrix whose columns are eigenvectors pi
ei = t(pi ), forms a basis in the positive orthant. (Cooley and Emeric,
2018).
D: a diagonal matrix of eigenvalues λi .
λi ’s relate to the scale of extreme principal components.
Yujing Jiang, Dan Cooley, Michael Wehner (Universities of Somewhere and Elsewhere)Decompositions of Dependence for High-Dimensional ExtremesMay 16, 2018 14 / 26
15. Data
Dataset: Global Historical Climatology Network (GHCN)
Total daily precipitation over the US continent from 1950 to 2016.
Stations which have less than 1% of missing values during this period
were selected (409 stations in total).
Further, focus on the hurricane season (August to October). 6162
days.
The data have been pre-processed.
The ENSO index (Oceanic Ni˜no Index (ONI)) is also collected for
ASO from 1950 to 2016 to explore if there is anything interesting.
( Link )
Yujing Jiang, Dan Cooley, Michael Wehner (Universities of Somewhere and Elsewhere)Decompositions of Dependence for High-Dimensional ExtremesMay 16, 2018 15 / 26
16. Data Preprocessing
Possible aspects of the data that is inconvenient for analysis:
1 Align extremes which happen on consecutive days.
Solution: 3-day moving average.
2 3-day moving average may ‘repeatedly count’ the jumps and induces
clustering into the data.
Solution: ECDF smoothing to alleviate the effect of the repeat count
of extremes. Further declustering is done in the estimation of TPDM
(Coles, 2001).
3 To apply our approach, needs to transform the data so it is regularly
varying with α = 2.
Solution: Probability integral transformation.
Yujing Jiang, Dan Cooley, Michael Wehner (Universities of Somewhere and Elsewhere)Decompositions of Dependence for High-Dimensional ExtremesMay 16, 2018 16 / 26
17. Estimate the TPDM Σ (q = 0.98) and carry out eigen decomposition
of the matrix: Σ = PDPt.
The first 10 eigen values (λi ) are:
105.98, 19.63, 15.02, 11.29, 10.32, 8.51, 6.80, 6.44, 5.70, 5.06.
The correspond-
ing proportion of each eigen values above to the sum of eigen values are:
25.9%, 4.80%, 3.67%, 2.76%, 2.52%, 2.08%, 1.66%, 1.57%, 1.37%, 1.23%.
See following pages for the eigen vectors.
We also calculate the extremal principal components V = Ptt−1(X)
= (v1, . . . , vp).
Yujing Jiang, Dan Cooley, Michael Wehner (Universities of Somewhere and Elsewhere)Decompositions of Dependence for High-Dimensional ExtremesMay 16, 2018 17 / 26
21. Reconstruction using Eigen Vectors
25
30
35
40
45
50
-120 -100 -80
long
lat
0
20
40
60
vec
Observation Values
25
30
35
40
45
50
-120 -100 -80
long
lat
0
20
40
60
vec
Reconstruction of the Observation Using the First 409 Eigenvectors
25
30
35
40
45
50
-120 -100 -80
long
lat
0
20
40
60
vec
Reconstruction of the Observation Using the First 2 Eigenvectors
25
30
35
40
45
50
-120 -100 -80
long
lat
0
20
40
60
vec
Reconstruction of the Observation Using the First 10 Eigenvectors
Yujing Jiang, Dan Cooley, Michael Wehner (Universities of Somewhere and Elsewhere)Decompositions of Dependence for High-Dimensional ExtremesMay 16, 2018 21 / 26
22. Likelihood Ratio Test for Score 2
−50
0
50
1960 1980 2000
order
score
low
medium
high
ENSO vs score 2
Large scores in the negative direction indicates precipitation on the
east coast.
Likelihood ratio test: H0: No difference in the largest scores (large in
magnitude) in the negative direction between La Ni˜na phase and the
rest.
The p-value of the likelihood test by fitting GPD distribution to the
values above 95th percentile after declustering is 0.01368.
Power may be low.
Yujing Jiang, Dan Cooley, Michael Wehner (Universities of Somewhere and Elsewhere)Decompositions of Dependence for High-Dimensional ExtremesMay 16, 2018 22 / 26
25. Summary
Used tail pairwise dependence matrix (TPDM) to describe the
extremal dependence among precipitation at stations over the US
continent.
Obtained the eigen vectors of the TPDM hoping to see if there is any
pattern that is of possible interest.
Compared the time series of scores of the eigen vectors with the
ENSO phase to see if there is anything interesting. Likelihood ratio
test for scores of eigen vector 2 shows that there is a statistically
significant difference in the time series between La Ni˜na phase and
the rest.
Any input would be more than welcome.
Thanks!
Yujing Jiang, Dan Cooley, Michael Wehner (Universities of Somewhere and Elsewhere)Decompositions of Dependence for High-Dimensional ExtremesMay 16, 2018 25 / 26
26. Reference
Coles, S. G. (2001), An Introduction to Statistical Modeling of Extreme
Values, London: Springer.
Cooley, D. and Emeric, T. (2018), “Decompositions of Dependence for
High-Dimensional Extremes,” .
Yujing Jiang, Dan Cooley, Michael Wehner (Universities of Somewhere and Elsewhere)Decompositions of Dependence for High-Dimensional ExtremesMay 16, 2018 26 / 26