SlideShare a Scribd company logo
1 of 67
Download to read offline
Generalized probabilistic principal component analysis
of correlated data
Mengyang Gu and Weining Shen
Department of Applied Mathematics and Statistics
Johns Hopkins University
Department of Statistics
University of California, Irvine
SAMSI BFF Conference
Mengyang Gu (Johns Hopkins University) GPPCA SAMSI BFF Conference 1 / 54
Outline
1 Introduction
2 Generalized probabilistic principal component analysis (GPPCA)
3 GPPCA with a mean structure
4 Simulated examples
Correctly specified models
Misspecified models
5 Real examples
Humanity computer model with multiple outputs
Global gridded temperature anomalies
6 Future directions
Mengyang Gu (Johns Hopkins University) GPPCA SAMSI BFF Conference 2 / 54
Introduction
Outline
1 Introduction
2 Generalized probabilistic principal component analysis (GPPCA)
3 GPPCA with a mean structure
4 Simulated examples
5 Real examples
6 Future directions
Mengyang Gu (Johns Hopkins University) GPPCA SAMSI BFF Conference 3 / 54
Introduction
NOAA monthly gridded temperature anomalies
−5
0
5
[oC]
50 150 250 350
−50
0
50
NOAA Temperature Anomalies in Feb 2017
Longitude
Latitude
−5
0
5
[oC]
50 150 250 350
−50
0
50
NOAA Temperature Anomalies in Dec 2018
Longitude
Latitude
Figure 1: NOAA monthly gridded temperature anomalies.
Mengyang Gu (Johns Hopkins University) GPPCA SAMSI BFF Conference 4 / 54
Introduction
Ground deformation by radar interferograms
−5000 0 5000
−6000−2000020006000
interferogram 1
x1
x2
−0.05
0.00
0.05
m/yr
−5000 0 5000
−6000−2000020006000
interferogram 2
x1
x2
−0.05
0.00
0.05
m/yr
−5000 0 5000
−6000−2000020006000
interferogram 3
x1
x2
−0.05
0.00
0.05
m/yr
−5000 0 5000
−6000−2000020006000
interferogram 4
x1
x2
−0.05
0.00
0.05
m/yr
−5000 0 5000
−6000−2000020006000
interferogram 5
x1
x2
−0.05
0.00
0.05
m/yr
Figure 2: Five interferometric synthetic aperture radar (InSAR) interferograms spanning the
following time periods: 1) 17 Oct 2011 - 04 May 2012; 2) 21 Oct 2011 - 16 May 2012; 3); 20
Oct 2011 to 15 May 2012; 4) 28 Oct 2011 to 11 May 2012; 5) 12 Oct 2011 - 07 May 2012.
The black curves show cliffs and other important topographic features at K¯ılauea; the large
elliptical feature is K¯ılauea Caldera. The color indicates the ground deformation rate per year.
The figures are from [Gu and Anderson, 2018].
Mengyang Gu (Johns Hopkins University) GPPCA SAMSI BFF Conference 5 / 54
Introduction
Emulation of computer models with multiple outputs
Figure 3: Median (truncated at 20 meters at the volcanic center region) and
interquartile range of the GaSP emulator of ‘maximum flow height over time’ for
TITAN2D, at 23,040 spatial locations over Montserrat Island and for new input
values V ∗
= 106.9984
, ϕ∗
= 3.3487, δ∗
bed = 10.8790, and δ∗
int = 31.0300. The
figures are from [Gu and Berger, 2016].
Mengyang Gu (Johns Hopkins University) GPPCA SAMSI BFF Conference 6 / 54
Introduction
Multiple sequences/time series
q
q
q
q
qq
q
q
q
qq
q
q
qq
qq
qq
q
q
q
q
q
q
qq
q
q
q
q
q
qq
q
q
q
qq
q
qqq
qq
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
qq
q
q
qq
q
q
q
qq
q
qq
q
q
q
q
q
q
q
q
qq
q
q
q
qq
qqq
q
q
qqqqq
q
qq
q
qq
qqqq
qqqqq
qq
q
q
qqq
qqq
qqqq
q
qqq
q
q
qq
qqqqqqqqq
qqqqqqqqqqqqqq
qqq
q
qq
q
qqqqqqq
qqqqqqqqqq
qqq
qqq
qqqq
q
qqqqqq
qqqqq
q
qqqqqqqqqqqqqqqqqq
qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq
qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq
qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq
qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq
qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq
qqqqqqqqqqqqq
qqqqqqqqqqqqqqqqqq
qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq
qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq
qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq
qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq
qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq
qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq
qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq
qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq
qqqqqqqqqqq
qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq
q
qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq
qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq
qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq
qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq
qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq
q
qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq
qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq
q
qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq
qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq
qqqqqqqqq
qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq
qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq
qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq
qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq
qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq
qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq
qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq
q
qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq
qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq
0 1000 2000 3000 4000 5000
0.00.20.40.60.81.0
CpG site distance
Correlation
0
5
10
15
20
25
0 5 10 15 20 25
Sample
Sample
0.7
0.8
0.9
1.0
value
Figure 4: Empirical correlation of methylation levels across sites (left panel) and
across samples (right panel), based on 24 samples and one million methylation
levels in chromosome 1 of each sample. The figures are from [Gu and Xu, 2017].
Similar data include
multiple time series
health records
Mengyang Gu (Johns Hopkins University) GPPCA SAMSI BFF Conference 7 / 54
Introduction
A latent factor model
Let y(x) = (y1(x), ..., yk(x))T
be a k-dimensional real-valued output vector at a
p-dimensional input vector x. Assume yj(x) has zero mean for now.
Consider the following latent factor model
y(x) = Az(x) + , (1)
The k × d factor loading matrix A = [a1, ..., ad] relates the k-dimensional
outputs to a d-dimensional factor processes z(x) = (z1(x), ..., zd(x))T
,
where d ≤ k. Assume the independence between any two factor processes.
Assume Zl = (zl(x1), ..., zl(xn)) follows a multivariate normal distribution
ZT
l ∼ MN(0, Σl), (2)
where Σl can be parameterized by a covariance function such that the (i, j)
entry of Σl is σ2
l Kl(xi, xj), where Kl(·, ·) is a kernel function, for
l = 1, ..., d and 1 ≤ i, j ≤ n.
This model is often referred as the semiparameteric latent factor model
[Seeger et al., 2005, Alvarez et al., 2012] and it is a special case of the linear
model of corregionalization (LMC) [Gelfand et al., 2004].
Mengyang Gu (Johns Hopkins University) GPPCA SAMSI BFF Conference 8 / 54
Introduction
Estimation of the factor loading matrix
Let Y = [y(x1), ..., y(xn)] be the k × n matrix of the observations and let
Z = [z(x1), ..., z(xn)] be the d × n latent factor matrix.
It is popular to estimate A by PCA. [Higdon et al., 2008, Paulo et al., 2012]
estimate A by the first d columns of
√
nU0D
1/2
0 where U0D0UT
0 are the
eigendecomposition of YYT
/n.
Mengyang Gu (Johns Hopkins University) GPPCA SAMSI BFF Conference 9 / 54
Introduction
Estimation of the factor loading matrix
Let Y = [y(x1), ..., y(xn)] be the k × n matrix of the observations and let
Z = [z(x1), ..., z(xn)] be the d × n latent factor matrix.
It is popular to estimate A by PCA. [Higdon et al., 2008, Paulo et al., 2012]
estimate A by the first d columns of
√
nU0D
1/2
0 where U0D0UT
0 are the
eigendecomposition of YYT
/n.
In [Tipping and Bishop, 1999], they study a latent factor model
Y = AZ + ,
with independent factors Z ∼ N(0, Ink). Assume each row of Y is zero,
the maximum marginal likelihood estimator (MMLE) of A is the first d
columns U0(D0 − σ2
0Ik)1/2
R, where R is an arbitrary d × d orthogonal
rotation matrix.
Note that the model (1) is unchanged if one replaces the pair (A, z(x)) by
(AE, E−1
z(x)) for any invertible matrix E. So only the subspace of A,
denoted as M(A), can be uniquely determined.
The linear subspace by the above PCA for the LMC model and MMLE of
the (independent) latent factor model are the same.
Mengyang Gu (Johns Hopkins University) GPPCA SAMSI BFF Conference 9 / 54
Introduction
Research goals
What is the maximum marginal likelihood estimator of the factor
loadings (and other parameters) in the latent factor model (1) (where
the factors are dependent)?
What are the predictive distributions of the new data?
Are they computationally feasible?
If we have additional regressors (covariates), can we also combine
them in the model in a coherent way?
Mengyang Gu (Johns Hopkins University) GPPCA SAMSI BFF Conference 10 / 54
Generalized probabilistic principal component analysis (GPPCA)
Outline
1 Introduction
2 Generalized probabilistic principal component analysis (GPPCA)
3 GPPCA with a mean structure
4 Simulated examples
5 Real examples
6 Future directions
Mengyang Gu (Johns Hopkins University) GPPCA SAMSI BFF Conference 11 / 54
Generalized probabilistic principal component analysis (GPPCA)
Orthogonal assumption
Since only the linear subspace of the factor loading matrix M(A) is identifiable,
we assume the columns of A in model (1) are orthonormal:
Assumptions 1
AT
A = Id. (3)
Note one may assume AT
A = cId where c is a positive constant which can
potentially depend on k, e.g. c = k. But the variance parameters of the
factor processes are estimated by the data so we focus on Assumption 1.
This assumption is also the key for some other estimators of factor loading
matrix [Lam et al., 2011, Lam and Yao, 2012].
The MLE of the factor loading matrix A under the Assumption 1 is
√
nU0R
(without marginalizing out Z), where U0 is the first d ordered eigenvectors
of YYT
/n and R is an orthogonal rotation matrix (same as the PCA). E.g.
[Bai and Ng, 2002] and [Bai, 2003] assume that AT
A = kId and estimate
A by
√
kU0 in modeling high-dimensional time series.
Mengyang Gu (Johns Hopkins University) GPPCA SAMSI BFF Conference 12 / 54
Generalized probabilistic principal component analysis (GPPCA)
Marginal likelihood
(Known expression of the marginal likelihood) Denote the vectorization
of the output Yv = vec(Y) and the d × n latent factor matrix
Z = (z(x1), ..., z(xn)) at inputs {x1, ..., xn}. After marginalizing out Z, Y
follows a multivariate normal distribution as follows ([Banerjee et al., 2014])
Yv | A, σ2
0, Σ1, ..., Σd ∼ MN 0,
d
l=1
Σl ⊗ (alaT
l ) + σ2
0Ink .
Lemma 1 (Marginal likelihood)
Under Assumption 1, the marginal distribution of Yv in model (1) is the
multivariate normal distribution as follows
Yv | A, σ2
0, Σ1, ..., Σd ∼ MN

0, σ2
0 Ink −
d
l=1
(σ2
0Σ−1
l + In)−1
⊗ (alaT
l )
−1

 ,
for l = 1, ..., d.
Mengyang Gu (Johns Hopkins University) GPPCA SAMSI BFF Conference 13 / 54
Generalized probabilistic principal component analysis (GPPCA)
Theorem 1 (Maximum marginal likelihood estimator)
For model (1), under Assumption 1, after marginalizing out Z,
1. if Σ1 = ... = Σd = Σ, the marginal likelihood is maximized when
ˆA = UR, (4)
where U is a k × d matrix of the first d principal eigenvectors of
G = Y(σ2
0Σ−1
+ In)−1
YT
, (5)
and R is an arbitrary d × d orthogonal rotation matrix;
Mengyang Gu (Johns Hopkins University) GPPCA SAMSI BFF Conference 14 / 54
Generalized probabilistic principal component analysis (GPPCA)
Theorem 1 (Maximum marginal likelihood estimator)
For model (1), under Assumption 1, after marginalizing out Z,
1. if Σ1 = ... = Σd = Σ, the marginal likelihood is maximized when
ˆA = UR, (4)
where U is a k × d matrix of the first d principal eigenvectors of
G = Y(σ2
0Σ−1
+ In)−1
YT
, (5)
and R is an arbitrary d × d orthogonal rotation matrix;
2. if the covariances of the factor processes are different, denoting
Gl = Y(σ2
0Σ−1
l + In)−1
YT
, the maximum marginal likelihood estimator of
A is
ˆA = argmaxA
d
l=1
aT
l Glal, s.t. AT
A = Id, (6)
A numerical optimization algorithm that preserves the orthogonal constraints in
(6) is introduced in [Wen and Yin, 2013].
Mengyang Gu (Johns Hopkins University) GPPCA SAMSI BFF Conference 14 / 54
Generalized probabilistic principal component analysis (GPPCA)
Generalized probabilistic principal component analysis
The estimator in Theorem 1 is called the generalized probabilistic
principal component analysis (GPPCA), which is a direct extension of
the PPCA in [Tipping and Bishop, 1999] when the factors are correlated.
Mengyang Gu (Johns Hopkins University) GPPCA SAMSI BFF Conference 15 / 54
Generalized probabilistic principal component analysis (GPPCA)
Generalized probabilistic principal component analysis
The estimator in Theorem 1 is called the generalized probabilistic
principal component analysis (GPPCA), which is a direct extension of
the PPCA in [Tipping and Bishop, 1999] when the factors are correlated.
For demonstration purposes, let (i, j)-term of Σl be σ2
l Kl(xi, xj), where
Kl(·, ·) is a kernel functions, having parameters γl.
Denote the signal to noise ratio (SNR) τl =
σ2
l
σ2
0
. Let τ = (τ1, ..., τd) and
γ = (γ1, ..., γd). The maximum marginal likelihood estimator of σ2
0 becomes
a function of ˆA, τ and γ as σ2
0 = ˆS2
/(nk), where
ˆS2
= tr(YT
Y) −
d
l=1 ˆaT
l Y(τ−1
l K−1
l + In)−1
YT
ˆal.
Plugging ˆA and ˆσ2
0, the marginal likelihood satisfies
L(τ, γ | Y, ˆA, ˆσ2
0) ∝
d
l=1
|τlKl + In|−1/2
| ˆS2
|−nk/2
. (7)
After obtaining (ˆτ, ˆγ) by maximizing the marginal likelihood, one get the ˆA,
ˆσ2
0, and ˆσ2
l for l = 1, ..., d.
Mengyang Gu (Johns Hopkins University) GPPCA SAMSI BFF Conference 15 / 54
Generalized probabilistic principal component analysis (GPPCA)
Computational complexity
Each evaluation of the likelihood in (7), one needs
max(O(dn3), O(dkn)) in general.
Each evaluation of the optimization function for estimating A in
Theorem 1, one needs max(O(dn3), O(dkn)) in general; to solve the
eigenproblem when the covariance is shared, one needs min(kn2, k2n).
When the input is one-dimensional and the Mat´ern kernel are used,
the computational operations are only O(dkn) for computing the
likelihood in (7) without any approximation (see e.g. [Whittle, 1954,
Hartikainen and Sarkka, 2010]). There is an R package, called
“FastGaSP” in CRAN that implements the fast algorithm for
Gaussian process with Mat´ern kernel [Gu, 2019].
To directly solve the eigenproblem still has the rate
min(O(kn2), O(k2n)), but the iterative algorithm has rate O(dkn).
Mengyang Gu (Johns Hopkins University) GPPCA SAMSI BFF Conference 16 / 54
Generalized probabilistic principal component analysis (GPPCA)
Let ˆΣl be the estimator of the covariance matrix for the lth factor, where the
(i, j) element of ˆΣl is ˆσ2
l
ˆKl(xi, xj) by plugging in ˆσ2
l and ˆγl.
Theorem 2 (Predictive distribution)
Under the Assumption 1, for any x∗
, one has
Y(x∗
) | Y, ˆA, ˆγ, ˆσ2
, ˆσ2
0 ∼ MN ˆµ∗
(x∗
), ˆΣ∗
(x∗
) ,
where
ˆµ∗
(x∗
) = ˆAˆz(x∗
), (8)
with ˆz(x∗
) = (ˆz1(x∗
), ..., ˆzd(x∗
))T
, ˆzl(x∗
) = ˆΣT
l (x∗
)(ˆΣl + ˆσ2
0In)−1
YT
ˆal,
ˆΣl(x∗
) = ˆσ2
l ( ˆKl(x1, x∗
), ..., ˆKl(x1, x∗
))T
for l = 1, ..., d, and
ˆΣ∗
(x∗
) = ˆA ˆD(x∗
)ˆAT
+ ˆσ2
0(Ik − ˆAˆAT
) (9)
with ˆD(x∗
) being a diagonal matrix, and the lth diagonal term being
ˆDl(x∗
) = ˆσ2
l
ˆKl(x∗
, x∗
) + ˆσ2
0 − ˆΣT
l (x∗
) ˆΣl + ˆσ2
0In
−1
ˆΣl(x∗
) .
Mengyang Gu (Johns Hopkins University) GPPCA SAMSI BFF Conference 17 / 54
Generalized probabilistic principal component analysis (GPPCA)
Illustrative example
Example 1
The data are sampled from the latent factor model (1) with the shared
covariance matrix Σ1 = Σ2 = Σ, where x is equally spaced from 1 to n
and the kernel function is assumed to follow (14) with γ = 100 and
σ2 = 1. We choose k = 2, d = 1 and n = 100. Two scenarios are
implemented with σ2
0 = 0.01 and σ2
0 = 1, respectively. The parameters
(σ2
0, σ2, γ) are assumed to be unknown and estimated from the data.
Mengyang Gu (Johns Hopkins University) GPPCA SAMSI BFF Conference 18 / 54
Generalized probabilistic principal component analysis (GPPCA)
qq
qq
q
q
qq
q
qqqq
qqqq
qq
qqq
q
q
q
q
q
q
q
qq
qqq
q
q
q
qq
q
qq
q
qq
q
q
q
qq
q
q
q
qqq
q
q
q
q
q
q
qq
qqqqqq
q
q
qqqq
q
q
qq
q
q
qqq
q
q
q
q
q
q
qq
q
qq
q
q
q
q
0 20 40 60 80 100
−1.5−0.50.51.0
x
y
q
q
q
q
qqqqq
q
q
q
qqq
qq
q
q
q
qq
q
q
q
qq
q
q
q
q
q
q
qqqq
q
qq
q
q
qq
q
q
qq
q
q
q
q
qqq
q
q
q
qqq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
qq
q
q
qqq
qqq
qq
q
q
q
q
q
q
q
q
q
qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq
0 20 40 60 80 100
−1012345
x
y~
q
q
qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq
−1.0 −0.5 0.0 0.5
−1.0−0.50.00.5
y1
y2
q
q
q
q
q
q
qq
q
q
qqq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qqq
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
0 20 40 60 80 100
−3−2−10123
x
y
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
qqq
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq
0 20 40 60 80 100
−4−3−2−101
x
y~
q
q
q
qq
qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq
−3 −2 −1 0 1 2
−3−2−10123
y1
y2
Figure 5: Estimation of the factor loading matrix by the PCA and GPPCA for Example
1 with the variance of the noise being σ2
0 = 0.01 and σ2
0 = 1, graphed in the upper and
lower panels, respectively. The circles and dots are the first and second rows of Y in the
left panel, and of ˜Y = YL in the middle panels, where L = UD1/2
with U being the
eigenvectors and the diagonals of D being the eigenvalues of (ˆσ2
0
ˆΣ−1
+ In)−1
. In the
right panels, the black, red and blue lines are the subspace of A, the first eigenvector of
U0 and Y(ˆσ2
0
ˆΣ−1
+ In)−1
YT
, respectively, with the black triangles being the outputs.
Mengyang Gu (Johns Hopkins University) GPPCA SAMSI BFF Conference 19 / 54
Generalized probabilistic principal component analysis (GPPCA)
Estimation of the mean
0 20 40 60 80 100
−1.0−0.50.00.5
x
Y
^
1
PCA
GPPCA
Truth
qq
q
q
q
q
q
q
q
qqqq
q
qq
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
qqq
q
q
q
qq
q
qq
q
qq
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
qqq
q
q
q
q
q
q
q
qqq
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
0 20 40 60 80 100
−0.20.00.20.4
x
Y
^
2
PCA
GPPCA
Truth
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
qq
qq
q
q
q
q
q
qq
q
q
qq
q
q
q
q
qqq
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
0 20 40 60 80 100
−3−2−1012
x
Y
^
1
PCA
GPPCA
Truth
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
0 20 40 60 80 100
−2−10123
x
Y
^
2
PCA
GPPCA
Truth
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
qq
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
Figure 6: Estimation of AZ for Example 1 with the variance of the noise being
σ2
0 = 0.01 and σ2
0 = 1, graphed in the upper panels and lower panels, respectively. The
first row and second row of Y are graphed as the black curves in the left and right
panels, respectively. The red dotted curves and the blue dashed curves are the prediction
by the PCA and GPPCA, respectively. The grey region is the 95% posterior credible
interval from GPPCA.
Mengyang Gu (Johns Hopkins University) GPPCA SAMSI BFF Conference 20 / 54
GPPCA with a mean structure
Outline
1 Introduction
2 Generalized probabilistic principal component analysis (GPPCA)
3 GPPCA with a mean structure
4 Simulated examples
5 Real examples
6 Future directions
Mengyang Gu (Johns Hopkins University) GPPCA SAMSI BFF Conference 21 / 54
GPPCA with a mean structure
Latent factor model with covariates
Consider the latent factor model with a mean structure for a k-dimensional
output vector at the input x,
y(x) = (h(x)B)T
+ Az(x) + , (10)
where h(x) is a 1 × q known mean basis function related to input x and possibly
other covariates, B = (β1, ..., βk) is a q × k matrix of the mean (or trend)
parameters. Denote M = In − H(HT
H)−1
HT
. We have the following lemma
for the marginal likelihood estimator of the variance.
Lemma 2
Consider an objective prior π(B) ∝ 1. Under Assumption 1, after marginalizing
out B and Z, the maximum likelihood estimator for σ2
0 is ˆσ2
0 = S2
M /k(n − q),
where S2
M = tr(YMYT
) −
d
l=1 aT
l YM(M + τ−1
l K−1
l )−1
MYT
al. Moreover,
the marginal density of the data satisfies
p(Y | A, τ, γ, ˆσ2
0) ∝
d
l=1
|τlKl + In|
−1/2
HT
(τlKl + In)−1
H
− 1
2
S2
M
−(k(n−q)
2 )
Mengyang Gu (Johns Hopkins University) GPPCA SAMSI BFF Conference 22 / 54
GPPCA with a mean structure
GPPCA with the mean structure
Since there is no closed-form expression for the parameters (τ, γ) in the
kernels, one can numerically maximize the Equation (11) to estimate A
and other parameters.
ˆA = argmaxA
d
l=1
aT
l Gl,M al, s.t. AT
A = Id, (11)
(ˆτ, ˆγ) = argmax(τ,γ)p(Y | ˆA, τ, γ). (12)
When Σ1 = ... = Σd, the closed-form expression of ˆA can be obtained
similarly in Theorem 1. In general, we can use the approach in [Wen and
Yin, 2013] for solving the optimization problem in (12). After obtaining ˆτ
and ˆσ2
0, we transform them to get ˆσ2
l = ˆτlˆσ2
0 for l = 1, ..., d.
Mengyang Gu (Johns Hopkins University) GPPCA SAMSI BFF Conference 23 / 54
GPPCA with a mean structure
Theorem 3 (Predictive distribution)
Under the Assumption 1, after marginalizing out Z and B by the objective prior
π(B) ∝ 1, the predictive distribution of model (10) for any x∗
is
Y(x∗
) | Y, ˆA, ˆγ, ˆσ2
, ˆσ2
0 ∼ MN ˆµ∗
M (x∗
), ˆΣ∗
M (x∗
) .
Here
ˆµ∗
M (x∗
) = h(x∗
)ˆB
T
+ ˆAˆzM (x∗
),
where ˆB = (HT
H)−1
H(Y − ˆAˆZM )T
, ˆZM = (ˆZT
1,M , ..., ˆZT
d,M )T
, with
ˆZl,M = aT
l YM( ˆΣlM + ˆσ2
0In)−1 ˆΣl, and ˆzM (x∗
) = (ˆz1,M (x∗
), ..., ˆzd,M (x∗
))T
with
ˆzl,M (x∗
) = ˆΣT
l (x∗
)( ˆΣlM + ˆσ2
0In)−1
MYal, for l = 1, ..., d. Moreover,
ˆΣ∗
M (x∗
) = ˆA ˆDM (x∗
) ˆAT
+ ˆσ2
0(1 + h(x∗
)(HT
H)−1
hT
(x∗
))(Ik − ˆA ˆAT
),
where ˆDM (x∗
) is a diagonal matrix with the lth diagonal term being
ˆDl,M (x∗
) = ˆσ2
l
ˆKl(x∗
, x∗
) + ˆσ2
0 − ˆΣT
l (x∗
) ˜Σ−1
l
ˆΣl(x∗
)
+ (hT
(x∗
) − HT ˜Σ−1
l
ˆΣl(x∗
))T
(HT ˜Σ−1
l H)−1
(hT
(x∗
) − HT ˜Σ−1
l
ˆΣl(x∗
)),
with ˜Σl = ˆΣl + ˆσ2
0In for l = 1, ..., d.
Mengyang Gu (Johns Hopkins University) GPPCA SAMSI BFF Conference 24 / 54
Simulated examples
Outline
1 Introduction
2 Generalized probabilistic principal component analysis (GPPCA)
3 GPPCA with a mean structure
4 Simulated examples
Correctly specified models
Misspecified models
5 Real examples
6 Future directions
Mengyang Gu (Johns Hopkins University) GPPCA SAMSI BFF Conference 25 / 54
Simulated examples Correctly specified models
Outline
1 Introduction
2 Generalized probabilistic principal component analysis (GPPCA)
3 GPPCA with a mean structure
4 Simulated examples
Correctly specified models
Misspecified models
5 Real examples
Humanity computer model with multiple outputs
Global gridded temperature anomalies
6 Future directions
Mengyang Gu (Johns Hopkins University) GPPCA SAMSI BFF Conference 26 / 54
Simulated examples Correctly specified models
Evaluation criteria
(Largest principal angle). Let 0 ≤ φ1 ≤ ... ≤ φd ≤ π/2 be the principal
angles between M(A) and M(ˆA), recursively defined by
φi = arccos max
a∈M(A),ˆa∈M( ˆA)
|aT
ˆa| = arccos(|aT
i ˆai|),
subject to
||a|| = ||ˆa|| = 1, aT
ai = 0, ˆaT
ˆai = 0, i = 1, ..., d − 1,
where || · || denotes the L2 norm. The largest principal angle is φd. When
the columns of the A and ˆA are orthogonal bases of the M(A) and M(ˆA),
cos(φd) is equal to the smallest singular value of AT ˆA [Bj¨orck and Golub,
1973, Absil et al., 2006].
Mengyang Gu (Johns Hopkins University) GPPCA SAMSI BFF Conference 27 / 54
Simulated examples Correctly specified models
Evaluation criteria
(Largest principal angle). Let 0 ≤ φ1 ≤ ... ≤ φd ≤ π/2 be the principal
angles between M(A) and M(ˆA), recursively defined by
φi = arccos max
a∈M(A),ˆa∈M( ˆA)
|aT
ˆa| = arccos(|aT
i ˆai|),
subject to
||a|| = ||ˆa|| = 1, aT
ai = 0, ˆaT
ˆai = 0, i = 1, ..., d − 1,
where || · || denotes the L2 norm. The largest principal angle is φd. When
the columns of the A and ˆA are orthogonal bases of the M(A) and M(ˆA),
cos(φd) is equal to the smallest singular value of AT ˆA [Bj¨orck and Golub,
1973, Absil et al., 2006].
(Average mean square error (AvgMSE)) of the output over N experiments:
AvgMSE =
N
l=1
k
j=1
n
i=1
( ˆY
(l)
j,i − E[Y
(l)
j,i ])2
knN
, (13)
where E[Y
(l)
j,i ] is the (j, i)-term of the mean of the output matrix at the lth
experiment, and ˆY
(l)
j,i is the estimation.
Mengyang Gu (Johns Hopkins University) GPPCA SAMSI BFF Conference 27 / 54
Simulated examples Correctly specified models
Approaches
In the GPPCA, we let the covariance function of the lth factor be a product
kernel σ2
l Kl(xa, xb) = σ2
l
p
m=1 Klm(xam, xbm) for demonstration
purposes, where Klm(·, ·) is the Mat´ern kernel with roughness parameter 2.5
Klm(xam, xbm) = 1 +
√
5d
γlm
+
5d2
3γ2
lm
exp −
√
5d
γlm
, (14)
with d = |xam − xbm| and unknown range parameters γl = (γl1, ..., γlp).
The MMLE will be used for estimating factor loading matrix and the
parameters and the predictive mean of the data will be used for prediction.
In PCA, ˆApca = U0, where U0 is the first d eigenvectors of YYT
/n.
In [Lam et al., 2011, Lam and Yao, 2012], A is estimated by
q0
q=1
ˆΣy(q)ˆΣT
y (q) with q0 = 1 and q0 = 5, where ˆΣy(q) is the sample
covariance of the output at lag q.
Independent GPs and parallel partial GPs will also be included for the last
simulated examples.
Mengyang Gu (Johns Hopkins University) GPPCA SAMSI BFF Conference 28 / 54
Simulated examples Correctly specified models
Example 2 (Factors with the same covariance matrix)
The data are sampled from model (1) with Σ1 = ... = Σd = Σ, where
xi = i for 1 ≤ i ≤ n, and the kernel function in (14) is used with γ = 100
and σ2 = 1. In each scenario, we simulate the data from 16 different
combinations of σ2
0, k, d and n. We repeat N = 100 times for each
scenario. The parameters (σ2
0, σ2, γ) are treated as unknown and
estimated from the data.
Mengyang Gu (Johns Hopkins University) GPPCA SAMSI BFF Conference 29 / 54
Simulated examples Correctly specified models
q
q
q
q
q
q
q
q
q
qq
q
qq
qqq
qqqq
qqqqq
q
q
q
q
q
qq
q
q
q
PCA
GPPCA
LY1
LY5
PCA
GPPCA
LY1
LY5
0.0
0.5
1.0
1.5
k = 8, d = 4 and τ = 100
LargestPrincipalAngle
q
q
q
qq
q
q
qqqqqq
q
qqq
q
q
qqq
q
qq
PCA
GPPCA
LY1
LY5
PCA
GPPCA
LY1
LY5
0.0
0.5
1.0
1.5
k = 40, d = 4 and τ = 100
LargestPrincipalAngle
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
qqq
qq
PCA
GPPCA
LY1
LY5
PCA
GPPCA
LY1
LY5
0.0
0.5
1.0
1.5
k = 16, d = 8 and τ = 100
LargestPrincipalAngle
q
q
q
qq
q
q
q
qqq
q
qq q
q
qq
qq
PCA
GPPCA
LY1
LY5
PCA
GPPCA
LY1
LY5
0.0
0.5
1.0
1.5
k = 80, d = 8 and τ = 100
LargestPrincipalAngle
qq
q
q
q
q
q
q
qq
q
q
q
PCA
GPPCA
LY1
LY5
PCA
GPPCA
LY1
LY5
0.0
0.5
1.0
1.5
k = 8, d = 4 and τ = 4
LargestPrincipalAngle
q
q
qq
qq
q
qq
q
q
q
q
qq
PCA
GPPCA
LY1
LY5
PCA
GPPCA
LY1
LY5
0.0
0.5
1.0
1.5
k = 40, d = 4 and τ = 4
LargestPrincipalAngle
q
q
q
qq
q
q
qq
qq
qq
q
q
q
q
q
PCA
GPPCA
LY1
LY5
PCA
GPPCA
LY1
LY5
0.0
0.5
1.0
1.5
k = 16, d = 8 and τ = 4
LargestPrincipalAngle
qqq
q
q
q
q
q
q
qq
q
q
q
PCA
GPPCA
LY1
LY5
PCA
GPPCA
LY1
LY5
0.0
0.5
1.0
1.5
k = 80, d = 8 and τ = 4
LargestPrincipalAngle
Figure 7: The largest principal angle. n = 200 and n = 400 for the left four
boxplots and right four boxplots in the first rows, respectively; n = 500 and
n = 1000 in the second row.
Mengyang Gu (Johns Hopkins University) GPPCA SAMSI BFF Conference 30 / 54
Simulated examples Correctly specified models
d = 4 and τ = 100 k=8 k=40
n = 200 n = 400 n = 200 n = 400
PCA 5.3 × 10−3 5.1 × 10−3 1.4 × 10−3 1.1 × 10−3
GPPCA 3.3 × 10−4 2.6 × 10−4 2.2 × 10−4 1.3 × 10−4
LY1 4.6 × 10−2 5.8 × 10−3 1.5 × 10−2 2.1 × 10−3
LY5 3.2 × 10−2 5.5 × 10−3 1.1 × 10−2 1.8 × 10−3
d = 8 and τ = 100 k=16 k=80
n = 500 n = 1000 n = 500 n = 1000
PCA 5.2 × 10−3 5.0 × 10−3 1.3 × 10−3 1.1 × 10−3
GPPCA 2.9 × 10−4 2.4 × 10−4 1.9 × 10−4 1.1 × 10−4
LY1 1.4 × 10−2 5.1 × 10−3 5.4 × 10−3 1.2 × 10−3
LY5 8.8 × 10−3 5.1 × 10−3 3.9 × 10−3 1.2 × 10−3
d = 4 and τ = 4 k=8 k=40
n = 200 n = 400 n = 200 n = 400
PCA 1.4 × 10−1 1.3 × 10−1 4.2 × 10−2 3.4 × 10−2
GPPCA 5.8 × 10−3 4.4 × 10−3 5.3 × 10−3 3.0 × 10−3
LY1 2.2 × 10−1 1.7 × 10−1 7.2 × 10−2 6.4 × 10−2
LY5 2.2 × 10−1 1.5 × 10−1 4.8 × 10−2 4.1 × 10−2
d = 8 and τ = 4 k=16 k=80
n = 500 n = 1000 n = 500 n = 1000
PCA 1.4 × 10−1 1.3 × 10−1 3.9 × 10−2 3.2 × 10−2
GPPCA 5.1 × 10−3 3.9 × 10−3 4.3 × 10−3 2.4 × 10−3
LY1 1.8 × 10−1 1.4 × 10−1 5.1 × 10−2 3.4 × 10−2
LY5 1.7 × 10−1 1.3 × 10−1 4.6 × 10−2 3.1 × 10−2
Table 1: AvgMSE for Example 2.
Mengyang Gu (Johns Hopkins University) GPPCA SAMSI BFF Conference 31 / 54
Simulated examples Correctly specified models
Prediction of the mean by PCA and GPPCA
0 100 200 300 400
−1.5−0.50.5
x
Y
^
1
PCA
GPPCA
Truth
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
0 100 200 300 400
−1.00.01.0
x
Y
^
2
PCA
GPPCA
Truth
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qqq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
Figure 8: Prediction of the mean (AZ) of the first two output variables in one
experiment with k = 8, d = 4, n = 400 and τ = 4. The observations are plotted
as black circles and the truth is graphed as the black curves. The estimation by
the PCA and GPPCA is graphed as the red dotted curves and blue dashed curves,
respectively. The shaded area is the 95% posterior credible interval by the
GPPCA.
Mengyang Gu (Johns Hopkins University) GPPCA SAMSI BFF Conference 32 / 54
Simulated examples Correctly specified models
Example 3 (Factors with different covariance matrices)
The data are sampled from model (1) where xi = i for 1 ≤ i ≤ n. The
variance of the noise is σ2
0 = 0.25 and the kernel function is assumed to
follow from (14) with σ2 = 1. The range parameter γ of each factor is
uniformly sampled from [10, 103] in each experiment. In each scenario, we
simulate the data from 8 different combinations of k, d and n. We repeat
N = 100 times for each scenario. The parameters in the kernels and the
variance of the noise are treated as unknown and estimated from the data.
Mengyang Gu (Johns Hopkins University) GPPCA SAMSI BFF Conference 33 / 54
Simulated examples Correctly specified models
Largest Principal angles for Example 3
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
PCA
GPPCA
LY1
LY5
PCA
GPPCA
LY1
LY5
0.0
0.5
1.0
1.5
k = 8 and d = 4
LargestPrincipalAngle
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
qq
q
q
q
PCA
GPPCA
LY1
LY5
PCA
GPPCA
LY1
LY5
0.0
0.5
1.0
1.5
k = 40 and d = 4
LargestPrincipalAngle
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qqqq
q
q
q
qq
q
q
q
qqqq
q
q
q
q
q
q
q
qqq
q
q
q
q
q
q
q
q
qq
q
qq
PCA
GPPCA
LY1
LY5
PCA
GPPCA
LY1
LY5
0.0
0.5
1.0
1.5
k = 16 and d = 8
LargestPrincipalAngle
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
qq
q
qq
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
PCA
GPPCA
LY1
LY5
PCA
GPPCA
LY1
LY5
0.0
0.5
1.0
1.5
k = 80 and d = 8
LargestPrincipalAngle
Figure 9: The largest principal angle between the true subspace and the estimated
subspace of the four approaches for Example 3. The number of observations of
each output variable is n = 200 and n = 400 for left 4 boxplots and right 4
boxplots in 2 left panels, respectively. The number of observations is n = 500 and
n = 1000 for left 4 boxplots and right 4 boxplots in 2 right panels, respectively.
Mengyang Gu (Johns Hopkins University) GPPCA SAMSI BFF Conference 34 / 54
Simulated examples Correctly specified models
AvgMSE for Example 3
d = 4 and τ = 4 k=8 k=40
n = 200 n = 400 n = 200 n = 400
PCA 1.3 × 10−1 1.3 × 10−1 3.8 × 10−2 3.0 × 10−2
GPPCA 1.4 × 10−2 4.0 × 10−2 7.1 × 10−3 1.1 × 10−2
LY1 1.6 × 10−1 1.4 × 10−1 4.9 × 10−2 3.4 × 10−2
LY5 1.5 × 10−1 1.3 × 10−1 4.4 × 10−2 3.2 × 10−2
d = 8 and τ = 4 k=16 k=80
n = 500 n = 1000 n = 500 n = 1000
PCA 1.3 × 10−1 1.3 × 10−1 3.5 × 10−2 2.9 × 10−2
GPPCA 1.3 × 10−2 3.3 × 10−2 6.0 × 10−3 8.0 × 10−3
LY1 1.4 × 10−1 1.3 × 10−1 3.7 × 10−2 2.9 × 10−2
LY5 1.4 × 10−1 1.3 × 10−1 3.4 × 10−2 2.8 × 10−2
Table 2: AvgMSE for Example 3.
Mengyang Gu (Johns Hopkins University) GPPCA SAMSI BFF Conference 35 / 54
Simulated examples Misspecified models
Outline
1 Introduction
2 Generalized probabilistic principal component analysis (GPPCA)
3 GPPCA with a mean structure
4 Simulated examples
Correctly specified models
Misspecified models
5 Real examples
Humanity computer model with multiple outputs
Global gridded temperature anomalies
6 Future directions
Mengyang Gu (Johns Hopkins University) GPPCA SAMSI BFF Conference 36 / 54
Simulated examples Misspecified models
Example 4 (Unconstrained factor loadings and misspecified kernel
functions)
The data are sampled from model (1) with Σ1 = ... = Σd = Σ and xi = i
for 1 ≤ i ≤ n. Each entry of the factor loading matrix is assumed to be
uniformly sampled from [0, 1] independently (without the orthogonal
constraints in (3)). The exponential kernel and the Gaussian kernel are
assumed in generating the data with different combinations of σ2
0 and n,
while in the GPPCA, we still use the Mat´ern kernel function in (14) for
the estimation. We assume k = 20, d = 4, γ = 100 and σ2 = 1 in
sampling the data. We repeat N = 100 times for each scenario. All the
kernel parameters and the noise variance are treated as unknown and
estimated from the data.
Mengyang Gu (Johns Hopkins University) GPPCA SAMSI BFF Conference 37 / 54
Simulated examples Misspecified models
Largest Principal angle for Example 4
q
q
qqq
q
q
q
qqq
q
q
q
q
qq
PCA
GPPCA
LY1
LY5
PCA
GPPCA
LY1
LY5
PCA
GPPCA
LY1
LY5
0.0
0.5
1.0
1.5
LargestPrincipalAngle
Exponential Kernel
k = 20, d = 4 and τ = 0.25
q
q
q
q
q
q
q
q
q
q
q
qqq
q
q
q
q
q
qq
q
q
q qq
q
qq q
q
q
q
PCA
GPPCA
LY1
LY5
PCA
GPPCA
LY1
LY5
PCA
GPPCA
LY1
LY5
0.0
0.5
1.0
1.5
LargestPrincipalAngle
Gaussian Kernel
k = 20, d = 4 and τ = 0.25
Figure 10: The largest principal angle between the estimated subspace of four
approaches and the true subspace for Example 4. The number of observations are
assumed to be n = 100, n = 200 and n = 400 for left 4 boxplots, middle 4
boxplots and right 4 boxplots in both panels, respectively. The kernel in
simulating the data is assumed to be the exponential kernel in the left panel,
whereas the kernel is assumed to be the Gaussian kernel in the right panel.
Mengyang Gu (Johns Hopkins University) GPPCA SAMSI BFF Conference 38 / 54
Simulated examples Misspecified models
AvgMSE for Example 4
exponential kernel and τ = 4
n = 100 n = 200 n = 400
PCA 7.4 × 10−2 6.1 × 10−2 5.4 × 10−2
GPPCA 3.1 × 10−2 2.6 × 10−2 2.4 × 10−2
LY1 1.5 × 10−1 8.2 × 10−1 5.7 × 10−2
LY5 1.3 × 10−1 7.3 × 10−1 5.6 × 10−2
Gaussian kernel and τ = 1/4
n = 100 n = 200 n = 400
PCA 1.1 × 100 8.9 × 10−1 8.4 × 10−1
GPPCA 7.2 × 10−1 6.6 × 10−1 6.2 × 10−1
LY1 1.3 × 100 1.0 × 100 8.6 × 10−1
LY5 1.3 × 100 1.0 × 100 8.6 × 10−1
Table 3: AvgMSE for Example 4.
Mengyang Gu (Johns Hopkins University) GPPCA SAMSI BFF Conference 39 / 54
Simulated examples Misspecified models
Example 5 (Unconstrained factor loadings and deterministic factors)
The data are sampled from model (1) with each latent factor being a
deterministic function
Zl(xi) = cos(0.05πθlxi)
where θl
i.i.d.
∼ unif(0, 1) for l = 1, ..., d, with xi = i for 1 ≤ i ≤ n,
σ2
0 = 0.25, k = 20 and d = 4. Four scenarios are considered with the
sample size n = 100, n = 200, n = 400 and n = 800.
Mengyang Gu (Johns Hopkins University) GPPCA SAMSI BFF Conference 40 / 54
Simulated examples Misspecified models
Largest Principal angle for Example 5
q
q
q
q
q
q
q
qq
qq
q
qqq
q
q
qqq
q q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
PCA
GPPCA
LY1
LY5
PCA
GPPCA
LY1
LY5
PCA
GPPCA
LY1
LY5
PCA
GPPCA
LY1
LY5
0.0
0.5
1.0
1.5
LargestPrincipalAngle
Deterministic Factors
k = 20, d = 4 and τ = 4
Figure 11: The largest principal angle between the estimated subspace of the
loading matrix and the true subspace for Example 5. From the left to the right,
the number of observations is assumed to be n = 100, n = 200, n = 400 and
n = 800 for each 4 boxplots, respectively.
Mengyang Gu (Johns Hopkins University) GPPCA SAMSI BFF Conference 41 / 54
Simulated examples Misspecified models
AvgMSE for Example 5
n = 100 n = 200 n = 400 n = 800
PCA 7.0 × 10−2
6.0 × 10−2
5.4 × 10−2
5.2 × 10−2
GPPCA 1.4 × 10−2
9.2 × 10−3
6.7 × 10−3
5.5 × 10−3
LY1 9.8 × 10−1
7.6 × 10−1
6.3 × 10−2
5.7 × 10−2
LY5 9.3 × 10−2
7.3 × 10−2
6.2 × 10−2
5.6 × 10−2
Ind GP 2.0 × 10−2
1.9 × 10−2
1.7 × 10−2
1.7 × 10−2
PP GP 2.0 × 10−2
1.9 × 10−2
1.8 × 10−2
1.8 × 10−2
Table 4: AvgMSE for Example 5.
The Ind GP approach treats each output variable independently and the
mean of the output is estimated by the predictive mean in the Gaussian
process regression.
The PP GP approach also models each output variable independently by a
Gaussian process, whereas the covariance function is shared for k
independent Gaussian processes and estimated based on all data.
Mengyang Gu (Johns Hopkins University) GPPCA SAMSI BFF Conference 42 / 54
Real examples
Outline
1 Introduction
2 Generalized probabilistic principal component analysis (GPPCA)
3 GPPCA with a mean structure
4 Simulated examples
5 Real examples
Humanity computer model with multiple outputs
Global gridded temperature anomalies
6 Future directions
Mengyang Gu (Johns Hopkins University) GPPCA SAMSI BFF Conference 43 / 54
Real examples Humanity computer model with multiple outputs
Outline
1 Introduction
2 Generalized probabilistic principal component analysis (GPPCA)
3 GPPCA with a mean structure
4 Simulated examples
Correctly specified models
Misspecified models
5 Real examples
Humanity computer model with multiple outputs
Global gridded temperature anomalies
6 Future directions
Mengyang Gu (Johns Hopkins University) GPPCA SAMSI BFF Conference 44 / 54
Real examples Humanity computer model with multiple outputs
We first consider the testbed called the ‘diplomatic and military operations
in a non-warfighting domain’ (DIAMOND) simulator, which models the
number of casualties during the second day to sixth day after the earthquake
and volcanic eruption in Giarre and Catania. The input variables are 13
dimensional, such as the helicopter cruise speed, engineer ground speed,
hospital, shelter and food supply capacity in these two places, etc.
We use the same n = 120 training and n∗
= 120 testing outputs in
[Overstall and Woods, 2016] to compare different approaches. The criteria
for out of sample prediction are
RMSE =
k
j=1
n∗
i=1( ˆY ∗
j (x∗
i ) − Y ∗
j (x∗
i ))2
kn∗
,
PCI(95%) =
1
kn∗
k
j=1
n∗
i=1
1{Y ∗
j (x∗
i ) ∈ CIij(95%)} ,
LCI(95%) =
1
kn∗
k
j=1
n∗
i=1
length{CIij(95%)} .
Mengyang Gu (Johns Hopkins University) GPPCA SAMSI BFF Conference 45 / 54
Real examples Humanity computer model with multiple outputs
Method Mean function Kernel RMSE PCI (95%) LCI (95%)
GPPCA Intercept Gaussian kernel 3.33 × 102 0.948 1.52 × 103
GPPCA Selected covariates Gaussian kernel 3.18 × 102 0.957 1.31 × 103
GPPCA Intercept Mat´ern kernel 2.82 × 102 0.962 1.22 × 103
GPPCA Selected covariates Mat´ern kernel 2.74 × 102 0.957 1.18 × 103
Ind GP Intercept Gaussian kernel 3.64 × 102 0.918 1.18 × 103
Ind GP Selected covariates Gaussian kernel 4.04 × 102 0.918 1.17 × 103
Ind GP Intercept Mat´ern kernel 3.40 × 102 0.930 0.984 × 103
Ind GP Selected covariates Mat´ern kernel 3.31 × 102 0.927 0.967 × 103
Multi GP Intercept Gaussian kernel 3.63 × 102 0.975 1.67 × 103
Multi GP Selected covariates Gaussian kernel 3.34 × 102 0.963 1.54 × 103
Multi GP Intercept Mat´ern kernel 3.01 × 102 0.962 1.34 × 103
Multi GP Selected covariates Mat´ern kernel 3.05 × 102 0.970 1.50 × 103
Table 5: The GPPCA and Ind GP with the same mean structure and kernels are
given in the first 8 rows. The 9th and 10th rows show the emulation result of two
best models in [Overstall and Woods, 2016] using Gaussian kernel for the same
held-out testing output, whereas the last two rows give the result of the same
model with the Mat´ern kernel in (14). The RMSE is 1.08 × 105
using the mean
of the training output to predict.
Mengyang Gu (Johns Hopkins University) GPPCA SAMSI BFF Conference 46 / 54
Real examples Humanity computer model with multiple outputs
Estimated covariance and prediction
2
3
4
5
6
2 3 4 5 6
Day
Day
2.5e+06
5.0e+06
7.5e+06
1.0e+07
Covariance
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
0 20 40 60 80 100 120
050001500025000
Held out runs
Output
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
Casualties on day 5
GPPCA prediction for day 5
Ind GP prediction for day 5
Casualties on day 6
GPPCA prediction for day 6
Ind GP prediction for day 6
Figure 12: The estimated covariance of the casualties by the GPPCA at the
different days after the catastrophe is graphed in the left panel. The held out
testing output, the prediction by the GPPCA and Independent GPs with the mean
basis h(x) = (1, x11) and Mat´ern kernel for the fifth day and sixth day are
graphed in the right panel.
Mengyang Gu (Johns Hopkins University) GPPCA SAMSI BFF Conference 47 / 54
Real examples Global gridded temperature anomalies
Outline
1 Introduction
2 Generalized probabilistic principal component analysis (GPPCA)
3 GPPCA with a mean structure
4 Simulated examples
Correctly specified models
Misspecified models
5 Real examples
Humanity computer model with multiple outputs
Global gridded temperature anomalies
6 Future directions
Mengyang Gu (Johns Hopkins University) GPPCA SAMSI BFF Conference 48 / 54
Real examples Global gridded temperature anomalies
NOAA global gridded temperature anomalies
The dataset from U.S. National Oceanic and Atmospheric Administration
(NOAA) the global gridded monthly anomalies of the combined air and
marine temperature from Jan 1880 to near present with 5◦
× 5◦
latitude-longitude spatial resolution. The recorded variance of the
measurement error is around 0.1.
We compare different approaches on interpolation. We use the monthly
temperature anomalies at 1, 639 spatial grid boxes in the past 20 years. We
hold out the 24, 000 randomly sampled measurements on 1, 200 spatial grid
boxes in 20 months as the test data set.
For GPPCA, the mean basis function h(x) = (1, x), where x is an integer
from 1 to 240 for the month. We also assume the covariance is the same for
all factor processes.
We will also compare with PPCA, spatial smoothing and temporal
smoothing approach. For the temporal smoothing approach, we also assume
h(x) = (1, x).
Random forest regression by making independence assumption either across
space or across time.
Mengyang Gu (Johns Hopkins University) GPPCA SAMSI BFF Conference 49 / 54
Real examples Global gridded temperature anomalies
Method measurement error RMSE PCI (95%) LCI (95%)
GPPCA, d = 50 estimated 0.392 0.877 1.03
GPPCA, d = 100 estimated 0.330 0.774 0.564
GPPCA, d = 50 fixed 0.392 0.938 1.34
GPPCA, d = 100 fixed 0.335 0.976 1.44
PPCA, d = 50 estimated 0.644 0.674 1.09
PPCA, d = 100 estimated 0.644 0.520 1.40
PPCA, d = 50 fixed 0.641 0.760 1.33
PPCA, d = 100 fixed 0.622 0.801 1.400
Temporal smoothing by GP estimated 1.02 0.940 2.36
Spatial smoothing by GP estimated 0.623 0.917 1.95
Temporal regression by RF estimated 0.497 / /
Spatial regression by RF estimated 0.444 / /
Table 6: Out of sample prediction of the temperature anomalies by different
approaches. The predictive performance by the GPPCA and PPCA are given in
the first four rows and latter four rows. The predictive performance by the
temporal smoothing method and spatial smoothing methods are given in the 9th
and 10th rows. The last two rows give the predictive RMSE by regression using
the random forest (RF) algorithm.
Mengyang Gu (Johns Hopkins University) GPPCA SAMSI BFF Conference 50 / 54
Real examples Global gridded temperature anomalies
Comparison between the GPPCA and spatial smoothing
−6
−4
−2
0
2
4
6
°C
50 150 250 350
−50
0
50
Interpolion by the GPPCA
Longitude
Latitude
−6
−4
−2
0
2
4
6
°C
50 150 250 350
−50
0
50
Observated temperature anomalies
Longitude
Latitude
−6
−4
−2
0
2
4
6
°C
50 150 250 350
−50
0
50
Interpolion by the spatial smoothing method
Longitude
Latitude
Figure 13: The interpolated and observed temperature anomalies in April 2013.
The observed temperature anomalies in April 2013 is graphed in the middle panel.
The interpolated temperature anomalies by the GPPCA and spatial smoothing
method are graphed in the left and right panels, respectively. The number of
training observations and test observations are 439 and 1200, respectively. The
out-of-sample RMSE of the GPPCA and spatial smoothing method is 0.335 and
0.779, respectively.
Mengyang Gu (Johns Hopkins University) GPPCA SAMSI BFF Conference 51 / 54
Real examples Global gridded temperature anomalies
Estimated Intercept and trend by the GPPCA
−1.5
−1.0
−0.5
0.0
0.5
1.0
1.5
°C
50 150 250 350
−50
0
50
Estimated intercept
Longitude
Latitude
−0.015
−0.010
−0.005
0.000
0.005
0.010
0.015
°C
50 150 250 350
−50
0
50
Estimated monthly temperature change rate
Longitude
Latitude
Figure 14: Estimated intercept and monthly change rate of the temperature
anomalies by the GPPCA using the monthly temperature anomalies between
January 1999 and December 2018.
The spatial orthonormal basis of A could be used. GPPCA is more
general as it does not require the distance between functions.
The GPPCA can be extended to the irregular missing case by the EM
algorithm or MCMC algorithm if one can specify the full posteriors.
Mengyang Gu (Johns Hopkins University) GPPCA SAMSI BFF Conference 52 / 54
Future directions
Outline
1 Introduction
2 Generalized probabilistic principal component analysis (GPPCA)
3 GPPCA with a mean structure
4 Simulated examples
5 Real examples
6 Future directions
Mengyang Gu (Johns Hopkins University) GPPCA SAMSI BFF Conference 53 / 54
Future directions
Future directions
The full Bayesian approach of the factor loading matrix and
parameters (based on the computationally feasible marginal
likelihood).
Estimating the number of factors.
Convergence rate of the GPPCA.
Extension when the observations are not a matrix.
Optimization algorithm on the Stiefel manifold.
Other orthonormal basis of the factor loading matrix.
Other ways to model the factor processes.
Heteroscedastic noise.
Mengyang Gu (Johns Hopkins University) GPPCA SAMSI BFF Conference 54 / 54
Future directions
Reference
Gu, M. and Shen, W. (2018) Generalized probabilistic principal component
analysis (GPPCA) for correlated data. arXiv:1808.10868.
Mengyang Gu (Johns Hopkins University) GPPCA SAMSI BFF Conference 54 / 54
Future directions
Thanks!
Mengyang Gu (Johns Hopkins University) GPPCA SAMSI BFF Conference 54 / 54
Future directions
Related literature: frequentist approaches
The MLE of the factor loading matrix A under the Assumption 1 is U0R
(without marginalizing out Z), where ˜U0 is the first d ordered eigenvectors
of YYT
and R is an orthogonal rotation matrix (same as the PCA).
PCA is widely used in factor models, particularly in modeling multiple time
series. E.g. Bai and Ng [2002] and Bai [2003] assume that AT
A = kId and
estimate A by
√
kU0 in modeling high-dimensional time series.
PCA is also widely used to estimate the basis in the linear model of
coregionalization [Higdon et al., 2008, Paulo et al., 2012].
In [Tipping and Bishop, 1999], the linear subspace by the PCA is the MMLE
of the factor model with independent factors.
[Lam et al., 2011, Lam and Yao, 2012] estimate the factor loading matrix of
model (1) by ˆALY :=
q0
q=1
ˆΣy(q)ˆΣT
y (q), where ˆΣy(q) is the k × k sample
covariance at lag q of the output and q0 is fixed to be a positive integer.
Kernel PCA was introduced in machine learning, which map the output onto
the feature space by kernels [Sch¨olkopf et al., 1998, Mika et al., 1999,
Hoffmann, 2007].
Mengyang Gu (Johns Hopkins University) GPPCA SAMSI BFF Conference 54 / 54
Future directions
Related literature: Bayesian approaches
[West, 2003] points out the connection between PCA and a class of
generalized singular g-priors, and introduces a spike-and-slab prior
that induces the sparse factors in the latent factor model assuming
the factors are independently distributed.
Another prior that induces the sparsity is introduced by [Bhattacharya
and Dunson, 2011] under the independent assumptions of the factors,
and its asymptotic behaviors are also discussed.
[Nakajima and West, 2013, Zhou et al., 2014] introduce a method to
directly threshold the time-varying factor loading matrix in Bayesian
dynamic linear models.
When modeling spatially correlated data, priors are also discussed for
the spatially varying factor loading matrices in LMC [Gelfand et al.,
2004, Banerjee et al., 2014].
[Higdon et al., 2008, Paulo et al., 2012, Fricker et al., 2013] use LMC
for emulating computer models with multiple outputs, estimates the
factor loading matrix and relies on MCMC algorithm for the inference.
Mengyang Gu (Johns Hopkins University) GPPCA SAMSI BFF Conference 54 / 54
References
P-A Absil, Alan Edelman, and Plamen Koev. On the largest principal
angle between random subspaces. Linear Algebra and its applications,
414(1):288–294, 2006.
Mauricio A Alvarez, Lorenzo Rosasco, Neil D Lawrence, et al. Kernels for
vector-valued functions: A review. Foundations and Trends R in
Machine Learning, 4(3):195–266, 2012.
Jushan Bai. Inferential theory for factor models of large dimensions.
Econometrica, 71(1):135–171, 2003.
Jushan Bai and Serena Ng. Determining the number of factors in
approximate factor models. Econometrica, 70(1):191–221, 2002.
Sudipto Banerjee, Bradley P Carlin, and Alan E Gelfand. Hierarchical
modeling and analysis for spatial data. Crc Press, 2014.
Anirban Bhattacharya and David B Dunson. Sparse Bayesian infinite
factor models. Biometrika, pages 291–306, 2011.
ke Bj¨orck and Gene H Golub. Numerical methods for computing angles
between linear subspaces. Mathematics of computation, 27(123):
579–594, 1973.
Mengyang Gu (Johns Hopkins University) GPPCA SAMSI BFF Conference 54 / 54
References
Thomas E Fricker, Jeremy E Oakley, and Nathan M Urban. Multivariate
Gaussian process emulators with nonseparable covariance structures.
Technometrics, 55(1):47–56, 2013.
Alan E Gelfand, Alexandra M Schmidt, Sudipto Banerjee, and CF Sirmans.
Nonstationary multivariate process modeling through spatially varying
coregionalization. Test, 13(2):263–312, 2004.
Mengyang Gu. FastGaSP: Fast and Exact Computation of Gaussian
Stochastic Process, 2019. URL
https://CRAN.R-project.org/package=FastGaSP. R package
version 0.5.1.
Mengyang Gu and Kyle Anderson. Calibration of imperfect mathematical
models by multiple sources of data with measurement bias. arXiv
preprint arXiv:1810.11664, 2018.
Mengyang Gu and James O Berger. Parallel partial Gaussian process
emulation for computer models with massive output. Annals of Applied
Statistics, 10(3):1317–1347, 2016.
Mengyang Gu and Yanxun Xu. Nonseparable Gaussian stochastic process:
Mengyang Gu (Johns Hopkins University) GPPCA SAMSI BFF Conference 54 / 54
References
A unified view and computational strategy. arXiv preprint
arXiv:1711.11501, 2017.
Jouni Hartikainen and Simo Sarkka. Kalman filtering and smoothing
solutions to temporal gaussian process regression models. In Machine
Learning for Signal Processing (MLSP), 2010 IEEE International
Workshop on, pages 379–384. IEEE, 2010.
Dave Higdon, James Gattiker, Brian Williams, and Maria Rightley.
Computer model calibration using high-dimensional output. Journal of
the American Statistical Association, 103(482):570–583, 2008.
Heiko Hoffmann. Kernel PCA for novelty detection. Pattern recognition,
40(3):863–874, 2007.
Clifford Lam and Qiwei Yao. Factor modeling for high-dimensional time
series: inference for the number of factors. The Annals of Statistics, 40
(2):694–726, 2012.
Clifford Lam, Qiwei Yao, and Neil Bathia. Estimation of latent factors for
high-dimensional time series. Biometrika, 98(4):901–918, 2011.
Sebastian Mika, Bernhard Sch¨olkopf, Alex J Smola, Klaus-Robert M¨uller,
Matthias Scholz, and Gunnar R¨atsch. Kernel PCA and de-noising in
Mengyang Gu (Johns Hopkins University) GPPCA SAMSI BFF Conference 54 / 54
References
feature spaces. In Advances in neural information processing systems,
pages 536–542, 1999.
Jouchi Nakajima and Mike West. Bayesian analysis of latent threshold
dynamic models. Journal of Business & Economic Statistics, 31(2):
151–164, 2013.
Antony M Overstall and David C Woods. Multivariate emulation of
computer simulators: model selection and diagnostics with application
to a humanitarian relief model. Journal of the Royal Statistical Society:
Series C (Applied Statistics), 65(4):483–505, 2016.
Rui Paulo, Gonzalo Garc´ıa-Donato, and Jes´us Palomo. Calibration of
computer models with multivariate output. Computational Statistics
and Data Analysis, 56(12):3959–3974, 2012.
Bernhard Sch¨olkopf, Alexander Smola, and Klaus-Robert M¨uller.
Nonlinear component analysis as a kernel eigenvalue problem. Neural
computation, 10(5):1299–1319, 1998.
Matthias Seeger, Yee-Whye Teh, and Michael Jordan. Semiparametric
latent factor models. Technical report, 2005.
Mengyang Gu (Johns Hopkins University) GPPCA SAMSI BFF Conference 54 / 54
Future directions
Michael E Tipping and Christopher M Bishop. Probabilistic principal
component analysis. Journal of the Royal Statistical Society: Series B
(Statistical Methodology), 61(3):611–622, 1999.
Zaiwen Wen and Wotao Yin. A feasible method for optimization with
orthogonality constraints. Mathematical Programming, 142(1-2):
397–434, 2013.
M. West. Bayesian factor regression models in the “large p, small n”
paradigm. In J. M. Bernardo, M. J. Bayarri, J. O. Berger, A. P. David,
D. Heckerman, A. F. M. Smith, and M. West, editors, Bayesian
Statistics 7, pages 723–732. Oxford University Press, 2003. URL
http://ftp.isds.duke.edu/WorkingPapers/02-12.html.
Peter Whittle. On stationary processes in the plane. Biometrika, pages
434–449, 1954.
Xiaocong Zhou, Jouchi Nakajima, and Mike West. Bayesian forecasting
and portfolio decisions using dynamic dependent sparse factor models.
International Journal of Forecasting, 30(4):963–980, 2014.
Mengyang Gu (Johns Hopkins University) GPPCA SAMSI BFF Conference 54 / 54

More Related Content

What's hot

4 hydrology geostatistics-part_2
4 hydrology geostatistics-part_2 4 hydrology geostatistics-part_2
4 hydrology geostatistics-part_2 Riccardo Rigon
 
The inverse scattering series for tasks associated with primaries: direct non...
The inverse scattering series for tasks associated with primaries: direct non...The inverse scattering series for tasks associated with primaries: direct non...
The inverse scattering series for tasks associated with primaries: direct non...Arthur Weglein
 
THE RESEARCH OF QUANTUM PHASE ESTIMATION ALGORITHM
THE RESEARCH OF QUANTUM PHASE ESTIMATION ALGORITHMTHE RESEARCH OF QUANTUM PHASE ESTIMATION ALGORITHM
THE RESEARCH OF QUANTUM PHASE ESTIMATION ALGORITHMIJCSEA Journal
 
Quickselect Under Yaroslavskiy's Dual Pivoting Algorithm
Quickselect Under Yaroslavskiy's Dual Pivoting AlgorithmQuickselect Under Yaroslavskiy's Dual Pivoting Algorithm
Quickselect Under Yaroslavskiy's Dual Pivoting AlgorithmSebastian Wild
 
SIAM-AG21-Topological Persistence Machine of Phase Transition
SIAM-AG21-Topological Persistence Machine of Phase TransitionSIAM-AG21-Topological Persistence Machine of Phase Transition
SIAM-AG21-Topological Persistence Machine of Phase TransitionHa Phuong
 
Model predictive-fuzzy-control-of-air-ratio-for-automotive-engines
Model predictive-fuzzy-control-of-air-ratio-for-automotive-enginesModel predictive-fuzzy-control-of-air-ratio-for-automotive-engines
Model predictive-fuzzy-control-of-air-ratio-for-automotive-enginespace130557
 
Mva 06 principal_component_analysis_2010_11
Mva 06 principal_component_analysis_2010_11Mva 06 principal_component_analysis_2010_11
Mva 06 principal_component_analysis_2010_11P Palai
 
Availability of a Redundant System with Two Parallel Active Components
Availability of a Redundant System with Two Parallel Active ComponentsAvailability of a Redundant System with Two Parallel Active Components
Availability of a Redundant System with Two Parallel Active Componentstheijes
 
A common fixed point theorem for two random operators using random mann itera...
A common fixed point theorem for two random operators using random mann itera...A common fixed point theorem for two random operators using random mann itera...
A common fixed point theorem for two random operators using random mann itera...Alexander Decker
 
Clustering techniques
Clustering techniquesClustering techniques
Clustering techniquestalktoharry
 
Moment Preserving Approximation of Independent Components for the Reconstruct...
Moment Preserving Approximation of Independent Components for the Reconstruct...Moment Preserving Approximation of Independent Components for the Reconstruct...
Moment Preserving Approximation of Independent Components for the Reconstruct...rahulmonikasharma
 
Edge tenacity in cycles and complete
Edge tenacity in cycles and completeEdge tenacity in cycles and complete
Edge tenacity in cycles and completeijfcstjournal
 

What's hot (19)

4 hydrology geostatistics-part_2
4 hydrology geostatistics-part_2 4 hydrology geostatistics-part_2
4 hydrology geostatistics-part_2
 
CLIM Fall 2017 Course: Statistics for Climate Research, Climate Informatics -...
CLIM Fall 2017 Course: Statistics for Climate Research, Climate Informatics -...CLIM Fall 2017 Course: Statistics for Climate Research, Climate Informatics -...
CLIM Fall 2017 Course: Statistics for Climate Research, Climate Informatics -...
 
MUMS: Bayesian, Fiducial, and Frequentist Conference - Model Selection in the...
MUMS: Bayesian, Fiducial, and Frequentist Conference - Model Selection in the...MUMS: Bayesian, Fiducial, and Frequentist Conference - Model Selection in the...
MUMS: Bayesian, Fiducial, and Frequentist Conference - Model Selection in the...
 
F0742328
F0742328F0742328
F0742328
 
CLIM: Transition Workshop - Incorporating Spatial Dependence in Remote Sensin...
CLIM: Transition Workshop - Incorporating Spatial Dependence in Remote Sensin...CLIM: Transition Workshop - Incorporating Spatial Dependence in Remote Sensin...
CLIM: Transition Workshop - Incorporating Spatial Dependence in Remote Sensin...
 
The inverse scattering series for tasks associated with primaries: direct non...
The inverse scattering series for tasks associated with primaries: direct non...The inverse scattering series for tasks associated with primaries: direct non...
The inverse scattering series for tasks associated with primaries: direct non...
 
THE RESEARCH OF QUANTUM PHASE ESTIMATION ALGORITHM
THE RESEARCH OF QUANTUM PHASE ESTIMATION ALGORITHMTHE RESEARCH OF QUANTUM PHASE ESTIMATION ALGORITHM
THE RESEARCH OF QUANTUM PHASE ESTIMATION ALGORITHM
 
D143136
D143136D143136
D143136
 
Program on Mathematical and Statistical Methods for Climate and the Earth Sys...
Program on Mathematical and Statistical Methods for Climate and the Earth Sys...Program on Mathematical and Statistical Methods for Climate and the Earth Sys...
Program on Mathematical and Statistical Methods for Climate and the Earth Sys...
 
Quickselect Under Yaroslavskiy's Dual Pivoting Algorithm
Quickselect Under Yaroslavskiy's Dual Pivoting AlgorithmQuickselect Under Yaroslavskiy's Dual Pivoting Algorithm
Quickselect Under Yaroslavskiy's Dual Pivoting Algorithm
 
SIAM-AG21-Topological Persistence Machine of Phase Transition
SIAM-AG21-Topological Persistence Machine of Phase TransitionSIAM-AG21-Topological Persistence Machine of Phase Transition
SIAM-AG21-Topological Persistence Machine of Phase Transition
 
Model predictive-fuzzy-control-of-air-ratio-for-automotive-engines
Model predictive-fuzzy-control-of-air-ratio-for-automotive-enginesModel predictive-fuzzy-control-of-air-ratio-for-automotive-engines
Model predictive-fuzzy-control-of-air-ratio-for-automotive-engines
 
Mva 06 principal_component_analysis_2010_11
Mva 06 principal_component_analysis_2010_11Mva 06 principal_component_analysis_2010_11
Mva 06 principal_component_analysis_2010_11
 
Availability of a Redundant System with Two Parallel Active Components
Availability of a Redundant System with Two Parallel Active ComponentsAvailability of a Redundant System with Two Parallel Active Components
Availability of a Redundant System with Two Parallel Active Components
 
A common fixed point theorem for two random operators using random mann itera...
A common fixed point theorem for two random operators using random mann itera...A common fixed point theorem for two random operators using random mann itera...
A common fixed point theorem for two random operators using random mann itera...
 
Clustering techniques
Clustering techniquesClustering techniques
Clustering techniques
 
1789 1800
1789 18001789 1800
1789 1800
 
Moment Preserving Approximation of Independent Components for the Reconstruct...
Moment Preserving Approximation of Independent Components for the Reconstruct...Moment Preserving Approximation of Independent Components for the Reconstruct...
Moment Preserving Approximation of Independent Components for the Reconstruct...
 
Edge tenacity in cycles and complete
Edge tenacity in cycles and completeEdge tenacity in cycles and complete
Edge tenacity in cycles and complete
 

Similar to MUMS: Bayesian, Fiducial, and Frequentist Conference - Generalized Probabilistic Principal Component Analysis of Correlated Mortality, Mengyang Gu, April 30, 2019

A walk through the intersection between machine learning and mechanistic mode...
A walk through the intersection between machine learning and mechanistic mode...A walk through the intersection between machine learning and mechanistic mode...
A walk through the intersection between machine learning and mechanistic mode...JuanPabloCarbajal3
 
A new generalized lindley distribution
A new generalized lindley distributionA new generalized lindley distribution
A new generalized lindley distributionAlexander Decker
 
Spatio-Temporal Characterization with Wavelet Coherence: Anexus between Envir...
Spatio-Temporal Characterization with Wavelet Coherence: Anexus between Envir...Spatio-Temporal Characterization with Wavelet Coherence: Anexus between Envir...
Spatio-Temporal Characterization with Wavelet Coherence: Anexus between Envir...ijsc
 
An investigation of inference of the generalized extreme value distribution b...
An investigation of inference of the generalized extreme value distribution b...An investigation of inference of the generalized extreme value distribution b...
An investigation of inference of the generalized extreme value distribution b...Alexander Decker
 
Mining group correlations over data streams
Mining group correlations over data streamsMining group correlations over data streams
Mining group correlations over data streamsyuanchung
 
LOGNORMAL ORDINARY KRIGING METAMODEL IN SIMULATION OPTIMIZATION
LOGNORMAL ORDINARY KRIGING METAMODEL IN SIMULATION OPTIMIZATIONLOGNORMAL ORDINARY KRIGING METAMODEL IN SIMULATION OPTIMIZATION
LOGNORMAL ORDINARY KRIGING METAMODEL IN SIMULATION OPTIMIZATIONorajjournal
 
The Odd Generalized Exponential Log Logistic Distribution
The Odd Generalized Exponential Log Logistic DistributionThe Odd Generalized Exponential Log Logistic Distribution
The Odd Generalized Exponential Log Logistic Distributioninventionjournals
 
A Comparison of Particle Swarm Optimization and Differential Evolution
A Comparison of Particle Swarm Optimization and Differential EvolutionA Comparison of Particle Swarm Optimization and Differential Evolution
A Comparison of Particle Swarm Optimization and Differential Evolutionijsc
 
Jgrass-NewAge: Kriging component
Jgrass-NewAge: Kriging componentJgrass-NewAge: Kriging component
Jgrass-NewAge: Kriging componentNiccolò Tubini
 
Graph Spectra through Network Complexity Measures: Information Content of Eig...
Graph Spectra through Network Complexity Measures: Information Content of Eig...Graph Spectra through Network Complexity Measures: Information Content of Eig...
Graph Spectra through Network Complexity Measures: Information Content of Eig...Hector Zenil
 

Similar to MUMS: Bayesian, Fiducial, and Frequentist Conference - Generalized Probabilistic Principal Component Analysis of Correlated Mortality, Mengyang Gu, April 30, 2019 (20)

MUMS Opening Workshop - An Overview of Reduced-Order Models and Emulators (ED...
MUMS Opening Workshop - An Overview of Reduced-Order Models and Emulators (ED...MUMS Opening Workshop - An Overview of Reduced-Order Models and Emulators (ED...
MUMS Opening Workshop - An Overview of Reduced-Order Models and Emulators (ED...
 
A walk through the intersection between machine learning and mechanistic mode...
A walk through the intersection between machine learning and mechanistic mode...A walk through the intersection between machine learning and mechanistic mode...
A walk through the intersection between machine learning and mechanistic mode...
 
Climate Extremes Workshop - Extreme Values of Vertical Wind Speed in Doppler ...
Climate Extremes Workshop - Extreme Values of Vertical Wind Speed in Doppler ...Climate Extremes Workshop - Extreme Values of Vertical Wind Speed in Doppler ...
Climate Extremes Workshop - Extreme Values of Vertical Wind Speed in Doppler ...
 
A new generalized lindley distribution
A new generalized lindley distributionA new generalized lindley distribution
A new generalized lindley distribution
 
intro
introintro
intro
 
Spatio-Temporal Characterization with Wavelet Coherence: Anexus between Envir...
Spatio-Temporal Characterization with Wavelet Coherence: Anexus between Envir...Spatio-Temporal Characterization with Wavelet Coherence: Anexus between Envir...
Spatio-Temporal Characterization with Wavelet Coherence: Anexus between Envir...
 
An investigation of inference of the generalized extreme value distribution b...
An investigation of inference of the generalized extreme value distribution b...An investigation of inference of the generalized extreme value distribution b...
An investigation of inference of the generalized extreme value distribution b...
 
My Prize Winning Physics Poster from 2006
My Prize Winning Physics Poster from 2006My Prize Winning Physics Poster from 2006
My Prize Winning Physics Poster from 2006
 
Mining group correlations over data streams
Mining group correlations over data streamsMining group correlations over data streams
Mining group correlations over data streams
 
Master's thesis
Master's thesisMaster's thesis
Master's thesis
 
How to Decide the Best Fuzzy Model in ANFIS
How to Decide the Best Fuzzy Model in ANFIS How to Decide the Best Fuzzy Model in ANFIS
How to Decide the Best Fuzzy Model in ANFIS
 
LOGNORMAL ORDINARY KRIGING METAMODEL IN SIMULATION OPTIMIZATION
LOGNORMAL ORDINARY KRIGING METAMODEL IN SIMULATION OPTIMIZATIONLOGNORMAL ORDINARY KRIGING METAMODEL IN SIMULATION OPTIMIZATION
LOGNORMAL ORDINARY KRIGING METAMODEL IN SIMULATION OPTIMIZATION
 
MUMS: Transition & SPUQ Workshop - A Review of Model Calibration Methods with...
MUMS: Transition & SPUQ Workshop - A Review of Model Calibration Methods with...MUMS: Transition & SPUQ Workshop - A Review of Model Calibration Methods with...
MUMS: Transition & SPUQ Workshop - A Review of Model Calibration Methods with...
 
The Odd Generalized Exponential Log Logistic Distribution
The Odd Generalized Exponential Log Logistic DistributionThe Odd Generalized Exponential Log Logistic Distribution
The Odd Generalized Exponential Log Logistic Distribution
 
A Comparison of Particle Swarm Optimization and Differential Evolution
A Comparison of Particle Swarm Optimization and Differential EvolutionA Comparison of Particle Swarm Optimization and Differential Evolution
A Comparison of Particle Swarm Optimization and Differential Evolution
 
CLIM: Transition Workshop - Accounting for Model Errors Due to Sub-Grid Scale...
CLIM: Transition Workshop - Accounting for Model Errors Due to Sub-Grid Scale...CLIM: Transition Workshop - Accounting for Model Errors Due to Sub-Grid Scale...
CLIM: Transition Workshop - Accounting for Model Errors Due to Sub-Grid Scale...
 
Jgrass-NewAge: Kriging component
Jgrass-NewAge: Kriging componentJgrass-NewAge: Kriging component
Jgrass-NewAge: Kriging component
 
Data Mgmt-Liu.pdf
Data Mgmt-Liu.pdfData Mgmt-Liu.pdf
Data Mgmt-Liu.pdf
 
Graph Spectra through Network Complexity Measures: Information Content of Eig...
Graph Spectra through Network Complexity Measures: Information Content of Eig...Graph Spectra through Network Complexity Measures: Information Content of Eig...
Graph Spectra through Network Complexity Measures: Information Content of Eig...
 
CLIM: Transition Workshop - Investigating Precipitation Extremes in the US Gu...
CLIM: Transition Workshop - Investigating Precipitation Extremes in the US Gu...CLIM: Transition Workshop - Investigating Precipitation Extremes in the US Gu...
CLIM: Transition Workshop - Investigating Precipitation Extremes in the US Gu...
 

More from The Statistical and Applied Mathematical Sciences Institute

More from The Statistical and Applied Mathematical Sciences Institute (20)

Causal Inference Opening Workshop - Latent Variable Models, Causal Inference,...
Causal Inference Opening Workshop - Latent Variable Models, Causal Inference,...Causal Inference Opening Workshop - Latent Variable Models, Causal Inference,...
Causal Inference Opening Workshop - Latent Variable Models, Causal Inference,...
 
2019 Fall Series: Special Guest Lecture - 0-1 Phase Transitions in High Dimen...
2019 Fall Series: Special Guest Lecture - 0-1 Phase Transitions in High Dimen...2019 Fall Series: Special Guest Lecture - 0-1 Phase Transitions in High Dimen...
2019 Fall Series: Special Guest Lecture - 0-1 Phase Transitions in High Dimen...
 
Causal Inference Opening Workshop - Causal Discovery in Neuroimaging Data - F...
Causal Inference Opening Workshop - Causal Discovery in Neuroimaging Data - F...Causal Inference Opening Workshop - Causal Discovery in Neuroimaging Data - F...
Causal Inference Opening Workshop - Causal Discovery in Neuroimaging Data - F...
 
Causal Inference Opening Workshop - Smooth Extensions to BART for Heterogeneo...
Causal Inference Opening Workshop - Smooth Extensions to BART for Heterogeneo...Causal Inference Opening Workshop - Smooth Extensions to BART for Heterogeneo...
Causal Inference Opening Workshop - Smooth Extensions to BART for Heterogeneo...
 
Causal Inference Opening Workshop - A Bracketing Relationship between Differe...
Causal Inference Opening Workshop - A Bracketing Relationship between Differe...Causal Inference Opening Workshop - A Bracketing Relationship between Differe...
Causal Inference Opening Workshop - A Bracketing Relationship between Differe...
 
Causal Inference Opening Workshop - Testing Weak Nulls in Matched Observation...
Causal Inference Opening Workshop - Testing Weak Nulls in Matched Observation...Causal Inference Opening Workshop - Testing Weak Nulls in Matched Observation...
Causal Inference Opening Workshop - Testing Weak Nulls in Matched Observation...
 
Causal Inference Opening Workshop - Difference-in-differences: more than meet...
Causal Inference Opening Workshop - Difference-in-differences: more than meet...Causal Inference Opening Workshop - Difference-in-differences: more than meet...
Causal Inference Opening Workshop - Difference-in-differences: more than meet...
 
Causal Inference Opening Workshop - New Statistical Learning Methods for Esti...
Causal Inference Opening Workshop - New Statistical Learning Methods for Esti...Causal Inference Opening Workshop - New Statistical Learning Methods for Esti...
Causal Inference Opening Workshop - New Statistical Learning Methods for Esti...
 
Causal Inference Opening Workshop - Bipartite Causal Inference with Interfere...
Causal Inference Opening Workshop - Bipartite Causal Inference with Interfere...Causal Inference Opening Workshop - Bipartite Causal Inference with Interfere...
Causal Inference Opening Workshop - Bipartite Causal Inference with Interfere...
 
Causal Inference Opening Workshop - Bridging the Gap Between Causal Literatur...
Causal Inference Opening Workshop - Bridging the Gap Between Causal Literatur...Causal Inference Opening Workshop - Bridging the Gap Between Causal Literatur...
Causal Inference Opening Workshop - Bridging the Gap Between Causal Literatur...
 
Causal Inference Opening Workshop - Some Applications of Reinforcement Learni...
Causal Inference Opening Workshop - Some Applications of Reinforcement Learni...Causal Inference Opening Workshop - Some Applications of Reinforcement Learni...
Causal Inference Opening Workshop - Some Applications of Reinforcement Learni...
 
Causal Inference Opening Workshop - Bracketing Bounds for Differences-in-Diff...
Causal Inference Opening Workshop - Bracketing Bounds for Differences-in-Diff...Causal Inference Opening Workshop - Bracketing Bounds for Differences-in-Diff...
Causal Inference Opening Workshop - Bracketing Bounds for Differences-in-Diff...
 
Causal Inference Opening Workshop - Assisting the Impact of State Polcies: Br...
Causal Inference Opening Workshop - Assisting the Impact of State Polcies: Br...Causal Inference Opening Workshop - Assisting the Impact of State Polcies: Br...
Causal Inference Opening Workshop - Assisting the Impact of State Polcies: Br...
 
Causal Inference Opening Workshop - Experimenting in Equilibrium - Stefan Wag...
Causal Inference Opening Workshop - Experimenting in Equilibrium - Stefan Wag...Causal Inference Opening Workshop - Experimenting in Equilibrium - Stefan Wag...
Causal Inference Opening Workshop - Experimenting in Equilibrium - Stefan Wag...
 
Causal Inference Opening Workshop - Targeted Learning for Causal Inference Ba...
Causal Inference Opening Workshop - Targeted Learning for Causal Inference Ba...Causal Inference Opening Workshop - Targeted Learning for Causal Inference Ba...
Causal Inference Opening Workshop - Targeted Learning for Causal Inference Ba...
 
Causal Inference Opening Workshop - Bayesian Nonparametric Models for Treatme...
Causal Inference Opening Workshop - Bayesian Nonparametric Models for Treatme...Causal Inference Opening Workshop - Bayesian Nonparametric Models for Treatme...
Causal Inference Opening Workshop - Bayesian Nonparametric Models for Treatme...
 
2019 Fall Series: Special Guest Lecture - Adversarial Risk Analysis of the Ge...
2019 Fall Series: Special Guest Lecture - Adversarial Risk Analysis of the Ge...2019 Fall Series: Special Guest Lecture - Adversarial Risk Analysis of the Ge...
2019 Fall Series: Special Guest Lecture - Adversarial Risk Analysis of the Ge...
 
2019 Fall Series: Professional Development, Writing Academic Papers…What Work...
2019 Fall Series: Professional Development, Writing Academic Papers…What Work...2019 Fall Series: Professional Development, Writing Academic Papers…What Work...
2019 Fall Series: Professional Development, Writing Academic Papers…What Work...
 
2019 GDRR: Blockchain Data Analytics - Machine Learning in/for Blockchain: Fu...
2019 GDRR: Blockchain Data Analytics - Machine Learning in/for Blockchain: Fu...2019 GDRR: Blockchain Data Analytics - Machine Learning in/for Blockchain: Fu...
2019 GDRR: Blockchain Data Analytics - Machine Learning in/for Blockchain: Fu...
 
2019 GDRR: Blockchain Data Analytics - QuTrack: Model Life Cycle Management f...
2019 GDRR: Blockchain Data Analytics - QuTrack: Model Life Cycle Management f...2019 GDRR: Blockchain Data Analytics - QuTrack: Model Life Cycle Management f...
2019 GDRR: Blockchain Data Analytics - QuTrack: Model Life Cycle Management f...
 

Recently uploaded

Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsanshu789521
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxheathfieldcps1
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Krashi Coaching
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesFatimaKhan178732
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Sapana Sha
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformChameera Dedduwage
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...Marc Dusseiller Dusjagr
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionSafetyChain Software
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationnomboosow
 
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Celine George
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfsanyamsingh5019
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxmanuelaromero2013
 
_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting Data_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting DataJhengPantaleon
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxiammrhaywood
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxGaneshChakor2
 
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...M56BOOKSTORE PRODUCT/SERVICE
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxOH TEIK BIN
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Educationpboyjonauth
 

Recently uploaded (20)

Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha elections
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and Actinides
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy Reform
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory Inspection
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communication
 
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdf
 
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptx
 
_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting Data_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting Data
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptx
 
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
 
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptx
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Education
 

MUMS: Bayesian, Fiducial, and Frequentist Conference - Generalized Probabilistic Principal Component Analysis of Correlated Mortality, Mengyang Gu, April 30, 2019

  • 1. Generalized probabilistic principal component analysis of correlated data Mengyang Gu and Weining Shen Department of Applied Mathematics and Statistics Johns Hopkins University Department of Statistics University of California, Irvine SAMSI BFF Conference Mengyang Gu (Johns Hopkins University) GPPCA SAMSI BFF Conference 1 / 54
  • 2. Outline 1 Introduction 2 Generalized probabilistic principal component analysis (GPPCA) 3 GPPCA with a mean structure 4 Simulated examples Correctly specified models Misspecified models 5 Real examples Humanity computer model with multiple outputs Global gridded temperature anomalies 6 Future directions Mengyang Gu (Johns Hopkins University) GPPCA SAMSI BFF Conference 2 / 54
  • 3. Introduction Outline 1 Introduction 2 Generalized probabilistic principal component analysis (GPPCA) 3 GPPCA with a mean structure 4 Simulated examples 5 Real examples 6 Future directions Mengyang Gu (Johns Hopkins University) GPPCA SAMSI BFF Conference 3 / 54
  • 4. Introduction NOAA monthly gridded temperature anomalies −5 0 5 [oC] 50 150 250 350 −50 0 50 NOAA Temperature Anomalies in Feb 2017 Longitude Latitude −5 0 5 [oC] 50 150 250 350 −50 0 50 NOAA Temperature Anomalies in Dec 2018 Longitude Latitude Figure 1: NOAA monthly gridded temperature anomalies. Mengyang Gu (Johns Hopkins University) GPPCA SAMSI BFF Conference 4 / 54
  • 5. Introduction Ground deformation by radar interferograms −5000 0 5000 −6000−2000020006000 interferogram 1 x1 x2 −0.05 0.00 0.05 m/yr −5000 0 5000 −6000−2000020006000 interferogram 2 x1 x2 −0.05 0.00 0.05 m/yr −5000 0 5000 −6000−2000020006000 interferogram 3 x1 x2 −0.05 0.00 0.05 m/yr −5000 0 5000 −6000−2000020006000 interferogram 4 x1 x2 −0.05 0.00 0.05 m/yr −5000 0 5000 −6000−2000020006000 interferogram 5 x1 x2 −0.05 0.00 0.05 m/yr Figure 2: Five interferometric synthetic aperture radar (InSAR) interferograms spanning the following time periods: 1) 17 Oct 2011 - 04 May 2012; 2) 21 Oct 2011 - 16 May 2012; 3); 20 Oct 2011 to 15 May 2012; 4) 28 Oct 2011 to 11 May 2012; 5) 12 Oct 2011 - 07 May 2012. The black curves show cliffs and other important topographic features at K¯ılauea; the large elliptical feature is K¯ılauea Caldera. The color indicates the ground deformation rate per year. The figures are from [Gu and Anderson, 2018]. Mengyang Gu (Johns Hopkins University) GPPCA SAMSI BFF Conference 5 / 54
  • 6. Introduction Emulation of computer models with multiple outputs Figure 3: Median (truncated at 20 meters at the volcanic center region) and interquartile range of the GaSP emulator of ‘maximum flow height over time’ for TITAN2D, at 23,040 spatial locations over Montserrat Island and for new input values V ∗ = 106.9984 , ϕ∗ = 3.3487, δ∗ bed = 10.8790, and δ∗ int = 31.0300. The figures are from [Gu and Berger, 2016]. Mengyang Gu (Johns Hopkins University) GPPCA SAMSI BFF Conference 6 / 54
  • 7. Introduction Multiple sequences/time series q q q q qq q q q qq q q qq qq qq q q q q q q qq q q q q q qq q q q qq q qqq qq q q q q q q qq q q q q q q q q qq q q qq q q q qq q qq q q q q q q q q qq q q q qq qqq q q qqqqq q qq q qq qqqq qqqqq qq q q qqq qqq qqqq q qqq q q qq qqqqqqqqq qqqqqqqqqqqqqq qqq q qq q qqqqqqq qqqqqqqqqq qqq qqq qqqq q qqqqqq qqqqq q qqqqqqqqqqqqqqqqqq qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq qqqqqqqqqqqqq qqqqqqqqqqqqqqqqqq qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq qqqqqqqqqqq qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq q qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq q qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq q qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq qqqqqqqqq qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq q qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq 0 1000 2000 3000 4000 5000 0.00.20.40.60.81.0 CpG site distance Correlation 0 5 10 15 20 25 0 5 10 15 20 25 Sample Sample 0.7 0.8 0.9 1.0 value Figure 4: Empirical correlation of methylation levels across sites (left panel) and across samples (right panel), based on 24 samples and one million methylation levels in chromosome 1 of each sample. The figures are from [Gu and Xu, 2017]. Similar data include multiple time series health records Mengyang Gu (Johns Hopkins University) GPPCA SAMSI BFF Conference 7 / 54
  • 8. Introduction A latent factor model Let y(x) = (y1(x), ..., yk(x))T be a k-dimensional real-valued output vector at a p-dimensional input vector x. Assume yj(x) has zero mean for now. Consider the following latent factor model y(x) = Az(x) + , (1) The k × d factor loading matrix A = [a1, ..., ad] relates the k-dimensional outputs to a d-dimensional factor processes z(x) = (z1(x), ..., zd(x))T , where d ≤ k. Assume the independence between any two factor processes. Assume Zl = (zl(x1), ..., zl(xn)) follows a multivariate normal distribution ZT l ∼ MN(0, Σl), (2) where Σl can be parameterized by a covariance function such that the (i, j) entry of Σl is σ2 l Kl(xi, xj), where Kl(·, ·) is a kernel function, for l = 1, ..., d and 1 ≤ i, j ≤ n. This model is often referred as the semiparameteric latent factor model [Seeger et al., 2005, Alvarez et al., 2012] and it is a special case of the linear model of corregionalization (LMC) [Gelfand et al., 2004]. Mengyang Gu (Johns Hopkins University) GPPCA SAMSI BFF Conference 8 / 54
  • 9. Introduction Estimation of the factor loading matrix Let Y = [y(x1), ..., y(xn)] be the k × n matrix of the observations and let Z = [z(x1), ..., z(xn)] be the d × n latent factor matrix. It is popular to estimate A by PCA. [Higdon et al., 2008, Paulo et al., 2012] estimate A by the first d columns of √ nU0D 1/2 0 where U0D0UT 0 are the eigendecomposition of YYT /n. Mengyang Gu (Johns Hopkins University) GPPCA SAMSI BFF Conference 9 / 54
  • 10. Introduction Estimation of the factor loading matrix Let Y = [y(x1), ..., y(xn)] be the k × n matrix of the observations and let Z = [z(x1), ..., z(xn)] be the d × n latent factor matrix. It is popular to estimate A by PCA. [Higdon et al., 2008, Paulo et al., 2012] estimate A by the first d columns of √ nU0D 1/2 0 where U0D0UT 0 are the eigendecomposition of YYT /n. In [Tipping and Bishop, 1999], they study a latent factor model Y = AZ + , with independent factors Z ∼ N(0, Ink). Assume each row of Y is zero, the maximum marginal likelihood estimator (MMLE) of A is the first d columns U0(D0 − σ2 0Ik)1/2 R, where R is an arbitrary d × d orthogonal rotation matrix. Note that the model (1) is unchanged if one replaces the pair (A, z(x)) by (AE, E−1 z(x)) for any invertible matrix E. So only the subspace of A, denoted as M(A), can be uniquely determined. The linear subspace by the above PCA for the LMC model and MMLE of the (independent) latent factor model are the same. Mengyang Gu (Johns Hopkins University) GPPCA SAMSI BFF Conference 9 / 54
  • 11. Introduction Research goals What is the maximum marginal likelihood estimator of the factor loadings (and other parameters) in the latent factor model (1) (where the factors are dependent)? What are the predictive distributions of the new data? Are they computationally feasible? If we have additional regressors (covariates), can we also combine them in the model in a coherent way? Mengyang Gu (Johns Hopkins University) GPPCA SAMSI BFF Conference 10 / 54
  • 12. Generalized probabilistic principal component analysis (GPPCA) Outline 1 Introduction 2 Generalized probabilistic principal component analysis (GPPCA) 3 GPPCA with a mean structure 4 Simulated examples 5 Real examples 6 Future directions Mengyang Gu (Johns Hopkins University) GPPCA SAMSI BFF Conference 11 / 54
  • 13. Generalized probabilistic principal component analysis (GPPCA) Orthogonal assumption Since only the linear subspace of the factor loading matrix M(A) is identifiable, we assume the columns of A in model (1) are orthonormal: Assumptions 1 AT A = Id. (3) Note one may assume AT A = cId where c is a positive constant which can potentially depend on k, e.g. c = k. But the variance parameters of the factor processes are estimated by the data so we focus on Assumption 1. This assumption is also the key for some other estimators of factor loading matrix [Lam et al., 2011, Lam and Yao, 2012]. The MLE of the factor loading matrix A under the Assumption 1 is √ nU0R (without marginalizing out Z), where U0 is the first d ordered eigenvectors of YYT /n and R is an orthogonal rotation matrix (same as the PCA). E.g. [Bai and Ng, 2002] and [Bai, 2003] assume that AT A = kId and estimate A by √ kU0 in modeling high-dimensional time series. Mengyang Gu (Johns Hopkins University) GPPCA SAMSI BFF Conference 12 / 54
  • 14. Generalized probabilistic principal component analysis (GPPCA) Marginal likelihood (Known expression of the marginal likelihood) Denote the vectorization of the output Yv = vec(Y) and the d × n latent factor matrix Z = (z(x1), ..., z(xn)) at inputs {x1, ..., xn}. After marginalizing out Z, Y follows a multivariate normal distribution as follows ([Banerjee et al., 2014]) Yv | A, σ2 0, Σ1, ..., Σd ∼ MN 0, d l=1 Σl ⊗ (alaT l ) + σ2 0Ink . Lemma 1 (Marginal likelihood) Under Assumption 1, the marginal distribution of Yv in model (1) is the multivariate normal distribution as follows Yv | A, σ2 0, Σ1, ..., Σd ∼ MN  0, σ2 0 Ink − d l=1 (σ2 0Σ−1 l + In)−1 ⊗ (alaT l ) −1   , for l = 1, ..., d. Mengyang Gu (Johns Hopkins University) GPPCA SAMSI BFF Conference 13 / 54
  • 15. Generalized probabilistic principal component analysis (GPPCA) Theorem 1 (Maximum marginal likelihood estimator) For model (1), under Assumption 1, after marginalizing out Z, 1. if Σ1 = ... = Σd = Σ, the marginal likelihood is maximized when ˆA = UR, (4) where U is a k × d matrix of the first d principal eigenvectors of G = Y(σ2 0Σ−1 + In)−1 YT , (5) and R is an arbitrary d × d orthogonal rotation matrix; Mengyang Gu (Johns Hopkins University) GPPCA SAMSI BFF Conference 14 / 54
  • 16. Generalized probabilistic principal component analysis (GPPCA) Theorem 1 (Maximum marginal likelihood estimator) For model (1), under Assumption 1, after marginalizing out Z, 1. if Σ1 = ... = Σd = Σ, the marginal likelihood is maximized when ˆA = UR, (4) where U is a k × d matrix of the first d principal eigenvectors of G = Y(σ2 0Σ−1 + In)−1 YT , (5) and R is an arbitrary d × d orthogonal rotation matrix; 2. if the covariances of the factor processes are different, denoting Gl = Y(σ2 0Σ−1 l + In)−1 YT , the maximum marginal likelihood estimator of A is ˆA = argmaxA d l=1 aT l Glal, s.t. AT A = Id, (6) A numerical optimization algorithm that preserves the orthogonal constraints in (6) is introduced in [Wen and Yin, 2013]. Mengyang Gu (Johns Hopkins University) GPPCA SAMSI BFF Conference 14 / 54
  • 17. Generalized probabilistic principal component analysis (GPPCA) Generalized probabilistic principal component analysis The estimator in Theorem 1 is called the generalized probabilistic principal component analysis (GPPCA), which is a direct extension of the PPCA in [Tipping and Bishop, 1999] when the factors are correlated. Mengyang Gu (Johns Hopkins University) GPPCA SAMSI BFF Conference 15 / 54
  • 18. Generalized probabilistic principal component analysis (GPPCA) Generalized probabilistic principal component analysis The estimator in Theorem 1 is called the generalized probabilistic principal component analysis (GPPCA), which is a direct extension of the PPCA in [Tipping and Bishop, 1999] when the factors are correlated. For demonstration purposes, let (i, j)-term of Σl be σ2 l Kl(xi, xj), where Kl(·, ·) is a kernel functions, having parameters γl. Denote the signal to noise ratio (SNR) τl = σ2 l σ2 0 . Let τ = (τ1, ..., τd) and γ = (γ1, ..., γd). The maximum marginal likelihood estimator of σ2 0 becomes a function of ˆA, τ and γ as σ2 0 = ˆS2 /(nk), where ˆS2 = tr(YT Y) − d l=1 ˆaT l Y(τ−1 l K−1 l + In)−1 YT ˆal. Plugging ˆA and ˆσ2 0, the marginal likelihood satisfies L(τ, γ | Y, ˆA, ˆσ2 0) ∝ d l=1 |τlKl + In|−1/2 | ˆS2 |−nk/2 . (7) After obtaining (ˆτ, ˆγ) by maximizing the marginal likelihood, one get the ˆA, ˆσ2 0, and ˆσ2 l for l = 1, ..., d. Mengyang Gu (Johns Hopkins University) GPPCA SAMSI BFF Conference 15 / 54
  • 19. Generalized probabilistic principal component analysis (GPPCA) Computational complexity Each evaluation of the likelihood in (7), one needs max(O(dn3), O(dkn)) in general. Each evaluation of the optimization function for estimating A in Theorem 1, one needs max(O(dn3), O(dkn)) in general; to solve the eigenproblem when the covariance is shared, one needs min(kn2, k2n). When the input is one-dimensional and the Mat´ern kernel are used, the computational operations are only O(dkn) for computing the likelihood in (7) without any approximation (see e.g. [Whittle, 1954, Hartikainen and Sarkka, 2010]). There is an R package, called “FastGaSP” in CRAN that implements the fast algorithm for Gaussian process with Mat´ern kernel [Gu, 2019]. To directly solve the eigenproblem still has the rate min(O(kn2), O(k2n)), but the iterative algorithm has rate O(dkn). Mengyang Gu (Johns Hopkins University) GPPCA SAMSI BFF Conference 16 / 54
  • 20. Generalized probabilistic principal component analysis (GPPCA) Let ˆΣl be the estimator of the covariance matrix for the lth factor, where the (i, j) element of ˆΣl is ˆσ2 l ˆKl(xi, xj) by plugging in ˆσ2 l and ˆγl. Theorem 2 (Predictive distribution) Under the Assumption 1, for any x∗ , one has Y(x∗ ) | Y, ˆA, ˆγ, ˆσ2 , ˆσ2 0 ∼ MN ˆµ∗ (x∗ ), ˆΣ∗ (x∗ ) , where ˆµ∗ (x∗ ) = ˆAˆz(x∗ ), (8) with ˆz(x∗ ) = (ˆz1(x∗ ), ..., ˆzd(x∗ ))T , ˆzl(x∗ ) = ˆΣT l (x∗ )(ˆΣl + ˆσ2 0In)−1 YT ˆal, ˆΣl(x∗ ) = ˆσ2 l ( ˆKl(x1, x∗ ), ..., ˆKl(x1, x∗ ))T for l = 1, ..., d, and ˆΣ∗ (x∗ ) = ˆA ˆD(x∗ )ˆAT + ˆσ2 0(Ik − ˆAˆAT ) (9) with ˆD(x∗ ) being a diagonal matrix, and the lth diagonal term being ˆDl(x∗ ) = ˆσ2 l ˆKl(x∗ , x∗ ) + ˆσ2 0 − ˆΣT l (x∗ ) ˆΣl + ˆσ2 0In −1 ˆΣl(x∗ ) . Mengyang Gu (Johns Hopkins University) GPPCA SAMSI BFF Conference 17 / 54
  • 21. Generalized probabilistic principal component analysis (GPPCA) Illustrative example Example 1 The data are sampled from the latent factor model (1) with the shared covariance matrix Σ1 = Σ2 = Σ, where x is equally spaced from 1 to n and the kernel function is assumed to follow (14) with γ = 100 and σ2 = 1. We choose k = 2, d = 1 and n = 100. Two scenarios are implemented with σ2 0 = 0.01 and σ2 0 = 1, respectively. The parameters (σ2 0, σ2, γ) are assumed to be unknown and estimated from the data. Mengyang Gu (Johns Hopkins University) GPPCA SAMSI BFF Conference 18 / 54
  • 22. Generalized probabilistic principal component analysis (GPPCA) qq qq q q qq q qqqq qqqq qq qqq q q q q q q q qq qqq q q q qq q qq q qq q q q qq q q q qqq q q q q q q qq qqqqqq q q qqqq q q qq q q qqq q q q q q q qq q qq q q q q 0 20 40 60 80 100 −1.5−0.50.51.0 x y q q q q qqqqq q q q qqq qq q q q qq q q q qq q q q q q q qqqq q qq q q qq q q qq q q q q qqq q q q qqq q q q q q q q q q q q q q q q q q qq q q qq q q qqq qqq qq q q q q q q q q q qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq 0 20 40 60 80 100 −1012345 x y~ q q qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq −1.0 −0.5 0.0 0.5 −1.0−0.50.00.5 y1 y2 q q q q q q qq q q qqq q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qqq q q q q q q qq q q q q q q q 0 20 40 60 80 100 −3−2−10123 x y q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q qq q q q q q q q q q qq q q q q qqq q q q qq q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q qq q q q q q q q q qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq 0 20 40 60 80 100 −4−3−2−101 x y~ q q q qq qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq −3 −2 −1 0 1 2 −3−2−10123 y1 y2 Figure 5: Estimation of the factor loading matrix by the PCA and GPPCA for Example 1 with the variance of the noise being σ2 0 = 0.01 and σ2 0 = 1, graphed in the upper and lower panels, respectively. The circles and dots are the first and second rows of Y in the left panel, and of ˜Y = YL in the middle panels, where L = UD1/2 with U being the eigenvectors and the diagonals of D being the eigenvalues of (ˆσ2 0 ˆΣ−1 + In)−1 . In the right panels, the black, red and blue lines are the subspace of A, the first eigenvector of U0 and Y(ˆσ2 0 ˆΣ−1 + In)−1 YT , respectively, with the black triangles being the outputs. Mengyang Gu (Johns Hopkins University) GPPCA SAMSI BFF Conference 19 / 54
  • 23. Generalized probabilistic principal component analysis (GPPCA) Estimation of the mean 0 20 40 60 80 100 −1.0−0.50.00.5 x Y ^ 1 PCA GPPCA Truth qq q q q q q q q qqqq q qq q q q qq q q q q q q q q q q qqq q q q qq q qq q qq q q q qq q q q q q q q q q q q q q q qq q q q q q q qqq q q q q q q q qqq q q q q q q qq q q q q q q q 0 20 40 60 80 100 −0.20.00.20.4 x Y ^ 2 PCA GPPCA Truth q q q q q q q q q q q q q q q qq q q q q q q q q qq q q q q q q qq qq q q q q q qq q q qq q q q q qqq q q q qq q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q qq q q q q q q 0 20 40 60 80 100 −3−2−1012 x Y ^ 1 PCA GPPCA Truth q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q 0 20 40 60 80 100 −2−10123 x Y ^ 2 PCA GPPCA Truth q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q qq q q q q q q q q q qq q q q q qq q q q q qq q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q qq q q q q q q Figure 6: Estimation of AZ for Example 1 with the variance of the noise being σ2 0 = 0.01 and σ2 0 = 1, graphed in the upper panels and lower panels, respectively. The first row and second row of Y are graphed as the black curves in the left and right panels, respectively. The red dotted curves and the blue dashed curves are the prediction by the PCA and GPPCA, respectively. The grey region is the 95% posterior credible interval from GPPCA. Mengyang Gu (Johns Hopkins University) GPPCA SAMSI BFF Conference 20 / 54
  • 24. GPPCA with a mean structure Outline 1 Introduction 2 Generalized probabilistic principal component analysis (GPPCA) 3 GPPCA with a mean structure 4 Simulated examples 5 Real examples 6 Future directions Mengyang Gu (Johns Hopkins University) GPPCA SAMSI BFF Conference 21 / 54
  • 25. GPPCA with a mean structure Latent factor model with covariates Consider the latent factor model with a mean structure for a k-dimensional output vector at the input x, y(x) = (h(x)B)T + Az(x) + , (10) where h(x) is a 1 × q known mean basis function related to input x and possibly other covariates, B = (β1, ..., βk) is a q × k matrix of the mean (or trend) parameters. Denote M = In − H(HT H)−1 HT . We have the following lemma for the marginal likelihood estimator of the variance. Lemma 2 Consider an objective prior π(B) ∝ 1. Under Assumption 1, after marginalizing out B and Z, the maximum likelihood estimator for σ2 0 is ˆσ2 0 = S2 M /k(n − q), where S2 M = tr(YMYT ) − d l=1 aT l YM(M + τ−1 l K−1 l )−1 MYT al. Moreover, the marginal density of the data satisfies p(Y | A, τ, γ, ˆσ2 0) ∝ d l=1 |τlKl + In| −1/2 HT (τlKl + In)−1 H − 1 2 S2 M −(k(n−q) 2 ) Mengyang Gu (Johns Hopkins University) GPPCA SAMSI BFF Conference 22 / 54
  • 26. GPPCA with a mean structure GPPCA with the mean structure Since there is no closed-form expression for the parameters (τ, γ) in the kernels, one can numerically maximize the Equation (11) to estimate A and other parameters. ˆA = argmaxA d l=1 aT l Gl,M al, s.t. AT A = Id, (11) (ˆτ, ˆγ) = argmax(τ,γ)p(Y | ˆA, τ, γ). (12) When Σ1 = ... = Σd, the closed-form expression of ˆA can be obtained similarly in Theorem 1. In general, we can use the approach in [Wen and Yin, 2013] for solving the optimization problem in (12). After obtaining ˆτ and ˆσ2 0, we transform them to get ˆσ2 l = ˆτlˆσ2 0 for l = 1, ..., d. Mengyang Gu (Johns Hopkins University) GPPCA SAMSI BFF Conference 23 / 54
  • 27. GPPCA with a mean structure Theorem 3 (Predictive distribution) Under the Assumption 1, after marginalizing out Z and B by the objective prior π(B) ∝ 1, the predictive distribution of model (10) for any x∗ is Y(x∗ ) | Y, ˆA, ˆγ, ˆσ2 , ˆσ2 0 ∼ MN ˆµ∗ M (x∗ ), ˆΣ∗ M (x∗ ) . Here ˆµ∗ M (x∗ ) = h(x∗ )ˆB T + ˆAˆzM (x∗ ), where ˆB = (HT H)−1 H(Y − ˆAˆZM )T , ˆZM = (ˆZT 1,M , ..., ˆZT d,M )T , with ˆZl,M = aT l YM( ˆΣlM + ˆσ2 0In)−1 ˆΣl, and ˆzM (x∗ ) = (ˆz1,M (x∗ ), ..., ˆzd,M (x∗ ))T with ˆzl,M (x∗ ) = ˆΣT l (x∗ )( ˆΣlM + ˆσ2 0In)−1 MYal, for l = 1, ..., d. Moreover, ˆΣ∗ M (x∗ ) = ˆA ˆDM (x∗ ) ˆAT + ˆσ2 0(1 + h(x∗ )(HT H)−1 hT (x∗ ))(Ik − ˆA ˆAT ), where ˆDM (x∗ ) is a diagonal matrix with the lth diagonal term being ˆDl,M (x∗ ) = ˆσ2 l ˆKl(x∗ , x∗ ) + ˆσ2 0 − ˆΣT l (x∗ ) ˜Σ−1 l ˆΣl(x∗ ) + (hT (x∗ ) − HT ˜Σ−1 l ˆΣl(x∗ ))T (HT ˜Σ−1 l H)−1 (hT (x∗ ) − HT ˜Σ−1 l ˆΣl(x∗ )), with ˜Σl = ˆΣl + ˆσ2 0In for l = 1, ..., d. Mengyang Gu (Johns Hopkins University) GPPCA SAMSI BFF Conference 24 / 54
  • 28. Simulated examples Outline 1 Introduction 2 Generalized probabilistic principal component analysis (GPPCA) 3 GPPCA with a mean structure 4 Simulated examples Correctly specified models Misspecified models 5 Real examples 6 Future directions Mengyang Gu (Johns Hopkins University) GPPCA SAMSI BFF Conference 25 / 54
  • 29. Simulated examples Correctly specified models Outline 1 Introduction 2 Generalized probabilistic principal component analysis (GPPCA) 3 GPPCA with a mean structure 4 Simulated examples Correctly specified models Misspecified models 5 Real examples Humanity computer model with multiple outputs Global gridded temperature anomalies 6 Future directions Mengyang Gu (Johns Hopkins University) GPPCA SAMSI BFF Conference 26 / 54
  • 30. Simulated examples Correctly specified models Evaluation criteria (Largest principal angle). Let 0 ≤ φ1 ≤ ... ≤ φd ≤ π/2 be the principal angles between M(A) and M(ˆA), recursively defined by φi = arccos max a∈M(A),ˆa∈M( ˆA) |aT ˆa| = arccos(|aT i ˆai|), subject to ||a|| = ||ˆa|| = 1, aT ai = 0, ˆaT ˆai = 0, i = 1, ..., d − 1, where || · || denotes the L2 norm. The largest principal angle is φd. When the columns of the A and ˆA are orthogonal bases of the M(A) and M(ˆA), cos(φd) is equal to the smallest singular value of AT ˆA [Bj¨orck and Golub, 1973, Absil et al., 2006]. Mengyang Gu (Johns Hopkins University) GPPCA SAMSI BFF Conference 27 / 54
  • 31. Simulated examples Correctly specified models Evaluation criteria (Largest principal angle). Let 0 ≤ φ1 ≤ ... ≤ φd ≤ π/2 be the principal angles between M(A) and M(ˆA), recursively defined by φi = arccos max a∈M(A),ˆa∈M( ˆA) |aT ˆa| = arccos(|aT i ˆai|), subject to ||a|| = ||ˆa|| = 1, aT ai = 0, ˆaT ˆai = 0, i = 1, ..., d − 1, where || · || denotes the L2 norm. The largest principal angle is φd. When the columns of the A and ˆA are orthogonal bases of the M(A) and M(ˆA), cos(φd) is equal to the smallest singular value of AT ˆA [Bj¨orck and Golub, 1973, Absil et al., 2006]. (Average mean square error (AvgMSE)) of the output over N experiments: AvgMSE = N l=1 k j=1 n i=1 ( ˆY (l) j,i − E[Y (l) j,i ])2 knN , (13) where E[Y (l) j,i ] is the (j, i)-term of the mean of the output matrix at the lth experiment, and ˆY (l) j,i is the estimation. Mengyang Gu (Johns Hopkins University) GPPCA SAMSI BFF Conference 27 / 54
  • 32. Simulated examples Correctly specified models Approaches In the GPPCA, we let the covariance function of the lth factor be a product kernel σ2 l Kl(xa, xb) = σ2 l p m=1 Klm(xam, xbm) for demonstration purposes, where Klm(·, ·) is the Mat´ern kernel with roughness parameter 2.5 Klm(xam, xbm) = 1 + √ 5d γlm + 5d2 3γ2 lm exp − √ 5d γlm , (14) with d = |xam − xbm| and unknown range parameters γl = (γl1, ..., γlp). The MMLE will be used for estimating factor loading matrix and the parameters and the predictive mean of the data will be used for prediction. In PCA, ˆApca = U0, where U0 is the first d eigenvectors of YYT /n. In [Lam et al., 2011, Lam and Yao, 2012], A is estimated by q0 q=1 ˆΣy(q)ˆΣT y (q) with q0 = 1 and q0 = 5, where ˆΣy(q) is the sample covariance of the output at lag q. Independent GPs and parallel partial GPs will also be included for the last simulated examples. Mengyang Gu (Johns Hopkins University) GPPCA SAMSI BFF Conference 28 / 54
  • 33. Simulated examples Correctly specified models Example 2 (Factors with the same covariance matrix) The data are sampled from model (1) with Σ1 = ... = Σd = Σ, where xi = i for 1 ≤ i ≤ n, and the kernel function in (14) is used with γ = 100 and σ2 = 1. In each scenario, we simulate the data from 16 different combinations of σ2 0, k, d and n. We repeat N = 100 times for each scenario. The parameters (σ2 0, σ2, γ) are treated as unknown and estimated from the data. Mengyang Gu (Johns Hopkins University) GPPCA SAMSI BFF Conference 29 / 54
  • 34. Simulated examples Correctly specified models q q q q q q q q q qq q qq qqq qqqq qqqqq q q q q q qq q q q PCA GPPCA LY1 LY5 PCA GPPCA LY1 LY5 0.0 0.5 1.0 1.5 k = 8, d = 4 and τ = 100 LargestPrincipalAngle q q q qq q q qqqqqq q qqq q q qqq q qq PCA GPPCA LY1 LY5 PCA GPPCA LY1 LY5 0.0 0.5 1.0 1.5 k = 40, d = 4 and τ = 100 LargestPrincipalAngle q q q q qq q q q q q q q q q q q q q q qq qqq qq PCA GPPCA LY1 LY5 PCA GPPCA LY1 LY5 0.0 0.5 1.0 1.5 k = 16, d = 8 and τ = 100 LargestPrincipalAngle q q q qq q q q qqq q qq q q qq qq PCA GPPCA LY1 LY5 PCA GPPCA LY1 LY5 0.0 0.5 1.0 1.5 k = 80, d = 8 and τ = 100 LargestPrincipalAngle qq q q q q q q qq q q q PCA GPPCA LY1 LY5 PCA GPPCA LY1 LY5 0.0 0.5 1.0 1.5 k = 8, d = 4 and τ = 4 LargestPrincipalAngle q q qq qq q qq q q q q qq PCA GPPCA LY1 LY5 PCA GPPCA LY1 LY5 0.0 0.5 1.0 1.5 k = 40, d = 4 and τ = 4 LargestPrincipalAngle q q q qq q q qq qq qq q q q q q PCA GPPCA LY1 LY5 PCA GPPCA LY1 LY5 0.0 0.5 1.0 1.5 k = 16, d = 8 and τ = 4 LargestPrincipalAngle qqq q q q q q q qq q q q PCA GPPCA LY1 LY5 PCA GPPCA LY1 LY5 0.0 0.5 1.0 1.5 k = 80, d = 8 and τ = 4 LargestPrincipalAngle Figure 7: The largest principal angle. n = 200 and n = 400 for the left four boxplots and right four boxplots in the first rows, respectively; n = 500 and n = 1000 in the second row. Mengyang Gu (Johns Hopkins University) GPPCA SAMSI BFF Conference 30 / 54
  • 35. Simulated examples Correctly specified models d = 4 and τ = 100 k=8 k=40 n = 200 n = 400 n = 200 n = 400 PCA 5.3 × 10−3 5.1 × 10−3 1.4 × 10−3 1.1 × 10−3 GPPCA 3.3 × 10−4 2.6 × 10−4 2.2 × 10−4 1.3 × 10−4 LY1 4.6 × 10−2 5.8 × 10−3 1.5 × 10−2 2.1 × 10−3 LY5 3.2 × 10−2 5.5 × 10−3 1.1 × 10−2 1.8 × 10−3 d = 8 and τ = 100 k=16 k=80 n = 500 n = 1000 n = 500 n = 1000 PCA 5.2 × 10−3 5.0 × 10−3 1.3 × 10−3 1.1 × 10−3 GPPCA 2.9 × 10−4 2.4 × 10−4 1.9 × 10−4 1.1 × 10−4 LY1 1.4 × 10−2 5.1 × 10−3 5.4 × 10−3 1.2 × 10−3 LY5 8.8 × 10−3 5.1 × 10−3 3.9 × 10−3 1.2 × 10−3 d = 4 and τ = 4 k=8 k=40 n = 200 n = 400 n = 200 n = 400 PCA 1.4 × 10−1 1.3 × 10−1 4.2 × 10−2 3.4 × 10−2 GPPCA 5.8 × 10−3 4.4 × 10−3 5.3 × 10−3 3.0 × 10−3 LY1 2.2 × 10−1 1.7 × 10−1 7.2 × 10−2 6.4 × 10−2 LY5 2.2 × 10−1 1.5 × 10−1 4.8 × 10−2 4.1 × 10−2 d = 8 and τ = 4 k=16 k=80 n = 500 n = 1000 n = 500 n = 1000 PCA 1.4 × 10−1 1.3 × 10−1 3.9 × 10−2 3.2 × 10−2 GPPCA 5.1 × 10−3 3.9 × 10−3 4.3 × 10−3 2.4 × 10−3 LY1 1.8 × 10−1 1.4 × 10−1 5.1 × 10−2 3.4 × 10−2 LY5 1.7 × 10−1 1.3 × 10−1 4.6 × 10−2 3.1 × 10−2 Table 1: AvgMSE for Example 2. Mengyang Gu (Johns Hopkins University) GPPCA SAMSI BFF Conference 31 / 54
  • 36. Simulated examples Correctly specified models Prediction of the mean by PCA and GPPCA 0 100 200 300 400 −1.5−0.50.5 x Y ^ 1 PCA GPPCA Truth q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q qq q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q qq q q q q q q 0 100 200 300 400 −1.00.01.0 x Y ^ 2 PCA GPPCA Truth q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q qq q q q q q q q q q q q q q qq q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qqq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q Figure 8: Prediction of the mean (AZ) of the first two output variables in one experiment with k = 8, d = 4, n = 400 and τ = 4. The observations are plotted as black circles and the truth is graphed as the black curves. The estimation by the PCA and GPPCA is graphed as the red dotted curves and blue dashed curves, respectively. The shaded area is the 95% posterior credible interval by the GPPCA. Mengyang Gu (Johns Hopkins University) GPPCA SAMSI BFF Conference 32 / 54
  • 37. Simulated examples Correctly specified models Example 3 (Factors with different covariance matrices) The data are sampled from model (1) where xi = i for 1 ≤ i ≤ n. The variance of the noise is σ2 0 = 0.25 and the kernel function is assumed to follow from (14) with σ2 = 1. The range parameter γ of each factor is uniformly sampled from [10, 103] in each experiment. In each scenario, we simulate the data from 8 different combinations of k, d and n. We repeat N = 100 times for each scenario. The parameters in the kernels and the variance of the noise are treated as unknown and estimated from the data. Mengyang Gu (Johns Hopkins University) GPPCA SAMSI BFF Conference 33 / 54
  • 38. Simulated examples Correctly specified models Largest Principal angles for Example 3 q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q PCA GPPCA LY1 LY5 PCA GPPCA LY1 LY5 0.0 0.5 1.0 1.5 k = 8 and d = 4 LargestPrincipalAngle q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q qq q q q PCA GPPCA LY1 LY5 PCA GPPCA LY1 LY5 0.0 0.5 1.0 1.5 k = 40 and d = 4 LargestPrincipalAngle q q q q q q q q q q q q q q q q q q q qqqq q q q qq q q q qqqq q q q q q q q qqq q q q q q q q q qq q qq PCA GPPCA LY1 LY5 PCA GPPCA LY1 LY5 0.0 0.5 1.0 1.5 k = 16 and d = 8 LargestPrincipalAngle q q q q qq q q q q q q q q q q q qq q qq q q q q q q qq q q q q q q q q q q q q q q qq q q q q q PCA GPPCA LY1 LY5 PCA GPPCA LY1 LY5 0.0 0.5 1.0 1.5 k = 80 and d = 8 LargestPrincipalAngle Figure 9: The largest principal angle between the true subspace and the estimated subspace of the four approaches for Example 3. The number of observations of each output variable is n = 200 and n = 400 for left 4 boxplots and right 4 boxplots in 2 left panels, respectively. The number of observations is n = 500 and n = 1000 for left 4 boxplots and right 4 boxplots in 2 right panels, respectively. Mengyang Gu (Johns Hopkins University) GPPCA SAMSI BFF Conference 34 / 54
  • 39. Simulated examples Correctly specified models AvgMSE for Example 3 d = 4 and τ = 4 k=8 k=40 n = 200 n = 400 n = 200 n = 400 PCA 1.3 × 10−1 1.3 × 10−1 3.8 × 10−2 3.0 × 10−2 GPPCA 1.4 × 10−2 4.0 × 10−2 7.1 × 10−3 1.1 × 10−2 LY1 1.6 × 10−1 1.4 × 10−1 4.9 × 10−2 3.4 × 10−2 LY5 1.5 × 10−1 1.3 × 10−1 4.4 × 10−2 3.2 × 10−2 d = 8 and τ = 4 k=16 k=80 n = 500 n = 1000 n = 500 n = 1000 PCA 1.3 × 10−1 1.3 × 10−1 3.5 × 10−2 2.9 × 10−2 GPPCA 1.3 × 10−2 3.3 × 10−2 6.0 × 10−3 8.0 × 10−3 LY1 1.4 × 10−1 1.3 × 10−1 3.7 × 10−2 2.9 × 10−2 LY5 1.4 × 10−1 1.3 × 10−1 3.4 × 10−2 2.8 × 10−2 Table 2: AvgMSE for Example 3. Mengyang Gu (Johns Hopkins University) GPPCA SAMSI BFF Conference 35 / 54
  • 40. Simulated examples Misspecified models Outline 1 Introduction 2 Generalized probabilistic principal component analysis (GPPCA) 3 GPPCA with a mean structure 4 Simulated examples Correctly specified models Misspecified models 5 Real examples Humanity computer model with multiple outputs Global gridded temperature anomalies 6 Future directions Mengyang Gu (Johns Hopkins University) GPPCA SAMSI BFF Conference 36 / 54
  • 41. Simulated examples Misspecified models Example 4 (Unconstrained factor loadings and misspecified kernel functions) The data are sampled from model (1) with Σ1 = ... = Σd = Σ and xi = i for 1 ≤ i ≤ n. Each entry of the factor loading matrix is assumed to be uniformly sampled from [0, 1] independently (without the orthogonal constraints in (3)). The exponential kernel and the Gaussian kernel are assumed in generating the data with different combinations of σ2 0 and n, while in the GPPCA, we still use the Mat´ern kernel function in (14) for the estimation. We assume k = 20, d = 4, γ = 100 and σ2 = 1 in sampling the data. We repeat N = 100 times for each scenario. All the kernel parameters and the noise variance are treated as unknown and estimated from the data. Mengyang Gu (Johns Hopkins University) GPPCA SAMSI BFF Conference 37 / 54
  • 42. Simulated examples Misspecified models Largest Principal angle for Example 4 q q qqq q q q qqq q q q q qq PCA GPPCA LY1 LY5 PCA GPPCA LY1 LY5 PCA GPPCA LY1 LY5 0.0 0.5 1.0 1.5 LargestPrincipalAngle Exponential Kernel k = 20, d = 4 and τ = 0.25 q q q q q q q q q q q qqq q q q q q qq q q q qq q qq q q q q PCA GPPCA LY1 LY5 PCA GPPCA LY1 LY5 PCA GPPCA LY1 LY5 0.0 0.5 1.0 1.5 LargestPrincipalAngle Gaussian Kernel k = 20, d = 4 and τ = 0.25 Figure 10: The largest principal angle between the estimated subspace of four approaches and the true subspace for Example 4. The number of observations are assumed to be n = 100, n = 200 and n = 400 for left 4 boxplots, middle 4 boxplots and right 4 boxplots in both panels, respectively. The kernel in simulating the data is assumed to be the exponential kernel in the left panel, whereas the kernel is assumed to be the Gaussian kernel in the right panel. Mengyang Gu (Johns Hopkins University) GPPCA SAMSI BFF Conference 38 / 54
  • 43. Simulated examples Misspecified models AvgMSE for Example 4 exponential kernel and τ = 4 n = 100 n = 200 n = 400 PCA 7.4 × 10−2 6.1 × 10−2 5.4 × 10−2 GPPCA 3.1 × 10−2 2.6 × 10−2 2.4 × 10−2 LY1 1.5 × 10−1 8.2 × 10−1 5.7 × 10−2 LY5 1.3 × 10−1 7.3 × 10−1 5.6 × 10−2 Gaussian kernel and τ = 1/4 n = 100 n = 200 n = 400 PCA 1.1 × 100 8.9 × 10−1 8.4 × 10−1 GPPCA 7.2 × 10−1 6.6 × 10−1 6.2 × 10−1 LY1 1.3 × 100 1.0 × 100 8.6 × 10−1 LY5 1.3 × 100 1.0 × 100 8.6 × 10−1 Table 3: AvgMSE for Example 4. Mengyang Gu (Johns Hopkins University) GPPCA SAMSI BFF Conference 39 / 54
  • 44. Simulated examples Misspecified models Example 5 (Unconstrained factor loadings and deterministic factors) The data are sampled from model (1) with each latent factor being a deterministic function Zl(xi) = cos(0.05πθlxi) where θl i.i.d. ∼ unif(0, 1) for l = 1, ..., d, with xi = i for 1 ≤ i ≤ n, σ2 0 = 0.25, k = 20 and d = 4. Four scenarios are considered with the sample size n = 100, n = 200, n = 400 and n = 800. Mengyang Gu (Johns Hopkins University) GPPCA SAMSI BFF Conference 40 / 54
  • 45. Simulated examples Misspecified models Largest Principal angle for Example 5 q q q q q q q qq qq q qqq q q qqq q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q PCA GPPCA LY1 LY5 PCA GPPCA LY1 LY5 PCA GPPCA LY1 LY5 PCA GPPCA LY1 LY5 0.0 0.5 1.0 1.5 LargestPrincipalAngle Deterministic Factors k = 20, d = 4 and τ = 4 Figure 11: The largest principal angle between the estimated subspace of the loading matrix and the true subspace for Example 5. From the left to the right, the number of observations is assumed to be n = 100, n = 200, n = 400 and n = 800 for each 4 boxplots, respectively. Mengyang Gu (Johns Hopkins University) GPPCA SAMSI BFF Conference 41 / 54
  • 46. Simulated examples Misspecified models AvgMSE for Example 5 n = 100 n = 200 n = 400 n = 800 PCA 7.0 × 10−2 6.0 × 10−2 5.4 × 10−2 5.2 × 10−2 GPPCA 1.4 × 10−2 9.2 × 10−3 6.7 × 10−3 5.5 × 10−3 LY1 9.8 × 10−1 7.6 × 10−1 6.3 × 10−2 5.7 × 10−2 LY5 9.3 × 10−2 7.3 × 10−2 6.2 × 10−2 5.6 × 10−2 Ind GP 2.0 × 10−2 1.9 × 10−2 1.7 × 10−2 1.7 × 10−2 PP GP 2.0 × 10−2 1.9 × 10−2 1.8 × 10−2 1.8 × 10−2 Table 4: AvgMSE for Example 5. The Ind GP approach treats each output variable independently and the mean of the output is estimated by the predictive mean in the Gaussian process regression. The PP GP approach also models each output variable independently by a Gaussian process, whereas the covariance function is shared for k independent Gaussian processes and estimated based on all data. Mengyang Gu (Johns Hopkins University) GPPCA SAMSI BFF Conference 42 / 54
  • 47. Real examples Outline 1 Introduction 2 Generalized probabilistic principal component analysis (GPPCA) 3 GPPCA with a mean structure 4 Simulated examples 5 Real examples Humanity computer model with multiple outputs Global gridded temperature anomalies 6 Future directions Mengyang Gu (Johns Hopkins University) GPPCA SAMSI BFF Conference 43 / 54
  • 48. Real examples Humanity computer model with multiple outputs Outline 1 Introduction 2 Generalized probabilistic principal component analysis (GPPCA) 3 GPPCA with a mean structure 4 Simulated examples Correctly specified models Misspecified models 5 Real examples Humanity computer model with multiple outputs Global gridded temperature anomalies 6 Future directions Mengyang Gu (Johns Hopkins University) GPPCA SAMSI BFF Conference 44 / 54
  • 49. Real examples Humanity computer model with multiple outputs We first consider the testbed called the ‘diplomatic and military operations in a non-warfighting domain’ (DIAMOND) simulator, which models the number of casualties during the second day to sixth day after the earthquake and volcanic eruption in Giarre and Catania. The input variables are 13 dimensional, such as the helicopter cruise speed, engineer ground speed, hospital, shelter and food supply capacity in these two places, etc. We use the same n = 120 training and n∗ = 120 testing outputs in [Overstall and Woods, 2016] to compare different approaches. The criteria for out of sample prediction are RMSE = k j=1 n∗ i=1( ˆY ∗ j (x∗ i ) − Y ∗ j (x∗ i ))2 kn∗ , PCI(95%) = 1 kn∗ k j=1 n∗ i=1 1{Y ∗ j (x∗ i ) ∈ CIij(95%)} , LCI(95%) = 1 kn∗ k j=1 n∗ i=1 length{CIij(95%)} . Mengyang Gu (Johns Hopkins University) GPPCA SAMSI BFF Conference 45 / 54
  • 50. Real examples Humanity computer model with multiple outputs Method Mean function Kernel RMSE PCI (95%) LCI (95%) GPPCA Intercept Gaussian kernel 3.33 × 102 0.948 1.52 × 103 GPPCA Selected covariates Gaussian kernel 3.18 × 102 0.957 1.31 × 103 GPPCA Intercept Mat´ern kernel 2.82 × 102 0.962 1.22 × 103 GPPCA Selected covariates Mat´ern kernel 2.74 × 102 0.957 1.18 × 103 Ind GP Intercept Gaussian kernel 3.64 × 102 0.918 1.18 × 103 Ind GP Selected covariates Gaussian kernel 4.04 × 102 0.918 1.17 × 103 Ind GP Intercept Mat´ern kernel 3.40 × 102 0.930 0.984 × 103 Ind GP Selected covariates Mat´ern kernel 3.31 × 102 0.927 0.967 × 103 Multi GP Intercept Gaussian kernel 3.63 × 102 0.975 1.67 × 103 Multi GP Selected covariates Gaussian kernel 3.34 × 102 0.963 1.54 × 103 Multi GP Intercept Mat´ern kernel 3.01 × 102 0.962 1.34 × 103 Multi GP Selected covariates Mat´ern kernel 3.05 × 102 0.970 1.50 × 103 Table 5: The GPPCA and Ind GP with the same mean structure and kernels are given in the first 8 rows. The 9th and 10th rows show the emulation result of two best models in [Overstall and Woods, 2016] using Gaussian kernel for the same held-out testing output, whereas the last two rows give the result of the same model with the Mat´ern kernel in (14). The RMSE is 1.08 × 105 using the mean of the training output to predict. Mengyang Gu (Johns Hopkins University) GPPCA SAMSI BFF Conference 46 / 54
  • 51. Real examples Humanity computer model with multiple outputs Estimated covariance and prediction 2 3 4 5 6 2 3 4 5 6 Day Day 2.5e+06 5.0e+06 7.5e+06 1.0e+07 Covariance q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q qq q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q 0 20 40 60 80 100 120 050001500025000 Held out runs Output q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q qq q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q qq q q qq q q q q q q q q q q q q q q q q Casualties on day 5 GPPCA prediction for day 5 Ind GP prediction for day 5 Casualties on day 6 GPPCA prediction for day 6 Ind GP prediction for day 6 Figure 12: The estimated covariance of the casualties by the GPPCA at the different days after the catastrophe is graphed in the left panel. The held out testing output, the prediction by the GPPCA and Independent GPs with the mean basis h(x) = (1, x11) and Mat´ern kernel for the fifth day and sixth day are graphed in the right panel. Mengyang Gu (Johns Hopkins University) GPPCA SAMSI BFF Conference 47 / 54
  • 52. Real examples Global gridded temperature anomalies Outline 1 Introduction 2 Generalized probabilistic principal component analysis (GPPCA) 3 GPPCA with a mean structure 4 Simulated examples Correctly specified models Misspecified models 5 Real examples Humanity computer model with multiple outputs Global gridded temperature anomalies 6 Future directions Mengyang Gu (Johns Hopkins University) GPPCA SAMSI BFF Conference 48 / 54
  • 53. Real examples Global gridded temperature anomalies NOAA global gridded temperature anomalies The dataset from U.S. National Oceanic and Atmospheric Administration (NOAA) the global gridded monthly anomalies of the combined air and marine temperature from Jan 1880 to near present with 5◦ × 5◦ latitude-longitude spatial resolution. The recorded variance of the measurement error is around 0.1. We compare different approaches on interpolation. We use the monthly temperature anomalies at 1, 639 spatial grid boxes in the past 20 years. We hold out the 24, 000 randomly sampled measurements on 1, 200 spatial grid boxes in 20 months as the test data set. For GPPCA, the mean basis function h(x) = (1, x), where x is an integer from 1 to 240 for the month. We also assume the covariance is the same for all factor processes. We will also compare with PPCA, spatial smoothing and temporal smoothing approach. For the temporal smoothing approach, we also assume h(x) = (1, x). Random forest regression by making independence assumption either across space or across time. Mengyang Gu (Johns Hopkins University) GPPCA SAMSI BFF Conference 49 / 54
  • 54. Real examples Global gridded temperature anomalies Method measurement error RMSE PCI (95%) LCI (95%) GPPCA, d = 50 estimated 0.392 0.877 1.03 GPPCA, d = 100 estimated 0.330 0.774 0.564 GPPCA, d = 50 fixed 0.392 0.938 1.34 GPPCA, d = 100 fixed 0.335 0.976 1.44 PPCA, d = 50 estimated 0.644 0.674 1.09 PPCA, d = 100 estimated 0.644 0.520 1.40 PPCA, d = 50 fixed 0.641 0.760 1.33 PPCA, d = 100 fixed 0.622 0.801 1.400 Temporal smoothing by GP estimated 1.02 0.940 2.36 Spatial smoothing by GP estimated 0.623 0.917 1.95 Temporal regression by RF estimated 0.497 / / Spatial regression by RF estimated 0.444 / / Table 6: Out of sample prediction of the temperature anomalies by different approaches. The predictive performance by the GPPCA and PPCA are given in the first four rows and latter four rows. The predictive performance by the temporal smoothing method and spatial smoothing methods are given in the 9th and 10th rows. The last two rows give the predictive RMSE by regression using the random forest (RF) algorithm. Mengyang Gu (Johns Hopkins University) GPPCA SAMSI BFF Conference 50 / 54
  • 55. Real examples Global gridded temperature anomalies Comparison between the GPPCA and spatial smoothing −6 −4 −2 0 2 4 6 °C 50 150 250 350 −50 0 50 Interpolion by the GPPCA Longitude Latitude −6 −4 −2 0 2 4 6 °C 50 150 250 350 −50 0 50 Observated temperature anomalies Longitude Latitude −6 −4 −2 0 2 4 6 °C 50 150 250 350 −50 0 50 Interpolion by the spatial smoothing method Longitude Latitude Figure 13: The interpolated and observed temperature anomalies in April 2013. The observed temperature anomalies in April 2013 is graphed in the middle panel. The interpolated temperature anomalies by the GPPCA and spatial smoothing method are graphed in the left and right panels, respectively. The number of training observations and test observations are 439 and 1200, respectively. The out-of-sample RMSE of the GPPCA and spatial smoothing method is 0.335 and 0.779, respectively. Mengyang Gu (Johns Hopkins University) GPPCA SAMSI BFF Conference 51 / 54
  • 56. Real examples Global gridded temperature anomalies Estimated Intercept and trend by the GPPCA −1.5 −1.0 −0.5 0.0 0.5 1.0 1.5 °C 50 150 250 350 −50 0 50 Estimated intercept Longitude Latitude −0.015 −0.010 −0.005 0.000 0.005 0.010 0.015 °C 50 150 250 350 −50 0 50 Estimated monthly temperature change rate Longitude Latitude Figure 14: Estimated intercept and monthly change rate of the temperature anomalies by the GPPCA using the monthly temperature anomalies between January 1999 and December 2018. The spatial orthonormal basis of A could be used. GPPCA is more general as it does not require the distance between functions. The GPPCA can be extended to the irregular missing case by the EM algorithm or MCMC algorithm if one can specify the full posteriors. Mengyang Gu (Johns Hopkins University) GPPCA SAMSI BFF Conference 52 / 54
  • 57. Future directions Outline 1 Introduction 2 Generalized probabilistic principal component analysis (GPPCA) 3 GPPCA with a mean structure 4 Simulated examples 5 Real examples 6 Future directions Mengyang Gu (Johns Hopkins University) GPPCA SAMSI BFF Conference 53 / 54
  • 58. Future directions Future directions The full Bayesian approach of the factor loading matrix and parameters (based on the computationally feasible marginal likelihood). Estimating the number of factors. Convergence rate of the GPPCA. Extension when the observations are not a matrix. Optimization algorithm on the Stiefel manifold. Other orthonormal basis of the factor loading matrix. Other ways to model the factor processes. Heteroscedastic noise. Mengyang Gu (Johns Hopkins University) GPPCA SAMSI BFF Conference 54 / 54
  • 59. Future directions Reference Gu, M. and Shen, W. (2018) Generalized probabilistic principal component analysis (GPPCA) for correlated data. arXiv:1808.10868. Mengyang Gu (Johns Hopkins University) GPPCA SAMSI BFF Conference 54 / 54
  • 60. Future directions Thanks! Mengyang Gu (Johns Hopkins University) GPPCA SAMSI BFF Conference 54 / 54
  • 61. Future directions Related literature: frequentist approaches The MLE of the factor loading matrix A under the Assumption 1 is U0R (without marginalizing out Z), where ˜U0 is the first d ordered eigenvectors of YYT and R is an orthogonal rotation matrix (same as the PCA). PCA is widely used in factor models, particularly in modeling multiple time series. E.g. Bai and Ng [2002] and Bai [2003] assume that AT A = kId and estimate A by √ kU0 in modeling high-dimensional time series. PCA is also widely used to estimate the basis in the linear model of coregionalization [Higdon et al., 2008, Paulo et al., 2012]. In [Tipping and Bishop, 1999], the linear subspace by the PCA is the MMLE of the factor model with independent factors. [Lam et al., 2011, Lam and Yao, 2012] estimate the factor loading matrix of model (1) by ˆALY := q0 q=1 ˆΣy(q)ˆΣT y (q), where ˆΣy(q) is the k × k sample covariance at lag q of the output and q0 is fixed to be a positive integer. Kernel PCA was introduced in machine learning, which map the output onto the feature space by kernels [Sch¨olkopf et al., 1998, Mika et al., 1999, Hoffmann, 2007]. Mengyang Gu (Johns Hopkins University) GPPCA SAMSI BFF Conference 54 / 54
  • 62. Future directions Related literature: Bayesian approaches [West, 2003] points out the connection between PCA and a class of generalized singular g-priors, and introduces a spike-and-slab prior that induces the sparse factors in the latent factor model assuming the factors are independently distributed. Another prior that induces the sparsity is introduced by [Bhattacharya and Dunson, 2011] under the independent assumptions of the factors, and its asymptotic behaviors are also discussed. [Nakajima and West, 2013, Zhou et al., 2014] introduce a method to directly threshold the time-varying factor loading matrix in Bayesian dynamic linear models. When modeling spatially correlated data, priors are also discussed for the spatially varying factor loading matrices in LMC [Gelfand et al., 2004, Banerjee et al., 2014]. [Higdon et al., 2008, Paulo et al., 2012, Fricker et al., 2013] use LMC for emulating computer models with multiple outputs, estimates the factor loading matrix and relies on MCMC algorithm for the inference. Mengyang Gu (Johns Hopkins University) GPPCA SAMSI BFF Conference 54 / 54
  • 63. References P-A Absil, Alan Edelman, and Plamen Koev. On the largest principal angle between random subspaces. Linear Algebra and its applications, 414(1):288–294, 2006. Mauricio A Alvarez, Lorenzo Rosasco, Neil D Lawrence, et al. Kernels for vector-valued functions: A review. Foundations and Trends R in Machine Learning, 4(3):195–266, 2012. Jushan Bai. Inferential theory for factor models of large dimensions. Econometrica, 71(1):135–171, 2003. Jushan Bai and Serena Ng. Determining the number of factors in approximate factor models. Econometrica, 70(1):191–221, 2002. Sudipto Banerjee, Bradley P Carlin, and Alan E Gelfand. Hierarchical modeling and analysis for spatial data. Crc Press, 2014. Anirban Bhattacharya and David B Dunson. Sparse Bayesian infinite factor models. Biometrika, pages 291–306, 2011. ke Bj¨orck and Gene H Golub. Numerical methods for computing angles between linear subspaces. Mathematics of computation, 27(123): 579–594, 1973. Mengyang Gu (Johns Hopkins University) GPPCA SAMSI BFF Conference 54 / 54
  • 64. References Thomas E Fricker, Jeremy E Oakley, and Nathan M Urban. Multivariate Gaussian process emulators with nonseparable covariance structures. Technometrics, 55(1):47–56, 2013. Alan E Gelfand, Alexandra M Schmidt, Sudipto Banerjee, and CF Sirmans. Nonstationary multivariate process modeling through spatially varying coregionalization. Test, 13(2):263–312, 2004. Mengyang Gu. FastGaSP: Fast and Exact Computation of Gaussian Stochastic Process, 2019. URL https://CRAN.R-project.org/package=FastGaSP. R package version 0.5.1. Mengyang Gu and Kyle Anderson. Calibration of imperfect mathematical models by multiple sources of data with measurement bias. arXiv preprint arXiv:1810.11664, 2018. Mengyang Gu and James O Berger. Parallel partial Gaussian process emulation for computer models with massive output. Annals of Applied Statistics, 10(3):1317–1347, 2016. Mengyang Gu and Yanxun Xu. Nonseparable Gaussian stochastic process: Mengyang Gu (Johns Hopkins University) GPPCA SAMSI BFF Conference 54 / 54
  • 65. References A unified view and computational strategy. arXiv preprint arXiv:1711.11501, 2017. Jouni Hartikainen and Simo Sarkka. Kalman filtering and smoothing solutions to temporal gaussian process regression models. In Machine Learning for Signal Processing (MLSP), 2010 IEEE International Workshop on, pages 379–384. IEEE, 2010. Dave Higdon, James Gattiker, Brian Williams, and Maria Rightley. Computer model calibration using high-dimensional output. Journal of the American Statistical Association, 103(482):570–583, 2008. Heiko Hoffmann. Kernel PCA for novelty detection. Pattern recognition, 40(3):863–874, 2007. Clifford Lam and Qiwei Yao. Factor modeling for high-dimensional time series: inference for the number of factors. The Annals of Statistics, 40 (2):694–726, 2012. Clifford Lam, Qiwei Yao, and Neil Bathia. Estimation of latent factors for high-dimensional time series. Biometrika, 98(4):901–918, 2011. Sebastian Mika, Bernhard Sch¨olkopf, Alex J Smola, Klaus-Robert M¨uller, Matthias Scholz, and Gunnar R¨atsch. Kernel PCA and de-noising in Mengyang Gu (Johns Hopkins University) GPPCA SAMSI BFF Conference 54 / 54
  • 66. References feature spaces. In Advances in neural information processing systems, pages 536–542, 1999. Jouchi Nakajima and Mike West. Bayesian analysis of latent threshold dynamic models. Journal of Business & Economic Statistics, 31(2): 151–164, 2013. Antony M Overstall and David C Woods. Multivariate emulation of computer simulators: model selection and diagnostics with application to a humanitarian relief model. Journal of the Royal Statistical Society: Series C (Applied Statistics), 65(4):483–505, 2016. Rui Paulo, Gonzalo Garc´ıa-Donato, and Jes´us Palomo. Calibration of computer models with multivariate output. Computational Statistics and Data Analysis, 56(12):3959–3974, 2012. Bernhard Sch¨olkopf, Alexander Smola, and Klaus-Robert M¨uller. Nonlinear component analysis as a kernel eigenvalue problem. Neural computation, 10(5):1299–1319, 1998. Matthias Seeger, Yee-Whye Teh, and Michael Jordan. Semiparametric latent factor models. Technical report, 2005. Mengyang Gu (Johns Hopkins University) GPPCA SAMSI BFF Conference 54 / 54
  • 67. Future directions Michael E Tipping and Christopher M Bishop. Probabilistic principal component analysis. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 61(3):611–622, 1999. Zaiwen Wen and Wotao Yin. A feasible method for optimization with orthogonality constraints. Mathematical Programming, 142(1-2): 397–434, 2013. M. West. Bayesian factor regression models in the “large p, small n” paradigm. In J. M. Bernardo, M. J. Bayarri, J. O. Berger, A. P. David, D. Heckerman, A. F. M. Smith, and M. West, editors, Bayesian Statistics 7, pages 723–732. Oxford University Press, 2003. URL http://ftp.isds.duke.edu/WorkingPapers/02-12.html. Peter Whittle. On stationary processes in the plane. Biometrika, pages 434–449, 1954. Xiaocong Zhou, Jouchi Nakajima, and Mike West. Bayesian forecasting and portfolio decisions using dynamic dependent sparse factor models. International Journal of Forecasting, 30(4):963–980, 2014. Mengyang Gu (Johns Hopkins University) GPPCA SAMSI BFF Conference 54 / 54