MUMS: Bayesian, Fiducial, and Frequentist Conference - Generalized Probabilistic Principal Component Analysis of Correlated Mortality, Mengyang Gu, April 30, 2019

Generalized probabilistic principal component analysis
of correlated data
Mengyang Gu and Weining Shen
Department of Applied Mathematics and Statistics
Johns Hopkins University
Department of Statistics
University of California, Irvine
SAMSI BFF Conference
Mengyang Gu (Johns Hopkins University) GPPCA SAMSI BFF Conference 1 / 54

Outline
1 Introduction
2 Generalized probabilistic principal component analysis (GPPCA)
3 GPPCA with a mean structure
4 Simulated examples
Correctly speciﬁed models
Misspeciﬁed models
5 Real examples
Humanity computer model with multiple outputs
Global gridded temperature anomalies
6 Future directions

Introduction
Outline
1 Introduction
5 Real examples
6 Future directions

Introduction
NOAA monthly gridded temperature anomalies
−5
0
5
[oC]
50 150 250 350
−50
0
50
NOAA Temperature Anomalies in Feb 2017
Longitude
Latitude
−5
0
5
[oC]
50 150 250 350
−50
0
50
NOAA Temperature Anomalies in Dec 2018
Longitude
Latitude
Figure 1: NOAA monthly gridded temperature anomalies.

Introduction
Ground deformation by radar interferograms
−5000 0 5000
−6000−2000020006000
interferogram 1
x1
x2
−0.05
0.00
0.05
m/yr
−5000 0 5000
−6000−2000020006000
interferogram 2
x1
x2
−0.05
0.00
0.05
m/yr
−5000 0 5000
−6000−2000020006000
interferogram 3
x1
x2
−0.05
0.00
0.05
m/yr
−5000 0 5000
−6000−2000020006000
interferogram 4
x1
x2
−0.05
0.00
0.05
m/yr
−5000 0 5000
−6000−2000020006000
interferogram 5
x1
x2
−0.05
0.00
0.05
m/yr
Figure 2: Five interferometric synthetic aperture radar (InSAR) interferograms spanning the
following time periods: 1) 17 Oct 2011 - 04 May 2012; 2) 21 Oct 2011 - 16 May 2012; 3); 20
Oct 2011 to 15 May 2012; 4) 28 Oct 2011 to 11 May 2012; 5) 12 Oct 2011 - 07 May 2012.
The black curves show cliﬀs and other important topographic features at K¯ılauea; the large
elliptical feature is K¯ılauea Caldera. The color indicates the ground deformation rate per year.
The ﬁgures are from [Gu and Anderson, 2018].

Introduction
Emulation of computer models with multiple outputs
Figure 3: Median (truncated at 20 meters at the volcanic center region) and
interquartile range of the GaSP emulator of ‘maximum ﬂow height over time’ for
TITAN2D, at 23,040 spatial locations over Montserrat Island and for new input
values V ∗
= 106.9984
, ϕ∗
= 3.3487, δ∗
bed = 10.8790, and δ∗
int = 31.0300. The
ﬁgures are from [Gu and Berger, 2016].

Introduction
Multiple sequences/time series
q
q
q
q
qq
q
q
q
qq
q
q
qq
qq
qq
q
q
q
q
q
q
qq
q
q
q
q
q
qq
q
q
q
qq
q
qqq
qq
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
qq
q
q
qq
q
q
q
qq
q
qq
q
q
q
q
q
q
q
q
qq
q
q
q
qq
qqq
q
q
qqqqq
q
qq
q
qq
qqqq
qqqqq
qq
q
q
qqq
qqq
qqqq
q
qqq
q
q
qq
qqqqqqqqq
qqqqqqqqqqqqqq
qqq
q
qq
q
qqqqqqq
qqqqqqqqqq
qqq
qqq
qqqq
q
qqqqqq
qqqqq
q
qqqqqqqqqqqqqqqqqq
qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq
qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq
qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq
qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq
qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq
qqqqqqqqqqqqq
qqqqqqqqqqqqqqqqqq
qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq
qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq
qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq
qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq
qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq
qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq
qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq
qqqqqqqqqqq
qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq
q
qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq
qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq
qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq
qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq
q
qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq
qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq
q
qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq
qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq
qqqqqqqqq
qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq
qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq
qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq
qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq
qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq
qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq
qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq
q
qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq
qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq
0 1000 2000 3000 4000 5000
0.00.20.40.60.81.0
CpG site distance
Correlation
0
5
10
15
20
25
0 5 10 15 20 25
Sample
Sample
0.7
0.8
0.9
1.0
value
Figure 4: Empirical correlation of methylation levels across sites (left panel) and
across samples (right panel), based on 24 samples and one million methylation
levels in chromosome 1 of each sample. The ﬁgures are from [Gu and Xu, 2017].
Similar data include
multiple time series
health records

Introduction
A latent factor model
Let y(x) = (y1(x), ..., yk(x))T
be a k-dimensional real-valued output vector at a
p-dimensional input vector x. Assume yj(x) has zero mean for now.
Consider the following latent factor model
y(x) = Az(x) + , (1)
The k × d factor loading matrix A = [a1, ..., ad] relates the k-dimensional
outputs to a d-dimensional factor processes z(x) = (z1(x), ..., zd(x))T
,
where d ≤ k. Assume the independence between any two factor processes.
Assume Zl = (zl(x1), ..., zl(xn)) follows a multivariate normal distribution
ZT
l ∼ MN(0, Σl), (2)
where Σl can be parameterized by a covariance function such that the (i, j)
entry of Σl is σ2
l Kl(xi, xj), where Kl(·, ·) is a kernel function, for
l = 1, ..., d and 1 ≤ i, j ≤ n.
This model is often referred as the semiparameteric latent factor model
[Seeger et al., 2005, Alvarez et al., 2012] and it is a special case of the linear
model of corregionalization (LMC) [Gelfand et al., 2004].

Introduction
Estimation of the factor loading matrix
Let Y = [y(x1), ..., y(xn)] be the k × n matrix of the observations and let
Z = [z(x1), ..., z(xn)] be the d × n latent factor matrix.
It is popular to estimate A by PCA. [Higdon et al., 2008, Paulo et al., 2012]
estimate A by the ﬁrst d columns of
√
nU0D
1/2
0 where U0D0UT
0 are the
eigendecomposition of YYT
/n.

Introduction
Estimation of the factor loading matrix
Let Y = [y(x1), ..., y(xn)] be the k × n matrix of the observations and let
Z = [z(x1), ..., z(xn)] be the d × n latent factor matrix.
It is popular to estimate A by PCA. [Higdon et al., 2008, Paulo et al., 2012]
estimate A by the ﬁrst d columns of
√
nU0D
1/2
0 where U0D0UT
0 are the
eigendecomposition of YYT
/n.
In [Tipping and Bishop, 1999], they study a latent factor model
Y = AZ + ,
with independent factors Z ∼ N(0, Ink). Assume each row of Y is zero,
the maximum marginal likelihood estimator (MMLE) of A is the ﬁrst d
columns U0(D0 − σ2
0Ik)1/2
R, where R is an arbitrary d × d orthogonal
rotation matrix.
Note that the model (1) is unchanged if one replaces the pair (A, z(x)) by
(AE, E−1
z(x)) for any invertible matrix E. So only the subspace of A,
denoted as M(A), can be uniquely determined.
The linear subspace by the above PCA for the LMC model and MMLE of
the (independent) latent factor model are the same.

Introduction
Research goals
What is the maximum marginal likelihood estimator of the factor
loadings (and other parameters) in the latent factor model (1) (where
the factors are dependent)?
What are the predictive distributions of the new data?
Are they computationally feasible?
If we have additional regressors (covariates), can we also combine
them in the model in a coherent way?

Generalized probabilistic principal component analysis (GPPCA)
Outline
1 Introduction
5 Real examples
6 Future directions

Orthogonal assumption
Since only the linear subspace of the factor loading matrix M(A) is identiﬁable,
we assume the columns of A in model (1) are orthonormal:
Assumptions 1
AT
A = Id. (3)
Note one may assume AT
A = cId where c is a positive constant which can
potentially depend on k, e.g. c = k. But the variance parameters of the
factor processes are estimated by the data so we focus on Assumption 1.
This assumption is also the key for some other estimators of factor loading
matrix [Lam et al., 2011, Lam and Yao, 2012].
The MLE of the factor loading matrix A under the Assumption 1 is
√
nU0R
(without marginalizing out Z), where U0 is the ﬁrst d ordered eigenvectors
of YYT
/n and R is an orthogonal rotation matrix (same as the PCA). E.g.
[Bai and Ng, 2002] and [Bai, 2003] assume that AT
A = kId and estimate
A by
√
kU0 in modeling high-dimensional time series.

Marginal likelihood
(Known expression of the marginal likelihood) Denote the vectorization
of the output Yv = vec(Y) and the d × n latent factor matrix
Z = (z(x1), ..., z(xn)) at inputs {x1, ..., xn}. After marginalizing out Z, Y
follows a multivariate normal distribution as follows ([Banerjee et al., 2014])
Yv | A, σ2
0, Σ1, ..., Σd ∼ MN 0,
d
l=1
Σl ⊗ (alaT
l ) + σ2
0Ink .
Lemma 1 (Marginal likelihood)
Under Assumption 1, the marginal distribution of Yv in model (1) is the
multivariate normal distribution as follows
Yv | A, σ2
0, Σ1, ..., Σd ∼ MN

0, σ2
0 Ink −
d
l=1
(σ2
0Σ−1
l + In)−1
⊗ (alaT
l )
−1

 ,
for l = 1, ..., d.

Theorem 1 (Maximum marginal likelihood estimator)
For model (1), under Assumption 1, after marginalizing out Z,
1. if Σ1 = ... = Σd = Σ, the marginal likelihood is maximized when
ˆA = UR, (4)
where U is a k × d matrix of the ﬁrst d principal eigenvectors of
G = Y(σ2
0Σ−1
+ In)−1
YT
, (5)
and R is an arbitrary d × d orthogonal rotation matrix;

Theorem 1 (Maximum marginal likelihood estimator)
For model (1), under Assumption 1, after marginalizing out Z,
1. if Σ1 = ... = Σd = Σ, the marginal likelihood is maximized when
Â = UR, (4)
where U is a k × d matrix of the first d principal eigenvectors of
G = Y(σ2
0Σ−1
+ In)−1
YT
, (5)
and R is an arbitrary d × d orthogonal rotation matrix;
2. if the covariances of the factor processes are different, denoting
Gl = Y(σ2
0Σ−1
l + In)−1
YT
, the maximum marginal likelihood estimator of
A is
Â = argmaxA
d
l=1
aT
l Glal, s.t. AT
A = Id, (6)
A numerical optimization algorithm that preserves the orthogonal constraints in
(6) is introduced in [Wen and Yin, 2013].

The estimator in Theorem 1 is called the generalized probabilistic
principal component analysis (GPPCA), which is a direct extension of
the PPCA in [Tipping and Bishop, 1999] when the factors are correlated.

The estimator in Theorem 1 is called the generalized probabilistic
principal component analysis (GPPCA), which is a direct extension of
the PPCA in [Tipping and Bishop, 1999] when the factors are correlated.
For demonstration purposes, let (i, j)-term of Σl be σ2
l Kl(xi, xj), where
Kl(·, ·) is a kernel functions, having parameters γl.
Denote the signal to noise ratio (SNR) τl =
σ2
l
σ2
0
. Let τ = (τ1, ..., τd) and
γ = (γ1, ..., γd). The maximum marginal likelihood estimator of σ2
0 becomes
a function of Â, τ and γ as σ2
0 = ˆS2
/(nk), where
ˆS2
= tr(YT
Y) −
d
l=1 âT
l Y(τ−1
l K−1
l + In)−1
YT
âl.
Plugging Â and ˆσ2
0, the marginal likelihood satisfies
L(τ, γ | Y, Â, ˆσ2
0) ∝
d
l=1
|τlKl + In|−1/2
| ˆS2
|−nk/2
. (7)
After obtaining (ˆτ, ˆγ) by maximizing the marginal likelihood, one get the Â,
ˆσ2
0, and ˆσ2
l for l = 1, ..., d.

Computational complexity
Each evaluation of the likelihood in (7), one needs
max(O(dn3), O(dkn)) in general.
Each evaluation of the optimization function for estimating A in
Theorem 1, one needs max(O(dn3), O(dkn)) in general; to solve the
eigenproblem when the covariance is shared, one needs min(kn2, k2n).
When the input is one-dimensional and the Mat´ern kernel are used,
the computational operations are only O(dkn) for computing the
likelihood in (7) without any approximation (see e.g. [Whittle, 1954,
Hartikainen and Sarkka, 2010]). There is an R package, called
“FastGaSP” in CRAN that implements the fast algorithm for
Gaussian process with Mat´ern kernel [Gu, 2019].
To directly solve the eigenproblem still has the rate
min(O(kn2), O(k2n)), but the iterative algorithm has rate O(dkn).

Let ˆΣl be the estimator of the covariance matrix for the lth factor, where the
(i, j) element of ˆΣl is ˆσ2
l
ˆKl(xi, xj) by plugging in ˆσ2
l and ˆγl.
Theorem 2 (Predictive distribution)
Under the Assumption 1, for any x∗
, one has
Y(x∗
) | Y, Â, ˆγ, ˆσ2
, ˆσ2
0 ∼ MN ˆµ∗
(x∗
), ˆΣ∗
(x∗
) ,
where
ˆµ∗
(x∗
) = Âˆz(x∗
), (8)
with ˆz(x∗
) = (ˆz1(x∗
), ..., ˆzd(x∗
))T
, ˆzl(x∗
) = ˆΣT
l (x∗
)(ˆΣl + ˆσ2
0In)−1
YT
âl,
ˆΣl(x∗
) = ˆσ2
l ( ˆKl(x1, x∗
), ..., ˆKl(x1, x∗
))T
for l = 1, ..., d, and
ˆΣ∗
(x∗
) = Â ˆD(x∗
)ÂT
+ ˆσ2
0(Ik − ÂÂT
) (9)
with ˆD(x∗
) being a diagonal matrix, and the lth diagonal term being
ˆDl(x∗
) = ˆσ2
l
ˆKl(x∗
, x∗
) + ˆσ2
0 − ˆΣT
l (x∗
) ˆΣl + ˆσ2
0In
−1
ˆΣl(x∗
) .

Illustrative example
Example 1
The data are sampled from the latent factor model (1) with the shared
covariance matrix Σ1 = Σ2 = Σ, where x is equally spaced from 1 to n
and the kernel function is assumed to follow (14) with γ = 100 and
σ2 = 1. We choose k = 2, d = 1 and n = 100. Two scenarios are
implemented with σ2
0 = 0.01 and σ2
0 = 1, respectively. The parameters
(σ2
0, σ2, γ) are assumed to be unknown and estimated from the data.

qq
qq
q
q
qq
q
qqqq
qqqq
qq
qqq
q
q
q
q
q
q
q
qq
qqq
q
q
q
qq
q
qq
q
qq
q
q
q
qq
q
q
q
qqq
q
q
q
q
q
q
qq
qqqqqq
q
q
qqqq
q
q
qq
q
q
qqq
q
q
q
q
q
q
qq
q
qq
q
q
q
q
0 20 40 60 80 100
−1.5−0.50.51.0
x
y
q
q
q
q
qqqqq
q
q
q
qqq
qq
q
q
q
qq
q
q
q
qq
q
q
q
q
q
q
qqqq
q
qq
q
q
qq
q
q
qq
q
q
q
q
qqq
q
q
q
qqq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
qq
q
q
qqq
qqq
qq
q
q
q
q
q
q
q
q
q
qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq
0 20 40 60 80 100
−1012345
x
y~
q
q
qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq
−1.0 −0.5 0.0 0.5
−1.0−0.50.00.5
y1
y2
q
q
q
q
q
q
qq
q
q
qqq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qqq
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
0 20 40 60 80 100
−3−2−10123
x
y
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
qqq
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq
0 20 40 60 80 100
−4−3−2−101
x
y~
q
q
q
qq
qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq
−3 −2 −1 0 1 2
−3−2−10123
y1
y2
Figure 5: Estimation of the factor loading matrix by the PCA and GPPCA for Example
1 with the variance of the noise being σ2
0 = 0.01 and σ2
0 = 1, graphed in the upper and
lower panels, respectively. The circles and dots are the ﬁrst and second rows of Y in the
left panel, and of ˜Y = YL in the middle panels, where L = UD1/2
with U being the
eigenvectors and the diagonals of D being the eigenvalues of (ˆσ2
0
ˆΣ−1
+ In)−1
. In the
right panels, the black, red and blue lines are the subspace of A, the ﬁrst eigenvector of
U0 and Y(ˆσ2
0
ˆΣ−1
+ In)−1
YT
, respectively, with the black triangles being the outputs.

Estimation of the mean
0 20 40 60 80 100
−1.0−0.50.00.5
x
Y
^
1
PCA
GPPCA
Truth
qq
q
q
q
q
q
q
q
qqqq
q
qq
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
qqq
q
q
q
qq
q
qq
q
qq
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
qqq
q
q
q
q
q
q
q
qqq
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
0 20 40 60 80 100
−0.20.00.20.4
x
Y
^
2
PCA
GPPCA
Truth
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
qq
qq
q
q
q
q
q
qq
q
q
qq
q
q
q
q
qqq
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
0 20 40 60 80 100
−3−2−1012
x
Y
^
1
PCA
GPPCA
Truth
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
0 20 40 60 80 100
−2−10123
x
Y
^
2
PCA
GPPCA
Truth
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
qq
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
Figure 6: Estimation of AZ for Example 1 with the variance of the noise being
σ2
0 = 0.01 and σ2
0 = 1, graphed in the upper panels and lower panels, respectively. The
ﬁrst row and second row of Y are graphed as the black curves in the left and right
panels, respectively. The red dotted curves and the blue dashed curves are the prediction
by the PCA and GPPCA, respectively. The grey region is the 95% posterior credible
interval from GPPCA.

GPPCA with a mean structure
Outline
1 Introduction
5 Real examples
6 Future directions

Latent factor model with covariates
Consider the latent factor model with a mean structure for a k-dimensional
output vector at the input x,
y(x) = (h(x)B)T
+ Az(x) + , (10)
where h(x) is a 1 × q known mean basis function related to input x and possibly
other covariates, B = (β1, ..., βk) is a q × k matrix of the mean (or trend)
parameters. Denote M = In − H(HT
H)−1
HT
. We have the following lemma
for the marginal likelihood estimator of the variance.
Lemma 2
Consider an objective prior π(B) ∝ 1. Under Assumption 1, after marginalizing
out B and Z, the maximum likelihood estimator for σ2
0 is ˆσ2
0 = S2
M /k(n − q),
where S2
M = tr(YMYT
) −
d
l=1 aT
l YM(M + τ−1
l K−1
l )−1
MYT
al. Moreover,
the marginal density of the data satisﬁes
p(Y | A, τ, γ, ˆσ2
0) ∝
d
l=1
|τlKl + In|
−1/2
HT
(τlKl + In)−1
H
− 1
2
S2
M
−(k(n−q)
2 )

GPPCA with the mean structure
Since there is no closed-form expression for the parameters (τ, γ) in the
kernels, one can numerically maximize the Equation (11) to estimate A
and other parameters.
Â = argmaxA
d
l=1
aT
l Gl,M al, s.t. AT
A = Id, (11)
(ˆτ, ˆγ) = argmax(τ,γ)p(Y | Â, τ, γ). (12)
When Σ1 = ... = Σd, the closed-form expression of Â can be obtained
similarly in Theorem 1. In general, we can use the approach in [Wen and
Yin, 2013] for solving the optimization problem in (12). After obtaining ˆτ
and ˆσ2
0, we transform them to get ˆσ2
l = ˆτlˆσ2
0 for l = 1, ..., d.

Theorem 3 (Predictive distribution)
Under the Assumption 1, after marginalizing out Z and B by the objective prior
π(B) ∝ 1, the predictive distribution of model (10) for any x∗
is
Y(x∗
) | Y, Â, ˆγ, ˆσ2
, ˆσ2
0 ∼ MN ˆµ∗
M (x∗
), ˆΣ∗
M (x∗
) .
Here
ˆµ∗
M (x∗
) = h(x∗
)ˆB
T
+ ÂˆzM (x∗
),
where ˆB = (HT
H)−1
H(Y − ÂˆZM )T
, ˆZM = (ˆZT
1,M , ..., ˆZT
d,M )T
, with
ˆZl,M = aT
l YM( ˆΣlM + ˆσ2
0In)−1 ˆΣl, and ˆzM (x∗
) = (ˆz1,M (x∗
), ..., ˆzd,M (x∗
))T
with
ˆzl,M (x∗
) = ˆΣT
l (x∗
)( ˆΣlM + ˆσ2
0In)−1
MYal, for l = 1, ..., d. Moreover,
ˆΣ∗
M (x∗
) = Â ˆDM (x∗
) ÂT
+ ˆσ2
0(1 + h(x∗
)(HT
H)−1
hT
(x∗
))(Ik − Â ÂT
),
where ˆDM (x∗
) is a diagonal matrix with the lth diagonal term being
ˆDl,M (x∗
) = ˆσ2
l
ˆKl(x∗
, x∗
) + ˆσ2
0 − ˆΣT
l (x∗
) ˜Σ−1
l
ˆΣl(x∗
)
+ (hT
(x∗
) − HT ˜Σ−1
l
ˆΣl(x∗
))T
(HT ˜Σ−1
l H)−1
(hT
(x∗
) − HT ˜Σ−1
l
ˆΣl(x∗
)),
with ˜Σl = ˆΣl + ˆσ2
0In for l = 1, ..., d.

Simulated examples
Outline
1 Introduction
5 Real examples
6 Future directions

Simulated examples Correctly speciﬁed models
Outline
1 Introduction
5 Real examples
6 Future directions

Evaluation criteria
(Largest principal angle). Let 0 ≤ φ1 ≤ ... ≤ φd ≤ π/2 be the principal
angles between M(A) and M(Â), recursively defined by
φi = arccos max
a∈M(A),â∈M( Â)
|aT
â| = arccos(|aT
i âi|),
subject to
||a|| = ||â|| = 1, aT
ai = 0, âT
âi = 0, i = 1, ..., d − 1,
where || · || denotes the L2 norm. The largest principal angle is φd. When
the columns of the A and Â are orthogonal bases of the M(A) and M(Â),
cos(φd) is equal to the smallest singular value of AT Â [Björck and Golub,
1973, Absil et al., 2006].

Evaluation criteria
(Largest principal angle). Let 0 ≤ φ1 ≤ ... ≤ φd ≤ π/2 be the principal
angles between M(A) and M(Â), recursively defined by
φi = arccos max
a∈M(A),â∈M( Â)
|aT
â| = arccos(|aT
i âi|),
subject to
||a|| = ||â|| = 1, aT
ai = 0, âT
âi = 0, i = 1, ..., d − 1,
where || · || denotes the L2 norm. The largest principal angle is φd. When
the columns of the A and Â are orthogonal bases of the M(A) and M(Â),
cos(φd) is equal to the smallest singular value of AT Â [Björck and Golub,
1973, Absil et al., 2006].
(Average mean square error (AvgMSE)) of the output over N experiments:
AvgMSE =
N
l=1
k
j=1
n
i=1
( ˆY
(l)
j,i − E[Y
(l)
j,i ])2
knN
, (13)
where E[Y
(l)
j,i ] is the (j, i)-term of the mean of the output matrix at the lth
experiment, and ˆY
(l)
j,i is the estimation.

Approaches
In the GPPCA, we let the covariance function of the lth factor be a product
kernel σ2
l Kl(xa, xb) = σ2
l
p
m=1 Klm(xam, xbm) for demonstration
purposes, where Klm(·, ·) is the Matérn kernel with roughness parameter 2.5
Klm(xam, xbm) = 1 +
√
5d
γlm
+
5d2
3γ2
lm
exp −
√
5d
γlm
, (14)
with d = |xam − xbm| and unknown range parameters γl = (γl1, ..., γlp).
The MMLE will be used for estimating factor loading matrix and the
parameters and the predictive mean of the data will be used for prediction.
In PCA, Âpca = U0, where U0 is the first d eigenvectors of YYT
/n.
In [Lam et al., 2011, Lam and Yao, 2012], A is estimated by
q0
q=1
ˆΣy(q)ˆΣT
y (q) with q0 = 1 and q0 = 5, where ˆΣy(q) is the sample
covariance of the output at lag q.
Independent GPs and parallel partial GPs will also be included for the last
simulated examples.

Example 2 (Factors with the same covariance matrix)
The data are sampled from model (1) with Σ1 = ... = Σd = Σ, where
xi = i for 1 ≤ i ≤ n, and the kernel function in (14) is used with γ = 100
and σ2 = 1. In each scenario, we simulate the data from 16 diﬀerent
combinations of σ2
0, k, d and n. We repeat N = 100 times for each
scenario. The parameters (σ2
0, σ2, γ) are treated as unknown and
estimated from the data.

q
q
q
q
q
q
q
q
q
qq
q
qq
qqq
qqqq
qqqqq
q
q
q
q
q
qq
q
q
q
PCA
GPPCA
LY1
LY5
PCA
GPPCA
LY1
LY5
0.0
0.5
1.0
1.5
k = 8, d = 4 and τ = 100
LargestPrincipalAngle
q
q
q
qq
q
q
qqqqqq
q
qqq
q
q
qqq
q
qq
PCA
GPPCA
LY1
LY5
PCA
GPPCA
LY1
LY5
0.0
0.5
1.0
1.5
k = 40, d = 4 and τ = 100
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
qqq
qq
PCA
GPPCA
LY1
LY5
PCA
GPPCA
LY1
LY5
0.0
0.5
1.0
1.5
k = 16, d = 8 and τ = 100
q
q
q
qq
q
q
q
qqq
q
qq q
q
qq
qq
PCA
GPPCA
LY1
LY5
PCA
GPPCA
LY1
LY5
0.0
0.5
1.0
1.5
k = 80, d = 8 and τ = 100
qq
q
q
q
q
q
q
qq
q
q
q
PCA
GPPCA
LY1
LY5
PCA
GPPCA
LY1
LY5
0.0
0.5
1.0
1.5
k = 8, d = 4 and τ = 4
q
q
qq
qq
q
qq
q
q
q
q
qq
PCA
GPPCA
LY1
LY5
PCA
GPPCA
LY1
LY5
0.0
0.5
1.0
1.5
k = 40, d = 4 and τ = 4
q
q
q
qq
q
q
qq
qq
qq
q
q
q
q
q
PCA
GPPCA
LY1
LY5
PCA
GPPCA
LY1
LY5
0.0
0.5
1.0
1.5
k = 16, d = 8 and τ = 4
qqq
q
q
q
q
q
q
qq
q
q
q
PCA
GPPCA
LY1
LY5
PCA
GPPCA
LY1
LY5
0.0
0.5
1.0
1.5
k = 80, d = 8 and τ = 4
Figure 7: The largest principal angle. n = 200 and n = 400 for the left four
boxplots and right four boxplots in the ﬁrst rows, respectively; n = 500 and
n = 1000 in the second row.

d = 4 and τ = 100 k=8 k=40
n = 200 n = 400 n = 200 n = 400
PCA 5.3 × 10−3 5.1 × 10−3 1.4 × 10−3 1.1 × 10−3
GPPCA 3.3 × 10−4 2.6 × 10−4 2.2 × 10−4 1.3 × 10−4
LY1 4.6 × 10−2 5.8 × 10−3 1.5 × 10−2 2.1 × 10−3
LY5 3.2 × 10−2 5.5 × 10−3 1.1 × 10−2 1.8 × 10−3
d = 8 and τ = 100 k=16 k=80
n = 500 n = 1000 n = 500 n = 1000
PCA 5.2 × 10−3 5.0 × 10−3 1.3 × 10−3 1.1 × 10−3
GPPCA 2.9 × 10−4 2.4 × 10−4 1.9 × 10−4 1.1 × 10−4
LY1 1.4 × 10−2 5.1 × 10−3 5.4 × 10−3 1.2 × 10−3
LY5 8.8 × 10−3 5.1 × 10−3 3.9 × 10−3 1.2 × 10−3
d = 4 and τ = 4 k=8 k=40
n = 200 n = 400 n = 200 n = 400
PCA 1.4 × 10−1 1.3 × 10−1 4.2 × 10−2 3.4 × 10−2
GPPCA 5.8 × 10−3 4.4 × 10−3 5.3 × 10−3 3.0 × 10−3
LY1 2.2 × 10−1 1.7 × 10−1 7.2 × 10−2 6.4 × 10−2
LY5 2.2 × 10−1 1.5 × 10−1 4.8 × 10−2 4.1 × 10−2
d = 8 and τ = 4 k=16 k=80
n = 500 n = 1000 n = 500 n = 1000
PCA 1.4 × 10−1 1.3 × 10−1 3.9 × 10−2 3.2 × 10−2
GPPCA 5.1 × 10−3 3.9 × 10−3 4.3 × 10−3 2.4 × 10−3
LY1 1.8 × 10−1 1.4 × 10−1 5.1 × 10−2 3.4 × 10−2
LY5 1.7 × 10−1 1.3 × 10−1 4.6 × 10−2 3.1 × 10−2
Table 1: AvgMSE for Example 2.

Prediction of the mean by PCA and GPPCA
0 100 200 300 400
−1.5−0.50.5
x
Y
^
1
PCA
GPPCA
Truth
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
0 100 200 300 400
−1.00.01.0
x
Y
^
2
PCA
GPPCA
Truth
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qqq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
Figure 8: Prediction of the mean (AZ) of the ﬁrst two output variables in one
experiment with k = 8, d = 4, n = 400 and τ = 4. The observations are plotted
as black circles and the truth is graphed as the black curves. The estimation by
the PCA and GPPCA is graphed as the red dotted curves and blue dashed curves,
respectively. The shaded area is the 95% posterior credible interval by the
GPPCA.

Example 3 (Factors with diﬀerent covariance matrices)
The data are sampled from model (1) where xi = i for 1 ≤ i ≤ n. The
variance of the noise is σ2
0 = 0.25 and the kernel function is assumed to
follow from (14) with σ2 = 1. The range parameter γ of each factor is
uniformly sampled from [10, 103] in each experiment. In each scenario, we
simulate the data from 8 diﬀerent combinations of k, d and n. We repeat
N = 100 times for each scenario. The parameters in the kernels and the
variance of the noise are treated as unknown and estimated from the data.

Largest Principal angles for Example 3
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
PCA
GPPCA
LY1
LY5
PCA
GPPCA
LY1
LY5
0.0
0.5
1.0
1.5
k = 8 and d = 4
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
qq
q
q
q
PCA
GPPCA
LY1
LY5
PCA
GPPCA
LY1
LY5
0.0
0.5
1.0
1.5
k = 40 and d = 4
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qqqq
q
q
q
qq
q
q
q
qqqq
q
q
q
q
q
q
q
qqq
q
q
q
q
q
q
q
q
qq
q
qq
PCA
GPPCA
LY1
LY5
PCA
GPPCA
LY1
LY5
0.0
0.5
1.0
1.5
k = 16 and d = 8
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
qq
q
qq
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
PCA
GPPCA
LY1
LY5
PCA
GPPCA
LY1
LY5
0.0
0.5
1.0
1.5
k = 80 and d = 8
Figure 9: The largest principal angle between the true subspace and the estimated
subspace of the four approaches for Example 3. The number of observations of
each output variable is n = 200 and n = 400 for left 4 boxplots and right 4
boxplots in 2 left panels, respectively. The number of observations is n = 500 and
n = 1000 for left 4 boxplots and right 4 boxplots in 2 right panels, respectively.

AvgMSE for Example 3
d = 4 and τ = 4 k=8 k=40
n = 200 n = 400 n = 200 n = 400
PCA 1.3 × 10−1 1.3 × 10−1 3.8 × 10−2 3.0 × 10−2
GPPCA 1.4 × 10−2 4.0 × 10−2 7.1 × 10−3 1.1 × 10−2
LY1 1.6 × 10−1 1.4 × 10−1 4.9 × 10−2 3.4 × 10−2
LY5 1.5 × 10−1 1.3 × 10−1 4.4 × 10−2 3.2 × 10−2
d = 8 and τ = 4 k=16 k=80
n = 500 n = 1000 n = 500 n = 1000
PCA 1.3 × 10−1 1.3 × 10−1 3.5 × 10−2 2.9 × 10−2
GPPCA 1.3 × 10−2 3.3 × 10−2 6.0 × 10−3 8.0 × 10−3
LY1 1.4 × 10−1 1.3 × 10−1 3.7 × 10−2 2.9 × 10−2
LY5 1.4 × 10−1 1.3 × 10−1 3.4 × 10−2 2.8 × 10−2

Simulated examples Misspeciﬁed models
Outline
1 Introduction
5 Real examples
6 Future directions

Example 4 (Unconstrained factor loadings and misspecified kernel
functions)
The data are sampled from model (1) with Σ1 = ... = Σd = Σ and xi = i
for 1 ≤ i ≤ n. Each entry of the factor loading matrix is assumed to be
uniformly sampled from [0, 1] independently (without the orthogonal
constraints in (3)). The exponential kernel and the Gaussian kernel are
assumed in generating the data with different combinations of σ2
0 and n,
while in the GPPCA, we still use the Matérn kernel function in (14) for
the estimation. We assume k = 20, d = 4, γ = 100 and σ2 = 1 in
sampling the data. We repeat N = 100 times for each scenario. All the
kernel parameters and the noise variance are treated as unknown and
estimated from the data.

Largest Principal angle for Example 4
q
q
qqq
q
q
q
qqq
q
q
q
q
qq
PCA
GPPCA
LY1
LY5
PCA
GPPCA
LY1
LY5
PCA
GPPCA
LY1
LY5
0.0
0.5
1.0
1.5
Exponential Kernel
k = 20, d = 4 and τ = 0.25
q
q
q
q
q
q
q
q
q
q
q
qqq
q
q
q
q
q
qq
q
q
q qq
q
qq q
q
q
q
PCA
GPPCA
LY1
LY5
PCA
GPPCA
LY1
LY5
PCA
GPPCA
LY1
LY5
0.0
0.5
1.0
1.5
Gaussian Kernel
k = 20, d = 4 and τ = 0.25
Figure 10: The largest principal angle between the estimated subspace of four
approaches and the true subspace for Example 4. The number of observations are
assumed to be n = 100, n = 200 and n = 400 for left 4 boxplots, middle 4
boxplots and right 4 boxplots in both panels, respectively. The kernel in
simulating the data is assumed to be the exponential kernel in the left panel,
whereas the kernel is assumed to be the Gaussian kernel in the right panel.

exponential kernel and τ = 4
n = 100 n = 200 n = 400
PCA 7.4 × 10−2 6.1 × 10−2 5.4 × 10−2
GPPCA 3.1 × 10−2 2.6 × 10−2 2.4 × 10−2
LY1 1.5 × 10−1 8.2 × 10−1 5.7 × 10−2
LY5 1.3 × 10−1 7.3 × 10−1 5.6 × 10−2
Gaussian kernel and τ = 1/4
n = 100 n = 200 n = 400
PCA 1.1 × 100 8.9 × 10−1 8.4 × 10−1
GPPCA 7.2 × 10−1 6.6 × 10−1 6.2 × 10−1
LY1 1.3 × 100 1.0 × 100 8.6 × 10−1
LY5 1.3 × 100 1.0 × 100 8.6 × 10−1

Example 5 (Unconstrained factor loadings and deterministic factors)
The data are sampled from model (1) with each latent factor being a
deterministic function
Zl(xi) = cos(0.05πθlxi)
where θl
i.i.d.
∼ unif(0, 1) for l = 1, ..., d, with xi = i for 1 ≤ i ≤ n,
σ2
0 = 0.25, k = 20 and d = 4. Four scenarios are considered with the
sample size n = 100, n = 200, n = 400 and n = 800.

Largest Principal angle for Example 5
q
q
q
q
q
q
q
qq
qq
q
qqq
q
q
qqq
q q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
PCA
GPPCA
LY1
LY5
PCA
GPPCA
LY1
LY5
PCA
GPPCA
LY1
LY5
PCA
GPPCA
LY1
LY5
0.0
0.5
1.0
1.5
Deterministic Factors
k = 20, d = 4 and τ = 4
Figure 11: The largest principal angle between the estimated subspace of the
loading matrix and the true subspace for Example 5. From the left to the right,
the number of observations is assumed to be n = 100, n = 200, n = 400 and
n = 800 for each 4 boxplots, respectively.

n = 100 n = 200 n = 400 n = 800
PCA 7.0 × 10−2
6.0 × 10−2
5.4 × 10−2
5.2 × 10−2
GPPCA 1.4 × 10−2
9.2 × 10−3
6.7 × 10−3
5.5 × 10−3
LY1 9.8 × 10−1
7.6 × 10−1
6.3 × 10−2
5.7 × 10−2
LY5 9.3 × 10−2
7.3 × 10−2
6.2 × 10−2
5.6 × 10−2
Ind GP 2.0 × 10−2
1.9 × 10−2
1.7 × 10−2
1.7 × 10−2
PP GP 2.0 × 10−2
1.9 × 10−2
1.8 × 10−2
1.8 × 10−2
The Ind GP approach treats each output variable independently and the
mean of the output is estimated by the predictive mean in the Gaussian
process regression.
The PP GP approach also models each output variable independently by a
Gaussian process, whereas the covariance function is shared for k
independent Gaussian processes and estimated based on all data.

Real examples
Outline
1 Introduction
5 Real examples
6 Future directions

Real examples Humanity computer model with multiple outputs
Outline
1 Introduction
5 Real examples
6 Future directions

We first consider the testbed called the ‘diplomatic and military operations
in a non-warfighting domain’ (DIAMOND) simulator, which models the
number of casualties during the second day to sixth day after the earthquake
and volcanic eruption in Giarre and Catania. The input variables are 13
dimensional, such as the helicopter cruise speed, engineer ground speed,
hospital, shelter and food supply capacity in these two places, etc.
We use the same n = 120 training and n∗
= 120 testing outputs in
[Overstall and Woods, 2016] to compare different approaches. The criteria
for out of sample prediction are
RMSE =
k
j=1
n∗
i=1( ˆY ∗
j (x∗
i ) − Y ∗
j (x∗
i ))2
kn∗
,
PCI(95%) =
1
kn∗
k
j=1
n∗
i=1
1{Y ∗
j (x∗
i ) ∈ CIij(95%)} ,
LCI(95%) =
1
kn∗
k
j=1
n∗
i=1
length{CIij(95%)} .

Method Mean function Kernel RMSE PCI (95%) LCI (95%)
GPPCA Intercept Gaussian kernel 3.33 × 102 0.948 1.52 × 103
GPPCA Selected covariates Gaussian kernel 3.18 × 102 0.957 1.31 × 103
GPPCA Intercept Matérn kernel 2.82 × 102 0.962 1.22 × 103
GPPCA Selected covariates Matérn kernel 2.74 × 102 0.957 1.18 × 103
Ind GP Intercept Gaussian kernel 3.64 × 102 0.918 1.18 × 103
Ind GP Selected covariates Gaussian kernel 4.04 × 102 0.918 1.17 × 103
Ind GP Intercept Matérn kernel 3.40 × 102 0.930 0.984 × 103
Ind GP Selected covariates Matérn kernel 3.31 × 102 0.927 0.967 × 103
Multi GP Intercept Gaussian kernel 3.63 × 102 0.975 1.67 × 103
Multi GP Selected covariates Gaussian kernel 3.34 × 102 0.963 1.54 × 103
Multi GP Intercept Matérn kernel 3.01 × 102 0.962 1.34 × 103
Multi GP Selected covariates Matérn kernel 3.05 × 102 0.970 1.50 × 103
Table 5: The GPPCA and Ind GP with the same mean structure and kernels are
given in the first 8 rows. The 9th and 10th rows show the emulation result of two
best models in [Overstall and Woods, 2016] using Gaussian kernel for the same
held-out testing output, whereas the last two rows give the result of the same
model with the Matérn kernel in (14). The RMSE is 1.08 × 105
using the mean
of the training output to predict.

Estimated covariance and prediction
2
3
4
5
6
2 3 4 5 6
Day
Day
2.5e+06
5.0e+06
7.5e+06
1.0e+07
Covariance
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
0 20 40 60 80 100 120
050001500025000
Held out runs
Output
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
Casualties on day 5
GPPCA prediction for day 5
Ind GP prediction for day 5
Casualties on day 6
GPPCA prediction for day 6
Ind GP prediction for day 6
Figure 12: The estimated covariance of the casualties by the GPPCA at the
different days after the catastrophe is graphed in the left panel. The held out
testing output, the prediction by the GPPCA and Independent GPs with the mean
basis h(x) = (1, x11) and Matérn kernel for the fifth day and sixth day are
graphed in the right panel.

Real examples Global gridded temperature anomalies
Outline
1 Introduction
5 Real examples
6 Future directions

NOAA global gridded temperature anomalies
The dataset from U.S. National Oceanic and Atmospheric Administration
(NOAA) the global gridded monthly anomalies of the combined air and
marine temperature from Jan 1880 to near present with 5◦
× 5◦
latitude-longitude spatial resolution. The recorded variance of the
measurement error is around 0.1.
We compare diﬀerent approaches on interpolation. We use the monthly
temperature anomalies at 1, 639 spatial grid boxes in the past 20 years. We
hold out the 24, 000 randomly sampled measurements on 1, 200 spatial grid
boxes in 20 months as the test data set.
For GPPCA, the mean basis function h(x) = (1, x), where x is an integer
from 1 to 240 for the month. We also assume the covariance is the same for
all factor processes.
We will also compare with PPCA, spatial smoothing and temporal
smoothing approach. For the temporal smoothing approach, we also assume
h(x) = (1, x).
Random forest regression by making independence assumption either across
space or across time.

Method measurement error RMSE PCI (95%) LCI (95%)
GPPCA, d = 50 estimated 0.392 0.877 1.03
GPPCA, d = 100 estimated 0.330 0.774 0.564
GPPCA, d = 50 fixed 0.392 0.938 1.34
GPPCA, d = 100 fixed 0.335 0.976 1.44
PPCA, d = 50 estimated 0.644 0.674 1.09
PPCA, d = 100 estimated 0.644 0.520 1.40
PPCA, d = 50 fixed 0.641 0.760 1.33
PPCA, d = 100 fixed 0.622 0.801 1.400
Temporal smoothing by GP estimated 1.02 0.940 2.36
Spatial smoothing by GP estimated 0.623 0.917 1.95
Temporal regression by RF estimated 0.497 / /
Spatial regression by RF estimated 0.444 / /
Table 6: Out of sample prediction of the temperature anomalies by different
approaches. The predictive performance by the GPPCA and PPCA are given in
the first four rows and latter four rows. The predictive performance by the
temporal smoothing method and spatial smoothing methods are given in the 9th
and 10th rows. The last two rows give the predictive RMSE by regression using
the random forest (RF) algorithm.

Comparison between the GPPCA and spatial smoothing
−6
−4
−2
0
2
4
6
°C
50 150 250 350
−50
0
50
Interpolion by the GPPCA
Longitude
Latitude
−6
−4
−2
0
2
4
6
°C
50 150 250 350
−50
0
50
Observated temperature anomalies
Longitude
Latitude
−6
−4
−2
0
2
4
6
°C
50 150 250 350
−50
0
50
Interpolion by the spatial smoothing method
Longitude
Latitude
Figure 13: The interpolated and observed temperature anomalies in April 2013.
The observed temperature anomalies in April 2013 is graphed in the middle panel.
The interpolated temperature anomalies by the GPPCA and spatial smoothing
method are graphed in the left and right panels, respectively. The number of
training observations and test observations are 439 and 1200, respectively. The
out-of-sample RMSE of the GPPCA and spatial smoothing method is 0.335 and
0.779, respectively.

Estimated Intercept and trend by the GPPCA
−1.5
−1.0
−0.5
0.0
0.5
1.0
1.5
°C
50 150 250 350
−50
0
50
Estimated intercept
Longitude
Latitude
−0.015
−0.010
−0.005
0.000
0.005
0.010
0.015
°C
50 150 250 350
−50
0
50
Estimated monthly temperature change rate
Longitude
Latitude
Figure 14: Estimated intercept and monthly change rate of the temperature
anomalies by the GPPCA using the monthly temperature anomalies between
January 1999 and December 2018.
The spatial orthonormal basis of A could be used. GPPCA is more
general as it does not require the distance between functions.
The GPPCA can be extended to the irregular missing case by the EM
algorithm or MCMC algorithm if one can specify the full posteriors.

Future directions
Outline
1 Introduction
5 Real examples
6 Future directions

Future directions
Future directions
The full Bayesian approach of the factor loading matrix and
parameters (based on the computationally feasible marginal
likelihood).
Estimating the number of factors.
Convergence rate of the GPPCA.
Extension when the observations are not a matrix.
Optimization algorithm on the Stiefel manifold.
Other orthonormal basis of the factor loading matrix.
Other ways to model the factor processes.
Heteroscedastic noise.

Future directions
Reference
Gu, M. and Shen, W. (2018) Generalized probabilistic principal component
analysis (GPPCA) for correlated data. arXiv:1808.10868.

Future directions
Thanks!

Future directions
Related literature: frequentist approaches
The MLE of the factor loading matrix A under the Assumption 1 is U0R
(without marginalizing out Z), where ˜U0 is the first d ordered eigenvectors
of YYT
and R is an orthogonal rotation matrix (same as the PCA).
PCA is widely used in factor models, particularly in modeling multiple time
series. E.g. Bai and Ng [2002] and Bai [2003] assume that AT
A = kId and
estimate A by
√
kU0 in modeling high-dimensional time series.
PCA is also widely used to estimate the basis in the linear model of
coregionalization [Higdon et al., 2008, Paulo et al., 2012].
In [Tipping and Bishop, 1999], the linear subspace by the PCA is the MMLE
of the factor model with independent factors.
[Lam et al., 2011, Lam and Yao, 2012] estimate the factor loading matrix of
model (1) by ÂLY :=
q0
q=1
ˆΣy(q)ˆΣT
y (q), where ˆΣy(q) is the k × k sample
covariance at lag q of the output and q0 is fixed to be a positive integer.
Kernel PCA was introduced in machine learning, which map the output onto
the feature space by kernels [Schölkopf et al., 1998, Mika et al., 1999,
Hoffmann, 2007].

Future directions
Related literature: Bayesian approaches
[West, 2003] points out the connection between PCA and a class of
generalized singular g-priors, and introduces a spike-and-slab prior
that induces the sparse factors in the latent factor model assuming
the factors are independently distributed.
Another prior that induces the sparsity is introduced by [Bhattacharya
and Dunson, 2011] under the independent assumptions of the factors,
and its asymptotic behaviors are also discussed.
[Nakajima and West, 2013, Zhou et al., 2014] introduce a method to
directly threshold the time-varying factor loading matrix in Bayesian
dynamic linear models.
When modeling spatially correlated data, priors are also discussed for
the spatially varying factor loading matrices in LMC [Gelfand et al.,
2004, Banerjee et al., 2014].
[Higdon et al., 2008, Paulo et al., 2012, Fricker et al., 2013] use LMC
for emulating computer models with multiple outputs, estimates the
factor loading matrix and relies on MCMC algorithm for the inference.

References
P-A Absil, Alan Edelman, and Plamen Koev. On the largest principal
angle between random subspaces. Linear Algebra and its applications,
414(1):288–294, 2006.
Mauricio A Alvarez, Lorenzo Rosasco, Neil D Lawrence, et al. Kernels for
vector-valued functions: A review. Foundations and Trends R in
Machine Learning, 4(3):195–266, 2012.
Jushan Bai. Inferential theory for factor models of large dimensions.
Econometrica, 71(1):135–171, 2003.
Jushan Bai and Serena Ng. Determining the number of factors in
approximate factor models. Econometrica, 70(1):191–221, 2002.
Sudipto Banerjee, Bradley P Carlin, and Alan E Gelfand. Hierarchical
modeling and analysis for spatial data. Crc Press, 2014.
Anirban Bhattacharya and David B Dunson. Sparse Bayesian inﬁnite
factor models. Biometrika, pages 291–306, 2011.
ke Bj¨orck and Gene H Golub. Numerical methods for computing angles
between linear subspaces. Mathematics of computation, 27(123):
579–594, 1973.

References
Thomas E Fricker, Jeremy E Oakley, and Nathan M Urban. Multivariate
Gaussian process emulators with nonseparable covariance structures.
Technometrics, 55(1):47–56, 2013.
Alan E Gelfand, Alexandra M Schmidt, Sudipto Banerjee, and CF Sirmans.
Nonstationary multivariate process modeling through spatially varying
coregionalization. Test, 13(2):263–312, 2004.
Mengyang Gu. FastGaSP: Fast and Exact Computation of Gaussian
Stochastic Process, 2019. URL
https://CRAN.R-project.org/package=FastGaSP. R package
version 0.5.1.
Mengyang Gu and Kyle Anderson. Calibration of imperfect mathematical
models by multiple sources of data with measurement bias. arXiv
preprint arXiv:1810.11664, 2018.
Mengyang Gu and James O Berger. Parallel partial Gaussian process
emulation for computer models with massive output. Annals of Applied
Statistics, 10(3):1317–1347, 2016.
Mengyang Gu and Yanxun Xu. Nonseparable Gaussian stochastic process:

References
A unified view and computational strategy. arXiv preprint
arXiv:1711.11501, 2017.
Jouni Hartikainen and Simo Sarkka. Kalman filtering and smoothing
solutions to temporal gaussian process regression models. In Machine
Learning for Signal Processing (MLSP), 2010 IEEE International
Workshop on, pages 379–384. IEEE, 2010.
Dave Higdon, James Gattiker, Brian Williams, and Maria Rightley.
Computer model calibration using high-dimensional output. Journal of
the American Statistical Association, 103(482):570–583, 2008.
Heiko Hoffmann. Kernel PCA for novelty detection. Pattern recognition,
40(3):863–874, 2007.
Clifford Lam and Qiwei Yao. Factor modeling for high-dimensional time
series: inference for the number of factors. The Annals of Statistics, 40
(2):694–726, 2012.
Clifford Lam, Qiwei Yao, and Neil Bathia. Estimation of latent factors for
high-dimensional time series. Biometrika, 98(4):901–918, 2011.
Sebastian Mika, Bernhard Schölkopf, Alex J Smola, Klaus-Robert Müller,
Matthias Scholz, and Gunnar Rätsch. Kernel PCA and de-noising in

References
feature spaces. In Advances in neural information processing systems,
pages 536–542, 1999.
Jouchi Nakajima and Mike West. Bayesian analysis of latent threshold
dynamic models. Journal of Business & Economic Statistics, 31(2):
151–164, 2013.
Antony M Overstall and David C Woods. Multivariate emulation of
computer simulators: model selection and diagnostics with application
to a humanitarian relief model. Journal of the Royal Statistical Society:
Series C (Applied Statistics), 65(4):483–505, 2016.
Rui Paulo, Gonzalo Garc´ıa-Donato, and Jesús Palomo. Calibration of
computer models with multivariate output. Computational Statistics
and Data Analysis, 56(12):3959–3974, 2012.
Bernhard Schölkopf, Alexander Smola, and Klaus-Robert Müller.
Nonlinear component analysis as a kernel eigenvalue problem. Neural
computation, 10(5):1299–1319, 1998.
Matthias Seeger, Yee-Whye Teh, and Michael Jordan. Semiparametric
latent factor models. Technical report, 2005.

Future directions
Michael E Tipping and Christopher M Bishop. Probabilistic principal
component analysis. Journal of the Royal Statistical Society: Series B
(Statistical Methodology), 61(3):611–622, 1999.
Zaiwen Wen and Wotao Yin. A feasible method for optimization with
orthogonality constraints. Mathematical Programming, 142(1-2):
397–434, 2013.
M. West. Bayesian factor regression models in the “large p, small n”
paradigm. In J. M. Bernardo, M. J. Bayarri, J. O. Berger, A. P. David,
D. Heckerman, A. F. M. Smith, and M. West, editors, Bayesian
Statistics 7, pages 723–732. Oxford University Press, 2003. URL
http://ftp.isds.duke.edu/WorkingPapers/02-12.html.
Peter Whittle. On stationary processes in the plane. Biometrika, pages
434–449, 1954.
Xiaocong Zhou, Jouchi Nakajima, and Mike West. Bayesian forecasting
and portfolio decisions using dynamic dependent sparse factor models.
International Journal of Forecasting, 30(4):963–980, 2014.

MUMS: Bayesian, Fiducial, and Frequentist Conference - Generalized Probabilistic Principal Component Analysis of Correlated Mortality, Mengyang Gu, April 30, 2019

Recommended

Recommended

More Related Content

What's hot

What's hot (19)

Similar to MUMS: Bayesian, Fiducial, and Frequentist Conference - Generalized Probabilistic Principal Component Analysis of Correlated Mortality, Mengyang Gu, April 30, 2019

Similar to MUMS: Bayesian, Fiducial, and Frequentist Conference - Generalized Probabilistic Principal Component Analysis of Correlated Mortality, Mengyang Gu, April 30, 2019 (20)

More from The Statistical and Applied Mathematical Sciences Institute

More from The Statistical and Applied Mathematical Sciences Institute (20)

Recently uploaded

Recently uploaded (20)

MUMS: Bayesian, Fiducial, and Frequentist Conference - Generalized Probabilistic Principal Component Analysis of Correlated Mortality, Mengyang Gu, April 30, 2019