ICT Role in 21st Century Education & its Challenges.pptx
QMC: Operator Splitting Workshop, Estimation of Inverse Covariance Matrix in Compositional Data - Aditya Mishra, Mar 21, 2018
1. Estimation of Inverse Covariance Matrix in
Compositional Data
Aditya Mishra
Flatiron Institute, Simons Foundation
Operator Splitting Methods in Data Analysis, SAMSI
Raleigh, NC
March 22, 2018
Aditya Mishra Flatiron Institute, Simons Foundation (Operator Splitting Methods in Data Analysis, SAMSI RPrecision Matrix Estimation March 22, 2018 1 / 16
2. Motivation: Human Microbiome Project
Aditya Mishra Flatiron Institute, Simons Foundation (Operator Splitting Methods in Data Analysis, SAMSI RPrecision Matrix Estimation March 22, 2018 2 / 16
3. Microbial Ecology and Human
Aditya Mishra Flatiron Institute, Simons Foundation (Operator Splitting Methods in Data Analysis, SAMSI RPrecision Matrix Estimation March 22, 2018 3 / 16
4. Generation: Compositional Data
Aditya Mishra Flatiron Institute, Simons Foundation (Operator Splitting Methods in Data Analysis, SAMSI RPrecision Matrix Estimation March 22, 2018 4 / 16
5. Generation: Compositional Data
Aditya Mishra Flatiron Institute, Simons Foundation (Operator Splitting Methods in Data Analysis, SAMSI RPrecision Matrix Estimation March 22, 2018 5 / 16
6. Compositional Data of OTU
OTU are given by index set: gi = { index set of ith OTU} with
cardinality pi.
Aditya Mishra Flatiron Institute, Simons Foundation (Operator Splitting Methods in Data Analysis, SAMSI RPrecision Matrix Estimation March 22, 2018 6 / 16
7. Compositional Data of OTU
OTU are given by index set: gi = { index set of ith OTU} with
cardinality pi.
Absolute abundance of components are unknown;
W =
w1g1 w1g2 w1g3 . . . w1gk
w2g1 w2g2 w2g3 . . . w2gk
...
...
...
...
...
wng1 wng2 wng3 . . . wngk
n×p=p1+...+pk
where operational taxonomic unit (OTU)
wigj = [wigj(1), wigj(2), . . . , wigj(pj)]
Let W be observation for random variable w = [wg1 , . . . , wgk
].
Define yigj = log wigj , and matrix Y = (yigj )ij. For Y, a random
variable y = [yg1 , . . . , ygk
].
Aditya Mishra Flatiron Institute, Simons Foundation (Operator Splitting Methods in Data Analysis, SAMSI RPrecision Matrix Estimation March 22, 2018 6 / 16
8. Compositional Data of OTU
OTU are given by index set: gi = { index set of ith OTU} with
cardinality pi.
Absolute abundance of components are unknown;
W =
w1g1 w1g2 w1g3 . . . w1gk
w2g1 w2g2 w2g3 . . . w2gk
...
...
...
...
...
wng1 wng2 wng3 . . . wngk
n×p=p1+...+pk
where operational taxonomic unit (OTU)
wigj = [wigj(1), wigj(2), . . . , wigj(pj)]
Let W be observation for random variable w = [wg1 , . . . , wgk
].
Define yigj = log wigj , and matrix Y = (yigj )ij. For Y, a random
variable y = [yg1 , . . . , ygk
].
We are interested in finding inverse of covariance matrix
(Σy
) of random variable y [Aitchison, 1982].
Aditya Mishra Flatiron Institute, Simons Foundation (Operator Splitting Methods in Data Analysis, SAMSI RPrecision Matrix Estimation March 22, 2018 6 / 16
9. Desirable Property of CDA Methods [Aitchison, 1982]
Scale invariance
Permutation invariance
Subcompositional coherence: Same results in a subcomposition,
regardless of whether we analyze only that subcomposition or a
larger composition containing other parts.
Aditya Mishra Flatiron Institute, Simons Foundation (Operator Splitting Methods in Data Analysis, SAMSI RPrecision Matrix Estimation March 22, 2018 7 / 16
10. Compositional Data of OTU
Absolute abundance of component is unknown;
W =
w1g1 w1g2 w1g3 . . . w1gk
w2g1 w2g2 w2g3 . . . w2gk
...
...
...
...
...
wng1 wng2 wng3 . . . wngk
n×p=p1+...+pk
Define sub-composition matrix:
CT
= c1 c1 c1 . . . ck
T
=
1T
p1
0 . . . 0
0 1T
p2
. . . 0
...
...
...
...
0 0 . . . 1T
pk
k×p
(1)
where 1pk
is all-ones vector of size pk.
Aditya Mishra Flatiron Institute, Simons Foundation (Operator Splitting Methods in Data Analysis, SAMSI RPrecision Matrix Estimation March 22, 2018 8 / 16
11. Compositional Data of OTU
Based count data W, define ¯xigj =
wigj
¯wij
where ¯wij =
pj
k=1 wigj(k).
Aditya Mishra Flatiron Institute, Simons Foundation (Operator Splitting Methods in Data Analysis, SAMSI RPrecision Matrix Estimation March 22, 2018 9 / 16
12. Compositional Data of OTU
Based count data W, define ¯xigj =
wigj
¯wij
where ¯wij =
pj
k=1 wigj(k).
Unknown relative abundance data:
¯X =
¯x1g1 ¯x1g2 ¯x1g3 . . . ¯x1gk
¯x2g1 ¯x2g2 ¯x2g3 . . . ¯x2gk
...
...
...
...
...
¯xng1 ¯xng2 ¯xng3 . . . ¯xngk
n×p=p1+...+pk
where OTU ¯xigj = [¯xigj(1), ¯xigj(2), . . . , ¯xigj(pj)].
Corresponding to ¯X, we have random variable ¯x = [¯xg1 , . . . , ¯xgk
]
with ¯xgj = [¯xgj(1), . . . , ¯xgj(pj)].
Aditya Mishra Flatiron Institute, Simons Foundation (Operator Splitting Methods in Data Analysis, SAMSI RPrecision Matrix Estimation March 22, 2018 9 / 16
13. Compositional Data of OTU
Based count data W, define ¯xigj =
wigj
¯wij
where ¯wij =
pj
k=1 wigj(k).
Unknown relative abundance data:
¯X =
¯x1g1 ¯x1g2 ¯x1g3 . . . ¯x1gk
¯x2g1 ¯x2g2 ¯x2g3 . . . ¯x2gk
...
...
...
...
...
¯xng1 ¯xng2 ¯xng3 . . . ¯xngk
n×p=p1+...+pk
where OTU ¯xigj = [¯xigj(1), ¯xigj(2), . . . , ¯xigj(pj)].
Corresponding to ¯X, we have random variable ¯x = [¯xg1 , . . . , ¯xgk
]
with ¯xgj = [¯xgj(1), . . . , ¯xgj(pj)].
Then CT¯xT = 0.
Aditya Mishra Flatiron Institute, Simons Foundation (Operator Splitting Methods in Data Analysis, SAMSI RPrecision Matrix Estimation March 22, 2018 9 / 16
14. Covariance in Relative Abundance Data
In term of absolute count random variable w = [wg1 , . . . , wgk
], we
can write an element of relative count random variable ¯x, i.e.,
¯xgi(k) = wgi(k)/ ¯wi where ¯wi = pi
k=1 wgi(k).
Let ¯w = [ ¯wi, . . . , ¯wk] (sum of each subgroup random variable
w:)
Aditya Mishra Flatiron Institute, Simons Foundation (Operator Splitting Methods in Data Analysis, SAMSI RPrecision Matrix Estimation March 22, 2018 10 / 16
15. Covariance in Relative Abundance Data
In term of absolute count random variable w = [wg1 , . . . , wgk
], we
can write an element of relative count random variable ¯x, i.e.,
¯xgi(k) = wgi(k)/ ¯wi where ¯wi = pi
k=1 wgi(k).
Let ¯w = [ ¯wi, . . . , ¯wk] (sum of each subgroup random variable
w:)
For any (i,j,k,l), we get:
cov(log ¯xgi(k), log ¯xgj(l)) = cov(log wgi(k), log wgj(l))−
cov(log wgi(k), log ¯wj)−
cov(log wgj(l), log ¯wi)+
cov(log ¯wi, log ¯wj)
On writing the expression of covariance matrix of random variable
¯x, we get
cov(log ¯x, log ¯x) =cov(y, y) − cov(C ¯w, y) − [cov(C ¯w, y)]T
+ cov(C ¯w, C ¯w) (2)
Aditya Mishra Flatiron Institute, Simons Foundation (Operator Splitting Methods in Data Analysis, SAMSI RPrecision Matrix Estimation March 22, 2018 10 / 16
16. Observed Relative Abundance Data of OTU
Based of observed abundance data available:
X =
x1g1 x1g2 x1g3 . . . x1gk
x2g1 x2g2 x2g3 . . . x2gk
...
...
...
...
...
xng1 xng2 xng3 . . . xngk
n×p=p1+...+pk
where operational taxonomic unit (OTU)
xigj = [˜xigj(1), ˜xigj(2), . . . , ˜xigj(pj)]
Define xigj =
xigj
pi
k=1 ˜xigj(k)
Aditya Mishra Flatiron Institute, Simons Foundation (Operator Splitting Methods in Data Analysis, SAMSI RPrecision Matrix Estimation March 22, 2018 11 / 16
17. Observed Relative Abundance Data of OTU
Based of observed abundance data available:
X =
x1g1 x1g2 x1g3 . . . x1gk
x2g1 x2g2 x2g3 . . . x2gk
...
...
...
...
...
xng1 xng2 xng3 . . . xngk
n×p=p1+...+pk
where operational taxonomic unit (OTU)
xigj = [˜xigj(1), ˜xigj(2), . . . , ˜xigj(pj)]
Define xigj =
xigj
pi
k=1 ˜xigj(k)
Using xigj , we have matrix of observed relative abundance
X = (xigj )ij.
Let observation X be corresponding to random variable x.
Using X, we can estimate
cov(log x, log x) = Σ
x
= cov(log ¯x, log ¯x).
Aditya Mishra Flatiron Institute, Simons Foundation (Operator Splitting Methods in Data Analysis, SAMSI RPrecision Matrix Estimation March 22, 2018 11 / 16
18. Covariance Estimation in CDA
From the result in equation (2):
Σ
x
= Σy
− cov(Cwg, y) − [cov(Cwg, y)]T
+ cov(Cwg, Cwg)
Consider the transformation matrix: F = I − Pc where
Pc = C(CTC)−1CT.
Using the transformation matrix, we can say
FΣ
x
F = FΣy
F
Aditya Mishra Flatiron Institute, Simons Foundation (Operator Splitting Methods in Data Analysis, SAMSI RPrecision Matrix Estimation March 22, 2018 12 / 16
19. Existing Approach in Unconstrained Setting
Graphical lasso formulation [Friedman et al., 2008]
min
Ω
− log Ω + tr(Σy
Ω) + λn Ω 1
Aditya Mishra Flatiron Institute, Simons Foundation (Operator Splitting Methods in Data Analysis, SAMSI RPrecision Matrix Estimation March 22, 2018 13 / 16
20. Existing Approach in Unconstrained Setting
Graphical lasso formulation [Friedman et al., 2008]
min
Ω
− log Ω + tr(Σy
Ω) + λn Ω 1
Consider that Σy
is known. CLIME estimator [Cai et al., 2011]
for its inverse Ω is given by
min Ω 1 s.t. ΩΣy
− I ∞ ≤ λn
Also it can be formulated as:
min Ω 1 s.t. Ω−1
− Σy
∞ ≤ λn
Aditya Mishra Flatiron Institute, Simons Foundation (Operator Splitting Methods in Data Analysis, SAMSI RPrecision Matrix Estimation March 22, 2018 13 / 16
21. Open Problem
On relaxing the nearness condition of Ω−1
− Σy
∞ ≤ λn for the
case of compositional data, we have
FΩ−1
F − FΣy
F ∞ ≤ λn
Given that FΣ
x
F = FΣy
F. Can we formulate the estimation of
sparse precision matrix as:
min Ω 1 s.t. FΩ−1
F − FΣ
x
F ∞ ≤ λn
Aditya Mishra Flatiron Institute, Simons Foundation (Operator Splitting Methods in Data Analysis, SAMSI RPrecision Matrix Estimation March 22, 2018 14 / 16
22. Reference
John Aitchison. The statistical analysis of compositional data. Journal
of the Royal Statistical Society. Series B (Methodological), pages
139–177, 1982.
Tony Cai, Weidong Liu, and Xi Luo. A constrained 1 minimization
approach to sparse precision matrix estimation. Journal of the
American Statistical Association, 106(494):594–607, 2011.
Jerome Friedman, Trevor Hastie, and Robert Tibshirani. Sparse
inverse covariance estimation with the graphical lasso. Biostatistics,
9(3):432–441, 2008.
Aditya Mishra Flatiron Institute, Simons Foundation (Operator Splitting Methods in Data Analysis, SAMSI RPrecision Matrix Estimation March 22, 2018 15 / 16
23. Thank You
Aditya Mishra Flatiron Institute, Simons Foundation (Operator Splitting Methods in Data Analysis, SAMSI RPrecision Matrix Estimation March 22, 2018 16 / 16