SlideShare a Scribd company logo
1 of 42
Manjari Narayan, 

Postdoctoral Scholar, Stanford University (School of Medicine) 

(PI: Amit Etkin, M.D., Ph.D.)
Tutorial presented at Junior Scientist Workshop at HHMI, Janelia Farms
Sparse Inverse Covariance Estimation
using skggm
skggm: Collaboration with Dr. Jason Laska, ML R&D at Clara Labs.
Explosion of Functional Imaging Tools
fMRI, fNIRS EEG, MEG
Intracranial EEG,
micro-ECoG
Molecular fMRI
Credit: Marie Suver, Ph.D. and Ainul Huda, University of Washington and
Michael H. Dickinson, Ph.D., California Institute of Technology

http://newsroom.cumc.columbia.edu/blog/2014/11/11/researchers-
receive-nih-brain-initiative-funding/

Calcium imaging
Credit: Misha Ahrens, Ph.D., Janelia Farms 

https://www.simonsfoundation.org/features/
foundation-news/how-do-different-brain-regions-
interact-to-enhance-function/

Light sheet
microscopy
Photo Credit: Tang, 2015, Scientific Reports.
Voltage-sensitive

Dye Imaging
Light field
microscopy
Credit: Raju Tomer, Ph.D. & Deisseroth Lab, Stanford University

http://techfinder.stanford.edu/technology_detail.php?ID=36402
Application: Functional Connectomics
Network as a unit of interest
Unobserved stochastic dependence/interaction
between neurons, circuits, regions, …
Ahrens, et. al. Nature (2012)
A shared goal across
modalities & resolutions
Macroscale
Mesoscale
T or n
p
Probabilistic Graphical Models
Many probabilistic models available, both directed and undirected
Graph G = (V, E)
Vertices V = (1, . . . , p), Edges E ⇢ V ⌦ V
X = (X1, . . . , Xp) ⇠ PX
Probabilistic graphical model relates PX to G
(j, k) 62 E () independence or conditional independence between Xj and Xk
Graph G = (V, E)
Vertices V = (1, . . . , p), Edges E ⇢ V ⌦ V
X = (X1, . . . , Xp) ⇠ PX
Probabilistic graphical model relates PX to G
E () independence or conditional independence between Xj and Xk
Observed: Unobserved:
Examples
Directed Acyclic Graphs (DAGs/Bayes-nets)
State-Space Models including linear/nonlinear VAR,
Undirected Graphical Models or Markov networks
Bivariate associations (Correlation, Granger-Causality, Transfer Entropy)
More informative than correlations: 



A measure of “direct” interactions that
elements “indirect” interactions due to
observed common causes. 

Benefits:
Studying cognitive mechanisms
Designing interventional targets
Science-wide efficient use of data
Models for Connectivity:

Conditional Dependence & Markov Networks
conditional dependence 

(“partial correlations”)
marginal dependence

(“marginal correlations”)
Introduction to Markov Networks
Markov Properties
• Graph G = (V, E)
• Vertices V = {1, 2, . . . p} and Edges E ⇢ V ⇥ V
• Multivariate Normal x1, . . . , xT
i.i.d
⇠ Np(0, ⌃)
• Inverse Covariance ⌃ 1
= ⇥
1
5
2
4
3
X5 ? X1|XV {1,5}
Pairwise Markov Property (P):
Two variables conditionally independent,
given all other nodes
Lauritzen (1996)
Markov Properties
• Graph G = (V, E)
• Vertices V = {1, 2, . . . p} and Edges E ⇢ V ⇥ V
• Multivariate Normal x1, . . . , xT
i.i.d
⇠ Np(0, ⌃)
• Inverse Covariance ⌃ 1
= ⇥
1
5
2
4
3
X5 ? XV ne(5)|Xne(5), ne(5) = {2, 5}
Local Markov Property (L):
A variable is conditionally independent of all others
given its neighbors
Lauritzen (1996)
Global Markov Property (G):
Given 3 disjoint sets A, B and C such that 

all paths between A to B go through C, then 

A conditionally independent of B given C
Markov Properties
• Graph G = (V, E)
• Vertices V = {1, 2, . . . p} and Edges E ⇢ V ⇥ V
• Multivariate Normal x1, . . . , xT
i.i.d
⇠ Np(0, ⌃)
• Inverse Covariance ⌃ 1
= ⇥
1
5
2
4
3
A
B
C
XA ? XB|XC, where XA = {Xa}a2A
Lauritzen (1996)
Intersection property: Holds for positive densities, e.g. Gaussian

Factorizes probability distribution

Computational tractability & Statistical power 

to identify all conditional independences
Extended to some non-positive densities!
Benefits of Global Markov Properties
P(X) = P(XA|XC)P(XB|XC)P(XC)
If A ? B|(C, D) and A ? C|(B, D) then A ? (B [ C)|D
Lauritzen (1996)
1
5
2
4
3
A
B
C
Generality of Markov Networks
For many types of pairwise associations
Markov Networks that satisfy Global Markov Property
Pairwise Association Markov Networks
Correlation Zero partial correlation = Conditional independence
Coherence or Coherency Zero partial coherence = Conditional independence
Directed Information (including transfer
entropy, Sims/Granger prediction, … )
Dynamic extensions to standard Markov properties,
local independence (Didelez 2008)
Pairwise ordering between variables DAGs, CPDAGs, MAGs, PAGs, ….
This is not an exhaustive list!
Generality of Markov Networks
For many probability distributions
Markov Networks that satisfy at least Local if not Global Markov Property
Distributional Assumptions Markov Networks
Exponential Families (Binary, Poisson, Circular, …)
Exponential MRFs including Binary Ising Models,
Poisson Graphical Models,


Nonparametric Distributions
Nonparanormal (copulas) Graphical Models,
Kernel Graphical Models
Separable Covariance Structure (Spatio-Temporal) Separable Markov Networks
(P. Ravikumar, G.I. Allen, and others)
(H. Liu, E. Xing, B. Scholkopf, and others)
(G.I. Allen, S. Zhou, A. Hero, P. Hoff, and many others)
From now on: Gaussian Graphical Model
Xk ? Xl|XV {k,l} () ⌃ 1
kl = 0 () (k, l) 62 E
• Graph G = (V, E)
• Vertices V = {1, 2, . . . p} and Edges E ⇢ V ⇥ V
• Multivariate Normal x1, . . . , xT
i.i.d
⇠ Np(0, ⌃)
• Inverse Covariance ⌃ 1
= ⇥
Zero in Inverse Covariance = Conditional Independence
3
1
5
2
4
Lauritzen (1996)
1 2 3 4 5
12345
From now on: Gaussian Graphical Model
Xk ? Xl|XV {k,l} () ⌃ 1
kl = 0 () (k, l) 62 E
• Graph G = (V, E)
• Vertices V = {1, 2, . . . p} and Edges E ⇢ V ⇥ V
• Multivariate Normal x1, . . . , xT
i.i.d
⇠ Np(0, ⌃)
• Inverse Covariance ⌃ 1
= ⇥
Zero in Inverse Covariance = Conditional Independence
3
1
5
2
4
Lauritzen (1996)
Important for
nonparametric

distributions 

+ 

exponential family
1 2 3 4 5
12345
Estimation in High Dimensions
Gaussian Log-Likelihood
Input to log-likelihood is effectively sample covariance
Likelihood for Inverse Covariance
“Covariance Selection”, Dempster, 1972; Banerjee et. al. (2006); Yuan (2006)
ˆ⌃ =
1
T
X>
X, Data matrix XT ⇥p is centered
L(ˆ⌃; ⇥) ⌘ log det ⇥
D
ˆ⌃, ⇥
E
Gaussian Log-Likelihood
Input to log-likelihood is effectively sample covariance
Likelihood for Inverse Covariance
“Covariance Selection”, Dempster, 1972; Banerjee et. al. (2006); Yuan (2006)
ˆ⌃ =
1
T
X>
X, Data matrix XT ⇥p is centered
L(ˆ⌃; ⇥) ⌘ log det ⇥
D
ˆ⌃, ⇥
E
Likelihood for Inverse Covariance
Put all variables on the same scale
“Covariance Selection”, Dempster, 1972; Banerjee et. al. (2006); Yuan (2006)
ˆ⌃ =
1
T
X>
X, R(ˆ⌃) = D
1
2 ˆ⌃D
1
2 , D = diag(ˆ⌃)
L(ˆ⌃; ⇥) ⌘ log det ⇥
D
R(ˆ⌃), ⇥
E
Gaussian Log-Likelihood
Variance-Correlation Decomposition
Degeneracy of Likelihood 

in High Dimensions
Credit: Negaban, Ravikumar, Wainwright & Yu, Statistical Science, 2012;
“A Unified Framework for High Dimensional Analysis of M-estimators with Decomposable Regularizers”
High curvature (Easy) Low curvature (Hard)
Given XT ⇥p, T ⇡ p
Encourage sparsity with Lasso penalty
Convex problem: Many optimization solutions available
Popular alternative if (L)=>(G): neighborhood selection
Sparse Inverse Covariance
Sparse penalized Maximum Likelihood
ˆ⇥( ) = maximize
⇥ 0
L(ˆ⌃; ⇥) k⇥k1,o↵,
k⇥k1,o↵ =
X
j6=k
|✓j,k|
“Covariance Selection”, Dempster, 1972; Banerjee et. al. (2006);
Yuan (2006); Friedman et. al (2008);“QUIC”, Hsieh et. al. (2011 & 2013); Buhlmann & Van De Geer (2011);
Meinshausen &
Buhlmann (2006)
Fisher Information (F) of the inverse covariance needs to
be well conditioned, not incoherent.
Signal strength of edges needs to be sufficiently larger
than noise
Caveat: Might always hold at infinite sample size but only
probabilistically in finite samples
Model Identifiability of Sparse MLE
When is perfect edge recovery possible?
Meinshausen et. al. 2006; Ravikumar et. al. (2010, 2011); 

Van De Geer & Buhlmann (2013); and others
Model Identifiability:

Network Structure Matters
Theoretical assumptions often
violated for many networks 

at finite samples
Narayan et. al (2015a)
Do two unconnected nodes,
share “mutual friends” ?

Increases with degree
Dependent on structure Meinshausen et. al. 2006;
Ravikumar et. al. (2010, 2011);
Cai & Zhou (2015)
More correlated nodes,
more errors in distinguishing edges from non-edges
We will only look at Lasso and its improved variants
Different estimators have slightly different limitations
Pseudolikelihood; Least squares; Dantzig type ….
Other regularizers behave differently as well 

Model Identifiability of Sparse MLE
When is perfect edge recovery possible?
See review on graphical models from Drton & Maathius (2016)
Features:
scikit-learn interface
Comprehensive range of estimators, model selection procedures,

metrics, monte-carlo benchmarks of statistical error control, …

For researcher: Benchmark new estimator/algorithm against others
For data analyst: Best practices for estimation & structure learning

Github repo: http://github.com/jasonlaska/skggm

Tutorial notebooks: http://neurostats.org/jf2016-skggm/
skggm: Inverse covariance estimation
By @jasonlaska and @mnarayan
Binder Instructions:
http://mybinder.org/repo/neuroquant/jf2016-skggm

Alternative/Backup: Install in local anaconda environment
Install skggm: 

pip install skggm
Download notebooks: 

git clone git@github.com:neuroquant/jf2016-skggm.git
skggm: Tutorial Setup
Tutorial: http://neurostats.org/jf2016-skggm/
Ground Truth
Toy Example: Simple Banded or Chain Network Structure
Saturated Precision Matrices
Saturation: Estimate all entries of inverse covariance (precision)
Recall: High curvature of likelihood, easy to distinguish different graphs
Saturated Precision Matrices
Degeneracy at low sample sizes. 

(Using pseudo-inverse for degenerate sample covariance)
Recall: Low curvature of likelihood, hard to distinguish different graphs
Standard Graphical Lasso
Model Selection: How do we choose regularization/sparsity/non-zero support?
ˆ⇥( ) = arg min
⇥ 0
L(ˆ⌃; ⇥) + Pen(⇥)
Friedman et al. 2007; Meinshausen and Buhlmann 2006; Banerjee et al. 2006;
Rothman 2008; Hsieh et al; Cai et al. 2011; and many more.
Sparse Penalized Maximum Likelihood
Cross Validation: Minimizes Type II
Yuan and Lin (2007); Bickel and Levina (2008)
X ! (X⇤,train
, X⇤,test
)
{ ˆ⇥⇤
( )}train
Training Hold-out
Loss({ ˆ⇥⇤
( )}train
; { ˆ⌃⇤
}test
)
E.g. Kullback-Leibler; Log-Likelihood
Extended BIC: Minimizes Type I
Foygel & Drton (2010) Alternatives (StARS, Liu et. al.)

Coming soon
Privileges sparser models than BIC
min BIC(ˆE( )) = min Ln(ˆ⌃; ⇥) + |ˆE| log(n) + 4 |ˆE| log(p)
ˆE
d
= no. of non-zeros in ˆ⇥( )
One-Stage vs. Two-Stage Estimators
Use initial estimates to reduce bias in estimation
Standard Graphical Lasso
Weighted Graphical Lasso
Stage I:
Stage 2: ˆ⇥( ) = maximize
⇥ 0
L(ˆ⌃; ⇥) kW ⇥k1,o↵,
kW ⇥k1,o↵ =
X
j6=k
|wj,k✓j,k|
ˆ⇥( ) = maximize
⇥ 0
L(ˆ⌃; ⇥) k⇥k1,o↵,
k⇥k1,o↵ =
X
j6=k
|✓j,k|
wjk =
1
|ˆ✓|in
jk
E.g. Adaptive weights
Zou 2006; Zhou et. al. 2011

Buhlmann & Van De Geer 2011;
Cai & Zhou 2015
Strong edges shrink less
Shrinkage of Edges: Lasso vs. Adaptive
Performance very dependent on weights i.e.
Need good separation between strong vs. weak edges
Coefficient
(entryofinversecovariance)
Regularization parameter (lambda)
All edges shrink by same value
Zou (2006)
Weights can be data dependent/adaptive

Stage I: Any estimator not just MLE

Stage II: Adaptive MLE
Use to create randomized model averaging
Locally linear approximations to non-convex penalties 

(coming soon to skggm)
Variety of Two-Stage Estimators
Weights can be specified in many ways
Adaptive Estimation: Zhou, Van De Geer, Buhlmann (2009);
Breheny and Huang (2011); Cai et. al. (2011) and others
High Sparsity Case
True Parameters
n
p = 75, degree = .15p
High Sample Size, High Sparsity
Adaptive Estimator improves on Initial Estimator
n
p = 75, degree = .15p
Difference in sparsity: 69,77
Support Error: 4.0, False Pos: 4.0, False Neg: 0.0
Difference in sparsity: 69,141
Support Error: 36.0, False Pos: 36.0, False Neg: 0.0
Low Sample Size, High Sparsity
Adaptivity less useful without good initial estimate
n
p = 15, degree = .15p
Difference in sparsity: 69,85
Support Error: 8.0, False Pos: 8.0, False Neg: 0.0
Difference in sparsity: 69,149
Support Error: 40.0, False Pos: 40.0, False Neg: 0.0
Moderate Sparsity Case
n
p = 75, degree = .4p
True Parameters
High Sample Size, Moderate Sparsity
Nodes more correlated with each other, but adaptivity still does well
n
p = 75, degree = .4p
Difference in sparsity: 115,129
Support Error: 7.0, False Pos: 7.0, False Neg: 0.0
Difference in sparsity: 115,169
Support Error: 27.0, False Pos: 27.0, False Neg: 0.0
Low Sample Size, Moderate Sparsity
Nodes more correlated with each other, more false negatives
n
p = 15, degree = .4p
Difference in sparsity: 115,135
Support Error: 22.0, False Pos: 16.0, False Neg: 6.0
Difference in sparsity: 115,111
Support Error: 18.0, False Pos: 8.0, False Neg: 10.0
Model Averaging & Stability Selection
For any initial estimator build an ensemble of estimators and aggregate
n
p = 15, degree
Threshold stability scores => Familywise error control over edges
Meinshausen &
Buhlmann (2010)
ˆ⇥⇤b
( ) = maximize
⇥ 0
L(ˆ⌃⇤b
; ⇥) Pen(W⇤b
( ) ⇥),
w⇤b
jk = w⇤b
kj 2 { /a, a }, with Ber(⇢), for j 6= k
Aggregate I
⇣
ˆ⇥⇤b
( ) 6= 0
⌘
Future plans include
Computational scalability (big-quic, support for Apache spark)
Monte-Carlo “unit-testing” of statistical error control
Novel case studies and more examples
Other estimator classes (pseudo-likelihood, non-convex, …)
Regularizers beyond sparsity: mixture of regularizers, …
Other Markov network models for time-series
Directed graphical models
skggm: Inverse covariance estimation
Version 0.1

More Related Content

What's hot

Introduction to Interpretable Machine Learning
Introduction to Interpretable Machine LearningIntroduction to Interpretable Machine Learning
Introduction to Interpretable Machine LearningNguyen Giang
 
Interpretability of machine learning
Interpretability of machine learningInterpretability of machine learning
Interpretability of machine learningDaiki Tanaka
 
Relational machine-learning
Relational machine-learningRelational machine-learning
Relational machine-learningBhushan Kotnis
 
Low rank models for recommender systems with limited preference information
Low rank models for recommender systems with limited preference informationLow rank models for recommender systems with limited preference information
Low rank models for recommender systems with limited preference informationEvgeny Frolov
 
$$ Formulating semantic image annotation as a supervised learning problem
$$ Formulating semantic image annotation as a supervised learning problem$$ Formulating semantic image annotation as a supervised learning problem
$$ Formulating semantic image annotation as a supervised learning problemmhmt82
 
Network Crossover Performance on NK Landscapes and Deceptive Problems
Network Crossover Performance on NK Landscapes and Deceptive ProblemsNetwork Crossover Performance on NK Landscapes and Deceptive Problems
Network Crossover Performance on NK Landscapes and Deceptive Problemshauschildm
 
Artificial neural networks and its application
Artificial neural networks and its applicationArtificial neural networks and its application
Artificial neural networks and its applicationHưng Đặng
 
Graph Neural Network for Phenotype Prediction
Graph Neural Network for Phenotype PredictionGraph Neural Network for Phenotype Prediction
Graph Neural Network for Phenotype Predictiontuxette
 
Study of Different Multi-instance Learning kNN Algorithms
Study of Different Multi-instance Learning kNN AlgorithmsStudy of Different Multi-instance Learning kNN Algorithms
Study of Different Multi-instance Learning kNN AlgorithmsEditor IJCATR
 
Current Approaches in Search Result Diversification
Current Approaches in Search Result DiversificationCurrent Approaches in Search Result Diversification
Current Approaches in Search Result DiversificationMario Sangiorgio
 
Combining co-expression and co-location for gene network inference in porcine...
Combining co-expression and co-location for gene network inference in porcine...Combining co-expression and co-location for gene network inference in porcine...
Combining co-expression and co-location for gene network inference in porcine...tuxette
 
Joint gene network inference with multiple samples: a bootstrapped consensual...
Joint gene network inference with multiple samples: a bootstrapped consensual...Joint gene network inference with multiple samples: a bootstrapped consensual...
Joint gene network inference with multiple samples: a bootstrapped consensual...tuxette
 
Artificial Neural Networks: Applications In Management
Artificial Neural Networks: Applications In ManagementArtificial Neural Networks: Applications In Management
Artificial Neural Networks: Applications In ManagementIOSR Journals
 
Nature Inspired Reasoning Applied in Semantic Web
Nature Inspired Reasoning Applied in Semantic WebNature Inspired Reasoning Applied in Semantic Web
Nature Inspired Reasoning Applied in Semantic Webguestecf0af
 
08 Exponential Random Graph Models (ERGM)
08 Exponential Random Graph Models (ERGM)08 Exponential Random Graph Models (ERGM)
08 Exponential Random Graph Models (ERGM)dnac
 
Show observe and tell giang nguyen
Show observe and tell   giang nguyenShow observe and tell   giang nguyen
Show observe and tell giang nguyenNguyen Giang
 

What's hot (20)

Introduction to Interpretable Machine Learning
Introduction to Interpretable Machine LearningIntroduction to Interpretable Machine Learning
Introduction to Interpretable Machine Learning
 
Interpretability of machine learning
Interpretability of machine learningInterpretability of machine learning
Interpretability of machine learning
 
Relational machine-learning
Relational machine-learningRelational machine-learning
Relational machine-learning
 
Temporal networks - Alain Barrat
Temporal networks - Alain BarratTemporal networks - Alain Barrat
Temporal networks - Alain Barrat
 
Low rank models for recommender systems with limited preference information
Low rank models for recommender systems with limited preference informationLow rank models for recommender systems with limited preference information
Low rank models for recommender systems with limited preference information
 
$$ Formulating semantic image annotation as a supervised learning problem
$$ Formulating semantic image annotation as a supervised learning problem$$ Formulating semantic image annotation as a supervised learning problem
$$ Formulating semantic image annotation as a supervised learning problem
 
Network Crossover Performance on NK Landscapes and Deceptive Problems
Network Crossover Performance on NK Landscapes and Deceptive ProblemsNetwork Crossover Performance on NK Landscapes and Deceptive Problems
Network Crossover Performance on NK Landscapes and Deceptive Problems
 
Declarative data analysis
Declarative data analysisDeclarative data analysis
Declarative data analysis
 
Artificial neural networks and its application
Artificial neural networks and its applicationArtificial neural networks and its application
Artificial neural networks and its application
 
Graph Neural Network for Phenotype Prediction
Graph Neural Network for Phenotype PredictionGraph Neural Network for Phenotype Prediction
Graph Neural Network for Phenotype Prediction
 
Jack
JackJack
Jack
 
Study of Different Multi-instance Learning kNN Algorithms
Study of Different Multi-instance Learning kNN AlgorithmsStudy of Different Multi-instance Learning kNN Algorithms
Study of Different Multi-instance Learning kNN Algorithms
 
MUMS: Bayesian, Fiducial, and Frequentist Conference - Spatially Informed Var...
MUMS: Bayesian, Fiducial, and Frequentist Conference - Spatially Informed Var...MUMS: Bayesian, Fiducial, and Frequentist Conference - Spatially Informed Var...
MUMS: Bayesian, Fiducial, and Frequentist Conference - Spatially Informed Var...
 
Current Approaches in Search Result Diversification
Current Approaches in Search Result DiversificationCurrent Approaches in Search Result Diversification
Current Approaches in Search Result Diversification
 
Combining co-expression and co-location for gene network inference in porcine...
Combining co-expression and co-location for gene network inference in porcine...Combining co-expression and co-location for gene network inference in porcine...
Combining co-expression and co-location for gene network inference in porcine...
 
Joint gene network inference with multiple samples: a bootstrapped consensual...
Joint gene network inference with multiple samples: a bootstrapped consensual...Joint gene network inference with multiple samples: a bootstrapped consensual...
Joint gene network inference with multiple samples: a bootstrapped consensual...
 
Artificial Neural Networks: Applications In Management
Artificial Neural Networks: Applications In ManagementArtificial Neural Networks: Applications In Management
Artificial Neural Networks: Applications In Management
 
Nature Inspired Reasoning Applied in Semantic Web
Nature Inspired Reasoning Applied in Semantic WebNature Inspired Reasoning Applied in Semantic Web
Nature Inspired Reasoning Applied in Semantic Web
 
08 Exponential Random Graph Models (ERGM)
08 Exponential Random Graph Models (ERGM)08 Exponential Random Graph Models (ERGM)
08 Exponential Random Graph Models (ERGM)
 
Show observe and tell giang nguyen
Show observe and tell   giang nguyenShow observe and tell   giang nguyen
Show observe and tell giang nguyen
 

Similar to Sparse inverse covariance estimation using skggm

Higher-order spectral graph clustering with motifs
Higher-order spectral graph clustering with motifsHigher-order spectral graph clustering with motifs
Higher-order spectral graph clustering with motifsAustin Benson
 
Graphical Models 4dummies
Graphical Models 4dummiesGraphical Models 4dummies
Graphical Models 4dummiesxamdam
 
As pi re2015_abstracts
As pi re2015_abstractsAs pi re2015_abstracts
As pi re2015_abstractsJoseph Park
 
Mahoney mlconf-nov13
Mahoney mlconf-nov13Mahoney mlconf-nov13
Mahoney mlconf-nov13MLconf
 
SVM Based Identification of Psychological Personality Using Handwritten Text
SVM Based Identification of Psychological Personality Using Handwritten Text SVM Based Identification of Psychological Personality Using Handwritten Text
SVM Based Identification of Psychological Personality Using Handwritten Text IJERA Editor
 
Presentation2 2000
Presentation2 2000Presentation2 2000
Presentation2 2000suvobgd
 
Differential privacy (개인정보 차등보호)
Differential privacy (개인정보 차등보호)Differential privacy (개인정보 차등보호)
Differential privacy (개인정보 차등보호)Young-Geun Choi
 
Updated (version 2.3 THRILLER) Easy Perspective to (Complexity)-Thriller 12 S...
Updated (version 2.3 THRILLER) Easy Perspective to (Complexity)-Thriller 12 S...Updated (version 2.3 THRILLER) Easy Perspective to (Complexity)-Thriller 12 S...
Updated (version 2.3 THRILLER) Easy Perspective to (Complexity)-Thriller 12 S...EmadfHABIB2
 
Network Biology: A paradigm for modeling biological complex systems
Network Biology: A paradigm for modeling biological complex systemsNetwork Biology: A paradigm for modeling biological complex systems
Network Biology: A paradigm for modeling biological complex systemsGanesh Bagler
 
High Dimensional Biological Data Analysis and Visualization
High Dimensional Biological Data Analysis and VisualizationHigh Dimensional Biological Data Analysis and Visualization
High Dimensional Biological Data Analysis and VisualizationDmitry Grapov
 
Representation Learning & Generative Modeling with Variational Autoencoder(VA...
Representation Learning & Generative Modeling with Variational Autoencoder(VA...Representation Learning & Generative Modeling with Variational Autoencoder(VA...
Representation Learning & Generative Modeling with Variational Autoencoder(VA...changedaeoh
 
nonlinear_rmt.pdf
nonlinear_rmt.pdfnonlinear_rmt.pdf
nonlinear_rmt.pdfGieTe
 
Gecco 2011 - Effects of Topology on the diversity of spatially-structured evo...
Gecco 2011 - Effects of Topology on the diversity of spatially-structured evo...Gecco 2011 - Effects of Topology on the diversity of spatially-structured evo...
Gecco 2011 - Effects of Topology on the diversity of spatially-structured evo...matteodefelice
 
Nonequilibrium Network Dynamics_Inference, Fluctuation-Respones & Tipping Poi...
Nonequilibrium Network Dynamics_Inference, Fluctuation-Respones & Tipping Poi...Nonequilibrium Network Dynamics_Inference, Fluctuation-Respones & Tipping Poi...
Nonequilibrium Network Dynamics_Inference, Fluctuation-Respones & Tipping Poi...Förderverein Technische Fakultät
 
A Diffusion Wavelet Approach For 3 D Model Matching
A Diffusion Wavelet Approach For 3 D Model MatchingA Diffusion Wavelet Approach For 3 D Model Matching
A Diffusion Wavelet Approach For 3 D Model Matchingrafi
 
Chris Dyer - 2017 - Neural MT Workshop Invited Talk: The Neural Noisy Channel...
Chris Dyer - 2017 - Neural MT Workshop Invited Talk: The Neural Noisy Channel...Chris Dyer - 2017 - Neural MT Workshop Invited Talk: The Neural Noisy Channel...
Chris Dyer - 2017 - Neural MT Workshop Invited Talk: The Neural Noisy Channel...Association for Computational Linguistics
 
UTS workshop talk
UTS workshop talkUTS workshop talk
UTS workshop talkLei Wang
 

Similar to Sparse inverse covariance estimation using skggm (20)

Higher-order spectral graph clustering with motifs
Higher-order spectral graph clustering with motifsHigher-order spectral graph clustering with motifs
Higher-order spectral graph clustering with motifs
 
Graphical Models 4dummies
Graphical Models 4dummiesGraphical Models 4dummies
Graphical Models 4dummies
 
CLIM Program: Remote Sensing Workshop, Multilayer Modeling and Analysis of Co...
CLIM Program: Remote Sensing Workshop, Multilayer Modeling and Analysis of Co...CLIM Program: Remote Sensing Workshop, Multilayer Modeling and Analysis of Co...
CLIM Program: Remote Sensing Workshop, Multilayer Modeling and Analysis of Co...
 
As pi re2015_abstracts
As pi re2015_abstractsAs pi re2015_abstracts
As pi re2015_abstracts
 
Mahoney mlconf-nov13
Mahoney mlconf-nov13Mahoney mlconf-nov13
Mahoney mlconf-nov13
 
SVM Based Identification of Psychological Personality Using Handwritten Text
SVM Based Identification of Psychological Personality Using Handwritten Text SVM Based Identification of Psychological Personality Using Handwritten Text
SVM Based Identification of Psychological Personality Using Handwritten Text
 
Presentation2 2000
Presentation2 2000Presentation2 2000
Presentation2 2000
 
Differential privacy (개인정보 차등보호)
Differential privacy (개인정보 차등보호)Differential privacy (개인정보 차등보호)
Differential privacy (개인정보 차등보호)
 
Updated (version 2.3 THRILLER) Easy Perspective to (Complexity)-Thriller 12 S...
Updated (version 2.3 THRILLER) Easy Perspective to (Complexity)-Thriller 12 S...Updated (version 2.3 THRILLER) Easy Perspective to (Complexity)-Thriller 12 S...
Updated (version 2.3 THRILLER) Easy Perspective to (Complexity)-Thriller 12 S...
 
Network Biology: A paradigm for modeling biological complex systems
Network Biology: A paradigm for modeling biological complex systemsNetwork Biology: A paradigm for modeling biological complex systems
Network Biology: A paradigm for modeling biological complex systems
 
High Dimensional Biological Data Analysis and Visualization
High Dimensional Biological Data Analysis and VisualizationHigh Dimensional Biological Data Analysis and Visualization
High Dimensional Biological Data Analysis and Visualization
 
Representation Learning & Generative Modeling with Variational Autoencoder(VA...
Representation Learning & Generative Modeling with Variational Autoencoder(VA...Representation Learning & Generative Modeling with Variational Autoencoder(VA...
Representation Learning & Generative Modeling with Variational Autoencoder(VA...
 
nonlinear_rmt.pdf
nonlinear_rmt.pdfnonlinear_rmt.pdf
nonlinear_rmt.pdf
 
Gecco 2011 - Effects of Topology on the diversity of spatially-structured evo...
Gecco 2011 - Effects of Topology on the diversity of spatially-structured evo...Gecco 2011 - Effects of Topology on the diversity of spatially-structured evo...
Gecco 2011 - Effects of Topology on the diversity of spatially-structured evo...
 
mlss
mlssmlss
mlss
 
2018 Modern Math Workshop - Nonparametric Regression and Classification for M...
2018 Modern Math Workshop - Nonparametric Regression and Classification for M...2018 Modern Math Workshop - Nonparametric Regression and Classification for M...
2018 Modern Math Workshop - Nonparametric Regression and Classification for M...
 
Nonequilibrium Network Dynamics_Inference, Fluctuation-Respones & Tipping Poi...
Nonequilibrium Network Dynamics_Inference, Fluctuation-Respones & Tipping Poi...Nonequilibrium Network Dynamics_Inference, Fluctuation-Respones & Tipping Poi...
Nonequilibrium Network Dynamics_Inference, Fluctuation-Respones & Tipping Poi...
 
A Diffusion Wavelet Approach For 3 D Model Matching
A Diffusion Wavelet Approach For 3 D Model MatchingA Diffusion Wavelet Approach For 3 D Model Matching
A Diffusion Wavelet Approach For 3 D Model Matching
 
Chris Dyer - 2017 - Neural MT Workshop Invited Talk: The Neural Noisy Channel...
Chris Dyer - 2017 - Neural MT Workshop Invited Talk: The Neural Noisy Channel...Chris Dyer - 2017 - Neural MT Workshop Invited Talk: The Neural Noisy Channel...
Chris Dyer - 2017 - Neural MT Workshop Invited Talk: The Neural Noisy Channel...
 
UTS workshop talk
UTS workshop talkUTS workshop talk
UTS workshop talk
 

Recently uploaded

Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...RohitNehra6
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Lokesh Kothari
 
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...jana861314
 
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡anilsa9823
 
Caco-2 cell permeability assay for drug absorption
Caco-2 cell permeability assay for drug absorptionCaco-2 cell permeability assay for drug absorption
Caco-2 cell permeability assay for drug absorptionPriyansha Singh
 
Grafana in space: Monitoring Japan's SLIM moon lander in real time
Grafana in space: Monitoring Japan's SLIM moon lander  in real timeGrafana in space: Monitoring Japan's SLIM moon lander  in real time
Grafana in space: Monitoring Japan's SLIM moon lander in real timeSatoshi NAKAHIRA
 
Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)PraveenaKalaiselvan1
 
Orientation, design and principles of polyhouse
Orientation, design and principles of polyhouseOrientation, design and principles of polyhouse
Orientation, design and principles of polyhousejana861314
 
Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Patrick Diehl
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksSérgio Sacani
 
A relative description on Sonoporation.pdf
A relative description on Sonoporation.pdfA relative description on Sonoporation.pdf
A relative description on Sonoporation.pdfnehabiju2046
 
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxSOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxkessiyaTpeter
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bSérgio Sacani
 
Cultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptxCultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptxpradhanghanshyam7136
 
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.aasikanpl
 
G9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.pptG9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.pptMAESTRELLAMesa2
 
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCESTERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCEPRINCE C P
 
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSpermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSarthak Sekhar Mondal
 
Biological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfBiological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfmuntazimhurra
 

Recently uploaded (20)

Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
 
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
 
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
 
Caco-2 cell permeability assay for drug absorption
Caco-2 cell permeability assay for drug absorptionCaco-2 cell permeability assay for drug absorption
Caco-2 cell permeability assay for drug absorption
 
Grafana in space: Monitoring Japan's SLIM moon lander in real time
Grafana in space: Monitoring Japan's SLIM moon lander  in real timeGrafana in space: Monitoring Japan's SLIM moon lander  in real time
Grafana in space: Monitoring Japan's SLIM moon lander in real time
 
Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)
 
Orientation, design and principles of polyhouse
Orientation, design and principles of polyhouseOrientation, design and principles of polyhouse
Orientation, design and principles of polyhouse
 
Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disks
 
A relative description on Sonoporation.pdf
A relative description on Sonoporation.pdfA relative description on Sonoporation.pdf
A relative description on Sonoporation.pdf
 
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxSOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
 
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
 
Cultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptxCultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptx
 
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
 
G9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.pptG9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.ppt
 
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCESTERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
 
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSpermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
 
Biological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfBiological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdf
 

Sparse inverse covariance estimation using skggm

  • 1. Manjari Narayan, 
 Postdoctoral Scholar, Stanford University (School of Medicine) 
 (PI: Amit Etkin, M.D., Ph.D.) Tutorial presented at Junior Scientist Workshop at HHMI, Janelia Farms Sparse Inverse Covariance Estimation using skggm skggm: Collaboration with Dr. Jason Laska, ML R&D at Clara Labs.
  • 2. Explosion of Functional Imaging Tools fMRI, fNIRS EEG, MEG Intracranial EEG, micro-ECoG Molecular fMRI Credit: Marie Suver, Ph.D. and Ainul Huda, University of Washington and Michael H. Dickinson, Ph.D., California Institute of Technology
 http://newsroom.cumc.columbia.edu/blog/2014/11/11/researchers- receive-nih-brain-initiative-funding/
 Calcium imaging Credit: Misha Ahrens, Ph.D., Janelia Farms 
 https://www.simonsfoundation.org/features/ foundation-news/how-do-different-brain-regions- interact-to-enhance-function/
 Light sheet microscopy Photo Credit: Tang, 2015, Scientific Reports. Voltage-sensitive
 Dye Imaging Light field microscopy Credit: Raju Tomer, Ph.D. & Deisseroth Lab, Stanford University
 http://techfinder.stanford.edu/technology_detail.php?ID=36402
  • 3. Application: Functional Connectomics Network as a unit of interest Unobserved stochastic dependence/interaction between neurons, circuits, regions, … Ahrens, et. al. Nature (2012) A shared goal across modalities & resolutions Macroscale Mesoscale T or n p
  • 4. Probabilistic Graphical Models Many probabilistic models available, both directed and undirected Graph G = (V, E) Vertices V = (1, . . . , p), Edges E ⇢ V ⌦ V X = (X1, . . . , Xp) ⇠ PX Probabilistic graphical model relates PX to G (j, k) 62 E () independence or conditional independence between Xj and Xk Graph G = (V, E) Vertices V = (1, . . . , p), Edges E ⇢ V ⌦ V X = (X1, . . . , Xp) ⇠ PX Probabilistic graphical model relates PX to G E () independence or conditional independence between Xj and Xk Observed: Unobserved: Examples Directed Acyclic Graphs (DAGs/Bayes-nets) State-Space Models including linear/nonlinear VAR, Undirected Graphical Models or Markov networks Bivariate associations (Correlation, Granger-Causality, Transfer Entropy)
  • 5. More informative than correlations: 
 
 A measure of “direct” interactions that elements “indirect” interactions due to observed common causes. 
 Benefits: Studying cognitive mechanisms Designing interventional targets Science-wide efficient use of data Models for Connectivity:
 Conditional Dependence & Markov Networks conditional dependence 
 (“partial correlations”) marginal dependence
 (“marginal correlations”)
  • 7. Markov Properties • Graph G = (V, E) • Vertices V = {1, 2, . . . p} and Edges E ⇢ V ⇥ V • Multivariate Normal x1, . . . , xT i.i.d ⇠ Np(0, ⌃) • Inverse Covariance ⌃ 1 = ⇥ 1 5 2 4 3 X5 ? X1|XV {1,5} Pairwise Markov Property (P): Two variables conditionally independent, given all other nodes Lauritzen (1996)
  • 8. Markov Properties • Graph G = (V, E) • Vertices V = {1, 2, . . . p} and Edges E ⇢ V ⇥ V • Multivariate Normal x1, . . . , xT i.i.d ⇠ Np(0, ⌃) • Inverse Covariance ⌃ 1 = ⇥ 1 5 2 4 3 X5 ? XV ne(5)|Xne(5), ne(5) = {2, 5} Local Markov Property (L): A variable is conditionally independent of all others given its neighbors Lauritzen (1996)
  • 9. Global Markov Property (G): Given 3 disjoint sets A, B and C such that 
 all paths between A to B go through C, then 
 A conditionally independent of B given C Markov Properties • Graph G = (V, E) • Vertices V = {1, 2, . . . p} and Edges E ⇢ V ⇥ V • Multivariate Normal x1, . . . , xT i.i.d ⇠ Np(0, ⌃) • Inverse Covariance ⌃ 1 = ⇥ 1 5 2 4 3 A B C XA ? XB|XC, where XA = {Xa}a2A Lauritzen (1996)
  • 10. Intersection property: Holds for positive densities, e.g. Gaussian
 Factorizes probability distribution
 Computational tractability & Statistical power 
 to identify all conditional independences Extended to some non-positive densities! Benefits of Global Markov Properties P(X) = P(XA|XC)P(XB|XC)P(XC) If A ? B|(C, D) and A ? C|(B, D) then A ? (B [ C)|D Lauritzen (1996) 1 5 2 4 3 A B C
  • 11. Generality of Markov Networks For many types of pairwise associations Markov Networks that satisfy Global Markov Property Pairwise Association Markov Networks Correlation Zero partial correlation = Conditional independence Coherence or Coherency Zero partial coherence = Conditional independence Directed Information (including transfer entropy, Sims/Granger prediction, … ) Dynamic extensions to standard Markov properties, local independence (Didelez 2008) Pairwise ordering between variables DAGs, CPDAGs, MAGs, PAGs, …. This is not an exhaustive list!
  • 12. Generality of Markov Networks For many probability distributions Markov Networks that satisfy at least Local if not Global Markov Property Distributional Assumptions Markov Networks Exponential Families (Binary, Poisson, Circular, …) Exponential MRFs including Binary Ising Models, Poisson Graphical Models, 
 Nonparametric Distributions Nonparanormal (copulas) Graphical Models, Kernel Graphical Models Separable Covariance Structure (Spatio-Temporal) Separable Markov Networks (P. Ravikumar, G.I. Allen, and others) (H. Liu, E. Xing, B. Scholkopf, and others) (G.I. Allen, S. Zhou, A. Hero, P. Hoff, and many others)
  • 13. From now on: Gaussian Graphical Model Xk ? Xl|XV {k,l} () ⌃ 1 kl = 0 () (k, l) 62 E • Graph G = (V, E) • Vertices V = {1, 2, . . . p} and Edges E ⇢ V ⇥ V • Multivariate Normal x1, . . . , xT i.i.d ⇠ Np(0, ⌃) • Inverse Covariance ⌃ 1 = ⇥ Zero in Inverse Covariance = Conditional Independence 3 1 5 2 4 Lauritzen (1996) 1 2 3 4 5 12345
  • 14. From now on: Gaussian Graphical Model Xk ? Xl|XV {k,l} () ⌃ 1 kl = 0 () (k, l) 62 E • Graph G = (V, E) • Vertices V = {1, 2, . . . p} and Edges E ⇢ V ⇥ V • Multivariate Normal x1, . . . , xT i.i.d ⇠ Np(0, ⌃) • Inverse Covariance ⌃ 1 = ⇥ Zero in Inverse Covariance = Conditional Independence 3 1 5 2 4 Lauritzen (1996) Important for nonparametric
 distributions 
 + 
 exponential family 1 2 3 4 5 12345
  • 15. Estimation in High Dimensions
  • 16. Gaussian Log-Likelihood Input to log-likelihood is effectively sample covariance Likelihood for Inverse Covariance “Covariance Selection”, Dempster, 1972; Banerjee et. al. (2006); Yuan (2006) ˆ⌃ = 1 T X> X, Data matrix XT ⇥p is centered L(ˆ⌃; ⇥) ⌘ log det ⇥ D ˆ⌃, ⇥ E
  • 17. Gaussian Log-Likelihood Input to log-likelihood is effectively sample covariance Likelihood for Inverse Covariance “Covariance Selection”, Dempster, 1972; Banerjee et. al. (2006); Yuan (2006) ˆ⌃ = 1 T X> X, Data matrix XT ⇥p is centered L(ˆ⌃; ⇥) ⌘ log det ⇥ D ˆ⌃, ⇥ E
  • 18. Likelihood for Inverse Covariance Put all variables on the same scale “Covariance Selection”, Dempster, 1972; Banerjee et. al. (2006); Yuan (2006) ˆ⌃ = 1 T X> X, R(ˆ⌃) = D 1 2 ˆ⌃D 1 2 , D = diag(ˆ⌃) L(ˆ⌃; ⇥) ⌘ log det ⇥ D R(ˆ⌃), ⇥ E Gaussian Log-Likelihood Variance-Correlation Decomposition
  • 19. Degeneracy of Likelihood 
 in High Dimensions Credit: Negaban, Ravikumar, Wainwright & Yu, Statistical Science, 2012; “A Unified Framework for High Dimensional Analysis of M-estimators with Decomposable Regularizers” High curvature (Easy) Low curvature (Hard) Given XT ⇥p, T ⇡ p
  • 20. Encourage sparsity with Lasso penalty Convex problem: Many optimization solutions available Popular alternative if (L)=>(G): neighborhood selection Sparse Inverse Covariance Sparse penalized Maximum Likelihood ˆ⇥( ) = maximize ⇥ 0 L(ˆ⌃; ⇥) k⇥k1,o↵, k⇥k1,o↵ = X j6=k |✓j,k| “Covariance Selection”, Dempster, 1972; Banerjee et. al. (2006); Yuan (2006); Friedman et. al (2008);“QUIC”, Hsieh et. al. (2011 & 2013); Buhlmann & Van De Geer (2011); Meinshausen & Buhlmann (2006)
  • 21. Fisher Information (F) of the inverse covariance needs to be well conditioned, not incoherent. Signal strength of edges needs to be sufficiently larger than noise Caveat: Might always hold at infinite sample size but only probabilistically in finite samples Model Identifiability of Sparse MLE When is perfect edge recovery possible? Meinshausen et. al. 2006; Ravikumar et. al. (2010, 2011); 
 Van De Geer & Buhlmann (2013); and others
  • 22. Model Identifiability:
 Network Structure Matters Theoretical assumptions often violated for many networks 
 at finite samples Narayan et. al (2015a) Do two unconnected nodes, share “mutual friends” ?
 Increases with degree Dependent on structure Meinshausen et. al. 2006; Ravikumar et. al. (2010, 2011); Cai & Zhou (2015) More correlated nodes, more errors in distinguishing edges from non-edges
  • 23. We will only look at Lasso and its improved variants Different estimators have slightly different limitations Pseudolikelihood; Least squares; Dantzig type …. Other regularizers behave differently as well 
 Model Identifiability of Sparse MLE When is perfect edge recovery possible? See review on graphical models from Drton & Maathius (2016)
  • 24. Features: scikit-learn interface Comprehensive range of estimators, model selection procedures,
 metrics, monte-carlo benchmarks of statistical error control, …
 For researcher: Benchmark new estimator/algorithm against others For data analyst: Best practices for estimation & structure learning
 Github repo: http://github.com/jasonlaska/skggm
 Tutorial notebooks: http://neurostats.org/jf2016-skggm/ skggm: Inverse covariance estimation By @jasonlaska and @mnarayan
  • 25. Binder Instructions: http://mybinder.org/repo/neuroquant/jf2016-skggm
 Alternative/Backup: Install in local anaconda environment Install skggm: 
 pip install skggm Download notebooks: 
 git clone git@github.com:neuroquant/jf2016-skggm.git skggm: Tutorial Setup Tutorial: http://neurostats.org/jf2016-skggm/
  • 26. Ground Truth Toy Example: Simple Banded or Chain Network Structure
  • 27. Saturated Precision Matrices Saturation: Estimate all entries of inverse covariance (precision) Recall: High curvature of likelihood, easy to distinguish different graphs
  • 28. Saturated Precision Matrices Degeneracy at low sample sizes. 
 (Using pseudo-inverse for degenerate sample covariance) Recall: Low curvature of likelihood, hard to distinguish different graphs
  • 29. Standard Graphical Lasso Model Selection: How do we choose regularization/sparsity/non-zero support? ˆ⇥( ) = arg min ⇥ 0 L(ˆ⌃; ⇥) + Pen(⇥) Friedman et al. 2007; Meinshausen and Buhlmann 2006; Banerjee et al. 2006; Rothman 2008; Hsieh et al; Cai et al. 2011; and many more. Sparse Penalized Maximum Likelihood
  • 30. Cross Validation: Minimizes Type II Yuan and Lin (2007); Bickel and Levina (2008) X ! (X⇤,train , X⇤,test ) { ˆ⇥⇤ ( )}train Training Hold-out Loss({ ˆ⇥⇤ ( )}train ; { ˆ⌃⇤ }test ) E.g. Kullback-Leibler; Log-Likelihood
  • 31. Extended BIC: Minimizes Type I Foygel & Drton (2010) Alternatives (StARS, Liu et. al.)
 Coming soon Privileges sparser models than BIC min BIC(ˆE( )) = min Ln(ˆ⌃; ⇥) + |ˆE| log(n) + 4 |ˆE| log(p) ˆE d = no. of non-zeros in ˆ⇥( )
  • 32. One-Stage vs. Two-Stage Estimators Use initial estimates to reduce bias in estimation Standard Graphical Lasso Weighted Graphical Lasso Stage I: Stage 2: ˆ⇥( ) = maximize ⇥ 0 L(ˆ⌃; ⇥) kW ⇥k1,o↵, kW ⇥k1,o↵ = X j6=k |wj,k✓j,k| ˆ⇥( ) = maximize ⇥ 0 L(ˆ⌃; ⇥) k⇥k1,o↵, k⇥k1,o↵ = X j6=k |✓j,k| wjk = 1 |ˆ✓|in jk E.g. Adaptive weights Zou 2006; Zhou et. al. 2011
 Buhlmann & Van De Geer 2011; Cai & Zhou 2015
  • 33. Strong edges shrink less Shrinkage of Edges: Lasso vs. Adaptive Performance very dependent on weights i.e. Need good separation between strong vs. weak edges Coefficient (entryofinversecovariance) Regularization parameter (lambda) All edges shrink by same value Zou (2006)
  • 34. Weights can be data dependent/adaptive
 Stage I: Any estimator not just MLE
 Stage II: Adaptive MLE Use to create randomized model averaging Locally linear approximations to non-convex penalties 
 (coming soon to skggm) Variety of Two-Stage Estimators Weights can be specified in many ways Adaptive Estimation: Zhou, Van De Geer, Buhlmann (2009); Breheny and Huang (2011); Cai et. al. (2011) and others
  • 35. High Sparsity Case True Parameters n p = 75, degree = .15p
  • 36. High Sample Size, High Sparsity Adaptive Estimator improves on Initial Estimator n p = 75, degree = .15p Difference in sparsity: 69,77 Support Error: 4.0, False Pos: 4.0, False Neg: 0.0 Difference in sparsity: 69,141 Support Error: 36.0, False Pos: 36.0, False Neg: 0.0
  • 37. Low Sample Size, High Sparsity Adaptivity less useful without good initial estimate n p = 15, degree = .15p Difference in sparsity: 69,85 Support Error: 8.0, False Pos: 8.0, False Neg: 0.0 Difference in sparsity: 69,149 Support Error: 40.0, False Pos: 40.0, False Neg: 0.0
  • 38. Moderate Sparsity Case n p = 75, degree = .4p True Parameters
  • 39. High Sample Size, Moderate Sparsity Nodes more correlated with each other, but adaptivity still does well n p = 75, degree = .4p Difference in sparsity: 115,129 Support Error: 7.0, False Pos: 7.0, False Neg: 0.0 Difference in sparsity: 115,169 Support Error: 27.0, False Pos: 27.0, False Neg: 0.0
  • 40. Low Sample Size, Moderate Sparsity Nodes more correlated with each other, more false negatives n p = 15, degree = .4p Difference in sparsity: 115,135 Support Error: 22.0, False Pos: 16.0, False Neg: 6.0 Difference in sparsity: 115,111 Support Error: 18.0, False Pos: 8.0, False Neg: 10.0
  • 41. Model Averaging & Stability Selection For any initial estimator build an ensemble of estimators and aggregate n p = 15, degree Threshold stability scores => Familywise error control over edges Meinshausen & Buhlmann (2010) ˆ⇥⇤b ( ) = maximize ⇥ 0 L(ˆ⌃⇤b ; ⇥) Pen(W⇤b ( ) ⇥), w⇤b jk = w⇤b kj 2 { /a, a }, with Ber(⇢), for j 6= k Aggregate I ⇣ ˆ⇥⇤b ( ) 6= 0 ⌘
  • 42. Future plans include Computational scalability (big-quic, support for Apache spark) Monte-Carlo “unit-testing” of statistical error control Novel case studies and more examples Other estimator classes (pseudo-likelihood, non-convex, …) Regularizers beyond sparsity: mixture of regularizers, … Other Markov network models for time-series Directed graphical models skggm: Inverse covariance estimation Version 0.1