SlideShare a Scribd company logo
1 of 21
Methods for Robust High Dimensional Graphical Model Selection
Kshitij Khare and Syed Rahman
Department of Statistics
University of Florida
Motivation
• Availability of high-dimensional data or “big data” from various applications
• Number of variables (p) much larger than (or sometimes comparable to) the
sample size (n)
• Examples:
Biology: gene expression data
Environmental science: climate data on spatial grid
Finance: returns on thousands of stocks
1 / 20
Motivation
SAMPLES	
  
VARIABLES/FEATURES	
  
2 / 20
Goal: Understanding relationships between variables
• Common goal in many applications: Understand complex network of relationships
between variables
• Covariance matrix: a fundamental quantity to help understand multivariate
relationships
• Even if estimating the covariance matrix is not the end goal, it is a crucial first
step before further analysis
3 / 20
Quick recap: What is a covariance matrix?
• The covariance of two variables/features (say two stock prices) is a measure of
linear dependence between these variables
• Positive covariance indicates similar behavior, Negative covariance indicates
opposite behavior, zero covariance indicates lack of linear dependence
4 / 20
Lets say we have five stock prices S1, S2, S3, S4, S5. The covariance matrix of these
five stocks looks like
S2	
  
S2	
  
S3	
  
S4	
  
S5	
  
S1	
  
S1	
  
S3	
   S4	
   S5	
  
5 / 20
Challenges in high-dimensional estimation
• Covariance matrix (often denoted by Σ) has O(p2) unknown parameters
• If p = 1000, we need to estimate roughly 1 million parameters
• If sample size n is much smaller (or even same order) than p, this is not viable
• The sample covariance matrix (classical estimator) can perform very poorly in
high-dimensional situations (not even invertible when n < p)
6 / 20
Is there a way out?
• Reliably estimate small number of parameters in Σ or Ω = Σ−1
• Set insignificant parameters to zero
• Gives rise to sparse estimates of Σ or Ω
• Sparsity pattern can be represented by graphs/networks
7 / 20
Concentration Graphical Models: Sparsity in Ω
• Assume Ω (inverse covariance matrix) is sparse: corresponds to assuming
conditional independences
• Sparsity pattern in Ω can be represented by an undirected graph G = (V , E)
• Build a graph from sparse Ω
Ω =
A B C
1 0.2 0.3 A
0.2 2 0 B
0.3 0 1.2 C
A
B C
8 / 20
Are these models useful? Appropriate?
• Many physical networks are assumed to be sparse
• Complex networks (internet, citation networks, social networks) tend to be sparse
[Newman, 2003]
• Genetic networks are sparse [Gardner et al, 2003, Jeong et al, 2001]
Model selection problem: How do we infer the underlying network/graph from data?
9 / 20
CONvex CORrelation selection methoD (CONCORD)
Obtain estimate of Ω by minimizing the objective function:
Qcon(Ω) =: −
p
i=1
n log ωii +
1
2
p
i=1
ωii Yi +
j=i
ωij Yj
2
2 + λ
1 i<j p
|ωij |
• The penalty term λ 1 i<j p |ωij | ensures that the minimizer is sparse
• λ (chosen by the user) controls the level of sparsity in the estimator
• Larger the λ, sparser the estimator
10 / 20
Minimization algorithm
• Direct minimization of Qcon not feasible
• Cyclic coordinate-wise minimization algorithm
1. Minimize [ωij ]1 (other coefficients held constant):
ωij ←
Sλ
n
− j =j ωij Sjj + i =i ωi j Sii
Sii + Sjj
2. Minimize [ωii ] (other coefficients held constant):
ωii ←
− j=i ωij Sij + j=i ωij Sij
2
+ 4Sii
2Sii
Repeat until convergence
1
Soft-thresholding operator: Sλ(x) = sign(x)(|x| − λ)+
11 / 20
CONCORD method: summary
METHOD
Property
NS
SPACE
SYMLASSO
SPLICE
CONCORD
Symmetry + + + +
Convergence guarantee (fixed n) N/A +
Asymptotic consistency (n, p → ∞) + + +
CONCORD retains all good properties of previous methods, and adds several attractive
features
12 / 20
Comparison with Sample Covariance Matrix
• When the sample size(n) is smaller than the number of variables(p), the sample
covariance matrix(S) is not even positive definite (and hence not invertible).
• In such a case, we HAVE to use CONCORD (or a comparable method) to get an
estimate.
• If n > p, we can consider S−1 as an estimate for Ω. However, S−1 will never be
sparse in general and is usually a poor estimate, especially if Ω is sparse.
13 / 20
Comparison with Sample Covariance Matrix continued ...
• For the true covariance matrix(Ω−1) for our numerical experiments we generated
50 × 50 positive definite matrix.
• Using the covariance matrix we then generated data for a sample of n = 60
observations (slightly larger than p = 50).
• We compared the accuracy of the sample covariance matrix and CONCORD using
the Frobenius Norm. The above experiment was repeated 100 times.
• The average Frobenius error for CONCORD is 0.4125151, while for the inverse
covariance matrix is 46.9759999.
Message: CONCORD is far superior to simply inverting S.
14 / 20
Leveraging strength of ECL
• One of the biggest advantages of ECL is distributed computing.
• However, CONCORD as it exists doesn’t lend itself easily to parallel computing.
• Even if we can run if on several nodes, the nodes need to communicate amongst
themselves due to the dependence structure of covariance matrices.
• How can we adapt CONCORD to leverage parallel computation?
15 / 20
Improvisation: Divide and Conquer
• Run concord for just a few iterations (around 10) until the corresponding graph
breaks into five to ten disjoint components.
• Run Concord afresh separately for each of these components in separately nodes.
• Each of these are completely independent and there is no need for any data
movement between these nodes.
• The overall performance in terms of time is greatly enhanced.
16 / 20
Illustrative example
• In our example with p = 50, we run CONCORD for 10 iterations.
• Depending on the value of the penalization parameter, λ the graph is broken up
into 5 or 6 components.
• Run CONCORD on the full dataset till convergence takes 15 minutes and 30
seconds.
• Improvised method: Concord for 10 iterations: 2 minutes 32 seconds Running a
fresh concord for each of these takes less than 3 minutes
Improvised method reduces overall method time by 66%.
17 / 20
Illustrative example
Full 1 2 3 4 5
Comparison of Concord Implementations
Implementation
Timeinseconds
0200400600800
18 / 20
Algorithm 1 CONCORD pseudocode
Input: Compute the sample covariance matrix S
Input: Fix maximum number of iterations: rmax
Input: Fix initial estimate: ˆΩ(0)
Input: Fix convergence threshold:
Set r ← 1
Set converged = FALSE
repeat
ˆΩold
= ˆΩcurrent
updates to partial covariances Ωij
for i = 1,...p − 1 do
for j = i + 1,...p do
ˆωcurrent
ij =
S λ
n
(−( j =j ωcurrent
ij sjj + i =i ωcurrent
i j sii ))
sii + sjj
(1)
where Sλ (x) := sign(x)(|x| − λ)+
end for
end for
updates to partial variances Ωii
for i = 1,...p do
ˆωcurrent
ii =
− k=i sik ωcurrent
ki + ( k=i sik ωcurrent
ki )2 + 4sii
2sii
(2)
end for
Convergence checking
if ˆΩold
− ˆΩcurrent
max
< then
converged = TRUE
else
r ← r + 1
end if
until converged=TRUE or r > rmax
return ˆΩ(r)
19 / 20
ECL implementation of CONCORD
• ECL has been implemented in CONCORD as part of the machine learning library.
• If n > p, use ML.PopulationEstimate.ConcordV1. If n < p, use
ML.PopulationEstimate.ConcordV2. Or simply use
ML.PopulationEstimate.InverseCovariance.
• ML.PopulationEstimate.ConcordV2(Y:=data,lambda:=10,
maxiter:=100,tol:=0.00001)
• Help/documentation is available at
https://concordinecl.wordpress.com/guide-to-using-concord-in-ecl/
20 / 20

More Related Content

What's hot

Unit 3 random number generation, random-variate generation
Unit 3 random number generation, random-variate generationUnit 3 random number generation, random-variate generation
Unit 3 random number generation, random-variate generationraksharao
 
Emergence of Invariance and Disentangling in Deep Representations
Emergence of Invariance and Disentangling in Deep RepresentationsEmergence of Invariance and Disentangling in Deep Representations
Emergence of Invariance and Disentangling in Deep RepresentationsSangwoo Mo
 
14 dimentionality reduction
14 dimentionality reduction14 dimentionality reduction
14 dimentionality reductionTanmayVijay1
 
Deep Learning Theory Seminar (Chap 1-2, part 1)
Deep Learning Theory Seminar (Chap 1-2, part 1)Deep Learning Theory Seminar (Chap 1-2, part 1)
Deep Learning Theory Seminar (Chap 1-2, part 1)Sangwoo Mo
 
13 unsupervised learning clustering
13 unsupervised learning   clustering13 unsupervised learning   clustering
13 unsupervised learning clusteringTanmayVijay1
 
8 neural network representation
8 neural network representation8 neural network representation
8 neural network representationTanmayVijay1
 
Learning Theory 101 ...and Towards Learning the Flat Minima
Learning Theory 101 ...and Towards Learning the Flat MinimaLearning Theory 101 ...and Towards Learning the Flat Minima
Learning Theory 101 ...and Towards Learning the Flat MinimaSangwoo Mo
 
4 linear regeression with multiple variables
4 linear regeression with multiple variables4 linear regeression with multiple variables
4 linear regeression with multiple variablesTanmayVijay1
 
Explicit Density Models
Explicit Density ModelsExplicit Density Models
Explicit Density ModelsSangwoo Mo
 
9 neural network learning
9 neural network learning9 neural network learning
9 neural network learningTanmayVijay1
 
Sharpness-aware minimization (SAM)
Sharpness-aware minimization (SAM)Sharpness-aware minimization (SAM)
Sharpness-aware minimization (SAM)Sangwoo Mo
 
12 support vector machines
12 support vector machines12 support vector machines
12 support vector machinesTanmayVijay1
 
Hungarian Method
Hungarian MethodHungarian Method
Hungarian MethodAritra7469
 
Assignment problem
Assignment problemAssignment problem
Assignment problemAbu Bashar
 
Regression Analysis and model comparison on the Boston Housing Data
Regression Analysis and model comparison on the Boston Housing DataRegression Analysis and model comparison on the Boston Housing Data
Regression Analysis and model comparison on the Boston Housing DataShivaram Prakash
 
computervision project
computervision projectcomputervision project
computervision projectLianli Liu
 
Signal Flow Graph
Signal Flow GraphSignal Flow Graph
Signal Flow GraphPreet_patel
 
15 anomaly detection
15 anomaly detection15 anomaly detection
15 anomaly detectionTanmayVijay1
 
Introduction to Diffusion Models
Introduction to Diffusion ModelsIntroduction to Diffusion Models
Introduction to Diffusion ModelsSangwoo Mo
 
Two Pseudo-random Number Generators, an Overview
Two Pseudo-random Number Generators, an Overview Two Pseudo-random Number Generators, an Overview
Two Pseudo-random Number Generators, an Overview Kato Mivule
 

What's hot (20)

Unit 3 random number generation, random-variate generation
Unit 3 random number generation, random-variate generationUnit 3 random number generation, random-variate generation
Unit 3 random number generation, random-variate generation
 
Emergence of Invariance and Disentangling in Deep Representations
Emergence of Invariance and Disentangling in Deep RepresentationsEmergence of Invariance and Disentangling in Deep Representations
Emergence of Invariance and Disentangling in Deep Representations
 
14 dimentionality reduction
14 dimentionality reduction14 dimentionality reduction
14 dimentionality reduction
 
Deep Learning Theory Seminar (Chap 1-2, part 1)
Deep Learning Theory Seminar (Chap 1-2, part 1)Deep Learning Theory Seminar (Chap 1-2, part 1)
Deep Learning Theory Seminar (Chap 1-2, part 1)
 
13 unsupervised learning clustering
13 unsupervised learning   clustering13 unsupervised learning   clustering
13 unsupervised learning clustering
 
8 neural network representation
8 neural network representation8 neural network representation
8 neural network representation
 
Learning Theory 101 ...and Towards Learning the Flat Minima
Learning Theory 101 ...and Towards Learning the Flat MinimaLearning Theory 101 ...and Towards Learning the Flat Minima
Learning Theory 101 ...and Towards Learning the Flat Minima
 
4 linear regeression with multiple variables
4 linear regeression with multiple variables4 linear regeression with multiple variables
4 linear regeression with multiple variables
 
Explicit Density Models
Explicit Density ModelsExplicit Density Models
Explicit Density Models
 
9 neural network learning
9 neural network learning9 neural network learning
9 neural network learning
 
Sharpness-aware minimization (SAM)
Sharpness-aware minimization (SAM)Sharpness-aware minimization (SAM)
Sharpness-aware minimization (SAM)
 
12 support vector machines
12 support vector machines12 support vector machines
12 support vector machines
 
Hungarian Method
Hungarian MethodHungarian Method
Hungarian Method
 
Assignment problem
Assignment problemAssignment problem
Assignment problem
 
Regression Analysis and model comparison on the Boston Housing Data
Regression Analysis and model comparison on the Boston Housing DataRegression Analysis and model comparison on the Boston Housing Data
Regression Analysis and model comparison on the Boston Housing Data
 
computervision project
computervision projectcomputervision project
computervision project
 
Signal Flow Graph
Signal Flow GraphSignal Flow Graph
Signal Flow Graph
 
15 anomaly detection
15 anomaly detection15 anomaly detection
15 anomaly detection
 
Introduction to Diffusion Models
Introduction to Diffusion ModelsIntroduction to Diffusion Models
Introduction to Diffusion Models
 
Two Pseudo-random Number Generators, an Overview
Two Pseudo-random Number Generators, an Overview Two Pseudo-random Number Generators, an Overview
Two Pseudo-random Number Generators, an Overview
 

Viewers also liked

Supply7 overview
Supply7   overviewSupply7   overview
Supply7 overviewSUPPLY7
 
Seminario biologia molecular l. donovani
Seminario biologia molecular  l. donovaniSeminario biologia molecular  l. donovani
Seminario biologia molecular l. donovaniAna Parada
 
Personalizing Forum Search using Multidimensional Random Walks
Personalizing Forum Search using Multidimensional Random WalksPersonalizing Forum Search using Multidimensional Random Walks
Personalizing Forum Search using Multidimensional Random WalksAmélie Marian
 
Collaboration with nasa arc
Collaboration with nasa arcCollaboration with nasa arc
Collaboration with nasa arcCarlos Duarte
 
Leaving a Godly Inheritance
Leaving a Godly InheritanceLeaving a Godly Inheritance
Leaving a Godly InheritanceVictorias Church
 
Fashion workweek
Fashion workweekFashion workweek
Fashion workweeksroth222
 
Primadona operei din milano
Primadona operei din milanoPrimadona operei din milano
Primadona operei din milanoBargan Ivan
 
Gerencia de Proyectos-UDES-Mapa conceptual
Gerencia de Proyectos-UDES-Mapa conceptualGerencia de Proyectos-UDES-Mapa conceptual
Gerencia de Proyectos-UDES-Mapa conceptualLeo Zeta
 
What's your job?
What's your job?What's your job?
What's your job?Freelancer
 
Costume Ideas
Costume IdeasCostume Ideas
Costume Ideassmdoyle
 
The engagement formula
The engagement formula The engagement formula
The engagement formula Vikas Sinhmar
 

Viewers also liked (20)

Supply7 overview
Supply7   overviewSupply7   overview
Supply7 overview
 
Coca Cola
Coca ColaCoca Cola
Coca Cola
 
Seminario biologia molecular l. donovani
Seminario biologia molecular  l. donovaniSeminario biologia molecular  l. donovani
Seminario biologia molecular l. donovani
 
Personalizing Forum Search using Multidimensional Random Walks
Personalizing Forum Search using Multidimensional Random WalksPersonalizing Forum Search using Multidimensional Random Walks
Personalizing Forum Search using Multidimensional Random Walks
 
Bet-the-Farm User Experience
Bet-the-Farm User ExperienceBet-the-Farm User Experience
Bet-the-Farm User Experience
 
Collaboration with nasa arc
Collaboration with nasa arcCollaboration with nasa arc
Collaboration with nasa arc
 
Leaving a Godly Inheritance
Leaving a Godly InheritanceLeaving a Godly Inheritance
Leaving a Godly Inheritance
 
Fashion workweek
Fashion workweekFashion workweek
Fashion workweek
 
Primadona operei din milano
Primadona operei din milanoPrimadona operei din milano
Primadona operei din milano
 
Gerencia de Proyectos-UDES-Mapa conceptual
Gerencia de Proyectos-UDES-Mapa conceptualGerencia de Proyectos-UDES-Mapa conceptual
Gerencia de Proyectos-UDES-Mapa conceptual
 
Si spersonalizzante
Si spersonalizzanteSi spersonalizzante
Si spersonalizzante
 
1
11
1
 
Chevrolet camaro
Chevrolet camaroChevrolet camaro
Chevrolet camaro
 
Congiuntivo int 1_parte 1
Congiuntivo int 1_parte 1Congiuntivo int 1_parte 1
Congiuntivo int 1_parte 1
 
What's your job?
What's your job?What's your job?
What's your job?
 
Chocolate
Chocolate Chocolate
Chocolate
 
Costume Ideas
Costume IdeasCostume Ideas
Costume Ideas
 
Idioms
IdiomsIdioms
Idioms
 
Elaph maroc espagne
Elaph maroc espagneElaph maroc espagne
Elaph maroc espagne
 
The engagement formula
The engagement formula The engagement formula
The engagement formula
 

Similar to High-Dimensional Network Estimation using ECL

Conducting and reporting the results of a cfd simulation
Conducting and reporting the results of a cfd simulationConducting and reporting the results of a cfd simulation
Conducting and reporting the results of a cfd simulationMalik Abdul Wahab
 
Reporting.ppt
Reporting.pptReporting.ppt
Reporting.pptasodiatel
 
Tutorial on Markov Random Fields (MRFs) for Computer Vision Applications
Tutorial on Markov Random Fields (MRFs) for Computer Vision ApplicationsTutorial on Markov Random Fields (MRFs) for Computer Vision Applications
Tutorial on Markov Random Fields (MRFs) for Computer Vision ApplicationsAnmol Dwivedi
 
Efficient anomaly detection via matrix sketching
Efficient anomaly detection via matrix sketchingEfficient anomaly detection via matrix sketching
Efficient anomaly detection via matrix sketchingHsing-chuan Hsieh
 
Reweighting and Boosting to uniforimty in HEP
Reweighting and Boosting to uniforimty in HEPReweighting and Boosting to uniforimty in HEP
Reweighting and Boosting to uniforimty in HEParogozhnikov
 
Graph Analysis Beyond Linear Algebra
Graph Analysis Beyond Linear AlgebraGraph Analysis Beyond Linear Algebra
Graph Analysis Beyond Linear AlgebraJason Riedy
 
A Performance Analysis of Self-* Evolutionary Algorithms on Networks with Cor...
A Performance Analysis of Self-* Evolutionary Algorithms on Networks with Cor...A Performance Analysis of Self-* Evolutionary Algorithms on Networks with Cor...
A Performance Analysis of Self-* Evolutionary Algorithms on Networks with Cor...Rafael Nogueras
 
Self-sampling Strategies for Multimemetic Algorithms in Unstable Computationa...
Self-sampling Strategies for Multimemetic Algorithms in Unstable Computationa...Self-sampling Strategies for Multimemetic Algorithms in Unstable Computationa...
Self-sampling Strategies for Multimemetic Algorithms in Unstable Computationa...Rafael Nogueras
 
SVD and the Netflix Dataset
SVD and the Netflix DatasetSVD and the Netflix Dataset
SVD and the Netflix DatasetBen Mabey
 
Score-Based Generative Modeling through Stochastic Differential Equations
Score-Based Generative Modeling through Stochastic Differential EquationsScore-Based Generative Modeling through Stochastic Differential Equations
Score-Based Generative Modeling through Stochastic Differential EquationsSangwoo Mo
 
Report on Efficient Estimation for High Similarities using Odd Sketches
Report on Efficient Estimation for High Similarities using Odd Sketches  Report on Efficient Estimation for High Similarities using Odd Sketches
Report on Efficient Estimation for High Similarities using Odd Sketches AXEL FOTSO
 
AIAA-MAO-RegionalError-2012
AIAA-MAO-RegionalError-2012AIAA-MAO-RegionalError-2012
AIAA-MAO-RegionalError-2012OptiModel
 
My Postdoctoral Research
My Postdoctoral ResearchMy Postdoctoral Research
My Postdoctoral ResearchPo-Ting Wu
 
Nonlinear dimension reduction
Nonlinear dimension reductionNonlinear dimension reduction
Nonlinear dimension reductionYan Xu
 
Monte Carlo Berkeley.pptx
Monte Carlo Berkeley.pptxMonte Carlo Berkeley.pptx
Monte Carlo Berkeley.pptxHaibinSu2
 

Similar to High-Dimensional Network Estimation using ECL (20)

Conducting and reporting the results of a cfd simulation
Conducting and reporting the results of a cfd simulationConducting and reporting the results of a cfd simulation
Conducting and reporting the results of a cfd simulation
 
Climate Extremes Workshop - Assessing models for estimation and methods for u...
Climate Extremes Workshop - Assessing models for estimation and methods for u...Climate Extremes Workshop - Assessing models for estimation and methods for u...
Climate Extremes Workshop - Assessing models for estimation and methods for u...
 
Reporting.ppt
Reporting.pptReporting.ppt
Reporting.ppt
 
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
 
Tutorial on Markov Random Fields (MRFs) for Computer Vision Applications
Tutorial on Markov Random Fields (MRFs) for Computer Vision ApplicationsTutorial on Markov Random Fields (MRFs) for Computer Vision Applications
Tutorial on Markov Random Fields (MRFs) for Computer Vision Applications
 
Efficient anomaly detection via matrix sketching
Efficient anomaly detection via matrix sketchingEfficient anomaly detection via matrix sketching
Efficient anomaly detection via matrix sketching
 
Reweighting and Boosting to uniforimty in HEP
Reweighting and Boosting to uniforimty in HEPReweighting and Boosting to uniforimty in HEP
Reweighting and Boosting to uniforimty in HEP
 
Convergence Analysis
Convergence AnalysisConvergence Analysis
Convergence Analysis
 
Graph Analysis Beyond Linear Algebra
Graph Analysis Beyond Linear AlgebraGraph Analysis Beyond Linear Algebra
Graph Analysis Beyond Linear Algebra
 
A Performance Analysis of Self-* Evolutionary Algorithms on Networks with Cor...
A Performance Analysis of Self-* Evolutionary Algorithms on Networks with Cor...A Performance Analysis of Self-* Evolutionary Algorithms on Networks with Cor...
A Performance Analysis of Self-* Evolutionary Algorithms on Networks with Cor...
 
Self-sampling Strategies for Multimemetic Algorithms in Unstable Computationa...
Self-sampling Strategies for Multimemetic Algorithms in Unstable Computationa...Self-sampling Strategies for Multimemetic Algorithms in Unstable Computationa...
Self-sampling Strategies for Multimemetic Algorithms in Unstable Computationa...
 
SVD and the Netflix Dataset
SVD and the Netflix DatasetSVD and the Netflix Dataset
SVD and the Netflix Dataset
 
Score-Based Generative Modeling through Stochastic Differential Equations
Score-Based Generative Modeling through Stochastic Differential EquationsScore-Based Generative Modeling through Stochastic Differential Equations
Score-Based Generative Modeling through Stochastic Differential Equations
 
Report on Efficient Estimation for High Similarities using Odd Sketches
Report on Efficient Estimation for High Similarities using Odd Sketches  Report on Efficient Estimation for High Similarities using Odd Sketches
Report on Efficient Estimation for High Similarities using Odd Sketches
 
17_monte_carlo.pdf
17_monte_carlo.pdf17_monte_carlo.pdf
17_monte_carlo.pdf
 
AIAA-MAO-RegionalError-2012
AIAA-MAO-RegionalError-2012AIAA-MAO-RegionalError-2012
AIAA-MAO-RegionalError-2012
 
My Postdoctoral Research
My Postdoctoral ResearchMy Postdoctoral Research
My Postdoctoral Research
 
Nonlinear dimension reduction
Nonlinear dimension reductionNonlinear dimension reduction
Nonlinear dimension reduction
 
Monte Carlo Berkeley.pptx
Monte Carlo Berkeley.pptxMonte Carlo Berkeley.pptx
Monte Carlo Berkeley.pptx
 
Group Project
Group ProjectGroup Project
Group Project
 

More from HPCC Systems

Natural Language to SQL Query conversion using Machine Learning Techniques on...
Natural Language to SQL Query conversion using Machine Learning Techniques on...Natural Language to SQL Query conversion using Machine Learning Techniques on...
Natural Language to SQL Query conversion using Machine Learning Techniques on...HPCC Systems
 
Improving Efficiency of Machine Learning Algorithms using HPCC Systems
Improving Efficiency of Machine Learning Algorithms using HPCC SystemsImproving Efficiency of Machine Learning Algorithms using HPCC Systems
Improving Efficiency of Machine Learning Algorithms using HPCC SystemsHPCC Systems
 
Towards Trustable AI for Complex Systems
Towards Trustable AI for Complex SystemsTowards Trustable AI for Complex Systems
Towards Trustable AI for Complex SystemsHPCC Systems
 
Closing / Adjourn
Closing / Adjourn Closing / Adjourn
Closing / Adjourn HPCC Systems
 
Community Website: Virtual Ribbon Cutting
Community Website: Virtual Ribbon CuttingCommunity Website: Virtual Ribbon Cutting
Community Website: Virtual Ribbon CuttingHPCC Systems
 
Release Cycle Changes
Release Cycle ChangesRelease Cycle Changes
Release Cycle ChangesHPCC Systems
 
Geohashing with Uber’s H3 Geospatial Index
Geohashing with Uber’s H3 Geospatial Index Geohashing with Uber’s H3 Geospatial Index
Geohashing with Uber’s H3 Geospatial Index HPCC Systems
 
Advancements in HPCC Systems Machine Learning
Advancements in HPCC Systems Machine LearningAdvancements in HPCC Systems Machine Learning
Advancements in HPCC Systems Machine LearningHPCC Systems
 
Expanding HPCC Systems Deep Neural Network Capabilities
Expanding HPCC Systems Deep Neural Network CapabilitiesExpanding HPCC Systems Deep Neural Network Capabilities
Expanding HPCC Systems Deep Neural Network CapabilitiesHPCC Systems
 
Leveraging Intra-Node Parallelization in HPCC Systems
Leveraging Intra-Node Parallelization in HPCC SystemsLeveraging Intra-Node Parallelization in HPCC Systems
Leveraging Intra-Node Parallelization in HPCC SystemsHPCC Systems
 
DataPatterns - Profiling in ECL Watch
DataPatterns - Profiling in ECL Watch DataPatterns - Profiling in ECL Watch
DataPatterns - Profiling in ECL Watch HPCC Systems
 
Leveraging the Spark-HPCC Ecosystem
Leveraging the Spark-HPCC Ecosystem Leveraging the Spark-HPCC Ecosystem
Leveraging the Spark-HPCC Ecosystem HPCC Systems
 
Work Unit Analysis Tool
Work Unit Analysis ToolWork Unit Analysis Tool
Work Unit Analysis ToolHPCC Systems
 
Community Award Ceremony
Community Award Ceremony Community Award Ceremony
Community Award Ceremony HPCC Systems
 
Dapper Tool - A Bundle to Make your ECL Neater
Dapper Tool - A Bundle to Make your ECL NeaterDapper Tool - A Bundle to Make your ECL Neater
Dapper Tool - A Bundle to Make your ECL NeaterHPCC Systems
 
A Success Story of Challenging the Status Quo: Gadget Girls and the Inclusion...
A Success Story of Challenging the Status Quo: Gadget Girls and the Inclusion...A Success Story of Challenging the Status Quo: Gadget Girls and the Inclusion...
A Success Story of Challenging the Status Quo: Gadget Girls and the Inclusion...HPCC Systems
 
Beyond the Spectrum – Creating an Environment of Diversity and Empowerment wi...
Beyond the Spectrum – Creating an Environment of Diversity and Empowerment wi...Beyond the Spectrum – Creating an Environment of Diversity and Empowerment wi...
Beyond the Spectrum – Creating an Environment of Diversity and Empowerment wi...HPCC Systems
 

More from HPCC Systems (20)

Natural Language to SQL Query conversion using Machine Learning Techniques on...
Natural Language to SQL Query conversion using Machine Learning Techniques on...Natural Language to SQL Query conversion using Machine Learning Techniques on...
Natural Language to SQL Query conversion using Machine Learning Techniques on...
 
Improving Efficiency of Machine Learning Algorithms using HPCC Systems
Improving Efficiency of Machine Learning Algorithms using HPCC SystemsImproving Efficiency of Machine Learning Algorithms using HPCC Systems
Improving Efficiency of Machine Learning Algorithms using HPCC Systems
 
Towards Trustable AI for Complex Systems
Towards Trustable AI for Complex SystemsTowards Trustable AI for Complex Systems
Towards Trustable AI for Complex Systems
 
Welcome
WelcomeWelcome
Welcome
 
Closing / Adjourn
Closing / Adjourn Closing / Adjourn
Closing / Adjourn
 
Community Website: Virtual Ribbon Cutting
Community Website: Virtual Ribbon CuttingCommunity Website: Virtual Ribbon Cutting
Community Website: Virtual Ribbon Cutting
 
Path to 8.0
Path to 8.0 Path to 8.0
Path to 8.0
 
Release Cycle Changes
Release Cycle ChangesRelease Cycle Changes
Release Cycle Changes
 
Geohashing with Uber’s H3 Geospatial Index
Geohashing with Uber’s H3 Geospatial Index Geohashing with Uber’s H3 Geospatial Index
Geohashing with Uber’s H3 Geospatial Index
 
Advancements in HPCC Systems Machine Learning
Advancements in HPCC Systems Machine LearningAdvancements in HPCC Systems Machine Learning
Advancements in HPCC Systems Machine Learning
 
Docker Support
Docker Support Docker Support
Docker Support
 
Expanding HPCC Systems Deep Neural Network Capabilities
Expanding HPCC Systems Deep Neural Network CapabilitiesExpanding HPCC Systems Deep Neural Network Capabilities
Expanding HPCC Systems Deep Neural Network Capabilities
 
Leveraging Intra-Node Parallelization in HPCC Systems
Leveraging Intra-Node Parallelization in HPCC SystemsLeveraging Intra-Node Parallelization in HPCC Systems
Leveraging Intra-Node Parallelization in HPCC Systems
 
DataPatterns - Profiling in ECL Watch
DataPatterns - Profiling in ECL Watch DataPatterns - Profiling in ECL Watch
DataPatterns - Profiling in ECL Watch
 
Leveraging the Spark-HPCC Ecosystem
Leveraging the Spark-HPCC Ecosystem Leveraging the Spark-HPCC Ecosystem
Leveraging the Spark-HPCC Ecosystem
 
Work Unit Analysis Tool
Work Unit Analysis ToolWork Unit Analysis Tool
Work Unit Analysis Tool
 
Community Award Ceremony
Community Award Ceremony Community Award Ceremony
Community Award Ceremony
 
Dapper Tool - A Bundle to Make your ECL Neater
Dapper Tool - A Bundle to Make your ECL NeaterDapper Tool - A Bundle to Make your ECL Neater
Dapper Tool - A Bundle to Make your ECL Neater
 
A Success Story of Challenging the Status Quo: Gadget Girls and the Inclusion...
A Success Story of Challenging the Status Quo: Gadget Girls and the Inclusion...A Success Story of Challenging the Status Quo: Gadget Girls and the Inclusion...
A Success Story of Challenging the Status Quo: Gadget Girls and the Inclusion...
 
Beyond the Spectrum – Creating an Environment of Diversity and Empowerment wi...
Beyond the Spectrum – Creating an Environment of Diversity and Empowerment wi...Beyond the Spectrum – Creating an Environment of Diversity and Empowerment wi...
Beyond the Spectrum – Creating an Environment of Diversity and Empowerment wi...
 

Recently uploaded

B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxStephen266013
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort servicejennyeacort
 
9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home ServiceSapana Sha
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样vhwb25kk
 
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理e4aez8ss
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...limedy534
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Colleen Farrelly
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...Boston Institute of Analytics
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfJohn Sterrett
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfBoston Institute of Analytics
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptSonatrach
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)jennyeacort
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一F sss
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxEmmanuel Dauda
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfSocial Samosa
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...Florian Roscheck
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhijennyeacort
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdfHuman37
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfgstagge
 

Recently uploaded (20)

B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docx
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
 
9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
 
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdf
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptx
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
 
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdf
 

High-Dimensional Network Estimation using ECL

  • 1. Methods for Robust High Dimensional Graphical Model Selection Kshitij Khare and Syed Rahman Department of Statistics University of Florida
  • 2. Motivation • Availability of high-dimensional data or “big data” from various applications • Number of variables (p) much larger than (or sometimes comparable to) the sample size (n) • Examples: Biology: gene expression data Environmental science: climate data on spatial grid Finance: returns on thousands of stocks 1 / 20
  • 4. Goal: Understanding relationships between variables • Common goal in many applications: Understand complex network of relationships between variables • Covariance matrix: a fundamental quantity to help understand multivariate relationships • Even if estimating the covariance matrix is not the end goal, it is a crucial first step before further analysis 3 / 20
  • 5. Quick recap: What is a covariance matrix? • The covariance of two variables/features (say two stock prices) is a measure of linear dependence between these variables • Positive covariance indicates similar behavior, Negative covariance indicates opposite behavior, zero covariance indicates lack of linear dependence 4 / 20
  • 6. Lets say we have five stock prices S1, S2, S3, S4, S5. The covariance matrix of these five stocks looks like S2   S2   S3   S4   S5   S1   S1   S3   S4   S5   5 / 20
  • 7. Challenges in high-dimensional estimation • Covariance matrix (often denoted by Σ) has O(p2) unknown parameters • If p = 1000, we need to estimate roughly 1 million parameters • If sample size n is much smaller (or even same order) than p, this is not viable • The sample covariance matrix (classical estimator) can perform very poorly in high-dimensional situations (not even invertible when n < p) 6 / 20
  • 8. Is there a way out? • Reliably estimate small number of parameters in Σ or Ω = Σ−1 • Set insignificant parameters to zero • Gives rise to sparse estimates of Σ or Ω • Sparsity pattern can be represented by graphs/networks 7 / 20
  • 9. Concentration Graphical Models: Sparsity in Ω • Assume Ω (inverse covariance matrix) is sparse: corresponds to assuming conditional independences • Sparsity pattern in Ω can be represented by an undirected graph G = (V , E) • Build a graph from sparse Ω Ω = A B C 1 0.2 0.3 A 0.2 2 0 B 0.3 0 1.2 C A B C 8 / 20
  • 10. Are these models useful? Appropriate? • Many physical networks are assumed to be sparse • Complex networks (internet, citation networks, social networks) tend to be sparse [Newman, 2003] • Genetic networks are sparse [Gardner et al, 2003, Jeong et al, 2001] Model selection problem: How do we infer the underlying network/graph from data? 9 / 20
  • 11. CONvex CORrelation selection methoD (CONCORD) Obtain estimate of Ω by minimizing the objective function: Qcon(Ω) =: − p i=1 n log ωii + 1 2 p i=1 ωii Yi + j=i ωij Yj 2 2 + λ 1 i<j p |ωij | • The penalty term λ 1 i<j p |ωij | ensures that the minimizer is sparse • λ (chosen by the user) controls the level of sparsity in the estimator • Larger the λ, sparser the estimator 10 / 20
  • 12. Minimization algorithm • Direct minimization of Qcon not feasible • Cyclic coordinate-wise minimization algorithm 1. Minimize [ωij ]1 (other coefficients held constant): ωij ← Sλ n − j =j ωij Sjj + i =i ωi j Sii Sii + Sjj 2. Minimize [ωii ] (other coefficients held constant): ωii ← − j=i ωij Sij + j=i ωij Sij 2 + 4Sii 2Sii Repeat until convergence 1 Soft-thresholding operator: Sλ(x) = sign(x)(|x| − λ)+ 11 / 20
  • 13. CONCORD method: summary METHOD Property NS SPACE SYMLASSO SPLICE CONCORD Symmetry + + + + Convergence guarantee (fixed n) N/A + Asymptotic consistency (n, p → ∞) + + + CONCORD retains all good properties of previous methods, and adds several attractive features 12 / 20
  • 14. Comparison with Sample Covariance Matrix • When the sample size(n) is smaller than the number of variables(p), the sample covariance matrix(S) is not even positive definite (and hence not invertible). • In such a case, we HAVE to use CONCORD (or a comparable method) to get an estimate. • If n > p, we can consider S−1 as an estimate for Ω. However, S−1 will never be sparse in general and is usually a poor estimate, especially if Ω is sparse. 13 / 20
  • 15. Comparison with Sample Covariance Matrix continued ... • For the true covariance matrix(Ω−1) for our numerical experiments we generated 50 × 50 positive definite matrix. • Using the covariance matrix we then generated data for a sample of n = 60 observations (slightly larger than p = 50). • We compared the accuracy of the sample covariance matrix and CONCORD using the Frobenius Norm. The above experiment was repeated 100 times. • The average Frobenius error for CONCORD is 0.4125151, while for the inverse covariance matrix is 46.9759999. Message: CONCORD is far superior to simply inverting S. 14 / 20
  • 16. Leveraging strength of ECL • One of the biggest advantages of ECL is distributed computing. • However, CONCORD as it exists doesn’t lend itself easily to parallel computing. • Even if we can run if on several nodes, the nodes need to communicate amongst themselves due to the dependence structure of covariance matrices. • How can we adapt CONCORD to leverage parallel computation? 15 / 20
  • 17. Improvisation: Divide and Conquer • Run concord for just a few iterations (around 10) until the corresponding graph breaks into five to ten disjoint components. • Run Concord afresh separately for each of these components in separately nodes. • Each of these are completely independent and there is no need for any data movement between these nodes. • The overall performance in terms of time is greatly enhanced. 16 / 20
  • 18. Illustrative example • In our example with p = 50, we run CONCORD for 10 iterations. • Depending on the value of the penalization parameter, λ the graph is broken up into 5 or 6 components. • Run CONCORD on the full dataset till convergence takes 15 minutes and 30 seconds. • Improvised method: Concord for 10 iterations: 2 minutes 32 seconds Running a fresh concord for each of these takes less than 3 minutes Improvised method reduces overall method time by 66%. 17 / 20
  • 19. Illustrative example Full 1 2 3 4 5 Comparison of Concord Implementations Implementation Timeinseconds 0200400600800 18 / 20
  • 20. Algorithm 1 CONCORD pseudocode Input: Compute the sample covariance matrix S Input: Fix maximum number of iterations: rmax Input: Fix initial estimate: ˆΩ(0) Input: Fix convergence threshold: Set r ← 1 Set converged = FALSE repeat ˆΩold = ˆΩcurrent updates to partial covariances Ωij for i = 1,...p − 1 do for j = i + 1,...p do ˆωcurrent ij = S λ n (−( j =j ωcurrent ij sjj + i =i ωcurrent i j sii )) sii + sjj (1) where Sλ (x) := sign(x)(|x| − λ)+ end for end for updates to partial variances Ωii for i = 1,...p do ˆωcurrent ii = − k=i sik ωcurrent ki + ( k=i sik ωcurrent ki )2 + 4sii 2sii (2) end for Convergence checking if ˆΩold − ˆΩcurrent max < then converged = TRUE else r ← r + 1 end if until converged=TRUE or r > rmax return ˆΩ(r) 19 / 20
  • 21. ECL implementation of CONCORD • ECL has been implemented in CONCORD as part of the machine learning library. • If n > p, use ML.PopulationEstimate.ConcordV1. If n < p, use ML.PopulationEstimate.ConcordV2. Or simply use ML.PopulationEstimate.InverseCovariance. • ML.PopulationEstimate.ConcordV2(Y:=data,lambda:=10, maxiter:=100,tol:=0.00001) • Help/documentation is available at https://concordinecl.wordpress.com/guide-to-using-concord-in-ecl/ 20 / 20