Francesco GadaletaUniversity of Liege - Montefiore Institute
Select and Connect
the benefits of building networks in genetics
Francesco Gadaleta PhD - Montefiore Institute ULg
Francesco GadaletaSelect and Connect: the benefits of building networks in genetics
• Genetics and Networks
• Variable selection with penalised regression
• Application to Gene Expression Profiles
• Demo
OUTLINE
Francesco GadaletaSelect and Connect: the benefits of building networks in genetics
Given a set of microarray experiments
HOW
select covariates build networks
?
Francesco GadaletaSelect and Connect: the benefits of building networks in genetics
EXPLAIN PREDICTOR
gene interaction
best SNP selection
phenotype
survival
Francesco GadaletaSelect and Connect: the benefits of building networks in genetics
?ASSOCIATION
CORRELATION
CAUSALITY
A
B
REGULATION
Francesco GadaletaSelect and Connect: the benefits of building networks in genetics
PENALISED
REGRESSION
least squares elastic net ridge
regression fused group quadratic programming
hierarchical LASSO nonlinear penalised
multivariate linear regression
gradient descent
Francesco GadaletaSelect and Connect: the benefits of building networks in genetics
OPTIMISATION PROBLEMS
IN MACHINE LEARNING
Def. convex set X
Def. convex function
Optimisation problem
Francesco GadaletaSelect and Connect: the benefits of building networks in genetics
GRADIENT
Where’s the min?



follow the
Francesco GadaletaSelect and Connect: the benefits of building networks in genetics
local minima are also global minima
fast convergence
WHY CONVEXITY?
Introduce gradients, subgradients and epigraphs
…
some proofs
here
gradient/coordinate
descent
Francesco GadaletaSelect and Connect: the benefits of building networks in genetics
OPTIMISATION
METHODS FOR
GRADIENT DESCENT
works fine but can be slow
COORDINATE DESCENT
cycle through each predictor in turn
compute residuals
convex ANDdifferentiable
convex
PATHWISE COORDINATE DESCENT
start with large (sparse model)
apply COORDINATE DESCENT
decrease (zero coordinates stay zero!)
Francesco GadaletaSelect and Connect: the benefits of building networks in genetics
min
penalty (sparsity)
covariance matrix
(association)
gene matrix
response
Francesco GadaletaSelect and Connect: the benefits of building networks in genetics
MULTICOLLINEARITY
y B C
D
lack of independency
presence of interdependency
least square regression fails
approach singularity
explodes
Francesco GadaletaSelect and Connect: the benefits of building networks in genetics
i
NETWORK
CONSTRUCTION
Francesco GadaletaSelect and Connect: the benefits of building networks in genetics
matrix of regression
coefficients
symmetric
not symmetric A B
A B
A B
Francesco GadaletaSelect and Connect: the benefits of building networks in genetics
NETWORK
VALIDATION
GeneNetWeaver
‣ generates synthetic μA data from regulatory network

‣ several conditions (simulation of μA noise, network
perturbations, time series, generation of samples using
multifactorial equations)

‣ Golden Standard (GS) 

Directed unweighted signed network, based on
transcription factor network (TFN) of E.coli [1]
Francesco GadaletaSelect and Connect: the benefits of building networks in genetics
SYNTHETIC DATA
GENERATION
‣ Used no noise, multifactorial based GS networks
50 nodes with 3 regulators (TFs)
200 nodes with 10 regulators (TFs)
Francesco GadaletaSelect and Connect: the benefits of building networks in genetics
SELECT AND CONNECT
IN ACTION
Francesco GadaletaSelect and Connect: the benefits of building networks in genetics
REAL
NETWORK
200nodes
212connections
Francesco GadaletaSelect and Connect: the benefits of building networks in genetics
PREDICTED
NETWORK
200nodes
360connections
Francesco GadaletaSelect and Connect: the benefits of building networks in genetics
DEGREE
CORRELATION
86%
false positives
predicted hubs
correctly detected
Francesco GadaletaSelect and Connect: the benefits of building networks in genetics
BETWEENESS
CORRELATION
83%
Francesco GadaletaSelect and Connect: the benefits of building networks in genetics
LASSO is quadratic programming (polynomial)
COMPUTATIONAL
COMPLEXITY
Time-complexity of iterative convex optimisation
is tricky to analyse 

(it depends on a convergence criterion)
:-):-(
Coordinate descent requires O(np) operations
:-|
Francesco GadaletaSelect and Connect: the benefits of building networks in genetics
PARALLEL
COMPUTATIONS
(permutation tests)
Francesco GadaletaSelect and Connect: the benefits of building networks in genetics
200 genes, 400 permutations, 5 cpus
100 100100 100
as fast as 100 permutations
Francesco GadaletaSelect and Connect: the benefits of building networks in genetics
200 genes, 400 permutations, 40 cpus
100 100100 100 100 100100 100 100 100100 100
1:25 25:50 175:200
asfastas25genesand100permutations
…
Francesco GadaletaSelect and Connect: the benefits of building networks in genetics
NETWORK
INTEGRATION
Francesco GadaletaSelect and Connect: the benefits of building networks in genetics
NETWORK
INTEGRATION
how do we connect networks?
how do we deal with diverse 

datasets?
Francesco GadaletaSelect and Connect: the benefits of building networks in genetics
– Niels Bohr
“Prediction is very difficult,
especially about the future.”
genes
www.worldofpiggy.com @worldofpiggy francesco.gadaleta@gmail.com
Francesco GadaletaSelect and Connect: the benefits of building networks in genetics
thank you.
www.worldofpiggy.com @worldofpiggy francesco.gadaleta@gmail.com

Network genetics

  • 1.
    Francesco GadaletaUniversity ofLiege - Montefiore Institute Select and Connect the benefits of building networks in genetics Francesco Gadaleta PhD - Montefiore Institute ULg
  • 2.
    Francesco GadaletaSelect andConnect: the benefits of building networks in genetics • Genetics and Networks • Variable selection with penalised regression • Application to Gene Expression Profiles • Demo OUTLINE
  • 3.
    Francesco GadaletaSelect andConnect: the benefits of building networks in genetics Given a set of microarray experiments HOW select covariates build networks ?
  • 4.
    Francesco GadaletaSelect andConnect: the benefits of building networks in genetics EXPLAIN PREDICTOR gene interaction best SNP selection phenotype survival
  • 5.
    Francesco GadaletaSelect andConnect: the benefits of building networks in genetics ?ASSOCIATION CORRELATION CAUSALITY A B REGULATION
  • 6.
    Francesco GadaletaSelect andConnect: the benefits of building networks in genetics PENALISED REGRESSION least squares elastic net ridge regression fused group quadratic programming hierarchical LASSO nonlinear penalised multivariate linear regression gradient descent
  • 7.
    Francesco GadaletaSelect andConnect: the benefits of building networks in genetics OPTIMISATION PROBLEMS IN MACHINE LEARNING Def. convex set X Def. convex function Optimisation problem
  • 8.
    Francesco GadaletaSelect andConnect: the benefits of building networks in genetics GRADIENT Where’s the min?
 
 follow the
  • 9.
    Francesco GadaletaSelect andConnect: the benefits of building networks in genetics local minima are also global minima fast convergence WHY CONVEXITY? Introduce gradients, subgradients and epigraphs … some proofs here gradient/coordinate descent
  • 10.
    Francesco GadaletaSelect andConnect: the benefits of building networks in genetics OPTIMISATION METHODS FOR GRADIENT DESCENT works fine but can be slow COORDINATE DESCENT cycle through each predictor in turn compute residuals convex ANDdifferentiable convex PATHWISE COORDINATE DESCENT start with large (sparse model) apply COORDINATE DESCENT decrease (zero coordinates stay zero!)
  • 11.
    Francesco GadaletaSelect andConnect: the benefits of building networks in genetics min penalty (sparsity) covariance matrix (association) gene matrix response
  • 12.
    Francesco GadaletaSelect andConnect: the benefits of building networks in genetics MULTICOLLINEARITY y B C D lack of independency presence of interdependency least square regression fails approach singularity explodes
  • 13.
    Francesco GadaletaSelect andConnect: the benefits of building networks in genetics i NETWORK CONSTRUCTION
  • 14.
    Francesco GadaletaSelect andConnect: the benefits of building networks in genetics matrix of regression coefficients symmetric not symmetric A B A B A B
  • 15.
    Francesco GadaletaSelect andConnect: the benefits of building networks in genetics NETWORK VALIDATION GeneNetWeaver ‣ generates synthetic μA data from regulatory network
 ‣ several conditions (simulation of μA noise, network perturbations, time series, generation of samples using multifactorial equations)
 ‣ Golden Standard (GS) 
 Directed unweighted signed network, based on transcription factor network (TFN) of E.coli [1]
  • 16.
    Francesco GadaletaSelect andConnect: the benefits of building networks in genetics SYNTHETIC DATA GENERATION ‣ Used no noise, multifactorial based GS networks 50 nodes with 3 regulators (TFs) 200 nodes with 10 regulators (TFs)
  • 17.
    Francesco GadaletaSelect andConnect: the benefits of building networks in genetics SELECT AND CONNECT IN ACTION
  • 18.
    Francesco GadaletaSelect andConnect: the benefits of building networks in genetics REAL NETWORK 200nodes 212connections
  • 19.
    Francesco GadaletaSelect andConnect: the benefits of building networks in genetics PREDICTED NETWORK 200nodes 360connections
  • 20.
    Francesco GadaletaSelect andConnect: the benefits of building networks in genetics DEGREE CORRELATION 86% false positives predicted hubs correctly detected
  • 21.
    Francesco GadaletaSelect andConnect: the benefits of building networks in genetics BETWEENESS CORRELATION 83%
  • 22.
    Francesco GadaletaSelect andConnect: the benefits of building networks in genetics LASSO is quadratic programming (polynomial) COMPUTATIONAL COMPLEXITY Time-complexity of iterative convex optimisation is tricky to analyse 
 (it depends on a convergence criterion) :-):-( Coordinate descent requires O(np) operations :-|
  • 23.
    Francesco GadaletaSelect andConnect: the benefits of building networks in genetics PARALLEL COMPUTATIONS (permutation tests)
  • 24.
    Francesco GadaletaSelect andConnect: the benefits of building networks in genetics 200 genes, 400 permutations, 5 cpus 100 100100 100 as fast as 100 permutations
  • 25.
    Francesco GadaletaSelect andConnect: the benefits of building networks in genetics 200 genes, 400 permutations, 40 cpus 100 100100 100 100 100100 100 100 100100 100 1:25 25:50 175:200 asfastas25genesand100permutations …
  • 26.
    Francesco GadaletaSelect andConnect: the benefits of building networks in genetics NETWORK INTEGRATION
  • 27.
    Francesco GadaletaSelect andConnect: the benefits of building networks in genetics NETWORK INTEGRATION how do we connect networks? how do we deal with diverse 
 datasets?
  • 28.
    Francesco GadaletaSelect andConnect: the benefits of building networks in genetics – Niels Bohr “Prediction is very difficult, especially about the future.” genes www.worldofpiggy.com @worldofpiggy francesco.gadaleta@gmail.com
  • 29.
    Francesco GadaletaSelect andConnect: the benefits of building networks in genetics thank you. www.worldofpiggy.com @worldofpiggy francesco.gadaleta@gmail.com