Comprehensive understanding of gene regulatory
networks (GRNs) is a major challenge in systems biology. Most
methods for modeling and inferring the dynamics of GRNs,
such as those based on state space models, vector autoregressive
models and G1DBN algorithm, assume linear dependencies
among genes. However, this strong assumption does not make
for true representation of time-course relationships across the
genes, which are inherently nonlinear. Nonlinear modeling
methods such as the S-systems and causal structure
identification (CSI) have been proposed, but are known to be
statistically inefficient and analytically intractable in high
dimensions. To overcome these limitations, we propose an
optimized ensemble approach based on support vector
regression (SVR) and dynamic Bayesian networks (DBNs). The
method called SVR-DBN, uses nonlinear kernels of the SVR to
infer the temporal relationships among genes within the DBN
framework. The two-stage ensemble is further improved by
SVR parameter optimization using Particle Swarm
Optimization. Results on eight insilico-generated datasets, and
two real world datasets of Drosophila Melanogaster and
Escherichia Coli, show that our method outperformed the
G1DBN algorithm by a total average accuracy of 12%. We
further applied our method to model the time-course
relationships of ovarian carcinoma. From our results, four hub
genes were discovered. Stratified analysis further showed that
the expression levels Prostrate differentiation factor and BTG
family member 2 genes, were significantly increased by the
cisplatin and oxaliplatin platinum drugs; while expression levels
of Polo-like kinase and Cyclin B1 genes, were both decreased by
the platinum drugs. These hub genes might be potential
biomarkers for ovarian carcinoma.
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Inference of Nonlinear Gene Regulatory Networks through Optimized Ensemble of Support Vector Regression and Dynamic Bayesian Networks
1. Arinze Akutekwe
arinze.akutekwe@northumbria.ac.uk
Inference of Nonlinear Gene Regulatory Networks through
Optimized Ensemble of Support Vector Regression and Dynamic
Bayesian Networks
Improved Inference of Gene Regulatory
Networks via Optimized Ensemble of
Nonlinear Predictors and Dynamic Bayesian
Networks
Bio-Health Informatics Research Group, Department of
Computer Science and Digital Technologies,
University of Northumbria at Newcastle, United Kingdom
IEEE Engineering in Medicine and Biology
Conference, Milano
29th August 2015
Arinze Akutekwe and Huseyin Seker
2. Problem Description
Aims of the study
Methodology
Support Vector Regression
Dynamic Bayesian Network
Proposed Algorithm
Results and Discussion
Further Application (Ovarian Carcinoma)
Conclusions
Further work
Questions
Outline
3. Problem Description
Aims of the study
Methodology
Support Vector Regression
Dynamic Bayesian Network
Proposed Algorithm
Results and Discussion
Further Application (Ovarian Carcinoma)
Conclusions
Further work
Questions
Outline
4. Estimation of GRNs from high-throuput time-course
data is a major challenge.
“Transposable elements (jumping genes), composing
about half of the human genome, are reverse
transcribed.” Wendy Ashlock
Reverse engineering of GRNs is therefore important.
Problem further compounded by the p >> n curse of
dimensionality problem.
Accurate and statistically efficient methods needed.
Results are of importance in disease treatment which
rely on identification of genes’ dynamics.
Personalized medicine, drug discovery.
Problem Description
5. Problem Description
Aims of the study
Methodology
Support Vector Regression
Dynamic Bayesian Network
Proposed Algorithm
Results and Discussion
Further Application (Ovarian Carcinoma)
Conclusions
Further work
Questions
Outline
6. To develop an improved method of modeling
the dynamics of GRNs based on Support Vector
Regression and Dynamic Bayesian Network.
To address the modeling inefficiencies posed by
assumption of linearity with existing methods
(G1DBN, VAR models etc).
To compare developed method with ones in
literature.
To explore relevance on both simulated and
real world time-course dataset.
Aims of the study
7. Problem Description
Aims of the study
Methodology
Support Vector Regression
Dynamic Bayesian Network
Proposed Algorithm
Results and Discussion
Further Application (Ovarian Carcinoma)
Conclusions
Further work
Questions
Outline
8. Support Vector Regression (SVR)
For training data x1;y1,…,xn;yn the goal of the SVR is to
find a function f(x) that has at most ε deviation from the
actually obtained targets yi. This can be represented as:
2 *
*
( )
,
n
i i
i=1i i
min 1
w +C ξ
w,b,ξ 2
( )T
i i iy w x b
*
( )T
i iw x b
*
, 0i i
9. Problem Description
Aims of the study
Methodology
Support Vector Regression
Dynamic Bayesian Network
Proposed Algorithm
Results and Discussion
Further Application (Ovarian Carcinoma)
Conclusions
Further work
Questions
Outline
10. Dynamic Bayesian Network (DBN) are Bayesian network
with time-series to represent temporal dependencies
among variable.
Microarrays: simultaneous expression of thousands of genes.
The DBN models the likelihood of an edge by measuring the
conditional dependencies between the variables and given
Modeling Assumptions:
1st order Markov process: states that the future state
is independent of the past given the present.
Any variable at time t is dependent on the past
variables only through the variables observed at the
previous time t-1
Dynamic Bayesian Network
( )
t-1 t
i j
X , X
t-1
i
X
t
j
X
t-1
k
X
11. DBN models biological motifs (cycles), without
graphical cycles hence still retains the Bayesian
Network framework.
Dynamic Bayesian Network
(Lebre. S 2009, Friedman et al. 1998, Murphy and Mian 1999, OpgenRhein and Strimmer 2007)
12. Problem Description
Aims of the study
Methodology
Support Vector Regression
Dynamic Bayesian Network
Proposed Algorithm
Results and Discussion
Further Application (Ovarian Carcinoma)
Conclusions
Further work
Questions
Outline
13. Involves the use of RBF kernel
of SVR within a DBN
framework.
Improves on existing DBN
modeling methods where
linear models (based on
assumption of linearity) have
been used to infer the
temporal relationships.
Two-step ensemble
approach, is further improved
by the optimization of SVR C
and γ hyperparameters.
Proposed improved algorithm
14. Problem Description
Aims of the study
Methodology
Support Vector Regression
Dynamic Bayesian Network
Proposed Algorithm
Results and Discussion
Further Application (Ovarian Carcinoma)
Conclusions
Further work
Questions
Outline
15. Algorithm was implemented using R Language and
effectiveness tested using standard simulated and
real gene expression datasets (Ground truths).
Simulated dataset is the Dialogue for Reverse
Engineering Assessments and Methods (DREAM)
dataset. Eight different kinds were used.
Real datasets include the life cycle of Drosophila
Melanogaster and the SOS DNA repair network of
Escherichia coli.
CLPSO used to tune the hyper-parameters the SVR
model of Algorithm 1.
Results and Discussions
16. Dataset
G1DBN Optimized SVR-
DBN
AUPR AUROC AUPR AUROC
DREAM4_1 0.1648 0.5537 0.2287 0.6416
DREAM4_2 0.1760 0.5476 0.2303 0.6607
DREAM4_3 0.1402 0.5153 0.1663 0.5867
DREAM4_4 0.1431 0.5561 0.2607 0.7126
DREAM4_5 0.1333 0.5483 0.2175 0.6922
DREAM3_10 0.1955 0.4862 0.3323 0.6925
DREAM3_50 0.0555 0.4831 0.0856 0.5684
DREAM3_100 0.0355 0.5353 0.0402 0.5525
The performance of the two
methods on DREAM
datasets.
Precision=TP/(TP+FP),
Recall=TP/(TP+FN)
TPRate=TP/(TP+FN),
FPRate=FP/(FP+TN)
The table shows that the
optimized SVR-DBN method
consistently outperformed
the G1DBN at modeling
nonlinear relationships
between genes.
Comparison of Results
17. Simulated data are generated using mathematical
models such as Ordinary Differential Equations
(ODEs) or Stochastic Differential Equations (SDEs).
However, single time series data with 21 time
points (sequence of 0-1000 by 50) such as the
DREAM4 dataset, are not very common in
practice.
Real datasets are further used to test the
efficiency of the algorithm.
Results on real dataset
19. Problem Description
Aims of the study
Methodology
Support Vector Regression
Dynamic Bayesian Network
Proposed Algorithm
Results and Discussion
Further Application (Ovarian Carcinoma)
Conclusions
Further work
Questions
Outline
20. Time-course A2780 human ovarian carcinoma
cell data GSE8057 was downloaded and
analyzed.
It contained 12625 genes and 51 time series
arrays of Affymetrix HG-U95Av2 GeneChips.
Consists of gene expression profiles obtained
prior to treatment, and at five time-points up
to 24hr after treatment with IC90 growth-
inhibitory concentrations for cisplatin and
oxaliplatin platinum drugs.
Further Application (Ovarian Carcinoma)
21. Further Application (Ovarian Carcinoma)
39742_at
2031_s_at
33334_at
38287_at
40117_at
36079_at
37842_at
37228_at
1536_at
527_at
40619_at
36634_at
1173_g_at
1890_at
39708_at
41191_at
1945_at
33322_i_at
39633_at
34 genes that were selected
based on three main criteria –
genes whose expressions
were increased or decreased
by both oxaliplatin and
cisplatin platinum drugs.
Result shows temporal
relationships across 19 of the
34 genes with a total of 42
arcs.
4 highly regulated genes were
discovered with a significant
self-loop between one of
them.
22. From the results, the expression levels of Prostrate
differentiation factor and BTG family, member 2 genes
are among those increased by the platinum drugs
while expression levels of Polo-like kinase and Cyclin
B1 are both decreased by the platinum drugs.
These genes might therefore be potential drug targets
for ovarian cancer.
Results
Probeset Drug
Category
Gene
Sym.
Gene Title
1890_at both
increase
PLAB Prostrate
differentiation factor
36634_at both
increase
BTG2 BTG family, member 2
37228_at both
decrease
PLK Polo-like kinase
(Drosophila)
1945_at both
decrease
CCNB1 Cyclin B1
On-going verification
of interactions from
databases
(Genemania etc)
Inference results.
Further lab-based
clinical validation
required.
23. Problem Description
Aims of the study
Methodology
Support Vector Regression
Dynamic Bayesian Network
Proposed Algorithm
Results and Discussion
Further Application (Ovarian Carcinoma)
Conclusions
Further work
Questions
Outline
24. Developed an optimized ensemble SVR-DBN algorithm
for more accurate inference of the nonlinearities
involved in modeling gene expression networks from
time-course data.
Algorithm allows for flexibility and improvement in
prediction accuracy by parameter optimization.
Tested on standard simulated and real gene expression
data with results that outperform existing ones.
Inferred temporal relationships of time-course ovarian
cancer dataset; highly regulated genes were found.
Could be applied for inference on other time-course
gene expression data.
Conclusions
25. Problem Description
Aims of the study
Methodology
Support Vector Regression
Dynamic Bayesian Network
Proposed Algorithm
Results and Discussion
Further Application (Ovarian Carcinoma)
Conclusions
Further work
Questions
Outline
26. More studies will be carried out to improve on
the predictive power of the new algorithm.
Comparison with other time-course inference
algorithms such as ARACNE, CLR and MRNET.
Other population-based optimization algorithms
such as Differential Evolution tried.
Results will also be compared with other linear
and nonlinear dynamic prediction methods.
Further work
27. 1] S. Lèbre, „Inferring dynamic genetic networks with low order independencies‟, Statistical
Applications in Genetics and Molecular Biology, 8, (1), pp. 1-38, 2009
2] J. Liang, A. Qin, P. Suganthan, S. Baskar "Comprehensive learning particle swarm optimizer for
global optimization of multimodal functions," Evolutionary Computation, IEEE Transactions on,
10(3), pp.281,295, June 2006
3] C. Chang, L. Chih-Jen. “LIBSVM: a library for support vector machines”, ACM Transactions on
Intelligent Systems and Technology (TIST), 2, (3), pp. 1-27, 2011
4] Y. Brun, R. Varma, S. Hector, L. Pendyala, R. Tummala R, W. Greco, “Simultaneous modeling of
concentration-effect and time-course patterns in gene expression data from microarrays” Cancer
Genomics-Proteomics, 5(1), pp. 43-53, 2008
5] J. Kennedy and R. C. Eberhart, Particle swarm optimization, in: Proc. of IEEE International
Conference on Neural Networks, Piscataway, NJ. pp. 1942-1948 (1995).
6] B. Godsey “Improved Inference of Gene Regulatory Networks through Integrated Bayesian
Clustering and Dynamic Modeling of Time-Course Expression Data” PLoS ONE 8(7): e68358, 2013.
7]T. Hasegawa, R. Yamaguchi, M. Nagasaki, S. Miyano, S. Imoto “Inference of Gene Regulatory
Networks Incorporating Multi-Source Biological Knowledge via a State Space Model with L1
Regularization” PLoS ONE 9(8), 2014.
8] S. Lebre. "Stochastic process analysis for Genomics and Dynamic Bayesian Networks inference."
PhD dissertation, Université d'Evry-Val d'Essonne, 2007
References