Martin Luther University
Halle-Wittenberg
Falk Schreiber
From Big Data to Smart Knowledge
Integrating Multimodal Biologica...
Observations
1.  A tidal wave of scientific data
Observations
1.  A tidal wave of scientific data
Year Time Costs (Mio. US$)
2003 13 years 2700
2007 a few months 1
2009 a ...
Observations
1.  A tidal wave of scientific data
2.  From building blocks to complex systems
genes transcripts proteins me...
Observations
1.  A tidal wave of scientific data
2.  From building blocks to complex systemsreductionistapproach
integrati...
Observations
1.  A tidal wave of scientific data
2.  From building blocks to complex systems
3.  Multi-domain data
Observations
1.  A tidal wave of scientific data
2.  From building blocks to complex systems
3.  Multi-domain data
From Data to Knowledge – Outline of the Talk
Understanding metabolism
via modelling
From Data to Knowledge – Outline of the Talk
Understanding metabolism
via modelling
Integrating and exploring
multimodal b...
!  Network of thousands
of biochemical reactions
! Enzyme-catalysed
! Transporter-mediated
!  Supports all biological acti...
+ kinetic rate laws
+ kinetic
parameters
Topological
analysis
network
structure
Petri net (P/T)
analysis
+ thermodynamics
...
+ kinetic rate laws
+ kinetic
parameters
Topological
analysis
network
structure
Petri net (P/T)
analysis
+ thermodynamics
...
Flux Balance Analysis
!  Constraint-based stoichiometric modelling approach to predict
and analyse the metabolic steady st...
Principles of Flux Balance Analysis
Simulation
Oxygene level
Objective Function
How to identify plausible physiological states?
Question Objective
What are the biochemical
production ...
History of FBA
Software Tools and Pipelines for FBA
!  CellNetAnalyzer (CNA)
http://www.mpi-magdeburg.mpg.de/projects/cna/cna.html
!  COB...
FBA Model of seed Metabolism in Hordeum vulgare
Grafahrend-Belau et al. Plant Physiology, 2009
FBA Model of seed Metabolism in Hordeum vulgare
Grafahrend-Belau et al. Plant Physiology, 2009
Size
257 reactions, 234 met...
Example of Model Application
!  Imaging uncovers metabolic compartmentation
!  Alanine synthesis mainly in central endospe...
Simulation of Region-specific Metabolism
A
B
Central endosperm (hypoxic) Peripheral endosperm (aerobic)
Melkus et al. Plan...
Simulation of Region-specific Metabolism
A
B
Central endosperm (hypoxic) Peripheral endosperm (aerobic)
Melkus et al. Plan...
Obtaining Parameters
!  Influx
! Quantification from video
data
!  Relation of substances
in the same area
! Multimodal al...
Scaling up - Multi* and High Throughput Modelling
Coupling of Organ-specific FBA Models
Coupling of FBA and FSA Models
Müller et al. IEEE PMA, 2012
Grafahrend-Belau et al. Plant Physiology, 2013
High Throughput Modelling
!  Path2Models: A pipeline to compute draft models
!  >140.000 kinetic, logical and constraint-b...
High Throughput Modelling
!  Path2Models: A pipeline to compute draft models
!  >140.000 kinetic, logical and constraint-b...
From Data to Knowledge – Outline of the Talk
Understanding metabolism
via modelling
Integrating and exploring
multimodal b...
Multi-domain Biological Data
Data Domains
Available Tools
Data Integration – A Major Problem (Example: Networks)
!  Bridge the abyss!
Data Integration – A Major Problem (Example: Networks)
!  Many information resources can be utilized
as IDMappers:
! Web services, web sites
(e.g. PICR, CRONOS, …)
! Relational ...
!  Comprises a set of identifiers (nodes) and a set of identifier
mappings (edges)
!  Used to explore identifier interconn...
!  Composed of biological networks and the inferred identifier
mappings as mapping edges
!  Mapping edges represent identi...
!  Metabolic pathways:
Glycolysis, Pyruvate metabolism from KEGG
!  Gene regulatory network:
Arabidopsis thaliana from Reg...
Example
Available Tools
Data, Mappings and Mapping Function
!  Set of measurements
!  Mappings with the object path functions which derives the
re...
Example of Integrated Data http://www.vanted.org
!  The ABC(DE)-model of
Arabidopsis thaliana floral
organ specification
!...
Standards for Modelling and Simulation in SysBio
Standards for Modelling and Simulation in SysBio
Can You Understand This?
Can You Understand This?
Stimulates?
but ...
what
exactly?
Associates
into?
Trans-
locates?
Reciprocal
stimulation?
Is
deg...
Ambiguity in Conventional Representation
Standardised Symbols are Important
Most English
speaking country
Quebec Iran China Israel
Singapore Norway Poland USA and
...
What is SBGN?
!  A way to unambiguously describe biochemical and
cellular events in graphs
!  Limited amount of symbols (~...
Graph Trinity: Three Languages in One http://sbgn.org
Process Description
maps
Entity Relationships
maps
Activity Flow
map...
Graph Trinity: Three Languages in One
Process Description Entity Relationships Activity Flow
Systems Biology Graphical Notation (SBGN)
Working with SBGN http://www.sbgn-ed.org
!  Verification
Czauderna et al. Bioinformatics, 2010
!  Synthesis / bricks
Junke...
Modelling, Visual Analytics, Standards, Network Analysis
Optimise
Predict
visualise, explore, integrate, analyse, model
pr...
Thank You
“We now have unprecedented
ability to collect data about nature
but there is now a crisis
developing in biology,...
Upcoming SlideShare
Loading in …5
×

Falk Schreiber - From big data to smart knowledge ‐ integrating multimodal biological data and modelling metabolism

1,081 views
868 views

Published on

Modern data acquisition methods in the life sciences allow the procurement of different types of data in increasing quantity, facilitating a comprehensive view of biological systems. As data is usually gathered and interpreted by separate domain scientists, it is hard to grasp multi‐domain properties and structures. Consequently there is a need for the integration, analysis, modelling, simulation, and visualisation of life science data from different sources and of different types.
This talk focuses on these two aspects: firstly, methods for the integration and visualization of multimodal biological data are presented. This is achieved based on two graphs representing the meta‐relations between biological data, and the measurement combinations, respectively. Both graphs are linked and serve as different views of the integrated data with navigation and exploration possibilities. Data can be combined and visualised multifariously, resulting in views of the integrated biological data. Secondly, methods to reconstruct, simulate, and analyse detailed metabolic models are presented. We will focus on stoichiometric models, and see how different types of data are used to gather new insights into metabolic processes shown on an example of metabolism in plants.

First presented at the 2014 Winter School in Mathematical and Computational Biology http://bioinformatics.org.au/ws14/program/

Published in: Science
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
1,081
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
11
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Falk Schreiber - From big data to smart knowledge ‐ integrating multimodal biological data and modelling metabolism

  1. 1. Martin Luther University Halle-Wittenberg Falk Schreiber From Big Data to Smart Knowledge Integrating Multimodal Biological Data and Modelling Metabolism 14/07/2014 1 Leibniz Institute IPK Gatersleben
  2. 2. Observations 1.  A tidal wave of scientific data
  3. 3. Observations 1.  A tidal wave of scientific data Year Time Costs (Mio. US$) 2003 13 years 2700 2007 a few months 1 2009 a few weeks 0,05 2014 a few days 0,001 ~2017 cheaper to reproduce data than storing it
  4. 4. Observations 1.  A tidal wave of scientific data 2.  From building blocks to complex systems genes transcripts proteins metabolites reductionistapproach
  5. 5. Observations 1.  A tidal wave of scientific data 2.  From building blocks to complex systemsreductionistapproach integrativeapproach genes transcripts proteins metabolites
  6. 6. Observations 1.  A tidal wave of scientific data 2.  From building blocks to complex systems 3.  Multi-domain data
  7. 7. Observations 1.  A tidal wave of scientific data 2.  From building blocks to complex systems 3.  Multi-domain data
  8. 8. From Data to Knowledge – Outline of the Talk Understanding metabolism via modelling
  9. 9. From Data to Knowledge – Outline of the Talk Understanding metabolism via modelling Integrating and exploring multimodal biological data
  10. 10. !  Network of thousands of biochemical reactions ! Enzyme-catalysed ! Transporter-mediated !  Supports all biological activity !  Metabolic model = List of reactions + associated information Metabolism Source: http://www.genome.jp/kegg/Source: Michael 1993
  11. 11. + kinetic rate laws + kinetic parameters Topological analysis network structure Petri net (P/T) analysis + thermodynamics + stoichiometry Flux balance analysis (FBA) + mass balance + capacity constraints + stochastic rate laws + metabolite concentrations Kinetic modelling Petri net (SPN) analysis Metabolic Models Sizeofmodel Levelofdetail
  12. 12. + kinetic rate laws + kinetic parameters Topological analysis network structure Petri net (P/T) analysis + thermodynamics + stoichiometry Flux balance analysis (FBA) + mass balance + capacity constraints + stochastic rate laws + metabolite concentrations Kinetic modelling Petri net (SPN) analysis Metabolic Models Sizeofmodel Levelofdetail
  13. 13. Flux Balance Analysis !  Constraint-based stoichiometric modelling approach to predict and analyse the metabolic steady state conversion rates (fluxes) !  Advantages ! No kinetic parameters required ! Quantitative predictions ! Applicable to large systems !  Applications ! Prediction of optimal metabolic yields and flux distributions ! Prediction of phenotype/viability of knockout-mutants ! Prediction of pathway redundancies ! And more
  14. 14. Principles of Flux Balance Analysis
  15. 15. Simulation Oxygene level
  16. 16. Objective Function How to identify plausible physiological states? Question Objective What are the biochemical production capabilities? Maximise metabolite product What is the maximal growth rate and biomass yield? Maximise growth rate How efficiently can metabolism channel metabolites through the network? Minimise the Euclidean norm What is the tradeoff between biomass production and metabolite overproduction? Maximise biomass production for a given metabolite production How energetically efficient can metabolism operate? Minimise ATP production or minimise nutrient uptake
  17. 17. History of FBA
  18. 18. Software Tools and Pipelines for FBA !  CellNetAnalyzer (CNA) http://www.mpi-magdeburg.mpg.de/projects/cna/cna.html !  COBRA Toolbox http://gcrg.ucsd.edu/downloads/COBRAToolbox !  FBA-SimVis http://fbasimvis.ipk-gatersleben.de !  Thiele et al. A protocol for generating a high-quality genome- scale metabolic reconstruction. Nature Protocols, 5(1): 93–121, 2010. !  Grafahrend-Belau et al. Plant metabolic pathways: databases and pipeline for stoichiometric analysis. In Agrawal and Rakwal (Eds.), Seed development: omics technologies toward improvement of seed quality and crop yield, Springer, 345-366, 2012.
  19. 19. FBA Model of seed Metabolism in Hordeum vulgare Grafahrend-Belau et al. Plant Physiology, 2009
  20. 20. FBA Model of seed Metabolism in Hordeum vulgare Grafahrend-Belau et al. Plant Physiology, 2009 Size 257 reactions, 234 metabolites Pathways Glyc, TCA, PPP, oxP, Ferm, Rubisco, AA, Starch, CW, and others
  21. 21. Example of Model Application !  Imaging uncovers metabolic compartmentation !  Alanine synthesis mainly in central endosperm, alanine gradient reflects the local oxygen state !  Modelling purpose: elucidate the role of alanine metabolism Source of images: L. Borisjuk and H. Rolletschek, IPK Melkus et al. Plant Biotechnology Journal, 2011 Rolletschek et al. Plant Cell, 2011
  22. 22. Simulation of Region-specific Metabolism A B Central endosperm (hypoxic) Peripheral endosperm (aerobic) Melkus et al. Plant Biotechnology Journal, 2011 Rolletschek et al. Plant Cell, 2011
  23. 23. Simulation of Region-specific Metabolism A B Central endosperm (hypoxic) Peripheral endosperm (aerobic) Melkus et al. Plant Biotechnology Journal, 2011 Rolletschek et al. Plant Cell, 2011
  24. 24. Obtaining Parameters !  Influx ! Quantification from video data !  Relation of substances in the same area ! Multimodal alignment Scharfe et al. BMC Bioinformatics, 2010 Fester et al. GCB, 2009 !  Biomass accumulation ! Quantification from image series Hartmann et al. BMC Bioinformatics, 2011
  25. 25. Scaling up - Multi* and High Throughput Modelling
  26. 26. Coupling of Organ-specific FBA Models
  27. 27. Coupling of FBA and FSA Models Müller et al. IEEE PMA, 2012 Grafahrend-Belau et al. Plant Physiology, 2013
  28. 28. High Throughput Modelling !  Path2Models: A pipeline to compute draft models !  >140.000 kinetic, logical and constraint-based models Le Novère et al. BMC Systems Biology, 2013
  29. 29. High Throughput Modelling !  Path2Models: A pipeline to compute draft models !  >140.000 kinetic, logical and constraint-based models Le Novère et al. BMC Systems Biology, 2013
  30. 30. From Data to Knowledge – Outline of the Talk Understanding metabolism via modelling Integrating and exploring multimodal biological data
  31. 31. Multi-domain Biological Data
  32. 32. Data Domains
  33. 33. Available Tools
  34. 34. Data Integration – A Major Problem (Example: Networks)
  35. 35. !  Bridge the abyss! Data Integration – A Major Problem (Example: Networks)
  36. 36. !  Many information resources can be utilized as IDMappers: ! Web services, web sites (e.g. PICR, CRONOS, …) ! Relational databases (e.g. STRING, PDD, …) ! Flat files (e.g. Kegg, UniProt, …) Overview: Mehlhorn et al. TransID – the flexible identifier mapping service 112-121 (Internat. Symp. Integrative Bioinformatics), 2013. !  Unified using the BridgeDB framework IDMappers
  37. 37. !  Comprises a set of identifiers (nodes) and a set of identifier mappings (edges) !  Used to explore identifier interconnections !  Basis of the integration of biological networks !  Example The Data Linkage Graph (Tair) (UniProt) (EC number)
  38. 38. !  Composed of biological networks and the inferred identifier mappings as mapping edges !  Mapping edges represent identifier connections in the data linkage graph !  Example The Integrated Graph Data linkage graph Integrated graph
  39. 39. !  Metabolic pathways: Glycolysis, Pyruvate metabolism from KEGG !  Gene regulatory network: Arabidopsis thaliana from Regulogs Example
  40. 40. Example
  41. 41. Available Tools
  42. 42. Data, Mappings and Mapping Function !  Set of measurements !  Mappings with the object path functions which derives the relevant metadata and any set of graph element attributes !  Basis: ID Mappers 𝑚 𝑚 𝑚 𝑚 Rohn et al. Bioinformatics, 2011
  43. 43. Example of Integrated Data http://www.vanted.org !  The ABC(DE)-model of Arabidopsis thaliana floral organ specification !  Determination of floral organ identity depends on the combinatorial expression of floral homeotic genes from different classes !  Integration of color-coded images, representing floral homeotic gene expression patterns, into the context of a regulatory network Junker et al. Frontiers in Plant Science, 2012.
  44. 44. Standards for Modelling and Simulation in SysBio
  45. 45. Standards for Modelling and Simulation in SysBio
  46. 46. Can You Understand This?
  47. 47. Can You Understand This? Stimulates? but ... what exactly? Associates into? Trans- locates? Reciprocal stimulation? Is degraded? Stimulates gene Trans- cription?
  48. 48. Ambiguity in Conventional Representation
  49. 49. Standardised Symbols are Important Most English speaking country Quebec Iran China Israel Singapore Norway Poland USA and Canada
  50. 50. What is SBGN? !  A way to unambiguously describe biochemical and cellular events in graphs !  Limited amount of symbols (~30) à Smooth learning curve !  Can graphically represent quantitative models, biochemical pathways, at different levels of granularity !  Developed since 2006 by a interdisciplinary community, part of COMBINE !  Three languages ! Process Descriptions à one state = one glyph ! Entity Relationships à one entity = one glyph ! Activity Flow à conceptual level
  51. 51. Graph Trinity: Three Languages in One http://sbgn.org Process Description maps Entity Relationships maps Activity Flow maps !  Unambiguous !  Mechanistic !  Sequential !  Combinatorial explosion !  Unambiguous !  Mechanistic !  Non-Sequential !  Ambiguous !  Conceptual !  Sequential Le Novère et al. Nature Biotechnology, 2009
  52. 52. Graph Trinity: Three Languages in One Process Description Entity Relationships Activity Flow
  53. 53. Systems Biology Graphical Notation (SBGN)
  54. 54. Working with SBGN http://www.sbgn-ed.org !  Verification Czauderna et al. Bioinformatics, 2010 !  Synthesis / bricks Junker et al. Trends in Biotechnology, 2012 !  Translation Czauderna et al. BMC Bioinformatics, 2013 !  Layout Schreiber et al. BMC Bioinformatics, 2009 Dwyer et al. IEEE Transactions Visualization & Computer Graphics, 2008 !  Data integration Junker et al. Nature Protocols, 2012
  55. 55. Modelling, Visual Analytics, Standards, Network Analysis Optimise Predict visualise, explore, integrate, analyse, model present, understand simulate, predict
  56. 56. Thank You “We now have unprecedented ability to collect data about nature but there is now a crisis developing in biology, in that completely unstructured information does not enhance understanding. We need a framework to put all of this knowledge and data into - that is going to be the problem in biology. […] Driving toward that framework is really the big challenge.” Sydney Brenner

×