18. Data Analysis and Knowledge
Management using BioXM
MeDALL - AirPROM - Synergy-COPD
EISBM Workshop 13-15.6.12
Dr. Dieter Maier
Biomax Informatics AG
www.biomax.com
19. Biomax – Connecting unrelated
information for efficient decision
support
Biomax Vision Biomax Profile
Master scientific complexity Headquartered in Martinsried
Germany
Ensure ease of use
Increase speed of development In business for more than 12
years
Reduce cost and time
World wide customer base
BioXM is a configurable
knowledge management platform Enable centers of excellence for
to flexibly interconnect isolated personalized medicine
silos of information in biomedical Support for Systems Biology
research
20. Why „Knowledge Management“?
Knowledge: “the realisation and
understanding of patterns and their
implications existing in information”
Need to mine information for
patterns
A pattern often only emerge when
information from different silos is
combined
e.g. Expression with gene function,
SNPs with clinical history of
patients, ...
Need semantically integrated
information
e.g. Information about identical or
“equivalent” objects and “meaning”
becomes integrated requires
framework for integration
methods to find “equivalent”
“meaning”
21. Knowledge Management aspects
• Data integration
• Knowledge representation
• Knowledge extraction
• Collaboration and project management
• Multivariate data analysis
26. Working with semantic networks
• Connected data,
meta-data and
knowledge
• Query, view, report
• Integrate with
analysis
27. Knowledge Network Representation
Dynamic network representation in BioXM
Each node or edge of the network may serve
as entry point for further exploration!
29. Concept - Agile Solution Building
Step 1:
Specification
• Designing the
data model
Query the knowledge network, Define the domain-specific data
explore the graph and report query model
results
Step 3: Use Step 2:
• Query building Implementation
and information
retrieval
• Importing
information
Instantiate the
knowledge network
with data and
information from
external resources
30. Building Blocks
Experiment
Text mining Graphs
repository
Public databases R statistics Network search
32. Solution deployment
Step 1:
Specification
Designing the data
Web applications framework fueled
model by BioXM for quick access
Step 3: Use Step 2:
Query building and Implementation
information Importing
retrieval information
Step 4: WebApps
for Information,
Retrieval, Reporting
and Annotation
64. Airway Disease PRedicting Outcomes through Patient Specific
Computational Modelling (AirPROM)
- 50 month, 1.3.11-28.2.16
- Partners: 34
- call: ICT-2010.5.3 VPH call
Image analysis and omics based computational models of the airways to
unravel the pathophysiological mechaims in asthma and COPD
70. AirPROM automated data flows
WP1: clinical data
CT
WP2: omics data
morphology WP4: computational
patient anatomy tools
model, simulation result
WP7: KM
WP5: macro scale
large airway
WP3: micro scale
inform, constrain
model validate
WP6: macro scale WP8: patient specific
small airway multi scale model
71. AirPROM Knowledge
• Collaboration network (partners, tasks, data/models)
• 8 computational models I/O parameter semantic descriptions
Cell model
Tissue model
Perfusion model …
• AirPROM clinical data
15 control, 57 asthma
Anthropometrics, Spirometry
• Link to image data
• Full text document search
• Public knowledge
Gene function (EntrezGene, UniProt, MGI)
Gene - disease association (OMIM, CTD, PubMed)
Gene - compound association (CTD, PubChem, PubMed)
Pathways (KEGG, Reactome)
Protein-protein interactions (MINT, DIP, IntAct)
~100 data sources
Network of ~2 million connections
• Omics data
72. AirPROM KM tasks
• Ensure data flow: provide a secure federated data retrieval, exchange, processing
and warehousing infrastructure
• Semantically integrate the clinical, biobanking physiological, genetic, experimental
and imaging data
• Enable data analysis by providing data matrices and integration with algorithms and
tools for network inference
• Formats to support e.g. ANSYS, ISA-TAB, CGNS, MAGE, SBML, CDISC
• Ontologies to support SNOMED, FMA anatomy ontology, Bio-Physical Ontology
• MIBBI meta-information definitions
• Expected data volume: lower Terabyte region
74. Simple data mapping
Set rules
for import
Data to be imported
(e.g. from an Excel
spreadsheet)
Example:
Tabular Data Import Define import script or
select existing script
75. Study cohort variable
harmonisation
Aim:
To provide a template to facilitate
harmonization between pre-existing
cohorts and support the design of
emerging ones.
Bank 2
Bank 3
Data
Bank
1 common Pool
Bank 4
Bank 5
77. Data Schema
• Factor analysis carried out on BTS severe
asthma data set to determine underlying
structure/characteristics of the dataset
• The underlying structure/ factors were then
used to inform the domains and themes to order
the data.
84. High-performance storage system
Tera- to Petabyte data storage for image and image analysis data
• AN1-PZ1.storage.pionier.net.pl
• Certificate based
• sFTP, SSHFS, GridFTP, WebDAV
• access with e.g.
• CT-image data for 35 subjects
• Initial image analysis data
97. Synergy-COPD
“Modelling and simulation environment for systems medicine
(Chronic obstructive pulmonary disease -COPD- as a use case)”
- Start: 1.2.11
- Duration: 3 years
- Partners: 9
- call: ICT-2010.5.3 VPH call
- see: www.Synergy-COPD.org
Integration of models at metabolic (muscle TCA, Respiratory chain, ATP diffusion)
cellular (immune system) and organ (lung biophysics, gas diffusion blood flow) level
Clinical decision support
Software with translation into clinical praxis
99. SYNERGY, KM tasks
Clinical data from BioBridge, PAC-COPD + ECLIPSE
Experimental methods:
– Phenotypes: (respiratory symptoms (wheezing, asthma), rhinitis, dermatitis, IgE? to
common inhaled allergens, and their longitudinal changes)
– transcriptome
– proteomics (targeted)
– metabolomics
Data matrices for and integration with algorithms and tools for network inference
Integration of models (SBML, CellML)
100. Knowledge model
for semantic mapping
Aims:
Find model and experimental data parameters which
are similarily described
Use experimental data to validate theoretical models
Connect Models which share similar Model
Parameters
101. Model and data parameter
concept
Data parameter
Model parameter • instantiates a certain
instantiates a certain parameter in the Life Science
parameter in a model World
has ontological • occurs as descriptor or
description measurable in experimental or
anthropometric data
• has ontological description
102. Parameter semantic annotation
Context specific
Generic • context specfic parameter
general parameter information, true for a given
information, true in any model/study only
context • shared assignment to
assigned to parameter parameter + model/study
only • e.g. unit, Input/Output
e.g. semantic description
103. Parameter semantic annotation
Molecular level model (SBML import)
MIRIAM mapping based reference entity association
check for identical MIRIAM mapping before re-using existing
model parameter by name
create model specific name “parameter_model” if non-identical
Supra-molecular level model (manual
parameter generation)
create semantic annotation based on search result for free text -
ontology mapping
search for existing parameter with same semantics i.e. on-the-fly
network similarity search between search result + Model
parameter context
105. Mapping concept
Use experimental data to validate theoretical models
Connect Element:Model Parameter:Instances
with
Element:Parameter:Instances
Connect models which share similar Model
Parameters
Connect Element:Model Parameter:Instances
with
Element:Model Parameter:Instances
115. Interpreting semantically integrated
information
generate new probabilistic networks from the KB
explore the connection between probabilistic network(s) and deterministic
models based on concepts (genes, physiology) with direct but especially indirect
connections e.g. via Pathways, PPI, ..
explore the connections between data analysis results to nitroso-redox related
knowledge
99
116. Concept for details see WP4 & 5 presentations
Glycolysis
NAD Glc
Clinical ADP Resulting connecting network
data mechanic Myofibrils Glycolysis
work
ATP TCA cycle Cit NAD Glc
NADH Pyr AcCoA
NAD ADP
OAA NADH Succ mechanic
work
ADP ATP
O2 NAD Lac NADH Pyr
transport Electron chain CrP
ATP
diffusion
CrP ROS NAD Lac
Deterministic models
COPD knowledge base ATP
Data clinical/ CrP
experimental
Selection of hubs
Oxidative
phosphorylation
TCA COPD KB
Cycle
network
Glycolysis search
Probabilistic network Physiological
measurments
120. Searching connecting networks by PPI
Glycolysis 11 model proteins, 27 candidates
- PPI with good experimental evidence (78 456)
- 428 protein net, 7 Glycolysis model - 10 candidates
- PPI Two Hybrid (3 743), 55 protein net
- all PPI (>1.5 Mio), 757 protein net
Electron Chain 18 model proteins, 23 candidates, 193 protein
net
TCA cycle 16 model proteins, 21 candidates, 199 protein net
104
122. Summary
• Flexible knowledge modelling
• Different levels of access
• Exchange within, between and outside of projects
• Knowledge network “background” for data
analysis and mining