18. Data Analysis and Knowledge
Management using BioXM
MeDALL - AirPROM - Synergy-COPD
EISBM Workshop 13-15.6.12
Dr. Dieter Maier
Biomax Informatics AG
www.biomax.com
19. Biomax ā Connecting unrelated
information for efficient decision
support
Biomax Vision Biomax Profile
Master scientific complexity Headquartered in Martinsried
Germany
Ensure ease of use
Increase speed of development In business for more than 12
years
Reduce cost and time
World wide customer base
BioXM is a configurable
knowledge management platform Enable centers of excellence for
to flexibly interconnect isolated personalized medicine
silos of information in biomedical Support for Systems Biology
research
20. Why āKnowledge Managementā?
Knowledge: Ā āthe realisation and
understanding of patterns and their
implications existing in informationā
Need to mine information for
patterns
A pattern often only emerge when
information from different silos is
combined
e.g. Expression with gene function,
SNPs with clinical history of
patients, ...
Need semantically integrated
information
e.g. Information about identical or
āequivalentā Ā objects Ā and Ā āmeaningā Ā
becomes integrated requires
framework for integration
methods Ā to Ā find Ā āequivalentā Ā
āmeaningā
21. Knowledge Management aspects
ā¢ Data integration
ā¢ Knowledge representation
ā¢ Knowledge extraction
ā¢ Collaboration and project management
ā¢ Multivariate data analysis
26. Working with semantic networks
ā¢ Connected data,
meta-data and
knowledge
ā¢ Query, view, report
ā¢ Integrate with
analysis
27. Knowledge Network Representation
Dynamic network representation in BioXM
Each node or edge of the network may serve
as entry point for further exploration!
29. Concept - Agile Solution Building
Step 1:
Specification
ā¢ Designing the
data model
Query the knowledge network, Define the domain-specific data
explore the graph and report query model
results
Step 3: Use Step 2:
ā¢ Query building Implementation
and information
retrieval
ā¢ Importing
information
Instantiate the
knowledge network
with data and
information from
external resources
30. Building Blocks
Experiment
Text mining Graphs
repository
Public databases R statistics Network search
32. Solution deployment
Step 1:
Specification
Designing the data
Web applications framework fueled
model by BioXM for quick access
Step 3: Use Step 2:
Query building and Implementation
information Importing
retrieval information
Step 4: WebApps
for Information,
Retrieval, Reporting
and Annotation
64. Airway Disease PRedicting Outcomes through Patient Specific
Computational Modelling (AirPROM)
- 50 month, 1.3.11-28.2.16
- Partners: 34
- call: ICT-2010.5.3 VPH call
Image analysis and omics based computational models of the airways to
unravel the pathophysiological mechaims in asthma and COPD
70. AirPROM automated data flows
WP1: clinical data
CT
WP2: omics data
morphology WP4: computational
patient anatomy tools
model, simulation result
WP7: KM
WP5: macro scale
large airway
WP3: micro scale
inform, constrain
model validate
WP6: macro scale WP8: patient specific
small airway multi scale model
71. AirPROM Knowledge
ā¢ Collaboration network (partners, tasks, data/models)
ā¢ 8 computational models I/O parameter semantic descriptions
ļ§ Cell model
ļ§ Tissue model
ļ§ Perfusion model ā¦
ā¢ AirPROM clinical data
ļ§ 15 control, 57 asthma
ļ§ Anthropometrics, Spirometry
ā¢ Link to image data
ā¢ Full text document search
ā¢ Public knowledge
ļ§ Gene function (EntrezGene, UniProt, MGI)
ļ§ Gene - disease association (OMIM, CTD, PubMed)
ļ§ Gene - compound association (CTD, PubChem, PubMed)
ļ§ Pathways (KEGG, Reactome)
ļ§ Protein-protein interactions (MINT, DIP, IntAct)
ļ§ ~100 data sources
ļ§ Network of ~2 million connections
ā¢ Omics data
72. AirPROM KM tasks
ā¢ Ensure data flow: provide a secure federated data retrieval, exchange, processing
and warehousing infrastructure
ā¢ Semantically integrate the clinical, biobanking physiological, genetic, experimental
and imaging data
ā¢ Enable data analysis by providing data matrices and integration with algorithms and
tools for network inference
ā¢ Formats to support e.g. ANSYS, ISA-TAB, CGNS, MAGE, SBML, CDISC
ā¢ Ontologies to support SNOMED, FMA anatomy ontology, Bio-Physical Ontology
ā¢ MIBBI meta-information definitions
ā¢ Expected data volume: lower Terabyte region
74. Simple data mapping
Set rules
for import
Data to be imported
(e.g. from an Excel
spreadsheet) Ā
Example:
Tabular Data Import Define import script or
select existing script
75. Study cohort variable
harmonisation
Aim:
To provide a template to facilitate
harmonization between pre-existing
cohorts and support the design of
emerging ones.
Bank 2
Bank 3
Data
Bank
1 common Pool
Bank 4
Bank 5
77. Data Schema
ā¢ Factor analysis carried out on BTS severe
asthma data set to determine underlying
structure/characteristics of the dataset
ā¢ The underlying structure/ factors were then
used to inform the domains and themes to order
the data.
84. High-performance storage system
Tera- to Petabyte data storage for image and image analysis data
ā¢ AN1-PZ1.storage.pionier.net.pl
ā¢ Certificate based
ā¢ sFTP, SSHFS, GridFTP, WebDAV
ā¢ access with e.g.
ā¢ CT-image data for 35 subjects
ā¢ Initial image analysis data
97. Synergy-COPD
āModelling Ā and Ā simulation Ā environment Ā for Ā systems Ā medicine
(Chronic obstructive pulmonary disease -COPD- as Ā a Ā use Ā case)ā
- Start: 1.2.11
- Duration: 3 years
- Partners: 9
- call: ICT-2010.5.3 VPH call
- see: www.Synergy-COPD.org
Integration of models at metabolic (muscle TCA, Respiratory chain, ATP diffusion)
cellular (immune system) and organ (lung biophysics, gas diffusion blood flow) level
Clinical decision support
Software with translation into clinical praxis
99. SYNERGY, KM tasks
ļ§ Clinical data from BioBridge, PAC-COPD + ECLIPSE
ļ§ Experimental methods:
ā Phenotypes: (respiratory symptoms (wheezing, asthma), rhinitis, dermatitis, IgE? to
common inhaled allergens, and their longitudinal changes)
ā transcriptome
ā proteomics (targeted)
ā metabolomics
ļ§ Data matrices for and integration with algorithms and tools for network inference
ļ§ Integration of models (SBML, CellML)
100. Knowledge model
for semantic mapping
Aims:
ļ§ Find model and experimental data parameters which
are similarily described
ļ§ Use experimental data to validate theoretical models
ļ§ Connect Models which share similar Model
Parameters
101. Model and data parameter
concept
Data parameter
Model parameter ā¢ instantiates a certain
ļ§ instantiates a certain parameter in the Life Science
parameter in a model World
ļ§ has ontological ā¢ occurs as descriptor or
description measurable in experimental or
anthropometric data
ā¢ has ontological description
102. Parameter semantic annotation
Context specific
Generic ā¢ context specfic parameter
ļ§ general parameter information, true for a given
information, true in any model/study only
context ā¢ shared assignment to
ļ§ assigned to parameter parameter + model/study
only ā¢ e.g. unit, Input/Output
ļ§ e.g. semantic description
103. Parameter semantic annotation
Molecular level model (SBML import)
ļ§ MIRIAM mapping based reference entity association
ļ§ check for identical MIRIAM mapping before re-using existing
model parameter by name
ļ§ create Ā model Ā specific Ā name Ā āparameter_modelā Ā if Ā non-identical
Supra-molecular level model (manual
parameter generation)
ļ§ create semantic annotation based on search result for free text -
ontology mapping
ļ§ search for existing parameter with same semantics i.e. on-the-fly
network similarity search between search result + Model
parameter context
105. Mapping concept
Use experimental data to validate theoretical models
ļ§ Connect Element:Model Parameter:Instances
with
Element:Parameter:Instances
Connect models which share similar Model
Parameters
ļ§ Connect Element:Model Parameter:Instances
with
Element:Model Parameter:Instances
115. Interpreting semantically integrated
information
ļ§ generate new probabilistic networks from the KB
ļ§ explore the connection between probabilistic network(s) and deterministic
models based on concepts (genes, physiology) with direct but especially indirect
connections e.g. via Pathways, PPI, ..
ļ§ explore the connections between data analysis results to nitroso-redox related
knowledge
99
116. Concept for details see WP4 & 5 presentations
Glycolysis
NAD Glc
Clinical ADP Resulting connecting network
data mechanic Myofibrils Glycolysis
work
ATP TCA cycle Cit NAD Glc
NADH Pyr AcCoA
NAD ADP
OAA NADH Succ mechanic
work
ADP ATP
O2 NAD Lac NADH Pyr
transport Electron chain CrP
ATP
diffusion
CrP ROS NAD Lac
Deterministic models
COPD knowledge base ATP
Data clinical/ CrP
experimental
Selection of hubs
Oxidative
phosphorylation
TCA COPD KB
Cycle
network
Glycolysis search
Probabilistic network Physiological
measurments
120. Searching connecting networks by PPI
ļ§ Glycolysis 11 model proteins, 27 candidates
- PPI with good experimental evidence (78 456)
- 428 protein net, 7 Glycolysis model - 10 candidates
- PPI Two Hybrid (3 743), 55 protein net
- all PPI (>1.5 Mio), 757 protein net
ļ§ Electron Chain 18 model proteins, 23 candidates, 193 protein
net
ļ§ TCA cycle 16 model proteins, 21 candidates, 199 protein net
104
122. Summary
ā¢ Flexible knowledge modelling
ā¢ Different levels of access
ā¢ Exchange within, between and outside of projects
ā¢ Knowledge Ā network Ā ābackgroundā Ā for Ā data Ā
analysis and mining