The slide deck of the presentation "AnalyzeGenomes.com: A Federated In-Memory Database Platform for Digital Health" of the 2017 BMBF All Hands Meeting in Karlsruhe are online available now.
2. What is the Hasso Plattner Institute, Potsdam, Germany?
Dr. Schapranow, BMBF
All Hands, Oct 11, 2017
Analyze Genomes: A
Federerated In-
Memory Database for
Digital Health
2
3. ■ Founded as a public-private partnership
in 1998 in Potsdam near Berlin, Germany
■ Institute belongs to the
University of Potsdam
■ Ranked 1st in CHE since 2009
■ 500 B.Sc. and M.Sc. students
■ 12 professors/chairs, 150 PhD students
■ Apr 2017: Digital Engineering Faculty
■ Oct 2017: Opening of Digital Health Center
Hasso Plattner Institute
Key Facts
Analyze Genomes: A
Federerated In-
Memory Database for
Digital Health
3
Dr. Schapranow, BMBF
All Hands, Oct 11, 2017
4. ■ Can we enable clinicians to take their therapy decisions:
□ Incorporating all available patient specifics,
□ Referencing latest lab results and worldwide medical knowledge, and
□ In an interactive manner during their ward round?
Our Motivation
Turn Precision Medicine Into Clinical Routine
Analyze Genomes: A
Federerated In-
Memory Database for
Digital Health
4
Dr. Schapranow, BMBF
All Hands, Oct 11, 2017
5. Dr. Schapranow, BMBF
All Hands, Oct 11, 2017
Analyze Genomes: A
Federerated In-
Memory Database for
Digital Health
5
6. Dr. Schapranow, BMBF
All Hands, Oct 11, 2017
Analyze Genomes: A
Federerated In-
Memory Database for
Digital Health
6
7. Our Vision
Medical Board Incorporating Latest Medical Knowledge
Dr. Schapranow, BMBF
All Hands, Oct 11, 2017
Analyze Genomes: A
Federerated In-
Memory Database for
Digital Health
7
8. The Challenge
Distributed Heterogeneous Data Sources
8
Human genome/biological data
600GB per full genome
15PB+ in databases of leading institutes
Prescription data
1.5B records from 10,000 doctors and
10M Patients (100 GB)
Clinical trials
Currently more than 30k
recruiting on ClinicalTrials.gov
Human proteome
160M data points (2.4GB) per sample
>3TB raw proteome data in ProteomicsDB
PubMed database
>23M articles
Hospital information systems
Often more than 50GB
Medical sensor data
Scan of a single organ in 1s
creates 10GB of raw dataCancer patient records
>160k records at NCT Analyze Genomes: A
Federerated In-
Memory Database for
Digital Health
Dr. Schapranow, BMBF
All Hands, Oct 11, 2017
9. Combined column
and row store
Map/Reduce Single and
multi-tenancy
Lightweight
compression
Insert only
for time travel
Real-time
replication
Working on
integers
SQL interface on
columns and rows
Active/passive
data store
Minimal
projections
Group key Reduction of
software layers
Dynamic multi-
threading
Bulk load
of data
Object-
relational
mapping
Text retrieval
and extraction engine
No aggregate
tables
Data partitioning Any attribute
as index
No disk
On-the-fly
extensibility
Analytics on
historical data
Multi-core/
parallelization
Our Technology
In-Memory Database Technology
+
++
+
+
P
v
+++
t
SQL
x
x
T
disk
9
Dr. Schapranow, BMBF
All Hands, Oct 11, 2017
Analyze Genomes: A
Federerated In-
Memory Database for
Digital Health
10. Dr. Schapranow, BMBF
All Hands, Oct 11, 2017
Our Approach: AnalyzeGenomes.com
In-Memory Computing Platform for Big Medical Data
10
In-Memory Database
Analyze Genomes: A
Federerated In-
Memory Database for
Digital Health
11. Dr. Schapranow, BMBF
All Hands, Oct 11, 2017
Our Approach: AnalyzeGenomes.com
In-Memory Computing Platform for Big Medical Data
11
In-Memory Database
Combined and Linked Data
Genome
Data
Cellular
Pathways
Genome
Metadata
Research
Publications
Pipeline and
Analysis Models
Drugs and
Interactions
Analyze Genomes: A
Federerated In-
Memory Database for
Digital Health
Indexed
Sources
12. Dr. Schapranow, BMBF
All Hands, Oct 11, 2017
Our Approach: AnalyzeGenomes.com
In-Memory Computing Platform for Big Medical Data
12
In-Memory Database
Extensions for Life Sciences
Data Exchange,
App Store
Access Control,
Data Protection
Fair Use
Statistical
Tools
Real-time
Analysis
App-spanning
User Profiles
Combined and Linked Data
Genome
Data
Cellular
Pathways
Genome
Metadata
Research
Publications
Pipeline and
Analysis Models
Drugs and
Interactions
Analyze Genomes: A
Federerated In-
Memory Database for
Digital Health
Indexed
Sources
13. Dr. Schapranow, BMBF
All Hands, Oct 11, 2017
Our Approach: AnalyzeGenomes.com
In-Memory Computing Platform for Big Medical Data
13
In-Memory Database
Extensions for Life Sciences
Data Exchange,
App Store
Access Control,
Data Protection
Fair Use
Statistical
Tools
Real-time
Analysis
App-spanning
User Profiles
Combined and Linked Data
Genome
Data
Cellular
Pathways
Genome
Metadata
Research
Publications
Pipeline and
Analysis Models
Drugs and
Interactions
Analyze Genomes: A
Federerated In-
Memory Database for
Digital Health
Drug Response
Analysis
Pathway Topology
Analysis
Medical
Knowledge CockpitOncolyzer
Clinical Trial
Recruitment
Cohort
Analysis
...
Indexed
Sources
14. Reproducibility
Modeling of Data Analysis Pipelines
1. Design time (researcher, process expert)
□ Definition of parameterized process model
□ Uses graphical editor and jobs from repository
2. Configuration time (researcher, lab assistant)
□ Select model and specify parameters, e.g. aln opts
□ Results in model instance stored in repository
3. Execution time (researcher)
□ Select model instance
□ Specify execution parameters, e.g. input files
Analyze Genomes: A
Federerated In-
Memory Database for
Digital Health
Dr. Schapranow, BMBF
All Hands, Oct 11, 2017
14
16. ■ Patient: 63 years, male, smoker, chronic heart insufficiency, stage III-IV
1. Appointment I (pre-surgery): Acquire systemic patient details, e.g.
physiological and blood markers
2. Predict outcome using clinical model with patient specifics
3. Select adequate option and conduct valve replacement
4. Equip patient with sensors to allow regular monitoring
5. Appointment II 6 wks after surgery to validate outcome
Establish Systems Medicine Model for
Improved Treatment of Heart Failure
Dr. Schapranow, BMBF
All Hands, Oct 11, 2017
Analyze Genomes: A
Federerated In-
Memory Database for
Digital Health
16
17. ■ Joint process definition
■ Identification of long running steps
■ Aims
□ Sharing of data
□ Improved communication
□ Reproducible data processing
□ Analysis applications for interactive
hypothesis validation
Requirements Engineering for System Medicine
Computer-aided Systems Medicine Process
Dr. Schapranow, BMBF
All Hands, Oct 11, 2017
Analyze Genomes: A
Federerated In-
Memory Database for
Digital Health
17
18. ■ Structured data acquisition, e.g. IMDB as data integration platform
■ Improved communication, e.g. event-driven user notifications
■ Reproducible data processing, e.g. IMDB as processing platform for DNA
and RNA data
■ Enables real-time data analysis
Contributions
Dr. Schapranow, BMBF
All Hands, Oct 11, 2017
Analyze Genomes: A
Federerated In-
Memory Database for
Digital Health
18
RNA Seq Analysis_V2
TopHat
Trimmomatic
FASTQC
STAR
featureCounts
Counts Matrix
BAM-File
Aligned Reads
FASTQC 2
FASTQ -
Trimmed Reads
Pre-Trimming
QC-Report
FASTQ - Reads
Post-Alignment
QC-Report
19. s
Dr. Schapranow, BMBF
All Hands, Oct 11, 2017
Analyze Genomes: A
Federerated In-
Memory Database for
Digital Health
19
20. ■ Interdisciplinary partners collaborate on enabling interactive health research
■ Current funding period: Aug 2015 – July 2018
■ Funded consortium partners:
□ AOK
German healthcare insurance company
□ data experts group
Technology operations
□ Hasso Plattner Institute
Real-time data analysis, in-memory database technology
□ Technology, Methods, and Infrastructure for Networked Medical Research
Legal and data protection
Smart Analysis Health Research Access (SAHRA)
Dr. Schapranow, BMBF
All Hands, Oct 11, 2017
Analyze Genomes: A
Federerated In-
Memory Database for
Digital Health
20
21. ■ Analysis dashboard combining
functions per use case
■ Providing expert-facing entry
point to individual apps
■ Provides application-wide
authentication / single sign on
Interactive Analysis Dashboard
Dr. Schapranow, BMBF
All Hands, Oct 11, 2017
Analyze Genomes: A
Federerated In-
Memory Database for
Digital Health
21
22. ■ Stratification of patient cohorts using patient specifics
■ Automatic matching of similar patients and patient anamnesis
■ Interactive graphical exploration of longitudinal patient data
Stratification of Hypertension Patients
and Longitudinal Data Analysis
Dr. Schapranow, BMBF
All Hands, Oct 11, 2017
Analyze Genomes: A
Federerated In-
Memory Database for
Digital Health
22
23. ■ Query-oriented search interface
■ Seamless integration of patient specifics, e.g. from EMR
■ Parallel search in international knowledge bases, e.g. for biomarkers, literature,
cellular pathway, and clinical trials
App Example:
Medical Knowledge Cockpit for Patients and Clinicians
Analyze Genomes: A
Federerated In-
Memory Database for
Digital Health
23
Dr. Schapranow, BMBF
All Hands, Oct 11, 2017
24. Dr. Schapranow, BMBF
All Hands, Oct 11, 2017
Medical Knowledge Cockpit for Patients and Clinicians
Pathway Topology Analysis
■ Search in pathways is limited to “is a certain
element contained” today
■ Integrated >1,5k pathways from international
sources, e.g. KEGG, HumanCyc, and WikiPathways,
into HANA
■ Implemented graph-based topology exploration and
ranking based on patient specifics
■ Enables interactive identification of possible
dysfunctions affecting the course of a therapy
before its start
Analyze Genomes: A
Federerated In-
Memory Database for
Digital Health
Unified access to multiple formerly
disjoint data sources
Pathway analysis of genetic
variants with graph engine
24
25. Dr. Schapranow, BMBF
All Hands, Oct 11, 2017
■ Interactively explore relevant publications, e.g. PDFs
■ Improved ease of exploration, e.g. by highlighted medical terms and relevant
concepts
Medical Knowledge Cockpit for Patients and Clinicians
Publications
Analyze Genomes: A
Federerated In-
Memory Database for
Digital Health
25
26. ■ For patients
□ Identify relevant clinical trials and medical experts
□ Become an informed patient
■ For clinicians
□ Identify pharmacokinetic correlations
□ Scan for similar patient cases, e.g. to evaluate therapy efficiency
■ For researchers
□ Enable real-time analysis of medical data, e.g. assess pathways
to identify impact of detected variants
□ Combined mining in structured and unstructured data, e.g. publications,
diagnosis, and EMR data
What to Take Home?
Learn more and test-drive it yourself: AnalyzeGenomes.com
Dr. Schapranow, BMBF
All Hands, Oct 11, 2017
26
Analyze Genomes: A
Federerated In-
Memory Database for
Digital Health
27. Keep in contact with us!
Dr. Schapranow, BMBF
All Hands, Oct 11, 2017
Analyze Genomes: A
Federerated In-
Memory Database for
Digital Health
27
Dr.-Ing. Matthieu-P. Schapranow
Program Manager E-Health & Life Sciences
Hasso Plattner Institute
August-Bebel-Str. 88
14482 Potsdam, Germany
schapranow@hpi.de
http://we.analyzegenomes.com/