In-Memory Data Management for Systems Medicine

In-Memory Data Management for Systems Medicine
Dr. Matthieu-P. Schapranow
e:Med Focus Workshop Data Management in Systems Medicine, Berlin
June 10, 2016

Heart
Failure
Sleeping
disorder
Fibrosis
Blood
pressure
Blood
volume
Gene ex-
pression
Hyper-
trophyCalcium
meta-
bolism
Energy
meta-
bolism
Iron
deﬁciency
Vitamin-D
deﬁciency
Gender
Epi-
genetics
■  Integrated systems medicine based on
real-time analysis of healthcare data
■  Initial funding period: Mar ‘15 – Feb ‘18
■  Funded consortium partners:
App Example:
Systems Medicine Model of Heart Failure (SMART)
Schapranow, e:Med
Workshop, Jun 10, 2016
In-Memory Data
Management for
Systems Medicine
2

■  Patients
□  Individual anamnesis, family history, and background
□  Require fast access to individualized therapy
■  Clinicians
□  Identify root and extent of disease using laboratory tests
□  Evaluate therapy alternatives, adapt existing therapy
■  Researchers
□  Conduct laboratory work, e.g. analyze patient samples
□  Create new research ﬁndings and come-up with treatment alternatives
Actors in Systems Medicine
Schapranow, e:Med
3
In-Memory Data
Management for
Systems Medicine

Schapranow, e:Med
In-Memory Data
Management for
Systems Medicine
4

Schapranow, e:Med
In-Memory Data
Management for
Systems Medicine
5

IT Challenges
Distributed Heterogeneous Data Sources
6
Human genome/biological data
600GB per full genome
15PB+ in databases of leading institutes
Prescription data
1.5B records from 10,000 doctors and
10M Patients (100 GB)
Clinical trials
Currently more than 30k
recruiting on ClinicalTrials.gov
Human proteome
160M data points (2.4GB) per sample
>3TB raw proteome data in ProteomicsDB
PubMed database
>23M articles
Hospital information systems
Often more than 50GB
Medical sensor data
Scan of a single organ in 1s
creates 10GB of raw dataCancer patient records
>160k records at NCT
In-Memory Data
Management for
Systems Medicine
Schapranow, e:Med

Our Methodology
Design Thinking
Schapranow, e:Med
In-Memory Data
Management for
Systems Medicine
7

■  Joint process deﬁnition
■  Identiﬁcation of long running steps
■  Aims
□  Improved communication
□  Sharing of data
□  Reproducible data processing
Requirements Engineering for System Medicine
Computer-aided Systems Medicine Process
Schapranow, e:Med
In-Memory Data
Management for
Systems Medicine
8
20160407_eCardiohealth_Whole_Process
HeartCenter
Study
Assessor
Study Assessor
Study
Assessment
Eligible Patient Available
Radiologist
Radiologist
MRI MR
Images
Patient Meta
Data, Hemo-
dynamic
Parameters,
and Clinical
Data
Cardiologist
Cardiologist
Surgery Performed?
Hemodyna-
mic
Evaluation
Surgeon
Surgeon
Surgery
ITplatform
IT platform
Update
Notification
SMART Data Storage
Data
processing
WetLab
WetLab
Wet Lab
Wet Lab
Experiments Validation
Wet Lab
Results, e.g.
Expression
Data
Message: Biopsy Sample
Condition: 20 Biopsy Samples for batch processing
Bioinformatici-
an
Bioinformatician
RNA
Sequencing
FASTQ Files
ProteomicsLab
Proteome
Analyzer
Proteome Analyzer
Protein
Expressions
Proteome
Experiments
Cardiomyocyte
Modeler
Cardiomyocyte Modeler
Cardiomyocyte
Modeling
Cardiomyo-
cyte Electro-
mechanical
Model
Modeling
Multi-scale
modeller
Multi-scale modeller
Message: Post-surgery visit completed with data entry
Multi-Scale
Modeling
Model
output
Hemodynamic
Parameters
Protein
Expression
Levels

Data Processing Pipelines
From Model to Execution
1.  Design time (researcher, process expert)
□  Definition of parameterized process model
□  Uses graphical editor and jobs from repository
2.  Configuration time (researcher, lab assistant)
□  Select model and specify parameters, e.g. aln opts
□  Results in model instance stored in repository
3.  Execution time (researcher)
□  Select model instance
□  Specify execution parameters, e.g. input files
In-Memory Data
Management for
Systems Medicine
Schapranow, e:Med
9

■  Requirements
□  Real-time data analysis
□  Maintained software
■  Restrictions
□  Data privacy
□  Data locality
□  Volume of “big medical data”
■  Solution?
□  Federated In-Memory Database System vs. Cloud Computing
Software Requirements in Systems Medicine
Schapranow, e:Med
In-Memory Data
Management for
Systems Medicine
10

Where are all those Clouds go to?
Schapranow, e:Med
In-Memory Data
Management for
Systems Medicine
11
Gartner's 2014 Hype Cycle for Emerging Technologies

Multiple Cloud Service Providers
Schapranow, e:Med
In-Memory Data
Management for
Systems Medicine
12
Local System
C loud
Synchronization
Service
R
Local Storage
Local
Synchronization
Service
R
Shared
C loud
Storage
Site A
Local System
R
Local Storage
Local
Synchronization
Service
Site B
C loud
Synchronization
Service
Shared
C loud
Storage
R
Cloud Provider
Site A
C loud Provider
Site B

A Single Service Provider
Schapranow, e:Med
In-Memory Data
Management for
Systems Medicine
13
Cloud
Synchronization
Service
Shared
Cloud
Storage
Site A Site BCloud Provider
Cloud System
R R

Multiple Sites Forming the
Federated In-Memory Database System
Schapranow, e:Med
In-Memory Data
Management for
Systems Medicine
14
Federated In-M em ory D atabase System
M aster Data and
Shared Algorithm s
Site A Site BCloud Provider
Cloud IM D B
Instance
Local IM DB
Instance
Sensitive D ata,
e.g. Patient Data
R
Local IM DB
Instance
Sensitive Data,
e.g. Patient D ata
R

Schapranow, e:Med
we.analyzegenomes.com
Real-time Analysis of Big Medical Data
15
In-Memory Database
Extensions for Life Sciences
Data Exchange,
App Store
Access Control,
Data Protection
Fair Use
Statistical
Tools
Real-time
Analysis
App-spanning
User Proﬁles
Combined and Linked Data
Genome
Data
Cellular
Pathways
Genome
Metadata
Research
Publications
Pipeline and
Analysis Models
Drugs and
Interactions
In-Memory Data
Management for
Systems Medicine
Drug Response
Analysis
Pathway Topology
Analysis
Medical
Knowledge CockpitOncolyzer
Clinical Trial
Recruitment
Cohort
Analysis
...
Indexed
Sources

Combined column
and row store
Map/Reduce Single and
multi-tenancy
Lightweight
compression
Insert only
for time travel
Real-time
replication
Working on
integers
SQL interface on
columns and rows
Active/passive
data store
Minimal
projections
Group key Reduction of
software layers
Dynamic multi-
threading
Bulk load
of data
Object-
relational
mapping
Text retrieval
and extraction engine
No aggregate
tables
Data partitioning Any attribute
as index
No disk
On-the-ﬂy
extensibility
Analytics on
historical data
Multi-core/
parallelization
Our Technology
In-Memory Database Technology
+
++
+
+
P
v
+++
t
SQL
x
x
T
disk
16
Schapranow, e:Med
In-Memory Data
Management for
Systems Medicine

■  Traditional databases allow four data operations:
□  INSERT, SELECT and
□  DELETE, UPDATE
■  Insert-only requires only INSERT, SELECT to maintain a complete history
(bookkeeping systems)
■  Insert-only enables time travelling, e.g. to
□  Trace changes and reconstruct decisions
□  Document complete history of changes, therapies, etc.
□  Enable statistical observations
Insert-Only / Append-Only
Schapranow, e:Med
In-Memory Data
Management for
Systems Medicine
17
++
+
+

■  Main memory access is the new bottleneck
■  Lightweight compression can reduce this bottleneck, i.e.
□  Lossless
□  Improved usage of data bus capacity
□  Work directly on compressed data
Lightweight Compression
Schapranow, e:Med
In-Memory Data
Management for
Systems Medicine
18
Attribute Vector
RecId ValueId
1  C18.0
2  C32.0
3  C00.9
4  C18.0
5 C20.0
6 C20.0
7 C50.9
8 C18.0
Inverted Index
ValueId RecIdList
1  2
2  3
3  5,6
4  1,4,8
5  7
Data Dictionary
ValueId Value
1 Larynx
2 Lip
3 Rectum
4 Colon
5 MamaTable
………
C18.0Colon646470
C50.9Mama167898
C20.0Rectum647912
C20.0Rectum215678
C18.0Colon998711
C00.9Lip123489
C32.0Larynx357982
C18.0Colon091487RecId 1
RecId 2
RecId 3
RecId 4
RecId 5
RecId 6
RecId 7
RecId 8
…
•  Typical compression factor of 10:1 for
enterprise software
•  In ﬁnancial applications up to 50:1

■  Horizontal Partitioning
□  Cut long tables into shorter segments
□  E.g. to group samples with same relevance
■  Vertical Partitioning
□  Split oﬀ columns to individual resources
□  E.g. to separate personalized data from experiment data
■  Partitioning is the basis for
□  Parallel execution of database queries
□  Implementation of data aging and data retention management
Data Partitioning
Schapranow, e:Med
In-Memory Data
Management for
Systems Medicine
19

■  Modern server systems consist of x CPUs, e.g.
■  Each CPU consists of y CPU cores, e.g. 12
■  Consider each of the x*y CPU core as individual workers, e.g. 6x12
■  Each worker can perform one task at the same time in parallel
■  Full table scan of database table w/ 1M entries results in 1/x*1/y search time when
traversing in parallel
□  Reduced response time
□  No need for pre-aggregated totals and redundant data
□  Improved usage of hardware
□  Instant analysis of data
Multi-core and Parallelization
Schapranow, e:Med
In-Memory Data
Management for
Systems Medicine
20

■  Online: Visit we.analyzegenomes.com for latest research
results, slides, videos, tools, and publications
■  Oﬄine: Read more about it, e.g.
High-Performance In-Memory Genome Data Analysis:
How In-Memory Database Technology Accelerates Personalized Medicine,
In-Memory Data Management Research, Springer,
ISBN: 978-3-319-03034-0, 2014
■  In Person: Join us for the Symposium “Diagnostics in the Era of Big Data and
Systems Medicine” Oct 5-6, 2016 in Potsdam
Where to ﬁnd additional information?
Schapranow, e:Med
In-Memory Data
Management for
Systems Medicine
21

Keep in contact with us!
Dr. Matthieu-P. Schapranow
Program Manager E-Health & Life Sciences
Hasso Plattner Institute
August-Bebel-Str. 88
14482 Potsdam, Germany
schapranow@hpi.de
http://we.analyzegenomes.com/
Schapranow, e:Med
In-Memory Data
Management for
Systems Medicine
22

In-Memory Data Management for Systems Medicine

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to In-Memory Data Management for Systems Medicine

Similar to In-Memory Data Management for Systems Medicine (19)

Recently uploaded

Recently uploaded (20)

In-Memory Data Management for Systems Medicine