Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

In-Memory Data Management for Systems Medicine

733 views

Published on

This presentation provides a brief overview of how in-memory database technology can be applied to support systems medicine approaches. For that, it shares real-world experiences, e.g. from the SMART project consortium funded by the German Federal Ministry of Education and Research.

Published in: Health & Medicine
  • Be the first to comment

  • Be the first to like this

In-Memory Data Management for Systems Medicine

  1. 1. In-Memory Data Management for Systems Medicine Dr. Matthieu-P. Schapranow e:Med Focus Workshop Data Management in Systems Medicine, Berlin June 10, 2016
  2. 2. Heart Failure Sleeping disorder Fibrosis Blood pressure Blood volume Gene ex- pression Hyper- trophyCalcium meta- bolism Energy meta- bolism Iron deficiency Vitamin-D deficiency Gender Epi- genetics ■  Integrated systems medicine based on real-time analysis of healthcare data ■  Initial funding period: Mar ‘15 – Feb ‘18 ■  Funded consortium partners: App Example: Systems Medicine Model of Heart Failure (SMART) Schapranow, e:Med Workshop, Jun 10, 2016 In-Memory Data Management for Systems Medicine 2
  3. 3. ■  Patients □  Individual anamnesis, family history, and background □  Require fast access to individualized therapy ■  Clinicians □  Identify root and extent of disease using laboratory tests □  Evaluate therapy alternatives, adapt existing therapy ■  Researchers □  Conduct laboratory work, e.g. analyze patient samples □  Create new research findings and come-up with treatment alternatives Actors in Systems Medicine Schapranow, e:Med Workshop, Jun 10, 2016 3 In-Memory Data Management for Systems Medicine
  4. 4. Schapranow, e:Med Workshop, Jun 10, 2016 In-Memory Data Management for Systems Medicine 4
  5. 5. Schapranow, e:Med Workshop, Jun 10, 2016 In-Memory Data Management for Systems Medicine 5
  6. 6. IT Challenges Distributed Heterogeneous Data Sources 6 Human genome/biological data 600GB per full genome 15PB+ in databases of leading institutes Prescription data 1.5B records from 10,000 doctors and 10M Patients (100 GB) Clinical trials Currently more than 30k recruiting on ClinicalTrials.gov Human proteome 160M data points (2.4GB) per sample >3TB raw proteome data in ProteomicsDB PubMed database >23M articles Hospital information systems Often more than 50GB Medical sensor data Scan of a single organ in 1s creates 10GB of raw dataCancer patient records >160k records at NCT In-Memory Data Management for Systems Medicine Schapranow, e:Med Workshop, Jun 10, 2016
  7. 7. Our Methodology Design Thinking Schapranow, e:Med Workshop, Jun 10, 2016 In-Memory Data Management for Systems Medicine 7
  8. 8. ■  Joint process definition ■  Identification of long running steps ■  Aims □  Improved communication □  Sharing of data □  Reproducible data processing Requirements Engineering for System Medicine Computer-aided Systems Medicine Process Schapranow, e:Med Workshop, Jun 10, 2016 In-Memory Data Management for Systems Medicine 8 20160407_eCardiohealth_Whole_Process HeartCenter Study Assessor Study Assessor Study Assessment Eligible Patient Available Radiologist Radiologist MRI MR Images Patient Meta Data, Hemo- dynamic Parameters, and Clinical Data Cardiologist Cardiologist Surgery Performed? Hemodyna- mic Evaluation Surgeon Surgeon Surgery ITplatform IT platform Update Notification SMART Data Storage Data processing WetLab WetLab Wet Lab Wet Lab Experiments Validation Wet Lab Results, e.g. Expression Data Message: Biopsy Sample Condition: 20 Biopsy Samples for batch processing Bioinformatici- an Bioinformatician RNA Sequencing FASTQ Files ProteomicsLab Proteome Analyzer Proteome Analyzer Protein Expressions Proteome Experiments Cardiomyocyte Modeler Cardiomyocyte Modeler Cardiomyocyte Modeling Cardiomyo- cyte Electro- mechanical Model Modeling Multi-scale modeller Multi-scale modeller Message: Post-surgery visit completed with data entry Multi-Scale Modeling Model output Hemodynamic Parameters Protein Expression Levels
  9. 9. Data Processing Pipelines From Model to Execution 1.  Design time (researcher, process expert) □  Definition of parameterized process model □  Uses graphical editor and jobs from repository 2.  Configuration time (researcher, lab assistant) □  Select model and specify parameters, e.g. aln opts □  Results in model instance stored in repository 3.  Execution time (researcher) □  Select model instance □  Specify execution parameters, e.g. input files In-Memory Data Management for Systems Medicine Schapranow, e:Med Workshop, Jun 10, 2016 9
  10. 10. ■  Requirements □  Real-time data analysis □  Maintained software ■  Restrictions □  Data privacy □  Data locality □  Volume of “big medical data” ■  Solution? □  Federated In-Memory Database System vs. Cloud Computing Software Requirements in Systems Medicine Schapranow, e:Med Workshop, Jun 10, 2016 In-Memory Data Management for Systems Medicine 10
  11. 11. Where are all those Clouds go to? Schapranow, e:Med Workshop, Jun 10, 2016 In-Memory Data Management for Systems Medicine 11 Gartner's 2014 Hype Cycle for Emerging Technologies
  12. 12. Multiple Cloud Service Providers Schapranow, e:Med Workshop, Jun 10, 2016 In-Memory Data Management for Systems Medicine 12 Local System C loud Synchronization Service R Local Storage Local Synchronization Service R Shared C loud Storage Site A Local System R Local Storage Local Synchronization Service Site B C loud Synchronization Service Shared C loud Storage R Cloud Provider Site A C loud Provider Site B
  13. 13. A Single Service Provider Schapranow, e:Med Workshop, Jun 10, 2016 In-Memory Data Management for Systems Medicine 13 Cloud Synchronization Service Shared Cloud Storage Site A Site BCloud Provider Cloud System R R
  14. 14. Multiple Sites Forming the Federated In-Memory Database System Schapranow, e:Med Workshop, Jun 10, 2016 In-Memory Data Management for Systems Medicine 14 Federated In-M em ory D atabase System M aster Data and Shared Algorithm s Site A Site BCloud Provider Cloud IM D B Instance Local IM DB Instance Sensitive D ata, e.g. Patient Data R Local IM DB Instance Sensitive Data, e.g. Patient D ata R
  15. 15. Schapranow, e:Med Workshop, Jun 10, 2016 we.analyzegenomes.com Real-time Analysis of Big Medical Data 15 In-Memory Database Extensions for Life Sciences Data Exchange, App Store Access Control, Data Protection Fair Use Statistical Tools Real-time Analysis App-spanning User Profiles Combined and Linked Data Genome Data Cellular Pathways Genome Metadata Research Publications Pipeline and Analysis Models Drugs and Interactions In-Memory Data Management for Systems Medicine Drug Response Analysis Pathway Topology Analysis Medical Knowledge CockpitOncolyzer Clinical Trial Recruitment Cohort Analysis ... Indexed Sources
  16. 16. Combined column and row store Map/Reduce Single and multi-tenancy Lightweight compression Insert only for time travel Real-time replication Working on integers SQL interface on columns and rows Active/passive data store Minimal projections Group key Reduction of software layers Dynamic multi- threading Bulk load of data Object- relational mapping Text retrieval and extraction engine No aggregate tables Data partitioning Any attribute as index No disk On-the-fly extensibility Analytics on historical data Multi-core/ parallelization Our Technology In-Memory Database Technology + ++ + + P v +++ t SQL x x T disk 16 Schapranow, e:Med Workshop, Jun 10, 2016 In-Memory Data Management for Systems Medicine
  17. 17. ■  Traditional databases allow four data operations: □  INSERT, SELECT and □  DELETE, UPDATE ■  Insert-only requires only INSERT, SELECT to maintain a complete history (bookkeeping systems) ■  Insert-only enables time travelling, e.g. to □  Trace changes and reconstruct decisions □  Document complete history of changes, therapies, etc. □  Enable statistical observations Insert-Only / Append-Only Schapranow, e:Med Workshop, Jun 10, 2016 In-Memory Data Management for Systems Medicine 17 ++ + +
  18. 18. ■  Main memory access is the new bottleneck ■  Lightweight compression can reduce this bottleneck, i.e. □  Lossless □  Improved usage of data bus capacity □  Work directly on compressed data Lightweight Compression Schapranow, e:Med Workshop, Jun 10, 2016 In-Memory Data Management for Systems Medicine 18 Attribute Vector RecId ValueId 1  C18.0 2  C32.0 3  C00.9 4  C18.0 5 C20.0 6 C20.0 7 C50.9 8 C18.0 Inverted Index ValueId RecIdList 1  2 2  3 3  5,6 4  1,4,8 5  7 Data Dictionary ValueId Value 1 Larynx 2 Lip 3 Rectum 4 Colon 5 MamaTable ……… C18.0Colon646470 C50.9Mama167898 C20.0Rectum647912 C20.0Rectum215678 C18.0Colon998711 C00.9Lip123489 C32.0Larynx357982 C18.0Colon091487RecId 1 RecId 2 RecId 3 RecId 4 RecId 5 RecId 6 RecId 7 RecId 8 … •  Typical compression factor of 10:1 for enterprise software •  In financial applications up to 50:1
  19. 19. ■  Horizontal Partitioning □  Cut long tables into shorter segments □  E.g. to group samples with same relevance ■  Vertical Partitioning □  Split off columns to individual resources □  E.g. to separate personalized data from experiment data ■  Partitioning is the basis for □  Parallel execution of database queries □  Implementation of data aging and data retention management Data Partitioning Schapranow, e:Med Workshop, Jun 10, 2016 In-Memory Data Management for Systems Medicine 19
  20. 20. ■  Modern server systems consist of x CPUs, e.g. ■  Each CPU consists of y CPU cores, e.g. 12 ■  Consider each of the x*y CPU core as individual workers, e.g. 6x12 ■  Each worker can perform one task at the same time in parallel ■  Full table scan of database table w/ 1M entries results in 1/x*1/y search time when traversing in parallel □  Reduced response time □  No need for pre-aggregated totals and redundant data □  Improved usage of hardware □  Instant analysis of data Multi-core and Parallelization Schapranow, e:Med Workshop, Jun 10, 2016 In-Memory Data Management for Systems Medicine 20
  21. 21. ■  Online: Visit we.analyzegenomes.com for latest research results, slides, videos, tools, and publications ■  Offline: Read more about it, e.g. High-Performance In-Memory Genome Data Analysis: How In-Memory Database Technology Accelerates Personalized Medicine, In-Memory Data Management Research, Springer, ISBN: 978-3-319-03034-0, 2014 ■  In Person: Join us for the Symposium “Diagnostics in the Era of Big Data and Systems Medicine” Oct 5-6, 2016 in Potsdam Where to find additional information? Schapranow, e:Med Workshop, Jun 10, 2016 In-Memory Data Management for Systems Medicine 21
  22. 22. Keep in contact with us! Dr. Matthieu-P. Schapranow Program Manager E-Health & Life Sciences Hasso Plattner Institute August-Bebel-Str. 88 14482 Potsdam, Germany schapranow@hpi.de http://we.analyzegenomes.com/ Schapranow, e:Med Workshop, Jun 10, 2016 In-Memory Data Management for Systems Medicine 22

×