Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

A Federated In-Memory Database Computing Platform Enabling Real-Time Analysis of Big Medical Data

852 views

Published on

The slide deck "A Federated In-Memory Database Computing Platform Enabling Real-time Analysis of Big Medical Data" presented on May 17, 2017 at Intel Tech Talks hosted by SAPPHIRE 2017 in Orlando, FL is online available now.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

A Federated In-Memory Database Computing Platform Enabling Real-Time Analysis of Big Medical Data

  1. 1. A Federated In-Memory Database Computing Platform Enabling Real- time Analysis of Big Medical Data Dr.-Ing. Matthieu-P. Schapranow Hasso Plattner Institute, Potsdam, Germany May 17, 2017
  2. 2. ■  Can we enable clinicians to take their therapy decisions: □  Incorporating all available patient specifics, □  Referencing latest lab results and worldwide medical knowledge, and □  In an interactive manner during their ward round? Our Motivation Turn Precision Medicine Into Clinical Routine Analyze Genomes: A Federated In- Memory Database Computing Platform 2 Dr. Schapranow, Intel Tech Talk at SAPPHIRE, May 17, 2017
  3. 3. Dr. Schapranow, Intel Tech Talk at SAPPHIRE, May 17, 2017 Analyze Genomes: A Federated In- Memory Database Computing Platform 3
  4. 4. Our Vision Medical Board Incorporating Latest Medical Knowledge Dr. Schapranow, Intel Tech Talk at SAPPHIRE, May 17, 2017 Analyze Genomes: A Federated In- Memory Database Computing Platform 4
  5. 5. Project Time Line Dr. Schapranow, Intel Tech Talk at SAPPHIRE, May 17, 2017 Analyze Genomes: A Federated In- Memory Database Computing Platform 5 2009 2010 2011 2012 2013 2014 2015 SAP HANA launched Oncolyzer SORMAS Drug Response Analysis Enterprise Software Medical Knowledge Cockpit Analyze Genomes Platform IMDB Research 2016 2017 A R T + T RAM S + S M
  6. 6. The Challenge Distributed Heterogeneous Data Sources 6 Human genome/biological data 600GB per full genome 15PB+ in databases of leading institutes Prescription data 1.5B records from 10,000 doctors and 10M Patients (100 GB) Clinical trials Currently more than 30k recruiting on ClinicalTrials.gov Human proteome 160M data points (2.4GB) per sample >3TB raw proteome data in ProteomicsDB PubMed database >23M articles Hospital information systems Often more than 50GB Medical sensor data Scan of a single organ in 1s creates 10GB of raw dataCancer patient records >160k records at NCT Analyze Genomes: A Federated In- Memory Database Computing Platform Dr. Schapranow, Intel Tech Talk at SAPPHIRE, May 17, 2017
  7. 7. ■  Requirements □  Managed services □  Reproducibility □  Real-time data analysis ■  Restrictions □  Data privacy □  Data locality □  Volume of big medical data Software Requirements in Life Sciences Dr. Schapranow, Intel Tech Talk at SAPPHIRE, May 17, 2017 Analyze Genomes: A Federated In- Memory Database Computing Platform 7 http://stevedempsen.blogspot.de/2013/08/agile-software-requirements-comic.html
  8. 8. Dr. Schapranow, Intel Tech Talk at SAPPHIRE, May 17, 2017 Our Approach: AnalyzeGenomes.com In-Memory Computing Platform for Big Medical Data 8 In-Memory Database Analyze Genomes: A Federated In- Memory Database Computing Platform
  9. 9. Dr. Schapranow, Intel Tech Talk at SAPPHIRE, May 17, 2017 Our Approach: AnalyzeGenomes.com In-Memory Computing Platform for Big Medical Data 9 In-Memory Database Combined and Linked Data Genome Data Cellular Pathways Genome Metadata Research Publications Pipeline and Analysis Models Drugs and Interactions Analyze Genomes: A Federated In- Memory Database Computing Platform Indexed Sources
  10. 10. Dr. Schapranow, Intel Tech Talk at SAPPHIRE, May 17, 2017 Our Approach: AnalyzeGenomes.com In-Memory Computing Platform for Big Medical Data 10 In-Memory Database Extensions for Life Sciences Data Exchange, App Store Access Control, Data Protection Fair Use Statistical Tools Real-time Analysis App-spanning User Profiles Combined and Linked Data Genome Data Cellular Pathways Genome Metadata Research Publications Pipeline and Analysis Models Drugs and Interactions Analyze Genomes: A Federated In- Memory Database Computing Platform Indexed Sources
  11. 11. Dr. Schapranow, Intel Tech Talk at SAPPHIRE, May 17, 2017 Our Approach: AnalyzeGenomes.com In-Memory Computing Platform for Big Medical Data 11 In-Memory Database Extensions for Life Sciences Data Exchange, App Store Access Control, Data Protection Fair Use Statistical Tools Real-time Analysis App-spanning User Profiles Combined and Linked Data Genome Data Cellular Pathways Genome Metadata Research Publications Pipeline and Analysis Models Drugs and Interactions Analyze Genomes: A Federated In- Memory Database Computing Platform Drug Response Analysis Pathway Topology Analysis Medical Knowledge CockpitOncolyzer Clinical Trial Recruitment Cohort Analysis ... Indexed Sources
  12. 12. Combined column and row store Map/Reduce Single and multi-tenancy Lightweight compression Insert only for time travel Real-time replication Working on integers SQL interface on columns and rows Active/passive data store Minimal projections Group key Reduction of software layers Dynamic multi- threading Bulk load of data Object- relational mapping Text retrieval and extraction engine No aggregate tables Data partitioning Any attribute as index No disk On-the-fly extensibility Analytics on historical data Multi-core/ parallelization Our Technology In-Memory Database Technology + ++ + + P v +++ t SQL x x T disk 12 Dr. Schapranow, Intel Tech Talk at SAPPHIRE, May 17, 2017 Analyze Genomes: A Federated In- Memory Database Computing Platform
  13. 13. Scheduling and Execution of Genome Data Processing Pipelines Analyze Genomes: A Federated In-Memory Database Computing Platform In-Memory Database Tasks Scheduler ID Pipeline Params 12 BWA xyz.fastq 13 Stanford A_1.fastq 14 Bowtie xyz.fastq Worker Worker Subtasks Task ID Job Status Params 12 97 Split done xyz.fastq 12 98 Import todo abc.vcf 12 98 Import done abc.vcf Webservice . . . 1. Trigger task execution 2. Schedule subtasks 3. Execute subtasks 13
  14. 14. Managed Services provided by Federated In-Memory Database System (FIMDB) Dr. Schapranow, Intel Tech Talk at SAPPHIRE, May 17, 2017 Analyze Genomes: A Federated In- Memory Database Computing Platform 14 Node i WorkerWorkerWorker IMDB Node j WorkerWorkerWorker IMDB Node k WorkerWorkerWorker IMDB Scheduler Node m WorkerWorkerWorker IMDB Relay Node n WorkerWorkerWorker IMDB ... Cloud Service Provider (Shared Algorithms and Public Reference Data) Hospital or Research Department (Sensitive/Patient Data) VPN UDP TCP Shared File System (Pool) Shared File System (Pool) ... Shared File System (Global)
  15. 15. ■  Not standardized ■  Not exchangeable ■  Concatenation of bash scripts reading from and writing to files ■  Requires IT expertise for □  Setup □  Error handling, and □  Efficient processing and parallelization ■  Objective: Model, configure, and execute pipelines without involving IT experts Genome Data Processing Pipelines State of the Art Dr. Schapranow, Intel Tech Talk at SAPPHIRE, May 17, 2017 Analyze Genomes: A Federated In- Memory Database Computing Platform 15 bwa aln ref.fa sample.fastq | bwa samse ref.fa – sample.fastq | samtools view -Su - | samtools sort …
  16. 16. ■  Graphical modeling notation ■  Compliant with BPMN 2.0 extended by □  Modular structure □  Degree of parallelization □  Parameters and variables ■  Model descriptions (XPDL) are stored in IMDB ■  Model instances are transformed into graph structure executed by our worker framework Genome Data Processing Pipelines Standardized Modeling Dr. Schapranow, Intel Tech Talk at SAPPHIRE, May 17, 2017 Analyze Genomes: A Federated In- Memory Database Computing Platform Chart 16
  17. 17. Genome Data Processing Pipelines XML Process Definition Language Dr. Schapranow, Intel Tech Talk at SAPPHIRE, May 17, 2017 Analyze Genomes: A Federated In- Memory Database Computing Platform 17
  18. 18. PIPELINES.MODELS Database Structure Dr. Schapranow, Intel Tech Talk at SAPPHIRE, May 17, 2017 Analyze Genomes: A Federated In- Memory Database Computing Platform 18 PIPELINES.PIPELINES
  19. 19. ■  Results are imported into IMDB ■  Optimization reduced execution time by >50% Genome Data Processing Pipelines Traditional vs. Optimized Approach Dr. Schapranow, Intel Tech Talk at SAPPHIRE, May 17, 2017 Analyze Genomes: A Federated In- Memory Database Computing Platform 19
  20. 20. Reproducibility Modeling of Data Analysis Pipelines 1.  Design time (researcher, process expert) □  Definition of parameterized process model □  Uses graphical editor and jobs from repository 2.  Configuration time (researcher, lab assistant) □  Select model and specify parameters, e.g. aln opts □  Results in model instance stored in repository 3.  Execution time (researcher) □  Select model instance □  Specify execution parameters, e.g. input files Analyze Genomes: A Federated In- Memory Database Computing Platform Dr. Schapranow, Intel Tech Talk at SAPPHIRE, May 17, 2017 20
  21. 21. ■  Query-oriented search interface ■  Seamless integration of patient specifics, e.g. from EMR ■  Parallel search in international knowledge bases, e.g. for biomarkers, literature, cellular pathway, and clinical trials App Example: Medical Knowledge Cockpit for Patients and Clinicians Analyze Genomes: A Federated In- Memory Database Computing Platform 21 Dr. Schapranow, Intel Tech Talk at SAPPHIRE, May 17, 2017
  22. 22. Dr. Schapranow, Intel Tech Talk at SAPPHIRE, May 17, 2017 Medical Knowledge Cockpit for Patients and Clinicians Pathway Topology Analysis ■  Search in pathways is limited to “is a certain element contained” today ■  Integrated >1,5k pathways from international sources, e.g. KEGG, HumanCyc, and WikiPathways, into HANA ■  Implemented graph-based topology exploration and ranking based on patient specifics ■  Enables interactive identification of possible dysfunctions affecting the course of a therapy before its start Analyze Genomes: A Federated In- Memory Database Computing Platform Unified access to multiple formerly disjoint data sources Pathway analysis of genetic variants with graph engine 22
  23. 23. Dr. Schapranow, Intel Tech Talk at SAPPHIRE, May 17, 2017 ■  Interactively explore relevant publications, e.g. PDFs ■  Improved ease of exploration, e.g. by highlighted medical terms and relevant concepts Medical Knowledge Cockpit for Patients and Clinicians Publications Analyze Genomes: A Federated In- Memory Database Computing Platform 23
  24. 24. App Example: Real-time Assessment of Clinical Trial Candidates ■  Supports trial design and recruitment process through statistical data analysis ■  Real-time matching and clustering of patients and clinical trial inclusion/exclusion criteria ■  Reassessment of already screened or participating citizens to reduce recruitment costs ■  Integrates smoothly with the Analyze Genomes: A Federated In- Memory Database Computing Platform Real-time assessment of clinical trial candidates 24 Dr. Schapranow, Intel Tech Talk at SAPPHIRE, May 17, 2017
  25. 25. ■  Online: Visit we.analyzegenomes.com for latest research results, slides, videos, tools, and publications ■  Offline: High-Performance In-Memory Genome Data Analysis: In-Memory Data Management Research, Springer, ISBN: 978-3-319-03034-0, 2014 ■  In Person: Visit us at the HPI booth 200! ■  Join us for Intel Tech Talks at SAPPHIRE booth 669! □  May 17 01.00pm: A Federated In-Memory Database Computing Platform Enabling Real-time Analysis of Big Medical Data □  May 18 3.00pm: In-Memory Apps For Precision Medicine Where to find additional information? Dr. Schapranow, Intel Tech Talk at SAPPHIRE, May 17, 2017 Analyze Genomes: A Federated In- Memory Database Computing Platform 25
  26. 26. Keep in contact with us! Dr. Schapranow, Intel Tech Talk at SAPPHIRE, May 17, 2017 Analyze Genomes: A Federated In- Memory Database Computing Platform 26 Dr. Matthieu-P. Schapranow Program Manager E-Health & Life Sciences Hasso Plattner Institute August-Bebel-Str. 88 14482 Potsdam, Germany schapranow@hpi.de http://we.analyzegenomes.com/

×