Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Analyze Genomes: A Federated In-memory Database Computing Platform enabling real-time Analysis of Big Medical Data

543 views

Published on

The given slide deck was presented on May 17 at the Intel Tech Talks hosted on SAPPHIRE 2016 in Orlando, FL.

Published in: Healthcare
  • Be the first to comment

  • Be the first to like this

Analyze Genomes: A Federated In-memory Database Computing Platform enabling real-time Analysis of Big Medical Data

  1. 1. Analyze Genomes: A Federated In-Memory Database Computing Platform Enabling Real-time Analysis of Big Medical Data Dr. Matthieu-P. Schapranow SAPPHIRE, Orlando, USA May 17, 2016
  2. 2. ■  Online: Visit we.analyzegenomes.com for latest research results, slides, videos, tools, and publications ■  Offline: High-Performance In-Memory Genome Data Analysis: In-Memory Data Management Research, Springer, ISBN: 978-3-319-03034-0, 2014 ■  In Person: Join us for Intel Tech Talks at SAPPHIRE booth 625 daily! □  May 17 12.30pm: A Federated In-Memory Database Computing Platform Enabling Real-time Analysis of Big Medical Data □  May 18 12.30pm: In-Memory Apps for Next Generation Life Sciences Research □  May 19 11.30am: In-Memory Apps Supporting Precision Medicine Where to find additional information? Schapranow, SAPPHIRE, May 17, 2016 A Federated In- Memory Database Computing Platform for Big Medical Data 2
  3. 3. Indirect Interaction Direct Interaction C linician PatientResearcher Pharm aceutical Com pany H ealthcare Providers H ospital Research Center Laboratory Patient Advocacy G roup Intelligent Healthcare Networks in the 21st Century? Schapranow, SAPPHIRE, May 17, 2016 A Federated In- Memory Database Computing Platform for Big Medical Data 3
  4. 4. Indirect Interaction Direct Interaction C linician PatientResearcher Pharm aceutical Com pany H ealthcare Providers H ospital Research Center Laboratory Patient Advocacy G roup Intelligent Healthcare Networks in the 21st Century? Schapranow, SAPPHIRE, May 17, 2016 A Federated In- Memory Database Computing Platform for Big Medical Data 4
  5. 5. Indirect Interaction Direct Interaction C linician PatientResearcher Pharm aceutical Com pany H ealthcare Providers H ospital Research Center Laboratory Patient Advocacy G roup Intelligent Healthcare Networks in the 21st Century! Schapranow, SAPPHIRE, May 17, 2016 A Federated In- Memory Database Computing Platform for Big Medical Data 5
  6. 6. ■  Patients □  Individual anamnesis, family history, and background □  Require fast access to individualized therapy ■  Clinicians □  Identify root and extent of disease using laboratory tests □  Evaluate therapy alternatives, adapt existing therapy ■  Researchers □  Conduct laboratory work, e.g. analyze patient samples □  Create new research findings and come-up with treatment alternatives The Setting Actors in Oncology Schapranow, SAPPHIRE, May 17, 2016 6 A Federated In- Memory Database Computing Platform for Big Medical Data
  7. 7. IT Challenges Distributed Heterogeneous Data Sources 7 Human genome/biological data 600GB per full genome 15PB+ in databases of leading institutes Prescription data 1.5B records from 10,000 doctors and 10M Patients (100 GB) Clinical trials Currently more than 30k recruiting on ClinicalTrials.gov Human proteome 160M data points (2.4GB) per sample >3TB raw proteome data in ProteomicsDB PubMed database >23M articles Hospital information systems Often more than 50GB Medical sensor data Scan of a single organ in 1s creates 10GB of raw dataCancer patient records >160k records at NCT A Federated In- Memory Database Computing Platform for Big Medical Data Schapranow, SAPPHIRE, May 17, 2016
  8. 8. Schapranow, SAPPHIRE, May 17, 2016 Our Approach Analyze Genomes: Real-time Analysis of Big Medical Data 8 In-Memory Database Extensions for Life Sciences Data Exchange, App Store Access Control, Data Protection Fair Use Statistical Tools Real-time Analysis App-spanning User Profiles Combined and Linked Data Genome Data Cellular Pathways Genome Metadata Research Publications Pipeline and Analysis Models Drugs and Interactions A Federated In- Memory Database Computing Platform for Big Medical Data Drug Response Analysis Pathway Topology Analysis Medical Knowledge CockpitOncolyzer Clinical Trial Recruitment Cohort Analysis ... Indexed Sources
  9. 9. Combined column and row store Map/Reduce Single and multi-tenancy Lightweight compression Insert only for time travel Real-time replication Working on integers SQL interface on columns and rows Active/passive data store Minimal projections Group key Reduction of software layers Dynamic multi- threading Bulk load of data Object- relational mapping Text retrieval and extraction engine No aggregate tables Data partitioning Any attribute as index No disk On-the-fly extensibility Analytics on historical data Multi-core/ parallelization Our Technology In-Memory Database Technology + ++ + + P v +++ t SQL x x T disk 9 Schapranow, SAPPHIRE, May 17, 2016 A Federated In- Memory Database Computing Platform for Big Medical Data
  10. 10. Where are all those Clouds go to? Schapranow, SAPPHIRE, May 17, 2016 A Federated In- Memory Database Computing Platform for Big Medical Data 10 Gartner's 2014 Hype Cycle for Emerging Technologies
  11. 11. ■  Requirements □  Real-time data analysis □  Maintained software ■  Restrictions □  Data privacy □  Data locality □  Volume of “big medical data” ■  Solution? □  Federated In-Memory Database System vs. Cloud Computing Software Requirements in Life Sciences Schapranow, SAPPHIRE, May 17, 2016 A Federated In- Memory Database Computing Platform for Big Medical Data 11
  12. 12. Approach I: Multiple Cloud Service Providers Schapranow, SAPPHIRE, May 17, 2016 A Federated In- Memory Database Computing Platform for Big Medical Data 12 Local System C loud Synchronization Service R Local Storage Local Synchronization Service R Shared C loud Storage Site A Local System R Local Storage Local Synchronization Service Site B C loud Synchronization Service Shared C loud Storage R Cloud Provider Site A C loud Provider Site B
  13. 13. Approach II: A Single Service Provider Schapranow, SAPPHIRE, May 17, 2016 A Federated In- Memory Database Computing Platform for Big Medical Data 13 Cloud Synchronization Service Shared Cloud Storage Site A Site BCloud Provider Cloud System R R
  14. 14. Multiple Sites Forming the Federated In-Memory Database System (FIMDB) Schapranow, SAPPHIRE, May 17, 2016 A Federated In- Memory Database Computing Platform for Big Medical Data 14 Federated In-M em ory D atabase System M aster Data and Shared Algorithm s Site A Site BCloud Provider Cloud IM D B Instance Local IM DB Instance Sensitive D ata, e.g. Patient Data R Local IM DB Instance Sensitive Data, e.g. Patient D ata R
  15. 15. FIMDB: Cloud Service Provider Schapranow, SAPPHIRE, May 17, 2016 A Federated In- Memory Database Computing Platform for Big Medical Data 15 Site B Federated In-M em ory D atabase Instance, Algorithm s, and Applications M anaged by Service Provider CloudService Provider Site A FIMDB A.1 FIMDB A.2 FIMDB A.3 FIMDB A.4 FIMDB A.5 FIMDB B.1 FIMDB B.2 FIMDB B.3 FIMDB C.1 Federated In-M em ory Database Instances M aster Data M anaged by Service Provider Sensitive D ata reside at Site ■  Change of cloud computing paradigm: Transfer (small) algorithms to (big) data ■  In-Memory Database (IMDB) □  Landscape of IMDB nodes □  Stored IMDB procedures and algorithms □  Master data for applications ■  In-Memory File System (IMDBfs) □  Integration of file-based tools □  Managed services directory □  OS binaries compiled and statically linked for individual platforms
  16. 16. 1.  Establish site-to-site VPN connection b/w site and cloud service provider 2.  Mount remote services directory 3.  Install and configure local IMDB instance from services directory 4.  Subscribe to and configure selected managed services FIMDB: Setup of a New Client Schapranow, SAPPHIRE, May 17, 2016 A Federated In- Memory Database Computing Platform for Big Medical Data 16
  17. 17. ■  Data partitioning protects sensitive data by storing it on local hardware resources only ■  Supports parallel query execution, i.e. reduced processing time ■  Efficient use of existing hardware resources FIMDB: Incorporating Local Compute Resources Schapranow, SAPPHIRE, May 17, 2016 A Federated In- Memory Database Computing Platform for Big Medical Data 17
  18. 18. ■  Brings algorithms to data ■  Forms a single database across individual sites and locations ■  Master data managed by service provider whilst sensitive data resides locally What to Take Home? Test it Yourself: AnalyzeGenomes.com Schapranow, SAPPHIRE, May 17, 2016 A Federated In- Memory Database Computing Platform for Big Medical Data 18 Pros Cons Single database license Complex operation Easy to consume services Time-consuming infrastructure setup Query propagation by IMDB Only a single source of truth
  19. 19. ■  Online: Visit we.analyzegenomes.com for latest research results, slides, videos, tools, and publications ■  Offline: High-Performance In-Memory Genome Data Analysis: In-Memory Data Management Research, Springer, ISBN: 978-3-319-03034-0, 2014 ■  In Person: Join us for Intel Tech Talks at SAPPHIRE booth 625 daily! □  May 17 12.30pm: A Federated In-Memory Database Computing Platform Enabling Real-time Analysis of Big Medical Data □  May 18 12.30pm: In-Memory Apps for Next Generation Life Sciences Research □  May 19 11.30am: In-Memory Apps Supporting Precision Medicine Where to find additional information? Schapranow, SAPPHIRE, May 17, 2016 A Federated In- Memory Database Computing Platform for Big Medical Data 19
  20. 20. Keep in contact with us! Dr. Matthieu-P. Schapranow Program Manager E-Health & Life Sciences Hasso Plattner Institute August-Bebel-Str. 88 14482 Potsdam, Germany schapranow@hpi.de http://we.analyzegenomes.com/ Schapranow, SAPPHIRE, May 17, 2016 A Federated In- Memory Database Computing Platform for Big Medical Data 20

×