Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

A Federated In-Memory Database System for Life Sciences

1,115 views

Published on

Details about our federated Federated In-Memory DataBase (FIMDB) supporting life sciences as presented at the BIRTE/VLDB conference 2015.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

A Federated In-Memory Database System for Life Sciences

  1. 1. A Federated In-Memory Database System For Life Sciences Dr. Matthieu-P. Schapranow BIRTE/VLDB 2015, Kohala Coast, Hawai’i, HI Aug 31, 2015
  2. 2. ■  Online: Visit we.analyzegenomes.com for latest research results, tools, and news ■  Offline: Read more about it, e.g. High-Performance In-Memory Genome Data Analysis: How In-Memory Database Technology Accelerates Personalized Medicine, In-Memory Data Management Research, Springer, ISBN: 978-3-319-03034-0, 2014 ■  In Person: Join us for “Bio Data World Congress” Oct 21-22, 2015 in Cambridge, U.K. Important things first: Where do you find additional information? Schapranow, BIRTE/ VLDB 2015, Aug 31, 2015 A Federated In- Memory Database System For Life Sciences 2
  3. 3. ■  Patients □  Individual anamnesis, family history, and background □  Require fast access to individualized therapy ■  Clinicians □  Identify root and extent of disease using laboratory tests □  Evaluate therapy alternatives, adapt existing therapy ■  Researchers □  Conduct laboratory work, e.g. analyze patient samples □  Create new research findings and come-up with treatment alternatives The Setting Actors in Oncology Schapranow, BIRTE/ VLDB 2015, Aug 31, 2015 3 A Federated In- Memory Database System For Life Sciences
  4. 4. ■  Can we enable doctors to: □  Select best treatment options for their patients, □  Analyze latest diagnostic data about patient’s status, □  Exchange knowledge with patients to improve quality of living Our Motivation Enable Doctors to Use Precision Medicine A Federated In- Memory Database System For Life Sciences 4 Schapranow, BIRTE/ VLDB 2015, Aug 31, 2015
  5. 5. Use Case: Identification of Best Treatment Option for Cancer Patient ■  Patient: 48 years, female, non-smoker, smoke-free environment ■  Diagnosis: Non-Small Cell Lung Cancer (NSCLC), stage IV 1.  Surgery to remove tumor 2.  Tumor sample is sent to laboratory to extract DNA 3.  DNA is sequenced resulting in up to 750 GB of raw data per sample 4.  Processing of raw data to perform analysis 5.  Identification of relevant driver mutations using international medical knowledge 6.  Informed decision making Schapranow, Trends and Concepts Lecture, July 2, 2015 Turning Big Data into Precision Medicine 5
  6. 6. Schapranow, BIRTE/ VLDB 2015, Aug 31, 2015 Analyze Genomes: Real-time Analysis of Big Medical Data 6 In-Memory Database Extensions for Life Sciences Data Exchange, App Store Access Control, Data Protection Fair Use Statistical Tools Real-time Analysis App-spanning User Profiles Combined and Linked Data Genome Data Cellular Pathways Genome Metadata Research Publications Pipeline and Analysis Models Drugs and Interactions A Federated In- Memory Database System For Life Sciences Drug Response Analysis Pathway Topology Analysis Medical Knowledge CockpitOncolyzer Clinical Trial Recruitment Cohort Analysis ... Indexed Sources
  7. 7. Combined column and row store Map/Reduce Single and multi-tenancy Lightweight compression Insert only for time travel Real-time replication Working on integers SQL interface on columns and rows Active/passive data store Minimal projections Group key Reduction of software layers Dynamic multi- threading Bulk load of data Object- relational mapping Text retrieval and extraction engine No aggregate tables Data partitioning Any attribute as index No disk On-the-fly extensibility Analytics on historical data Multi-core/ parallelization Our Technology In-Memory Database Technology + ++ + + P v +++ t SQL x x T disk 7 Schapranow, BIRTE/ VLDB 2015, Aug 31, 2015 A Federated In- Memory Database System For Life Sciences
  8. 8. ■  Requirements □  Real-time data analysis □  Maintained software ■  Restrictions □  Data privacy □  Data locality □  Volume of “big medical data” ■  Solution? □  Federated In-Memory Database System vs. Cloud Computing Software Requirements in Life Sciences Schapranow, BIRTE/ VLDB 2015, Aug 31, 2015 A Federated In- Memory Database System For Life Sciences 8
  9. 9. Federated In-Memory Database (FIMDB) Incorporating Local Compute Resources Schapranow, BIRTE/ VLDB 2015, Aug 31, 2015 A Federated In- Memory Database System For Life Sciences 9 Site B Federated In-M em ory D atabase Instance, Algorithm s, and Applications M anaged by Service Provider CloudService Provider Site A FIMDB A.1 FIMDB A.2 FIMDB A.3 FIMDB A.4 FIMDB A.5 FIMDB B.1 FIMDB B.2 FIMDB B.3 FIMDB C.1 Federated In-M em ory Database Instances M aster Data M anaged by Service Provider Sensitive D ata reside at Site
  10. 10. Where are all those Clouds go to? Schapranow, BIRTE/ VLDB 2015, Aug 31, 2015 A Federated In- Memory Database System For Life Sciences 10 Gartner's 2014 Hype Cycle for Emerging Technologies
  11. 11. ■  Three Cat IV hurricanes in the Pacific at the same time: □  Ignacio, □  Jimena, and □  Kilo. ■  Kilo (most left) and Ignacio (center) classified as Cat III by Aug 30, 2015 ■  Ignacio will have passed the Hawai’i Big Island by Sep 2, 2015 (last updated Aug 30, 10pm) Where are all those Clouds go to? (Excurse) Schapranow, BIRTE/ VLDB 2015, Aug 31, 2015 A Federated In- Memory Database System For Life Sciences 11 http://www.weather.com/storms/hurricane/news/three-category-4-hurricanes-pacific-kilo-ignacio-jimena
  12. 12. Multiple Cloud Service Providers Schapranow, BIRTE/ VLDB 2015, Aug 31, 2015 A Federated In- Memory Database System For Life Sciences 12 Local System C loud Synchronization Service R Local Storage Local Synchronization Service R Shared C loud Storage Site A Local System R Local Storage Local Synchronization Service Site B C loud Synchronization Service Shared C loud Storage R Cloud Provider Site A C loud Provider Site B
  13. 13. A Single Service Provider Schapranow, BIRTE/ VLDB 2015, Aug 31, 2015 A Federated In- Memory Database System For Life Sciences 13 Cloud Synchronization Service Shared Cloud Storage Site A Site BCloud Provider Cloud System R R
  14. 14. Multiple Sites Forming the Federated In-Memory Database System Schapranow, BIRTE/ VLDB 2015, Aug 31, 2015 A Federated In- Memory Database System For Life Sciences 14 Federated In-M em ory D atabase System M aster Data and Shared Algorithm s Site A Site BCloud Provider Cloud IM D B Instance Local IM DB Instance Sensitive D ata, e.g. Patient Data R Local IM DB Instance Sensitive Data, e.g. Patient D ata R
  15. 15. ■  File System □  Managed services directory □  OS binaries statically compiled for individual platforms ■  Database □  In-memory database landscape □  Stored procedures and database algorithms □  Master application data Provided by the Cloud Service Provider Schapranow, BIRTE/ VLDB 2015, Aug 31, 2015 A Federated In- Memory Database System For Life Sciences 15
  16. 16. 1.  Establish site-to-site VPN connection b/w site and cloud service provider 2.  Mount remote services directory 3.  Install and configure local IMDB instance from services directory 4.  Subscribe to and configure selected managed service Setup of a New Client Schapranow, BIRTE/ VLDB 2015, Aug 31, 2015 A Federated In- Memory Database System For Life Sciences 16
  17. 17. ■  Supports parallel query execution ■  Protects sensitive data ■  Brings algorithms to data Data Partitioning Schapranow, BIRTE/ VLDB 2015, Aug 31, 2015 A Federated In- Memory Database System For Life Sciences 17
  18. 18. ■  Test our services at we.analyzegenomes.com ■  FIMDB brings algorithms to data ■  Forms a single virtual database across sites and locations ■  Master data managed by service provider whilst sensitive data resides locally Summary and Outlook Schapranow, BIRTE/ VLDB 2015, Aug 31, 2015 A Federated In- Memory Database System For Life Sciences 18 Pros Cons Single database license Complex operation Easy to consume services Complex single time setup required Query propagation by IMDB
  19. 19. Keep in contact with us! Hasso Plattner Institute August-Bebel-Str. 88 14482 Potsdam, Germany Dr. Matthieu-P. Schapranow schapranow@hpi.de http://we.analyzegenomes.com/

×