Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Analyze Genomes: A Federated In-Memory Database System For Life Sciences

753 views

Published on

The given presentation shows our latest research results of building a federated in-memory computing platform for life sciences: AnalyzeGenomes.com

Published in: Technology
  • Be the first to like this

Analyze Genomes: A Federated In-Memory Database System For Life Sciences

  1. 1. Analyze Genomes: A Federated In-Memory Database System For Life Sciences Dr. Matthieu-P. Schapranow HPI Future SOC Lab Day, Potsdam, Germany Nov 4, 2015 Generously supported by
  2. 2. ■  Online: Visit we.analyzegenomes.com for latest research results, tools, and news ■  Offline: Read more about it, e.g. High-Performance In-Memory Genome Data Analysis: How In-Memory Database Technology Accelerates Personalized Medicine, In-Memory Data Management Research, Springer, ISBN: 978-3-319-03034-0, 2014 ■  In Person: Join us for “Festival of Genomics” Jan 19-21, 2016 in London, UK Important things first: Where do you find additional information? Schapranow/Perscheid, FSOC Lab Day, Nov 4, 2015 A Federated In- Memory Database System For Life Sciences 2
  3. 3. ■  Patients □  Individual anamnesis, family history, and background □  Require fast access to individualized therapy ■  Clinicians □  Identify root and extent of disease using laboratory tests □  Evaluate therapy alternatives, adapt existing therapy ■  Researchers □  Conduct laboratory work, e.g. analyze patient samples □  Create new research findings and come-up with treatment alternatives The Setting Actors in Oncology Schapranow/Perscheid, FSOC Lab Day, Nov 4, 2015 3 A Federated In- Memory Database System For Life Sciences
  4. 4. IT Challenges Distributed Heterogeneous Data Sources Human genome/biological data 600GB per full genome 15PB+ in databases of leading institutes Prescription data 1.5B records from 10,000 doctors and 10M Patients (100 GB) Clinical trials Currently more than 30k recruiting on ClinicalTrials.gov Human proteome 160M data points (2.4GB) per sample >3TB raw proteome data in ProteomicsDB PubMed database >24M articlesHospital information systems Often more than 50GB Medical sensor data Scan of a single organ in 1s creates 10GB of raw dataCancer patient records >160k records at NCT A Federated In- Memory Database System For Life Sciences Schapranow/Perscheid, FSOC Lab Day, Nov 4, 2015 Chart 4
  5. 5. ■  Requirements □  Real-time data analysis □  Maintained software ■  Restrictions □  Data privacy □  Data locality □  Volume of “big medical data” ■  Solution? □  Federated In-Memory Database System vs. Cloud Computing Software Requirements in Life Sciences Schapranow/Perscheid, FSOC Lab Day, Nov 4, 2015 A Federated In- Memory Database System For Life Sciences 5
  6. 6. Where are all those Clouds go to? Schapranow/Perscheid, FSOC Lab Day, Nov 4, 2015 A Federated In- Memory Database System For Life Sciences 6 Gartner's 2014 Hype Cycle for Emerging Technologies
  7. 7. Multiple Cloud Service Providers Schapranow, BIRTE/ VLDB 2015, Aug 31, 2015 A Federated In- Memory Database System For Life Sciences 7 Local System C loud Synchronization Service R Local Storage Local Synchronization Service R Shared C loud Storage Site A Local System R Local Storage Local Synchronization Service Site B C loud Synchronization Service Shared C loud Storage R Cloud Provider Site A C loud Provider Site B
  8. 8. Federated In-Memory Database (FIMDB) Incorporating Local Compute Resources Schapranow/Perscheid, FSOC Lab Day, Nov 4, 2015 A Federated In- Memory Database System For Life Sciences 8 Site B Federated In-M em ory D atabase Instance, Algorithm s, and Applications M anaged by Service Provider CloudService Provider Site A FIMDB A.1 FIMDB A.2 FIMDB A.3 FIMDB A.4 FIMDB A.5 FIMDB B.1 FIMDB B.2 FIMDB B.3 FIMDB C.1 Federated In-M em ory Database Instances M aster Data M anaged by Service Provider Sensitive D ata reside at Site ■  Aim: Provision of managed Analyze Genomes services while sensitive data remains locally ■  Process steps □  Connect existing resources to join federated database landscape □  Install Workers on local nodes to process sensitive data and store results in local DB instances
  9. 9. Schapranow/Perscheid, FSOC Lab Day, Nov 4, 2015 Analyze Genomes: Real-time Analysis of Big Medical Data 9 In-Memory Database Extensions for Life Sciences Data Exchange, App Store Access Control, Data Protection Fair Use Statistical Tools Real-time Analysis App-spanning User Profiles Combined and Linked Data Genome Data Cellular Pathways Genome Metadata Research Publications Pipeline and Analysis Models Drugs and Interactions A Federated In- Memory Database System For Life Sciences Drug Response Analysis Pathway Topology Analysis Medical Knowledge CockpitOncolyzer Clinical Trial Recruitment Cohort Analysis ... Indexed Sources
  10. 10. Use Case: Identification of Best Treatment Option for Cancer Patient ■  Patient: 48 years, female, non-smoker, smoke-free environment ■  Diagnosis: Non-Small Cell Lung Cancer (NSCLC), stage IV 1.  Surgery to remove tumor 2.  Tumor sample is sent to laboratory to extract DNA 3.  DNA is sequenced resulting in up to 750 GB of raw data per sample 4.  Processing of raw data to perform analysis 5.  Identification of relevant driver mutations using international medical knowledge 6.  Informed decision making Schapranow/Perscheid, FSOC Lab Day, Nov 4, 2015 A Federated In- Memory Database System For Life Sciences 10
  11. 11. From Raw Genome Data to Analysis Schapranow/Perscheid, FSOC Lab Day, Nov 4, 2015 A Federated In- Memory Database System For Life Sciences ■  Sequencing: Acquire digital DNA data ■  Alignment: Reconstruction of complete genome with snippets ■  Variant Calling: Identification of genetic variants ■  Data Annotation: Linking genetic variants with research findings Chart 11
  12. 12. Standardized Modeling of Genome Data Analysis Pipelines ■  Graphical modeling of analysis pipelines □  Supports reproducible research □  BPMN-2.0-compliant ■  Extension of modeling notation by □  Modular structure □  Degree of parallelization □  Parameters/variables ■  Pipelines stored in IMDB and executed through our worker framework A Federated In- Memory Database System For Life Sciences Schapranow/Perscheid, FSOC Lab Day, Nov 4, 2015 Chart 12
  13. 13. Execution of Genome Data Analysis Pipelines ■  Dedicated scheduler for optimized pipeline execution □  Assigns tasks to workers □  Recovery of pipeline status ■  Scheduler uses IMDB logs for workload estimation ■  Different scheduling algorithms available, e.g. □  High Throughput □  Priority First □  User-/Group-based A Federated In- Memory Database System For Life Sciences Schapranow/Perscheid, FSOC Lab Day, Nov 4, 2015 IMDB Pipeline TasksScheduler Worker Worker Worker Worker Pipeline Subtasks Events Data Chart 13
  14. 14. Real-time Analysis of Genetic Variants ■  Genome Browser enables detailed exploration of genome loci and associated associations ■  Ranks variants accordingly to known diseases ■  Integrates latest international medical knowledge, annotations, and literature ■  Provides links back to primary data sources, e.g. EBI, NCBI, dbSNP, and UCSC A Federated In- Memory Database System For Life Sciences Schapranow/Perscheid, FSOC Lab Day, Nov 4, 2015 Chart 14
  15. 15. Medical Knowledge Cockpit ■  Uses patient specifics to provide more adequate results ■  Immediate exploration of relevant information, e.g. □  Gene descriptions □  Molecular impact and related pathways □  Scientific publications □  Suitable clinical trials ■  Translates manual searching for hours or days into finding A Federated In- Memory Database System For Life Sciences Schapranow/Perscheid, FSOC Lab Day, Nov 4, 2015 Chart 15
  16. 16. Drug Response Analysis ■  Incorporate knowledge about historic cases to optimize treatment of current cases ■  Enables real-time exploration of Xenograft experiments ■  Configurable medical model to predict drug response A Federated In- Memory Database System For Life Sciences Schapranow/Perscheid, FSOC Lab Day, Nov 4, 2015 Chart 16
  17. 17. ■  Global Medical Knowledge (Master’s project) ■  Detect cardiovascular diseases and evaluate treatment options (DHZB) ■  Use health insurance data to improve health care research (AOK) ■  Pharmacogenetics (Bayer) ■  Generously supported by Join us for upcoming projects! Schapranow/Perscheid, FSOC Lab Day, Nov 4, 2015 A Federated In- Memory Database System For Life Sciences 17 Interdisciplinary Design Thinking Teams You?
  18. 18. ■  For patients □  Identify relevant clinical trials and medical experts □  Become an informed patient ■  For clinicians □  Identify pharmacokinetic correlations □  Scan for similar patient cases, e.g. to evaluate therapy efficiency ■  For researchers □  Enable real-time analysis of medical data, e.g. assess pathways to identify impact of detected variants □  Combined mining in structured and unstructured data, e.g. publications, diagnosis, and EMR data What to Take Home? Test it Yourself: AnalyzeGenomes.com Schapranow/Perscheid, FSOC Lab Day, Nov 4, 2015 18 A Federated In- Memory Database System For Life Sciences
  19. 19. Keep in contact with us! Hasso Plattner Institute Enterprise Platform & Integration Concepts (EPIC) August-Bebel-Str. 88 14482 Potsdam, Germany Dr. Matthieu-P. Schapranow Program Manager E-Health schapranow@hpi.de Schapranow/Perscheid, FSOC Lab Day, Nov 4, 2015 A Federated In- Memory Database System For Life Sciences 19 Cindy Perscheid Research Assistant cindy.perscheid@hpi.de

×