Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Big Data in Genomics: Opportunities and Challenges

752 views

Published on

Slides of the 2015 Bio Data World Congress show how our analyzegenomes.com services are combined to support precision medicine in the context of modern oncology treatment.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Big Data in Genomics: Opportunities and Challenges

  1. 1. Big Data in Genomics: Opportunities and Challenges Dr. Matthieu-P. Schapranow Bio Data World Congress, Cambridge, UK Oct 22, 2015
  2. 2. ■  Online: Visit we.analyzegenomes.com for latest research results, tools, and news ■  Offline: Read more about it, e.g. High-Performance In-Memory Genome Data Analysis: How In-Memory Database Technology Accelerates Personalized Medicine, In-Memory Data Management Research, Springer, ISBN: 978-3-319-03034-0, 2014 ■  In Person: Join us for “Festival of Genomics” Jan 19-21, 2016 in London, UK Important things first: Where do you find additional information? Schapranow, Bio Data, Cambridge, UK, Oct 22, 2015 Big Data in Genomics: Opportunities and Challenges 2
  3. 3. What is the Hasso Plattner Institute, Potsdam, Germany? Schapranow, HPI, Oct 13, 2015 Analyze Genomes: A Federated In-Memory Database Computing Platform 3
  4. 4. Prof. Dr. h.c. Hasso Plattner ■ Research focuses on the technical aspects of enterprise software and design of complex applications □  In-Memory Data Management for Enterprise Applications □  Enterprise Application Programming Model □  Scientific Data Management □  Human-Centered Software Design and Engineering ■ Industry cooperations, e.g. SAP, Siemens, Audi, and EADS ■ Research cooperations, e.g. Stanford, MIT, and Berkeley Hasso Plattner Institute Enterprise Platform and Integration Concepts Group Schapranow, HPI, Oct 13, 2015 Analyze Genomes: A Federated In-Memory Database Computing Platform 4 Partner of Stanford Center for Design Research Partner of MIT in Supply Chain Innovation and CSAIL Partner at UC Berkeley RAD / AMP Lab Partner of SAP AG
  5. 5. ■  Since 2009 Program Manager E-Health & Life Sciences ■  2006-2014 Strategic Projects SAP HANA ■  Visiting Scientist at V.A., Boston, MA and Charité, Berlin ■  Software Engineer by training (PhD, M.Sc., B.Sc.) With whom are you dealing? Schapranow, HPI, Oct 13, 2015 Analyze Genomes: A Federated In-Memory Database Computing Platform 5
  6. 6. ■  Patients □  Individual anamnesis, family history, and background □  Require fast access to individualized therapy ■  Clinicians □  Identify root and extent of disease using laboratory tests □  Evaluate therapy alternatives, adapt existing therapy ■  Researchers □  Conduct laboratory work, e.g. analyze patient samples □  Create new research findings and come-up with treatment alternatives The Setting Actors in Oncology Schapranow, Bio Data, Cambridge, UK, Oct 22, 2015 6 Big Data in Genomics: Opportunities and Challenges
  7. 7. IT Challenges Distributed Heterogeneous Data Sources 7 Human genome/biological data 600GB per full genome 15PB+ in databases of leading institutes Prescription data 1.5B records from 10,000 doctors and 10M Patients (100 GB) Clinical trials Currently more than 30k recruiting on ClinicalTrials.gov Human proteome 160M data points (2.4GB) per sample >3TB raw proteome data in ProteomicsDB PubMed database >23M articles Hospital information systems Often more than 50GB Medical sensor data Scan of a single organ in 1s creates 10GB of raw dataCancer patient records >160k records at NCT Big Data in Genomics: Opportunities and Challenges Schapranow, Bio Data, Cambridge, UK, Oct 22, 2015
  8. 8. Schapranow, HPI, Oct 13, 2015 Our Approach Analyze Genomes: Real-time Analysis of Big Medical Data 8 In-Memory Database Extensions for Life Sciences Data Exchange, App Store Access Control, Data Protection Fair Use Statistical Tools Real-time Analysis App-spanning User Profiles Combined and Linked Data Genome Data Cellular Pathways Genome Metadata Research Publications Pipeline and Analysis Models Drugs and Interactions Analyze Genomes: A Federated In-Memory Database Computing Platform Drug Response Analysis Pathway Topology Analysis Medical Knowledge CockpitOncolyzer Clinical Trial Recruitment Cohort Analysis ... Indexed Sources
  9. 9. Case Vignette ■  Patient: 48 years, female, non-smoker, smoke-free environment ■  Diagnosis: Non-Small Cell Lung Cancer (NSCLC), stage IV ■  Markers: KRAS, EGFR, BRAF, NRAS, (ERBB2) ■  Initial treatment: Surgery ■  Therapy: Palliative chemotherapy Schapranow, Bio Data, Cambridge, UK, Oct 22, 2015 Big Data in Genomics: Opportunities and Challenges 9
  10. 10. Cloud-based Services for Processing of DNA Data ■  Control center for processing of raw DNA data, such as FASTQ, SAM, and VCF ■  Personal user profile guarantees privacy of uploaded and processed data ■  Supports reproducible research process by storing all relevant process parameters ■  Implements prioritized data processing and fair use, e.g. per department or per institute ■  Supports additional service, such as data annotations, billing, and sharing for all Analyze Genomes services ■  Honored by the 2014 European Life Science Award Big Data in Genomics: Opportunities and Challenges Standardized Modeling and runtime environment for analysis pipelines 10 Schapranow, Bio Data, Cambridge, UK, Oct 22, 2015
  11. 11. ■  Query-oriented search interface ■  Seamless integration of patient specifics, e.g. from EMR ■  Parallel search in international knowledge bases, e.g. for biomarkers, literature, cellular pathway, and clinical trials Medical Knowledge Cockpit for Patients and Clinicians Linking Patient Specifics with International Knowledge Big Data in Genomics: Opportunities and Challenges 11 Schapranow, Bio Data, Cambridge, UK, Oct 22, 2015
  12. 12. Medical Knowledge Cockpit for Patients and Clinicians ■  Search for affected genes in distributed and heterogeneous data sources ■  Immediate exploration of relevant information, such as □  Gene descriptions, □  Molecular impact and related pathways, □  Scientific publications, and □  Suitable clinical trials. ■  No manual searching for hours or days: In-memory technology translates searching into interactive finding! Big Data in Genomics: Opportunities and Challenges Automatic clinical trial matching build on text analysis features Unified access to structured and un-structured data sources 12 Schapranow, Bio Data, Cambridge, UK, Oct 22, 2015
  13. 13. Schapranow, Bio Data, Cambridge, UK, Oct 22, 2015 Medical Knowledge Cockpit for Patients and Clinicians Pathway Topology Analysis ■  Search in pathways is limited to “is a certain element contained” today ■  Integrated >1,5k pathways from international sources, e.g. KEGG, HumanCyc, and WikiPathways, into HANA ■  Implemented graph-based topology exploration and ranking based on patient specifics ■  Enables interactive identification of possible dysfunctions affecting the course of a therapy before its start Big Data in Genomics: Opportunities and Challenges Unified access to multiple formerly disjoint data sources Pathway analysis of genetic variants with graph engine 13
  14. 14. Real-time Data Analysis and Interactive Exploration Drug Response Analysis Data Sources Schapranow, Bio Data, Cambridge, UK, Oct 22, 2015 Big Data in Genomics: Opportunities and Challenges Smoking status, tumor classification and age (1MB - 100MB) Raw DNA data and genetic variants (100MB - 1TB) Medication efficiency and wet lab results (10MB - 1GB) 14 Patient-specific Data Tumor-specific Data Compound Interaction Data
  15. 15. Schapranow, Bio Data, Cambridge, UK, Oct 22, 2015 Big Data in Genomics: Opportunities and Challenges 15
  16. 16. Showcase Schapranow, Bio Data, Cambridge, UK, Oct 22, 2015 Big Data in Genomics: Opportunities and Challenges 16 Calculating Drug Response…Predict Drug Response
  17. 17. Schapranow, Bio Data, Cambridge, UK, Oct 22, 2015 Big Data in Genomics: Opportunities and Challenges 17 cetuximab might be more beneficial for the current case
  18. 18. Our Methodology Design Thinking Schapranow, Bio Data, Cambridge, UK, Oct 22, 2015 Big Data in Genomics: Opportunities and Challenges 18
  19. 19. Our Methodology Design Thinking Schapranow, Bio Data, Cambridge, UK, Oct 22, 2015 Big Data in Genomics: Opportunities and Challenges 19 Desirability ■  Portfolio of integrated services for clinicians, researchers, and patients ■  Include latest treatment option, e.g. most effective therapies Viability ■  Enable precision medicine also in far-off regions and developing countries ■  Involve word-wide experts (cost-saving) ■  Combine latest international data (publications, annotations, genome data) Feasibility ■  HiSeq 2500 enables high-coverage whole genome sequencing in 20h ■  IMDB enables allele frequency determination of 12B records within <1s ■  Cloud-based data processing services reduce TCO
  20. 20. Combined column and row store Map/Reduce Single and multi-tenancy Lightweight compression Insert only for time travel Real-time replication Working on integers SQL interface on columns and rows Active/passive data store Minimal projections Group key Reduction of software layers Dynamic multi- threading Bulk load of data Object- relational mapping Text retrieval and extraction engine No aggregate tables Data partitioning Any attribute as index No disk On-the-fly extensibility Analytics on historical data Multi-core/ parallelization Our Technology In-Memory Database Technology + ++ + + P v +++ t SQL x x T disk 20 Schapranow, Bio Data, Cambridge, UK, Oct 22, 2015 Big Data in Genomics: Opportunities and Challenges
  21. 21. ■  For patients □  Identify relevant clinical trials and medical experts □  Become an informed patient ■  For clinicians □  Identify pharmacokinetic correlations □  Scan for similar patient cases, e.g. to evaluate therapy efficiency ■  For researchers □  Enable real-time analysis of medical data, e.g. assess pathways to identify impact of detected variants □  Combined mining in structured and unstructured data, e.g. publications, diagnosis, and EMR data What to Take Home? Test it Yourself: AnalyzeGenomes.com Schapranow, Bio Data, Cambridge, UK, Oct 22, 2015 21 Big Data in Genomics: Opportunities and Challenges
  22. 22. Keep in contact with us! Hasso Plattner Institute Enterprise Platform & Integration Concepts (EPIC) Program Manager E-Health Dr. Matthieu-P. Schapranow August-Bebel-Str. 88 14482 Potsdam, Germany Dr. Matthieu-P. Schapranow schapranow@hpi.de http://we.analyzegenomes.com/ Schapranow, Bio Data, Cambridge, UK, Oct 22, 2015 Big Data in Genomics: Opportunities and Challenges 22

×