Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

"When time matters..."

917 views

Published on

The given slide deck was presented on the 2017 Festival of Genomics in London, UK. It depicts how latest in-memory database technology supports clinicians in finding the best treatment options incorporating genetic data.

Published in: Health & Medicine
  • Be the first to comment

"When time matters..."

  1. 1. AnalyzeGenomes.com: When time matters… Dr. –Ing. Matthieu-P. Schapranow Festival of Genomics, London, UK Jan 31, 2017
  2. 2. What is the Hasso Plattner Institute, Potsdam, Germany? Dr. Schapranow, Festival of Genomics, Jan 31, 2017 When time matters... 2
  3. 3. From Raw Genome Data to Analysis Dr. Schapranow, Festival of Genomics, Jan 31, 2017 When time matters... ■  DNA Sequencing: Transformation of analogues DNA into digital format ■  Alignment: Reconstruction of complete genome with snippets ■  Variant Calling: Identification of genetic variants ■  Data Annotation: Linking genetic variants with research findings 3
  4. 4. ■  Purpose: Transformation of analogous DNA into digital format (A/D converter) ■  Input: Chunks of DNA ■  Output: DNA reads in digital form, e.g. in FASTQ format 1. DNA Sequencing Dr. Schapranow, Festival of Genomics, Jan 31, 2017 When time matters... 44peaks.app
  5. 5. ■  FASTQ format used for further processing ■  One read is a quart-tuple of: 1.  Sequence identifier / description 2.  Raw sequence 3.  Strand / direction 4.  Quality values per sequenced base 1. Output of Sequencing Dr. Schapranow, Festival of Genomics, Jan 31, 2017 When time matters... 5
  6. 6. ■  Purpose: Mapping of DNA reads to a reference ■  Input: □  DNA reads := Sequence of nucleotides with a length of 100 bp up to some 1 kbp □  Reference genome := Blueprint for alignment of DNA reads ■  Output: Mapped DNA reads ■  Bear in mind: □  Less fraction in DNA reads, i.e. longer reads, allows more precise alignment □  Reference from same origin improves mapping quality 2. Alignment Overview Dr. Schapranow, Festival of Genomics, Jan 31, 2017 When time matters... 6
  7. 7. ■  Purpose: Variant Calling := Detect variations within a genome ■  Input: □  Mapped DNA reads, i.e. output of alignment process □  Reference genome ■  Output: List of variants ■  Bear in mind: □  Read depth at posi:= Number of nucleotides storing information about pos i 3. Variant Calling Overview Dr. Schapranow, Festival of Genomics, Jan 31, 2017 When time matters... 7
  8. 8. ■  Purpose: □  Assess impact of genetic changes □  Understand gene function and possible medical therapy options ■  Input: List of genetic variants ■  Output: Details about certain genetic locus 4. Genetic Annotations Dr. Schapranow, Festival of Genomics, Jan 31, 2017 When time matters... 8 CHROM POS ID REF ALT QUAL FILTER INFO FORMAT SAMPLE chr7 140753336 rs113488022 T A 61 PASS NS=1 GT 0/1
  9. 9. ■  Manual, time-consuming Internet search, e.g. publications, annotations, guidelines ■  International consortiums provide fragmented information ■  Missing linkage across individual data sources ■  Annotations vary in completeness and correctness 4. Challenges Today Dr. Schapranow, Festival of Genomics, Jan 31, 2017 When time matters... 9
  10. 10. ■  https://www.ncbi.nlm.nih.gov/projects/SNP/snp_ref.cgi?rs=rs113488022 4. Interpretation of Annotations: BRAF Gene dbSNP Dr. Schapranow, Festival of Genomics, Jan 31, 2017 When time matters... 10
  11. 11. 4. Interpretation of Annotations: BRAF Gene Kegg Dr. Schapranow, Festival of Genomics, Jan 31, 2017 When time matters... 11
  12. 12. 4. Interpretation of Annotations: BRAF gene GeneCards Dr. Schapranow, Festival of Genomics, Jan 31, 2017 When time matters... 12
  13. 13. 4. Interpretation of Annotations: BRAF Gene PubMed Dr. Schapranow, Festival of Genomics, Jan 31, 2017 When time matters... 13
  14. 14. Simplified Clinical Oncology Process (1/2) Dr. Schapranow, Festival of Genomics, Jan 31, 2017 When time matters... 14 Simplified Clinical Oncology Process (1/2)
  15. 15. Simplified Clinical Oncology Process (1/2) Dr. Schapranow, Festival of Genomics, Jan 31, 2017 When time matters... 15 Simplified Clinical Oncology Process (2/2)
  16. 16. ■  Can we enable clinicians to take their therapy decisions: □  Incorporating all available patient specifics, □  Referencing latest lab results and worldwide medical knowledge, and □  In an interactive manner during their ward round? Our Motivation Turn Precision Medicine Into Clinical Routine When time matters... 16 Dr. Schapranow, Festival of Genomics, Jan 31, 2017
  17. 17. Use Case: Precision Oncology Identification of Best Treatment Option for Cancer Patient ■  Patient: 48 years, female, non-smoker, smoke-free environment ■  Diagnosis: Non-Small Cell Lung Cancer (NSCLC), stage IV ■  Markers: KRAS, EGFR, BRAF, NRAS, (ERBB2) 1.  Remove tumor through surgery 2.  Send tumor sample to laboratory for DNA extraction 3.  Sequence complete DNA of sample results in 750 GB of raw genome data 4.  Process raw genome data, e.g. alignment, variant calling, and annotate 5.  Identify relevant variants using international medical knowledge 6.  Support decision making, e.g. link to de-identified historic cases Dr. Schapranow, Festival of Genomics, Jan 31, 2017 When time matters... 17
  18. 18. Dr. Schapranow, Festival of Genomics, Jan 31, 2017 When time matters... 18
  19. 19. Our Vision Medical Board Incorporating Latest Medical Knowledge Dr. Schapranow, Festival of Genomics, Jan 31, 2017 When time matters... 19
  20. 20. The Challenge Distributed Heterogeneous Data Sources 20 Human genome/biological data 600GB per full genome 15PB+ in databases of leading institutes Prescription data 1.5B records from 10,000 doctors and 10M Patients (100 GB) Clinical trials Currently more than 30k recruiting on ClinicalTrials.gov Human proteome 160M data points (2.4GB) per sample >3TB raw proteome data in ProteomicsDB PubMed database >23M articles Hospital information systems Often more than 50GB Medical sensor data Scan of a single organ in 1s creates 10GB of raw dataCancer patient records >160k records at NCT When time matters... Dr. Schapranow, Festival of Genomics, Jan 31, 2017
  21. 21. ■  Requirements □  Managed services □  Reproducibility □  Real-time data analysis ■  Restrictions □  Data privacy □  Data locality □  Volume of big medical data Software Requirements in Life Sciences Dr. Schapranow, Festival of Genomics, Jan 31, 2017 When time matters... 21 http://stevedempsen.blogspot.de/2013/08/agile-software-requirements-comic.html
  22. 22. Project Time Line Dr. Schapranow, Festival of Genomics, Jan 31, 2017 When time matters... 22 2009 2010 2011 2012 2013 2014 2015 SAP HANA launched Oncolyzer SORMAS Drug Response Analysis Enterprise Software Medical Knowledge Cockpit Analyze Genomes Platform IMDB Research 2016 2017 A R T + T RAM S + S M
  23. 23. Dr. Schapranow, Festival of Genomics, Jan 31, 2017 Our Approach: AnalyzeGenomes.com In-Memory Computing Platform for Big Medical Data 23 In-Memory Database When time matters...
  24. 24. Dr. Schapranow, Festival of Genomics, Jan 31, 2017 Our Approach: AnalyzeGenomes.com In-Memory Computing Platform for Big Medical Data 24 In-Memory Database Combined and Linked Data Genome Data Cellular Pathways Genome Metadata Research Publications Pipeline and Analysis Models Drugs and Interactions When time matters... Indexed Sources
  25. 25. Dr. Schapranow, Festival of Genomics, Jan 31, 2017 Our Approach: AnalyzeGenomes.com In-Memory Computing Platform for Big Medical Data 25 In-Memory Database Extensions for Life Sciences Data Exchange, App Store Access Control, Data Protection Fair Use Statistical Tools Real-time Analysis App-spanning User Profiles Combined and Linked Data Genome Data Cellular Pathways Genome Metadata Research Publications Pipeline and Analysis Models Drugs and Interactions When time matters... Indexed Sources
  26. 26. Dr. Schapranow, Festival of Genomics, Jan 31, 2017 Our Approach: AnalyzeGenomes.com In-Memory Computing Platform for Big Medical Data 26 In-Memory Database Extensions for Life Sciences Data Exchange, App Store Access Control, Data Protection Fair Use Statistical Tools Real-time Analysis App-spanning User Profiles Combined and Linked Data Genome Data Cellular Pathways Genome Metadata Research Publications Pipeline and Analysis Models Drugs and Interactions When time matters... Drug Response Analysis Pathway Topology Analysis Medical Knowledge CockpitOncolyzer Clinical Trial Recruitment Cohort Analysis ... Indexed Sources
  27. 27. Combined column and row store Map/Reduce Single and multi-tenancy Lightweight compression Insert only for time travel Real-time replication Working on integers SQL interface on columns and rows Active/passive data store Minimal projections Group key Reduction of software layers Dynamic multi- threading Bulk load of data Object- relational mapping Text retrieval and extraction engine No aggregate tables Data partitioning Any attribute as index No disk On-the-fly extensibility Analytics on historical data Multi-core/ parallelization Real-Time Data Analysis In-Memory Database Technology + ++ + + P v +++ t SQL x x T disk 27 Dr. Schapranow, Festival of Genomics, Jan 31, 2017 When time matters...
  28. 28. Managed Services provided by Federated In-Memory Database System (FIMDB) Dr. Schapranow, Festival of Genomics, Jan 31, 2017 When time matters... 28 Node i WorkerWorkerWorker IMDB Node j WorkerWorkerWorker IMDB Node k WorkerWorkerWorker IMDB Scheduler Node m WorkerWorkerWorker IMDB Relay Node n WorkerWorkerWorker IMDB ... Cloud Service Provider (Shared Algorithms and Public Reference Data) Hospital or Research Department (Sensitive/Patient Data) VPN UDP TCP Shared File System (Pool) Shared File System (Pool) ... Shared File System (Global)
  29. 29. From Raw Genome Data to Analysis Dr. Schapranow, Festival of Genomics, Jan 31, 2017 When time matters... ■  DNA Sequencing: Transformation of analogues DNA into digital format ■  Alignment: Reconstruction of complete genome with snippets ■  Variant Calling: Identification of genetic variants ■  Data Annotation: Linking genetic variants with research findings 29
  30. 30. Reproducibility Modeling of Data Analysis Pipelines 1.  Design time (researcher, process expert) □  Definition of parameterized process model □  Uses graphical editor and jobs from repository 2.  Configuration time (researcher, lab assistant) □  Select model and specify parameters, e.g. aln opts □  Results in model instance stored in repository 3.  Execution time (researcher) □  Select model instance □  Specify execution parameters, e.g. input files When time matters... Dr. Schapranow, Festival of Genomics, Jan 31, 2017 30
  31. 31. App Example: Cloud-based Services for Processing of DNA Data ■  Control center for processing of raw DNA data, such as FASTQ, SAM, and VCF ■  Personal user profile guarantees privacy of uploaded and processed data ■  Supports reproducible research process by storing all relevant process parameters ■  Implements prioritized data processing and fair use, e.g. per department or per institute ■  Supports additional service, such as data annotations, billing, and sharing for all Analyze Genomes services ■  Honored by the 2014 European Life Science Award When time matters... Standardized Modeling and runtime environment for analysis pipelines 31 Dr. Schapranow, Festival of Genomics, Jan 31, 2017
  32. 32. Real-time Data Analysis and Interactive Exploration App Example: Identification of Optimal Chemotherapy Dr. Schapranow, Festival of Genomics, Jan 31, 2017 When time matters... Smoking status, tumor classification and age (1MB - 100MB) Raw DNA data and genetic variants (100MB - 1TB) Medication efficiency and wet lab results (10MB - 1GB) 32 Patient-specific Data Tumor-specific Data Compound Interaction Data ■  Honored by the 2015 PerMediCon Award
  33. 33. Dr. Schapranow, Festival of Genomics, Jan 31, 2017 When time matters... 33
  34. 34. Showcase Dr. Schapranow, Festival of Genomics, Jan 31, 2017 When time matters... 34 Calculating Drug Response…Predict Drug Response
  35. 35. Dr. Schapranow, Festival of Genomics, Jan 31, 2017 When time matters... 35 cetuximab might be more beneficial for the current case
  36. 36. ■  Query-oriented search interface ■  Seamless integration of patient specifics, e.g. from EMR ■  Parallel search in international knowledge bases, e.g. for biomarkers, literature, cellular pathway, and clinical trials App Example: Medical Knowledge Cockpit for Patients and Clinicians When time matters... 36 Dr. Schapranow, Festival of Genomics, Jan 31, 2017
  37. 37. Dr. Schapranow, Festival of Genomics, Jan 31, 2017 Medical Knowledge Cockpit for Patients and Clinicians Pathway Topology Analysis ■  Search in pathways is limited to “is a certain element contained” today ■  Integrated >1,5k pathways from international sources, e.g. KEGG, HumanCyc, and WikiPathways, into HANA ■  Implemented graph-based topology exploration and ranking based on patient specifics ■  Enables interactive identification of possible dysfunctions affecting the course of a therapy before its start When time matters... Unified access to multiple formerly disjoint data sources Pathway analysis of genetic variants with graph engine 37
  38. 38. Dr. Schapranow, Festival of Genomics, Jan 31, 2017 ■  Interactively explore relevant publications, e.g. PDFs ■  Improved ease of exploration, e.g. by highlighted medical terms and relevant concepts Medical Knowledge Cockpit for Patients and Clinicians Publications When time matters... 38
  39. 39. Real-time Assessment of Clinical Trial Candidates ■  Switch from trial-centric to patient-centric clinical trials ■  Real-time matching and clustering of patients and clinical trial inclusion/exclusion criteria ■  No manual pre-screening of patients for months: In-memory technology enables interactive pre- screening process ■  Reassessment of already screened or already participating patient reduces recruitment costs When time matters... Assessment of patients preconditions for clinical trials 39 Dr. Schapranow, Festival of Genomics, Jan 31, 2017
  40. 40. ■  For patients □  Identify relevant clinical trials and medical experts □  Become an informed patient ■  For clinicians □  Identify pharmacokinetic correlations □  Scan for similar patient cases, e.g. to evaluate therapy efficiency ■  For researchers □  Enable real-time analysis of medical data, e.g. assess pathways to identify impact of detected variants □  Combined mining in structured and unstructured data, e.g. publications, diagnosis, and EMR data What to Take Home? Learn more and test-drive it yourself: AnalyzeGenomes.com Dr. Schapranow, Festival of Genomics, Jan 31, 2017 40 When time matters...
  41. 41. Keep in contact with us! Dr. Schapranow, Festival of Genomics, Jan 31, 2017 When time matters... 41 Dr.-Ing. Matthieu-P. Schapranow Program Manager E-Health & Life Sciences Hasso Plattner Institute August-Bebel-Str. 88 14482 Potsdam, Germany schapranow@hpi.de http://we.analyzegenomes.com/

×