Real-time Analysis ofNext Generation Sequencing Data                    World Health Summit                            Oct...
Genome Sequencing:    Do you have enough time?2         Image taken from http://portal.ccg.uni-koeln.de/ccg/assets/images/...
The Archon Genomics X Prize3    !            “$10 million will be awarded to the first team            to   rapidly,   acc...
Agenda4      ■  Conventional Medicine      ■  Personalized Medicine      ■  Challenges of Genome Data Analysis      ■  Hig...
Conventional Medicine5                                               Women                                           Will ...
Personalized Medicine6           “Personalized medicine aims at treating patients       specifically based on their indivi...
Personalized Medicine7   Patient suffering                          Conventional Therapy                           Treatme...
Challenges of Genome Data Analysis8                                         Analysis of Genomic                           ...
Challenges of Genome Data Analysis9                                         Analysis of Genomic                           ...
High-Performance In-Memory Genome Project     Real-time Analysis of Genome Data10       ■      Real-time Analysis of NGS D...
High-Performance In-Memory Genome Project     It is your time11     ■  ~10G FASTQ files resp. ~45M reads from 1k genome pr...
High-Performance In-Memory Genome Project     Architecture12     Real-time Analysis of NGS Data, Prof. Dr. Meinel, Schapra...
High-Performance In-Memory Genome Project     In-Memory Technology                                                        ...
High-Performance In-Memory Genome Project     Hardware Characteristics at FSOC-Lab14       ■  1,000 core cluster at       ...
What to take home?15     Sequencing machines become faster, smaller,     cheaper, and generate immense data sets in     he...
Thank you for your interest!     Keep in contact with us.16      Prof. Dr. Christoph Meinel                               ...
Upcoming SlideShare
Loading in …5
×

Real-time Analysis of Next Generation Sequencing Data

1,269 views
1,164 views

Published on

Published in: Technology
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,269
On SlideShare
0
From Embeds
0
Number of Embeds
483
Actions
Shares
0
Downloads
0
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

Real-time Analysis of Next Generation Sequencing Data

  1. 1. Real-time Analysis ofNext Generation Sequencing Data World Health Summit Oct 24, 2012 Prof. Dr. Christoph Meinel Matthieu Schapranow Hasso Plattner Institute
  2. 2. Genome Sequencing: Do you have enough time?2 Image taken from http://portal.ccg.uni-koeln.de/ccg/assets/images/3730.jpg Real-time Analysis of NGS Data, Prof. Dr. Meinel, Schapranow, World Health Summit, Oct 24, 2012
  3. 3. The Archon Genomics X Prize3 ! “$10 million will be awarded to the first team to rapidly, accurately and economically sequence 100 whole human genomes to an unprecedented level of accuracy.”
 ! (Archon Genomics X Prize, 2012)! Real-time Analysis of NGS Data, Prof. Dr. Meinel, Schapranow, World Health Summit, Oct 24, 2012
  4. 4. Agenda4 ■  Conventional Medicine ■  Personalized Medicine ■  Challenges of Genome Data Analysis ■  High-Performance In-Memory Genome Project Real-time Analysis of NGS Data, Prof. Dr. Meinel, Schapranow, World Health Summit, Oct 24, 2012
  5. 5. Conventional Medicine5 Women Will Develop Cancer Men Will Never Delop Cancer 0% 50% 100% American Cancer Society, Surveillance Research, 2012 Chemotherapies Fail Work Real-time Analysis of NGS Data, Prof. Dr. Meinel, Schapranow, World Health Summit, Oct 24, 2012
  6. 6. Personalized Medicine6 “Personalized medicine aims at treating patients specifically based on their individual dispositions, e.g. genetic or environmental factors”! (K. Jain, Textbook of Personalized Medicine. Springer, 2009)! Enhanced by Limiting Factor World-wide medical Research results in heterogeneously research activities formatted in distributed databases Real-time Analysis of NGS Data, Prof. Dr. Meinel, Schapranow, World Health Summit, Oct 24, 2012
  7. 7. Personalized Medicine7 Patient suffering Conventional Therapy Treatment from Cancer Decision DNA Analysis of Sequencing Genomic Data •  Quantity: 3.2 Billion Base Pairs •  Quantity: •  Data Size: 1-20 GB •  Known Mutations: 80M •  Distinct Genes: 20k-25k •  Proteins: 50k-300k •  Data Sizes: •  Alignment: 5-10 GB •  Variants: 10-100 GB Personalized Medicine As of Today Supported by HPI 0 10 20 30 40 Duration [Days] Real-time Analysis of NGS Data, Prof. Dr. Meinel, Schapranow, World Health Summit, Oct 24, 2012
  8. 8. Challenges of Genome Data Analysis8 Analysis of Genomic Data Alignment and Analysis of Annotations Variant Calling in World-wide DBs Bound To CPU Performance Memory Capacity Duration Hours Weeks HPI Minutes Real-time Multi-Core Partitioning & Compression In-Memory Technology Real-time Analysis of NGS Data, Prof. Dr. Meinel, Schapranow, World Health Summit, Oct 24, 2012
  9. 9. Challenges of Genome Data Analysis9 Analysis of Genomic Data Alignment and Analysis of Annotations Variant Calling in World-wide DBs Bound To CPU Performance Memory Capacity Duration Hours Weeks HPI Minutes Real-time Multi-Core Partitioning & Compression In-Memory Technology Real-time Analysis of NGS Data, Prof. Dr. Meinel, Schapranow, World Health Summit, Oct 24, 2012
  10. 10. High-Performance In-Memory Genome Project Real-time Analysis of Genome Data10 ■  Real-time Analysis of NGS Data, Prof. Dr. Meinel, Schapranow, World Health Summit, Oct 24, 2012
  11. 11. High-Performance In-Memory Genome Project It is your time11 ■  ~10G FASTQ files resp. ~45M reads from 1k genome project ■  ~400k-700k variants detected BWA, Bowtie, Bowtie2, TMAP ■  ~45 min for alignment and variant calling ■  Analysis of result: Interactive exploration in real-time Real-time Analysis of NGS Data, Prof. Dr. Meinel, Schapranow, World Health Summit, Oct 24, 2012
  12. 12. High-Performance In-Memory Genome Project Architecture12 Real-time Analysis of NGS Data, Prof. Dr. Meinel, Schapranow, World Health Summit, Oct 24, 2012
  13. 13. High-Performance In-Memory Genome Project In-Memory Technology ● ● ● ● Read Event Read Event Verification Verification Repositories Repositories Services Services up to 8.000 read up to 8.000 read up to 2.000 up to 2.000 event notifications event notifications requests requests per second per second13 per second per second + Combined Minimal Any attribute Discovery Service column Discovery Service projections as index and row store Insert only Multi-core/ + for time travel Bulk load +++ parallelization SAP HANA SAP HANA P A Active/passive P A Lightweight A P data store Partitioning Compression Dynamic SQL Analytics on SQL interface multi- historical threading t on columns & data rows within nodes No aggregate Single and Reduction of x tables multi-tenancy x layers Object to +++ On-the-fly Text Retrieval extensibility relational T and Extraction mapping Map Group Key No disk reduce Real-time Analysis of NGS Data, Prof. Dr. Meinel, Schapranow, World Health Summit, Oct 24, 2012
  14. 14. High-Performance In-Memory Genome Project Hardware Characteristics at FSOC-Lab14 ■  1,000 core cluster at Hasso Plattner Institute with 25 TB main memory ■  Consists of 25 nodes, each: □  40 cores □  1 TB main memory □  Intel® Xeon® E7- 4870 □  2.40GHz □  30 MB Cache Real-time Analysis of NGS Data, Prof. Dr. Meinel, Schapranow, World Health Summit, Oct 24, 2012
  15. 15. What to take home?15 Sequencing machines become faster, smaller, cheaper, and generate immense data sets in heterogeneous formats ■  IT technology is the key to explore and analyze these big data sets ■  Parallelization reduces time for processing of genome data ■  In-memory technology enables real-time analysis and interactive exploration of genome data ■  We integrate research results from int’l research databases in a single knowledge base “Let’s identify genomic roots and optimal treatments before the patient wakes up from anaesthesia” Real-time Analysis of NGS Data, Prof. Dr. Meinel, Schapranow, World Health Summit, Oct 24, 2012
  16. 16. Thank you for your interest! Keep in contact with us.16 Prof. Dr. Christoph Meinel Matthieu-P. Schapranow, M.Sc. office-meinel@hpi.uni-potsdam.de schapranow@hpi.uni-potsdam.de http://www.hpi.uni-potsdam.de/meinel/team/christoph_meinel.html http://j.mp/schapranow Hasso Plattner Institute Enterprise Platform & Integration Concepts Matthieu-P. Schapranow August-Bebel-Str. 88 14482 Potsdam, Germany Real-time Analysis of NGS Data, Prof. Dr. Meinel, Schapranow, World Health Summit, Oct 24, 2012

×