HIG Project Overview           August 31, 2012    Matthieu-P. Schapranow    Hasso Plattner InstituteChair of Prof. Hasso P...
Vision: Real-time Analysis of Genomic    Data to Improve Medical Treatment2    HIG Project Overview, M. Schapranow, Aug 31...
Build up the Whole Picture out of Layers3     ■  Data:           □  Combine research findings from int’l scientific databa...
How the Vision Becomes Real4      ■  Platform:           □  Worker Framework: Enables parallel execution of tasks         ...
Alignment Coordinator5      ■  Available Alignment Algorithms (and growing)           □  Bowtie2           □  Bowtie      ...
Numbers you should know    Alignment Execution Time6      ■  One cell line ~600k reads / 110MB      ■  Pipeline: Alignment...
Numbers you should know    History of the Human Genome Project7      ■  1984: Idea of a global Human Genome         (HG) p...
Numbers you should know    Human Genome8              Entity                Cardinality      Different Bases              ...
9                                                                                Costs in USD                             ...
Hardware Characteristics10       ■  1,000 core cluster,          25 TB main memory       ■  Consists of 25 identical nodes...
Customer Process as of Today11       ■  Tissue sequencing in context of cancer treatment       ■  Complex, time-consuming,...
Project Objectives12       ■  Alignment of DNA reads (FASTQ) against reference genome          (FASTA) è mapped reads    ...
Thank you for your interest!     Keep in contact with us.13                                                               ...
Upcoming SlideShare
Loading in...5
×

High-Performance In-Memory Genome (HIG) Project

541

Published on

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
541
On Slideshare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
13
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

High-Performance In-Memory Genome (HIG) Project

  1. 1. HIG Project Overview August 31, 2012 Matthieu-P. Schapranow Hasso Plattner InstituteChair of Prof. Hasso Plattner
  2. 2. Vision: Real-time Analysis of Genomic Data to Improve Medical Treatment2 HIG Project Overview, M. Schapranow, Aug 31, 2012
  3. 3. Build up the Whole Picture out of Layers3 ■  Data: □  Combine research findings from int’l scientific databases in single system at HPI ■  Platform: □  Expose information as a service to be consumed by special purpose applications ■  Applications: □  Support genome alignment pipeline processing by □  Massively parallel execute: □ Alignment algorithms, e.g. BWA, BT2, etc. □ Variant calling □  Analyze individual patient results (real-time annotations with combined data) □  Analyze patient cohorts using individual filters HIG Project Overview, M. Schapranow, Aug 31, 2012
  4. 4. How the Vision Becomes Real4 ■  Platform: □  Worker Framework: Enables parallel execution of tasks (alignment, variant calling) across node limits □  Updating Framework: Retrieves periodic database updated of international databases and automatically integrates them into local store ■  Applications: □  Alignment Coordinator: Submit alignment tasks and retrieve mutation lists, e.g. CSV □  Genome Browser: Interactive browsing in reference and specific patient genomes HIG Project Overview, M. Schapranow, Aug 31, 2012
  5. 5. Alignment Coordinator5 ■  Available Alignment Algorithms (and growing) □  Bowtie2 □  Bowtie □  BWA □  TMAP □  SNAP □  MAQ □  SOAP HIG Project Overview, M. Schapranow, Aug 31, 2012
  6. 6. Numbers you should know Alignment Execution Time6 ■  One cell line ~600k reads / 110MB ■  Pipeline: Alignment and variant calling Property Traditional HPI Full Genome No Yes Cores 2 * 6 cores 25 * 40 cores Main Memory 48 GB 25 TB Runtime ~720 ~40s HIG Project Overview, M. Schapranow, Aug 31, 2012
  7. 7. Numbers you should know History of the Human Genome Project7 ■  1984: Idea of a global Human Genome (HG) project discussed at Alta Summit: “DNA available on the Internet” ■  1990: HG project for 15 years started in the US (3 billion USD funding) ■  2000: Rough draft of the HG announced ■  2003: Complete genome sequenced ■  2006: Last and longest chr1 sequenced ■  … what’s next? HIG Project Overview, M. Schapranow, Aug 31, 2012
  8. 8. Numbers you should know Human Genome8 Entity Cardinality Different Bases 4 (A,C,G,T) Base Pairs 3.137 Bbp Chromosomes 23 Distinct Genes 20k-25k Amino Acids 21 (coded as triplets) Proteins 50k-300k Taken from http://de.wikipedia.org/wiki/Code-Sonne HIG Project Overview, M. Schapranow, Aug 31, 2012
  9. 9. 9 Costs in USD 0,01 0,1 1 10 100 1000 10000 01.01.01 01.05.01 01.09.01 01.01.02 01.05.02 01.09.02 01.01.03 01.05.03 01.09.03 01.01.04 01.05.04 Comparison of Costs 01.09.04 01.01.05 Costs per Megabyte RAM 01.05.05 01.09.05 Numbers you should knowHIG Project Overview, M. Schapranow, Aug 31, 2012 01.01.06 01.05.06 01.09.06 01.01.07 01.05.07 01.09.07 01.01.08 01.05.08 01.09.08 01.01.09 Costs per Megabase Sequencing 01.05.09 01.09.09 01.01.10 Comparison of Costs for Main Memory and Genome Analysis 01.05.10 01.09.10 01.01.11 01.05.11 01.09.11 01.01.12
  10. 10. Hardware Characteristics10 ■  1,000 core cluster, 25 TB main memory ■  Consists of 25 identical nodes: □  80 cores □  1 TB main memory □  Intel® Xeon® E7- 4870 □  2.40GHz □  30 MB Cache HIG Project Overview, M. Schapranow, Aug 31, 2012
  11. 11. Customer Process as of Today11 ■  Tissue sequencing in context of cancer treatment ■  Complex, time-consuming, media breaks, manual steps HIG Project Overview, M. Schapranow, Aug 31, 2012
  12. 12. Project Objectives12 ■  Alignment of DNA reads (FASTQ) against reference genome (FASTA) è mapped reads ■  Real-time analysis of mapped reads □  Detection of mutations (SNP, INDELs) □  Comparison of multiple tissues □  Detection of similar clusters to identify co-relations ■  Analysis of mutations □  Identify mutations with scientific references (existing knowledge) □  Detection of similar clusters to identify co-relations □  Identify genes and regulators for certain phenotypic characteristics, e.g. “fast running horses” HIG Project Overview, M. Schapranow, Aug 31, 2012
  13. 13. Thank you for your interest! Keep in contact with us.13 Matthieu-P. Schapranow, M.Sc. schapranow@hpi.uni-potsdam.de http://j.mp/schapranow Hasso Plattner Institute Enterprise Platform & Integration Concepts Matthieu-P. Schapranow August-Bebel-Str. 88 14482 Potsdam, Germany HIG Project Overview, M. Schapranow, Aug 31, 2012
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×