SAP HANA For Genome Data Processing: A Deep Dive

1,878 views
1,720 views

Published on

This content was presented at the Personalized Medicine World Conference 2013 in Mountain View

Published in: Technology
0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,878
On SlideShare
0
From Embeds
0
Number of Embeds
103
Actions
Shares
0
Downloads
6
Comments
0
Likes
3
Embeds 0
No embeds

No notes for slide

SAP HANA For Genome Data Processing: A Deep Dive

  1. 1. SAP  HANA  For  Genome  Data   Processing:  A  Deep  Dive  Dr.  Mahieu-­‐P.  Schapranow   Emanuel  Ziegler   PI  In-­‐Memory  Technology   HANA  In-­‐Memory  Pla:orm   for  Life  Sciences   Genomics  and  Proteomics   Hasso  Pla9ner  Ins;tute   SAP  AG  
  2. 2. Comparison  of  Costs   Comparison  of  Costs  for  Main  Memory  and  Genome  Analysis   Costs  per  Megabyte  RAM   Costs  per  Megabase  Sequencing   10000   1000   100  Costs  in  USD   10   1   0.1   0.01   0.001   1/1/01   5/1/01   9/1/01   1/1/02   5/1/02   9/1/02   1/1/03   5/1/03   9/1/03   1/1/04   5/1/04   9/1/04   1/1/05   5/1/05   9/1/05   1/1/06   5/1/06   9/1/06   1/1/07   5/1/07   9/1/07   1/1/08   5/1/08   9/1/08   1/1/09   5/1/09   9/1/09   1/1/10   5/1/10   9/1/10   1/1/11   5/1/11   9/1/11   1/1/12   SAP  HANA  For  Genome  Data  Processing:  A  Deep  Dive,  E.  Ziegler  and  Dr.  M.-­‐P.  Schapranow   2  
  3. 3. HANA  technology  for  alignment   Efficient  streaming  of  large  amounts  of  data   using  experience  with  high  throughput  of  big  data   Cache  efficient  index  structures  for  seed  lookups   using  knowledge  from  text  search   RaFng  of  seed  matches   based  on  search  engine  prac;ces   Hardware  accelerated  gapped  alignment   using  vectoriza;on  and  bit  parallelism   SAP  HANA  For  Genome  Data  Processing:  A  Deep  Dive,  E.  Ziegler  and  Dr.  M.-­‐P.  Schapranow   3  
  4. 4. Alignment  on  SAP  HANA   Simulated  full  genome   Illumina  HiSeq  sequenced  exome   100  bases  per  read,  single  ended   100  bases  per  read,  single  ended  BWA-­‐SW  SAP  HANA   Misaligned   Misaligned   Unaligned   Unaligned   0   0.2   0.4   0.6   0.8   1.0   0   0.2   0.4   0.6   0.8   1.0   Percentage   Percentage   Misalignment  w.  r.  t.  Smith-­‐Waterman  score   Misalignment  w.  r.  t.  Smith-­‐Waterman  score   of  reference  alignment  from  simula;on   of  other  alignment  algorithm  result   SAP  HANA  For  Genome  Data  Processing:  A  Deep  Dive,  E.  Ziegler  and  Dr.  M.-­‐P.  Schapranow   4  
  5. 5. Genome  Data  Processing   Integrated  in  SAP  HANA   1,000  core  cluster   ■  25  iden;cal  nodes   ■  80  cores   ■  1  TB  main  memory   ■  2.40  GHz,  30  MB  Cache  SAP  HANA  For  Genome  Data  Processing:  A  Deep  Dive,  E.  Ziegler  and  Dr.  M.-­‐P.  Schapranow   5  
  6. 6. Real-­‐;me  Combina;on  of   Latest  Research  Results  Genome  Browser  ■  Comparison  of  mul;ple  mapped  genomes  with  reference  ■  Explora;on  of  individual  genome  loca;ons  combined  with  latest   relevant  annota;ons  and  literature  e.g.  NCBI,  dbSNP,  UCSC,  Sanger            InterpretaFon  of  Variants  ■  Variants  are  sorted,  e.g.  accordingly  to  known  associated  diseases  ■  All  variants  are  linked  to  genome  browser  ■  Mul;ple  pa;ents  can  be  compared  to  iden;fy  individual  disposi;ons   SAP  HANA  For  Genome  Data  Processing:  A  Deep  Dive,  E.  Ziegler  and  Dr.  M.-­‐P.  Schapranow   6  
  7. 7. Hardware  Advances  Support   Analysis  of  Genome  Data   Alignment  and   CombinaFon  with  Latest   Variant  Calling   Research  AnnotaFons   Bound  To   CPU  Performance   Memory  Capacity   DuraFon   Hours   Weeks  SAP  &  HPI   Minutes   Real-­‐;me   Mul;-­‐Core   Par;;oning  &  Compression  In-­‐Memory      Technology       SAP  HANA  For  Genome  Data  Processing:  A  Deep  Dive,  E.  Ziegler  and  Dr.  M.-­‐P.  Schapranow   7  
  8. 8. What  to  take  home?  Sequencing  machines  become  faster,  smaller,  cheaper,  and  generate  immense  data  sets  in  heterogeneous  formats     ■  In-­‐memory  technology  is  the  key  to   explore  and  analyze  these  big  data  sets   ■  Efficient  paralleliza;on  reduces  processing  ;me   ■  In-­‐memory  technology  enables  real-­‐;me  analysis  and   interac;ve  explora;on  of  genome  data     “Let’s  idenFfy  genomic  roots  and  opFmal  treatments   before  the  paFent  wakes  up  from  anaesthesia!”   SAP  HANA  For  Genome  Data  Processing:  A  Deep  Dive,  E.  Ziegler  and  Dr.  M.-­‐P.  Schapranow   8  
  9. 9. Thank  you  for  your  interest!   Keep  in  contact  with  us.   Dr. Matthieu-P. SchapranowEmanuel Ziegler schapranow@hpi.uni-potsdam.deemanuel.ziegler@SAP.com http://j.mp/schapranowSAP AG Hasso Plattner InstituteEmanuel Ziegler, TREX Enterprise Platform & Integration ConceptsDietmar-Hopp-Allee 16 Matthieu-P. Schapranow69190 Walldorf, Germany August-Bebel-Str. 88 14482 Potsdam, Germany SAP  HANA  For  Genome  Data  Processing:  A  Deep  Dive,  E.  Ziegler  and  Dr.  M.-­‐P.  Schapranow   9  

×