0
Test Drive NVIDIA GPUs!
Experience The Acceleration

Develop your codes on latest
GPUs today
Sign up for FREE GPU Test Dri...
Prof. Dr. Knut Reinert
Algorithmische Bioinformatik, FB Mathematik und Informatik

Intro to SeqAn
An Open-Source C++ templ...
This talk

Why SeqAn?
SeqAn as SDK
SeqAn concept/content
Generic Parallelization
3
~ 15 years ago...

Data volume and cost:
In 2000 the 3 billion base pairs of the
human genome were sequenced for
about 3 b...
Sequencing today...

Illumina HiSeq
100 Billion bps per DAY

Within roughly ten years sequencing has
become about 10 milli...
Future of NGS data analysis

Nvidia Webinar, 22.10.2013

6
Software libraries bridge gap
Structural variants

RNA-Seq

ChIP-Seq

Metagenomics abundance

Sequence assembly

Cancer ge...
SeqAn
Now SeqAn/SeqAn tools have been cited more
than 360 times
Among the institutions are (omitting German institutes):
D...
SeqAn developers
16
14
12
External

10

CSC
BMBF

8

DFG

6

IMPRS
FU

4
2
0
2003 2004 2005 2006 2007 2008 2009 2010 2011 ...
SeqAn main concepts

Nvidia Webinar, 22.10.2013

10
length(str)

Value<T>::Type

String<Subclass>
Nvidia Webinar, 22.10.2013

11
void swap(string & str)
{
char help = str[1];
str[1] = str[0];
str[0] = help;
}

Nvidia Webinar, 22.10.2013

12
template <typename T>
void swap(T & str)
{
char help = str[1];
str[1] = str[0];
str[0] = help;
}

Nvidia Webinar, 22.10.20...
template <typename T>
void swap(T & str)
{
char help = str[1];
str[1] = str[0];
str[0] = help;
}

Nvidia Webinar, 22.10.20...
template <typename T>
void swap(String<T> & str)
{
T help = str[1];
str[1] = str[0];
str[0] = help;
}

Nvidia Webinar, 22....
template <typename T>
void swap(T & str)
{
T::value_type help = str[1];
str[1] = str[0];
str[0] = help;
}

Nvidia Webinar,...
template <typename T>
void swap(T & str)
{
Value<T>::Type help = str[1];
str[1] = str[0];
str[0] = help;
}

Nvidia Webinar...
Metafunction
template <typename T>
struct Value
{
typedef T Type;
};

Nvidia Webinar, 22.10.2013

18
template <typename T>
struct Value
template <typename T>
{
struct Value< String<T> >
typedef T Type;
{
}; typedef T Type;
...
template <typename T>
struct Value
{
typedef T Type;
};
template <typename T>
< >
struct Value< String<T> >
char * >
{
typ...
template <typename T>
struct Value< String<T> >
{
typedef T Type;
};
template < >
t_size N >
struct Value< char * > >
[N]
...
template <typename T>
void swap(T & str)
{
Value<T>::Type help = str[1];
str[1] = str[0];
str[0] = help;
}

Nvidia Webinar...
template <typename T>
void swap(T & str)
{
Value<T>::Type help = str[1];
str[1] = str[0];
str[0] = help;
}

Nvidia Webinar...
template <typename T>
void swap(T & str)
{
Value<T>::Type help =
value(str,1);
value(str,1) = value(str,0);
value(str,0) =...
Shim Function
template <typename T>
Value<T> & value( T & str,
int i)
{
return str[i];
};
Nvidia Webinar, 22.10.2013

25
Generic Algorithm
template <typename T>
void swap(T & str)
{
Value<T>::Type help =
value(str,1);
value(str,1) = value(str,...
SeqAn Content - SDK

Nvidia Webinar, 22.10.2013

27
SeqAn SDK Components - Tutorials

Nvidia Webinar, 22.10.2013

28
SeqAn SDK Components –
Reference Manual

Nvidia Webinar, 22.10.2013

29
SeqAn SDK Components

Review Board to ensure code quality
CDash/CTest to automatically
Code coverage reports
compile and t...
SeqAn Content
algorithms & data structures

Nvidia Webinar, 22.10.2013

31
Unified Alignment Algorithms
Versatile & Extensible DP-Interface
For Example ...
Standard DP-Algorithms
Global & Semi Glob...
Unified Alignment Algorithms
For	
  Example	
  ...	
  
Needleman-Wunsch with Traceback:
DPProfile<GlobalAlignment<>, Linea...
Support for Common File Formats
Important file formats for HTS analysis
SequenceStream	
  ss(“file.fa.gz”);	
  
Sequences
...
Journaled Sequences
Store Multiple Genomes
Save Storage Capacities

StringSet<TJournaled,	
  Owner<JournalSet>	
  >	
  set...
Fragment	
  Store
(Multi) Read Alignments
Read alignments can be easily imported:
std::ifstream	
  file("ex1.sam");	
  
re...
Unified	
  Full-­‐Text	
  Indexing	
  Framework
Available Indices
Suffix Trees:
•  suffix array
•  enhanced suffix array
• ...
SeqAn Performance

Nvidia Webinar, 22.10.2013

38
Masai read mapper

Nvidia Webinar, 22.10.2013

39
Masai read mapper
Reads	
  

Genome	
  
Chr.	
  1	
  
Chr.	
  2	
  
Chr.	
  X	
  

ACGCTTCATCGCCCT…	
  

Index	
  of	
  re...
Read Mapping: Masai
Faster	
  and	
  more	
  accurate	
  than	
  BWA	
  and	
  BowLe2	
  
Timings	
  on	
  a	
  single	
  ...
Easily exchange index….

Nvidia Webinar, 22.10.2013

42
Collaboration to parallelize indices and
verification algorithms in SeqAn, to speed up any
applications making use of indi...
SeqAn going parallel
GOAL
Parallelize the finder interface of SeqAn
so it works on CPU and accelerators like GPU

Will	
  ...
SeqAn going parallel

Construct	
  FM-­‐index	
  
on	
  reverse	
  genome	
  
Set	
  #	
  OMP	
  threads	
  
Call	
  gener...
SeqAn going parallel : NVIDIA GPUs

Copy	
  needles	
  and	
  index	
  to	
  GPU	
  

SAME	
  count	
  funcLon	
  as	
  on...
SeqAn going parallel
Count	
  occurrences	
  of	
  10	
  million	
  20-­‐mers	
  	
  	
  
in	
  the	
  human	
  genome	
  ...
SeqAn going parallel
Approx.	
  count	
  occurrences	
  of	
  1.2	
  million	
  33-­‐mers	
  	
  	
  
in	
  the	
  human	
...
Part II: The details

Nvidia Webinar, 22.10.2013

49
Parallelization on the GPU

Nvidia Webinar, 22.10.2013
CUDA preliminaries
In	
  order	
  to	
  use	
  CUDA	
  we	
  first	
  had	
  to	
  adapt	
  some	
  parts	
  of	
  SeqAn:	
...
Strings
•  Instead	
  of	
  defining	
  a	
  new	
  CUDA	
  string	
  we	
  simply	
  use	
  the	
  Thrust	
  library:	
  
...
Standard Strings
•  Up	
  to	
  here,	
  all	
  strings	
  can	
  only	
  be	
  used	
  on	
  the	
  side	
  of	
  their	
...
Host-Device String
•  How	
  to	
  access	
  a	
  device_vector	
  from	
  device-­‐side?	
  
•  We	
  could	
  pass	
  (P...
Host-Device String
•  How	
  to	
  use	
  a	
  device_vector	
  on	
  the	
  device	
  

Device	
  Memory	
  

Host	
  Mem...
Device and View metafunctions
•  For	
  generic	
  GPU	
  programming:	
  
•  The	
  Device	
  metafuncLon	
  returns	
  t...
Hello world
•  A	
  simple	
  example	
  to	
  reverse	
  a	
  string	
  on	
  the	
  GPU	
  
// A standard SeqAn string o...
Porting complex data structures
•  More	
  complex	
  structures	
  (e.g.	
  Index,	
  Graph)	
  can	
  only	
  be	
  port...
Example: FM Index

Nvidia Webinar, 22.10.2013
The FM-index (BWT, LF-mapping)

Nvidia Webinar, 22.10.2013
The FM-index (search ssi)

a3	
  =	
  C(‘i’)	
  +	
  Occ(‘i’,0)	
  +	
  1 	
  =	
  1	
  +	
  0	
  +	
  1	
  
b3	
  =	
  C(...
The FM-index (backwards search)

a1	
  =	
  C(‘s’)	
  +	
  Occ(‘s’,8)	
  +	
  1	
  =	
  8	
  +	
  2	
  +	
  1	
  
	
  
b1	...
The FM-index in SeqAn
•  The	
  FM-­‐index	
  can	
  be	
  implemented	
  using	
  a	
  number	
  of	
  string-­‐based	
  ...
A generic FM-index
•  SeqAn‘s	
  FM-­‐index	
  consists	
  of	
  some	
  nested	
  classes	
  storing	
  Strings	
  
FM-­‐...
A generic FM-index
•  The	
  Device	
  type	
  of	
  the	
  FM	
  index	
  uses	
  device_vector	
  instead	
  of	
  Strin...
CPU vs. GPU
•  Invoking	
  an	
  FM-­‐index	
  based	
  search	
  on	
  CPU	
  and	
  GPU:	
  
// Select the index The find...
Approximate search via backtracking
do {!
if (finder.score == finder.scoreThreshold)!
{!
if (goDown(textIt, suffix(pattern...
Outlook for GPU support
•  Our	
  next	
  steps	
  are:	
  
•  Provide	
  parallelFor()	
  to	
  hide	
  CUDA	
  kernel	
 ...
Generic Parallelization

Nvidia Webinar, 22.10.2013

69
Multicore parallelization
•  We	
  first	
  introduced	
  Tags	
  to	
  switch	
  between	
  serial	
  and	
  parallel	
  
...
Splitter
•  To	
  this	
  end,	
  we	
  developed	
  the	
  Splitter<TValue, TSpec>	
  to	
  
compute	
  a	
  parLLon	
  i...
Splitter
•  The	
  Spliqer	
  can	
  also	
  be	
  used	
  with	
  iterators	
  directly	
  	
  
•  The	
  Serial	
  /	
  ...
SeqAn going parallel
Count	
  occurrences	
  of	
  10	
  million	
  20-­‐mers	
  	
  	
  
in	
  the	
  human	
  genome	
  ...
Upcoming GTC Express Webinars
October 23 - Revolutionize Virtual Desktops with the One
Missing Piece: A Scalable GPU
Octob...
GTC 2014 Call for Posters
Posters should describe novel or interesting topics in
§  Science and research
§  Professional...
Test Drive NVIDIA GPUs!
Experience The Acceleration

Develop your codes on latest
GPUs today
Sign up for FREE GPU Test Dri...
Upcoming SlideShare
Loading in...5
×

Introduction to SeqAn, an Open-source C++ Template Library

731

Published on

SeqAn (www.seqan.de) is an open-source C++ template library (BSD license) that implements many efficient and generic data structures and algorithms for Next-Generation Sequencing (NGS) analysis. It contains gapped k-mer indices, enhanced suffix arrays (ESA) or an FM-index, as well algorithms for fast and accurate alignment or read mapping. Based on those data types and fast I/O routines, users can easily develop tools that are extremely efficient and easy to maintain. Besides multi-core, the research team at Freie Universität Berlin has started generic support for distinguished accelerators such as NVIDIA GPUs. Go through the slides to learn more. For your own BI development you can try GPUs for free here: www.Nvidia.com/GPUTestDrive

Published in: Technology, Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
731
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
8
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Transcript of "Introduction to SeqAn, an Open-source C++ Template Library"

  1. 1. Test Drive NVIDIA GPUs! Experience The Acceleration Develop your codes on latest GPUs today Sign up for FREE GPU Test Drive on remotely hosted clusters www.nvidia.com/GPUTestDrive
  2. 2. Prof. Dr. Knut Reinert Algorithmische Bioinformatik, FB Mathematik und Informatik Intro to SeqAn An Open-Source C++ template library for biological sequence analysis Knut Reinert, David Weese Freie Universität Berlin Berlin Institute for Computer Science
  3. 3. This talk Why SeqAn? SeqAn as SDK SeqAn concept/content Generic Parallelization 3
  4. 4. ~ 15 years ago... Data volume and cost: In 2000 the 3 billion base pairs of the human genome were sequenced for about 3 billion US$ Dollar 100 million bp per day Nvidia Webinar, 22.10.2013 4
  5. 5. Sequencing today... Illumina HiSeq 100 Billion bps per DAY Within roughly ten years sequencing has become about 10 million times cheaper Nvidia Webinar, 22.10.2013 5
  6. 6. Future of NGS data analysis Nvidia Webinar, 22.10.2013 6
  7. 7. Software libraries bridge gap Structural variants RNA-Seq ChIP-Seq Metagenomics abundance Sequence assembly Cancer genomics Analysis pipelines Experimentalists Maintainable tool Prototype implementation Algorithm libraries Algorithm design Computer Scientists FM-index Multicore Suffix arrays Nvidia Webinar, 22.10.2013 Theoretical Considerations Secondary memory Fast I/O K-mer filter Hardware acceleration 7
  8. 8. SeqAn Now SeqAn/SeqAn tools have been cited more than 360 times Among the institutions are (omitting German institutes): Department of Genetics, Harvard Medical School, Boston, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, J. Craig Venter Institute, under BSD USA, Is Rockville MD, license and Department of Molecular Biology, Princeton University, hence free for academic Applied Mathematics Program, Yale University, New Haven, IBM T.J. Watson Research Center, Yorktown Heights, AND commercial use. The Ohio State University, Columbus, University of Minnesota, Australian National University, Canberra, Department of Statistics, University of Oxford, Swedish University of Agricultural Sciences (SLU), Uppsala, Graduate School of Life Sciences, University of Cambridge, Broad Institute, Cambridge, USA, EMBL-EBI, University of California, University of Chicago, Iowa State University, Ames, The Pennsylvania State University, Peking University, Beijing University of Science and Technology of China, BGI-Shenzhen, China, Beijing Institute of Genomics…… Nvidia Webinar, 22.10.2013 8
  9. 9. SeqAn developers 16 14 12 External 10 CSC BMBF 8 DFG 6 IMPRS FU 4 2 0 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 Nvidia Webinar, 22.10.2013 9
  10. 10. SeqAn main concepts Nvidia Webinar, 22.10.2013 10
  11. 11. length(str) Value<T>::Type String<Subclass> Nvidia Webinar, 22.10.2013 11
  12. 12. void swap(string & str) { char help = str[1]; str[1] = str[0]; str[0] = help; } Nvidia Webinar, 22.10.2013 12
  13. 13. template <typename T> void swap(T & str) { char help = str[1]; str[1] = str[0]; str[0] = help; } Nvidia Webinar, 22.10.2013 13
  14. 14. template <typename T> void swap(T & str) { char help = str[1]; str[1] = str[0]; str[0] = help; } Nvidia Webinar, 22.10.2013 14
  15. 15. template <typename T> void swap(String<T> & str) { T help = str[1]; str[1] = str[0]; str[0] = help; } Nvidia Webinar, 22.10.2013 15
  16. 16. template <typename T> void swap(T & str) { T::value_type help = str[1]; str[1] = str[0]; str[0] = help; } Nvidia Webinar, 22.10.2013 16
  17. 17. template <typename T> void swap(T & str) { Value<T>::Type help = str[1]; str[1] = str[0]; str[0] = help; } Nvidia Webinar, 22.10.2013 17
  18. 18. Metafunction template <typename T> struct Value { typedef T Type; }; Nvidia Webinar, 22.10.2013 18
  19. 19. template <typename T> struct Value template <typename T> { struct Value< String<T> > typedef T Type; { }; typedef T Type; }; Nvidia Webinar, 22.10.2013 19
  20. 20. template <typename T> struct Value { typedef T Type; }; template <typename T> < > struct Value< String<T> > char * > { typedef T Type; char Type; }; Nvidia Webinar, 22.10.2013 20
  21. 21. template <typename T> struct Value< String<T> > { typedef T Type; }; template < > t_size N > struct Value< char * > > [N] { typedef char Type; }; Nvidia Webinar, 22.10.2013 21
  22. 22. template <typename T> void swap(T & str) { Value<T>::Type help = str[1]; str[1] = str[0]; str[0] = help; } Nvidia Webinar, 22.10.2013 22
  23. 23. template <typename T> void swap(T & str) { Value<T>::Type help = str[1]; str[1] = str[0]; str[0] = help; } Nvidia Webinar, 22.10.2013 23
  24. 24. template <typename T> void swap(T & str) { Value<T>::Type help = value(str,1); value(str,1) = value(str,0); value(str,0) = help; } Nvidia Webinar, 22.10.2013 24
  25. 25. Shim Function template <typename T> Value<T> & value( T & str, int i) { return str[i]; }; Nvidia Webinar, 22.10.2013 25
  26. 26. Generic Algorithm template <typename T> void swap(T & str) { Value<T>::Type help = value(str,1); value(str,1) = value(str,0); value(str,0) = help; } Nvidia Webinar, 22.10.2013 26
  27. 27. SeqAn Content - SDK Nvidia Webinar, 22.10.2013 27
  28. 28. SeqAn SDK Components - Tutorials Nvidia Webinar, 22.10.2013 28
  29. 29. SeqAn SDK Components – Reference Manual Nvidia Webinar, 22.10.2013 29
  30. 30. SeqAn SDK Components Review Board to ensure code quality CDash/CTest to automatically Code coverage reports compile and test across platforms Nvidia Webinar, 22.10.2013 30
  31. 31. SeqAn Content algorithms & data structures Nvidia Webinar, 22.10.2013 31
  32. 32. Unified Alignment Algorithms Versatile & Extensible DP-Interface For Example ... Standard DP-Algorithms Global & Semi Global Alignments Local Alignments Modified DP-Algorithms Split Breakpoint Detection Banded Chain Alignment Nvidia Webinar, 22.10.2013 32
  33. 33. Unified Alignment Algorithms For  Example  ...   Needleman-Wunsch with Traceback: DPProfile<GlobalAlignment<>, LinearGaps, TracebackOn<> > Semi-Global Gotoh without Traceback: DPProfile<GlobalAlignment<FreeEndGaps<True, False, True, False> >, AffineGaps, TracebackOff> Banded Smith-Waterman with Affine Gap Costs: DPBand<BandOn>(lowerDiag, upperDiag), DPProfile<LocalAlignment<>, AffineGaps, TracebackOn<> > Split-Breakpoint Detection for Right Anchor: DPProfile<SplitAlignment<>, AffineGaps, TracebackOn<GapsRight> > Nvidia Webinar, 22.10.2013 33
  34. 34. Support for Common File Formats Important file formats for HTS analysis SequenceStream  ss(“file.fa.gz”);   Sequences while  (!atEnd(ss))   FASTA, FASTQ Indexed FASTA (FAI) for random access {   Genomic Features GFF 2, GFF 3, GTF, BED Read Mapping SAM, BAM (plus BAM indices) Variants VCF  readRecord(id,  seq,  ss);    cout  <<  id  <<  't'  <<  seq  <<  'n';   } BamStream  bs(“file.bam”);   while  (!atEnd(bs))   {    readRecord(record,  bs);    cout  <<  record.qName  <<  't'  <<                    record.pos  <<  'n’;   } … or write your own parser Tutorials and helper routines for writing your own parsers. Nvidia Webinar, 22.10.2013 34
  35. 35. Journaled Sequences Store Multiple Genomes Save Storage Capacities StringSet<TJournaled,  Owner<JournalSet>  >  set;   setGlobalReference(set,  refSeq);   String<Dna,  Journaled<Alloc<>  >  >   appendValue(set,  seq1);   join(set,  idx,  JoinConfig<>());   Ref: G1: ŸŸŸ ŸŸŸ G2: GN: Nvidia Webinar, 22.10.2013 35
  36. 36. Fragment  Store (Multi) Read Alignments Read alignments can be easily imported: std::ifstream  file("ex1.sam");   read(file,  store,  Sam());   … and accessed as a multiple alignment, e.g. for visualization: AlignedReadLayout  layout;   layoutAlignment(layout,  store);   printAlignment(svgFile,  Raw(),  layout,  store,  1,  0,  150,  0,  36); Nvidia Webinar, 22.10.2013 36
  37. 37. Unified  Full-­‐Text  Indexing  Framework Available Indices Suffix Trees: •  suffix array •  enhanced suffix array •  lazy suffix tree Prefix Trie: •  FM-index q-Gram Indices: •  direct addressing •  open addressing •  gapped Index<TSeq,  IndexEsa<>  >   Index<StringSet<TSeq>,  FMIndex<>  >   All indices support multiple strings and external memory construction/usage. Index Lookup Interface All indices support the (sequential) find interface: Finder<TIndex>  finder(index);   while  (find(finder,  "TATAA"))      cout  <<  "Hit  at  position"  <<  position(finder)  <<  endl;       Nvidia Webinar, 22.10.2013 37
  38. 38. SeqAn Performance Nvidia Webinar, 22.10.2013 38
  39. 39. Masai read mapper Nvidia Webinar, 22.10.2013 39
  40. 40. Masai read mapper Reads   Genome   Chr.  1   Chr.  2   Chr.  X   ACGCTTCATCGCCCT…   Index  of  reads   (Radix  tree  of  seeds)   Index  of  genome   (e.g.  FM-­‐index)   Algorithm  is  based  on  the  simultaneous  traversal  of  two  string  indices     (e.g.,  FM-­‐index,  Enhanced  suffix  array,  Lazy  suffix  tree)   40 Nvidia Webinar, 22.10.2013
  41. 41. Read Mapping: Masai Faster  and  more  accurate  than  BWA  and  BowLe2   Timings  on  a  single  core   Nvidia Webinar, 22.10.2013 41
  42. 42. Easily exchange index…. Nvidia Webinar, 22.10.2013 42
  43. 43. Collaboration to parallelize indices and verification algorithms in SeqAn, to speed up any applications making use of indices What about multi-core implementation? Nvidia Webinar, 22.10.2013 43
  44. 44. SeqAn going parallel GOAL Parallelize the finder interface of SeqAn so it works on CPU and accelerators like GPU Will  be  replaced  by  hg18    and  10  million  20-­‐mers   Nvidia Webinar, 22.10.2013 44
  45. 45. SeqAn going parallel Construct  FM-­‐index   on  reverse  genome   Set  #  OMP  threads   Call  generic  count  funcLon   Nvidia Webinar, 22.10.2013 45
  46. 46. SeqAn going parallel : NVIDIA GPUs Copy  needles  and  index  to  GPU   SAME  count  funcLon  as  on  CPU  !   Nvidia Webinar, 22.10.2013 46
  47. 47. SeqAn going parallel Count  occurrences  of  10  million  20-­‐mers       in  the  human  genome  using  an  FM-­‐index   I7,3.2  GHz   …12...   Intel  Xeon  Phi   7120,   244  threads   NVIDIA   Tesla  K20   Nvidia Webinar, 22.10.2013 18.6  sec   1  X   2.66  sec   7  X   2.18   sec   8.5  X   0.4 s 47  X   47
  48. 48. SeqAn going parallel Approx.  count  occurrences  of  1.2  million  33-­‐mers       in  the  human  genome  using  an  FM-­‐index   I7,3.2  GHz   …12...   66.1  s   9.0  s   1  X   7.3  X   Intel  Xeon  Phi   7120,   244  threads   3.9 s 16.9  X   NVIDIA   Tesla  K20   3.2 s 20.7  X   Nvidia Webinar, 22.10.2013 48
  49. 49. Part II: The details Nvidia Webinar, 22.10.2013 49
  50. 50. Parallelization on the GPU Nvidia Webinar, 22.10.2013
  51. 51. CUDA preliminaries In  order  to  use  CUDA  we  first  had  to  adapt  some  parts  of  SeqAn:   •  CUDA  requires  each  funcLon  to  be  prefixed  with  domain  qualifiers   __host__    or    __device__    in  order  to  generate  CPU/GPU  code   •  We  prefixed  all  basic  template  funcLons  with  a  SEQAN_HOST_DEVICE  macro       #ifdef __CUDACC__! #define SEQAN_HOST_DEVICE inline __device__ __host__! #else! #define SEQAN_HOST_DEVICE inline! #endif! •  StaLc  const  arrays  are  not  allowed  in  the  way  SeqAn  defines  them   •  We  replaced  alphabet  conversion  lookup  tables  (e.g.  Dna<-->  char)  by   conversion  funcLons   Nvidia Webinar, 22.10.2013
  52. 52. Strings •  Instead  of  defining  a  new  CUDA  string  we  simply  use  the  Thrust  library:   •  Provides  host_vector  and  device_vector  classes,  which  are  vectors  with   buffers  in  host  or  device  memory   •  However,  Thrust  funcLons  are  callable  only  from  host-­‐side   •  We  made  both  vectors  accessible  from  SeqAn   •  SeqAn  strings  have  to  provide  a  set  of  global  (meta-­‐)funcLons,  e.g.  Value<>,   resize(),  …   •  We  simply  defined  the  required  wrapper  funcLons  for  these  two  vectors   Nvidia Webinar, 22.10.2013
  53. 53. Standard Strings •  Up  to  here,  all  strings  can  only  be  used  on  the  side  of  their  scope   Device  Memory   Host  Memory   thrust::host_vector! Buffer   Buffer   thrust::device_vector! seqan::String! Nvidia Webinar, 22.10.2013 Buffer   seqan::String! Buffer  
  54. 54. Host-Device String •  How  to  access  a  device_vector  from  device-­‐side?   •  We  could  pass  (POD)  iterators  to  the  kernel   •  However,  many  SeqAn  algorithms  work  on  more  complex  containers   •  We  need  the  same  interface  of  the  container  on  the  device  side   •  For  strings  we  developed  a  so-­‐called  ContainerView (POD  type)   •  Provides  a  container  interface  given  the  begin/end  pointers  of  vector  buffer   •  The  view()  funcLon  creates  the  ContainerView  object  for  a  given   device_vector! Nvidia Webinar, 22.10.2013
  55. 55. Host-Device String •  How  to  use  a  device_vector  on  the  device   Device  Memory   Host  Memory   Buffer   thrust::device_vector! view()! seqan::ContainerView! Nvidia Webinar, 22.10.2013 kernel  launch! seqan::ContainerView!
  56. 56. Device and View metafunctions •  For  generic  GPU  programming:   •  The  Device  metafuncLon  returns  the  device-­‐memory  equivalent  of  a  class   // Replaces String with thrust::device_vector.! template <typename TValue, typename TSpec>! struct Device<String<TValue, TSpec> >! {! typedef thrust::device_vector<TValue> Type;! };! •  The  View  metafuncLon  returns  the  (POD)  view  type  of  a  class   // Returns a view type that can be passed to a CUDA kernel.! template <typename TValue, typename TAlloc>! struct View<thrust::device_vector<TValue, TAlloc> >! {! typedef ContainerView<thrust::device_vector<TValue, TAlloc> > Type;! };! Nvidia Webinar, 22.10.2013
  57. 57. Hello world •  A  simple  example  to  reverse  a  string  on  the  GPU   // A standard SeqAn string over the Dna alphabet.! String<Dna> myString = "ACGT";! ! // A Dna string on device global memory.! typename Device<String<Dna> >::Type myDeviceString;! ! // Copy the string to global memory.! assign(myDeviceString, myString);! ! // Pass a view of the device string to the CUDA kernel.! myKernel<<<1,1>>>(view(myDeviceString));! ! // TString is ContainerView<device_vector<Dna> >.! template <typename TString>! __global__ void myKernel(TString string)! {! printf(”length(string) = %dn", length(string));! reverse(string);! }! Nvidia Webinar, 22.10.2013
  58. 58. Porting complex data structures •  More  complex  structures  (e.g.  Index,  Graph)  can  only  be  ported  to  the   GPU  if  they  …   •  don’t  use  pointers   •  use  only  strings  of  POD  types  (String<Dna>,  but  not  String<String<…> >)   •  use  only  1-­‐dimensional  StringSets  (ConcatDirect)   •  Nested  classes  are  no  problem   •  View  metafuncLon  converts  all  member  types  into  their  view  types   •  view()  funcLon  is  called  recursively  on  all  members   Nvidia Webinar, 22.10.2013
  59. 59. Example: FM Index Nvidia Webinar, 22.10.2013
  60. 60. The FM-index (BWT, LF-mapping) Nvidia Webinar, 22.10.2013
  61. 61. The FM-index (search ssi) a3  =  C(‘i’)  +  Occ(‘i’,0)  +  1  =  1  +  0  +  1   b3  =  C(‘i’)  +  Occ(‘i’,12)  =  1  +  4   Nvidia Webinar, 22.10.2013
  62. 62. The FM-index (backwards search) a1  =  C(‘s’)  +  Occ(‘s’,8)  +  1  =  8  +  2  +  1     b1  =  C(‘s’)  +  Occ(‘s’,10)    =  8  +  4   Nvidia Webinar, 22.10.2013
  63. 63. The FM-index in SeqAn •  The  FM-­‐index  can  be  implemented  using  a  number  of  string-­‐based   lookup  tables   •  ...  as  well  as  other  indices,  e.g.  enhanced  suffix  array,  q-­‐gram  index   •  There  is  a  space-­‐Lme  tradeoff  between  all  these  indices   •  The  FM  index  has  the  minimal  memory  requirements   Nvidia Webinar, 22.10.2013
  64. 64. A generic FM-index •  SeqAn‘s  FM-­‐index  consists  of  some  nested  classes  storing  Strings   FM-­‐index  (host-­‐only)   Nvidia Webinar, 22.10.2013
  65. 65. A generic FM-index •  The  Device  type  of  the  FM  index  uses  device_vector  instead  of  String! GPU  FM-­‐index  (host-­‐part)   •  The  view  of  this  object  (=  device-­‐part)  is  the  same  tree,  where  leaves  are   replaced  by  ContainerViews  of  device_vectors     Nvidia Webinar, 22.10.2013
  66. 66. CPU vs. GPU •  Invoking  an  FM-­‐index  based  search  on  CPU  and  GPU:   // Select the index The findGPU kernel AND the type.! findCPU function will TIndex;! typedef Index<DnaString, FMIndex<> > invoke many ! instances of the SAME generic // Type is Index<device_vector<Dna>, FMIndex<> >.! function which will perform a typedef typename Device<TIndex>::Type TDeviceIndex;! ! // ======== On CPU ! backtracking algorithm on our ======== // ========== generic index interface On // Create an index. TIndex index("ACGTTGCAA"); GPU ===========! // Create a device index.! TIndex index("ACGTTGCAA");! TDeviceIndex deviceIndex;! assign(deviceIndex, index);! ! // Use the FM-index on CPU. findCPU(index,…); ! template <typename TIndex> void findCPU(TIndex & index,…); Nvidia Webinar, 22.10.2013 // Use the FM-index in a CUDA kernel.! findGPU<<<...>>>(view(deviceIndex),…);! template <typename TIndex>! __global__ void! findGPU(TIndex index,…);!
  67. 67. Approximate search via backtracking do {! if (finder.score == finder.scoreThreshold)! {! if (goDown(textIt, suffix(pattern, patternIt))) delegate(finder);! goUp(textIt);! if (isRoot(textIt)) break;! }! else if (finder.score < finder.scoreThreshold)! {! if (atEnd(patternIt)) delegate(finder);! else if (goDown(textIt))! {! finder.score += parentEdgeLabel(textIt) != value(patternIt);! goNext(patternIt);! continue;! }! }! ! ! do {! goPrevious(patternIt);! finder.score -= parentEdgeLabel(textIt) != value(patternIt);! } while (!goRight(textIt) && goUp(textIt));! if (isRoot(textIt)) break;! finder.score += parentEdgeLabel(textIt) != value(patternIt);! goNext(patternIt);! }! while (true);! Nvidia Webinar, 22.10.2013
  68. 68. Outlook for GPU support •  Our  next  steps  are:   •  Provide  parallelFor()  to  hide  CUDA  kernel  call/OpenMP  for-­‐loop   •  Develop  classes  for  concurrent  access  (String,  job  queues)   •  Port  more  indices  and  index  iterators  to  be  used  with  CUDA   •  Port  SeqAn‘s  alignment  module   •  Develop  a  CPU/GPU  version  of  the  FM-­‐index  based  read  mapper  Masai   •  ...   •  Follow  our  development:   •  Sources:  hqps://github.com/seqan/seqan/tree/develop   •  Code  examples:  hqp://trac.seqan.de/wiki/HowTo/DevelopCUDA     Nvidia Webinar, 22.10.2013
  69. 69. Generic Parallelization Nvidia Webinar, 22.10.2013 69
  70. 70. Multicore parallelization •  We  first  introduced  Tags  to  switch  between  serial  and  parallel   algorithms:   struct Serial_;! typedef Tag<Serial_> Serial;!   ! struct Parallel_;! typedef Tag<Parallel_> Parallel;!   •  Then  we  defined  basic  atomic  operaLons  required  for  thread  safety:       template <typename T>! inline T atomicInc(T &x, Serial)! {! return ++x;! }!     ! template <typename T>! inline T atomicInc(volatile T &x, Parallel)! {! __sync_add_and_fetch(&x, 1);! }!
  71. 71. Splitter •  To  this  end,  we  developed  the  Splitter<TValue, TSpec>  to   compute  a  parLLon  into  subintervals  of  (almost)  equal  length  …     Splitter<unsigned> splitter(10, 20, 3);! for (unsigned i = 0; i < length(splitter); ++i)! cout << '[' << splitter[i] << ',' << splitter[i+1] << ')' << endl;! ! // [10,14)! // [14,17) ! // [17,20)!
  72. 72. Splitter •  The  Spliqer  can  also  be  used  with  iterators  directly     •  The  Serial  /  Parallel  tag  divides  an  interval  range  into  1  /  #thread_num   many  intervals   template <typename TIter, typename TVal, typename TParallelTag>! inline void arrayFill(TIter begin_, TIter end_, ! TVal const &value, Tag<TParallelTag> parallelTag)! {! Splitter<TIterator> splitter(begin_, end_, parallelTag);! ! SEQAN_OMP_PRAGMA(parallel for)! for (int job = 0; job < (int)length(splitter); ++job)! arrayFill(splitter[job], splitter[job + 1], value, Serial());! }! •  The  parallel  tag  can  be  used  to  switch  off  the  parallel  behaviour  
  73. 73. SeqAn going parallel Count  occurrences  of  10  million  20-­‐mers       in  the  human  genome  using  an  FM-­‐index   I7,3.2  GHz   18.6  sec   1  X   Thank you for your 2.66  sec   7  X   …12...   attention Intel  Xeon  Phi   7120,   244  threads   NVIDIA   Tesla  K20   2.18   sec   0.4 s 8.5  X   47  X   73
  74. 74. Upcoming GTC Express Webinars October 23 - Revolutionize Virtual Desktops with the One Missing Piece: A Scalable GPU October 30 - OpenACC 2.0 Enhancements for Cray Supercomputers October 31 - Getting the Most out of NVIDIA GRID vGPU with Citrix XenServer November 5 - Accelerating Face-in-the-Crowd Recognition with GPU Technology November 6 - Bright Cluster Manager: A CUDA-ready Management Solution for GPU-based HPC Register at www.gputechconf.com/gtcexpress
  75. 75. GTC 2014 Call for Posters Posters should describe novel or interesting topics in §  Science and research §  Professional graphics §  Mobile computing §  Automotive applications §  Game development §  Cloud computing Call opens October 29 www.gputechconf.com
  76. 76. Test Drive NVIDIA GPUs! Experience The Acceleration Develop your codes on latest GPUs today Sign up for FREE GPU Test Drive on remotely hosted clusters www.nvidia.com/GPUTestDrive
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×