Introduction to SeqAn, an Open-source C++ Template Library

Test Drive NVIDIA GPUs!
Experience The Acceleration

Develop your codes on latest
GPUs today
Sign up for FREE GPU Test Drive
on remotely hosted clusters

www.nvidia.com/GPUTestDrive

Prof. Dr. Knut Reinert
Algorithmische Bioinformatik, FB Mathematik und Informatik

Intro to SeqAn
An Open-Source C++ template library
for biological sequence analysis
Knut Reinert, David Weese
Freie Universität Berlin Berlin
Institute for Computer Science

This talk

Why SeqAn?
SeqAn as SDK
SeqAn concept/content
Generic Parallelization
3

~ 15 years ago...

Data volume and cost:
In 2000 the 3 billion base pairs of the
human genome were sequenced for
about 3 billion US$ Dollar
100 million bp per day

Nvidia Webinar, 22.10.2013

4

Sequencing today...

Illumina HiSeq
100 Billion bps per DAY

Within roughly ten years sequencing has
become about 10 million times cheaper

5

Future of NGS data analysis


6

Software libraries bridge gap
Structural variants

RNA-Seq

ChIP-Seq

Metagenomics abundance

Sequence assembly

Cancer genomics

Analysis pipelines

Experimentalists

Maintainable tool
Prototype implementation

Algorithm libraries

Algorithm design
Computer Scientists
FM-index

Multicore

Suffix arrays

Theoretical Considerations
Secondary memory
Fast I/O

K-mer filter
Hardware acceleration
7

SeqAn
Now SeqAn/SeqAn tools have been cited more
than 360 times
Among the institutions are (omitting German institutes):
Department of Genetics, Harvard Medical School, Boston,
European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton,
J. Craig Venter Institute, under BSD USA,
Is Rockville MD, license and
Department of Molecular Biology, Princeton University,
hence free for academic
Applied Mathematics Program, Yale University, New Haven,
IBM T.J. Watson Research Center, Yorktown Heights,
AND commercial use.
The Ohio State University, Columbus, University of Minnesota,
Australian National University, Canberra,
Department of Statistics, University of Oxford,
Swedish University of Agricultural Sciences (SLU), Uppsala,
Graduate School of Life Sciences, University of Cambridge,
Broad Institute, Cambridge, USA,
EMBL-EBI, University of California, University of Chicago,
Iowa State University, Ames, The Pennsylvania State University,
Peking University, Beijing University of Science and Technology of China,
BGI-Shenzhen, China, Beijing Institute of Genomics……

8

SeqAn developers
16
14
12
External

10

CSC
BMBF

8

DFG

6

IMPRS
FU

4
2
0
2003 2004 2005 2006 2007 2008 2009 2010 2011 2012


9

SeqAn main concepts


10

length(str)

Value<T>::Type

String<Subclass>

11

void swap(string & str)
{
char help = str[1];
str[1] = str[0];
str[0] = help;
}


12

template <typename T>
void swap(T & str)
{
char help = str[1];
str[1] = str[0];
str[0] = help;
}


13

void swap(T & str)
{
char help = str[1];
str[1] = str[0];
str[0] = help;
}


14

void swap(String<T> & str)
{
T help = str[1];
str[1] = str[0];
str[0] = help;
}


15

void swap(T & str)
{
T::value_type help = str[1];
str[1] = str[0];
str[0] = help;
}


16

void swap(T & str)
{
Value<T>::Type help = str[1];
str[1] = str[0];
str[0] = help;
}


17

Metafunction
struct Value
{
typedef T Type;
};


18

struct Value
{
struct Value< String<T> >
typedef T Type;
{
}; typedef T Type;
};

19

struct Value
{
typedef T Type;
};
< >
char * >
{
typedef T Type;
char Type;
};

20

{
typedef T Type;
};
template < >
t_size N >
struct Value< char * > >
[N]
{
typedef char Type;
};

21

void swap(T & str)
{
str[1] = str[0];
str[0] = help;
}


22

void swap(T & str)
{
str[1] = str[0];
str[0] = help;
}


23

void swap(T & str)
{
Value<T>::Type help =
value(str,1);
value(str,1) = value(str,0);
value(str,0) = help;
}

24

Shim Function
Value<T> & value( T & str,
int i)
{
return str[i];
};

25

Generic Algorithm
void swap(T & str)
{
Value<T>::Type help =
value(str,1);
value(str,1) = value(str,0);
value(str,0) = help;
}

26

SeqAn Content - SDK


27

SeqAn SDK Components - Tutorials


28

SeqAn SDK Components –
Reference Manual


29

SeqAn SDK Components

Review Board to ensure code quality
CDash/CTest to automatically
Code coverage reports
compile and test across platforms

30

SeqAn Content
algorithms & data structures


31

Unified Alignment Algorithms
Versatile & Extensible DP-Interface
For Example ...
Standard DP-Algorithms
Global & Semi Global Alignments
Local Alignments

Modified DP-Algorithms
Split Breakpoint Detection
Banded Chain Alignment


32

Unified Alignment Algorithms
For
Example
...

Needleman-Wunsch with Traceback:
DPProfile<GlobalAlignment<>, LinearGaps, TracebackOn<> >
Semi-Global Gotoh without Traceback:
DPProfile<GlobalAlignment<FreeEndGaps<True, False, True, False> >,
AffineGaps, TracebackOff>
Banded Smith-Waterman with Affine Gap Costs:
DPBand<BandOn>(lowerDiag, upperDiag),
DPProfile<LocalAlignment<>, AffineGaps, TracebackOn<> >
Split-Breakpoint Detection for Right Anchor:
DPProfile<SplitAlignment<>, AffineGaps, TracebackOn<GapsRight> >


33

Support for Common File Formats
Important file formats for HTS analysis
SequenceStream
ss(“file.fa.gz”);

Sequences
while
(!atEnd(ss))

FASTA, FASTQ
Indexed FASTA (FAI) for random access {

Genomic Features
GFF 2, GFF 3, GTF, BED
Read Mapping
SAM, BAM (plus BAM indices)
Variants
VCF

readRecord(id,
seq,
ss);

cout
<<
id
<<
't'
<<
seq
<<
'n';

}
BamStream
bs(“file.bam”);

while
(!atEnd(bs))

{

readRecord(record,
bs);

cout
<<
record.qName
<<
't'
<<

record.pos
<<
'n’;

}

… or write your own parser
Tutorials and helper routines for writing your own parsers.

34

Journaled Sequences
Store Multiple Genomes
Save Storage Capacities

StringSet<TJournaled,
Owner<JournalSet>
>
set;

setGlobalReference(set,
refSeq);

String<Dna,
Journaled<Alloc<>
>
>

appendValue(set,
seq1);

join(set,
idx,
JoinConfig<>());

Ref:
G1:
ŸŸŸ

ŸŸŸ

G2:
GN:


35

Fragment
Store
(Multi) Read Alignments
Read alignments can be easily imported:
std::ifstream
file("ex1.sam");

read(file,
store,
Sam());

… and accessed as a multiple alignment, e.g. for visualization:
AlignedReadLayout
layout;

layoutAlignment(layout,
store);

printAlignment(svgFile,
Raw(),
layout,
store,
1,
0,
150,
0,
36);


36

Uniﬁed
Full-‐Text
Indexing
Framework
Available Indices
Suffix Trees:
•  suffix array
•  enhanced suffix array
•  lazy suffix tree

Prefix Trie:
•  FM-index

q-Gram Indices:
•  direct addressing
•  open addressing
•  gapped

Index<TSeq,
IndexEsa<>
>

Index<StringSet<TSeq>,
FMIndex<>
>

All indices support multiple strings and external memory construction/usage.
Index Lookup Interface
All indices support the (sequential) find interface:
Finder<TIndex>
finder(index);

while
(find(finder,
"TATAA"))

cout
<<
"Hit
at
position"
<<
position(finder)
<<
endl;


37

SeqAn Performance


38

Masai read mapper


39

Masai read mapper
Reads

Genome

Chr.
1

Chr.
2

Chr.
X

ACGCTTCATCGCCCT…

Index
of
reads

(Radix
tree
of
seeds)

Index
of
genome

(e.g.
FM-‐index)

Algorithm
is
based
on
the
simultaneous
traversal
of
two
string
indices

(e.g.,
FM-‐index,
Enhanced
suﬃx
array,
Lazy
suﬃx
tree)

40

Read Mapping: Masai
Faster
and
more
accurate
than
BWA
and
BowLe2

Timings
on
a
single
core


41

Easily exchange index….


42

Collaboration to parallelize indices and
verification algorithms in SeqAn, to speed up any
applications making use of indices

What about multi-core implementation?


43

SeqAn going parallel
GOAL
Parallelize the finder interface of SeqAn
so it works on CPU and accelerators like GPU

Will
be
replaced
by
hg18

and
10
million
20-‐mers


44


Construct
FM-‐index

on
reverse
genome

Set
#
OMP
threads

Call
generic
count
funcLon


45

SeqAn going parallel : NVIDIA GPUs

Copy
needles
and
index
to
GPU

SAME
count
funcLon
as
on
CPU
!


46

Count
occurrences
of
10
million
20-‐mers

in
the
human
genome
using
an
FM-‐index

I7,3.2
GHz

…12...

Intel
Xeon
Phi

7120,

244
threads

NVIDIA

Tesla
K20


18.6
sec

1
X

2.66
sec

7
X

2.18

sec

8.5
X

0.4 s

47
X

47

Approx.
count
occurrences
of
1.2
million
33-‐mers

in
the
human
genome
using
an
FM-‐index

I7,3.2
GHz

…12...

66.1
s

9.0
s

1
X

7.3
X

Intel
Xeon
Phi

7120,

244
threads

3.9 s

16.9
X

NVIDIA

Tesla
K20

3.2 s

20.7
X


48

Part II: The details


49

Parallelization on the GPU


CUDA preliminaries
In
order
to
use
CUDA
we
first
had
to
adapt
some
parts
of
SeqAn:

•  CUDA
requires
each
funcLon
to
be
prefixed
with
domain
qualifiers

__host__

or

__device__

in
order
to
generate
CPU/GPU
code

•  We
prefixed
all
basic
template
funcLons
with
a
SEQAN_HOST_DEVICE
macro

#ifdef __CUDACC__!
#define SEQAN_HOST_DEVICE inline __device__ __host__!
#else!
#define SEQAN_HOST_DEVICE inline!
#endif!

•  StaLc
const
arrays
are
not
allowed
in
the
way
SeqAn
defines
them

•  We
replaced
alphabet
conversion
lookup
tables
(e.g.
Dna<-->
char)
by

conversion
funcLons


Strings
•  Instead
of
defining
a
new
CUDA
string
we
simply
use
the
Thrust
library:

•  Provides
host_vector
and
device_vector
classes,
which
are
vectors
with

buffers
in
host
or
device
memory

•  However,
Thrust
funcLons
are
callable
only
from
host-‐side

•  We
made
both
vectors
accessible
from
SeqAn

•  SeqAn
strings
have
to
provide
a
set
of
global
(meta-‐)funcLons,
e.g.
Value<>,

resize(),
…

•  We
simply
defined
the
required
wrapper
funcLons
for
these
two
vectors


Standard Strings
•  Up
to
here,
all
strings
can
only
be
used
on
the
side
of
their
scope

Device
Memory

Host
Memory

thrust::host_vector!

Buffer

Buffer

thrust::device_vector!

seqan::String!


Buffer

seqan::String!

Buffer

Host-Device String
•  How
to
access
a
device_vector
from
device-‐side?

•  We
could
pass
(POD)
iterators
to
the
kernel

•  However,
many
SeqAn
algorithms
work
on
more
complex
containers

•  We
need
the
same
interface
of
the
container
on
the
device
side

•  For
strings
we
developed
a
so-‐called
ContainerView (POD
type)

•  Provides
a
container
interface
given
the
begin/end
pointers
of
vector
buﬀer

•  The
view()
funcLon
creates
the
ContainerView
object
for
a
given

device_vector!


Host-Device String
•  How
to
use
a
device_vector
on
the
device

Device
Memory

Host
Memory

Buﬀer

thrust::device_vector!

view()!

seqan::ContainerView!


kernel
launch!

seqan::ContainerView!

Device and View metafunctions
•  For
generic
GPU
programming:

•  The
Device
metafuncLon
returns
the
device-‐memory
equivalent
of
a
class

// Replaces String with thrust::device_vector.!
template <typename TValue, typename TSpec>!
struct Device<String<TValue, TSpec> >!
{!
typedef thrust::device_vector<TValue> Type;!
};!

•  The
View
metafuncLon
returns
the
(POD)
view
type
of
a
class

// Returns a view type that can be passed to a CUDA kernel.!
template <typename TValue, typename TAlloc>!
struct View<thrust::device_vector<TValue, TAlloc> >!
{!
typedef ContainerView<thrust::device_vector<TValue, TAlloc> > Type;!
};!


Hello world
•  A
simple
example
to
reverse
a
string
on
the
GPU

// A standard SeqAn string over the Dna alphabet.!
String<Dna> myString = "ACGT";!
!
// A Dna string on device global memory.!
typename Device<String<Dna> >::Type myDeviceString;!
!

// Copy the string to global memory.!
assign(myDeviceString, myString);!
!
// Pass a view of the device string to the CUDA kernel.!
myKernel<<<1,1>>>(view(myDeviceString));!
!

// TString is ContainerView<device_vector<Dna> >.!
template <typename TString>!
__global__ void myKernel(TString string)!
{!
printf(”length(string) = %dn", length(string));!
reverse(string);!
}!


Porting complex data structures
•  More
complex
structures
(e.g.
Index,
Graph)
can
only
be
ported
to
the

GPU
if
they
…

•  don’t
use
pointers

•  use
only
strings
of
POD
types
(String<Dna>,
but
not
String<String<…> >)

•  use
only
1-‐dimensional
StringSets
(ConcatDirect)

•  Nested
classes
are
no
problem

•  View
metafuncLon
converts
all
member
types
into
their
view
types

•  view()
funcLon
is
called
recursively
on
all
members


Example: FM Index


The FM-index (BWT, LF-mapping)


The FM-index (search ssi)

a3
=
C(‘i’)
+
Occ(‘i’,0)
+
1
=
1
+
0
+
1

b3
=
C(‘i’)
+
Occ(‘i’,12)
=
1
+
4


The FM-index (backwards search)

a1
=
C(‘s’)
+
Occ(‘s’,8)
+
1
=
8
+
2
+
1

b1
=
C(‘s’)
+
Occ(‘s’,10)

=
8
+
4


The FM-index in SeqAn
•  The
FM-‐index
can
be
implemented
using
a
number
of
string-‐based

lookup
tables

•  ...
as
well
as
other
indices,
e.g.
enhanced
suﬃx
array,
q-‐gram
index

•  There
is
a
space-‐Lme
tradeoﬀ
between
all
these
indices

•  The
FM
index
has
the
minimal
memory
requirements


A generic FM-index
•  SeqAn‘s
FM-‐index
consists
of
some
nested
classes
storing
Strings

FM-‐index
(host-‐only)


A generic FM-index
•  The
Device
type
of
the
FM
index
uses
device_vector
instead
of
String!
GPU
FM-‐index
(host-‐part)

•  The
view
of
this
object
(=
device-‐part)
is
the
same
tree,
where
leaves
are

replaced
by
ContainerViews
of
device_vectors


CPU vs. GPU
•  Invoking
an
FM-‐index
based
search
on
CPU
and
GPU:

// Select the index The ﬁndGPU kernel AND the
type.!
ﬁndCPU function will TIndex;!
typedef Index<DnaString, FMIndex<> > invoke many
!

instances of the SAME generic

// Type is Index<device_vector<Dna>, FMIndex<> >.!
function which will perform a
typedef typename Device<TIndex>::Type TDeviceIndex;!
!

// ======== On CPU
!

backtracking algorithm on our
========
// ==========
generic index interface

On

// Create an index.
TIndex index("ACGTTGCAA");

GPU ===========!

// Create a device index.!
TIndex index("ACGTTGCAA");!
TDeviceIndex deviceIndex;!
assign(deviceIndex, index);!

!
// Use the FM-index on CPU.
findCPU(index,…);
!

template <typename TIndex>
void
findCPU(TIndex & index,…);


// Use the FM-index in a CUDA kernel.!
findGPU<<<...>>>(view(deviceIndex),…);!
template <typename TIndex>!
__global__ void!
findGPU(TIndex index,…);!

Approximate search via backtracking
do {!
if (finder.score == finder.scoreThreshold)!
{!
if (goDown(textIt, suffix(pattern, patternIt))) delegate(finder);!
goUp(textIt);!
if (isRoot(textIt)) break;!
}!
else if (finder.score < finder.scoreThreshold)!
{!
if (atEnd(patternIt)) delegate(finder);!
else if (goDown(textIt))!
{!
finder.score += parentEdgeLabel(textIt) != value(patternIt);!
goNext(patternIt);!
continue;!
}!
}!
!

!

do {!
goPrevious(patternIt);!
finder.score -= parentEdgeLabel(textIt) != value(patternIt);!
} while (!goRight(textIt) && goUp(textIt));!
if (isRoot(textIt)) break;!
finder.score += parentEdgeLabel(textIt) != value(patternIt);!
goNext(patternIt);!

}!
while (true);!


Outlook for GPU support
•  Our
next
steps
are:

•  Provide
parallelFor()
to
hide
CUDA
kernel
call/OpenMP
for-‐loop

•  Develop
classes
for
concurrent
access
(String,
job
queues)

•  Port
more
indices
and
index
iterators
to
be
used
with
CUDA

•  Port
SeqAn‘s
alignment
module

•  Develop
a
CPU/GPU
version
of
the
FM-‐index
based
read
mapper
Masai

•  ...

•  Follow
our
development:

•  Sources:
hqps://github.com/seqan/seqan/tree/develop

•  Code
examples:
hqp://trac.seqan.de/wiki/HowTo/DevelopCUDA


Generic Parallelization


69

Multicore parallelization
•  We
ﬁrst
introduced
Tags
to
switch
between
serial
and
parallel

algorithms:

struct Serial_;!
typedef Tag<Serial_> Serial;!

!
struct Parallel_;!
typedef Tag<Parallel_> Parallel;!

•  Then
we
deﬁned
basic
atomic
operaLons
required
for
thread
safety:

template <typename T>!
inline T atomicInc(T &x, Serial)!
{!
return ++x;!
}!

!

template <typename T>!
inline T atomicInc(volatile T &x, Parallel)!
{!
__sync_add_and_fetch(&x, 1);!
}!

Splitter
•  To
this
end,
we
developed
the
Splitter<TValue, TSpec>
to

compute
a
parLLon
into
subintervals
of
(almost)
equal
length
…

Splitter<unsigned> splitter(10, 20, 3);!
for (unsigned i = 0; i < length(splitter); ++i)!
cout << '[' << splitter[i] << ',' << splitter[i+1] << ')' << endl;!
!
// [10,14)!
// [14,17) !
// [17,20)!

Splitter
•  The
Spliqer
can
also
be
used
with
iterators
directly

•  The
Serial
/
Parallel
tag
divides
an
interval
range
into
1
/
#thread_num

many
intervals

template <typename TIter, typename TVal, typename TParallelTag>!
inline void arrayFill(TIter begin_, TIter end_, !
TVal const &value, Tag<TParallelTag> parallelTag)!
{!
Splitter<TIterator> splitter(begin_, end_, parallelTag);!
!

SEQAN_OMP_PRAGMA(parallel for)!
for (int job = 0; job < (int)length(splitter); ++job)!
arrayFill(splitter[job], splitter[job + 1], value, Serial());!
}!

•  The
parallel
tag
can
be
used
to
switch
oﬀ
the
parallel
behaviour

Count
occurrences
of
10
million
20-‐mers

in
the
human
genome
using
an
FM-‐index

I7,3.2
GHz

18.6
sec

1
X

Thank you for your
2.66
sec

7
X

…12...

attention
Intel
Xeon
Phi

7120,

244
threads

NVIDIA

Tesla
K20

2.18

sec

0.4 s

8.5
X

47
X

73

Upcoming GTC Express Webinars
October 23 - Revolutionize Virtual Desktops with the One
Missing Piece: A Scalable GPU
October 30 - OpenACC 2.0 Enhancements for Cray
Supercomputers
October 31 - Getting the Most out of NVIDIA GRID vGPU with
Citrix XenServer
November 5 - Accelerating Face-in-the-Crowd Recognition with
GPU Technology
November 6 - Bright Cluster Manager: A CUDA-ready
Management Solution for GPU-based HPC

Register at www.gputechconf.com/gtcexpress

GTC 2014 Call for Posters
Posters should describe novel or interesting topics in
§  Science and research
§  Professional graphics
§  Mobile computing
§  Automotive applications
§  Game development
§  Cloud computing

Call opens October 29
www.gputechconf.com

Introduction to SeqAn, an Open-source C++ Template Library

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Introduction to SeqAn, an Open-source C++ Template Library

Similar to Introduction to SeqAn, an Open-source C++ Template Library (20)

More from Can Ozdoruk

More from Can Ozdoruk (16)

Recently uploaded

Recently uploaded (20)

Introduction to SeqAn, an Open-source C++ Template Library