SlideShare a Scribd company logo
1 of 67
Download to read offline
Intel Labs
Vasimuddin Md.
Sanchit Misra
Efficient Architecture-Aware Acceleration
of BWA-MEM for Multicore Systems
Heng Li Srinivas Aluru
May 21, 2019
Intel Labs
BIGstack: Broad Intel Genomics stack
Optimized Broad Software on Top of Reference Architecture Design
2
Intel Labs
3
Primer on Human Genome
 3 Billion base-pairs
over 23
chromosome-pairs
 23 sequences over
∑= {A,C,G,T}
Exactly same
DNA across
cells of a body
Human ~ Human
99.5% Similarity
Intel Labs
Obtaining Genome of an Individual
Map to the
Reference
Sequence
4
Intel Labs
Obtaining Genome of an Individual
1 Human Genome Get reads
(30X coverage)
1.2 Billion Paired End
Reads of length 151
Map to the
Reference
Sequence
5
Intel Labs
Obtaining Genome of an Individual
6
1 Human Genome Get reads
(30X coverage)
1.2 Billion Paired End
Reads of length 151
28 min
164 min
64 min
Illumina HiSeq X 10 BWA-MEM* BWA-MEM2*
Among the most popular tools
~70K users
*On single socket Intel® Xeon® Platinum 8180 Processor
Map to the
Reference
Sequence
6
Intel Labs
7
Genome Data Will Dwarf Everything Else
Intel Labs
8
Population Genomics, Approaching Worldwide Scale
Source: Frost & Sullivan, “Global Precision Medicine Growth Opportunities, Forecast to 2025”,
January 2017
Intel Labs
9
Population Genomics, Approaching Worldwide Scale
Source: Frost & Sullivan, “Global Precision Medicine Growth Opportunities, Forecast to 2025”,
January 2017
100 million - 2 billion human genomes expected to be sequenced
by 2025!
(That’s ~ 10-200 Exabytes!)
Stephens, et. al. Big Data: Astronomical or Genomical?. PLOS Biology. (2015)
Intel Labs
3 key kernels (each quite complex) consuming 15-45% of time
– SMEM (Super Maximal Exact Match), SAL (Suffix Array Lookup), BSW (Banded
Smith Waterman) with several heuristics
– Different kernels can be the most time consuming depending on data
– Time not covered by the kernels (Misc) is also significant
Majority of other approaches target 1-2 of the 3 kernels on GPGPU/ FPGA
– pipeline the rest on the host CPU
– Performance bound by the non-optimized kernels running on CPU
Accelerating BWA-MEM has Proven Difficult
Approach SMEM SAL BSW Overall
Multiple approaches - (CPU) - (CPU) 1.6x-3x
(GPGPU/FPGA)
1.45x-2x
Chang et. al. 2016 4x (FPGA) - (CPU) - (CPU) 1.26x
Ahmed et. al. 2015 1.7x (CPU) 2.8x (4 FPGAs) 5.7x (4 FPGAs) 2.6x
10
Intel Labs
3 key kernels (each quite complex) consuming 15-45% of time
– SMEM (Super Maximal Exact Match), SAL (Suffix Array Lookup), BSW (Banded
Smith Waterman) with several heuristics
– Different kernels can be the most time consuming depending on data
– Time not covered by the kernels (Misc) is also significant
Majority of other approaches target 1-2 of the 3 kernels on GPGPU/ FPGA
– pipeline the rest on the host CPU
– Performance bound by the non-optimized kernels running on CPU
Accelerating BWA-MEM has Proven Difficult
Approach SMEM SAL BSW Overall
Multiple approaches - (CPU) - (CPU) 1.6x-3x
(GPGPU/FPGA)
1.45x-2x
Chang et. al. 2016 4x (FPGA) - (CPU) - (CPU) 1.26x
Ahmed et. al. 2015 1.7x (CPU) 2.8x (4 FPGAs) 5.7x (4 FPGAs) 2.6x
Bypasses some of the heuristics – Get different output – Strict No No
11
Intel Labs
3 key kernels (each quite complex) consuming 15-45% of time
– SMEM (Super Maximal Exact Match), SAL (Suffix Array Lookup), BSW (Banded
Smith Waterman) with several heuristics
– Different kernels can be the most time consuming depending on data
– Time not covered by the kernels (Misc) is also significant
Majority of other approaches target 1-2 of the 3 kernels on GPGPU/ FPGA
– pipeline the rest on the host CPU
– Performance bound by the non-optimized kernels running on CPU
Accelerating BWA-MEM has Proven Difficult
No published work contains a holistic architecture-aware optimization of
BWA-MEM software on multicore systems.
Approach SMEM SAL BSW Overall
Multiple approaches - (CPU) - (CPU) 1.6x-3x
(GPGPU/FPGA)
1.45x-2x
Chang et. al. 2016 4x (FPGA) - (CPU) - (CPU) 1.26x
Ahmed et. al. 2015 1.7x (CPU) 2.8x (4 FPGAs) 5.7x (4 FPGAs) 2.6x
Bypasses some of the heuristics – Get different output – Strict No No
12
Intel Labs
System Configuration
Intel® Xeon® Platinum
8180 Processor
Name used in the rest of the
presentation
SKX
Sockets x Cores x Threads 2 x 28 x 2
VPUs/Core x AVX register width 2 x {512, 256, 128}
Base clock frequency 2.5 GHz
L1D/L2 cache / Core 32/1024 KB
L3 cache / Socket 38.5 MB
DRAM size / Socket, BW 96 GB, 114 GB/s
Compiler version ICC v. 17.0.2
Performance on multiple sockets can be achieved by just distributing the reads equally
and load imbalance is usually not an issue.
Therefore, our efforts are focused on single socket performance.
13
Intel Labs
Datasets
Reference Sequence
Half of Human Genome (version HG38) - 1.5 Billion nucleotides
Dataset # Reads Read Length Dataset Source
D1 5 x 105 151 Broad Institute
D2 5 x 105 151 Broad Institute
D3 1.25 x 106 76 NCBI SRA: SRX020470
D4 1.25 x 106 101 NCBI SRA: SRX207170
D5 1.25 x 106 101 NCBI SRA: SRX206890
Read Datasets
14
Intel Labs
End to End Performance Gains On SKX – Compute Only
Our output is identical to original BWA-MEM
Single Thread of SKX Single socket (56 threads/28 cores) of SKX
15
Intel Labs
Optimization Details
16
Intel Labs
The Problem – Mapping to the Reference Sequence
S1
S2
S4
S3
Sm
Reference R
CCCTCCTATTTAAC
Query Q
Find the best matches of 𝑄 in 𝑅
17
Intel Labs
FM-Index of the Reference Sequence
FM-index of a sample
reference sequence:
AGTGGA.
It consists of Suffix Array,
Burrows Wheeler
Transform (BWT), O and D
arrays.
Since BW-Matrix is
lexicographically sorted, all
the occurrences of a query
appear contiguously in the
suffix array (SA). These
contiguous locations are
called SA interval.
18
Intel Labs
FM-Index of the Reference Sequence
FM-index of a sample
reference sequence:
AGTGGA.
It consists of Suffix Array,
Burrows Wheeler
Transform (BWT), O and D
arrays.
Since BW-Matrix is
lexicographically sorted, all
the occurrences of a query
appear contiguously in the
suffix array (SA). These
contiguous locations are
called SA interval.
30 GB 1.5 GB
96 GB
19
Sizes for human
genome
Intel Labs
Compressed FM-Index in BWA-MEM
 To reduce memory footprint, the O array is divided into buckets of
size 𝜂
 For each bucket
– nucleotide counts are stored for all the previous buckets
– The corresponding BWT string of size 𝜂 is stored in a 2-bit per nucleotide format
O(G, t) = 256 + 1 = 257
A:0
C:0
G:0
T:0
GGAAC…..AGCT
A:35
C:30
G:31
T:32
TGAGC…..AGCT
A:266
C:250
G:256
T:252
CGCCA…..TGAT
𝜂 = 128 tth index in BWT
string
Fig. based on Jing Zhang et. al. CCGrid’2013
20
Intel Labs
BWA-MEM Algorithm
Seeding – Look for exact matches (regions) in the reference sequence for the
substrings (seeds) of the query using compressed FM-Index
– Super Maximal Exact Match (SMEM)
– Suffix Array Lookup (SAL)
– Chaining
Extension – Extend the matches on either side to get end-to-end matches.
Select matches with high similarity
– Banded Smith Waterman (BSW)
SAM-Form – Format the
output in the SAM format
21
Intel Labs
BWA-MEM Algorithm
Seeding – Look for exact matches (regions) in the reference sequence for the
substrings (seeds) of the query using compressed FM-Index
– Super Maximal Exact Match (SMEM)
– Suffix Array Lookup (SAL)
– Chaining
Extension – Extend the matches on either side to get end-to-end matches.
Select matches with high similarity
– Banded Smith Waterman (BSW)
SAM-Form – Format the
output in the SAM format
- Reorganization
22
Intel Labs
SMEM Algorithm from BWA-MEM - For One Position
Reference: ATTCTTATGTA
Read: GTTAC
Forward extension phase Backward extension phase
23
1. Find maximal length query
substrings with matches
2. Output the matches
Intel Labs
SMEM Algorithm from BWA-MEM - For One Position
Reference: ATTCTTATGTA
Read: GTTAC
Forward extension phase
1. GTTAC
Find T - <T, 7, 12>
Backward extension phase
24
1. Find maximal length query
substrings with matches
2. Output the matches
Intel Labs
SMEM Algorithm from BWA-MEM - For One Position
Reference: ATTCTTATGTA
Read: GTTAC
Forward extension phase
1. GTTAC
Find T - <T, 7, 12>
2. GTTAC
Find TA - <TA, 7, 8>
<T, 7, 12>
Backward extension phase
25
1. Find maximal length query
substrings with matches
2. Output the matches
Intel Labs
SMEM Algorithm from BWA-MEM - For One Position
Reference: ATTCTTATGTA
Read: GTTAC
Forward extension phase
1. GTTAC
Find T - <T, 7, 12>
2. GTTAC
Find TA - <TA, 7, 8>
<T, 7, 12>
3. GTTAC
Find TAC –
<TA, 7, 8>
<T, 7, 12>
Backward extension phase
26
1. Find maximal length query
substrings with matches
2. Output the matches
Intel Labs
SMEM Algorithm from BWA-MEM - For One Position
Reference: ATTCTTATGTA
Read: GTTAC
Forward extension phase
1. GTTAC
Find T - <T, 7, 12>
2. GTTAC
Find TA - <TA, 7, 8>
<T, 7, 12>
3. GTTAC
Find TAC –
<TA, 7, 8>
<T, 7, 12>
Backward extension phase
1. GTTAC
<TA, 7, 8> - Find TTA- <TTA, 11, 11>
<T, 7, 12> - Find TT - <TT, 11, 12>
27
1. Find maximal length query
substrings with matches
2. Output the matches
Intel Labs
SMEM Algorithm from BWA-MEM - For One Position
Reference: ATTCTTATGTA
Read: GTTAC
Forward extension phase
1. GTTAC
Find T - <T, 7, 12>
2. GTTAC
Find TA - <TA, 7, 8>
<T, 7, 12>
3. GTTAC
Find TAC –
<TA, 7, 8>
<T, 7, 12>
Backward extension phase
1. GTTAC
<TA, 7, 8> - Find TTA- <TTA, 11, 11>
<T, 7, 12> - Find TT - <TT, 11, 12>
2. GTTAC
<TTA, 11, 11> - Find GTTA – Not
found
Add TTA to list of SMEMs
<TT, 11, 12> - Find GTT – Not found
28
1. Find maximal length query
substrings with matches
2. Output the matches
Intel Labs
SMEM Algorithm from BWA-MEM - For One Position
Reference: ATTCTTATGTA
Read: GTTAC
Forward extension phase
1. GTTAC
Find T - <T, 7, 12>
2. GTTAC
Find TA - <TA, 7, 8>
<T, 7, 12>
3. GTTAC
Find TAC –
<TA, 7, 8>
<T, 7, 12>
Backward extension phase
1. GTTAC
<TA, 7, 8> - Find TTA- <TTA, 11, 11>
<T, 7, 12> - Find TT - <TT, 11, 12>
2. GTTAC
<TTA, 11, 11> - Find GTTA – Not
found
Add TTA to list of SMEMs
<TT, 11, 12> - Find GTT – Not found
Output SMEMs:
<TTA, 11, 11>
29
1. Find maximal length query
substrings with matches
2. Output the matches
Intel Labs
SMEM Algorithm
from BWA-MEM:
For One Position
Intel Labs
SMEM Algorithm
from BWA-MEM:
For One Position
FM-Index
Intel Labs
SMEM Algorithm
from BWA-MEM:
For One Position
FM-Index
query
Intel Labs
SMEM Algorithm
from BWA-MEM:
For One Position
m
Forward extension
p q
m m+1 p1 q1
m m+2 p2 q2
… … ... ...
m m+k pk qk
FM-Index
query
Intel Labs
SMEM Algorithm
from BWA-MEM:
For One Position
m
Forward extension
p q
m m+1 p1 q1
m m+2 p2 q2
… … ... ...
m m+k pk qk
m m+k pk qk
… … … …
m m+2 p2 q2
m m+1 p1 q1
m-1 m+k pk
’ qk
’
… … … …
m-1 m+2 p2
’ q2
’
m-1 m+1 p1
’ q1
’
Backward extension
FM-Index
query
Intel Labs
SMEM Algorithm
from BWA-MEM:
For One Position
m
Forward extension
p q
m m+1 p1 q1
m m+2 p2 q2
… … ... ...
m m+k pk qk
m m+k pk qk
… … … …
m m+2 p2 q2
m m+1 p1 q1
m-1 m+k pk
’ qk
’
… … … …
m-1 m+2 p2
’ q2
’
m-1 m+1 p1
’ q1
’
Backward extension
m-2 m+k pk
’’ qk
’’
… … … …
m-2 m+2 p2
’’ q2
’’
FM-Index
query
m-1 m+k pk
’ qk
’
… … … …
m-1 m+2 p2
’ q2
’
Intel Labs
SMEM Algorithm
36
Intel Labs
SMEM Algorithm
37
Intel Labs
SMEM Algorithm
38
Intel Labs
SMEM Algorithm
39
Intel Labs
SMEM Algorithm
No spatial locality
40
Intel Labs
SMEM Algorithm
No spatial locality
New values in the tuple
depend on current values
and the current nucleotide
41
Intel Labs
SMEM Algorithm
No spatial locality
Large # instructions
for 𝜂 = 128
New values in the tuple
depend on current values
and the current nucleotide
42
Intel Labs
SMEM Algorithm
No spatial locality
Large # instructions
for 𝜂 = 128
New values in the tuple
depend on current values
and the current nucleotide
43
Intel Labs
SMEM Algorithm – Key Optimizations
 Software Prefetching
– For any tuple that is added to the backward search buffer, we know the memory
locations that will be accessed when the corresponding backward search occurs
– So, we software prefetch it and hide prefetch latency with computation
Intel Labs
SMEM Algorithm – Key Optimizations
 Reducing 𝜂 and vectorization
– Reduced the value of 𝜂 to 32
– Store BWT string using 1-byte per nucleotide format – 32 bytes total
– Process the 32 byte BWT using byte level AVX2 instrinsics to get the number of
occurrences of a nucleotide
– The four counts consume 4 bytes per letter – 16 bytes total
– Added 16 bytes of padding to make 64 bytes to align along cache line boundary
– one cache line to ensure the whole bucket can be prefetched using one
instruction
45
Intel Labs
SMEM Algorithm – Results
System: SKX, #Threads = 1
Read dataset: 60000 reads from D2
2x speedup
46
Intel Labs
Suffix Array Lookup - SAL
SMEM outputs the suffix array interval
Each suffix array index in the interval is looked
up to get the reference sequence coordinate like
this:
Optimization:
– Original BWA-MEM uses compressed suffix array to
reduce memory footprint – but there is sufficient
memory on current systems
– So, we simply use uncompressed suffix array and look
it up using the above expression
47
Intel Labs
SAL - Results
System: SKX, #Threads = 1
Input data created by intercepting the data to SAL stage from an actual run using
600,000 reds from D2
183x speedup
48
Intel Labs
Banded Smith Waterman - BSW
is gap open penalty
is gap extension penalty
𝑓(𝑎, 𝑏) = match parameter, if a=b
mismatch parameter, otherwise49
Regular Smith Waterman
Intel Labs
Banded Smith Waterman - BSW
 Only a diagonal band is computed
is gap open penalty
is gap extension penalty
𝑓(𝑎, 𝑏) = match parameter, if a=b
mismatch parameter, otherwise50
Regular Smith Waterman Banded Smith Waterman from BWA-MEM
Intel Labs
Banded Smith Waterman - BSW
 Only a diagonal band is computed
 Size of the band can dynamically change from
top to bottom
is gap open penalty
is gap extension penalty
𝑓(𝑎, 𝑏) = match parameter, if a=b
mismatch parameter, otherwise51
Regular Smith Waterman Banded Smith Waterman from BWA-MEM
Intel Labs
Banded Smith Waterman - BSW
 Only a diagonal band is computed
 Size of the band can dynamically change from
top to bottom
 Various conditions of early exit
is gap open penalty
is gap extension penalty
𝑓(𝑎, 𝑏) = match parameter, if a=b
mismatch parameter, otherwise52
Regular Smith Waterman Banded Smith Waterman from BWA-MEM
Intel Labs
Banded Smith Waterman - BSW
 Only a diagonal band is computed
 Size of the band can dynamically change from
top to bottom
 Various conditions of early exit
 Low parallelism within one matrix computation
is gap open penalty
is gap extension penalty
𝑓(𝑎, 𝑏) = match parameter, if a=b
mismatch parameter, otherwise53
Regular Smith Waterman Banded Smith Waterman from BWA-MEM
Intel Labs
BSW – Optimizations – Inter-task Vectorization
We hand vectorized using AVX512 SIMD intrinsics
54
Intel Labs
BSW – Optimizations – Inter-task Vectorization
We hand vectorized using AVX512 SIMD intrinsics
Challenges
– Variable and dynamically changing band size
– Early exits
– Overhead of dynamic band computation
55
Intel Labs
BSW – Optimizations – Inter-task Vectorization
We hand vectorized using AVX512 SIMD intrinsics
Challenges
– Variable and dynamically changing band size
– Early exits
– Overhead of dynamic band computation
Sort the sequences according to band sizes to make
the computation across pairs being vectorized more
uniform
56
Intel Labs
BSW – Optimizations – Inter-task Vectorization
We hand vectorized using AVX512 SIMD intrinsics
Challenges
– Variable and dynamically changing band size
– Early exits
– Overhead of dynamic band computation
Sort the sequences according to band sizes to make
the computation across pairs being vectorized more
uniform
Convert the sequences from AoS to SoA format to
prevent gather/scatter cost
57
Intel Labs
BSW – Optimizations – Inter-task Vectorization
We hand vectorized using AVX512 SIMD intrinsics
Challenges
– Variable and dynamically changing band size
– Early exits
– Overhead of dynamic band computation
Sort the sequences according to band sizes to make
the computation across pairs being vectorized more
uniform
Convert the sequences from AoS to SoA format to
prevent gather/scatter cost
SIMD Operations used
– cmp, blend, max, mov, add, and sub, mask
– Precision
– Lower precision provides more performance
– Precision required depends on max. score depends on sequence lengths
– We choose 8-bit or 16-bit precision based on sequence lengths
58
Intel Labs
BSW - Results
System: SKX, #Threads = 1
Input: 48 Million sequence pairs obtained by intercepting the input to this stage from a
full application run. Read dataset used for full run: D3.
11.6x6.7x
59
Intel Labs
BSW - Results
System: SKX, #Threads = 1
Input: 48 Million sequence pairs obtained by intercepting the input to this stage from a
full application run. Read dataset used for full run: D3.
11.6x6.7x
~14x reduction in # instructions
IPC is reduced because majority of
instructions in optimized code are SIMD
instructions
There are 2 ports for SIMD (VPUs), but 4 for
scalar
60
Intel Labs
BSW - Results
System: SKX, #Threads = 1
Input: 48 Million sequence pairs obtained by intercepting the input to this stage from a
full application run. Read dataset used for full run: D3.
11.6x6.7x
~14x reduction in # instructions
IPC is reduced because majority of
instructions in optimized code are SIMD
instructions
There are 2 ports for SIMD (VPUs), but 4 for
scalar
61
Why not
512
8
= 64x
speedup?
Intel Labs
BSW - Results
System: SKX, #Threads = 1
Input: 48 Million sequence pairs obtained by intercepting the input to this stage from a
full application run. Read dataset used for full run: D3.
11.6x6.7x
~14x reduction in # instructions
IPC is reduced because majority of
instructions in optimized code are SIMD
instructions
There are 2 ports for SIMD (VPUs), but 4 for
scalar
62
Why not
512
8
= 64x
speedup?
Only 43% of the time is spent on cell
computation using SIMD
In which ~50% of lanes are idle – so,
effectively ~21.5% for cell computation
Intel Labs
Multithread Scaling
Scaling of three kernels and the entire application from 1 to 28 core
on SKX
We demonstrate nearly equal or better scaling on all kernels
Application scaling is worse due to bad scaling of “Misc” section
63
Intel Labs
End to End Performance Results – Compute only
All kernels retain their speedup in the end-to-end run
SAL barely contributes to the run time due to 183x speedup
Single Thread of SKX Single socket (56 threads/28 cores) of SKX
64
Intel Confidential – Internal Only
BWA-MEM2 Open Sourcing
Drop-In Replacement
Supported executions: AVX512, AVX2, SSE4.1, scalar
Supported functionality: All the functionality of BWA-MEM
including single end and paired-end alignments
Output: Identical to BWA-MEM
Command line interface: Exactly same as BWA-MEM
Future Steps
Algorithmic, implementation level (Misc) and architectural
improvements
https://github.com/bwa-mem2/bwa-mem2
65
Intel Confidential – Internal Only
Intel Legal Disclaimers
 Intel, Xeon and Intel Xeon Phi are trademarks of Intel Corporation or its
subsidiaries in the U.S. and/or other countries. Other names and brands may be
claimed as the property of others. © Intel Corporation
 Software and workloads used in performance tests may have been optimized for
performance only on Intel microprocessors. Performance tests, such as SYSmark
and MobileMark, are measured using specific computer systems, components,
software, operations and functions. Any change to any of those factors may cause
the results to vary. You should consult other information and performance tests to
assist you in fully evaluating your contemplated purchases, including the
performance of that product when combined with other products. For more
information go to www.intel.com/benchmarks.
 Benchmark results were obtained prior to implementation of recent software
patches and firmware updates intended to address exploits referred to as "Spectre"
and "Meltdown". Implementation of these updates may make these results
inapplicable to your device or system.
66
Intel Confidential – Internal Only
Thank You!
Vasimuddin Md
vasimuddin.md@intel.com
@wasim_galaxy
Sanchit Misra
sanchit.misra@intel.com
sanchit-misra@github.io
@sanchit_misra
Heng Li
hli@jimmy.harvard.edu
http://www.liheng.org/
@lh3lh3
Srinivas Aluru
aluru@cc.gatech.edu
https://www.cc.gatech.edu/~saluru/
67

More Related Content

What's hot

RNA-seq Data Analysis Overview
RNA-seq Data Analysis OverviewRNA-seq Data Analysis Overview
RNA-seq Data Analysis OverviewSean Davis
 
Variant (SNP) calling - an introduction (with a worked example, using FreeBay...
Variant (SNP) calling - an introduction (with a worked example, using FreeBay...Variant (SNP) calling - an introduction (with a worked example, using FreeBay...
Variant (SNP) calling - an introduction (with a worked example, using FreeBay...Manikhandan Mudaliar
 
BITS: Basics of Sequence similarity
BITS: Basics of Sequence similarityBITS: Basics of Sequence similarity
BITS: Basics of Sequence similarityBITS
 
Introduction to RNA-seq and RNA-seq Data Analysis (UEB-UAT Bioinformatics Cou...
Introduction to RNA-seq and RNA-seq Data Analysis (UEB-UAT Bioinformatics Cou...Introduction to RNA-seq and RNA-seq Data Analysis (UEB-UAT Bioinformatics Cou...
Introduction to RNA-seq and RNA-seq Data Analysis (UEB-UAT Bioinformatics Cou...VHIR Vall d’Hebron Institut de Recerca
 
qPCR Design Strategies for Specific Applications
qPCR Design Strategies for Specific ApplicationsqPCR Design Strategies for Specific Applications
qPCR Design Strategies for Specific ApplicationsIntegrated DNA Technologies
 
Differential expression in RNA-Seq
Differential expression in RNA-SeqDifferential expression in RNA-Seq
Differential expression in RNA-SeqcursoNGS
 
Needleman-wunch algorithm harshita
Needleman-wunch algorithm  harshitaNeedleman-wunch algorithm  harshita
Needleman-wunch algorithm harshitaHarshita Bhawsar
 
RNASeq - Analysis Pipeline for Differential Expression
RNASeq - Analysis Pipeline for Differential ExpressionRNASeq - Analysis Pipeline for Differential Expression
RNASeq - Analysis Pipeline for Differential ExpressionJatinder Singh
 
So you want to do a: RNAseq experiment, Differential Gene Expression Analysis
So you want to do a: RNAseq experiment, Differential Gene Expression AnalysisSo you want to do a: RNAseq experiment, Differential Gene Expression Analysis
So you want to do a: RNAseq experiment, Differential Gene Expression AnalysisUniversity of California, Davis
 
RNA-seq: general concept, goal and experimental design - part 1
RNA-seq: general concept, goal and experimental design - part 1RNA-seq: general concept, goal and experimental design - part 1
RNA-seq: general concept, goal and experimental design - part 1BITS
 
RNA-seq: A High-resolution View of the Transcriptome
RNA-seq: A High-resolution View of the TranscriptomeRNA-seq: A High-resolution View of the Transcriptome
RNA-seq: A High-resolution View of the TranscriptomeSean Davis
 
Dna sequencing methods
Dna sequencing methodsDna sequencing methods
Dna sequencing methodshephz
 
De bruijn graphs
De bruijn graphsDe bruijn graphs
De bruijn graphsmarium02
 
De novo genome assembly - T.Seemann - IMB winter school 2016 - brisbane, au ...
De novo genome assembly  - T.Seemann - IMB winter school 2016 - brisbane, au ...De novo genome assembly  - T.Seemann - IMB winter school 2016 - brisbane, au ...
De novo genome assembly - T.Seemann - IMB winter school 2016 - brisbane, au ...Torsten Seemann
 

What's hot (20)

RNA-seq Data Analysis Overview
RNA-seq Data Analysis OverviewRNA-seq Data Analysis Overview
RNA-seq Data Analysis Overview
 
Rna seq pipeline
Rna seq pipelineRna seq pipeline
Rna seq pipeline
 
Biological networks - building and visualizing
Biological networks - building and visualizingBiological networks - building and visualizing
Biological networks - building and visualizing
 
Variant (SNP) calling - an introduction (with a worked example, using FreeBay...
Variant (SNP) calling - an introduction (with a worked example, using FreeBay...Variant (SNP) calling - an introduction (with a worked example, using FreeBay...
Variant (SNP) calling - an introduction (with a worked example, using FreeBay...
 
Variant analysis and whole exome sequencing
Variant analysis and whole exome sequencingVariant analysis and whole exome sequencing
Variant analysis and whole exome sequencing
 
BITS: Basics of Sequence similarity
BITS: Basics of Sequence similarityBITS: Basics of Sequence similarity
BITS: Basics of Sequence similarity
 
Introduction to RNA-seq and RNA-seq Data Analysis (UEB-UAT Bioinformatics Cou...
Introduction to RNA-seq and RNA-seq Data Analysis (UEB-UAT Bioinformatics Cou...Introduction to RNA-seq and RNA-seq Data Analysis (UEB-UAT Bioinformatics Cou...
Introduction to RNA-seq and RNA-seq Data Analysis (UEB-UAT Bioinformatics Cou...
 
ChipSeq Data Analysis
ChipSeq Data AnalysisChipSeq Data Analysis
ChipSeq Data Analysis
 
qPCR Design Strategies for Specific Applications
qPCR Design Strategies for Specific ApplicationsqPCR Design Strategies for Specific Applications
qPCR Design Strategies for Specific Applications
 
Differential expression in RNA-Seq
Differential expression in RNA-SeqDifferential expression in RNA-Seq
Differential expression in RNA-Seq
 
Needleman-wunch algorithm harshita
Needleman-wunch algorithm  harshitaNeedleman-wunch algorithm  harshita
Needleman-wunch algorithm harshita
 
RNASeq - Analysis Pipeline for Differential Expression
RNASeq - Analysis Pipeline for Differential ExpressionRNASeq - Analysis Pipeline for Differential Expression
RNASeq - Analysis Pipeline for Differential Expression
 
Data analysis pipelines for NGS applications
Data analysis pipelines for NGS applicationsData analysis pipelines for NGS applications
Data analysis pipelines for NGS applications
 
So you want to do a: RNAseq experiment, Differential Gene Expression Analysis
So you want to do a: RNAseq experiment, Differential Gene Expression AnalysisSo you want to do a: RNAseq experiment, Differential Gene Expression Analysis
So you want to do a: RNAseq experiment, Differential Gene Expression Analysis
 
RNA-seq: general concept, goal and experimental design - part 1
RNA-seq: general concept, goal and experimental design - part 1RNA-seq: general concept, goal and experimental design - part 1
RNA-seq: general concept, goal and experimental design - part 1
 
RNA-seq: A High-resolution View of the Transcriptome
RNA-seq: A High-resolution View of the TranscriptomeRNA-seq: A High-resolution View of the Transcriptome
RNA-seq: A High-resolution View of the Transcriptome
 
Genome Assembly
Genome AssemblyGenome Assembly
Genome Assembly
 
Dna sequencing methods
Dna sequencing methodsDna sequencing methods
Dna sequencing methods
 
De bruijn graphs
De bruijn graphsDe bruijn graphs
De bruijn graphs
 
De novo genome assembly - T.Seemann - IMB winter school 2016 - brisbane, au ...
De novo genome assembly  - T.Seemann - IMB winter school 2016 - brisbane, au ...De novo genome assembly  - T.Seemann - IMB winter school 2016 - brisbane, au ...
De novo genome assembly - T.Seemann - IMB winter school 2016 - brisbane, au ...
 

Similar to BWA-MEM2-IPDPS 2019

Workshop NGS data analysis - 2
Workshop NGS data analysis - 2Workshop NGS data analysis - 2
Workshop NGS data analysis - 2Maté Ongenaert
 
AAME ARM Techcon2013 004v02 Debug and Optimization
AAME ARM Techcon2013 004v02 Debug and OptimizationAAME ARM Techcon2013 004v02 Debug and Optimization
AAME ARM Techcon2013 004v02 Debug and OptimizationAnh Dung NGUYEN
 
Thomas+Niewel+ +Oracletuning
Thomas+Niewel+ +OracletuningThomas+Niewel+ +Oracletuning
Thomas+Niewel+ +Oracletuningafa reg
 
Cisco crs1
Cisco crs1Cisco crs1
Cisco crs1wjunjmt
 
design-compiler.pdf
design-compiler.pdfdesign-compiler.pdf
design-compiler.pdfFrangoCamila
 
The CAOS framework: democratize the acceleration of compute intensive applica...
The CAOS framework: democratize the acceleration of compute intensive applica...The CAOS framework: democratize the acceleration of compute intensive applica...
The CAOS framework: democratize the acceleration of compute intensive applica...NECST Lab @ Politecnico di Milano
 
ExtraV - Boosting Graph Processing Near Storage with a Coherent Accelerator
ExtraV - Boosting Graph Processing Near Storage with a Coherent AcceleratorExtraV - Boosting Graph Processing Near Storage with a Coherent Accelerator
ExtraV - Boosting Graph Processing Near Storage with a Coherent AcceleratorJinho Lee
 
unit 1ARM INTRODUCTION.pptx
unit 1ARM INTRODUCTION.pptxunit 1ARM INTRODUCTION.pptx
unit 1ARM INTRODUCTION.pptxKandavelEee
 
BFSK RT In FPGA Thesis Pres Jps
BFSK RT In FPGA Thesis Pres JpsBFSK RT In FPGA Thesis Pres Jps
BFSK RT In FPGA Thesis Pres Jpsjpsvenn
 
Algorithm Selection for Preferred Extensions Enumeration
Algorithm Selection for Preferred Extensions EnumerationAlgorithm Selection for Preferred Extensions Enumeration
Algorithm Selection for Preferred Extensions EnumerationFederico Cerutti
 
ESPM2 2018 - Automatic Generation of High-Order Finite-Difference Code with T...
ESPM2 2018 - Automatic Generation of High-Order Finite-Difference Code with T...ESPM2 2018 - Automatic Generation of High-Order Finite-Difference Code with T...
ESPM2 2018 - Automatic Generation of High-Order Finite-Difference Code with T...Hideyuki Tanaka
 
Nilesh ranpura systemmodelling
Nilesh ranpura systemmodellingNilesh ranpura systemmodelling
Nilesh ranpura systemmodellingObsidian Software
 
Nodes and Networks for HPC computing
Nodes and Networks for HPC computingNodes and Networks for HPC computing
Nodes and Networks for HPC computingrinnocente
 

Similar to BWA-MEM2-IPDPS 2019 (20)

Workshop NGS data analysis - 2
Workshop NGS data analysis - 2Workshop NGS data analysis - 2
Workshop NGS data analysis - 2
 
ASCIC.ppt
ASCIC.pptASCIC.ppt
ASCIC.ppt
 
AAME ARM Techcon2013 004v02 Debug and Optimization
AAME ARM Techcon2013 004v02 Debug and OptimizationAAME ARM Techcon2013 004v02 Debug and Optimization
AAME ARM Techcon2013 004v02 Debug and Optimization
 
TiReX: Tiled Regular eXpression matching architecture
TiReX: Tiled Regular eXpression matching architectureTiReX: Tiled Regular eXpression matching architecture
TiReX: Tiled Regular eXpression matching architecture
 
Thomas+Niewel+ +Oracletuning
Thomas+Niewel+ +OracletuningThomas+Niewel+ +Oracletuning
Thomas+Niewel+ +Oracletuning
 
7 eti pres
7 eti pres7 eti pres
7 eti pres
 
Asic
AsicAsic
Asic
 
Cisco crs1
Cisco crs1Cisco crs1
Cisco crs1
 
design-compiler.pdf
design-compiler.pdfdesign-compiler.pdf
design-compiler.pdf
 
The CAOS framework: democratize the acceleration of compute intensive applica...
The CAOS framework: democratize the acceleration of compute intensive applica...The CAOS framework: democratize the acceleration of compute intensive applica...
The CAOS framework: democratize the acceleration of compute intensive applica...
 
ExtraV - Boosting Graph Processing Near Storage with a Coherent Accelerator
ExtraV - Boosting Graph Processing Near Storage with a Coherent AcceleratorExtraV - Boosting Graph Processing Near Storage with a Coherent Accelerator
ExtraV - Boosting Graph Processing Near Storage with a Coherent Accelerator
 
ate_full_paper
ate_full_paperate_full_paper
ate_full_paper
 
Altera trcak g
Altera  trcak gAltera  trcak g
Altera trcak g
 
unit 1ARM INTRODUCTION.pptx
unit 1ARM INTRODUCTION.pptxunit 1ARM INTRODUCTION.pptx
unit 1ARM INTRODUCTION.pptx
 
BFSK RT In FPGA Thesis Pres Jps
BFSK RT In FPGA Thesis Pres JpsBFSK RT In FPGA Thesis Pres Jps
BFSK RT In FPGA Thesis Pres Jps
 
Algorithm Selection for Preferred Extensions Enumeration
Algorithm Selection for Preferred Extensions EnumerationAlgorithm Selection for Preferred Extensions Enumeration
Algorithm Selection for Preferred Extensions Enumeration
 
ESPM2 2018 - Automatic Generation of High-Order Finite-Difference Code with T...
ESPM2 2018 - Automatic Generation of High-Order Finite-Difference Code with T...ESPM2 2018 - Automatic Generation of High-Order Finite-Difference Code with T...
ESPM2 2018 - Automatic Generation of High-Order Finite-Difference Code with T...
 
Nilesh ranpura systemmodelling
Nilesh ranpura systemmodellingNilesh ranpura systemmodelling
Nilesh ranpura systemmodelling
 
Nodes and Networks for HPC computing
Nodes and Networks for HPC computingNodes and Networks for HPC computing
Nodes and Networks for HPC computing
 
No[1][1]
No[1][1]No[1][1]
No[1][1]
 

Recently uploaded

VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...Suhani Kapoor
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfgstagge
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDRafezzaman
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingNeil Barnes
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...soniya singh
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptSonatrach
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024thyngster
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...Suhani Kapoor
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptxthyngster
 
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改atducpo
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsappssapnasaifi408
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubaihf8803863
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...dajasot375
 
9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home ServiceSapana Sha
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一F sss
 

Recently uploaded (20)

VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdf
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data Storytelling
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
 
Call Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort ServiceCall Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort Service
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
 
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
 
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
 
E-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptxE-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptx
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
 
9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
 

BWA-MEM2-IPDPS 2019

  • 1. Intel Labs Vasimuddin Md. Sanchit Misra Efficient Architecture-Aware Acceleration of BWA-MEM for Multicore Systems Heng Li Srinivas Aluru May 21, 2019
  • 2. Intel Labs BIGstack: Broad Intel Genomics stack Optimized Broad Software on Top of Reference Architecture Design 2
  • 3. Intel Labs 3 Primer on Human Genome  3 Billion base-pairs over 23 chromosome-pairs  23 sequences over ∑= {A,C,G,T} Exactly same DNA across cells of a body Human ~ Human 99.5% Similarity
  • 4. Intel Labs Obtaining Genome of an Individual Map to the Reference Sequence 4
  • 5. Intel Labs Obtaining Genome of an Individual 1 Human Genome Get reads (30X coverage) 1.2 Billion Paired End Reads of length 151 Map to the Reference Sequence 5
  • 6. Intel Labs Obtaining Genome of an Individual 6 1 Human Genome Get reads (30X coverage) 1.2 Billion Paired End Reads of length 151 28 min 164 min 64 min Illumina HiSeq X 10 BWA-MEM* BWA-MEM2* Among the most popular tools ~70K users *On single socket Intel® Xeon® Platinum 8180 Processor Map to the Reference Sequence 6
  • 7. Intel Labs 7 Genome Data Will Dwarf Everything Else
  • 8. Intel Labs 8 Population Genomics, Approaching Worldwide Scale Source: Frost & Sullivan, “Global Precision Medicine Growth Opportunities, Forecast to 2025”, January 2017
  • 9. Intel Labs 9 Population Genomics, Approaching Worldwide Scale Source: Frost & Sullivan, “Global Precision Medicine Growth Opportunities, Forecast to 2025”, January 2017 100 million - 2 billion human genomes expected to be sequenced by 2025! (That’s ~ 10-200 Exabytes!) Stephens, et. al. Big Data: Astronomical or Genomical?. PLOS Biology. (2015)
  • 10. Intel Labs 3 key kernels (each quite complex) consuming 15-45% of time – SMEM (Super Maximal Exact Match), SAL (Suffix Array Lookup), BSW (Banded Smith Waterman) with several heuristics – Different kernels can be the most time consuming depending on data – Time not covered by the kernels (Misc) is also significant Majority of other approaches target 1-2 of the 3 kernels on GPGPU/ FPGA – pipeline the rest on the host CPU – Performance bound by the non-optimized kernels running on CPU Accelerating BWA-MEM has Proven Difficult Approach SMEM SAL BSW Overall Multiple approaches - (CPU) - (CPU) 1.6x-3x (GPGPU/FPGA) 1.45x-2x Chang et. al. 2016 4x (FPGA) - (CPU) - (CPU) 1.26x Ahmed et. al. 2015 1.7x (CPU) 2.8x (4 FPGAs) 5.7x (4 FPGAs) 2.6x 10
  • 11. Intel Labs 3 key kernels (each quite complex) consuming 15-45% of time – SMEM (Super Maximal Exact Match), SAL (Suffix Array Lookup), BSW (Banded Smith Waterman) with several heuristics – Different kernels can be the most time consuming depending on data – Time not covered by the kernels (Misc) is also significant Majority of other approaches target 1-2 of the 3 kernels on GPGPU/ FPGA – pipeline the rest on the host CPU – Performance bound by the non-optimized kernels running on CPU Accelerating BWA-MEM has Proven Difficult Approach SMEM SAL BSW Overall Multiple approaches - (CPU) - (CPU) 1.6x-3x (GPGPU/FPGA) 1.45x-2x Chang et. al. 2016 4x (FPGA) - (CPU) - (CPU) 1.26x Ahmed et. al. 2015 1.7x (CPU) 2.8x (4 FPGAs) 5.7x (4 FPGAs) 2.6x Bypasses some of the heuristics – Get different output – Strict No No 11
  • 12. Intel Labs 3 key kernels (each quite complex) consuming 15-45% of time – SMEM (Super Maximal Exact Match), SAL (Suffix Array Lookup), BSW (Banded Smith Waterman) with several heuristics – Different kernels can be the most time consuming depending on data – Time not covered by the kernels (Misc) is also significant Majority of other approaches target 1-2 of the 3 kernels on GPGPU/ FPGA – pipeline the rest on the host CPU – Performance bound by the non-optimized kernels running on CPU Accelerating BWA-MEM has Proven Difficult No published work contains a holistic architecture-aware optimization of BWA-MEM software on multicore systems. Approach SMEM SAL BSW Overall Multiple approaches - (CPU) - (CPU) 1.6x-3x (GPGPU/FPGA) 1.45x-2x Chang et. al. 2016 4x (FPGA) - (CPU) - (CPU) 1.26x Ahmed et. al. 2015 1.7x (CPU) 2.8x (4 FPGAs) 5.7x (4 FPGAs) 2.6x Bypasses some of the heuristics – Get different output – Strict No No 12
  • 13. Intel Labs System Configuration Intel® Xeon® Platinum 8180 Processor Name used in the rest of the presentation SKX Sockets x Cores x Threads 2 x 28 x 2 VPUs/Core x AVX register width 2 x {512, 256, 128} Base clock frequency 2.5 GHz L1D/L2 cache / Core 32/1024 KB L3 cache / Socket 38.5 MB DRAM size / Socket, BW 96 GB, 114 GB/s Compiler version ICC v. 17.0.2 Performance on multiple sockets can be achieved by just distributing the reads equally and load imbalance is usually not an issue. Therefore, our efforts are focused on single socket performance. 13
  • 14. Intel Labs Datasets Reference Sequence Half of Human Genome (version HG38) - 1.5 Billion nucleotides Dataset # Reads Read Length Dataset Source D1 5 x 105 151 Broad Institute D2 5 x 105 151 Broad Institute D3 1.25 x 106 76 NCBI SRA: SRX020470 D4 1.25 x 106 101 NCBI SRA: SRX207170 D5 1.25 x 106 101 NCBI SRA: SRX206890 Read Datasets 14
  • 15. Intel Labs End to End Performance Gains On SKX – Compute Only Our output is identical to original BWA-MEM Single Thread of SKX Single socket (56 threads/28 cores) of SKX 15
  • 17. Intel Labs The Problem – Mapping to the Reference Sequence S1 S2 S4 S3 Sm Reference R CCCTCCTATTTAAC Query Q Find the best matches of 𝑄 in 𝑅 17
  • 18. Intel Labs FM-Index of the Reference Sequence FM-index of a sample reference sequence: AGTGGA. It consists of Suffix Array, Burrows Wheeler Transform (BWT), O and D arrays. Since BW-Matrix is lexicographically sorted, all the occurrences of a query appear contiguously in the suffix array (SA). These contiguous locations are called SA interval. 18
  • 19. Intel Labs FM-Index of the Reference Sequence FM-index of a sample reference sequence: AGTGGA. It consists of Suffix Array, Burrows Wheeler Transform (BWT), O and D arrays. Since BW-Matrix is lexicographically sorted, all the occurrences of a query appear contiguously in the suffix array (SA). These contiguous locations are called SA interval. 30 GB 1.5 GB 96 GB 19 Sizes for human genome
  • 20. Intel Labs Compressed FM-Index in BWA-MEM  To reduce memory footprint, the O array is divided into buckets of size 𝜂  For each bucket – nucleotide counts are stored for all the previous buckets – The corresponding BWT string of size 𝜂 is stored in a 2-bit per nucleotide format O(G, t) = 256 + 1 = 257 A:0 C:0 G:0 T:0 GGAAC…..AGCT A:35 C:30 G:31 T:32 TGAGC…..AGCT A:266 C:250 G:256 T:252 CGCCA…..TGAT 𝜂 = 128 tth index in BWT string Fig. based on Jing Zhang et. al. CCGrid’2013 20
  • 21. Intel Labs BWA-MEM Algorithm Seeding – Look for exact matches (regions) in the reference sequence for the substrings (seeds) of the query using compressed FM-Index – Super Maximal Exact Match (SMEM) – Suffix Array Lookup (SAL) – Chaining Extension – Extend the matches on either side to get end-to-end matches. Select matches with high similarity – Banded Smith Waterman (BSW) SAM-Form – Format the output in the SAM format 21
  • 22. Intel Labs BWA-MEM Algorithm Seeding – Look for exact matches (regions) in the reference sequence for the substrings (seeds) of the query using compressed FM-Index – Super Maximal Exact Match (SMEM) – Suffix Array Lookup (SAL) – Chaining Extension – Extend the matches on either side to get end-to-end matches. Select matches with high similarity – Banded Smith Waterman (BSW) SAM-Form – Format the output in the SAM format - Reorganization 22
  • 23. Intel Labs SMEM Algorithm from BWA-MEM - For One Position Reference: ATTCTTATGTA Read: GTTAC Forward extension phase Backward extension phase 23 1. Find maximal length query substrings with matches 2. Output the matches
  • 24. Intel Labs SMEM Algorithm from BWA-MEM - For One Position Reference: ATTCTTATGTA Read: GTTAC Forward extension phase 1. GTTAC Find T - <T, 7, 12> Backward extension phase 24 1. Find maximal length query substrings with matches 2. Output the matches
  • 25. Intel Labs SMEM Algorithm from BWA-MEM - For One Position Reference: ATTCTTATGTA Read: GTTAC Forward extension phase 1. GTTAC Find T - <T, 7, 12> 2. GTTAC Find TA - <TA, 7, 8> <T, 7, 12> Backward extension phase 25 1. Find maximal length query substrings with matches 2. Output the matches
  • 26. Intel Labs SMEM Algorithm from BWA-MEM - For One Position Reference: ATTCTTATGTA Read: GTTAC Forward extension phase 1. GTTAC Find T - <T, 7, 12> 2. GTTAC Find TA - <TA, 7, 8> <T, 7, 12> 3. GTTAC Find TAC – <TA, 7, 8> <T, 7, 12> Backward extension phase 26 1. Find maximal length query substrings with matches 2. Output the matches
  • 27. Intel Labs SMEM Algorithm from BWA-MEM - For One Position Reference: ATTCTTATGTA Read: GTTAC Forward extension phase 1. GTTAC Find T - <T, 7, 12> 2. GTTAC Find TA - <TA, 7, 8> <T, 7, 12> 3. GTTAC Find TAC – <TA, 7, 8> <T, 7, 12> Backward extension phase 1. GTTAC <TA, 7, 8> - Find TTA- <TTA, 11, 11> <T, 7, 12> - Find TT - <TT, 11, 12> 27 1. Find maximal length query substrings with matches 2. Output the matches
  • 28. Intel Labs SMEM Algorithm from BWA-MEM - For One Position Reference: ATTCTTATGTA Read: GTTAC Forward extension phase 1. GTTAC Find T - <T, 7, 12> 2. GTTAC Find TA - <TA, 7, 8> <T, 7, 12> 3. GTTAC Find TAC – <TA, 7, 8> <T, 7, 12> Backward extension phase 1. GTTAC <TA, 7, 8> - Find TTA- <TTA, 11, 11> <T, 7, 12> - Find TT - <TT, 11, 12> 2. GTTAC <TTA, 11, 11> - Find GTTA – Not found Add TTA to list of SMEMs <TT, 11, 12> - Find GTT – Not found 28 1. Find maximal length query substrings with matches 2. Output the matches
  • 29. Intel Labs SMEM Algorithm from BWA-MEM - For One Position Reference: ATTCTTATGTA Read: GTTAC Forward extension phase 1. GTTAC Find T - <T, 7, 12> 2. GTTAC Find TA - <TA, 7, 8> <T, 7, 12> 3. GTTAC Find TAC – <TA, 7, 8> <T, 7, 12> Backward extension phase 1. GTTAC <TA, 7, 8> - Find TTA- <TTA, 11, 11> <T, 7, 12> - Find TT - <TT, 11, 12> 2. GTTAC <TTA, 11, 11> - Find GTTA – Not found Add TTA to list of SMEMs <TT, 11, 12> - Find GTT – Not found Output SMEMs: <TTA, 11, 11> 29 1. Find maximal length query substrings with matches 2. Output the matches
  • 30. Intel Labs SMEM Algorithm from BWA-MEM: For One Position
  • 31. Intel Labs SMEM Algorithm from BWA-MEM: For One Position FM-Index
  • 32. Intel Labs SMEM Algorithm from BWA-MEM: For One Position FM-Index query
  • 33. Intel Labs SMEM Algorithm from BWA-MEM: For One Position m Forward extension p q m m+1 p1 q1 m m+2 p2 q2 … … ... ... m m+k pk qk FM-Index query
  • 34. Intel Labs SMEM Algorithm from BWA-MEM: For One Position m Forward extension p q m m+1 p1 q1 m m+2 p2 q2 … … ... ... m m+k pk qk m m+k pk qk … … … … m m+2 p2 q2 m m+1 p1 q1 m-1 m+k pk ’ qk ’ … … … … m-1 m+2 p2 ’ q2 ’ m-1 m+1 p1 ’ q1 ’ Backward extension FM-Index query
  • 35. Intel Labs SMEM Algorithm from BWA-MEM: For One Position m Forward extension p q m m+1 p1 q1 m m+2 p2 q2 … … ... ... m m+k pk qk m m+k pk qk … … … … m m+2 p2 q2 m m+1 p1 q1 m-1 m+k pk ’ qk ’ … … … … m-1 m+2 p2 ’ q2 ’ m-1 m+1 p1 ’ q1 ’ Backward extension m-2 m+k pk ’’ qk ’’ … … … … m-2 m+2 p2 ’’ q2 ’’ FM-Index query m-1 m+k pk ’ qk ’ … … … … m-1 m+2 p2 ’ q2 ’
  • 40. Intel Labs SMEM Algorithm No spatial locality 40
  • 41. Intel Labs SMEM Algorithm No spatial locality New values in the tuple depend on current values and the current nucleotide 41
  • 42. Intel Labs SMEM Algorithm No spatial locality Large # instructions for 𝜂 = 128 New values in the tuple depend on current values and the current nucleotide 42
  • 43. Intel Labs SMEM Algorithm No spatial locality Large # instructions for 𝜂 = 128 New values in the tuple depend on current values and the current nucleotide 43
  • 44. Intel Labs SMEM Algorithm – Key Optimizations  Software Prefetching – For any tuple that is added to the backward search buffer, we know the memory locations that will be accessed when the corresponding backward search occurs – So, we software prefetch it and hide prefetch latency with computation
  • 45. Intel Labs SMEM Algorithm – Key Optimizations  Reducing 𝜂 and vectorization – Reduced the value of 𝜂 to 32 – Store BWT string using 1-byte per nucleotide format – 32 bytes total – Process the 32 byte BWT using byte level AVX2 instrinsics to get the number of occurrences of a nucleotide – The four counts consume 4 bytes per letter – 16 bytes total – Added 16 bytes of padding to make 64 bytes to align along cache line boundary – one cache line to ensure the whole bucket can be prefetched using one instruction 45
  • 46. Intel Labs SMEM Algorithm – Results System: SKX, #Threads = 1 Read dataset: 60000 reads from D2 2x speedup 46
  • 47. Intel Labs Suffix Array Lookup - SAL SMEM outputs the suffix array interval Each suffix array index in the interval is looked up to get the reference sequence coordinate like this: Optimization: – Original BWA-MEM uses compressed suffix array to reduce memory footprint – but there is sufficient memory on current systems – So, we simply use uncompressed suffix array and look it up using the above expression 47
  • 48. Intel Labs SAL - Results System: SKX, #Threads = 1 Input data created by intercepting the data to SAL stage from an actual run using 600,000 reds from D2 183x speedup 48
  • 49. Intel Labs Banded Smith Waterman - BSW is gap open penalty is gap extension penalty 𝑓(𝑎, 𝑏) = match parameter, if a=b mismatch parameter, otherwise49 Regular Smith Waterman
  • 50. Intel Labs Banded Smith Waterman - BSW  Only a diagonal band is computed is gap open penalty is gap extension penalty 𝑓(𝑎, 𝑏) = match parameter, if a=b mismatch parameter, otherwise50 Regular Smith Waterman Banded Smith Waterman from BWA-MEM
  • 51. Intel Labs Banded Smith Waterman - BSW  Only a diagonal band is computed  Size of the band can dynamically change from top to bottom is gap open penalty is gap extension penalty 𝑓(𝑎, 𝑏) = match parameter, if a=b mismatch parameter, otherwise51 Regular Smith Waterman Banded Smith Waterman from BWA-MEM
  • 52. Intel Labs Banded Smith Waterman - BSW  Only a diagonal band is computed  Size of the band can dynamically change from top to bottom  Various conditions of early exit is gap open penalty is gap extension penalty 𝑓(𝑎, 𝑏) = match parameter, if a=b mismatch parameter, otherwise52 Regular Smith Waterman Banded Smith Waterman from BWA-MEM
  • 53. Intel Labs Banded Smith Waterman - BSW  Only a diagonal band is computed  Size of the band can dynamically change from top to bottom  Various conditions of early exit  Low parallelism within one matrix computation is gap open penalty is gap extension penalty 𝑓(𝑎, 𝑏) = match parameter, if a=b mismatch parameter, otherwise53 Regular Smith Waterman Banded Smith Waterman from BWA-MEM
  • 54. Intel Labs BSW – Optimizations – Inter-task Vectorization We hand vectorized using AVX512 SIMD intrinsics 54
  • 55. Intel Labs BSW – Optimizations – Inter-task Vectorization We hand vectorized using AVX512 SIMD intrinsics Challenges – Variable and dynamically changing band size – Early exits – Overhead of dynamic band computation 55
  • 56. Intel Labs BSW – Optimizations – Inter-task Vectorization We hand vectorized using AVX512 SIMD intrinsics Challenges – Variable and dynamically changing band size – Early exits – Overhead of dynamic band computation Sort the sequences according to band sizes to make the computation across pairs being vectorized more uniform 56
  • 57. Intel Labs BSW – Optimizations – Inter-task Vectorization We hand vectorized using AVX512 SIMD intrinsics Challenges – Variable and dynamically changing band size – Early exits – Overhead of dynamic band computation Sort the sequences according to band sizes to make the computation across pairs being vectorized more uniform Convert the sequences from AoS to SoA format to prevent gather/scatter cost 57
  • 58. Intel Labs BSW – Optimizations – Inter-task Vectorization We hand vectorized using AVX512 SIMD intrinsics Challenges – Variable and dynamically changing band size – Early exits – Overhead of dynamic band computation Sort the sequences according to band sizes to make the computation across pairs being vectorized more uniform Convert the sequences from AoS to SoA format to prevent gather/scatter cost SIMD Operations used – cmp, blend, max, mov, add, and sub, mask – Precision – Lower precision provides more performance – Precision required depends on max. score depends on sequence lengths – We choose 8-bit or 16-bit precision based on sequence lengths 58
  • 59. Intel Labs BSW - Results System: SKX, #Threads = 1 Input: 48 Million sequence pairs obtained by intercepting the input to this stage from a full application run. Read dataset used for full run: D3. 11.6x6.7x 59
  • 60. Intel Labs BSW - Results System: SKX, #Threads = 1 Input: 48 Million sequence pairs obtained by intercepting the input to this stage from a full application run. Read dataset used for full run: D3. 11.6x6.7x ~14x reduction in # instructions IPC is reduced because majority of instructions in optimized code are SIMD instructions There are 2 ports for SIMD (VPUs), but 4 for scalar 60
  • 61. Intel Labs BSW - Results System: SKX, #Threads = 1 Input: 48 Million sequence pairs obtained by intercepting the input to this stage from a full application run. Read dataset used for full run: D3. 11.6x6.7x ~14x reduction in # instructions IPC is reduced because majority of instructions in optimized code are SIMD instructions There are 2 ports for SIMD (VPUs), but 4 for scalar 61 Why not 512 8 = 64x speedup?
  • 62. Intel Labs BSW - Results System: SKX, #Threads = 1 Input: 48 Million sequence pairs obtained by intercepting the input to this stage from a full application run. Read dataset used for full run: D3. 11.6x6.7x ~14x reduction in # instructions IPC is reduced because majority of instructions in optimized code are SIMD instructions There are 2 ports for SIMD (VPUs), but 4 for scalar 62 Why not 512 8 = 64x speedup? Only 43% of the time is spent on cell computation using SIMD In which ~50% of lanes are idle – so, effectively ~21.5% for cell computation
  • 63. Intel Labs Multithread Scaling Scaling of three kernels and the entire application from 1 to 28 core on SKX We demonstrate nearly equal or better scaling on all kernels Application scaling is worse due to bad scaling of “Misc” section 63
  • 64. Intel Labs End to End Performance Results – Compute only All kernels retain their speedup in the end-to-end run SAL barely contributes to the run time due to 183x speedup Single Thread of SKX Single socket (56 threads/28 cores) of SKX 64
  • 65. Intel Confidential – Internal Only BWA-MEM2 Open Sourcing Drop-In Replacement Supported executions: AVX512, AVX2, SSE4.1, scalar Supported functionality: All the functionality of BWA-MEM including single end and paired-end alignments Output: Identical to BWA-MEM Command line interface: Exactly same as BWA-MEM Future Steps Algorithmic, implementation level (Misc) and architectural improvements https://github.com/bwa-mem2/bwa-mem2 65
  • 66. Intel Confidential – Internal Only Intel Legal Disclaimers  Intel, Xeon and Intel Xeon Phi are trademarks of Intel Corporation or its subsidiaries in the U.S. and/or other countries. Other names and brands may be claimed as the property of others. © Intel Corporation  Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more information go to www.intel.com/benchmarks.  Benchmark results were obtained prior to implementation of recent software patches and firmware updates intended to address exploits referred to as "Spectre" and "Meltdown". Implementation of these updates may make these results inapplicable to your device or system. 66
  • 67. Intel Confidential – Internal Only Thank You! Vasimuddin Md vasimuddin.md@intel.com @wasim_galaxy Sanchit Misra sanchit.misra@intel.com sanchit-misra@github.io @sanchit_misra Heng Li hli@jimmy.harvard.edu http://www.liheng.org/ @lh3lh3 Srinivas Aluru aluru@cc.gatech.edu https://www.cc.gatech.edu/~saluru/ 67