This document summarizes methods for improved structural variant (SV) detection and interpretation from long-read sequencing data. It describes:
1. A breast cancer study using 75x PacBio coverage that detected SVs through alignment, copy number analysis, and assembly-based variant calling.
2. Tools the author has developed or improved for long-read SV analysis including NextGenMap-LR for alignment, Sniffles for SV detection, and SplitThreader for SV interpretation.
3. How the author's approaches offer more accurate SV detection over existing methods by improving alignments and detection algorithms as well as enabling assembly-guided analysis and reconstruction of complex cancer genome rearrangements.
3. Breast Cancer Pipeline
PacBio Sequencing
Alignment with
BWA-MEM
Copy number
analysis
SV-calling with
Lumpy
Graphical genome
threading analysis
Assembly with
Falcon on
DNAnexus
Alignment with
MUMmer
(nucmer)
Call variants
between
consecutive
alignments
Call variants within
alignments
IsoSeq
transcriptome
analysis
...
Detailed analysis of
Her2
amplifications
Validation using
PCR.
Illumina
sequencing
4. What we learned
Illumina:
– Nested SV are hard to detect
– Problems with repetitive regions
PacBio:
– Tools are not yet as accurate
– Erroneous alignments hide SV
Assembly vs. mapping
– Detection of more SV over assembly
– Assembly misses low frequent SVs
– Comparing the assembly to the genome is still challenging
5. What we can offer
• Improve alignments:
– NextGenMap-LR*
• Improve SV detection:
– Survivor (illumina)
– Sniffles (PacBio)*
– Assembly guided detection (PacBio)
• Interpretation of the SV:
– SplitThreader*
6. NextGenMap-LR
• Novel algorithms to make mapper aware
of SV region
– Builds on very fast and robust NextGenMap
algorithms (Sedlazeck et al. Bioinformatics. 2013)
• Major benefits from pairwise alignment with
improved scoring functions
– Accounts for long (SV) and smaller (error) gaps
– SVs are better represented, cleaner breakpoints
9. Sniffles
• Predicts: deletion, duplication, insertion,
inversion, translocations and nested SV
• Designed to overcome the problems of
current SV callers.
• Scans for erroneous regions in the alignment
and predict if they overlap.
15. How we could contribute
1. Calling SV using PacBio
2. Calling SV using Illumina
3. Interpretation of Structural Variants
Thanks to:
Schatz group + McCombie group + Hicks
group + OICR + PacBio + DNAnexus
Editor's Notes
Cancer in the tube vs. Genome in the bottle
* Things I will talk about today.
NOTE: All results use only PacBio read alignments with the highest mapping quality of 60