SV Detection via 
Anchored Assembly 
How can we best call structural variants? 
Becky Drees,Jeremy Bruestle, Cheinan Marks
SV Detection via Anchored Assembly 
Brief Description of Anchored Assembly Method 
Testing vs GIAB Variant Set & Validated...
Input data 
Any Species 
with a draft genome 
Existing NGS Data 
No special library prep 
~20x per ploidy 
Please do not d...
Step 1: Read Correction 
A* error correction 
1000 2000 3000 4000 5000 
0 
K-mer Quality Score Distribution 
0 200 400 600...
Step 2: Remove Reference Matches 
Please do not distribute without permission. 
! 
• Remove reads that are an 
exact match...
Step 3: Read Overlap Graph 
Read overlap 
assembly 
R7 R8 
R3 R6 R9 
8 9 8 9 
Please do not distribute without permission....
Step 4: Anchoring 
Please do not distribute without permission. 
! 
• Anchor assemblies to 
reference coordinates 
• Provi...
Step 5: Variant Validation 
Variant validation 
T T A G A T A A C A 
Please do not distribute without permission. 
! 
• As...
NA12878 SNP Detection vs GIAB 
Please do not distribute without permission. 
Anchored)Assembly)only) 
13,307) 
Genome)in)a...
NA12878 Indel Detection vs GIAB 
Please do not distribute without permission.
NA12878 SV Insertions 
Chr. Mills 
Pindel 
50x 
AA 
50x AA 
200x 
1 247579917 
2 2576951 n n 
2 78558069 n n n 
2 187...
NA12878 SV Deletions 
Please do not distribute without permission.
How to describe SVs from breakpoints? 
#CHROM 
POS 
ID 
REF 
ALT 
QUAL 
FILTER 
1 
1500000 
bnd_A 
T 
T[1:1501108[ 
100 
P...
How to describe SVs from breakpoints? 
Assembled breakpoints can reveal variation that is hard to categorize 
• Different ...
How to describe SVs from breakpoints? 
A single breakpoint can contain multiple sequence changes: 
! 
• Inserted sequence ...
How to describe SVs from breakpoints? 
Many assemblies anchor to multiple genome locations 
• Variation in duplicated geno...
Contact 
• More information 
• Trial on own data 
! 
becky@spiralgenetics.com 
niranjan@spiralgenetics.com 
! 
info@spiral...
Questions? 
Please do not distribute without permission.
Anchored Assembly SNP Distribution 
Please do not distribute without permission.
Anchored Assembly SV Distribution 
Please do not distribute without permission.
Upcoming SlideShare
Loading in …5
×

Aug2014 spiral genetics anchored assembly

899 views

Published on

Aug2014 spiral genetics anchored assembly

Published in: Health & Medicine
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
899
On SlideShare
0
From Embeds
0
Number of Embeds
6
Actions
Shares
0
Downloads
14
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Aug2014 spiral genetics anchored assembly

  1. 1. SV Detection via Anchored Assembly How can we best call structural variants? Becky Drees,Jeremy Bruestle, Cheinan Marks
  2. 2. SV Detection via Anchored Assembly Brief Description of Anchored Assembly Method Testing vs GIAB Variant Set & Validated SV Sets How Do We Describe SVs from Detected Breakpoints? Please do not distribute without permission. !
  3. 3. Input data Any Species with a draft genome Existing NGS Data No special library prep ~20x per ploidy Please do not distribute without permission.
  4. 4. Step 1: Read Correction A* error correction 1000 2000 3000 4000 5000 0 K-mer Quality Score Distribution 0 200 400 600 800 1000 1200 K-mer Count Please do not distribute without permission. Total K-mer Quality Score ! • Similar to Euler or Quake • Corrects the read without using reference information • Reduces error from 1% to 0.01%
  5. 5. Step 2: Remove Reference Matches Please do not distribute without permission. ! • Remove reads that are an exact match to reference • Significantly reduces the complexity of the graph • Reduces required memory usage (40GB for whole human genome)
  6. 6. Step 3: Read Overlap Graph Read overlap assembly R7 R8 R3 R6 R9 8 9 8 9 Please do not distribute without permission. ! • Construct a read overlap graph with the remaining reads • Provides more context than a kmer-based de Bruijn graph 7 7 7 7 7 8 7 R1 R2 R3 R5
  7. 7. Step 4: Anchoring Please do not distribute without permission. ! • Anchor assemblies to reference coordinates • Provide breakpoint information while keeping reference bias low Anchoring
  8. 8. Step 5: Variant Validation Variant validation T T A G A T A A C A Please do not distribute without permission. ! • Assemble variant sequence from read overlap graph • Computes minimal cost variation (similar to Smith- Waterman) • Calls variants and QC to remove likely false positives A A T G A C T T A G . . A G A C T T A G A T A A C C T T A G A T A A C A T T A G A T A A C A T T G G A T A A C A T T G G A C T T A G A T A A C A T T G T A G Reference Assembled R2 R3 R4 R5 R6
  9. 9. NA12878 SNP Detection vs GIAB Please do not distribute without permission. Anchored)Assembly)only) 13,307) Genome)in)a)Bo8le)only) 144,463) ! 2,596,897) Sensi@vity:))95%) Precision:))99.5%)
  10. 10. NA12878 Indel Detection vs GIAB Please do not distribute without permission.
  11. 11. NA12878 SV Insertions Chr. Mills Pindel 50x AA 50x AA 200x 1 247579917 2 2576951 n n 2 78558069 n n n 2 187143096 n 2 191002548 n n n 3 43972635 n n n 3 100737223 n n n 3 100868475 n n n 3 195823764 n n n 5 78035993 n n n 7 1528948 n n n 7 2089876 8 22717662 n n n 9 97387403 n 9 137361862 n 12 103954170 n n 13 76345722 n n n 13 113760939 13 114103496 n n 15 26060663 n n 15 92686723 n 17 39240782 17 77134774 n 18 74794821 n n 18 76182038 n n n 19 1278240 n n n 19 2247173 n n n 20 55992535 n n 21 39080014 n n X 94894756 n n Mills et al. Eichler Lab, U. Washington, Sanger validated Please do not distribute without permission.
  12. 12. NA12878 SV Deletions Please do not distribute without permission.
  13. 13. How to describe SVs from breakpoints? #CHROM POS ID REF ALT QUAL FILTER 1 1500000 bnd_A T T[1:1501108[ 100 PASS INFO FORMAT SAMPLE DP=26;NS=1;SVTYPE=BND;MATEID=bnd_B;AID=1234 DP:ED:OV 26:72:89 #CHROM POS ID REF ALT QUAL FILTER 1 1501108 bnd_B G ]1:1500000]G 100 PASS INFO FORMAT SAMPLE DP=26;NS=1;SVTYPE=BND;MATEID=bnd_A;AID=1234 DP:ED:OV 26:72:89 Please do not distribute without permission. As breakend records: As SV events:
  14. 14. How to describe SVs from breakpoints? Assembled breakpoints can reveal variation that is hard to categorize • Different events can produce similar breakpoints • Multiple breakpoints can represent a single rearrangement event Please do not distribute without permission. CHR$1$ bnd_K$ bnd_L$ bnd_M$ bnd_N$ 200000$ 190000$ 197000$200231$
  15. 15. How to describe SVs from breakpoints? A single breakpoint can contain multiple sequence changes: ! • Inserted sequence at deletion breakpoints • Deleted or duplicated sequence at insert breakpoints • Deleted or duplicated sequence at inversion breakpoints deleted sequence duplicated sequence Please do not distribute without permission. CHR$1$ 1700000$ 1704100$ 1700100$ 1704250$ Inverted(sequence(
  16. 16. How to describe SVs from breakpoints? Many assemblies anchor to multiple genome locations • Variation in duplicated genome regions • Variation in repetitive elements • Transposons anchors to multiple places Please do not distribute without permission. CHR$1$ Alu$ unique anchor
  17. 17. Contact • More information • Trial on own data ! becky@spiralgenetics.com niranjan@spiralgenetics.com ! info@spiralgenetics.com Please do not distribute without permission.
  18. 18. Questions? Please do not distribute without permission.
  19. 19. Anchored Assembly SNP Distribution Please do not distribute without permission.
  20. 20. Anchored Assembly SV Distribution Please do not distribute without permission.

×