Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Ion Torrent Sequencing Applications:   Variant Calling, Barcoding, and Long            Range Mate Pairs                   ...
Contract Research Division• Five SOLiD4 sequencing platforms• One Life Techologies 5500XL• Two Ion Torrent PGMs• Automatio...
Agenda• Germ Line Variant Caller• Barcoding• Long Range Mate Pair Data
Variant Calling
Variant Calling• Goal: indentify SNPs  and INDELs   – High sensitivity      • Few false negatives   – High positive predic...
Variant Calling•   DH10B•   All identified variants are false positives•   PPV and sensitivity•   maq fakemut used to inse...
Samtools Defaults vs. Variant Calling              Settings• Default samtools setting not optimized for Ion  Torrent error...
PPV                        Corrected Sensitivity  Settings                   Total     SNPs     INDELs     Total          ...
PPV and Sensitivity of Samtools Analyses100.000% 80.000% 60.000% 40.000%                                                  ...
Similar Results with Public DH10B Runs                                      PPV and Sensitivity of Public DH10B Runs100.00...
Homopolymer Mutated Reference Genome                                                     Homopolymer PPV and Sensitivity10...
Conclusions• Variant Calling plugin    • Important to remember  able to identify >80%       Variant Calling is  well-cover...
Barcoding
Barcoding• HuRef gDNA• Compared read quality statistics with non-  barcoded run• IonSet barcodes 5-8• 11bp barcodes at beg...
Barcoding• 94.51% reads mapped  to barcodes used.• Variant Calling Report  for Each Barcode   – New feature in 1.5.1• Ion ...
Quality ComparisonBarcoded hg19 Run (TS 1.5.1)   Non-barcoded hg19 Run (TS 1.5.1)
Mapping Comparison
Conclusions• Similar quality between     • 318 Chip and Barcoding  barcoded and non            • Ion Torrent Community  ba...
Long Range Mate Pairs
Long Range Mate Pairs• Data provided by Ion Torrent• Average 10KB inserts• Split sff files with sff_extract utility      •...
Unsplit ReadsMetric                  MbpTotal Number of Bases   404.65Q17 Bases               207.67Q20 Bases             ...
Split Reads Metrics                                          2000000                                          1800000     ...
Reads 1• Per base sequence quality below Q20 after base 20• Analysis performed pre TS 1.5 release   • Predicted base quali...
Reads 2• Per base sequence quality below Q20• Second part of read in lower quality region of unsplit read• Homopolymer enr...
Insert Size        bwa                        tmapμ = 10189.78, σ = 1282.43    μ = 9751.20, σ = 2016.62
Mapping                                                   AQ17          AQ20        Perfect                           Tota...
Conclusion• Long reads capable of producing Mate Pair  reads   – Quality mapping   – Tight distribution around insert size...
ThanksEdge Bio Team                      Follow Us:• Lab                          •   EdgeBio Twitter: @EdgeBio   –    Joy...
Upcoming SlideShare
Loading in …5
×

Ion Torrent Sequencing Applications: Variant Calling, Barcoding, and Long Range Mate Pairs

9,465 views

Published on

EdgeBio discusses three applications for Ion Torrent sequencing that we have been exploring lately. We discuss the robustness of the included Germ Line Variant Caller, the barcoding capability on the Ion Torrent, and a new dataset of Long Range (10kb) Mate Pairs.

Published in: Technology, Business
  • Be the first to comment

Ion Torrent Sequencing Applications: Variant Calling, Barcoding, and Long Range Mate Pairs

  1. 1. Ion Torrent Sequencing Applications: Variant Calling, Barcoding, and Long Range Mate Pairs David Jenkins Bioinformatics Engineer EdgeBio
  2. 2. Contract Research Division• Five SOLiD4 sequencing platforms• One Life Techologies 5500XL• Two Ion Torrent PGMs• Automation thru Caliper Sciclone & Biomek FX• Life Technologies Preferred Service Provider• Agilent Certified Service Provider• Commercial partnerships with companies such as CLCBio, DNANexus and Genologics• MD/PhD & Masters Level Scientists and Bioinformaticians• IT Infrastructure of >100 CPUs and >100TB storage
  3. 3. Agenda• Germ Line Variant Caller• Barcoding• Long Range Mate Pair Data
  4. 4. Variant Calling
  5. 5. Variant Calling• Goal: indentify SNPs and INDELs – High sensitivity • Few false negatives – High positive predictive value • Few false positives• Challenge: distinguish between homopolymer sequencing error and true INDELs
  6. 6. Variant Calling• DH10B• All identified variants are false positives• PPV and sensitivity• maq fakemut used to insert artificial mutations – 220 SNPs and 239 INDELs• EdgeBio 316 Chip Run – 11.00x AQ17 coverage of genome• Goal: identify most sensitive (true pos./[true pos. + false neg.+) settings that don’t lose PPV (true pos./*true pos. + false pos.]) – Identify the most variants while avoiding calling any non- variants
  7. 7. Samtools Defaults vs. Variant Calling Settings• Default samtools setting not optimized for Ion Torrent error model – Lower base quality of candidates – Coverage from both strands – Strict requirements for homopolymers • two sequences from both strands
  8. 8. PPV Corrected Sensitivity Settings Total SNPs INDELs Total SNPs INDELs Samtools Default 6.014% 96.682% 3.203% 100% 100% 100% SettingsQ4, h100, o20, e27, m1, H1 39.672% 100% 25.060% 98.690% 99.550% 97.910%Q14, h100, o20, e21, m1, H2 79.565% 100% 64.259% 92.810% 98.180% 89.870% Q7, h50, o10, e17, m4, H1 93.523% 100% 86.486% 91.720% 99.090% 84.940%Q14, h50, o10, e17, m4, H1 95.148% 100% 89.655% 90.850% 98.180% 84.100% Variant Calling 95.676% 100% 90.533% 90.650% 99.550% 83.260% SettingsQ14, h50, o10, e17, m4, H2 97.175% 100% 93.631% 89.540% 96.360% 83.260%
  9. 9. PPV and Sensitivity of Samtools Analyses100.000% 80.000% 60.000% 40.000% Total PPV SNPs PPV INDELs PPV 20.000% Total Corrected Sensitivity SNPs Corrected Sensitivity INDELs Corrected Sensitivity 0.000% Default Samtools h100, o20, e27, m1, H1 o20, e21, m4, H2 o10, e17, m4, H1 Q4, Q14, h75, Q7, h50, Q14, h50, o10, e17, m4, H1Variant CallingQ14, h50, o10, e17, m4, H2
  10. 10. Similar Results with Public DH10B Runs PPV and Sensitivity of Public DH10B Runs100.00% 80.00% 60.00% Total PPV SNP PPV 40.00% INDEL PPV Total Sensitivity 20.00% SNP Sensitivity INDEL Sensitivity 0.00% Life Ion Torrent 314 Life Ion Torrent Life Ion Torrent 318 Life Ion Torrent Edge Bio Ion Life Ion Torrent 316 Life Ion Torrent 100MB 316LR DH10B Chip 314LR DH10B Torrent 316 DH10B DH10B 316LR DH10B >99% accuracy
  11. 11. Homopolymer Mutated Reference Genome Homopolymer PPV and Sensitivity100.00% 80.00% 60.00% Homopolymer PPV 40.00% Hompolymer Sensitivity 20.00% 0.00% Life Ion Torrent 314 Life Ion Torrent Life Ion Torrent 318 Life Ion Torrent Edge Bio Ion Life Ion Torrent 316 Life Ion Torrent 100MB 316LR DH10B Chip 314LR DH10B Torrent 316 DH10B DH10B 316LR DH10B >99% per base accuracy with long reads.
  12. 12. Conclusions• Variant Calling plugin • Important to remember able to identify >80% Variant Calling is well-covered INDELs Application Specific and >99% well-covered • Easy to re-run Germ SNPs Line Variant Caller with• Improves on custom settings. performance of default • More information at samtools settings by http://www.edgebio.com/blog/ avoiding false positive SNPs and INDELs
  13. 13. Barcoding
  14. 14. Barcoding• HuRef gDNA• Compared read quality statistics with non- barcoded run• IonSet barcodes 5-8• 11bp barcodes at beginning of the read
  15. 15. Barcoding• 94.51% reads mapped to barcodes used.• Variant Calling Report for Each Barcode – New feature in 1.5.1• Ion Community Feature Requests – Aligning barcodes to different references – Find out what community wants
  16. 16. Quality ComparisonBarcoded hg19 Run (TS 1.5.1) Non-barcoded hg19 Run (TS 1.5.1)
  17. 17. Mapping Comparison
  18. 18. Conclusions• Similar quality between • 318 Chip and Barcoding barcoded and non • Ion Torrent Community barcoded runs – Technical details• Robust set of barcodes – Desired Features• Losing first 11 high – Troubleshooting quality bases to the barcode – Explains lower initial quality
  19. 19. Long Range Mate Pairs
  20. 20. Long Range Mate Pairs• Data provided by Ion Torrent• Average 10KB inserts• Split sff files with sff_extract utility • >IA_A • CTGCTGTACGGCCAAGGCGGATGTACGGTACAGCAG • >IA_B • CTGCTGTACCGTACATCCGCCTTGGCCGTACAGCAG• Can reads map successfully with average 10KB inserts? – Increasing homopolymers farther into read
  21. 21. Unsplit ReadsMetric MbpTotal Number of Bases 404.65Q17 Bases 207.67Q20 Bases 150.07Total Number of Reads 2,308,396Mean length [bp] 175Longest Read [bp] 365 From: http://flxlexblog.wordpress.com
  22. 22. Split Reads Metrics 2000000 1800000 1600000Type Count Percent 1400000 Total Reads 2,308,396 1200000Orphan Reads 220,707 9.561% 1000000 Partial Linker 106,913 4.631%Multiple Linker 29 0.001% 800000 Too Short 1,757 0.076% 600000Correctly Split 1,978,990 85.730% 400000 200000 0 Orphan Partial Multiple Too Short Correctly Reads (1 Linker Linker Split Reads seq) Found Occuracnes
  23. 23. Reads 1• Per base sequence quality below Q20 after base 20• Analysis performed pre TS 1.5 release • Predicted base quality has improved• Homopolymer enrichment relatively consistent across the read
  24. 24. Reads 2• Per base sequence quality below Q20• Second part of read in lower quality region of unsplit read• Homopolymer enrichment still fairly uniform
  25. 25. Insert Size bwa tmapμ = 10189.78, σ = 1282.43 μ = 9751.20, σ = 2016.62
  26. 26. Mapping AQ17 AQ20 Perfect Total Number of Bases [Mbp] 218.55 179.37 170.28 Mean Length [bp] 70 63 60 Longest Alignment [bp] 173 171 167 Mean Coverage Depth 46.6x 38.3x 36.3x Percentage of Library Covered 99.99% 99.99% 99.99% Read >= 2 Reads Unmapped Excluded Clipped Perfect 1 mismatchLength [bp] mismatches 50 3,240,310 15,981 20 0 1,810,959 744,229 669,121 100 349,925 1,340 5 49,717 104,928 72,110 121,825 150 1,944 73 0 851 127 172 721
  27. 27. Conclusion• Long reads capable of producing Mate Pair reads – Quality mapping – Tight distribution around insert size• Human Application – With longer insert sizes (40kb) could be used to resolve structural variation• Blog post coming soon: – http://www.edgebio.com/blog/
  28. 28. ThanksEdge Bio Team Follow Us:• Lab • EdgeBio Twitter: @EdgeBio – Joy Adigun • David Jenkins Twitter: @dfjenkins3 – Jennifer Sheffield • Justin Johnson Twitter: @BioInfo – Ryan Mease • djenkins@edgebio.com – Rossio Kersey • http://www.edgebio.com/blog/• Informatics – Anju Varadarajan – Phil Dagosto• Justin Johnson• John Seed• Dean Galaas

×