Successfully reported this slideshow.
Your SlideShare is downloading. ×

BioSB meeting 2015

Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad

Check these out next

1 of 25 Ad

More Related Content

Slideshows for you (20)

Advertisement

Similar to BioSB meeting 2015 (20)

Recently uploaded (20)

Advertisement

BioSB meeting 2015

  1. 1. Scaffolding using long nanopore reads and more Hans Jansen Christiaan Henkel senior scientist
  2. 2. Dutch SME at Bioscience Park in Leiden, the Netherlands • High throughput drug screens, and toxicity assays in zebrafish larvae • Fish fertility (eel, pike perch, sole) to aid sustainable aquaculture • Sequencing (genomes, transcriptomes) • Bioinformatics ZF-screens B.V.
  3. 3. Genome projects Common carp (Cyprinus carpio) High troughput screening model Genome and transcriptomes European and Japanese eel (Anguilla anguilla and Anguilla japonica) Completing the life cycle in aquaculture Genome and transcriptomes King cobra (Ophiophagus hannah) Evolution and toxins Genome and transcriptomes But the quality of these genomes can be improved
  4. 4. But MAP is much more. It is about being a community and a playground to test new applications. As Gordon Sanghera (CEO of ONT) said "MAP will never end. There will always be a MAP“. So if you think you're application can benefit from nanopore sensing then come join MAP and play with us. Visible as a web portal with information from ONT and social media like system with blog possibilities, comment, likes, and a forum to ask advice. MinION Access Program
  5. 5. We entered when MAP started. Our first MinION arrived in April 2014 and the first kits in June. Since then run 30 Flow Cells. MAPpers competition Topped the leaderboard on read length and yield so we now have three MinION's. MinION Access Program and ZF-genomics
  6. 6. Longest 2D read: 93.5 Kbp Longest template read: 120 Kbp (231 Kbp) Highest yield: 1.32 Gevents R7 0 50 100 150 200 250 300 350 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 Base pairs sequenced (Mbp) Runs template and 2D yield over the past year template 2D R7.3R6
  7. 7. Scaffolding genomes using long reads or How to untangle the assembly graph
  8. 8. Cheap short read sequencing technology has been used to generate many draft genomes repeatunique sequence in unique sequence out Draft genomes made with short read data suffer from a fundamental problem. Reads that are shorter than the length of a repeat can’t connect the unique sequence in with the unique sequence out Genomic sequences Short reads
  9. 9. repeatunique sequence in unique sequence out Long reads can help to resolve repeat area’s in the assembly graph And the resulting contigs will now look like this: Untangle
  10. 10. 1. Short read correction Quake (not for small genomes) 2. Short read assembly Velvet 3. MinION read alignment to Velvet contigs LAST 4. Link filtering and contig tiling Untangle script 5. Path detachment around repeats Untangle script 6. Bubble popping Untangle script 7. Delete unconfirmed connections Untangle script 8. Contig extraction Untangle script Assembly and scaffolding strategy Task Software
  11. 11. Agrobacterium strain NCPPB 1771 Agrobacteria are the cause of crown gall disease, a tumorous growth of plant tissue. Agrobacteria transfer part of their (plasmid) DNA to their host and this feature is used widely in plant research to genetically modify plants. Agrobacteria have two chromosomes, and carry several plasmids. This strain also carries active transposons.
  12. 12. NCPPB 1771 assembly graph 25× transposon → (1160 bp) 8× transposon → (873 bp) 4× rRNA → (6.4 Kb) 271 nodes, 311 connections 154 contigs N50 = 198 Kb Sum = 5.87 Mb
  13. 13. • Alignment: LAST with optimized settings • Links: alignment filtering and contig tiling • 7328 reads aligned to contigs • 438 reads aligned to multiple contigs • 585 links between contigs • 13158 reads on R6 and R7 chemistry • 73.8 Mb total yield (template and 2D) • 5–85970 nt length, typical ~12 Kb MinION sequencing and scaffolding
  14. 14. Links between nodes are specific Means link is confirmed by PCR
  15. 15. Final assembly graph after scaffolding • 271 nodes + 312 connections → 49 nodes + 5 connections • 154 contigs → ~8 contigs • Complete chromosome 2 (1.2 Mb), pTi (190 Kb), cryptic megaplasmid (746 Kb) • Slight residual fragmentation of chromosome 1
  16. 16. MinION Analysis and Reference Consortium MARC is a consortium within MAP that seeks to establish sources of variation, optimize protocols and analysis. It is open science. Data is shared in the consortium and will be made available through ENA. ~100 people have signed up. ~7 experimental groups and ~4 analysis groups are actively working. Managed by weekly TC.
  17. 17. Different phases in MARC Phase 1 is about being as standard as possible and establish variation in the system and between sites. This is done by 5 labs in the Netherlands, UK (2), USA ( east and west coast). Phase 2 is all about tweaking the protocol. Things like DNA isolation, shearing (or not), running scripts, DNA modifications will be addressed in this phase. Phase 3 is about examples of applications. MinION Analysis and Reference Consortium
  18. 18. MinION Analysis and Reference Consortium In phase 1 the 5 participating labs received Escherichia coli str. K-12 substr. MG1655. Performed DNA isolation, library prep, and sequencing according to a detailed protocol. Per lab a total of 4 libraries with 2 different kits were prepared and run. This provides a excellent data set to understand sources of variance in ONT data.
  19. 19. 5e+04 1e+05 40000 50000 60000 70000 80000 90000 Run2 Run1 Sample CSH UCSC UEA WTCHG ZF Total Traces 5e+04 1e+05 40000 50000 60000 70000 Run2 Run1 Sample CSH UCSC UEA WTCHG ZF Template Reads 20000 40000 60000 20000 30000 40000 Run2 Run1 Sample CSH UCSC UEA WTCHG ZF Complement Reads 10000 20000 30000 40000 20000 30000 40000 Run2 Run1 Sample CSH UCSC UEA WTCHG ZF 2D Reads Read Counts
  20. 20. Read Length Statistics 4000 4500 5000 5500 2000 3000 4000 5000 6000 Run2 Run1 Sample CSH UCSC UEA WTCHG ZF Template Mean 3500 4000 4500 5000 5500 2000 4000 6000 Run2 Run1 Sample CSH UCSC UEA WTCHG ZF Template Median 4000 4500 5000 5500 3500 4000 4500 Run2 Run1 Sample CSH UCSC UEA WTCHG ZF Template STDEV 4000 4500 5000 5500 6000 2000 3000 4000 5000 6000 7000 Run2 Run1 Sample CSH UCSC UEA WTCHG ZF Complement Mean 4000 5000 2000 4000 6000 Run2 Run1 Sample CSH UCSC UEA WTCHG ZF Complement Median 3250 3500 3750 3000 3500 4000 4500 Run2 Run1 Sample CSH UCSC UEA WTCHG ZF Complement STDEV 4500 5000 5500 6000 6500 2000 4000 6000 Run2 Run1 Sample CSH UCSC UEA WTCHG ZF 2D Mean 4000 5000 6000 2000 4000 6000 Run2 Run1 Sample CSH UCSC UEA WTCHG ZF 2D Median 3000 3500 4000 2500 3000 3500 4000 Run2 Run1 Sample CSH UCSC UEA WTCHG ZF 2D STDEV
  21. 21. 60 65 70 75 40 50 60 70 Run2 Run1 Sample CSH UCSC UEA WTCHG ZF Template % aligned 72 76 80 84 60 70 Run2 Run1 Sample CSH UCSC UEA WTCHG ZF Complement % aligned 92 93 94 95 85 90 Run2 Run1 Sample CSH UCSC UEA WTCHG ZF 2D % aligned 60 61 62 63 60.0 62.5 65.0 67.5 70.0 Run2 Run1 Sample CSH UCSC UEA ZF Template 4 Sites 70 72 74 76 78 72 74 76 Run2 Run1 Sample CSH UCSC UEA ZF Complement 4 Sites 91.5 92.0 92.5 93.0 92.5 93.0 93.5 94.0 Run2 Run1 Sample CSH UCSC UEA ZF 2D 4 Sites Read Alignments
  22. 22. With the data of the first 10 runs analyzed we can already see that read length has a stronger lab effect than base pair identity to the reference. Another set of 10 phase 1 runs is currently being analyzed and will give a clearer picture on variability. Experiments for phase 2 will start shortly, while in parallel phase 3 experiments and analysis are being done. Conclusions and perspectives
  23. 23. The king cobra genome Rapid expansion of the 3 FTx gen family in the king cobra
  24. 24. London Calling 2015 Highlights from Clive Brown’s talk • Improvements to the basecaller . There’s still room for improvement. • Read until (and barcoding). • Fast mode on the MinION MkI (500 bp/sec instead of 30) • New 3000 channel ASIC with crumpet chip design to separate ASIC and fluidics part. • MinION MkII and PromethION will have this new ASIC. • Library prep on beads to reduce amounts of DNA needed (lower ng to pg). • Direct RNA sequencing. • Simplified sample preparation and VolTRAX. • Pricing will be “pay as you go”. Initial payment for hardware include some hrs sequencing. • MkI $270 and 3 hrs sequencing (~3 Gbp in fast mode).
  25. 25. Acknowledgements Prof. Dr. Paul Hooykaas, Leiden University Christiaan Henkel senior scientist Leiden University Ron Dirks (CEO of ZF-screens B.V.) All members of the MARC consortium Ewan Birney, EMBL-EBI Justin O’Grady, UEA Sara Goodwin, CSHL David Buck, WTCHG Oxford Vadim Zalunin, EMBL-EBI Miten Jain, UCSC Matt Loose, Nottingham Jared Simpson, OICR, Toronto

Editor's Notes

  • Excuse me if I may sound like a ONT salesperson, but the truth is nanopore sensing is a very powerful method to measure many different things and it will show up on many different places in your life over the next decade or two.

×