Long read sequencing - LSCC lab talk - fri 5 june 2015

Long read sequencing
Torsten Seemann
VLSCI LSCC Lab Talk - Melbourne, AU - Fri 5 June 2015
The good, the bad, and the really cool.

Structural variation
The missing heritability - not just SNPs & indels

Pacific Biosciences RSII
2015 ARC LIEF
w/ Tim Stinear
Installed this week.
Passed testing!

Oxford Nanopore MinION MkI
Successor to Mk0
MinION Access
Program Round 2
The up & comer!

PacBio
It’s already here and it works.

PacBio - the device
∷ It’s big!
∷ Three chunks
: compute (left)
: robotics (top)
: sequencing (bottom)
∷ A cushion of N2
gas

PacBio - technology
∷ Polymerase bound to
bottom of ZMW μ-well
∷ Fluorescent nucleotide
incorporation
measured in real time
∷ 3 hour “movies”

PacBio: read lengths
Needs careful
library prep to
ensure DNA is
not overly
fragmented!

PacBio: error rate
Single read: 86% 30x Consensus: 99.999%

PacBio: main applications
∷ Finished microbial genomes
∷ Full length cDNA (mRNA isoforms)
∷ Extreme GC sequence
∷ HLA / MHC / KIR haplotyping
∷ Base modifications (methylation)

PacBio: bioinformatics
∷ All in GitHub
∷ SMRT Portal
: Nice GUI
: Cloud ready
: Linux backend
: Cluster ready
∷ Cmdline too!

Oxford Nanopore
The new kid on the block.

PromethION - large scale
∷ 48 separate
flow cells
∷ On board ASIC
∷ Runs Python

Nanopore - types of reads
“1D reads”
∷ Template 1D
﹕ only fwd stran
∷ Complement 1D
﹕ only rev strand
“2D reads”
∷ Normal 2D
﹕ mostly fwd, some rev
∷ Full 2D
﹕ most of fwd & rev
﹕ these are high quality

Nanopore - read lengths
Read length is not limited
by technology but by
library preparation.
Can get >100kbp reads.
Read length

Nanopore - error rate
∷ 5-mer errors
∷ Not modelling
base mods yet
∷ Basically
where PacBio
was a few
years ago!
Percent identity (aligned)

MinION - applications
∷ Same as PacBio plus....
∷ Portable sequencing
: in the field eg. Josh Quick in Guinea for Ebola
: in hospitals - infection control
: monitoring - water/food supply, production facilities
: at the GP - pathogen test in 10 min from blood prick?
: spit in a home device every morning?

MinION - bioinformatics
∷ Event space -vs- base space
: MinION MkI - base calling in cloud (Metrichor)
: MinION MkII - on device?
: PromethION - can choose on-device add-on
∷ Mostly 3rd-party tools - lots of activity
: poretools, poRe
: minoTour, nanoPolish

Disruptive technology
Just another sequencer?

“Run until”
Dynamically adjust sequencing yield

“Read until”
∷ Can access events/bases during reading
: remember reads are long 40 kbp
: examine first 100 bp say
: can decide to stop reading and eject molecule!
∷ This is a killer app!
: only want pathogens? eject if human DNA
: only want exome? eject if not exonic looking
: controlled with Python code

A new business model
∷ No capital or reagent costs
: Instrument will be free
: Flow cells will be free
: Only pay for what you want to sequence
: Min. $20 and ~$1000 for a 100x human genome
∷ But I’ll scam the system!
: Flowcell stats sent back to base
: Won’t send you new flow cells if they look unused

Some things never change
∷ Don’t worry!
: 50% of our job will always be converting file formats ☺
∷ But things are improving
: Pacbio: HDF5
: MinION: HDF5 / FAST5
∷ Can convert .h5/.hd5 to .fastq easily

Read alignment
∷ PacBio
: BLASR - Basic Local Alignment + Successive Refinement
: BWA MEM - bwa mem -x pacbio
∷ MinION
: MarginAlign - sum over possible alignments, HMMs
: BWA MEM - bwa mem -x ont
∷ Need to modify variant caller parameters

De novo assembly
∷ Pacbio
: HGAP, HGAP2, Falcon, Spades, Celera Assembler
∷ MinION
: Spades, Celera Assembler, NanoPolish
∷ Lots of convergence
: Similar error models (indels)
: Long reads, lower coverage - back to the future!

Streaming analysis
∷ We are not going to keep all this data
∷ Extract info we need and discard
∷ Cheaper to resequence?
∷ Need to think streaming analyses
∷ Lots of new applications

Exciting times!
∷ Genomics is changing all the time
: new technologies
: changing attributes/properties of current technology
∷ Bioinformaticians need to be able to adapt
: focus on key skills not specific apps
∷ Pipelines are often short lived
: except maybe clinical / accredited ones

Contact
∷ tseemann.github.io
∷ t.seemann@unimelb.edu.au
∷ @torstenseemann

Long read sequencing - LSCC lab talk - fri 5 june 2015

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to Long read sequencing - LSCC lab talk - fri 5 june 2015

Similar to Long read sequencing - LSCC lab talk - fri 5 june 2015 (20)

More from Torsten Seemann

More from Torsten Seemann (16)

Recently uploaded

Recently uploaded (20)

Long read sequencing - LSCC lab talk - fri 5 june 2015