Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.

Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.

Successfully reported this slideshow.

Like this presentation? Why not share!

6,516 views

Published on

London Calling 2015

Published in:
Science

No Downloads

Total views

6,516

On SlideShare

0

From Embeds

0

Number of Embeds

3,210

Shares

0

Downloads

76

Comments

0

Likes

5

No embeds

No notes for slide

- 1. Jared Simpson ! Ontario Institute for Cancer Research & Department of Computer Science University ofToronto Error correction, assembly and consensus algorithms for MinION data London Calling, May 14th, 2015
- 2. Our collaboration 2
- 3. An overview of NGS assembly • Illumina data: short reads, very accurate, very deep • nearly all Illumina assembly is based on exact matching algorithms • fragmented assemblies ! • Algorithms for Illumina data do not work for long, noisy reads • PacBio developed a pipeline (“HGAP”) to assemble their data • We used this recipe as a starting point but with custom components 3
- 4. Long read assembly pipeline 4 Error correction Celera Assembler Consensus Input reads Genome Assembly
- 5. Input Data • First challenge is ﬁnding overlaps for reads with 15-20% errors 5
- 6. Overlap Detection 6 we use github.com/thegenemyers/daligner to compute overlaps
- 7. Partial Order Graphs 7 add read GCTACGAT that we want to correct to graph
- 8. Partial Order Graphs 8 add sequence GCTCGAT to graph
- 9. Partial Order Graphs 9 add sequence GCTCGATT to graph
- 10. Partial Order Graphs 10 maximum weight path GCTCGAT is the corrected read
- 11. Error Correction 11
- 12. Contig Assembly 12 Celera Assembler produces one contig at 98.5% identity
- 13. Assembly Polishing • Consensus problem is viewed as choosing a sequence C’ that maximizes the probability of the event data 13 C0 = arg max S2C P(D|S) P(D|S) = rY k=1 P(ei,k, ei+1,k, ..., ej,k|S, ⇥) where
- 14. Selecting a Consensus 14 Mutate ACTACGATCGACTTACGA CCTACGATCGACTTACGA TCTACGATCGACTTACGA ... -CTACGATCGACTTACGA G-TACGATCGACTTACGA GC-ACGATCGACTTACGA GCT-CGATCGACTTACGA ... GACTACGATCGACTTACGA GCCTACGATCGACTTACGA GGCTACGATCGACTTACGA GTCTACGATCGACTTACGA P(D|S) -190 -187 -192 -176 -191 -193 -168 -198 -191 -195 -181 GCTACGATCGACTTACGA
- 15. Selecting a Consensus 15 GCT-CGATCGACTTACGA Mutate A C T ... - G GC GCT ... G G G G P(D|S) -190 -187 -192 -176 -191 -193 -168 -198 -191 -195 -181 Select new consensus GCT-CGATCGACTTACGA -168
- 16. Pore Models 16
- 17. Generating Events • What do we expect events from a given sequence to look like? 17 GCTACGATT Sample Current ●●● ● ●●●●● ●● ● ● ●● ● ●●● ● ●● ● ●●●●● ● ● ● ●●●● ● ●● ●● ●●●●● ● ● ●●●● ● ●● ●● ● ● ● ●●● ●●● ●●●● ●● ● ● ●●●● ● ● ● ● ●●● ●● ●●●●●●● ● ● ● ● ● ●●● ● ● ●●●●●●●● ● ● ● ●●●● ● ●●● ●●● ● ●● ●●● ●● ●● ●●● ● ●● ●●●●●●●● ● ●●●● ● ●●●●●● ●●●● ●●● ● ●●●● ● ● ●● ● ● ●●●●●●● ●●● ●●● ●●● ● ●●● ●● ●● ● ● ●●● ●● ●●● ●●●●●●● ● ● ● ●●●● ● ● ●●●●● ● ●●●● ●● ● ● ● ● ● ● ● ● ●● ● ●● ● ●●●● ●● ●● ● ● ●● ● ● ●●●●● ● ● ●● ●●●● ● ● ●●● ● ● ●● ●● ●● ● ●● ● ●●●●●●●●●●● ●●● ● ●● ● ●●● ● ●●● ● ● ●●●●●●● ● ●● ● ●● ●●●●● ● ● ●● ● ● ●● ●● ● ● ●● ● ●●● ●●● ● ● ● ● ● ● ● ● ● ●●● ● ●●●●●●●●●●●●●● ● ● ●●●●●● ●● ●● ● ●● ●●● ●●● ●●● ● ●● ● ● ●● ●● ●●●●● ●● ●●● ●●●● ● ● ●●● ● ●● ●●● ● ●●●●●● ● ● ●● ●●● ●●●● ● ● ●●● ● ● ● ●●●● ● ●●●●● ●● ● ● ●● ● ●●●● ●●● ● ●● ●● ●● ● ●●●●● ●●●●● ●● ● 0 25 50 75 100 0.00 0.25 0.50 0.75 1.00 1.25 time (s)Current(pA)
- 18. Generating Events • What do we expect events from a given sequence to look like? 18 GCTACGATT Sample Current ●●● ● ●●●●● ●● ● ● ●● ● ●●● ● ●● ● ●●●●● ● ● ● ●●●● ● ●● ●● ●●●●● ● ● ●●●● ● ●● ●● ● ● ● ●●● ●●● ●●●● ●● ● ● ●●●● ● ● ● ● ●●● ●● ●●●●●●● ● ● ● ● ● ●●● ● ● ●●●●●●●● ● ● ● ●●●● ● ●●● ●●● ● ●● ●●● ●● ●● ●●● ● ●● ●●●●●●●● ● ●●●● ● ●●●●●● ●●●● ●●● ● ●●●● ● ● ●● ● ● ●●●●●●● ●●● ●●● ●●● ● ●●● ●● ●● ● ● ●●● ●● ●●● ●●●●●●● ● ● ● ●●●● ● ● ●●●●● ● ●●●● ●● ● ● ● ● ● ● ● ● ●● ● ●● ● ●●●● ●● ●● ● ● ●● ● ● ●●●●● ● ● ●● ●●●● ● ● ●●● ● ● ●● ●● ●● ● ●● ● ●●●●●●●●●●● ●●● ● ●● ● ●●● ● ●●● ● ● ●●●●●●● ● ●● ● ●● ●●●●● ● ● ●● ● ● ●● ●● ● ● ●● ● ●●● ●●● ● ● ● ● ● ● ● ● ● ●●● ● ●●●●●●●●●●●●●● ● ● ●●●●●● ●● ●● ● ●● ●●● ●●● ●●● ● ●● ● ● ●● ●● ●●●●● ●● ●●● ●●●● ● ● ●●● ● ●● ●●● ● ●●●●●● ● ● ●● ●●● ●●●● ● ● ●●● ● ● ● ●●●● ● ●●●●● ●● ● ● ●● ● ●●●● ●●● ● ●● ●● ●● ● ●●●●● ●●●●● ●● ● ●● ●● ●●●● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ●● ●●●●●●● ● ●● ● ●●●● ●● ● ● ● ● ● ●● ● ● ● ●●●●● ● ●●● ● ● ● ●● ● ● ●●●●●● ●● ● ●●●● ●● ● ● ●●● ●●●● ● ● ●●●●● ●● ●● ● ● ●●●●●●● 0 25 50 75 100 0.00 0.25 0.50 0.75 1.00 1.25 time (s)Current(pA)
- 19. Generating Events • What do we expect events from a given sequence to look like? 19 GCTACGATT Sample Current ●●● ● ●●●●● ●● ● ● ●● ● ●●● ● ●● ● ●●●●● ● ● ● ●●●● ● ●● ●● ●●●●● ● ● ●●●● ● ●● ●● ● ● ● ●●● ●●● ●●●● ●● ● ● ●●●● ● ● ● ● ●●● ●● ●●●●●●● ● ● ● ● ● ●●● ● ● ●●●●●●●● ● ● ● ●●●● ● ●●● ●●● ● ●● ●●● ●● ●● ●●● ● ●● ●●●●●●●● ● ●●●● ● ●●●●●● ●●●● ●●● ● ●●●● ● ● ●● ● ● ●●●●●●● ●●● ●●● ●●● ● ●●● ●● ●● ● ● ●●● ●● ●●● ●●●●●●● ● ● ● ●●●● ● ● ●●●●● ● ●●●● ●● ● ● ● ● ● ● ● ● ●● ● ●● ● ●●●● ●● ●● ● ● ●● ● ● ●●●●● ● ● ●● ●●●● ● ● ●●● ● ● ●● ●● ●● ● ●● ● ●●●●●●●●●●● ●●● ● ●● ● ●●● ● ●●● ● ● ●●●●●●● ● ●● ● ●● ●●●●● ● ● ●● ● ● ●● ●● ● ● ●● ● ●●● ●●● ● ● ● ● ● ● ● ● ● ●●● ● ●●●●●●●●●●●●●● ● ● ●●●●●● ●● ●● ● ●● ●●● ●●● ●●● ● ●● ● ● ●● ●● ●●●●● ●● ●●● ●●●● ● ● ●●● ● ●● ●●● ● ●●●●●● ● ● ●● ●●● ●●●● ● ● ●●● ● ● ● ●●●● ● ●●●●● ●● ● ● ●● ● ●●●● ●●● ● ●● ●● ●● ● ●●●●● ●●●●● ●● ● ●● ●● ●●●● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ●● ●●●●●●● ● ●● ● ●●●● ●● ● ● ● ● ● ●● ● ● ● ●●●●● ● ●●● ● ● ● ●● ● ● ●●●●●● ●● ● ●●●● ●● ● ● ●●● ●●●● ● ● ●●●●● ●● ●● ● ● ●●●●●●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ●●● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ●●● ● ● ●● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ●● ● ●● ● ● ● ●● ●● ●● ● ●● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●●● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●●●● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●●● ● ●● ● ● ●● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ●● ● 0 25 50 75 100 0.00 0.25 0.50 0.75 1.00 1.25 time (s)Current(pA)
- 20. Generating Events • What do we expect events from a given sequence to look like? 20 GCTACGATT Sample Current ●●● ● ●●●●● ●● ● ● ●● ● ●●● ● ●● ● ●●●●● ● ● ● ●●●● ● ●● ●● ●●●●● ● ● ●●●● ● ●● ●● ● ● ● ●●● ●●● ●●●● ●● ● ● ●●●● ● ● ● ● ●●● ●● ●●●●●●● ● ● ● ● ● ●●● ● ● ●●●●●●●● ● ● ● ●●●● ● ●●● ●●● ● ●● ●●● ●● ●● ●●● ● ●● ●●●●●●●● ● ●●●● ● ●●●●●● ●●●● ●●● ● ●●●● ● ● ●● ● ● ●●●●●●● ●●● ●●● ●●● ● ●●● ●● ●● ● ● ●●● ●● ●●● ●●●●●●● ● ● ● ●●●● ● ● ●●●●● ● ●●●● ●● ● ● ● ● ● ● ● ● ●● ● ●● ● ●●●● ●● ●● ● ● ●● ● ● ●●●●● ● ● ●● ●●●● ● ● ●●● ● ● ●● ●● ●● ● ●● ● ●●●●●●●●●●● ●●● ● ●● ● ●●● ● ●●● ● ● ●●●●●●● ● ●● ● ●● ●●●●● ● ● ●● ● ● ●● ●● ● ● ●● ● ●●● ●●● ● ● ● ● ● ● ● ● ● ●●● ● ●●●●●●●●●●●●●● ● ● ●●●●●● ●● ●● ● ●● ●●● ●●● ●●● ● ●● ● ● ●● ●● ●●●●● ●● ●●● ●●●● ● ● ●●● ● ●● ●●● ● ●●●●●● ● ● ●● ●●● ●●●● ● ● ●●● ● ● ● ●●●● ● ●●●●● ●● ● ● ●● ● ●●●● ●●● ● ●● ●● ●● ● ●●●●● ●●●●● ●● ● ●● ●● ●●●● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ●● ●●●●●●● ● ●● ● ●●●● ●● ● ● ● ● ● ●● ● ● ● ●●●●● ● ●●● ● ● ● ●● ● ● ●●●●●● ●● ● ●●●● ●● ● ● ●●● ●●●● ● ● ●●●●● ●● ●● ● ● ●●●●●●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ●●● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ●●● ● ● ●● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ●● ● ●● ● ● ● ●● ●● ●● ● ●● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●●● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●●●● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●●● ● ●● ● ● ●● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ●● ●● ● ● ●●● ● ● ●●● ●●● ● ●●● ● ● ● ●●● ● ● ● ●● ● ● ●● ● ● ● ● ●● ● ●● ●● 0 25 50 75 100 0.00 0.25 0.50 0.75 1.00 1.25 time (s)Current(pA)
- 21. Generating Events • What do we expect events from a given sequence to look like? 21 CTACGATT Sample Current ●●● ● ●●●●● ●● ● ● ●● ● ●●● ● ●● ● ●●●●● ● ● ● ●●●● ● ●● ●● ●●●●● ● ● ●●●● ● ●● ●● ● ● ● ●●● ●●● ●●●● ●● ● ● ●●●● ● ● ● ● ●●● ●● ●●●●●●● ● ● ● ● ● ●●● ● ● ●●●●●●●● ● ● ● ●●●● ● ●●● ●●● ● ●● ●●● ●● ●● ●●● ● ●● ●●●●●●●● ● ●●●● ● ●●●●●● ●●●● ●●● ● ●●●● ● ● ●● ● ● ●●●●●●● ●●● ●●● ●●● ● ●●● ●● ●● ● ● ●●● ●● ●●● ●●●●●●● ● ● ● ●●●● ● ● ●●●●● ● ●●●● ●● ● ● ● ● ● ● ● ● ●● ● ●● ● ●●●● ●● ●● ● ● ●● ● ● ●●●●● ● ● ●● ●●●● ● ● ●●● ● ● ●● ●● ●● ● ●● ● ●●●●●●●●●●● ●●● ● ●● ● ●●● ● ●●● ● ● ●●●●●●● ● ●● ● ●● ●●●●● ● ● ●● ● ● ●● ●● ● ● ●● ● ●●● ●●● ● ● ● ● ● ● ● ● ● ●●● ● ●●●●●●●●●●●●●● ● ● ●●●●●● ●● ●● ● ●● ●●● ●●● ●●● ● ●● ● ● ●● ●● ●●●●● ●● ●●● ●●●● ● ● ●●● ● ●● ●●● ● ●●●●●● ● ● ●● ●●● ●●●● ● ● ●●● ● ● ● ●●●● ● ●●●●● ●● ● ● ●● ● ●●●● ●●● ● ●● ●● ●● ● ●●●●● ●●●●● ●● ● ●● ●● ●●●● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ●● ●●●●●●● ● ●● ● ●●●● ●● ● ● ● ● ● ●● ● ● ● ●●●●● ● ●●● ● ● ● ●● ● ● ●●●●●● ●● ● ●●●● ●● ● ● ●●● ●●●● ● ● ●●●●● ●● ●● ● ● ●●●●●●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ●●● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ●●● ● ● ●● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ●● ● ●● ● ● ● ●● ●● ●● ● ●● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●●● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●●●● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●●● ● ●● ● ● ●● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ●● ●● ● ● ●●● ● ● ●●● ●●● ● ●●● ● ● ● ●●● ● ● ● ●● ● ● ●● ● ● ● ● ●● ● ●● ●● ● ● ● ● ●● ● ● ● ● ● ●● ● ●●●● ●●● ● ● ●● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ●●●●●● ● ● ● ● ●● ●●●● ● ●●● ●● ● ● ● ● ●●● ●● ● ● ● ●●● ● ● ● ●● ●● ● ● ● ● ●● ● ●● ●●●● ●●● ● ● ● ● ●● ●● ●●● ● ● ●● ●●● ● ● ● ●●● ●●●●● ● ● ● ● ● ● 0 25 50 75 100 0.00 0.25 0.50 0.75 1.00 1.25 time (s)Current(pA)
- 22. Event Detection 22 ●●● ● ●●●●● ●● ● ● ●● ● ●●● ● ●● ● ●●●●● ● ● ● ●●●● ● ●● ●● ●●●●● ● ● ●●●● ● ●● ●● ● ● ● ●●● ●●● ●●●● ●● ● ● ●●●● ● ● ● ● ●●● ●● ●●●●●●● ● ● ● ● ● ●●● ● ● ●●●●●●●● ● ● ● ●●●● ● ●●● ●●● ● ●● ●●● ●● ●● ●●● ● ●● ●●●●●●●● ● ●●●● ● ●●●●●● ●●●● ●●● ● ●●●● ● ● ●● ● ● ●●●●●●● ●●● ●●● ●●● ● ●●● ●● ●● ● ● ●●● ●● ●●● ●●●●●●● ● ● ● ●●●● ● ● ●●●●● ● ●●●● ●● ● ● ● ● ● ● ● ● ●● ● ●● ● ●●●● ●● ●● ● ● ●● ● ● ●●●●● ● ● ●● ●●●● ● ● ●●● ● ● ●● ●● ●● ● ●● ● ●●●●●●●●●●● ●●● ● ●● ● ●●● ● ●●● ● ● ●●●●●●● ● ●● ● ●● ●●●●● ● ● ●● ● ● ●● ●● ● ● ●● ● ●●● ●●● ● ● ● ● ● ● ● ● ● ●●● ● ●●●●●●●●●●●●●● ● ● ●●●●●● ●● ●● ● ●● ●●● ●●● ●●● ● ●● ● ● ●● ●● ●●●●● ●● ●●● ●●●● ● ● ●●● ● ●● ●●● ● ●●●●●● ● ● ●● ●●● ●●●● ● ● ●●● ● ● ● ●●●● ● ●●●●● ●● ● ● ●● ● ●●●● ●●● ● ●● ●● ●● ● ●●●●● ●●●●● ●● ● ●● ●● ●●●● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ●● ●●●●●●● ● ●● ● ●●●● ●● ● ● ● ● ● ●● ● ● ● ●●●●● ● ●●● ● ● ● ●● ● ● ●●●●●● ●● ● ●●●● ●● ● ● ●●● ●●●● ● ● ●●●●● ●● ●● ● ● ●●●●●●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ●●● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ●●● ● ● ●● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ●● ● ●● ● ● ● ●● ●● ●● ● ●● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●●● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●●●● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●●● ● ●● ● ● ●● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ●● ●● ● ● ●●● ● ● ●●● ●●● ● ●●● ● ● ● ●●● ● ● ● ●● ● ● ●● ● ● ● ● ●● ● ●● ●● ● ● ● ● ●● ● ● ● ● ● ●● ● ●●●● ●●● ● ● ●● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ●●●●●● ● ● ● ● ●● ●●●● ● ●●● ●● ● ● ● ● ●●● ●● ● ● ● ●●● ● ● ● ●● ●● ● ● ● ● ●● ● ●● ●●●● ●●● ● ● ● ● ●● ●● ●●● ● ● ●● ●●● ● ● ● ●●● ●●●●● ● ● ● ● ● ● 0 25 50 75 100 0.00 0.25 0.50 0.75 1.00 1.25 time (s)Current(pA) Event mean current (pA) current stdv duration (s) 1 60.3 0.7 0.521 2 40.6 1.0 0.112 3 52.2 2.0 0.356 4 54.1 1.2 0.291 5 49.5 1.5 0.141
- 23. A simple model • What is the probability of observing events E given a sequence S? • Assuming for the moment there are no missing or extra events: 23 P(e1, e2, ..., en|s1, s2, ..., sn, ⇥) = nY i=1 P(ei|si, µsi , si ) P(ei|k, µk, k) = N(µk, 2 k)
- 24. Complications 24 ● ● ●● ●● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ●●● ● ● ● ● ● ● ●● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ●● ●● ● ●●●● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●●● ● ●● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ●●● ● ●● ●● ● ● ●● ● ● ● ● ● ●● ●● ●● ● ● ●● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ●● ●● ● ●● ●● ● ●● ● ● ● ● ● ● ●● ● ● ●●● ● ● ● ●● ● ● ● ●● ● ● ●●● ● ● ● ● ● ● ● ● ● ●● ●●● ● ●● ●● ● ● ●●● ● ● ●●●● ● ● ●● ● ● ● ● ● ●●● ● ● ● ●●● ● ● ● ● ● ● ● ● ●● ● ● ●●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ●● ● ● ●● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ●●●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●●●● ● ● ● ● ●● ● ● ● ● ● ● ● ●●● ●● ● ● ● ● ● ●●● ● ● ● ● ● 30 40 50 60 70 0.0 0.2 0.4 0.6 time (s) Current(pA)
- 25. Complications 25 ● ● ●● ●● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ●●● ● ● ● ● ● ● ●● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ●● ●● ● ●●●● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●●● ● ●● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ●●● ● ●● ●● ● ● ●● ● ● ● ● ● ●● ●● ●● ● ● ●● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ●● ●● ● ●● ●● ● ●● ● ● ● ● ● ● ●● ● ● ●●● ● ● ● ●● ● ● ● ●● ● ● ●●● ● ● ● ● ● ● ● ● ● ●● ●●● ● ●● ●● ● ● ●●● ● ● ●●●● ● ● ●● ● ● ● ● ● ●●● ● ● ● ●●● ● ● ● ● ● ● ● ● ●● ● ● ●●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ●● ● ● ●● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ●●●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●●●● ● ● ● ● ●● ● ● ● ● ● ● ● ●●● ●● ● ● ● ● ● ●●● ● ● ● ● ● 30 40 50 60 70 0.0 0.2 0.4 0.6 time (s) Current(pA) Is this an event ?
- 26. Complications 26 ● ● ●● ●● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ●●● ● ● ● ● ● ● ●● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ●● ●● ● ●●●● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●●● ● ●● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ●●● ● ●● ●● ● ● ●● ● ● ● ● ● ●● ●● ●● ● ● ●● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ●● ●● ● ●● ●● ● ●● ● ● ● ● ● ● ●● ● ● ●●● ● ● ● ●● ● ● ● ●● ● ● ●●● ● ● ● ● ● ● ● ● ● ●● ●●● ● ●● ●● ● ● ●●● ● ● ●●●● ● ● ●● ● ● ● ● ● ●●● ● ● ● ●●● ● ● ● ● ● ● ● ● ●● ● ● ●●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ●● ● ● ●● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ●●●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●●●● ● ● ● ● ●● ● ● ● ● ● ● ● ●●● ●● ● ● ● ● ● ●●● ● ● ● ● ● 30 40 50 60 70 0.0 0.2 0.4 0.6 time (s) Current(pA) One event or two ?
- 27. Nanopore HMM • must consider: • over segmentation • under segmentation • missed short events • HMM: • M states: match event to 5-mers • E states: extra obs. of an event • K states: no event obs. for 5-mer 27 P(D|S) P(⇡, e1, e2, ..., en|S, ⇥) = nY i=1 P(ei|⇡i, µsi , si )P(⇡i|⇡i 1, S) P(e1, e2, ..., en|S, ⇥) = X ⇡ P(⇡, e1, e2, ..., en|S, ⇥)
- 28. Transition Probabilities • Probability of not observing an event is a function of absolute difference between (expected) current 28 ● ● ●● ●● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ●●● ● ● ● ● ● ● ●● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ●● ●● ● ●●●● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●●● ● ●● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ●●● ● ●● ●● ● ● ●● ● ● ● ● ● ●● ●● ●● ● ● ●● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ●● ●● ● ●● ●● ● ●● ● ● ● ● ● ● ●● ● ● ●●● ● ● ● ●● ● ● ● ●● ● ● ●●● ● ● ● ● ● ● ● ● ● ●● ●●● ● ●● ●● ● ● ●●● ● ● ●●●● ● ● ●● ● ● ● ● ● ●●● ● ● ● ●●● ● ● ● ● ● ● ● ● ●● ● ● ●●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ●● ● ● ●● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ●●●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●●●● ● ● ● ● ●● ● ● ● ● ● ● ● ●●● ●● ● ● ● ● ● ●●● ● ● ● ● ● 30 40 50 60 70 0.0 0.2 0.4 0.6 time (s) Current(pA)
- 29. Transition Probabilities • Probability of not observing an event is a function of absolute difference between (expected) current 29 ● ● ●● ●● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ●●● ● ● ● ● ● ● ●● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ●● ●● ● ●●●● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●●● ● ●● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ●●● ● ●● ●● ● ● ●● ● ● ● ● ● ●● ●● ●● ● ● ●● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ●● ●● ● ●● ●● ● ●● ● ● ● ● ● ● ●● ● ● ●●● ● ● ● ●● ● ● ● ●● ● ● ●●● ● ● ● ● ● ● ● ● ● ●● ●●● ● ●● ●● ● ● ●●● ● ● ●●●● ● ● ●● ● ● ● ● ● ●●● ● ● ● ●●● ● ● ● ● ● ● ● ● ●● ● ● ●●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ●● ● ● ●● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ●●●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●●●● ● ● ● ● ●● ● ● ● ● ● ● ● ●●● ●● ● ● ● ● ● ●●● ● ● ● ● ● 30 40 50 60 70 0.0 0.2 0.4 0.6 time (s) Current(pA)
- 30. Transition Probabilities 30 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.1 0.2 0.3 0.4 0.5 0 5 10 15 20 absolute difference (pA) SkipProbability
- 31. Assembly Accuracy 31 0 5000 10000 0 5000 10000 5 mer count in reference 5mercountindraftassembly 0 3000 6000 9000 12000 TTTTT AAAAA TTTTG CAAAA CTTTT AAAAG CCCCC GGGGG kmer count draft reference y 12000 A C B D 0 5000 0 5000 10000 5 mer count in reference 5mercou 0 5000 10000 0 5000 10000 5 mer count in reference 5mercountinpolishedassembly C D Draft: 98.5% accuracy Polished: 99.5% accuracy
- 32. Assembly Accuracy 32 0 3000 6000 9000 12000 TTTTT AAAAA TTTTG CAAAA CTTTT AAAAG CCCCC GGGGG kmer count draft reference 9000 12000 B D 0 0 5000 10000 5 mer count in reference 5mer 0 3000 TTTTT AAAAA TTTTG CAAAA CTTTT AAAAG CCCCC GGGGG kmer 0 5000 10000 0 5000 10000 5 mer count in reference 5mercountinpolishedassembly 0 3000 6000 9000 12000 TTTTT AAAAA TTTTG CAAAA CTTTT AAAAG CCCCC GGGGG kmer count polished reference C D
- 33. Aligning Events to a Reference • HMM can also align events to a reference genome ! ! ! ! ! • Read about it here: • http://simpsonlab.github.io/2015/04/08/eventalign/ 33
- 34. Planned Improvements • Model dwell duration to better call homopolymers ! ! ! • SNP calling/genotyping ! ! ! • Improve scalability to handle larger genomes • Use signal data during error correction 34 CTAAAAAAAAAAAAGTACA P(gi|D) = P(D|gi)P(gi) P(D)
- 35. Acknowledgements & Code • Collaborators: • Nick Loman, Josh Quick (Birmingham) • Jonathan Dursi (OICR) ! ! • Code: • github.com/jts/nanocorrect (error correction) • github.com/jts/nanopolish (signal-level algorithms) • github.com/jts/nanopore-paper-analysis (reproduce our paper) 35

No public clipboards found for this slide

Be the first to comment