May. 14, 2015•0 likes•7,080 views

Download to read offline

Report

Science

London Calling 2015

Jared SimpsonFollow

Selection analysis using HyPhyBioinformatics and Computational Biosciences Branch

Industrial plant optimization in reduced dimensional spacesCapstone

Hybrid of Ant Colony Optimization and Gravitational Emulation Based Load Bala...IRJET Journal

A Cooperative Coevolutionary Approach to Maximise Surveillance Coverage of UA...Daniel H. Stolfi

170216 jts agbt_finalJared Simpson

2014 khmer protocolsc.titus.brown

- 1. Jared Simpson ! Ontario Institute for Cancer Research & Department of Computer Science University ofToronto Error correction, assembly and consensus algorithms for MinION data London Calling, May 14th, 2015
- 2. Our collaboration 2
- 3. An overview of NGS assembly • Illumina data: short reads, very accurate, very deep • nearly all Illumina assembly is based on exact matching algorithms • fragmented assemblies ! • Algorithms for Illumina data do not work for long, noisy reads • PacBio developed a pipeline (“HGAP”) to assemble their data • We used this recipe as a starting point but with custom components 3
- 4. Long read assembly pipeline 4 Error correction Celera Assembler Consensus Input reads Genome Assembly
- 5. Input Data • First challenge is ﬁnding overlaps for reads with 15-20% errors 5
- 6. Overlap Detection 6 we use github.com/thegenemyers/daligner to compute overlaps
- 7. Partial Order Graphs 7 add read GCTACGAT that we want to correct to graph
- 8. Partial Order Graphs 8 add sequence GCTCGAT to graph
- 9. Partial Order Graphs 9 add sequence GCTCGATT to graph
- 10. Partial Order Graphs 10 maximum weight path GCTCGAT is the corrected read
- 11. Error Correction 11
- 12. Contig Assembly 12 Celera Assembler produces one contig at 98.5% identity
- 13. Assembly Polishing • Consensus problem is viewed as choosing a sequence C’ that maximizes the probability of the event data 13 C0 = arg max S2C P(D|S) P(D|S) = rY k=1 P(ei,k, ei+1,k, ..., ej,k|S, ⇥) where
- 14. Selecting a Consensus 14 Mutate ACTACGATCGACTTACGA CCTACGATCGACTTACGA TCTACGATCGACTTACGA ... -CTACGATCGACTTACGA G-TACGATCGACTTACGA GC-ACGATCGACTTACGA GCT-CGATCGACTTACGA ... GACTACGATCGACTTACGA GCCTACGATCGACTTACGA GGCTACGATCGACTTACGA GTCTACGATCGACTTACGA P(D|S) -190 -187 -192 -176 -191 -193 -168 -198 -191 -195 -181 GCTACGATCGACTTACGA
- 15. Selecting a Consensus 15 GCT-CGATCGACTTACGA Mutate A C T ... - G GC GCT ... G G G G P(D|S) -190 -187 -192 -176 -191 -193 -168 -198 -191 -195 -181 Select new consensus GCT-CGATCGACTTACGA -168
- 16. Pore Models 16
- 17. Generating Events • What do we expect events from a given sequence to look like? 17 GCTACGATT Sample Current ●●● ● ●●●●● ●● ● ● ●● ● ●●● ● ●● ● ●●●●● ● ● ● ●●●● ● ●● ●● ●●●●● ● ● ●●●● ● ●● ●● ● ● ● ●●● ●●● ●●●● ●● ● ● ●●●● ● ● ● ● ●●● ●● ●●●●●●● ● ● ● ● ● ●●● ● ● ●●●●●●●● ● ● ● ●●●● ● ●●● ●●● ● ●● ●●● ●● ●● ●●● ● ●● ●●●●●●●● ● ●●●● ● ●●●●●● ●●●● ●●● ● ●●●● ● ● ●● ● ● ●●●●●●● ●●● ●●● ●●● ● ●●● ●● ●● ● ● ●●● ●● ●●● ●●●●●●● ● ● ● ●●●● ● ● ●●●●● ● ●●●● ●● ● ● ● ● ● ● ● ● ●● ● ●● ● ●●●● ●● ●● ● ● ●● ● ● ●●●●● ● ● ●● ●●●● ● ● ●●● ● ● ●● ●● ●● ● ●● ● ●●●●●●●●●●● ●●● ● ●● ● ●●● ● ●●● ● ● ●●●●●●● ● ●● ● ●● ●●●●● ● ● ●● ● ● ●● ●● ● ● ●● ● ●●● ●●● ● ● ● ● ● ● ● ● ● ●●● ● ●●●●●●●●●●●●●● ● ● ●●●●●● ●● ●● ● ●● ●●● ●●● ●●● ● ●● ● ● ●● ●● ●●●●● ●● ●●● ●●●● ● ● ●●● ● ●● ●●● ● ●●●●●● ● ● ●● ●●● ●●●● ● ● ●●● ● ● ● ●●●● ● ●●●●● ●● ● ● ●● ● ●●●● ●●● ● ●● ●● ●● ● ●●●●● ●●●●● ●● ● 0 25 50 75 100 0.00 0.25 0.50 0.75 1.00 1.25 time (s)Current(pA)
- 18. Generating Events • What do we expect events from a given sequence to look like? 18 GCTACGATT Sample Current ●●● ● ●●●●● ●● ● ● ●● ● ●●● ● ●● ● ●●●●● ● ● ● ●●●● ● ●● ●● ●●●●● ● ● ●●●● ● ●● ●● ● ● ● ●●● ●●● ●●●● ●● ● ● ●●●● ● ● ● ● ●●● ●● ●●●●●●● ● ● ● ● ● ●●● ● ● ●●●●●●●● ● ● ● ●●●● ● ●●● ●●● ● ●● ●●● ●● ●● ●●● ● ●● ●●●●●●●● ● ●●●● ● ●●●●●● ●●●● ●●● ● ●●●● ● ● ●● ● ● ●●●●●●● ●●● ●●● ●●● ● ●●● ●● ●● ● ● ●●● ●● ●●● ●●●●●●● ● ● ● ●●●● ● ● ●●●●● ● ●●●● ●● ● ● ● ● ● ● ● ● ●● ● ●● ● ●●●● ●● ●● ● ● ●● ● ● ●●●●● ● ● ●● ●●●● ● ● ●●● ● ● ●● ●● ●● ● ●● ● ●●●●●●●●●●● ●●● ● ●● ● ●●● ● ●●● ● ● ●●●●●●● ● ●● ● ●● ●●●●● ● ● ●● ● ● ●● ●● ● ● ●● ● ●●● ●●● ● ● ● ● ● ● ● ● ● ●●● ● ●●●●●●●●●●●●●● ● ● ●●●●●● ●● ●● ● ●● ●●● ●●● ●●● ● ●● ● ● ●● ●● ●●●●● ●● ●●● ●●●● ● ● ●●● ● ●● ●●● ● ●●●●●● ● ● ●● ●●● ●●●● ● ● ●●● ● ● ● ●●●● ● ●●●●● ●● ● ● ●● ● ●●●● ●●● ● ●● ●● ●● ● ●●●●● ●●●●● ●● ● ●● ●● ●●●● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ●● ●●●●●●● ● ●● ● ●●●● ●● ● ● ● ● ● ●● ● ● ● ●●●●● ● ●●● ● ● ● ●● ● ● ●●●●●● ●● ● ●●●● ●● ● ● ●●● ●●●● ● ● ●●●●● ●● ●● ● ● ●●●●●●● 0 25 50 75 100 0.00 0.25 0.50 0.75 1.00 1.25 time (s)Current(pA)
- 19. Generating Events • What do we expect events from a given sequence to look like? 19 GCTACGATT Sample Current ●●● ● ●●●●● ●● ● ● ●● ● ●●● ● ●● ● ●●●●● ● ● ● ●●●● ● ●● ●● ●●●●● ● ● ●●●● ● ●● ●● ● ● ● ●●● ●●● ●●●● ●● ● ● ●●●● ● ● ● ● ●●● ●● ●●●●●●● ● ● ● ● ● ●●● ● ● ●●●●●●●● ● ● ● ●●●● ● ●●● ●●● ● ●● ●●● ●● ●● ●●● ● ●● ●●●●●●●● ● ●●●● ● ●●●●●● ●●●● ●●● ● ●●●● ● ● ●● ● ● ●●●●●●● ●●● ●●● ●●● ● ●●● ●● ●● ● ● ●●● ●● ●●● ●●●●●●● ● ● ● ●●●● ● ● ●●●●● ● ●●●● ●● ● ● ● ● ● ● ● ● ●● ● ●● ● ●●●● ●● ●● ● ● ●● ● ● ●●●●● ● ● ●● ●●●● ● ● ●●● ● ● ●● ●● ●● ● ●● ● ●●●●●●●●●●● ●●● ● ●● ● ●●● ● ●●● ● ● ●●●●●●● ● ●● ● ●● ●●●●● ● ● ●● ● ● ●● ●● ● ● ●● ● ●●● ●●● ● ● ● ● ● ● ● ● ● ●●● ● ●●●●●●●●●●●●●● ● ● ●●●●●● ●● ●● ● ●● ●●● ●●● ●●● ● ●● ● ● ●● ●● ●●●●● ●● ●●● ●●●● ● ● ●●● ● ●● ●●● ● ●●●●●● ● ● ●● ●●● ●●●● ● ● ●●● ● ● ● ●●●● ● ●●●●● ●● ● ● ●● ● ●●●● ●●● ● ●● ●● ●● ● ●●●●● ●●●●● ●● ● ●● ●● ●●●● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ●● ●●●●●●● ● ●● ● ●●●● ●● ● ● ● ● ● ●● ● ● ● ●●●●● ● ●●● ● ● ● ●● ● ● ●●●●●● ●● ● ●●●● ●● ● ● ●●● ●●●● ● ● ●●●●● ●● ●● ● ● ●●●●●●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ●●● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ●●● ● ● ●● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ●● ● ●● ● ● ● ●● ●● ●● ● ●● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●●● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●●●● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●●● ● ●● ● ● ●● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ●● ● 0 25 50 75 100 0.00 0.25 0.50 0.75 1.00 1.25 time (s)Current(pA)
- 20. Generating Events • What do we expect events from a given sequence to look like? 20 GCTACGATT Sample Current ●●● ● ●●●●● ●● ● ● ●● ● ●●● ● ●● ● ●●●●● ● ● ● ●●●● ● ●● ●● ●●●●● ● ● ●●●● ● ●● ●● ● ● ● ●●● ●●● ●●●● ●● ● ● ●●●● ● ● ● ● ●●● ●● ●●●●●●● ● ● ● ● ● ●●● ● ● ●●●●●●●● ● ● ● ●●●● ● ●●● ●●● ● ●● ●●● ●● ●● ●●● ● ●● ●●●●●●●● ● ●●●● ● ●●●●●● ●●●● ●●● ● ●●●● ● ● ●● ● ● ●●●●●●● ●●● ●●● ●●● ● ●●● ●● ●● ● ● ●●● ●● ●●● ●●●●●●● ● ● ● ●●●● ● ● ●●●●● ● ●●●● ●● ● ● ● ● ● ● ● ● ●● ● ●● ● ●●●● ●● ●● ● ● ●● ● ● ●●●●● ● ● ●● ●●●● ● ● ●●● ● ● ●● ●● ●● ● ●● ● ●●●●●●●●●●● ●●● ● ●● ● ●●● ● ●●● ● ● ●●●●●●● ● ●● ● ●● ●●●●● ● ● ●● ● ● ●● ●● ● ● ●● ● ●●● ●●● ● ● ● ● ● ● ● ● ● ●●● ● ●●●●●●●●●●●●●● ● ● ●●●●●● ●● ●● ● ●● ●●● ●●● ●●● ● ●● ● ● ●● ●● ●●●●● ●● ●●● ●●●● ● ● ●●● ● ●● ●●● ● ●●●●●● ● ● ●● ●●● ●●●● ● ● ●●● ● ● ● ●●●● ● ●●●●● ●● ● ● ●● ● ●●●● ●●● ● ●● ●● ●● ● ●●●●● ●●●●● ●● ● ●● ●● ●●●● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ●● ●●●●●●● ● ●● ● ●●●● ●● ● ● ● ● ● ●● ● ● ● ●●●●● ● ●●● ● ● ● ●● ● ● ●●●●●● ●● ● ●●●● ●● ● ● ●●● ●●●● ● ● ●●●●● ●● ●● ● ● ●●●●●●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ●●● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ●●● ● ● ●● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ●● ● ●● ● ● ● ●● ●● ●● ● ●● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●●● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●●●● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●●● ● ●● ● ● ●● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ●● ●● ● ● ●●● ● ● ●●● ●●● ● ●●● ● ● ● ●●● ● ● ● ●● ● ● ●● ● ● ● ● ●● ● ●● ●● 0 25 50 75 100 0.00 0.25 0.50 0.75 1.00 1.25 time (s)Current(pA)
- 21. Generating Events • What do we expect events from a given sequence to look like? 21 CTACGATT Sample Current ●●● ● ●●●●● ●● ● ● ●● ● ●●● ● ●● ● ●●●●● ● ● ● ●●●● ● ●● ●● ●●●●● ● ● ●●●● ● ●● ●● ● ● ● ●●● ●●● ●●●● ●● ● ● ●●●● ● ● ● ● ●●● ●● ●●●●●●● ● ● ● ● ● ●●● ● ● ●●●●●●●● ● ● ● ●●●● ● ●●● ●●● ● ●● ●●● ●● ●● ●●● ● ●● ●●●●●●●● ● ●●●● ● ●●●●●● ●●●● ●●● ● ●●●● ● ● ●● ● ● ●●●●●●● ●●● ●●● ●●● ● ●●● ●● ●● ● ● ●●● ●● ●●● ●●●●●●● ● ● ● ●●●● ● ● ●●●●● ● ●●●● ●● ● ● ● ● ● ● ● ● ●● ● ●● ● ●●●● ●● ●● ● ● ●● ● ● ●●●●● ● ● ●● ●●●● ● ● ●●● ● ● ●● ●● ●● ● ●● ● ●●●●●●●●●●● ●●● ● ●● ● ●●● ● ●●● ● ● ●●●●●●● ● ●● ● ●● ●●●●● ● ● ●● ● ● ●● ●● ● ● ●● ● ●●● ●●● ● ● ● ● ● ● ● ● ● ●●● ● ●●●●●●●●●●●●●● ● ● ●●●●●● ●● ●● ● ●● ●●● ●●● ●●● ● ●● ● ● ●● ●● ●●●●● ●● ●●● ●●●● ● ● ●●● ● ●● ●●● ● ●●●●●● ● ● ●● ●●● ●●●● ● ● ●●● ● ● ● ●●●● ● ●●●●● ●● ● ● ●● ● ●●●● ●●● ● ●● ●● ●● ● ●●●●● ●●●●● ●● ● ●● ●● ●●●● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ●● ●●●●●●● ● ●● ● ●●●● ●● ● ● ● ● ● ●● ● ● ● ●●●●● ● ●●● ● ● ● ●● ● ● ●●●●●● ●● ● ●●●● ●● ● ● ●●● ●●●● ● ● ●●●●● ●● ●● ● ● ●●●●●●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ●●● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ●●● ● ● ●● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ●● ● ●● ● ● ● ●● ●● ●● ● ●● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●●● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●●●● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●●● ● ●● ● ● ●● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ●● ●● ● ● ●●● ● ● ●●● ●●● ● ●●● ● ● ● ●●● ● ● ● ●● ● ● ●● ● ● ● ● ●● ● ●● ●● ● ● ● ● ●● ● ● ● ● ● ●● ● ●●●● ●●● ● ● ●● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ●●●●●● ● ● ● ● ●● ●●●● ● ●●● ●● ● ● ● ● ●●● ●● ● ● ● ●●● ● ● ● ●● ●● ● ● ● ● ●● ● ●● ●●●● ●●● ● ● ● ● ●● ●● ●●● ● ● ●● ●●● ● ● ● ●●● ●●●●● ● ● ● ● ● ● 0 25 50 75 100 0.00 0.25 0.50 0.75 1.00 1.25 time (s)Current(pA)
- 22. Event Detection 22 ●●● ● ●●●●● ●● ● ● ●● ● ●●● ● ●● ● ●●●●● ● ● ● ●●●● ● ●● ●● ●●●●● ● ● ●●●● ● ●● ●● ● ● ● ●●● ●●● ●●●● ●● ● ● ●●●● ● ● ● ● ●●● ●● ●●●●●●● ● ● ● ● ● ●●● ● ● ●●●●●●●● ● ● ● ●●●● ● ●●● ●●● ● ●● ●●● ●● ●● ●●● ● ●● ●●●●●●●● ● ●●●● ● ●●●●●● ●●●● ●●● ● ●●●● ● ● ●● ● ● ●●●●●●● ●●● ●●● ●●● ● ●●● ●● ●● ● ● ●●● ●● ●●● ●●●●●●● ● ● ● ●●●● ● ● ●●●●● ● ●●●● ●● ● ● ● ● ● ● ● ● ●● ● ●● ● ●●●● ●● ●● ● ● ●● ● ● ●●●●● ● ● ●● ●●●● ● ● ●●● ● ● ●● ●● ●● ● ●● ● ●●●●●●●●●●● ●●● ● ●● ● ●●● ● ●●● ● ● ●●●●●●● ● ●● ● ●● ●●●●● ● ● ●● ● ● ●● ●● ● ● ●● ● ●●● ●●● ● ● ● ● ● ● ● ● ● ●●● ● ●●●●●●●●●●●●●● ● ● ●●●●●● ●● ●● ● ●● ●●● ●●● ●●● ● ●● ● ● ●● ●● ●●●●● ●● ●●● ●●●● ● ● ●●● ● ●● ●●● ● ●●●●●● ● ● ●● ●●● ●●●● ● ● ●●● ● ● ● ●●●● ● ●●●●● ●● ● ● ●● ● ●●●● ●●● ● ●● ●● ●● ● ●●●●● ●●●●● ●● ● ●● ●● ●●●● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ●● ●●●●●●● ● ●● ● ●●●● ●● ● ● ● ● ● ●● ● ● ● ●●●●● ● ●●● ● ● ● ●● ● ● ●●●●●● ●● ● ●●●● ●● ● ● ●●● ●●●● ● ● ●●●●● ●● ●● ● ● ●●●●●●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ●●● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ●●● ● ● ●● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ●● ● ●● ● ● ● ●● ●● ●● ● ●● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●●● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●●●● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●●● ● ●● ● ● ●● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ●● ●● ● ● ●●● ● ● ●●● ●●● ● ●●● ● ● ● ●●● ● ● ● ●● ● ● ●● ● ● ● ● ●● ● ●● ●● ● ● ● ● ●● ● ● ● ● ● ●● ● ●●●● ●●● ● ● ●● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ●●●●●● ● ● ● ● ●● ●●●● ● ●●● ●● ● ● ● ● ●●● ●● ● ● ● ●●● ● ● ● ●● ●● ● ● ● ● ●● ● ●● ●●●● ●●● ● ● ● ● ●● ●● ●●● ● ● ●● ●●● ● ● ● ●●● ●●●●● ● ● ● ● ● ● 0 25 50 75 100 0.00 0.25 0.50 0.75 1.00 1.25 time (s)Current(pA) Event mean current (pA) current stdv duration (s) 1 60.3 0.7 0.521 2 40.6 1.0 0.112 3 52.2 2.0 0.356 4 54.1 1.2 0.291 5 49.5 1.5 0.141
- 23. A simple model • What is the probability of observing events E given a sequence S? • Assuming for the moment there are no missing or extra events: 23 P(e1, e2, ..., en|s1, s2, ..., sn, ⇥) = nY i=1 P(ei|si, µsi , si ) P(ei|k, µk, k) = N(µk, 2 k)
- 24. Complications 24 ● ● ●● ●● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ●●● ● ● ● ● ● ● ●● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ●● ●● ● ●●●● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●●● ● ●● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ●●● ● ●● ●● ● ● ●● ● ● ● ● ● ●● ●● ●● ● ● ●● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ●● ●● ● ●● ●● ● ●● ● ● ● ● ● ● ●● ● ● ●●● ● ● ● ●● ● ● ● ●● ● ● ●●● ● ● ● ● ● ● ● ● ● ●● ●●● ● ●● ●● ● ● ●●● ● ● ●●●● ● ● ●● ● ● ● ● ● ●●● ● ● ● ●●● ● ● ● ● ● ● ● ● ●● ● ● ●●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ●● ● ● ●● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ●●●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●●●● ● ● ● ● ●● ● ● ● ● ● ● ● ●●● ●● ● ● ● ● ● ●●● ● ● ● ● ● 30 40 50 60 70 0.0 0.2 0.4 0.6 time (s) Current(pA)
- 25. Complications 25 ● ● ●● ●● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ●●● ● ● ● ● ● ● ●● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ●● ●● ● ●●●● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●●● ● ●● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ●●● ● ●● ●● ● ● ●● ● ● ● ● ● ●● ●● ●● ● ● ●● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ●● ●● ● ●● ●● ● ●● ● ● ● ● ● ● ●● ● ● ●●● ● ● ● ●● ● ● ● ●● ● ● ●●● ● ● ● ● ● ● ● ● ● ●● ●●● ● ●● ●● ● ● ●●● ● ● ●●●● ● ● ●● ● ● ● ● ● ●●● ● ● ● ●●● ● ● ● ● ● ● ● ● ●● ● ● ●●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ●● ● ● ●● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ●●●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●●●● ● ● ● ● ●● ● ● ● ● ● ● ● ●●● ●● ● ● ● ● ● ●●● ● ● ● ● ● 30 40 50 60 70 0.0 0.2 0.4 0.6 time (s) Current(pA) Is this an event ?
- 26. Complications 26 ● ● ●● ●● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ●●● ● ● ● ● ● ● ●● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ●● ●● ● ●●●● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●●● ● ●● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ●●● ● ●● ●● ● ● ●● ● ● ● ● ● ●● ●● ●● ● ● ●● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ●● ●● ● ●● ●● ● ●● ● ● ● ● ● ● ●● ● ● ●●● ● ● ● ●● ● ● ● ●● ● ● ●●● ● ● ● ● ● ● ● ● ● ●● ●●● ● ●● ●● ● ● ●●● ● ● ●●●● ● ● ●● ● ● ● ● ● ●●● ● ● ● ●●● ● ● ● ● ● ● ● ● ●● ● ● ●●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ●● ● ● ●● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ●●●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●●●● ● ● ● ● ●● ● ● ● ● ● ● ● ●●● ●● ● ● ● ● ● ●●● ● ● ● ● ● 30 40 50 60 70 0.0 0.2 0.4 0.6 time (s) Current(pA) One event or two ?
- 27. Nanopore HMM • must consider: • over segmentation • under segmentation • missed short events • HMM: • M states: match event to 5-mers • E states: extra obs. of an event • K states: no event obs. for 5-mer 27 P(D|S) P(⇡, e1, e2, ..., en|S, ⇥) = nY i=1 P(ei|⇡i, µsi , si )P(⇡i|⇡i 1, S) P(e1, e2, ..., en|S, ⇥) = X ⇡ P(⇡, e1, e2, ..., en|S, ⇥)
- 28. Transition Probabilities • Probability of not observing an event is a function of absolute difference between (expected) current 28 ● ● ●● ●● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ●●● ● ● ● ● ● ● ●● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ●● ●● ● ●●●● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●●● ● ●● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ●●● ● ●● ●● ● ● ●● ● ● ● ● ● ●● ●● ●● ● ● ●● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ●● ●● ● ●● ●● ● ●● ● ● ● ● ● ● ●● ● ● ●●● ● ● ● ●● ● ● ● ●● ● ● ●●● ● ● ● ● ● ● ● ● ● ●● ●●● ● ●● ●● ● ● ●●● ● ● ●●●● ● ● ●● ● ● ● ● ● ●●● ● ● ● ●●● ● ● ● ● ● ● ● ● ●● ● ● ●●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ●● ● ● ●● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ●●●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●●●● ● ● ● ● ●● ● ● ● ● ● ● ● ●●● ●● ● ● ● ● ● ●●● ● ● ● ● ● 30 40 50 60 70 0.0 0.2 0.4 0.6 time (s) Current(pA)
- 29. Transition Probabilities • Probability of not observing an event is a function of absolute difference between (expected) current 29 ● ● ●● ●● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ●●● ● ● ● ● ● ● ●● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ●● ●● ● ●●●● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●●● ● ●● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ●●● ● ●● ●● ● ● ●● ● ● ● ● ● ●● ●● ●● ● ● ●● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ●● ●● ● ●● ●● ● ●● ● ● ● ● ● ● ●● ● ● ●●● ● ● ● ●● ● ● ● ●● ● ● ●●● ● ● ● ● ● ● ● ● ● ●● ●●● ● ●● ●● ● ● ●●● ● ● ●●●● ● ● ●● ● ● ● ● ● ●●● ● ● ● ●●● ● ● ● ● ● ● ● ● ●● ● ● ●●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ●● ● ● ●● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ●●●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●●●● ● ● ● ● ●● ● ● ● ● ● ● ● ●●● ●● ● ● ● ● ● ●●● ● ● ● ● ● 30 40 50 60 70 0.0 0.2 0.4 0.6 time (s) Current(pA)
- 30. Transition Probabilities 30 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.1 0.2 0.3 0.4 0.5 0 5 10 15 20 absolute difference (pA) SkipProbability
- 31. Assembly Accuracy 31 0 5000 10000 0 5000 10000 5 mer count in reference 5mercountindraftassembly 0 3000 6000 9000 12000 TTTTT AAAAA TTTTG CAAAA CTTTT AAAAG CCCCC GGGGG kmer count draft reference y 12000 A C B D 0 5000 0 5000 10000 5 mer count in reference 5mercou 0 5000 10000 0 5000 10000 5 mer count in reference 5mercountinpolishedassembly C D Draft: 98.5% accuracy Polished: 99.5% accuracy
- 32. Assembly Accuracy 32 0 3000 6000 9000 12000 TTTTT AAAAA TTTTG CAAAA CTTTT AAAAG CCCCC GGGGG kmer count draft reference 9000 12000 B D 0 0 5000 10000 5 mer count in reference 5mer 0 3000 TTTTT AAAAA TTTTG CAAAA CTTTT AAAAG CCCCC GGGGG kmer 0 5000 10000 0 5000 10000 5 mer count in reference 5mercountinpolishedassembly 0 3000 6000 9000 12000 TTTTT AAAAA TTTTG CAAAA CTTTT AAAAG CCCCC GGGGG kmer count polished reference C D
- 33. Aligning Events to a Reference • HMM can also align events to a reference genome ! ! ! ! ! • Read about it here: • http://simpsonlab.github.io/2015/04/08/eventalign/ 33
- 34. Planned Improvements • Model dwell duration to better call homopolymers ! ! ! • SNP calling/genotyping ! ! ! • Improve scalability to handle larger genomes • Use signal data during error correction 34 CTAAAAAAAAAAAAGTACA P(gi|D) = P(D|gi)P(gi) P(D)
- 35. Acknowledgements & Code • Collaborators: • Nick Loman, Josh Quick (Birmingham) • Jonathan Dursi (OICR) ! ! • Code: • github.com/jts/nanocorrect (error correction) • github.com/jts/nanopolish (signal-level algorithms) • github.com/jts/nanopore-paper-analysis (reproduce our paper) 35