SlideShare a Scribd company logo
1 of 12
Download to read offline
Genome Assembly from Three Sequencing Platforms: 

MinION, MiSeq and PacBio
Francesca Giordano, Louise Aigrain, Michael Quail, James Bonfield,
Robert Davies, David Jackson, Thomas Keane, Zemin Ning and Richard Durbin
2
Yeast strains: S288c, SK1, N44, CBS
S288c Reference:
12 Million bases,17 chromosomes
Sequenced at the Wellcome Trust Sanger Institute
MinION Reads
Strain
Bases
(Mb)
Reads
Mean
Length
Longest
Read
Coverag
e
Identity
Numbe
r of
Runs
Flowcell
S288c 323 32770 9843 56477 27X 93% 3 R7
N44 130 15654 8292 37837 11X N/A 4 R7
CBS 109 12211 8952 46481 9X N/A 4 R7
SK1 51 5938 8589 36791 4X N/A 2 R7
MiSeq reads, for each strain: ~120X coverage, Identity ~93%
ONT Read
Lengths
PacBio Read
Lengths
PacBio Reads
Strain
Bases
(Mb)
Reads
Mean
Length
Longest
Read
Coverag
e
Identity
Number
of Runs
S288c 1463 239408 6109 35196 120X 93% 3
N44 1794 371025 4834 33906 148X N/A 3
CBS 1639 324414 5052 34173 134X N/A 2
SK1 3019 697989 4325 34080 248X N/A 5
3
Assemblers and other analysis tools
De Novo Assembly with Long Reads (ONT & PacBio)
Canu https://github.com/marbl/canu
Falcon https://github.com/PacificBiosciences/falcon
MiniAsm https://github.com/lh3/miniasm
PBcR http://wgs-assembler.sourceforge.net/wiki/index.php/PBcR
Nanopolish https://github.com/jts/nanopolish
Analysis I, S288c:
De novo Assembly with Long
reads only -- ONT or PacBio
Analysis II, all strains:
De novo Assembly with
MiSeq reads, Scaffolding using
Long Reads -- ONT or PacBio
More tools used:
Poretools (https://github.com/arq5x/poretools),
dnadiff (https://github.com/garviz/MUMmer)
De Novo Assembly with MiSeq Reads
SOAP denovo http://soap.genomics.org.cn/soapdenovo.html
Fermi https://github.com/lh3/fermi
SPAdes http://bioinf.spbau.ru/spades
Masurca http://www.genome.umd.edu/masurca.html
Scaffolding pipelines
Hybrid-SPAdes http://bioinf.spbau.ru/spades
SMIS https://sourceforge.net/projects/phusion2/files/smis/
4
Read Length distribution
Analysis I: S288c, de novo assembly with Long Reads only
Nanopore vs. PacBio Platforms
S288c
Reads
Bases
(Mb)
Reads
Mean Read
Length
Longest
Read
Coverage Identity
Nanopore 323 32770 9843 56477 27X 93%
PacBio 328 34248 9584 32921 27X 92%
5
Analysis I: S288c, de novo assembly with Long Reads only
Nanopore vs. PacBio Platforms
S288c
Reads
Bases
(Mb)
Reads
Mean Read
Length
Longest
Read
Coverage Identity
Nanopore 323 32770 9843 56477 27X 93%
PacBio 328 34248 9584 32921 27X 92%
Platform Assembler
Bases
(Mb)
Contigs
N50
(Kb)
Reference
coverage
Mismatches
(Kb)
Indels
(Kb)
Identity
CPU
Time (h)
Nanopore
Falcon 1.15 42 522 96.9% 7 279 97.5% 14
Canu 1.19 27 769 98.8% 2 229 98.0% 172
PBcR 1.35 101 528 99.2% 3 233 98.0% 496
MiniAsm 1.18 33 540 95.2% 431 820 89.2% 0.07
PacBio
Falcon 1.18 70 272 95.6% 4 295 99.7% 8
Canu 1.24 32 616 99.9% 1 10 99.9% 25
PBcR 1.25 37 751 99.2% 2 10 99.9% 52
MiniAsm 1.25 40 467 95.8% 232 1060 89.4% 0.05
6
Additional tests performed
Analysis I: S288c, de novo assembly with Long Reads
only
De novo assembly with varied coverage: 10X, 20X, 27X
for both ONT and PacBio data
De novo assembly using 2D reads from Pass folder
versus using 2D reads from Pass+Fail folders
Polish assemblies with Nanopolish to improve accuracy
De novo assembly with the full PacBio data samples:
> 120X per strain
Analysis II: Scaffolding drafts assemblies from MiSeq
data
De novo assembly with MiSeq reads and compare results
of scaffolding by Hybrid-SPAdes and by the SMIS pipeline
De novo assembly using 2D reads from Pass folder
versus using 2D reads from Pass+Fail folders
Thank you!
Acknowledgments:
Louise Aigrain, Michael Quail, James Bonfield, Robert
Davies, David Jackson, Thomas Keane, Zemin Ning,
Richard Durbin and Gianni Liti
Backup slides
8
Run details
Date Flowcell 2D ‘Pass’ 2D ‘Pass’ + 2D ’Fail’ 1D ‘Pass’ + 1D ’Fail’
Bases (Mb) Reads Bases (Mb) Reads Bases (Mb) Reads
S288c
18/02/2016 R7 53 4722 95 11631 252 32290
09/03/2016 R7 18 1937 30 3455 71 8535
11/03/2016 R7 111 11854 355 34586 676 87457
11/03/2016 R7 141 14257 252 27971 635 75977
N44
14/03/2015 R7 6 3101 14 4373 107 40519
20/04/2015 R7 1 154 1 233 12 1745
24/05/2015 R7 5 576 10 1166 39 6507
22/10/2015 R7 118 9026 186 22739 425 57893
CBS
24/05/2015 R7 6 700 9 1125 29 4497
24/05/2015 R7 15 1509 24 2725 80 11085
09/09/2015 R7 32 4066 42 5763 99 14531
22/10/2015 R7 56 5868 73 8159 169 21782
SK1
18/02/2016 R7 27 3166 143 19017 391 53137
10/03/2016 R7 24 2772 69 8990 176 24291
Run with the modified scripts by John Tyson (https://
wiki.nanoporetech.com/pages/viewpage.action?pageId=35523684)
9
Analysis I: S288c, de novo assembly with Long Reads only
Nanopore Reads, 27X Coverage
Canu
PBcR MiniAsm
Falcon Canu
PBcR MiniAsm
Falcon
PacBio Reads, 27X Coverage
10
Analysis I: S288c, de novo assembly with Long Reads only
Canu assembly from Nanopore polished
Assembler
Bases
(Mb)
Contigs
N50
(Kb)
Mismatches
(Kb)
Indels
(Kb)
Identity
CPU
Time (h)
Canu 1.19 27 769 1.8 229 98.0% 172
Canu+
Nanopolilsh
1.21 27 783 1.6 49.3 99.5% 873
11
Analysis I: S288c, de novo assembly with Long Reads only: PacBio Reads
Assembler Bases (Mb) Contigs N50 Identity
S288c
120X
Falcon 11.88 24 804938 99.99%
Canu 12.33 21 783441 99.99%
PBcR 12.19 17 809591 99.99%
MiniAsm 12.47 29 668702 89.20%
N44 148X
Falcon 11.71 22 788656 N/A
Canu 11.94 20 800107 N/A
PBcR 11.91 22 799764 N/A
MiniAsm 12.06 17 829559 N/A
SK1 326X
Falcon 10.44 104 272457 N/A
Canu 12.32 22 798830 N/A
PBcR 12.45 33 829345 N/A
MiniAsm 12.34 22 839990 N/A
CBS 134X
Falcon 11.78 24 1475018 N/A
Canu 12.14 19 829217 N/A
PBcR 12.24 33 815875 N/A
MiniAsm 12.37 20 1525406 N/A
12
Strain Pipeline Bases Scaffolds N50 Identity
Hybrid-SPAdes versus SPAdes + SMIS
S288c 27X
Spades +SMIS 11727537 97 494844 99.98%
Hybrid-Spades 11770749 66 444120 99.97%
N44 11X
Spades +SMIS 11695790 66 653753 N/A
Hybrid-Spades 11703568 58 348695 N/A
CBS 9X
Spades +SMIS 11695531 73 502933 N/A
Hybrid-Spades 11709420 45 548274 N/A
SK1 4X
Spades +SMIS 11671735 147 343333 N/A
Hybrid-Spades 11684067 112 226622 N/A
Best Assemblies: SOAP denovo + SMIS
S288c 27X SOAP k=81 11744509 83 818222 99.97%
N44 11X SOAP k=101 11663034 48 663565 N/A
CBS 9X SOAP k=101 11659507 69 793208 N/A
SK1 4X SOAP k=81 11623050 114 452917 N/A
Analysis II: De novo Assembly with MiSeq reads, Scaffolding using Long Reads
Nanopore Reads

More Related Content

What's hot

Ns 3 installation procedure
Ns 3 installation procedureNs 3 installation procedure
Ns 3 installation procedureVinayak Antin
 
OpenNebulaConf2018 - Why Latency is the #1 Metric of your Cloud - Boyan Krosn...
OpenNebulaConf2018 - Why Latency is the #1 Metric of your Cloud - Boyan Krosn...OpenNebulaConf2018 - Why Latency is the #1 Metric of your Cloud - Boyan Krosn...
OpenNebulaConf2018 - Why Latency is the #1 Metric of your Cloud - Boyan Krosn...OpenNebula Project
 
Multicloud connectivity using OpenNHRP
Multicloud connectivity using OpenNHRPMulticloud connectivity using OpenNHRP
Multicloud connectivity using OpenNHRPBob Melander
 
LF_OVS_17_OVS/OVS-DPDK connection tracking for Mobile usecases
LF_OVS_17_OVS/OVS-DPDK connection tracking for Mobile usecasesLF_OVS_17_OVS/OVS-DPDK connection tracking for Mobile usecases
LF_OVS_17_OVS/OVS-DPDK connection tracking for Mobile usecasesLF_OpenvSwitch
 
Debugging linux issues with eBPF
Debugging linux issues with eBPFDebugging linux issues with eBPF
Debugging linux issues with eBPFIvan Babrou
 
How to Speak Intel DPDK KNI for Web Services.
How to Speak Intel DPDK KNI for Web Services.How to Speak Intel DPDK KNI for Web Services.
How to Speak Intel DPDK KNI for Web Services.Naoto MATSUMOTO
 
The n00bs guide to ovs dpdk
The n00bs guide to ovs dpdkThe n00bs guide to ovs dpdk
The n00bs guide to ovs dpdkmarkdgray
 

What's hot (7)

Ns 3 installation procedure
Ns 3 installation procedureNs 3 installation procedure
Ns 3 installation procedure
 
OpenNebulaConf2018 - Why Latency is the #1 Metric of your Cloud - Boyan Krosn...
OpenNebulaConf2018 - Why Latency is the #1 Metric of your Cloud - Boyan Krosn...OpenNebulaConf2018 - Why Latency is the #1 Metric of your Cloud - Boyan Krosn...
OpenNebulaConf2018 - Why Latency is the #1 Metric of your Cloud - Boyan Krosn...
 
Multicloud connectivity using OpenNHRP
Multicloud connectivity using OpenNHRPMulticloud connectivity using OpenNHRP
Multicloud connectivity using OpenNHRP
 
LF_OVS_17_OVS/OVS-DPDK connection tracking for Mobile usecases
LF_OVS_17_OVS/OVS-DPDK connection tracking for Mobile usecasesLF_OVS_17_OVS/OVS-DPDK connection tracking for Mobile usecases
LF_OVS_17_OVS/OVS-DPDK connection tracking for Mobile usecases
 
Debugging linux issues with eBPF
Debugging linux issues with eBPFDebugging linux issues with eBPF
Debugging linux issues with eBPF
 
How to Speak Intel DPDK KNI for Web Services.
How to Speak Intel DPDK KNI for Web Services.How to Speak Intel DPDK KNI for Web Services.
How to Speak Intel DPDK KNI for Web Services.
 
The n00bs guide to ovs dpdk
The n00bs guide to ovs dpdkThe n00bs guide to ovs dpdk
The n00bs guide to ovs dpdk
 

Similar to Genome assembly from three sequencing platforms: minION, MiSeq and PacBio

Miten Generating high-quality reference human genomes using Promethion nanopo...
Miten Generating high-quality reference human genomes using Promethion nanopo...Miten Generating high-quality reference human genomes using Promethion nanopo...
Miten Generating high-quality reference human genomes using Promethion nanopo...GenomeInABottle
 
Optimal Quality and Efficiency in Adaptive Live Streaming with JND-Aware Low ...
Optimal Quality and Efficiency in Adaptive Live Streaming with JND-Aware Low ...Optimal Quality and Efficiency in Adaptive Live Streaming with JND-Aware Low ...
Optimal Quality and Efficiency in Adaptive Live Streaming with JND-Aware Low ...Vignesh V Menon
 
Optimal Quality and Efficiency in Adaptive Live Streaming with JND-Aware Low ...
Optimal Quality and Efficiency in Adaptive Live Streaming with JND-Aware Low ...Optimal Quality and Efficiency in Adaptive Live Streaming with JND-Aware Low ...
Optimal Quality and Efficiency in Adaptive Live Streaming with JND-Aware Low ...Alpen-Adria-Universität
 
Ceph Day New York 2014: Ceph, a physical perspective
Ceph Day New York 2014: Ceph, a physical perspective Ceph Day New York 2014: Ceph, a physical perspective
Ceph Day New York 2014: Ceph, a physical perspective Ceph Community
 
customization of a deep learning accelerator, based on NVDLA
customization of a deep learning accelerator, based on NVDLAcustomization of a deep learning accelerator, based on NVDLA
customization of a deep learning accelerator, based on NVDLAShien-Chun Luo
 
Using BioNano Maps to Improve an Insect Genome Assembly​
Using BioNano Maps to Improve an Insect Genome Assembly​Using BioNano Maps to Improve an Insect Genome Assembly​
Using BioNano Maps to Improve an Insect Genome Assembly​Jennifer Shelton
 
MVAPICH: How a Bunch of Buckeyes Crack Tough Nuts
MVAPICH: How a Bunch of Buckeyes Crack Tough NutsMVAPICH: How a Bunch of Buckeyes Crack Tough Nuts
MVAPICH: How a Bunch of Buckeyes Crack Tough Nutsinside-BigData.com
 
Implementing AI: High Performance Architectures: A Universal Accelerated Comp...
Implementing AI: High Performance Architectures: A Universal Accelerated Comp...Implementing AI: High Performance Architectures: A Universal Accelerated Comp...
Implementing AI: High Performance Architectures: A Universal Accelerated Comp...KTN
 
Generating high-quality human reference genomes using PromethION nanopore seq...
Generating high-quality human reference genomes using PromethION nanopore seq...Generating high-quality human reference genomes using PromethION nanopore seq...
Generating high-quality human reference genomes using PromethION nanopore seq...Miten Jain
 
cisco-n9k-c93108tc-ex-datasheet.pdf
cisco-n9k-c93108tc-ex-datasheet.pdfcisco-n9k-c93108tc-ex-datasheet.pdf
cisco-n9k-c93108tc-ex-datasheet.pdfHi-Network.com
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkDataWorks Summit
 
Ovs perf
Ovs perfOvs perf
Ovs perfMadhu c
 
New data from giab genomes promethion
New data from giab genomes   promethionNew data from giab genomes   promethion
New data from giab genomes promethionGenomeInABottle
 
캡슐 네트워크를 이용한 엔드투엔드 음성 단어 인식, 배재성(KAIST 석사과정)
캡슐 네트워크를 이용한 엔드투엔드 음성 단어 인식, 배재성(KAIST 석사과정)캡슐 네트워크를 이용한 엔드투엔드 음성 단어 인식, 배재성(KAIST 석사과정)
캡슐 네트워크를 이용한 엔드투엔드 음성 단어 인식, 배재성(KAIST 석사과정)NAVER Engineering
 
cisco-n9k-c93180yc-ex-datasheet.pdf
cisco-n9k-c93180yc-ex-datasheet.pdfcisco-n9k-c93180yc-ex-datasheet.pdf
cisco-n9k-c93180yc-ex-datasheet.pdfHi-Network.com
 
CNNECST: an FPGA-based approach for the hardware acceleration of Convolutiona...
CNNECST: an FPGA-based approach for the hardware acceleration of Convolutiona...CNNECST: an FPGA-based approach for the hardware acceleration of Convolutiona...
CNNECST: an FPGA-based approach for the hardware acceleration of Convolutiona...NECST Lab @ Politecnico di Milano
 
Improving Top-K Retrieval Algorithms Using Dynamic Programming and Longer Ski...
Improving Top-K Retrieval Algorithms Using Dynamic Programming and Longer Ski...Improving Top-K Retrieval Algorithms Using Dynamic Programming and Longer Ski...
Improving Top-K Retrieval Algorithms Using Dynamic Programming and Longer Ski...Sease
 

Similar to Genome assembly from three sequencing platforms: minION, MiSeq and PacBio (20)

AGBT 2016 Workshop Magrini
AGBT 2016 Workshop MagriniAGBT 2016 Workshop Magrini
AGBT 2016 Workshop Magrini
 
Miten Generating high-quality reference human genomes using Promethion nanopo...
Miten Generating high-quality reference human genomes using Promethion nanopo...Miten Generating high-quality reference human genomes using Promethion nanopo...
Miten Generating high-quality reference human genomes using Promethion nanopo...
 
Optimal Quality and Efficiency in Adaptive Live Streaming with JND-Aware Low ...
Optimal Quality and Efficiency in Adaptive Live Streaming with JND-Aware Low ...Optimal Quality and Efficiency in Adaptive Live Streaming with JND-Aware Low ...
Optimal Quality and Efficiency in Adaptive Live Streaming with JND-Aware Low ...
 
Optimal Quality and Efficiency in Adaptive Live Streaming with JND-Aware Low ...
Optimal Quality and Efficiency in Adaptive Live Streaming with JND-Aware Low ...Optimal Quality and Efficiency in Adaptive Live Streaming with JND-Aware Low ...
Optimal Quality and Efficiency in Adaptive Live Streaming with JND-Aware Low ...
 
Ceph Day New York 2014: Ceph, a physical perspective
Ceph Day New York 2014: Ceph, a physical perspective Ceph Day New York 2014: Ceph, a physical perspective
Ceph Day New York 2014: Ceph, a physical perspective
 
customization of a deep learning accelerator, based on NVDLA
customization of a deep learning accelerator, based on NVDLAcustomization of a deep learning accelerator, based on NVDLA
customization of a deep learning accelerator, based on NVDLA
 
Using BioNano Maps to Improve an Insect Genome Assembly​
Using BioNano Maps to Improve an Insect Genome Assembly​Using BioNano Maps to Improve an Insect Genome Assembly​
Using BioNano Maps to Improve an Insect Genome Assembly​
 
MVAPICH: How a Bunch of Buckeyes Crack Tough Nuts
MVAPICH: How a Bunch of Buckeyes Crack Tough NutsMVAPICH: How a Bunch of Buckeyes Crack Tough Nuts
MVAPICH: How a Bunch of Buckeyes Crack Tough Nuts
 
Implementing AI: High Performance Architectures: A Universal Accelerated Comp...
Implementing AI: High Performance Architectures: A Universal Accelerated Comp...Implementing AI: High Performance Architectures: A Universal Accelerated Comp...
Implementing AI: High Performance Architectures: A Universal Accelerated Comp...
 
Generating high-quality human reference genomes using PromethION nanopore seq...
Generating high-quality human reference genomes using PromethION nanopore seq...Generating high-quality human reference genomes using PromethION nanopore seq...
Generating high-quality human reference genomes using PromethION nanopore seq...
 
cisco-n9k-c93108tc-ex-datasheet.pdf
cisco-n9k-c93108tc-ex-datasheet.pdfcisco-n9k-c93108tc-ex-datasheet.pdf
cisco-n9k-c93108tc-ex-datasheet.pdf
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
 
Ovs perf
Ovs perfOvs perf
Ovs perf
 
New data from giab genomes promethion
New data from giab genomes   promethionNew data from giab genomes   promethion
New data from giab genomes promethion
 
Ramorum2016 final
Ramorum2016 finalRamorum2016 final
Ramorum2016 final
 
캡슐 네트워크를 이용한 엔드투엔드 음성 단어 인식, 배재성(KAIST 석사과정)
캡슐 네트워크를 이용한 엔드투엔드 음성 단어 인식, 배재성(KAIST 석사과정)캡슐 네트워크를 이용한 엔드투엔드 음성 단어 인식, 배재성(KAIST 석사과정)
캡슐 네트워크를 이용한 엔드투엔드 음성 단어 인식, 배재성(KAIST 석사과정)
 
cisco-n9k-c93180yc-ex-datasheet.pdf
cisco-n9k-c93180yc-ex-datasheet.pdfcisco-n9k-c93180yc-ex-datasheet.pdf
cisco-n9k-c93180yc-ex-datasheet.pdf
 
BioSB meeting 2015
BioSB meeting 2015BioSB meeting 2015
BioSB meeting 2015
 
CNNECST: an FPGA-based approach for the hardware acceleration of Convolutiona...
CNNECST: an FPGA-based approach for the hardware acceleration of Convolutiona...CNNECST: an FPGA-based approach for the hardware acceleration of Convolutiona...
CNNECST: an FPGA-based approach for the hardware acceleration of Convolutiona...
 
Improving Top-K Retrieval Algorithms Using Dynamic Programming and Longer Ski...
Improving Top-K Retrieval Algorithms Using Dynamic Programming and Longer Ski...Improving Top-K Retrieval Algorithms Using Dynamic Programming and Longer Ski...
Improving Top-K Retrieval Algorithms Using Dynamic Programming and Longer Ski...
 

More from Francesca Giordano

Phenix Forward Upgrade: the RPCs
Phenix Forward Upgrade: the RPCsPhenix Forward Upgrade: the RPCs
Phenix Forward Upgrade: the RPCsFrancesca Giordano
 
Talk at the QCDN 2012 conference in Bilbao
Talk at the QCDN 2012 conference in BilbaoTalk at the QCDN 2012 conference in Bilbao
Talk at the QCDN 2012 conference in BilbaoFrancesca Giordano
 
Seminario per Studenti a SPINFest 2014 - day 1
Seminario per Studenti a SPINFest 2014 - day 1Seminario per Studenti a SPINFest 2014 - day 1
Seminario per Studenti a SPINFest 2014 - day 1Francesca Giordano
 
Seminario per Studenti a SPINFest 2014
Seminario per Studenti a SPINFest 2014Seminario per Studenti a SPINFest 2014
Seminario per Studenti a SPINFest 2014Francesca Giordano
 
NGS Assembly Practical Lesson (EBI course)
NGS Assembly Practical Lesson (EBI course)NGS Assembly Practical Lesson (EBI course)
NGS Assembly Practical Lesson (EBI course)Francesca Giordano
 
F Giordano ScanPAV Analysis Pipeline
F Giordano ScanPAV Analysis PipelineF Giordano ScanPAV Analysis Pipeline
F Giordano ScanPAV Analysis PipelineFrancesca Giordano
 
F Giordano Proton Spin from Sea Quarks
F Giordano Proton Spin from Sea QuarksF Giordano Proton Spin from Sea Quarks
F Giordano Proton Spin from Sea QuarksFrancesca Giordano
 
F Giordano: spin-dependent effects in spin-averaged DIS
F Giordano: spin-dependent effects in spin-averaged DISF Giordano: spin-dependent effects in spin-averaged DIS
F Giordano: spin-dependent effects in spin-averaged DISFrancesca Giordano
 
F Giordano Collins Fragmentation for Kaon
F Giordano Collins Fragmentation for KaonF Giordano Collins Fragmentation for Kaon
F Giordano Collins Fragmentation for KaonFrancesca Giordano
 
F Giordano Proton transversity distributions
F Giordano Proton transversity distributionsF Giordano Proton transversity distributions
F Giordano Proton transversity distributionsFrancesca Giordano
 

More from Francesca Giordano (12)

Phenix Forward Upgrade: the RPCs
Phenix Forward Upgrade: the RPCsPhenix Forward Upgrade: the RPCs
Phenix Forward Upgrade: the RPCs
 
Dis2013 spin highlights
Dis2013 spin highlightsDis2013 spin highlights
Dis2013 spin highlights
 
TMD PDF in SIDIS
TMD PDF in SIDISTMD PDF in SIDIS
TMD PDF in SIDIS
 
Talk at the QCDN 2012 conference in Bilbao
Talk at the QCDN 2012 conference in BilbaoTalk at the QCDN 2012 conference in Bilbao
Talk at the QCDN 2012 conference in Bilbao
 
Seminario per Studenti a SPINFest 2014 - day 1
Seminario per Studenti a SPINFest 2014 - day 1Seminario per Studenti a SPINFest 2014 - day 1
Seminario per Studenti a SPINFest 2014 - day 1
 
Seminario per Studenti a SPINFest 2014
Seminario per Studenti a SPINFest 2014Seminario per Studenti a SPINFest 2014
Seminario per Studenti a SPINFest 2014
 
NGS Assembly Practical Lesson (EBI course)
NGS Assembly Practical Lesson (EBI course)NGS Assembly Practical Lesson (EBI course)
NGS Assembly Practical Lesson (EBI course)
 
F Giordano ScanPAV Analysis Pipeline
F Giordano ScanPAV Analysis PipelineF Giordano ScanPAV Analysis Pipeline
F Giordano ScanPAV Analysis Pipeline
 
F Giordano Proton Spin from Sea Quarks
F Giordano Proton Spin from Sea QuarksF Giordano Proton Spin from Sea Quarks
F Giordano Proton Spin from Sea Quarks
 
F Giordano: spin-dependent effects in spin-averaged DIS
F Giordano: spin-dependent effects in spin-averaged DISF Giordano: spin-dependent effects in spin-averaged DIS
F Giordano: spin-dependent effects in spin-averaged DIS
 
F Giordano Collins Fragmentation for Kaon
F Giordano Collins Fragmentation for KaonF Giordano Collins Fragmentation for Kaon
F Giordano Collins Fragmentation for Kaon
 
F Giordano Proton transversity distributions
F Giordano Proton transversity distributionsF Giordano Proton transversity distributions
F Giordano Proton transversity distributions
 

Recently uploaded

A relative description on Sonoporation.pdf
A relative description on Sonoporation.pdfA relative description on Sonoporation.pdf
A relative description on Sonoporation.pdfnehabiju2046
 
Green chemistry and Sustainable development.pptx
Green chemistry  and Sustainable development.pptxGreen chemistry  and Sustainable development.pptx
Green chemistry and Sustainable development.pptxRajatChauhan518211
 
Natural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsNatural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsAArockiyaNisha
 
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRStunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRDelhi Call girls
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...RohitNehra6
 
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...anilsa9823
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxgindu3009
 
Artificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C PArtificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C PPRINCE C P
 
Boyles law module in the grade 10 science
Boyles law module in the grade 10 scienceBoyles law module in the grade 10 science
Boyles law module in the grade 10 sciencefloriejanemacaya1
 
GFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxGFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxAleenaTreesaSaji
 
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...jana861314
 
Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Patrick Diehl
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Lokesh Kothari
 
Chemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfChemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfSumit Kumar yadav
 
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...Sérgio Sacani
 
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisRaman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisDiwakar Mishra
 
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCESTERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCEPRINCE C P
 
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptxUnlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptxanandsmhk
 
Orientation, design and principles of polyhouse
Orientation, design and principles of polyhouseOrientation, design and principles of polyhouse
Orientation, design and principles of polyhousejana861314
 

Recently uploaded (20)

A relative description on Sonoporation.pdf
A relative description on Sonoporation.pdfA relative description on Sonoporation.pdf
A relative description on Sonoporation.pdf
 
Green chemistry and Sustainable development.pptx
Green chemistry  and Sustainable development.pptxGreen chemistry  and Sustainable development.pptx
Green chemistry and Sustainable development.pptx
 
Natural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsNatural Polymer Based Nanomaterials
Natural Polymer Based Nanomaterials
 
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRStunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...
 
The Philosophy of Science
The Philosophy of ScienceThe Philosophy of Science
The Philosophy of Science
 
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptx
 
Artificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C PArtificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C P
 
Boyles law module in the grade 10 science
Boyles law module in the grade 10 scienceBoyles law module in the grade 10 science
Boyles law module in the grade 10 science
 
GFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxGFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptx
 
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
 
Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
 
Chemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfChemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdf
 
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
 
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisRaman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
 
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCESTERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
 
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptxUnlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
 
Orientation, design and principles of polyhouse
Orientation, design and principles of polyhouseOrientation, design and principles of polyhouse
Orientation, design and principles of polyhouse
 

Genome assembly from three sequencing platforms: minION, MiSeq and PacBio

  • 1. Genome Assembly from Three Sequencing Platforms: 
 MinION, MiSeq and PacBio Francesca Giordano, Louise Aigrain, Michael Quail, James Bonfield, Robert Davies, David Jackson, Thomas Keane, Zemin Ning and Richard Durbin
  • 2. 2 Yeast strains: S288c, SK1, N44, CBS S288c Reference: 12 Million bases,17 chromosomes Sequenced at the Wellcome Trust Sanger Institute MinION Reads Strain Bases (Mb) Reads Mean Length Longest Read Coverag e Identity Numbe r of Runs Flowcell S288c 323 32770 9843 56477 27X 93% 3 R7 N44 130 15654 8292 37837 11X N/A 4 R7 CBS 109 12211 8952 46481 9X N/A 4 R7 SK1 51 5938 8589 36791 4X N/A 2 R7 MiSeq reads, for each strain: ~120X coverage, Identity ~93% ONT Read Lengths PacBio Read Lengths PacBio Reads Strain Bases (Mb) Reads Mean Length Longest Read Coverag e Identity Number of Runs S288c 1463 239408 6109 35196 120X 93% 3 N44 1794 371025 4834 33906 148X N/A 3 CBS 1639 324414 5052 34173 134X N/A 2 SK1 3019 697989 4325 34080 248X N/A 5
  • 3. 3 Assemblers and other analysis tools De Novo Assembly with Long Reads (ONT & PacBio) Canu https://github.com/marbl/canu Falcon https://github.com/PacificBiosciences/falcon MiniAsm https://github.com/lh3/miniasm PBcR http://wgs-assembler.sourceforge.net/wiki/index.php/PBcR Nanopolish https://github.com/jts/nanopolish Analysis I, S288c: De novo Assembly with Long reads only -- ONT or PacBio Analysis II, all strains: De novo Assembly with MiSeq reads, Scaffolding using Long Reads -- ONT or PacBio More tools used: Poretools (https://github.com/arq5x/poretools), dnadiff (https://github.com/garviz/MUMmer) De Novo Assembly with MiSeq Reads SOAP denovo http://soap.genomics.org.cn/soapdenovo.html Fermi https://github.com/lh3/fermi SPAdes http://bioinf.spbau.ru/spades Masurca http://www.genome.umd.edu/masurca.html Scaffolding pipelines Hybrid-SPAdes http://bioinf.spbau.ru/spades SMIS https://sourceforge.net/projects/phusion2/files/smis/
  • 4. 4 Read Length distribution Analysis I: S288c, de novo assembly with Long Reads only Nanopore vs. PacBio Platforms S288c Reads Bases (Mb) Reads Mean Read Length Longest Read Coverage Identity Nanopore 323 32770 9843 56477 27X 93% PacBio 328 34248 9584 32921 27X 92%
  • 5. 5 Analysis I: S288c, de novo assembly with Long Reads only Nanopore vs. PacBio Platforms S288c Reads Bases (Mb) Reads Mean Read Length Longest Read Coverage Identity Nanopore 323 32770 9843 56477 27X 93% PacBio 328 34248 9584 32921 27X 92% Platform Assembler Bases (Mb) Contigs N50 (Kb) Reference coverage Mismatches (Kb) Indels (Kb) Identity CPU Time (h) Nanopore Falcon 1.15 42 522 96.9% 7 279 97.5% 14 Canu 1.19 27 769 98.8% 2 229 98.0% 172 PBcR 1.35 101 528 99.2% 3 233 98.0% 496 MiniAsm 1.18 33 540 95.2% 431 820 89.2% 0.07 PacBio Falcon 1.18 70 272 95.6% 4 295 99.7% 8 Canu 1.24 32 616 99.9% 1 10 99.9% 25 PBcR 1.25 37 751 99.2% 2 10 99.9% 52 MiniAsm 1.25 40 467 95.8% 232 1060 89.4% 0.05
  • 6. 6 Additional tests performed Analysis I: S288c, de novo assembly with Long Reads only De novo assembly with varied coverage: 10X, 20X, 27X for both ONT and PacBio data De novo assembly using 2D reads from Pass folder versus using 2D reads from Pass+Fail folders Polish assemblies with Nanopolish to improve accuracy De novo assembly with the full PacBio data samples: > 120X per strain Analysis II: Scaffolding drafts assemblies from MiSeq data De novo assembly with MiSeq reads and compare results of scaffolding by Hybrid-SPAdes and by the SMIS pipeline De novo assembly using 2D reads from Pass folder versus using 2D reads from Pass+Fail folders Thank you! Acknowledgments: Louise Aigrain, Michael Quail, James Bonfield, Robert Davies, David Jackson, Thomas Keane, Zemin Ning, Richard Durbin and Gianni Liti
  • 8. 8 Run details Date Flowcell 2D ‘Pass’ 2D ‘Pass’ + 2D ’Fail’ 1D ‘Pass’ + 1D ’Fail’ Bases (Mb) Reads Bases (Mb) Reads Bases (Mb) Reads S288c 18/02/2016 R7 53 4722 95 11631 252 32290 09/03/2016 R7 18 1937 30 3455 71 8535 11/03/2016 R7 111 11854 355 34586 676 87457 11/03/2016 R7 141 14257 252 27971 635 75977 N44 14/03/2015 R7 6 3101 14 4373 107 40519 20/04/2015 R7 1 154 1 233 12 1745 24/05/2015 R7 5 576 10 1166 39 6507 22/10/2015 R7 118 9026 186 22739 425 57893 CBS 24/05/2015 R7 6 700 9 1125 29 4497 24/05/2015 R7 15 1509 24 2725 80 11085 09/09/2015 R7 32 4066 42 5763 99 14531 22/10/2015 R7 56 5868 73 8159 169 21782 SK1 18/02/2016 R7 27 3166 143 19017 391 53137 10/03/2016 R7 24 2772 69 8990 176 24291 Run with the modified scripts by John Tyson (https:// wiki.nanoporetech.com/pages/viewpage.action?pageId=35523684)
  • 9. 9 Analysis I: S288c, de novo assembly with Long Reads only Nanopore Reads, 27X Coverage Canu PBcR MiniAsm Falcon Canu PBcR MiniAsm Falcon PacBio Reads, 27X Coverage
  • 10. 10 Analysis I: S288c, de novo assembly with Long Reads only Canu assembly from Nanopore polished Assembler Bases (Mb) Contigs N50 (Kb) Mismatches (Kb) Indels (Kb) Identity CPU Time (h) Canu 1.19 27 769 1.8 229 98.0% 172 Canu+ Nanopolilsh 1.21 27 783 1.6 49.3 99.5% 873
  • 11. 11 Analysis I: S288c, de novo assembly with Long Reads only: PacBio Reads Assembler Bases (Mb) Contigs N50 Identity S288c 120X Falcon 11.88 24 804938 99.99% Canu 12.33 21 783441 99.99% PBcR 12.19 17 809591 99.99% MiniAsm 12.47 29 668702 89.20% N44 148X Falcon 11.71 22 788656 N/A Canu 11.94 20 800107 N/A PBcR 11.91 22 799764 N/A MiniAsm 12.06 17 829559 N/A SK1 326X Falcon 10.44 104 272457 N/A Canu 12.32 22 798830 N/A PBcR 12.45 33 829345 N/A MiniAsm 12.34 22 839990 N/A CBS 134X Falcon 11.78 24 1475018 N/A Canu 12.14 19 829217 N/A PBcR 12.24 33 815875 N/A MiniAsm 12.37 20 1525406 N/A
  • 12. 12 Strain Pipeline Bases Scaffolds N50 Identity Hybrid-SPAdes versus SPAdes + SMIS S288c 27X Spades +SMIS 11727537 97 494844 99.98% Hybrid-Spades 11770749 66 444120 99.97% N44 11X Spades +SMIS 11695790 66 653753 N/A Hybrid-Spades 11703568 58 348695 N/A CBS 9X Spades +SMIS 11695531 73 502933 N/A Hybrid-Spades 11709420 45 548274 N/A SK1 4X Spades +SMIS 11671735 147 343333 N/A Hybrid-Spades 11684067 112 226622 N/A Best Assemblies: SOAP denovo + SMIS S288c 27X SOAP k=81 11744509 83 818222 99.97% N44 11X SOAP k=101 11663034 48 663565 N/A CBS 9X SOAP k=101 11659507 69 793208 N/A SK1 4X SOAP k=81 11623050 114 452917 N/A Analysis II: De novo Assembly with MiSeq reads, Scaffolding using Long Reads Nanopore Reads