SlideShare a Scribd company logo
1 of 16
De novo assembly, a
multi-technology approach:
Illumina, PacBio, and OpGen
PhD. Francesco Vezzi
Senior Bioinformatician, NGI-Stockholm
Both Stockholm and Uppsala nodes
Illumina HiSeq 2000/2500 16
Illumina MiSeq 3
Life Technologies SOLiD 5500xl 4
Life Technologies SOLiD 5500wildfire 2
Life Technologies Ion Torrent 2
Life Technologies Ion Proton 6
Life Technologies Sanger ABI3730 2
Pacific Biosciences RSII 1
Argus Whole Genome Mapping System 1
One of 3 best-equipped sequencing sites in Europe
In this talk
Illumina (Stockholm):
• 100/150 bp paired reads (low error rate)
• 900/200 Gbp in 6/2 day(s)
PacBio (Uppsala):
• 8.5 Kbp reads, (max 30Kbp, high error rate)
• 375 Mbp (1 SMRT Cell) in 10 hours
OpGen Argus System (Stockholm):
• ~300 Kbp maps
• 10 Gbp in ~1 day
Optical Maps
• Restriction Map
◦ Representation of the cut sites on a
given DNA molecule to provide spatial
information of genetic loci
• An enzyme is selected and used
to cut the molecules. This
provides a 2D representation of
the molecule structure
Optical Maps: workflow
DNA extraction directly
from culture
Quality control of
extracted material
Prepare a chip
Run Argus System
Data assembly
StepsTime
3-8h
1h
1.5h
1h
2-8h
Notes
Closing genomes with Optical Maps
De novo reconstructs parts
missing in the reference strain
Correctly assembles long tandem
repeats
De Novo assembly
(Illumina, PacBio)
Set of un-ordered and
not oriented contigs
Optical Map
Contigs
Case Study: Combing all the technologies
~15 Mbp genome sequenced at High Coverage with:
• Illumina HiSeq:
• 500X PE libraries (180bp and 650bp insert)
• 150X MP library (3Kbp)
• 150X MP library (7Kbp)
• PacBio
• 50/60X with reads longer than 2Kbp
• OpGen
• 3 chips (only one worked really well)
• 300X coverage
• Average map length 320Kbp
Assembly Strategy
https://github.com/vezzi/de_novo_scilife
Semi-automated pipeline for de novo assembly:
• Global configuration file  tools and system configuration
• Sample configuration file  samples description
3 modules:
1. QC-module (Illumina only):
• Adaptor removal, kmer-analysis, fastqc, (insert size estimation)
2. Assemble-module (Illumina only):
• Runs specified assemblers and outputs executed commands
3. Validation-module:
• FRCbam, coverage analysis, GC-analysis, (N50)
I NEED USERS/FEEDBACK/CONTIRBUTIONS
QC-Module
Kmer analysis:
• Samples complexity
• Error rate
• Heterozygosity
0 1000 2000 3000 4000 5000 6000
05000100001500020000
Insert Size Histogram for All_Reads
in file lib_3000.bam
Insert Size
Count FR
RF
TANDEM
FASTQC
Adaptor removal
Alignment (partial assembly)
Assemble-Module
Illumina only:
• SOAPdenovo
• MaSuRCA
• Allpaths-LG
PacBio only:
• HGAP
• CABOG
Hybrid:
• PB-jelly (HAH)
>5000
#scaffolds totalLength maxContigLength N50 N80 percentageNs
Allpaths-LG 227 14513103 596012 139364 57619 15%
MASURCA 163 18549484 1188669 526519 282507 2%
HGAP 290 14399273 763592 142483 37117 0%
PB-Jelly 179 14718213 747750 195225 85127 13%
• Try-and-fail process
• Automated pipeline developed in order to
streamline these analysis
• MASURCA surprisingly the “best” assembler
MaSuRCA HGAP PB-Jelly (HAH)
Validation-Module
FRCbam
Validation-Module
PacBio-only assembly is
clearly outperforming
the others
Optical Maps
PacBio produces the best assembly however 290 contigs contigs are produced.
Optical Maps allowed to obtain
the 2D representation of the 7
chromosomes.
N.B. chromosome number was
one of the biological questions of
this project!!!
But much more can be done!!!
Incredible tool to finish (or almost finish) genomes
% contigs placed
Total size of placed
contigs
% size placed
contigs
% genome
covered
pacBio+OpGene 94.12 11578995 97% 77.05
Allpaths+OpGene 71.88 10692027 84% 52.88
Allpaths+Masurca+Opgene 80.65 27506424 92% 69.64
Allpaths+PacBio+Opgene 82.32 22271022 91% 83.05
Masurca+PacBio+pgene 94.44 28393392 98% 83.79
Allpaths+Masurca+PacBio+Opgene 85.42 39085419 94% 87.39
Combing all the technologies
Conclusions – Take home message
Attempt to automate de novo assembly process:
• https://github.com/vezzi/de_novo_scilife
• Not 100% automated
Illumina, PacBio, Hybrid assemblies:
• PacBio alone seems to produce the best assemblers
• Hybrid assembly seems to not be able to correct merged-assembly
problems
Mixing technologies is always a good idea:
• Possibility to compensate technological biases
• Allows to produce better assemblies
Thanks
https://github.com/vezzi/de_novo_scilife

More Related Content

What's hot

Behalf Of Pamela Collaboration
Behalf Of Pamela CollaborationBehalf Of Pamela Collaboration
Behalf Of Pamela Collaborationahmad bassiouny
 
SkySweeper: A High Wire Robot
SkySweeper: A High Wire RobotSkySweeper: A High Wire Robot
SkySweeper: A High Wire RobotNick Morozovsky
 
自律移動ロボット向けハード・ソフト協調のためのコンポーネント設計支援ツール
自律移動ロボット向けハード・ソフト協調のためのコンポーネント設計支援ツール自律移動ロボット向けハード・ソフト協調のためのコンポーネント設計支援ツール
自律移動ロボット向けハード・ソフト協調のためのコンポーネント設計支援ツールKazushi Yamashina
 
FPGA処理をROSコンポーネント化する自動設計環境
FPGA処理をROSコンポーネント化する自動設計環境FPGA処理をROSコンポーネント化する自動設計環境
FPGA処理をROSコンポーネント化する自動設計環境Kazushi Yamashina
 
Track Finding in LHCb's 2020 Trigger
Track Finding in LHCb's 2020 TriggerTrack Finding in LHCb's 2020 Trigger
Track Finding in LHCb's 2020 TriggerTimothy Head
 
IGARSS 2011 pt slides_7 28 2011.ppt
IGARSS 2011 pt slides_7 28 2011.pptIGARSS 2011 pt slides_7 28 2011.ppt
IGARSS 2011 pt slides_7 28 2011.pptgrssieee
 
cReComp : Automated Design Tool for ROS-Compliant FPGA Component
cReComp : Automated Design Tool  for ROS-Compliant FPGA Component cReComp : Automated Design Tool  for ROS-Compliant FPGA Component
cReComp : Automated Design Tool for ROS-Compliant FPGA Component Kazushi Yamashina
 
Uav flight control system with ins gps
Uav flight control system with ins gpsUav flight control system with ins gps
Uav flight control system with ins gpsamir amiry
 
FPGAを用いた処理のロボット向けコンポーネントの設計生産性評価
FPGAを用いた処理のロボット向けコンポーネントの設計生産性評価FPGAを用いた処理のロボット向けコンポーネントの設計生産性評価
FPGAを用いた処理のロボット向けコンポーネントの設計生産性評価Kazushi Yamashina
 
Discos: A common control software for the SRT and the other italian radiotele...
Discos: A common control software for the SRT and the other italian radiotele...Discos: A common control software for the SRT and the other italian radiotele...
Discos: A common control software for the SRT and the other italian radiotele...Sergio Poppi
 
Review regional Source Specific Station Corrections (SSSCs) developed for no...
Review regional Source Specific Station Corrections (SSSCs) developed for  no...Review regional Source Specific Station Corrections (SSSCs) developed for  no...
Review regional Source Specific Station Corrections (SSSCs) developed for no...Ivan Kitov
 
Snowmobile mode surveys by ClearView Geophysics Inc.
Snowmobile mode surveys by ClearView Geophysics Inc.Snowmobile mode surveys by ClearView Geophysics Inc.
Snowmobile mode surveys by ClearView Geophysics Inc.JoeMihelcic
 
RT15 Berkeley | Requirements on Power Amplifiers and HIL Real-Time Processors...
RT15 Berkeley | Requirements on Power Amplifiers and HIL Real-Time Processors...RT15 Berkeley | Requirements on Power Amplifiers and HIL Real-Time Processors...
RT15 Berkeley | Requirements on Power Amplifiers and HIL Real-Time Processors...OPAL-RT TECHNOLOGIES
 
postertemplate_plc_v36_final2
postertemplate_plc_v36_final2postertemplate_plc_v36_final2
postertemplate_plc_v36_final2Patrick Cavins
 
4 IGARSS2011kobayashiPi-SARearthquak20110724b.ppt
4 IGARSS2011kobayashiPi-SARearthquak20110724b.ppt4 IGARSS2011kobayashiPi-SARearthquak20110724b.ppt
4 IGARSS2011kobayashiPi-SARearthquak20110724b.pptgrssieee
 
High Definition On MPEG In Internet Protocol (Wbm Comments)
High Definition On MPEG In Internet Protocol (Wbm Comments)High Definition On MPEG In Internet Protocol (Wbm Comments)
High Definition On MPEG In Internet Protocol (Wbm Comments)Kelly Daniels
 
OSMC 2012 | Monitoring at CERN by Christophe Haen
OSMC 2012 | Monitoring at CERN by Christophe HaenOSMC 2012 | Monitoring at CERN by Christophe Haen
OSMC 2012 | Monitoring at CERN by Christophe HaenNETWAYS
 
LAM_TOMMY_PRESENTATION_FIN
LAM_TOMMY_PRESENTATION_FINLAM_TOMMY_PRESENTATION_FIN
LAM_TOMMY_PRESENTATION_FINTommy Lam
 

What's hot (18)

Behalf Of Pamela Collaboration
Behalf Of Pamela CollaborationBehalf Of Pamela Collaboration
Behalf Of Pamela Collaboration
 
SkySweeper: A High Wire Robot
SkySweeper: A High Wire RobotSkySweeper: A High Wire Robot
SkySweeper: A High Wire Robot
 
自律移動ロボット向けハード・ソフト協調のためのコンポーネント設計支援ツール
自律移動ロボット向けハード・ソフト協調のためのコンポーネント設計支援ツール自律移動ロボット向けハード・ソフト協調のためのコンポーネント設計支援ツール
自律移動ロボット向けハード・ソフト協調のためのコンポーネント設計支援ツール
 
FPGA処理をROSコンポーネント化する自動設計環境
FPGA処理をROSコンポーネント化する自動設計環境FPGA処理をROSコンポーネント化する自動設計環境
FPGA処理をROSコンポーネント化する自動設計環境
 
Track Finding in LHCb's 2020 Trigger
Track Finding in LHCb's 2020 TriggerTrack Finding in LHCb's 2020 Trigger
Track Finding in LHCb's 2020 Trigger
 
IGARSS 2011 pt slides_7 28 2011.ppt
IGARSS 2011 pt slides_7 28 2011.pptIGARSS 2011 pt slides_7 28 2011.ppt
IGARSS 2011 pt slides_7 28 2011.ppt
 
cReComp : Automated Design Tool for ROS-Compliant FPGA Component
cReComp : Automated Design Tool  for ROS-Compliant FPGA Component cReComp : Automated Design Tool  for ROS-Compliant FPGA Component
cReComp : Automated Design Tool for ROS-Compliant FPGA Component
 
Uav flight control system with ins gps
Uav flight control system with ins gpsUav flight control system with ins gps
Uav flight control system with ins gps
 
FPGAを用いた処理のロボット向けコンポーネントの設計生産性評価
FPGAを用いた処理のロボット向けコンポーネントの設計生産性評価FPGAを用いた処理のロボット向けコンポーネントの設計生産性評価
FPGAを用いた処理のロボット向けコンポーネントの設計生産性評価
 
Discos: A common control software for the SRT and the other italian radiotele...
Discos: A common control software for the SRT and the other italian radiotele...Discos: A common control software for the SRT and the other italian radiotele...
Discos: A common control software for the SRT and the other italian radiotele...
 
Review regional Source Specific Station Corrections (SSSCs) developed for no...
Review regional Source Specific Station Corrections (SSSCs) developed for  no...Review regional Source Specific Station Corrections (SSSCs) developed for  no...
Review regional Source Specific Station Corrections (SSSCs) developed for no...
 
Snowmobile mode surveys by ClearView Geophysics Inc.
Snowmobile mode surveys by ClearView Geophysics Inc.Snowmobile mode surveys by ClearView Geophysics Inc.
Snowmobile mode surveys by ClearView Geophysics Inc.
 
RT15 Berkeley | Requirements on Power Amplifiers and HIL Real-Time Processors...
RT15 Berkeley | Requirements on Power Amplifiers and HIL Real-Time Processors...RT15 Berkeley | Requirements on Power Amplifiers and HIL Real-Time Processors...
RT15 Berkeley | Requirements on Power Amplifiers and HIL Real-Time Processors...
 
postertemplate_plc_v36_final2
postertemplate_plc_v36_final2postertemplate_plc_v36_final2
postertemplate_plc_v36_final2
 
4 IGARSS2011kobayashiPi-SARearthquak20110724b.ppt
4 IGARSS2011kobayashiPi-SARearthquak20110724b.ppt4 IGARSS2011kobayashiPi-SARearthquak20110724b.ppt
4 IGARSS2011kobayashiPi-SARearthquak20110724b.ppt
 
High Definition On MPEG In Internet Protocol (Wbm Comments)
High Definition On MPEG In Internet Protocol (Wbm Comments)High Definition On MPEG In Internet Protocol (Wbm Comments)
High Definition On MPEG In Internet Protocol (Wbm Comments)
 
OSMC 2012 | Monitoring at CERN by Christophe Haen
OSMC 2012 | Monitoring at CERN by Christophe HaenOSMC 2012 | Monitoring at CERN by Christophe Haen
OSMC 2012 | Monitoring at CERN by Christophe Haen
 
LAM_TOMMY_PRESENTATION_FIN
LAM_TOMMY_PRESENTATION_FINLAM_TOMMY_PRESENTATION_FIN
LAM_TOMMY_PRESENTATION_FIN
 

Similar to SeRC: de novo assembly workshop. Francesco Vezzi

Integrated Detector Electronics (IDEAS) ASIC product update
Integrated Detector Electronics (IDEAS) ASIC product updateIntegrated Detector Electronics (IDEAS) ASIC product update
Integrated Detector Electronics (IDEAS) ASIC product updateGunnar Maehlum
 
Positioning techniques in 3 g networks (1)
Positioning techniques in 3 g networks (1)Positioning techniques in 3 g networks (1)
Positioning techniques in 3 g networks (1)kike2005
 
Advanced lock in amplifier for detection of phase transitions in liquid crystals
Advanced lock in amplifier for detection of phase transitions in liquid crystalsAdvanced lock in amplifier for detection of phase transitions in liquid crystals
Advanced lock in amplifier for detection of phase transitions in liquid crystalsIAEME Publication
 
Advanced Oscilloscope Technologies enabling Terabit Optical Communications
Advanced Oscilloscope Technologies enabling Terabit Optical CommunicationsAdvanced Oscilloscope Technologies enabling Terabit Optical Communications
Advanced Oscilloscope Technologies enabling Terabit Optical CommunicationsCPqD
 
AMAR_KANTETI_RESUME
AMAR_KANTETI_RESUMEAMAR_KANTETI_RESUME
AMAR_KANTETI_RESUMEamar kanteti
 
IEEE CASE 2011, Italy - Conference Paper Presentation
IEEE CASE 2011, Italy - Conference Paper PresentationIEEE CASE 2011, Italy - Conference Paper Presentation
IEEE CASE 2011, Italy - Conference Paper Presentationashishrratnakar
 
OpenCAPI-based Image Analysis Pipeline for 18 GB/s kilohertz-framerate X-ray ...
OpenCAPI-based Image Analysis Pipeline for 18 GB/s kilohertz-framerate X-ray ...OpenCAPI-based Image Analysis Pipeline for 18 GB/s kilohertz-framerate X-ray ...
OpenCAPI-based Image Analysis Pipeline for 18 GB/s kilohertz-framerate X-ray ...Ganesan Narayanasamy
 
The Search for Gravitational Waves
The Search for Gravitational WavesThe Search for Gravitational Waves
The Search for Gravitational Wavesinside-BigData.com
 
Optical Modulation Analysis (OMA) Present and Future
Optical Modulation Analysis (OMA) Present and FutureOptical Modulation Analysis (OMA) Present and Future
Optical Modulation Analysis (OMA) Present and FutureCPqD
 
150807 Fast R-CNN
150807 Fast R-CNN150807 Fast R-CNN
150807 Fast R-CNNJunho Cho
 
Huawei_MIMO_solution.pdf
Huawei_MIMO_solution.pdfHuawei_MIMO_solution.pdf
Huawei_MIMO_solution.pdfssuser32515c
 
customization of a deep learning accelerator, based on NVDLA
customization of a deep learning accelerator, based on NVDLAcustomization of a deep learning accelerator, based on NVDLA
customization of a deep learning accelerator, based on NVDLAShien-Chun Luo
 
Towards Terabit per Second Optical Networking
Towards Terabit per Second Optical NetworkingTowards Terabit per Second Optical Networking
Towards Terabit per Second Optical NetworkingCPqD
 
LTE Features, Link Budget & Basic Principle
LTE Features, Link Budget & Basic PrincipleLTE Features, Link Budget & Basic Principle
LTE Features, Link Budget & Basic PrincipleMd Mustafizur Rahman
 

Similar to SeRC: de novo assembly workshop. Francesco Vezzi (20)

Integrated Detector Electronics (IDEAS) ASIC product update
Integrated Detector Electronics (IDEAS) ASIC product updateIntegrated Detector Electronics (IDEAS) ASIC product update
Integrated Detector Electronics (IDEAS) ASIC product update
 
ThesisPresentation_Upd
ThesisPresentation_UpdThesisPresentation_Upd
ThesisPresentation_Upd
 
Positioning techniques in 3 g networks (1)
Positioning techniques in 3 g networks (1)Positioning techniques in 3 g networks (1)
Positioning techniques in 3 g networks (1)
 
Corralling Big Data at TACC
Corralling Big Data at TACCCorralling Big Data at TACC
Corralling Big Data at TACC
 
Advanced lock in amplifier for detection of phase transitions in liquid crystals
Advanced lock in amplifier for detection of phase transitions in liquid crystalsAdvanced lock in amplifier for detection of phase transitions in liquid crystals
Advanced lock in amplifier for detection of phase transitions in liquid crystals
 
Advanced Oscilloscope Technologies enabling Terabit Optical Communications
Advanced Oscilloscope Technologies enabling Terabit Optical CommunicationsAdvanced Oscilloscope Technologies enabling Terabit Optical Communications
Advanced Oscilloscope Technologies enabling Terabit Optical Communications
 
AMAR_KANTETI_RESUME
AMAR_KANTETI_RESUMEAMAR_KANTETI_RESUME
AMAR_KANTETI_RESUME
 
IEEE CASE 2011, Italy - Conference Paper Presentation
IEEE CASE 2011, Italy - Conference Paper PresentationIEEE CASE 2011, Italy - Conference Paper Presentation
IEEE CASE 2011, Italy - Conference Paper Presentation
 
OpenCAPI-based Image Analysis Pipeline for 18 GB/s kilohertz-framerate X-ray ...
OpenCAPI-based Image Analysis Pipeline for 18 GB/s kilohertz-framerate X-ray ...OpenCAPI-based Image Analysis Pipeline for 18 GB/s kilohertz-framerate X-ray ...
OpenCAPI-based Image Analysis Pipeline for 18 GB/s kilohertz-framerate X-ray ...
 
The Search for Gravitational Waves
The Search for Gravitational WavesThe Search for Gravitational Waves
The Search for Gravitational Waves
 
Optical Modulation Analysis (OMA) Present and Future
Optical Modulation Analysis (OMA) Present and FutureOptical Modulation Analysis (OMA) Present and Future
Optical Modulation Analysis (OMA) Present and Future
 
150807 Fast R-CNN
150807 Fast R-CNN150807 Fast R-CNN
150807 Fast R-CNN
 
Huawei_MIMO_solution.pdf
Huawei_MIMO_solution.pdfHuawei_MIMO_solution.pdf
Huawei_MIMO_solution.pdf
 
Resume201411
Resume201411Resume201411
Resume201411
 
customization of a deep learning accelerator, based on NVDLA
customization of a deep learning accelerator, based on NVDLAcustomization of a deep learning accelerator, based on NVDLA
customization of a deep learning accelerator, based on NVDLA
 
Towards Terabit per Second Optical Networking
Towards Terabit per Second Optical NetworkingTowards Terabit per Second Optical Networking
Towards Terabit per Second Optical Networking
 
LTE Features, Link Budget & Basic Principle
LTE Features, Link Budget & Basic PrincipleLTE Features, Link Budget & Basic Principle
LTE Features, Link Budget & Basic Principle
 
6600ingles
6600ingles6600ingles
6600ingles
 
Thesis presentation
Thesis presentationThesis presentation
Thesis presentation
 
Parameters for drive test
Parameters for drive testParameters for drive test
Parameters for drive test
 

Recently uploaded

TOTAL CHOLESTEROL (lipid profile test).pptx
TOTAL CHOLESTEROL (lipid profile test).pptxTOTAL CHOLESTEROL (lipid profile test).pptx
TOTAL CHOLESTEROL (lipid profile test).pptxdharshini369nike
 
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptxTHE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptxNandakishor Bhaurao Deshmukh
 
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |aasikanpl
 
Call Girls in Hauz Khas Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Hauz Khas Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Hauz Khas Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Hauz Khas Delhi 💯Call Us 🔝9953322196🔝 💯Escort.aasikanpl
 
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.PraveenaKalaiselvan1
 
Heredity: Inheritance and Variation of Traits
Heredity: Inheritance and Variation of TraitsHeredity: Inheritance and Variation of Traits
Heredity: Inheritance and Variation of TraitsCharlene Llagas
 
Speech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptxSpeech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptxpriyankatabhane
 
Temporomandibular joint Muscles of Mastication
Temporomandibular joint Muscles of MasticationTemporomandibular joint Muscles of Mastication
Temporomandibular joint Muscles of Masticationvidulajaib
 
Recombinant DNA technology( Transgenic plant and animal)
Recombinant DNA technology( Transgenic plant and animal)Recombinant DNA technology( Transgenic plant and animal)
Recombinant DNA technology( Transgenic plant and animal)DHURKADEVIBASKAR
 
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptxLIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptxmalonesandreagweneth
 
RESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptx
RESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptxRESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptx
RESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptxFarihaAbdulRasheed
 
Manassas R - Parkside Middle School 🌎🏫
Manassas R - Parkside Middle School 🌎🏫Manassas R - Parkside Middle School 🌎🏫
Manassas R - Parkside Middle School 🌎🏫qfactory1
 
Evidences of Evolution General Biology 2
Evidences of Evolution General Biology 2Evidences of Evolution General Biology 2
Evidences of Evolution General Biology 2John Carlo Rollon
 
Call Us ≽ 9953322196 ≼ Call Girls In Lajpat Nagar (Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Lajpat Nagar (Delhi) |Call Us ≽ 9953322196 ≼ Call Girls In Lajpat Nagar (Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Lajpat Nagar (Delhi) |aasikanpl
 
Forest laws, Indian forest laws, why they are important
Forest laws, Indian forest laws, why they are importantForest laws, Indian forest laws, why they are important
Forest laws, Indian forest laws, why they are importantadityabhardwaj282
 
Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Patrick Diehl
 
Microphone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptxMicrophone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptxpriyankatabhane
 
Artificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C PArtificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C PPRINCE C P
 
Call Girls in Aiims Metro Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Aiims Metro Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Aiims Metro Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Aiims Metro Delhi 💯Call Us 🔝9953322196🔝 💯Escort.aasikanpl
 

Recently uploaded (20)

TOTAL CHOLESTEROL (lipid profile test).pptx
TOTAL CHOLESTEROL (lipid profile test).pptxTOTAL CHOLESTEROL (lipid profile test).pptx
TOTAL CHOLESTEROL (lipid profile test).pptx
 
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptxTHE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
 
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
 
Call Girls in Hauz Khas Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Hauz Khas Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Hauz Khas Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Hauz Khas Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
 
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
 
Heredity: Inheritance and Variation of Traits
Heredity: Inheritance and Variation of TraitsHeredity: Inheritance and Variation of Traits
Heredity: Inheritance and Variation of Traits
 
Speech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptxSpeech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptx
 
Temporomandibular joint Muscles of Mastication
Temporomandibular joint Muscles of MasticationTemporomandibular joint Muscles of Mastication
Temporomandibular joint Muscles of Mastication
 
Recombinant DNA technology( Transgenic plant and animal)
Recombinant DNA technology( Transgenic plant and animal)Recombinant DNA technology( Transgenic plant and animal)
Recombinant DNA technology( Transgenic plant and animal)
 
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptxLIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
 
RESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptx
RESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptxRESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptx
RESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptx
 
Manassas R - Parkside Middle School 🌎🏫
Manassas R - Parkside Middle School 🌎🏫Manassas R - Parkside Middle School 🌎🏫
Manassas R - Parkside Middle School 🌎🏫
 
Volatile Oils Pharmacognosy And Phytochemistry -I
Volatile Oils Pharmacognosy And Phytochemistry -IVolatile Oils Pharmacognosy And Phytochemistry -I
Volatile Oils Pharmacognosy And Phytochemistry -I
 
Evidences of Evolution General Biology 2
Evidences of Evolution General Biology 2Evidences of Evolution General Biology 2
Evidences of Evolution General Biology 2
 
Call Us ≽ 9953322196 ≼ Call Girls In Lajpat Nagar (Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Lajpat Nagar (Delhi) |Call Us ≽ 9953322196 ≼ Call Girls In Lajpat Nagar (Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Lajpat Nagar (Delhi) |
 
Forest laws, Indian forest laws, why they are important
Forest laws, Indian forest laws, why they are importantForest laws, Indian forest laws, why they are important
Forest laws, Indian forest laws, why they are important
 
Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?
 
Microphone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptxMicrophone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptx
 
Artificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C PArtificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C P
 
Call Girls in Aiims Metro Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Aiims Metro Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Aiims Metro Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Aiims Metro Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
 

SeRC: de novo assembly workshop. Francesco Vezzi

  • 1. De novo assembly, a multi-technology approach: Illumina, PacBio, and OpGen PhD. Francesco Vezzi Senior Bioinformatician, NGI-Stockholm
  • 2. Both Stockholm and Uppsala nodes Illumina HiSeq 2000/2500 16 Illumina MiSeq 3 Life Technologies SOLiD 5500xl 4 Life Technologies SOLiD 5500wildfire 2 Life Technologies Ion Torrent 2 Life Technologies Ion Proton 6 Life Technologies Sanger ABI3730 2 Pacific Biosciences RSII 1 Argus Whole Genome Mapping System 1 One of 3 best-equipped sequencing sites in Europe
  • 3. In this talk Illumina (Stockholm): • 100/150 bp paired reads (low error rate) • 900/200 Gbp in 6/2 day(s) PacBio (Uppsala): • 8.5 Kbp reads, (max 30Kbp, high error rate) • 375 Mbp (1 SMRT Cell) in 10 hours OpGen Argus System (Stockholm): • ~300 Kbp maps • 10 Gbp in ~1 day
  • 4. Optical Maps • Restriction Map ◦ Representation of the cut sites on a given DNA molecule to provide spatial information of genetic loci • An enzyme is selected and used to cut the molecules. This provides a 2D representation of the molecule structure
  • 5. Optical Maps: workflow DNA extraction directly from culture Quality control of extracted material Prepare a chip Run Argus System Data assembly StepsTime 3-8h 1h 1.5h 1h 2-8h Notes
  • 6. Closing genomes with Optical Maps De novo reconstructs parts missing in the reference strain Correctly assembles long tandem repeats De Novo assembly (Illumina, PacBio) Set of un-ordered and not oriented contigs Optical Map Contigs
  • 7. Case Study: Combing all the technologies ~15 Mbp genome sequenced at High Coverage with: • Illumina HiSeq: • 500X PE libraries (180bp and 650bp insert) • 150X MP library (3Kbp) • 150X MP library (7Kbp) • PacBio • 50/60X with reads longer than 2Kbp • OpGen • 3 chips (only one worked really well) • 300X coverage • Average map length 320Kbp
  • 8. Assembly Strategy https://github.com/vezzi/de_novo_scilife Semi-automated pipeline for de novo assembly: • Global configuration file  tools and system configuration • Sample configuration file  samples description 3 modules: 1. QC-module (Illumina only): • Adaptor removal, kmer-analysis, fastqc, (insert size estimation) 2. Assemble-module (Illumina only): • Runs specified assemblers and outputs executed commands 3. Validation-module: • FRCbam, coverage analysis, GC-analysis, (N50) I NEED USERS/FEEDBACK/CONTIRBUTIONS
  • 9. QC-Module Kmer analysis: • Samples complexity • Error rate • Heterozygosity 0 1000 2000 3000 4000 5000 6000 05000100001500020000 Insert Size Histogram for All_Reads in file lib_3000.bam Insert Size Count FR RF TANDEM FASTQC Adaptor removal Alignment (partial assembly)
  • 10. Assemble-Module Illumina only: • SOAPdenovo • MaSuRCA • Allpaths-LG PacBio only: • HGAP • CABOG Hybrid: • PB-jelly (HAH) >5000 #scaffolds totalLength maxContigLength N50 N80 percentageNs Allpaths-LG 227 14513103 596012 139364 57619 15% MASURCA 163 18549484 1188669 526519 282507 2% HGAP 290 14399273 763592 142483 37117 0% PB-Jelly 179 14718213 747750 195225 85127 13% • Try-and-fail process • Automated pipeline developed in order to streamline these analysis • MASURCA surprisingly the “best” assembler
  • 11. MaSuRCA HGAP PB-Jelly (HAH) Validation-Module
  • 13. Optical Maps PacBio produces the best assembly however 290 contigs contigs are produced. Optical Maps allowed to obtain the 2D representation of the 7 chromosomes. N.B. chromosome number was one of the biological questions of this project!!! But much more can be done!!!
  • 14. Incredible tool to finish (or almost finish) genomes % contigs placed Total size of placed contigs % size placed contigs % genome covered pacBio+OpGene 94.12 11578995 97% 77.05 Allpaths+OpGene 71.88 10692027 84% 52.88 Allpaths+Masurca+Opgene 80.65 27506424 92% 69.64 Allpaths+PacBio+Opgene 82.32 22271022 91% 83.05 Masurca+PacBio+pgene 94.44 28393392 98% 83.79 Allpaths+Masurca+PacBio+Opgene 85.42 39085419 94% 87.39 Combing all the technologies
  • 15. Conclusions – Take home message Attempt to automate de novo assembly process: • https://github.com/vezzi/de_novo_scilife • Not 100% automated Illumina, PacBio, Hybrid assemblies: • PacBio alone seems to produce the best assemblers • Hybrid assembly seems to not be able to correct merged-assembly problems Mixing technologies is always a good idea: • Possibility to compensate technological biases • Allows to produce better assemblies