SlideShare a Scribd company logo
The best of both worlds
Combining PacBio with short read technology
  for improved de novo genome assembly

          Lex Nederbragt, NSC and CEES
           lex.nederbragt@bio.uio.no
This talk
Why does everybody want longer reads?


        … for genome assemblies
What is a genome assembly


    Hierarchical structure

reads

 contigs

   scaffolds
Sequence data

                           Reads
                                                    reads

                                                      contigs

                                                        scaffolds



original DNA

 fragments




original DNA

 fragments

                  Sequenced ends




               http://www.cbcb.umd.edu/research/assembly_primer.shtml
Contigs

                          Building contigs
                                                               reads

                                                                 contigs

                                                                   scaffolds


                 ACGCGATTCAGGTTACCACG
                   GCGATTCAGGTTACCACGCG
                     GATTCAGGTTACCACGCGTA
                       TTCAGGTTACCACGCGTAGC
                         CAGGTTACCACGCGTAGCGC
  Aligned reads            GGTTACCACGCGTAGCGCAT
                             TTACCACGCGTAGCGCATTA
                                ACCACGCGTAGCGCATTACA
                                  CACGCGTAGCGCATTACACA
                                    CGCGTAGCGCATTACACAGA
                                      CGTAGCGCATTACACAGATT
                                        TAGCGCATTACACAGATTAG
Consensus contig ACGCGATTCAGGTTACCACGCGTAGCGCATTACACAGATTAG
Contigs

                          Building contigs
                                                                     reads

                                                                       contigs

                                                                         scaffolds




     Repeat copy 1                                    Repeat copy 2




                                                          Contig orientation?
                                                            Contig order?




Collapsed repeat
   consensus
                     http://www.cbcb.umd.edu/research/assembly_primer.shtml
Mate pairs

                          Other read type
                                                      reads

                                                        contigs

                                                          scaffolds




     Repeat copy 1                      Repeat copy 2




(much) longer fragments
                                            mate pair reads
Scaffolds

                 Ordered, oriented contigs
                                               reads

                                                 contigs

                                                   scaffolds




    mate pairs
contigs



                           gap size estimate
What is a genome assembly


    Hierarchical structure

reads                            ACGCGATTCAGGTTACCACG
                                   GCGATTCAGGTTACCACGCG
                                     GATTCAGGTTACCACGCGTA
                                       TTCAGGTTACCACGCGTAGC
                                         CAGGTTACCACGCGTAGCGC
                  Aligned reads            GGTTACCACGCGTAGCGCAT
                                             TTACCACGCGTAGCGCATTA
                                                ACCACGCGTAGCGCATTACA
                                                  CACGCGTAGCGCATTACACA
                                                    CGCGTAGCGCATTACACAGA

 contigs                                              CGTAGCGCATTACACAGATT
                                                        TAGCGCATTACACAGATTAG
                Consensus contig ACGCGATTCAGGTTACCACGCGTAGCGCATTACACAGATTAG




   scaffolds
Genome assembly




So, what’s so hard about it?
1) Repeats

                                                                     reads

                                                                       contigs

                                                                         scaffolds




     Repeat copy 1                                    Repeat copy 2




                                         Repeats break up contigs


Collapsed repeat
   consensus
                     http://www.cbcb.umd.edu/research/assembly_primer.shtml
2) Heterozygosity



                                                               Differences
                                                              between sister
                                                          *   chromosomes



                                                          *




                                                          *




http://commons.wikimedia.org/wiki/File:Chromosome_1.svg
2) Heterozygosity




             Polymorphic contig 2

Contig 1                            Contig 4
             Polymorphic contig 3
2) Heterozygosity




http://www.astraean.com/borderwars/wp-content/uploads/2012/04/heterozygoats.jpg
and many other sites
3) Many programs to choose from




Zhang et al. (2011) doi:10.1371/journal.pone.0017915.g001
Assembly: challenges
         Repeat copy 1                               Repeat copy 2




                         Knowing how to use the programs



Heterozygosity
                              Polymorphic contig 2

          Contig 1                                            Contig 4
                              Polymorphic contig 3
So, why does everybody want longer reads?




http://www.autobizz.com.my/forum/forum/General-Chat/944-The-worlds-longest-car.html
Longer reads?
Repeat copy 1                                 Repeat copy 2




    Long reads can span repeats and heterozygous regions




                       Polymorphic contig 2

 Contig 1                                              Contig 4
                       Polymorphic contig 3
PacBio to the rescue?
High-throughput sequencing

                           Library preparation

SMRTBell'template'
SMRTBell'template'




Standard'Sequencing'
Standard'Sequencing'

                                           Generates& pass& each&
                                                    one&  on&   molecule&
           Insert&
      Large&     Sizes&                    Generates& pass& each&
                                                    one&  on&   molecule&
     Large Insert& Sizes
      Large&     Sizes&
            Insert                         sequenced&
                                            Single pass
                                           sequenced&


Circular'Consensus'Sequencing'
Circular'Consensus'Sequencing'                               Continued generations
                                                             of reads

  Small Insert Sizes&
   Small&
   Small&
         Insert&
               Sizes
         Insert&
               Sizes&

                                           Multiple mul8ple&
                                                    passes passes& each&
                                           Generates&            on&   molecule&
                                           Generates&
                                                    mul8ple&
                                           sequenced&      passes& each&
                                                                 on&   molecule&
                                           sequenced&
High-throughput sequencing

      Raw read length
High-throughput sequencing
SMRTBell'template'

                           Raw reads and subreads

Standard'Sequencing'


                                            Generates& pass& each&
                                                     one&  on&   molecule&
     Large Insert& Sizes
      Large&     Sizes&
            Insert                           Single pass
                                            sequenced&


                                           ‘Subreads’
Circular'Consensus'Sequencing'



  Small Insert Sizes&
   Small&Insert&
               Sizes

                                            Multiple mul8ple&
                                                     passes passes& each&
                                            Generates&            on&   molecule&
                                            sequenced&
PacBio: uses
SMRTBell'template'

                           Long reads  low quality

Standard'Sequencing'


                                             Generates& pass& each&
                                                      one&  on&   molecule&
     Large Insert& Sizes
      Large&     Sizes&
            Insert                            Single pass
                                             sequenced&
                                               85-87% accuracy
Circular'Consensus'Sequencing'
                             Useful for assembly?
    Small&
         Insert&
               Sizes&


                                             Generates&
                                                      mul8ple&
                                                             passes& each&
                                                                   on&   molecule&
                                             sequenced&
Solutions for assembly
Solutions for assembly (1)




   Designed by Pacific Biosciences




http://www.clker.com/clipart-4245.html
Solutions for assembly (2)
   Broad Institute




Need a special recipe
  for sequencing
Solutions for assembly (3)

                 PacBioToCA
        Error correct with short reads




Celera assembler


   http://schatzlab.cshl.edu/presentations/2012-01-17.PAG.SMRTassembly.pdf
PacBioToCA




             Koren et al, 2012
Shameless self-promotion

flxlexblog.wordpress.com
Shameless self-promotion




            @lexnederbragt
The Atlantic cod genome project
First draft




Fragmented assembly
    - short contigs
    - many gap bases
                                http://en.wikipedia.org
First draft



6467 scaffolds




                   35% gap bases
The causes




Short Tandem Repeats (>20% of gaps)
The causes


           Heterozygosity?



            Polymorphic contig 2

Contig 1                           Contig 4
            Polymorphic contig 3
The goal



 23 pseudochromosomes




       Longer contigs




                        Below 5% gap bases



PacBio to the rescue?
The approach
 SMRTBell'template'


         Libraries

 Standard'Sequencing'


                                  Generates& pass& ea
                                           one&  on&
      Large Insert& Sizes
       Large&     Sizes&
             Insert               sequenced&


Aim for looooong insert sizes
 Circular'Consensus'Sequencing'


     Small&
          Insert&
                Sizes&


                                  Generates&
                                           mul8ple&
                                                  passes
                                  sequenced&
SMRTBell'template'        The approach

                                  Sequencing
      Standard'Sequencing'


                                                Generates& pass& each&
                                                         one&  on&   molecule&
          Large Insert& Sizes
           Large&     Sizes&
                 Insert                          Single pass
                                                sequenced&


    Sequence with 90 minute movies
     Circular'Consensus'Sequencing'


         Small&
              Insert&
                    Sizes&


                                                Generates&
                                                         mul8ple&
                                                                passes& each&
                                                                      on&   molecule&
10 x coverage in reads of at least 3000 bp      sequenced&




                No, we don’t throw this away…
The approach

Error-correction
PacBio results
                               100          Relative throughput at different minimum length cutoffs


                                                                                                      10kb lib 2
                                                                    Fraction of bases at minimum 10kb lib 1
                                                                                                 length
                                                                                                      4kb lib
                               80
Percentage of total sequence

                               60
                               40
                               20
                               0




                                     0kbp   3kbp      5kbp                                   10kbp              15kbp

                                                             Length cutoff longest subread


                                               Large library insert size important!
chnology

                                        PacBio results




              SMRTBell'template'
                 64 SMRT Cells
                                                    3.2 Gigabytes in raw reads at least 3kb
                                                                3.8 x coverage
                                                3




              Standard'Sequencing'


                                                        Generates& pass& each&
                                                                 one&  on&   molecule&
                  Large Insert& Sizes
                   Large&     Sizes&
                         Insert                         sequenced&


      2.2 Gigabytes in longest subreads reads
             Circular'Consensus'Sequencing'
                   Largest 15 kbp

                 Small&
                      Insert&
                            Sizes&


                                                        Generates&
                                                                 mul8ple&
                                                                        passes& each&
                                                                              on&   molecule&
PacBio results

Mapping to the cod genome
      11.4 kbp subread




       10.6 kbp subread




      10.9 kbp subread
Example 1


ACACAC repeat




232 bp Gap




TGTGTG repeat
Example 1
Example 1
Example 1
Scaffold               ...ACACAC     TGTGTG...

PacBio reads
               Unplaced contig
Example 2


TGTGTG repeat




     344 bp Gap
Example 2
Example 2

Scaffold       ...TGTGTG

PacBio reads

                     Heterozygosity?
Example 3

Scaffold


   PacBio reads
                  300 bp misassembly?
Error-correction




                          Work In Progress
http://openclipart.org/
Outlook




  Will PacBio solve our problems?
Outlook




  Or
Outlook



                Polymorphic contig 2

Contig 1                               Contig 4
                Polymorphic contig 3




   Will we find the heterozygous regions?
Outlook




 http://www.pasteur.fr/recherche/unites/Bbi/
 en.wikipedia.org
 and Martin Malmstrøm

More Related Content

What's hot

Protein computational analysis
Protein computational analysisProtein computational analysis
Protein computational analysis
Kinza Irshad
 
Next-generation sequencing and quality control: An Introduction (2016)
Next-generation sequencing and quality control: An Introduction (2016)Next-generation sequencing and quality control: An Introduction (2016)
Next-generation sequencing and quality control: An Introduction (2016)
Sebastian Schmeier
 
BPF Internals (eBPF)
BPF Internals (eBPF)BPF Internals (eBPF)
BPF Internals (eBPF)
Brendan Gregg
 
New generation sequencing equipments
New generation sequencing equipmentsNew generation sequencing equipments
New generation sequencing equipmentsKalaivani P
 
15 molecular markers techniques
15 molecular markers techniques15 molecular markers techniques
15 molecular markers techniques
AVINASH KUSHWAHA
 
Assembly and finishing
Assembly and finishingAssembly and finishing
Assembly and finishingNikolay Vyahhi
 
Physical maps and their use in annotations
Physical maps and their use in annotationsPhysical maps and their use in annotations
Physical maps and their use in annotations
Sheetal Mehla
 
Linkage analysis
Linkage analysisLinkage analysis
Linkage analysis
UshaYadav24
 
Molecular marker and its application to genome mapping and molecular breeding
Molecular marker and its application to genome mapping and molecular breedingMolecular marker and its application to genome mapping and molecular breeding
Molecular marker and its application to genome mapping and molecular breeding
FOODCROPS
 
Data retreival system
Data retreival systemData retreival system
Data retreival system
Shikha Thakur
 
Investigating the 3D structure of the genome with Hi-C data analysis
Investigating the 3D structure of the genome with Hi-C data analysisInvestigating the 3D structure of the genome with Hi-C data analysis
Investigating the 3D structure of the genome with Hi-C data analysis
tuxette
 
scRNA-Seq Workshop Presentation - Stem Cell Network 2018
scRNA-Seq Workshop Presentation - Stem Cell Network 2018scRNA-Seq Workshop Presentation - Stem Cell Network 2018
scRNA-Seq Workshop Presentation - Stem Cell Network 2018
David Cook
 
"Quantizing Deep Networks for Efficient Inference at the Edge," a Presentatio...
"Quantizing Deep Networks for Efficient Inference at the Edge," a Presentatio..."Quantizing Deep Networks for Efficient Inference at the Edge," a Presentatio...
"Quantizing Deep Networks for Efficient Inference at the Edge," a Presentatio...
Edge AI and Vision Alliance
 
Next generation sequencing
Next generation sequencingNext generation sequencing
Next generation sequencing
Uzma Jabeen
 
The uni prot knowledgebase
The uni prot knowledgebaseThe uni prot knowledgebase
The uni prot knowledgebase
Kew Sama
 
Next generation sequencing
Next generation sequencingNext generation sequencing
Next generation sequencing
Swathi Prabakar
 
Understanding DPDK algorithmics
Understanding DPDK algorithmicsUnderstanding DPDK algorithmics
Understanding DPDK algorithmics
Denys Haryachyy
 
Long non coding RNA or lncRNA
Long non coding RNA or lncRNALong non coding RNA or lncRNA
Long non coding RNA or lncRNA
MOHIT GOSWAMI
 

What's hot (20)

Protein computational analysis
Protein computational analysisProtein computational analysis
Protein computational analysis
 
Next-generation sequencing and quality control: An Introduction (2016)
Next-generation sequencing and quality control: An Introduction (2016)Next-generation sequencing and quality control: An Introduction (2016)
Next-generation sequencing and quality control: An Introduction (2016)
 
BPF Internals (eBPF)
BPF Internals (eBPF)BPF Internals (eBPF)
BPF Internals (eBPF)
 
New generation sequencing equipments
New generation sequencing equipmentsNew generation sequencing equipments
New generation sequencing equipments
 
15 molecular markers techniques
15 molecular markers techniques15 molecular markers techniques
15 molecular markers techniques
 
Small rna
Small rnaSmall rna
Small rna
 
Assembly and finishing
Assembly and finishingAssembly and finishing
Assembly and finishing
 
Physical maps and their use in annotations
Physical maps and their use in annotationsPhysical maps and their use in annotations
Physical maps and their use in annotations
 
Linkage analysis
Linkage analysisLinkage analysis
Linkage analysis
 
Molecular marker and its application to genome mapping and molecular breeding
Molecular marker and its application to genome mapping and molecular breedingMolecular marker and its application to genome mapping and molecular breeding
Molecular marker and its application to genome mapping and molecular breeding
 
Data retreival system
Data retreival systemData retreival system
Data retreival system
 
Investigating the 3D structure of the genome with Hi-C data analysis
Investigating the 3D structure of the genome with Hi-C data analysisInvestigating the 3D structure of the genome with Hi-C data analysis
Investigating the 3D structure of the genome with Hi-C data analysis
 
scRNA-Seq Workshop Presentation - Stem Cell Network 2018
scRNA-Seq Workshop Presentation - Stem Cell Network 2018scRNA-Seq Workshop Presentation - Stem Cell Network 2018
scRNA-Seq Workshop Presentation - Stem Cell Network 2018
 
RNA-Seq
RNA-SeqRNA-Seq
RNA-Seq
 
"Quantizing Deep Networks for Efficient Inference at the Edge," a Presentatio...
"Quantizing Deep Networks for Efficient Inference at the Edge," a Presentatio..."Quantizing Deep Networks for Efficient Inference at the Edge," a Presentatio...
"Quantizing Deep Networks for Efficient Inference at the Edge," a Presentatio...
 
Next generation sequencing
Next generation sequencingNext generation sequencing
Next generation sequencing
 
The uni prot knowledgebase
The uni prot knowledgebaseThe uni prot knowledgebase
The uni prot knowledgebase
 
Next generation sequencing
Next generation sequencingNext generation sequencing
Next generation sequencing
 
Understanding DPDK algorithmics
Understanding DPDK algorithmicsUnderstanding DPDK algorithmics
Understanding DPDK algorithmics
 
Long non coding RNA or lncRNA
Long non coding RNA or lncRNALong non coding RNA or lncRNA
Long non coding RNA or lncRNA
 

Viewers also liked

IonGAP - an Integrated Genome Assembly Platform for Ion Torrent Data
IonGAP - an Integrated Genome Assembly Platform for Ion Torrent DataIonGAP - an Integrated Genome Assembly Platform for Ion Torrent Data
IonGAP - an Integrated Genome Assembly Platform for Ion Torrent Data
Adrian Baez-Ortega
 
Jan2016 pac bio giab
Jan2016 pac bio giabJan2016 pac bio giab
Jan2016 pac bio giab
GenomeInABottle
 
NGS technologies - platforms and applications
NGS technologies - platforms and applicationsNGS technologies - platforms and applications
NGS technologies - platforms and applications
AGRF_Ltd
 
Long read sequencing - WEHI bioinformatics seminar - tue 16 june 2015
Long read sequencing -  WEHI  bioinformatics seminar - tue 16 june 2015Long read sequencing -  WEHI  bioinformatics seminar - tue 16 june 2015
Long read sequencing - WEHI bioinformatics seminar - tue 16 june 2015
Torsten Seemann
 
Updated: New High Throughput Sequencing technologies at the Norwegian Sequenc...
Updated: New High Throughput Sequencing technologies at the Norwegian Sequenc...Updated: New High Throughput Sequencing technologies at the Norwegian Sequenc...
Updated: New High Throughput Sequencing technologies at the Norwegian Sequenc...
Lex Nederbragt
 
Next-generation sequencing - variation discovery
Next-generation sequencing - variation discoveryNext-generation sequencing - variation discovery
Next-generation sequencing - variation discoveryJan Aerts
 
2014 June 17 PacBio User Group Meeting Presentation "How Looking for a Needle...
2014 June 17 PacBio User Group Meeting Presentation "How Looking for a Needle...2014 June 17 PacBio User Group Meeting Presentation "How Looking for a Needle...
2014 June 17 PacBio User Group Meeting Presentation "How Looking for a Needle...
Anne Deslattes Mays
 
20150601 bio sb_assembly_course
20150601 bio sb_assembly_course20150601 bio sb_assembly_course
20150601 bio sb_assembly_course
hansjansen9999
 
Improving and validating the Atlantic Cod genome assembly using PacBio
Improving and validating the Atlantic Cod genome assembly using PacBioImproving and validating the Atlantic Cod genome assembly using PacBio
Improving and validating the Atlantic Cod genome assembly using PacBio
Lex Nederbragt
 
A peek inside the bioinformatics black box - DCAMG Symposium - mon 20 july 2015
A peek inside the bioinformatics black box - DCAMG Symposium - mon 20 july 2015A peek inside the bioinformatics black box - DCAMG Symposium - mon 20 july 2015
A peek inside the bioinformatics black box - DCAMG Symposium - mon 20 july 2015
Torsten Seemann
 
Genome assembly: the art of trying to make one big thing from millions of ver...
Genome assembly: the art of trying to make one big thing from millions of ver...Genome assembly: the art of trying to make one big thing from millions of ver...
Genome assembly: the art of trying to make one big thing from millions of ver...
Keith Bradnam
 
De novo genome assembly - IMB Winter School - 7 July 2015
De novo genome assembly - IMB Winter School - 7 July 2015De novo genome assembly - IMB Winter School - 7 July 2015
De novo genome assembly - IMB Winter School - 7 July 2015Torsten Seemann
 
An introduction to RNA-seq data analysis
An introduction to RNA-seq data analysisAn introduction to RNA-seq data analysis
An introduction to RNA-seq data analysis
AGRF_Ltd
 
Next-generation sequencing format and visualization with ngs.plot
Next-generation sequencing format and visualization with ngs.plotNext-generation sequencing format and visualization with ngs.plot
Next-generation sequencing format and visualization with ngs.plot
Li Shen
 
Introduction to RNA-seq and RNA-seq Data Analysis (UEB-UAT Bioinformatics Cou...
Introduction to RNA-seq and RNA-seq Data Analysis (UEB-UAT Bioinformatics Cou...Introduction to RNA-seq and RNA-seq Data Analysis (UEB-UAT Bioinformatics Cou...
Introduction to RNA-seq and RNA-seq Data Analysis (UEB-UAT Bioinformatics Cou...
VHIR Vall d’Hebron Institut de Recerca
 
[2013.10.29] albertsen genomics metagenomics
[2013.10.29] albertsen genomics metagenomics[2013.10.29] albertsen genomics metagenomics
[2013.10.29] albertsen genomics metagenomicsMads Albertsen
 
Semiconductor Sequencing Applications for Plant Sciences
Semiconductor Sequencing Applications for Plant SciencesSemiconductor Sequencing Applications for Plant Sciences
Semiconductor Sequencing Applications for Plant Sciences
Thermo Fisher Scientific
 
Ngs de novo assembly progresses and challenges
Ngs de novo assembly progresses and challengesNgs de novo assembly progresses and challenges
Ngs de novo assembly progresses and challenges
Scott Edmunds
 
NGS Targeted Enrichment Technology in Cancer Research: NGS Tech Overview Webi...
NGS Targeted Enrichment Technology in Cancer Research: NGS Tech Overview Webi...NGS Targeted Enrichment Technology in Cancer Research: NGS Tech Overview Webi...
NGS Targeted Enrichment Technology in Cancer Research: NGS Tech Overview Webi...
QIAGEN
 

Viewers also liked (20)

IonGAP - an Integrated Genome Assembly Platform for Ion Torrent Data
IonGAP - an Integrated Genome Assembly Platform for Ion Torrent DataIonGAP - an Integrated Genome Assembly Platform for Ion Torrent Data
IonGAP - an Integrated Genome Assembly Platform for Ion Torrent Data
 
Jan2016 pac bio giab
Jan2016 pac bio giabJan2016 pac bio giab
Jan2016 pac bio giab
 
NGS technologies - platforms and applications
NGS technologies - platforms and applicationsNGS technologies - platforms and applications
NGS technologies - platforms and applications
 
Long read sequencing - WEHI bioinformatics seminar - tue 16 june 2015
Long read sequencing -  WEHI  bioinformatics seminar - tue 16 june 2015Long read sequencing -  WEHI  bioinformatics seminar - tue 16 june 2015
Long read sequencing - WEHI bioinformatics seminar - tue 16 june 2015
 
Updated: New High Throughput Sequencing technologies at the Norwegian Sequenc...
Updated: New High Throughput Sequencing technologies at the Norwegian Sequenc...Updated: New High Throughput Sequencing technologies at the Norwegian Sequenc...
Updated: New High Throughput Sequencing technologies at the Norwegian Sequenc...
 
Next-generation sequencing - variation discovery
Next-generation sequencing - variation discoveryNext-generation sequencing - variation discovery
Next-generation sequencing - variation discovery
 
2014 June 17 PacBio User Group Meeting Presentation "How Looking for a Needle...
2014 June 17 PacBio User Group Meeting Presentation "How Looking for a Needle...2014 June 17 PacBio User Group Meeting Presentation "How Looking for a Needle...
2014 June 17 PacBio User Group Meeting Presentation "How Looking for a Needle...
 
20150601 bio sb_assembly_course
20150601 bio sb_assembly_course20150601 bio sb_assembly_course
20150601 bio sb_assembly_course
 
Improving and validating the Atlantic Cod genome assembly using PacBio
Improving and validating the Atlantic Cod genome assembly using PacBioImproving and validating the Atlantic Cod genome assembly using PacBio
Improving and validating the Atlantic Cod genome assembly using PacBio
 
A peek inside the bioinformatics black box - DCAMG Symposium - mon 20 july 2015
A peek inside the bioinformatics black box - DCAMG Symposium - mon 20 july 2015A peek inside the bioinformatics black box - DCAMG Symposium - mon 20 july 2015
A peek inside the bioinformatics black box - DCAMG Symposium - mon 20 july 2015
 
Genome assembly: the art of trying to make one big thing from millions of ver...
Genome assembly: the art of trying to make one big thing from millions of ver...Genome assembly: the art of trying to make one big thing from millions of ver...
Genome assembly: the art of trying to make one big thing from millions of ver...
 
20140711 3 t_clark_ercc2.0_workshop
20140711 3 t_clark_ercc2.0_workshop20140711 3 t_clark_ercc2.0_workshop
20140711 3 t_clark_ercc2.0_workshop
 
De novo genome assembly - IMB Winter School - 7 July 2015
De novo genome assembly - IMB Winter School - 7 July 2015De novo genome assembly - IMB Winter School - 7 July 2015
De novo genome assembly - IMB Winter School - 7 July 2015
 
An introduction to RNA-seq data analysis
An introduction to RNA-seq data analysisAn introduction to RNA-seq data analysis
An introduction to RNA-seq data analysis
 
Next-generation sequencing format and visualization with ngs.plot
Next-generation sequencing format and visualization with ngs.plotNext-generation sequencing format and visualization with ngs.plot
Next-generation sequencing format and visualization with ngs.plot
 
Introduction to RNA-seq and RNA-seq Data Analysis (UEB-UAT Bioinformatics Cou...
Introduction to RNA-seq and RNA-seq Data Analysis (UEB-UAT Bioinformatics Cou...Introduction to RNA-seq and RNA-seq Data Analysis (UEB-UAT Bioinformatics Cou...
Introduction to RNA-seq and RNA-seq Data Analysis (UEB-UAT Bioinformatics Cou...
 
[2013.10.29] albertsen genomics metagenomics
[2013.10.29] albertsen genomics metagenomics[2013.10.29] albertsen genomics metagenomics
[2013.10.29] albertsen genomics metagenomics
 
Semiconductor Sequencing Applications for Plant Sciences
Semiconductor Sequencing Applications for Plant SciencesSemiconductor Sequencing Applications for Plant Sciences
Semiconductor Sequencing Applications for Plant Sciences
 
Ngs de novo assembly progresses and challenges
Ngs de novo assembly progresses and challengesNgs de novo assembly progresses and challenges
Ngs de novo assembly progresses and challenges
 
NGS Targeted Enrichment Technology in Cancer Research: NGS Tech Overview Webi...
NGS Targeted Enrichment Technology in Cancer Research: NGS Tech Overview Webi...NGS Targeted Enrichment Technology in Cancer Research: NGS Tech Overview Webi...
NGS Targeted Enrichment Technology in Cancer Research: NGS Tech Overview Webi...
 

Similar to Combining PacBio with short read technology for improved de novo genome assembly

2013 pag-equine-workshop
2013 pag-equine-workshop2013 pag-equine-workshop
2013 pag-equine-workshopc.titus.brown
 
How to sequence a large eukaryotic genome
How to sequence a large eukaryotic genomeHow to sequence a large eukaryotic genome
How to sequence a large eukaryotic genome
Lex Nederbragt
 
20110524zurichngs 1st pub
20110524zurichngs 1st pub20110524zurichngs 1st pub
20110524zurichngs 1st pubsesejun
 
Genome Assembly copy
Genome Assembly   copyGenome Assembly   copy
Genome Assembly copy
Pradeep Kumar
 
The Genome Assembly Problem
The Genome Assembly ProblemThe Genome Assembly Problem
The Genome Assembly Problem
Mark Chang
 
Git Going With DVCS v1.5.2
Git Going With DVCS v1.5.2Git Going With DVCS v1.5.2
Git Going With DVCS v1.5.2
Matthew McCullough
 
Rnaseq forgenefinding
Rnaseq forgenefindingRnaseq forgenefinding
Rnaseq forgenefinding
Sucheta Tripathy
 

Similar to Combining PacBio with short read technology for improved de novo genome assembly (7)

2013 pag-equine-workshop
2013 pag-equine-workshop2013 pag-equine-workshop
2013 pag-equine-workshop
 
How to sequence a large eukaryotic genome
How to sequence a large eukaryotic genomeHow to sequence a large eukaryotic genome
How to sequence a large eukaryotic genome
 
20110524zurichngs 1st pub
20110524zurichngs 1st pub20110524zurichngs 1st pub
20110524zurichngs 1st pub
 
Genome Assembly copy
Genome Assembly   copyGenome Assembly   copy
Genome Assembly copy
 
The Genome Assembly Problem
The Genome Assembly ProblemThe Genome Assembly Problem
The Genome Assembly Problem
 
Git Going With DVCS v1.5.2
Git Going With DVCS v1.5.2Git Going With DVCS v1.5.2
Git Going With DVCS v1.5.2
 
Rnaseq forgenefinding
Rnaseq forgenefindingRnaseq forgenefinding
Rnaseq forgenefinding
 

More from Lex Nederbragt

Coding & Best Practice in Programming in the NGS era
Coding & Best Practice in Programming in the NGS eraCoding & Best Practice in Programming in the NGS era
Coding & Best Practice in Programming in the NGS era
Lex Nederbragt
 
Why of version control
Why of version controlWhy of version control
Why of version control
Lex Nederbragt
 
Assembly: before and after
Assembly: before and afterAssembly: before and after
Assembly: before and after
Lex Nederbragt
 
Repeat after me: Is our research reproducible (enough)?
Repeat after me: Is our research reproducible (enough)? Repeat after me: Is our research reproducible (enough)?
Repeat after me: Is our research reproducible (enough)?
Lex Nederbragt
 
A different kettle of fish entirely: bioinformatic challenges and solutions f...
A different kettle of fish entirely: bioinformatic challenges and solutions f...A different kettle of fish entirely: bioinformatic challenges and solutions f...
A different kettle of fish entirely: bioinformatic challenges and solutions f...
Lex Nederbragt
 
New High Throughput Sequencing technologies at the Norwegian Sequencing Centr...
New High Throughput Sequencing technologies at the Norwegian Sequencing Centr...New High Throughput Sequencing technologies at the Norwegian Sequencing Centr...
New High Throughput Sequencing technologies at the Norwegian Sequencing Centr...
Lex Nederbragt
 
How and why I use blogging
How and why I use bloggingHow and why I use blogging
How and why I use blogging
Lex Nederbragt
 
Assembly of metagenomes
Assembly of metagenomesAssembly of metagenomes
Assembly of metagenomes
Lex Nederbragt
 
NGS techniques and data
NGS techniques and data NGS techniques and data
NGS techniques and data
Lex Nederbragt
 
NGS: bioinformatic challenges
NGS: bioinformatic challengesNGS: bioinformatic challenges
NGS: bioinformatic challenges
Lex Nederbragt
 

More from Lex Nederbragt (10)

Coding & Best Practice in Programming in the NGS era
Coding & Best Practice in Programming in the NGS eraCoding & Best Practice in Programming in the NGS era
Coding & Best Practice in Programming in the NGS era
 
Why of version control
Why of version controlWhy of version control
Why of version control
 
Assembly: before and after
Assembly: before and afterAssembly: before and after
Assembly: before and after
 
Repeat after me: Is our research reproducible (enough)?
Repeat after me: Is our research reproducible (enough)? Repeat after me: Is our research reproducible (enough)?
Repeat after me: Is our research reproducible (enough)?
 
A different kettle of fish entirely: bioinformatic challenges and solutions f...
A different kettle of fish entirely: bioinformatic challenges and solutions f...A different kettle of fish entirely: bioinformatic challenges and solutions f...
A different kettle of fish entirely: bioinformatic challenges and solutions f...
 
New High Throughput Sequencing technologies at the Norwegian Sequencing Centr...
New High Throughput Sequencing technologies at the Norwegian Sequencing Centr...New High Throughput Sequencing technologies at the Norwegian Sequencing Centr...
New High Throughput Sequencing technologies at the Norwegian Sequencing Centr...
 
How and why I use blogging
How and why I use bloggingHow and why I use blogging
How and why I use blogging
 
Assembly of metagenomes
Assembly of metagenomesAssembly of metagenomes
Assembly of metagenomes
 
NGS techniques and data
NGS techniques and data NGS techniques and data
NGS techniques and data
 
NGS: bioinformatic challenges
NGS: bioinformatic challengesNGS: bioinformatic challenges
NGS: bioinformatic challenges
 

Recently uploaded

Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Inflectra
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
Alison B. Lowndes
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
91mobiles
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
Prayukth K V
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
RTTS
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Ramesh Iyer
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
ControlCase
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
Product School
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Thierry Lestable
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
DianaGray10
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
Product School
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
KatiaHIMEUR1
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
BookNet Canada
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
Thijs Feryn
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Product School
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Tobias Schneck
 

Recently uploaded (20)

Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
 

Combining PacBio with short read technology for improved de novo genome assembly

  • 1. The best of both worlds Combining PacBio with short read technology for improved de novo genome assembly Lex Nederbragt, NSC and CEES lex.nederbragt@bio.uio.no
  • 3. Why does everybody want longer reads? … for genome assemblies
  • 4. What is a genome assembly Hierarchical structure reads contigs scaffolds
  • 5. Sequence data Reads reads contigs scaffolds original DNA fragments original DNA fragments Sequenced ends http://www.cbcb.umd.edu/research/assembly_primer.shtml
  • 6. Contigs Building contigs reads contigs scaffolds ACGCGATTCAGGTTACCACG GCGATTCAGGTTACCACGCG GATTCAGGTTACCACGCGTA TTCAGGTTACCACGCGTAGC CAGGTTACCACGCGTAGCGC Aligned reads GGTTACCACGCGTAGCGCAT TTACCACGCGTAGCGCATTA ACCACGCGTAGCGCATTACA CACGCGTAGCGCATTACACA CGCGTAGCGCATTACACAGA CGTAGCGCATTACACAGATT TAGCGCATTACACAGATTAG Consensus contig ACGCGATTCAGGTTACCACGCGTAGCGCATTACACAGATTAG
  • 7. Contigs Building contigs reads contigs scaffolds Repeat copy 1 Repeat copy 2 Contig orientation? Contig order? Collapsed repeat consensus http://www.cbcb.umd.edu/research/assembly_primer.shtml
  • 8. Mate pairs Other read type reads contigs scaffolds Repeat copy 1 Repeat copy 2 (much) longer fragments mate pair reads
  • 9. Scaffolds Ordered, oriented contigs reads contigs scaffolds mate pairs contigs gap size estimate
  • 10. What is a genome assembly Hierarchical structure reads ACGCGATTCAGGTTACCACG GCGATTCAGGTTACCACGCG GATTCAGGTTACCACGCGTA TTCAGGTTACCACGCGTAGC CAGGTTACCACGCGTAGCGC Aligned reads GGTTACCACGCGTAGCGCAT TTACCACGCGTAGCGCATTA ACCACGCGTAGCGCATTACA CACGCGTAGCGCATTACACA CGCGTAGCGCATTACACAGA contigs CGTAGCGCATTACACAGATT TAGCGCATTACACAGATTAG Consensus contig ACGCGATTCAGGTTACCACGCGTAGCGCATTACACAGATTAG scaffolds
  • 11. Genome assembly So, what’s so hard about it?
  • 12. 1) Repeats reads contigs scaffolds Repeat copy 1 Repeat copy 2 Repeats break up contigs Collapsed repeat consensus http://www.cbcb.umd.edu/research/assembly_primer.shtml
  • 13. 2) Heterozygosity Differences between sister * chromosomes * * http://commons.wikimedia.org/wiki/File:Chromosome_1.svg
  • 14. 2) Heterozygosity Polymorphic contig 2 Contig 1 Contig 4 Polymorphic contig 3
  • 16. 3) Many programs to choose from Zhang et al. (2011) doi:10.1371/journal.pone.0017915.g001
  • 17. Assembly: challenges Repeat copy 1 Repeat copy 2 Knowing how to use the programs Heterozygosity Polymorphic contig 2 Contig 1 Contig 4 Polymorphic contig 3
  • 18. So, why does everybody want longer reads? http://www.autobizz.com.my/forum/forum/General-Chat/944-The-worlds-longest-car.html
  • 19. Longer reads? Repeat copy 1 Repeat copy 2 Long reads can span repeats and heterozygous regions Polymorphic contig 2 Contig 1 Contig 4 Polymorphic contig 3
  • 20. PacBio to the rescue?
  • 21. High-throughput sequencing Library preparation SMRTBell'template' SMRTBell'template' Standard'Sequencing' Standard'Sequencing' Generates& pass& each& one& on& molecule& Insert& Large& Sizes& Generates& pass& each& one& on& molecule& Large Insert& Sizes Large& Sizes& Insert sequenced& Single pass sequenced& Circular'Consensus'Sequencing' Circular'Consensus'Sequencing' Continued generations of reads Small Insert Sizes& Small& Small& Insert& Sizes Insert& Sizes& Multiple mul8ple& passes passes& each& Generates& on& molecule& Generates& mul8ple& sequenced& passes& each& on& molecule& sequenced&
  • 22. High-throughput sequencing Raw read length
  • 23. High-throughput sequencing SMRTBell'template' Raw reads and subreads Standard'Sequencing' Generates& pass& each& one& on& molecule& Large Insert& Sizes Large& Sizes& Insert Single pass sequenced& ‘Subreads’ Circular'Consensus'Sequencing' Small Insert Sizes& Small&Insert& Sizes Multiple mul8ple& passes passes& each& Generates& on& molecule& sequenced&
  • 24. PacBio: uses SMRTBell'template' Long reads  low quality Standard'Sequencing' Generates& pass& each& one& on& molecule& Large Insert& Sizes Large& Sizes& Insert Single pass sequenced& 85-87% accuracy Circular'Consensus'Sequencing' Useful for assembly? Small& Insert& Sizes& Generates& mul8ple& passes& each& on& molecule& sequenced&
  • 26. Solutions for assembly (1) Designed by Pacific Biosciences http://www.clker.com/clipart-4245.html
  • 27. Solutions for assembly (2) Broad Institute Need a special recipe for sequencing
  • 28. Solutions for assembly (3) PacBioToCA Error correct with short reads Celera assembler http://schatzlab.cshl.edu/presentations/2012-01-17.PAG.SMRTassembly.pdf
  • 29. PacBioToCA Koren et al, 2012
  • 31. Shameless self-promotion @lexnederbragt
  • 32. The Atlantic cod genome project
  • 33. First draft Fragmented assembly - short contigs - many gap bases http://en.wikipedia.org
  • 34. First draft 6467 scaffolds 35% gap bases
  • 35. The causes Short Tandem Repeats (>20% of gaps)
  • 36. The causes Heterozygosity? Polymorphic contig 2 Contig 1 Contig 4 Polymorphic contig 3
  • 37. The goal 23 pseudochromosomes Longer contigs Below 5% gap bases PacBio to the rescue?
  • 38. The approach SMRTBell'template' Libraries Standard'Sequencing' Generates& pass& ea one& on& Large Insert& Sizes Large& Sizes& Insert sequenced& Aim for looooong insert sizes Circular'Consensus'Sequencing' Small& Insert& Sizes& Generates& mul8ple& passes sequenced&
  • 39. SMRTBell'template' The approach Sequencing Standard'Sequencing' Generates& pass& each& one& on& molecule& Large Insert& Sizes Large& Sizes& Insert Single pass sequenced& Sequence with 90 minute movies Circular'Consensus'Sequencing' Small& Insert& Sizes& Generates& mul8ple& passes& each& on& molecule& 10 x coverage in reads of at least 3000 bp sequenced& No, we don’t throw this away…
  • 41. PacBio results 100 Relative throughput at different minimum length cutoffs 10kb lib 2 Fraction of bases at minimum 10kb lib 1 length 4kb lib 80 Percentage of total sequence 60 40 20 0 0kbp 3kbp 5kbp 10kbp 15kbp Length cutoff longest subread Large library insert size important!
  • 42. chnology PacBio results SMRTBell'template' 64 SMRT Cells 3.2 Gigabytes in raw reads at least 3kb 3.8 x coverage 3 Standard'Sequencing' Generates& pass& each& one& on& molecule& Large Insert& Sizes Large& Sizes& Insert sequenced& 2.2 Gigabytes in longest subreads reads Circular'Consensus'Sequencing' Largest 15 kbp Small& Insert& Sizes& Generates& mul8ple& passes& each& on& molecule&
  • 43. PacBio results Mapping to the cod genome 11.4 kbp subread 10.6 kbp subread 10.9 kbp subread
  • 44. Example 1 ACACAC repeat 232 bp Gap TGTGTG repeat
  • 47. Example 1 Scaffold ...ACACAC TGTGTG... PacBio reads Unplaced contig
  • 50. Example 2 Scaffold ...TGTGTG PacBio reads Heterozygosity?
  • 51. Example 3 Scaffold PacBio reads 300 bp misassembly?
  • 52. Error-correction Work In Progress http://openclipart.org/
  • 53. Outlook Will PacBio solve our problems?
  • 55. Outlook Polymorphic contig 2 Contig 1 Contig 4 Polymorphic contig 3 Will we find the heterozygous regions?