SlideShare a Scribd company logo
Next-Generation Sequencing of Microbial
Genomes and Metagenomes

Christine King
Farncombe Metagenomics Facility

Human Microbiome Journal Club
July 13, 2012
Overview
   Next-generation sequencing
     Applications

     Instruments

     Library
            prep and sequencing chemistry
     Sequence quality



   Project overview
     Microbial genomes
     Microbial communities
DNA Sequencing
                1st generation
                  Sanger chain
                   termination
                  Capillary
                   electrophoresis
                2nd generation (NGS)
                  High throughput,
                   “massively parallel”
                  Shorter reads
                  Sequencing-by-
                   synthesis
                3rd generation
                    Single molecule
Applications
   DNA sequencing
       De novo genomes
       Resequencing
           Shotgun (e.g. mutant
            strains)
           Amplicon (e.g. HLA,
            cancer)
           Sequence capture (e.g.
            exome)
       Metagenome
           Amplicon (e.g. 16S, COI,
            viral)
           Shotgun
       ChIP
   RNA sequencing
       Gene expression
       Gene annotation, splice
        variants
Instruments
Instruments
                                Total
               # of    Read             Cost
                                outp           Run
Instrument     read   length             per                     Technology
                                 ut            Time
                 s     (bp)             base
                                (Gb)
  GS FLX       1M      450       0.5    $$$$   ++

 GS FLX+       1M      650       0.6    $$$$   ++        emPCR, SBS, light detection

   GS Jr       100K    450      0.05    $$$$   ++

   GAIIx       640M   2x 150     90      $$    +++

HiSeq 2000      6B    2x 100    600      $     +++      Bridge PCR, SBS, fluororphore

   MiSeq       12M    2x 150     2       $$    ++

 PacBio RS     >10K   >1000     0.01    $$$$    +      Single-molecule seq, fluorophore

SOLiD 5500xl   1.4B   75 + 35   155      $     +++    emPCR, probe ligation, fluorophore
 Ion PGM -
               1M      >100      0.1    $$$     +
    316
                                                          emPCR, SBS, pH change
 Ion PGM -
               6M      >100      1       $$     +
    318
Which instrument(s) to use?
     Read length vs number of reads
     Cost per base, per sample, per project (multiplexing?)
     Accuracy
     Run time, wait time
Application      Lengt #        Accura   Instruments        Considerations
                 h     Reads    cy
De novo          +++   ++       ++       MiSeq, 454, Ion    Mix lengths
(small)
De novo          +++   +++      ++       HiSeq, 454,        Mix lengths, MP
(large)                                  SOLiD
Re-seq           ++    ++       ++       MiSeq, Ion         Multiplex?
(small)
Re-seq (large) ++      +++      ++       HiSeq, SOLiD       Enrichment?
RNA-seq          +     +++      +        Illumina, SOLiD,   Ref? Size?
(count)                                  Ion                Rare?
Library Preparation
   Goal: fragments of DNA, each end flanked by adaptor
    sequences

   Adaptors contain amplification- and sequencing primer
    binding sites; platform- and chemistry-specific

   Optional: sample-specific barcodes/indexes/MIDs/tags
    allow multiplexing during sequencing

   Library QC: quantity, size
Library Preparation
   Library types:
       Shotgun (DNA)
         May begin with ChIP
         May follow with sequence capture
     Mate pair (DNA)
     Amplicon (DNA)
     Total RNA
         May enrich for mRNA (poly-A enrichment, rRNA depletion)
         Convert to cDNA (then similar to DNA protocols)
       Small RNA
           RNA ligations, convert to cDNA after
Library Preparation: Shotgun
                   Fragmentation
                       Sonication
                       Nebulization
                       Enzymatic


                   End repair
                       3’ overhangs digested
                       5’ overhangs filled
                       5’ phosphate added
Library Preparation: Shotgun
                   Adapter ligation
                       T-overhangs
                       Forked structure controls
                        orientation

                   Library amplification
                       Few cycles
                       Enrich for correctly-adapted
                        fragments
                       Required to complete
                        adapter structure in some
                        protocols

                   Size selection
                       Gel excision, AMPure beads
                       Limit insert size as needed,
                        remove artifacts
Library Preparation: Amplicon
   Amplify region of       Primers contain
    interest using PCR       adapter sequences
Library Preparation: Mate Pair
   Begin with large
    fragments (e.g. 3kb,
    20kb)

   Circularize and
    fragment again
       Illumina: direct ligation
       454: Cre/Lox
        recombination

   Enrich for fragments
    containing the junction

   Proceed with shotgun
    library prep
Library Preparation: Mate Pair
   Why? Paired
    sequences are a known
    distance apart;
    improves genome
    assembly

   Note: 454 calls these
    “paired end libraries”,
    not to be confused with
    Illumina’s “paired end
    sequencing”!
Sequencing: Illumina
                   Cluster generation
                       Library fragments hybridize
                        to oligos on the flow cell
                       New strand synthesized,
                        original denatured,
                        removed
                       Free end binds to adjacent
                        oligos (bridge formation)
                       Complimentary strand
                        synthesized, denatured
                        (both tethered to flow cell)
                       Repeat to form clonal
                        cluster
                       Cleave one oligo, denature
                        to leave ssDNA clusters
                   ~800K clusters/mm^2
Sequencing: Illumina
   Variety of workflows:
     Single-  or paired end reads
     0, 1, or 2 index reads
Sequencing: Illumina
   At each cycle, all 4 fluorescently-labeled
    nucleotides pass over the flow cell
   Each cluster incorporates one nt (terminator) per
    cycle
   Fluor is imaged, then cleaved
   De-block and repeat
Sequencing: Illumina
   Other terminology:
       cBot – accessory instrument that performs cluster
        generation
       Lanes – divisions (8) of HiSeq and GAIIx flow cells
       PhiX – bacteriophage with small, balanced genome; PhiX
        library spiked in with samples for QC
       Phasing/pre-phasing – nt incorporation falls behind or
        jumps ahead on a portion of strands in the cluster and
        contributes to noise
       Chastity filter – measures signal purity (after intensity
        corrections); if the background signal is high, cluster will be
        discarded
       BaseSpace – cloud computing site for processing MiSeq
        data

   File format: fastq
Sequencing: 454
   emPCR: clonal
    amplification of
    bead-bound library
    in microdroplets

   Library input
    amounts critical!
     One   molecule per
      bead
     Titration procedure
Sequencing: 454
   Library capture:
    beads coated with
    complimentary oligo
   Amplification:
    droplet contains
    PCR reagents and
    the other oligo
   Post-PCR: millions
    of identical
    fragments attached
    to the bead
Sequencing: 454
   Bead Recovery:           Enrichment: capture
    physical and              successfully
    chemical disruption       amplified beads
                              using biotinylated
                              primers + magnetic,
                              streptavidin beads
Sequencing: 454
   Deposit bead layers
    onto PicoTiterPlate:
     Enzyme  beads
     Enriched DNA
      beads
     More enzyme beads

     PPiase beads
Sequencing: 454
Sequencing: 454
   Pyrosequencing

       4 nucleotides flow
        separately
       If nt
        incorporation…PPi...light
       APS + PPi (sulfurylase)
        ATP
       Luciferin + ATP (luciferase)
          light + oxyluciferin
       Amount of light
        proportional to #nt
        incorporated
       Rinse and repeat with next
        nt
Sequencing: 454
                     Camera captures light
                      emitted from every well
                      during every nucleotide flow
Sequencing: 454
   Flowgram: representation of a sequence, based on the
    pattern of light emitted from a single well
Sequencing: 454
   Other terminology:
     Lib-L/Lib-A: adapter variants, “ligated” or “annealed”
     Titanium chemistry: ~450 bp reads on all instruments
     XL+ chemistry: ~700 bp reads on the FLX+ instrument
     Flow: one of the four nucleotides flows over the PTP
     Cycle: a set of four flows, in order
     Valley flow: if number of bases incorporated in a given
      read during that flow is uncertain, e.g. 1.5 units of light
      (background signal, homopolymers)

   File format: sff (standard flowgram format)
Sequencing: Ion Torrent
   Procedures and
    chemistry similar to 454
   Instead of PPi, measure
    H+ release (pH change)
    via semiconductor chip
   No expensive camera or
    laser required, no
    modified nucleotides
Sequence Quality

Phred (Q)   Probabilit   Base Call
                                        Error probabilities
 Score      y of Error   Accuracy        determined using
               (P)
                                         training sets,
   10        1 in 10       90%
                                         platform-specific
   20        1 in 100      99%
   30        1 in 1K      99.9%
                                         biases
   40        1 in 10K     99.99%        Expressed as a
   50       1 in 100K    99.999%         quality value (QV or Q
                                         score) per base
                                        Similar to PHRED
                                         scores:
                                          Q = -10 log10P
                                          P = 10 -Q/10
Project 1: Microbial Genome
   Considerations:                 Coverage
     Reference genome?               Depth (number of
     How much coverage                times a particular
      do I want?                       base is “covered” by a
                                       read (e.g. 25X)
     How big is the
      genome                          Breadth (% of genome
                                       with at least 1X
     How much data do I
                                       coverage)
      need?
           bp needed = genome
            size X coverage
       Which
        instrument/chemistry
        configuration to use?
Project 1: Microbial Genome
   Sample preparation
     Isolate high quality (not
      degraded) and high purity (no
      RNA) gDNA
     Verify on a gel
     Quantify using dsDNA-specific
      dye

   Library preparation
     Can do this yourself if you like
     ~ $200 per sample for Nextera
         Cheaper protocols
         Cheaper in bulk
       Barcode compatibility
Project 1: Microbial Genome
   Library QC
     Insertsize confirmed on BioAnalyzer (within
      range, no artifacts)
     Pool barcoded libraries (normalize based on
      PicoGreen quantification)
     Absolute quantification of library pools using
      qPCR
Project 1: Microbial Genome
   MiSeq sequencing
     Diluteand denature library pool (optimal
      concentration requires titration...)
     Spike in PhiX library as needed (e.g. 1%)

     Prepare and load reagents, flow cell

     Basic filtering and de-multiplexing performed
      automatically
     Download fastq files from BaseSpace
Project 1: Microbial Genome
   Data processing             Assembly:
     Additional filtering       overlapping reads
     Trim the ends              are assembled to
     Remove PCR                 eachother based on
      duplicates                 sequence similarity
                                 = contigs
Project 1: Microbial Genome
   What’s next?
     Polish the genome
      (hybrid assemblies,
      mate pair libraries)
     Annotate (ORFs,
      RNA-seq)
     Compare
Project 2: Microbial Community
   Shotgun                       Targeted
    metagenomics                   metagenomics
     Unbiased survey of            Limited survey of
      community content              community content
     Random library                Targeted loci provide
      fragments may                  excellent taxonomic
      provide very little            resolution, but may
      taxonomic resolution           exclude certain taxa
      (e.g. conserved,
      unknown)
                                      Identify OTUs, classify
       Identify genes,                by taxonomy
        classify by function
Project 2: Microbial Community
   16S rRNA
   Multi-copy gene (1.5
    kb)
   Conserved and
    hypervariable regions
   Extensive databases
    from known species
Project 2: Microbial Community
   Considerations:            Sample preparation:
     Biases in sampling         Isolate
                                       DNA
      methods, culturing,        PCR amplify, purify
      DNA isolation,               High-fidelity
      PCR...replicate               polymerase
     Available SOPs               Barcoded primers

     How many reads per           No primer dimers!

      sample?                    NormalizePCR
     Read length                products and pool
      matters!
Project 2: Microbial Community
   454 Sequencing                Data processing
     emPCR     titrations          De-multiplexing
      with different library        Additionalfiltering
      input                         Trim the barcodes,
     Bulk emPCR                     primers
     Sequence                      Check for chimeras
     Basic filtering

     Collect sff files
Project 2: Microbial Community
   Clustering
     Sequences   grouped
     by similarity = OTUs
Project 2: Microbial Community
   Taxonomic
    identification
     OTUs are classifed by
      comparing to known
      16S sequences
     Level of classification
      (e.g. family vs
      genus)?


   Diversity
     Within sample
     Between samples

More Related Content

What's hot

Microarray Analysis
Microarray AnalysisMicroarray Analysis
Microarray Analysis
James McInerney
 
codon_optimization
codon_optimizationcodon_optimization
codon_optimization
HARSHITHA EBBALI
 
Applications of genomics and proteomics ppt
Applications of genomics and  proteomics pptApplications of genomics and  proteomics ppt
Applications of genomics and proteomics ppt
Ibad khan
 
Omics era
Omics eraOmics era
Microarray Data Analysis
Microarray Data AnalysisMicroarray Data Analysis
Microarray Data Analysis
yuvraj404
 
Map based cloning
Map based cloning Map based cloning
Map based cloning
PREETHYDAVID
 
Functional genomics
Functional genomicsFunctional genomics
Functional genomics
saswat tripathy
 
Metagenomics and it’s applications
Metagenomics and it’s applicationsMetagenomics and it’s applications
Metagenomics and it’s applications
Sham Sadiq
 
Genome Assembly
Genome AssemblyGenome Assembly
Genome Assembly
Aureliano Bombarely
 
Comparative genomics
Comparative genomicsComparative genomics
Comparative genomics
Jajati Keshari Nayak
 
Nanopore sequencing (NGS)
Nanopore sequencing (NGS)Nanopore sequencing (NGS)
Nanopore sequencing (NGS)
Sourabh Kumar
 
Metabolomics
MetabolomicsMetabolomics
Metabolomics
priya1111
 
Metagenomic
MetagenomicMetagenomic
Metagenomic
nasim arshadi
 
Metagenomics sk presentation 17.10.2017
Metagenomics sk presentation 17.10.2017 Metagenomics sk presentation 17.10.2017
Metagenomics sk presentation 17.10.2017
SUNILKUMARSAHOO16
 
Metagenomics
MetagenomicsMetagenomics
Metagenomics
Surender Rawat
 
Microarray technique
Microarray techniqueMicroarray technique
Microarray technique
arunchacko14
 
Probe labelling and hybridization
Probe labelling and hybridizationProbe labelling and hybridization
Probe labelling and hybridization
DOCTOR WHO
 
Comparative genomics
Comparative genomicsComparative genomics
Comparative genomics
Athira RG
 
Synthetic biology
Synthetic biologySynthetic biology
Synthetic biology
Vasyl Mykytyuk
 
Microarray technology, biochip, DNA chip
Microarray technology, biochip, DNA chip Microarray technology, biochip, DNA chip
Microarray technology, biochip, DNA chip
KAUSHAL SAHU
 

What's hot (20)

Microarray Analysis
Microarray AnalysisMicroarray Analysis
Microarray Analysis
 
codon_optimization
codon_optimizationcodon_optimization
codon_optimization
 
Applications of genomics and proteomics ppt
Applications of genomics and  proteomics pptApplications of genomics and  proteomics ppt
Applications of genomics and proteomics ppt
 
Omics era
Omics eraOmics era
Omics era
 
Microarray Data Analysis
Microarray Data AnalysisMicroarray Data Analysis
Microarray Data Analysis
 
Map based cloning
Map based cloning Map based cloning
Map based cloning
 
Functional genomics
Functional genomicsFunctional genomics
Functional genomics
 
Metagenomics and it’s applications
Metagenomics and it’s applicationsMetagenomics and it’s applications
Metagenomics and it’s applications
 
Genome Assembly
Genome AssemblyGenome Assembly
Genome Assembly
 
Comparative genomics
Comparative genomicsComparative genomics
Comparative genomics
 
Nanopore sequencing (NGS)
Nanopore sequencing (NGS)Nanopore sequencing (NGS)
Nanopore sequencing (NGS)
 
Metabolomics
MetabolomicsMetabolomics
Metabolomics
 
Metagenomic
MetagenomicMetagenomic
Metagenomic
 
Metagenomics sk presentation 17.10.2017
Metagenomics sk presentation 17.10.2017 Metagenomics sk presentation 17.10.2017
Metagenomics sk presentation 17.10.2017
 
Metagenomics
MetagenomicsMetagenomics
Metagenomics
 
Microarray technique
Microarray techniqueMicroarray technique
Microarray technique
 
Probe labelling and hybridization
Probe labelling and hybridizationProbe labelling and hybridization
Probe labelling and hybridization
 
Comparative genomics
Comparative genomicsComparative genomics
Comparative genomics
 
Synthetic biology
Synthetic biologySynthetic biology
Synthetic biology
 
Microarray technology, biochip, DNA chip
Microarray technology, biochip, DNA chip Microarray technology, biochip, DNA chip
Microarray technology, biochip, DNA chip
 

Similar to Ngs microbiome

A Comparison of NGS Platforms.
A Comparison of NGS Platforms.A Comparison of NGS Platforms.
A Comparison of NGS Platforms.
mkim8
 
20150601 bio sb_assembly_course
20150601 bio sb_assembly_course20150601 bio sb_assembly_course
20150601 bio sb_assembly_course
hansjansen9999
 
Making powerful science: an introduction to NGS and beyond
Making powerful science: an introduction to NGS and beyondMaking powerful science: an introduction to NGS and beyond
Making powerful science: an introduction to NGS and beyond
AdamCribbs1
 
Bio305 genome analysis and annotation 2012
Bio305 genome analysis and annotation 2012Bio305 genome analysis and annotation 2012
Bio305 genome analysis and annotation 2012
Mark Pallen
 
RNA-seq Analysis
RNA-seq AnalysisRNA-seq Analysis
RNA-seq Analysis
COST action BM1006
 
Next generation sequencing
Next generation sequencingNext generation sequencing
Next generation sequencing
ARUNDHATI MEHTA
 
nextgenerationsequencing-170606100132.pdf
nextgenerationsequencing-170606100132.pdfnextgenerationsequencing-170606100132.pdf
nextgenerationsequencing-170606100132.pdf
AkhileshPathak33
 
Lecture 2 , mbbs students. pcr, rt pcr,
Lecture 2 , mbbs students. pcr, rt pcr,  Lecture 2 , mbbs students. pcr, rt pcr,
Lecture 2 , mbbs students. pcr, rt pcr,
Dr Vishnu Kumar
 
Bioinformatics workshop Sept 2014
Bioinformatics workshop Sept 2014Bioinformatics workshop Sept 2014
Bioinformatics workshop Sept 2014
LutzFr
 
Ngs intro_v6_public
 Ngs intro_v6_public Ngs intro_v6_public
Ngs intro_v6_public
François PAILLIER
 
20140711 4 e_tseng_ercc2.0_workshop
20140711 4 e_tseng_ercc2.0_workshop20140711 4 e_tseng_ercc2.0_workshop
20140711 4 e_tseng_ercc2.0_workshop
External RNA Controls Consortium
 
NGx Sequencing 101-platforms
NGx Sequencing 101-platformsNGx Sequencing 101-platforms
NGx Sequencing 101-platforms
AllSeq
 
2015.04.08-Next-generation-sequencing-issues
2015.04.08-Next-generation-sequencing-issues2015.04.08-Next-generation-sequencing-issues
2015.04.08-Next-generation-sequencing-issues
Dongyan Zhao
 
Xin Zhou - Saturday Closing Plenary
Xin Zhou - Saturday Closing PlenaryXin Zhou - Saturday Closing Plenary
Xin Zhou - Saturday Closing Plenary
Consortium for the Barcode of Life (CBOL)
 
RNA-Seq transcriptome analysis of Gonium pectorale cell cycle.
RNA-Seq transcriptome analysis of Gonium pectorale cell cycle.RNA-Seq transcriptome analysis of Gonium pectorale cell cycle.
RNA-Seq transcriptome analysis of Gonium pectorale cell cycle.
Jennifer Shelton
 
An introduction to RNA-seq data analysis
An introduction to RNA-seq data analysisAn introduction to RNA-seq data analysis
An introduction to RNA-seq data analysis
AGRF_Ltd
 
Introduction to NGS
Introduction to NGSIntroduction to NGS
Introduction to NGS
cursoNGS
 
RNA-Seq transcriptome analysis of Gonium pectorale cell cycle
RNA-Seq transcriptome analysis of Gonium pectorale cell cycleRNA-Seq transcriptome analysis of Gonium pectorale cell cycle
RNA-Seq transcriptome analysis of Gonium pectorale cell cycle
Jennifer Shelton
 
How to cluster and sequence an ngs library (james hadfield160416)
How to cluster and sequence an ngs library (james hadfield160416)How to cluster and sequence an ngs library (james hadfield160416)
How to cluster and sequence an ngs library (james hadfield160416)
James Hadfield
 
Updated: New High Throughput Sequencing technologies at the Norwegian Sequenc...
Updated: New High Throughput Sequencing technologies at the Norwegian Sequenc...Updated: New High Throughput Sequencing technologies at the Norwegian Sequenc...
Updated: New High Throughput Sequencing technologies at the Norwegian Sequenc...
Lex Nederbragt
 

Similar to Ngs microbiome (20)

A Comparison of NGS Platforms.
A Comparison of NGS Platforms.A Comparison of NGS Platforms.
A Comparison of NGS Platforms.
 
20150601 bio sb_assembly_course
20150601 bio sb_assembly_course20150601 bio sb_assembly_course
20150601 bio sb_assembly_course
 
Making powerful science: an introduction to NGS and beyond
Making powerful science: an introduction to NGS and beyondMaking powerful science: an introduction to NGS and beyond
Making powerful science: an introduction to NGS and beyond
 
Bio305 genome analysis and annotation 2012
Bio305 genome analysis and annotation 2012Bio305 genome analysis and annotation 2012
Bio305 genome analysis and annotation 2012
 
RNA-seq Analysis
RNA-seq AnalysisRNA-seq Analysis
RNA-seq Analysis
 
Next generation sequencing
Next generation sequencingNext generation sequencing
Next generation sequencing
 
nextgenerationsequencing-170606100132.pdf
nextgenerationsequencing-170606100132.pdfnextgenerationsequencing-170606100132.pdf
nextgenerationsequencing-170606100132.pdf
 
Lecture 2 , mbbs students. pcr, rt pcr,
Lecture 2 , mbbs students. pcr, rt pcr,  Lecture 2 , mbbs students. pcr, rt pcr,
Lecture 2 , mbbs students. pcr, rt pcr,
 
Bioinformatics workshop Sept 2014
Bioinformatics workshop Sept 2014Bioinformatics workshop Sept 2014
Bioinformatics workshop Sept 2014
 
Ngs intro_v6_public
 Ngs intro_v6_public Ngs intro_v6_public
Ngs intro_v6_public
 
20140711 4 e_tseng_ercc2.0_workshop
20140711 4 e_tseng_ercc2.0_workshop20140711 4 e_tseng_ercc2.0_workshop
20140711 4 e_tseng_ercc2.0_workshop
 
NGx Sequencing 101-platforms
NGx Sequencing 101-platformsNGx Sequencing 101-platforms
NGx Sequencing 101-platforms
 
2015.04.08-Next-generation-sequencing-issues
2015.04.08-Next-generation-sequencing-issues2015.04.08-Next-generation-sequencing-issues
2015.04.08-Next-generation-sequencing-issues
 
Xin Zhou - Saturday Closing Plenary
Xin Zhou - Saturday Closing PlenaryXin Zhou - Saturday Closing Plenary
Xin Zhou - Saturday Closing Plenary
 
RNA-Seq transcriptome analysis of Gonium pectorale cell cycle.
RNA-Seq transcriptome analysis of Gonium pectorale cell cycle.RNA-Seq transcriptome analysis of Gonium pectorale cell cycle.
RNA-Seq transcriptome analysis of Gonium pectorale cell cycle.
 
An introduction to RNA-seq data analysis
An introduction to RNA-seq data analysisAn introduction to RNA-seq data analysis
An introduction to RNA-seq data analysis
 
Introduction to NGS
Introduction to NGSIntroduction to NGS
Introduction to NGS
 
RNA-Seq transcriptome analysis of Gonium pectorale cell cycle
RNA-Seq transcriptome analysis of Gonium pectorale cell cycleRNA-Seq transcriptome analysis of Gonium pectorale cell cycle
RNA-Seq transcriptome analysis of Gonium pectorale cell cycle
 
How to cluster and sequence an ngs library (james hadfield160416)
How to cluster and sequence an ngs library (james hadfield160416)How to cluster and sequence an ngs library (james hadfield160416)
How to cluster and sequence an ngs library (james hadfield160416)
 
Updated: New High Throughput Sequencing technologies at the Norwegian Sequenc...
Updated: New High Throughput Sequencing technologies at the Norwegian Sequenc...Updated: New High Throughput Sequencing technologies at the Norwegian Sequenc...
Updated: New High Throughput Sequencing technologies at the Norwegian Sequenc...
 

Recently uploaded

Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems S.M.S.A.
 
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
Neo4j
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
Safe Software
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
Kari Kakkonen
 
Full-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalizationFull-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalization
Zilliz
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Paige Cruz
 
By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024
Pierluigi Pugliese
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
Matthew Sinclair
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
Quotidiano Piemontese
 
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
SOFTTECHHUB
 
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AIEnchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Vladimir Iglovikov, Ph.D.
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Albert Hoitingh
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
sonjaschweigert1
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
Data structures and Algorithms in Python.pdf
Data structures and Algorithms in Python.pdfData structures and Algorithms in Python.pdf
Data structures and Algorithms in Python.pdf
TIPNGVN2
 
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
James Anderson
 
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Nexer Digital
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
Adtran
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
Aftab Hussain
 

Recently uploaded (20)

Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
 
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
 
Full-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalizationFull-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalization
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
 
By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
 
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
 
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AIEnchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
Data structures and Algorithms in Python.pdf
Data structures and Algorithms in Python.pdfData structures and Algorithms in Python.pdf
Data structures and Algorithms in Python.pdf
 
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
 
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
 

Ngs microbiome

  • 1. Next-Generation Sequencing of Microbial Genomes and Metagenomes Christine King Farncombe Metagenomics Facility Human Microbiome Journal Club July 13, 2012
  • 2. Overview  Next-generation sequencing  Applications  Instruments  Library prep and sequencing chemistry  Sequence quality  Project overview  Microbial genomes  Microbial communities
  • 3. DNA Sequencing  1st generation  Sanger chain termination  Capillary electrophoresis  2nd generation (NGS)  High throughput, “massively parallel”  Shorter reads  Sequencing-by- synthesis  3rd generation  Single molecule
  • 4. Applications  DNA sequencing  De novo genomes  Resequencing  Shotgun (e.g. mutant strains)  Amplicon (e.g. HLA, cancer)  Sequence capture (e.g. exome)  Metagenome  Amplicon (e.g. 16S, COI, viral)  Shotgun  ChIP  RNA sequencing  Gene expression  Gene annotation, splice variants
  • 6. Instruments Total # of Read Cost outp Run Instrument read length per Technology ut Time s (bp) base (Gb) GS FLX 1M 450 0.5 $$$$ ++ GS FLX+ 1M 650 0.6 $$$$ ++ emPCR, SBS, light detection GS Jr 100K 450 0.05 $$$$ ++ GAIIx 640M 2x 150 90 $$ +++ HiSeq 2000 6B 2x 100 600 $ +++ Bridge PCR, SBS, fluororphore MiSeq 12M 2x 150 2 $$ ++ PacBio RS >10K >1000 0.01 $$$$ + Single-molecule seq, fluorophore SOLiD 5500xl 1.4B 75 + 35 155 $ +++ emPCR, probe ligation, fluorophore Ion PGM - 1M >100 0.1 $$$ + 316 emPCR, SBS, pH change Ion PGM - 6M >100 1 $$ + 318
  • 7. Which instrument(s) to use?  Read length vs number of reads  Cost per base, per sample, per project (multiplexing?)  Accuracy  Run time, wait time Application Lengt # Accura Instruments Considerations h Reads cy De novo +++ ++ ++ MiSeq, 454, Ion Mix lengths (small) De novo +++ +++ ++ HiSeq, 454, Mix lengths, MP (large) SOLiD Re-seq ++ ++ ++ MiSeq, Ion Multiplex? (small) Re-seq (large) ++ +++ ++ HiSeq, SOLiD Enrichment? RNA-seq + +++ + Illumina, SOLiD, Ref? Size? (count) Ion Rare?
  • 8. Library Preparation  Goal: fragments of DNA, each end flanked by adaptor sequences  Adaptors contain amplification- and sequencing primer binding sites; platform- and chemistry-specific  Optional: sample-specific barcodes/indexes/MIDs/tags allow multiplexing during sequencing  Library QC: quantity, size
  • 9. Library Preparation  Library types:  Shotgun (DNA)  May begin with ChIP  May follow with sequence capture  Mate pair (DNA)  Amplicon (DNA)  Total RNA  May enrich for mRNA (poly-A enrichment, rRNA depletion)  Convert to cDNA (then similar to DNA protocols)  Small RNA  RNA ligations, convert to cDNA after
  • 10. Library Preparation: Shotgun  Fragmentation  Sonication  Nebulization  Enzymatic  End repair  3’ overhangs digested  5’ overhangs filled  5’ phosphate added
  • 11. Library Preparation: Shotgun  Adapter ligation  T-overhangs  Forked structure controls orientation  Library amplification  Few cycles  Enrich for correctly-adapted fragments  Required to complete adapter structure in some protocols  Size selection  Gel excision, AMPure beads  Limit insert size as needed, remove artifacts
  • 12. Library Preparation: Amplicon  Amplify region of  Primers contain interest using PCR adapter sequences
  • 13. Library Preparation: Mate Pair  Begin with large fragments (e.g. 3kb, 20kb)  Circularize and fragment again  Illumina: direct ligation  454: Cre/Lox recombination  Enrich for fragments containing the junction  Proceed with shotgun library prep
  • 14. Library Preparation: Mate Pair  Why? Paired sequences are a known distance apart; improves genome assembly  Note: 454 calls these “paired end libraries”, not to be confused with Illumina’s “paired end sequencing”!
  • 15. Sequencing: Illumina  Cluster generation  Library fragments hybridize to oligos on the flow cell  New strand synthesized, original denatured, removed  Free end binds to adjacent oligos (bridge formation)  Complimentary strand synthesized, denatured (both tethered to flow cell)  Repeat to form clonal cluster  Cleave one oligo, denature to leave ssDNA clusters  ~800K clusters/mm^2
  • 16. Sequencing: Illumina  Variety of workflows:  Single- or paired end reads  0, 1, or 2 index reads
  • 17. Sequencing: Illumina  At each cycle, all 4 fluorescently-labeled nucleotides pass over the flow cell  Each cluster incorporates one nt (terminator) per cycle  Fluor is imaged, then cleaved  De-block and repeat
  • 18. Sequencing: Illumina  Other terminology:  cBot – accessory instrument that performs cluster generation  Lanes – divisions (8) of HiSeq and GAIIx flow cells  PhiX – bacteriophage with small, balanced genome; PhiX library spiked in with samples for QC  Phasing/pre-phasing – nt incorporation falls behind or jumps ahead on a portion of strands in the cluster and contributes to noise  Chastity filter – measures signal purity (after intensity corrections); if the background signal is high, cluster will be discarded  BaseSpace – cloud computing site for processing MiSeq data  File format: fastq
  • 19. Sequencing: 454  emPCR: clonal amplification of bead-bound library in microdroplets  Library input amounts critical!  One molecule per bead  Titration procedure
  • 20. Sequencing: 454  Library capture: beads coated with complimentary oligo  Amplification: droplet contains PCR reagents and the other oligo  Post-PCR: millions of identical fragments attached to the bead
  • 21. Sequencing: 454  Bead Recovery:  Enrichment: capture physical and successfully chemical disruption amplified beads using biotinylated primers + magnetic, streptavidin beads
  • 22. Sequencing: 454  Deposit bead layers onto PicoTiterPlate:  Enzyme beads  Enriched DNA beads  More enzyme beads  PPiase beads
  • 24. Sequencing: 454  Pyrosequencing  4 nucleotides flow separately  If nt incorporation…PPi...light  APS + PPi (sulfurylase) ATP  Luciferin + ATP (luciferase) light + oxyluciferin  Amount of light proportional to #nt incorporated  Rinse and repeat with next nt
  • 25. Sequencing: 454  Camera captures light emitted from every well during every nucleotide flow
  • 26. Sequencing: 454  Flowgram: representation of a sequence, based on the pattern of light emitted from a single well
  • 27. Sequencing: 454  Other terminology:  Lib-L/Lib-A: adapter variants, “ligated” or “annealed”  Titanium chemistry: ~450 bp reads on all instruments  XL+ chemistry: ~700 bp reads on the FLX+ instrument  Flow: one of the four nucleotides flows over the PTP  Cycle: a set of four flows, in order  Valley flow: if number of bases incorporated in a given read during that flow is uncertain, e.g. 1.5 units of light (background signal, homopolymers)  File format: sff (standard flowgram format)
  • 28. Sequencing: Ion Torrent  Procedures and chemistry similar to 454  Instead of PPi, measure H+ release (pH change) via semiconductor chip  No expensive camera or laser required, no modified nucleotides
  • 29. Sequence Quality Phred (Q) Probabilit Base Call  Error probabilities Score y of Error Accuracy determined using (P) training sets, 10 1 in 10 90% platform-specific 20 1 in 100 99% 30 1 in 1K 99.9% biases 40 1 in 10K 99.99%  Expressed as a 50 1 in 100K 99.999% quality value (QV or Q score) per base  Similar to PHRED scores:  Q = -10 log10P  P = 10 -Q/10
  • 30. Project 1: Microbial Genome  Considerations:  Coverage  Reference genome?  Depth (number of  How much coverage times a particular do I want? base is “covered” by a read (e.g. 25X)  How big is the genome  Breadth (% of genome with at least 1X  How much data do I coverage) need?  bp needed = genome size X coverage  Which instrument/chemistry configuration to use?
  • 31. Project 1: Microbial Genome  Sample preparation  Isolate high quality (not degraded) and high purity (no RNA) gDNA  Verify on a gel  Quantify using dsDNA-specific dye  Library preparation  Can do this yourself if you like  ~ $200 per sample for Nextera  Cheaper protocols  Cheaper in bulk  Barcode compatibility
  • 32. Project 1: Microbial Genome  Library QC  Insertsize confirmed on BioAnalyzer (within range, no artifacts)  Pool barcoded libraries (normalize based on PicoGreen quantification)  Absolute quantification of library pools using qPCR
  • 33. Project 1: Microbial Genome  MiSeq sequencing  Diluteand denature library pool (optimal concentration requires titration...)  Spike in PhiX library as needed (e.g. 1%)  Prepare and load reagents, flow cell  Basic filtering and de-multiplexing performed automatically  Download fastq files from BaseSpace
  • 34. Project 1: Microbial Genome  Data processing  Assembly:  Additional filtering overlapping reads  Trim the ends are assembled to  Remove PCR eachother based on duplicates sequence similarity = contigs
  • 35. Project 1: Microbial Genome  What’s next?  Polish the genome (hybrid assemblies, mate pair libraries)  Annotate (ORFs, RNA-seq)  Compare
  • 36. Project 2: Microbial Community  Shotgun  Targeted metagenomics metagenomics  Unbiased survey of  Limited survey of community content community content  Random library  Targeted loci provide fragments may excellent taxonomic provide very little resolution, but may taxonomic resolution exclude certain taxa (e.g. conserved, unknown)  Identify OTUs, classify  Identify genes, by taxonomy classify by function
  • 37. Project 2: Microbial Community  16S rRNA  Multi-copy gene (1.5 kb)  Conserved and hypervariable regions  Extensive databases from known species
  • 38. Project 2: Microbial Community  Considerations:  Sample preparation:  Biases in sampling  Isolate DNA methods, culturing,  PCR amplify, purify DNA isolation,  High-fidelity PCR...replicate polymerase  Available SOPs  Barcoded primers  How many reads per  No primer dimers! sample?  NormalizePCR  Read length products and pool matters!
  • 39. Project 2: Microbial Community  454 Sequencing  Data processing  emPCR titrations  De-multiplexing with different library  Additionalfiltering input  Trim the barcodes,  Bulk emPCR primers  Sequence  Check for chimeras  Basic filtering  Collect sff files
  • 40. Project 2: Microbial Community  Clustering  Sequences grouped by similarity = OTUs
  • 41. Project 2: Microbial Community  Taxonomic identification  OTUs are classifed by comparing to known 16S sequences  Level of classification (e.g. family vs genus)?  Diversity  Within sample  Between samples