SlideShare a Scribd company logo
1 of 41
Next-Generation Sequencing of Microbial
Genomes and Metagenomes

Christine King
Farncombe Metagenomics Facility

Human Microbiome Journal Club
July 13, 2012
Overview
   Next-generation sequencing
     Applications

     Instruments

     Library
            prep and sequencing chemistry
     Sequence quality



   Project overview
     Microbial genomes
     Microbial communities
DNA Sequencing
                1st generation
                  Sanger chain
                   termination
                  Capillary
                   electrophoresis
                2nd generation (NGS)
                  High throughput,
                   “massively parallel”
                  Shorter reads
                  Sequencing-by-
                   synthesis
                3rd generation
                    Single molecule
Applications
   DNA sequencing
       De novo genomes
       Resequencing
           Shotgun (e.g. mutant
            strains)
           Amplicon (e.g. HLA,
            cancer)
           Sequence capture (e.g.
            exome)
       Metagenome
           Amplicon (e.g. 16S, COI,
            viral)
           Shotgun
       ChIP
   RNA sequencing
       Gene expression
       Gene annotation, splice
        variants
Instruments
Instruments
                                Total
               # of    Read             Cost
                                outp           Run
Instrument     read   length             per                     Technology
                                 ut            Time
                 s     (bp)             base
                                (Gb)
  GS FLX       1M      450       0.5    $$$$   ++

 GS FLX+       1M      650       0.6    $$$$   ++        emPCR, SBS, light detection

   GS Jr       100K    450      0.05    $$$$   ++

   GAIIx       640M   2x 150     90      $$    +++

HiSeq 2000      6B    2x 100    600      $     +++      Bridge PCR, SBS, fluororphore

   MiSeq       12M    2x 150     2       $$    ++

 PacBio RS     >10K   >1000     0.01    $$$$    +      Single-molecule seq, fluorophore

SOLiD 5500xl   1.4B   75 + 35   155      $     +++    emPCR, probe ligation, fluorophore
 Ion PGM -
               1M      >100      0.1    $$$     +
    316
                                                          emPCR, SBS, pH change
 Ion PGM -
               6M      >100      1       $$     +
    318
Which instrument(s) to use?
     Read length vs number of reads
     Cost per base, per sample, per project (multiplexing?)
     Accuracy
     Run time, wait time
Application      Lengt #        Accura   Instruments        Considerations
                 h     Reads    cy
De novo          +++   ++       ++       MiSeq, 454, Ion    Mix lengths
(small)
De novo          +++   +++      ++       HiSeq, 454,        Mix lengths, MP
(large)                                  SOLiD
Re-seq           ++    ++       ++       MiSeq, Ion         Multiplex?
(small)
Re-seq (large) ++      +++      ++       HiSeq, SOLiD       Enrichment?
RNA-seq          +     +++      +        Illumina, SOLiD,   Ref? Size?
(count)                                  Ion                Rare?
Library Preparation
   Goal: fragments of DNA, each end flanked by adaptor
    sequences

   Adaptors contain amplification- and sequencing primer
    binding sites; platform- and chemistry-specific

   Optional: sample-specific barcodes/indexes/MIDs/tags
    allow multiplexing during sequencing

   Library QC: quantity, size
Library Preparation
   Library types:
       Shotgun (DNA)
         May begin with ChIP
         May follow with sequence capture
     Mate pair (DNA)
     Amplicon (DNA)
     Total RNA
         May enrich for mRNA (poly-A enrichment, rRNA depletion)
         Convert to cDNA (then similar to DNA protocols)
       Small RNA
           RNA ligations, convert to cDNA after
Library Preparation: Shotgun
                   Fragmentation
                       Sonication
                       Nebulization
                       Enzymatic


                   End repair
                       3’ overhangs digested
                       5’ overhangs filled
                       5’ phosphate added
Library Preparation: Shotgun
                   Adapter ligation
                       T-overhangs
                       Forked structure controls
                        orientation

                   Library amplification
                       Few cycles
                       Enrich for correctly-adapted
                        fragments
                       Required to complete
                        adapter structure in some
                        protocols

                   Size selection
                       Gel excision, AMPure beads
                       Limit insert size as needed,
                        remove artifacts
Library Preparation: Amplicon
   Amplify region of       Primers contain
    interest using PCR       adapter sequences
Library Preparation: Mate Pair
   Begin with large
    fragments (e.g. 3kb,
    20kb)

   Circularize and
    fragment again
       Illumina: direct ligation
       454: Cre/Lox
        recombination

   Enrich for fragments
    containing the junction

   Proceed with shotgun
    library prep
Library Preparation: Mate Pair
   Why? Paired
    sequences are a known
    distance apart;
    improves genome
    assembly

   Note: 454 calls these
    “paired end libraries”,
    not to be confused with
    Illumina’s “paired end
    sequencing”!
Sequencing: Illumina
                   Cluster generation
                       Library fragments hybridize
                        to oligos on the flow cell
                       New strand synthesized,
                        original denatured,
                        removed
                       Free end binds to adjacent
                        oligos (bridge formation)
                       Complimentary strand
                        synthesized, denatured
                        (both tethered to flow cell)
                       Repeat to form clonal
                        cluster
                       Cleave one oligo, denature
                        to leave ssDNA clusters
                   ~800K clusters/mm^2
Sequencing: Illumina
   Variety of workflows:
     Single-  or paired end reads
     0, 1, or 2 index reads
Sequencing: Illumina
   At each cycle, all 4 fluorescently-labeled
    nucleotides pass over the flow cell
   Each cluster incorporates one nt (terminator) per
    cycle
   Fluor is imaged, then cleaved
   De-block and repeat
Sequencing: Illumina
   Other terminology:
       cBot – accessory instrument that performs cluster
        generation
       Lanes – divisions (8) of HiSeq and GAIIx flow cells
       PhiX – bacteriophage with small, balanced genome; PhiX
        library spiked in with samples for QC
       Phasing/pre-phasing – nt incorporation falls behind or
        jumps ahead on a portion of strands in the cluster and
        contributes to noise
       Chastity filter – measures signal purity (after intensity
        corrections); if the background signal is high, cluster will be
        discarded
       BaseSpace – cloud computing site for processing MiSeq
        data

   File format: fastq
Sequencing: 454
   emPCR: clonal
    amplification of
    bead-bound library
    in microdroplets

   Library input
    amounts critical!
     One   molecule per
      bead
     Titration procedure
Sequencing: 454
   Library capture:
    beads coated with
    complimentary oligo
   Amplification:
    droplet contains
    PCR reagents and
    the other oligo
   Post-PCR: millions
    of identical
    fragments attached
    to the bead
Sequencing: 454
   Bead Recovery:           Enrichment: capture
    physical and              successfully
    chemical disruption       amplified beads
                              using biotinylated
                              primers + magnetic,
                              streptavidin beads
Sequencing: 454
   Deposit bead layers
    onto PicoTiterPlate:
     Enzyme  beads
     Enriched DNA
      beads
     More enzyme beads

     PPiase beads
Sequencing: 454
Sequencing: 454
   Pyrosequencing

       4 nucleotides flow
        separately
       If nt
        incorporation…PPi...light
       APS + PPi (sulfurylase)
        ATP
       Luciferin + ATP (luciferase)
          light + oxyluciferin
       Amount of light
        proportional to #nt
        incorporated
       Rinse and repeat with next
        nt
Sequencing: 454
                     Camera captures light
                      emitted from every well
                      during every nucleotide flow
Sequencing: 454
   Flowgram: representation of a sequence, based on the
    pattern of light emitted from a single well
Sequencing: 454
   Other terminology:
     Lib-L/Lib-A: adapter variants, “ligated” or “annealed”
     Titanium chemistry: ~450 bp reads on all instruments
     XL+ chemistry: ~700 bp reads on the FLX+ instrument
     Flow: one of the four nucleotides flows over the PTP
     Cycle: a set of four flows, in order
     Valley flow: if number of bases incorporated in a given
      read during that flow is uncertain, e.g. 1.5 units of light
      (background signal, homopolymers)

   File format: sff (standard flowgram format)
Sequencing: Ion Torrent
   Procedures and
    chemistry similar to 454
   Instead of PPi, measure
    H+ release (pH change)
    via semiconductor chip
   No expensive camera or
    laser required, no
    modified nucleotides
Sequence Quality

Phred (Q)   Probabilit   Base Call
                                        Error probabilities
 Score      y of Error   Accuracy        determined using
               (P)
                                         training sets,
   10        1 in 10       90%
                                         platform-specific
   20        1 in 100      99%
   30        1 in 1K      99.9%
                                         biases
   40        1 in 10K     99.99%        Expressed as a
   50       1 in 100K    99.999%         quality value (QV or Q
                                         score) per base
                                        Similar to PHRED
                                         scores:
                                          Q = -10 log10P
                                          P = 10 -Q/10
Project 1: Microbial Genome
   Considerations:                 Coverage
     Reference genome?               Depth (number of
     How much coverage                times a particular
      do I want?                       base is “covered” by a
                                       read (e.g. 25X)
     How big is the
      genome                          Breadth (% of genome
                                       with at least 1X
     How much data do I
                                       coverage)
      need?
           bp needed = genome
            size X coverage
       Which
        instrument/chemistry
        configuration to use?
Project 1: Microbial Genome
   Sample preparation
     Isolate high quality (not
      degraded) and high purity (no
      RNA) gDNA
     Verify on a gel
     Quantify using dsDNA-specific
      dye

   Library preparation
     Can do this yourself if you like
     ~ $200 per sample for Nextera
         Cheaper protocols
         Cheaper in bulk
       Barcode compatibility
Project 1: Microbial Genome
   Library QC
     Insertsize confirmed on BioAnalyzer (within
      range, no artifacts)
     Pool barcoded libraries (normalize based on
      PicoGreen quantification)
     Absolute quantification of library pools using
      qPCR
Project 1: Microbial Genome
   MiSeq sequencing
     Diluteand denature library pool (optimal
      concentration requires titration...)
     Spike in PhiX library as needed (e.g. 1%)

     Prepare and load reagents, flow cell

     Basic filtering and de-multiplexing performed
      automatically
     Download fastq files from BaseSpace
Project 1: Microbial Genome
   Data processing             Assembly:
     Additional filtering       overlapping reads
     Trim the ends              are assembled to
     Remove PCR                 eachother based on
      duplicates                 sequence similarity
                                 = contigs
Project 1: Microbial Genome
   What’s next?
     Polish the genome
      (hybrid assemblies,
      mate pair libraries)
     Annotate (ORFs,
      RNA-seq)
     Compare
Project 2: Microbial Community
   Shotgun                       Targeted
    metagenomics                   metagenomics
     Unbiased survey of            Limited survey of
      community content              community content
     Random library                Targeted loci provide
      fragments may                  excellent taxonomic
      provide very little            resolution, but may
      taxonomic resolution           exclude certain taxa
      (e.g. conserved,
      unknown)
                                      Identify OTUs, classify
       Identify genes,                by taxonomy
        classify by function
Project 2: Microbial Community
   16S rRNA
   Multi-copy gene (1.5
    kb)
   Conserved and
    hypervariable regions
   Extensive databases
    from known species
Project 2: Microbial Community
   Considerations:            Sample preparation:
     Biases in sampling         Isolate
                                       DNA
      methods, culturing,        PCR amplify, purify
      DNA isolation,               High-fidelity
      PCR...replicate               polymerase
     Available SOPs               Barcoded primers

     How many reads per           No primer dimers!

      sample?                    NormalizePCR
     Read length                products and pool
      matters!
Project 2: Microbial Community
   454 Sequencing                Data processing
     emPCR     titrations          De-multiplexing
      with different library        Additionalfiltering
      input                         Trim the barcodes,
     Bulk emPCR                     primers
     Sequence                      Check for chimeras
     Basic filtering

     Collect sff files
Project 2: Microbial Community
   Clustering
     Sequences   grouped
     by similarity = OTUs
Project 2: Microbial Community
   Taxonomic
    identification
     OTUs are classifed by
      comparing to known
      16S sequences
     Level of classification
      (e.g. family vs
      genus)?


   Diversity
     Within sample
     Between samples

More Related Content

What's hot

Next generation sequencing
Next generation sequencingNext generation sequencing
Next generation sequencing
Dayananda Salam
 

What's hot (20)

Metagenomics analysis
Metagenomics  analysisMetagenomics  analysis
Metagenomics analysis
 
Introduction to Single-cell RNA-seq
Introduction to Single-cell RNA-seqIntroduction to Single-cell RNA-seq
Introduction to Single-cell RNA-seq
 
Rna seq and chip seq
Rna seq and chip seqRna seq and chip seq
Rna seq and chip seq
 
How to cluster and sequence an ngs library (james hadfield160416)
How to cluster and sequence an ngs library (james hadfield160416)How to cluster and sequence an ngs library (james hadfield160416)
How to cluster and sequence an ngs library (james hadfield160416)
 
Prokka - rapid bacterial genome annotation - ABPHM 2013
Prokka - rapid bacterial genome annotation - ABPHM 2013Prokka - rapid bacterial genome annotation - ABPHM 2013
Prokka - rapid bacterial genome annotation - ABPHM 2013
 
Tools for Metagenomics with 16S/ITS and Whole Genome Shotgun Sequences
Tools for Metagenomics with 16S/ITS and Whole Genome Shotgun SequencesTools for Metagenomics with 16S/ITS and Whole Genome Shotgun Sequences
Tools for Metagenomics with 16S/ITS and Whole Genome Shotgun Sequences
 
The Era of the Microbiome - Talk by Jonathan Eisen
The Era of the Microbiome - Talk by Jonathan Eisen The Era of the Microbiome - Talk by Jonathan Eisen
The Era of the Microbiome - Talk by Jonathan Eisen
 
Genome analysis2
Genome analysis2Genome analysis2
Genome analysis2
 
BLAST and sequence alignment
BLAST and sequence alignmentBLAST and sequence alignment
BLAST and sequence alignment
 
Assembly and gene_prediction
Assembly and gene_predictionAssembly and gene_prediction
Assembly and gene_prediction
 
Next generation sequencing
Next generation sequencingNext generation sequencing
Next generation sequencing
 
HMP PPT
HMP  PPTHMP  PPT
HMP PPT
 
De novo genome assembly - T.Seemann - IMB winter school 2016 - brisbane, au ...
De novo genome assembly  - T.Seemann - IMB winter school 2016 - brisbane, au ...De novo genome assembly  - T.Seemann - IMB winter school 2016 - brisbane, au ...
De novo genome assembly - T.Seemann - IMB winter school 2016 - brisbane, au ...
 
Human Microbiome
Human Microbiome Human Microbiome
Human Microbiome
 
Impacts of genomics, proteomics, and metabolomics ppt
Impacts of genomics, proteomics, and metabolomics pptImpacts of genomics, proteomics, and metabolomics ppt
Impacts of genomics, proteomics, and metabolomics ppt
 
Introduction to 16S Analysis with NGS - BMR Genomics
Introduction to 16S Analysis with NGS - BMR GenomicsIntroduction to 16S Analysis with NGS - BMR Genomics
Introduction to 16S Analysis with NGS - BMR Genomics
 
What is n50 quality measure of genome assembly
What is n50 quality measure of genome assemblyWhat is n50 quality measure of genome assembly
What is n50 quality measure of genome assembly
 
Bacterial taxonomy & classification
Bacterial taxonomy & classificationBacterial taxonomy & classification
Bacterial taxonomy & classification
 
So you want to do a: RNAseq experiment, Differential Gene Expression Analysis
So you want to do a: RNAseq experiment, Differential Gene Expression AnalysisSo you want to do a: RNAseq experiment, Differential Gene Expression Analysis
So you want to do a: RNAseq experiment, Differential Gene Expression Analysis
 
Explaining the assembly model
Explaining the assembly modelExplaining the assembly model
Explaining the assembly model
 

Similar to Ngs microbiome

A Comparison of NGS Platforms.
A Comparison of NGS Platforms.A Comparison of NGS Platforms.
A Comparison of NGS Platforms.
mkim8
 
Bioinformatics workshop Sept 2014
Bioinformatics workshop Sept 2014Bioinformatics workshop Sept 2014
Bioinformatics workshop Sept 2014
LutzFr
 
2015.04.08-Next-generation-sequencing-issues
2015.04.08-Next-generation-sequencing-issues2015.04.08-Next-generation-sequencing-issues
2015.04.08-Next-generation-sequencing-issues
Dongyan Zhao
 

Similar to Ngs microbiome (20)

A Comparison of NGS Platforms.
A Comparison of NGS Platforms.A Comparison of NGS Platforms.
A Comparison of NGS Platforms.
 
20150601 bio sb_assembly_course
20150601 bio sb_assembly_course20150601 bio sb_assembly_course
20150601 bio sb_assembly_course
 
Making powerful science: an introduction to NGS and beyond
Making powerful science: an introduction to NGS and beyondMaking powerful science: an introduction to NGS and beyond
Making powerful science: an introduction to NGS and beyond
 
Bio305 genome analysis and annotation 2012
Bio305 genome analysis and annotation 2012Bio305 genome analysis and annotation 2012
Bio305 genome analysis and annotation 2012
 
RNA-seq Analysis
RNA-seq AnalysisRNA-seq Analysis
RNA-seq Analysis
 
Next generation sequencing
Next generation sequencingNext generation sequencing
Next generation sequencing
 
nextgenerationsequencing-170606100132.pdf
nextgenerationsequencing-170606100132.pdfnextgenerationsequencing-170606100132.pdf
nextgenerationsequencing-170606100132.pdf
 
Lecture 2 , mbbs students. pcr, rt pcr,
Lecture 2 , mbbs students. pcr, rt pcr,  Lecture 2 , mbbs students. pcr, rt pcr,
Lecture 2 , mbbs students. pcr, rt pcr,
 
Bioinformatics workshop Sept 2014
Bioinformatics workshop Sept 2014Bioinformatics workshop Sept 2014
Bioinformatics workshop Sept 2014
 
Ngs intro_v6_public
 Ngs intro_v6_public Ngs intro_v6_public
Ngs intro_v6_public
 
20140711 4 e_tseng_ercc2.0_workshop
20140711 4 e_tseng_ercc2.0_workshop20140711 4 e_tseng_ercc2.0_workshop
20140711 4 e_tseng_ercc2.0_workshop
 
NGx Sequencing 101-platforms
NGx Sequencing 101-platformsNGx Sequencing 101-platforms
NGx Sequencing 101-platforms
 
2015.04.08-Next-generation-sequencing-issues
2015.04.08-Next-generation-sequencing-issues2015.04.08-Next-generation-sequencing-issues
2015.04.08-Next-generation-sequencing-issues
 
Xin Zhou - Saturday Closing Plenary
Xin Zhou - Saturday Closing PlenaryXin Zhou - Saturday Closing Plenary
Xin Zhou - Saturday Closing Plenary
 
RNA-Seq transcriptome analysis of Gonium pectorale cell cycle.
RNA-Seq transcriptome analysis of Gonium pectorale cell cycle.RNA-Seq transcriptome analysis of Gonium pectorale cell cycle.
RNA-Seq transcriptome analysis of Gonium pectorale cell cycle.
 
An introduction to RNA-seq data analysis
An introduction to RNA-seq data analysisAn introduction to RNA-seq data analysis
An introduction to RNA-seq data analysis
 
Introduction to NGS
Introduction to NGSIntroduction to NGS
Introduction to NGS
 
RNA-Seq transcriptome analysis of Gonium pectorale cell cycle
RNA-Seq transcriptome analysis of Gonium pectorale cell cycleRNA-Seq transcriptome analysis of Gonium pectorale cell cycle
RNA-Seq transcriptome analysis of Gonium pectorale cell cycle
 
Updated: New High Throughput Sequencing technologies at the Norwegian Sequenc...
Updated: New High Throughput Sequencing technologies at the Norwegian Sequenc...Updated: New High Throughput Sequencing technologies at the Norwegian Sequenc...
Updated: New High Throughput Sequencing technologies at the Norwegian Sequenc...
 
New High Throughput Sequencing technologies at the Norwegian Sequencing Centr...
New High Throughput Sequencing technologies at the Norwegian Sequencing Centr...New High Throughput Sequencing technologies at the Norwegian Sequencing Centr...
New High Throughput Sequencing technologies at the Norwegian Sequencing Centr...
 

Recently uploaded

Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 

Recently uploaded (20)

FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
JohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptxJohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptx
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 

Ngs microbiome

  • 1. Next-Generation Sequencing of Microbial Genomes and Metagenomes Christine King Farncombe Metagenomics Facility Human Microbiome Journal Club July 13, 2012
  • 2. Overview  Next-generation sequencing  Applications  Instruments  Library prep and sequencing chemistry  Sequence quality  Project overview  Microbial genomes  Microbial communities
  • 3. DNA Sequencing  1st generation  Sanger chain termination  Capillary electrophoresis  2nd generation (NGS)  High throughput, “massively parallel”  Shorter reads  Sequencing-by- synthesis  3rd generation  Single molecule
  • 4. Applications  DNA sequencing  De novo genomes  Resequencing  Shotgun (e.g. mutant strains)  Amplicon (e.g. HLA, cancer)  Sequence capture (e.g. exome)  Metagenome  Amplicon (e.g. 16S, COI, viral)  Shotgun  ChIP  RNA sequencing  Gene expression  Gene annotation, splice variants
  • 6. Instruments Total # of Read Cost outp Run Instrument read length per Technology ut Time s (bp) base (Gb) GS FLX 1M 450 0.5 $$$$ ++ GS FLX+ 1M 650 0.6 $$$$ ++ emPCR, SBS, light detection GS Jr 100K 450 0.05 $$$$ ++ GAIIx 640M 2x 150 90 $$ +++ HiSeq 2000 6B 2x 100 600 $ +++ Bridge PCR, SBS, fluororphore MiSeq 12M 2x 150 2 $$ ++ PacBio RS >10K >1000 0.01 $$$$ + Single-molecule seq, fluorophore SOLiD 5500xl 1.4B 75 + 35 155 $ +++ emPCR, probe ligation, fluorophore Ion PGM - 1M >100 0.1 $$$ + 316 emPCR, SBS, pH change Ion PGM - 6M >100 1 $$ + 318
  • 7. Which instrument(s) to use?  Read length vs number of reads  Cost per base, per sample, per project (multiplexing?)  Accuracy  Run time, wait time Application Lengt # Accura Instruments Considerations h Reads cy De novo +++ ++ ++ MiSeq, 454, Ion Mix lengths (small) De novo +++ +++ ++ HiSeq, 454, Mix lengths, MP (large) SOLiD Re-seq ++ ++ ++ MiSeq, Ion Multiplex? (small) Re-seq (large) ++ +++ ++ HiSeq, SOLiD Enrichment? RNA-seq + +++ + Illumina, SOLiD, Ref? Size? (count) Ion Rare?
  • 8. Library Preparation  Goal: fragments of DNA, each end flanked by adaptor sequences  Adaptors contain amplification- and sequencing primer binding sites; platform- and chemistry-specific  Optional: sample-specific barcodes/indexes/MIDs/tags allow multiplexing during sequencing  Library QC: quantity, size
  • 9. Library Preparation  Library types:  Shotgun (DNA)  May begin with ChIP  May follow with sequence capture  Mate pair (DNA)  Amplicon (DNA)  Total RNA  May enrich for mRNA (poly-A enrichment, rRNA depletion)  Convert to cDNA (then similar to DNA protocols)  Small RNA  RNA ligations, convert to cDNA after
  • 10. Library Preparation: Shotgun  Fragmentation  Sonication  Nebulization  Enzymatic  End repair  3’ overhangs digested  5’ overhangs filled  5’ phosphate added
  • 11. Library Preparation: Shotgun  Adapter ligation  T-overhangs  Forked structure controls orientation  Library amplification  Few cycles  Enrich for correctly-adapted fragments  Required to complete adapter structure in some protocols  Size selection  Gel excision, AMPure beads  Limit insert size as needed, remove artifacts
  • 12. Library Preparation: Amplicon  Amplify region of  Primers contain interest using PCR adapter sequences
  • 13. Library Preparation: Mate Pair  Begin with large fragments (e.g. 3kb, 20kb)  Circularize and fragment again  Illumina: direct ligation  454: Cre/Lox recombination  Enrich for fragments containing the junction  Proceed with shotgun library prep
  • 14. Library Preparation: Mate Pair  Why? Paired sequences are a known distance apart; improves genome assembly  Note: 454 calls these “paired end libraries”, not to be confused with Illumina’s “paired end sequencing”!
  • 15. Sequencing: Illumina  Cluster generation  Library fragments hybridize to oligos on the flow cell  New strand synthesized, original denatured, removed  Free end binds to adjacent oligos (bridge formation)  Complimentary strand synthesized, denatured (both tethered to flow cell)  Repeat to form clonal cluster  Cleave one oligo, denature to leave ssDNA clusters  ~800K clusters/mm^2
  • 16. Sequencing: Illumina  Variety of workflows:  Single- or paired end reads  0, 1, or 2 index reads
  • 17. Sequencing: Illumina  At each cycle, all 4 fluorescently-labeled nucleotides pass over the flow cell  Each cluster incorporates one nt (terminator) per cycle  Fluor is imaged, then cleaved  De-block and repeat
  • 18. Sequencing: Illumina  Other terminology:  cBot – accessory instrument that performs cluster generation  Lanes – divisions (8) of HiSeq and GAIIx flow cells  PhiX – bacteriophage with small, balanced genome; PhiX library spiked in with samples for QC  Phasing/pre-phasing – nt incorporation falls behind or jumps ahead on a portion of strands in the cluster and contributes to noise  Chastity filter – measures signal purity (after intensity corrections); if the background signal is high, cluster will be discarded  BaseSpace – cloud computing site for processing MiSeq data  File format: fastq
  • 19. Sequencing: 454  emPCR: clonal amplification of bead-bound library in microdroplets  Library input amounts critical!  One molecule per bead  Titration procedure
  • 20. Sequencing: 454  Library capture: beads coated with complimentary oligo  Amplification: droplet contains PCR reagents and the other oligo  Post-PCR: millions of identical fragments attached to the bead
  • 21. Sequencing: 454  Bead Recovery:  Enrichment: capture physical and successfully chemical disruption amplified beads using biotinylated primers + magnetic, streptavidin beads
  • 22. Sequencing: 454  Deposit bead layers onto PicoTiterPlate:  Enzyme beads  Enriched DNA beads  More enzyme beads  PPiase beads
  • 24. Sequencing: 454  Pyrosequencing  4 nucleotides flow separately  If nt incorporation…PPi...light  APS + PPi (sulfurylase) ATP  Luciferin + ATP (luciferase) light + oxyluciferin  Amount of light proportional to #nt incorporated  Rinse and repeat with next nt
  • 25. Sequencing: 454  Camera captures light emitted from every well during every nucleotide flow
  • 26. Sequencing: 454  Flowgram: representation of a sequence, based on the pattern of light emitted from a single well
  • 27. Sequencing: 454  Other terminology:  Lib-L/Lib-A: adapter variants, “ligated” or “annealed”  Titanium chemistry: ~450 bp reads on all instruments  XL+ chemistry: ~700 bp reads on the FLX+ instrument  Flow: one of the four nucleotides flows over the PTP  Cycle: a set of four flows, in order  Valley flow: if number of bases incorporated in a given read during that flow is uncertain, e.g. 1.5 units of light (background signal, homopolymers)  File format: sff (standard flowgram format)
  • 28. Sequencing: Ion Torrent  Procedures and chemistry similar to 454  Instead of PPi, measure H+ release (pH change) via semiconductor chip  No expensive camera or laser required, no modified nucleotides
  • 29. Sequence Quality Phred (Q) Probabilit Base Call  Error probabilities Score y of Error Accuracy determined using (P) training sets, 10 1 in 10 90% platform-specific 20 1 in 100 99% 30 1 in 1K 99.9% biases 40 1 in 10K 99.99%  Expressed as a 50 1 in 100K 99.999% quality value (QV or Q score) per base  Similar to PHRED scores:  Q = -10 log10P  P = 10 -Q/10
  • 30. Project 1: Microbial Genome  Considerations:  Coverage  Reference genome?  Depth (number of  How much coverage times a particular do I want? base is “covered” by a read (e.g. 25X)  How big is the genome  Breadth (% of genome with at least 1X  How much data do I coverage) need?  bp needed = genome size X coverage  Which instrument/chemistry configuration to use?
  • 31. Project 1: Microbial Genome  Sample preparation  Isolate high quality (not degraded) and high purity (no RNA) gDNA  Verify on a gel  Quantify using dsDNA-specific dye  Library preparation  Can do this yourself if you like  ~ $200 per sample for Nextera  Cheaper protocols  Cheaper in bulk  Barcode compatibility
  • 32. Project 1: Microbial Genome  Library QC  Insertsize confirmed on BioAnalyzer (within range, no artifacts)  Pool barcoded libraries (normalize based on PicoGreen quantification)  Absolute quantification of library pools using qPCR
  • 33. Project 1: Microbial Genome  MiSeq sequencing  Diluteand denature library pool (optimal concentration requires titration...)  Spike in PhiX library as needed (e.g. 1%)  Prepare and load reagents, flow cell  Basic filtering and de-multiplexing performed automatically  Download fastq files from BaseSpace
  • 34. Project 1: Microbial Genome  Data processing  Assembly:  Additional filtering overlapping reads  Trim the ends are assembled to  Remove PCR eachother based on duplicates sequence similarity = contigs
  • 35. Project 1: Microbial Genome  What’s next?  Polish the genome (hybrid assemblies, mate pair libraries)  Annotate (ORFs, RNA-seq)  Compare
  • 36. Project 2: Microbial Community  Shotgun  Targeted metagenomics metagenomics  Unbiased survey of  Limited survey of community content community content  Random library  Targeted loci provide fragments may excellent taxonomic provide very little resolution, but may taxonomic resolution exclude certain taxa (e.g. conserved, unknown)  Identify OTUs, classify  Identify genes, by taxonomy classify by function
  • 37. Project 2: Microbial Community  16S rRNA  Multi-copy gene (1.5 kb)  Conserved and hypervariable regions  Extensive databases from known species
  • 38. Project 2: Microbial Community  Considerations:  Sample preparation:  Biases in sampling  Isolate DNA methods, culturing,  PCR amplify, purify DNA isolation,  High-fidelity PCR...replicate polymerase  Available SOPs  Barcoded primers  How many reads per  No primer dimers! sample?  NormalizePCR  Read length products and pool matters!
  • 39. Project 2: Microbial Community  454 Sequencing  Data processing  emPCR titrations  De-multiplexing with different library  Additionalfiltering input  Trim the barcodes,  Bulk emPCR primers  Sequence  Check for chimeras  Basic filtering  Collect sff files
  • 40. Project 2: Microbial Community  Clustering  Sequences grouped by similarity = OTUs
  • 41. Project 2: Microbial Community  Taxonomic identification  OTUs are classifed by comparing to known 16S sequences  Level of classification (e.g. family vs genus)?  Diversity  Within sample  Between samples