SlideShare a Scribd company logo
1 of 54
Bio153 Microbial Genomics

               Professor Mark Pallen
            University of Birmingham
Microbial Genomics
   General features of microbial genomes
   Historical overview
   Genome sequencing, annotation and analysis
   Genome evolution
   What we can learn from a genome sequence?
General features of genomes
Microbial                               Human
 Small WSIWYG genomes                     Very large genomes
  (Mbp)                                     (Gbp)
 Gene density high (>90%)
       intergenic regions short
                                           Gene density low
       very little repetitiveor non-          Only 25% is genes
        coding DNA                             Introns mean only1%
       Introns very rare                       codes
   Protein-coding genes                   Genes can span ≥30
    (CDS) short (~1kbp)
                                            kbp
   Operons with promoters
    just upstream                          Genes have ~3
   Fewer non-coding RNAs                   transcripts
                                               Splicing and splice
                                                variants
Bacterial genome organisation

Chromosomes                          Plasmids
   Most commonly single                Independent autonomous
                                         replicon, can be circular or
    circular chromosome                  linear
    (always DNA)                        may integrate into chromosome
       BUT many species have           copy number varies 1 to 10s
        linear chromosome(s) (e.g.      often carry non-essential genes
        Borrelia, Streptomyces, Rh       that confer an adaptive
        odoccus)                         advantage in certain conditions
       BUT a few species with two
        chromosomes (e.g.
        Vibriocholerae)
   Can be mix of circular and
    linear (e.g.
    Agrobacteriumtumefacien
    s, B. burgdoferi)
Bacterial Genome Size
   species which occupy restricted ecological
    niches, (e.g. obligate intracellular parasites and
    endosymbionts) tend to have smaller genomes
    (<1.5 Mb) than generalist bacteria
       smallest known bacterial genome:
        Carsonellaruddii, 160 kb! (Nakabachi et al. 2006)
       BUT mitochondrial genomes are smaller
   largest genomes found in bacteria with complex
    developmental cycles, e.g. Streptomyces
       largest bacterial genome: Sorangiumcellulosum, 13
        Mb
Bacterial genomes are made from DNA
   In 1944, Oswald Avery, Colin MacLeod, and Maclyn
    McCarty showed that DNA (not proteins) was the genetic
    material responsible for inheritance.
       Identified DNA as the "transforming principle" while studying
        Streptococcus pneumoniae
       Avery, Oswald T., Colin M. MacLeod, and Maclyn McCarty.
        Studies on the chemical nature of the substance inducing
        transformation of pneumococcal types. Journal of Experimental
        Medicine. 1944 Feb 1; 79(2): 137-158.
   In 1952, this work was supported by Alfred Hershey and
    Martha Chase who showed that only the DNA of a virus
    needs to enter a bacterium to infect it.
       Used radioactively labelled bacteriophage
       Hershey AD and Chase M. Independent functions of viral
        protein and nucleic acid in growth of bacteriophage. Journal of
        General Physiology. 1952. 36: 39-56.
Viral genomes are variable
   Use RNA or DNA but not
    both in genome
       Some have RNA genomes!
   Grouped into families
    depending on
       type of genome: DNA or
        RNA, single- or double-
        stranded
       Typically dozens of genes
        or fewer
       Large genomes in pox
        viruses (~200 kb)
       Massive genomes in
        megaviruses (1Mbp!)
Microbial Genomics Timeline

Year   Milestone
1977   Invention of dideoxy chain terminator sequencing (“Sanger sequencing”)
1979   Sequencing of the 5.3-kilobase genome of bacteriophage phiX174
1981   First human mitochondrial genome sequence*
1982   Determination of the 48.5-kilobase genome sequence of bacteriophage lambda through first use
       of shotgun sequencing
1986   Development of automated fluorescent sequencing
1995   First complete genome sequences obtained of free-living bacteria (Haemophilus influenzae and
       Mycoplasma genitalium)
1996   Mycoplasma becomes first bacterial genus that has completely sequenced genomes from two
       different species (M. genitalium and M. pneumoniae)
1997   First genome sequences from Escherichia coli and Bacillus subtilis
1998   First genome sequence from Mycobacterium tuberculosis; genome sequence from
       Rickettsiaprowazekii provides first evidence of reductive evolution
Microbial Genomics Timeline
Year    Milestone
1999    Helicobacter pylori becomes the first species with completely sequenced genomes from two
        isolates
2000    Meningococcal genome sequence primes first application of reverse vaccinology
2001    Second E. coli genome sequences reveal unexpected level of horizontal gene transfer;
        genome sequence of M. leprae provides compelling evidence of bacterial pseudogenes and
        reductive evolution; first paper reporting genome sequences of two strains from one species
        (Staphylococcus aureus) in a single publication.
2002    Genome sequencing of multiple strains of Bacillus anthracis to provide markers for forensic
        epidemiology
2003    Genome sequencing of uncultivable Tropherymawhippleileads to design of axenic growth
        medium
2004    Genome sequence of mimivirus blurs distinctions between bacteria and viruses
2005    Use of whole-genome sequencing used to identify target of new anti-tuberculosis drug
        Mycoplasma genitalium genome sequenced using pyrosequencing
2006-   Bacterial metagenomics survey of the Sargasso sea yields >1 million new genes
2011    Rise of next-generation or high-throughput sequencing
The first genome sequences
   The first sequenced gene was from bacteriophage MS2
       The gene encoding the coat protein
       1972
       Min Jou W, Haegeman G, Ysebaert M, and Fiers W. Nucleotide
        sequence of the gene coding for the bacteriophage MS2 coat
        protein. Nature. 1972 May 12; 237(5350): 82-88.
   The first sequenced genome was bacteriophage MS2
       1976
       RNA genome is 3,569 nucleotides
       Fiers W, Contreras R, Duerinck F, Haegeman G, Iserentant
        D, Merregaert J, Min Jou W, Molemans F, Raeymaekers A, Van
        den Berghe A, Volckaert G, and Ysebaert M. Complete
        nucleotide sequence of bacteriophage MS2 RNA: primary and
        secondary structure of the replicase gene. Nature. 1976 Apr 8;
        260(5551): 500-507.
The first genome sequences
   The first sequenced DNA genome was bacteriophage Φ-
    X174
       1977
       5368 base pairs
       Sanger F, Air GM, Barrell BG, Brown NL, Coulson AR, Fiddes
        CA, Hutchison CA, Slocombe PM, and Smith M. Nucleotide
        sequence of bacteriophage phi X174 DNA. Nature. 1977 265
        (5596): 687-695.
   The first sequenced bacterial genome was Haemophilus
    influenzae
       1995
       1,830,140 base pairs
       Fleischmann R, Adams M, White O, Clayton R, Kirkness
        E, Kerlavage A, Bult C, Tomb J, Dougherty B, and Merrick J.
        Whole-genome random sequencing and assembly of
        Haemophilus influenzae Rd. Science, 1995. 269 (5223): 496-
        512.
Overview of a genome project
   Choose strain                       Closure and finishing
       Fresh isolate or tractable          Manually intensive
        lab strain?                         Difficulty depends on
   Choose strategy                          how repetitive
       Shotgun sequencing              Data Release
       Paired-end sequencing               Immediate or delayed?
       Draft or complete?              Annotation
   Choose chemistry                        Manually intensive bottle
       Sanger; 454; Illumina;               neck
        Ion Torrent                     Publication
   Assembly
       Automated
Methods for genome sequencing – historic
Sanger method sequencing
   Sanger F and Coulson AR. A rapid method for
    determining sequences in DNA by primed synthesis
    with DNA polymerase. Journal of Molecular Biology.
    1975 94: 441-448.
   Step 1, a sequence-specific DNA primer is radiolabeled
   Step 2, the primer is annealed to the template DNA
   Step 3, the primer is extended by DNA polymerase
       Incorporation of a deoxynucleotide - further extension possible
       Incorporation of a dideoxynucleotide – chain termination
   Four reactions set up
       ddATP, dATP, dCTP, dGTP, dTTP
       ddCTP, dATP, dCTP, dGTP, dTTP
       ddGTP, dATP, dCTP, dGTP, dTTP
       ddTTP, dATP, dCTP, dGTP, dTTP
Methods for genome sequencing – historic
Sanger method sequencing
Methods for genome sequencing –
automated Sanger sequencing
   Smith LM, Sanders JZ, Kaiser RJ, Hughes P, Dodd C, Connell CR, Heiner C,
    Kent SBH, and Hood LE. Fluorescence detection in automated DNA
    sequence analysis. Nature. 1986 321: 674-679.
   Replaced radioisotopes with fluorescent dyes
       Safer for the researchers
       Each of the four DNA bases could be dyed a different colour
       Eliminated the need to run separate reactions in separate lanes
       The migration of the dye could be read because of the fluorescence
       This information allowed automatic gel reading
   Further improvements were made
       Improved dye chemistry using fluorescent dideoxy-terminators (DuPont): Prober
        JM, Trainor GL, Dam RJ, Hobbs FW, Robertson CW, Zagursky RJ, Cocuzza AJ,
        Jensen MA, and Baumeister K. A system for rapid DNA sequencing with
        fluorescent chain-terminating dideoxynucleotides. Science 238: 336-341.
       Replacing slab gels with re-useable capillary tubes: Ruiz-Martinez MC, Berka J,
        Belenkii A, Foret F, Miller AW, and Karger BL. DNA sequencing by capillary
        electrophoresis with replaceable linear polyacrylamide and laser-induced
        fluorescence detection. Analytical Chemistry 1993 65: 2851-2858.
Whole-Genome Shotgun Sanger Sequencing
                            Random shearing
  bacterial
chromosome

                                                  Size selection



        plasmid vector
                                              Pick colonies to create shotgun
                            Cloning                       library




     Sequence each insert
       with two primers                            Plasmid preps
High-throughput Sequencing
   100x faster, 100x cheaper!
       A disruptive technology
   Several technologies in the marketplace from 2007
    onwards
       454 (Roche)
       Illumina
       Ion Torrent
       PacBio
   Fundamentally new approaches
       Solid-phase amplification of clonal templates in “molecular
        colonies”
           Massive increase in number of “clones” compensates for shorter
            read length
       New chemistries for sequence reading
         454: pyrophosphate detection on base addition
         Illumina: reversible de-protection of fluorescent bases
High-Throughput Shotgun Sequencing
                 Random shearing
  bacterial
chromosome

                                    Size selection




      Sequence      Amplify        Add adapters
454 sequencing


 Emulsion-based clonal amplification




Anneal sstDNA to                              Clonal amplification          Break
                   Emulsify beads and PCR
an excess of DNA                                 occurs inside       microreactors, enric
                   reagents in water-in-oil
 Capture Beads                                   microreactors       h for DNA-positive
                       microreactors
                                                                            beads
Pyrosequencing
    DNA template with primer
     mixed with the enzymes along
     with the two substrates
     adenosine 5‟-phosphosulfate
     (APS) and luciferin
1.     one of the four nucleotides
       added to reaction
2.     If complementary to base in
       template strand then DNA
       polymerase incorporates it
3.     Pyrophosphate (Ppi)
       released then converted to
       ATP by sulfurylase in the
       presence of APS.
4.     ATP serves as a substrate to
       luciferase, causing a light
       reaction.
5.     Excess nucleotides degraded
       by apyrase.
Illumina
Sequencing
The Sequence Assembly Problem
   Sequencing technologies generate reads of <1000
    bp
   These reads must be assembled into a single
    continuous genomic sequence.
   Shotgun sequencing exploits many overlapping
    sequences (high coverage) to infer ordering directly
    from the sequences themselves
The Repeat Problem
   Repeats at read ends can be assembled in multiple
    ways
    Correct
    ATTTATGTGTGTGTGGTGTG
                 GTGTGGTGTGCACTACTGCT
                           ACTACTGCTGACTACTGTGTGGTGTG
                                         GTGTGGTGTGATATCCCT

    Incorrect
     ATTTATGTGTGTGTGGTGTG
                   GTGTGGTGTGATATCCCT


                ACTACTGCTGACTACTGTGTGGTGTG
                              GTGTGGTGTGCACTACTGCT
Random shearing
    bacterial
  chromosome

                                                  Size selection for 3kb or 8kb etc

Obtain sequences from
  either side of linker

                          Paired-end
known distance apart in
        genome

                          Sequencing                                Add linkers




                                                                   Circularise
   Add adapters            Shear and select on size and
                               presence of linkers




                                                   Create long fragments of known
                                                   length
                                                   Obtain sequence from paired ends
                                                      known distance apart
                                                   Allows assembly of contigs across
                                                   repeats into scaffolds
Genome Assembly




   Contig 1            Contig 2                 Contig 3
                                      Sequence Gap

                                  Scaffold

        Physical Gap
Re-sequencing
   Short reads (<200bp)
    inefficient de novo
    assembly
   Instead they are
    mapped against a
    reference genome
   Re-sequencing is like
    assembling a jigsaw
    puzzle using the image
    on the lid
Genome annotation
   Annotation is the addition of information about the
    predicted sequence features to the flat file of DNA code
   Identification of potential coding sequences - CDS
   Homology searches to predict function
   Other features can be annotated as well
       rRNAs
       Potential promoters
       tRNAs
       Small non-coding RNAs
       Repeat sequences
       Insertion sequences (ISs), transposons, gene fragments
   Location of the origin of replication
   Determination of the number of bases, genes, and
    G+C%.
How to go from this….?
>Escherichia coli K-12 MG1655_3870656-3890655
      TGCTGCTGCCTGCTGCGCGGTGCGCTCTACGGATTGCCCGGCGCGATAGAGATCGCTGCCTAAGCCCGCCCCTGCACAACCTGCGTCTATCCACTGCGCCAGGTTTTCTGCGTCACGCCGCAAC
      GGCAAAGACTGCGATGTCCGATGGCAATACCGCTTTTAACGCTTTGATGTATTGCGGACCAAAAGCCGATGACGGAAATATTTTCAGCGCCTGCGGCGCCCGCTTCGAGCGCGGTAAAGGCTTCG
      GTCGCCGTCGCGCAGCCGGGGCAGACGTCATGCCGTAGCCCACCGCACGGCGGATCACTTCACTATGGATATTGGGCGTAACGATGAGCTGACAGCCCATCCTGGCGAGCGCATCGACCTGTT
      CAGGTTTCAGTACCGTACCTGCGCCAATCAACGCCTTGTCGCCGTACGCATCAACGATGCGGGAATGCTTTGCTCCCATTGTGGGGAATTCAGCGGGATTTCAACCGCGTCGAACCCGGCGTCAA
      TCACCGCGCCAACATGCGCCAGCGCCTCGTCGGGCGTAATACCGCGCAAAATGGCGATCAGCGGGAGTTTAGTTTGCCACTGCATGAGGATGCTCCTTATACCAGCCTGAAATGCCGTGTCGCC
      CGCCACCGCCGTCACGTCGCAACCCATCGCCTGAAAGGCTTGCTGGTAGCGCGCGGTCAGCGATGTTCCGGCGACAAGGGTGATGGCGTGTTGATGGGCCACATAGTCGCGCATACTGGGACC
      TCTGCGCCAATCAACAAACCAGAGAGAAATTCGCTGACCTGTTCGCGGGGAAGTGTTCCCAGCACATGCGAGGCGCGAACTTCAAAAAGCTGCGGCAATATGGCGGGCGTATTAAGACCACGCT
      CAAGGCCAGCTGTGAAGGCATCGGCAGGTTTTCCTGCGGCGGCAAACCTGCGCCAATCAATGAGTGATTTAACAGTAAATGATGTAATTCACCGGTCATCACGGTGCGAAAATCGTTGATTTGCTG
      GCTATCGGCCTGCACCCATTTGCAATGGGTTCCGGGCATGACATAAAGAGAGGAAGAGCCAGAGCTCGCGCGCCGATCAATTGTGTTTCTTCGCCGCGCATCACATTGTGGTTATCGTCATGAGA
      GACACATAATCCGGGAATAATCCAGATATTGTCGCCAACTGACGTTAATTGTTCGCCAATAGACGAAAAACAGGCAGGAACAGATAATACGGTGCAACTTTCCAGCCGACGTTGCTGCCAACCATT
      CCTGCCATTACCACTGGCGTTTTCTCTTCACGCCAGTCGGTCGTGACTTCTGCTAACACCGCAGCCGGAGATTTTCCGTTCAGGCGCGTGACGCCTGCTTCTGATTGCCTGCTCTCAGGCAGTGG
      TCGCCCTGATAAAGCCAGGCGCGCAGATTGGTCGATCCCCAGTCAATTGCGATGTAGCGAGCTGTCATGTGATTTCCTTTAACCTTCGTGTCGAGCTGGCGATCATGGTAAGCGCCGCCTGCTCT
      GCCGCATCGCCGTCCTGATGCGTATCGCATCGAACAGCGCCTTATGTTCCTGGAGCGTTTGCGGCATGTTGGCCTCATCGCCCATCCAGGTTCGTTCAAAAACCGCCCGCTGCAGCGAACTGATC
      GCAATGCTAAGTTGCTGTAACACCGGGTTATGCACCGACTGCAGCACCGCTCGTGGTAGCGAATATCCGCTTCGTTAAACGCTTCGCGGTCCTGATTGTTGGCAATCATCTCGTTCAGCGCCGATT
      CAATCTGCGCCAGATCGCTGGAAGTCGCGCGCTCTGCTCCCAACGGGCAATCGCCGGTTCCACCAGATTTCGCACTTCGTCATGGCACTGATAAGCCGTGGGTCGTAGTCATTTTCCAGCACCCA
      TTGCAGTACGTCAGTGTCGAGGTAATTCCACTGGTTACGCGGTGCCACAAACGCCCCGCGATAACGTTTCATTTCAATCAGCCGCTTCGCCATCAGCGAACGGAACACCCACGGATGATGTTGCG
      CGAGGTTGCAAACTCCTCACAGAGTTCCGCCTCAGCCGGAAGCGGCGAGCCTGGCACGTATTTGCCGTGAACGATCTGTTTACCCAGCGTAATGACAATGCGATCGGTTTTATTGAGAGTCATGG
      AGAGTCCTTGTGCTTGTATGTTCTTCTCTACTTTACCCCGATCGATGCATAACGCGGCAACTTTGTAGTACCAGCGTGATGACGTTCGCGTTTGCCGTGCGTGTAATGTAGTACAAACTTATATTGTT
      GTACTACAATTTAGATCACAAAAAGAACAATGCATAAAAAATGACATGCGTCGGGCAGAAATCTGAAAAGGGATATCAGGCGCTAAACAGGAGGGAAAGAAGAGTATGCTTTCAACGGCTTAGCTA
      CTCGTTTAAAGGATTAATCATGAAGTTGAATTTTAAGGGATTTTTTAAGGCTGCCGGTTTATTCCCACTGCGCTGATGCTTTCAGGCTGTATCTCGTATGCTCTGGTTTCCCATACCGCAAAGGGTAG
      TTCAGGAAAGTATCAATCGCAGTCAGACACCATCACTGGGCTATCGCAGGCAAAAGATAGTAATGGAACAAAAGGCTATGTTTTTGTAGGGGAATCGTGGATTACCTTATCACTGATGGTGCCGAT
      GACATCGTTAAGATGCTCAATGATCCAGCACTTAACCGGCACAATATTCAGGTTGCCGATGACGCAAGATTTGTTTTAAATGCGGGGAAAAAGAAATTTACCGGCACAATATCGCTTTACTACTACG
      GAATAACGAAGAAGAAAAGGCACTGGCAACGCATTATGGTTTTGCCTGTGGTGTTCAACACTGTACCAGGTCACTGGAAAACCTAAAAGGCACAATCCATGAGAAAAATAAAAACATGGATTACTCA
      AAGGTGATGGCGTTCTACCATCCATTTAAGTGCGATTTTATGAATACTATTCACCCAGAGGCATTCCGGGATGGTGTTTCCGCAGCATTACTGCCAGTGACTGTTACGCTGGACATCATTACTGCAC
      CGCTGCAATTTCTGGTTGTATATGCAGTAAACCAATAATCAGTAAGCGGGCAAACCGTTTATGCTGTTTGCCCGCCCACAGATTAATTCAGCACATACTTCTCAATAGCAAACGCCACGCCATCTTCA
      AGGTTAGATTTGGTGACAAAGTTCGCCACTTCTTTCACTGAAGGAATAGCGTTATCCATCGCCACACCGACGCCTGCATATTAATCATTGCGATATCGTTTTCCTGATCGCCAATCGCCATGATTTCT
      TCCGGTTTAATACCTAACACGTCGGCCAGTGATTTCACCCCCGTACCTTTGTTAACGCGTTTATCGAGGATTTCGAGGAAGTACGGCGCACTTTTCAGCACGGTATATTCTCTTTCACTTCCTGCGG
      AATACGCGCGATAGCCTGGTCGAGGATGGCGGGTTCATCAATCATCATCACTTTCAGGAACTGGGTATTGGGGTCCATTTTCTCCGCTTCGCAGAACACCAGCGGAATGGTGGCAACGAAGGATT
      CATGCACCGTGTGTAGCTGATATCACGGTTGGCGGTGTACAGCGTGGTGCGGTCCAGGGCGTGGAAATGAGAACCGACTTCGCGAGAGAGTTTTTCCAGGAAACGATAGTCGTCATAGCTGAGA
      GCAGTTTGCGCCACGGTGCTACCATCAGCGGCCTTCTGTACCACGCGCCGTTATAAGTAATGCAGTAGTCGCCCGGCTGTTCCATATGCAGCTCTTTCAGGTAGTTGTGCACACCTGCATACGGG
      CGACCCGTCGTTAGCACGACATTCACGCCACGGGCGCGAGCTGCGGCAATCGCATTTTTAACGGCGGGTGAAAGGTGTGATCGGGCAGCAGAAGGGTGCCATCCATATCGATAGCAATGAGTTT
      AATAGCCATGAGTTCCCCAGGTAGATTGGTTCCTGACCCATGCTAACGCGATTCCGCTCAAAAATCAGTACAACACCCGAGGGAAAAGGGGGATGCAACGCGCGTGCGTGCTCCCTTTTTGCTTA
      GCGGAAGAGTTTCCCTTTCAGCAGTTCCATGCCTGCGGAAAGCAGATCGTTATTGGCTTGTGGTGACACTTCACCTTGCGGTGAGAGCGCATCAATAATCTTCGGCAATTGTTCTGCCAGTAAACT
      GGAAGCTGACTGGTATCCACGCCAAGTTTTTGCCCGAGATCGGACACCGCATTTGTGCCGAGCGCCGATTCCAGTTGCTCGCCACTAACCGATTGATTGCCCTGTTGATTACTCAGCCAGGTTGA
      GAGAATGGCCCCTAAGCCGCCACTTTGCAGTTTTTCCACAGCACCTGAATGCCGCCCTGCTCCTCAACCCAACTTAAAATAGCCTGATATTTCCCCGCATCGCCTTTCAGAAAGGCACCGACAACTT
      CATCAAAAAGCCCCATGATAATCACCTGTAAAGCGTTACGTGTTGACCCAAAAAGTATAGATTTGCGGATGATAATTGCGGATTGCAGAAATAAAAAGGGCGGAGATGATCTCCGCCCTTTTCTTAT
      AGCTTCTTGCCGGATGCGGCGTGAACGCCTTATCCGGCCTACAAAATCATGAAAATTCAATACATTGCAAGATTTTCGTAGGCCTGATAAGCGTGCGCATCAGGCACGCTCGCATGGTTAGCGCCA
      TTAAATATCGATATTCGCCGCTTTCAGGGCGTTCTCTTCAATAAACGCACGGCGCGGTTCAACGGCGTCGCCCATCAGCGTGGTGAACAACTGGTCGGCAGCAATCGCATCTTTAACGGTAACCG
      CAGCATACGACGACTTTCCGGGTCCATAGTGGTTTCCCACAGCTGTTCCGGGTTCATCTCGCCCAGACCTTTATAACGCTGGATGGAGAGGCCGCGACGGGACTCTTTCACCAGCCAGTCCAGCG
      CCTGCTCGAAGCTGGCTACCGGCTGACGCGCTCGCCACGTTCGATAAACGCATCTTCTTCCAGCAAGCCACGCAGTTTCTCACCCAGCGTGCAGATACGACGATATTCGCCACCGGTGATAAACT
      CGTGATCCAGCGGATAGTCAGTATCCACACCGTGGGTACGCACGCGAACAATCGGCTCAACAGGTTTTGCTCAGCATTGGTGTGAACATCAAACTTCCACTGGCTGCCGTGCTGTTCTTTGTCGTT
      CAGTTCGCTGACCAGCGCGTTCACCCAGCGGGTAACGGTCTGCTCATCAGAAAGGTCAGCTTCCGTCAACGTCGGCTGATAGATAAGTCTTTCAGCATTGCTTTCGGATAACGACGCTCCATACG
      ATTGATCATTTTCTGCGTCGCGTTGTACTCAGATACCAGTTTCTCTAACGCTTCGCCAGCCAATGCCGGTGCACTGGCGTTGGTGTGCAGCGTTGCGCCGTCCAGCGCGATAGAGATTGGTACTG
      ATCCATCGCTTCGTCGTCTTTAATGTACTGTTCCTGCTTGCCTTTCTTCACTTTGTACAGCGGCGGCTGAGCGATGTAGACGTGACCGCGTTCAACGATTTCCGGCATCTGACGATAGAAGAAGGT
      CAACAGCAGCGTACGAATGTGGAGCCGTCGACGTCCGCATCGGTCATGATGATGATGCTGTGATAACGCAGTTTGTCCGGGTTGTACTCGTCACGACCGATACCACAGCCAAGCGCGGTGATAA
      GCGTCGCCACTTCCTGAGAAGAGAGCATCTTATCGAAGCGCGCTTTCTCGACTTGAGGATTTTACCCTTCAGCGGCAGAATCGCCTGGTTCTTGCGGTTACGCCCCTGCTTCGCAGAGCCGCCCG
      CGGAGTCCCCTTCCACCAGGTACAGTTCGGAAAGCGCCGGATCGCGTTCCTGGCAGTCTGCCAGTTTGCCCGGCAGGCCCGCAAGTCGAGCGCACCTTTACGGCGGGTCATTTCACGCGCGCG
      ACGCGGCGCTTCACGGGCACGGGCAGCATCGATAATTTTGCCAACCACGATTTTCGCGTCGGTTGGGTTTTCCAGCAGGTATTCTGCCAGCAGTTCGTTCATCTGCTGTTCAACGCCGATTTCACC
      TCAGAAGAAACCAGTTTGTCTTTGGTCTGGGAGGAGAATTTCGGGTCCGGCACTTTCACGGAAACGACCGCAATCAGGCCTTCACGCGCATCGTCACCGGTGGCGCTGACTTTGGCTTTTTTGCT
      GTAGCCTTCTTTGTCCATTAGGCGTTCAGGGTACGGGTCATCGCCGCACGGAAGCCTGCCAGGTGAGTACCGCCGTCACGCTGCGGAATGTTGTTGGTAAAGCAGTAGATGTTTTCCTGGAAGCC
      ATCGTTCCACTGCAACGCCACTTCGACGCCAATACCGTCTTTTTCAGTGAGAAGTAGAAGATATTCGGGTGGATCGGCGTTTTGTTCTTGTTCAGATATTCAACGAACGCCTTGATGCCGCCTTCAT
      AGTGGAAGTGGTCTTCTTTGCCGTCGCGCTTGTCGCGCAGACGAATGGAAACGCCGGAGTTGAGGAACGACAACTCCGCAGACGTTTCGCCAGAATTTCATATTCGAACTCGGTCACATTGGTGA
      AGGTTTCGAGGCTGGGCCAGAAACGCACCATGGTGCCGGTTTTTTCAGTCTCGCCGGTAACCGCCAGCGGGGCCTGCGGTACACCGTGTTCGTAGATCTGACGGTGATTTTACCCTCGCGCTGG
      ATAACCAGCTCCAGTTTTTGCGACAGGGCGTTTACTACCGAAACACCAACGCCGTGCAGACCGCCGGACACTTTATAGGAGTTATCGTCAAATTTACCGCCTGCGTGCAGAACGGTCATGATCACT
      TCCGCCGCCGA
…to this?
   FT gene complement(9299..10702)
   FT /db_xref="GenBank:2367266”
   FT /gene="dnaA”
   FT /note="b3702”
   FT CDS complement(9299..10702)
   FT /db_xref="GI:2367267”
   FT /db_xref="PID:g2367267”
   FT /function="putative regulator; DNA - replication, repair,
   FT restriction/modification”
   FT /codon_start=1
   FT /protein_id="AAC76725.1”
   FT /gene="dnaA”
   FT /translation="MSLSLWQQCLARLQDELPATEFSMWIRPLQAELSDNTLALYAPNR
   FT FVLDWVRDKYLNNINGLLTSFCGADAPQLRFEVGTKPVTQTPQAAVTSNVAAPAQVAQT
   FT QPQRAAPSTRSGWDNVPAPAEPTYRSNVNVKHTFDNFVEGKSNQLARAAARQVADNPGG
   FT AYNPLFLYGGTGLGKTHLLHAVGNGIMARKPNAKVVYMHSERFVQDMVKALQNNAIEEF
   FT KRYYRSVDALLIDDIQFFANKERSQEEFFHTFNALLEGNQQIILTSDRYPKEINGVEDR
   FT LKSRFGWGLTVAIEPPELETRVAILMKKADENDIRLPGEVAFFIAKRLRSNVRELEGAL
   FT NRVIANANFTGRAITIDFVREALRDLLALQEKLVTIDNIQKTVAEYYKIKVADLLSKRR
   FT SRSVARPRQMAMALAKELTNHSLPEIGDAFGGRDHTTVLHACRKIEQLREESHDIKEDF
   FT SNLIRTLSS”
   FT /product="DNA biosynthesis; initiation of chromosome
   FT replication; can be transcription regulator”
   FT /transl_table=11
   FT /note="f467; 100 pct identical to DNAA_ECOLI SW: P03004;
   FT CG Site No. 851”

Or this?
An ORF is not a CDS!
An ORF is just an open reading frame
There are many more ORFs than protein coding genes (CDSs) in a
genome


                                                        Non-coding ORFs




                                                           CDSs
                                                    (note ORF can extend
                                                   upstream of start codon)
The Problem of Frameshift Errors
      Actual sequence

     10      20   30    40   50    60   70
     |     |    |    |    |   |    |
 ATGAGTACCGCTAAATTAGTTAAATCAAAAGCGACCAATCTGCTTTATACCCGCAACGATGTCTCCGACAGCGAGAAA
 M S T A K L V K S K A T N L L Y T R N D V S D S E K
 • V P L N • L N Q K R P I C F I P A T M S P T A R K
  E Y R • I S • I K S D Q S A L Y P Q R C L R Q R E K


     10      20   30    40    50    60   70
     |     |    |    |    |    |    |
 ATGAGTACCGCTAAATTAGTTAAATCAAAAAGCGACCAATCTGCTTTATACCCGCAACGATGTCTCCGACAGCGAGAA
 M S T A K L V K S K S D Q S A L Y P Q R C L R Q R E
 • V P L N • L N Q K A T N L L Y T R N D V S D S E K
  E Y R • I S • I K K R P I C F I P A T M S P T A R K


      Frameshifted sequence after single base error
Homology
   Similarities in form       the cat sat on the mat
    (sequence) allow us        die Katze sass auf der Matte
    to infer similarities in
    “meaning” (structure
    and function)
   Homology is not just
    sequence similarity
       Two sequences can
        be similar without
        any common
        ancestry, particularly
        if low complexity      vge|GBant88-2      ITLITCVSVKDNSKRYVVAG
                               vge|GEfae9-178     LTLITCDQATKTTGRIIVIA
                               vge|GSpne1-403     MTLITCDPIPTFNKRLLVNF
                               sortase_staur      LTLITCDDYNEKTGVWEKRK
Types of Homology
   Homologues can be
    divided into
       Orthologues: lines of
        descent congruent with
        whole genome
       Paralogues: result of
        gene duplication
       Xenologues: result of
        HGT
Homology Searches
   The aim of homology searches is to identify sequences
    within these databases that are homologous to your
    sequence.
   This involves comparing your sequence with all the
    database sequences
       looking for stretches of sequence that appear to be similar
       then scoring the matches and ranking them
       a measure of the significance of the match is given
   Most common program used for homology searches is
    BLAST
Bacterial Genome Dynamics
       Gene Loss                         Gene Duplication
                                                                         Gene Gain

      Drastic downsizing in isolated
      intracellular niches                                              Horizontal gene transfer
                                                                        by phage, plasmids,
                                                                        pathogenicity islands




                                          Bacterial                       Rapid emergence of
Accumulation of
                                                                          genetically uniform
pseudogenes and IS                        Genome                          pathogens from variable
elements after shift to                   Dynamics                        ancestral populations
new niche




           Recombination and
           rearrangements                                   single nucleotide polymorphisms (SNPs)




                                       Gene Change
Horizontal gene transfer
   Horizontal (or lateral) gene transfer denotes any
    transfer, exchange or acquisition of genetic material that
    differs from the normal mode of transmission from
    parents to offspring (vertical transmission).

         Vertical gene transfer
                              Horizontal gene
Bacterial mobile genetic elements
   Transposons
       pieces of DNA that act as „jumping genes‟ that change
        location on chromosome or plasmid chromosomal
        localization.
       encode transposase that catalyses the transposition
        event
       can carry resistance or virulence genes
   Insertion sequences (IS elements)
       transposable elements that encode only the transposase
       multiple copies of same IS within genome provide targets
        for homologous recombination, rearrangements and
        replicon fusions
   Conjugative transposons
       normally integrated into the chromosome
       excise then transferred to recipient cells by conjugation
Bacterial mobile genetic elements
   Plasmids
       self-replicating extrachromosomalreplicons
       usually circular but can be linear
       Can carry resistance or virulence genes
   Bacteriophages
       bacterial virusescan carry virulence genes
       can insert into bacterial chromosome as prophages
        (lysogeny)
   Integrons
       complex natural cloning and gene expression systems
        able to capture promoterless gene cassettes by site-
        specific recombination
       allow formation of large arrays of gene cassettes
        transferred as a whole between different replicons.
Genomic islands
   large chromosomal regions, part of the flexible gene
    pool
   previously transferred by other mobile genetic
    elements
   present in some bacteria but absent in close
    relatives
   carry multiple genes that increase phenotypic
    versatility
   contribute to dynamic character of bacterial
    chromosomes and can be excised from the
    chromosome and transferred to other recipients
   pathogenicity islands contain dozens of genes that
    allow quantum leap to complex new virulence
Core genomes and Pangenomes
   Core genome
       pool of genes shared by all members of a bacterial
        species
   Accessory or dispensable genome
       pool of genes present in some but not all genomes within
        the same bacterial species
   Pangenome
       global gene repertoire of a bacterial species, comprised of
        core genome + accessory genome
   Metagenome
       global gene repertoire of mixed microbial population
Escherichia coli Core and Pan-genomes




                         Welch et al. Proc Natl Acad Sci U S A. 2002 Dec 24;99(26):17020-4
Metagenomics
   Environmental shotgun
    sequencing
       DNA extracted from
        mixed microbial
        communities sequenced
        en masse
   Assembled into contigs
       Typically only small
        contigs can be obtained
Uses of a genome sequence
   Gene discovery
       Fuelling hypothesis driven research on pathogen biology
   Comparative genomics
       SNP discovery and genomic epiemiology
   Functional genomics
       Transcriptomics
       Proteomics
       Interactome
       Structural Genomics
       Mass Mutagenesis
Haemolytic-uraemic syndrome
   Shiga-toxin-producing E. coli (STEC)
       bloody diarrhoea; damage to kidneys and brain
       anaemia; loss of platelets
German E. coli O104:H4 outbreak

   May-July 2011
   >4000 cases
   >40 deaths
   Link to sprouting seeds
   High risk of haemolytic-
    uraemic syndrome
   Females particularly at risk


        Frank et al DOI: 10.1056/NEJMoa1106483
Take-away messages from the genome
   Pathogens don‟t bother with passports!
       Not a new strain: something similar seen in Germany ten
        years ago and in Korea
       closest genome-sequenced strain was isolated from Central
        African Republic in late 1990s, belongs to an
        enteroaggregative lineage
   German STEC probably comes from a lineage
    circulating in human populations rather than from an
    animal source (unlike E. coli O157)
Take-away messages
   Bacteria evolve
    quickly
       Virulence factors in E.
        coli can jump from one
        lineage to another on
        mobile genetic
        elements
       Pathotypes can
        overlap and evolve
       Antibiotic resistance
        seen where no
        obvious prior use of
        antibiotics
Take-away messages from genome sequence
   Genome sequencing brings the advantages of
       open-endedness (revealing the “unknown unknowns”),
       universal applicability
       ultimate in resolution
   Bench-top sequencing platforms now generate data
    sufficiently quickly and cheaply to have an impact on
    real-world clinical and epidemiological problems
Comprehensive Coverage of Human Microbiome
Comprehensive coverage of tree of life
What will you do when you can sequence
everything?

More Related Content

What's hot

Vector mediated gene transfer methods for transgenesis in Plants.
Vector mediated gene transfer methods for transgenesis in Plants.Vector mediated gene transfer methods for transgenesis in Plants.
Vector mediated gene transfer methods for transgenesis in Plants.Akshay More
 
Microbial sequencing
Microbial sequencingMicrobial sequencing
Microbial sequencingDynah Perry
 
Artificial chromosome
Artificial chromosomeArtificial chromosome
Artificial chromosomeSafali Gupta
 
Whole genome sequence.
Whole genome sequence.Whole genome sequence.
Whole genome sequence.jayalakshmi311
 
Introduction to Metagenomics. Applications, Approaches and Tools (Bioinformat...
Introduction to Metagenomics. Applications, Approaches and Tools (Bioinformat...Introduction to Metagenomics. Applications, Approaches and Tools (Bioinformat...
Introduction to Metagenomics. Applications, Approaches and Tools (Bioinformat...VHIR Vall d’Hebron Institut de Recerca
 
Single Nucleotide Polymorphism (SNP)
Single Nucleotide Polymorphism (SNP)Single Nucleotide Polymorphism (SNP)
Single Nucleotide Polymorphism (SNP)amna munir
 
Transfection methods (DNA to host cell)
Transfection methods (DNA to host cell) Transfection methods (DNA to host cell)
Transfection methods (DNA to host cell) Erin Davis
 
Cosmid Vectors, YAC and BAC Expression Vectors
Cosmid Vectors, YAC and BAC Expression VectorsCosmid Vectors, YAC and BAC Expression Vectors
Cosmid Vectors, YAC and BAC Expression VectorsCharthaGaglani
 
Chromosome walking
Chromosome walkingChromosome walking
Chromosome walkingAleena Khan
 
16S Ribosomal DNA Sequence Analysis
16S Ribosomal DNA Sequence Analysis16S Ribosomal DNA Sequence Analysis
16S Ribosomal DNA Sequence AnalysisAbdulrahman Muhammad
 

What's hot (20)

Vector mediated gene transfer methods for transgenesis in Plants.
Vector mediated gene transfer methods for transgenesis in Plants.Vector mediated gene transfer methods for transgenesis in Plants.
Vector mediated gene transfer methods for transgenesis in Plants.
 
16s
16s16s
16s
 
Microbial sequencing
Microbial sequencingMicrobial sequencing
Microbial sequencing
 
Artificial chromosome
Artificial chromosomeArtificial chromosome
Artificial chromosome
 
Whole genome sequence.
Whole genome sequence.Whole genome sequence.
Whole genome sequence.
 
Phagemid vector
Phagemid vectorPhagemid vector
Phagemid vector
 
Introduction to Metagenomics. Applications, Approaches and Tools (Bioinformat...
Introduction to Metagenomics. Applications, Approaches and Tools (Bioinformat...Introduction to Metagenomics. Applications, Approaches and Tools (Bioinformat...
Introduction to Metagenomics. Applications, Approaches and Tools (Bioinformat...
 
PPT ON MICROBIAL GENOME
PPT ON MICROBIAL GENOMEPPT ON MICROBIAL GENOME
PPT ON MICROBIAL GENOME
 
Gene isolation methods
Gene isolation methodsGene isolation methods
Gene isolation methods
 
Single Nucleotide Polymorphism (SNP)
Single Nucleotide Polymorphism (SNP)Single Nucleotide Polymorphism (SNP)
Single Nucleotide Polymorphism (SNP)
 
Genome mapping
Genome mapping Genome mapping
Genome mapping
 
Genome annotation
Genome annotationGenome annotation
Genome annotation
 
Transfection methods (DNA to host cell)
Transfection methods (DNA to host cell) Transfection methods (DNA to host cell)
Transfection methods (DNA to host cell)
 
Cosmid Vectors, YAC and BAC Expression Vectors
Cosmid Vectors, YAC and BAC Expression VectorsCosmid Vectors, YAC and BAC Expression Vectors
Cosmid Vectors, YAC and BAC Expression Vectors
 
Metagenomics
MetagenomicsMetagenomics
Metagenomics
 
YEAST TWO HYBRID SYSTEM
 YEAST TWO HYBRID SYSTEM YEAST TWO HYBRID SYSTEM
YEAST TWO HYBRID SYSTEM
 
Chromosome walking
Chromosome walkingChromosome walking
Chromosome walking
 
16S Ribosomal DNA Sequence Analysis
16S Ribosomal DNA Sequence Analysis16S Ribosomal DNA Sequence Analysis
16S Ribosomal DNA Sequence Analysis
 
Artificial Vectors
Artificial VectorsArtificial Vectors
Artificial Vectors
 
Viral vector
Viral vectorViral vector
Viral vector
 

Viewers also liked

SPIN Workshop Microbial Genomics @NIST
SPIN Workshop Microbial Genomics @NISTSPIN Workshop Microbial Genomics @NIST
SPIN Workshop Microbial Genomics @NISTnist-spin
 
Improving and validating the Atlantic Cod genome assembly using PacBio
Improving and validating the Atlantic Cod genome assembly using PacBioImproving and validating the Atlantic Cod genome assembly using PacBio
Improving and validating the Atlantic Cod genome assembly using PacBioLex Nederbragt
 
Bio305 genome analysis and annotation 2012
Bio305 genome analysis and annotation 2012Bio305 genome analysis and annotation 2012
Bio305 genome analysis and annotation 2012Mark Pallen
 
Rapid outbreak characterisation - UK Genome Sciences 2014 - wed 3 sep 2014
Rapid outbreak characterisation  - UK Genome Sciences 2014 - wed 3 sep 2014Rapid outbreak characterisation  - UK Genome Sciences 2014 - wed 3 sep 2014
Rapid outbreak characterisation - UK Genome Sciences 2014 - wed 3 sep 2014Torsten Seemann
 
Nikita rory dkoda
Nikita rory dkodaNikita rory dkoda
Nikita rory dkodamaths00001
 
Grantsmanship: A personal view
Grantsmanship: A personal viewGrantsmanship: A personal view
Grantsmanship: A personal viewMark Pallen
 
EVO Jaarcongres 2014 - Presentatie shopping 2020
EVO Jaarcongres 2014 - Presentatie shopping 2020 EVO Jaarcongres 2014 - Presentatie shopping 2020
EVO Jaarcongres 2014 - Presentatie shopping 2020 evofenedex
 
Hum evolgen2011 scatterlingsofafrica
Hum evolgen2011 scatterlingsofafricaHum evolgen2011 scatterlingsofafrica
Hum evolgen2011 scatterlingsofafricaMark Pallen
 
Northern ireland interviewees
Northern ireland intervieweesNorthern ireland interviewees
Northern ireland intervieweeskatyfleury
 
What Is Web 2.0 ?
What Is Web 2.0 ? What Is Web 2.0 ?
What Is Web 2.0 ? Jeremaya
 
Bio380 Human Evolution: Waking the dead
Bio380 Human Evolution: Waking the deadBio380 Human Evolution: Waking the dead
Bio380 Human Evolution: Waking the deadMark Pallen
 
2011-04-26_various-assemblers-presentation
2011-04-26_various-assemblers-presentation2011-04-26_various-assemblers-presentation
2011-04-26_various-assemblers-presentationmhaimel
 
Bio380 Cancer Phylogenomics
Bio380 Cancer PhylogenomicsBio380 Cancer Phylogenomics
Bio380 Cancer PhylogenomicsMark Pallen
 
Genome assembly: then and now — v1.1
Genome assembly: then and now — v1.1Genome assembly: then and now — v1.1
Genome assembly: then and now — v1.1Keith Bradnam
 
Bio303 Lecture 2 Two Old Enemies, TB and Leprosy
Bio303 Lecture 2 Two Old Enemies, TB and LeprosyBio303 Lecture 2 Two Old Enemies, TB and Leprosy
Bio303 Lecture 2 Two Old Enemies, TB and LeprosyMark Pallen
 

Viewers also liked (20)

SPIN Workshop Microbial Genomics @NIST
SPIN Workshop Microbial Genomics @NISTSPIN Workshop Microbial Genomics @NIST
SPIN Workshop Microbial Genomics @NIST
 
Shapes of bacteria
Shapes of bacteriaShapes of bacteria
Shapes of bacteria
 
Improving and validating the Atlantic Cod genome assembly using PacBio
Improving and validating the Atlantic Cod genome assembly using PacBioImproving and validating the Atlantic Cod genome assembly using PacBio
Improving and validating the Atlantic Cod genome assembly using PacBio
 
Bio305 genome analysis and annotation 2012
Bio305 genome analysis and annotation 2012Bio305 genome analysis and annotation 2012
Bio305 genome analysis and annotation 2012
 
Rapid outbreak characterisation - UK Genome Sciences 2014 - wed 3 sep 2014
Rapid outbreak characterisation  - UK Genome Sciences 2014 - wed 3 sep 2014Rapid outbreak characterisation  - UK Genome Sciences 2014 - wed 3 sep 2014
Rapid outbreak characterisation - UK Genome Sciences 2014 - wed 3 sep 2014
 
DNA Sequencing
DNA SequencingDNA Sequencing
DNA Sequencing
 
Nikita rory dkoda
Nikita rory dkodaNikita rory dkoda
Nikita rory dkoda
 
Ducky momo
Ducky momoDucky momo
Ducky momo
 
Grantsmanship: A personal view
Grantsmanship: A personal viewGrantsmanship: A personal view
Grantsmanship: A personal view
 
EVO Jaarcongres 2014 - Presentatie shopping 2020
EVO Jaarcongres 2014 - Presentatie shopping 2020 EVO Jaarcongres 2014 - Presentatie shopping 2020
EVO Jaarcongres 2014 - Presentatie shopping 2020
 
Hum evolgen2011 scatterlingsofafrica
Hum evolgen2011 scatterlingsofafricaHum evolgen2011 scatterlingsofafrica
Hum evolgen2011 scatterlingsofafrica
 
Northern ireland interviewees
Northern ireland intervieweesNorthern ireland interviewees
Northern ireland interviewees
 
What Is Web 2.0 ?
What Is Web 2.0 ? What Is Web 2.0 ?
What Is Web 2.0 ?
 
Postgresql 9.3-a4
Postgresql 9.3-a4Postgresql 9.3-a4
Postgresql 9.3-a4
 
Bio380 Human Evolution: Waking the dead
Bio380 Human Evolution: Waking the deadBio380 Human Evolution: Waking the dead
Bio380 Human Evolution: Waking the dead
 
2011-04-26_various-assemblers-presentation
2011-04-26_various-assemblers-presentation2011-04-26_various-assemblers-presentation
2011-04-26_various-assemblers-presentation
 
Genome Assembly Forensics
Genome Assembly ForensicsGenome Assembly Forensics
Genome Assembly Forensics
 
Bio380 Cancer Phylogenomics
Bio380 Cancer PhylogenomicsBio380 Cancer Phylogenomics
Bio380 Cancer Phylogenomics
 
Genome assembly: then and now — v1.1
Genome assembly: then and now — v1.1Genome assembly: then and now — v1.1
Genome assembly: then and now — v1.1
 
Bio303 Lecture 2 Two Old Enemies, TB and Leprosy
Bio303 Lecture 2 Two Old Enemies, TB and LeprosyBio303 Lecture 2 Two Old Enemies, TB and Leprosy
Bio303 Lecture 2 Two Old Enemies, TB and Leprosy
 

Similar to Bio153 microbial genomics 2012

Human genome project (2) converted
Human genome project (2) convertedHuman genome project (2) converted
Human genome project (2) convertedGAnchal
 
NILANSU_DASGenome organization2020-04-08Genome organization.pptx
NILANSU_DASGenome organization2020-04-08Genome organization.pptxNILANSU_DASGenome organization2020-04-08Genome organization.pptx
NILANSU_DASGenome organization2020-04-08Genome organization.pptxTanmoyBanerjee44
 
Recombination Technology
Recombination TechnologyRecombination Technology
Recombination TechnologyZahid Azeem
 
Genome sequencing
Genome sequencingGenome sequencing
Genome sequencingShital Pal
 
Dr. ladli kishore (microbial genetics and variation) (1)
Dr. ladli kishore (microbial genetics and variation) (1)Dr. ladli kishore (microbial genetics and variation) (1)
Dr. ladli kishore (microbial genetics and variation) (1)Drladlikishore2015
 
Unit7_MolecularGenetics
Unit7_MolecularGeneticsUnit7_MolecularGenetics
Unit7_MolecularGeneticsaurorabiologia
 
DNA Fingerprinting for Taxonomy and Phylogeny.pptx
DNA Fingerprinting for Taxonomy and Phylogeny.pptxDNA Fingerprinting for Taxonomy and Phylogeny.pptx
DNA Fingerprinting for Taxonomy and Phylogeny.pptxsharanabasapppa
 
PCR, RT-PCR, FISH
PCR, RT-PCR, FISHPCR, RT-PCR, FISH
PCR, RT-PCR, FISHtcha163
 
Genomics Technologies
Genomics TechnologiesGenomics Technologies
Genomics TechnologiesSean Davis
 

Similar to Bio153 microbial genomics 2012 (20)

THE human genome
THE human genomeTHE human genome
THE human genome
 
Genomics
GenomicsGenomics
Genomics
 
Human genome project (2) converted
Human genome project (2) convertedHuman genome project (2) converted
Human genome project (2) converted
 
Modern genetics
Modern geneticsModern genetics
Modern genetics
 
Molecular tagging
Molecular tagging Molecular tagging
Molecular tagging
 
NILANSU_DASGenome organization2020-04-08Genome organization.pptx
NILANSU_DASGenome organization2020-04-08Genome organization.pptxNILANSU_DASGenome organization2020-04-08Genome organization.pptx
NILANSU_DASGenome organization2020-04-08Genome organization.pptx
 
Recombination Technology
Recombination TechnologyRecombination Technology
Recombination Technology
 
Recombinant DNA.pptx
 Recombinant DNA.pptx Recombinant DNA.pptx
Recombinant DNA.pptx
 
Fungal genomics
Fungal genomicsFungal genomics
Fungal genomics
 
Genome sequencing
Genome sequencingGenome sequencing
Genome sequencing
 
Dr. ladli kishore (microbial genetics and variation) (1)
Dr. ladli kishore (microbial genetics and variation) (1)Dr. ladli kishore (microbial genetics and variation) (1)
Dr. ladli kishore (microbial genetics and variation) (1)
 
Eisen.Geba.Jgi2009b
Eisen.Geba.Jgi2009bEisen.Geba.Jgi2009b
Eisen.Geba.Jgi2009b
 
Unit7_MolecularGenetics
Unit7_MolecularGeneticsUnit7_MolecularGenetics
Unit7_MolecularGenetics
 
Ap Bio Ch 13 Power Point
Ap Bio Ch 13 Power PointAp Bio Ch 13 Power Point
Ap Bio Ch 13 Power Point
 
Cloning dna f inal
Cloning dna f inalCloning dna f inal
Cloning dna f inal
 
Microbial genomes.ppt
Microbial genomes.pptMicrobial genomes.ppt
Microbial genomes.ppt
 
DNA Fingerprinting for Taxonomy and Phylogeny.pptx
DNA Fingerprinting for Taxonomy and Phylogeny.pptxDNA Fingerprinting for Taxonomy and Phylogeny.pptx
DNA Fingerprinting for Taxonomy and Phylogeny.pptx
 
PCR, RT-PCR, FISH
PCR, RT-PCR, FISHPCR, RT-PCR, FISH
PCR, RT-PCR, FISH
 
0.PDF
0.PDF0.PDF
0.PDF
 
Genomics Technologies
Genomics TechnologiesGenomics Technologies
Genomics Technologies
 

More from Mark Pallen

Nothing in Microbiology makes Sense except in the Light of Evolution
Nothing in Microbiology makes Sense except in the Light of EvolutionNothing in Microbiology makes Sense except in the Light of Evolution
Nothing in Microbiology makes Sense except in the Light of EvolutionMark Pallen
 
Bio305 2012 Lecture 1 on E. coli
Bio305 2012 Lecture 1 on E. coliBio305 2012 Lecture 1 on E. coli
Bio305 2012 Lecture 1 on E. coliMark Pallen
 
Bio305 Lecture on Genetics
Bio305 Lecture on Genetics Bio305 Lecture on Genetics
Bio305 Lecture on Genetics Mark Pallen
 
Bio305 Lecture on Gene Regulation in Bacterial Pathogens
Bio305 Lecture on Gene Regulation in Bacterial PathogensBio305 Lecture on Gene Regulation in Bacterial Pathogens
Bio305 Lecture on Gene Regulation in Bacterial PathogensMark Pallen
 
Bio305 pathogen biology_2012
Bio305 pathogen biology_2012Bio305 pathogen biology_2012
Bio305 pathogen biology_2012Mark Pallen
 
Bio303 laboratory diagnosis of infection
Bio303 laboratory diagnosis of infectionBio303 laboratory diagnosis of infection
Bio303 laboratory diagnosis of infectionMark Pallen
 
High-Throughput Sequencing
High-Throughput SequencingHigh-Throughput Sequencing
High-Throughput SequencingMark Pallen
 
Bio263 Lecture 2: Becoming human
Bio263 Lecture 2: Becoming humanBio263 Lecture 2: Becoming human
Bio263 Lecture 2: Becoming humanMark Pallen
 
Bio303 Lecture Three: New Foes, Emerging Infections
Bio303 Lecture Three: New Foes, Emerging InfectionsBio303 Lecture Three: New Foes, Emerging Infections
Bio303 Lecture Three: New Foes, Emerging InfectionsMark Pallen
 
Bio263 Who is our Closest Relative
Bio263 Who is  our Closest RelativeBio263 Who is  our Closest Relative
Bio263 Who is our Closest RelativeMark Pallen
 
Bio303 Lecture 1 The Global Burden of Infection and an Old Enemy, Malaria
Bio303 Lecture 1 The Global Burden of Infection and an Old Enemy, MalariaBio303 Lecture 1 The Global Burden of Infection and an Old Enemy, Malaria
Bio303 Lecture 1 The Global Burden of Infection and an Old Enemy, MalariaMark Pallen
 
Bio380 2011 The Wandering Gene
Bio380 2011 The Wandering GeneBio380 2011 The Wandering Gene
Bio380 2011 The Wandering GeneMark Pallen
 
Bio380 hum evolgen2011_major_populations
Bio380 hum evolgen2011_major_populationsBio380 hum evolgen2011_major_populations
Bio380 hum evolgen2011_major_populationsMark Pallen
 

More from Mark Pallen (13)

Nothing in Microbiology makes Sense except in the Light of Evolution
Nothing in Microbiology makes Sense except in the Light of EvolutionNothing in Microbiology makes Sense except in the Light of Evolution
Nothing in Microbiology makes Sense except in the Light of Evolution
 
Bio305 2012 Lecture 1 on E. coli
Bio305 2012 Lecture 1 on E. coliBio305 2012 Lecture 1 on E. coli
Bio305 2012 Lecture 1 on E. coli
 
Bio305 Lecture on Genetics
Bio305 Lecture on Genetics Bio305 Lecture on Genetics
Bio305 Lecture on Genetics
 
Bio305 Lecture on Gene Regulation in Bacterial Pathogens
Bio305 Lecture on Gene Regulation in Bacterial PathogensBio305 Lecture on Gene Regulation in Bacterial Pathogens
Bio305 Lecture on Gene Regulation in Bacterial Pathogens
 
Bio305 pathogen biology_2012
Bio305 pathogen biology_2012Bio305 pathogen biology_2012
Bio305 pathogen biology_2012
 
Bio303 laboratory diagnosis of infection
Bio303 laboratory diagnosis of infectionBio303 laboratory diagnosis of infection
Bio303 laboratory diagnosis of infection
 
High-Throughput Sequencing
High-Throughput SequencingHigh-Throughput Sequencing
High-Throughput Sequencing
 
Bio263 Lecture 2: Becoming human
Bio263 Lecture 2: Becoming humanBio263 Lecture 2: Becoming human
Bio263 Lecture 2: Becoming human
 
Bio303 Lecture Three: New Foes, Emerging Infections
Bio303 Lecture Three: New Foes, Emerging InfectionsBio303 Lecture Three: New Foes, Emerging Infections
Bio303 Lecture Three: New Foes, Emerging Infections
 
Bio263 Who is our Closest Relative
Bio263 Who is  our Closest RelativeBio263 Who is  our Closest Relative
Bio263 Who is our Closest Relative
 
Bio303 Lecture 1 The Global Burden of Infection and an Old Enemy, Malaria
Bio303 Lecture 1 The Global Burden of Infection and an Old Enemy, MalariaBio303 Lecture 1 The Global Burden of Infection and an Old Enemy, Malaria
Bio303 Lecture 1 The Global Burden of Infection and an Old Enemy, Malaria
 
Bio380 2011 The Wandering Gene
Bio380 2011 The Wandering GeneBio380 2011 The Wandering Gene
Bio380 2011 The Wandering Gene
 
Bio380 hum evolgen2011_major_populations
Bio380 hum evolgen2011_major_populationsBio380 hum evolgen2011_major_populations
Bio380 hum evolgen2011_major_populations
 

Recently uploaded

Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeThiyagu K
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdfQucHHunhnh
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityGeoBlogs
 
Disha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfDisha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfchloefrazer622
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfAyushMahapatra5
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Krashi Coaching
 
9548086042 for call girls in Indira Nagar with room service
9548086042  for call girls in Indira Nagar  with room service9548086042  for call girls in Indira Nagar  with room service
9548086042 for call girls in Indira Nagar with room servicediscovermytutordmt
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfAdmir Softic
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDThiyagu K
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdfSoniaTolstoy
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3JemimahLaneBuaron
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Celine George
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfagholdier
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactPECB
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...christianmathematics
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Disha Kariya
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD
 
General AI for Medical Educators April 2024
General AI for Medical Educators April 2024General AI for Medical Educators April 2024
General AI for Medical Educators April 2024Janet Corral
 

Recently uploaded (20)

Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 
Disha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfDisha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdf
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdf
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
 
9548086042 for call girls in Indira Nagar with room service
9548086042  for call girls in Indira Nagar  with room service9548086042  for call girls in Indira Nagar  with room service
9548086042 for call girls in Indira Nagar with room service
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SD
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
General AI for Medical Educators April 2024
General AI for Medical Educators April 2024General AI for Medical Educators April 2024
General AI for Medical Educators April 2024
 

Bio153 microbial genomics 2012

  • 1. Bio153 Microbial Genomics Professor Mark Pallen University of Birmingham
  • 2. Microbial Genomics  General features of microbial genomes  Historical overview  Genome sequencing, annotation and analysis  Genome evolution  What we can learn from a genome sequence?
  • 3. General features of genomes Microbial Human  Small WSIWYG genomes  Very large genomes (Mbp) (Gbp)  Gene density high (>90%)  intergenic regions short  Gene density low  very little repetitiveor non-  Only 25% is genes coding DNA  Introns mean only1%  Introns very rare codes  Protein-coding genes  Genes can span ≥30 (CDS) short (~1kbp) kbp  Operons with promoters just upstream  Genes have ~3  Fewer non-coding RNAs transcripts  Splicing and splice variants
  • 4. Bacterial genome organisation Chromosomes Plasmids  Most commonly single  Independent autonomous replicon, can be circular or circular chromosome linear (always DNA)  may integrate into chromosome  BUT many species have  copy number varies 1 to 10s linear chromosome(s) (e.g.  often carry non-essential genes Borrelia, Streptomyces, Rh that confer an adaptive odoccus) advantage in certain conditions  BUT a few species with two chromosomes (e.g. Vibriocholerae)  Can be mix of circular and linear (e.g. Agrobacteriumtumefacien s, B. burgdoferi)
  • 5. Bacterial Genome Size  species which occupy restricted ecological niches, (e.g. obligate intracellular parasites and endosymbionts) tend to have smaller genomes (<1.5 Mb) than generalist bacteria  smallest known bacterial genome: Carsonellaruddii, 160 kb! (Nakabachi et al. 2006)  BUT mitochondrial genomes are smaller  largest genomes found in bacteria with complex developmental cycles, e.g. Streptomyces  largest bacterial genome: Sorangiumcellulosum, 13 Mb
  • 6. Bacterial genomes are made from DNA  In 1944, Oswald Avery, Colin MacLeod, and Maclyn McCarty showed that DNA (not proteins) was the genetic material responsible for inheritance.  Identified DNA as the "transforming principle" while studying Streptococcus pneumoniae  Avery, Oswald T., Colin M. MacLeod, and Maclyn McCarty. Studies on the chemical nature of the substance inducing transformation of pneumococcal types. Journal of Experimental Medicine. 1944 Feb 1; 79(2): 137-158.  In 1952, this work was supported by Alfred Hershey and Martha Chase who showed that only the DNA of a virus needs to enter a bacterium to infect it.  Used radioactively labelled bacteriophage  Hershey AD and Chase M. Independent functions of viral protein and nucleic acid in growth of bacteriophage. Journal of General Physiology. 1952. 36: 39-56.
  • 7. Viral genomes are variable  Use RNA or DNA but not both in genome  Some have RNA genomes!  Grouped into families depending on  type of genome: DNA or RNA, single- or double- stranded  Typically dozens of genes or fewer  Large genomes in pox viruses (~200 kb)  Massive genomes in megaviruses (1Mbp!)
  • 8. Microbial Genomics Timeline Year Milestone 1977 Invention of dideoxy chain terminator sequencing (“Sanger sequencing”) 1979 Sequencing of the 5.3-kilobase genome of bacteriophage phiX174 1981 First human mitochondrial genome sequence* 1982 Determination of the 48.5-kilobase genome sequence of bacteriophage lambda through first use of shotgun sequencing 1986 Development of automated fluorescent sequencing 1995 First complete genome sequences obtained of free-living bacteria (Haemophilus influenzae and Mycoplasma genitalium) 1996 Mycoplasma becomes first bacterial genus that has completely sequenced genomes from two different species (M. genitalium and M. pneumoniae) 1997 First genome sequences from Escherichia coli and Bacillus subtilis 1998 First genome sequence from Mycobacterium tuberculosis; genome sequence from Rickettsiaprowazekii provides first evidence of reductive evolution
  • 9. Microbial Genomics Timeline Year Milestone 1999 Helicobacter pylori becomes the first species with completely sequenced genomes from two isolates 2000 Meningococcal genome sequence primes first application of reverse vaccinology 2001 Second E. coli genome sequences reveal unexpected level of horizontal gene transfer; genome sequence of M. leprae provides compelling evidence of bacterial pseudogenes and reductive evolution; first paper reporting genome sequences of two strains from one species (Staphylococcus aureus) in a single publication. 2002 Genome sequencing of multiple strains of Bacillus anthracis to provide markers for forensic epidemiology 2003 Genome sequencing of uncultivable Tropherymawhippleileads to design of axenic growth medium 2004 Genome sequence of mimivirus blurs distinctions between bacteria and viruses 2005 Use of whole-genome sequencing used to identify target of new anti-tuberculosis drug Mycoplasma genitalium genome sequenced using pyrosequencing 2006- Bacterial metagenomics survey of the Sargasso sea yields >1 million new genes 2011 Rise of next-generation or high-throughput sequencing
  • 10. The first genome sequences  The first sequenced gene was from bacteriophage MS2  The gene encoding the coat protein  1972  Min Jou W, Haegeman G, Ysebaert M, and Fiers W. Nucleotide sequence of the gene coding for the bacteriophage MS2 coat protein. Nature. 1972 May 12; 237(5350): 82-88.  The first sequenced genome was bacteriophage MS2  1976  RNA genome is 3,569 nucleotides  Fiers W, Contreras R, Duerinck F, Haegeman G, Iserentant D, Merregaert J, Min Jou W, Molemans F, Raeymaekers A, Van den Berghe A, Volckaert G, and Ysebaert M. Complete nucleotide sequence of bacteriophage MS2 RNA: primary and secondary structure of the replicase gene. Nature. 1976 Apr 8; 260(5551): 500-507.
  • 11. The first genome sequences  The first sequenced DNA genome was bacteriophage Φ- X174  1977  5368 base pairs  Sanger F, Air GM, Barrell BG, Brown NL, Coulson AR, Fiddes CA, Hutchison CA, Slocombe PM, and Smith M. Nucleotide sequence of bacteriophage phi X174 DNA. Nature. 1977 265 (5596): 687-695.  The first sequenced bacterial genome was Haemophilus influenzae  1995  1,830,140 base pairs  Fleischmann R, Adams M, White O, Clayton R, Kirkness E, Kerlavage A, Bult C, Tomb J, Dougherty B, and Merrick J. Whole-genome random sequencing and assembly of Haemophilus influenzae Rd. Science, 1995. 269 (5223): 496- 512.
  • 12. Overview of a genome project  Choose strain  Closure and finishing  Fresh isolate or tractable  Manually intensive lab strain?  Difficulty depends on  Choose strategy how repetitive  Shotgun sequencing  Data Release  Paired-end sequencing  Immediate or delayed?  Draft or complete?  Annotation  Choose chemistry  Manually intensive bottle  Sanger; 454; Illumina; neck Ion Torrent  Publication  Assembly  Automated
  • 13. Methods for genome sequencing – historic Sanger method sequencing  Sanger F and Coulson AR. A rapid method for determining sequences in DNA by primed synthesis with DNA polymerase. Journal of Molecular Biology. 1975 94: 441-448.  Step 1, a sequence-specific DNA primer is radiolabeled  Step 2, the primer is annealed to the template DNA  Step 3, the primer is extended by DNA polymerase  Incorporation of a deoxynucleotide - further extension possible  Incorporation of a dideoxynucleotide – chain termination  Four reactions set up  ddATP, dATP, dCTP, dGTP, dTTP  ddCTP, dATP, dCTP, dGTP, dTTP  ddGTP, dATP, dCTP, dGTP, dTTP  ddTTP, dATP, dCTP, dGTP, dTTP
  • 14. Methods for genome sequencing – historic Sanger method sequencing
  • 15. Methods for genome sequencing – automated Sanger sequencing  Smith LM, Sanders JZ, Kaiser RJ, Hughes P, Dodd C, Connell CR, Heiner C, Kent SBH, and Hood LE. Fluorescence detection in automated DNA sequence analysis. Nature. 1986 321: 674-679.  Replaced radioisotopes with fluorescent dyes  Safer for the researchers  Each of the four DNA bases could be dyed a different colour  Eliminated the need to run separate reactions in separate lanes  The migration of the dye could be read because of the fluorescence  This information allowed automatic gel reading  Further improvements were made  Improved dye chemistry using fluorescent dideoxy-terminators (DuPont): Prober JM, Trainor GL, Dam RJ, Hobbs FW, Robertson CW, Zagursky RJ, Cocuzza AJ, Jensen MA, and Baumeister K. A system for rapid DNA sequencing with fluorescent chain-terminating dideoxynucleotides. Science 238: 336-341.  Replacing slab gels with re-useable capillary tubes: Ruiz-Martinez MC, Berka J, Belenkii A, Foret F, Miller AW, and Karger BL. DNA sequencing by capillary electrophoresis with replaceable linear polyacrylamide and laser-induced fluorescence detection. Analytical Chemistry 1993 65: 2851-2858.
  • 16. Whole-Genome Shotgun Sanger Sequencing Random shearing bacterial chromosome Size selection plasmid vector Pick colonies to create shotgun Cloning library Sequence each insert with two primers Plasmid preps
  • 17. High-throughput Sequencing  100x faster, 100x cheaper!  A disruptive technology  Several technologies in the marketplace from 2007 onwards  454 (Roche)  Illumina  Ion Torrent  PacBio  Fundamentally new approaches  Solid-phase amplification of clonal templates in “molecular colonies”  Massive increase in number of “clones” compensates for shorter read length  New chemistries for sequence reading  454: pyrophosphate detection on base addition  Illumina: reversible de-protection of fluorescent bases
  • 18. High-Throughput Shotgun Sequencing Random shearing bacterial chromosome Size selection Sequence Amplify Add adapters
  • 19. 454 sequencing Emulsion-based clonal amplification Anneal sstDNA to Clonal amplification Break Emulsify beads and PCR an excess of DNA occurs inside microreactors, enric reagents in water-in-oil Capture Beads microreactors h for DNA-positive microreactors beads
  • 20. Pyrosequencing  DNA template with primer mixed with the enzymes along with the two substrates adenosine 5‟-phosphosulfate (APS) and luciferin 1. one of the four nucleotides added to reaction 2. If complementary to base in template strand then DNA polymerase incorporates it 3. Pyrophosphate (Ppi) released then converted to ATP by sulfurylase in the presence of APS. 4. ATP serves as a substrate to luciferase, causing a light reaction. 5. Excess nucleotides degraded by apyrase.
  • 22. The Sequence Assembly Problem  Sequencing technologies generate reads of <1000 bp  These reads must be assembled into a single continuous genomic sequence.  Shotgun sequencing exploits many overlapping sequences (high coverage) to infer ordering directly from the sequences themselves
  • 23. The Repeat Problem  Repeats at read ends can be assembled in multiple ways Correct ATTTATGTGTGTGTGGTGTG GTGTGGTGTGCACTACTGCT ACTACTGCTGACTACTGTGTGGTGTG GTGTGGTGTGATATCCCT Incorrect ATTTATGTGTGTGTGGTGTG GTGTGGTGTGATATCCCT ACTACTGCTGACTACTGTGTGGTGTG GTGTGGTGTGCACTACTGCT
  • 24. Random shearing bacterial chromosome Size selection for 3kb or 8kb etc Obtain sequences from either side of linker Paired-end known distance apart in genome Sequencing Add linkers Circularise Add adapters Shear and select on size and presence of linkers Create long fragments of known length Obtain sequence from paired ends known distance apart Allows assembly of contigs across repeats into scaffolds
  • 25. Genome Assembly Contig 1 Contig 2 Contig 3 Sequence Gap Scaffold Physical Gap
  • 26. Re-sequencing  Short reads (<200bp) inefficient de novo assembly  Instead they are mapped against a reference genome  Re-sequencing is like assembling a jigsaw puzzle using the image on the lid
  • 27. Genome annotation  Annotation is the addition of information about the predicted sequence features to the flat file of DNA code  Identification of potential coding sequences - CDS  Homology searches to predict function  Other features can be annotated as well  rRNAs  Potential promoters  tRNAs  Small non-coding RNAs  Repeat sequences  Insertion sequences (ISs), transposons, gene fragments  Location of the origin of replication  Determination of the number of bases, genes, and G+C%.
  • 28. How to go from this….? >Escherichia coli K-12 MG1655_3870656-3890655 TGCTGCTGCCTGCTGCGCGGTGCGCTCTACGGATTGCCCGGCGCGATAGAGATCGCTGCCTAAGCCCGCCCCTGCACAACCTGCGTCTATCCACTGCGCCAGGTTTTCTGCGTCACGCCGCAAC GGCAAAGACTGCGATGTCCGATGGCAATACCGCTTTTAACGCTTTGATGTATTGCGGACCAAAAGCCGATGACGGAAATATTTTCAGCGCCTGCGGCGCCCGCTTCGAGCGCGGTAAAGGCTTCG GTCGCCGTCGCGCAGCCGGGGCAGACGTCATGCCGTAGCCCACCGCACGGCGGATCACTTCACTATGGATATTGGGCGTAACGATGAGCTGACAGCCCATCCTGGCGAGCGCATCGACCTGTT CAGGTTTCAGTACCGTACCTGCGCCAATCAACGCCTTGTCGCCGTACGCATCAACGATGCGGGAATGCTTTGCTCCCATTGTGGGGAATTCAGCGGGATTTCAACCGCGTCGAACCCGGCGTCAA TCACCGCGCCAACATGCGCCAGCGCCTCGTCGGGCGTAATACCGCGCAAAATGGCGATCAGCGGGAGTTTAGTTTGCCACTGCATGAGGATGCTCCTTATACCAGCCTGAAATGCCGTGTCGCC CGCCACCGCCGTCACGTCGCAACCCATCGCCTGAAAGGCTTGCTGGTAGCGCGCGGTCAGCGATGTTCCGGCGACAAGGGTGATGGCGTGTTGATGGGCCACATAGTCGCGCATACTGGGACC TCTGCGCCAATCAACAAACCAGAGAGAAATTCGCTGACCTGTTCGCGGGGAAGTGTTCCCAGCACATGCGAGGCGCGAACTTCAAAAAGCTGCGGCAATATGGCGGGCGTATTAAGACCACGCT CAAGGCCAGCTGTGAAGGCATCGGCAGGTTTTCCTGCGGCGGCAAACCTGCGCCAATCAATGAGTGATTTAACAGTAAATGATGTAATTCACCGGTCATCACGGTGCGAAAATCGTTGATTTGCTG GCTATCGGCCTGCACCCATTTGCAATGGGTTCCGGGCATGACATAAAGAGAGGAAGAGCCAGAGCTCGCGCGCCGATCAATTGTGTTTCTTCGCCGCGCATCACATTGTGGTTATCGTCATGAGA GACACATAATCCGGGAATAATCCAGATATTGTCGCCAACTGACGTTAATTGTTCGCCAATAGACGAAAAACAGGCAGGAACAGATAATACGGTGCAACTTTCCAGCCGACGTTGCTGCCAACCATT CCTGCCATTACCACTGGCGTTTTCTCTTCACGCCAGTCGGTCGTGACTTCTGCTAACACCGCAGCCGGAGATTTTCCGTTCAGGCGCGTGACGCCTGCTTCTGATTGCCTGCTCTCAGGCAGTGG TCGCCCTGATAAAGCCAGGCGCGCAGATTGGTCGATCCCCAGTCAATTGCGATGTAGCGAGCTGTCATGTGATTTCCTTTAACCTTCGTGTCGAGCTGGCGATCATGGTAAGCGCCGCCTGCTCT GCCGCATCGCCGTCCTGATGCGTATCGCATCGAACAGCGCCTTATGTTCCTGGAGCGTTTGCGGCATGTTGGCCTCATCGCCCATCCAGGTTCGTTCAAAAACCGCCCGCTGCAGCGAACTGATC GCAATGCTAAGTTGCTGTAACACCGGGTTATGCACCGACTGCAGCACCGCTCGTGGTAGCGAATATCCGCTTCGTTAAACGCTTCGCGGTCCTGATTGTTGGCAATCATCTCGTTCAGCGCCGATT CAATCTGCGCCAGATCGCTGGAAGTCGCGCGCTCTGCTCCCAACGGGCAATCGCCGGTTCCACCAGATTTCGCACTTCGTCATGGCACTGATAAGCCGTGGGTCGTAGTCATTTTCCAGCACCCA TTGCAGTACGTCAGTGTCGAGGTAATTCCACTGGTTACGCGGTGCCACAAACGCCCCGCGATAACGTTTCATTTCAATCAGCCGCTTCGCCATCAGCGAACGGAACACCCACGGATGATGTTGCG CGAGGTTGCAAACTCCTCACAGAGTTCCGCCTCAGCCGGAAGCGGCGAGCCTGGCACGTATTTGCCGTGAACGATCTGTTTACCCAGCGTAATGACAATGCGATCGGTTTTATTGAGAGTCATGG AGAGTCCTTGTGCTTGTATGTTCTTCTCTACTTTACCCCGATCGATGCATAACGCGGCAACTTTGTAGTACCAGCGTGATGACGTTCGCGTTTGCCGTGCGTGTAATGTAGTACAAACTTATATTGTT GTACTACAATTTAGATCACAAAAAGAACAATGCATAAAAAATGACATGCGTCGGGCAGAAATCTGAAAAGGGATATCAGGCGCTAAACAGGAGGGAAAGAAGAGTATGCTTTCAACGGCTTAGCTA CTCGTTTAAAGGATTAATCATGAAGTTGAATTTTAAGGGATTTTTTAAGGCTGCCGGTTTATTCCCACTGCGCTGATGCTTTCAGGCTGTATCTCGTATGCTCTGGTTTCCCATACCGCAAAGGGTAG TTCAGGAAAGTATCAATCGCAGTCAGACACCATCACTGGGCTATCGCAGGCAAAAGATAGTAATGGAACAAAAGGCTATGTTTTTGTAGGGGAATCGTGGATTACCTTATCACTGATGGTGCCGAT GACATCGTTAAGATGCTCAATGATCCAGCACTTAACCGGCACAATATTCAGGTTGCCGATGACGCAAGATTTGTTTTAAATGCGGGGAAAAAGAAATTTACCGGCACAATATCGCTTTACTACTACG GAATAACGAAGAAGAAAAGGCACTGGCAACGCATTATGGTTTTGCCTGTGGTGTTCAACACTGTACCAGGTCACTGGAAAACCTAAAAGGCACAATCCATGAGAAAAATAAAAACATGGATTACTCA AAGGTGATGGCGTTCTACCATCCATTTAAGTGCGATTTTATGAATACTATTCACCCAGAGGCATTCCGGGATGGTGTTTCCGCAGCATTACTGCCAGTGACTGTTACGCTGGACATCATTACTGCAC CGCTGCAATTTCTGGTTGTATATGCAGTAAACCAATAATCAGTAAGCGGGCAAACCGTTTATGCTGTTTGCCCGCCCACAGATTAATTCAGCACATACTTCTCAATAGCAAACGCCACGCCATCTTCA AGGTTAGATTTGGTGACAAAGTTCGCCACTTCTTTCACTGAAGGAATAGCGTTATCCATCGCCACACCGACGCCTGCATATTAATCATTGCGATATCGTTTTCCTGATCGCCAATCGCCATGATTTCT TCCGGTTTAATACCTAACACGTCGGCCAGTGATTTCACCCCCGTACCTTTGTTAACGCGTTTATCGAGGATTTCGAGGAAGTACGGCGCACTTTTCAGCACGGTATATTCTCTTTCACTTCCTGCGG AATACGCGCGATAGCCTGGTCGAGGATGGCGGGTTCATCAATCATCATCACTTTCAGGAACTGGGTATTGGGGTCCATTTTCTCCGCTTCGCAGAACACCAGCGGAATGGTGGCAACGAAGGATT CATGCACCGTGTGTAGCTGATATCACGGTTGGCGGTGTACAGCGTGGTGCGGTCCAGGGCGTGGAAATGAGAACCGACTTCGCGAGAGAGTTTTTCCAGGAAACGATAGTCGTCATAGCTGAGA GCAGTTTGCGCCACGGTGCTACCATCAGCGGCCTTCTGTACCACGCGCCGTTATAAGTAATGCAGTAGTCGCCCGGCTGTTCCATATGCAGCTCTTTCAGGTAGTTGTGCACACCTGCATACGGG CGACCCGTCGTTAGCACGACATTCACGCCACGGGCGCGAGCTGCGGCAATCGCATTTTTAACGGCGGGTGAAAGGTGTGATCGGGCAGCAGAAGGGTGCCATCCATATCGATAGCAATGAGTTT AATAGCCATGAGTTCCCCAGGTAGATTGGTTCCTGACCCATGCTAACGCGATTCCGCTCAAAAATCAGTACAACACCCGAGGGAAAAGGGGGATGCAACGCGCGTGCGTGCTCCCTTTTTGCTTA GCGGAAGAGTTTCCCTTTCAGCAGTTCCATGCCTGCGGAAAGCAGATCGTTATTGGCTTGTGGTGACACTTCACCTTGCGGTGAGAGCGCATCAATAATCTTCGGCAATTGTTCTGCCAGTAAACT GGAAGCTGACTGGTATCCACGCCAAGTTTTTGCCCGAGATCGGACACCGCATTTGTGCCGAGCGCCGATTCCAGTTGCTCGCCACTAACCGATTGATTGCCCTGTTGATTACTCAGCCAGGTTGA GAGAATGGCCCCTAAGCCGCCACTTTGCAGTTTTTCCACAGCACCTGAATGCCGCCCTGCTCCTCAACCCAACTTAAAATAGCCTGATATTTCCCCGCATCGCCTTTCAGAAAGGCACCGACAACTT CATCAAAAAGCCCCATGATAATCACCTGTAAAGCGTTACGTGTTGACCCAAAAAGTATAGATTTGCGGATGATAATTGCGGATTGCAGAAATAAAAAGGGCGGAGATGATCTCCGCCCTTTTCTTAT AGCTTCTTGCCGGATGCGGCGTGAACGCCTTATCCGGCCTACAAAATCATGAAAATTCAATACATTGCAAGATTTTCGTAGGCCTGATAAGCGTGCGCATCAGGCACGCTCGCATGGTTAGCGCCA TTAAATATCGATATTCGCCGCTTTCAGGGCGTTCTCTTCAATAAACGCACGGCGCGGTTCAACGGCGTCGCCCATCAGCGTGGTGAACAACTGGTCGGCAGCAATCGCATCTTTAACGGTAACCG CAGCATACGACGACTTTCCGGGTCCATAGTGGTTTCCCACAGCTGTTCCGGGTTCATCTCGCCCAGACCTTTATAACGCTGGATGGAGAGGCCGCGACGGGACTCTTTCACCAGCCAGTCCAGCG CCTGCTCGAAGCTGGCTACCGGCTGACGCGCTCGCCACGTTCGATAAACGCATCTTCTTCCAGCAAGCCACGCAGTTTCTCACCCAGCGTGCAGATACGACGATATTCGCCACCGGTGATAAACT CGTGATCCAGCGGATAGTCAGTATCCACACCGTGGGTACGCACGCGAACAATCGGCTCAACAGGTTTTGCTCAGCATTGGTGTGAACATCAAACTTCCACTGGCTGCCGTGCTGTTCTTTGTCGTT CAGTTCGCTGACCAGCGCGTTCACCCAGCGGGTAACGGTCTGCTCATCAGAAAGGTCAGCTTCCGTCAACGTCGGCTGATAGATAAGTCTTTCAGCATTGCTTTCGGATAACGACGCTCCATACG ATTGATCATTTTCTGCGTCGCGTTGTACTCAGATACCAGTTTCTCTAACGCTTCGCCAGCCAATGCCGGTGCACTGGCGTTGGTGTGCAGCGTTGCGCCGTCCAGCGCGATAGAGATTGGTACTG ATCCATCGCTTCGTCGTCTTTAATGTACTGTTCCTGCTTGCCTTTCTTCACTTTGTACAGCGGCGGCTGAGCGATGTAGACGTGACCGCGTTCAACGATTTCCGGCATCTGACGATAGAAGAAGGT CAACAGCAGCGTACGAATGTGGAGCCGTCGACGTCCGCATCGGTCATGATGATGATGCTGTGATAACGCAGTTTGTCCGGGTTGTACTCGTCACGACCGATACCACAGCCAAGCGCGGTGATAA GCGTCGCCACTTCCTGAGAAGAGAGCATCTTATCGAAGCGCGCTTTCTCGACTTGAGGATTTTACCCTTCAGCGGCAGAATCGCCTGGTTCTTGCGGTTACGCCCCTGCTTCGCAGAGCCGCCCG CGGAGTCCCCTTCCACCAGGTACAGTTCGGAAAGCGCCGGATCGCGTTCCTGGCAGTCTGCCAGTTTGCCCGGCAGGCCCGCAAGTCGAGCGCACCTTTACGGCGGGTCATTTCACGCGCGCG ACGCGGCGCTTCACGGGCACGGGCAGCATCGATAATTTTGCCAACCACGATTTTCGCGTCGGTTGGGTTTTCCAGCAGGTATTCTGCCAGCAGTTCGTTCATCTGCTGTTCAACGCCGATTTCACC TCAGAAGAAACCAGTTTGTCTTTGGTCTGGGAGGAGAATTTCGGGTCCGGCACTTTCACGGAAACGACCGCAATCAGGCCTTCACGCGCATCGTCACCGGTGGCGCTGACTTTGGCTTTTTTGCT GTAGCCTTCTTTGTCCATTAGGCGTTCAGGGTACGGGTCATCGCCGCACGGAAGCCTGCCAGGTGAGTACCGCCGTCACGCTGCGGAATGTTGTTGGTAAAGCAGTAGATGTTTTCCTGGAAGCC ATCGTTCCACTGCAACGCCACTTCGACGCCAATACCGTCTTTTTCAGTGAGAAGTAGAAGATATTCGGGTGGATCGGCGTTTTGTTCTTGTTCAGATATTCAACGAACGCCTTGATGCCGCCTTCAT AGTGGAAGTGGTCTTCTTTGCCGTCGCGCTTGTCGCGCAGACGAATGGAAACGCCGGAGTTGAGGAACGACAACTCCGCAGACGTTTCGCCAGAATTTCATATTCGAACTCGGTCACATTGGTGA AGGTTTCGAGGCTGGGCCAGAAACGCACCATGGTGCCGGTTTTTTCAGTCTCGCCGGTAACCGCCAGCGGGGCCTGCGGTACACCGTGTTCGTAGATCTGACGGTGATTTTACCCTCGCGCTGG ATAACCAGCTCCAGTTTTTGCGACAGGGCGTTTACTACCGAAACACCAACGCCGTGCAGACCGCCGGACACTTTATAGGAGTTATCGTCAAATTTACCGCCTGCGTGCAGAACGGTCATGATCACT TCCGCCGCCGA
  • 29. …to this?  FT gene complement(9299..10702)  FT /db_xref="GenBank:2367266”  FT /gene="dnaA”  FT /note="b3702”  FT CDS complement(9299..10702)  FT /db_xref="GI:2367267”  FT /db_xref="PID:g2367267”  FT /function="putative regulator; DNA - replication, repair,  FT restriction/modification”  FT /codon_start=1  FT /protein_id="AAC76725.1”  FT /gene="dnaA”  FT /translation="MSLSLWQQCLARLQDELPATEFSMWIRPLQAELSDNTLALYAPNR  FT FVLDWVRDKYLNNINGLLTSFCGADAPQLRFEVGTKPVTQTPQAAVTSNVAAPAQVAQT  FT QPQRAAPSTRSGWDNVPAPAEPTYRSNVNVKHTFDNFVEGKSNQLARAAARQVADNPGG  FT AYNPLFLYGGTGLGKTHLLHAVGNGIMARKPNAKVVYMHSERFVQDMVKALQNNAIEEF  FT KRYYRSVDALLIDDIQFFANKERSQEEFFHTFNALLEGNQQIILTSDRYPKEINGVEDR  FT LKSRFGWGLTVAIEPPELETRVAILMKKADENDIRLPGEVAFFIAKRLRSNVRELEGAL  FT NRVIANANFTGRAITIDFVREALRDLLALQEKLVTIDNIQKTVAEYYKIKVADLLSKRR  FT SRSVARPRQMAMALAKELTNHSLPEIGDAFGGRDHTTVLHACRKIEQLREESHDIKEDF  FT SNLIRTLSS”  FT /product="DNA biosynthesis; initiation of chromosome  FT replication; can be transcription regulator”  FT /transl_table=11  FT /note="f467; 100 pct identical to DNAA_ECOLI SW: P03004;  FT CG Site No. 851” 
  • 31. An ORF is not a CDS! An ORF is just an open reading frame There are many more ORFs than protein coding genes (CDSs) in a genome Non-coding ORFs CDSs (note ORF can extend upstream of start codon)
  • 32. The Problem of Frameshift Errors Actual sequence 10 20 30 40 50 60 70 | | | | | | | ATGAGTACCGCTAAATTAGTTAAATCAAAAGCGACCAATCTGCTTTATACCCGCAACGATGTCTCCGACAGCGAGAAA M S T A K L V K S K A T N L L Y T R N D V S D S E K • V P L N • L N Q K R P I C F I P A T M S P T A R K E Y R • I S • I K S D Q S A L Y P Q R C L R Q R E K 10 20 30 40 50 60 70 | | | | | | | ATGAGTACCGCTAAATTAGTTAAATCAAAAAGCGACCAATCTGCTTTATACCCGCAACGATGTCTCCGACAGCGAGAA M S T A K L V K S K S D Q S A L Y P Q R C L R Q R E • V P L N • L N Q K A T N L L Y T R N D V S D S E K E Y R • I S • I K K R P I C F I P A T M S P T A R K Frameshifted sequence after single base error
  • 33. Homology  Similarities in form the cat sat on the mat (sequence) allow us die Katze sass auf der Matte to infer similarities in “meaning” (structure and function)  Homology is not just sequence similarity  Two sequences can be similar without any common ancestry, particularly if low complexity vge|GBant88-2 ITLITCVSVKDNSKRYVVAG vge|GEfae9-178 LTLITCDQATKTTGRIIVIA vge|GSpne1-403 MTLITCDPIPTFNKRLLVNF sortase_staur LTLITCDDYNEKTGVWEKRK
  • 34. Types of Homology  Homologues can be divided into  Orthologues: lines of descent congruent with whole genome  Paralogues: result of gene duplication  Xenologues: result of HGT
  • 35. Homology Searches  The aim of homology searches is to identify sequences within these databases that are homologous to your sequence.  This involves comparing your sequence with all the database sequences  looking for stretches of sequence that appear to be similar  then scoring the matches and ranking them  a measure of the significance of the match is given  Most common program used for homology searches is BLAST
  • 36. Bacterial Genome Dynamics Gene Loss Gene Duplication Gene Gain Drastic downsizing in isolated intracellular niches Horizontal gene transfer by phage, plasmids, pathogenicity islands Bacterial Rapid emergence of Accumulation of genetically uniform pseudogenes and IS Genome pathogens from variable elements after shift to Dynamics ancestral populations new niche Recombination and rearrangements single nucleotide polymorphisms (SNPs) Gene Change
  • 37. Horizontal gene transfer  Horizontal (or lateral) gene transfer denotes any transfer, exchange or acquisition of genetic material that differs from the normal mode of transmission from parents to offspring (vertical transmission). Vertical gene transfer Horizontal gene
  • 38. Bacterial mobile genetic elements  Transposons  pieces of DNA that act as „jumping genes‟ that change location on chromosome or plasmid chromosomal localization.  encode transposase that catalyses the transposition event  can carry resistance or virulence genes  Insertion sequences (IS elements)  transposable elements that encode only the transposase  multiple copies of same IS within genome provide targets for homologous recombination, rearrangements and replicon fusions  Conjugative transposons  normally integrated into the chromosome  excise then transferred to recipient cells by conjugation
  • 39. Bacterial mobile genetic elements  Plasmids  self-replicating extrachromosomalreplicons  usually circular but can be linear  Can carry resistance or virulence genes  Bacteriophages  bacterial virusescan carry virulence genes  can insert into bacterial chromosome as prophages (lysogeny)  Integrons  complex natural cloning and gene expression systems able to capture promoterless gene cassettes by site- specific recombination  allow formation of large arrays of gene cassettes transferred as a whole between different replicons.
  • 40. Genomic islands  large chromosomal regions, part of the flexible gene pool  previously transferred by other mobile genetic elements  present in some bacteria but absent in close relatives  carry multiple genes that increase phenotypic versatility  contribute to dynamic character of bacterial chromosomes and can be excised from the chromosome and transferred to other recipients  pathogenicity islands contain dozens of genes that allow quantum leap to complex new virulence
  • 41. Core genomes and Pangenomes  Core genome  pool of genes shared by all members of a bacterial species  Accessory or dispensable genome  pool of genes present in some but not all genomes within the same bacterial species  Pangenome  global gene repertoire of a bacterial species, comprised of core genome + accessory genome  Metagenome  global gene repertoire of mixed microbial population
  • 42. Escherichia coli Core and Pan-genomes Welch et al. Proc Natl Acad Sci U S A. 2002 Dec 24;99(26):17020-4
  • 43. Metagenomics  Environmental shotgun sequencing  DNA extracted from mixed microbial communities sequenced en masse  Assembled into contigs  Typically only small contigs can be obtained
  • 44. Uses of a genome sequence  Gene discovery  Fuelling hypothesis driven research on pathogen biology  Comparative genomics  SNP discovery and genomic epiemiology  Functional genomics  Transcriptomics  Proteomics  Interactome  Structural Genomics  Mass Mutagenesis
  • 45. Haemolytic-uraemic syndrome  Shiga-toxin-producing E. coli (STEC)  bloody diarrhoea; damage to kidneys and brain  anaemia; loss of platelets
  • 46. German E. coli O104:H4 outbreak  May-July 2011  >4000 cases  >40 deaths  Link to sprouting seeds  High risk of haemolytic- uraemic syndrome  Females particularly at risk Frank et al DOI: 10.1056/NEJMoa1106483
  • 47.
  • 48. Take-away messages from the genome  Pathogens don‟t bother with passports!  Not a new strain: something similar seen in Germany ten years ago and in Korea  closest genome-sequenced strain was isolated from Central African Republic in late 1990s, belongs to an enteroaggregative lineage  German STEC probably comes from a lineage circulating in human populations rather than from an animal source (unlike E. coli O157)
  • 49. Take-away messages  Bacteria evolve quickly  Virulence factors in E. coli can jump from one lineage to another on mobile genetic elements  Pathotypes can overlap and evolve  Antibiotic resistance seen where no obvious prior use of antibiotics
  • 50.
  • 51. Take-away messages from genome sequence  Genome sequencing brings the advantages of  open-endedness (revealing the “unknown unknowns”),  universal applicability  ultimate in resolution  Bench-top sequencing platforms now generate data sufficiently quickly and cheaply to have an impact on real-world clinical and epidemiological problems
  • 52. Comprehensive Coverage of Human Microbiome
  • 54. What will you do when you can sequence everything?