Your SlideShare is downloading. ×
0
Bio153 microbial genomics 2012
Bio153 microbial genomics 2012
Bio153 microbial genomics 2012
Bio153 microbial genomics 2012
Bio153 microbial genomics 2012
Bio153 microbial genomics 2012
Bio153 microbial genomics 2012
Bio153 microbial genomics 2012
Bio153 microbial genomics 2012
Bio153 microbial genomics 2012
Bio153 microbial genomics 2012
Bio153 microbial genomics 2012
Bio153 microbial genomics 2012
Bio153 microbial genomics 2012
Bio153 microbial genomics 2012
Bio153 microbial genomics 2012
Bio153 microbial genomics 2012
Bio153 microbial genomics 2012
Bio153 microbial genomics 2012
Bio153 microbial genomics 2012
Bio153 microbial genomics 2012
Bio153 microbial genomics 2012
Bio153 microbial genomics 2012
Bio153 microbial genomics 2012
Bio153 microbial genomics 2012
Bio153 microbial genomics 2012
Bio153 microbial genomics 2012
Bio153 microbial genomics 2012
Bio153 microbial genomics 2012
Bio153 microbial genomics 2012
Bio153 microbial genomics 2012
Bio153 microbial genomics 2012
Bio153 microbial genomics 2012
Bio153 microbial genomics 2012
Bio153 microbial genomics 2012
Bio153 microbial genomics 2012
Bio153 microbial genomics 2012
Bio153 microbial genomics 2012
Bio153 microbial genomics 2012
Bio153 microbial genomics 2012
Bio153 microbial genomics 2012
Bio153 microbial genomics 2012
Bio153 microbial genomics 2012
Bio153 microbial genomics 2012
Bio153 microbial genomics 2012
Bio153 microbial genomics 2012
Bio153 microbial genomics 2012
Bio153 microbial genomics 2012
Bio153 microbial genomics 2012
Bio153 microbial genomics 2012
Bio153 microbial genomics 2012
Bio153 microbial genomics 2012
Bio153 microbial genomics 2012
Bio153 microbial genomics 2012
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Bio153 microbial genomics 2012

1,910

Published on

Introductory lecture on microbial genomics for University of Birmingham Biosciences first-year module Bio153

Introductory lecture on microbial genomics for University of Birmingham Biosciences first-year module Bio153

Published in: Education, Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,910
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
110
Comments
0
Likes
1
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Bio153 Microbial Genomics Professor Mark Pallen University of Birmingham
  • 2. Microbial Genomics General features of microbial genomes Historical overview Genome sequencing, annotation and analysis Genome evolution What we can learn from a genome sequence?
  • 3. General features of genomesMicrobial Human Small WSIWYG genomes  Very large genomes (Mbp) (Gbp) Gene density high (>90%)  intergenic regions short  Gene density low  very little repetitiveor non-  Only 25% is genes coding DNA  Introns mean only1%  Introns very rare codes Protein-coding genes  Genes can span ≥30 (CDS) short (~1kbp) kbp Operons with promoters just upstream  Genes have ~3 Fewer non-coding RNAs transcripts  Splicing and splice variants
  • 4. Bacterial genome organisationChromosomes Plasmids Most commonly single  Independent autonomous replicon, can be circular or circular chromosome linear (always DNA)  may integrate into chromosome  BUT many species have  copy number varies 1 to 10s linear chromosome(s) (e.g.  often carry non-essential genes Borrelia, Streptomyces, Rh that confer an adaptive odoccus) advantage in certain conditions  BUT a few species with two chromosomes (e.g. Vibriocholerae) Can be mix of circular and linear (e.g. Agrobacteriumtumefacien s, B. burgdoferi)
  • 5. Bacterial Genome Size species which occupy restricted ecological niches, (e.g. obligate intracellular parasites and endosymbionts) tend to have smaller genomes (<1.5 Mb) than generalist bacteria  smallest known bacterial genome: Carsonellaruddii, 160 kb! (Nakabachi et al. 2006)  BUT mitochondrial genomes are smaller largest genomes found in bacteria with complex developmental cycles, e.g. Streptomyces  largest bacterial genome: Sorangiumcellulosum, 13 Mb
  • 6. Bacterial genomes are made from DNA In 1944, Oswald Avery, Colin MacLeod, and Maclyn McCarty showed that DNA (not proteins) was the genetic material responsible for inheritance.  Identified DNA as the "transforming principle" while studying Streptococcus pneumoniae  Avery, Oswald T., Colin M. MacLeod, and Maclyn McCarty. Studies on the chemical nature of the substance inducing transformation of pneumococcal types. Journal of Experimental Medicine. 1944 Feb 1; 79(2): 137-158. In 1952, this work was supported by Alfred Hershey and Martha Chase who showed that only the DNA of a virus needs to enter a bacterium to infect it.  Used radioactively labelled bacteriophage  Hershey AD and Chase M. Independent functions of viral protein and nucleic acid in growth of bacteriophage. Journal of General Physiology. 1952. 36: 39-56.
  • 7. Viral genomes are variable Use RNA or DNA but not both in genome  Some have RNA genomes! Grouped into families depending on  type of genome: DNA or RNA, single- or double- stranded  Typically dozens of genes or fewer  Large genomes in pox viruses (~200 kb)  Massive genomes in megaviruses (1Mbp!)
  • 8. Microbial Genomics TimelineYear Milestone1977 Invention of dideoxy chain terminator sequencing (“Sanger sequencing”)1979 Sequencing of the 5.3-kilobase genome of bacteriophage phiX1741981 First human mitochondrial genome sequence*1982 Determination of the 48.5-kilobase genome sequence of bacteriophage lambda through first use of shotgun sequencing1986 Development of automated fluorescent sequencing1995 First complete genome sequences obtained of free-living bacteria (Haemophilus influenzae and Mycoplasma genitalium)1996 Mycoplasma becomes first bacterial genus that has completely sequenced genomes from two different species (M. genitalium and M. pneumoniae)1997 First genome sequences from Escherichia coli and Bacillus subtilis1998 First genome sequence from Mycobacterium tuberculosis; genome sequence from Rickettsiaprowazekii provides first evidence of reductive evolution
  • 9. Microbial Genomics TimelineYear Milestone1999 Helicobacter pylori becomes the first species with completely sequenced genomes from two isolates2000 Meningococcal genome sequence primes first application of reverse vaccinology2001 Second E. coli genome sequences reveal unexpected level of horizontal gene transfer; genome sequence of M. leprae provides compelling evidence of bacterial pseudogenes and reductive evolution; first paper reporting genome sequences of two strains from one species (Staphylococcus aureus) in a single publication.2002 Genome sequencing of multiple strains of Bacillus anthracis to provide markers for forensic epidemiology2003 Genome sequencing of uncultivable Tropherymawhippleileads to design of axenic growth medium2004 Genome sequence of mimivirus blurs distinctions between bacteria and viruses2005 Use of whole-genome sequencing used to identify target of new anti-tuberculosis drug Mycoplasma genitalium genome sequenced using pyrosequencing2006- Bacterial metagenomics survey of the Sargasso sea yields >1 million new genes2011 Rise of next-generation or high-throughput sequencing
  • 10. The first genome sequences The first sequenced gene was from bacteriophage MS2  The gene encoding the coat protein  1972  Min Jou W, Haegeman G, Ysebaert M, and Fiers W. Nucleotide sequence of the gene coding for the bacteriophage MS2 coat protein. Nature. 1972 May 12; 237(5350): 82-88. The first sequenced genome was bacteriophage MS2  1976  RNA genome is 3,569 nucleotides  Fiers W, Contreras R, Duerinck F, Haegeman G, Iserentant D, Merregaert J, Min Jou W, Molemans F, Raeymaekers A, Van den Berghe A, Volckaert G, and Ysebaert M. Complete nucleotide sequence of bacteriophage MS2 RNA: primary and secondary structure of the replicase gene. Nature. 1976 Apr 8; 260(5551): 500-507.
  • 11. The first genome sequences The first sequenced DNA genome was bacteriophage Φ- X174  1977  5368 base pairs  Sanger F, Air GM, Barrell BG, Brown NL, Coulson AR, Fiddes CA, Hutchison CA, Slocombe PM, and Smith M. Nucleotide sequence of bacteriophage phi X174 DNA. Nature. 1977 265 (5596): 687-695. The first sequenced bacterial genome was Haemophilus influenzae  1995  1,830,140 base pairs  Fleischmann R, Adams M, White O, Clayton R, Kirkness E, Kerlavage A, Bult C, Tomb J, Dougherty B, and Merrick J. Whole-genome random sequencing and assembly of Haemophilus influenzae Rd. Science, 1995. 269 (5223): 496- 512.
  • 12. Overview of a genome project Choose strain  Closure and finishing  Fresh isolate or tractable  Manually intensive lab strain?  Difficulty depends on Choose strategy how repetitive  Shotgun sequencing  Data Release  Paired-end sequencing  Immediate or delayed?  Draft or complete?  Annotation Choose chemistry  Manually intensive bottle  Sanger; 454; Illumina; neck Ion Torrent  Publication Assembly  Automated
  • 13. Methods for genome sequencing – historicSanger method sequencing Sanger F and Coulson AR. A rapid method for determining sequences in DNA by primed synthesis with DNA polymerase. Journal of Molecular Biology. 1975 94: 441-448. Step 1, a sequence-specific DNA primer is radiolabeled Step 2, the primer is annealed to the template DNA Step 3, the primer is extended by DNA polymerase  Incorporation of a deoxynucleotide - further extension possible  Incorporation of a dideoxynucleotide – chain termination Four reactions set up  ddATP, dATP, dCTP, dGTP, dTTP  ddCTP, dATP, dCTP, dGTP, dTTP  ddGTP, dATP, dCTP, dGTP, dTTP  ddTTP, dATP, dCTP, dGTP, dTTP
  • 14. Methods for genome sequencing – historicSanger method sequencing
  • 15. Methods for genome sequencing –automated Sanger sequencing Smith LM, Sanders JZ, Kaiser RJ, Hughes P, Dodd C, Connell CR, Heiner C, Kent SBH, and Hood LE. Fluorescence detection in automated DNA sequence analysis. Nature. 1986 321: 674-679. Replaced radioisotopes with fluorescent dyes  Safer for the researchers  Each of the four DNA bases could be dyed a different colour  Eliminated the need to run separate reactions in separate lanes  The migration of the dye could be read because of the fluorescence  This information allowed automatic gel reading Further improvements were made  Improved dye chemistry using fluorescent dideoxy-terminators (DuPont): Prober JM, Trainor GL, Dam RJ, Hobbs FW, Robertson CW, Zagursky RJ, Cocuzza AJ, Jensen MA, and Baumeister K. A system for rapid DNA sequencing with fluorescent chain-terminating dideoxynucleotides. Science 238: 336-341.  Replacing slab gels with re-useable capillary tubes: Ruiz-Martinez MC, Berka J, Belenkii A, Foret F, Miller AW, and Karger BL. DNA sequencing by capillary electrophoresis with replaceable linear polyacrylamide and laser-induced fluorescence detection. Analytical Chemistry 1993 65: 2851-2858.
  • 16. Whole-Genome Shotgun Sanger Sequencing Random shearing bacterialchromosome Size selection plasmid vector Pick colonies to create shotgun Cloning library Sequence each insert with two primers Plasmid preps
  • 17. High-throughput Sequencing 100x faster, 100x cheaper!  A disruptive technology Several technologies in the marketplace from 2007 onwards  454 (Roche)  Illumina  Ion Torrent  PacBio Fundamentally new approaches  Solid-phase amplification of clonal templates in “molecular colonies”  Massive increase in number of “clones” compensates for shorter read length  New chemistries for sequence reading  454: pyrophosphate detection on base addition  Illumina: reversible de-protection of fluorescent bases
  • 18. High-Throughput Shotgun Sequencing Random shearing bacterialchromosome Size selection Sequence Amplify Add adapters
  • 19. 454 sequencing Emulsion-based clonal amplificationAnneal sstDNA to Clonal amplification Break Emulsify beads and PCRan excess of DNA occurs inside microreactors, enric reagents in water-in-oil Capture Beads microreactors h for DNA-positive microreactors beads
  • 20. Pyrosequencing DNA template with primer mixed with the enzymes along with the two substrates adenosine 5‟-phosphosulfate (APS) and luciferin1. one of the four nucleotides added to reaction2. If complementary to base in template strand then DNA polymerase incorporates it3. Pyrophosphate (Ppi) released then converted to ATP by sulfurylase in the presence of APS.4. ATP serves as a substrate to luciferase, causing a light reaction.5. Excess nucleotides degraded by apyrase.
  • 21. IlluminaSequencing
  • 22. The Sequence Assembly Problem Sequencing technologies generate reads of <1000 bp These reads must be assembled into a single continuous genomic sequence. Shotgun sequencing exploits many overlapping sequences (high coverage) to infer ordering directly from the sequences themselves
  • 23. The Repeat Problem Repeats at read ends can be assembled in multiple ways Correct ATTTATGTGTGTGTGGTGTG GTGTGGTGTGCACTACTGCT ACTACTGCTGACTACTGTGTGGTGTG GTGTGGTGTGATATCCCT Incorrect ATTTATGTGTGTGTGGTGTG GTGTGGTGTGATATCCCT ACTACTGCTGACTACTGTGTGGTGTG GTGTGGTGTGCACTACTGCT
  • 24. Random shearing bacterial chromosome Size selection for 3kb or 8kb etcObtain sequences from either side of linker Paired-endknown distance apart in genome Sequencing Add linkers Circularise Add adapters Shear and select on size and presence of linkers Create long fragments of known length Obtain sequence from paired ends known distance apart Allows assembly of contigs across repeats into scaffolds
  • 25. Genome Assembly Contig 1 Contig 2 Contig 3 Sequence Gap Scaffold Physical Gap
  • 26. Re-sequencing Short reads (<200bp) inefficient de novo assembly Instead they are mapped against a reference genome Re-sequencing is like assembling a jigsaw puzzle using the image on the lid
  • 27. Genome annotation Annotation is the addition of information about the predicted sequence features to the flat file of DNA code Identification of potential coding sequences - CDS Homology searches to predict function Other features can be annotated as well  rRNAs  Potential promoters  tRNAs  Small non-coding RNAs  Repeat sequences  Insertion sequences (ISs), transposons, gene fragments Location of the origin of replication Determination of the number of bases, genes, and G+C%.
  • 28. How to go from this….?>Escherichia coli K-12 MG1655_3870656-3890655 TGCTGCTGCCTGCTGCGCGGTGCGCTCTACGGATTGCCCGGCGCGATAGAGATCGCTGCCTAAGCCCGCCCCTGCACAACCTGCGTCTATCCACTGCGCCAGGTTTTCTGCGTCACGCCGCAAC GGCAAAGACTGCGATGTCCGATGGCAATACCGCTTTTAACGCTTTGATGTATTGCGGACCAAAAGCCGATGACGGAAATATTTTCAGCGCCTGCGGCGCCCGCTTCGAGCGCGGTAAAGGCTTCG GTCGCCGTCGCGCAGCCGGGGCAGACGTCATGCCGTAGCCCACCGCACGGCGGATCACTTCACTATGGATATTGGGCGTAACGATGAGCTGACAGCCCATCCTGGCGAGCGCATCGACCTGTT CAGGTTTCAGTACCGTACCTGCGCCAATCAACGCCTTGTCGCCGTACGCATCAACGATGCGGGAATGCTTTGCTCCCATTGTGGGGAATTCAGCGGGATTTCAACCGCGTCGAACCCGGCGTCAA TCACCGCGCCAACATGCGCCAGCGCCTCGTCGGGCGTAATACCGCGCAAAATGGCGATCAGCGGGAGTTTAGTTTGCCACTGCATGAGGATGCTCCTTATACCAGCCTGAAATGCCGTGTCGCC CGCCACCGCCGTCACGTCGCAACCCATCGCCTGAAAGGCTTGCTGGTAGCGCGCGGTCAGCGATGTTCCGGCGACAAGGGTGATGGCGTGTTGATGGGCCACATAGTCGCGCATACTGGGACC TCTGCGCCAATCAACAAACCAGAGAGAAATTCGCTGACCTGTTCGCGGGGAAGTGTTCCCAGCACATGCGAGGCGCGAACTTCAAAAAGCTGCGGCAATATGGCGGGCGTATTAAGACCACGCT CAAGGCCAGCTGTGAAGGCATCGGCAGGTTTTCCTGCGGCGGCAAACCTGCGCCAATCAATGAGTGATTTAACAGTAAATGATGTAATTCACCGGTCATCACGGTGCGAAAATCGTTGATTTGCTG GCTATCGGCCTGCACCCATTTGCAATGGGTTCCGGGCATGACATAAAGAGAGGAAGAGCCAGAGCTCGCGCGCCGATCAATTGTGTTTCTTCGCCGCGCATCACATTGTGGTTATCGTCATGAGA GACACATAATCCGGGAATAATCCAGATATTGTCGCCAACTGACGTTAATTGTTCGCCAATAGACGAAAAACAGGCAGGAACAGATAATACGGTGCAACTTTCCAGCCGACGTTGCTGCCAACCATT CCTGCCATTACCACTGGCGTTTTCTCTTCACGCCAGTCGGTCGTGACTTCTGCTAACACCGCAGCCGGAGATTTTCCGTTCAGGCGCGTGACGCCTGCTTCTGATTGCCTGCTCTCAGGCAGTGG TCGCCCTGATAAAGCCAGGCGCGCAGATTGGTCGATCCCCAGTCAATTGCGATGTAGCGAGCTGTCATGTGATTTCCTTTAACCTTCGTGTCGAGCTGGCGATCATGGTAAGCGCCGCCTGCTCT GCCGCATCGCCGTCCTGATGCGTATCGCATCGAACAGCGCCTTATGTTCCTGGAGCGTTTGCGGCATGTTGGCCTCATCGCCCATCCAGGTTCGTTCAAAAACCGCCCGCTGCAGCGAACTGATC GCAATGCTAAGTTGCTGTAACACCGGGTTATGCACCGACTGCAGCACCGCTCGTGGTAGCGAATATCCGCTTCGTTAAACGCTTCGCGGTCCTGATTGTTGGCAATCATCTCGTTCAGCGCCGATT CAATCTGCGCCAGATCGCTGGAAGTCGCGCGCTCTGCTCCCAACGGGCAATCGCCGGTTCCACCAGATTTCGCACTTCGTCATGGCACTGATAAGCCGTGGGTCGTAGTCATTTTCCAGCACCCA TTGCAGTACGTCAGTGTCGAGGTAATTCCACTGGTTACGCGGTGCCACAAACGCCCCGCGATAACGTTTCATTTCAATCAGCCGCTTCGCCATCAGCGAACGGAACACCCACGGATGATGTTGCG CGAGGTTGCAAACTCCTCACAGAGTTCCGCCTCAGCCGGAAGCGGCGAGCCTGGCACGTATTTGCCGTGAACGATCTGTTTACCCAGCGTAATGACAATGCGATCGGTTTTATTGAGAGTCATGG AGAGTCCTTGTGCTTGTATGTTCTTCTCTACTTTACCCCGATCGATGCATAACGCGGCAACTTTGTAGTACCAGCGTGATGACGTTCGCGTTTGCCGTGCGTGTAATGTAGTACAAACTTATATTGTT GTACTACAATTTAGATCACAAAAAGAACAATGCATAAAAAATGACATGCGTCGGGCAGAAATCTGAAAAGGGATATCAGGCGCTAAACAGGAGGGAAAGAAGAGTATGCTTTCAACGGCTTAGCTA CTCGTTTAAAGGATTAATCATGAAGTTGAATTTTAAGGGATTTTTTAAGGCTGCCGGTTTATTCCCACTGCGCTGATGCTTTCAGGCTGTATCTCGTATGCTCTGGTTTCCCATACCGCAAAGGGTAG TTCAGGAAAGTATCAATCGCAGTCAGACACCATCACTGGGCTATCGCAGGCAAAAGATAGTAATGGAACAAAAGGCTATGTTTTTGTAGGGGAATCGTGGATTACCTTATCACTGATGGTGCCGAT GACATCGTTAAGATGCTCAATGATCCAGCACTTAACCGGCACAATATTCAGGTTGCCGATGACGCAAGATTTGTTTTAAATGCGGGGAAAAAGAAATTTACCGGCACAATATCGCTTTACTACTACG GAATAACGAAGAAGAAAAGGCACTGGCAACGCATTATGGTTTTGCCTGTGGTGTTCAACACTGTACCAGGTCACTGGAAAACCTAAAAGGCACAATCCATGAGAAAAATAAAAACATGGATTACTCA AAGGTGATGGCGTTCTACCATCCATTTAAGTGCGATTTTATGAATACTATTCACCCAGAGGCATTCCGGGATGGTGTTTCCGCAGCATTACTGCCAGTGACTGTTACGCTGGACATCATTACTGCAC CGCTGCAATTTCTGGTTGTATATGCAGTAAACCAATAATCAGTAAGCGGGCAAACCGTTTATGCTGTTTGCCCGCCCACAGATTAATTCAGCACATACTTCTCAATAGCAAACGCCACGCCATCTTCA AGGTTAGATTTGGTGACAAAGTTCGCCACTTCTTTCACTGAAGGAATAGCGTTATCCATCGCCACACCGACGCCTGCATATTAATCATTGCGATATCGTTTTCCTGATCGCCAATCGCCATGATTTCT TCCGGTTTAATACCTAACACGTCGGCCAGTGATTTCACCCCCGTACCTTTGTTAACGCGTTTATCGAGGATTTCGAGGAAGTACGGCGCACTTTTCAGCACGGTATATTCTCTTTCACTTCCTGCGG AATACGCGCGATAGCCTGGTCGAGGATGGCGGGTTCATCAATCATCATCACTTTCAGGAACTGGGTATTGGGGTCCATTTTCTCCGCTTCGCAGAACACCAGCGGAATGGTGGCAACGAAGGATT CATGCACCGTGTGTAGCTGATATCACGGTTGGCGGTGTACAGCGTGGTGCGGTCCAGGGCGTGGAAATGAGAACCGACTTCGCGAGAGAGTTTTTCCAGGAAACGATAGTCGTCATAGCTGAGA GCAGTTTGCGCCACGGTGCTACCATCAGCGGCCTTCTGTACCACGCGCCGTTATAAGTAATGCAGTAGTCGCCCGGCTGTTCCATATGCAGCTCTTTCAGGTAGTTGTGCACACCTGCATACGGG CGACCCGTCGTTAGCACGACATTCACGCCACGGGCGCGAGCTGCGGCAATCGCATTTTTAACGGCGGGTGAAAGGTGTGATCGGGCAGCAGAAGGGTGCCATCCATATCGATAGCAATGAGTTT AATAGCCATGAGTTCCCCAGGTAGATTGGTTCCTGACCCATGCTAACGCGATTCCGCTCAAAAATCAGTACAACACCCGAGGGAAAAGGGGGATGCAACGCGCGTGCGTGCTCCCTTTTTGCTTA GCGGAAGAGTTTCCCTTTCAGCAGTTCCATGCCTGCGGAAAGCAGATCGTTATTGGCTTGTGGTGACACTTCACCTTGCGGTGAGAGCGCATCAATAATCTTCGGCAATTGTTCTGCCAGTAAACT GGAAGCTGACTGGTATCCACGCCAAGTTTTTGCCCGAGATCGGACACCGCATTTGTGCCGAGCGCCGATTCCAGTTGCTCGCCACTAACCGATTGATTGCCCTGTTGATTACTCAGCCAGGTTGA GAGAATGGCCCCTAAGCCGCCACTTTGCAGTTTTTCCACAGCACCTGAATGCCGCCCTGCTCCTCAACCCAACTTAAAATAGCCTGATATTTCCCCGCATCGCCTTTCAGAAAGGCACCGACAACTT CATCAAAAAGCCCCATGATAATCACCTGTAAAGCGTTACGTGTTGACCCAAAAAGTATAGATTTGCGGATGATAATTGCGGATTGCAGAAATAAAAAGGGCGGAGATGATCTCCGCCCTTTTCTTAT AGCTTCTTGCCGGATGCGGCGTGAACGCCTTATCCGGCCTACAAAATCATGAAAATTCAATACATTGCAAGATTTTCGTAGGCCTGATAAGCGTGCGCATCAGGCACGCTCGCATGGTTAGCGCCA TTAAATATCGATATTCGCCGCTTTCAGGGCGTTCTCTTCAATAAACGCACGGCGCGGTTCAACGGCGTCGCCCATCAGCGTGGTGAACAACTGGTCGGCAGCAATCGCATCTTTAACGGTAACCG CAGCATACGACGACTTTCCGGGTCCATAGTGGTTTCCCACAGCTGTTCCGGGTTCATCTCGCCCAGACCTTTATAACGCTGGATGGAGAGGCCGCGACGGGACTCTTTCACCAGCCAGTCCAGCG CCTGCTCGAAGCTGGCTACCGGCTGACGCGCTCGCCACGTTCGATAAACGCATCTTCTTCCAGCAAGCCACGCAGTTTCTCACCCAGCGTGCAGATACGACGATATTCGCCACCGGTGATAAACT CGTGATCCAGCGGATAGTCAGTATCCACACCGTGGGTACGCACGCGAACAATCGGCTCAACAGGTTTTGCTCAGCATTGGTGTGAACATCAAACTTCCACTGGCTGCCGTGCTGTTCTTTGTCGTT CAGTTCGCTGACCAGCGCGTTCACCCAGCGGGTAACGGTCTGCTCATCAGAAAGGTCAGCTTCCGTCAACGTCGGCTGATAGATAAGTCTTTCAGCATTGCTTTCGGATAACGACGCTCCATACG ATTGATCATTTTCTGCGTCGCGTTGTACTCAGATACCAGTTTCTCTAACGCTTCGCCAGCCAATGCCGGTGCACTGGCGTTGGTGTGCAGCGTTGCGCCGTCCAGCGCGATAGAGATTGGTACTG ATCCATCGCTTCGTCGTCTTTAATGTACTGTTCCTGCTTGCCTTTCTTCACTTTGTACAGCGGCGGCTGAGCGATGTAGACGTGACCGCGTTCAACGATTTCCGGCATCTGACGATAGAAGAAGGT CAACAGCAGCGTACGAATGTGGAGCCGTCGACGTCCGCATCGGTCATGATGATGATGCTGTGATAACGCAGTTTGTCCGGGTTGTACTCGTCACGACCGATACCACAGCCAAGCGCGGTGATAA GCGTCGCCACTTCCTGAGAAGAGAGCATCTTATCGAAGCGCGCTTTCTCGACTTGAGGATTTTACCCTTCAGCGGCAGAATCGCCTGGTTCTTGCGGTTACGCCCCTGCTTCGCAGAGCCGCCCG CGGAGTCCCCTTCCACCAGGTACAGTTCGGAAAGCGCCGGATCGCGTTCCTGGCAGTCTGCCAGTTTGCCCGGCAGGCCCGCAAGTCGAGCGCACCTTTACGGCGGGTCATTTCACGCGCGCG ACGCGGCGCTTCACGGGCACGGGCAGCATCGATAATTTTGCCAACCACGATTTTCGCGTCGGTTGGGTTTTCCAGCAGGTATTCTGCCAGCAGTTCGTTCATCTGCTGTTCAACGCCGATTTCACC TCAGAAGAAACCAGTTTGTCTTTGGTCTGGGAGGAGAATTTCGGGTCCGGCACTTTCACGGAAACGACCGCAATCAGGCCTTCACGCGCATCGTCACCGGTGGCGCTGACTTTGGCTTTTTTGCT GTAGCCTTCTTTGTCCATTAGGCGTTCAGGGTACGGGTCATCGCCGCACGGAAGCCTGCCAGGTGAGTACCGCCGTCACGCTGCGGAATGTTGTTGGTAAAGCAGTAGATGTTTTCCTGGAAGCC ATCGTTCCACTGCAACGCCACTTCGACGCCAATACCGTCTTTTTCAGTGAGAAGTAGAAGATATTCGGGTGGATCGGCGTTTTGTTCTTGTTCAGATATTCAACGAACGCCTTGATGCCGCCTTCAT AGTGGAAGTGGTCTTCTTTGCCGTCGCGCTTGTCGCGCAGACGAATGGAAACGCCGGAGTTGAGGAACGACAACTCCGCAGACGTTTCGCCAGAATTTCATATTCGAACTCGGTCACATTGGTGA AGGTTTCGAGGCTGGGCCAGAAACGCACCATGGTGCCGGTTTTTTCAGTCTCGCCGGTAACCGCCAGCGGGGCCTGCGGTACACCGTGTTCGTAGATCTGACGGTGATTTTACCCTCGCGCTGG ATAACCAGCTCCAGTTTTTGCGACAGGGCGTTTACTACCGAAACACCAACGCCGTGCAGACCGCCGGACACTTTATAGGAGTTATCGTCAAATTTACCGCCTGCGTGCAGAACGGTCATGATCACT TCCGCCGCCGA
  • 29. …to this? FT gene complement(9299..10702) FT /db_xref="GenBank:2367266” FT /gene="dnaA” FT /note="b3702” FT CDS complement(9299..10702) FT /db_xref="GI:2367267” FT /db_xref="PID:g2367267” FT /function="putative regulator; DNA - replication, repair, FT restriction/modification” FT /codon_start=1 FT /protein_id="AAC76725.1” FT /gene="dnaA” FT /translation="MSLSLWQQCLARLQDELPATEFSMWIRPLQAELSDNTLALYAPNR FT FVLDWVRDKYLNNINGLLTSFCGADAPQLRFEVGTKPVTQTPQAAVTSNVAAPAQVAQT FT QPQRAAPSTRSGWDNVPAPAEPTYRSNVNVKHTFDNFVEGKSNQLARAAARQVADNPGG FT AYNPLFLYGGTGLGKTHLLHAVGNGIMARKPNAKVVYMHSERFVQDMVKALQNNAIEEF FT KRYYRSVDALLIDDIQFFANKERSQEEFFHTFNALLEGNQQIILTSDRYPKEINGVEDR FT LKSRFGWGLTVAIEPPELETRVAILMKKADENDIRLPGEVAFFIAKRLRSNVRELEGAL FT NRVIANANFTGRAITIDFVREALRDLLALQEKLVTIDNIQKTVAEYYKIKVADLLSKRR FT SRSVARPRQMAMALAKELTNHSLPEIGDAFGGRDHTTVLHACRKIEQLREESHDIKEDF FT SNLIRTLSS” FT /product="DNA biosynthesis; initiation of chromosome FT replication; can be transcription regulator” FT /transl_table=11 FT /note="f467; 100 pct identical to DNAA_ECOLI SW: P03004; FT CG Site No. 851”
  • 30. Or this?
  • 31. An ORF is not a CDS!An ORF is just an open reading frameThere are many more ORFs than protein coding genes (CDSs) in agenome Non-coding ORFs CDSs (note ORF can extend upstream of start codon)
  • 32. The Problem of Frameshift Errors Actual sequence 10 20 30 40 50 60 70 | | | | | | | ATGAGTACCGCTAAATTAGTTAAATCAAAAGCGACCAATCTGCTTTATACCCGCAACGATGTCTCCGACAGCGAGAAA M S T A K L V K S K A T N L L Y T R N D V S D S E K • V P L N • L N Q K R P I C F I P A T M S P T A R K E Y R • I S • I K S D Q S A L Y P Q R C L R Q R E K 10 20 30 40 50 60 70 | | | | | | | ATGAGTACCGCTAAATTAGTTAAATCAAAAAGCGACCAATCTGCTTTATACCCGCAACGATGTCTCCGACAGCGAGAA M S T A K L V K S K S D Q S A L Y P Q R C L R Q R E • V P L N • L N Q K A T N L L Y T R N D V S D S E K E Y R • I S • I K K R P I C F I P A T M S P T A R K Frameshifted sequence after single base error
  • 33. Homology Similarities in form the cat sat on the mat (sequence) allow us die Katze sass auf der Matte to infer similarities in “meaning” (structure and function) Homology is not just sequence similarity  Two sequences can be similar without any common ancestry, particularly if low complexity vge|GBant88-2 ITLITCVSVKDNSKRYVVAG vge|GEfae9-178 LTLITCDQATKTTGRIIVIA vge|GSpne1-403 MTLITCDPIPTFNKRLLVNF sortase_staur LTLITCDDYNEKTGVWEKRK
  • 34. Types of Homology Homologues can be divided into  Orthologues: lines of descent congruent with whole genome  Paralogues: result of gene duplication  Xenologues: result of HGT
  • 35. Homology Searches The aim of homology searches is to identify sequences within these databases that are homologous to your sequence. This involves comparing your sequence with all the database sequences  looking for stretches of sequence that appear to be similar  then scoring the matches and ranking them  a measure of the significance of the match is given Most common program used for homology searches is BLAST
  • 36. Bacterial Genome Dynamics Gene Loss Gene Duplication Gene Gain Drastic downsizing in isolated intracellular niches Horizontal gene transfer by phage, plasmids, pathogenicity islands Bacterial Rapid emergence ofAccumulation of genetically uniformpseudogenes and IS Genome pathogens from variableelements after shift to Dynamics ancestral populationsnew niche Recombination and rearrangements single nucleotide polymorphisms (SNPs) Gene Change
  • 37. Horizontal gene transfer Horizontal (or lateral) gene transfer denotes any transfer, exchange or acquisition of genetic material that differs from the normal mode of transmission from parents to offspring (vertical transmission). Vertical gene transfer Horizontal gene
  • 38. Bacterial mobile genetic elements Transposons  pieces of DNA that act as „jumping genes‟ that change location on chromosome or plasmid chromosomal localization.  encode transposase that catalyses the transposition event  can carry resistance or virulence genes Insertion sequences (IS elements)  transposable elements that encode only the transposase  multiple copies of same IS within genome provide targets for homologous recombination, rearrangements and replicon fusions Conjugative transposons  normally integrated into the chromosome  excise then transferred to recipient cells by conjugation
  • 39. Bacterial mobile genetic elements Plasmids  self-replicating extrachromosomalreplicons  usually circular but can be linear  Can carry resistance or virulence genes Bacteriophages  bacterial virusescan carry virulence genes  can insert into bacterial chromosome as prophages (lysogeny) Integrons  complex natural cloning and gene expression systems able to capture promoterless gene cassettes by site- specific recombination  allow formation of large arrays of gene cassettes transferred as a whole between different replicons.
  • 40. Genomic islands large chromosomal regions, part of the flexible gene pool previously transferred by other mobile genetic elements present in some bacteria but absent in close relatives carry multiple genes that increase phenotypic versatility contribute to dynamic character of bacterial chromosomes and can be excised from the chromosome and transferred to other recipients pathogenicity islands contain dozens of genes that allow quantum leap to complex new virulence
  • 41. Core genomes and Pangenomes Core genome  pool of genes shared by all members of a bacterial species Accessory or dispensable genome  pool of genes present in some but not all genomes within the same bacterial species Pangenome  global gene repertoire of a bacterial species, comprised of core genome + accessory genome Metagenome  global gene repertoire of mixed microbial population
  • 42. Escherichia coli Core and Pan-genomes Welch et al. Proc Natl Acad Sci U S A. 2002 Dec 24;99(26):17020-4
  • 43. Metagenomics Environmental shotgun sequencing  DNA extracted from mixed microbial communities sequenced en masse Assembled into contigs  Typically only small contigs can be obtained
  • 44. Uses of a genome sequence Gene discovery  Fuelling hypothesis driven research on pathogen biology Comparative genomics  SNP discovery and genomic epiemiology Functional genomics  Transcriptomics  Proteomics  Interactome  Structural Genomics  Mass Mutagenesis
  • 45. Haemolytic-uraemic syndrome Shiga-toxin-producing E. coli (STEC)  bloody diarrhoea; damage to kidneys and brain  anaemia; loss of platelets
  • 46. German E. coli O104:H4 outbreak May-July 2011 >4000 cases >40 deaths Link to sprouting seeds High risk of haemolytic- uraemic syndrome Females particularly at risk Frank et al DOI: 10.1056/NEJMoa1106483
  • 47. Take-away messages from the genome Pathogens don‟t bother with passports!  Not a new strain: something similar seen in Germany ten years ago and in Korea  closest genome-sequenced strain was isolated from Central African Republic in late 1990s, belongs to an enteroaggregative lineage German STEC probably comes from a lineage circulating in human populations rather than from an animal source (unlike E. coli O157)
  • 48. Take-away messages Bacteria evolve quickly  Virulence factors in E. coli can jump from one lineage to another on mobile genetic elements  Pathotypes can overlap and evolve  Antibiotic resistance seen where no obvious prior use of antibiotics
  • 49. Take-away messages from genome sequence Genome sequencing brings the advantages of  open-endedness (revealing the “unknown unknowns”),  universal applicability  ultimate in resolution Bench-top sequencing platforms now generate data sufficiently quickly and cheaply to have an impact on real-world clinical and epidemiological problems
  • 50. Comprehensive Coverage of Human Microbiome
  • 51. Comprehensive coverage of tree of life
  • 52. What will you do when you can sequenceeverything?

×