Whole genome sequencing of bacteria & analysis


Published on

1 Comment
No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Whole genome sequencing of bacteria & analysis

  2. 2. INTRODUCTION  1977 - first complete genome to be sequenced was bacteriophage X174 - 5386 bp  1995 - first complete genome sequence from a free living organism - Haemophilus influenzae (1.83 Mb) by whole genome shotgun approach  Sanger & Coulson (1977) - used chain-terminating dideoxynucleotide analogues  Maxam & Gilbert (1977) chemical degradation DNA sequencing - terminally labeled DNA fragments were chemically cleaved at specific bases and separated by gel electrophoresis
  3. 3. http://www.genomesonline.org/cgi-bin/GOLD/sequencing_status_distribution.cgi 429 Genome online database (GOLD)
  4. 4. ARCHON X PRIZE  X PRIZE Foundation in Santa Monica, CA, has introduced the Archon X PRIZE for Genomics and will award a sum of $10 million to the first team that can design a system capable of sequencing 100 human genomes in 10 days
  5. 5. SEQUENCING TECHNOLOGY  First generation  Sanger’s dideoxy chain terminating tech  Maxam & Gilbert chemical degradation tech  Next generation sequencing (NGS)  454/Roche - pyrosequencing  Illumina/ Solexa - reversible dye terminators  SOLiD /ABI- sequential ligation of oligonucleotide probes Second generation HT-NGS – sequencing after amplification
  6. 6.  Heliscope  SMRT (Pacific biosciences)  Single molecule real time (RNAP) sequencer  Nanopore DNA sequencer  Ion Torrent sequencing technology (PostLight)  VisiGen biotechnologies – FRET  Advantages of 3rd generation HT-NGS over 2nd  higher throughput  faster turnaround time  longer read lengths  higher consensus accuracy  small amounts of starting material  low cost Third generation HT-NGS - Single molecule sequencing
  7. 7. ADVANTAGES OF HT-NGS  Massive parallel sequencing of hundreds of thousands or millions of templates  Preliminary and tedious cloning work is eliminated and substituted by PCR amplification  Most recent technologies, even PCR is eliminated, because single DNA molecules  Economic  Reduced time
  8. 8. DISADVANTAGES OF HT-NGS  Most NGSTs produce short reads  Constructions of fragment libraries remain tricky and involve several steps of fragmentation, adaptor ligation and PCR amplification  Short homopolymers with the 454 technology  Modified nucleotides cause mis-incorporation or block further incorporation if the florescent moiety cannot be completely removed  Assembly of short reads into longer sequences
  9. 9. Illumina/ Solexa technology
  10. 10. zero-mode waveguides (ZMWs)
  11. 11. Selection of a technology for an experiment
  12. 12. GENOME ASSEMBLY  Assemblers can join sequences together based on overlapping regions between the sequences  Composed of contigs and scaffolds  Contigs - contiguous consensus sequences that are derived from collections of overlapping reads  Scaffolds - ordered and orientated sets of contigs that are linked to one another by mate pairs of sequencing reads  N50 - basic statistic for describing the contiguity of a genome assembly. The longer the N50 is, the better the assembly
  13. 13.  Alignment against a reference genome sequence  De novo assembly Construction of longer sequences, such as contigs or genomes, from shorter sequences, such as sequence reads, without prior knowledge of the order of the reads or reference to a closely related sequence
  14. 14. GENE PREDICTION  Ab initio gene prediction - mathematical models rather than external evidence (such as EST and protein alignments) to identify genes and to determine their intron–exon structures  Evidence-driven gene prediction - using ESTs, can be used to identify exon boundaries unambiguously. Great potential to improve the quality of gene prediction in newly sequenced genomes. ESTs and proteins must first be aligned to the genome  Commonly used tools for gene prediction in prokaryotes Glimmer, GeneMark
  15. 15. GENOME ANNOTATION  Is the extraction of biological knowledge from raw nucleotide sequences  Seeks to identify every potential protein coding gene (ORFs)  Used to compare in available database like BlastP  ‘Structural’ genome annotation is the process of identifying genes and their intron–exon structures  ‘Functional’ genome annotation is the process of attaching meta-data such as gene ontology terms to structural annotations
  16. 16. APPLICATIONS  Very large no of short reads help to identify single nucleotide polymorphisms (SNP) when comparing them in reference genome  Identification of rearrangements, deletions, insertions, inversions  Used to generate expressed sequence tags (EST) from RNA sequencing  Also to detect small regulatory RNAs  Illumia technoloy - ChIP Seq to study protein - DNA interactions  Metagenomics
  17. 17. LEADS TO DEVELOPMENT  Functional genomics  Comparative genomics  Environmental genomics (Metagenomics)
  18. 18. FUNCTIONAL GENOMICS  Reveals genome structure and its functional relation  Orthologs - they represent genes derived from a common ancestor that diverged because of divergence of the organism, tend to have similar function  Paralogs are homologs produced by gene duplication and represent genes derived from a common ancestral gene that duplicated within an organism and then diverged, tend to have different functions  Xenologs are homologs resulting from the horizontal transfer of a gene between two organisms. The function of xenologs can be variable, depending on how significant the change in context was for the horizontally moving gene. In general, though, the function tends to be similar
  19. 19. PHYLOGENETIC ANALYSIS  Phylogenetic trees, which are used to classify the evolutionary relationships between homologous genes represented in the genomes of divergent species Internal Nodes or Divergence Points Branches or Lineages A B C D E Terminal Nodes Ancestral Node or ROOT of the Tree
  20. 20. COMPARATIVE GENOMICS  Comparison of genome sequences reveals much information about genome structure and evolution, including importance of lateral gene transfer  Tool to discover how microbs adapted to particular ecology and in development of new therapeutic agents
  21. 21. METAGENOMICS  Genomics-based study of genetic material recovered directly from environmentally derived samples without laboratory culture and compared with all previously sequenced genes  Enable how microbs adapt extreme environments which help to discover new metabolic pathway and protective mechanisms
  22. 22. IMPACT OF GENOME SEQUENCING  Revealed genome reduction in I/C bacteria  Genome plasticity (rearrangements, mobile elements)  Gene duplication and diversification of protein function  Lateral gene transfer & acquisition of new functions  Adaptation to environments, virulence  Industrial process - fermentation tech,  Bioremediation  Biotransformation  Development of vaccines  Bacterial diversity  Synthetic biology  Epigenetics
  23. 23. REVERSE VACCINOLOGY  Use of genomic sequence information to identify novel and better suited protein candidates for vaccine  Serogroup B Neisseria meningitidis – based on genomic data all proteins predicted to be surface exposed, therefore accessible to antiobodies  Suitable candidates selected after sequencing various strains  Streptococcus agalactiae  Pan-genome composed of core genome, the genes present in all sequence strains and the dispensable genome made of genes present in a subset of strains
  24. 24.  Synthetic biology - from sequence of entire genome to synthesize genes de novo  Identification of minimal genome, the smallest set of genes that enbles life - Mycoplasma genitalium
  25. 25. DATABASES AND TOOLS RELATED WITH BACTERIAL GENOMIC DATA  NCBI Entrez Genome Project database:  http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db = genomeprj  A searchable collection of complete and incomplete (in-progress) large-scale sequencing, assembly, annotation, and mapping projects for cellular organisms  NCBI, Bacteria Genome Database:  http://www.ncbi.nlm.nih.gov/genomes/static/eub.html  The Genome database provides views for a variety of genomes, complete chromosomes, sequence maps with contigs, and integrated genetic and physical maps  Bacterial Genomes at The Sanger Institute: • http://www.sanger.ac.uk/Projects/Microbes/ • This web contains a list of funded, on-going, or completed projects of pathogens sequenced at this institute  TIGR Comprehensive Microbial Resource (CMR):  http://cmr.tigr.org/tigr-scripts/CMR/CmrHomePage.cgi  A free website displaying information on all the publicly available, complete prokaryotic genomes
  26. 26.  GOLD: Genomes OnLine Database:  http://www.genomesonline.org/  A genome database containing information about which genomes have been sequenced or are in progress  Microbial Genome Database for Comparative Analysis (MBGD):  http://mbgd.genome.ad.jp/  A database for comparative analysis of completely sequenced microbial genomes  Virulence Factors of Bacterial Pathogens (VFDB):  http://zdsys.chgb.org.cn/VFs/main.htm  VFDB is an integrated and comprehensive database of virulence factors for bacterial pathogens  Genome Information Broker:  http://gib.genes.nig.ac.jp/  A comprehensive data repository of complete microbial genomes in the public domain. Many microbial genomes can be explored graphically  Islander, a Database of Genomic Islands:  http://www.indiana.edu/~islander  This database contains genomic islands discovered in completely sequenced bacterial genomes
  27. 27.  GenoList genome browser at Institute Pasteur:  http://genolist.pasteur.fr/  Contains access to diverse genome browsers of pathogenic bacteria  IslandPath:  http://www.pathogenomics.sfu.ca/islandpath/update/IPindex.pl  An aid to the identification of genomic islands, including pathogenicity islands, of potentially horizontally transferred genes  HGT-DB:  http://www.tinet.org/~debb/HGT/  A database containing the prediction of horizontally transferred genes in several prokaryotic complete genomes  E. coli genome project:  http://www.genome.wisc.edu  A site devoted to the E. coli genome project with an updated annotation of the genome