Comparative genomics


Published on

Published in: Education, Technology
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Comparative genomics

  1. 1. 1
  2. 2. Introduction • Comparative genomics is a large-scale, holistic approach that compares two or more genomes to discover the similarities and differences between the genomes and to study the biology of the individual genomes • The subject of comparative genomics impinges on – Evolutionary biology and phylogenetic reconstructions of the tree of life, – Drug discovery programs, – Function predictions of hypothetical proteins – Identification of genes, regulatory motifs and other non-coding DNA motifs – Genome flux and dynamics 2
  3. 3. Computational tool for genome-scale sequence alignment • The first step in comparative genomics analysis is often the alignment of two genome sequences • It is a technically challenging problem Algorithms/tools URL BLASTN and MEGABLAST GLASS MUMmer PatternHunter products/ph.php PipMaker VISTA WABA 3 L. Wei et. al. 2002
  4. 4. Related terms.. • Homology • Homologous • Orthologous • Paralogous • Xenologous • Analogoues • Horizontal gene transfer 4
  5. 5. • Homology is the relationship of any two characters (such as two proteins that have similar sequences) that have descended, usually through divergence, from a common ancestral character • Homologues are thus components or characters (such as genes/proteins with similar sequences) that can be attributed to a common ancestor of the two organisms during evolution. Homologues can either be orthologues, paralogues, or xenologues • Orthologues are homologues that have evolved from a common ancestral gene by speciation. They usually have similar functions • Paralogues are homologues that are related or produced by duplication within a genome. They often have evolved to perform different functions • Xenologues are homologues that are related by an interspecies (horizontal transfer) of the genetic material for one of the homologues • Horizontal (Lateral) Gene Transfer is the movement of genetic material between species (or genus) other than by vertical descent. 5
  6. 6. 6
  7. 7. 7
  8. 8. Methods for comparative genomics • Comparative analysis of genome structure • Comparative analysis of coding regions • Comparative analysis of non-coding regions 8
  9. 9. Comparative analysis of genome structure • Analysis of the global structure of genomes, such as nucleotide composition, syntenic relationships, and gene ordering offer insight into the similarities and differences between genomes. • This provide information on the organization and evolution of the genomes, and highlight the unique features of individual genomes • The structure of different genomes can be compared at three levels: – Overall nucleotide statistics, – Genome structure at DNA level, and – Genome structure at gene level. 9
  10. 10. Comparison of overall nucleotide statistics • Overall nucleotide statistics, such as – Genome size, – Overall (G+C) content, – Regions of different (G+C) content, – Genome signature such as codon usage biases, – Amino acid usage biases, and the ratio of observed di- nucleotide frequency and – The expected frequency given random nucleotide distribution • These all present a global view of the similarities and differences of the genomes. 10
  11. 11. Comparison of genome structure at DNA level • Chromosomal breakage and exchange of chromosomal fragments are common mode of gene evolution. They can be studied by comparing genome structures at DNA level. – Identification of conserved synteny and genome rearrangement events – Analysis of breakpoints – Analysis of content and distribution of DNA repeats 11
  12. 12. Comparison of genome structure at gene level • Chromosomal breakage and exchange of chromosomal fragments cause disruption of gene order • Therefore gene order correlates with evolutionary distance between genomes 12
  13. 13. Comparative analysis of coding regions 13 Number of algorithms that have been use in comparative genomics to aid function prediction of genes. Identification of gene-coding regions comparison of gene content comparison of protein content Comparative genome based function prediction
  14. 14. Identification of gene-coding regions • The analysis and comparison of the coding regions starts with the gene identification algorithm that is used to infer what portions of the genomic sequence actively code for genes. • There are four basic approaches for gene identification L. Wei et. al. 2002 14
  15. 15. Comparison of gene content • After the predicted gene set is generated, it is very interesting and important to compare the content of genes across genomes • The first statistics to compare is the estimated total number of genes in a genome, elucidate the similarities and differences between the genomes include percentage of the genome that code for genes, distribution of coding regions across the genome (a.k.a. gene density), average gene length, codon usage • This is often done using a pairwise sequence comparison tool such as BLASTN or TBLASTX 15
  16. 16. Comparison of protein content • A second level of analysis that can be performed is to compare the set of gene products (protein) between the genomes, which has been termed ‘‘comparative proteomics” • It is important to compare the protein contents in critical pathways and important functional categories across genomes • Two widely used resources for pathways and functional categories are the KEGG pathway database and the Gene Ontology (GO) hierarchy L. Wei et. al. 2002 16
  17. 17. cntd… • Interesting statistics to compare include – Level of sequence identity between orthologous pairs across genome – Paralogous pairs within genome, – Number of replicated copies in corresponding paralog families – Functions of the paralogs – Locations of members of paralog families across the genome 17
  18. 18. Comparative genomics-based function prediction • functional assignment of genes in a non similarity-based manner • This rely on the basic premise that genes; that are functionally related, are genes that are closely associated across genomes in some form • This include three methods: – Co-conservation across genomes – Conservation of gene clusters and genomic context across species – Physical fusion of functionally linked genes across species (Domain fusion analysis) 18
  19. 19. Comparative analysis of noncoding regions • Noncoding regions of the genome gained a lot of attention in recent years because of its predicted role in regulation of transcription, DNA replication, and other biological functions • This approach is based on the presumption that selective pressure causes regulatory elements to evolve at a slower rate than that of non regulatory sequences in the non coding regions 19
  20. 20. Insights into Genome Fluxes and the Processes of Evolution • From an evolutionary biology perspective, whole genome comparisons provide molecular insights into the processes of evolution that include the molecular events responsible for the variations and fluxes that occur through a genome. These include processes like, inversions, translocations, deletions, duplications and insertions. 20
  21. 21. The Impact of Comparative Genomics in Phylogenetic Analysis Schematic depiction of Microsporidia's phylogenetic position based on Small Subunit RNA (SSU rRNA) as an early branching eukaryote that evolved prior to the acquisiton of mitochondria, and it's subsequent placement based on a composite gene phylogeny where it was placed closer to fungi. The latter placement has been confirmed by the complete sequenceof the micro-sporidia, Encephalitozoon cuniculi, where despite the absence of mitochondria, the presence of several mitochondrial genes could be observed. 21
  22. 22. Comparative Genomics in Drug Discovery • Comparative genomic studies throw important light on the pathogenesis of organisms, throwing up opportunities for therapeutic intervention as well as help in understanding and identifying disease genes • One of the most important fallouts of comparative analyses at a genome-wide scale is in the ability to identify and develop novel drug targets • If one is looking for antibacterial, antifungal, or antiprotozoal proteins to be used as targets, comparative genome analysis can reveal virulence genes, uncharacterized essential genes, species- specific genes, organism-specific genes, while ensuring that the chosen genes have no homologues in humans 22
  23. 23. Comparative genomics in drug discovery programs. A flow chart diagram explaining how comparative genomics can facilitate drug discovery programs for the discovery of new antimicrobials 23
  24. 24. Looking beyond… • As comparative genomics moves from between kingdoms to between genus to between species analysis, the next step is to carry out comparisions between individuals or strains that are members of a particular species • This would allow us to investigate variations at the individual level and to enable one to determine the propensity of an individual to respond to a drug or to come down with a disease or infection 24
  25. 25. 25