Your SlideShare is downloading. ×
BITS - Comparative genomics: gene family analysis
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

BITS - Comparative genomics: gene family analysis

954

Published on

This is the second presentation of the BITS training on 'Comparative genomics'. …

This is the second presentation of the BITS training on 'Comparative genomics'.

It reviews the different methods of investigating sequence homology on the gene family level.

Thanks to Klaas Vandepoele of the PSB department.

Published in: Technology
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
954
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
28
Comments
0
Likes
2
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Comparative genomicsin eukaryotesGene family analysis Klaas Vandepoele, PhDProfessor Ghent UniversityComparative & Integrative GenomicsVIB – Ghent University, Belgium http://www.bits.vib.be
  • 2. Workflow2
  • 3. Applications of clustering the proteome(s)  Gene families form the basis for the evolutionary (or phylogenetic) analysis of  Detection of orthologs and paralogs  Gene duplication, family expansions, pseudogene formation and gene loss  Species taxonomies  Horizontal Gene Transfer (HGT)  Evolution of gene structure • Introns • Protein domain organisation & (re)arrangements  Base composition and codon usage3
  • 4. I. Structural annotation: genome- wide versus family-wise  Rationale family-wise annotation  Since every gene has different (sequence) characteristics and different genes evolve at different rates, using these characteristics to determine homologous gene models will improve the overall structural annotation quality  Properties:  Slow & nearly-manual procedure  High-quality gene models revealing biological novel findings4
  • 5. Workflow family-wise annotation procedure Collecting experi- MSA experimental Family HMMbuildmental representatives representatives HMM profile EST/cDNA BLAST Species X proteome Protein motifs Ab initio gene prediction Correction gene model Putative HMMsearch Homologs Classification using Phylogenetic trees5 Detailed characterization http://hmmer.janelia.org/
  • 6. Experimental representativesInterProScanPFAM HMM logo Clustalw + JalView6
  • 7. BLAST / HMMsearch 1. Use multiple sequence alignment to create HMM profile 2. Use HMM profile to search for similar proteins7
  • 8. Representatives + putative homologs BioEdit Sequence EditorSuffix finalcds indicates corrected gene model compared to the original gene modelgenerate by the ab-initio gene prediction  Multiple sequence alignments assist in the detection and correction of errors in the structural annotation (missed exon)8
  • 9. Representatives + putative homologsSuffix finalcds indicates corrected gene model compared to the original gene modelgenerate by the ab-initio gene prediction  Multiple sequence alignments assist in the detection of errors in the structural annotation (false first exon)9
  • 10. Examples of family-specific protein motifs  B-type cyclins have HxKF signature  Cyclin destruction boxes (B1-type cyclin R-[AV]LGDIGN)10
  • 11. Examples of family-specific protein Arabidopsis Rice motifs  D-type cyclins contain LxCxE Rb-binding motif  Low conservation of phylogenetic signal at primary sequence level  General rules are rarely general: exceptions (i.e. missing protein motifs) are frequent and might indicate functional divergence11
  • 12. Classification using phylogenetic tree construction A- and B-type cyclins are mitotic cyclins D-type cyclins are G1-specific H-type cyclins regulate activity of CDK-activating kinases • The complexity of the cyclin gene family appears to be higher in plants than in mammals • Whether there is functional redundancy within A- and B-type cyclins or different regulation (and expression) of some cyclin subclasses remains to be analyzed12
  • 13. Unraveling functional divergence using Genes large-scale expression compendia13 Plant tissues
  • 14. Unraveling functional divergence using large-scale expression compendia A-type cyclin B-type cyclin Genes D-type cyclin14 Plant tissues Genevestigator
  • 15. II. Orthology & paralogy  A major goal of sequence analysis is evolutionary reconstruction. It is critical to distinguish between two principal types of homologous relationships, which differ in their evolutionary history and functional implications.  Orthologs, defined as homologous genes evolved through speciation (~evolutionary counterparts derived from a single ancestral gene in the last common ancestor of the given two species)  Paralogs, which are homologous genes evolved through duplication within the same (perhaps ancestral) genome.  These definitions were first introduced by Fitch (1970)15
  • 16. Orthology & paralogy inference Organism phylogeny Gene phylogenies (species tree) gene duplication a1 A b1 B c1 a1 b) a2 a2 C b2 b1 c2 a) b2 speciation Outparalogs16 Inparalogs c1
  • 17. In- and outparalogy17 Sonnhammer & Koonin: Orthology, paralogy and proposed classification for paralog subtypes
  • 18. Tree reconciliation  The automatic detection of speciation and duplication events using a species tree and gene family tree18
  • 19. III. Types of proteome analysis19
  • 20. The evolution of multi-domain proteins20
  • 21. Interpreting the output of an all- against-all similarity search Metrics for sequence similarity: • E-value, Bit score or percent identity21 • alignment coverage
  • 22. Clustering of similar sequences Proteins = vertices ~ nodes Sequence similarity relationship = edges22
  • 23. Clustering of similar sequences23
  • 24. Advanced methods for protein (orthology) clustering  Sequence similarity-based  COG (RBH) [Tatusov 1997]  InParanoid [Remm et al., 2001]  Tribe-MCL [Van Dongen 2000]  OrthoMCL [Li et al., 2003]  Phylogenetic tree-based  PhylomeDB [Huerta-Cepas et al., 2007]  Ensembl Compara [Vilella et al., 2008]24
  • 25. Overview methodologies BBH Inparanoid COG species overlap25 Gabaldon, 2008 reconciliation
  • 26. IV. Resources26
  • 27. Resources (bis)  Ensembl (Vertebrates)  EnsembGenomes (Metazoa, Protists, Fungi, Plants & Bacteria)  OrthoMCLDB 5 (150 genomes)  YGOB (>15 Fungi)27
  • 28. Hands-on  Goal: identify and characterize gene family members encoding for talin 2 (TLN2) 1. Select Query gene 2. Retrieve homo/orthologs 3. Create multiple sequence alignment 4. Identify conserved positions 5. Create phylogenetic tree and identify ortho/paralogous genes28

×