Ortholog assignment
Upcoming SlideShare
Loading in...5
×
 

Ortholog assignment

on

  • 819 views

 

Statistics

Views

Total Views
819
Views on SlideShare
819
Embed Views
0

Actions

Likes
1
Downloads
6
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Ortholog assignment Ortholog assignment Presentation Transcript

  • Computational Prediction of Orthologs Melvin Zhang School of Computing, National University of Singapore May 4, 2011
  • A gene is a unit of heredity in a living organism
  • One gene may encode for multiple proteins
  • Two genes are homologous if they descended froma common ancestral gene1 1 with respect to a specific speciation event
  • Two genes are homologous if they descended froma common ancestral gene1 In practice, homology is determined using sequence alignment. Figure: A sequence alignment of two proteins 1 with respect to a specific speciation event
  • Two genes are homologous if they descended froma common ancestral gene1 In practice, homology is determined using sequence alignment. Figure: A sequence alignment of two proteins Have you seen phrases like “high homology”, “significant homology”, or “35% homology”? 1 with respect to a specific speciation event
  • Orthologs are due to speciation, paralogs are dueto duplication MRCA of G and H speciation duplication g h h G H main orthologs paralogs orthologs
  • Orthologs maintain their function Annotate genes with unknown functions.
  • Orthologs maintain their function Annotate genes with unknown Infer protein-protein functions. interactions.
  • Orthologs are not one-to-one due to lineagespecific gene duplications Main orthologs are orthologs that have retained their ancestral position.2 MRCA of G and H speciation duplication g h h G H main orthologs paralogs orthologs 2 Burgetz et al., Evolutionary Bioinformatics 2006
  • Problem of identifying main orthologs Input Position and sequences of genes in 2 genomes Output For each gene in their common ancestor, find its direct descendant in G and H
  • Problem of identifying main orthologs Input Position and sequences of genes in 2 genomes Output For each gene in their common ancestor, find its direct descendant in G and H Complications gene duplication gene loss horizontal gene transfer gene fusion, fission
  • Three main approaches for finding orthologs Graph based Tree based Rearrangement based
  • Bidirectional Best Hit and variants Most popular approach. High level of functional relatedness.a Reciprocal smallest dist use evolutionary distance estimate instead of BLAST scores OMA stable pairs introduce a tolerance interval and stable matching a Altenhoff et al., PLoS CB 2009
  • EnsemblCompara GeneTrees3 Figure: Species tree for 4 species on top gene tree for gene A Based on reconciliation of gene trees with species tree. 1. Partition genes into families and construct gene trees 2. Reconcile each gene tree and species tree 3 Vilella et al., Genome Res 2009
  • MSOAR24 Figure: Rearrangement scenario between human and mouse 1. Partition genes into families and assign a unique symbol 2. Reconstruct the most parsimonious rearrangement (inversion, translocation, fusion, fission, duplication) 3. Extract the corresponding orthologs 4 Fu et al., JCB 2007
  • Can conserved gene neighborhood improveortholog predictions?
  • Human-mouse synteny blocks Conserved synteny blocks between human and mouse genome generated by the Cinteny web server5 5 Sinha and Meller, BMC Bioinformatics 2007
  • Local synteny criteria6 Figure: Local synteny: more than one unique match within +/- 3 genes. Homology defined as BLASTP E-value < 1e-5 94% of sampled inter-species pairs are identified as orthologs by Inparanoid (based on BBH) and local synteny criteria. 6 Jin Jun et al., BMC Genomics 2009
  • Local synteny score (LC) g G H h The local synteny score of g and h is 4 since there are 4 edges in the maximum matching.
  • Smith-Waterman alignment score (SW)
  • BBH-LS: bidirectional best hits based on linearcombination of SW and LC g G H h + sim(g , h) = (1−f )×SW(g , h)+f ×LC(g , h)
  • Human-Mouse-Rat dataset Input Human, mouse, and rat genes downloaded from Ensembl. Benchmark No “golden” benchmark for true orthology. Assume that orthologs are assigned the same gene symbol.
  • Tuning the BBH-LS method sim(g , h) = (1 − f ) × SW(g , h) + f × LC(g , h)
  • Results for various methods on Human-Mouse Figure: TP: same gene symbols, FP: different gene symbols More true positives and less false positives than MSOAR2.
  • Results for various methods on Human-Rat Figure: TP: same gene symbols, FP: different gene symbols
  • Results for various methods on Mouse-Rat Figure: TP: same gene symbols, FP: different gene symbols
  • How local synteny helps Human CTSH RASGRF1 ANKRD34C Human MSH3 RASGRF2 CKMT2 chr 15 chr 5 sw = 2466 sw = 2003 ls = 5 ls = 5 sw = 5265 ls = 1 Mouse ANKRD34C RASGRF1 CTSH Mouse CKMT2 RASGRF2 MSH3 chr 9 chr 13 Bold edges are the pairing from BBH-LS, thin edges are the pairing from BBH. BBH paired RASGRF2 (human) to RASGRF1 (mouse) due to high SW, corrected by BBH-LS with LC.
  • Summary: Identifying main orthologs MRCA of G and H speciation duplication g h h G H main orthologs paralogs orthologs For each gene in their common ancestor, find its direct descendant in G and H
  • Summary: Three approaches Graph based Tree based Rearrangement based
  • BBH-LS: bidirectional best hits based on linearcombination of SW and LC
  • BBH-LS: bidirectional best hits based on linearcombination of SW and LC g G H h +