SlideShare a Scribd company logo
1 of 32
Download to read offline
Computational Prediction of Orthologs

                Melvin Zhang

               School of Computing,
          National University of Singapore


                 May 4, 2011
A gene is a unit of heredity in a living organism
One gene may encode for multiple proteins
Two genes are homologous if they descended from
a common ancestral gene1




    1
        with respect to a specific speciation event
Two genes are homologous if they descended from
a common ancestral gene1

  In practice, homology is determined using sequence alignment.




               Figure: A sequence alignment of two proteins




    1
        with respect to a specific speciation event
Two genes are homologous if they descended from
a common ancestral gene1

  In practice, homology is determined using sequence alignment.




               Figure: A sequence alignment of two proteins


  Have you seen phrases like “high homology”, “significant
  homology”, or “35% homology”?


    1
        with respect to a specific speciation event
Orthologs are due to speciation, paralogs are due
to duplication

               MRCA of G and H
                               speciation


                                        duplication
          g                       h     h

     G                     H
                 main orthologs
                                      paralogs
                   orthologs
Orthologs maintain their function




 Annotate genes with unknown
          functions.
Orthologs maintain their function




 Annotate genes with unknown   Infer protein-protein
          functions.               interactions.
Orthologs are not one-to-one due to lineage
specific gene duplications
  Main orthologs are orthologs that have retained their ancestral
  position.2


                            MRCA of G and H
                                            speciation


                                                     duplication
                     g                         h     h

              G                         H
                              main orthologs
                                                   paralogs
                                orthologs

    2
        Burgetz et al., Evolutionary Bioinformatics 2006
Problem of identifying main orthologs


       Input Position and sequences of genes in 2 genomes
      Output For each gene in their common ancestor, find its
             direct descendant in G and H
Problem of identifying main orthologs


       Input Position and sequences of genes in 2 genomes
      Output For each gene in their common ancestor, find its
             direct descendant in G and H

  Complications
      gene duplication
      gene loss
      horizontal gene transfer
      gene fusion, fission
Three main approaches for finding orthologs



   Graph based      Tree based   Rearrangement based
Bidirectional Best Hit and variants

                     Most popular approach. High
                     level of functional relatedness.a
                     Reciprocal smallest dist
                     use evolutionary distance
                     estimate instead of BLAST
                     scores
                     OMA stable pairs
                     introduce a tolerance interval
                     and stable matching
                        a
                            Altenhoff et al., PLoS CB 2009
EnsemblCompara GeneTrees3




    Figure: Species tree for 4 species on top gene tree for gene A

  Based on reconciliation of gene trees with species tree.
   1. Partition genes into families and construct gene trees
   2. Reconcile each gene tree and species tree
    3
        Vilella et al., Genome Res 2009
MSOAR24




   Figure: Rearrangement scenario between human and mouse


  1. Partition genes into families and assign a unique symbol
  2. Reconstruct the most parsimonious rearrangement
     (inversion, translocation, fusion, fission, duplication)
  3. Extract the corresponding orthologs
   4
       Fu et al., JCB 2007
Can conserved gene neighborhood improve
ortholog predictions?
Human-mouse synteny blocks
  Conserved synteny blocks between human and mouse genome
  generated by the Cinteny web server5




    5
        Sinha and Meller, BMC Bioinformatics 2007
Local synteny criteria6




   Figure: Local synteny: more than one unique match within +/- 3
   genes. Homology defined as BLASTP E-value < 1e-5


   94% of sampled inter-species pairs are identified as orthologs
   by Inparanoid (based on BBH) and local synteny criteria.
     6
         Jin Jun et al., BMC Genomics 2009
Local synteny score (LC)

                               g
  G




  H
                               h

  The local synteny score of g and h is 4 since there are 4 edges
  in the maximum matching.
Smith-Waterman alignment score (SW)
BBH-LS: bidirectional best hits based on linear
combination of SW and LC
                        g
       G




       H
                        h
                        +



  sim(g , h) = (1−f )×SW(g , h)+f ×LC(g , h)
Human-Mouse-Rat dataset



  Input
  Human, mouse, and rat genes downloaded from Ensembl.
  Benchmark
  No “golden” benchmark for true orthology.
  Assume that orthologs are assigned the same gene symbol.
Tuning the BBH-LS method
  sim(g , h) = (1 − f ) × SW(g , h) + f × LC(g , h)
Results for various methods on Human-Mouse




     Figure: TP: same gene symbols, FP: different gene symbols

  More true positives and less false positives than MSOAR2.
Results for various methods on Human-Rat




     Figure: TP: same gene symbols, FP: different gene symbols
Results for various methods on Mouse-Rat




     Figure: TP: same gene symbols, FP: different gene symbols
How local synteny helps


   Human CTSH       RASGRF1     ANKRD34C    Human MSH3     RASGRF2     CKMT2
   chr 15                                   chr 5
             sw = 2466                                         sw = 2003
             ls = 5                                            ls = 5
                                     sw = 5265
                                     ls = 1
  Mouse ANKRD34C RASGRF1      CTSH           Mouse CKMT2   RASGRF2         MSH3
  chr 9                                      chr 13


  Bold edges are the pairing from BBH-LS, thin edges are the
  pairing from BBH.
  BBH paired RASGRF2 (human) to RASGRF1 (mouse) due to
  high SW, corrected by BBH-LS with LC.
Summary: Identifying main orthologs

                      MRCA of G and H
                                      speciation


                                               duplication
                g                        h     h

           G                      H
                        main orthologs
                                             paralogs
                          orthologs


  For each gene in their common ancestor, find its direct
  descendant in G and H
Summary: Three approaches



   Graph based    Tree based   Rearrangement based
BBH-LS: bidirectional best hits based on linear
combination of SW and LC
BBH-LS: bidirectional best hits based on linear
combination of SW and LC


                                     g
                           G




                           H
                                     h

                                    +

More Related Content

What's hot

Orthologs,Paralogs & Xenologs
 Orthologs,Paralogs & Xenologs  Orthologs,Paralogs & Xenologs
Orthologs,Paralogs & Xenologs OsamaZafar16
 
Protien Structure Prediction
Protien Structure PredictionProtien Structure Prediction
Protien Structure PredictionSelimReza76
 
De novo genome assembly - T.Seemann - IMB winter school 2016 - brisbane, au ...
De novo genome assembly  - T.Seemann - IMB winter school 2016 - brisbane, au ...De novo genome assembly  - T.Seemann - IMB winter school 2016 - brisbane, au ...
De novo genome assembly - T.Seemann - IMB winter school 2016 - brisbane, au ...Torsten Seemann
 
Concept of genome mapping
Concept of genome mappingConcept of genome mapping
Concept of genome mappingTenzin t
 
Overlap Layout Consensus assembly
Overlap Layout Consensus assemblyOverlap Layout Consensus assembly
Overlap Layout Consensus assemblyZhuyi Xue
 
Comparative transcriptomics
Comparative transcriptomicsComparative transcriptomics
Comparative transcriptomicsSayak Ghosh
 
Protein motif By KK Sahu Sir
Protein motif By KK Sahu SirProtein motif By KK Sahu Sir
Protein motif By KK Sahu SirKAUSHAL SAHU
 
Gene mapping & its role in evolution
Gene mapping & its role in evolutionGene mapping & its role in evolution
Gene mapping & its role in evolutionmehwishmanzoor4
 
Dotplots for Bioinformatics
Dotplots for BioinformaticsDotplots for Bioinformatics
Dotplots for Bioinformaticsavrilcoghlan
 
Gene Expression Data Analysis
Gene Expression Data AnalysisGene Expression Data Analysis
Gene Expression Data AnalysisJhoirene Clemente
 
SAGE- Serial Analysis of Gene Expression
SAGE- Serial Analysis of Gene ExpressionSAGE- Serial Analysis of Gene Expression
SAGE- Serial Analysis of Gene ExpressionAashish Patel
 
Threading modeling methods
Threading modeling methodsThreading modeling methods
Threading modeling methodsratanvishwas
 
Database in bioinformatics
Database in bioinformaticsDatabase in bioinformatics
Database in bioinformaticsVinaKhan1
 
RNA structure analysis
RNA structure analysis RNA structure analysis
RNA structure analysis Afra Fathima
 
Bioinformatics t5-database searching-v2013_wim_vancriekinge
Bioinformatics t5-database searching-v2013_wim_vancriekingeBioinformatics t5-database searching-v2013_wim_vancriekinge
Bioinformatics t5-database searching-v2013_wim_vancriekingeProf. Wim Van Criekinge
 

What's hot (20)

Orthologs,Paralogs & Xenologs
 Orthologs,Paralogs & Xenologs  Orthologs,Paralogs & Xenologs
Orthologs,Paralogs & Xenologs
 
Protien Structure Prediction
Protien Structure PredictionProtien Structure Prediction
Protien Structure Prediction
 
De novo genome assembly - T.Seemann - IMB winter school 2016 - brisbane, au ...
De novo genome assembly  - T.Seemann - IMB winter school 2016 - brisbane, au ...De novo genome assembly  - T.Seemann - IMB winter school 2016 - brisbane, au ...
De novo genome assembly - T.Seemann - IMB winter school 2016 - brisbane, au ...
 
genomic comparison
genomic comparison genomic comparison
genomic comparison
 
Concept of genome mapping
Concept of genome mappingConcept of genome mapping
Concept of genome mapping
 
Overlap Layout Consensus assembly
Overlap Layout Consensus assemblyOverlap Layout Consensus assembly
Overlap Layout Consensus assembly
 
Comparative transcriptomics
Comparative transcriptomicsComparative transcriptomics
Comparative transcriptomics
 
Protein motif By KK Sahu Sir
Protein motif By KK Sahu SirProtein motif By KK Sahu Sir
Protein motif By KK Sahu Sir
 
Dot matrix seminar
Dot matrix seminarDot matrix seminar
Dot matrix seminar
 
Genome assembly
Genome assemblyGenome assembly
Genome assembly
 
Gene mapping & its role in evolution
Gene mapping & its role in evolutionGene mapping & its role in evolution
Gene mapping & its role in evolution
 
222397 lecture 16 17
222397 lecture 16 17222397 lecture 16 17
222397 lecture 16 17
 
Dotplots for Bioinformatics
Dotplots for BioinformaticsDotplots for Bioinformatics
Dotplots for Bioinformatics
 
Gene Expression Data Analysis
Gene Expression Data AnalysisGene Expression Data Analysis
Gene Expression Data Analysis
 
SAGE- Serial Analysis of Gene Expression
SAGE- Serial Analysis of Gene ExpressionSAGE- Serial Analysis of Gene Expression
SAGE- Serial Analysis of Gene Expression
 
Sequence assembly
Sequence assemblySequence assembly
Sequence assembly
 
Threading modeling methods
Threading modeling methodsThreading modeling methods
Threading modeling methods
 
Database in bioinformatics
Database in bioinformaticsDatabase in bioinformatics
Database in bioinformatics
 
RNA structure analysis
RNA structure analysis RNA structure analysis
RNA structure analysis
 
Bioinformatics t5-database searching-v2013_wim_vancriekinge
Bioinformatics t5-database searching-v2013_wim_vancriekingeBioinformatics t5-database searching-v2013_wim_vancriekinge
Bioinformatics t5-database searching-v2013_wim_vancriekinge
 

Similar to Ortholog assignment

2008 PGSAS G-nomes
2008 PGSAS G-nomes2008 PGSAS G-nomes
2008 PGSAS G-nomesgfb1
 
2008 PGSAS G-nomes
2008 PGSAS G-nomes2008 PGSAS G-nomes
2008 PGSAS G-nomesgfb1
 
Detection of genomic homology in eukaryotic genomes
Detection of genomic homology in eukaryotic genomesDetection of genomic homology in eukaryotic genomes
Detection of genomic homology in eukaryotic genomesKlaas Vandepoele
 
Pshs Upcat Review Bio (Part 2) Answer Guide
Pshs Upcat Review   Bio (Part 2) Answer GuidePshs Upcat Review   Bio (Part 2) Answer Guide
Pshs Upcat Review Bio (Part 2) Answer GuideKent Kawashima
 
Biology 103 Laboratory Exercise – Genetic Problems .docx
Biology 103 Laboratory Exercise – Genetic Problems    .docxBiology 103 Laboratory Exercise – Genetic Problems    .docx
Biology 103 Laboratory Exercise – Genetic Problems .docxAASTHA76
 
Year 10 science genetics
Year 10 science geneticsYear 10 science genetics
Year 10 science geneticssirrainbow
 
10.2 inherritance
10.2 inherritance10.2 inherritance
10.2 inherritancelucascw
 
Comparative Genomics and Visualisation - Part 2
Comparative Genomics and Visualisation - Part 2Comparative Genomics and Visualisation - Part 2
Comparative Genomics and Visualisation - Part 2Leighton Pritchard
 
General Genetics: Gene Segregation and Integration (Part 1)
General Genetics: Gene Segregation and Integration (Part 1)General Genetics: Gene Segregation and Integration (Part 1)
General Genetics: Gene Segregation and Integration (Part 1)Shaina Mavreen Villaroza
 
AP BIOLOGY MENDELIAN GENETICS AND 2 Student Packet.pdf
AP   BIOLOGY MENDELIAN GENETICS AND   2 Student Packet.pdfAP   BIOLOGY MENDELIAN GENETICS AND   2 Student Packet.pdf
AP BIOLOGY MENDELIAN GENETICS AND 2 Student Packet.pdfNancy Rinehart
 
7. Genetics And Inheritance
7. Genetics And Inheritance7. Genetics And Inheritance
7. Genetics And Inheritancerossbiology
 
The Language of the Gene Ontology
The Language of the Gene OntologyThe Language of the Gene Ontology
The Language of the Gene Ontologyrobertstevens65
 
Genetics Lesson
Genetics LessonGenetics Lesson
Genetics LessonSamantha L
 
Population fitness and genetic load of
Population fitness and genetic load ofPopulation fitness and genetic load of
Population fitness and genetic load ofThanka Elango
 
Population fitness and genetic load of
Population fitness and genetic load ofPopulation fitness and genetic load of
Population fitness and genetic load ofThanka Elango
 

Similar to Ortholog assignment (20)

2008 PGSAS G-nomes
2008 PGSAS G-nomes2008 PGSAS G-nomes
2008 PGSAS G-nomes
 
2008 PGSAS G-nomes
2008 PGSAS G-nomes2008 PGSAS G-nomes
2008 PGSAS G-nomes
 
Detection of genomic homology in eukaryotic genomes
Detection of genomic homology in eukaryotic genomesDetection of genomic homology in eukaryotic genomes
Detection of genomic homology in eukaryotic genomes
 
Pshs Upcat Review Bio (Part 2) Answer Guide
Pshs Upcat Review   Bio (Part 2) Answer GuidePshs Upcat Review   Bio (Part 2) Answer Guide
Pshs Upcat Review Bio (Part 2) Answer Guide
 
Biology 103 Laboratory Exercise – Genetic Problems .docx
Biology 103 Laboratory Exercise – Genetic Problems    .docxBiology 103 Laboratory Exercise – Genetic Problems    .docx
Biology 103 Laboratory Exercise – Genetic Problems .docx
 
Hour 1
Hour 1Hour 1
Hour 1
 
Hour 1
Hour 1Hour 1
Hour 1
 
Year 10 science genetics
Year 10 science geneticsYear 10 science genetics
Year 10 science genetics
 
Gene mapping
Gene mappingGene mapping
Gene mapping
 
10.2 inherritance
10.2 inherritance10.2 inherritance
10.2 inherritance
 
Comparative Genomics and Visualisation - Part 2
Comparative Genomics and Visualisation - Part 2Comparative Genomics and Visualisation - Part 2
Comparative Genomics and Visualisation - Part 2
 
General Genetics: Gene Segregation and Integration (Part 1)
General Genetics: Gene Segregation and Integration (Part 1)General Genetics: Gene Segregation and Integration (Part 1)
General Genetics: Gene Segregation and Integration (Part 1)
 
Markers
MarkersMarkers
Markers
 
AP BIOLOGY MENDELIAN GENETICS AND 2 Student Packet.pdf
AP   BIOLOGY MENDELIAN GENETICS AND   2 Student Packet.pdfAP   BIOLOGY MENDELIAN GENETICS AND   2 Student Packet.pdf
AP BIOLOGY MENDELIAN GENETICS AND 2 Student Packet.pdf
 
7. Genetics And Inheritance
7. Genetics And Inheritance7. Genetics And Inheritance
7. Genetics And Inheritance
 
The Language of the Gene Ontology
The Language of the Gene OntologyThe Language of the Gene Ontology
The Language of the Gene Ontology
 
Genetics Lesson
Genetics LessonGenetics Lesson
Genetics Lesson
 
CROSS (PERSILANGAN).pptx
CROSS (PERSILANGAN).pptxCROSS (PERSILANGAN).pptx
CROSS (PERSILANGAN).pptx
 
Population fitness and genetic load of
Population fitness and genetic load ofPopulation fitness and genetic load of
Population fitness and genetic load of
 
Population fitness and genetic load of
Population fitness and genetic load ofPopulation fitness and genetic load of
Population fitness and genetic load of
 

More from Melvin Zhang

How Alan Turing accidentally invented Software
How Alan Turing accidentally invented SoftwareHow Alan Turing accidentally invented Software
How Alan Turing accidentally invented SoftwareMelvin Zhang
 
Solving the TSP for warehouses
Solving the TSP for warehousesSolving the TSP for warehouses
Solving the TSP for warehousesMelvin Zhang
 
Optimize all the things with MiniZinc
Optimize all the things with MiniZincOptimize all the things with MiniZinc
Optimize all the things with MiniZincMelvin Zhang
 
AMKSS Career Conference 2018: Software Engineering
AMKSS Career Conference 2018: Software EngineeringAMKSS Career Conference 2018: Software Engineering
AMKSS Career Conference 2018: Software EngineeringMelvin Zhang
 
Beating us at our own Games
Beating us at our own GamesBeating us at our own Games
Beating us at our own GamesMelvin Zhang
 
Getting started with open source game playing AIs
Getting started with open source game playing AIsGetting started with open source game playing AIs
Getting started with open source game playing AIsMelvin Zhang
 
Programs that Play better than Us
Programs that Play better than UsPrograms that Play better than Us
Programs that Play better than UsMelvin Zhang
 
Building a Turing Machine emulator to explore Turing's great ideas
Building a Turing Machine emulator to explore Turing's great ideasBuilding a Turing Machine emulator to explore Turing's great ideas
Building a Turing Machine emulator to explore Turing's great ideasMelvin Zhang
 
Lessons from Developing an AI to Play Magic: The Gathering
Lessons from Developing an AI to Play Magic: The GatheringLessons from Developing an AI to Play Magic: The Gathering
Lessons from Developing an AI to Play Magic: The GatheringMelvin Zhang
 
Functional programming from first principles
Functional programming from first principlesFunctional programming from first principles
Functional programming from first principlesMelvin Zhang
 
Binary Lambda Calculus and Combinatory Logic
Binary Lambda Calculus and Combinatory LogicBinary Lambda Calculus and Combinatory Logic
Binary Lambda Calculus and Combinatory LogicMelvin Zhang
 
AMKSS Career Conference 2015: Programming
AMKSS Career Conference 2015: ProgrammingAMKSS Career Conference 2015: Programming
AMKSS Career Conference 2015: ProgrammingMelvin Zhang
 
Building a state of the art AI to play Magic: The Gathering
Building a state of the art AI to play Magic: The GatheringBuilding a state of the art AI to play Magic: The Gathering
Building a state of the art AI to play Magic: The GatheringMelvin Zhang
 
Efficient Selectivity and Backup Operators in Monte-Carlo Tree Search
Efficient Selectivity and Backup Operators in Monte-Carlo Tree SearchEfficient Selectivity and Backup Operators in Monte-Carlo Tree Search
Efficient Selectivity and Backup Operators in Monte-Carlo Tree SearchMelvin Zhang
 
Quest for the optimal algorithm
Quest for the optimal algorithmQuest for the optimal algorithm
Quest for the optimal algorithmMelvin Zhang
 
Playing Games by Throwing Dice
Playing Games by Throwing DicePlaying Games by Throwing Dice
Playing Games by Throwing DiceMelvin Zhang
 
Becoming a better problem solver: a CS perspective
Becoming a better problem solver: a CS perspectiveBecoming a better problem solver: a CS perspective
Becoming a better problem solver: a CS perspectiveMelvin Zhang
 
Building pipelines with Make
Building pipelines with MakeBuilding pipelines with Make
Building pipelines with MakeMelvin Zhang
 
Opportunities in STEM
Opportunities in STEMOpportunities in STEM
Opportunities in STEMMelvin Zhang
 

More from Melvin Zhang (19)

How Alan Turing accidentally invented Software
How Alan Turing accidentally invented SoftwareHow Alan Turing accidentally invented Software
How Alan Turing accidentally invented Software
 
Solving the TSP for warehouses
Solving the TSP for warehousesSolving the TSP for warehouses
Solving the TSP for warehouses
 
Optimize all the things with MiniZinc
Optimize all the things with MiniZincOptimize all the things with MiniZinc
Optimize all the things with MiniZinc
 
AMKSS Career Conference 2018: Software Engineering
AMKSS Career Conference 2018: Software EngineeringAMKSS Career Conference 2018: Software Engineering
AMKSS Career Conference 2018: Software Engineering
 
Beating us at our own Games
Beating us at our own GamesBeating us at our own Games
Beating us at our own Games
 
Getting started with open source game playing AIs
Getting started with open source game playing AIsGetting started with open source game playing AIs
Getting started with open source game playing AIs
 
Programs that Play better than Us
Programs that Play better than UsPrograms that Play better than Us
Programs that Play better than Us
 
Building a Turing Machine emulator to explore Turing's great ideas
Building a Turing Machine emulator to explore Turing's great ideasBuilding a Turing Machine emulator to explore Turing's great ideas
Building a Turing Machine emulator to explore Turing's great ideas
 
Lessons from Developing an AI to Play Magic: The Gathering
Lessons from Developing an AI to Play Magic: The GatheringLessons from Developing an AI to Play Magic: The Gathering
Lessons from Developing an AI to Play Magic: The Gathering
 
Functional programming from first principles
Functional programming from first principlesFunctional programming from first principles
Functional programming from first principles
 
Binary Lambda Calculus and Combinatory Logic
Binary Lambda Calculus and Combinatory LogicBinary Lambda Calculus and Combinatory Logic
Binary Lambda Calculus and Combinatory Logic
 
AMKSS Career Conference 2015: Programming
AMKSS Career Conference 2015: ProgrammingAMKSS Career Conference 2015: Programming
AMKSS Career Conference 2015: Programming
 
Building a state of the art AI to play Magic: The Gathering
Building a state of the art AI to play Magic: The GatheringBuilding a state of the art AI to play Magic: The Gathering
Building a state of the art AI to play Magic: The Gathering
 
Efficient Selectivity and Backup Operators in Monte-Carlo Tree Search
Efficient Selectivity and Backup Operators in Monte-Carlo Tree SearchEfficient Selectivity and Backup Operators in Monte-Carlo Tree Search
Efficient Selectivity and Backup Operators in Monte-Carlo Tree Search
 
Quest for the optimal algorithm
Quest for the optimal algorithmQuest for the optimal algorithm
Quest for the optimal algorithm
 
Playing Games by Throwing Dice
Playing Games by Throwing DicePlaying Games by Throwing Dice
Playing Games by Throwing Dice
 
Becoming a better problem solver: a CS perspective
Becoming a better problem solver: a CS perspectiveBecoming a better problem solver: a CS perspective
Becoming a better problem solver: a CS perspective
 
Building pipelines with Make
Building pipelines with MakeBuilding pipelines with Make
Building pipelines with Make
 
Opportunities in STEM
Opportunities in STEMOpportunities in STEM
Opportunities in STEM
 

Ortholog assignment

  • 1. Computational Prediction of Orthologs Melvin Zhang School of Computing, National University of Singapore May 4, 2011
  • 2. A gene is a unit of heredity in a living organism
  • 3. One gene may encode for multiple proteins
  • 4. Two genes are homologous if they descended from a common ancestral gene1 1 with respect to a specific speciation event
  • 5. Two genes are homologous if they descended from a common ancestral gene1 In practice, homology is determined using sequence alignment. Figure: A sequence alignment of two proteins 1 with respect to a specific speciation event
  • 6. Two genes are homologous if they descended from a common ancestral gene1 In practice, homology is determined using sequence alignment. Figure: A sequence alignment of two proteins Have you seen phrases like “high homology”, “significant homology”, or “35% homology”? 1 with respect to a specific speciation event
  • 7. Orthologs are due to speciation, paralogs are due to duplication MRCA of G and H speciation duplication g h h G H main orthologs paralogs orthologs
  • 8. Orthologs maintain their function Annotate genes with unknown functions.
  • 9. Orthologs maintain their function Annotate genes with unknown Infer protein-protein functions. interactions.
  • 10. Orthologs are not one-to-one due to lineage specific gene duplications Main orthologs are orthologs that have retained their ancestral position.2 MRCA of G and H speciation duplication g h h G H main orthologs paralogs orthologs 2 Burgetz et al., Evolutionary Bioinformatics 2006
  • 11. Problem of identifying main orthologs Input Position and sequences of genes in 2 genomes Output For each gene in their common ancestor, find its direct descendant in G and H
  • 12. Problem of identifying main orthologs Input Position and sequences of genes in 2 genomes Output For each gene in their common ancestor, find its direct descendant in G and H Complications gene duplication gene loss horizontal gene transfer gene fusion, fission
  • 13. Three main approaches for finding orthologs Graph based Tree based Rearrangement based
  • 14. Bidirectional Best Hit and variants Most popular approach. High level of functional relatedness.a Reciprocal smallest dist use evolutionary distance estimate instead of BLAST scores OMA stable pairs introduce a tolerance interval and stable matching a Altenhoff et al., PLoS CB 2009
  • 15. EnsemblCompara GeneTrees3 Figure: Species tree for 4 species on top gene tree for gene A Based on reconciliation of gene trees with species tree. 1. Partition genes into families and construct gene trees 2. Reconcile each gene tree and species tree 3 Vilella et al., Genome Res 2009
  • 16. MSOAR24 Figure: Rearrangement scenario between human and mouse 1. Partition genes into families and assign a unique symbol 2. Reconstruct the most parsimonious rearrangement (inversion, translocation, fusion, fission, duplication) 3. Extract the corresponding orthologs 4 Fu et al., JCB 2007
  • 17. Can conserved gene neighborhood improve ortholog predictions?
  • 18. Human-mouse synteny blocks Conserved synteny blocks between human and mouse genome generated by the Cinteny web server5 5 Sinha and Meller, BMC Bioinformatics 2007
  • 19. Local synteny criteria6 Figure: Local synteny: more than one unique match within +/- 3 genes. Homology defined as BLASTP E-value < 1e-5 94% of sampled inter-species pairs are identified as orthologs by Inparanoid (based on BBH) and local synteny criteria. 6 Jin Jun et al., BMC Genomics 2009
  • 20. Local synteny score (LC) g G H h The local synteny score of g and h is 4 since there are 4 edges in the maximum matching.
  • 22. BBH-LS: bidirectional best hits based on linear combination of SW and LC g G H h + sim(g , h) = (1−f )×SW(g , h)+f ×LC(g , h)
  • 23. Human-Mouse-Rat dataset Input Human, mouse, and rat genes downloaded from Ensembl. Benchmark No “golden” benchmark for true orthology. Assume that orthologs are assigned the same gene symbol.
  • 24. Tuning the BBH-LS method sim(g , h) = (1 − f ) × SW(g , h) + f × LC(g , h)
  • 25. Results for various methods on Human-Mouse Figure: TP: same gene symbols, FP: different gene symbols More true positives and less false positives than MSOAR2.
  • 26. Results for various methods on Human-Rat Figure: TP: same gene symbols, FP: different gene symbols
  • 27. Results for various methods on Mouse-Rat Figure: TP: same gene symbols, FP: different gene symbols
  • 28. How local synteny helps Human CTSH RASGRF1 ANKRD34C Human MSH3 RASGRF2 CKMT2 chr 15 chr 5 sw = 2466 sw = 2003 ls = 5 ls = 5 sw = 5265 ls = 1 Mouse ANKRD34C RASGRF1 CTSH Mouse CKMT2 RASGRF2 MSH3 chr 9 chr 13 Bold edges are the pairing from BBH-LS, thin edges are the pairing from BBH. BBH paired RASGRF2 (human) to RASGRF1 (mouse) due to high SW, corrected by BBH-LS with LC.
  • 29. Summary: Identifying main orthologs MRCA of G and H speciation duplication g h h G H main orthologs paralogs orthologs For each gene in their common ancestor, find its direct descendant in G and H
  • 30. Summary: Three approaches Graph based Tree based Rearrangement based
  • 31. BBH-LS: bidirectional best hits based on linear combination of SW and LC
  • 32. BBH-LS: bidirectional best hits based on linear combination of SW and LC g G H h +