Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Detection of genomic homology in eukaryotic genomes

1,100 views

Published on

i-ADHoRe 3.0--fast and sensitive detection of genomic homology in extremely large data sets.
Proost S, Fostier J, De Witte D, Dhoedt B, Demeester P, Van de Peer Y, Vandepoele K.
Nucleic Acids Res. 2012 Jan;40(2):e11.

Comparative genomics is a powerful means to gain insight into the evolutionary processes that shape the genomes of related species. As the number of sequenced genomes increases, the development of software to perform accurate cross-species analyses becomes indispensable. However, many implementations that have the ability to compare multiple genomes exhibit unfavorable computational and memory requirements, limiting the number of genomes that can be analyzed in one run. Here, we present a software package to unveil genomic homology based on the identification of conservation of gene content and gene order (collinearity), i-ADHoRe 3.0, and its application to eukaryotic genomes. The use of efficient algorithms and support for parallel computing enable the analysis of large-scale data sets. Unlike other tools, i-ADHoRe can process the Ensembl data set, containing 49 species, in 1 h. Furthermore, the profile search is more sensitive to detect degenerate genomic homology than chaining pairwise collinearity information based on transitive homology. From ultra-conserved collinear regions between mammals and birds, by integrating coexpression information and protein-protein interactions, we identified more than 400 regions in the human genome showing significant functional coherence. The different algorithmical improvements ensure that i-ADHoRe 3.0 will remain a powerful tool to study genome evolution.

Published in: Education
  • Be the first to comment

Detection of genomic homology in eukaryotic genomes

  1. 1. DETECTION OF GENOMIC HOMOLOGY INEUKARYOTIC GENOMESChallenges & applications Klaas Vandepoele Strasbourg, October 16th 2012 Department of Plant Biotechnology and Bioinformatics, Ghent University Department of Plant Systems Biology, VIB http://twitter.com/plaza_genomics
  2. 2. http://bioinformatics.psb.ugent.be/cig/
  3. 3. KLAAS VANDEPOELE Klaas Vandepoele was appointed Tenure Track Professor at Ghent University in 2011 (MRP Nucleotides 2 Networks) within the Department of Plant Biotechnology and Bioinformatics (Ghent University - VIB). He is currently (co-)promoter of 3 PhD students. His scientific objectives are to extract biological knowledge from large-scale experimental data sets using data integration and comparative genomics. Through the development and application of various bioinformatics tools, including comparative sequence analysis, cross-species gene expression analysis, ChIP-Seq and cis-regulatory elements analysis, he tries to identify new aspects of genome biology, especially in the area of gene function prediction and gene regulation. Recently developed tools include ATCOECIS, a toolbox for co-expression and cis-regulatory element analysis, and PLAZA, a resource for plant comparative genomics (>1,800 visits per month coming from >85 different countries). During the last 10 years, Klaas Vandepoele published >50 papers in international peer- reviewed scientific journals, of which 80% in journals with an impact factor (IF) >5 and 33% in journals with an IF >10. His H-index is 25.
  4. 4. OVERVIEW Cross-species genome analysis Detection of genomic homology using i-ADHoRe 3.0 Applications  Plant genomes & WGD  Vertebrate genomes Conclusions
  5. 5. 1. CROSS-SPECIES GENOME ANALYSIS Alignment of homologous regions  Inter-genomic: aligning genomic sequences from different species  Intra-genomic aligning genomic sequences from the same species Different levels of resolution  Comparative mapping (markers)  Synteny (~ gene content)  Colinearity (gene content + order conservation)  DNA-based alignments (base-to-base mapping)
  6. 6. COMPARATIVE SEQUENCE ANALYSIS Ancestral genomeo Genome conservation: transfer knowledge gained from model organisms to cropso Genome variation: understand how genomes change over time in order to identify evolutionary processes and constraints Contemporary specieso Detection of new functional elements, both coding as well as non- coding Hardison, PLoS Biology 2003
  7. 7. HUMAN – MOUSE - RAT resolution
  8. 8. HUMAN – MOUSE ORTHOLOGOUS REGIONS resolution Genome translocations associated Comparative with human-mouse speciation mapping HumanMouse chr IV www.ensembl.org
  9. 9. HUMAN GENOME BROWSER resolutionConserved gene Human chr Icontent & order Mouse chr IV Gene loss and insertions in orthologous segments since human-mouse speciationEST/cDNAsimilaritiesGenomesimilarities Human gene model
  10. 10. HUMAN – MOUSE BASE-TO-BASE MAPPING resolution  Functional sequences (e.g. exons) evolve slower than non-functional ones (e.g. introns) due to natural selection against mutations in these regions  Consequently, functional elements, both coding and non-coding, are unusually well conserved in orthologous regions Blue: coding exons GT donor AG acceptor
  11. 11. 2. DETECTION OF GENOMIC HOMOLOGY International Chicken Genome Sequencing ConsortiumA. thaliana – A. lyrataProost et al., 2011 Poplar - Tuskan et al., 2006
  12. 12. GENE COLINEARITY
  13. 13. MATRIX REPRESENTATION
  14. 14. MAP-BASED APPROACH Chromosome 1 • Represent chromosomes as sorted gene lists • Identify all homologous gene pairs between chromosomesChromosome 2 (all-against-all BLASTP). • Score pairs of homologues in matrix • Statistical filtering 1Gene Homology Matrix (GHM) 2 Vandepoele et al. (2002) Genome Research
  15. 15. In an actual genomethis becomes complexGood statistical modelto find biologicallyrelevant regions
  16. 16. GENOMIC PROFILES pairwise multiple Simillion et al. (2004) Genome Research
  17. 17. GRAPH-BASED ALIGNMENT INCL. CONFLICTRESOLUTION Needleman-Wunsch Greedy graph-based Fostier, … & Vandepoele, Bioinformatics 2011
  18. 18. I-ADHORE ALGORITHM Proost, Fostier … & Vandepoele, NAR 2011
  19. 19. OUTPUT: MULTIPLE HOMOLOGOUS SEGMENTS - MULTIPLICONHSMMGGTN Mm2 Gg20 Hs20 Gg2 Mm18 Hs18 Tn15 Within and between species gene colinearity!
  20. 20. PROFILES OFFER IMPROVED SENSITIVITY TODETECT DEGENERATE GENOMIC HOMOLOGY Sensitivity (#homologous segments) Proost, Fostier … & Vandepoele, NAR 2011
  21. 21. I-ADHORE 3.0 Speed & memory footprint MCSCan: Tang et al. 2008 Cyntenator: Rödelsperger et al. 2010 Proost, Fostier … & Vandepoele, NAR 2011
  22. 22. 3. APPLICATIONS IN PLANTSINTEGRATION IN PLAZA 2.5HTTP://BIOINFORMATICS.PSB.UGENT.BE/PLAZAPLANT COMPARATIVE GENOMICS PLATFORM25 PLANT SPECIESBLAST PAIRS, GENE FAMILIES &I-ADHORE PRE-COMPUTEDCONNECTED TO SEVERAL TOOLS TOVISUALIZE THE I-ADHORE DATA
  23. 23. Gene family analysisGenome analysis >20 tools available Proost et al., Plant Cell 2009; Van Bel et al., 2012
  24. 24. GENOME-WIDE COLINEARITYZ. mays WGDotplot O. sativa
  25. 25. MULTI-SPECIES COLINEARITY profile
  26. 26. Source: Y. Van de Peer
  27. 27. TRIPLICATED GENOME STRUCTURE VITIS
  28. 28. TRACES OF AN ANCIENT HEXA-PLOIDIZATION INVITIS
  29. 29. 1:4 COLINEARITY BETWEEN VITIS ANDARABIDOPSIS
  30. 30. RESOLVING A SERIES OF ANCIENT AND RECENTWGDS IN DICOTS Arabidopsis a Arabidopsis b Arabidopsis a Arabidopsis b Papaya Poplar a Poplar b Vitis << < > >>
  31. 31. INTERMEZZO – WGDS & THE QUEST FOR PLANTORTHOLOGS •Tree-based orthologs (TROG) inferred using tree reconciliation •Orthologous gene families (ORTHO) inferred using OrthoMCL •Anchor points refer to gene-based colinearity between species Van Bel et al., •Best hit families (BHIF) inferred from Blast hits including inparalogs Plant Physiology 2012
  32. 32. COMPLEX GENE ORTHOLOGY RELATIONSHIPS IN PLANTS Query species: A. thalianaTarget species
  33. 33. SORTING OUT PLANT (CO-)ORTHOLOGS USINGEXPRESSION CONTEXT CONSERVATION Protein integrative orthology Expression Context Conservation scores (p-value < 0.05) Inparalogs (species-specific duplicates)
  34. 34. 3. APPLICATIONS IN VERTEBRATE GENOMEEVOLUTIONOVERVIEW OF THE ENSEMBLDATASET (RELEASE 57)RESOURCE FOR ANIMAL GENOMES(& OUTGROUPS)CONTAINS 49 SPECIES832 666 PROTEIN CODING GENES70 161 CHROMOSOMES/SCAFFOLDSHTTP://WWW.ENSEMBL.ORG/HUBBARD ET AL., 2009
  35. 35. RESULTS• Runtime on 32 CPUs (4 nodes with 8 cores) ~ 4.5 hours (several months with previous version)• Memory usage ~ 4 GByte / core• Search results:  237 292 multiplicons  5 204 391 anchor points• Up-to 46 colinear regions could be grouped into one large multiplicon  Unsurprisingly the largest cluster in these animal genomes was the well known hox-cluster  The hox cluster has a highly conserved order as order is strongly linked with the development of the body plan
  36. 36. ENRICHED COEXPRESSION VS CONSERVED COLINEARITYCONSERVED COLINEAR REGIONS - IS THERE A LINK WITHFUNCTIONAL CLUSTERS? Human Chromosome 4 Bars indicate the number of species the region is conserved in Dark regions indicate significant co-expression
  37. 37. BIOLOGICAL SIGNIFICANCE OF HIGHLY CONSERVED COLLINEARREGIONS
  38. 38. 4. CONCLUSIONS1 i-ADHoRe 3.0 - Algorithmical and technical improvements now allow the analysis of extremely large datasets Application on plant species & the integration in PLAZA comparative2 genomics resource3 It is now possible to analyze all Ensembl genomes in a single run
  39. 39. ACKNOWLEDGEMENTS Sebastian Proost Jan Fostier Michiel Van Bel Yves Van de Peer Piet Demeester
  40. 40. ACKNOWLEDGEMENTSFurther reading Fostier J*, Proost S*, Dhoedt B, Saeys Y, Demeester P, Van de Peer Y, Vandepoele K (2011) A greedy, graph-based algorithm for the alignment of multiple homologous gene lists. Bioinformatics 27: 749-756. Proost, S.*, Fostier, J.*, De Witte, D., Dhoedt, B., Demeester, P., Van de Peer, Y., and Vandepoele, K. (2012). i-ADHoRe 3.0--fast and sensitive detection of genomic homology in extremely large data sets. Nucleic Acids Res 40, e11. Van Bel, M.*, Proost, S.*, Wischnitzki, E., Movahedi, S., Scheerlinck, C., Van de Peer, Y., and Vandepoele, K. (2012). Dissecting plant genomes with the PLAZA comparative genomics platform. Plant Physiol 158, 590-600.

×