How to make a monkey: functional adaptation in the primate genome


Published on

Presentation to the "Workshop on Parallel and Distributed Processing of Large Genome Data", 22 February 2011, DBCLS, Tokyo ( The presentation describes the methodological issues surrounding the design of a workflow for assigning orthology among primate genomes, testing them for evidence of selection and interpreting the results using the Gene Ontology.

Published in: Technology, Business
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

How to make a monkey: functional adaptation in the primate genome

  1. 1. How to make a monkey: functional adaptation in the primate genome<br />Rutger Vos<br />Marie Curie Research Fellow<br />
  2. 2. Outline<br />Introduction<br />The question <br />Primate genomes<br />Homology across genomes<br />Finding evidence for natural selection<br />Characterizing gene function<br />Methods<br />Computational infrastructure<br />Basic workflow steps<br />Workflow design<br />Results<br />Preliminary findings<br />Conclusions<br />Acknowledgements<br />
  3. 3. The question<br />Which gene functions were under directional selection in primate evolutionary history?<br />
  4. 4. Primate genomes<br />Homo sapiens<br />Human<br />Pan troglodytes<br />Chimpanzee<br />Gorilla gorilla<br />Gorilla<br />Pongopygmaeus<br />Orangutan<br />Macacamulatta<br />Rhesus monkey<br />Callithrixjacchus<br />Common marmoset<br />Tarsiussyrichta<br />Philippine tarsier<br />Otolemurgarnettii<br />Greater galago<br />Microcebusmurinus<br />Gray mouse lemur<br />
  5. 5. Primate genomes<br />Bush babies<br />Lemurs<br />Tarsiers<br />New world monkeys<br />Apes<br />Old world monkeys<br />~65 MYA (K/T boundary)<br />
  6. 6. Homology: Orthologs and paralogs<br />
  7. 7. Evidence of selection: dN/dS ratio<br />
  8. 8. Evidence of selection: dN/dS ratio<br />Or Ka/Ks or ω,the ratio of non-synonymous over synonymous substitutions<br />dN/dS > 1: positive selection<br />dN/dS ≈ 1: neutral evolution?<br />dN/dS < 1: stabilizing selection<br />
  9. 9. Gene function: the Gene Ontology<br />GO is a hierarchical database of terms for genes<br />Terms are structured in a directed acyclic graphs<br />Terms are organized in three domains: biological process, cellular component and molecular function<br />
  10. 10. Gene function: the Gene Ontology<br />
  11. 11. Methods: Basic workflow steps<br />Protein BLAST all vs. all<br />Find Reciprocal Best protein Hit clusters<br />Protein align RBH clusters<br />Backtranslate protein alignments to cDNAs<br />Perform dN/dS ratio tests on all branches<br />Lookup GO terms for sequence GIs<br />Interpret results<br />
  12. 12. Methods: Basic workflow design<br />Build a single BLAST database of all genomes, then,<br />To parallelize the analysis:<br />Split the data into nine sets (for nine species)<br />Split each of nine genomes into files for each gene (~20k files per species)<br />Process files in parallel<br />
  13. 13. Methods: File processing<br />setenv<br />qsub<br /><br />…<br />…<br />make -j 4 all<br />Makefile<br />setenv<br />qsub<br /><br />
  14. 14. Methods: Software used<br />NCBI standalone BLAST (formatdb, blastp, fastacmd)<br />Muscle<br />GeneWise<br />HyPhy<br />BioPerl/Bio::Phylo (for parsing, logging and wrapping, all scripts under svn)<br />
  15. 15. Methods: Project organization<br />From: Noble, W.S., 2009. A Quick Guide to Organizing Computational Biology Projects. PLoSComput. Biol. 5(7). <br />
  16. 16. Methods: ThamesBlue hardware<br />One of the 100 fastest supercomputers in the world<br />IBM BladeCenter cluster <br />JS21 and JS20 Blade servers with 60TB of storage connected via a Myrinet 2G network. <br />SuSE Linux Enterprise Server <br />General Parallel File System<br />Batch jobs managed with Torque.<br />
  17. 17. Results<br />5952 loci with >= 2 RBHs relative to humans<br />2346 loci with dN/dS deviation somewhere (p<0.05)<br />Homo sapiens<br />Pan troglodytes<br />Gorilla gorilla<br />Pongopygmaeus<br />Macacamulatta<br />Callithrixjacchus<br />Tarsiussyrichta<br />Microcebusmurinus<br />Otolemurgarnettii<br />
  18. 18. Results: some interesting terms<br />Forebrain development, lifespan (and apoptosis), learning and social behavior in apes, including “deep” nodes<br />Eye development in “higher” monkeys<br />Terms to do with pregnancy<br />Terms to do with male-male competition<br />Etc. Etc. (…lots of hard to interpret molecular processes, of course…)<br />
  19. 19. “Brain genes”<br />
  20. 20. Visual system<br />Primates have a highly variable visual system:<br />Old World monkeys: three types of cones (unique among mammals)<br />New World monkeys: females trichromatic, males dichromatic<br />
  21. 21. Biological conclusions<br />Very, very, very, very preliminary: highest dN/dS ratios in functions for which there are multiple “optima” among primates:<br />Different placentationsystems<br />Different mating systems<br />Different visual systems<br />Different life histories and brain mass investments<br />
  22. 22. Methodological conclusions<br />Nine genomes is not that much. As FASTA files, it’s a 14Gb zipped archive (AA+cDNA).<br />The problem was trivially parallelizable, so I didn’t use any MPI versions of softwares.<br />Simple, consistent workflow and project design conventions are a lifesaver.<br />Make each step small enough so you can rerun it, because you will.<br />
  23. 23. Summary<br />I discussed:<br />Primate evolution and adaptation<br />Ortholog-finding<br />Alignment (multiple proteins, cDNA to protein)<br />Tree-based dN/dS ratio tests<br />Gene Ontology term enrichment<br />Methodological challenges<br />
  24. 24. Acknowledgements<br />Funding: FP7-PEOPLE-IEF-2008/N°237046<br />DBCLS for their kind invitation<br />Mark Pagel, Andrew Meade for discussion and help designing the workflow<br />