Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

How to make a monkey: functional adaptation in the primate genome


Published on

Presentation to the "Workshop on Parallel and Distributed Processing of Large Genome Data", 22 February 2011, DBCLS, Tokyo ( The presentation describes the methodological issues surrounding the design of a workflow for assigning orthology among primate genomes, testing them for evidence of selection and interpreting the results using the Gene Ontology.

Published in: Technology, Business
  • Be the first to comment

  • Be the first to like this

How to make a monkey: functional adaptation in the primate genome

  1. 1. How to make a monkey: functional adaptation in the primate genome<br />Rutger Vos<br />Marie Curie Research Fellow<br />
  2. 2. Outline<br />Introduction<br />The question <br />Primate genomes<br />Homology across genomes<br />Finding evidence for natural selection<br />Characterizing gene function<br />Methods<br />Computational infrastructure<br />Basic workflow steps<br />Workflow design<br />Results<br />Preliminary findings<br />Conclusions<br />Acknowledgements<br />
  3. 3. The question<br />Which gene functions were under directional selection in primate evolutionary history?<br />
  4. 4. Primate genomes<br />Homo sapiens<br />Human<br />Pan troglodytes<br />Chimpanzee<br />Gorilla gorilla<br />Gorilla<br />Pongopygmaeus<br />Orangutan<br />Macacamulatta<br />Rhesus monkey<br />Callithrixjacchus<br />Common marmoset<br />Tarsiussyrichta<br />Philippine tarsier<br />Otolemurgarnettii<br />Greater galago<br />Microcebusmurinus<br />Gray mouse lemur<br />
  5. 5. Primate genomes<br />Bush babies<br />Lemurs<br />Tarsiers<br />New world monkeys<br />Apes<br />Old world monkeys<br />~65 MYA (K/T boundary)<br />
  6. 6. Homology: Orthologs and paralogs<br />
  7. 7. Evidence of selection: dN/dS ratio<br />
  8. 8. Evidence of selection: dN/dS ratio<br />Or Ka/Ks or ω,the ratio of non-synonymous over synonymous substitutions<br />dN/dS > 1: positive selection<br />dN/dS ≈ 1: neutral evolution?<br />dN/dS < 1: stabilizing selection<br />
  9. 9. Gene function: the Gene Ontology<br />GO is a hierarchical database of terms for genes<br />Terms are structured in a directed acyclic graphs<br />Terms are organized in three domains: biological process, cellular component and molecular function<br />
  10. 10. Gene function: the Gene Ontology<br />
  11. 11. Methods: Basic workflow steps<br />Protein BLAST all vs. all<br />Find Reciprocal Best protein Hit clusters<br />Protein align RBH clusters<br />Backtranslate protein alignments to cDNAs<br />Perform dN/dS ratio tests on all branches<br />Lookup GO terms for sequence GIs<br />Interpret results<br />
  12. 12. Methods: Basic workflow design<br />Build a single BLAST database of all genomes, then,<br />To parallelize the analysis:<br />Split the data into nine sets (for nine species)<br />Split each of nine genomes into files for each gene (~20k files per species)<br />Process files in parallel<br />
  13. 13. Methods: File processing<br />setenv<br />qsub<br /><br />…<br />…<br />make -j 4 all<br />Makefile<br />setenv<br />qsub<br /><br />
  14. 14. Methods: Software used<br />NCBI standalone BLAST (formatdb, blastp, fastacmd)<br />Muscle<br />GeneWise<br />HyPhy<br />BioPerl/Bio::Phylo (for parsing, logging and wrapping, all scripts under svn)<br />
  15. 15. Methods: Project organization<br />From: Noble, W.S., 2009. A Quick Guide to Organizing Computational Biology Projects. PLoSComput. Biol. 5(7). <br />
  16. 16. Methods: ThamesBlue hardware<br />One of the 100 fastest supercomputers in the world<br />IBM BladeCenter cluster <br />JS21 and JS20 Blade servers with 60TB of storage connected via a Myrinet 2G network. <br />SuSE Linux Enterprise Server <br />General Parallel File System<br />Batch jobs managed with Torque.<br />
  17. 17. Results<br />5952 loci with >= 2 RBHs relative to humans<br />2346 loci with dN/dS deviation somewhere (p<0.05)<br />Homo sapiens<br />Pan troglodytes<br />Gorilla gorilla<br />Pongopygmaeus<br />Macacamulatta<br />Callithrixjacchus<br />Tarsiussyrichta<br />Microcebusmurinus<br />Otolemurgarnettii<br />
  18. 18. Results: some interesting terms<br />Forebrain development, lifespan (and apoptosis), learning and social behavior in apes, including “deep” nodes<br />Eye development in “higher” monkeys<br />Terms to do with pregnancy<br />Terms to do with male-male competition<br />Etc. Etc. (…lots of hard to interpret molecular processes, of course…)<br />
  19. 19. “Brain genes”<br />
  20. 20. Visual system<br />Primates have a highly variable visual system:<br />Old World monkeys: three types of cones (unique among mammals)<br />New World monkeys: females trichromatic, males dichromatic<br />
  21. 21. Biological conclusions<br />Very, very, very, very preliminary: highest dN/dS ratios in functions for which there are multiple “optima” among primates:<br />Different placentationsystems<br />Different mating systems<br />Different visual systems<br />Different life histories and brain mass investments<br />
  22. 22. Methodological conclusions<br />Nine genomes is not that much. As FASTA files, it’s a 14Gb zipped archive (AA+cDNA).<br />The problem was trivially parallelizable, so I didn’t use any MPI versions of softwares.<br />Simple, consistent workflow and project design conventions are a lifesaver.<br />Make each step small enough so you can rerun it, because you will.<br />
  23. 23. Summary<br />I discussed:<br />Primate evolution and adaptation<br />Ortholog-finding<br />Alignment (multiple proteins, cDNA to protein)<br />Tree-based dN/dS ratio tests<br />Gene Ontology term enrichment<br />Methodological challenges<br />
  24. 24. Acknowledgements<br />Funding: FP7-PEOPLE-IEF-2008/N°237046<br />DBCLS for their kind invitation<br />Mark Pagel, Andrew Meade for discussion and help designing the workflow<br />