Successfully reported this slideshow.

Prosdocimi ucb cdao


Published on

CDAO presentation.
The idea of the comparative analysis ontoloty has been presented worldwide, including: NESCent (USA), IGBMC (France), UFRJ (Brazil). Providing a semantic framework for evolutionary analysis in a high-throughtput way after the next and third generation sequencing is the way to approach evolutionary-based studies into genome-wide analysis. The darwinian core of reasoning also allows CDAO to be used with other entities.

Published in: Education, Technology
  • Be the first to comment

  • Be the first to like this

Prosdocimi ucb cdao

  1. 1. Francisco Prosdocimi Brandon Chisham Enrico Pontelli Arlin Stoltzfus Julie Thompson Framework for a C omparative D ata A nalysis O ntology IGBMC Department Seminar February 2009, Strasbourg Linking Evolution and Integrative Biology
  2. 2. Background <ul><li>Introduction/ Motivation </li></ul><ul><li>Development </li></ul><ul><li>Features </li></ul><ul><li>Evaluation </li></ul><ul><li>Application </li></ul><ul><li>Concluding remarks </li></ul>An explosion of the number and quality of data to be analyzed Nature 4 th September 2008 <ul><li>The Petabyte era (10 15 ): a new generation of DNA sequencers is up and running genome annotation, protein function and structure prediction, homologs searches, prediction of SNPs, etc </li></ul><ul><li>New tools are needed for the about-to-exist individual-based genomic sciences and medicine : populational genomics, farmacogenomics, evolutionary genomics </li></ul><ul><li>Lots of new data exiges large-scale automated analysis </li></ul><ul><li>interactome, gene expression, microRNA evolution, etc </li></ul><ul><li>Integrative biology data mining, analysis and integration </li></ul>
  3. 3. <ul><li>Powerful tools for evolutionary analysis remain under-utilized and difficult to apply </li></ul><ul><li>Nowadays tools are mainly used in an expert-supervised approach , which is time-consuming, difficult to document, error-prone, and not scalable </li></ul><ul><li>Need for better documentation of the whole pipeline used for evolutionary analysis </li></ul>Other Challenges Ortholog searches Multiple Alignment Alignment refinement Phylogenetic reconstruction Sequencing and Base-calling DNA extraction Statistical analysis Extraction kits Conditions PCR conditions Sequencer PHRED BLAST BBH COGnitor PSI-BLAST Phylogeny Clustal T-Coffe MAFFT MultAlign Manual Leon REFINER HMM Parsimony Max Likelihood PAUP Phylip Bootstrap Jacknife Bayesian MCMC <ul><li>Introduction/ Motivation </li></ul><ul><li>Development </li></ul><ul><li>Features </li></ul><ul><li>Evaluation </li></ul><ul><li>Application </li></ul><ul><li>Concluding remarks </li></ul>New tools are necessary to the automatic treatment of high-throughput data
  4. 4. NESCent <ul><li>[email_address] : a dozen scientific experts in phylogenetic software development got together to discuss these problems </li></ul><ul><li>Need to lower the technology barrier to apply the full force of evolutionary analysis to emerging problem areas (systems biology) </li></ul><ul><li>An integrated solution would make use of a combination of technologies, including: </li></ul><ul><ul><li>Clear workflow schemas </li></ul></ul><ul><ul><li>User-friendly software and web-services </li></ul></ul><ul><ul><li>Promotion of new databases and data standards </li></ul></ul><ul><ul><li>Development of standard vocabulary to represent evolutionary data  C-DAO </li></ul></ul>What to do? http:// / <ul><li>Introduction/ Motivation </li></ul><ul><li>Development </li></ul><ul><li>Features </li></ul><ul><li>Evaluation </li></ul><ul><li>Application </li></ul><ul><li>Concluding remarks </li></ul>
  5. 5. Developing Standards <ul><li>Standards for standards: formally approved standards are defined by a number of international bodies, such as W3C </li></ul><ul><li>The modern way to standardize knowledge is creating ontologies and they have been successfully applied for a number of other biomedical applications </li></ul><ul><li>Standardization of knowledge is a crucial step forward to allow easy communication and data interoperability </li></ul><ul><li>Standardization does not remove diversity but does improve connection , documentation , annotation and scalability </li></ul>obo Connecting data, connecting people, connecting algorithms <ul><li>Introduction/ Motivation </li></ul><ul><li>Development </li></ul><ul><li>Features </li></ul><ul><li>Evaluation </li></ul><ul><li>Application </li></ul><ul><li>Concluding remarks </li></ul>
  6. 6. What is an ontology? <ul><li>Ontology from philosophy: study of the nature of being, existence and reality </li></ul><ul><li>Ontology and Language: description of concepts (nouns) to describe events and entities in the real world and relations (actions or verbs) to relate these entities </li></ul><ul><li>Biomedical ontologies Positive heuristics fertile research program </li></ul>“ The positive heuristic of the programme saves the scientist from becoming confused by the ocean of anomalies . ” Imre Lakatos (1922-1974) <ul><li>Introduction/ Motivation </li></ul><ul><li>Development </li></ul><ul><li>Features </li></ul><ul><li>Evaluation </li></ul><ul><li>Application </li></ul><ul><li>Concluding remarks </li></ul>“ the mathematician is said to speak not about numbers, functions and infinite classes but merely about meaningless symbols and formulas manipulated according to given formal rules ” Rudolf Carnap (1891-1970)
  7. 7. Hein? O que é mesmo? <ul><li>Conjunto de termos e relações entre termos que devem ser utilizados para a descrição de algum fenômeno natural </li></ul><ul><li>A ontologia da pizza, definição de termos </li></ul><ul><ul><li>Relações (verbais) entre termos: temMassa, temBorda, temIngrediente, temTopo, éMassaDe, éTopoDe </li></ul></ul><ul><ul><li>Termos : Pan, Italiana, recheioCatupiry, recheioQueijo, molhoDeTomate, Calabresa, Presunto, QuatroQueijos, Pimentão, Cebola, Ovo, Frango... </li></ul></ul><ul><li>Instanciando a ontologia </li></ul><ul><ul><li>MinhaPizza temMassa Pan MinhaPizza temBorda recheioQueijo MinhaPizza temIngrediente molhoDeTomate MinhaPizza temIngrediente Frango MinhaPizza temTopo Catupiry </li></ul></ul><ul><ul><li>Gerando novas informações </li></ul></ul><ul><ul><ul><li>Valor nutricional, preço </li></ul></ul></ul><ul><li>Introduction/ Motivation </li></ul><ul><li>Development </li></ul><ul><li>Features </li></ul><ul><li>Evaluation </li></ul><ul><li>Application </li></ul><ul><li>Concluding remarks </li></ul>
  8. 8. A ontologia é a criação de uma linguagem formal com termos e relações entre termos que podem ser instanciados para a descrição formal de eventos do mundo real/natural. <ul><li>Introduction/ Motivation </li></ul><ul><li>Development </li></ul><ul><li>Features </li></ul><ul><li>Evaluation </li></ul><ul><li>Application </li></ul><ul><li>Concluding remarks </li></ul>
  9. 9. Gene ontology <ul><li>Primeira ontologia criada em biologia molecular, 2000 </li></ul><ul><li>Consórcio para a padronização da anotação gênica </li></ul><ul><li>Vocabulário padrão para a descrição de genes em três categorias </li></ul><ul><ul><li>Processos biológico </li></ul></ul><ul><ul><li>Função molecular </li></ul></ul><ul><ul><li>Localização celular </li></ul></ul><ul><li>Introduction/ Motivation </li></ul><ul><li>Development </li></ul><ul><li>Features </li></ul><ul><li>Evaluation </li></ul><ul><li>Application </li></ul><ul><li>Concluding remarks </li></ul>
  10. 10. As sub-ontologias do GO Anotação de genomas usando os mesmos termos Comparação eficaz <ul><li>Introduction/ Motivation </li></ul><ul><li>Development </li></ul><ul><li>Features </li></ul><ul><li>Evaluation </li></ul><ul><li>Application </li></ul><ul><li>Concluding remarks </li></ul>
  11. 11. <ul><li>Introduction/ Motivation </li></ul><ul><li>Development </li></ul><ul><li>Features </li></ul><ul><li>Evaluation </li></ul><ul><li>Application </li></ul><ul><li>Concluding remarks </li></ul>
  12. 12. Além do Gene ontology <ul><li>OBO foundry: The open biomedical ontologies </li></ul><ul><li>Anatomy ontologies </li></ul><ul><li>Introduction/ Motivation </li></ul><ul><li>Development </li></ul><ul><li>Features </li></ul><ul><li>Evaluation </li></ul><ul><li>Application </li></ul><ul><li>Concluding remarks </li></ul>
  13. 13. GO X CDAO <ul><li>Pré-CDAO ontologies (GO, anatomy, etc.) </li></ul><ul><ul><li>Relações semânticas simples ( is_a , part_of ) entre os conceitos criados; ontologia descritiva </li></ul></ul><ul><ul><li>Relation ontology : limitação do número de relações (verbos) a serem utilizados na descrição </li></ul></ul><ul><li>CDAO </li></ul><ul><ul><li>Relações semânticas complexas </li></ul></ul><ul><ul><li>Tentativa de criar uma verdadeira linguagem lógico-formal para a descrição de eventos </li></ul></ul><ul><ul><li>Possibilidade de realização de inferências novas </li></ul></ul><ul><li>Knowledge discovery </li></ul><ul><ul><li>Uma vez que os dados tenham sido anotados de acordo com termos e relações fixas, programas conhecidos como reasoners são capazes de ler o vocabulário e realizar inferências automáticas -> Petabyte-era </li></ul></ul><ul><li>Introduction/ Motivation </li></ul><ul><li>Development </li></ul><ul><li>Features </li></ul><ul><li>Evaluation </li></ul><ul><li>Application </li></ul><ul><li>Concluding remarks </li></ul>
  14. 14. MIAPA integration <ul><li>MIAME - Minimum Information About a Microarray Experiment ( Nat Genet. 2001) </li></ul><ul><ul><li>Documentação formal da informação mínima necessária para a reprodução do experimento </li></ul></ul><ul><li>MIAPA - Minimum Information About a Phylogenetic Analysis ( OMICS, 2006) </li></ul><ul><li>Introduction/ Motivation </li></ul><ul><li>Development </li></ul><ul><li>Features </li></ul><ul><li>Evaluation </li></ul><ul><li>Application </li></ul><ul><li>Concluding remarks </li></ul>
  15. 15. Algorithm for CDAO <ul><li>IF </li></ul><ul><li>Petabyte era, BIG-data AND </li></ul><ul><li>Non-scalability of modern evolutionary analysis AND </li></ul><ul><li>Science as language creation </li></ul><ul><li>AND </li></ul><ul><li>We know the standards to create standards </li></ul><ul><li>AND </li></ul><ul><li>Biomedical community know how to use ontologies (GO) </li></ul><ul><li>THEN </li></ul><ul><li>We gonna create this evolutionary ontology and help people to use and talk about evolution! However... </li></ul><ul><li>Introduction/ Motivation </li></ul><ul><li>Development </li></ul><ul><li>Features </li></ul><ul><li>Evaluation </li></ul><ul><li>Application </li></ul><ul><li>Concluding remarks </li></ul>
  16. 16. “ Nothing in biology makes sense except in the light of evolution ” T. Dobzhansky (1900-1975) <ul><li>The central role of Evolutionary biology </li></ul><ul><ul><li>Every single data collection made in biology can be viewed from an evolutionary perspective </li></ul></ul><ul><ul><li>CDAO must be able to represent virtually any data collection in the whole field of biology under an evolutionary perspective ! From biochemistry to zoology, genetics to botany, genomics to ecology, microbiology to development, physiology and medicine and so on… </li></ul></ul><ul><li>And... there are controversies among scholars... </li></ul><ul><ul><li>What is a species? What is an OTU? Should evolutionary characters be homologous? Darwin’s selectionism or Kimura’s neutralism? Gradualism or punctuated equilibrium? Phenetics or cladistics? Parsimony or likelihood? </li></ul></ul>Evolution as the core <ul><li>Introduction/ Motivation </li></ul><ul><li>Development </li></ul><ul><li>Features </li></ul><ul><li>Evaluation </li></ul><ul><li>Application </li></ul><ul><li>Concluding remarks </li></ul>Phenetics and cladistics data are both supported into C-DAO
  17. 17. <ul><li>Aimed at the formalization of the structure of knowledge on evolutionary analysis </li></ul><ul><ul><li>To represent both the data and the objective classification ( tree ) of compared entities, methods used on the analysis and relevant information </li></ul></ul><ul><ul><li>To map the stepwise history of evolution , including a chronicle of character-modification events </li></ul></ul><ul><ul><li>To make biological inferences about the present (propagating knowledge) </li></ul></ul><ul><ul><li>To cope with different views and paradigms applied on modern evolutionary biology field </li></ul></ul><ul><li>Introduction/ Motivation </li></ul><ul><li>Development </li></ul><ul><li>Features </li></ul><ul><li>Evaluation </li></ul><ul><li>Application </li></ul><ul><li>Concluding remarks </li></ul>
  18. 18. <ul><li>Introduction/ Motivation </li></ul><ul><li>Development </li></ul><ul><li>Features </li></ul><ul><li>Evaluation </li></ul><ul><li>Application </li></ul><ul><li>Concluding remarks </li></ul>1 Specification – Use cases Protein family alignment, Modelling character evolution, Functional inference, Human variation, Bayesian supertrees, Determine concordance between two or more phylogenies, Estimate divergence times, Determine genome-wide distribution of Ks (silent site substitutions), Tree reconciliation (orthology analysis), etc. 2 Representation 3 Conceptualization Define the concepts Define the relations between concepts (semantics) Define numeric restrictions 4 Implementation 5 Evaluation Back to step3: Reconceptualization
  19. 19. <ul><li>Introduction/ Motivation </li></ul><ul><li>Development </li></ul><ul><li>Features </li></ul><ul><li>Evaluation </li></ul><ul><li>Application </li></ul><ul><li>Concluding remarks </li></ul>
  20. 20. <ul><li>Introduction/ Motivation </li></ul><ul><li>Development </li></ul><ul><li>Features </li></ul><ul><li>Evaluation </li></ul><ul><li>Application </li></ul><ul><li>Concluding remarks </li></ul>Data Integration Data representation
  21. 21. <ul><li>Introduction/ Motivation </li></ul><ul><li>Development </li></ul><ul><li>Features </li></ul><ul><li>Evaluation </li></ul><ul><li>Application </li></ul><ul><li>Concluding remarks </li></ul>
  22. 22. Evaluation <ul><li>Introduction/ Motivation </li></ul><ul><li>Development </li></ul><ul><li>Features </li></ul><ul><li>Evaluation </li></ul><ul><li>Application </li></ul><ul><li>Concluding remarks </li></ul><ul><li>Translation of real test-cases represented in NEXUS files into C-DAO instances </li></ul><ul><li>C-DAO internal format </li></ul><cdao:Node rdf:ID=&quot;inode15&quot;> <cdao:part_of rdf:resource=&quot;#Tree_con_50_majrule&quot;/> <cdao:belongs_to_Edge rdf:resource=&quot;#edge_inode15_inode14&quot; /> <cdao:belongs_to_Edge rdf:resource= &quot;#edge_Athaliana_CAB79970_inode15&quot; /> <cdao:belongs_to_Edge rdf:resource=&quot;#edge_Athaliana_AAD31363_inode15&quot; /> <cdao:belongs_to_Edge_as_Child rdf:resource=&quot;#edge_inode15_inode14&quot; /> <cdao:belongs_to_Edge_as_Parent rdf:resource=&quot;#edge_Athaliana_CAB79970_inode15&quot; /> <cdao:belongs_to_Edge_as_Parent rdf:resource=&quot;#edge_Athaliana_AAD31363_inode15&quot; /> <cdao:nca_node_of rdf:resource=&quot;#set_nca_44&quot;/> </cdao:Node> <cdao:Directed_Edge rdf:ID=&quot;edge_Athaliana_CAB79970_1_inode15&quot;> <cdao:part_of rdf:resource=&quot;#Tree&quot;/> <cdao:has_Parent_Node rdf:resource=&quot;#node_inode15&quot;/> <cdao:has_Child_Node rdf:resource=&quot;#node_Athaliana_CAB79970_1&quot;/> <cdao:has_Annotation rdf:resource=&quot;#edge_Athaliana_CAB79970_1_inode15_length&quot;/> </cdao:Directed_Edge> <cdao:Edge_Length rdf:ID=&quot;edge_Athaliana_CAB79970_1_inode15_length&quot;> <cdao:has_Value rdf:datatype=&quot;&xsd;float&quot;> 0.009539 </cdao:has_Value> </cdao:Edge_Length> http:// /
  23. 23. EvolHHuPro <ul><li>Introduction/ Motivation </li></ul><ul><li>Development </li></ul><ul><li>Features </li></ul><ul><li>Evaluation </li></ul><ul><li>Application </li></ul><ul><li>Concluding remarks </li></ul><ul><li>On-going ANR project between IGBMC@Strasbourg (O. Poch) and Univ.Provence@Marseille (P. Pontarotti) </li></ul><ul><li>To describe the evolutionary history of modifications on each human protein since the rise of vertebrates </li></ul><ul><li>Data will be stored and represented as CDAO entities </li></ul>EvolHHuPro -- Evolutionary Histories of Human Proteome Application human human frog fish mouse mouse site active human human frog fish mouse mouse duplication domain loss recombination mutation human mouse
  24. 24. EvolHHuPro EvolHHuPro -- Evolutionary Histories of Human Proteome <ul><li>Introduction/ Motivation </li></ul><ul><li>Development </li></ul><ul><li>Features </li></ul><ul><li>Evaluation </li></ul><ul><li>Application </li></ul><ul><li>Concluding remarks </li></ul>Application GO event gène1 gène2 gène3 GOAnno
  25. 25. <ul><li>Allows the representation of large datasets ( syntactics , data representation ) </li></ul><ul><li>Allows different anomalous datasets to be combined ( data integration ) </li></ul><ul><li>Provides strict concepts making researchers speak in a standard vocabulary (avoids a Babel’s Tower problem ) </li></ul><ul><li>Allows logical inferences and knowledge propagation to be made automatically ( semantics ) 1. If TU1 has_annotation == GO:0006260 2. If TU2 has_annotation == “”; 3. If TU3 has_annotation == GO:0006260 4. If TU1, TU2 and TU3 form a monophyletic clade THEN TU2 has_annotation = GO:0006260 </li></ul>And so far, CDAO... <ul><li>Introduction/ Motivation </li></ul><ul><li>Development </li></ul><ul><li>Features </li></ul><ul><li>Evaluation </li></ul><ul><li>Application </li></ul><ul><li>Concluding remarks </li></ul>TU1 TU3 TU2 AN1 AN2
  26. 26. Future Challenges <ul><li>Introduction/ Motivation </li></ul><ul><li>Development </li></ul><ul><li>Features </li></ul><ul><li>Evaluation </li></ul><ul><li>Application </li></ul><ul><li>Concluding remarks </li></ul><ul><li>Verify the usability of the ontology by evolutionary biologists </li></ul><ul><li>Development of new tools for data format conversion </li></ul><ul><li>Integrate C-DAO into a generic workflow of evolutionary biology software (Arlin Stoltzfus) </li></ul><ul><li>Integrate CDAO with other ontologies (MAO, SO, AA, anatomy) for specific applications </li></ul><ul><li>Expand terms and concepts to allow a broader representation of evolutionary and comparative data </li></ul>
  27. 27. Conclusions <ul><li>C-DAO is a prototype for a well-annotated ontology providing represention of key concepts in evolutionary analysis, such as: </li></ul><ul><ul><li>Phylogenetic trees of entities-to-be-compared </li></ul></ul><ul><ul><li>Character-state data representing the attributes of entities </li></ul></ul><ul><ul><li>Methodological annotation of procedures used on the analysis (integration with MIAPA) </li></ul></ul><ul><ul><li>Evolutionary changes in characters over time </li></ul></ul><ul><li>It aims to facilitate communication, annotation, program interoperability, data integration and automated analysis of large-scale evolutionary datasets </li></ul>http:// <ul><li>Introduction/ Motivation </li></ul><ul><li>Development </li></ul><ul><li>Features </li></ul><ul><li>Evaluation </li></ul><ul><li>Application </li></ul><ul><li>Concluding remarks </li></ul>
  28. 28. Publications <ul><li>Introduction/ Motivation </li></ul><ul><li>Development </li></ul><ul><li>Features </li></ul><ul><li>Evaluation </li></ul><ul><li>Application </li></ul><ul><li>Concluding remarks </li></ul>
  29. 29. Acknowledgements https:// / <ul><li>Evo-info working group </li></ul><ul><li>EvolHHuPro/LBGI working group </li></ul>Pierre Pontarotti, Elodie Darbo, Philippe Gouret Olivier Poch and LBGI members <ul><li>Introduction/ Motivation </li></ul><ul><li>Development </li></ul><ul><li>Features </li></ul><ul><li>Evaluation </li></ul><ul><li>Application </li></ul><ul><li>Concluding remarks </li></ul><ul><li>Pós-graduação em ciências genômicas e biotecnologia - UCB </li></ul>Jonathan Joe Mark John Sergei L. Sudhir Paul O. Aaron David Wayne Weigang Andrew Arlin David L. Rutger Xuhua Christian Eisen Felsenstein Holder Huelsenbeck Kosakovsky Pond Kumar Lewis Mackey Maddison Maddison Qiu Rambaut Stoltzfus Swofford Vos Xia Zmasek UC Davis Genome Center, UC Davis, CA Department of Genome Sciences/ Biology, Seattle, WA School of Computational Science, FSU, Tallahassee, FL University of California, San Diego, CA Antiviral Research Center, UC, San Diego, CA Center for Evolutionary Functional Genomics, Tempe, AZ University of Connecticutt, Storrs, CT GlaxoSmithKline, King of Prussia, PA Department of Entomology, UA,Tucson, AZ Departments of Zoology and Botany, UBC, Vancouver, BC Department of Biological Sciences, HCCUNY, New York, NY Zoology Department, University of Oxford, Oxford, UK Institute of Evolutionary Biology, UE, Edinburgh, UK School of Computational Science, FSU, Tallahassee, FL University of British Columbia, Vancouver, BC (Canada) Biology Department, University of Ottawa, Ottawa, ON Burnham Institute for Medical Research, La Jolla, CA
  30. 30. Julie Thompson Enrico Pontelli Brandon Chisham Arlin Stoltzfus Visit our web-page at Dr. Francisco Prosdocimi – <ul><li>Introduction/ Motivation </li></ul><ul><li>Development </li></ul><ul><li>Features </li></ul><ul><li>Evaluation </li></ul><ul><li>Application </li></ul><ul><li>Concluding remarks </li></ul>Francisco Prosdocimi
  31. 31. CDAO meeting August, 2009 Las cruces, New Mexico