Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

BioCuration 2019 - Evidence and Conclusion Ontology 2019 Update

291 views

Published on

The Evidence and Conclusion Ontology (ECO) describes types of evidence relevant to biological investigations. First developed in the early 2000s, ECO now consists of over 1700 defined classes and is used by a large, and growing, list of resources. ECO imports close to 1000 classes from the Ontology for Biomedical Investigations and the Gene Ontology for use in logical definitions. Historically, ECO terms have generally been categorized by either the biological context of the evidence (e.g. gene expression) or the technique used to generate the evidence (e.g. PCR-based evidence). The result is that sometimes terms that have related biological context are found under different unrelated nodes. To address this, we have been performing a rigorous review of the structure and logic of the branches of ECO. Working with additional input from collaborators through the issue tracker on GitHub, term labels, definitions, and relationships are being evaluated and updated. The goal of these changes is to increase the logical consistency of ECO, make it easier for users to find and understand terms, and allow for ECO to continue to grow and support its users. In addition to the structural review, we have been working with CollecTF to utilize ECO for automated text mining. To generate a curated corpus for this effort, we have been annotating ECO terms to sentences which contain evidence-based assertions about gene products, taxonomic entities, and sequence features. From this effort we have developed clearly-defined annotation guidelines that have been passed on to a team of undergraduates who are continuing the curation effort.
Annotations are limited to single sentences, or to two consecutive sentences, containing the evidence instance and assertion clause. The quality of the mapping to ECO
and the strength of the author’s assertion are also captured. ECO is freely available at http://evidenceontology.org/ and https://github.com/evidenceontology.

Published in: Education
  • Be the first to comment

  • Be the first to like this

BioCuration 2019 - Evidence and Conclusion Ontology 2019 Update

  1. 1. The Evidence and Conclusion Ontology systematically describes scientific evidence types that support biological assertions. ECO is structured around two root classes: 'evidence' and 'assertion method’. Terms describing types of evidence are grouped under 'evidence’, while the 'assertion method', provides a mechanism for recording if a particular assertion was made by a human or in an automated fashion. ECO supports >20 user groups with their annotation efforts, e.g. UniProt-Gene Ontology Annotation1 (UniProt- GOA) has >628 million evidence-linked GO annotations2. ECO is released into the public domain under CC0 1.0 Universal (CC0 1.0) license. James B. Munro1, Elizabeth T. Hobbs2, Suvarna Nadendla1*, Rebecca C. Tauber1*, Stephen Goralski2, Ivan Erill2, Marcus C. Chibucos1, & Michelle Giglio1 1Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD 2Department of Biological Sciences, University of Maryland Baltimore County, Baltimore, MD *Contact: email - rctauber@gmail.com; snadendla@som.umaryland.edu Abstract: The Evidence and Conclusion Ontology (ECO) describes types of evidence relevant to biological investigations. First developed in the early 2000s, ECO now consists of over 1700 defined classes and is used by a large, and growing, list of resources. ECO imports close to 1000 classes from the Ontology for Biomedical Investigations and the Gene Ontology for use in logical definitions. Historically, ECO terms have generally been categorized by either the biological context of the evidence (e.g. gene expression) or the technique used to generate the evidence (e.g. PCR-based evidence). The result is that sometimes terms that have related biological context are found under different unrelated nodes. To address this, we have been performing a rigorous review of the structure and logic of the branches of ECO. Working with additional input from collaborators through the issue tracker on GitHub, term labels, definitions, and relationships are being evaluated and updated. The goal of these changes is to increase the logical consistency of ECO, make it easier for users to find and understand terms, and allow for ECO to continue to grow and support its users. In addition to the structural review, we have been working with CollecTF to utilize ECO for automated text mining. To generate a curated corpus for this effort, we have been annotating ECO terms to sentences which contain evidence-based assertions about gene products, taxonomic entities, and sequence features. From this effort we have developed clearly-defined annotation guidelines that have been passed on to a team of undergraduates who are continuing the curation effort. Annotations are limited to single sentences, or to two consecutive sentences, containing the evidence instance and assertion clause. The quality of the mapping to ECO and the strength of the author’s assertion are also captured. ECO is freely available at http://evidenceontology.org/ and https://github.com/evidenceontology. /evidenceontology Thank you to our collaborators and various user groups for supporting the growth of ECO. Collaborations: ECO is supported by the National Science Foundation (NSF) Division of Biological Infrastructure (DBI) under Award Number 1458400. Find us at http://evidenceontology.org/ 1. E.C. Dimmer, R.P. Huntley, Y. Alam-Faruque, T. Sawford, C. O'Donovan, M.J. Martinet, … R. Apweiler. (2012). The UniProt-GO Annotation database in 2011. Nucleic Acids Res., 40, D565–D570. 2. M.C. Chibucos, D.A. Siegele, J.C. Hu, M. Giglio (2017). The Evidence and Conclusion Ontology (ECO): Supporting GO Annotations. Methods in Mol. Biol., 1446, 245- 259. 3. S. Kilic, E.R. White, D.M. Sagitova, J.P. Cornish, & I. Erill. (2014). CollecTF: A database of experimentally validated transcription factor-binding sites in bacteria. Nucleic Acids Res., 42, D156-D160. 4. The Gene Ontology Consortium. (2015). Gene Ontology Consortium: going forward. Nucleic Acids Research, 43, D1049-D1056. 5. A. Bandrowski, R. Brinkman, M. Brochhausen, M.H. Brush, B. Bug, M.C. Chibucos. et al. (2016). The Ontology for Biomedical Investigations, PLoS One, 11(4):e0154556. 6. L.M. Schriml, E. Mitraka, J. Munro, B. Tauber, M. Schor, L. Nickle, V. Felix, Li. Jeng, C. Bearer. et al. Human Disease Ontology 2018 update: classification, content and workflow expansion, Nucleic Acids Research, Volume 47, Issue D1, 08 January 2019, Pages D955–D962. 7. M.C. Chibuocos, A.E. Zweifel, J.C. Herrera, W. Meza, S. Eslamfam, P. Uetz, … M.G. Giglio. (2014). An ontology for microbial phenotypes. BMC Microbiology, 14, 294. 8. Wikidata. https://www.wikidata.org/wiki/Wikidata:Main_Page • Currently, there are 1760 terms in ECO. All the terms have textual definitions. • 1339 ECO terms have logical definitions. Of these, 186 have logical definitions that link out to other vocabularies such as the GO4 and the OBI5, 1147 terms have logical definitions linking the class to an ECO assertion method, and 6 terms have logical definitions linking to other internal class. Future direction • Continue to work with our collaborators. • Collaboration with Confidence Information Ontology for expanding the model of capturing confidence information. The Human Disease Ontology6 , to incorporate classes representing definition sources. The Ontology for Microbial Phenotypes7 , to expand classes for phenotype annotations. The Ontology for Biomedical Investigations5, to complete the harmonization project. The Gene Ontology4 , to continue support representing evidence in gene products annotations. Wikidata8, to support annotations of genes, proteins and diseases in it’s structured data storage repository. Increased Logical Consistency Node Expansion We have been working with CollecTF3 to utilize ECO for an automated text mining effort. As a part of this project, a curated corpus of high quality experimental evidence annotations consisting of gene products, sequence feature, phenotype, and taxonomy/phylogeny, etc. is generated from sentences in scientific articles. This corpus is used as an annotated training set for building an automated text mining model. Guidelines for annotation Annotation process Interactive Text Mining (Future plan) Before After Inter-Annotator Agreement Kappa Equation : Ao = observed agreement; Ae = expected agreement deprecated

×