Ontology development and use for efficient information input and retrievalPresentation Transcript
Ontology development and use for efficient information input and retrieval 1 Alice Clara Augustine, Vijayalakshmi K, Shobha Char, Naveen Sylvester, Mittur N Jagadish; 2 Mike Edgerton 1 Monsanto Research Centre. Divn. of Monsanto Holdings Pvt. Ltd. #44/2A, “Vasants Business Park”. Bellary Road, NH-7, Hebbal. Bangalore 560 092, India 2 Monsanto Company. 800 North Lindbergh Blvd. Creve Coeur, Missouri 63167 United States Seventh Agricultural Ontology Services Workshop NOVEMEBER 9-10, 2006: BANGALORE (INDIA)
What is an ontology, desiderata,
What we have done in Monsanto
Tools used, method etc
Example of information retrieval
a common vocabulary
a shared understanding for people and machines
an established list of standardized terminology for
use in indexing and retrieval of information.
What is an Ontology For our purposes, an ontology is a set of terms encompassing a domain of biology that is organized according to biological relationships.
Why do we need an ontology? There is tremendous variation in the way in which phenotypes, traits, gene expression and protein localization are described. In addition, the nomenclature used to describe anatomy and development varies across taxa. For example: 1. A plant that flowers late can be described in many ways (late flowering, delayed flowering, flowers at 36 days after sowing). 2. Panicle, ear, tassel are all words used to describe an inflorescence. To make meaningful comparisons within and across different databases, we need a shared descriptive language that is uniformly applied to the data. Slide is a courtesy of Plant Ontology Consortium (POC)
Precision (The degree of mutual agreement / strict conformity to a rule or a standard e.g Gene Ontology terms) Systematic (characterized by order and planning e.g. Hierarchy ) Explicitness unambiguous (indicating a single clearly defined meaning: e.g: ‘Flower’ ) Flexibility (the quality of being adaptable or variable e.g. Thesaurus ) Ontology Desiderata
To establish a company standard Plant Ontology to
curate / edit data points in varied databases in a consistent
To incorporate a hierarchy indicating relationships amongst terms
To associate terms with very succinct definitions (glossary) for
To build in thesaurus for variability / adaptability
To ensure the use of this standardized terminology for indexing,
entry and retrieval of plant-relevant biological information
To be a part of the public Ontology effort (Plant Ontology Consortium; GO and TO - Gramene) in an attempt to improve both our ontology and the public ontology
Objectives of the Ontology team at Monsanto
Terms sourced from literature (dicot and monocot), Plant
Biology books, public databases (Gene Ontology, Plant
Covers plant morphology/anatomy and developmental stages, trait terms & phenotypes
Aligned with terms in Plant Ontology (PO) and Trait ontology (TO) (Shared with POC and Gramene).
Facilitates Thesaurus building for enhanced information retrieval
Includes glossary of terms
Salient features: Monsanto ontology
The structure: Directed Acyclic Graph (DAG).
The tool: DAG-Edit tool
Biological concepts are represented as a tree.
Branches represent broader terms
Leaves are more specific terms.
Like a simple hierarchy, children are not allowed to be their own ancestors; hence cycles are forbidden.
However, unlike a simple hierarchy, children terms are allowed to have more than one parent, thus allowing multiple child to parent relationships.
Term organization & tools
Instance of (is a, type of): Used to describe the relationship between a child term that represents a specific type of a more general parent term. For example: a silique is a type of fruit; a panicle is an inflorescence. Part of: Used to indicate the relationship between a child term that is a part of the parent term. For example: the ectocarp is a part of the pericarp, which in turn is part of the fruit. Develops from: Used to describe the relationship between a child term that develops from its parent term. For example: a seed coat (testa) develops from the integuments; a leaf develops from a leaf primordium. Each 'child term' has a unique relationship to its 'parent term'
4 wheelers Safari Tyre Two wheelers TATA truck Leyland SUV A graphical view of some terms and their relationships Car Bus Truck SUMO Sierra Indica V2 Indigo Indiva Carburetor Clutch plate Shock absorber The terms can be children of other parents too Part of Instance of term
anatomy Whole plant inflorescence flower tissue organ tapetum stamen anther filament shoot floral organ Part of Instance of A graphical view of some terms and their relationships term Yfg1 gene petal Petal primordium Develops from Plant Ontology (PO)
What genes are expressed in the same tissues or organs as my gene of interest? When and where are homologous genes expressed in different organisms? What genes are expressed in both monocot and dicot flowers? What genes are expressed in maize leaves but NOT in Arabidopsis or rice? Examples of queries that will be possible using annotations with POC terms.
Terms are then used to annotate genes Trait ontology Stress related WUE Yield related Pod length Pod shatter Pod weight abiotic term gene ALC abiotic NUE Cold Root mass Root volume Root weight The terms can be children of other parents too Root related Shoot related Fruit related Plant anatomy related Pod related What are the genes that are associated with conferring a pod shatter related trait? What are the genes that are associated with WUE AND increased root mass phenotype? P5CS AGL8
Using linguistics for Database interoperability Cross database - querying and reporting
Common understanding amongst all
Reuse of domain knowledge
Standards for interoperability
A community reference.
A common framework for integration
Support for knowledge
Hypothesis generation .
Querying, indexing & data capture .
CV – Mediated Benefits of Data Integration, Consistency and Accuracy
Common Platform, User Interface & Format of Unified CV
Alignment with external public dB’s (GO / TO / PO)
Heterogeneous sources / heterogeneous IDs.
Inconsistency in annotation (tackling legacy ontologies
Implementation of a Company Standard Plant Ontology that would ultimately cater to the integration of varied kind of databases with diverse data types .
Acknowledgements Jagadish Mittur Shobha Char Vijay Paranjape BSIB team@MRC, Bangalore The organizers of this conference – in particular Dr. Gauri Salokhe (Information Management Officer) Dr. V. C. Patil (The Organizing Secretary)