• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Ontology development and use for efficient information input and retrieval
 

Ontology development and use for efficient information input and retrieval

on

  • 1,385 views

Authors: Alice Clara Augustine, Vijayalakshmi K, Shobha Char, Naveen Sylvester, Mittur N Jagadish; Mike Edgerton

Authors: Alice Clara Augustine, Vijayalakshmi K, Shobha Char, Naveen Sylvester, Mittur N Jagadish; Mike Edgerton
Organizations: Monsanto Research Centre, Monsanto Company

Statistics

Views

Total Views
1,385
Views on SlideShare
1,385
Embed Views
0

Actions

Likes
0
Downloads
9
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

CC Attribution-NonCommercial-ShareAlike LicenseCC Attribution-NonCommercial-ShareAlike LicenseCC Attribution-NonCommercial-ShareAlike License

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • Title slide with Author names
  • Why do we need controlled vocabularies?
  • 1 & 2 are completed 3 is more than 50% done. 4 to 7 are proposed.
  • Annotation with POC terms will make queries like these easier to perform.
  • This is an extra slide as Acknowledgements are already mentioned at the bottom of the poster.

Ontology development and use for efficient information input and retrieval Ontology development and use for efficient information input and retrieval Presentation Transcript

  • Ontology development and use for efficient information input and retrieval 1 Alice Clara Augustine, Vijayalakshmi K, Shobha Char, Naveen Sylvester, Mittur N Jagadish; 2 Mike Edgerton   1 Monsanto Research Centre. Divn. of Monsanto Holdings Pvt. Ltd. #44/2A, “Vasants Business Park”. Bellary Road, NH-7, Hebbal. Bangalore 560 092, India 2 Monsanto Company. 800 North Lindbergh Blvd. Creve Coeur, Missouri 63167 United States Seventh Agricultural Ontology Services Workshop NOVEMEBER 9-10, 2006: BANGALORE (INDIA)
    • Outline
    • What is an ontology, desiderata,
    • What we have done in Monsanto
    • Tools used, method etc
    • Example of information retrieval
    • Challenges
      • a common vocabulary
      • a shared understanding for people and machines
      • an established list of standardized terminology for
      • use in indexing and retrieval of information.
    What is an Ontology For our purposes, an ontology is a set of terms encompassing a domain of biology that is organized according to biological relationships.
  • Why do we need an ontology? There is tremendous variation in the way in which phenotypes, traits, gene expression and protein localization are described. In addition, the nomenclature used to describe anatomy and development varies across taxa. For example: 1. A plant that flowers late can be described in many ways (late flowering, delayed flowering, flowers at 36 days after sowing). 2. Panicle, ear, tassel are all words used to describe an inflorescence. To make meaningful comparisons within and across different databases, we need a shared descriptive language that is uniformly applied to the data. Slide is a courtesy of Plant Ontology Consortium (POC)
  • Precision (The degree of mutual agreement / strict conformity to a rule or a standard e.g Gene Ontology terms) Systematic (characterized by order and planning e.g. Hierarchy ) Explicitness unambiguous (indicating a single clearly defined meaning: e.g: ‘Flower’ ) Flexibility (the quality of being adaptable or variable e.g. Thesaurus ) Ontology Desiderata
    • To establish a company standard Plant Ontology to
    • curate / edit data points in varied databases in a consistent
    • manner.
    • To incorporate a hierarchy indicating relationships amongst terms
    • To associate terms with very succinct definitions (glossary) for
    • uniform understanding
    • To build in thesaurus for variability / adaptability
    • To ensure the use of this standardized terminology for indexing,
    • entry and retrieval of plant-relevant biological information
    • To be a part of the public Ontology effort (Plant Ontology Consortium; GO and TO - Gramene) in an attempt to improve both our ontology and the public ontology
    Objectives of the Ontology team at Monsanto
    • Terms sourced from literature (dicot and monocot), Plant
    • Biology books, public databases (Gene Ontology, Plant
    • Ontology, Trait Ontology, RiceGenes, MaizedB, Gramene)
    • Covers plant morphology/anatomy and developmental stages, trait terms & phenotypes
    • Aligned with terms in Plant Ontology (PO) and Trait ontology (TO) (Shared with POC and Gramene).
    • Hierarchical organization
    • Facilitates Thesaurus building for enhanced information retrieval
    • Includes glossary of terms
    Salient features: Monsanto ontology
    • The structure: Directed Acyclic Graph (DAG).
    • The tool: DAG-Edit tool
    • The structure:
    • Biological concepts are represented as a tree.
      • Branches represent broader terms
      • Leaves are more specific terms.
    • Like a simple hierarchy, children are not allowed to be their own ancestors; hence cycles are forbidden.
    • However, unlike a simple hierarchy, children terms are allowed to have more than one parent, thus allowing multiple child to parent relationships.
    Term organization & tools
  • Instance of (is a, type of): Used to describe the relationship between a child term that represents a specific type of a more general parent term. For example: a silique is a type of fruit; a panicle is an inflorescence. Part of: Used to indicate the relationship between a child term that is a part of the parent term. For example: the ectocarp is a part of the pericarp, which in turn is part of the fruit. Develops from: Used to describe the relationship between a child term that develops from its parent term. For example: a seed coat (testa) develops from the integuments; a leaf develops from a leaf primordium. Each 'child term' has a unique relationship to its 'parent term'
  • 4 wheelers Safari Tyre Two wheelers TATA truck Leyland SUV A graphical view of some terms and their relationships Car Bus Truck SUMO Sierra Indica V2 Indigo Indiva Carburetor Clutch plate Shock absorber The terms can be children of other parents too Part of Instance of term
  • anatomy Whole plant inflorescence flower tissue organ tapetum stamen anther filament shoot floral organ Part of Instance of A graphical view of some terms and their relationships term Yfg1 gene petal Petal primordium Develops from Plant Ontology (PO)
  • What genes are expressed in the same tissues or organs as my gene of interest? When and where are homologous genes expressed in different organisms? What genes are expressed in both monocot and dicot flowers? What genes are expressed in maize leaves but NOT in Arabidopsis or rice? Examples of queries that will be possible using annotations with POC terms.
  • Terms are then used to annotate genes Trait ontology Stress related WUE Yield related Pod length Pod shatter Pod weight abiotic term gene ALC abiotic NUE Cold Root mass Root volume Root weight The terms can be children of other parents too Root related Shoot related Fruit related Plant anatomy related Pod related What are the genes that are associated with conferring a pod shatter related trait? What are the genes that are associated with WUE AND increased root mass phenotype? P5CS AGL8
  • Using linguistics for Database interoperability Cross database - querying and reporting
    • Information sharing
    • Common understanding amongst all
    • Reuse of domain knowledge
    • Standards for interoperability
    • A community reference.
    • Database interoperability
    • A common framework for integration
    • Support for knowledge
    • intensive applications
    • Text extraction.
    • Decision support.
    • Knowledge discovery
    • Hypothesis generation .
    • Intelligent interface
    • Querying, indexing & data capture .
    CV – Mediated Benefits of Data Integration, Consistency and Accuracy
  • Challenges
    • Common Platform, User Interface & Format of Unified CV
    • Alignment with external public dB’s (GO / TO / PO)
    • Heterogeneous sources / heterogeneous IDs.
    • Inconsistency in annotation (tackling legacy ontologies
    • across databases).
    • Implementation of a Company Standard Plant Ontology that would ultimately cater to the integration of varied kind of databases with diverse data types .
  • Acknowledgements Jagadish Mittur Shobha Char Vijay Paranjape BSIB team@MRC, Bangalore The organizers of this conference – in particular Dr. Gauri Salokhe (Information Management Officer) Dr. V. C. Patil (The Organizing Secretary)