Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Design and creation of ontologies for environmental information retrieval

2,107 views

Published on

Published in: Education
  • Be the first to comment

Design and creation of ontologies for environmental information retrieval

  1. 1. Design and Creation of Ontologies for Environmental Information Retrieval Vipul Kashyap AOS Workshop, Rome, November 2001 [email_address]
  2. 2. Outline <ul><li>Ontologies for Information Retrieval: The InfoSleuth System </li></ul><ul><li>Sources for Ontology Construction </li></ul><ul><li>The Ontology Design Process: </li></ul><ul><ul><li>“ Reverse Engineering” from a database schema </li></ul></ul><ul><ul><li>Ontology refinement based on user queries </li></ul></ul><ul><li>Enhancing the ontology </li></ul><ul><ul><li>Using a data dictionary </li></ul></ul><ul><ul><li>Using a Thesaurus </li></ul></ul><ul><li>Conclusions and Future Work </li></ul>
  3. 3. Ontologies for Information Retrieval: The InfoSleuth System KQML/KIF agents Domain Ontology Resource Agent Resource Agent Resource Agent User Agent User Agent User Agent
  4. 4. Ontologies for Information Retrieval <ul><li>Provide a concise, uniform, declarative description of semantic information </li></ul><ul><li>Independent of syntactic representations, conceptual models of the underlying information bases </li></ul><ul><li>Domain models provide wider access by supporting multiple world views on the same underlying data </li></ul><ul><li>EDEN ontology defined in the context of the InfoSleuth system: </li></ul><ul><ul><li>important and crucial to capture elements of environmental information </li></ul></ul>
  5. 5. Sources for Ontology construction <ul><li>Pre-existing Database Schemas </li></ul><ul><ul><li>data directed component </li></ul></ul><ul><li>Collection of representative set of queries possibly parameterized based on application user interface </li></ul><ul><ul><li>application directed component </li></ul></ul><ul><li>Thesauri and Vocabularies (e.g., EEA Thesaurus) </li></ul><ul><ul><li>knowledge directed component </li></ul></ul><ul><li>Ontology = knowledge-based middle ground between applications and data !!! </li></ul>
  6. 6. The Ontology Design Process Ontology from Database Schema Ontology from Queries Choose new Database Schema Abstract details from Database Schema Determine entities and attributes Group information, Analyze foreign keys and dependencies Determine Relationships Evaluate Ontology Implement and Test Drop entities and attributes Add new entities and attributes Add new subclasses and superclasses Choose new query No more queries
  7. 7. Reverse Engineering from a Database Schema <ul><li>Abstraction of details related to: </li></ul><ul><ul><li>data organization </li></ul></ul><ul><ul><li>local keys </li></ul></ul><ul><li>Grouping information in multiple tables </li></ul><ul><li>Identifying Relationships </li></ul><ul><li>Incorporating new concepts suggested by new schema </li></ul>
  8. 8. Environmental Databases <ul><li>CERCLIS 3 </li></ul><ul><ul><li>http://www.epa.gov/enviro/html/cerclis/cerclis_overview.html </li></ul></ul><ul><li>ITT </li></ul><ul><li>HAZDAT </li></ul><ul><ul><li>http://www.atsdr.cdc.gov/hazdat.html </li></ul></ul><ul><li>ERPIMS </li></ul><ul><ul><li>http://www.resdyn.com/erpims </li></ul></ul><ul><li>Basel Convention Database </li></ul><ul><ul><li>http://www.unep.ch/basel </li></ul></ul>
  9. 9. Abstracting out details related to local keys Site_Characteristic site_id (PK, FK to Site) rsic_code (PK, FK to Ref_Sic) sc_date Site Id date name code Database Schema Ontology Site site_id (PK) site_name site_ifms_ssid_ code site_rcra_id site_epa_id
  10. 10. Grouping Information in Multiple Tables Site site_id (PK) site_name site_ifms_ssid_ code site_rcra_id site_epa_id Site_Characteristic site_id (PK, FK to Site) rsic_code (PK, FK to Ref_Sic) sc_date Ref_Sic rsic_code (PK) rsic_code_desc Site_Alias site_id (PK, FK to Site) site_alias_id (PK) sa_name Database Schema Ontology Site date name code alias_name description
  11. 11. Identifying Relationships Site site_id (PK) site_name site_ifms_ssid_ code site_rcra_id site_epa_id Action site_id (PK, FK to Site) rat_code (PK, FK to ref_action_type) act_code_id (PK) Database Schema Ontology Ref_action_type rat_code (PK) rat_name rat_def Waste_Src_Media_Contaminated wsmrc_nmbr (PK) site_id (PK, FK to Action) rat_code (FK to Action) act_code_id (FK to Action) Remedial_Response site_id act_code_id rat_code Site Contaminant RemedialResponse PerformedAt actionName
  12. 12. Incorporation of new concepts from a different database schema Sends_Export uncode toxics_code toxics_description exporter importer Receives_Export uncode toxics_code toxics_description exporter importer Database Schema Ontology Site Contaminant RemedialResponse PerformedAt ExportsTo ImportsFrom Country
  13. 13. Ontology refinement based on user queries <ul><li>Addition of New Attributes </li></ul><ul><ul><li>At NPL sites with a land use category of INDUSTRIAL, what is the cleanup level range for LEAD …. </li></ul></ul><ul><ul><li>Add an attribute landUseCategory to the entity Site in the ontology </li></ul></ul><ul><li>Addition of new Relationships </li></ul><ul><ul><li>What is the range of concentrations for ARSENIC is a contaminant of concern in the SURFACE SOIL at NPL sites </li></ul></ul><ul><ul><li>Add a relationship HasContaminant between the entities Site and Contaminant in the ontology </li></ul></ul><ul><li>Addition of class-subclass relationships and new entities </li></ul><ul><ul><li>How many Super fund sites are in Edison County, New Jersey ? </li></ul></ul><ul><ul><li>Add an entity SuperFundSite as a subclass of Site in the ontology </li></ul></ul>
  14. 14. Using a data dictionary (EDR) to enhance the ontology <ul><li>select * from Site where state = ‘TX’ or state = ‘California’ </li></ul><ul><li>select coding_scheme 1 from Map where coding_scheme 3 = ‘TX’ </li></ul>{ “Texas”, “California” } { “TX”, “CA” } Site state StateName StateCode StateAbbr coding_scheme 1 Map coding_scheme 2 coding_scheme 3
  15. 15. Enhancing the Ontology by using a Thesaurus abandoned site THEME POLLUTION BT land setup NT disused military site LandSetup Site AbandonedSite DisusedMilitarySite SuperfundSite
  16. 16. Conclusions and Future Work <ul><li>Role of semantic content in handling data/information overload </li></ul><ul><ul><li>Domain Specific ontologies: an approach for capturing semantic content </li></ul></ul><ul><li>Design and construction of domain ontologies </li></ul><ul><ul><li>labor intensive, time consuming, difficult endeavor </li></ul></ul><ul><li>Re-use readily information: schemas, queries, data dictionaries, thesauri </li></ul><ul><ul><li>minimize the involvement of the domain expert </li></ul></ul><ul><li>Extrapolate this technique into other domains: </li></ul><ul><ul><li>telecommunication </li></ul></ul><ul><ul><li>IP networks (use of CIM information model by DMTF) </li></ul></ul><ul><li>Apply these techniques to Knowledge Management and Acquisition </li></ul>

×