Ekaw ontology learning for cost effective large-scale semantic annotation

740 views

Published on

Published in: Education, Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
740
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
15
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Ekaw ontology learning for cost effective large-scale semantic annotation

  1. 1. Ontology Learning for Large-scale Semantic Annotation of Web service Interfaces Shahab Mokarizadeh (Royal Institute of Technology ) Peep Kungas (Univeristy of Tartu) Mihhail Matskin (Royal Institute of Technology)
  2. 2. Motivation •Motivation: Analysis of public web-services for Identifying Missing but Valuable Web service (to be implemented) •Materials : - Corpus of circa 15000 WSDL documents (http://www.soatrader.com/web-services ) •Challenges : - Absence of any kind of semantic information (e.g. documentation) in around 95% of WSDLs - Frequent misspelling, abbreviation, technical words, etc. 2
  3. 3. Initial Step : Knowledge Acquisition •Knowledge about Web-services themselves: - Functionality of service - Attributes of service (e.g. QoS, Rating, etc) - Structural relations with other services, - ….. » Ontology of Services •Knowledge about Web-service Domain - Domain Concepts and Relations » Domain Ontology √ •Knowledge Acquisition → Ontology Learning 3
  4. 4. Domain Ontology Learning Granularity Granularity of Term Extraction from WSDL : - Coarse Grained: • Service Names • Operation Name • ….. - Fine Grained:√ • Part names of input/output parameters • XML Schema leaf element names 4
  5. 5. Ontology Learning Method • Pattern based method: Input text is scanned for predefined “ lexico- syntactic” patterns where the pattern indicates a relation of interest , either “taxonomic” or “non- taxonomic “. • Pattern based method is applicable because: Underlying extracted terms so often follow specific patterns. 5
  6. 6. Information Elicitation Ontology Learning Steps Term Extraction Syntactic Refinement Ontology Discovery Information Extraction: Pattern-based Semantic • Start with fine-grained granularity Analysis • If term is ambiguous , terms from Term Disambiguation coarse granularity are incorporated Class and Relation Determination Ontology Enrichment Adding Relations Ontology 6
  7. 7. Lexico-Syntactic Term Analysis -1 1- (Noun1)+ …+(Nounn) e.g. PictureIdentifier Term:(N|Wordn) [(nn)(N|Word1) + .. +(nn) (N|Wordn-1)] (Header) [ Modifier ] Identifier Picture Concept & Relation Example Identification Modifier isA Concept Picture isA Concept Header isA Concept Identifier isA Concept Term subConceptOf Header PictureIdentifier subConceptOf Identifier Modifier hasProperty Term Picture hasProperty PictureIdentifier “PictureIdentifier” isInstanceOf PictureIdentifier 7
  8. 8. Lexico-Syntactic Term Analysis -2 2- (Adj1)+ …+(Nounn) e.g.SupportedImage (N|Wordn) [(mod)(A|Word1)+…+(nn) (N|Wordn-1)] (Header) [ Modifier ] Image Supported Concept & Relation Example Identification Header isA Concept Image isA Concept Term subConceptOf Header SupportedImage subConceptOf Image “SupportedImage“ isInstanceOf SupportedImage 8
  9. 9. Adding other Non-Taxonomic Relations Exploiting WordNet to find following relations: • SynonymOf :(having a common synset) • SimilarTo: (based on taxonomy and corpus statistics of words) • More …. Example: • Image isSynonymOf Picture 9
  10. 10. Evaluation •Comparing automatically Generated Ontology with Golden Ontology : • Common Concepts: 862(out of 1391)≈62% • Common Instances: 1313 (out of 1853) ≈71% – Instance Level: • Precision: 85% • Recall: 78% 10
  11. 11. Conclusion •Pattern based ontology building is promising but not enough! •The result is not a really ontology (e.g. upper level concepts are missing) . •More non-taxonomic relations need to be discovered. 11
  12. 12. Question? Thanks! 12

×