Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Semantic Interoperability Methodologies


Published on

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Semantic Interoperability Methodologies

  1. 1. Semantic Interoperability Methodologies Johann Höchtl Danube University Krems Center for E-Government Austria
  2. 2. Bridges <ul><li>Bridging Concepts </li></ul><ul><li>Corn vs. Rice </li></ul><ul><li>Red Wine vs. Sake </li></ul><ul><li>Surströmming vs. thousand-year egg </li></ul><ul><li>Lederhosen vs. Sari </li></ul><ul><li>Bachblüten vs. Reiki </li></ul><ul><li>. </li></ul><ul><li>. </li></ul>Istanbul Bridge Map by Europe Asia
  3. 3. Super – Sub - Concepts <ul><li>Corn vs. Rice </li></ul><ul><li>Red Wine vs. Sake </li></ul><ul><li>Surströmming vs. thousand-year egg </li></ul>Know Everything vs. Domain Specific Knowledge? Food Protein Alcohol Nat. Preservative Lederhosen vs. Sari Clothing Natural Materials Bachblüten vs. Reiki Medicine Alternative Medicine Superconcept / Higher Ontology Sub-Concept / Lower Ontology Finance  Buy Logistic  Store
  4. 4. Who we are and What we do <ul><li>Danube University Krems, Austria </li></ul><ul><ul><li>Only State-Owned Post Graduate University </li></ul></ul><ul><li>Center for E-Government </li></ul><ul><ul><li>E-Inclusion and E-Participation and their impacts on electronic society </li></ul></ul><ul><ul><li>http:// </li></ul></ul><ul><ul><li>Journal of E-Democrcy and Open Government http:// </li></ul></ul><ul><li>About the Presenter </li></ul><ul><ul><li>E-Participative Processes and Models of Incorporation </li></ul></ul><ul><ul><li>Doctoral Thesis University of Vienna and Technical University of Vienna, Business Informatics, Vienna </li></ul></ul>
  5. 5. The Problem – State of Computer Automated Semantic Understanding <ul><li>“ The tragi-comic failure of Netbase can teach a lot to every company in the Semantic space.” </li></ul><ul><li>Lesson 1 : Don’t even try to boil the ocean of the WWW with these technologies. [The] Internet is full of valuable information but crap (or opinions) is 90% [of it] , the cost of getting rid of this crap and save only the good stuff is very high </li></ul><ul><li>Lesson 2 : Linguistic approaches are likely going to fail because search engines (and machines) can’t distinguish joke/seriousness, sarcasm/shame and sentiments in general. The semantic meaning is right there not in the words of a text. </li></ul><ul><li>Lesson 3 : If you choose to apply such approaches to one specific topic like Medicine (good choice) then stick to that topic , that means accept as INPUT only medical terms and provide as OUTPUTS only medical terms. </li></ul><ul><li>This last point requires human intervention and predefined taxonomies/ontologies but Netbase claims that they don’t need them both, ]i.e., that] their engine is fully automatic the failure too.” </li></ul>Reddit: Source:
  6. 6. Notions of Similarity <ul><li>How can a computer system declare two data fragments similar and to what extend? </li></ul><ul><li>Starting point: Data transformation into a computable dimension </li></ul><ul><ul><li>Canonical data structure is Matrix, X and Y Dimensions contain identified terms and their respective similarity </li></ul></ul><ul><ul><li>Visualization as tree or directed graph </li></ul></ul><ul><li>Required Computational effort is very high </li></ul><ul><ul><li>Brute-force approach: Compare every identified term of document instance A with every identified term of document instance B </li></ul></ul><ul><ul><li>Recent approaches: Genetic algorithms: “90:9:1 syndrome”: 90% of results are very good, 9% are acceptable and one percent degenerates </li></ul></ul>
  7. 7. Similarities <ul><li>Structural Similarity </li></ul><ul><ul><li>Number of Edit Operations to transform tree A (document artifact A) into tree B </li></ul></ul><ul><ul><li>Maximum common sub Graph, Minimum common super graph </li></ul></ul><ul><ul><li>Similarity Flooding </li></ul></ul><ul><ul><ul><li>Two graphs are similar if the neighborhoods for every Node are similar. </li></ul></ul></ul><ul><li>Element based Similarity </li></ul><ul><ul><li>Element names </li></ul></ul><ul><ul><li>Data types </li></ul></ul><ul><li>Similarity algorithms </li></ul><ul><ul><li>Strings: Levenshtein distance, lingusitic similarity (soundex) </li></ul></ul><ul><ul><li>Logical structure: Jaccard index, Dice coefficient, cosine similarity </li></ul></ul>
  8. 8. Ontology Similarity - Approaches <ul><li>What is the most specific common ancestor of a pair of concepts in an ontology – distance of concepts? </li></ul><ul><ul><li>Assign different similarity weights according to relationship (Synonyms, Hypernyms, Antonyms, Meronyms, …) </li></ul></ul><ul><ul><li>Measure through ontology collection (Opencyc, Wordnet, Wikipedia, DMOZ) or Knowledge Base </li></ul></ul><ul><ul><li>EDGE, LEACOCK, RESNIK,LIN, JIANG </li></ul></ul><ul><li>BUT: Abbreviations, AssocWords with delimiters (ArrivalAirportIn), Suffix/Prefix (hasName), misspellings, free invented words, … </li></ul><ul><ul><li>Human interaction (interactive mapping, enriched logic) is necessary </li></ul></ul><ul><ul><li>Solution: Knowledge Base, but high computational overhead! </li></ul></ul>
  9. 9. Research Approach <ul><li>Domain focus: Mapping of Document standards derived from UN/CEFACT CCTS Core Component Technical Specification </li></ul><ul><li>Support OAGIS 9.1, GS1 XML, UBL 2.0 and UN/CEFACT CCL 07B as common ancestor </li></ul><ul><li>Specialist approach: Reuse semantic knowledge instead of re-create already existing knowledge </li></ul><ul><ul><li>Automatically inferable semantic knowledge is in data types, structural similarity, element names. </li></ul></ul><ul><li>Feed an inference engine to create the upper ontology </li></ul><ul><li>The „harmonized“ upper ontology contains the relationship between document artifacts </li></ul>
  10. 10. Results <ul><li>Explicit rules incorporated in inference process </li></ul><ul><li>Heuristics to Discover Structurally Different </li></ul><ul><ul><li>Association Document Component and Basic Document Component Pairs </li></ul></ul><ul><ul><li>Different Basic Document Component </li></ul></ul><ul><ul><li>Association Document Components </li></ul></ul><ul><li>Recall rate for a domain specific mapper is higher as one relying on automatic inference: </li></ul><ul><ul><li>Success rate in identifying UBL ABIE to GS1 XML ABIEs 88.1% </li></ul></ul><ul><ul><li>False positive hits ~ 10% </li></ul></ul><ul><li>Repository of XSLT mappings as cache </li></ul><ul><li>GUI: Semantic Interoperability Service Utility (“ISU”) </li></ul><ul><li>Tryout at </li></ul><ul><li>OASIS SET TC at http:// www. oasis set / </li></ul>
  11. 11. Conclusion and outlook <ul><li>Targeted towards SMEs to overcome different communication standards in different domains </li></ul><ul><ul><li>RossettaNET vs. OAGIS vs. HL7 vs. … </li></ul></ul><ul><li>Current implementation focuses on CCL 07B derivatives </li></ul><ul><ul><li>But expandable model! </li></ul></ul><ul><li>Applicability beyond SMEs and Supply Chain / Invoicing </li></ul><ul><ul><li>Northern European Subset of UBL (NES) </li></ul></ul><ul><ul><li>cooperation on e-commerce and e-procurement </li></ul></ul><ul><ul><li>purpose is to facilitate harmonization of different types of e-procurement documents in countries that are already using UBL </li></ul></ul><ul><ul><li>Consequence: Data exchange between NES and OAGIS, UBL, HL7 … a use case for SET! </li></ul></ul>
  12. 12. THANK YOU! – Questions? <ul><li>Links: </li></ul><ul><li> </li></ul><ul><li> </li></ul><ul><li>http:// www. oasis set / </li></ul><ul><li> </li></ul><ul><li> </li></ul><ul><li>Gerti Kappel, Horst Kargl, Gerhard Kramler, Andrea Schauerhuber, Martina Seidl, Michael Strommer, and Manuel Wimmer, “Matching Metamodels with Semantic Systems - An Experience Report,” Mainz , 2007, pp. 38-52. </li></ul><ul><li>Fabien Duchateau and Zohra Bellahsène, “Designing a Benchmark for the Assessment of XML Schema Matching Tools,” Vienna, Austria: ACM, 2007. </li></ul><ul><li>Hong-Hai Do and Erhard Rahm, “Matching large schemas: Approaches and evaluation,” Science Direct , 2007, pp. 857-885. </li></ul>