Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

E Challenges 2009 Workshop 10b Semantic Interoperability Methodologies

524 views

Published on

Published in: Education, Technology
  • Be the first to comment

  • Be the first to like this

E Challenges 2009 Workshop 10b Semantic Interoperability Methodologies

  1. 1. Semantic Interoperability Methodologies Johann Höchtl Danube University Krems Center for E-Government Austria
  2. 2. Bridges <ul><li>Bridging Concepts </li></ul><ul><li>Corn vs. Rice </li></ul><ul><li>Red Wine vs. Sake </li></ul><ul><li>Surströmming vs. thousand-year egg </li></ul><ul><li>Lederhosen vs. Sari </li></ul><ul><li>Bachblüten vs. Reiki </li></ul><ul><li>. </li></ul><ul><li>. </li></ul>Istanbul Bridge Map by Openstreetmap.org Europe Asia
  3. 3. Super – Sub - Concepts <ul><li>Corn vs. Rice </li></ul><ul><li>Red Wine vs. Sake </li></ul><ul><li>Surströmming vs. thousand-year egg </li></ul>Know Everything vs. Domain Specific Knowledge? Food Protein Alcohol Nat. Preservative Lederhosen vs. Sari Clothing Natural Materials Bachblüten vs. Reiki Medicine Alternative Medicine Superconcept / Higher Ontology Sub-Concept / Lower Ontology Finance  Buy Logistic  Store
  4. 4. Who we are and What we do <ul><li>Danube University Krems, Austria </li></ul><ul><ul><li>Only State-Owned Post Graduate University </li></ul></ul><ul><li>Center for E-Government </li></ul><ul><ul><li>E-Inclusion and E-Participation and their impacts on electronic society </li></ul></ul><ul><ul><li>http:// digitalgovernment.wordpress.com </li></ul></ul><ul><ul><li>Journal of E-Democrcy and Open Government http:// www.jedem.org </li></ul></ul><ul><li>About the Presenter </li></ul><ul><ul><li>E-Participative Processes and Models of Incorporation </li></ul></ul><ul><ul><li>Doctoral Thesis University of Vienna and Technical University of Vienna, Business Informatics, Vienna </li></ul></ul>
  5. 5. The Problem – State of Computer Automated Semantic Understanding <ul><li>“ The tragi-comic failure of Netbase can teach a lot to every company in the Semantic space.” </li></ul><ul><li>Lesson 1 : Don’t even try to boil the ocean of the WWW with these technologies. [The] Internet is full of valuable information but crap (or opinions) is 90% [of it] , the cost of getting rid of this crap and save only the good stuff is very high </li></ul><ul><li>Lesson 2 : Linguistic approaches are likely going to fail because search engines (and machines) can’t distinguish joke/seriousness, sarcasm/shame and sentiments in general. The semantic meaning is right there not in the words of a text. </li></ul><ul><li>Lesson 3 : If you choose to apply such approaches to one specific topic like Medicine (good choice) then stick to that topic , that means accept as INPUT only medical terms and provide as OUTPUTS only medical terms. </li></ul><ul><li>This last point requires human intervention and predefined taxonomies/ontologies but Netbase claims that they don’t need them both, ]i.e., that] their engine is fully automatic the failure too.” </li></ul>Reddit: Source: http://marklogic.blogspot.com/2009/09/netbase-tragicomedy-perils-of-magic-and.html
  6. 6. Notions of Similarity <ul><li>How can a computer system declare two data fragments similar and to what extend? </li></ul><ul><li>Starting point: Data transformation into a computable dimension </li></ul><ul><ul><li>Canonical data structure is Matrix, X and Y Dimensions contain identified terms and their respective similarity </li></ul></ul><ul><ul><li>Visualization as tree or directed graph </li></ul></ul><ul><li>Required Computational effort is very high </li></ul><ul><ul><li>Brute-force approach: Compare every identified term of document instance A with every identified term of document instance B </li></ul></ul><ul><ul><li>Recent approaches: Genetic algorithms: “90:9:1 syndrome”: 90% of results are very good, 9% are acceptable and one percent degenerates </li></ul></ul>
  7. 7. Similarities <ul><li>Structural Similarity </li></ul><ul><ul><li>Number of Edit Operations to transform tree A (document artifact A) into tree B </li></ul></ul><ul><ul><li>Maximum common sub Graph, Minimum common super graph </li></ul></ul><ul><ul><li>Similarity Flooding </li></ul></ul><ul><ul><ul><li>Two graphs are similar if the neighborhoods for every Node are similar. </li></ul></ul></ul><ul><li>Element based Similarity </li></ul><ul><ul><li>Element names </li></ul></ul><ul><ul><li>Data types </li></ul></ul><ul><li>Similarity algorithms </li></ul><ul><ul><li>Strings: Levenshtein distance, lingusitic similarity (soundex) </li></ul></ul><ul><ul><li>Logical structure: Jaccard index, Dice coefficient, cosine similarity </li></ul></ul>
  8. 8. Ontology Similarity - Approaches <ul><li>What is the most specific common ancestor of a pair of concepts in an ontology – distance of concepts? </li></ul><ul><ul><li>Assign different similarity weights according to relationship (Synonyms, Hypernyms, Antonyms, Meronyms, …) </li></ul></ul><ul><ul><li>Measure through ontology collection (Opencyc, Wordnet, Wikipedia, DMOZ) or Knowledge Base </li></ul></ul><ul><ul><li>EDGE, LEACOCK, RESNIK,LIN, JIANG </li></ul></ul><ul><li>BUT: Abbreviations, AssocWords with delimiters (ArrivalAirportIn), Suffix/Prefix (hasName), misspellings, free invented words, … </li></ul><ul><ul><li>Human interaction (interactive mapping, enriched logic) is necessary </li></ul></ul><ul><ul><li>Solution: Knowledge Base, but high computational overhead! </li></ul></ul>
  9. 9. Research Approach <ul><li>Domain focus: Mapping of Document standards derived from UN/CEFACT CCTS Core Component Technical Specification </li></ul><ul><li>Support OAGIS 9.1, GS1 XML, UBL 2.0 and UN/CEFACT CCL 07B as common ancestor </li></ul><ul><li>Specialist approach: Reuse semantic knowledge instead of re-create already existing knowledge </li></ul><ul><ul><li>Automatically inferable semantic knowledge is in data types, structural similarity, element names. </li></ul></ul><ul><li>Feed an inference engine to create the upper ontology </li></ul><ul><li>The „harmonized“ upper ontology contains the relationship between document artifacts </li></ul>
  10. 10. Results <ul><li>Explicit rules incorporated in inference process </li></ul><ul><li>Heuristics to Discover Structurally Different </li></ul><ul><ul><li>Association Document Component and Basic Document Component Pairs </li></ul></ul><ul><ul><li>Different Basic Document Component </li></ul></ul><ul><ul><li>Association Document Components </li></ul></ul><ul><li>Recall rate for a domain specific mapper is higher as one relying on automatic inference: </li></ul><ul><ul><li>Success rate in identifying UBL ABIE to GS1 XML ABIEs 88.1% </li></ul></ul><ul><ul><li>False positive hits ~ 10% </li></ul></ul><ul><li>Repository of XSLT mappings as cache </li></ul><ul><li>GUI: Semantic Interoperability Service Utility (“ISU”) </li></ul><ul><li>Tryout at http://144.122.230.66:9090/ISU/web </li></ul><ul><li>OASIS SET TC at http:// www. oasis -open.org/committees/ set / </li></ul>
  11. 11. Conclusion and outlook <ul><li>Targeted towards SMEs to overcome different communication standards in different domains </li></ul><ul><ul><li>RossettaNET vs. OAGIS vs. HL7 vs. … </li></ul></ul><ul><li>Current implementation focuses on CCL 07B derivatives </li></ul><ul><ul><li>But expandable model! </li></ul></ul><ul><li>Applicability beyond SMEs and Supply Chain / Invoicing </li></ul><ul><ul><li>Northern European Subset of UBL (NES) </li></ul></ul><ul><ul><li>cooperation on e-commerce and e-procurement </li></ul></ul><ul><ul><li>purpose is to facilitate harmonization of different types of e-procurement documents in countries that are already using UBL </li></ul></ul><ul><ul><li>Consequence: Data exchange between NES and OAGIS, UBL, HL7 … a use case for SET! </li></ul></ul>
  12. 12. THANK YOU! – Questions? <ul><li>Links: </li></ul><ul><li>http://www.srdc.metu.edu.tr/iSURF/OASIS-SET-TC/tools/ISU-latest.zip </li></ul><ul><li>http://144.122.230.66:9090/ISU/web </li></ul><ul><li>http:// www. oasis -open.org/committees/ set / </li></ul><ul><li>http://www.oasis-open.org/committees/download.php/32369/20090504SemanticRepresentationOfDocumentArtifacts.pdf </li></ul><ul><li>http://www.oasis-open.org/committees/download.php/33577/SET-TC.odp </li></ul><ul><li>Gerti Kappel, Horst Kargl, Gerhard Kramler, Andrea Schauerhuber, Martina Seidl, Michael Strommer, and Manuel Wimmer, “Matching Metamodels with Semantic Systems - An Experience Report,” Mainz , 2007, pp. 38-52. </li></ul><ul><li>Fabien Duchateau and Zohra Bellahsène, “Designing a Benchmark for the Assessment of XML Schema Matching Tools,” Vienna, Austria: ACM, 2007. </li></ul><ul><li>Hong-Hai Do and Erhard Rahm, “Matching large schemas: Approaches and evaluation,” Science Direct , 2007, pp. 857-885. </li></ul>

×