Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Approach to leverage Websites to APIs through Semantics

44 views

Published on

PhD thesis defense.

This manuscript describes a methodology designed and implemented to realise the recommendation of vocabularies based on the content of a given website. The goal of the proposed approach is to generate vocabularies by reusing existing schemas. The automatic recommendation helps to leverage websites to self-described web entities in the Web of Data; understandable by both humans and machines. In this direction, the implemented approach is wrapped within a broader methodology of turning a website in a machine understandable node by using technologies that have been developed in the scope of the Semantic Web vision. Transforming a website to a machine understandable entity is the first step required by the websites side in order to narrow the gap with web agents and enable the structured content consumption without the need of implementing an Application Programming Interface (API) that would provide read-write functionality. The motivation of the thesis stems from the fact that the data provided via an API is already presented on the corresponding website in most of the cases.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Approach to leverage Websites to APIs through Semantics

  1. 1. Approach to leverage websites to APIs through Semantics Dipl.-Ing. Ioannis Stavrakantonakis University of Innsbruck December 12, 2017 Supervisor: Univ.-Prof. Dr. Dieter Fensel - Universität Innsbruck Co-supervisor: Ass.-Prof. Dr. Anna Fensel - Universität Innsbruck External supervisor: Univ.-Prof. Dr. Sören Auer - Leibniz Universität Hannover
  2. 2. Outline • Problem & Research thesis • Contributions • Related work • Approach • Implementation • Results • Outlook • Acknowledgements • References 2
  3. 3. Problem Leveraging Web content to machine-understandable Web data is still not trivial. Data Humans Machines API 3
  4. 4. Problem Leveraging Web content to machine-understandable Web data is still not trivial. Useraccessibility Machine accessibility website annotated
 website API 4 Semantic Annotation: a piece of metadata for an informational element
 that appears in a document; it is machine interpretable and gives explicit meaning to the data of the element.
  5. 5. Problem 5 Hotel website Data Partners Travel search engines Reservation websites Search engines Results presentation Query answering ? Alternatives - Manual entry in partners’ systems. - API implementation by the website and integration by the partner. - Annotations as API: + Save the cost of building an API. + Vocabularies are reused, allowing 3rd party integrations to rely on the same schema. + Make Web content machine understandable with explicit semantics. - Learning curve could be a barrier. - Vocabulary terms discovery and limitations.
  6. 6. Problem 6 Adding annotations to websites The Semantic Web • Which format? RDFa, Microdata, JSON-LD. • Which vocabulary is more suitable? • Which vocabulary terms are relevant to the website?
  7. 7. Research Thesis Vocabulary term recommendations
 can be semi-automatically generated for a given webpage with recall over 80%. Also, the same semi-automatic approach can outperform
 the manual selection of vocabulary terms. 7
  8. 8. Outline • Problem & Research Thesis • Contributions • Related work • Approach • Implementation • Results • Outlook • Acknowledgements • References 8
  9. 9. Contributions • Ranking of the vocabularies in the Semantic Web space. • Ranking of vocabulary terms using Linked Open Data. • Recommendation of a set of vocabulary terms based on a keyword set. • Design and development of an approach to discover vocabulary terms for a given webpage. • Definition of a new vocabulary that facilitates the description of search queries and results. 9
  10. 10. Contributions Related publications • Linked Open Vocabulary Ranking and Terms Discovery
 I.Stavrakantonakis, A.Fensel, D.Fensel - Proceedings of the 12th International Conference on Semantic Systems, SEMANTiCS 2016 (short listed in the best papers) • Towards a Vocabulary Terms Discovery Assistant
 I.Stavrakantonakis, A.Fensel, D.Fensel - Proceedings of the 12th International Conference on Semantic Systems, SEMANTiCS 2016 (poster accompanying the paper) • Linked Open Vocabulary Recommendation Based on Ranking and Linked Open Data
 I.Stavrakantonakis, A.Fensel, D.Fensel - Proceedings of the Joint International Semantic Technology Conference, JIST 2015 • Matching Web Entities with Potential Actions
 I.Stavrakantonakis, A.Fensel, D.Fensel - Poster Paper track of the 10th International Conference on Semantic Systems, SEMANTiCS 2014 Implementations • Vocab-recommender framework implementation
 https://github.com/istavrak/vocab-recommender (MIT License) • vSearch vocabulary, https://purl.org/vsearch 10
  11. 11. Outline • Problem & Research Thesis • Contributions • Related work • Approach • Implementation • Results • Outlook • Acknowledgements • References 11
  12. 12. Related work • Vocabulary discovery
 Approaches that enable the exploration of the vocabulary space
 (Linked Open Vocabularies, schema.org, vocab.cc, LODStats). • Vocabulary ranking
 Assignment of a score to the vocabularies based on popularity
 (e.g. IC, LOV, TermPicker, LOVER, DWRank). • Semantic annotation tools
 Vocabulary based generators, validators, annotations editor
 (Google Structured Data TT, Microformats, schema.org editors, RDFaCE). • Vocabulary development collaboration
 Ontology building; taxonomies editing; version control systems
 (DILIGENT, Collaborative Protégé; SOBOLEO, PoolParty; Git4Voc, VoCol). • Manual semantic annotations
 Survey to measure the end result quality and the time required. 12
  13. 13. Related work - LOV • Linked Open Vocabularies (LOV) provides a comprehensive directory of all the existing vocabularies. • Online search engine of vocabularies with a user interface, an API and a SPARQL endpoint. • Provides metadata about each listed vocabulary (versions, creators, links to other vocabularies, etc.) 13
  14. 14. Related work - Linked Open Data (LOD) 14 Linguistics DBpedia Government Life Science Social Networking Publications Linked Data
 Exposing, sharing, and connecting pieces of data using URIs. LOD Cloud Datasets published as Linked Data,
 interlinked with other datasets
 in the cloud.
  15. 15. Outline • Problem & Research Thesis • Contributions • Related work • Approach • Implementation • Results • Outlook • Acknowledgements • References 15
  16. 16. Approach 16 Manual
 termdiscovery Approach vocab- recommender (or keywords)
  17. 17. Approach • Goal: Discover vocabulary terms for a given webpage. • Dimensions: • Popularity of vocabulary and vocabulary terms. • Success history of the vocabulary authors. • Input webpage content similarity to vocabulary terms. • Webpage content patterns and type. 17
  18. 18. Approach Discovery of vocabulary terms 1. Linked Open Vocabularies Recommender (LOVR)
 Suggests terms based on the various ranking dimensions of LOV and LOD. 2. Patterns Knowledge Base
 Suggests terms based on a set of predefined datatype patterns, e.g. email, phone. 3. Static recommendations
 Suggests terms about the structural parts of a webpage without considering the textual content, e.g. images, videos. 18
  19. 19. Approach Linked Open Vocabularies Recommender (LOVR) Vocabulary ranking • reflects the position of the vocabulary within the space of vocabularies and not in conjunction with a keyword search • uses the authority score of the vocabulary creators based on the ranking of the previous vocabularies 19 Bv = Number of the backlinks to v = Number of vocabularies in LOV Sv = Authority score for v , Av = Authors set for v Va = Vocabularies set for author a VR(uk) = Score of vocabulary k based on incoming links
  20. 20. Approach Linked Open Vocabularies Recommender (LOVR) Vocabulary term LOD ranking • reflects the occurrences of the terms in the LOD space • biased by the used LOD snapshots 20 OR(t) = overall ranking of term t DR(t) = document ranking of term t , OC(t) = occurrences of term t, • vocab.cc • LODStats
  21. 21. Approach Linked Open Vocabularies Recommender (LOVR) Vocabulary term ranking • combines vocabulary LOV ranking and term LOD ranking • • higher score denotes a higher position in the ranking 21 TRLOD,t = term t ranking based on LODStats TRBTCD,t = term t ranking based on BTCD VSRLOV,V = vocabulary ranking of term’s t vocabulary V in LOV ret,k = relevance of term t with keyword k
 a = constant (e.g. 100)
  22. 22. Patterns Knowledge Base • Extraction of various datatypes that appear in the content of the webpage. • E.g. Phone, Date, Time, Price, Email, Person, Organisation. Approach 22 …
  23. 23. Static recommendations • Derive from the structural elements of a webpage. • E.g.: img, video, audio, h1, a. • Recommending the usage of schema.org terms. Approach 23 link image h1
  24. 24. Outline • Problem & Research Thesis • Contributions • Related work • Approach • Implementation • Results • Outlook • Acknowledgements • References 24
  25. 25. Implementation vSearch vocabulary
 https://purl.org/vsearch, http://lov.okfn.org/dataset/lov/vocabs/vsearch • Vocabulary that facilitates the description of the terms discovery output. • Can be used to describe any search and its results. Vocab-recommender framework (MIT License)
 https://github.com/istavrak/vocab-recommender • Web Service (endpoints for keywords and webpage). • User Interface that consumes the WS. 25
  26. 26. Implementation Describing the generated vocabulary (vSearch) https://purl.org/vSearch 26
  27. 27. Vocab-recommender Architecture Implementation 27 User LOV Static Pattern KB
  28. 28. Outline • Problem & Research Thesis • Contributions • Related work • Approach • Implementation • Results • Outlook • Acknowledgements • References 28
  29. 29. Results Evaluation Criteria • Precision Measures the effectiveness to retrieve results that are relevant, even if relevant results are missing. • Recall
 Measures the effectiveness to retrieve the total amount of results that are relevant no matter if non relevant ones have been retrieved as well. • F-Measure
 Combines precision and recall by providing the weighted harmonic mean of them. • Speed
 Time needed to generate a set of result vocabulary terms. 29
  30. 30. Results Human based evaluation
 64 evaluators (Computer Science students),
 4 use cases (article, exhibition, hotel, recipe) • Speed: The elapsed time of the discovery process has been provided by the participants:
 (manual avg. 51 mins vs approach avg. 1.26 mins) • Precision: Measured by reviewing the relevance of the proposed terms. • Recall: Measured by reviewing if the proposed terms reflect all the information. • The approach is compared to the aggregated set of terms proposed by the evaluators. 30 Article Hotel
  31. 31. Results Machine based evaluation • Precision: Measured by evaluating the relevance of the proposed terms against the existing terms. • Recall: Measured by comparing the set of proposed terms with the set of existing terms. • F2: Weighted towards the recall. 31
  32. 32. Outline • Problem & Research Thesis • Contributions • Related work • Approach • Implementation • Results • Outlook • Acknowledgements • References 32
  33. 33. Summary • Aim
 Generation of a result vocabulary for a given set of keywords. • Results
 Outperformed the manual discovery not only at the level of each participant individually, but also at the aggregated set of terms across all the participants’ result sets. • Proved thesis
 Semi-automatic generation of vocabulary term recommendations for a given webpage can reach a recall of 80%. 33
  34. 34. Future Assumption: Any machine understandable webpage can be considered eligible for consumption by web-agent applications. Vision: Seamless integration of websites with web-agent applications, that can perform various different tasks. Requirements: A. The leverage of websites to machine understandable entities (proposed approach). B. Web-agent applications should start leveraging website content for their tasks, similarly to APIs usage. 34
  35. 35. Acknowledgements 35
  36. 36. References 1. Ioannis Stavrakantonakis, Andreas Thalhammer, Alex Oberhauser, Corneliu-Valentin Stanciu, Ioan Toma. D2.4/ D2.5 – e-Freight Semantic Registry and Repository / e-Freight SESA platform. Technical report, European e- Freight capabilities for Co-modal transport, 04 2013. 2. Ioannis Stavrakantonakis, Andreas Thalhammer, Alex Oberhauser, Corneliu-Valentin Stanciu, Ioan Toma, Audun Vennesland, Thomas Cane. Introduction of the Semantically Enabled Service Architecture to the freight domain. 2nd International Conference on Applied Paperless Freight Transport and Logistics, 2012. 3. Ioannis Stavrakantonakis, Ioan Toma, Anna Fensel, Dieter Fensel. Hotel websites, Web 2.0, Web 3.0 and online direct marketing: The case of Austria. In Information and Communication Technologies in Tourism 2014, pages 665–677. Springer International Publishing, 2014. 4. Nikos Bikakis, Chrisa Tsinaraki, Ioannis Stavrakantonakis, Nektarios Gioldasis, Stavros Christodoulakis. The SPARQL2XQuery interoperability framework. World Wide Web, pages 1–88, 2014. 5. Ioannis Stavrakantonakis. Personal data and user modelling in tourism. In Information and Communication Technologies in Tourism 2013, pages 507–518. Springer, 2013. 6. Ioannis Stavrakantonakis, Andreea-Elena Gagiu, Harriet Kasper, Ioan Toma, Andreas Thalhammer. An approach for evaluation of social media monitoring tools. Common Value Management, 52, 2012. 7. Andreas Thalhammer, Ioannis Stavrakantonakis, and Ioan Toma. Diversity-aware clustering of SIOC posts. In I- SEMANTICS (Posters & Demos) 2013. Citeseer, 2013. 8. Anna Fensel, Ioan Toma, José María García, Ioannis Stavrakantonakis, Dieter Fensel. Enabling customers engagement and collaboration for small and medium-sized enterprises in ubiquitous multi-channel ecosystems. Computers in Industry, 65(5):891–904, 2014. 36

×