Approach to leverage Websites to APIs through Semantics

Approach to leverage websites
to APIs through Semantics
Dipl.-Ing. Ioannis Stavrakantonakis
University of Innsbruck
December 12, 2017
Supervisor: Univ.-Prof. Dr. Dieter Fensel - Universität Innsbruck
Co-supervisor: Ass.-Prof. Dr. Anna Fensel - Universität Innsbruck
External supervisor: Univ.-Prof. Dr. Sören Auer - Leibniz Universität Hannover

Outline
• Problem & Research thesis

• Contributions

• Related work

• Approach

• Implementation

• Results

• Outlook

• Acknowledgements

• References
2

Problem
Leveraging Web content to machine-understandable Web
data is still not trivial.
Data
Humans
Machines
API
3

Problem
Leveraging Web content to machine-understandable Web
data is still not trivial.
Useraccessibility
Machine accessibility
website annotated 
website
API
4
Semantic Annotation:
a piece of metadata for
an informational element 
that appears in a
document; it is machine
interpretable and gives
explicit meaning to the
data of the element.

Problem
5
Hotel website
Data
Partners

Travel search engines
Reservation websites
Search engines

Results presentation
Query answering
?
Alternatives
- Manual entry in partners’ systems.
- API implementation by the website and integration by the partner.
- Annotations as API:
+ Save the cost of building an API.
+ Vocabularies are reused, allowing 3rd party integrations to rely on the same schema.
+ Make Web content machine understandable with explicit semantics.
- Learning curve could be a barrier.
- Vocabulary terms discovery and limitations.

Problem
6
Adding annotations to websites
The Semantic Web
• Which format?

RDFa, Microdata, JSON-LD.
• Which vocabulary is more
suitable?

• Which vocabulary terms
are relevant to the website?

Research Thesis
Vocabulary term recommendations 
can be semi-automatically generated for a given webpage
with recall over 80%.
Also, the same semi-automatic approach can outperform 
the manual selection of vocabulary terms.
7

Outline
• Problem & Research Thesis

• Contributions

• Related work

• Approach

• Implementation

• Results

• Outlook


• References
8

Contributions
• Ranking of the vocabularies in the Semantic Web space.

• Ranking of vocabulary terms using Linked Open Data.

• Recommendation of a set of vocabulary terms based on a
keyword set.

• Design and development of an approach to discover
vocabulary terms for a given webpage.

• Deﬁnition of a new vocabulary that facilitates the
description of search queries and results.
9

Contributions
Related publications
• Linked Open Vocabulary Ranking and Terms Discovery 
I.Stavrakantonakis, A.Fensel, D.Fensel - Proceedings of the 12th International Conference on
Semantic Systems, SEMANTiCS 2016 (short listed in the best papers)

• Towards a Vocabulary Terms Discovery Assistant 
I.Stavrakantonakis, A.Fensel, D.Fensel - Proceedings of the 12th International Conference on
Semantic Systems, SEMANTiCS 2016 (poster accompanying the paper)

• Linked Open Vocabulary Recommendation Based on Ranking and Linked Open Data 
I.Stavrakantonakis, A.Fensel, D.Fensel - Proceedings of the Joint International Semantic
Technology Conference, JIST 2015

• Matching Web Entities with Potential Actions 
I.Stavrakantonakis, A.Fensel, D.Fensel - Poster Paper track of the 10th International Conference
on Semantic Systems, SEMANTiCS 2014

Implementations
• Vocab-recommender framework implementation 
https://github.com/istavrak/vocab-recommender (MIT License)

• vSearch vocabulary, https://purl.org/vsearch
10

Outline

• Contributions

• Related work

• Approach

• Implementation

• Results

• Outlook


• References
11

Related work
• Vocabulary discovery 
Approaches that enable the exploration of the vocabulary space 
(Linked Open Vocabularies, schema.org, vocab.cc, LODStats).
• Vocabulary ranking 
Assignment of a score to the vocabularies based on popularity 
(e.g. IC, LOV, TermPicker, LOVER, DWRank).
• Semantic annotation tools 
Vocabulary based generators, validators, annotations editor 
(Google Structured Data TT, Microformats, schema.org editors, RDFaCE).
• Vocabulary development collaboration 
Ontology building; taxonomies editing; version control systems 
(DILIGENT, Collaborative Protégé; SOBOLEO, PoolParty; Git4Voc, VoCol).
• Manual semantic annotations 
Survey to measure the end result quality and the time required.
12

Related work - LOV
• Linked Open Vocabularies (LOV) provides a comprehensive
directory of all the existing vocabularies.
• Online search engine of vocabularies with a user
interface, an API and a SPARQL endpoint.
• Provides metadata about each listed vocabulary (versions,
creators, links to other vocabularies, etc.)
13

Related work - Linked Open Data (LOD)
14
Linguistics
DBpedia
Government
Life Science
Social Networking
Publications
Linked Data 
Exposing, sharing, and connecting
pieces of data using URIs.
LOD Cloud
Datasets published as Linked Data, 
interlinked with other datasets 
in the cloud.

Outline

• Contributions

• Related work

• Approach

• Implementation

• Results

• Outlook


• References
15

Approach
16
Manual 
termdiscovery
Approach
vocab-
recommender
(or keywords)

Approach
• Goal: Discover vocabulary terms for a given webpage.

• Dimensions:

• Popularity of vocabulary and vocabulary terms.
• Success history of the vocabulary authors.
• Input webpage content similarity to vocabulary terms.
• Webpage content patterns and type.
17

Approach
Discovery of vocabulary terms
1. Linked Open Vocabularies Recommender (LOVR) 
Suggests terms based on the various ranking dimensions
of LOV and LOD.

2. Patterns Knowledge Base 
Suggests terms based on a set of predeﬁned datatype
patterns, e.g. email, phone.

3. Static recommendations 
Suggests terms about the structural parts of a webpage
without considering the textual content, e.g. images,
videos.
18

Approach
Linked Open Vocabularies Recommender (LOVR)
Vocabulary ranking
• reﬂects the position of the vocabulary within the space of
vocabularies and not in conjunction with a keyword
search
• uses the authority score of the vocabulary creators based
on the ranking of the previous vocabularies
19
Bv = Number of the backlinks to v
= Number of vocabularies in LOV
Sv = Authority score for v
,
Av = Authors set for v
Va = Vocabularies set for author a
VR(uk) = Score of vocabulary k
based on incoming links

Approach
Vocabulary term LOD ranking
• reﬂects the occurrences of the terms in the LOD space
• biased by the used LOD snapshots
20
OR(t) = overall ranking of term t
DR(t) = document ranking of term t
,
OC(t) = occurrences of term t,
• vocab.cc
• LODStats

Approach
Vocabulary term ranking
• combines vocabulary LOV ranking and term LOD ranking
•
• higher score denotes a higher position in the ranking
21
TRLOD,t = term t ranking based on LODStats
TRBTCD,t = term t ranking based on BTCD
VSRLOV,V = vocabulary ranking of term’s t vocabulary V in LOV
ret,k = relevance of term t with keyword k 
a = constant (e.g. 100)

Patterns Knowledge Base
• Extraction of various datatypes that appear in the content of
the webpage.
• E.g. Phone, Date, Time, Price, Email, Person, Organisation.
Approach
22
…

Static recommendations
• Derive from the structural elements of a webpage.
• E.g.: img, video, audio, h1, a.
• Recommending the usage of schema.org terms.
Approach
23
link
image
h1

Outline

• Contributions

• Related work

• Approach

• Implementation

• Results

• Outlook


• References
24

Implementation
vSearch vocabulary 
https://purl.org/vsearch, http://lov.okfn.org/dataset/lov/vocabs/vsearch
• Vocabulary that facilitates the description of the terms
discovery output.
• Can be used to describe any search and its results.

Vocab-recommender framework (MIT License) 
https://github.com/istavrak/vocab-recommender

• Web Service (endpoints for keywords and webpage).
• User Interface that consumes the WS.
25

Implementation
Describing the generated vocabulary (vSearch)
https://purl.org/vSearch
26

Vocab-recommender Architecture
Implementation
27
User
LOV
Static
Pattern KB

Outline

• Contributions

• Related work

• Approach

• Implementation

• Results

• Outlook


• References
28

Results
Evaluation Criteria
• Precision
Measures the effectiveness to retrieve results that are
relevant, even if relevant results are missing.
• Recall 
Measures the effectiveness to retrieve the total amount
of results that are relevant no matter if non relevant
ones have been retrieved as well.
• F-Measure 
Combines precision and recall by providing the
weighted harmonic mean of them.
• Speed 
Time needed to generate a set of result vocabulary
terms.
29

Results
Human based evaluation 
64 evaluators (Computer Science students), 
4 use cases (article, exhibition, hotel, recipe)
• Speed: The elapsed time of the discovery process has been
provided by the participants: 
(manual avg. 51 mins vs approach avg. 1.26 mins)
• Precision: Measured by reviewing the relevance of the
proposed terms.
• Recall: Measured by reviewing if the proposed terms reﬂect
all the information.
• The approach is compared to the aggregated set of terms
proposed by the evaluators.
30
Article
Hotel

Results
Machine based evaluation
• Precision: Measured by evaluating
the relevance of the proposed
terms against the existing terms.
• Recall: Measured by comparing
the set of proposed terms with
the set of existing terms.
• F2: Weighted towards the recall.
31

Outline

• Contributions

• Related work

• Approach

• Implementation

• Results

• Outlook


• References
32

Summary
• Aim 
Generation of a result vocabulary for a given set of
keywords.

• Results 
Outperformed the manual discovery not only at the level of
each participant individually, but also at the aggregated set
of terms across all the participants’ result sets.
• Proved thesis 
Semi-automatic generation of vocabulary term
recommendations for a given webpage can reach a recall of
80%.
33

Future
Assumption: Any machine understandable webpage can
be considered eligible for consumption by web-agent
applications.
Vision: Seamless integration of websites with web-agent
applications, that can perform various different tasks.
Requirements:
A. The leverage of websites to machine understandable
entities (proposed approach).
B. Web-agent applications should start leveraging website
content for their tasks, similarly to APIs usage.
34

References
1. Ioannis Stavrakantonakis, Andreas Thalhammer, Alex Oberhauser, Corneliu-Valentin Stanciu, Ioan Toma. D2.4/
D2.5 – e-Freight Semantic Registry and Repository / e-Freight SESA platform. Technical report, European e-
Freight capabilities for Co-modal transport, 04 2013.

2. Ioannis Stavrakantonakis, Andreas Thalhammer, Alex Oberhauser, Corneliu-Valentin Stanciu, Ioan Toma, Audun
Vennesland, Thomas Cane. Introduction of the Semantically Enabled Service Architecture to the freight domain.
2nd International Conference on Applied Paperless Freight Transport and Logistics, 2012.

3. Ioannis Stavrakantonakis, Ioan Toma, Anna Fensel, Dieter Fensel. Hotel websites, Web 2.0, Web 3.0 and online
direct marketing: The case of Austria. In Information and Communication Technologies in Tourism 2014, pages
665–677. Springer International Publishing, 2014.

4. Nikos Bikakis, Chrisa Tsinaraki, Ioannis Stavrakantonakis, Nektarios Gioldasis, Stavros Christodoulakis. The
SPARQL2XQuery interoperability framework. World Wide Web, pages 1–88, 2014.

5. Ioannis Stavrakantonakis. Personal data and user modelling in tourism. In Information and Communication
Technologies in Tourism 2013, pages 507–518. Springer, 2013.

6. Ioannis Stavrakantonakis, Andreea-Elena Gagiu, Harriet Kasper, Ioan Toma, Andreas Thalhammer. An approach
for evaluation of social media monitoring tools. Common Value Management, 52, 2012.

7. Andreas Thalhammer, Ioannis Stavrakantonakis, and Ioan Toma. Diversity-aware clustering of SIOC posts. In I-
SEMANTICS (Posters & Demos) 2013. Citeseer, 2013.

8. Anna Fensel, Ioan Toma, José María García, Ioannis Stavrakantonakis, Dieter Fensel. Enabling customers
engagement and collaboration for small and medium-sized enterprises in ubiquitous multi-channel ecosystems.
Computers in Industry, 65(5):891–904, 2014.
36

Approach to leverage Websites to APIs through Semantics

More Related Content

What's hot

Similar to Approach to leverage Websites to APIs through Semantics

Recently uploaded

Approach to leverage Websites to APIs through Semantics