In recent years, several important content providers such as Amazon, Musicbrainz, IMDb, Geonames, Google, and Twitter, have chosen to export their data through Web services. To unleash the potential of these sources for new intelligent applications, the data has to be combined across different APIs.
To this end, we have developed ANGIE, a framework that maps the knowledge provided by Web services dynamically into a local knowledge base. ANGIE represents Web services as views with binding patterns over the schema of the knowledge base. In this talk, I will focus on two problems related to our framework.
In the first part, the focus will be on the automatic integration of new Web services. I will present a novel algorithm for inferring the view definition of a given Web service in terms of the schema of the global knowledge base. The algorithm also generates a declarative script can transform the call results into results of the view. Our experiments on real Web services show the viability of our approach.
The second part will address the evaluation of conjunctive queries under a budget of calls. Conjunctive queries may require an unbound number of calls in order to compute the maximal answers. However, Web services typically allow only a fixed number of calls per session. Therefore, we have to prioritize query evaluation plans. We are working on distinguishing among all plans that could return answers those plans that actually will. Finally, I will show an application for this new notion of plans.
2. Motivating example
Long term goal: new intelligent applications such as
Applications that automatically compute vacations plans
Example:
• I would like to travel for 3 weeks in South America
• Visit UNESCO sites
• Old palaces
2
3. 3
Automatic computation of vacation plans
Personal Calendar
Web Services API
Traveling Related Books
Web Services API
Flights
Web Services API
Countries, Cities, Airports
Web Services API
4. Web Service APIs available on the Web
ProgrammableWeb.com counts >12000 APIs from various domains:
• Search (3200 APIs)
• Social (3000 APIs)
• Traveling (1200 APIs)
• Music (1000 APIs)
• Financial (1200 APIs), Science (600 APIs), Weather (300 APIs)
4
5. Query examples
• Places in Peru listed as UNESCO heritage
• Books written by South American Nobel Prize Winners
• Memorial houses of Brazilian Kings
5
6. Our research
• Query Evaluation using Web Service APIs
• Mapping Web Services to Knowledge Bases
6
Web Services WWW
SUSIE
Web Services
ANGIE
KB
Web Services Knowledge
Base
DORIS
8. 8
Problem Description
Given a query Q against
• a knowledge base (KB)
• a set of Web services F
• a bound Max for the number of Web service calls
compute answers for Q using at most Max calls
8
9. 9
Representing functions of Web Service APIs
A function is a named parameterized conjunctive query where
• Inputs must be bound to entities before the call execution
• Outputs are bound as the result of the call
• Relations are from a global schema (knowledge base schema)
outputinput
parent
p_place
birthplace
?child
?c_place
birthplace
hasChild
getChildren(parent, p_place,?child, ?c_place)
getChildren(parent, p_place,?child, ?c_place) :- birthplace(parent, p_place),
hasChild(parent, ?child)
9
11. Baseline Solution (aiming at completeness)
getChildren
birthplace
hasChild
getChildren
birthplace
hasChild
X
Brussels
birthplace
Isabella of Austria
getChildren
hasChild
….
getChildren birthplace
hasChild
birthplace
Palace of São
Cristóvão, Rio de
Janeiro
Pedro II of Brazil
birthplace
Kensington Palace,
London
Queen Victoria of the UK
But I only have a small budget of calls !
11
12. ANGIE Algorithm: the bang for the buck
birthplace
?place
Pedro II of Brazil
parent
p_place
birthplace
hasChild
Pedro II of Brazil
hasChild
Pedro I of Brazil
Ajuda, Lisbon
birthplace
hasChild
Juan VI of Portugal
parent
p_place
birthplace
hasChild
Querluz Palace, Lisbon
Palace of São
Cristóvão, Rio de
Janeiro
Juan VI of Portugal
Ajuda, Lisbon
Pedro I of Brazil
parent
p_place
birthplace
hasChild ?child
?c_place
birthplace
12
13. 13
Property
For a pipeline of calls:
W1 < W2 <… Wi … Wn < Q
where the inputs are extracted using the local queries
Q1
KB Q2
KB … Qi
KB … Qn
KB
If the knowledge base has answers for Qi
KB then
execute only Wi … Wn
13
14. Web call composition graph
YAGO
Query
?place
birthplace
?personid
hasId
getInfoByPersonId
?idperson
getPersonId
hasId
GetChildren
Juan VI of Portugal, Ajuda
GetChildren
Pedro I of Brazil
Pedro II of Brazil
GetPersonId
GetInfoByPersonId
id_Pedro-II
14
16. 16
ANGIE: Active Knowledge & Interaction Exploration
Query Mediator
Dynamically computes the Web calls that answer the query
RDF Warehouse
• The local KB stores the results of all executed Web calls
• Stored call results may speed-up the evaluation of related queries
16
Active Knowledge : Dynamically Enriching RDF Knowledge Bases by Web Services.
with F. M. Suchanek, G. Kasneci, T. Neumann, W. Yuan, G. Weikum, SIGMOD 2010
18. Problem: Asymmetric accesses
• Consider a source publishing only the Web service:
getLeaderInfo(leader, type, country)
• And the queries:
Q1: getLeaderInfo(Pablo II, ?, ?)
Q2: getLeaderInfo(?, ?, Brazil)
Q3: getLeaderInfo(?, king, Brazil)
18
Easy
Impossible
Impossible
DB of leaders
1 million calls and two will succeed
19. Our Approach: Use the Web as an Oracle
Example: implement “get head by country and type”
19
King, Brazil “King of Brazil?”
Lula
Pedro I
Pedro II
HTMLInformation
Extraction (IE)
getLeaderInfo
King, Brazil
getLeaderInfo
King, Brazil
getLeaderInfo
President, Brazil
3 calls and 2 will succeed
X
20. Model oracles as functions
20
HTML
Information
Extraction (IE)
[outputs (verified by WS)]
[country, head-type] “[type] of [country]”
oracleGetCandidates(person, type, country)
countryheadOf
?person country
type
type
23. Consider the additional Web services
getCurrentLeader(country, leader)
countryheadOf
leader country
getPredecessor(leader, pLeader, pType, pDate, pCountry)
predecessor
leader
countryheadOf
leader country
type
type
date
inauguration
25. countryheadOf
type
King
Brazil
inauguration
Smart calls vs. relevant but “guess” plans
25
countryheadOf
type
King
inauguration
getCurrentLeader(Brazil)
getPredecessor(leader)
oracleGetCandidates(Brazil, King)
getInaugurationDay(leader)
Brazil
predecessor
26. Smart calls
Given a call Wi that belongs to a plan W1,… Wi,… Wn we say Wi
is a smart call if its consequences are:
• either included in the union of the consequences of the
previous functions Wi-1, ... W1
• or are atoms of the query
Property:
If a plan consists of only smart calls, and if every call has
results, then the plan will deliver an answer for the query.
26
27. 27
Experiments
50 Web services from three domains:
• Books
• isbndb.org
• librarything.com
• abebooks.com
• Movies
• internetvideoarchive.com (IVA)
• Music
• musicbrainz.org
• last.fm
• discogs.com
• lyricWiki.org
27
28. Evaluation results
28
Get prize winners TD ANGIE SUSIE
Nobel Prize in Literature 0 0 14
Golden Pen Award 0 0 11
Franz Kafka Prize 0 0 5
American Book Medal 0 0 16
Jerusalem Prize 0 0 11
Get books of winners of prize TD ANGIE SUSIE
Nobel Prize Literature 0 0 198
Golden Pen Award 0 0 228
Franz Kafka Prize 0 0 132
Jerusalem Prize 0 0 220
Get books of winners by prize and country TD ANGIE SUSIE
Nobel Prize Literature, France 0 0 144
Franz Kafka Prize, UK 0 0 79
29. Related Work: Answering Queries using Views
• Maximal contained rewritings (MCR)
• Plans computing the largest number of answers
• Approaches based on reducing the number of irrelevant calls
•Benedict & al. PODS 2011, VLDB 2012
•S. Kambhampa, JIIC 2004
• SUSIE does not target maximal contained rewritings
• Relevant calls for MCR includes all calls that might return results
• Smart calls are a subset of relevant calls.
29
30. SUSIE
• Addressed the problem of asymmetric accesses
• A novel approach to answer such queries where the inputs
for the Web service call are extracted on the fly, from the Web
• New evaluation algorithm that prioritizes smart calls
• An experimental evaluation using a representative set of
queries and real data sources
30
SUSIE: Search Using Services and Information Extraction.
with F. M. Suchanek, W. Yuan, G. Weikum ICDE 2013
31. 31
Ongoing work
Given a query Q and a set of function F compute all smart plans
(for which it can be proven that they return answers)
31
33. Web Service API
• Web Services for applications ≅ Web forms for humans
• An API = collection Web services
• A Web Service
• expects bindings for input parameters
• returns structured data: XML or JSON
33
<geonames>
<country>
<ccode> AR </ccode>
<cname> Argentina </cname>
<isonumeric>032</isonumeric>
<fipscode> ARG <fipscode>
<continent> SA </continent>
<continentName> Argentina
</continentName>
<capital> Buenos Aires </capital>
<cities>
<city>
<name>Buenos Aires</name>
34. Goals
For every Web service:
1) Compute a parameterized query (relations are from the KB)
2) Compute a transformation script XSLT
to be applied for every call result
XML result results for the parameterized query
34
35. 1) Parameterized query for getCountryByName
35
getCountryByName(country, name, time-zone, capital, type, lat, lng
city, c_lat, c_lng)
label
country
hasCapital
time-zone
name
hasCity
type
city
label
c_lat
c_lng
lnglat
r
e
“Republic”
“ARS’’
“Argentina”
“Buenos
Aires”
f
“Buenos
Aires”
g h
“-34”
i
“-64” “Córdoba”
g h
“-31.40833”
i
“-64.18388”
f
dcba j l
“-34” “-64”
getCountryByName(Argentina)
36. r
e
“Republic”
“GMT+2’’
“Romania”
“Bucharest”
f
“Bucharest”
g h
“44.4”
i
“26.1” “Rm Valcea”
g h
“45.1”
i
“24”
f
dcba j l
“44.4” “26.1”
2) An XSLT transformation for all call results
getCountryByName(Romania, GMT+2, Bucharest, Republic,
44.4, 26.1, Bucharest, 44.4, 26.1)
getCountryByName(Romania, GMT+2, Bucharest, Republic,
44.4, 26.1, Rm Valcea, 45.1, 24)
37. General Challenges
• Heterogeneity: Every Web services has its schema for outputs
• Schemas are unknown
• >85% of Web services implemented using REST
• REST Web services do not expose schema descriptions
Our approach: use the overlapping between
Web services & Knowledge Bases
39. Three steps algorithm
1) Align root-to text-nodes to paths from the input in the KB
2) Compute class and relation alignment candidates satisfying
functional constraits
3) For each candidate compute transformation functions and check
inclusion and equivalence for the non-functional relations
Observation:
The first 2 steps alone lead to a precision/recall of of around 90%
39
40. 40
DORIS: Some experimental results
More than 50 Web services from 4 domains
• Books
• Movies
• Music
• Geo data
KB Precision Recall
Classes Relations Classes Relations
YAGO 0.92 0.91 0.96 0.93
DBpedia 0.89 0.88 0.98 0.95
BNF 1 1 1 1
40
41. Summary
• Addressed the problem of inferring views
• An instance based approach to the schema matching problem
• An experimental evaluation using real Web sources
41
DORIS: Discovering ontological relations in sources.
with Mary Koutraki, Dan Vodislav, in preparation
getCountryByName(country, name, time-zone, capital, type,
lat, lng, city, c_lat, c_lng)
label
country
hasCapital
time-zone
name
hasCity
type
city
label
c_lat
c_lng
lnglat
<geonames>
<country>
<ccode> AR </ccode>
<cname> Argentina </cname>
<isonumeric>032</isonumeric>
<fipscode> ARG <fipscode>
<continent> SA </continent>
<continentName> Argentina
</continentName>
<capital> Buenos Aires </capital>
<areaInSqKM> <areaInSqKM>
42. Our work
• Query Evaluation using Web Service APIs
• Mapping Web Services to Knowledge Bases
42
Web Services WWW
SUSIE
Web Services
ANGIE
KB
Web Services Knowledge
Base
DORIS
43. Same plan as a graph
predecessor
getPredecessor
country
Henrique Cardoso Brazil
President
type
headOfState
1 January 1995
predecessor
getPredecessor
country
Lula da Silva Brazil
President
type
headOfState
1 January 2003
getCurrentHeadOfState
Dilma Rousseff
countryheadOfState
Brazil
King
type
President
1 January 2011
BrazilDilma Rousseff
Lula da Silva
44. IE: Authors who won prize X
44
Precision Recall Prize
38% 59% National Book
62% 44% Phoenix
23% 52% Jerusalem
78% 79% Pulizer
25% 73% Franz Kafka
31% 13% Prix Femina
28% 6% Prix Decembre
41% 29% Nobel Prize
25% 73% Golden Pen
45. Challenges of an instanced-based approach
• XML elements do not correspond to entities in KB
• Entities in KB are URIs and are not to be found in call results
• What is an entity in the XML call result?
• Spurious matches (Argentina is a capital and also a person)
45
Idea: align properties expressed as text or literals first
Editor's Notes
A method represents for applications what Web forms are for Internet users
Every method is a predefined but unknown parameterized query
Heterogeneity of schemas
Every web service method has its own schema
Heterogeneity of schemas
Every web service method has its own schema
We use as gloabal schema a general purpose knowledge base
We based on the data in order to infer the mappings and not on the schema – we do not have any schema information from the REST web services or there are not constranes that we can take into account. Related Works are based on the schema information
Compute Overlapping between sources in the level of instances (leaves from XML literal nodes from RDF)