SlideShare a Scribd company logo
Towards Virtual Knowledge Graphs over Web APIs
Francesco Corcoglioniti
2022-11-09
postdoc @ KRDB, Free University of Bolzano,
supported by HIVE Fusion Grant project (2021-2022), OntoCRM project (2022-2024), and Ontopic s.r.l
slides available online at https://bit.ly/3WOoldB
1. Introduction
2. The VKG Framework
3. The Ontop VKG System
4. VKGs over Web APIs
5. Conclusions
Big Data Context
Towards Virtual Knowledge Graphs over Web APIs – slides available at https://bit.ly/3WOoldB 1/34
Variety Drives Data Management Initiatives
69%
25%
6%
Relative Importance
Variety
Volume
Velocity
http://sloanreview.mit.edu/article/
variety-not-volume-is-driving-big-data-initiatives/
(2016)
Data model heterogeneity
relational data, graph data, XML, JSON, CSV,
text files, ...
System heterogeneity
even when systems adopt the same data
model, they are not always fully compatible
Schema heterogeneity
different people see things differently, and
design schemas differently
Data-level heterogeneity
e.g., ‘IBM’ vs. ‘Int. Business Machines’ vs.
‘International Business Machines’
Towards Virtual Knowledge Graphs over Web APIs – slides available at https://bit.ly/3WOoldB 2/34
Querying Data Takes Time and IT Expertise (besides Domain Knowledge)
Query from Statoil (now Equinor) use case
EU FP7 Optique project
Natural language: In a given area, return
pressure data tagged with stratigraphy and
quality control attributes
SQL: huge query joining 9 tables, the main one
with 38 columns with cryptic names
Query from Sloan Digital Sky Survey use case
EU H2020 INODE project
Natural language: Get all white dwarf stars
SQL: unintelligible query defining ‘white dwarf’
SELECT objID
FROM skyserverv3_correct.star
WHERE u - g < .4 AND g - r < .7 AND
r - i > .4 AND i - z > .4
Towards Virtual Knowledge Graphs over Web APIs – slides available at https://bit.ly/3WOoldB 3/34
Virtual Knowledge Graphs (VKG) – a Data Access / Integration Solution
Three key ideas:
1. use a global (or integrated) schema and map the data sources to the global schema
2. adopt a very flexible data model for the global schema
→ Knowledge Graph (KG) whose vocabulary is expressed in an ontology.
3. exploit virtualization, i.e., the KG is not materialized, but kept virtual
This gives rise to the Virtual Knowledge Graph (VKG) approach to data access / integration, also
called Ontology-Based Data Access / Integration (OBDA)
Towards Virtual Knowledge Graphs over Web APIs – slides available at https://bit.ly/3WOoldB 4/34
Virtual Knowledge graphs (VKG) – Core Components
Ontology
conceptualizes a domain of interest in terms of
classes and (binary) properties, overall defining
the terminological knowledge (TBox) of the VKG
Data sources
provide the data forming the RDF triples, i.e., the
assertional knowledge (ABox), of the VKG
Mapping
define how to generate the RDF triples from the raw
data (e.g., relational), via mapping assertions that
populate each class/property of the ontology
Queries
formulated against the VKG (which is virtual) and
rewritten in native queries evaluated over the sources
. . .
. . .
. . .
. . .
Ontology O
Mapping M
Data sources D
query
results
Towards Virtual Knowledge Graphs over Web APIs – slides available at https://bit.ly/3WOoldB 5/34
1. Introduction
2. The VKG Framework
3. The Ontop VKG System
4. VKGs over Web APIs
5. Conclusions
VKG Framework – Which Languages to Use?
Need to balance
• expressive power
of adopted languages for O, M, q
• query answering efficiency
with respect to data size
. . .
. . .
. . .
. . .
Ontology O
Mapping M
Data sources D
query
results
W3C has standardized languages that are suitable for VKGs:
• Knowledge graph: expressed in RDF (W3C Rec. 2014 )
• Ontology O: expressed in OWL 2 QL (W3C Rec. 2012 )
• Mapping M: expressed in R2RML (W3C Rec. 2012 )
• Query q: expressed in SPARQL (W3C Rec. 2013 )
Towards Virtual Knowledge Graphs over Web APIs – slides available at https://bit.ly/3WOoldB 6/34
RDF – Data Represented as a Graph
The graph consists of a set of ⟨subject, predicate, object⟩ triples, over IRI, literal and blank nodes
• IRI nodes (formerly URI):
<http://example.org/M-25>,
<M-25>, ex:M-25 or :M-25
• Literal nodes:
"2008-02-12", "The Matrix"@en,
"511"^^xsd:integer
• class membership triples:
<A-1> rdf:type :Actor .
• object property triples:
<A-1> :playsIn <M-25> .
• data property triples:
<M-25> :releaseDate "2008-02-12" .
Towards Virtual Knowledge Graphs over Web APIs – slides available at https://bit.ly/3WOoldB 7/34
OWL 2 QL – Lightweight Ontology Language for Accessing Large Amounts of Data
Standard sub-language of OWL 2 [W3C Rec. 2012]
Its assertions encode a logical theory in the
DL-Lite fragment of description logics that
enables reasoning by query rewriting
Close correspondence with UML class diagrams
and ER schemas used in conceptual modeling
:actsIn rdfs:range :Movie
:actsIn rdfs:subPropertyOf :playsIn
. . . owl:someValuesFrom . . .
Actor
name: String
SeriesActor MovieActor
Play
title: String
Movie
actsIn
1..⋆
▶
playsIn
▶
{disjoint}
In f
ont
UM
Diego Calvanese (unibz + umu + ontopic) Ontology-based Data Access and Integration
Assertion type DL syntax OWL syntax
Subclass assertion MovieActor ⊑ Actor :MovieActor rdfs:subClassOf :Actor .
Class disjointness Actor ⊑ ¬Movie :Actor owl:disjointWith :Movie .
Domain of a property ∃actsIn ⊑ MovieActor :actsIn rdfs:domain :MovieActor .
Range of a property ∃actsIn−
⊑ Movie :actsIn rdfs:range :Movie .
Subproperty assertion actsIn ⊑ playsIn :actsIn rdfs:subPropertyOf :playsIn .
Inverse properties actsIn ≡ hasActor−
:actsIn owl:inverseOf :hasActor .
Mandatory participation MovieActor ⊑ ∃actsIn owl:someValuesFrom in superclass expression
Towards Virtual Knowledge Graphs over Web APIs – slides available at https://bit.ly/3WOoldB 8/34
Mappings
Define how to populate classes & properties via assertions of form: Qsql(⃗
x) ⇝ iri(⃗
x) rdf:type C
Qsql(⃗
x) ⇝ iri1(⃗
x) P iri2(⃗
x)
Ontology O:
:actsIn rdfs:domain :MovieActor .
:actsIn rdfs:range :Movie .
:Movie rdfs:subClassOf :Play .
:title rdfs:domain :Play .
:title rdfs:range xsd:string .
...
Mapping M:
m1: SELECT mcode, mtitle FROM MOVIE WHERE type = "m"
⇝ :m-{mcode} rdf:type :Movie . :m-{mcode} :title {mtitle} .
m2: SELECT M.mcode, A.acode FROM MOVIE M, ACTOR A
WHERE M.mcode = A.pcode AND M.type = "m"
⇝ :a-{acode} :actsIn :m-{mcode} .
Database D:
MOVIE
mcode mtitle myear type · · ·
511 The Matrix 1999 m · · ·
227 Blade Runner 1982 m · · ·
ACTOR
pcode acode aname · · ·
511 43 K. Reeves · · ·
511 57 C.A. Moss · · ·
VKG V from O, M, D:
:m-511 rdf:type :Movie .
:m-227 rdf:type :Movie .
:m-511 :title "The Matrix" .
:m-227 :title "Blade Runner" .
:a-43 :actsIn :m-511 .
:a-57 :actsIn :m-511 .
Towards Virtual Knowledge Graphs over Web APIs – slides available at https://bit.ly/3WOoldB 9/34
SPARQL Query Language
Standard query language for RDF data [W3C Rec. 2008, 2013], based on graph matching
SELECT ?a ?t WHERE {
?a rdf:type :Actor .
?a :playsIn ?m .
?m rdf:type :Movie .
?m :title ?t .
}
ndard query language for RDF data. [W3C Rec. 2008, 2013]
ry mechanism is based on graph matching.
?t
a rdf:type Actor .
a playsIn ?m .
m rdf:type Movie .
m title ?t .
?a
Actor
?m
Movie
?t
rdf:type
playsIn
rdf:type
title
guage features (SPARQL 1.1):
atches one of alternative graph patterns
L: produces a match even when part of the pattern is missing
FILTER conditions
Y, to express aggregations
remove possible solutions
paths (regular expressions)
Additional language features (SPARQL 1.1):
• UNION: matches one of alternative graph patterns
• OPTIONAL: produces a match even when part of the pattern is missing
• complex FILTER conditions
• GROUP BY, to express aggregations
• MINUS, to remove possible solutions
• property paths (regular expressions)
• · · ·
Towards Virtual Knowledge Graphs over Web APIs – slides available at https://bit.ly/3WOoldB 10/34
Query Answering in VKGs
Goal: answer a query q over a VKG V by jointly considering:
• the data provided by the data source D
• the mapping M encoding how such data translates to ontology
• the ontology O encoding domain knowledge that can be used to enrich answers.
Example:
• suppose that an entity :m-511 of class Movie can be obtained from the data D using some
mapping assertion in M (e.g., m1 about table MOVIE)
• suppose the ontology O states that each Movie is a Play, i.e., :Movie rdfs:subClassOf :Play
• if query q asks for all Plays, we should return also m-511 that is a Movie and thus also a Play
solution:
Query answering by Query Reformulation
Towards Virtual Knowledge Graphs over Web APIs – slides available at https://bit.ly/3WOoldB 11/34
Query Answering in VKGs – Query Reformulation
Ontology O
Mappings M
Data
Sources
D
. . .
. . .
. . .
. . .
Ontological Query q
Rewritten Query
SQL
Relational Answer
Ontological Answer
Rewriting
Unfolding
Evaluation
Result Translation
SELECT ?p {
?p rdf:type :Play
}
SELECT ?p {
{ ?p rdf:type :Play }
UNION
{ ?p rdf:type :Movie }
}
SELECT mcode
FROM MOVIE
WHERE type = “m”
?p
:m-511
mcode
511
D: MOVIE (mcode, mtitle, …)
O: :Movie rdfs:subClassOf :Play
M: SELECT mcode
FROM MOVIE
→ :m-{mcode} a :Movie
Towards Virtual Knowledge Graphs over Web APIs – slides available at https://bit.ly/3WOoldB 12/34
1. Introduction
2. The VKG Framework
3. The Ontop VKG System
4. VKGs over Web APIs
5. Conclusions
The Ontop VKG System
https://ontop-vkg.org/
• state-of-the-art VKG system born in UNIBZ (2009, first research in 2004)
• compliant with all relevant Semantic Web standards:
RDF, RDFS, OWL 2 QL, R2RML, SPARQL, and GeoSPARQL
• implemented in Java (v1.8+) and also available as Docker image
• supports all major relational DBMSs:
Oracle, DB2, MS SQL Server, Postgres, MySQL, Teiid, Dremio, Denodo, etc.
• open-source (Apache 2) project with a solid community
200+ mailing list members, 9000+ downloads in last 2 years
• commercial services (open-core model) by Ontopic , a UNIBZ spin-off funded in 2019
Towards Virtual Knowledge Graphs over Web APIs – slides available at https://bit.ly/3WOoldB 13/34
Ontop Usage Scenarios
s Solution
Mapping
Data
Ontology
materialize
virtualize
Virtual
Knowledge Graph
Materialized
Knowledge Graph
•••
Query Query Result
Triple Store
VKG query answering
• supports most of SPARQL 1.1 under
OWL 2 QL inference regime
• standard-compliant SPARQL endpoint
• over one relational source, or
• over multiple heterogeneous sources,
together with a data federation system
(e.g., Teiid, Dremio) providing an
integrated relational view of sources
VKG materialization
• use ontology and mappings to efficiently
& scalably materialize all the VKG triples
• the produced RDF file can be loaded in
any triplestore
Towards Virtual Knowledge Graphs over Web APIs – slides available at https://bit.ly/3WOoldB 14/34
Ontop Developer Community
Towards Virtual Knowledge Graphs over Web APIs – slides available at https://bit.ly/3WOoldB 15/34
Ontop in Research and Industrial Projects
Research projects
• Optique (EU FP7, 11/2012-10/2016)
Ontop-based scalable end-user access to
big data, 10 partners incl. Statoil, Siemens
• EPNet (ERC Advanced Grant)
cultural heritage project on food production
and distribution in the Roman Empire
• KAOS (Euregio, 06/2016-05/2019)
preparing standardized log files from
timestamped log data for process mining
• INODE (EU H2020, 11/2019-10/2022)
intelligent open data exploration
• IDEE (ERDF 2014-2020)
building & energy consumption data VKG
Industrial projects
• NOI Techpark
development South Tyrol tourism KG
• SIRIS Academic (Barcelona)
open data integration and dashboards
• Siemens Corportate Technologies (Munich)
access to temporal and streaming data
• Robert Bosch GmBH (Stuttgart)
analysis of manufacturing log data
• Metaphacts (Germany)
inclusion of Ontop in their platform
• Fluxicon (Milano)
• Isagog (Rome)
Towards Virtual Knowledge Graphs over Web APIs – slides available at https://bit.ly/3WOoldB 16/34
Ontop in Action Optique project, Statoil use case
From SQL query over the data source ...
SELECT wellbore.identifier, stratigraphic_zone.strat_column_identifier,
pty_pressure.pty_pressure_s, stratigraphic_zone.strat_unit_identifier
FROM wellbore, pty_pressure, activity fp_depth_data LEFT JOIN (
pty_location_1d AS fp_depth_pt1_loc
JOIN picked_stratigraphic_zones AS zs
ON zs.strat_zone_entry_md <= fp_depth_pt1_loc.Data_value_1_o AND
zs.strat_zone_exit_md >= fp_depth_pt1_loc.Data_value_1_o AND
zs.strat_zone_depth_uom = fp_depth_pt1_loc.Data_value_1_ou
JOIN join stratigraphic_zone
ON zs.wellbore = stratigraphic_zone.wellbore AND
zs.strat_column_identifier = stratigraphic_zone.
strat_column_identifier AND
zs.strat_interp_version = stratigraphic_zone.strat_interp_version AND
zs.strat_zone_identifier = stratigraphic_zone.strat_zone_identifier
) ON fp_depth_data.facility_s = zs.wellbore AND
fp_depth_data.activity_s = fp_depth_pt1_loc.activity_s,
activity_class AS form_pressure_class
WHERE wellbore.wellbore_s = fp_depth_data.Facility_s AND
fp_depth_data.activity_s = pty_pressure.activity_s AND
fp_depth_data.kind_s = form_pressure_class.activity_class_s AND
wellbore.ref_existence_kind = 'actual' AND
form_pressure_class.name = 'formation pressure depth data'
... to VKG SPARQL query
SELECT ?wellbore ?chronostrat_unit
?top_md_m ?lithostrat_unit
{
?w a :Wellbore ;
:name ?wellbore ;
:hasWellboreInterval ?intv .
?intv a :StratigraphicZone ;
:hasUnit ?cu ;
:hasTopDepth ?top .
?cu :name ?chronostrat_unit ;
:ofStratigraphicColumn
[ a :ChronoStratigraphicColumn ] .
?top a :MeasuredDepth ;
:valueInStandardUnit ?top_md_m .
?intv :overlapsWellboreInterval
?litho_intv .
?litho_intv :hasUnit ?lu .
?lu :name ?lithostrat_unit ;
:ofStratigraphicColumn
[ a :LithoStratigraphicColumn ] .
}
Towards Virtual Knowledge Graphs over Web APIs – slides available at https://bit.ly/3WOoldB 17/34
Ongoing Research & Development Directions
Mapping patterns
• bootstrapping (semi-automated generation) of mappings & possibly ontology for a data source
• reduces VKG deploying costs, mostly related to mapping authoring
Provenance & explanations
• report which sources/tuples, mappings and ontology axioms contributed to a query answer
• prototype Ontop extension based on provenance approaches (semi-rings) in DB community
Geospatial queries
• support GeoSPARQL to manipulate & query for geometries, leveraging DB support (e.g., PostGIS)
Temporal/streaming extensions
• support SQL-enabled stream processors like Flink and pattern matching over streaming data
Non-relational sources
• support non-relational data sources such as MongoDB, Neo4J and Web APIs
Towards Virtual Knowledge Graphs over Web APIs – slides available at https://bit.ly/3WOoldB 18/34
1. Introduction
2. The VKG Framework
3. The Ontop VKG System
4. VKGs over Web APIs
5. Conclusions
Accessing Web APIs
Data is increasingly available via Web APIs
• access to 3rd-party and/or dynamically-computed data
• access to data-related services, e.g., text search
Some APIs’ statisticsa
• 83% of all Internet traffic belongs to API-based services
• 2M+ API repositories on GitHub
• 90% of developers use APIs
• 30% of development time spent on coding APIs
Complex data access problem for applications operating on
data from both databases and APIs
a
https://nordicapis.com/20-impressive-api-economy-statistics/
RDB Sources
API Sources
SQL
calls
Application
complex
data access
problem
Towards Virtual Knowledge Graphs over Web APIs – slides available at https://bit.ly/3WOoldB 19/34
Accessing Web APIs – Open Data Hub (ODH) RDB + Semantic Search API Example
Answer hybrid queries like:
• get (plot) IRI, description, rating &
location of accommodations ...
• whose rating is 3 stars or more
(structured constraint) and ...
• whose EN description matches the
search string “horse riding” (text
constraint)
Semantic search: improved text search
that aims at capturing and leveraging
text meaning (vs term matching only)
• e.g., via BERT-based model from
Sentence Transformers library
Towards Virtual Knowledge Graphs over Web APIs – slides available at https://bit.ly/3WOoldB 20/34
Accessing Web APIs – Unified Access using a VKG
• applications operate on a unified VKG spanning APIs and
other involved sources
→ each API operation as an independent source
→ data federation setting due to multiple sources
• VKG built (e.g., via Ontop) over a Virtual Database (VDB)
federating all sources
→ VDB produced by a data federation system (e.g., Teiid)
→ the VDB offers a relational view of API data
→ VKG query reformulation may be tuned to this setting
• delegate the complex orchestration of source sub-queries
and API calls to a VKG + data federation system
• exploit existing database techniques to cope with API access
pattern restrictions during query answering
Virtual DB (VDB) (Teiid extension)
RDB Sources
API Sources
VKG (Ontop extension)
SQL
SQL
calls
SPARQL
User / Application
Towards Virtual Knowledge Graphs over Web APIs – slides available at https://bit.ly/3WOoldB 21/34
VDB – SQL/MED Specification
SQL/MED allows federating multiple sources in a virtual database (VDB)
• standardized SQL extension supported by some data federation systems like Teiid
• VDB as a set of schemas mapped to foreign data sources accessed via wrappers/translators
• we extend Teiid with a new service translator for accessing APIs
Example using Teiid with our extensions:
CREATE DATABASE vdb_example OPTIONS ( "... connection options for federated sources ..." );
USE DATABASE vdb_example;
CREATE SERVER db_source FOREIGN DATA WRAPPER postgresql; -- define RDB source with schema 'db'
CREATE SCHEMA db SERVER db_source; -- using 'postgresql' translator to access it
CREATE SERVER srv_source FOREIGN DATA WRAPPER service; -- define API source with schema 'srv'
CREATE SCHEMA srv SERVER srv_source; -- using 'service' translator to access it
IMPORT FOREIGN SCHEMA public FROM SERVER db_source INTO db OPTIONS ( importer.catalog 'public' );
SET SCHEMA srv;
-- CREATE FOREIGN TABLE / PROCEDURE statements mapped to API operations (API bindings)
Towards Virtual Knowledge Graphs over Web APIs – slides available at https://bit.ly/3WOoldB 22/34
VDB – API Bindings
API operations as SQL/MED procedures
• input tuple → 0..n output tuples
• URL, method, request/response templates
CREATE FOREIGN PROCEDURE api_semsearch_query (
query VARCHAR
) RETURNS TABLE (
query VARCHAR,
id VARCHAR,
score DOUBLE,
excerpt VARCHAR
) OPTIONS (
"method" 'post',
"url" 'http://semsearch:8080/query',
"requestBody" '{"query": "{query}", "n": 100}',
"responseBody" '{"matches": [{
"id": "{id}",
"score": "{score}",
"excerpt": "{excerpt}" }] }'
);
API data as SQL/MED virtual tables
• linked to API operations/procedures
• each procedure defines an access pattern
CREATE FOREIGN TABLE vt_semsearch_match (
query VARCHAR NOT NULL,
id VARCHAR NOT NULL,
score DOUBLE NOT NULL,
excerpt VARCHAR NOT NULL,
PRIMARY KEY (query, id)
) OPTIONS ( "select" 'api_semsearch_query' );
CREATE FOREIGN TABLE vt_semsearch_index (
id VARCHAR PRIMARY KEY,
text VARCHAR NOT NULL
) OPTIONS (
"UPDATABLE" 'true',
"upsert" 'api_semsearch_store',
"delete" 'api_semsearch_clear'
);
Towards Virtual Knowledge Graphs over Web APIs – slides available at https://bit.ly/3WOoldB 23/34
VDB – Query Translation & Execution
Given a VDB defined using SQL/MED + API Bindings and an input query over the VDB
• Teiid splits the query into sub-queries based on translator capabilities and cost heuristics
• sub-queries are sent to translators & Teiid handles remaining operations (e.g., federated joins)
Example SQL query
SELECT s.score,
s.excerpt,
a."AccoCategoryId",
a."AccoDetail-en-Name",
a."AccoDetail-en-City"
FROM srv.vt_semsearch_match AS s
JOIN db.v_accommodationsopen AS a
ON s.id = a."Id"
WHERE s.query = 'horse riding'
ORDER BY s.score DESC
LIMIT 10
Execution plan
LimitNode (limit = 10)
SortNode (s.score DESC)
ProjectNode (s.score, ... a."AccoDetail-en-City")
JoinNode (s.id = a."Id", merge join strategy)
AccessNode (API)
SELECT id, excerpt, score
FROM vt_semsearch_match
WHERE query = ’horse riding’
AccessNode (RDB)
SELECT "Id", "AccoDetail-en-Name",
"AccoDetail-en-City",
FROM v_accommodationsopen
Towards Virtual Knowledge Graphs over Web APIs – slides available at https://bit.ly/3WOoldB 24/34
VDB – Push-down of Projection, Filtering, Sorting, Slicing
Special input attributes map API capabilities related to standard relational operators
• filtering: return/process only objects matching some criteria (e.g., attribute = or ≥ constant)
• projection: include/exclude certain attributes in returned results
• sorting: sort results according to a certain attribute and direction (ascending/descending)
• slicing: return only a given page of all possible results
CREATE FOREIGN PROCEDURE api_station_data_from_to (
stype VARCHAR NOT NULL,
sname VARCHAR NOT NULL,
tname VARCHAR NOT NULL,
__min_inclusive__mvaliddate DATE NOT NULL, -- filter push down (conditions min <= mvaliddate <= max)
__max_inclusive__mvaliddate DATE NOT NULL,
__limit__ INTEGER -- slicing push down
) RETURNS TABLE ( ... )
) OPTIONS ( ... );
Partial/complete push down of these operators whenever possible
• allows offloading computation to the API (e.g., sorting)
• allows reducing costs by manipulating & transferring less data
Towards Virtual Knowledge Graphs over Web APIs – slides available at https://bit.ly/3WOoldB 25/34
VDB – Exploiting Bulk API Operations
Bulk API operations operate on multiple input tuples, such as lookup by set of IDs or bulk store
• their use enables better performance due to less API calls
• useful to speed-up dependent joins (using IN operator) between RDBMS and API data
A A
RDBMS table R virtual table S bulk API operation
(A input attribute)
⨝R.A = S.A
SELECT A, …
FROM R
WHERE …
1
SELECT A, …
FROM S
WHERE A IN (a1, a2, …)
AND …
3
2 Extract values of join
attribute A: a1, a2, …
API bindings
4 Bulk API calls with
multiple input tuples for
different values of A:
a1, a2, …
Towards Virtual Knowledge Graphs over Web APIs – slides available at https://bit.ly/3WOoldB 26/34
VDB – Data Materialization
Data materialization: required by API operations that cannot be invoked at query time
• operations too expensive to call at query time (e.g., align API and DB identifiers)
• operations instrumental to the use of external APIs (e.g., text indexing in a search engine)
Solution #1: materialized views in Teiid (or other data federation system used)
Solution #2: dedicated materialization engine for
flexibly executing arbitrary materialization rules:
• identifier – for documentation & diagnostics
• target – the system-managed computed table
(possibly virtual) where data is stored
• source – arbitrary SQL query (over any tables)
that produces the data to store
rules:
- id: index_accommodation_texts
target: vt_semsearch_index
source: |-
SELECT "Id" AS id,
"AccoDetail-en-Longdesc" AS text
FROM v_accommodationsopen
WHERE "AccoDetail-en-Longdesc"
IS NOT NULL
- ... other rules ...
Towards Virtual Knowledge Graphs over Web APIs – slides available at https://bit.ly/3WOoldB 27/34
VDB – Data Materialization (cont’d)
Rules (their SQL source queries) are analyzed to derive a rule dependency graph, which is mapped
to an execution plan using fixpoint rule evaluation for strongly connected components
R1 R2
R3 R4
R5
R1 R2
R3 R4
R5
sequence (
parallel (
R1,
sequence (
R2,
fixpoint (
parallel (
R3,
R4
)
)
)
),
R5
)
Rule / Table Dependencies Rule Dependencies Execution Plan
Towards Virtual Knowledge Graphs over Web APIs – slides available at https://bit.ly/3WOoldB 28/34
VKG – Example of Ontology & Mappings over the VDB
Ontology
schema:Accommodation a owl:Class ;
rdfs:subClassOf schema:Place ;
rdfs:label "Accommodation"@en ;
...
schema:name a owl:DatatypeProperty ;
...
hive:Match a owl:Class ...
Current ontology formalism (OWL 2 QL) reused
as is, but now also models data from APIs
Mappings
mappingId Semantic Search
target data:match/accommodation/{id}/{query}
a hive:Match;
hive:query {query}^^xsd:string;
hive:resource data:accommodation/{id};
hive:excerpt {excerpt}@en;
hive:score {score}^^xsd:decimal.
source SELECT *
FROM hiveodh.srv.vt_semsearch_match
Current VKG mapping formalism reused as is, but
data may now come from API virtual tables
Towards Virtual Knowledge Graphs over Web APIs – slides available at https://bit.ly/3WOoldB 29/34
VKG – Query Rewriting & Evaluation Example
User-supplied SPARQL query
SELECT ?h ?posLabel ?rating ?pos {
[] a hive:Match ;
hive:query "horse riding"^^xsd:string ;
hive:resource ?h ;
hive:excerpt ?excerpt ;
hive:score ?score .
?h a schema:LodgingBusiness ;
geo:defaultGeometry/geo:asWKT ?pos ;
schema:name ?name ;
schema:description ?description ;
schema:starRating/schema:ratingValue ?rating.
FILTER (?rating >= 3 && lang(?name) = 'en' &&
lang(?description) = 'en')
BIND (CONCAT(?name, " <br><br>...", ?excerpt,
"...<br><br>", ?description) AS ?posLabel)
}
ORDER BY DESC(?score) LIMIT 10
SQL query rewritten by Ontop
SELECT
v1.id,
v1.excerpt, -- fields used
v2."AccoDetail-en-Name", -- for deriving
v2."AccoDetail-en-Longdesc", -- ?posLabel
... complex expression computing rating ...,
ST_ASTEXT(v2."Geometry")
FROM
hiveodh.srv.vt_semsearch_match v1,
hiveodh.db.v_accommodationsopen v2
WHERE
v1."id" = v2."Id" AND
CAST(v1."query" AS TEXT) = 'horse riding' AND
... complex condition on rating >= 3 ... AND
... nonnull conditions for output columns ...
ORDER BY CAST(v1."score" AS DECIMAL) DESC
LIMIT 10
SQL query evaluated on the VDB by Teiid
Towards Virtual Knowledge Graphs over Web APIs – slides available at https://bit.ly/3WOoldB 30/34
VKG – ODH with Semantic Search Demo
Data sources
DB with ODH tourism data +
Semantic search API to index &
query accommodations texts
System
Ontop embedding Teiid +
materialization engine
Demo
https://hive.inf.unibz.it/
odh/vkg/ reformulate example
Towards Virtual Knowledge Graphs over Web APIs – slides available at https://bit.ly/3WOoldB 31/34
Overall Framework for VKGs over APIs
Virtual DB (VDB) Teiid + service translator
VKG Mappings
including virtual tables,
used for query rewriting
Materialization Rules
pre-compute results of
expensive API calls
→ VDB/VKG no more
fully “virtual”
API Bindings
define how to query/update a virtual
table via API calls, if possible
→ limited access patterns RDB Sources
API Sources
Virtual Knowledge Graph (VKG) Ontop
SQL
SQL
calls
Application
(VKG-based)
Application
(VDB-based)
SQL
SPARQL
VKG Ontology
formalizes the classes/properties
(the “schema”) of the VKG,
enabling reasoning
Towards Virtual Knowledge Graphs over Web APIs – slides available at https://bit.ly/3WOoldB 32/34
1. Introduction
2. The VKG Framework
3. The Ontop VKG System
4. VKGs over Web APIs
5. Conclusions
Takeaway Messages
Virtual Knowledge Graphs (VKG): flexible technology for building KGs over existing data source(s)
• useful for inherently relational data where a VKG engine + RDBMS may outperform a triplestore
• useful for existing data RDF-ification via VKG materialization to an RDF file
Ontop: mature, open-source VKG system with a solid user & developer community
• allows a VKG over a single RDB, with support for multiple database engines
• allows a VKG over multiple heterogeneous sources, in combination with an intermediate data
federation system such as the open-source Teiid & Dremio
• active research & development for adding new features and new data sources
VKGs over Web APIs: ongoing research & development effort
• enables transparent access to dynamically-computed API data via declarative queries
• API operations mapped to virtual relations, accessed through a Teiid extension
• optimizations for better using API features, such as bulk operations and operators’ push-down
• expensive API operations supported via pre-computation and data materialization
Towards Virtual Knowledge Graphs over Web APIs – slides available at https://bit.ly/3WOoldB 33/34
Thanks for attending!
these slides: https://bit.ly/3WOoldB
Ontop: https://ontop-vkg.org/

More Related Content

Similar to Towards Virtual Knowledge Graphs over Web APIs

Site Interoperability Projects at DERI Galway's SW Cluster
Site Interoperability Projects at DERI Galway's SW ClusterSite Interoperability Projects at DERI Galway's SW Cluster
Site Interoperability Projects at DERI Galway's SW Cluster
John Breslin
 
From SMW to Rules
From SMW to RulesFrom SMW to Rules
From SMW to Rules
Jie Bao
 
WWW09 - Triplify Light-Weight Linked Data Publication from Relational Databases
WWW09 - Triplify Light-Weight Linked Data Publication from Relational DatabasesWWW09 - Triplify Light-Weight Linked Data Publication from Relational Databases
WWW09 - Triplify Light-Weight Linked Data Publication from Relational Databases
Sören Auer
 
DBpedia Mobile Explorer
DBpedia Mobile ExplorerDBpedia Mobile Explorer
DBpedia Mobile Explorer
Iacopo Vagliano
 
Wi2015 - Clustering of Linked Open Data - the LODeX tool
Wi2015 - Clustering of Linked Open Data - the LODeX toolWi2015 - Clustering of Linked Open Data - the LODeX tool
Wi2015 - Clustering of Linked Open Data - the LODeX tool
Laura Po
 
Integrating a Domain Ontology Development Environment and an Ontology Search ...
Integrating a Domain Ontology Development Environment and an Ontology Search ...Integrating a Domain Ontology Development Environment and an Ontology Search ...
Integrating a Domain Ontology Development Environment and an Ontology Search ...
Takeshi Morita
 
The Semantic Data Web, Sören Auer, University of Leipzig
The Semantic Data Web, Sören Auer, University of LeipzigThe Semantic Data Web, Sören Auer, University of Leipzig
The Semantic Data Web, Sören Auer, University of Leipzig
LOD2 Creating Knowledge out of Interlinked Data
 
Knowledge graph construction with a façade - The SPARQL Anything Project
Knowledge graph construction with a façade - The SPARQL Anything ProjectKnowledge graph construction with a façade - The SPARQL Anything Project
Knowledge graph construction with a façade - The SPARQL Anything Project
Enrico Daga
 
Architecture Patterns for Semantic Web Applications
Architecture Patterns for Semantic Web ApplicationsArchitecture Patterns for Semantic Web Applications
Architecture Patterns for Semantic Web Applications
bpanulla
 
Semantic Web and Related Work at W3C
Semantic Web and Related Work at W3CSemantic Web and Related Work at W3C
Semantic Web and Related Work at W3C
Ivan Herman
 
DCMI Keynote: Bridging the Semantic Gaps and Interoperability
DCMI Keynote: Bridging the Semantic Gaps and InteroperabilityDCMI Keynote: Bridging the Semantic Gaps and Interoperability
DCMI Keynote: Bridging the Semantic Gaps and Interoperability
Mike Bergman
 
OBA: An Ontology-Based Framework for Creating REST APIs for Knowledge Graphs
OBA: An Ontology-Based Framework for Creating REST APIs for Knowledge GraphsOBA: An Ontology-Based Framework for Creating REST APIs for Knowledge Graphs
OBA: An Ontology-Based Framework for Creating REST APIs for Knowledge Graphs
dgarijo
 
A Semantic Wiki Based Light-Weight Web Application Model
A Semantic Wiki Based Light-Weight Web Application ModelA Semantic Wiki Based Light-Weight Web Application Model
A Semantic Wiki Based Light-Weight Web Application Model
Jie Bao
 
Adcom2006 Full 6
Adcom2006 Full 6Adcom2006 Full 6
Adcom2006 Full 6
umavanth
 
Semantic Web - Lecture 09 - Web Information Systems (4011474FNR)
Semantic Web - Lecture 09 - Web Information Systems (4011474FNR)Semantic Web - Lecture 09 - Web Information Systems (4011474FNR)
Semantic Web - Lecture 09 - Web Information Systems (4011474FNR)
Beat Signer
 
History and Background of the USEWOD Data Challenge
History and Background of the  USEWOD Data ChallengeHistory and Background of the  USEWOD Data Challenge
History and Background of the USEWOD Data Challenge
Knud Möller
 
Data Integration And Visualization
Data Integration And VisualizationData Integration And Visualization
Data Integration And Visualization
Ivan Ermilov
 
Ontology-based Cooperation of Information Systems
Ontology-based Cooperation of Information SystemsOntology-based Cooperation of Information Systems
Ontology-based Cooperation of Information Systems
Raji Ghawi
 
Evaluation Initiatives for Entity-oriented Search
Evaluation Initiatives for Entity-oriented SearchEvaluation Initiatives for Entity-oriented Search
Evaluation Initiatives for Entity-oriented Search
krisztianbalog
 
Linked Data Tutorial
Linked Data TutorialLinked Data Tutorial
Linked Data Tutorial
Sören Auer
 

Similar to Towards Virtual Knowledge Graphs over Web APIs (20)

Site Interoperability Projects at DERI Galway's SW Cluster
Site Interoperability Projects at DERI Galway's SW ClusterSite Interoperability Projects at DERI Galway's SW Cluster
Site Interoperability Projects at DERI Galway's SW Cluster
 
From SMW to Rules
From SMW to RulesFrom SMW to Rules
From SMW to Rules
 
WWW09 - Triplify Light-Weight Linked Data Publication from Relational Databases
WWW09 - Triplify Light-Weight Linked Data Publication from Relational DatabasesWWW09 - Triplify Light-Weight Linked Data Publication from Relational Databases
WWW09 - Triplify Light-Weight Linked Data Publication from Relational Databases
 
DBpedia Mobile Explorer
DBpedia Mobile ExplorerDBpedia Mobile Explorer
DBpedia Mobile Explorer
 
Wi2015 - Clustering of Linked Open Data - the LODeX tool
Wi2015 - Clustering of Linked Open Data - the LODeX toolWi2015 - Clustering of Linked Open Data - the LODeX tool
Wi2015 - Clustering of Linked Open Data - the LODeX tool
 
Integrating a Domain Ontology Development Environment and an Ontology Search ...
Integrating a Domain Ontology Development Environment and an Ontology Search ...Integrating a Domain Ontology Development Environment and an Ontology Search ...
Integrating a Domain Ontology Development Environment and an Ontology Search ...
 
The Semantic Data Web, Sören Auer, University of Leipzig
The Semantic Data Web, Sören Auer, University of LeipzigThe Semantic Data Web, Sören Auer, University of Leipzig
The Semantic Data Web, Sören Auer, University of Leipzig
 
Knowledge graph construction with a façade - The SPARQL Anything Project
Knowledge graph construction with a façade - The SPARQL Anything ProjectKnowledge graph construction with a façade - The SPARQL Anything Project
Knowledge graph construction with a façade - The SPARQL Anything Project
 
Architecture Patterns for Semantic Web Applications
Architecture Patterns for Semantic Web ApplicationsArchitecture Patterns for Semantic Web Applications
Architecture Patterns for Semantic Web Applications
 
Semantic Web and Related Work at W3C
Semantic Web and Related Work at W3CSemantic Web and Related Work at W3C
Semantic Web and Related Work at W3C
 
DCMI Keynote: Bridging the Semantic Gaps and Interoperability
DCMI Keynote: Bridging the Semantic Gaps and InteroperabilityDCMI Keynote: Bridging the Semantic Gaps and Interoperability
DCMI Keynote: Bridging the Semantic Gaps and Interoperability
 
OBA: An Ontology-Based Framework for Creating REST APIs for Knowledge Graphs
OBA: An Ontology-Based Framework for Creating REST APIs for Knowledge GraphsOBA: An Ontology-Based Framework for Creating REST APIs for Knowledge Graphs
OBA: An Ontology-Based Framework for Creating REST APIs for Knowledge Graphs
 
A Semantic Wiki Based Light-Weight Web Application Model
A Semantic Wiki Based Light-Weight Web Application ModelA Semantic Wiki Based Light-Weight Web Application Model
A Semantic Wiki Based Light-Weight Web Application Model
 
Adcom2006 Full 6
Adcom2006 Full 6Adcom2006 Full 6
Adcom2006 Full 6
 
Semantic Web - Lecture 09 - Web Information Systems (4011474FNR)
Semantic Web - Lecture 09 - Web Information Systems (4011474FNR)Semantic Web - Lecture 09 - Web Information Systems (4011474FNR)
Semantic Web - Lecture 09 - Web Information Systems (4011474FNR)
 
History and Background of the USEWOD Data Challenge
History and Background of the  USEWOD Data ChallengeHistory and Background of the  USEWOD Data Challenge
History and Background of the USEWOD Data Challenge
 
Data Integration And Visualization
Data Integration And VisualizationData Integration And Visualization
Data Integration And Visualization
 
Ontology-based Cooperation of Information Systems
Ontology-based Cooperation of Information SystemsOntology-based Cooperation of Information Systems
Ontology-based Cooperation of Information Systems
 
Evaluation Initiatives for Entity-oriented Search
Evaluation Initiatives for Entity-oriented SearchEvaluation Initiatives for Entity-oriented Search
Evaluation Initiatives for Entity-oriented Search
 
Linked Data Tutorial
Linked Data TutorialLinked Data Tutorial
Linked Data Tutorial
 

More from Speck&Tech

Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Speck&Tech
 
Dati aperti: un diritto digitale, da rivendicare e da alimentare
Dati aperti: un diritto digitale, da rivendicare e da alimentareDati aperti: un diritto digitale, da rivendicare e da alimentare
Dati aperti: un diritto digitale, da rivendicare e da alimentare
Speck&Tech
 
AI nel diritto penale, dalle indagini alla redazione delle sentenze
AI nel diritto penale, dalle indagini alla redazione delle sentenzeAI nel diritto penale, dalle indagini alla redazione delle sentenze
AI nel diritto penale, dalle indagini alla redazione delle sentenze
Speck&Tech
 
Vecchi e nuovi diritti per l'intelligenza artificiale
Vecchi e nuovi diritti per l'intelligenza artificialeVecchi e nuovi diritti per l'intelligenza artificiale
Vecchi e nuovi diritti per l'intelligenza artificiale
Speck&Tech
 
What should 6G be? - 6G: bridging gaps, connecting futures
What should 6G be? - 6G: bridging gaps, connecting futuresWhat should 6G be? - 6G: bridging gaps, connecting futures
What should 6G be? - 6G: bridging gaps, connecting futures
Speck&Tech
 
Creare il sangue artificiale: "buon sangue non mente"
Creare il sangue artificiale: "buon sangue non mente"Creare il sangue artificiale: "buon sangue non mente"
Creare il sangue artificiale: "buon sangue non mente"
Speck&Tech
 
AWS: gestire la scalabilità su larga scala
AWS: gestire la scalabilità su larga scalaAWS: gestire la scalabilità su larga scala
AWS: gestire la scalabilità su larga scala
Speck&Tech
 
Praticamente... AWS - Amazon Web Services
Praticamente... AWS - Amazon Web ServicesPraticamente... AWS - Amazon Web Services
Praticamente... AWS - Amazon Web Services
Speck&Tech
 
Data Sense-making: navigating the world through the lens of information design
Data Sense-making: navigating the world through the lens of information designData Sense-making: navigating the world through the lens of information design
Data Sense-making: navigating the world through the lens of information design
Speck&Tech
 
Data Activism: data as rhetoric, data as power
Data Activism: data as rhetoric, data as powerData Activism: data as rhetoric, data as power
Data Activism: data as rhetoric, data as power
Speck&Tech
 
Delve into the world of the human microbiome and metagenomics
Delve into the world of the human microbiome and metagenomicsDelve into the world of the human microbiome and metagenomics
Delve into the world of the human microbiome and metagenomics
Speck&Tech
 
Home4MeAi: un progetto sociale che utilizza dispositivi IoT per sfruttare le ...
Home4MeAi: un progetto sociale che utilizza dispositivi IoT per sfruttare le ...Home4MeAi: un progetto sociale che utilizza dispositivi IoT per sfruttare le ...
Home4MeAi: un progetto sociale che utilizza dispositivi IoT per sfruttare le ...
Speck&Tech
 
Monitorare una flotta di autobus: architettura di un progetto di acquisizione...
Monitorare una flotta di autobus: architettura di un progetto di acquisizione...Monitorare una flotta di autobus: architettura di un progetto di acquisizione...
Monitorare una flotta di autobus: architettura di un progetto di acquisizione...
Speck&Tech
 
Why LLMs should be handled with care
Why LLMs should be handled with careWhy LLMs should be handled with care
Why LLMs should be handled with care
Speck&Tech
 
Building intelligent applications with Large Language Models
Building intelligent applications with Large Language ModelsBuilding intelligent applications with Large Language Models
Building intelligent applications with Large Language Models
Speck&Tech
 
Privacy in the era of quantum computers
Privacy in the era of quantum computersPrivacy in the era of quantum computers
Privacy in the era of quantum computers
Speck&Tech
 
Machine learning with quantum computers
Machine learning with quantum computersMachine learning with quantum computers
Machine learning with quantum computers
Speck&Tech
 
Give your Web App superpowers by using GPUs
Give your Web App superpowers by using GPUsGive your Web App superpowers by using GPUs
Give your Web App superpowers by using GPUs
Speck&Tech
 
From leaf to orbit: exploring forests with technology
From leaf to orbit: exploring forests with technologyFrom leaf to orbit: exploring forests with technology
From leaf to orbit: exploring forests with technology
Speck&Tech
 
Innovating Wood
Innovating WoodInnovating Wood
Innovating Wood
Speck&Tech
 

More from Speck&Tech (20)

Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
 
Dati aperti: un diritto digitale, da rivendicare e da alimentare
Dati aperti: un diritto digitale, da rivendicare e da alimentareDati aperti: un diritto digitale, da rivendicare e da alimentare
Dati aperti: un diritto digitale, da rivendicare e da alimentare
 
AI nel diritto penale, dalle indagini alla redazione delle sentenze
AI nel diritto penale, dalle indagini alla redazione delle sentenzeAI nel diritto penale, dalle indagini alla redazione delle sentenze
AI nel diritto penale, dalle indagini alla redazione delle sentenze
 
Vecchi e nuovi diritti per l'intelligenza artificiale
Vecchi e nuovi diritti per l'intelligenza artificialeVecchi e nuovi diritti per l'intelligenza artificiale
Vecchi e nuovi diritti per l'intelligenza artificiale
 
What should 6G be? - 6G: bridging gaps, connecting futures
What should 6G be? - 6G: bridging gaps, connecting futuresWhat should 6G be? - 6G: bridging gaps, connecting futures
What should 6G be? - 6G: bridging gaps, connecting futures
 
Creare il sangue artificiale: "buon sangue non mente"
Creare il sangue artificiale: "buon sangue non mente"Creare il sangue artificiale: "buon sangue non mente"
Creare il sangue artificiale: "buon sangue non mente"
 
AWS: gestire la scalabilità su larga scala
AWS: gestire la scalabilità su larga scalaAWS: gestire la scalabilità su larga scala
AWS: gestire la scalabilità su larga scala
 
Praticamente... AWS - Amazon Web Services
Praticamente... AWS - Amazon Web ServicesPraticamente... AWS - Amazon Web Services
Praticamente... AWS - Amazon Web Services
 
Data Sense-making: navigating the world through the lens of information design
Data Sense-making: navigating the world through the lens of information designData Sense-making: navigating the world through the lens of information design
Data Sense-making: navigating the world through the lens of information design
 
Data Activism: data as rhetoric, data as power
Data Activism: data as rhetoric, data as powerData Activism: data as rhetoric, data as power
Data Activism: data as rhetoric, data as power
 
Delve into the world of the human microbiome and metagenomics
Delve into the world of the human microbiome and metagenomicsDelve into the world of the human microbiome and metagenomics
Delve into the world of the human microbiome and metagenomics
 
Home4MeAi: un progetto sociale che utilizza dispositivi IoT per sfruttare le ...
Home4MeAi: un progetto sociale che utilizza dispositivi IoT per sfruttare le ...Home4MeAi: un progetto sociale che utilizza dispositivi IoT per sfruttare le ...
Home4MeAi: un progetto sociale che utilizza dispositivi IoT per sfruttare le ...
 
Monitorare una flotta di autobus: architettura di un progetto di acquisizione...
Monitorare una flotta di autobus: architettura di un progetto di acquisizione...Monitorare una flotta di autobus: architettura di un progetto di acquisizione...
Monitorare una flotta di autobus: architettura di un progetto di acquisizione...
 
Why LLMs should be handled with care
Why LLMs should be handled with careWhy LLMs should be handled with care
Why LLMs should be handled with care
 
Building intelligent applications with Large Language Models
Building intelligent applications with Large Language ModelsBuilding intelligent applications with Large Language Models
Building intelligent applications with Large Language Models
 
Privacy in the era of quantum computers
Privacy in the era of quantum computersPrivacy in the era of quantum computers
Privacy in the era of quantum computers
 
Machine learning with quantum computers
Machine learning with quantum computersMachine learning with quantum computers
Machine learning with quantum computers
 
Give your Web App superpowers by using GPUs
Give your Web App superpowers by using GPUsGive your Web App superpowers by using GPUs
Give your Web App superpowers by using GPUs
 
From leaf to orbit: exploring forests with technology
From leaf to orbit: exploring forests with technologyFrom leaf to orbit: exploring forests with technology
From leaf to orbit: exploring forests with technology
 
Innovating Wood
Innovating WoodInnovating Wood
Innovating Wood
 

Recently uploaded

Choosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptxChoosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptx
Brandon Minnick, MBA
 
Y-Combinator seed pitch deck template PP
Y-Combinator seed pitch deck template PPY-Combinator seed pitch deck template PP
Y-Combinator seed pitch deck template PP
c5vrf27qcz
 
Must Know Postgres Extension for DBA and Developer during Migration
Must Know Postgres Extension for DBA and Developer during MigrationMust Know Postgres Extension for DBA and Developer during Migration
Must Know Postgres Extension for DBA and Developer during Migration
Mydbops
 
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
Edge AI and Vision Alliance
 
JavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green MasterplanJavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green Masterplan
Miro Wengner
 
PRODUCT LISTING OPTIMIZATION PRESENTATION.pptx
PRODUCT LISTING OPTIMIZATION PRESENTATION.pptxPRODUCT LISTING OPTIMIZATION PRESENTATION.pptx
PRODUCT LISTING OPTIMIZATION PRESENTATION.pptx
christinelarrosa
 
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
Jason Yip
 
High performance Serverless Java on AWS- GoTo Amsterdam 2024
High performance Serverless Java on AWS- GoTo Amsterdam 2024High performance Serverless Java on AWS- GoTo Amsterdam 2024
High performance Serverless Java on AWS- GoTo Amsterdam 2024
Vadym Kazulkin
 
"Scaling RAG Applications to serve millions of users", Kevin Goedecke
"Scaling RAG Applications to serve millions of users",  Kevin Goedecke"Scaling RAG Applications to serve millions of users",  Kevin Goedecke
"Scaling RAG Applications to serve millions of users", Kevin Goedecke
Fwdays
 
Introduction of Cybersecurity with OSS at Code Europe 2024
Introduction of Cybersecurity with OSS  at Code Europe 2024Introduction of Cybersecurity with OSS  at Code Europe 2024
Introduction of Cybersecurity with OSS at Code Europe 2024
Hiroshi SHIBATA
 
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor Ivaniuk
"Frontline Battles with DDoS: Best practices and Lessons Learned",  Igor Ivaniuk"Frontline Battles with DDoS: Best practices and Lessons Learned",  Igor Ivaniuk
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor Ivaniuk
Fwdays
 
Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving | Nameplate Manufacturing Process - 2024Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving
 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
Zilliz
 
Demystifying Knowledge Management through Storytelling
Demystifying Knowledge Management through StorytellingDemystifying Knowledge Management through Storytelling
Demystifying Knowledge Management through Storytelling
Enterprise Knowledge
 
Skybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoptionSkybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoption
Tatiana Kojar
 
Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |
AstuteBusiness
 
Christine's Product Research Presentation.pptx
Christine's Product Research Presentation.pptxChristine's Product Research Presentation.pptx
Christine's Product Research Presentation.pptx
christinelarrosa
 
AppSec PNW: Android and iOS Application Security with MobSF
AppSec PNW: Android and iOS Application Security with MobSFAppSec PNW: Android and iOS Application Security with MobSF
AppSec PNW: Android and iOS Application Security with MobSF
Ajin Abraham
 
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-EfficiencyFreshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
ScyllaDB
 
"$10 thousand per minute of downtime: architecture, queues, streaming and fin...
"$10 thousand per minute of downtime: architecture, queues, streaming and fin..."$10 thousand per minute of downtime: architecture, queues, streaming and fin...
"$10 thousand per minute of downtime: architecture, queues, streaming and fin...
Fwdays
 

Recently uploaded (20)

Choosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptxChoosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptx
 
Y-Combinator seed pitch deck template PP
Y-Combinator seed pitch deck template PPY-Combinator seed pitch deck template PP
Y-Combinator seed pitch deck template PP
 
Must Know Postgres Extension for DBA and Developer during Migration
Must Know Postgres Extension for DBA and Developer during MigrationMust Know Postgres Extension for DBA and Developer during Migration
Must Know Postgres Extension for DBA and Developer during Migration
 
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
 
JavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green MasterplanJavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green Masterplan
 
PRODUCT LISTING OPTIMIZATION PRESENTATION.pptx
PRODUCT LISTING OPTIMIZATION PRESENTATION.pptxPRODUCT LISTING OPTIMIZATION PRESENTATION.pptx
PRODUCT LISTING OPTIMIZATION PRESENTATION.pptx
 
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
 
High performance Serverless Java on AWS- GoTo Amsterdam 2024
High performance Serverless Java on AWS- GoTo Amsterdam 2024High performance Serverless Java on AWS- GoTo Amsterdam 2024
High performance Serverless Java on AWS- GoTo Amsterdam 2024
 
"Scaling RAG Applications to serve millions of users", Kevin Goedecke
"Scaling RAG Applications to serve millions of users",  Kevin Goedecke"Scaling RAG Applications to serve millions of users",  Kevin Goedecke
"Scaling RAG Applications to serve millions of users", Kevin Goedecke
 
Introduction of Cybersecurity with OSS at Code Europe 2024
Introduction of Cybersecurity with OSS  at Code Europe 2024Introduction of Cybersecurity with OSS  at Code Europe 2024
Introduction of Cybersecurity with OSS at Code Europe 2024
 
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor Ivaniuk
"Frontline Battles with DDoS: Best practices and Lessons Learned",  Igor Ivaniuk"Frontline Battles with DDoS: Best practices and Lessons Learned",  Igor Ivaniuk
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor Ivaniuk
 
Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving | Nameplate Manufacturing Process - 2024Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving | Nameplate Manufacturing Process - 2024
 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
 
Demystifying Knowledge Management through Storytelling
Demystifying Knowledge Management through StorytellingDemystifying Knowledge Management through Storytelling
Demystifying Knowledge Management through Storytelling
 
Skybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoptionSkybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoption
 
Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |
 
Christine's Product Research Presentation.pptx
Christine's Product Research Presentation.pptxChristine's Product Research Presentation.pptx
Christine's Product Research Presentation.pptx
 
AppSec PNW: Android and iOS Application Security with MobSF
AppSec PNW: Android and iOS Application Security with MobSFAppSec PNW: Android and iOS Application Security with MobSF
AppSec PNW: Android and iOS Application Security with MobSF
 
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-EfficiencyFreshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
 
"$10 thousand per minute of downtime: architecture, queues, streaming and fin...
"$10 thousand per minute of downtime: architecture, queues, streaming and fin..."$10 thousand per minute of downtime: architecture, queues, streaming and fin...
"$10 thousand per minute of downtime: architecture, queues, streaming and fin...
 

Towards Virtual Knowledge Graphs over Web APIs

  • 1. Towards Virtual Knowledge Graphs over Web APIs Francesco Corcoglioniti 2022-11-09 postdoc @ KRDB, Free University of Bolzano, supported by HIVE Fusion Grant project (2021-2022), OntoCRM project (2022-2024), and Ontopic s.r.l slides available online at https://bit.ly/3WOoldB
  • 2. 1. Introduction 2. The VKG Framework 3. The Ontop VKG System 4. VKGs over Web APIs 5. Conclusions
  • 3. Big Data Context Towards Virtual Knowledge Graphs over Web APIs – slides available at https://bit.ly/3WOoldB 1/34
  • 4. Variety Drives Data Management Initiatives 69% 25% 6% Relative Importance Variety Volume Velocity http://sloanreview.mit.edu/article/ variety-not-volume-is-driving-big-data-initiatives/ (2016) Data model heterogeneity relational data, graph data, XML, JSON, CSV, text files, ... System heterogeneity even when systems adopt the same data model, they are not always fully compatible Schema heterogeneity different people see things differently, and design schemas differently Data-level heterogeneity e.g., ‘IBM’ vs. ‘Int. Business Machines’ vs. ‘International Business Machines’ Towards Virtual Knowledge Graphs over Web APIs – slides available at https://bit.ly/3WOoldB 2/34
  • 5. Querying Data Takes Time and IT Expertise (besides Domain Knowledge) Query from Statoil (now Equinor) use case EU FP7 Optique project Natural language: In a given area, return pressure data tagged with stratigraphy and quality control attributes SQL: huge query joining 9 tables, the main one with 38 columns with cryptic names Query from Sloan Digital Sky Survey use case EU H2020 INODE project Natural language: Get all white dwarf stars SQL: unintelligible query defining ‘white dwarf’ SELECT objID FROM skyserverv3_correct.star WHERE u - g < .4 AND g - r < .7 AND r - i > .4 AND i - z > .4 Towards Virtual Knowledge Graphs over Web APIs – slides available at https://bit.ly/3WOoldB 3/34
  • 6. Virtual Knowledge Graphs (VKG) – a Data Access / Integration Solution Three key ideas: 1. use a global (or integrated) schema and map the data sources to the global schema 2. adopt a very flexible data model for the global schema → Knowledge Graph (KG) whose vocabulary is expressed in an ontology. 3. exploit virtualization, i.e., the KG is not materialized, but kept virtual This gives rise to the Virtual Knowledge Graph (VKG) approach to data access / integration, also called Ontology-Based Data Access / Integration (OBDA) Towards Virtual Knowledge Graphs over Web APIs – slides available at https://bit.ly/3WOoldB 4/34
  • 7. Virtual Knowledge graphs (VKG) – Core Components Ontology conceptualizes a domain of interest in terms of classes and (binary) properties, overall defining the terminological knowledge (TBox) of the VKG Data sources provide the data forming the RDF triples, i.e., the assertional knowledge (ABox), of the VKG Mapping define how to generate the RDF triples from the raw data (e.g., relational), via mapping assertions that populate each class/property of the ontology Queries formulated against the VKG (which is virtual) and rewritten in native queries evaluated over the sources . . . . . . . . . . . . Ontology O Mapping M Data sources D query results Towards Virtual Knowledge Graphs over Web APIs – slides available at https://bit.ly/3WOoldB 5/34
  • 8. 1. Introduction 2. The VKG Framework 3. The Ontop VKG System 4. VKGs over Web APIs 5. Conclusions
  • 9. VKG Framework – Which Languages to Use? Need to balance • expressive power of adopted languages for O, M, q • query answering efficiency with respect to data size . . . . . . . . . . . . Ontology O Mapping M Data sources D query results W3C has standardized languages that are suitable for VKGs: • Knowledge graph: expressed in RDF (W3C Rec. 2014 ) • Ontology O: expressed in OWL 2 QL (W3C Rec. 2012 ) • Mapping M: expressed in R2RML (W3C Rec. 2012 ) • Query q: expressed in SPARQL (W3C Rec. 2013 ) Towards Virtual Knowledge Graphs over Web APIs – slides available at https://bit.ly/3WOoldB 6/34
  • 10. RDF – Data Represented as a Graph The graph consists of a set of ⟨subject, predicate, object⟩ triples, over IRI, literal and blank nodes • IRI nodes (formerly URI): <http://example.org/M-25>, <M-25>, ex:M-25 or :M-25 • Literal nodes: "2008-02-12", "The Matrix"@en, "511"^^xsd:integer • class membership triples: <A-1> rdf:type :Actor . • object property triples: <A-1> :playsIn <M-25> . • data property triples: <M-25> :releaseDate "2008-02-12" . Towards Virtual Knowledge Graphs over Web APIs – slides available at https://bit.ly/3WOoldB 7/34
  • 11. OWL 2 QL – Lightweight Ontology Language for Accessing Large Amounts of Data Standard sub-language of OWL 2 [W3C Rec. 2012] Its assertions encode a logical theory in the DL-Lite fragment of description logics that enables reasoning by query rewriting Close correspondence with UML class diagrams and ER schemas used in conceptual modeling :actsIn rdfs:range :Movie :actsIn rdfs:subPropertyOf :playsIn . . . owl:someValuesFrom . . . Actor name: String SeriesActor MovieActor Play title: String Movie actsIn 1..⋆ ▶ playsIn ▶ {disjoint} In f ont UM Diego Calvanese (unibz + umu + ontopic) Ontology-based Data Access and Integration Assertion type DL syntax OWL syntax Subclass assertion MovieActor ⊑ Actor :MovieActor rdfs:subClassOf :Actor . Class disjointness Actor ⊑ ¬Movie :Actor owl:disjointWith :Movie . Domain of a property ∃actsIn ⊑ MovieActor :actsIn rdfs:domain :MovieActor . Range of a property ∃actsIn− ⊑ Movie :actsIn rdfs:range :Movie . Subproperty assertion actsIn ⊑ playsIn :actsIn rdfs:subPropertyOf :playsIn . Inverse properties actsIn ≡ hasActor− :actsIn owl:inverseOf :hasActor . Mandatory participation MovieActor ⊑ ∃actsIn owl:someValuesFrom in superclass expression Towards Virtual Knowledge Graphs over Web APIs – slides available at https://bit.ly/3WOoldB 8/34
  • 12. Mappings Define how to populate classes & properties via assertions of form: Qsql(⃗ x) ⇝ iri(⃗ x) rdf:type C Qsql(⃗ x) ⇝ iri1(⃗ x) P iri2(⃗ x) Ontology O: :actsIn rdfs:domain :MovieActor . :actsIn rdfs:range :Movie . :Movie rdfs:subClassOf :Play . :title rdfs:domain :Play . :title rdfs:range xsd:string . ... Mapping M: m1: SELECT mcode, mtitle FROM MOVIE WHERE type = "m" ⇝ :m-{mcode} rdf:type :Movie . :m-{mcode} :title {mtitle} . m2: SELECT M.mcode, A.acode FROM MOVIE M, ACTOR A WHERE M.mcode = A.pcode AND M.type = "m" ⇝ :a-{acode} :actsIn :m-{mcode} . Database D: MOVIE mcode mtitle myear type · · · 511 The Matrix 1999 m · · · 227 Blade Runner 1982 m · · · ACTOR pcode acode aname · · · 511 43 K. Reeves · · · 511 57 C.A. Moss · · · VKG V from O, M, D: :m-511 rdf:type :Movie . :m-227 rdf:type :Movie . :m-511 :title "The Matrix" . :m-227 :title "Blade Runner" . :a-43 :actsIn :m-511 . :a-57 :actsIn :m-511 . Towards Virtual Knowledge Graphs over Web APIs – slides available at https://bit.ly/3WOoldB 9/34
  • 13. SPARQL Query Language Standard query language for RDF data [W3C Rec. 2008, 2013], based on graph matching SELECT ?a ?t WHERE { ?a rdf:type :Actor . ?a :playsIn ?m . ?m rdf:type :Movie . ?m :title ?t . } ndard query language for RDF data. [W3C Rec. 2008, 2013] ry mechanism is based on graph matching. ?t a rdf:type Actor . a playsIn ?m . m rdf:type Movie . m title ?t . ?a Actor ?m Movie ?t rdf:type playsIn rdf:type title guage features (SPARQL 1.1): atches one of alternative graph patterns L: produces a match even when part of the pattern is missing FILTER conditions Y, to express aggregations remove possible solutions paths (regular expressions) Additional language features (SPARQL 1.1): • UNION: matches one of alternative graph patterns • OPTIONAL: produces a match even when part of the pattern is missing • complex FILTER conditions • GROUP BY, to express aggregations • MINUS, to remove possible solutions • property paths (regular expressions) • · · · Towards Virtual Knowledge Graphs over Web APIs – slides available at https://bit.ly/3WOoldB 10/34
  • 14. Query Answering in VKGs Goal: answer a query q over a VKG V by jointly considering: • the data provided by the data source D • the mapping M encoding how such data translates to ontology • the ontology O encoding domain knowledge that can be used to enrich answers. Example: • suppose that an entity :m-511 of class Movie can be obtained from the data D using some mapping assertion in M (e.g., m1 about table MOVIE) • suppose the ontology O states that each Movie is a Play, i.e., :Movie rdfs:subClassOf :Play • if query q asks for all Plays, we should return also m-511 that is a Movie and thus also a Play solution: Query answering by Query Reformulation Towards Virtual Knowledge Graphs over Web APIs – slides available at https://bit.ly/3WOoldB 11/34
  • 15. Query Answering in VKGs – Query Reformulation Ontology O Mappings M Data Sources D . . . . . . . . . . . . Ontological Query q Rewritten Query SQL Relational Answer Ontological Answer Rewriting Unfolding Evaluation Result Translation SELECT ?p { ?p rdf:type :Play } SELECT ?p { { ?p rdf:type :Play } UNION { ?p rdf:type :Movie } } SELECT mcode FROM MOVIE WHERE type = “m” ?p :m-511 mcode 511 D: MOVIE (mcode, mtitle, …) O: :Movie rdfs:subClassOf :Play M: SELECT mcode FROM MOVIE → :m-{mcode} a :Movie Towards Virtual Knowledge Graphs over Web APIs – slides available at https://bit.ly/3WOoldB 12/34
  • 16. 1. Introduction 2. The VKG Framework 3. The Ontop VKG System 4. VKGs over Web APIs 5. Conclusions
  • 17. The Ontop VKG System https://ontop-vkg.org/ • state-of-the-art VKG system born in UNIBZ (2009, first research in 2004) • compliant with all relevant Semantic Web standards: RDF, RDFS, OWL 2 QL, R2RML, SPARQL, and GeoSPARQL • implemented in Java (v1.8+) and also available as Docker image • supports all major relational DBMSs: Oracle, DB2, MS SQL Server, Postgres, MySQL, Teiid, Dremio, Denodo, etc. • open-source (Apache 2) project with a solid community 200+ mailing list members, 9000+ downloads in last 2 years • commercial services (open-core model) by Ontopic , a UNIBZ spin-off funded in 2019 Towards Virtual Knowledge Graphs over Web APIs – slides available at https://bit.ly/3WOoldB 13/34
  • 18. Ontop Usage Scenarios s Solution Mapping Data Ontology materialize virtualize Virtual Knowledge Graph Materialized Knowledge Graph ••• Query Query Result Triple Store VKG query answering • supports most of SPARQL 1.1 under OWL 2 QL inference regime • standard-compliant SPARQL endpoint • over one relational source, or • over multiple heterogeneous sources, together with a data federation system (e.g., Teiid, Dremio) providing an integrated relational view of sources VKG materialization • use ontology and mappings to efficiently & scalably materialize all the VKG triples • the produced RDF file can be loaded in any triplestore Towards Virtual Knowledge Graphs over Web APIs – slides available at https://bit.ly/3WOoldB 14/34
  • 19. Ontop Developer Community Towards Virtual Knowledge Graphs over Web APIs – slides available at https://bit.ly/3WOoldB 15/34
  • 20. Ontop in Research and Industrial Projects Research projects • Optique (EU FP7, 11/2012-10/2016) Ontop-based scalable end-user access to big data, 10 partners incl. Statoil, Siemens • EPNet (ERC Advanced Grant) cultural heritage project on food production and distribution in the Roman Empire • KAOS (Euregio, 06/2016-05/2019) preparing standardized log files from timestamped log data for process mining • INODE (EU H2020, 11/2019-10/2022) intelligent open data exploration • IDEE (ERDF 2014-2020) building & energy consumption data VKG Industrial projects • NOI Techpark development South Tyrol tourism KG • SIRIS Academic (Barcelona) open data integration and dashboards • Siemens Corportate Technologies (Munich) access to temporal and streaming data • Robert Bosch GmBH (Stuttgart) analysis of manufacturing log data • Metaphacts (Germany) inclusion of Ontop in their platform • Fluxicon (Milano) • Isagog (Rome) Towards Virtual Knowledge Graphs over Web APIs – slides available at https://bit.ly/3WOoldB 16/34
  • 21. Ontop in Action Optique project, Statoil use case From SQL query over the data source ... SELECT wellbore.identifier, stratigraphic_zone.strat_column_identifier, pty_pressure.pty_pressure_s, stratigraphic_zone.strat_unit_identifier FROM wellbore, pty_pressure, activity fp_depth_data LEFT JOIN ( pty_location_1d AS fp_depth_pt1_loc JOIN picked_stratigraphic_zones AS zs ON zs.strat_zone_entry_md <= fp_depth_pt1_loc.Data_value_1_o AND zs.strat_zone_exit_md >= fp_depth_pt1_loc.Data_value_1_o AND zs.strat_zone_depth_uom = fp_depth_pt1_loc.Data_value_1_ou JOIN join stratigraphic_zone ON zs.wellbore = stratigraphic_zone.wellbore AND zs.strat_column_identifier = stratigraphic_zone. strat_column_identifier AND zs.strat_interp_version = stratigraphic_zone.strat_interp_version AND zs.strat_zone_identifier = stratigraphic_zone.strat_zone_identifier ) ON fp_depth_data.facility_s = zs.wellbore AND fp_depth_data.activity_s = fp_depth_pt1_loc.activity_s, activity_class AS form_pressure_class WHERE wellbore.wellbore_s = fp_depth_data.Facility_s AND fp_depth_data.activity_s = pty_pressure.activity_s AND fp_depth_data.kind_s = form_pressure_class.activity_class_s AND wellbore.ref_existence_kind = 'actual' AND form_pressure_class.name = 'formation pressure depth data' ... to VKG SPARQL query SELECT ?wellbore ?chronostrat_unit ?top_md_m ?lithostrat_unit { ?w a :Wellbore ; :name ?wellbore ; :hasWellboreInterval ?intv . ?intv a :StratigraphicZone ; :hasUnit ?cu ; :hasTopDepth ?top . ?cu :name ?chronostrat_unit ; :ofStratigraphicColumn [ a :ChronoStratigraphicColumn ] . ?top a :MeasuredDepth ; :valueInStandardUnit ?top_md_m . ?intv :overlapsWellboreInterval ?litho_intv . ?litho_intv :hasUnit ?lu . ?lu :name ?lithostrat_unit ; :ofStratigraphicColumn [ a :LithoStratigraphicColumn ] . } Towards Virtual Knowledge Graphs over Web APIs – slides available at https://bit.ly/3WOoldB 17/34
  • 22. Ongoing Research & Development Directions Mapping patterns • bootstrapping (semi-automated generation) of mappings & possibly ontology for a data source • reduces VKG deploying costs, mostly related to mapping authoring Provenance & explanations • report which sources/tuples, mappings and ontology axioms contributed to a query answer • prototype Ontop extension based on provenance approaches (semi-rings) in DB community Geospatial queries • support GeoSPARQL to manipulate & query for geometries, leveraging DB support (e.g., PostGIS) Temporal/streaming extensions • support SQL-enabled stream processors like Flink and pattern matching over streaming data Non-relational sources • support non-relational data sources such as MongoDB, Neo4J and Web APIs Towards Virtual Knowledge Graphs over Web APIs – slides available at https://bit.ly/3WOoldB 18/34
  • 23. 1. Introduction 2. The VKG Framework 3. The Ontop VKG System 4. VKGs over Web APIs 5. Conclusions
  • 24. Accessing Web APIs Data is increasingly available via Web APIs • access to 3rd-party and/or dynamically-computed data • access to data-related services, e.g., text search Some APIs’ statisticsa • 83% of all Internet traffic belongs to API-based services • 2M+ API repositories on GitHub • 90% of developers use APIs • 30% of development time spent on coding APIs Complex data access problem for applications operating on data from both databases and APIs a https://nordicapis.com/20-impressive-api-economy-statistics/ RDB Sources API Sources SQL calls Application complex data access problem Towards Virtual Knowledge Graphs over Web APIs – slides available at https://bit.ly/3WOoldB 19/34
  • 25. Accessing Web APIs – Open Data Hub (ODH) RDB + Semantic Search API Example Answer hybrid queries like: • get (plot) IRI, description, rating & location of accommodations ... • whose rating is 3 stars or more (structured constraint) and ... • whose EN description matches the search string “horse riding” (text constraint) Semantic search: improved text search that aims at capturing and leveraging text meaning (vs term matching only) • e.g., via BERT-based model from Sentence Transformers library Towards Virtual Knowledge Graphs over Web APIs – slides available at https://bit.ly/3WOoldB 20/34
  • 26. Accessing Web APIs – Unified Access using a VKG • applications operate on a unified VKG spanning APIs and other involved sources → each API operation as an independent source → data federation setting due to multiple sources • VKG built (e.g., via Ontop) over a Virtual Database (VDB) federating all sources → VDB produced by a data federation system (e.g., Teiid) → the VDB offers a relational view of API data → VKG query reformulation may be tuned to this setting • delegate the complex orchestration of source sub-queries and API calls to a VKG + data federation system • exploit existing database techniques to cope with API access pattern restrictions during query answering Virtual DB (VDB) (Teiid extension) RDB Sources API Sources VKG (Ontop extension) SQL SQL calls SPARQL User / Application Towards Virtual Knowledge Graphs over Web APIs – slides available at https://bit.ly/3WOoldB 21/34
  • 27. VDB – SQL/MED Specification SQL/MED allows federating multiple sources in a virtual database (VDB) • standardized SQL extension supported by some data federation systems like Teiid • VDB as a set of schemas mapped to foreign data sources accessed via wrappers/translators • we extend Teiid with a new service translator for accessing APIs Example using Teiid with our extensions: CREATE DATABASE vdb_example OPTIONS ( "... connection options for federated sources ..." ); USE DATABASE vdb_example; CREATE SERVER db_source FOREIGN DATA WRAPPER postgresql; -- define RDB source with schema 'db' CREATE SCHEMA db SERVER db_source; -- using 'postgresql' translator to access it CREATE SERVER srv_source FOREIGN DATA WRAPPER service; -- define API source with schema 'srv' CREATE SCHEMA srv SERVER srv_source; -- using 'service' translator to access it IMPORT FOREIGN SCHEMA public FROM SERVER db_source INTO db OPTIONS ( importer.catalog 'public' ); SET SCHEMA srv; -- CREATE FOREIGN TABLE / PROCEDURE statements mapped to API operations (API bindings) Towards Virtual Knowledge Graphs over Web APIs – slides available at https://bit.ly/3WOoldB 22/34
  • 28. VDB – API Bindings API operations as SQL/MED procedures • input tuple → 0..n output tuples • URL, method, request/response templates CREATE FOREIGN PROCEDURE api_semsearch_query ( query VARCHAR ) RETURNS TABLE ( query VARCHAR, id VARCHAR, score DOUBLE, excerpt VARCHAR ) OPTIONS ( "method" 'post', "url" 'http://semsearch:8080/query', "requestBody" '{"query": "{query}", "n": 100}', "responseBody" '{"matches": [{ "id": "{id}", "score": "{score}", "excerpt": "{excerpt}" }] }' ); API data as SQL/MED virtual tables • linked to API operations/procedures • each procedure defines an access pattern CREATE FOREIGN TABLE vt_semsearch_match ( query VARCHAR NOT NULL, id VARCHAR NOT NULL, score DOUBLE NOT NULL, excerpt VARCHAR NOT NULL, PRIMARY KEY (query, id) ) OPTIONS ( "select" 'api_semsearch_query' ); CREATE FOREIGN TABLE vt_semsearch_index ( id VARCHAR PRIMARY KEY, text VARCHAR NOT NULL ) OPTIONS ( "UPDATABLE" 'true', "upsert" 'api_semsearch_store', "delete" 'api_semsearch_clear' ); Towards Virtual Knowledge Graphs over Web APIs – slides available at https://bit.ly/3WOoldB 23/34
  • 29. VDB – Query Translation & Execution Given a VDB defined using SQL/MED + API Bindings and an input query over the VDB • Teiid splits the query into sub-queries based on translator capabilities and cost heuristics • sub-queries are sent to translators & Teiid handles remaining operations (e.g., federated joins) Example SQL query SELECT s.score, s.excerpt, a."AccoCategoryId", a."AccoDetail-en-Name", a."AccoDetail-en-City" FROM srv.vt_semsearch_match AS s JOIN db.v_accommodationsopen AS a ON s.id = a."Id" WHERE s.query = 'horse riding' ORDER BY s.score DESC LIMIT 10 Execution plan LimitNode (limit = 10) SortNode (s.score DESC) ProjectNode (s.score, ... a."AccoDetail-en-City") JoinNode (s.id = a."Id", merge join strategy) AccessNode (API) SELECT id, excerpt, score FROM vt_semsearch_match WHERE query = ’horse riding’ AccessNode (RDB) SELECT "Id", "AccoDetail-en-Name", "AccoDetail-en-City", FROM v_accommodationsopen Towards Virtual Knowledge Graphs over Web APIs – slides available at https://bit.ly/3WOoldB 24/34
  • 30. VDB – Push-down of Projection, Filtering, Sorting, Slicing Special input attributes map API capabilities related to standard relational operators • filtering: return/process only objects matching some criteria (e.g., attribute = or ≥ constant) • projection: include/exclude certain attributes in returned results • sorting: sort results according to a certain attribute and direction (ascending/descending) • slicing: return only a given page of all possible results CREATE FOREIGN PROCEDURE api_station_data_from_to ( stype VARCHAR NOT NULL, sname VARCHAR NOT NULL, tname VARCHAR NOT NULL, __min_inclusive__mvaliddate DATE NOT NULL, -- filter push down (conditions min <= mvaliddate <= max) __max_inclusive__mvaliddate DATE NOT NULL, __limit__ INTEGER -- slicing push down ) RETURNS TABLE ( ... ) ) OPTIONS ( ... ); Partial/complete push down of these operators whenever possible • allows offloading computation to the API (e.g., sorting) • allows reducing costs by manipulating & transferring less data Towards Virtual Knowledge Graphs over Web APIs – slides available at https://bit.ly/3WOoldB 25/34
  • 31. VDB – Exploiting Bulk API Operations Bulk API operations operate on multiple input tuples, such as lookup by set of IDs or bulk store • their use enables better performance due to less API calls • useful to speed-up dependent joins (using IN operator) between RDBMS and API data A A RDBMS table R virtual table S bulk API operation (A input attribute) ⨝R.A = S.A SELECT A, … FROM R WHERE … 1 SELECT A, … FROM S WHERE A IN (a1, a2, …) AND … 3 2 Extract values of join attribute A: a1, a2, … API bindings 4 Bulk API calls with multiple input tuples for different values of A: a1, a2, … Towards Virtual Knowledge Graphs over Web APIs – slides available at https://bit.ly/3WOoldB 26/34
  • 32. VDB – Data Materialization Data materialization: required by API operations that cannot be invoked at query time • operations too expensive to call at query time (e.g., align API and DB identifiers) • operations instrumental to the use of external APIs (e.g., text indexing in a search engine) Solution #1: materialized views in Teiid (or other data federation system used) Solution #2: dedicated materialization engine for flexibly executing arbitrary materialization rules: • identifier – for documentation & diagnostics • target – the system-managed computed table (possibly virtual) where data is stored • source – arbitrary SQL query (over any tables) that produces the data to store rules: - id: index_accommodation_texts target: vt_semsearch_index source: |- SELECT "Id" AS id, "AccoDetail-en-Longdesc" AS text FROM v_accommodationsopen WHERE "AccoDetail-en-Longdesc" IS NOT NULL - ... other rules ... Towards Virtual Knowledge Graphs over Web APIs – slides available at https://bit.ly/3WOoldB 27/34
  • 33. VDB – Data Materialization (cont’d) Rules (their SQL source queries) are analyzed to derive a rule dependency graph, which is mapped to an execution plan using fixpoint rule evaluation for strongly connected components R1 R2 R3 R4 R5 R1 R2 R3 R4 R5 sequence ( parallel ( R1, sequence ( R2, fixpoint ( parallel ( R3, R4 ) ) ) ), R5 ) Rule / Table Dependencies Rule Dependencies Execution Plan Towards Virtual Knowledge Graphs over Web APIs – slides available at https://bit.ly/3WOoldB 28/34
  • 34. VKG – Example of Ontology & Mappings over the VDB Ontology schema:Accommodation a owl:Class ; rdfs:subClassOf schema:Place ; rdfs:label "Accommodation"@en ; ... schema:name a owl:DatatypeProperty ; ... hive:Match a owl:Class ... Current ontology formalism (OWL 2 QL) reused as is, but now also models data from APIs Mappings mappingId Semantic Search target data:match/accommodation/{id}/{query} a hive:Match; hive:query {query}^^xsd:string; hive:resource data:accommodation/{id}; hive:excerpt {excerpt}@en; hive:score {score}^^xsd:decimal. source SELECT * FROM hiveodh.srv.vt_semsearch_match Current VKG mapping formalism reused as is, but data may now come from API virtual tables Towards Virtual Knowledge Graphs over Web APIs – slides available at https://bit.ly/3WOoldB 29/34
  • 35. VKG – Query Rewriting & Evaluation Example User-supplied SPARQL query SELECT ?h ?posLabel ?rating ?pos { [] a hive:Match ; hive:query "horse riding"^^xsd:string ; hive:resource ?h ; hive:excerpt ?excerpt ; hive:score ?score . ?h a schema:LodgingBusiness ; geo:defaultGeometry/geo:asWKT ?pos ; schema:name ?name ; schema:description ?description ; schema:starRating/schema:ratingValue ?rating. FILTER (?rating >= 3 && lang(?name) = 'en' && lang(?description) = 'en') BIND (CONCAT(?name, " <br><br>...", ?excerpt, "...<br><br>", ?description) AS ?posLabel) } ORDER BY DESC(?score) LIMIT 10 SQL query rewritten by Ontop SELECT v1.id, v1.excerpt, -- fields used v2."AccoDetail-en-Name", -- for deriving v2."AccoDetail-en-Longdesc", -- ?posLabel ... complex expression computing rating ..., ST_ASTEXT(v2."Geometry") FROM hiveodh.srv.vt_semsearch_match v1, hiveodh.db.v_accommodationsopen v2 WHERE v1."id" = v2."Id" AND CAST(v1."query" AS TEXT) = 'horse riding' AND ... complex condition on rating >= 3 ... AND ... nonnull conditions for output columns ... ORDER BY CAST(v1."score" AS DECIMAL) DESC LIMIT 10 SQL query evaluated on the VDB by Teiid Towards Virtual Knowledge Graphs over Web APIs – slides available at https://bit.ly/3WOoldB 30/34
  • 36. VKG – ODH with Semantic Search Demo Data sources DB with ODH tourism data + Semantic search API to index & query accommodations texts System Ontop embedding Teiid + materialization engine Demo https://hive.inf.unibz.it/ odh/vkg/ reformulate example Towards Virtual Knowledge Graphs over Web APIs – slides available at https://bit.ly/3WOoldB 31/34
  • 37. Overall Framework for VKGs over APIs Virtual DB (VDB) Teiid + service translator VKG Mappings including virtual tables, used for query rewriting Materialization Rules pre-compute results of expensive API calls → VDB/VKG no more fully “virtual” API Bindings define how to query/update a virtual table via API calls, if possible → limited access patterns RDB Sources API Sources Virtual Knowledge Graph (VKG) Ontop SQL SQL calls Application (VKG-based) Application (VDB-based) SQL SPARQL VKG Ontology formalizes the classes/properties (the “schema”) of the VKG, enabling reasoning Towards Virtual Knowledge Graphs over Web APIs – slides available at https://bit.ly/3WOoldB 32/34
  • 38. 1. Introduction 2. The VKG Framework 3. The Ontop VKG System 4. VKGs over Web APIs 5. Conclusions
  • 39. Takeaway Messages Virtual Knowledge Graphs (VKG): flexible technology for building KGs over existing data source(s) • useful for inherently relational data where a VKG engine + RDBMS may outperform a triplestore • useful for existing data RDF-ification via VKG materialization to an RDF file Ontop: mature, open-source VKG system with a solid user & developer community • allows a VKG over a single RDB, with support for multiple database engines • allows a VKG over multiple heterogeneous sources, in combination with an intermediate data federation system such as the open-source Teiid & Dremio • active research & development for adding new features and new data sources VKGs over Web APIs: ongoing research & development effort • enables transparent access to dynamically-computed API data via declarative queries • API operations mapped to virtual relations, accessed through a Teiid extension • optimizations for better using API features, such as bulk operations and operators’ push-down • expensive API operations supported via pre-computation and data materialization Towards Virtual Knowledge Graphs over Web APIs – slides available at https://bit.ly/3WOoldB 33/34
  • 40. Thanks for attending! these slides: https://bit.ly/3WOoldB Ontop: https://ontop-vkg.org/