SlideShare a Scribd company logo
Ephedra: efficiently combining RDF data and services
using SPARQL federation
Andriy Nikolov
Peter Haase
Johannes Trame
Artem Kozlov
2
• Motivation: traditional vs hybrid federation
• Background: metaphactory platform
• Ephedra SPARQL federation engine:
• Architecture
• Optimization techniques
• Evaluation
Table of Contents
3
• Multiple physically distributed data repositories
• Each data repository represents a SPARQL endpoint
• sometimes, a relational database exposed using R2RML
”Canonical” SPARQL 1.1 federation scenario
Relational Database
OBDA / ETL
Graph database Graph database Graph database Graph database
4
• Not only SPARQL endpoints!
• RDF data
• Custom indices
• Compute services
• Machine learning models
• Enterprise APIs
• …
Hybrid federation scenario
Graph database
Machine learning model Custom API
?
Specialized indices
5
• Variety of data sources
• Local/remote RDF stores, virtual RDF
(Ontop)
• Variety of data modalities
• Graph, text, temporal, geospatial
• Variety of data processing techniques
• Graph analytics
• Statistical analysis/machine learning
• Domain-specific (BLAST genome
sequence similarity)
Challenges: Hybrid federation queries
Graph database
Relational Database
OBDA / ETL
NoSQL Data
/ Elastic Search Engines
Deep Learning
Service
BLAST Service
6
Background: metaphactory platform
Alexa
Semantic Search VisualizationAuthoring
Tableau
metaphactory Frontend – Modular W3C Web Components
metaphactory Backend – Java 8
Ontology Mgmt.
Service
Tableau Connector
Service
NLP Intent
Service
Hybrid SPARQL ServiceSPARQL
Service
RDF Graph Store
- Data Ingestion
Query Catalog
Service
Query as a
Service
metaphactory - Java servlet, Jetty Deployment
REST via JAVAX/Jersey, ACL/Security via Apache Shiro
3rd Party Tools / Services
Runtime / Configuration Data
Relational Database
OBDA / ETL
Graph database Main repository
7
Solution: Ephedra architecture
Graph Databases Blazegraph
Triplestore
Alexa
Semantic Search VisualizationAuthoring
Tableau
metaphactory Frontend – Modular W3C Web Components
metaphactory Backend – Java 8
Ontology Mgmt.
Service
NoSQL Data
/ Elastic Search Engines
Tableau Connector
Service
NLP Intent
Service
Hybrid SPARQL ServiceSPARQL
Service
RDF Graph Store
- Data Ingestion
Query Catalog
Service
Query as a
Service
metaphactory - Java servlet, Jetty Deployment
REST via JAVAX/Jersey, ACL/Security via Apache Shiro
3rd Party Tools / Services
RDF Data Structured Data / Indices (Text / Spatial…) Analytics Results
Relational Database
OBDA / ETL
Deep Learning
Service
R Service
Runtime / Configuration Data
RDF4J Federation SAIL API
Query plan optimizer
Runtime query execution engine
Service
Registry
Ephedra
8
• Extends RDF4J (ex-Sesame) API
• Compute services are wrapped into “virtual” RDF4J
repositories
• SPARQL graph patterns are transformed into API calls
• Service wrapper repositories are explicitly described in the
service registry
• Input/output parameters
• Expected graph patterns
• SPARQL 1.1 federation using the SERVICE keyword
• No automatic source selection
Main principles
9
# Select a painter similar to rembrandt
SELECT ?painter WHERE {
SERVICE eph:word2vec {
wd:Q5598 word2vec:hasSimilar ?painter .
}
?painter wdt:P106 wd:Q1028181 . # occupation: painter
}
Describing services
word2vec
embeddings
in: URI out: URI[]
# Service type descriptor (extended SPIN)
eph:word2vec a eph:Service ;
eph:hasSPARQLPattern [
sp:subject :_inputURI ;
sp:predicate word2vec:hasSimilar ;
sp:object :_outputURI .
] ;
spin:constraint [ spl:predicate _inputURI ] ;
spin:column [ spl:predicate _outputURI ] .
Service
Registry
wd:Q5598
?painter =
wd:Q5597
# Raphael
Matching service inputs/outputs to
SPARQL patterns
# Service instance descriptor (RDF4J SAIL)
serviceURL = http://wikidatatest.metaphacts.com/word2vec
10
Static optimizations
• Reordering of clauses based on input/output constraints
• Rank-aware optimizations
• Pushing LIMIT and ORDER BY operations down the tree
?uri
SELECT * WHERE {
SERVICE eph:word2vec {
?uri word2vec:hasSimilar ?painter .
}
SERVICE eph:wikidataText {
?uri wikidata:search “rembrandt”
}
}
SELECT * WHERE {
SERVICE eph:wikidataText {
?uri wikidata:search “rembrandt”
}
SERVICE eph:word2vec {
?uri word2vec:hasSimilar ?painter .
}
}
11
• Synchronizing loop join requests
• Synchronous vs asynchronous
• Separate requests vs batch
• “Breadth-first” vs “depth-first”
Runtime optimizations
?id1 wikidata:search
“hokusai”.
?id1BM owl:sameAs ?id1 .
?id1BM :collaboratedWith
?id2BM .
?id2BM owl:sameAs ?id2 .
?id2 word2vec:hasSimilar ?id3.
Σ": Wikidata text search API Σ#: British Museum Σ$: word2vec
𝜇" 𝜇""
𝜇"#
𝜇"""
𝜇""#
𝜇" 𝜇""
𝜇"#
𝜇"""
𝜇""#
12
• Multiple alternative plans possible:
• Selectivity hard to estimate in advance
• Alternative plans executed in parallel (competing)
• Plans revised during execution
Parallel competing join
SELECT * WHERE {
SERVICE eph:wikidataText {
?home wikidata:search “florence” .
}
SERVICE eph:wikidataText {
?char wikidata:search “mary” .
}
?image wdt:P180 ?character . # depicts
?image wdt:P179 ?creator . # creator
?creator wdt:P19 ?home . # place of birth
}
Σ"
Σ#
Σ$
Σ#
Obtained
results Σ$
Obtained
results
Hash join
Σ"
13
• Cultural heritage
• RDF stores
• British Museum
• Wikidata
• Services
• Wikidata text search
• word2vec similarity service
• Life sciences
• RDF stores
• Wikidata, Nextprot, Uniprot
• Proprietary data
• Services
• Wikidata text search
• BLAST sequence similarity
Applications
14
• Benchmark queries from 2 use cases domains:
• 4 from cultural heritage, 3 from life sciences
• Compared runtime with and without the Ephedra
optimization techniques
Evaluation
0
2
4
6
8
10
12
14
CH1 (2+2) CH2 (1+1) CH3 (2+0) CH4 (2+2) PH1 (1+2) PH2 (1+1) PH3 (2+1)
Query runtime
No runtime optimization Runtime optimization
15
• Architecture for integrating SPARQL endpoints and
compute services using SPARQL 1.1 federation
• Explicit mappings between SPARQL nodes and service
input/output parameters
• Static and runtime optimizations for hybrid queries
• Future work:
• Backend-aware optimizations
• Special treatment of relational databases exposed via Ontop
• Optimizations based on service calls statistics
• Applications
• Integrated query results as input for machine learning
• Browsing integrated data + services as a virtual knowledge
graph
Summary
16
• Questions?

More Related Content

What's hot

Linked Data Experiences at Springer Nature
Linked Data Experiences at Springer NatureLinked Data Experiences at Springer Nature
Linked Data Experiences at Springer Nature
Michele Pasin
 
2014-02-27 Wikidata talk Cambridge
2014-02-27 Wikidata talk Cambridge2014-02-27 Wikidata talk Cambridge
2014-02-27 Wikidata talk Cambridge
Magnus Manske
 
Enterprise knowledge graphs
Enterprise knowledge graphsEnterprise knowledge graphs
Enterprise knowledge graphs
Sören Auer
 
DSpace standard Data model and DSpace-CRIS
DSpace standard Data model and DSpace-CRISDSpace standard Data model and DSpace-CRIS
DSpace standard Data model and DSpace-CRIS
Andrea Bollini
 
McDanold-1-jun15
McDanold-1-jun15McDanold-1-jun15
Maximising (Re)Usability of Library metadata using Linked Data
Maximising (Re)Usability of Library metadata using Linked Data Maximising (Re)Usability of Library metadata using Linked Data
Maximising (Re)Usability of Library metadata using Linked Data
Asuncion Gomez-Perez
 
DSpace-CRIS: new features and contribution to the DSpace mainstream
DSpace-CRIS: new features and contribution to the DSpace mainstreamDSpace-CRIS: new features and contribution to the DSpace mainstream
DSpace-CRIS: new features and contribution to the DSpace mainstream
Andrea Bollini
 
Repository technologies
Repository technologiesRepository technologies
Repository technologies
Andrea Bollini
 
Nicoletta Fornara and Fabio Marfia | Modeling and Enforcing Access Control Ob...
Nicoletta Fornara and Fabio Marfia | Modeling and Enforcing Access Control Ob...Nicoletta Fornara and Fabio Marfia | Modeling and Enforcing Access Control Ob...
Nicoletta Fornara and Fabio Marfia | Modeling and Enforcing Access Control Ob...
semanticsconference
 
Linked data experience at Macmillan: Building discovery services for scientif...
Linked data experience at Macmillan: Building discovery services for scientif...Linked data experience at Macmillan: Building discovery services for scientif...
Linked data experience at Macmillan: Building discovery services for scientif...
Michele Pasin
 
Scaling up Linked Data
Scaling up Linked DataScaling up Linked Data
Scaling up Linked Data
EUCLID project
 
First Steps in Semantic Data Modelling and Search & Analytics in the Cloud
First Steps in Semantic Data Modelling and Search & Analytics in the CloudFirst Steps in Semantic Data Modelling and Search & Analytics in the Cloud
First Steps in Semantic Data Modelling and Search & Analytics in the Cloud
Ontotext
 
The Power of Semantic Technologies to Explore Linked Open Data
The Power of Semantic Technologies to Explore Linked Open DataThe Power of Semantic Technologies to Explore Linked Open Data
The Power of Semantic Technologies to Explore Linked Open Data
Ontotext
 
Linked Data from a Digital Object Management System
Linked Data from a Digital Object Management SystemLinked Data from a Digital Object Management System
Linked Data from a Digital Object Management System
Uldis Bojars
 
Lauruhn-5-jun15
Lauruhn-5-jun15Lauruhn-5-jun15
The Bounties of Semantic Data Integration for the Enterprise
The Bounties of Semantic Data Integration for the Enterprise The Bounties of Semantic Data Integration for the Enterprise
The Bounties of Semantic Data Integration for the Enterprise
Ontotext
 
Linking library data
Linking library dataLinking library data
Linking library data
Jindřich Mynarz
 
Towards digitizing scholarly communication
Towards digitizing scholarly communicationTowards digitizing scholarly communication
Towards digitizing scholarly communication
Sören Auer
 
EC-WEB: Validator and Preview for the JobPosting Data Model of Schema.org
EC-WEB: Validator and Preview for the JobPosting Data Model of Schema.orgEC-WEB: Validator and Preview for the JobPosting Data Model of Schema.org
EC-WEB: Validator and Preview for the JobPosting Data Model of Schema.org
Jindřich Mynarz
 
LDOW2015 Position Talk and Discussion
LDOW2015 Position Talk and DiscussionLDOW2015 Position Talk and Discussion
LDOW2015 Position Talk and Discussion
Sören Auer
 

What's hot (20)

Linked Data Experiences at Springer Nature
Linked Data Experiences at Springer NatureLinked Data Experiences at Springer Nature
Linked Data Experiences at Springer Nature
 
2014-02-27 Wikidata talk Cambridge
2014-02-27 Wikidata talk Cambridge2014-02-27 Wikidata talk Cambridge
2014-02-27 Wikidata talk Cambridge
 
Enterprise knowledge graphs
Enterprise knowledge graphsEnterprise knowledge graphs
Enterprise knowledge graphs
 
DSpace standard Data model and DSpace-CRIS
DSpace standard Data model and DSpace-CRISDSpace standard Data model and DSpace-CRIS
DSpace standard Data model and DSpace-CRIS
 
McDanold-1-jun15
McDanold-1-jun15McDanold-1-jun15
McDanold-1-jun15
 
Maximising (Re)Usability of Library metadata using Linked Data
Maximising (Re)Usability of Library metadata using Linked Data Maximising (Re)Usability of Library metadata using Linked Data
Maximising (Re)Usability of Library metadata using Linked Data
 
DSpace-CRIS: new features and contribution to the DSpace mainstream
DSpace-CRIS: new features and contribution to the DSpace mainstreamDSpace-CRIS: new features and contribution to the DSpace mainstream
DSpace-CRIS: new features and contribution to the DSpace mainstream
 
Repository technologies
Repository technologiesRepository technologies
Repository technologies
 
Nicoletta Fornara and Fabio Marfia | Modeling and Enforcing Access Control Ob...
Nicoletta Fornara and Fabio Marfia | Modeling and Enforcing Access Control Ob...Nicoletta Fornara and Fabio Marfia | Modeling and Enforcing Access Control Ob...
Nicoletta Fornara and Fabio Marfia | Modeling and Enforcing Access Control Ob...
 
Linked data experience at Macmillan: Building discovery services for scientif...
Linked data experience at Macmillan: Building discovery services for scientif...Linked data experience at Macmillan: Building discovery services for scientif...
Linked data experience at Macmillan: Building discovery services for scientif...
 
Scaling up Linked Data
Scaling up Linked DataScaling up Linked Data
Scaling up Linked Data
 
First Steps in Semantic Data Modelling and Search & Analytics in the Cloud
First Steps in Semantic Data Modelling and Search & Analytics in the CloudFirst Steps in Semantic Data Modelling and Search & Analytics in the Cloud
First Steps in Semantic Data Modelling and Search & Analytics in the Cloud
 
The Power of Semantic Technologies to Explore Linked Open Data
The Power of Semantic Technologies to Explore Linked Open DataThe Power of Semantic Technologies to Explore Linked Open Data
The Power of Semantic Technologies to Explore Linked Open Data
 
Linked Data from a Digital Object Management System
Linked Data from a Digital Object Management SystemLinked Data from a Digital Object Management System
Linked Data from a Digital Object Management System
 
Lauruhn-5-jun15
Lauruhn-5-jun15Lauruhn-5-jun15
Lauruhn-5-jun15
 
The Bounties of Semantic Data Integration for the Enterprise
The Bounties of Semantic Data Integration for the Enterprise The Bounties of Semantic Data Integration for the Enterprise
The Bounties of Semantic Data Integration for the Enterprise
 
Linking library data
Linking library dataLinking library data
Linking library data
 
Towards digitizing scholarly communication
Towards digitizing scholarly communicationTowards digitizing scholarly communication
Towards digitizing scholarly communication
 
EC-WEB: Validator and Preview for the JobPosting Data Model of Schema.org
EC-WEB: Validator and Preview for the JobPosting Data Model of Schema.orgEC-WEB: Validator and Preview for the JobPosting Data Model of Schema.org
EC-WEB: Validator and Preview for the JobPosting Data Model of Schema.org
 
LDOW2015 Position Talk and Discussion
LDOW2015 Position Talk and DiscussionLDOW2015 Position Talk and Discussion
LDOW2015 Position Talk and Discussion
 

Similar to Ephedra: efficiently combining RDF data and services using SPARQL federation

Data Integration And Visualization
Data Integration And VisualizationData Integration And Visualization
Data Integration And Visualization
Ivan Ermilov
 
Composable Parallel Processing in Apache Spark and Weld
Composable Parallel Processing in Apache Spark and WeldComposable Parallel Processing in Apache Spark and Weld
Composable Parallel Processing in Apache Spark and Weld
Databricks
 
Big Data visualization with Apache Spark and Zeppelin
Big Data visualization with Apache Spark and ZeppelinBig Data visualization with Apache Spark and Zeppelin
Big Data visualization with Apache Spark and Zeppelin
prajods
 
RDFauthor (EKAW)
RDFauthor (EKAW)RDFauthor (EKAW)
RDFauthor (EKAW)
Norman Heino
 
Apache Spark Data Source V2 with Wenchen Fan and Gengliang Wang
Apache Spark Data Source V2 with Wenchen Fan and Gengliang WangApache Spark Data Source V2 with Wenchen Fan and Gengliang Wang
Apache Spark Data Source V2 with Wenchen Fan and Gengliang Wang
Databricks
 
Coral-and-Transport_Portable-SQL-and-UDFs-for-the-Interoperability-of-Spark-a...
Coral-and-Transport_Portable-SQL-and-UDFs-for-the-Interoperability-of-Spark-a...Coral-and-Transport_Portable-SQL-and-UDFs-for-the-Interoperability-of-Spark-a...
Coral-and-Transport_Portable-SQL-and-UDFs-for-the-Interoperability-of-Spark-a...
aiuy
 
RDF Linked Data - Automatic Exchange of BIM Containers
RDF Linked Data - Automatic Exchange of BIM ContainersRDF Linked Data - Automatic Exchange of BIM Containers
RDF Linked Data - Automatic Exchange of BIM Containers
Safe Software
 
balloon Fusion: SPARQL Rewriting Based on Unified Co-Reference Information
balloon Fusion: SPARQL Rewriting Based on  Unified Co-Reference Informationballoon Fusion: SPARQL Rewriting Based on  Unified Co-Reference Information
balloon Fusion: SPARQL Rewriting Based on Unified Co-Reference Information
Kai Schlegel
 
CMD2RDF
CMD2RDFCMD2RDF
Incremental Export of Relational Database Contents into RDF Graphs
Incremental Export of Relational Database Contents into RDF GraphsIncremental Export of Relational Database Contents into RDF Graphs
Incremental Export of Relational Database Contents into RDF Graphs
Nikolaos Konstantinou
 
Using Apache Calcite for Enabling SQL and JDBC Access to Apache Geode and Oth...
Using Apache Calcite for Enabling SQL and JDBC Access to Apache Geode and Oth...Using Apache Calcite for Enabling SQL and JDBC Access to Apache Geode and Oth...
Using Apache Calcite for Enabling SQL and JDBC Access to Apache Geode and Oth...
Christian Tzolov
 
Apache Spark on HDinsight Training
Apache Spark on HDinsight TrainingApache Spark on HDinsight Training
Apache Spark on HDinsight Training
Synergetics Learning and Cloud Consulting
 
RDF-Gen: Generating RDF from streaming and archival data
RDF-Gen: Generating RDF from streaming and archival dataRDF-Gen: Generating RDF from streaming and archival data
RDF-Gen: Generating RDF from streaming and archival data
Giorgos Santipantakis
 
Arabidopsis Information Portal, Developer Workshop 2014, Introduction
Arabidopsis Information Portal, Developer Workshop 2014, IntroductionArabidopsis Information Portal, Developer Workshop 2014, Introduction
Arabidopsis Information Portal, Developer Workshop 2014, Introduction
JasonRafeMiller
 
D2RQ
D2RQD2RQ
Morpheus SQL and Cypher® in Apache® Spark - Big Data Meetup Munich
Morpheus SQL and Cypher® in Apache® Spark - Big Data Meetup MunichMorpheus SQL and Cypher® in Apache® Spark - Big Data Meetup Munich
Morpheus SQL and Cypher® in Apache® Spark - Big Data Meetup Munich
Martin Junghanns
 
Morpheus - SQL and Cypher in Apache Spark
Morpheus - SQL and Cypher in Apache SparkMorpheus - SQL and Cypher in Apache Spark
Morpheus - SQL and Cypher in Apache Spark
Henning Kropp
 
SPARQL and RDF query optimization
SPARQL and RDF query optimizationSPARQL and RDF query optimization
SPARQL and RDF query optimization
Kisung Kim
 
LOD技術解説
LOD技術解説LOD技術解説
LOD技術解説
Fumihiro Kato
 
20170126 big data processing
20170126 big data processing20170126 big data processing
20170126 big data processing
Vienna Data Science Group
 

Similar to Ephedra: efficiently combining RDF data and services using SPARQL federation (20)

Data Integration And Visualization
Data Integration And VisualizationData Integration And Visualization
Data Integration And Visualization
 
Composable Parallel Processing in Apache Spark and Weld
Composable Parallel Processing in Apache Spark and WeldComposable Parallel Processing in Apache Spark and Weld
Composable Parallel Processing in Apache Spark and Weld
 
Big Data visualization with Apache Spark and Zeppelin
Big Data visualization with Apache Spark and ZeppelinBig Data visualization with Apache Spark and Zeppelin
Big Data visualization with Apache Spark and Zeppelin
 
RDFauthor (EKAW)
RDFauthor (EKAW)RDFauthor (EKAW)
RDFauthor (EKAW)
 
Apache Spark Data Source V2 with Wenchen Fan and Gengliang Wang
Apache Spark Data Source V2 with Wenchen Fan and Gengliang WangApache Spark Data Source V2 with Wenchen Fan and Gengliang Wang
Apache Spark Data Source V2 with Wenchen Fan and Gengliang Wang
 
Coral-and-Transport_Portable-SQL-and-UDFs-for-the-Interoperability-of-Spark-a...
Coral-and-Transport_Portable-SQL-and-UDFs-for-the-Interoperability-of-Spark-a...Coral-and-Transport_Portable-SQL-and-UDFs-for-the-Interoperability-of-Spark-a...
Coral-and-Transport_Portable-SQL-and-UDFs-for-the-Interoperability-of-Spark-a...
 
RDF Linked Data - Automatic Exchange of BIM Containers
RDF Linked Data - Automatic Exchange of BIM ContainersRDF Linked Data - Automatic Exchange of BIM Containers
RDF Linked Data - Automatic Exchange of BIM Containers
 
balloon Fusion: SPARQL Rewriting Based on Unified Co-Reference Information
balloon Fusion: SPARQL Rewriting Based on  Unified Co-Reference Informationballoon Fusion: SPARQL Rewriting Based on  Unified Co-Reference Information
balloon Fusion: SPARQL Rewriting Based on Unified Co-Reference Information
 
CMD2RDF
CMD2RDFCMD2RDF
CMD2RDF
 
Incremental Export of Relational Database Contents into RDF Graphs
Incremental Export of Relational Database Contents into RDF GraphsIncremental Export of Relational Database Contents into RDF Graphs
Incremental Export of Relational Database Contents into RDF Graphs
 
Using Apache Calcite for Enabling SQL and JDBC Access to Apache Geode and Oth...
Using Apache Calcite for Enabling SQL and JDBC Access to Apache Geode and Oth...Using Apache Calcite for Enabling SQL and JDBC Access to Apache Geode and Oth...
Using Apache Calcite for Enabling SQL and JDBC Access to Apache Geode and Oth...
 
Apache Spark on HDinsight Training
Apache Spark on HDinsight TrainingApache Spark on HDinsight Training
Apache Spark on HDinsight Training
 
RDF-Gen: Generating RDF from streaming and archival data
RDF-Gen: Generating RDF from streaming and archival dataRDF-Gen: Generating RDF from streaming and archival data
RDF-Gen: Generating RDF from streaming and archival data
 
Arabidopsis Information Portal, Developer Workshop 2014, Introduction
Arabidopsis Information Portal, Developer Workshop 2014, IntroductionArabidopsis Information Portal, Developer Workshop 2014, Introduction
Arabidopsis Information Portal, Developer Workshop 2014, Introduction
 
D2RQ
D2RQD2RQ
D2RQ
 
Morpheus SQL and Cypher® in Apache® Spark - Big Data Meetup Munich
Morpheus SQL and Cypher® in Apache® Spark - Big Data Meetup MunichMorpheus SQL and Cypher® in Apache® Spark - Big Data Meetup Munich
Morpheus SQL and Cypher® in Apache® Spark - Big Data Meetup Munich
 
Morpheus - SQL and Cypher in Apache Spark
Morpheus - SQL and Cypher in Apache SparkMorpheus - SQL and Cypher in Apache Spark
Morpheus - SQL and Cypher in Apache Spark
 
SPARQL and RDF query optimization
SPARQL and RDF query optimizationSPARQL and RDF query optimization
SPARQL and RDF query optimization
 
LOD技術解説
LOD技術解説LOD技術解説
LOD技術解説
 
20170126 big data processing
20170126 big data processing20170126 big data processing
20170126 big data processing
 

More from Peter Haase

Building Enterprise-Ready Knowledge Graph Applications in the Cloud
Building Enterprise-Ready Knowledge Graph Applications in the CloudBuilding Enterprise-Ready Knowledge Graph Applications in the Cloud
Building Enterprise-Ready Knowledge Graph Applications in the Cloud
Peter Haase
 
Mapping, Interlinking and Exposing MusicBrainz as Linked Data
Mapping, Interlinking and Exposing MusicBrainz as Linked DataMapping, Interlinking and Exposing MusicBrainz as Linked Data
Mapping, Interlinking and Exposing MusicBrainz as Linked Data
Peter Haase
 
On demand access to Big Data through Semantic Technologies
 On demand access to Big Data through Semantic Technologies On demand access to Big Data through Semantic Technologies
On demand access to Big Data through Semantic Technologies
Peter Haase
 
Linked Data as a Service
Linked Data as a ServiceLinked Data as a Service
Linked Data as a ServicePeter Haase
 
Fedbench - A Benchmark Suite for Federated Semantic Data Processing
Fedbench - A Benchmark Suite for Federated Semantic Data ProcessingFedbench - A Benchmark Suite for Federated Semantic Data Processing
Fedbench - A Benchmark Suite for Federated Semantic Data ProcessingPeter Haase
 
Everything Self-Service:Linked Data Applications with the Information Workbench
Everything Self-Service:Linked Data Applications with the Information WorkbenchEverything Self-Service:Linked Data Applications with the Information Workbench
Everything Self-Service:Linked Data Applications with the Information WorkbenchPeter Haase
 
The Information Workbench as a Self-Service Platform for Linked Data Applicat...
The Information Workbench as a Self-Service Platform for Linked Data Applicat...The Information Workbench as a Self-Service Platform for Linked Data Applicat...
The Information Workbench as a Self-Service Platform for Linked Data Applicat...Peter Haase
 
Cloud-based Linked Data Management for Self-service Application Development
Cloud-based Linked Data Management for Self-service Application DevelopmentCloud-based Linked Data Management for Self-service Application Development
Cloud-based Linked Data Management for Self-service Application DevelopmentPeter Haase
 
Semantic Technologies for Enterprise Cloud Management
Semantic Technologies for Enterprise Cloud ManagementSemantic Technologies for Enterprise Cloud Management
Semantic Technologies for Enterprise Cloud ManagementPeter Haase
 

More from Peter Haase (9)

Building Enterprise-Ready Knowledge Graph Applications in the Cloud
Building Enterprise-Ready Knowledge Graph Applications in the CloudBuilding Enterprise-Ready Knowledge Graph Applications in the Cloud
Building Enterprise-Ready Knowledge Graph Applications in the Cloud
 
Mapping, Interlinking and Exposing MusicBrainz as Linked Data
Mapping, Interlinking and Exposing MusicBrainz as Linked DataMapping, Interlinking and Exposing MusicBrainz as Linked Data
Mapping, Interlinking and Exposing MusicBrainz as Linked Data
 
On demand access to Big Data through Semantic Technologies
 On demand access to Big Data through Semantic Technologies On demand access to Big Data through Semantic Technologies
On demand access to Big Data through Semantic Technologies
 
Linked Data as a Service
Linked Data as a ServiceLinked Data as a Service
Linked Data as a Service
 
Fedbench - A Benchmark Suite for Federated Semantic Data Processing
Fedbench - A Benchmark Suite for Federated Semantic Data ProcessingFedbench - A Benchmark Suite for Federated Semantic Data Processing
Fedbench - A Benchmark Suite for Federated Semantic Data Processing
 
Everything Self-Service:Linked Data Applications with the Information Workbench
Everything Self-Service:Linked Data Applications with the Information WorkbenchEverything Self-Service:Linked Data Applications with the Information Workbench
Everything Self-Service:Linked Data Applications with the Information Workbench
 
The Information Workbench as a Self-Service Platform for Linked Data Applicat...
The Information Workbench as a Self-Service Platform for Linked Data Applicat...The Information Workbench as a Self-Service Platform for Linked Data Applicat...
The Information Workbench as a Self-Service Platform for Linked Data Applicat...
 
Cloud-based Linked Data Management for Self-service Application Development
Cloud-based Linked Data Management for Self-service Application DevelopmentCloud-based Linked Data Management for Self-service Application Development
Cloud-based Linked Data Management for Self-service Application Development
 
Semantic Technologies for Enterprise Cloud Management
Semantic Technologies for Enterprise Cloud ManagementSemantic Technologies for Enterprise Cloud Management
Semantic Technologies for Enterprise Cloud Management
 

Recently uploaded

Opendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptxOpendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptx
Opendatabay
 
一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单
enxupq
 
一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单
enxupq
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
TravisMalana
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
ewymefz
 
Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptx
benishzehra469
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
mbawufebxi
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
ahzuo
 
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project PresentationPredicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Boston Institute of Analytics
 
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
ukgaet
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
ewymefz
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
slg6lamcq
 
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
vcaxypu
 
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
dwreak4tg
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
v3tuleee
 
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
pchutichetpong
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
John Andrews
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Subhajit Sahu
 
一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单
ocavb
 

Recently uploaded (20)

Opendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptxOpendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptx
 
一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单
 
一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
 
Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptx
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
 
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project PresentationPredicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
 
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
 
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
 
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
 
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
 
一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单
 

Ephedra: efficiently combining RDF data and services using SPARQL federation

  • 1. Ephedra: efficiently combining RDF data and services using SPARQL federation Andriy Nikolov Peter Haase Johannes Trame Artem Kozlov
  • 2. 2 • Motivation: traditional vs hybrid federation • Background: metaphactory platform • Ephedra SPARQL federation engine: • Architecture • Optimization techniques • Evaluation Table of Contents
  • 3. 3 • Multiple physically distributed data repositories • Each data repository represents a SPARQL endpoint • sometimes, a relational database exposed using R2RML ”Canonical” SPARQL 1.1 federation scenario Relational Database OBDA / ETL Graph database Graph database Graph database Graph database
  • 4. 4 • Not only SPARQL endpoints! • RDF data • Custom indices • Compute services • Machine learning models • Enterprise APIs • … Hybrid federation scenario Graph database Machine learning model Custom API ? Specialized indices
  • 5. 5 • Variety of data sources • Local/remote RDF stores, virtual RDF (Ontop) • Variety of data modalities • Graph, text, temporal, geospatial • Variety of data processing techniques • Graph analytics • Statistical analysis/machine learning • Domain-specific (BLAST genome sequence similarity) Challenges: Hybrid federation queries Graph database Relational Database OBDA / ETL NoSQL Data / Elastic Search Engines Deep Learning Service BLAST Service
  • 6. 6 Background: metaphactory platform Alexa Semantic Search VisualizationAuthoring Tableau metaphactory Frontend – Modular W3C Web Components metaphactory Backend – Java 8 Ontology Mgmt. Service Tableau Connector Service NLP Intent Service Hybrid SPARQL ServiceSPARQL Service RDF Graph Store - Data Ingestion Query Catalog Service Query as a Service metaphactory - Java servlet, Jetty Deployment REST via JAVAX/Jersey, ACL/Security via Apache Shiro 3rd Party Tools / Services Runtime / Configuration Data Relational Database OBDA / ETL Graph database Main repository
  • 7. 7 Solution: Ephedra architecture Graph Databases Blazegraph Triplestore Alexa Semantic Search VisualizationAuthoring Tableau metaphactory Frontend – Modular W3C Web Components metaphactory Backend – Java 8 Ontology Mgmt. Service NoSQL Data / Elastic Search Engines Tableau Connector Service NLP Intent Service Hybrid SPARQL ServiceSPARQL Service RDF Graph Store - Data Ingestion Query Catalog Service Query as a Service metaphactory - Java servlet, Jetty Deployment REST via JAVAX/Jersey, ACL/Security via Apache Shiro 3rd Party Tools / Services RDF Data Structured Data / Indices (Text / Spatial…) Analytics Results Relational Database OBDA / ETL Deep Learning Service R Service Runtime / Configuration Data RDF4J Federation SAIL API Query plan optimizer Runtime query execution engine Service Registry Ephedra
  • 8. 8 • Extends RDF4J (ex-Sesame) API • Compute services are wrapped into “virtual” RDF4J repositories • SPARQL graph patterns are transformed into API calls • Service wrapper repositories are explicitly described in the service registry • Input/output parameters • Expected graph patterns • SPARQL 1.1 federation using the SERVICE keyword • No automatic source selection Main principles
  • 9. 9 # Select a painter similar to rembrandt SELECT ?painter WHERE { SERVICE eph:word2vec { wd:Q5598 word2vec:hasSimilar ?painter . } ?painter wdt:P106 wd:Q1028181 . # occupation: painter } Describing services word2vec embeddings in: URI out: URI[] # Service type descriptor (extended SPIN) eph:word2vec a eph:Service ; eph:hasSPARQLPattern [ sp:subject :_inputURI ; sp:predicate word2vec:hasSimilar ; sp:object :_outputURI . ] ; spin:constraint [ spl:predicate _inputURI ] ; spin:column [ spl:predicate _outputURI ] . Service Registry wd:Q5598 ?painter = wd:Q5597 # Raphael Matching service inputs/outputs to SPARQL patterns # Service instance descriptor (RDF4J SAIL) serviceURL = http://wikidatatest.metaphacts.com/word2vec
  • 10. 10 Static optimizations • Reordering of clauses based on input/output constraints • Rank-aware optimizations • Pushing LIMIT and ORDER BY operations down the tree ?uri SELECT * WHERE { SERVICE eph:word2vec { ?uri word2vec:hasSimilar ?painter . } SERVICE eph:wikidataText { ?uri wikidata:search “rembrandt” } } SELECT * WHERE { SERVICE eph:wikidataText { ?uri wikidata:search “rembrandt” } SERVICE eph:word2vec { ?uri word2vec:hasSimilar ?painter . } }
  • 11. 11 • Synchronizing loop join requests • Synchronous vs asynchronous • Separate requests vs batch • “Breadth-first” vs “depth-first” Runtime optimizations ?id1 wikidata:search “hokusai”. ?id1BM owl:sameAs ?id1 . ?id1BM :collaboratedWith ?id2BM . ?id2BM owl:sameAs ?id2 . ?id2 word2vec:hasSimilar ?id3. Σ": Wikidata text search API Σ#: British Museum Σ$: word2vec 𝜇" 𝜇"" 𝜇"# 𝜇""" 𝜇""# 𝜇" 𝜇"" 𝜇"# 𝜇""" 𝜇""#
  • 12. 12 • Multiple alternative plans possible: • Selectivity hard to estimate in advance • Alternative plans executed in parallel (competing) • Plans revised during execution Parallel competing join SELECT * WHERE { SERVICE eph:wikidataText { ?home wikidata:search “florence” . } SERVICE eph:wikidataText { ?char wikidata:search “mary” . } ?image wdt:P180 ?character . # depicts ?image wdt:P179 ?creator . # creator ?creator wdt:P19 ?home . # place of birth } Σ" Σ# Σ$ Σ# Obtained results Σ$ Obtained results Hash join Σ"
  • 13. 13 • Cultural heritage • RDF stores • British Museum • Wikidata • Services • Wikidata text search • word2vec similarity service • Life sciences • RDF stores • Wikidata, Nextprot, Uniprot • Proprietary data • Services • Wikidata text search • BLAST sequence similarity Applications
  • 14. 14 • Benchmark queries from 2 use cases domains: • 4 from cultural heritage, 3 from life sciences • Compared runtime with and without the Ephedra optimization techniques Evaluation 0 2 4 6 8 10 12 14 CH1 (2+2) CH2 (1+1) CH3 (2+0) CH4 (2+2) PH1 (1+2) PH2 (1+1) PH3 (2+1) Query runtime No runtime optimization Runtime optimization
  • 15. 15 • Architecture for integrating SPARQL endpoints and compute services using SPARQL 1.1 federation • Explicit mappings between SPARQL nodes and service input/output parameters • Static and runtime optimizations for hybrid queries • Future work: • Backend-aware optimizations • Special treatment of relational databases exposed via Ontop • Optimizations based on service calls statistics • Applications • Integrated query results as input for machine learning • Browsing integrated data + services as a virtual knowledge graph Summary