SlideShare a Scribd company logo
1 of 23
FedX:
Optimization Techniques for Federated Query
         Processing on Linked Data
                    ISWC 2011 – October 26th

       Andreas Schwarte1, Peter Haase1, Katja Hose2,
           Ralf Schenkel2, and Michael Schmidt1
                   1fluidOperations AG, Walldorf
          2Max-Planck Institute for Informatics, Saarbrücken
Outline



•   Motivation
•   Federated Query Processing
•   FedX Query Processing Model
•   Optimization Techniques
•   Evaluation
•   Conclusion & Outlook




                                  2
Motivation

   Query processing involving multiple distributed data
          sources, e.g. Linked Open Data cloud



                New York Times


                          DBpedia

     Query both data collections in an integrated way


                                                          3
Federated Query Processing


  Federation mediator at the server
 Virtual integration of (remote) data sources
 Communication via SPARQL protocol




                                                 SPARQL
                                                          Data Source




                                                 SPARQL
                        Federation
  Query                                                   Data Source
                         Mediator




                                                 SPARQL
                                                          Data Source


                                                                        4
Federated Query Processing


                           Example Query
             Find US presidents and associated news articles
 SELECT ?President ?Party ?TopicPage WHERE {
    ?President rdf:type dbpedia-yago:PresidentsOfTheUnitedStates .
    ?nytPresident owl:sameAs ?President .
    ?President dbpedia:party ?Party .
    ?nytPresident nytimes:topicPage ?TopicPage .
 }




                                                                     5
Federated Query Processing

Query:
SELECT ?President ?Party ?TopicPage WHERE {
  ?President rdf:type dbpedia-yago:PresidentsOfTheUnitedStates .
  ?nytPresident owl:sameAs ?President .




                                                                   SPARQL
  ...
}
                                                                              DBpedia



        Federation                                                  “Barack Obama”
         Mediator                                                   “George W. Bush”
                                                                    ...

  ?President rdf:type dbpedia-yago:PresidentsOfTheUnitedStates .




                                                                   SPARQL
     “Barack Obama”
     “George W. Bush”                                                         NYTimes
     ...




                                                                                        6
Federated Query Processing

    Query:
         SELECT ?President ?Party ?TopicPage WHERE {
           ?President rdf:type dbpedia-yago:PresidentsOfTheUnitedStates .
           ?nytPresident owl:sameAs ?President .




                                                                            SPARQL
           ...
         }
                                                                                         DBpedia



                 Federation                                                  yago:Obama
                  Mediator

          ?nytPresident owl:sameAs “Barack Obama” .




                                                                            SPARQL
              “Barack Obama”
Input:        “George W. Bush”                                                           NYTimes
              ...



              “Barack Obama”, yago:Obama                                     nyt:Obama
Output:
              “Barack Obama”, nyt:Obama


                                                                                                   7
Federated Query Processing

    Query:
         SELECT ?President ?Party ?TopicPage WHERE {
           ?President rdf:type dbpedia-yago:PresidentsOfTheUnitedStates .




                                                                             SPARQL
             ...
         }
                                                                                         DBpedia



                     Federation
                      Mediator

             ?nytPresident owl:sameAs “George W. Bush” .




                                                                             SPARQL
                   “Barack Obama”
Input:             “George W. Bush”                                                      NYTimes
                   ...



                   “Barack Obama”, yago:Obama                                 nyt:Bush
Output:
                   “Barack Obama”, nyt:Obama
                   “George W. Bush”, nyt:Bush                   ... and so on for the other intermediate
                                                                mappings and triple patterns ...           8
FedX Query Processing Model

Scenario:
 Efficient SPARQL query processing on multiple distributed sources
 Data sources are known and accessible as SPARQL endpoints
   • FedX is designed to be fully compatible with SPARQL 1.0
 No a-priori knowledge about data sources
   • No local preprocessing of the data sources required
   • No need for pre-computed statistics
 On-demand federation setup




                                                                      9
Challenges to Federated Query Processing

1) Involve only relevant sources in the evaluation
       Avoid: Subqueries are sent to all sources, although potentially irrelevant
2) Compute joins close to the data
       Avoid: All joins are executed locally in a nested loop fashion
3) Reduce remote communication
       Avoid: Nested loop join that causes many remote requests




                                                                                    10
Optimization Techniques

1. Source Selection:
Idea:
  Triple patterns are annotated with relevant sources
    • Sources that can contribute information for a particular triple pattern
    • SPARQL ASK requests in conjunction with a local cache
       • After a warm-up period the cache learns the capabilities of the data sources
           During Source Selection remote requests can be avoided


2. Exclusive Groups:
Idea:
  Group triple patterns with the same single relevant source
    • Evaluation in a single (remote) subquery
    • Push join to the endpoint


                                                                                        11
Optimization Techniques (2)

Example: Source Selection + Exclusive Groups


SELECT ?President ?Party ?TopicPage WHERE {
                                                                Source Selection
 ?President rdf:type dbpedia-yago:PresidentsOfTheUnitedStates . @ DBpedia
 ?President dbpedia:party ?Party .                              @ DBpedia
                                                                             Exclusive Group
 ?nytPresident owl:sameAs ?President .                          @ DBpedia, NYTimes
 ?nytPresident nytimes:topicPage ?TopicPage .                   @ NYTimes
}




   Advantages:

    Avoid sending subqueries to sources that are not relevant
    Delegate joins to the endpoint by forming exclusive groups (i.e. executing
     the respective patterns in a single subquery)

                                                                                       12
Optimization Techniques (3)

3. Join Order:
Idea:
  Iteratively determine the join order based on count-heuristic:
    • Count free variables of triple patterns and groups
    • Consider "resolved" variable mappings from earlier iteration



4. Bound Joins:
Idea:
  Compute joins in a block nested loop fashion:
    • Reduce the number of requests by "vectored" evaluation of a set of input bindings
    • Renaming and Post-Processing technique for SPARQL 1.0



                                                                                          13
Optimization Techniques (4)

Example: Bound Joins
SELECT ?President ?Party ?TopicPage WHERE {
   ?President rdf:type dbpedia:PresidentsOfTheUnitedStates .
   ?President dbpedia:party ?Party .
   ?nytPresident owl:sameAs ?President .
   ?nytPresident nytimes:topicPage ?TopicPage .
}



  Assume that the following intermediate results have been computed as input for the last triple pattern
     Block Input               Before (NLJ)
     “Barack Obama”            SELECT ?TopicPage WHERE { “Barack Obama” nytimes:topicPage ?TopicPage }
     “George W. Bush”          SELECT ?TopicPage WHERE { “George W. Bush” nytimes:topicPage ?TopicPage }
     …                         …



     Now: Evaluation in a single remote request using a SPARQL UNION
     construct + local post processing (SPARQL 1.0)

                                                                                                           14
Optimization Example
    1   SPARQL Query                                     2   Unoptimized Internal Representation


    Compute Micronutrients using Drugbank and KEGG
                                                             4x Local Join
    SELECT ?drug ?title WHERE {                                     =
       ?drug drugbank:drugCategory drugbank-cat:micronutrient .
       ?drug drugbank:casRegistryNumber ?id .                     4x NLJ
       ?keggDrug rdf:type kegg:Drug .
       ?keggDrug bio2rdf:xRef ?id .
       ?keggDrug dc:title ?title .
    }




3    Optimized Internal Representation


            Exlusive Group
             Remote Join




                                                                                                   15
Evaluation

Based on FedBench benchmark suite
   14 queries from the Cross Domain (CD) and Life Science (LS) collections
   Real-World Data from the Linked Open Data cloud
   Federation with 5 (CD) and 4 (LS) data sources
   Queries vary in complexity, size, structure, and sources involved


Benchmark environment
 HP Proliant 2GHz 4Core, 32GB RAM
 20GB RAM for server (federation mediator)
 Local copies of the SPARQL endpoint to ensure reproducibility and
  reliability of the service
    • Provided by the FedBench Framework



                                                                              16
Evaluation (2)

  a) Evaluation times of Cross Domain (CD) and Life Science (LS) queries




                                                                           17
Evaluation (3)

          b) Number of requests sent to the endpoints



                                                          Runtimes
                                                        AliBaba: >600s
                                                         DARQ: >600s
                                                         FedX: 0.109s



                                                          Runtimes
                                                        AliBaba: >600s
                                                         DARQ: 133s
                                                           FedX: 1.4s




                                                                     18
FedX – The Bigger Picture


                                         Information Workbench:
                            Integration of Virtualized Data Sources as a Service

Application Layer

                                                                                  Semantic Wiki
                                                                                  Collaboration
                                                                               Reporting & Analytics
                                                                                Visual Exploration


Virtualization Layer                                                         Transparent & On-Demand
                                                                             Integration of Data Sources
Data Layer              SPARQL        SPARQL        SPARQL
                        Endpoint      Endpoint      Endpoint                       Data Registries
                                                                 Metadata
                                                                                 CKAN, data.gov, etc.
                                                                  Registry
                       Data Source   Data Source
                                                                                  + Enterprise Data
                                                   Data Source


                                                                                                        19
Conclusion and Outlook

FedX: Efficient SPARQL query processing in federated setting
 Available as open source software: http://www.fluidops.com/fedx
   • Implementation as a Sesame SAIL
 Novel join processing strategies, grouping techniques, source selection
   • Significant improvement of query performance compared to state-of-the-art systems
 On-demand federation setup without any preprocessing of data sources
 Compatible with existing SPARQL 1.0 endpoints

Outlook
 SPARQL 1.1 Federation Extension in Sesame 2.6
   • Ongoing integration of FedX’ optimization techniques into Sesame
 Federation Extensions in FedX
   • SERVICE keyword for optional user input to improve source selection
   • BINDINGS clauses for bound joins (for SPARQL 1.1 endpoints)
 (Remote) statistics to improve join order (e.g. VoID)


                                                                                         20
Visit our exhibitor booth
or arrange a private live demo!
Further information on FedX
http://www.fluidops.com/fedx



   THANK YOU
      CONTACT:
      fluid Operations
      Altrottstr. 31
      Walldorf, Germany
      Email: info@fluidOps.com
      website: www.fluidOps.com
      Tel.: +49 6227 3846-527
Query Cross Domain 3 (CD3)

          Find US presidents, their party and associated news


SELECT ?president ?party ?page WHERE {
  ?president <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://dbpedia.org/ontology/President> .
  ?president <http://dbpedia.org/ontology/nationality> <http://dbpedia.org/resource/United_States> .
  ?president <http://dbpedia.org/ontology/party> ?party .
  ?x <http://data.nytimes.com/elements/topicPage> ?page .
  ?x <http://www.w3.org/2002/07/owl#sameAs> ?president .
}
Query Life Science 3 (LS3)

               Find drugs and interaction drugs (+ the effect)


SELECT ?Drug ?IntDrug ?IntEffect WHERE {
  ?Drug <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://dbpedia.org/ontology/Drug> .
  ?y <http://www.w3.org/2002/07/owl#sameAs> ?Drug .
  ?Int <http://www4.wiwiss.fu-berlin.de/drugbank/resource/drugbank/interactionDrug1> ?y .
  ?Int <http://www4.wiwiss.fu-berlin.de/drugbank/resource/drugbank/interactionDrug2> ?IntDrug .
  ?Int <http://www4.wiwiss.fu-berlin.de/drugbank/resource/drugbank/text> ?IntEffect .
}

More Related Content

What's hot

GDG Meets U event - Big data & Wikidata - no lies codelab
GDG Meets U event - Big data & Wikidata -  no lies codelabGDG Meets U event - Big data & Wikidata -  no lies codelab
GDG Meets U event - Big data & Wikidata - no lies codelabCAMELIA BOBAN
 
Fine-grained Evaluation of SPARQL Endpoint Federation Systems
Fine-grained Evaluation of SPARQL Endpoint Federation SystemsFine-grained Evaluation of SPARQL Endpoint Federation Systems
Fine-grained Evaluation of SPARQL Endpoint Federation SystemsMuhammad Saleem
 
2011 4IZ440 Semantic Web – RDF, SPARQL, and software APIs
2011 4IZ440 Semantic Web – RDF, SPARQL, and software APIs2011 4IZ440 Semantic Web – RDF, SPARQL, and software APIs
2011 4IZ440 Semantic Web – RDF, SPARQL, and software APIsJosef Petrák
 
Rdf Overview Presentation
Rdf Overview PresentationRdf Overview Presentation
Rdf Overview PresentationKen Varnum
 
Mon norton tut_queryinglinkeddata02
Mon norton tut_queryinglinkeddata02Mon norton tut_queryinglinkeddata02
Mon norton tut_queryinglinkeddata02eswcsummerschool
 
SPARQL in the Semantic Web
SPARQL in the Semantic WebSPARQL in the Semantic Web
SPARQL in the Semantic WebJan Beeck
 
Introduction to RDF
Introduction to RDFIntroduction to RDF
Introduction to RDFNarni Rajesh
 
Hack U Barcelona 2011
Hack U Barcelona 2011Hack U Barcelona 2011
Hack U Barcelona 2011Peter Mika
 
Resource description framework
Resource description frameworkResource description framework
Resource description frameworkhozifa1010
 
Introduction To RDF and RDFS
Introduction To RDF and RDFSIntroduction To RDF and RDFS
Introduction To RDF and RDFSNilesh Wagmare
 
Semantic web meetup – sparql tutorial
Semantic web meetup – sparql tutorialSemantic web meetup – sparql tutorial
Semantic web meetup – sparql tutorialAdonisDamian
 
Debunking some “RDF vs. Property Graph” Alternative Facts
Debunking some “RDF vs. Property Graph” Alternative FactsDebunking some “RDF vs. Property Graph” Alternative Facts
Debunking some “RDF vs. Property Graph” Alternative FactsNeo4j
 
The Semantic Web #10 - SPARQL
The Semantic Web #10 - SPARQLThe Semantic Web #10 - SPARQL
The Semantic Web #10 - SPARQLMyungjin Lee
 

What's hot (20)

Introduction to RDF Data Model
Introduction to RDF Data ModelIntroduction to RDF Data Model
Introduction to RDF Data Model
 
GDG Meets U event - Big data & Wikidata - no lies codelab
GDG Meets U event - Big data & Wikidata -  no lies codelabGDG Meets U event - Big data & Wikidata -  no lies codelab
GDG Meets U event - Big data & Wikidata - no lies codelab
 
Fine-grained Evaluation of SPARQL Endpoint Federation Systems
Fine-grained Evaluation of SPARQL Endpoint Federation SystemsFine-grained Evaluation of SPARQL Endpoint Federation Systems
Fine-grained Evaluation of SPARQL Endpoint Federation Systems
 
2011 4IZ440 Semantic Web – RDF, SPARQL, and software APIs
2011 4IZ440 Semantic Web – RDF, SPARQL, and software APIs2011 4IZ440 Semantic Web – RDF, SPARQL, and software APIs
2011 4IZ440 Semantic Web – RDF, SPARQL, and software APIs
 
Data in RDF
Data in RDFData in RDF
Data in RDF
 
Rdf Overview Presentation
Rdf Overview PresentationRdf Overview Presentation
Rdf Overview Presentation
 
Mon norton tut_queryinglinkeddata02
Mon norton tut_queryinglinkeddata02Mon norton tut_queryinglinkeddata02
Mon norton tut_queryinglinkeddata02
 
Rdf
RdfRdf
Rdf
 
SPARQL in the Semantic Web
SPARQL in the Semantic WebSPARQL in the Semantic Web
SPARQL in the Semantic Web
 
Introduction to RDF
Introduction to RDFIntroduction to RDF
Introduction to RDF
 
Hack U Barcelona 2011
Hack U Barcelona 2011Hack U Barcelona 2011
Hack U Barcelona 2011
 
Resource description framework
Resource description frameworkResource description framework
Resource description framework
 
RDF Data Model
RDF Data ModelRDF Data Model
RDF Data Model
 
Introduction To RDF and RDFS
Introduction To RDF and RDFSIntroduction To RDF and RDFS
Introduction To RDF and RDFS
 
Semantic web meetup – sparql tutorial
Semantic web meetup – sparql tutorialSemantic web meetup – sparql tutorial
Semantic web meetup – sparql tutorial
 
SWT Lecture Session 2 - RDF
SWT Lecture Session 2 - RDFSWT Lecture Session 2 - RDF
SWT Lecture Session 2 - RDF
 
Debunking some “RDF vs. Property Graph” Alternative Facts
Debunking some “RDF vs. Property Graph” Alternative FactsDebunking some “RDF vs. Property Graph” Alternative Facts
Debunking some “RDF vs. Property Graph” Alternative Facts
 
The Semantic Web #10 - SPARQL
The Semantic Web #10 - SPARQLThe Semantic Web #10 - SPARQL
The Semantic Web #10 - SPARQL
 
Introduction to SPARQL
Introduction to SPARQLIntroduction to SPARQL
Introduction to SPARQL
 
Introduction to RDF
Introduction to RDFIntroduction to RDF
Introduction to RDF
 

Viewers also liked

10-16-12 DuraSpace Hot Topics, Slides, A Case Study
10-16-12 DuraSpace Hot Topics, Slides, A Case Study10-16-12 DuraSpace Hot Topics, Slides, A Case Study
10-16-12 DuraSpace Hot Topics, Slides, A Case StudyDuraSpace
 
Whats A Data Warehouse
Whats A Data WarehouseWhats A Data Warehouse
Whats A Data WarehouseNone None
 
Data Warehouse and OLAP - Lear-Fabini
Data Warehouse and OLAP - Lear-FabiniData Warehouse and OLAP - Lear-Fabini
Data Warehouse and OLAP - Lear-FabiniScott Fabini
 
Oracle Optimizer: 12c New Capabilities
Oracle Optimizer: 12c New CapabilitiesOracle Optimizer: 12c New Capabilities
Oracle Optimizer: 12c New CapabilitiesGuatemala User Group
 
Materialized views in PostgreSQL
Materialized views in PostgreSQLMaterialized views in PostgreSQL
Materialized views in PostgreSQLAshutosh Bapat
 
Cassandra Materialized Views
Cassandra Materialized ViewsCassandra Materialized Views
Cassandra Materialized ViewsCarl Yeksigian
 
SSSW2015 Data Workflow Tutorial
SSSW2015 Data Workflow TutorialSSSW2015 Data Workflow Tutorial
SSSW2015 Data Workflow TutorialSSSW
 
Olap Cube Design
Olap Cube DesignOlap Cube Design
Olap Cube Designh1m
 
OLAP Cubes in Datawarehousing
OLAP Cubes in DatawarehousingOLAP Cubes in Datawarehousing
OLAP Cubes in DatawarehousingPrithwis Mukerjee
 
Crm evolution- crm phases
Crm  evolution- crm phasesCrm  evolution- crm phases
Crm evolution- crm phaseshemchandmba14
 
Cassandra and materialized views
Cassandra and materialized viewsCassandra and materialized views
Cassandra and materialized viewsGrzegorz Duda
 
OLAP & DATA WAREHOUSE
OLAP & DATA WAREHOUSEOLAP & DATA WAREHOUSE
OLAP & DATA WAREHOUSEZalpa Rathod
 

Viewers also liked (14)

10-16-12 DuraSpace Hot Topics, Slides, A Case Study
10-16-12 DuraSpace Hot Topics, Slides, A Case Study10-16-12 DuraSpace Hot Topics, Slides, A Case Study
10-16-12 DuraSpace Hot Topics, Slides, A Case Study
 
05 OLAP v6 weekend
05 OLAP  v6 weekend05 OLAP  v6 weekend
05 OLAP v6 weekend
 
Whats A Data Warehouse
Whats A Data WarehouseWhats A Data Warehouse
Whats A Data Warehouse
 
Data Warehouse and OLAP - Lear-Fabini
Data Warehouse and OLAP - Lear-FabiniData Warehouse and OLAP - Lear-Fabini
Data Warehouse and OLAP - Lear-Fabini
 
Oracle Optimizer: 12c New Capabilities
Oracle Optimizer: 12c New CapabilitiesOracle Optimizer: 12c New Capabilities
Oracle Optimizer: 12c New Capabilities
 
Materialized views in PostgreSQL
Materialized views in PostgreSQLMaterialized views in PostgreSQL
Materialized views in PostgreSQL
 
Cassandra Materialized Views
Cassandra Materialized ViewsCassandra Materialized Views
Cassandra Materialized Views
 
SSSW2015 Data Workflow Tutorial
SSSW2015 Data Workflow TutorialSSSW2015 Data Workflow Tutorial
SSSW2015 Data Workflow Tutorial
 
Olap Cube Design
Olap Cube DesignOlap Cube Design
Olap Cube Design
 
OLAP Cubes in Datawarehousing
OLAP Cubes in DatawarehousingOLAP Cubes in Datawarehousing
OLAP Cubes in Datawarehousing
 
Crm evolution- crm phases
Crm  evolution- crm phasesCrm  evolution- crm phases
Crm evolution- crm phases
 
Data cubes
Data cubesData cubes
Data cubes
 
Cassandra and materialized views
Cassandra and materialized viewsCassandra and materialized views
Cassandra and materialized views
 
OLAP & DATA WAREHOUSE
OLAP & DATA WAREHOUSEOLAP & DATA WAREHOUSE
OLAP & DATA WAREHOUSE
 

Similar to FedX - Optimization Techniques for Federated Query Processing on Linked Data

Importing life science at a into Neo4j
Importing life science at a into Neo4jImporting life science at a into Neo4j
Importing life science at a into Neo4jSimon Jupp
 
Aidan's PhD Viva
Aidan's PhD VivaAidan's PhD Viva
Aidan's PhD VivaAidan Hogan
 
It19 20140721 linked data personal perspective
It19 20140721 linked data personal perspectiveIt19 20140721 linked data personal perspective
It19 20140721 linked data personal perspectiveJanifer Gatenby
 
Piloting Linked Data to Connect Library and Archive Resources to the New Worl...
Piloting Linked Data to Connect Library and Archive Resources to the New Worl...Piloting Linked Data to Connect Library and Archive Resources to the New Worl...
Piloting Linked Data to Connect Library and Archive Resources to the New Worl...Laura Akerman
 
seevl: Data-driven music discovery
seevl: Data-driven music discoveryseevl: Data-driven music discovery
seevl: Data-driven music discoveryAlexandre Passant
 
Linked data 101: Getting Caught in the Semantic Web
Linked data 101: Getting Caught in the Semantic Web Linked data 101: Getting Caught in the Semantic Web
Linked data 101: Getting Caught in the Semantic Web Morgan Briles
 
Lecture linked data cloud & sparql
Lecture linked data cloud & sparqlLecture linked data cloud & sparql
Lecture linked data cloud & sparqlDhavalkumar Thakker
 
Publishing data on the Semantic Web
Publishing data on the Semantic WebPublishing data on the Semantic Web
Publishing data on the Semantic WebPeter Mika
 
Dynamic Collective Entity Representations for Entity Ranking
Dynamic Collective Entity Representations for Entity RankingDynamic Collective Entity Representations for Entity Ranking
Dynamic Collective Entity Representations for Entity RankingDavid Graus
 
Search challenges for collections of book records
Search challenges for collections of book recordsSearch challenges for collections of book records
Search challenges for collections of book recordsArjen de Vries
 
2009 0807 Lod Gmod
2009 0807 Lod Gmod2009 0807 Lod Gmod
2009 0807 Lod GmodJun Zhao
 
Connections that work: Linked Open Data demystified
Connections that work: Linked Open Data demystifiedConnections that work: Linked Open Data demystified
Connections that work: Linked Open Data demystifiedJakob .
 
The web of interlinked data and knowledge stripped
The web of interlinked data and knowledge strippedThe web of interlinked data and knowledge stripped
The web of interlinked data and knowledge strippedSören Auer
 
Bio ontologies and semantic technologies
Bio ontologies and semantic technologiesBio ontologies and semantic technologies
Bio ontologies and semantic technologiesProf. Wim Van Criekinge
 
WTF is Semantic Web?
WTF is Semantic Web?WTF is Semantic Web?
WTF is Semantic Web?milesw
 
Wi2015 - Clustering of Linked Open Data - the LODeX tool
Wi2015 - Clustering of Linked Open Data - the LODeX toolWi2015 - Clustering of Linked Open Data - the LODeX tool
Wi2015 - Clustering of Linked Open Data - the LODeX toolLaura Po
 

Similar to FedX - Optimization Techniques for Federated Query Processing on Linked Data (20)

Importing life science at a into Neo4j
Importing life science at a into Neo4jImporting life science at a into Neo4j
Importing life science at a into Neo4j
 
Aidan's PhD Viva
Aidan's PhD VivaAidan's PhD Viva
Aidan's PhD Viva
 
Linked Open Data
Linked Open DataLinked Open Data
Linked Open Data
 
Linked (Open) Data
Linked (Open) DataLinked (Open) Data
Linked (Open) Data
 
It19 20140721 linked data personal perspective
It19 20140721 linked data personal perspectiveIt19 20140721 linked data personal perspective
It19 20140721 linked data personal perspective
 
Piloting Linked Data to Connect Library and Archive Resources to the New Worl...
Piloting Linked Data to Connect Library and Archive Resources to the New Worl...Piloting Linked Data to Connect Library and Archive Resources to the New Worl...
Piloting Linked Data to Connect Library and Archive Resources to the New Worl...
 
seevl: Data-driven music discovery
seevl: Data-driven music discoveryseevl: Data-driven music discovery
seevl: Data-driven music discovery
 
Linked data 101: Getting Caught in the Semantic Web
Linked data 101: Getting Caught in the Semantic Web Linked data 101: Getting Caught in the Semantic Web
Linked data 101: Getting Caught in the Semantic Web
 
Lecture linked data cloud & sparql
Lecture linked data cloud & sparqlLecture linked data cloud & sparql
Lecture linked data cloud & sparql
 
Publishing data on the Semantic Web
Publishing data on the Semantic WebPublishing data on the Semantic Web
Publishing data on the Semantic Web
 
Dynamic Collective Entity Representations for Entity Ranking
Dynamic Collective Entity Representations for Entity RankingDynamic Collective Entity Representations for Entity Ranking
Dynamic Collective Entity Representations for Entity Ranking
 
Search challenges for collections of book records
Search challenges for collections of book recordsSearch challenges for collections of book records
Search challenges for collections of book records
 
2009 0807 Lod Gmod
2009 0807 Lod Gmod2009 0807 Lod Gmod
2009 0807 Lod Gmod
 
Connections that work: Linked Open Data demystified
Connections that work: Linked Open Data demystifiedConnections that work: Linked Open Data demystified
Connections that work: Linked Open Data demystified
 
The web of interlinked data and knowledge stripped
The web of interlinked data and knowledge strippedThe web of interlinked data and knowledge stripped
The web of interlinked data and knowledge stripped
 
Bio ontologies and semantic technologies
Bio ontologies and semantic technologiesBio ontologies and semantic technologies
Bio ontologies and semantic technologies
 
West coastrollout
West coastrolloutWest coastrollout
West coastrollout
 
WTF is Semantic Web?
WTF is Semantic Web?WTF is Semantic Web?
WTF is Semantic Web?
 
Wi2015 - Clustering of Linked Open Data - the LODeX tool
Wi2015 - Clustering of Linked Open Data - the LODeX toolWi2015 - Clustering of Linked Open Data - the LODeX tool
Wi2015 - Clustering of Linked Open Data - the LODeX tool
 
DBpedia InsideOut
DBpedia InsideOutDBpedia InsideOut
DBpedia InsideOut
 

Recently uploaded

Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesThousandEyes
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationKnoldus Inc.
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Mark Goldstein
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfpanagenda
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...Wes McKinney
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch TuesdayIvanti
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Farhan Tariq
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demoHarshalMandlekar2
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Alkin Tezuysal
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality AssuranceInflectra
 
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Scott Andery
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 

Recently uploaded (20)

Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog Presentation
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch Tuesday
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demo
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
 
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 

FedX - Optimization Techniques for Federated Query Processing on Linked Data

  • 1. FedX: Optimization Techniques for Federated Query Processing on Linked Data ISWC 2011 – October 26th Andreas Schwarte1, Peter Haase1, Katja Hose2, Ralf Schenkel2, and Michael Schmidt1 1fluidOperations AG, Walldorf 2Max-Planck Institute for Informatics, Saarbrücken
  • 2. Outline • Motivation • Federated Query Processing • FedX Query Processing Model • Optimization Techniques • Evaluation • Conclusion & Outlook 2
  • 3. Motivation Query processing involving multiple distributed data sources, e.g. Linked Open Data cloud New York Times DBpedia Query both data collections in an integrated way 3
  • 4. Federated Query Processing Federation mediator at the server  Virtual integration of (remote) data sources  Communication via SPARQL protocol SPARQL Data Source SPARQL Federation Query Data Source Mediator SPARQL Data Source 4
  • 5. Federated Query Processing Example Query Find US presidents and associated news articles SELECT ?President ?Party ?TopicPage WHERE { ?President rdf:type dbpedia-yago:PresidentsOfTheUnitedStates . ?nytPresident owl:sameAs ?President . ?President dbpedia:party ?Party . ?nytPresident nytimes:topicPage ?TopicPage . } 5
  • 6. Federated Query Processing Query: SELECT ?President ?Party ?TopicPage WHERE { ?President rdf:type dbpedia-yago:PresidentsOfTheUnitedStates . ?nytPresident owl:sameAs ?President . SPARQL ... } DBpedia Federation “Barack Obama” Mediator “George W. Bush” ... ?President rdf:type dbpedia-yago:PresidentsOfTheUnitedStates . SPARQL “Barack Obama” “George W. Bush” NYTimes ... 6
  • 7. Federated Query Processing Query: SELECT ?President ?Party ?TopicPage WHERE { ?President rdf:type dbpedia-yago:PresidentsOfTheUnitedStates . ?nytPresident owl:sameAs ?President . SPARQL ... } DBpedia Federation yago:Obama Mediator ?nytPresident owl:sameAs “Barack Obama” . SPARQL “Barack Obama” Input: “George W. Bush” NYTimes ... “Barack Obama”, yago:Obama nyt:Obama Output: “Barack Obama”, nyt:Obama 7
  • 8. Federated Query Processing Query: SELECT ?President ?Party ?TopicPage WHERE { ?President rdf:type dbpedia-yago:PresidentsOfTheUnitedStates . SPARQL ... } DBpedia Federation Mediator ?nytPresident owl:sameAs “George W. Bush” . SPARQL “Barack Obama” Input: “George W. Bush” NYTimes ... “Barack Obama”, yago:Obama nyt:Bush Output: “Barack Obama”, nyt:Obama “George W. Bush”, nyt:Bush ... and so on for the other intermediate mappings and triple patterns ... 8
  • 9. FedX Query Processing Model Scenario:  Efficient SPARQL query processing on multiple distributed sources  Data sources are known and accessible as SPARQL endpoints • FedX is designed to be fully compatible with SPARQL 1.0  No a-priori knowledge about data sources • No local preprocessing of the data sources required • No need for pre-computed statistics  On-demand federation setup 9
  • 10. Challenges to Federated Query Processing 1) Involve only relevant sources in the evaluation Avoid: Subqueries are sent to all sources, although potentially irrelevant 2) Compute joins close to the data Avoid: All joins are executed locally in a nested loop fashion 3) Reduce remote communication Avoid: Nested loop join that causes many remote requests 10
  • 11. Optimization Techniques 1. Source Selection: Idea: Triple patterns are annotated with relevant sources • Sources that can contribute information for a particular triple pattern • SPARQL ASK requests in conjunction with a local cache • After a warm-up period the cache learns the capabilities of the data sources  During Source Selection remote requests can be avoided 2. Exclusive Groups: Idea: Group triple patterns with the same single relevant source • Evaluation in a single (remote) subquery • Push join to the endpoint 11
  • 12. Optimization Techniques (2) Example: Source Selection + Exclusive Groups SELECT ?President ?Party ?TopicPage WHERE { Source Selection ?President rdf:type dbpedia-yago:PresidentsOfTheUnitedStates . @ DBpedia ?President dbpedia:party ?Party . @ DBpedia Exclusive Group ?nytPresident owl:sameAs ?President . @ DBpedia, NYTimes ?nytPresident nytimes:topicPage ?TopicPage . @ NYTimes } Advantages:  Avoid sending subqueries to sources that are not relevant  Delegate joins to the endpoint by forming exclusive groups (i.e. executing the respective patterns in a single subquery) 12
  • 13. Optimization Techniques (3) 3. Join Order: Idea: Iteratively determine the join order based on count-heuristic: • Count free variables of triple patterns and groups • Consider "resolved" variable mappings from earlier iteration 4. Bound Joins: Idea: Compute joins in a block nested loop fashion: • Reduce the number of requests by "vectored" evaluation of a set of input bindings • Renaming and Post-Processing technique for SPARQL 1.0 13
  • 14. Optimization Techniques (4) Example: Bound Joins SELECT ?President ?Party ?TopicPage WHERE { ?President rdf:type dbpedia:PresidentsOfTheUnitedStates . ?President dbpedia:party ?Party . ?nytPresident owl:sameAs ?President . ?nytPresident nytimes:topicPage ?TopicPage . } Assume that the following intermediate results have been computed as input for the last triple pattern Block Input Before (NLJ) “Barack Obama” SELECT ?TopicPage WHERE { “Barack Obama” nytimes:topicPage ?TopicPage } “George W. Bush” SELECT ?TopicPage WHERE { “George W. Bush” nytimes:topicPage ?TopicPage } … … Now: Evaluation in a single remote request using a SPARQL UNION construct + local post processing (SPARQL 1.0) 14
  • 15. Optimization Example 1 SPARQL Query 2 Unoptimized Internal Representation Compute Micronutrients using Drugbank and KEGG 4x Local Join SELECT ?drug ?title WHERE { = ?drug drugbank:drugCategory drugbank-cat:micronutrient . ?drug drugbank:casRegistryNumber ?id . 4x NLJ ?keggDrug rdf:type kegg:Drug . ?keggDrug bio2rdf:xRef ?id . ?keggDrug dc:title ?title . } 3 Optimized Internal Representation Exlusive Group  Remote Join 15
  • 16. Evaluation Based on FedBench benchmark suite  14 queries from the Cross Domain (CD) and Life Science (LS) collections  Real-World Data from the Linked Open Data cloud  Federation with 5 (CD) and 4 (LS) data sources  Queries vary in complexity, size, structure, and sources involved Benchmark environment  HP Proliant 2GHz 4Core, 32GB RAM  20GB RAM for server (federation mediator)  Local copies of the SPARQL endpoint to ensure reproducibility and reliability of the service • Provided by the FedBench Framework 16
  • 17. Evaluation (2) a) Evaluation times of Cross Domain (CD) and Life Science (LS) queries 17
  • 18. Evaluation (3) b) Number of requests sent to the endpoints Runtimes AliBaba: >600s DARQ: >600s FedX: 0.109s Runtimes AliBaba: >600s DARQ: 133s FedX: 1.4s 18
  • 19. FedX – The Bigger Picture Information Workbench: Integration of Virtualized Data Sources as a Service Application Layer Semantic Wiki Collaboration Reporting & Analytics Visual Exploration Virtualization Layer Transparent & On-Demand Integration of Data Sources Data Layer SPARQL SPARQL SPARQL Endpoint Endpoint Endpoint Data Registries Metadata CKAN, data.gov, etc. Registry Data Source Data Source + Enterprise Data Data Source 19
  • 20. Conclusion and Outlook FedX: Efficient SPARQL query processing in federated setting  Available as open source software: http://www.fluidops.com/fedx • Implementation as a Sesame SAIL  Novel join processing strategies, grouping techniques, source selection • Significant improvement of query performance compared to state-of-the-art systems  On-demand federation setup without any preprocessing of data sources  Compatible with existing SPARQL 1.0 endpoints Outlook  SPARQL 1.1 Federation Extension in Sesame 2.6 • Ongoing integration of FedX’ optimization techniques into Sesame  Federation Extensions in FedX • SERVICE keyword for optional user input to improve source selection • BINDINGS clauses for bound joins (for SPARQL 1.1 endpoints)  (Remote) statistics to improve join order (e.g. VoID) 20
  • 21. Visit our exhibitor booth or arrange a private live demo! Further information on FedX http://www.fluidops.com/fedx THANK YOU CONTACT: fluid Operations Altrottstr. 31 Walldorf, Germany Email: info@fluidOps.com website: www.fluidOps.com Tel.: +49 6227 3846-527
  • 22. Query Cross Domain 3 (CD3) Find US presidents, their party and associated news SELECT ?president ?party ?page WHERE { ?president <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://dbpedia.org/ontology/President> . ?president <http://dbpedia.org/ontology/nationality> <http://dbpedia.org/resource/United_States> . ?president <http://dbpedia.org/ontology/party> ?party . ?x <http://data.nytimes.com/elements/topicPage> ?page . ?x <http://www.w3.org/2002/07/owl#sameAs> ?president . }
  • 23. Query Life Science 3 (LS3) Find drugs and interaction drugs (+ the effect) SELECT ?Drug ?IntDrug ?IntEffect WHERE { ?Drug <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://dbpedia.org/ontology/Drug> . ?y <http://www.w3.org/2002/07/owl#sameAs> ?Drug . ?Int <http://www4.wiwiss.fu-berlin.de/drugbank/resource/drugbank/interactionDrug1> ?y . ?Int <http://www4.wiwiss.fu-berlin.de/drugbank/resource/drugbank/interactionDrug2> ?IntDrug . ?Int <http://www4.wiwiss.fu-berlin.de/drugbank/resource/drugbank/text> ?IntEffect . }

Editor's Notes

  1. Example scenarioDBpedia and New York Times collectionsDBpedia as structured knowledge baseNew York Times as a news provider
  2. Minimize the number of remote requests due to intelligent algebraic query rewritings
  3. Query Performance improved significantly for most queriesImprovements by more than an order of magnitudeNo timeouts, no evaluation errorsQuery CD3SELECT ?president ?party ?page WHERE { ?president &lt;http://www.w3.org/1999/02/22-rdf-syntax-ns#type&gt; &lt;http://dbpedia.org/ontology/President&gt; . ?president &lt;http://dbpedia.org/ontology/nationality&gt; &lt;http://dbpedia.org/resource/United_States&gt; . ?president &lt;http://dbpedia.org/ontology/party&gt; ?party . ?x &lt;http://data.nytimes.com/elements/topicPage&gt; ?page . ?x &lt;http://www.w3.org/2002/07/owl#sameAs&gt; ?president .}Query LS3SELECT ?Drug ?IntDrug ?IntEffect WHERE { ?Drug &lt;http://www.w3.org/1999/02/22-rdf-syntax-ns#type&gt; &lt;http://dbpedia.org/ontology/Drug&gt; . ?y &lt;http://www.w3.org/2002/07/owl#sameAs&gt; ?Drug . ?Int &lt;http://www4.wiwiss.fu-berlin.de/drugbank/resource/drugbank/interactionDrug1&gt; ?y . ?Int &lt;http://www4.wiwiss.fu-berlin.de/drugbank/resource/drugbank/interactionDrug2&gt; ?IntDrug . ?Int &lt;http://www4.wiwiss.fu-berlin.de/drugbank/resource/drugbank/text&gt; ?IntEffect . }