Zero-Knowledge
          Query Planning
   for an Iterator Implementation of
Link Traversal Based Query Execution

                                          Olaf Hartig
                          http://olafhartig.de/foaf.rdf#olaf
                                                @olafhartig

    Database and Information Systems Research Group
                       Humboldt-Universität zu Berlin
Iterator Based Execution Plan
                                          tp1 = ( ?p , ex:affiliated_with , <http://.../orgaX> )                     I1




                                          tp2 = ( ?p , ex:interested_in , ?b )                                       I2




                                          tp3 = ( ?b , rdf:type , <http://.../Book> )                                I3
  Query

  ?p ex:affiliated_with <http://.../orgaX>
  ?p ex:interested_in ?b
  ?b rdf:type <http://.../Book>
Olaf Hartig - Zero-Knowledge Query Planning for an Iterator Implementation of Link Traversal Based Query Execution        2
Iterator Based Execution Plan
                                          tp1 = ( ?p , ex:affiliated_with , <http://.../orgaX> )                     I1


                                                                                       Next?
   query-local                            tp2 = ( ?p , ex:interested_in , ?b )                                       I2
    dataset



                                                                                       Next?
                                          tp3 = ( ?b , rdf:type , <http://.../Book> )                                I3


                                                                                       Next?


Olaf Hartig - Zero-Knowledge Query Planning for an Iterator Implementation of Link Traversal Based Query Execution        3
Iterator Based Execution Plan
                                          tp1 = ( ?p , ex:affiliated_with , <http://.../orgaX> )                     I1


                                                                                       Next?
   query-local                            tp2 = ( ?p , ex:interested_in , ?b )                                       I2
    dataset



                                                                                       Next?
                                       Descriptor object <http://.../Book> )
                                        tp3 = ( ?b , rdf:type ,                                                      I3


                                               :                                       Next?
     <http://.../alice> ex:affiliated_with <http://.../orgaX>
                                               :
Olaf Hartig - Zero-Knowledge Query Planning for an Iterator Implementation of Link Traversal Based Query Execution        4
Iterator Based Execution Plan
                                          tp1 = ( ?p , ex:affiliated_with , <http://.../orgaX> )                     I1


                                                                                       Next?
   query-local                            tp2 = ( ?p , ex:interested_in , ?b )                                       I2
    dataset



                                                                                       Next?
                                          tp3 = ( ?b , rdf:type , <http://.../Book> )                                I3


                                               :                                       Next?
     <http://.../alice> ex:affiliated_with <http://.../orgaX>
                                               :
Olaf Hartig - Zero-Knowledge Query Planning for an Iterator Implementation of Link Traversal Based Query Execution        5
Iterator Based Execution Plan
                                          tp1 = ( ?p , ex:affiliated_with , <http://.../orgaX> )                     I1


                                                                 { ?p = <http://.../alice> }

   query-local                            tp2 = ( ?p , ex:interested_in , ?b )                                       I2
    dataset



                                                                                       Next?
                                          tp3 = ( ?b , rdf:type , <http://.../Book> )                                I3


                                               :                                       Next?
     <http://.../alice> ex:affiliated_with <http://.../orgaX>
                                               :
Olaf Hartig - Zero-Knowledge Query Planning for an Iterator Implementation of Link Traversal Based Query Execution        6
Iterator Based Execution Plan
                                          tp1 = ( ?p , ex:affiliated_with , <http://.../orgaX> )                     I1


                                                                 { ?p = <http://.../alice> }

   query-local                            tp2 = ( ?p , ex:interested_in , ?b )                                       I2
    dataset
                                          tp2' = ( <http://.../alice> , ex:interested_in , ?b )

                                                                                       Next?
                                          tp3 = ( ?b , rdf:type , <http://.../Book> )                                I3


                                                                                       Next?


Olaf Hartig - Zero-Knowledge Query Planning for an Iterator Implementation of Link Traversal Based Query Execution        7
Iterator Based Execution Plan
                                          tp1 = ( ?p , ex:affiliated_with , <http://.../orgaX> )                     I1


                                                                 { ?p = <http://.../alice> }

   query-local                            tp2 = ( ?p , ex:interested_in , ?b )                                       I2
    dataset
                                          tp2' = ( <http://.../alice> , ex:interested_in , ?b )

                                                                                       Next?
                                          tp3 = ( ?b , rdf:type , <http://.../Book> )                                I3


                                               :                                       Next?
        <http://.../alice> ex:interested_in <http://.../b1>
                                               :
Olaf Hartig - Zero-Knowledge Query Planning for an Iterator Implementation of Link Traversal Based Query Execution        8
Iterator Based Execution Plan
                                          tp1 = ( ?p , ex:affiliated_with , <http://.../orgaX> )                     I1


                                                                 { ?p = <http://.../alice> }

   query-local                            tp2 = ( ?p , ex:interested_in , ?b )                                       I2
    dataset
                                          tp2' = ( <http://.../alice> , ex:interested_in , ?b )

                                                   { ?p = <http://.../alice> , ?b = <http://.../b1> }

                                          tp3 = ( ?b , rdf:type , <http://.../Book> )                                I3


                                               :                                       Next?
        <http://.../alice> ex:interested_in <http://.../b1>
                                               :
Olaf Hartig - Zero-Knowledge Query Planning for an Iterator Implementation of Link Traversal Based Query Execution        9
Iterator Based Execution Plan
                                          tp1 = ( ?p , ex:affiliated_with , <http://.../orgaX> )                     I1


                                                                 { ?p = <http://.../alice> }

   query-local                            tp2 = ( ?p , ex:interested_in , ?b )                                       I2
    dataset
                                          tp2' = ( <http://.../alice> , ex:interested_in , ?b )

                                                 { ?p = <http://.../alice> , ?b = <http://.../b1> }

                                          tp3 = ( ?b , rdf:type , <http://.../Book> )                                I3
                                          tp3' = ( <http://.../b1> , rdf:type , <http://.../Book> )

                                                                                       Next?


Olaf Hartig - Zero-Knowledge Query Planning for an Iterator Implementation of Link Traversal Based Query Execution    10
Iterator Based Execution Plan
                                          tp1 = ( ?p , ex:affiliated_with , <http://.../orgaX> )                     I1


                                                                 { ?p = <http://.../alice> }

   query-local                            tp2 = ( ?p , ex:interested_in , ?b )                                       I2
    dataset
                                          tp2' = ( <http://.../alice> , ex:interested_in , ?b )

                                                 { ?p = <http://.../alice> , ?b = <http://.../b1> }

                                          tp3 = ( ?b , rdf:type , <http://.../Book> )                                I3
                                          tp3' = ( <http://.../b1> , rdf:type , <http://.../Book> )

                                                 :                                     Next?
 <http://.../Book> rdfs:subClassOf <http://.../CreativeWork>
                                                 :
Olaf Hartig - Zero-Knowledge Query Planning for an Iterator Implementation of Link Traversal Based Query Execution    11
Iterator Based Execution Plan
                                          tp1 = ( ?p , ex:affiliated_with , <http://.../orgaX> )                     I1


                                                                 { ?p = <http://.../alice> }

   query-local                            tp2 = ( ?p , ex:interested_in , ?b )                                       I2
    dataset
                                          tp2' = ( <http://.../alice> , ex:interested_in , ?b )

                                                 { ?p = <http://.../alice> , ?b = <http://.../b1> }

                                          tp3 = ( ?b , rdf:type , <http://.../Book> )                                I3
                                          tp3' = ( <http://.../b1> , rdf:type , <http://.../Book> )

                                     :                                                 Next?
     <http://.../b1> rdf:type <http://.../Book>
                                     :
Olaf Hartig - Zero-Knowledge Query Planning for an Iterator Implementation of Link Traversal Based Query Execution    12
Iterator Based Execution Plan
                                          tp1 = ( ?p , ex:affiliated_with , <http://.../orgaX> )                     I1


                                                                 { ?p = <http://.../alice> }

   query-local                            tp2 = ( ?p , ex:interested_in , ?b )                                       I2
    dataset
                                          tp2' = ( <http://.../alice> , ex:interested_in , ?b )

                                                 { ?p = <http://.../alice> , ?b = <http://.../b1> }

                                          tp3 = ( ?b , rdf:type , <http://.../Book> )                                I3
                                          tp3' = ( <http://.../b1> , rdf:type , <http://.../Book> )

                                                 { ?p = <http://.../alice> , ?b = <http://.../b1> }



Olaf Hartig - Zero-Knowledge Query Planning for an Iterator Implementation of Link Traversal Based Query Execution    13
Example Query Execution Plan
                                          tp1 = ( ?p , ex:affiliated_with , <http://.../orgaX>)                      I1




                                          tp2 = ( ?p , ex:interested_in , ?b )                                       I2




                                          tp3 = ( ?b , rdf:type , <http://.../Book> )                                I3
  Query

  ?p ex:affiliated_with <http://.../orgaX>
  ?p ex:interested_in ?b
  ?b rdf:type <http://.../Book>
Olaf Hartig - Zero-Knowledge Query Planning for an Iterator Implementation of Link Traversal Based Query Execution    14
An Alternative Execution Plan
                                          tp1 = ( ?b , rdf:type , <http://.../Book> )                                I1




                                          tp2 = ( ?p , ex:interested_in , ?b )                                       I2




                                          tp3 = ( ?p , ex:affiliated_with , <http://.../orgaX>)                      I3
  Query

  ?p ex:affiliated_with <http://.../orgaX>
  ?p ex:interested_in ?b
  ?b rdf:type <http://.../Book>
Olaf Hartig - Zero-Knowledge Query Planning for an Iterator Implementation of Link Traversal Based Query Execution    15
An Alternative Execution Plan
                                          tp1 = ( ?b , rdf:type , <http://.../Book> )                                I1


                                                                                       Next?
   query-local                            tp2 = ( ?p , ex:interested_in , ?b )                                       I2
    dataset



                                                                                       Next?
                                          tp3 = ( ?p , ex:affiliated_with , <http://.../orgaX>)                      I3


                                                                                       Next?


Olaf Hartig - Zero-Knowledge Query Planning for an Iterator Implementation of Link Traversal Based Query Execution    16
An Alternative Execution Plan
                                          tp1 = ( ?b , rdf:type , <http://.../Book> )                                I1


                                                                                       Next?
   query-local                            tp2 = ( ?p , ex:interested_in , ?b )                                       I2
    dataset



                                                                                       Next?
                                          tp3 = ( ?p , ex:affiliated_with , <http://.../orgaX>)                      I3


                                                 :                                     Next?
 <http://.../Book> rdfs:subClassOf <http://.../CreativeWork>
                                                 :
Olaf Hartig - Zero-Knowledge Query Planning for an Iterator Implementation of Link Traversal Based Query Execution    17
An Alternative Execution Plan
                                          tp1 = ( ?b , rdf:type , <http://.../Book> )                                I1


                                                                                END!

   query-local                            tp2 = ( ?p , ex:interested_in , ?b )                                       I2
    dataset



                                                                                       Next?
                                          tp3 = ( ?p , ex:affiliated_with , <http://.../orgaX>)                      I3


                                                 :                                     Next?
 <http://.../Book> rdfs:subClassOf <http://.../CreativeWork>
                                                 :
Olaf Hartig - Zero-Knowledge Query Planning for an Iterator Implementation of Link Traversal Based Query Execution    18
An Alternative Execution Plan
                                          tp1 = ( ?b , rdf:type , <http://.../Book> )                                I1


                                                                                END!

   query-local                            tp2 = ( ?p , ex:interested_in , ?b )                                       I2
    dataset



                                                                                END!

                                          tp3 = ( ?p , ex:affiliated_with , <http://.../orgaX>)                      I3


                                                                                END!



Olaf Hartig - Zero-Knowledge Query Planning for an Iterator Implementation of Link Traversal Based Query Execution    19
An Alternative Execution Plan
                                          tp1 = ( ?b , rdf:type , <http://.../Book> )                                I1


                                                                                END!

   query-local                            tp2 = ( ?p , ex:interested_in , ?b )                                       I2
    dataset



                                                                                END!

                                          tp3 = ( ?p , ex:affiliated_with , <http://.../orgaX>)                      I3


 Number of results may depend                                                   END!
 on the order of triple patterns
         = logical query execution plan
Olaf Hartig - Zero-Knowledge Query Planning for an Iterator Implementation of Link Traversal Based Query Execution    20
Query Plan Selection
 ●   Assessment criteria:
     ●   Cost (query execution time)
     ●   Benefit (number of results)
 ●   Cost and benefit must be estimated without plan execution
 ●   Estimation impossible due to “zero knowledge”
 ●   Heuristic Based Plan Selection
     ●   DEPENDENCY RESPECT RULE
     ●   SEED TP RULE
     ●   NO VOCAB SEED RULE
     ●   FILTERING TP RULE

Olaf Hartig - Zero-Knowledge Query Planning for an Iterator Implementation of Link Traversal Based Query Execution   21
SEED TP RULE

                       Use a plan with a seed triple pattern

 ●   Potential seed triple pattern
       … is a triple pattern that contains at least one HTTP URI
 ●   Seed triple pattern of a plan
       … is the first triple pattern in the plan and
       … is a potential seed triple pattern

  Query                                                                                ●   Rationale: good
                                                                                             starting point
  ?p ex:affiliated_with <http://.../orgaX> √
  ?p ex:interested_in ?b √
  ?b rdf:type <http://.../Book> √
Olaf Hartig - Zero-Knowledge Query Planning for an Iterator Implementation of Link Traversal Based Query Execution   22
NO VOCAB SEED RULE

         Avoid a seed triple pattern with vocabulary terms

 ●   Not only vocabulary term URIs in the seed triple pattern
 ●   Patterns to avoid:                       ?s ex:any_property ?o
                                              ?s rdf:type ex:any_class
 ●   Rationale: URIs for vocabulary term usually resolve to
                vocabulary definitions with little instance data
  Query

  ?p ex:affiliated_with <http://.../orgaX> √
  ?p ex:interested_in ?b
  ?b rdf:type <http://.../Book>
Olaf Hartig - Zero-Knowledge Query Planning for an Iterator Implementation of Link Traversal Based Query Execution   23
FILTERING TP RULE

           Use a plan where all filtering triple patterns are
            as close to the seed triple pattern as possible

 ●   Filtering triple pattern: each variable already occurs in one
                               of the preceding triple patterns
 ●   For each result                               tp1 = ( ?p , ex:affiliated_with , <http://.../orgaX>)             I1
     consumed as input
     a filtering TP can                                                 { ?p = <http://.../alice> }
     only report 1 or 0
     results as output                             tp2 = ( ?p , ex:interested_in , ?b )                              I2
                                                   tp2' = ( <http://.../alice> , ex:interested_in , ?b )
 ●   Rationale: Reduce                                   { ?p = <http://.../alice> , ?b = <http://.../b1> }
                cost
                                                   tp3 = ( ?b , rdf:type , <http://.../Book> )                       I3
                                                   tp3' = ( <http://.../b1> , rdf:type , <http://.../Book> )
Olaf Hartig - Zero-Knowledge Query Planning for an Iterator Implementation of Link Traversal Based Query Execution    24
Evaluation Procedure
 ●   Generate all possible plans
 ●   Execute each plan:
     ●   5 runs (+ 1 initial warm-up run)
     ●   Use an initially empty query-local dataset for each run
 ●   Measure for each plan:
     ●   Avg. execution time
     ●   Avg. number of descriptor objects retrieved during execution
     ●   Avg. number of query results




Olaf Hartig - Zero-Knowledge Query Planning for an Iterator Implementation of Link Traversal Based Query Execution   25
Evaluation Query (Example)

 SELECT ?spec ?genus WHERE {                                                          Of what genus are
                                                                                      the species that are
  geospecies:4qyn7 gs:inFamily ?fam .                                                 ● classified in the

  ?fam skos:narrowerTransitive ?spec .                                                  same family as the
  ?spec skos:closeMatch ?sp2 .                                                          American Badger,
                                                                                      ● and expected in the
  ?sp2 rdfs:subClassOf     ?genus .
                                                                                        same states as the
     ?spec             gs:isExpectedIn ?loc .                                           American Badger ?
     geospecies:4qyn7 gs:isExpectedIn ?loc
     ?loc rdf:type gs:State . }

 ●   2 potential seed triple patterns that
     satisfy our NO SEED VOCAB RULE
 ●   56 different plans, each contains
     2 filtering triple patterns                                                                        Picture source: Wikipedia
Olaf Hartig - Zero-Knowledge Query Planning for an Iterator Implementation of Link Traversal Based Query Execution            26
Measurements
                30                                                                      400




                                                             retrieved descr. objects
                                                                                        300
                20
query results




                                                                                        200
                10
                                                                                        100

                0                                                                        0
                     0 30 60 90 120 150 180                                                0 30 60 90 120 150 180
                     query exec. times (in seconds)                                       query exec. times (in seconds)


                Percentage of plans in each group with a filtering TP in specific positions
                             1st Filtering TP                                                    2nd Filtering TP
  100                                                       100
       50                                                       50
            0                                                            0
             1      2     3      4       5       6   7               1       2        3      4       5      6  7
     Olaf Hartig position in the ordered an Iterator Implementation of Link Traversal Based Query Execution BGP 27
          TP - Zero-Knowledge Query Planning for BGP              TP position in the ordered
Conclusions
                 ●   Approach that uses iterators to implement
                     Link Traversal Based Query Execution
                         … is sound
                         … guarantees termination
                         … cannot guarantee (reachability-) completeness
                 ●   Degree of completeness depends on
                     the query plans (i.e. orders of the BGP)
                 ●   Heuristic based plan selection

                 ●   Next steps:
                     ●   Algorithm to generate most suitable plans only
                     ●   Integrate adaptive query processing techniques
Olaf Hartig - Zero-Knowledge Query Planning for an Iterator Implementation of Link Traversal Based Query Execution   28
Backup Slides




Olaf Hartig - Zero-Knowledge Query Planning for an Iterator Implementation of Link Traversal Based Query Execution   29
Outline


                             1. Link Traversal Based
                                Query Execution
                             2. Characteristics of the
                                Iterator Based
                                Implementation Approach
                             3. Query Plan Selection
                             4. Evaluation


Olaf Hartig - Zero-Knowledge Query Planning for an Iterator Implementation of Link Traversal Based Query Execution   30
Link Traversal Based Querying
 ●   Semantics defined
     in two phases:
     1. Reachability                                                                 Descriptor
     2. Query Results                                                                  object




  Query

  ?p ex:affiliated_with <http://.../orgaX>
  ?p ex:interested_in ?b
  ?b rdf:type <http://.../Book>
Olaf Hartig - Zero-Knowledge Query Planning for an Iterator Implementation of Link Traversal Based Query Execution   31
Link Traversal Based Querying
 ●   Semantics defined
     in two phases:
     1. Reachability
     2. Query Results




  Query

  ?p ex:affiliated_with <http://.../orgaX>
  ?p ex:interested_in ?b
  ?b rdf:type <http://.../Book>
Olaf Hartig - Zero-Knowledge Query Planning for an Iterator Implementation of Link Traversal Based Query Execution   32
Link Traversal Based Querying
 ●   Semantics defined
     in two phases:
     1. Reachability
     2. Query Results




  Query

  ?p ex:affiliated_with <http://.../orgaX>
  ?p ex:interested_in ?b
  ?b rdf:type <http://.../Book>
Olaf Hartig - Zero-Knowledge Query Planning for an Iterator Implementation of Link Traversal Based Query Execution   33
Link Traversal Based Querying
 ●   Semantics defined                                                                           Reachable
     in two phases:                                                                              descriptor
     1. Reachability                                                                               object
     2. Query Results


                                        The mapping μ : V → U  B  L
                                        is a result to BGP bgp iff:
                                              Condition 1: dom(μ) = vars(bgp)
                                              Condition 2: μ[bgp]  Dreachable

 ●   2 phases do not reflect idea of actual execution strategy
     ●   Intertwine query evaluation with the traversal of data links
Olaf Hartig - Zero-Knowledge Query Planning for an Iterator Implementation of Link Traversal Based Query Execution   34
Characteristics




 ●   No need to know all data sources in advance
 ●   Never complete w.r.t. all data on the Web
     ●   Reason: reachability based on links that match query patterns
     ●   New concept: reachability-completeness
 ●   No guarantee for termination
     ●   Reason: Web of Data is infinite (at any point in time)
Olaf Hartig - Zero-Knowledge Query Planning for an Iterator Implementation of Link Traversal Based Query Execution   35
Outline



                             1. Characteristics of the
                                Implementation Approach
                             2. Query Plan Selection
                             3. Evaluation




Olaf Hartig - Zero-Knowledge Query Planning for an Iterator Implementation of Link Traversal Based Query Execution   36
Characteristics
 ●   Sound
 ●   Number of results may depend on order of triple patterns
 ●   Reachability-completeness not guaranteed
     ●   Main reason: inflexibility due to fixed order
 ●   Termination is guaranteed
     ●   No such guarantee for link traversal based
         query execution in general because the Web
         of Data is infinite (at any point in time)
 ●   Efficient
 ●   Easy to apply in existing query engines


Olaf Hartig - Zero-Knowledge Query Planning for an Iterator Implementation of Link Traversal Based Query Execution   37
Plan Selection Problem

 ●   Number of results may depend on order of triple patterns

                                                                = logical query execution plan


 ➔   Problem: select a suitable plan




Olaf Hartig - Zero-Knowledge Query Planning for an Iterator Implementation of Link Traversal Based Query Execution   38
Outline


                             1. Link Traversal Based
                                Query Execution
                             2. Characteristics of the
                                Iterator Based
                                Implementation Approach
                             3. Query Plan Selection
                             4. Evaluation


Olaf Hartig - Zero-Knowledge Query Planning for an Iterator Implementation of Link Traversal Based Query Execution   39
DEPENDENCY RESPECT RULE

                  Use a dependency respecting query plan

 ●   Dependency respect: a variable from each triple pattern
     already occurs in one of the preceding triple patterns

                                            tp1 = ( ?p , ex:affiliated_with , <http://.../orgaX>) I1




  Query
                                            tp2 = ( ?p , ex:interested_in , ?b )                                 √   I2


  ?p ex:affiliated_with tp3 = ( ?b , rdf:type , <http://.../Book> )
                         <http://.../orgaX>                                                                          I3
  ?p ex:interested_in ?b
  ?b rdf:type <http://.../Book>
Olaf Hartig - Zero-Knowledge Query Planning for an Iterator Implementation of Link Traversal Based Query Execution    40
DEPENDENCY RESPECT RULE

                  Use a dependency respecting query plan

 ●   Dependency respect: a variable from each triple pattern
     already occurs in one of the preceding triple patterns

                                            tp1 = ( ?p , ex:affiliated_with , <http://.../orgaX>) I1



                                            tp2 = ( ?p , ex:interested_in , ?b )                                     I2
  Query

  ?p ex:affiliated_with tp3 = ( ?b , rdf:type , <http://.../Book> )
                         <http://.../orgaX>                                                                          I3
  ?p ex:interested_in ?b
  ?b rdf:type <http://.../Book>
Olaf Hartig - Zero-Knowledge Query Planning for an Iterator Implementation of Link Traversal Based Query Execution    41
DEPENDENCY RESPECT RULE

                  Use a dependency respecting query plan

 ●   Dependency respect: a variable from each triple pattern
     already occurs in one of the preceding triple patterns
 ●   Rationale:                             tp1 = ( ?p , ex:affiliated_with , <http://.../orgaX>) I1
       Avoid
       cartesian
       products
                                            tp2 = ( ?b , rdf:type , <http://.../Book> )                              I2
  Query

  ?p ex:affiliated_with tp3 = ( ?p , ex:interested_in , ?b )
                         <http://.../orgaX>                                                                          I3
  ?p ex:interested_in ?b
  ?b rdf:type <http://.../Book>
Olaf Hartig - Zero-Knowledge Query Planning for an Iterator Implementation of Link Traversal Based Query Execution    42
These slides have been created by
                                       Olaf Hartig

                                              http://olafhartig.de


                     This work is licensed under a
       Creative Commons Attribution-Share Alike 3.0 License
           (http://creativecommons.org/licenses/by-sa/3.0/)




Olaf Hartig - Zero-Knowledge Query Planning for an Iterator Implementation of Link Traversal Based Query Execution   43

Zero-Knowledge Query Planning for an Iterator Implementation of Link Traversal Based Query Execution

  • 1.
    Zero-Knowledge Query Planning for an Iterator Implementation of Link Traversal Based Query Execution Olaf Hartig http://olafhartig.de/foaf.rdf#olaf @olafhartig Database and Information Systems Research Group Humboldt-Universität zu Berlin
  • 2.
    Iterator Based ExecutionPlan tp1 = ( ?p , ex:affiliated_with , <http://.../orgaX> ) I1 tp2 = ( ?p , ex:interested_in , ?b ) I2 tp3 = ( ?b , rdf:type , <http://.../Book> ) I3 Query ?p ex:affiliated_with <http://.../orgaX> ?p ex:interested_in ?b ?b rdf:type <http://.../Book> Olaf Hartig - Zero-Knowledge Query Planning for an Iterator Implementation of Link Traversal Based Query Execution 2
  • 3.
    Iterator Based ExecutionPlan tp1 = ( ?p , ex:affiliated_with , <http://.../orgaX> ) I1 Next? query-local tp2 = ( ?p , ex:interested_in , ?b ) I2 dataset Next? tp3 = ( ?b , rdf:type , <http://.../Book> ) I3 Next? Olaf Hartig - Zero-Knowledge Query Planning for an Iterator Implementation of Link Traversal Based Query Execution 3
  • 4.
    Iterator Based ExecutionPlan tp1 = ( ?p , ex:affiliated_with , <http://.../orgaX> ) I1 Next? query-local tp2 = ( ?p , ex:interested_in , ?b ) I2 dataset Next? Descriptor object <http://.../Book> ) tp3 = ( ?b , rdf:type , I3 : Next? <http://.../alice> ex:affiliated_with <http://.../orgaX> : Olaf Hartig - Zero-Knowledge Query Planning for an Iterator Implementation of Link Traversal Based Query Execution 4
  • 5.
    Iterator Based ExecutionPlan tp1 = ( ?p , ex:affiliated_with , <http://.../orgaX> ) I1 Next? query-local tp2 = ( ?p , ex:interested_in , ?b ) I2 dataset Next? tp3 = ( ?b , rdf:type , <http://.../Book> ) I3 : Next? <http://.../alice> ex:affiliated_with <http://.../orgaX> : Olaf Hartig - Zero-Knowledge Query Planning for an Iterator Implementation of Link Traversal Based Query Execution 5
  • 6.
    Iterator Based ExecutionPlan tp1 = ( ?p , ex:affiliated_with , <http://.../orgaX> ) I1 { ?p = <http://.../alice> } query-local tp2 = ( ?p , ex:interested_in , ?b ) I2 dataset Next? tp3 = ( ?b , rdf:type , <http://.../Book> ) I3 : Next? <http://.../alice> ex:affiliated_with <http://.../orgaX> : Olaf Hartig - Zero-Knowledge Query Planning for an Iterator Implementation of Link Traversal Based Query Execution 6
  • 7.
    Iterator Based ExecutionPlan tp1 = ( ?p , ex:affiliated_with , <http://.../orgaX> ) I1 { ?p = <http://.../alice> } query-local tp2 = ( ?p , ex:interested_in , ?b ) I2 dataset tp2' = ( <http://.../alice> , ex:interested_in , ?b ) Next? tp3 = ( ?b , rdf:type , <http://.../Book> ) I3 Next? Olaf Hartig - Zero-Knowledge Query Planning for an Iterator Implementation of Link Traversal Based Query Execution 7
  • 8.
    Iterator Based ExecutionPlan tp1 = ( ?p , ex:affiliated_with , <http://.../orgaX> ) I1 { ?p = <http://.../alice> } query-local tp2 = ( ?p , ex:interested_in , ?b ) I2 dataset tp2' = ( <http://.../alice> , ex:interested_in , ?b ) Next? tp3 = ( ?b , rdf:type , <http://.../Book> ) I3 : Next? <http://.../alice> ex:interested_in <http://.../b1> : Olaf Hartig - Zero-Knowledge Query Planning for an Iterator Implementation of Link Traversal Based Query Execution 8
  • 9.
    Iterator Based ExecutionPlan tp1 = ( ?p , ex:affiliated_with , <http://.../orgaX> ) I1 { ?p = <http://.../alice> } query-local tp2 = ( ?p , ex:interested_in , ?b ) I2 dataset tp2' = ( <http://.../alice> , ex:interested_in , ?b ) { ?p = <http://.../alice> , ?b = <http://.../b1> } tp3 = ( ?b , rdf:type , <http://.../Book> ) I3 : Next? <http://.../alice> ex:interested_in <http://.../b1> : Olaf Hartig - Zero-Knowledge Query Planning for an Iterator Implementation of Link Traversal Based Query Execution 9
  • 10.
    Iterator Based ExecutionPlan tp1 = ( ?p , ex:affiliated_with , <http://.../orgaX> ) I1 { ?p = <http://.../alice> } query-local tp2 = ( ?p , ex:interested_in , ?b ) I2 dataset tp2' = ( <http://.../alice> , ex:interested_in , ?b ) { ?p = <http://.../alice> , ?b = <http://.../b1> } tp3 = ( ?b , rdf:type , <http://.../Book> ) I3 tp3' = ( <http://.../b1> , rdf:type , <http://.../Book> ) Next? Olaf Hartig - Zero-Knowledge Query Planning for an Iterator Implementation of Link Traversal Based Query Execution 10
  • 11.
    Iterator Based ExecutionPlan tp1 = ( ?p , ex:affiliated_with , <http://.../orgaX> ) I1 { ?p = <http://.../alice> } query-local tp2 = ( ?p , ex:interested_in , ?b ) I2 dataset tp2' = ( <http://.../alice> , ex:interested_in , ?b ) { ?p = <http://.../alice> , ?b = <http://.../b1> } tp3 = ( ?b , rdf:type , <http://.../Book> ) I3 tp3' = ( <http://.../b1> , rdf:type , <http://.../Book> ) : Next? <http://.../Book> rdfs:subClassOf <http://.../CreativeWork> : Olaf Hartig - Zero-Knowledge Query Planning for an Iterator Implementation of Link Traversal Based Query Execution 11
  • 12.
    Iterator Based ExecutionPlan tp1 = ( ?p , ex:affiliated_with , <http://.../orgaX> ) I1 { ?p = <http://.../alice> } query-local tp2 = ( ?p , ex:interested_in , ?b ) I2 dataset tp2' = ( <http://.../alice> , ex:interested_in , ?b ) { ?p = <http://.../alice> , ?b = <http://.../b1> } tp3 = ( ?b , rdf:type , <http://.../Book> ) I3 tp3' = ( <http://.../b1> , rdf:type , <http://.../Book> ) : Next? <http://.../b1> rdf:type <http://.../Book> : Olaf Hartig - Zero-Knowledge Query Planning for an Iterator Implementation of Link Traversal Based Query Execution 12
  • 13.
    Iterator Based ExecutionPlan tp1 = ( ?p , ex:affiliated_with , <http://.../orgaX> ) I1 { ?p = <http://.../alice> } query-local tp2 = ( ?p , ex:interested_in , ?b ) I2 dataset tp2' = ( <http://.../alice> , ex:interested_in , ?b ) { ?p = <http://.../alice> , ?b = <http://.../b1> } tp3 = ( ?b , rdf:type , <http://.../Book> ) I3 tp3' = ( <http://.../b1> , rdf:type , <http://.../Book> ) { ?p = <http://.../alice> , ?b = <http://.../b1> } Olaf Hartig - Zero-Knowledge Query Planning for an Iterator Implementation of Link Traversal Based Query Execution 13
  • 14.
    Example Query ExecutionPlan tp1 = ( ?p , ex:affiliated_with , <http://.../orgaX>) I1 tp2 = ( ?p , ex:interested_in , ?b ) I2 tp3 = ( ?b , rdf:type , <http://.../Book> ) I3 Query ?p ex:affiliated_with <http://.../orgaX> ?p ex:interested_in ?b ?b rdf:type <http://.../Book> Olaf Hartig - Zero-Knowledge Query Planning for an Iterator Implementation of Link Traversal Based Query Execution 14
  • 15.
    An Alternative ExecutionPlan tp1 = ( ?b , rdf:type , <http://.../Book> ) I1 tp2 = ( ?p , ex:interested_in , ?b ) I2 tp3 = ( ?p , ex:affiliated_with , <http://.../orgaX>) I3 Query ?p ex:affiliated_with <http://.../orgaX> ?p ex:interested_in ?b ?b rdf:type <http://.../Book> Olaf Hartig - Zero-Knowledge Query Planning for an Iterator Implementation of Link Traversal Based Query Execution 15
  • 16.
    An Alternative ExecutionPlan tp1 = ( ?b , rdf:type , <http://.../Book> ) I1 Next? query-local tp2 = ( ?p , ex:interested_in , ?b ) I2 dataset Next? tp3 = ( ?p , ex:affiliated_with , <http://.../orgaX>) I3 Next? Olaf Hartig - Zero-Knowledge Query Planning for an Iterator Implementation of Link Traversal Based Query Execution 16
  • 17.
    An Alternative ExecutionPlan tp1 = ( ?b , rdf:type , <http://.../Book> ) I1 Next? query-local tp2 = ( ?p , ex:interested_in , ?b ) I2 dataset Next? tp3 = ( ?p , ex:affiliated_with , <http://.../orgaX>) I3 : Next? <http://.../Book> rdfs:subClassOf <http://.../CreativeWork> : Olaf Hartig - Zero-Knowledge Query Planning for an Iterator Implementation of Link Traversal Based Query Execution 17
  • 18.
    An Alternative ExecutionPlan tp1 = ( ?b , rdf:type , <http://.../Book> ) I1 END! query-local tp2 = ( ?p , ex:interested_in , ?b ) I2 dataset Next? tp3 = ( ?p , ex:affiliated_with , <http://.../orgaX>) I3 : Next? <http://.../Book> rdfs:subClassOf <http://.../CreativeWork> : Olaf Hartig - Zero-Knowledge Query Planning for an Iterator Implementation of Link Traversal Based Query Execution 18
  • 19.
    An Alternative ExecutionPlan tp1 = ( ?b , rdf:type , <http://.../Book> ) I1 END! query-local tp2 = ( ?p , ex:interested_in , ?b ) I2 dataset END! tp3 = ( ?p , ex:affiliated_with , <http://.../orgaX>) I3 END! Olaf Hartig - Zero-Knowledge Query Planning for an Iterator Implementation of Link Traversal Based Query Execution 19
  • 20.
    An Alternative ExecutionPlan tp1 = ( ?b , rdf:type , <http://.../Book> ) I1 END! query-local tp2 = ( ?p , ex:interested_in , ?b ) I2 dataset END! tp3 = ( ?p , ex:affiliated_with , <http://.../orgaX>) I3 Number of results may depend END! on the order of triple patterns = logical query execution plan Olaf Hartig - Zero-Knowledge Query Planning for an Iterator Implementation of Link Traversal Based Query Execution 20
  • 21.
    Query Plan Selection ● Assessment criteria: ● Cost (query execution time) ● Benefit (number of results) ● Cost and benefit must be estimated without plan execution ● Estimation impossible due to “zero knowledge” ● Heuristic Based Plan Selection ● DEPENDENCY RESPECT RULE ● SEED TP RULE ● NO VOCAB SEED RULE ● FILTERING TP RULE Olaf Hartig - Zero-Knowledge Query Planning for an Iterator Implementation of Link Traversal Based Query Execution 21
  • 22.
    SEED TP RULE Use a plan with a seed triple pattern ● Potential seed triple pattern … is a triple pattern that contains at least one HTTP URI ● Seed triple pattern of a plan … is the first triple pattern in the plan and … is a potential seed triple pattern Query ● Rationale: good starting point ?p ex:affiliated_with <http://.../orgaX> √ ?p ex:interested_in ?b √ ?b rdf:type <http://.../Book> √ Olaf Hartig - Zero-Knowledge Query Planning for an Iterator Implementation of Link Traversal Based Query Execution 22
  • 23.
    NO VOCAB SEEDRULE Avoid a seed triple pattern with vocabulary terms ● Not only vocabulary term URIs in the seed triple pattern ● Patterns to avoid: ?s ex:any_property ?o ?s rdf:type ex:any_class ● Rationale: URIs for vocabulary term usually resolve to vocabulary definitions with little instance data Query ?p ex:affiliated_with <http://.../orgaX> √ ?p ex:interested_in ?b ?b rdf:type <http://.../Book> Olaf Hartig - Zero-Knowledge Query Planning for an Iterator Implementation of Link Traversal Based Query Execution 23
  • 24.
    FILTERING TP RULE Use a plan where all filtering triple patterns are as close to the seed triple pattern as possible ● Filtering triple pattern: each variable already occurs in one of the preceding triple patterns ● For each result tp1 = ( ?p , ex:affiliated_with , <http://.../orgaX>) I1 consumed as input a filtering TP can { ?p = <http://.../alice> } only report 1 or 0 results as output tp2 = ( ?p , ex:interested_in , ?b ) I2 tp2' = ( <http://.../alice> , ex:interested_in , ?b ) ● Rationale: Reduce { ?p = <http://.../alice> , ?b = <http://.../b1> } cost tp3 = ( ?b , rdf:type , <http://.../Book> ) I3 tp3' = ( <http://.../b1> , rdf:type , <http://.../Book> ) Olaf Hartig - Zero-Knowledge Query Planning for an Iterator Implementation of Link Traversal Based Query Execution 24
  • 25.
    Evaluation Procedure ● Generate all possible plans ● Execute each plan: ● 5 runs (+ 1 initial warm-up run) ● Use an initially empty query-local dataset for each run ● Measure for each plan: ● Avg. execution time ● Avg. number of descriptor objects retrieved during execution ● Avg. number of query results Olaf Hartig - Zero-Knowledge Query Planning for an Iterator Implementation of Link Traversal Based Query Execution 25
  • 26.
    Evaluation Query (Example) SELECT ?spec ?genus WHERE { Of what genus are the species that are geospecies:4qyn7 gs:inFamily ?fam . ● classified in the ?fam skos:narrowerTransitive ?spec . same family as the ?spec skos:closeMatch ?sp2 . American Badger, ● and expected in the ?sp2 rdfs:subClassOf ?genus . same states as the ?spec gs:isExpectedIn ?loc . American Badger ? geospecies:4qyn7 gs:isExpectedIn ?loc ?loc rdf:type gs:State . } ● 2 potential seed triple patterns that satisfy our NO SEED VOCAB RULE ● 56 different plans, each contains 2 filtering triple patterns Picture source: Wikipedia Olaf Hartig - Zero-Knowledge Query Planning for an Iterator Implementation of Link Traversal Based Query Execution 26
  • 27.
    Measurements 30 400 retrieved descr. objects 300 20 query results 200 10 100 0 0 0 30 60 90 120 150 180 0 30 60 90 120 150 180 query exec. times (in seconds) query exec. times (in seconds) Percentage of plans in each group with a filtering TP in specific positions 1st Filtering TP 2nd Filtering TP 100 100 50 50 0 0 1 2 3 4 5 6 7 1 2 3 4 5 6 7 Olaf Hartig position in the ordered an Iterator Implementation of Link Traversal Based Query Execution BGP 27 TP - Zero-Knowledge Query Planning for BGP TP position in the ordered
  • 28.
    Conclusions ● Approach that uses iterators to implement Link Traversal Based Query Execution … is sound … guarantees termination … cannot guarantee (reachability-) completeness ● Degree of completeness depends on the query plans (i.e. orders of the BGP) ● Heuristic based plan selection ● Next steps: ● Algorithm to generate most suitable plans only ● Integrate adaptive query processing techniques Olaf Hartig - Zero-Knowledge Query Planning for an Iterator Implementation of Link Traversal Based Query Execution 28
  • 29.
    Backup Slides Olaf Hartig- Zero-Knowledge Query Planning for an Iterator Implementation of Link Traversal Based Query Execution 29
  • 30.
    Outline 1. Link Traversal Based Query Execution 2. Characteristics of the Iterator Based Implementation Approach 3. Query Plan Selection 4. Evaluation Olaf Hartig - Zero-Knowledge Query Planning for an Iterator Implementation of Link Traversal Based Query Execution 30
  • 31.
    Link Traversal BasedQuerying ● Semantics defined in two phases: 1. Reachability Descriptor 2. Query Results object Query ?p ex:affiliated_with <http://.../orgaX> ?p ex:interested_in ?b ?b rdf:type <http://.../Book> Olaf Hartig - Zero-Knowledge Query Planning for an Iterator Implementation of Link Traversal Based Query Execution 31
  • 32.
    Link Traversal BasedQuerying ● Semantics defined in two phases: 1. Reachability 2. Query Results Query ?p ex:affiliated_with <http://.../orgaX> ?p ex:interested_in ?b ?b rdf:type <http://.../Book> Olaf Hartig - Zero-Knowledge Query Planning for an Iterator Implementation of Link Traversal Based Query Execution 32
  • 33.
    Link Traversal BasedQuerying ● Semantics defined in two phases: 1. Reachability 2. Query Results Query ?p ex:affiliated_with <http://.../orgaX> ?p ex:interested_in ?b ?b rdf:type <http://.../Book> Olaf Hartig - Zero-Knowledge Query Planning for an Iterator Implementation of Link Traversal Based Query Execution 33
  • 34.
    Link Traversal BasedQuerying ● Semantics defined Reachable in two phases: descriptor 1. Reachability object 2. Query Results The mapping μ : V → U  B  L is a result to BGP bgp iff: Condition 1: dom(μ) = vars(bgp) Condition 2: μ[bgp]  Dreachable ● 2 phases do not reflect idea of actual execution strategy ● Intertwine query evaluation with the traversal of data links Olaf Hartig - Zero-Knowledge Query Planning for an Iterator Implementation of Link Traversal Based Query Execution 34
  • 35.
    Characteristics ● No need to know all data sources in advance ● Never complete w.r.t. all data on the Web ● Reason: reachability based on links that match query patterns ● New concept: reachability-completeness ● No guarantee for termination ● Reason: Web of Data is infinite (at any point in time) Olaf Hartig - Zero-Knowledge Query Planning for an Iterator Implementation of Link Traversal Based Query Execution 35
  • 36.
    Outline 1. Characteristics of the Implementation Approach 2. Query Plan Selection 3. Evaluation Olaf Hartig - Zero-Knowledge Query Planning for an Iterator Implementation of Link Traversal Based Query Execution 36
  • 37.
    Characteristics ● Sound ● Number of results may depend on order of triple patterns ● Reachability-completeness not guaranteed ● Main reason: inflexibility due to fixed order ● Termination is guaranteed ● No such guarantee for link traversal based query execution in general because the Web of Data is infinite (at any point in time) ● Efficient ● Easy to apply in existing query engines Olaf Hartig - Zero-Knowledge Query Planning for an Iterator Implementation of Link Traversal Based Query Execution 37
  • 38.
    Plan Selection Problem ● Number of results may depend on order of triple patterns = logical query execution plan ➔ Problem: select a suitable plan Olaf Hartig - Zero-Knowledge Query Planning for an Iterator Implementation of Link Traversal Based Query Execution 38
  • 39.
    Outline 1. Link Traversal Based Query Execution 2. Characteristics of the Iterator Based Implementation Approach 3. Query Plan Selection 4. Evaluation Olaf Hartig - Zero-Knowledge Query Planning for an Iterator Implementation of Link Traversal Based Query Execution 39
  • 40.
    DEPENDENCY RESPECT RULE Use a dependency respecting query plan ● Dependency respect: a variable from each triple pattern already occurs in one of the preceding triple patterns tp1 = ( ?p , ex:affiliated_with , <http://.../orgaX>) I1 Query tp2 = ( ?p , ex:interested_in , ?b ) √ I2 ?p ex:affiliated_with tp3 = ( ?b , rdf:type , <http://.../Book> ) <http://.../orgaX> I3 ?p ex:interested_in ?b ?b rdf:type <http://.../Book> Olaf Hartig - Zero-Knowledge Query Planning for an Iterator Implementation of Link Traversal Based Query Execution 40
  • 41.
    DEPENDENCY RESPECT RULE Use a dependency respecting query plan ● Dependency respect: a variable from each triple pattern already occurs in one of the preceding triple patterns tp1 = ( ?p , ex:affiliated_with , <http://.../orgaX>) I1 tp2 = ( ?p , ex:interested_in , ?b ) I2 Query ?p ex:affiliated_with tp3 = ( ?b , rdf:type , <http://.../Book> ) <http://.../orgaX> I3 ?p ex:interested_in ?b ?b rdf:type <http://.../Book> Olaf Hartig - Zero-Knowledge Query Planning for an Iterator Implementation of Link Traversal Based Query Execution 41
  • 42.
    DEPENDENCY RESPECT RULE Use a dependency respecting query plan ● Dependency respect: a variable from each triple pattern already occurs in one of the preceding triple patterns ● Rationale: tp1 = ( ?p , ex:affiliated_with , <http://.../orgaX>) I1 Avoid cartesian products tp2 = ( ?b , rdf:type , <http://.../Book> ) I2 Query ?p ex:affiliated_with tp3 = ( ?p , ex:interested_in , ?b ) <http://.../orgaX> I3 ?p ex:interested_in ?b ?b rdf:type <http://.../Book> Olaf Hartig - Zero-Knowledge Query Planning for an Iterator Implementation of Link Traversal Based Query Execution 42
  • 43.
    These slides havebeen created by Olaf Hartig http://olafhartig.de This work is licensed under a Creative Commons Attribution-Share Alike 3.0 License (http://creativecommons.org/licenses/by-sa/3.0/) Olaf Hartig - Zero-Knowledge Query Planning for an Iterator Implementation of Link Traversal Based Query Execution 43