Gagg: A Graph Aggregation Operator
June 2nd 2015
Fadi Maali*, Stephane Campinas, Stefan Decker
ESWC2015
* Funded by the Irish Research Council
The Famous LOD Cloud
1/20
http://lod-cloud.net/
The Famous LOD Cloud - COLOURED
2/20
http://lod-cloud.net/
The Famous LOD Cloud
(from a different Angel)
3/20
The Famous LOD Cloud
(from a different Angel)
4/20
The Famous LOD Cloud
(from a different Angel)
5/20
The Famous LOD Cloud
(from a different Angel)
6/20
Graph Aggregation
Condenses a large graph into a structurally
similar but smaller graph by collapsing vertices
and edges
Graph Aggregation - Schema Discovery
8/20
Introducing RDF Graph Summary with application to Assisted SPARQL Formulation
Graph Aggregation - Requirements
9/20
:linkset
:dbpedia :bbc-music
:crossdomain
23k
1.2b
20m
triples
triples
triples
:open-license
:media
:cc-by-sa
:closed-license
:bbc-terms
subject
subject
licenselicense
subjectsTarget objectsTarget
Graph Aggregation Methods
1. Custom Code
error prone, time, efficiency…
2. SPARQL
error prone, time, efficiency…
3. Graph Databases
expressivity, optimisation…
4. Gagg, a first-class operator
Graph Aggregation Methods
1. Custom Code
error prone, time, efficiency…
2. SPARQL
error prone, time, efficiency…
3. Graph Databases
expressivity, optimisation…
4. Gagg, a first-class operator
Operational Semantics
In-memory evaluation algorithm
Experimental evaluation
Gagg: Two-steps Aggregation
11/20
Gagg: Two-steps Aggregation
11/20
● Relation & measure
● Subject dimension(s)
& measure
● Object dimension(s) &
measure
Uses aggregation
functions and a template
similar to CONSTRUCT
queries
Graph Aggregation - Requirements
12/20
:linkset
:dbpedia :bbc-music
:crossdomain
23k
1.2b
20m
triples
triples
triples
:open-license
:media
:cc-by-sa
:closed-license
:bbc-terms
subject
subject
licenselicense
subjectsTarget objectsTarget
measure
relation
Graph Aggregation - Requirements
12/20
:linkset
:dbpedia :bbc-music
:crossdomain
23k
1.2b
20m
triples
triples
triples
:open-license
:media
:cc-by-sa
:closed-license
:bbc-terms
subject
subject
licenselicense
subjectsTarget objectsTarget
measure
relation
?l a void:LinkSet ;
void:subjectsTarget ?s ;
void:objectsTarget ?o ;
void:triples ?m .
Graph Aggregation - Requirements
13/20
:linkset
:dbpedia :bbc-music
:crossdomain
23k
1.2b
20m
triples
triples
triples
:open-license
:media
:cc-by-sa
:closed-license
:bbc-terms
subject
subject
licenselicense
subjectsTarget objectsTarget
Graph Aggregation - Requirements
13/20
:linkset
:dbpedia :bbc-music
:crossdomain
23k
1.2b
20m
triples
triples
triples
:open-license
:media
:cc-by-sa
:closed-license
:bbc-terms
subject
subject
licenselicense
subjectsTarget objectsTarget
?s dct:subject ?sd ;
void:triple ?sm .
Graph Aggregation - Requirements
14/20
:linkset
:dbpedia :bbc-music
:crossdomain
23k
1.2b
20m
triples
triples
triples
:open-license
:media
:cc-by-sa
:closed-license
:bbc-terms
subject
subject
licenselicense
subjectsTarget objectsTarget
Graph Aggregation - Requirements
14/20
:linkset
:dbpedia :bbc-music
:crossdomain
23k
1.2b
20m
triples
triples
triples
:open-license
:media
:cc-by-sa
:closed-license
:bbc-terms
subject
subject
licenselicense
subjectsTarget objectsTarget
?o dct:subject ?od ;
void:triple ?om .
Graph Aggregation - Definition
15/20
Q=(D,M,E,N,R,f)
D: subject dimensions
M: subject measure
E: object dimensions
N: object measure
R: relation query
f: reduce function
?x ?sd
?x ?sm
?y ?od
?y ?om
?x ?m?p ?y
Graph Aggregation - Grouped Graph
16/20
crossdomain media
10k 33k 3k 1.2M ......
linksTo
Graph Aggregation - Evaluation
17/20
● Build a binding table
● Build the Grouped Graph
O(|B|) algorithm where B is the size of the binding
table
● Apply the reduction function
?x ?m?p ?y ?sd ?sm ?od ?om
Graph Aggregation - Experiment Setup
18/20
● Extended In-memory Apache Jena
using SSE
● BSBM and SP2B
● Type Summary and Bibliometrics
Graph Aggregation - Experiment Results
19/20
Data size
(#triples)
fullSPARQL 3SPARQLs reduced Gagg
5k 0.08 0.06 0.01 0.03
190k 9.84 1.25 0.42 0.55
370k 31.88 2.82 1.00 1.13
1.8M 454.07 13.48 4.37 5.61
Type Summary on BSBM data
Graph aggregation as a first-class operator
- Easier for users to express
- Easier for engines to support and optimise
- Easier for further research and study
Further Questions:
- Syntax and effect on SPARQL
- Distributed implementation
Conclusion
20/20
PREFIX : <http://example.org/>
PREFIX rdf:<http://www.w3.org/1999/02/22-rdf-syntax-
ns#>
CONSTRUCT {
_:b0 a ?t1; :count COUNT(?sub.s) .
_:b1 a ?t2; :count COUNT(?obj.o).
_:b2 a rdf:Statement; rdf:predicate ?p; rdf:subject
_:b0; rdf:object _:b1; :count ?prop_count
} WHERE {
GRAPH_AGGREGATION {
?s ?p ?o
{?s a ?t1} GROUP BY ?t1 AS ?sub
{?o a ?t2} GROUP BY ?t2 AS ?obj
}
}
SELECT ?t1 ?count_s ?subId ?t2 ?count_o ?objId ?p (COUNT(*) AS
?rel_count){
{
SELECT ?t1 ?count_s ?subId ?t2 ?count_o ?objId ?p {
?s a ?t1 . ?s ?p ?o . ?o a ?t2 .
{
SELECT ?t1 ?subId (COUNT(DISTINCT ?s) AS ?count_s){
?s a ?t1 . ?s ?p ?o .?o a ?t2 .
BIND (iri(CONCAT(str(?t1), "_s")) AS ?subId)
} GROUP BY ?t1 ?subId
}
{
SELECT ?t2 ?objId (COUNT(DISTINCT ?t2) AS ?count_o){
?s a ?t1 .
?s ?p ?o .
?o a ?t2 .
BIND (iri(CONCAT(str(?t2), "_o")) AS ?objId)
} GROUP BY ?t2 ?objId
}
}
}
} GROUP BY ?t1 ?count_s ?t2 ?count_o ?subId ?objId
}

Gagg: A graph Aggregation Operator

  • 1.
    Gagg: A GraphAggregation Operator June 2nd 2015 Fadi Maali*, Stephane Campinas, Stefan Decker ESWC2015 * Funded by the Irish Research Council
  • 2.
    The Famous LODCloud 1/20 http://lod-cloud.net/
  • 3.
    The Famous LODCloud - COLOURED 2/20 http://lod-cloud.net/
  • 4.
    The Famous LODCloud (from a different Angel) 3/20
  • 5.
    The Famous LODCloud (from a different Angel) 4/20
  • 6.
    The Famous LODCloud (from a different Angel) 5/20
  • 7.
    The Famous LODCloud (from a different Angel) 6/20
  • 8.
    Graph Aggregation Condenses alarge graph into a structurally similar but smaller graph by collapsing vertices and edges
  • 9.
    Graph Aggregation -Schema Discovery 8/20 Introducing RDF Graph Summary with application to Assisted SPARQL Formulation
  • 10.
    Graph Aggregation -Requirements 9/20 :linkset :dbpedia :bbc-music :crossdomain 23k 1.2b 20m triples triples triples :open-license :media :cc-by-sa :closed-license :bbc-terms subject subject licenselicense subjectsTarget objectsTarget
  • 11.
    Graph Aggregation Methods 1.Custom Code error prone, time, efficiency… 2. SPARQL error prone, time, efficiency… 3. Graph Databases expressivity, optimisation… 4. Gagg, a first-class operator
  • 12.
    Graph Aggregation Methods 1.Custom Code error prone, time, efficiency… 2. SPARQL error prone, time, efficiency… 3. Graph Databases expressivity, optimisation… 4. Gagg, a first-class operator Operational Semantics In-memory evaluation algorithm Experimental evaluation
  • 13.
  • 14.
    Gagg: Two-steps Aggregation 11/20 ●Relation & measure ● Subject dimension(s) & measure ● Object dimension(s) & measure Uses aggregation functions and a template similar to CONSTRUCT queries
  • 15.
    Graph Aggregation -Requirements 12/20 :linkset :dbpedia :bbc-music :crossdomain 23k 1.2b 20m triples triples triples :open-license :media :cc-by-sa :closed-license :bbc-terms subject subject licenselicense subjectsTarget objectsTarget measure relation
  • 16.
    Graph Aggregation -Requirements 12/20 :linkset :dbpedia :bbc-music :crossdomain 23k 1.2b 20m triples triples triples :open-license :media :cc-by-sa :closed-license :bbc-terms subject subject licenselicense subjectsTarget objectsTarget measure relation ?l a void:LinkSet ; void:subjectsTarget ?s ; void:objectsTarget ?o ; void:triples ?m .
  • 17.
    Graph Aggregation -Requirements 13/20 :linkset :dbpedia :bbc-music :crossdomain 23k 1.2b 20m triples triples triples :open-license :media :cc-by-sa :closed-license :bbc-terms subject subject licenselicense subjectsTarget objectsTarget
  • 18.
    Graph Aggregation -Requirements 13/20 :linkset :dbpedia :bbc-music :crossdomain 23k 1.2b 20m triples triples triples :open-license :media :cc-by-sa :closed-license :bbc-terms subject subject licenselicense subjectsTarget objectsTarget ?s dct:subject ?sd ; void:triple ?sm .
  • 19.
    Graph Aggregation -Requirements 14/20 :linkset :dbpedia :bbc-music :crossdomain 23k 1.2b 20m triples triples triples :open-license :media :cc-by-sa :closed-license :bbc-terms subject subject licenselicense subjectsTarget objectsTarget
  • 20.
    Graph Aggregation -Requirements 14/20 :linkset :dbpedia :bbc-music :crossdomain 23k 1.2b 20m triples triples triples :open-license :media :cc-by-sa :closed-license :bbc-terms subject subject licenselicense subjectsTarget objectsTarget ?o dct:subject ?od ; void:triple ?om .
  • 21.
    Graph Aggregation -Definition 15/20 Q=(D,M,E,N,R,f) D: subject dimensions M: subject measure E: object dimensions N: object measure R: relation query f: reduce function ?x ?sd ?x ?sm ?y ?od ?y ?om ?x ?m?p ?y
  • 22.
    Graph Aggregation -Grouped Graph 16/20 crossdomain media 10k 33k 3k 1.2M ...... linksTo
  • 23.
    Graph Aggregation -Evaluation 17/20 ● Build a binding table ● Build the Grouped Graph O(|B|) algorithm where B is the size of the binding table ● Apply the reduction function ?x ?m?p ?y ?sd ?sm ?od ?om
  • 24.
    Graph Aggregation -Experiment Setup 18/20 ● Extended In-memory Apache Jena using SSE ● BSBM and SP2B ● Type Summary and Bibliometrics
  • 25.
    Graph Aggregation -Experiment Results 19/20 Data size (#triples) fullSPARQL 3SPARQLs reduced Gagg 5k 0.08 0.06 0.01 0.03 190k 9.84 1.25 0.42 0.55 370k 31.88 2.82 1.00 1.13 1.8M 454.07 13.48 4.37 5.61 Type Summary on BSBM data
  • 26.
    Graph aggregation asa first-class operator - Easier for users to express - Easier for engines to support and optimise - Easier for further research and study Further Questions: - Syntax and effect on SPARQL - Distributed implementation Conclusion 20/20
  • 27.
    PREFIX : <http://example.org/> PREFIXrdf:<http://www.w3.org/1999/02/22-rdf-syntax- ns#> CONSTRUCT { _:b0 a ?t1; :count COUNT(?sub.s) . _:b1 a ?t2; :count COUNT(?obj.o). _:b2 a rdf:Statement; rdf:predicate ?p; rdf:subject _:b0; rdf:object _:b1; :count ?prop_count } WHERE { GRAPH_AGGREGATION { ?s ?p ?o {?s a ?t1} GROUP BY ?t1 AS ?sub {?o a ?t2} GROUP BY ?t2 AS ?obj } }
  • 28.
    SELECT ?t1 ?count_s?subId ?t2 ?count_o ?objId ?p (COUNT(*) AS ?rel_count){ { SELECT ?t1 ?count_s ?subId ?t2 ?count_o ?objId ?p { ?s a ?t1 . ?s ?p ?o . ?o a ?t2 . { SELECT ?t1 ?subId (COUNT(DISTINCT ?s) AS ?count_s){ ?s a ?t1 . ?s ?p ?o .?o a ?t2 . BIND (iri(CONCAT(str(?t1), "_s")) AS ?subId) } GROUP BY ?t1 ?subId } { SELECT ?t2 ?objId (COUNT(DISTINCT ?t2) AS ?count_o){ ?s a ?t1 . ?s ?p ?o . ?o a ?t2 . BIND (iri(CONCAT(str(?t2), "_o")) AS ?objId) } GROUP BY ?t2 ?objId } } } } GROUP BY ?t1 ?count_s ?t2 ?count_o ?subId ?objId }