Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Gagg: A Graph Aggregation Operator
June 2nd 2015
Fadi Maali*, Stephane Campinas, Stefan Decker
ESWC2015
* Funded by the Ir...
The Famous LOD Cloud
1/20
http://lod-cloud.net/
The Famous LOD Cloud - COLOURED
2/20
http://lod-cloud.net/
The Famous LOD Cloud
(from a different Angel)
3/20
The Famous LOD Cloud
(from a different Angel)
4/20
The Famous LOD Cloud
(from a different Angel)
5/20
The Famous LOD Cloud
(from a different Angel)
6/20
Graph Aggregation
Condenses a large graph into a structurally
similar but smaller graph by collapsing vertices
and edges
Graph Aggregation - Schema Discovery
8/20
Introducing RDF Graph Summary with application to Assisted SPARQL Formulation
Graph Aggregation - Requirements
9/20
:linkset
:dbpedia :bbc-music
:crossdomain
23k
1.2b
20m
triples
triples
triples
:open...
Graph Aggregation Methods
1. Custom Code
error prone, time, efficiency…
2. SPARQL
error prone, time, efficiency…
3. Graph ...
Graph Aggregation Methods
1. Custom Code
error prone, time, efficiency…
2. SPARQL
error prone, time, efficiency…
3. Graph ...
Gagg: Two-steps Aggregation
11/20
Gagg: Two-steps Aggregation
11/20
● Relation & measure
● Subject dimension(s)
& measure
● Object dimension(s) &
measure
Us...
Graph Aggregation - Requirements
12/20
:linkset
:dbpedia :bbc-music
:crossdomain
23k
1.2b
20m
triples
triples
triples
:ope...
Graph Aggregation - Requirements
12/20
:linkset
:dbpedia :bbc-music
:crossdomain
23k
1.2b
20m
triples
triples
triples
:ope...
Graph Aggregation - Requirements
13/20
:linkset
:dbpedia :bbc-music
:crossdomain
23k
1.2b
20m
triples
triples
triples
:ope...
Graph Aggregation - Requirements
13/20
:linkset
:dbpedia :bbc-music
:crossdomain
23k
1.2b
20m
triples
triples
triples
:ope...
Graph Aggregation - Requirements
14/20
:linkset
:dbpedia :bbc-music
:crossdomain
23k
1.2b
20m
triples
triples
triples
:ope...
Graph Aggregation - Requirements
14/20
:linkset
:dbpedia :bbc-music
:crossdomain
23k
1.2b
20m
triples
triples
triples
:ope...
Graph Aggregation - Definition
15/20
Q=(D,M,E,N,R,f)
D: subject dimensions
M: subject measure
E: object dimensions
N: obje...
Graph Aggregation - Grouped Graph
16/20
crossdomain media
10k 33k 3k 1.2M ......
linksTo
Graph Aggregation - Evaluation
17/20
● Build a binding table
● Build the Grouped Graph
O(|B|) algorithm where B is the siz...
Graph Aggregation - Experiment Setup
18/20
● Extended In-memory Apache Jena
using SSE
● BSBM and SP2B
● Type Summary and B...
Graph Aggregation - Experiment Results
19/20
Data size
(#triples)
fullSPARQL 3SPARQLs reduced Gagg
5k 0.08 0.06 0.01 0.03
...
Graph aggregation as a first-class operator
- Easier for users to express
- Easier for engines to support and optimise
- E...
PREFIX : <http://example.org/>
PREFIX rdf:<http://www.w3.org/1999/02/22-rdf-syntax-
ns#>
CONSTRUCT {
_:b0 a ?t1; :count CO...
SELECT ?t1 ?count_s ?subId ?t2 ?count_o ?objId ?p (COUNT(*) AS
?rel_count){
{
SELECT ?t1 ?count_s ?subId ?t2 ?count_o ?obj...
Upcoming SlideShare
Loading in …5
×

Gagg: A graph Aggregation Operator

703 views

Published on

ESWC2015 presentation

Published in: Data & Analytics
  • Be the first to comment

Gagg: A graph Aggregation Operator

  1. 1. Gagg: A Graph Aggregation Operator June 2nd 2015 Fadi Maali*, Stephane Campinas, Stefan Decker ESWC2015 * Funded by the Irish Research Council
  2. 2. The Famous LOD Cloud 1/20 http://lod-cloud.net/
  3. 3. The Famous LOD Cloud - COLOURED 2/20 http://lod-cloud.net/
  4. 4. The Famous LOD Cloud (from a different Angel) 3/20
  5. 5. The Famous LOD Cloud (from a different Angel) 4/20
  6. 6. The Famous LOD Cloud (from a different Angel) 5/20
  7. 7. The Famous LOD Cloud (from a different Angel) 6/20
  8. 8. Graph Aggregation Condenses a large graph into a structurally similar but smaller graph by collapsing vertices and edges
  9. 9. Graph Aggregation - Schema Discovery 8/20 Introducing RDF Graph Summary with application to Assisted SPARQL Formulation
  10. 10. Graph Aggregation - Requirements 9/20 :linkset :dbpedia :bbc-music :crossdomain 23k 1.2b 20m triples triples triples :open-license :media :cc-by-sa :closed-license :bbc-terms subject subject licenselicense subjectsTarget objectsTarget
  11. 11. Graph Aggregation Methods 1. Custom Code error prone, time, efficiency… 2. SPARQL error prone, time, efficiency… 3. Graph Databases expressivity, optimisation… 4. Gagg, a first-class operator
  12. 12. Graph Aggregation Methods 1. Custom Code error prone, time, efficiency… 2. SPARQL error prone, time, efficiency… 3. Graph Databases expressivity, optimisation… 4. Gagg, a first-class operator Operational Semantics In-memory evaluation algorithm Experimental evaluation
  13. 13. Gagg: Two-steps Aggregation 11/20
  14. 14. Gagg: Two-steps Aggregation 11/20 ● Relation & measure ● Subject dimension(s) & measure ● Object dimension(s) & measure Uses aggregation functions and a template similar to CONSTRUCT queries
  15. 15. Graph Aggregation - Requirements 12/20 :linkset :dbpedia :bbc-music :crossdomain 23k 1.2b 20m triples triples triples :open-license :media :cc-by-sa :closed-license :bbc-terms subject subject licenselicense subjectsTarget objectsTarget measure relation
  16. 16. Graph Aggregation - Requirements 12/20 :linkset :dbpedia :bbc-music :crossdomain 23k 1.2b 20m triples triples triples :open-license :media :cc-by-sa :closed-license :bbc-terms subject subject licenselicense subjectsTarget objectsTarget measure relation ?l a void:LinkSet ; void:subjectsTarget ?s ; void:objectsTarget ?o ; void:triples ?m .
  17. 17. Graph Aggregation - Requirements 13/20 :linkset :dbpedia :bbc-music :crossdomain 23k 1.2b 20m triples triples triples :open-license :media :cc-by-sa :closed-license :bbc-terms subject subject licenselicense subjectsTarget objectsTarget
  18. 18. Graph Aggregation - Requirements 13/20 :linkset :dbpedia :bbc-music :crossdomain 23k 1.2b 20m triples triples triples :open-license :media :cc-by-sa :closed-license :bbc-terms subject subject licenselicense subjectsTarget objectsTarget ?s dct:subject ?sd ; void:triple ?sm .
  19. 19. Graph Aggregation - Requirements 14/20 :linkset :dbpedia :bbc-music :crossdomain 23k 1.2b 20m triples triples triples :open-license :media :cc-by-sa :closed-license :bbc-terms subject subject licenselicense subjectsTarget objectsTarget
  20. 20. Graph Aggregation - Requirements 14/20 :linkset :dbpedia :bbc-music :crossdomain 23k 1.2b 20m triples triples triples :open-license :media :cc-by-sa :closed-license :bbc-terms subject subject licenselicense subjectsTarget objectsTarget ?o dct:subject ?od ; void:triple ?om .
  21. 21. Graph Aggregation - Definition 15/20 Q=(D,M,E,N,R,f) D: subject dimensions M: subject measure E: object dimensions N: object measure R: relation query f: reduce function ?x ?sd ?x ?sm ?y ?od ?y ?om ?x ?m?p ?y
  22. 22. Graph Aggregation - Grouped Graph 16/20 crossdomain media 10k 33k 3k 1.2M ...... linksTo
  23. 23. Graph Aggregation - Evaluation 17/20 ● Build a binding table ● Build the Grouped Graph O(|B|) algorithm where B is the size of the binding table ● Apply the reduction function ?x ?m?p ?y ?sd ?sm ?od ?om
  24. 24. Graph Aggregation - Experiment Setup 18/20 ● Extended In-memory Apache Jena using SSE ● BSBM and SP2B ● Type Summary and Bibliometrics
  25. 25. Graph Aggregation - Experiment Results 19/20 Data size (#triples) fullSPARQL 3SPARQLs reduced Gagg 5k 0.08 0.06 0.01 0.03 190k 9.84 1.25 0.42 0.55 370k 31.88 2.82 1.00 1.13 1.8M 454.07 13.48 4.37 5.61 Type Summary on BSBM data
  26. 26. Graph aggregation as a first-class operator - Easier for users to express - Easier for engines to support and optimise - Easier for further research and study Further Questions: - Syntax and effect on SPARQL - Distributed implementation Conclusion 20/20
  27. 27. PREFIX : <http://example.org/> PREFIX rdf:<http://www.w3.org/1999/02/22-rdf-syntax- ns#> CONSTRUCT { _:b0 a ?t1; :count COUNT(?sub.s) . _:b1 a ?t2; :count COUNT(?obj.o). _:b2 a rdf:Statement; rdf:predicate ?p; rdf:subject _:b0; rdf:object _:b1; :count ?prop_count } WHERE { GRAPH_AGGREGATION { ?s ?p ?o {?s a ?t1} GROUP BY ?t1 AS ?sub {?o a ?t2} GROUP BY ?t2 AS ?obj } }
  28. 28. SELECT ?t1 ?count_s ?subId ?t2 ?count_o ?objId ?p (COUNT(*) AS ?rel_count){ { SELECT ?t1 ?count_s ?subId ?t2 ?count_o ?objId ?p { ?s a ?t1 . ?s ?p ?o . ?o a ?t2 . { SELECT ?t1 ?subId (COUNT(DISTINCT ?s) AS ?count_s){ ?s a ?t1 . ?s ?p ?o .?o a ?t2 . BIND (iri(CONCAT(str(?t1), "_s")) AS ?subId) } GROUP BY ?t1 ?subId } { SELECT ?t2 ?objId (COUNT(DISTINCT ?t2) AS ?count_o){ ?s a ?t1 . ?s ?p ?o . ?o a ?t2 . BIND (iri(CONCAT(str(?t2), "_o")) AS ?objId) } GROUP BY ?t2 ?objId } } } } GROUP BY ?t1 ?count_s ?t2 ?count_o ?subId ?objId }

×