THoSP: an Algorithm for Nesting Property Graphs

THoSP: an Algorithm for Nesting Property
Graphs
Giacomo Bergami 1 André Petermann 2 Danilo Montesi 1
1st Joint GRADES-NDA International Workshop, 2018
10th June 2018
Università di Bologna1, Universität Leipzig2

Key Ideas – Research Problem
1 An operator allowing to generalize the current “grouping” and
“nesting” is missing. Nevertheless, current (G)DBMSs allow to
express nesting operations, but their query languages’ plans do
not allow to optimize the whole process by combining the
following tasks:
• path joins separately for both patterns.
• grouping to create an id collection over the matched elements.
2 The general nesting algorithm could lead to an exponential
evaluation time.
1/16

Key Ideas – Use Case
Author Paper∗authorOf
Vertex Pattern
Authorsrc Paper∗
Authorsrc =Authordst
Authordst
authorOf authorOf
Edge Pattern
Author
name : Abigail
surname : Conner
0
Author
name : Baldwin
surname : Oliver
1
Author
name : Cassie
surname : Norman
2
Paper
title : On Joining Graphs
3
Paper
title : Object Databases
4
Paper
title : On Nesting Graphs
5
AuthorOf
6
AuthorOf
7
AuthorOf
8
AuthorOf
9
AuthorOf
10
Input Bibliography Network 2/16

Key Ideas – Desired Result
Paper
title : On Joining
Graphs
3
Paper
title : Object
Databases
4
(0 → 1), (1 → 0)
Paper
title : On Nesting
Graphs
5
Author
name : Abigail
surname : Conner
0
(0)
(0 → 2), (2 → 0)
(2)
Author
name : Baldwin
surname : Oliver
1
Author
name : Cassie
surname : Norman
2
coauthorship coauthorship
(1)
Expected result
3/16

Key Ideas – Research Goals
1 As for graph joins, the data model must enhance the
serialization of both operands and graph result.
4/16

2 The logical graph nesting operator must be general enough to
support both the THoSP algorithm and other graph
summarization tasks.
4/16

2 The logical graph nesting operator must be general enough to
support both the THoSP algorithm and other graph
summarization tasks.
3 Grouping can be avoided by defining a nesting index, through
which the containment is associated to the container. This can
be achieved by extending the Graph Join’s data structures with
the aforementioned data structure.
4/16

Logical Model – Design (1)
The nested (property) graph data model is an extension of the
logical model for graph joins. Therefore, we want to preserve the
same assumptions:
The resulting nested graph is not a materialized view (as in
SQL’s SELECT).
The nested graph is serialized by only using the ID information.
Attribute, values and labels can be completely reconstructed
from these informations and the pattern rewriting information.
5/16

Logical Model – Design (2)
The following modelling choices allow the reconstruction of the
required pieces of information:
Vertices and edges are distinctly identified by ids (N2).
A nested graph database is a property graph, where each vertex
and edge may contain (nest) another property graph (ν, ).
Each vertex or edge within the graph can be considered as a
possible graph operand.
6/16

Logical Model – Definition
Graph Nesting
A nested graph database is a nested graph, where each vertex and edge may
represent a graph. Given a nested graph G = (V, E), a vertex pattern gV, a
edge pattern gE vertex pattern containing grouping references:
η
keep
ι (G) = { v ∈ V | gV(v) = ∅, keep } ∪ ι(gV(G)),
{ e ∈ E | gE(e) = ∅, keep } ∪ ι(gE(G))
where ι is an indexing function associating to each matched graph into one
new single identifier not appearing in G, and keep is set to true whether
the non-traversed vertices and edges must be preserved into the final graph.
The newly generated nested graph is inserted into the graph database which
also contains G. Values associated to both nested vertices and edges are
determined by user defined functions.
7/16

THoSP Algorithm – Physical Model
Motivations:
1 Reduce the number of graph visiting times by visiting the
subpattern first, and then extending the visit to the remaining
patterns.
2 Represent the nested graph as an adjacency list enriched with
an external nesting index.
The algorithm uses the same principles that were adopted for
implementing graph joins:
Use memory mapping (OS buffering).
Serialized graphs represent vertices associated to both ingoing
and outgoing edges.
No additional indexing structures are exploited.
8/16

THoSP Algorithm – Example
Author
name : Abigail
surname : Conner
0
Author
name : Baldwin
surname : Oliver
1
Author
name : Cassie
surname : Norman
2
Paper
title : On Joining
Graphs
3
Paper
title : Object
Databases
4
Paper
title : On Nesting
Graphs
5
AuthorOf
6
AuthorOf
7
AuthorOf
8
AuthorOf
9
AuthorOf
10
9/16

Author
name : Abigail
surname : Conner
0
Author
name : Baldwin
surname : Oliver
1
Author
name : Cassie
surname : Norman
2
Paper
title : On Joining
Graphs
3
Paper
title : Object
Databases
4
Paper
title : On Nesting
Graphs
5
AuthorOf
6
AuthorOf
7
AuthorOf
8
AuthorOf
9
AuthorOf
10
Paper
title : On Joining
Graphs
3
9/16

Author
name : Abigail
surname : Conner
0
Author
name : Baldwin
surname : Oliver
1
Author
name : Cassie
surname : Norman
2
Paper
title : On Joining
Graphs
3
Paper
title : Object
Databases
4
Paper
title : On Nesting
Graphs
5
AuthorOf
6
AuthorOf
7
AuthorOf
8
AuthorOf
9
AuthorOf
10
Paper
title : On Joining
Graphs
3
Author
name : Abigail
surname : Conner
0
(0)
9/16

Author
name : Abigail
surname : Conner
0
Author
name : Baldwin
surname : Oliver
1
Author
name : Cassie
surname : Norman
2
Paper
title : On Joining
Graphs
3
Paper
title : Object
Databases
4
Paper
title : On Nesting
Graphs
5
AuthorOf
6
AuthorOf
7
AuthorOf
8
AuthorOf
9
AuthorOf
10
Paper
title : On Joining
Graphs
3
Author
name : Abigail
surname : Conner
0
(0)
Author
name : Cassie
surname : Norman
2
9/16

Author
name : Abigail
surname : Conner
0
Author
name : Baldwin
surname : Oliver
1
Author
name : Cassie
surname : Norman
2
Paper
title : On Joining
Graphs
3
Paper
title : Object
Databases
4
Paper
title : On Nesting
Graphs
5
AuthorOf
6
AuthorOf
7
AuthorOf
8
AuthorOf
9
AuthorOf
10
Paper
title : On Joining
Graphs
3
Author
name : Abigail
surname : Conner
0
(0)
(0 → 2), (2 → 0)
Author
name : Cassie
surname : Norman
2
coauthorship
9/16

Author
name : Abigail
surname : Conner
0
Author
name : Baldwin
surname : Oliver
1
Author
name : Cassie
surname : Norman
2
Paper
title : On Joining
Graphs
3
Paper
title : Object
Databases
4
Paper
title : On Nesting
Graphs
5
AuthorOf
6
AuthorOf
7
AuthorOf
8
AuthorOf
9
AuthorOf
10
Paper
title : On Joining
Graphs
3
Paper
title : Object
Databases
4
Author
name : Abigail
surname : Conner
0
(0)
(0 → 2), (2 → 0)
(2)
Author
name : Cassie
surname : Norman
2
coauthorship
9/16

Author
name : Abigail
surname : Conner
0
Author
name : Baldwin
surname : Oliver
1
Author
name : Cassie
surname : Norman
2
Paper
title : On Joining
Graphs
3
Paper
title : Object
Databases
4
Paper
title : On Nesting
Graphs
5
AuthorOf
6
AuthorOf
7
AuthorOf
8
AuthorOf
9
AuthorOf
10
Paper
title : On Joining
Graphs
3
Paper
title : Object
Databases
4
Author
name : Abigail
surname : Conner
0
(0)
(0 → 2), (2 → 0)
(2)
Author
name : Baldwin
surname : Oliver
1
(1)
Author
name : Cassie
surname : Norman
2
coauthorship
9/16

Author
name : Abigail
surname : Conner
0
Author
name : Baldwin
surname : Oliver
1
Author
name : Cassie
surname : Norman
2
Paper
title : On Joining
Graphs
3
Paper
title : Object
Databases
4
Paper
title : On Nesting
Graphs
5
AuthorOf
6
AuthorOf
7
AuthorOf
8
AuthorOf
9
AuthorOf
10
Paper
title : On Joining
Graphs
3
Paper
title : Object
Databases
4
(0 → 1), (1 → 0)
Author
name : Abigail
surname : Conner
0
(0)
(0 → 2), (2 → 0)
(2)
Author
name : Baldwin
surname : Oliver
1
(1)
Author
name : Cassie
surname : Norman
2
coauthorship
coauthorship
9/16

Author
name : Abigail
surname : Conner
0
Author
name : Baldwin
surname : Oliver
1
Author
name : Cassie
surname : Norman
2
Paper
title : On Joining
Graphs
3
Paper
title : Object
Databases
4
Paper
title : On Nesting
Graphs
5
AuthorOf
6
AuthorOf
7
AuthorOf
8
AuthorOf
9
AuthorOf
10
Paper
title : On Joining
Graphs
3
Paper
title : Object
Databases
4
(0 → 1), (1 → 0)
Paper
title : On Nesting
Graphs
5
Author
name : Abigail
surname : Conner
0
(0)
(0 → 2), (2 → 0)
(2)
Author
name : Baldwin
surname : Oliver
1
Author
name : Cassie
surname : Norman
2
coauthorship
coauthorship
(1)
9/16

Experimental Evaluation – Dataset
We want to show that the combination of THoSP with the proposed
physical data model outperforms the query plans for other query
languages (Cypher, SPARQL, SQL, AQL).
We performed our tests on both synthetic and real world data, using
n = 1 ÷ 8 operands with vertex size 10n:
• GMark graph generator.
• Random samples of Microsoft Academic Graph.
Our tests’ source code is available at:
https://bitbucket.org/unibogb/graphnestingc/src
10/16

Experimental Evaluation – Competing DataBases
Given that the only graph database using Java was the the worst
performing one, we implemented our solution only in C++ The
graph nesting operator was implemented in each DB language by
redurning ID collections.
• PostgreSQL was used to evaluate SQL queries. We ran the
queries directly in psql.
• SPARQL queries were evaluated over Virtuoso. SPARQL
queries were send via ODBC (C++).
• Cypher queries were evaluated over Neo4J. SPARQL queries
were send via the execute method.
• AQL queries were evaluated over ArangoDB. We ran the
queries directly in arangosh.
11/16

Experimental Evaluation – GMark Benchmark
Operands Size Two HOp Separated Pattern Time (C/C++) (ms)
|V| #Subgraph SQL+JSON SPARQL AQL Cypher THoSP
10 3 2.10 11 15.00 681.40 0.11
102 58 9.68 63 3.89 1,943.98 0.14
103 968 17.96 63 12.34 >3.60×106 0.46
104 8, 683 69.27 364 46.74 >3.60×106 4.07
105 88, 885 294.23 4,153 508.87 >3.60×106 43.81
106 902, 020 2,611.48 50,341 7,212.19 >3.60×106 563.02
107 8, 991, 417 25,666.14 672,273 922,590.00 >3.60×106 8,202.93
108 89, 146, 891 396,523.88 >3.60×106 >3.60×106 >3.60×106 91,834.20
12/16

Experimental Evaluation – Microsoft Academic Graph Bench-
mark
Operands Size Two HOp Separated Pattern Time (C/C++) (ms)
|V| #Subgraph SQL+JSON SPARQL AQL Cypher THoSP
10 19 1.69·100 3.4·101 6.57·10−1 2.38·103 2.82·10−1
102 255 1.75·100 3.22·102 2.51·100 1.01·104 3.46·10−1
103 23,119 4.71·101 1.22·103 8.18·101 >1H 1.39·101
104 5,411,205 1.53·104 2.77·105 2.08·104 >1H 2.58·103
105 97,079,329 1.20·106 >1H OOM1 >1H 1.97·105
106 241,448,529 >1H >1H OOM1 >1H 6.22·105
107 361,759,509 OOM2 >1H OOM1 >1H 7.74·105
13/16

Experimental Evaluation – Results
• This further benchmarks shows that all the current data model
supporting nested representation do not support query plans
allowing for a specific case of (graph) nesting.
• The proposed approach extended the secondary memory’s
property graph representation by adding associations to nested
vertices and edges.
• The serialized data structure provides a graph having an
external containment data structure.
• This data model achieves structural aggregation for graph data,
where aggregated data may preserve the original vertices and
edges.
14/16

Experimental Evaluation – Further Results
GROQ: THoSP can be generalized into a more general
algorithm.
Generalized Semistructured Model: This data structure can be
generalized into a broader data representation.
15/16

Experimental Evaluation – Future Work
GROQ: Further benchmarks have to be carried out over this
more general general nesting algorithm.
General Nesting: Provide a query plan where either grouping or
GROQ are used.
16/16

Backup Slides – Nested Graph Database
Nested Graph DataBase
Given a set Σ∗ of strings, a nested (property) graph database G is a tuple
G = V, E, λ, , ω, ν, , where:
• V, E ∈ N2 s.t. V ∩ E = ∅
• source and target λ: E → V2.
• labelling : V ∪ E → ℘(Σ∗)
• object mapping ω : V ∪ E → Ω
• vertices’ containment: ν: (V ∪ E) → ℘(V)
• edges’ containment: : (V ∪ E) → ℘(E)
Each vertex or edge o ∈ V ∪ E induces a nested (property) graph as the
following pair:
Go = ν(o), e ∈ (o) λ(e) ∈ (∪n≥0 ν (n)
({o}))2

THoSP Pseudocode
nest ( Cont , patt , u , S ) :
for each s in S s . t . patt . d o S e r i a l i z e ( s ) :
Cont . write ( <u , s >)
Input : G, gV , gE
Cont ← ∅
NestedGraph ← ∅
a ← V ∩ E ( γV ∪ γsrc
E ∪ γdst
E ) ;
for each v : v e r t e x in G s . t . a ( v ) :
for each V( u →e v ) :
u : = d t l ( u ) c ; nest ( Cont , V , u , { u , e , v } )
NGraph (V) ← NGraph (V) ∪ { u }
for each V(w →e v ) s . t . E ( u →e ve ←w)
w : = d t l (w) c ;
e’ : = d t l ( u ,w) c ;
nest ( Cont , E , e’ , { u , e , v , e ' ,w} )
NGraph ( E ) ← NGraph ( E ) ∪ { u →e’ w }

THoSP: an Algorithm for Nesting Property Graphs

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to THoSP: an Algorithm for Nesting Property Graphs

Similar to THoSP: an Algorithm for Nesting Property Graphs (20)

Recently uploaded

Recently uploaded (20)

THoSP: an Algorithm for Nesting Property Graphs