Data Day Texas 2013

Graph Databases
Analyzing Relationships at Scale
#DDTX13

Matthias Broecheler, CTO
@mbroecheler AURELIUS
March XXX, MMXIII THINKAURELIUS.COM

name: Neptune
name: Alcmene
type: god
type: human
age: 4500
age: 45

Property
Vertex

name: Saturn
name: Jupiter
name: Hercules
name: Hydra
type: titan
type: god
type: demigod
type: monster
age: 10000

name: Pluto
name: Cerberus
type: god
type: monster
age: 4000

Graph

name: Neptune
name: Alcmene
type: god
type: human
age: 4500
age: 45

Edge
brother
mother

name: Saturn
name: Jupiter
name: Hercules
name: Hydra
type: titan
type: god
type: demigod
type: monster
age: 10000

father
father
battled
time: 2
battled
brother
time:12
Edge
Property
name: Pluto
name: Cerberus
type: god
type: monster
age: 4000
Edge
Label pet

Graph

name: Neptune
name: Alcmene
type: god
type: human
age: 4500
age: 45

brother
mother

name: Saturn
name: Jupiter
name: Hercules
name: Hydra
type: titan
type: god
type: demigod
type: monster
age: 10000

father
father
battled
time: 2
battled
brother
time:12

name: Pluto
name: Cerberus
type: god
type: monster
age: 4000

pet

Path

name: Neptune
name: Alcmene
type: god
type: human
age: 4500
age: 45

brother
mother

name: Saturn
name: Jupiter
name: Hercules
name: Hydra
type: titan
type: god
type: demigod
type: monster
age: 10000

father
father
battled
time: 2
battled
brother
time:12

name: Pluto
name: Cerberus
type: god
type: monster
age: 4000

pet

Degree

name: Neptune
name: Alcmene
type: god
type: human
age: 4500
age: 45

brother
mother

name: Saturn
name: Jupiter
name: Hercules
name: Hydra
type: titan
type: god
type: demigod
type: monster
age: 10000

father
father
battled
time: 2
battled
brother
time:12

name: Pluto
name: Cerberus
type: god
type: monster
age: 4000

pet

Shortest
Paths

name: Neptune
name: Alcmene
type: god
type: human
age: 4500
age: 45

brother
mother

name: Saturn
name: Jupiter
name: Hercules
name: Hydra
type: titan
type: god
type: demigod
type: monster
age: 10000

father
father
battled
time: 2
battled
brother
time:12

name: Pluto
name: Cerberus
type: god
type: monster
age: 4000

pet

Centrality

Tinkerpop Graph Stack
Graph
Server

Graph
Algorithms

Object-Graph
Mapper

Traversal
Language

Dataﬂow
Processing

Generic
Graph API

name: Neptune
name: Alcmene
type: god
type: human
age: 4500
age: 45

brother
mother

name: Saturn
name: Jupiter
name: Hercules
name: Hydra
type: titan
type: god
type: demigod
type: monster
age: 10000

father
father
battled
time: 2
battled
brother
time:12

name: Pluto
name: Cerberus
type: god
type: monster
age: 4000

pet

g.V!
g.E!

name: Neptune
name: Alcmene
type: god
type: human
age: 4500
age: 45

brother
mother

name: Saturn
name: Jupiter
name: Hercules
name: Hydra
type: titan
type: god
type: demigod
type: monster
age: 10000

v
battled
father
father
time: 2
battled
brother
time:12

name: Pluto
name: Cerberus
type: god
type: monster
age: 4000

pet

v = g.V(‘name’,’Hercules’)!

name: Neptune
name: Alcmene
type: god
type: human
age: 4500
age: 45

brother
mother

name: Saturn
name: Jupiter
name: Hercules
name: Hydra
type: titan
type: god
type: demigod
type: monster
age: 10000

v
battled
father
father
time: 2
battled
brother
time:12

name: Pluto
name: Cerberus
type: god
type: monster
age: 4000

pet

v.out(‘father’,’mother’)!

name: Neptune
name: Alcmene
type: god
type: human
age: 4500
age: 45

brother
mother

name: Saturn
name: Jupiter
name: Hercules
name: Hydra
type: titan
type: god
type: demigod
type: monster
age: 10000

v
battled
father
father
time: 2
battled
brother
time:12

name: Pluto
name: Cerberus
type: god
type: monster
age: 4000

pet

v.out(‘father’).out(‘brother’).name!

name: Neptune
name: Alcmene
type: god
type: human
age: 4500
age: 45

brother
mother

name: Saturn
name: Jupiter
name: Hercules
name: Hydra
type: titan
type: god
type: demigod
type: monster
age: 10000

v
battled
father
father
time: 2
battled
brother
time:12

name: Pluto
name: Cerberus
type: god
type: monster
age: 4000

pet

v.outE(‘battled’).has(‘time’,T.gt,5).inV.name!

name: Neptune
name: Alcmene
type: god
type: human
age: 4500
age: 45

brother
mother

name: Saturn
name: Jupiter
name: Hercules
name: Hydra
type: titan
type: god
type: demigod
type: monster
age: 10000

v
battled
father
father
time: 2
battled
brother
time:12

name: Pluto
name: Cerberus
type: god
type: monster
age: 4000

pet

v.out(‘father’).out(‘brother’)!
.has(‘age’,T.lt,4200).name!

name: Neptune
name: Alcmene
type: god
type: human
age: 4500
age: 45

brother
mother

name: Saturn
name: Jupiter
name: Hercules
name: Hydra
type: titan
type: god
type: demigod
type: monster
age: 10000

father
father
battled
time: 2
battled
brother
time:12

name: Pluto
name: Cerberus
type: god
type: monster
age: 4000

pet

g.query().has(‘age’,T.gt,4200).vertices()!

name: Neptune
name: Alcmene
type: god
type: human
age: 4500
age: 45

brother
mother

name: Saturn
name: Jupiter
name: Hercules
name: Hydra
type: titan
type: god
type: demigod
type: monster
age: 10000

father
father
battled
time: 2
battled
brother
time:12

name: Pluto
name: Cerberus
type: god
type: monster
age: 4000

pet

g.query().has(‘time’,T.lt,5).edges()!

name: Neptune
name: Alcmene
type: god
type: human
age: 4500
age: 45

brother
mother

name: Saturn
name: Jupiter
name: Hercules
name: Hydra
type: titan
type: god
type: demigod
type: monster
age: 10000

father
father
battled
time: 2
battled
brother
time:12

name: Pluto
name: Cerberus
type: god
type: monster
age: 4000

pet

saturn.as('x').in('father')!
.loop('x'){it.loops < 3}.next()!

name: Neptune
name: Alcmene
type: god
type: human
age: 4500
age: 45

brother
mother

name: Saturn
name: Jupiter
name: Hercules
name: Hydra
type: titan
type: god
type: demigod
type: monster
age: 10000

father
father
battled
time: 2
battled
brother
time:12

name: Pluto
name: Cerberus
type: god
type: monster
age: 4000

pet

g.V.sideEffect{ 
!it.rank = it.both.both.both.count() 
}!

Speed of Traversal/Process
The Graph Landscape

Illustration only, not to scale
Size of Graph

Apache 2

Aurelius Graph Cluster
TITAN FAUNUS FULGORA

Map/Reduce
Load

Bulk Load

Analysis results
back into Titan

Stores a massive-scale Batch processing of large Runs global graph algorithms
property graph allowing real- graphs with Hadoop
on large, compressed,
time traversals and updates
in-memory graphs

Titan Features
  Numerous Concurrent Users
  Many Short Transactions
  read/write
  Real-time Traversals (OLTP)
  High Availability
  Dynamic Scalability
  Variable Consistency Model
  ACID or eventual consistency
  Real-time Big Graph Data

Storage Backends
Partitionability

Consistency
Availability

$ ./titan-0.2.0/bin/gremlin.sh!
! ! !,,,/!
(o o)!
-----oOOo-(_)-oOOo-----!
gremlin> g = TitanFactory.open('/tmp/titan')!
==>titangraph[local:/tmp/titan]!
gremlin> v = g.V(‘name’,’Hercules’)!
==>v[4]!
gremlin> v.out(‘father’).out(‘brother’).name!

Vertex-Centric Indices
  Sort and index edges per
vertex by primary key
  Primary key can be composite
  Enables efﬁcient focused
traversals
  Only retrieve edges that matter
  Uses push down predicates for
quick, index-driven retrieval

battled
battled
battled
time: 1
time: 3
time: 5

mother
battled
v
v.query()!
time: 9

father
fought
fought

battled
battled
battled
time: 1
time: 3
time: 5

mother
battled
v
v.query()!
time: 9
.direction(OUT)!

father

battled
battled
battled
time: 1
time: 3
time: 5

battled
v
v.query()!
time: 9
.direction(OUT)!
.labels(‘battled’)!

battled
battled
time: 1
time: 3

v
v.query()!
.direction(OUT)!
.labels(‘battled’)!
.has(‘time,T.lt,5)!

Titan Server

REST
REXPRO

$ wget http://s3.thinkaurelius.com/downloads/titan/titan-cassandra-0.3.0.zip!
$ unzip titan-cassandra-0.3.0.zip!
$ cd titan-cassandra-0.3.0!
$ sudo bin/titan.sh config/titan-server-rexster.xml config/titan-server-
cassandra.properties!

Graph Indexing
  Vertex and Edge indexing
  Pluggable index provider
  ElasticSearch
  Lucene
  Full-text search
  Numeric range search
  Geographic search

name: Neptune
name: Alcmene
age: 4500
type: human
title: God of the age: 45
earth and ocean

brother
mother

name: Jupiter
name: Saturn
age: 4800
name: Hercules
name: Hydra
type: titan
title: God of the title: Divine hero
type: monster
age: 10000
heaven and skies

father
father
battled
time: 2
battled
locaion: [37.7,23.9]
brother
time:12
location: [39,22]

name: Pluto
name: Cerberus
age: 4000
title: Ugly beast of the
title: God of the
underworld
underworld

pet

name: Neptune
name: Alcmene
age: 4500
type: human
earth and ocean

brother
mother

name: Jupiter
name: Saturn
age: 4800
name: Hercules
name: Hydra
type: titan
type: monster
age: 10000
heaven and skies

father
father
battled
time: 2
battled
locaion: [37.7,23.9]
brother
time:12
location: [39,22]

name: Pluto
name: Cerberus
age: 4000
title: God of the
underworld
underworld

pet

g.query().has(‘title’,Txt.CONTAINS,’god’).vertices()!

name: Neptune
name: Alcmene
age: 4500
type: human
earth and ocean

brother
mother

name: Jupiter
name: Saturn
age: 4800
name: Hercules
name: Hydra
type: titan
type: monster
age: 10000
heaven and skies

father
father
battled
time: 2
battled
locaion: [37.7,23.9]
brother
time:12
location: [39,22]

name: Pluto
name: Cerberus
age: 4000
title: God of the
underworld
underworld

pet

g.query().has(‘age’,GREATER_THAN,4500) 
.has(‘title’,CONTAINS,’god’).vertices()!

name: Neptune
name: Alcmene
age: 4500
type: human
earth and ocean

brother
mother

name: Jupiter
name: Saturn
age: 4800
name: Hercules
name: Hydra
type: titan
type: monster
age: 10000
heaven and skies

father
father
battled
time: 2
battled
locaion: [37.7,23.9]
brother
time:12
location: [39,22]

name: Pluto
name: Cerberus
age: 4000
title: God of the
underworld
underworld

pet

g.query().has(‘location’,WITHIN, 
Geoshape.circle(38,24,50).edges()!

Faunus Features
  Hadoop-based Graph
Computing Framework
  Graph Analytics
  Breadth-ﬁrst Traversals
  Global Graph Computations
  Batch Big Graph Data

Faunus Architecture

g._()!

Faunus Work Flow

g.V.out .out .count()

hdfs://user/ubuntu/
output/job-0/
output/job-1/ graph*
output/job-2/ { sideeffect*
Compressed HDFS Graphs
  stored in sequence ﬁles
  variable length encoding
  preﬁx compression

Faunus Setup

$ bin/gremlin.sh !

,,,/!
(o o)!
-----oOOo-(_)-oOOo-----!
gremlin> g = FaunusFactory.open('bin/titan-hbase.properties')!
==>faunusgraph[titanhbaseinputformat]!
gremlin> g.getProperties()!
==>faunus.graph.input.format=com.thinkaurelius.faunus.formats.titan.hbase.TitanHBaseInputFormat
==>faunus.graph.output.format=org.apache.hadoop.mapreduce.lib.output.SequenceFileOutputFormat!
==>faunus.sideeffect.output.format=org.apache.hadoop.mapreduce.lib.output.TextOutputFormat!
==>faunus.output.location=dbpedia!
==>faunus.output.location.overwrite=true!
gremlin> g._() !
12/11/09 15:17:45 INFO mapreduce.FaunusCompiler: Compiled to 1 MapReduce job(s)!
12/11/09 15:17:45 INFO mapreduce.FaunusCompiler: Executing job 1 out of 1:
MapSequence[com.thinkaurelius.faunus.mapreduce.transform.IdentityMap.Map]!
12/11/09 15:17:50 INFO mapred.JobClient: Running job: job_201211081058_0003!

Build a Knowledge Graph
  Based on DBPedia
  Graph version of Wikipedia
  ~290 million edges (~1B triples)
1.  Bulk load RDF into Faunus
  6 m1.xlarge
2.  Convert to property graph
3.  Bulk load into Titan
  3 m1.xlarge with Cassandra
4.  OLTP+OLAP
  Total Time: ~ 2 hours

Graph OLTP

gremlin> g = TitanFactory.open('bin/cassandra.local') !
==>titangraph[cassandrathrift:10.176.213.110]!

gremlin> g.V('name','Random_walker_algorithm').both.name!
==>Random_walk!
==>Segmentation_(image_processing)!
==>Graph_(mathematics)!
==>Laplacian_matrix!
==>Graph!
==>Laplacian_matrix!
==>Electrical_network!
==>Resistor!
==>Electrical_resistance_and_conductance!
==>Ground_(electricity)!
==>Direct_current!
==>Voltage_source!
==>Precomputation!
==>Category:Computer_vision!
==>Random_Walker_(Computer_Vision)!
==>List_of_algorithms!
==>Segmentation_(image_processing)!
==>Watershed_(image_processing)!
==>Random_walker_(computer_vision)!
==>Random_Walker_(computer_vision)!

Graph OLAP

gremlin> g.V('name','Learning').out.out.out.out[0..10].name !
==>Latium!
==>Roman_Kingdom!
==>Roman_Republic!
==>Roman_Empire!
==>Middle_Ages!
==>Early_modern_Europe!
==>Armenian_Kingdom_of_Cilicia!
==>Lingua_franca!
==>Vatican_City!
==>Vulgar_Latin!
==>Romance_languages!

Complex Problem

1.  Identify Entities
2.  Identify Relationships
3.  Apply Graph Analysis

Apache 2

Aurelius Graph Cluster
TITAN FAUNUS FULGORA

Map/Reduce
Load

Bulk Load

Analysis results
aureliusgraphs@googlegroups.com
back into Titan

Stores a massive-scale Batch processing of large Runs global graph algorithms
property graph allowing real- graphs with Hadoop
on large, compressed,
time traversals and updates
in-memory graphs

Thanks!

Vadas Gintautas
Marko Rodriguez
@vadasg
@twarko

Stephen Mallette
Daniel LaRocque
@spmallette

AURELIUS
THINKAURELIUS.COM

We are Hiring

AURELIUS
THINKAURELIUS.COM
@AURELIUSGRAPHS

Data Day Texas 2013

Recommended

Recommended

More Related Content

Viewers also liked

Viewers also liked (20)

More from Matthias Broecheler

More from Matthias Broecheler (9)

Recently uploaded

Recently uploaded (20)

Data Day Texas 2013

Editor's Notes