THE PATH FORWARD 
TITAN 1.0 + TINKERPOP3 
MARKO A. RODRIGUEZ 
http://THINKAURELIUS.COM 
http://TINKERPOP.COM
PART 1: 
INTRODUCTION TO GRAPHS
VERTEX
software
software 
LABEL
name:gremlin 
software
PROPERTY 
KEY VALUE 
name:gremlin 
software
name:gremlin 
use:queries 
software
name:gremlin 
use:queries 
software 
name:titan 
use:database 
software
name:gremlin 
use:queries 
software 
name:titan 
use:database 
software 
dependsOn
name:gremlin 
use:queries 
software 
name:titan 
use:database 
software 
EDGE dependsOn
name:gremlin 
use:queries 
software 
name:titan 
use:database 
software 
dependsOn 
since:0.1
GRAPH
COMPUTING 
PROCESS STRUCTURE
COMPUTING 
PROCESS STRUCTURE 
TRAVERSAL GRAPH
TRAVERSAL GRAPH 
PROCESS STRUCTURE 
COMPUTING GRAPH-BASED 
COMPUTING
WhY GRAPH-BASED COMPUTING?
WhY GRAPH-BASED COMPUTING? 
INTUITIVE MODELING
WhY GRAPH-BASED COMPUTING? 
INTUITIVE MODELING 
EXPRESSIVE QUERYING
WhY GRAPH-BASED COMPUTING? 
INTUITIVE MODELING 
EXPRESSIVE QUERYING 
NUMEROUS ANALYSES 
Mixing Patterns 
Centrality 
Ranking 
Path Expressions 
Geodesics 
Inference 
Motifs 
Scoring
ANALYSES ARE THE 
EPIPHENOMENA OF TRAVERSAL 
f( )→
WHAT IS THE SIGNIFICANCE OF 
GRAPH ANALYSIS?
ANALYSES YIELD 
INSIGHTS ABOUT THE MODEL 
= 
DATA 
PRODUCTS 
DATA-DRIVEN 
DECISION SUPPORT
RECOMMENDATION 
People you may know. 
Products you might like. 
Movies you should watch and 
the friends you should watch them with. 
SOCIAL GRAPH 
RATINGS GRAPH 
SOCIAL+RATINGS 
GRAPH
WHO ELSE MIGHT HERCULES KNOW? 
hercules 
1 
0 2 
3 
4 
5 
6 
cerberus 
nemean 
hydra 
knows 
knows 
knows 
pluto 
neptune 
jupiter 
knows 
knows 
knows 
knows 
knows
hercules 
1 
0 2 
3 
4 
5 
6 
cerberus 
nemean 
hydra 
knows 
knows 
knows 
pluto 
neptune 
jupiter 
knows 
knows 
knows 
knows 
knows 
gremlin> hercules 
==>v[0]
hercules 
1 
0 2 
3 
4 
5 
6 
cerberus 
nemean 
hydra 
knows 
knows 
knows 
pluto 
neptune 
jupiter 
knows 
knows 
knows 
knows 
knows 
gremlin> hercules.out('knows') 
==>v[1] 
==>v[2] 
==>v[3]
hercules 
1 
0 2 
3 
4 
5 
6 
cerberus 
nemean 
hydra 
knows 
knows 
knows 
pluto 
neptune 
jupiter 
knows 
knows 
knows 
knows 
knows 
gremlin> hercules.out('knows').out('knows') 
==>v[4] 
==>v[5] 
==>v[5] 
==>v[6] 
==>v[5]
hercules 
1 
0 2 
3 
4 
5 
6 
cerberus 
nemean 
hydra 
knows 
knows 
knows 
pluto 
neptune 
jupiter 
knows 
knows 
knows 
knows 
knows 
gremlin> hercules.out('knows').out('knows').groupCount() 
==>v[4]=1 
==>v[5]=3 
==>v[6]=1
HERCULES PROBABLY KNOWS NEPTUNE 
hercules 
1 
0 2 
3 
4 
5 
6 
cerberus 
nemean 
hydra 
knows 
knows 
knows 
pluto 
neptune 
jupiter 
knows 
knows 
knows 
knows 
knows 
knows
HERCULES PROBABLY KNOWS NEPTUNE 
hercules 
1 
0 2 
3 
4 
5 
6 
cerberus 
nemean 
hydra 
knows 
knows 
knows 
pluto 
neptune 
jupiter 
knows 
knows 
knows 
knows 
knows 
father 
brother 
...PROBABLY MORE SO WHEN OTHER 
TYPES OF EDGES ARE ANALYZED
hercules 
1 
0 2 
3 
4 
5 
6 
cerberus 
nemean 
hydra 
knows 
knows 
knows 
pluto 
neptune 
jupiter 
knows 
knows 
knows 
knows 
knows 
father 
brother
hercules 
1 
0 2 
3 
4 
5 
6 
cerberus 
nemean 
hydra 
knows 
knows 
knows 
pluto 
neptune 
jupiter 
knows 
knows 
knows 
knows 
knows 
father 
brother 
likes
hercules 
1 
0 2 
3 
4 
5 
6 
cerberus 
nemean 
hydra 
knows 
knows 
knows 
pluto 
neptune 
jupiter 
knows 
knows 
knows 
knows 
knows 
father 
brother 
likes 
SOCIAL GRAPH
human flesh 
hercules 
1 
0 2 
3 
4 
5 
6 
cerberus 
nemean 
hydra 
knows 
knows 
knows 
pluto 
neptune 
jupiter 
knows 
knows 
knows 
knows 
knows 
father 
brother 
7 
likes 
SOCIAL GRAPH
human flesh 
hercules 
1 
0 2 
3 
4 
5 
6 
cerberus 
nemean 
hydra 
knows 
knows 
knows 
pluto 
neptune 
jupiter 
knows 
knows 
knows 
knows 
knows 
father 
brother 
7 
likes 
likes 
likes 
SOCIAL GRAPH
human flesh 
hercules 
1 
0 2 
3 
4 
5 
6 
cerberus 
nemean 
hydra 
knows 
knows 
knows 
pluto 
neptune 
jupiter 
knows 
knows 
knows 
knows 
knows 
father 
brother 
7 
likes 
likes 
likes 
tartarus 
8 
SOCIAL GRAPH
human flesh 
hercules 
1 
0 2 
3 
4 
5 
6 
cerberus 
nemean 
hydra 
knows 
knows 
knows 
pluto 
neptune 
jupiter 
knows 
knows 
knows 
knows 
knows 
father 
brother 
7 
likes 
likes 
likes 
tartarus 
8 
likes likes 
dislikes 
SOCIAL GRAPH
human flesh 
hercules 
1 
0 2 
3 
4 
5 
6 
cerberus 
nemean 
hydra 
knows 
knows 
knows 
pluto 
neptune 
jupiter 
knows 
knows 
knows 
knows 
knows 
father 
brother 
7 
likes 
likes 
likes 
tartarus 
8 
likes likes 
dislikes 
SOCIAL GRAPH 
RATINGS GRAPH
human flesh 
hercules 
1 
0 2 
3 
4 
5 
6 
cerberus 
nemean 
hydra 
knows 
knows 
knows 
pluto 
neptune 
jupiter 
knows 
knows 
knows 
knows 
knows 
father 
brother 
7 
likes 
likes 
likes 
composedOf 
tartarus 
8 
likes likes 
dislikes 
NEMEAN MIGHT LIKE TARTARUS 
smellsOf 
SOCIAL GRAPH 
RATINGS GRAPH 
PRODUCT GRAPH 
* Collaborative Filtering + Content-Based Recommendation
PATH FINDING 
How is this person related to this film? 
Which authors of this book also 
wrote a New York Times bestseller? 
Which movies are based on a book by a 
New York Times bestseller? 
MOVIE GRAPH 
BOOK GRAPH 
MOVIE+BOOK 
GRAPH
WHO PLAYED HERCULES 
IN WHAT MOVIE? 
hercules 
0 
arnold 
schwarzenegger 
hasActor 
jupiter 
10 7 
8 
depictedIn 
hercules in 
new york 
actor 
role 
9 
hasActor 
6 
role 
ernest 
graves 
actor 
11 
depictedIn
WHO PLAYED HERCULES 
IN WHAT MOVIE? 
hercules 
0 
arnold 
schwarzenegger 
hasActor 
jupiter 
10 7 
8 
depictedIn 
hercules in 
new york 
gremlin> hercules.match('hercules', 
actor 
role 
9 
hasActor 
6 
role 
ernest 
graves 
actor 
11 
depictedIn
WHO PLAYED HERCULES 
IN WHAT MOVIE? 
hercules 
0 
arnold 
schwarzenegger 
role 
hasActor 
jupiter 
10 7 
8 
depictedIn 
hercules in 
new york 
actor 
gremlin> hercules.match('hercules', 
g.of().as('hercules').out('depictedIn').as('movie'), 
9 
hasActor 
6 
role 
ernest 
graves 
actor 
11 
depictedIn
WHO PLAYED HERCULES 
IN WHAT MOVIE? 
hercules 
0 
arnold 
schwarzenegger 
role 
hasActor 
jupiter 
10 7 
8 
depictedIn 
hercules in 
new york 
actor 
gremlin> hercules.match('hercules', 
g.of().as('hercules').out('depictedIn').as('movie'), 
g.of().as('movie').out('hasActor').as('actor'), 
9 
hasActor 
6 
role 
ernest 
graves 
actor 
11 
depictedIn
WHO PLAYED HERCULES 
IN WHAT MOVIE? 
hercules 
0 
arnold 
schwarzenegger 
role 
hasActor 
jupiter 
10 7 
8 
depictedIn 
hercules in 
new york 
actor 
gremlin> hercules.match('hercules', 
g.of().as('hercules').out('depictedIn').as('movie'), 
g.of().as('movie').out('hasActor').as('actor'), 
g.of().as('actor').out('role').as('hercules'), 
9 
hasActor 
6 
role 
ernest 
graves 
actor 
11 
depictedIn
WHO PLAYED HERCULES 
IN WHAT MOVIE? 
hercules 
0 
arnold 
schwarzenegger 
role 
hasActor 
jupiter 
10 7 
8 
depictedIn 
hercules in 
new york 
actor 
gremlin> hercules.match('hercules', 
g.of().as('hercules').out('depictedIn').as('movie'), 
g.of().as('movie').out('hasActor').as('actor'), 
g.of().as('actor').out('role').as('hercules'), 
g.of().as('actor').out('actor').as('star')) 
9 
hasActor 
6 
role 
ernest 
graves 
actor 
11 
depictedIn
WHO PLAYED HERCULES 
IN WHAT MOVIE? 
hercules 
0 
arnold 
schwarzenegger 
role 
hasActor 
jupiter 
10 7 
8 
depictedIn 
hercules in 
new york 
actor 
gremlin> hercules.match('hercules', 
g.of().as('hercules').out('depictedIn').as('movie'), 
g.of().as('movie').out('hasActor').as('actor'), 
g.of().as('actor').out('role').as('hercules'), 
g.of().as('actor').out('actor').as('star')) 
.select(['movie','star']) 
==>[movie:v[7], star:v[9]] 
9 
hasActor 
6 
role 
ernest 
graves 
actor 
11 
depictedIn
WHO PLAYED HERCULES 
IN WHAT MOVIE? 
hercules 
0 
arnold 
schwarzenegger 
role 
hasActor 
jupiter 
10 7 
8 
depictedIn 
hercules in 
new york 
actor 
gremlin> hercules.match('hercules', 
g.of().as('hercules').out('depictedIn').as('movie') 
g.of().as('movie').out('hasActor').as('actor') 
g.of().as('actor').out('role').as('hercules') 
g.of().as('actor').out('actor').as('star')) 
.select(['movie','star']){it.name} 
==>[movie:hercules in new york, star:arnold schwarzenegger] 
9 
hasActor 
6 
role 
ernest 
graves 
actor 
11 
depictedIn
hercules 
0 
arnold 
schwarzenegger 
hasActor 
jupiter 
10 7 
8 
depictedIn 
hercules in 
new york 
actor 
role 
9 
hasActor 
6 
role 
ernest 
graves 
actor 
11 
depictedIn
hercules 
0 
arnold 
schwarzenegger 
hasActor 
jupiter 
10 7 
8 
depictedIn 
hercules in 
new york 
actor 
role 
9 
hasActor 
6 
role 
ernest 
graves 
actor 
11 
depictedIn
hercules 
0 
arnold 
schwarzenegger 
hasActor 
jupiter 
10 7 
8 
depictedIn 
hercules in 
new york 
actor 
role 
9 
hasActor 
6 
role 
ernest 
graves 
actor 
11 
depictedIn 
12 the arms of 
hercules 
depictedIn
hercules 
0 
arnold 
schwarzenegger 
hasActor 
jupiter 
10 7 
8 
depictedIn 
hercules in 
new york 
actor 
role 
9 
hasActor 
6 
role 
ernest 
graves 
actor 
11 
depictedIn 
12 the arms of 
hercules 
fred 
saberhagen 
13 
depictedIn 
writtenBy
albuquerque 
hercules 
0 
arnold 
schwarzenegger 
hasActor 
jupiter 
10 7 
8 
depictedIn 
hercules in 
new york 
actor 
role 
9 
hasActor 
6 
role 
ernest 
graves 
actor 
11 
depictedIn 
12 the arms of 
hercules 
fred 
saberhagen 
13 
depictedIn 
writtenBy 
14 
livesIn
albuquerque 
hercules 
0 
arnold 
schwarzenegger 
hasActor 
jupiter 
10 7 
8 
depictedIn 
hercules in 
new york 
actor 
role 
9 
hasActor 
6 
role 
ernest 
graves 
actor 
11 
depictedIn 
12 the arms of 
hercules 
fred 
saberhagen 
13 
depictedIn 
writtenBy 
14 
livesIn 
15 
25-North 
santa fe
albuquerque 
hercules 
0 
arnold 
schwarzenegger 
hasActor 
jupiter 
10 7 
8 
depictedIn 
hercules in 
new york 
actor 
role 
9 
hasActor 
6 
role 
marko 
rodriguez 
ernest 
graves 
actor 
11 
depictedIn 
12 the arms of 
hercules 
fred 
saberhagen 
13 
depictedIn 
writtenBy 
14 
livesIn 
15 
25-North 
santa fe 
16 
livesIn
albuquerque 
hercules 
0 
arnold 
schwarzenegger 
hasActor 
jupiter 
10 7 
8 
depictedIn 
hercules in 
new york 
actor 
role 
9 
hasActor 
6 
role 
marko 
rodriguez 
ernest 
graves 
actor 
11 
depictedIn 
12 the arms of 
hercules 
fred 
saberhagen 
13 
depictedIn 
writtenBy 
14 
livesIn 
15 
25-North 
santa fe 
16 
livesIn 
thinksHeIs
albuquerque 
hercules 
0 
arnold 
schwarzenegger 
BOOK GRAPH 
hasActor 
jupiter 
10 7 
8 
depictedIn 
hercules in 
new york 
actor 
role 
9 
hasActor 
6 
role 
marko 
rodriguez 
ernest 
graves 
actor 
11 
depictedIn 
12 the arms of 
hercules 
fred 
saberhagen 
13 
depictedIn 
writtenBy 
14 
livesIn 
15 
25-North 
santa fe 
16 
livesIn 
thinksHeIs 
MOVIE GRAPH 
TRANSPORTATION GRAPH 
PROFILE 
GRAPH
SOCIAL INFLUENCE 
Who are the most influential people in 
java, mathematics, art, surreal art, politics, ...? 
Which region of the social graph will propagate this 
advertisement this furthest? 
Which 3 experts should review this submitted article? 
Which people should I talk to at the upcoming 
conference and what topics should 
I talk to them about? 
SOCIAL + COMMUNICATION + EXPERTISE + EVENT GRAPH
PATTERN IDENTIFICATION 
This connectivity pattern is a sign of financial fraud. 
When this motif is found, a red flag will be raised. 
TRANSACTION GRAPH 
Healthy discourse is typified by a discussion board 
with a branch factor in this range and a concept 
clique score in this range. 
DISCUSSION GRAPH
KNOWLEDGE DISCOVERY 
The terms "ice", "fans", "stanley cup," 
are classified as "sports" 
Given that all identified birds fly, 
it can be deduced that all birds fly. 
If contrary evidence is provided, 
then this "fact" can be retracted. 
WIKIPEDIA GRAPH 
EVIDENTIAL LOGIC GRAPH
WORLD MODEL
WORLD PROCESSES 
WORLD MODEL
WORLD PROCESSES 
WORLD MODEL 
A single world model and various types of traversers 
moving through that model to solve problems.
TRAVERSAL GRAPH 
PROCESS STRUCTURE 
COMPUTING GRAPH-BASED 
COMPUTING
PART 2: 
THE PATH TO TITAN 1.0
WhY CREATE TITAN? 
A number of Aurelius' clients... 
...need to represent and process 
graphs at the 100+ billion edge 
scale w/ thousands of concurrent 
transactions. 
...need both local graph traversals 
(OLTP) and batch graph 
processing (OLAP). 
...desire a free, open source 
distributed graph database.
TITAN's KEY FEATURES 
Titan provides... 
..."infinite size" graphs and 
"unlimited" users by means of a 
distributed storage engine. 
...real-time local traversals (OLTP) 
and support for global batch 
processing via Hadoop (OLAP). 
...distribution via the liberal, free, 
open source Apache2 license.
-OR-BACKEND 
AGNOSTIC
TITAN DISTRIBUTED 
VIA CASSANDRA 
titan$ bin/gremlin.sh 
,,,/ 
(o o) 
-----oOOo-(_)-oOOo----- 
gremlin> conf = new BaseConfiguration(); 
==>org.apache.commons.configuration.BaseConfiguration@763861e6 
gremlin> conf.setProperty("storage.backend","cassandra"); 
gremlin> conf.setProperty("storage.hostname","77.77.77.77"); 
gremlin> g = TitanFactory.open(conf); 
==>titangraph[cassandra:77.77.77.77] 
gremlin>
INHERITED FEATURES 
Continuously available with no single point of failure. 
No write bottlenecks to the graph as there is no master/slave architecture. 
Built-in replication ensures data is available during machine failure. 
Caching layer ensures that continuously accessed data is available in memory. 
Elastic scalability allows for the introduction and removal of machines. 
Cassandra available at http://cassandra.apache.org/
TITAN DISTRIBUTED 
titan$ bin/gremlin.sh 
,,,/ 
(o o) 
VIA HBASE 
-----oOOo-(_)-oOOo----- 
gremlin> conf = new BaseConfiguration(); 
==>org.apache.commons.configuration.BaseConfiguration@763861e6 
gremlin> conf.setProperty("storage.backend","hbase"); 
gremlin> conf.setProperty("storage.hostname","77.77.77.77"); 
gremlin> g = TitanFactory.open(conf); 
==>titangraph[hbase:77.77.77.77] 
gremlin>
INHERITED FEATURES 
Strictly consistent reads and writes. 
Linear scalability with the addition of machines. 
Base classes for backing Hadoop MapReduce jobs with HBase tables. 
HDFS-based data replication. 
Generally good integration with the tools in the Hadoop ecosystem. 
HBase available at http://hbase.apache.org/
Titan is all about ...
Titan is all about numerous concurrent users...
Titan is all about numerous concurrent users... 
high availability....
Titan is all about numerous concurrent users... 
high availability.... 
dynamic scalability...
VERTEX-CENTRIC INDICES 
THE SUPER NODE PROBLEM 
Natural, real-world graphs contain 
vertices of high degree. 
Even if rare, their degree ensures that 
they exist on many paths. 
Traversing a high degree vertex 
means touching numerous incident 
edges and potentially touching most 
of the graph in only a few steps.
VERTEX-CENTRIC INDICES 
A SUPER NODE SOLUTION 
A "super node" only exists from the 
vantage point of classic "textbook 
style" graphs. 
In the world of property graphs, 
intelligent disk-level filtering can 
interpret a "super node" as a more 
manageable low-degree vertex. 
Vertex-centric querying utilizes sort 
orders for speedy lookup of incident 
edges with particular qualities.
VERTEX-CENTRIC INDICES 
PUSHDOWN PREDICATES 
stars:5 
vertex.bothE() 
likes likes 
stars:2 
stars:2 
knows knows 
likes likes 
knows 
likes 
stars:3 stars:3 
8 edges
VERTEX-CENTRIC INDICES 
PUSHDOWN PREDICATES 
stars:5 
likes likes 
likes 
knows knows 
stars:3 stars:3 
likes likes 
stars:2 
stars:2 
vertex.outE() 
7 edges
VERTEX-CENTRIC INDICES 
PUSHDOWN PREDICATES 
stars:5 
likes likes 
likes 
stars:3 stars:3 
likes likes 
stars:2 
stars:2 
vertex.outE("likes") 
5 edges
VERTEX-CENTRIC INDICES 
PUSHDOWN PREDICATES 
stars:5 
likes 
1 edge 
vertex.outE("likes").has("stars",5)
VERTEX-CENTRIC INDICES 
DISK-LEVEL SORTING/INDEXING 
likes 
likes 
likes 
knows 
knows 
stars:1 
stars:2 
stars:5
VERTEX-CENTRIC INDICES 
DISK-LEVEL SORTING/INDEXING 
likes 
likes 
likes 
knows 
knows 
likes 
knows 
stars:1 
stars:2 
stars:5
VERTEX-CENTRIC INDICES 
DISK-LEVEL SORTING/INDEXING 
likes 
likes 
likes 
knows 
knows 
likes w/ time 1-3 
knows 
stars:1 
stars:2 
stars:5 
likes w/ time 4-5
VERTEX-CENTRIC INDICES 
DISK-LEVEL SORTING/INDEXING 
likes 
likes 
likes 
knows 
knows 
likes w/ time 1-3 
knows 
schema = g.openManagementSystem() 
stars = schema 
.makePropertyKey("stars") 
.dataType(Integer.class) 
.make() 
likes = schema 
.makeEdgeLabel('likes') 
.make() 
schema.buildEdgeIndex(likes, 
'timeIdx',BOTH,DESC,stars) 
stars:1 
stars:2 
stars:5 
likes w/ time 4-5
THE TITAN OLAP STORY 
0 
1 
2 
3 
A 
B 
A 
C 
a:b 
c:d 
e:f 
g:h 
i:j 
4 
5 
6 
7 
vertex 
id props 
incoming edges outgoing edges 
edge 
id props label id 
edge 
id props label id 
edge 
id label id 
edge 
id label id 
1 e:f 4 c:d A 2 5 B 0 6 g:h A 3 7 C 3
0 
1 
3 
4 
5 
6 
7 
8 
9 
10 
11 
AN ADJACENCY LIST
AN ADJACENCY LIST 
+ 
CLUSTER 
127.0.0.2 127.0.0.3 127.0.0.4 
0 
1 
3 
4 
5 
6 
7 
8 
9 
10 
11
A DISTRIBUTED ADJACENCY LIST 
0 
1 
2 
3 
4 
5 
6 
7 
8 
9 
10 
11 
127.0.0.2 127.0.0.3 127.0.0.4
HADOOP 
Hadoop is a distributed computing platform composed of two key components: 
HDFS: 
A distributed file system that stores arbitrarily large files within a cluster. 
MapReduce: 
A parallel functional computing model for key/value pair data. 
http://hadoop.apache.org
FAUNUS AND HADOOP 
0 
1 
2 
3 
4 
5 
6 
7 
8 
9 
10 
11 
Process 
Structure 
127.0.0.2 127.0.0.3 127.0.0.4 
Faunus provides graph input/output formats (structure) 
and a traversal language for graphs (process).
THE TITAN OLAP STORY 
Serial Key/Value Data Structure Indexed Key/Indexed Value Data Structure 
0 
1 
3 
4 
5 
6 
7 
8 
9 
10 
11 
0 
1 
3 
4 
5 
6 
7 
8 
9 
10 
11
Adam Jacobs. 2009. The Pathologies of Big Data. Communications of the ACM 52, 8 (August 2009), 36-44. 
doi:10.1145/1536616.1536632 http://doi.acm.org/10.1145/1536616.1536632
INTRA-CLUSTER CONFIGURATION 
Titan/Cassandra Faunus/Hadoop 
Data is processed on the machine where it is located. 
Limited network communication.
INTER-CLUSTER CONFIGURATION 
Graph data is offloaded to another cluster. 
Repeated analysis does not interfere with production graph database.
THE TITAN OLAP STORY 
Titan currently provides a Hadoop OLAP engine 
Titan-Hadoop (nicknamed Faunus) 
Titan 1.0 will also provide an in-memory, off head, single-machine OLAP engine. 
Titan-Memory (nicknamed Fulgora)
PART 3: 
THE PATH TO TINKERPOP3
TINKERPOP3 FEATURES 
A standard, vendor-agnostic graph query language. 
Both OLTP and OLAP engines for evaluating queries. 
A way for developers to create custom, domain specific variants. 
A server system for pushing queries to the server for evaluation. 
A vertex-centric computing framework for distributed graph algorithms.
gremlin>
gremlin> gremlin = g.addVertex(label,'software','name','gremlin') 
==>v[0] 
name:gremlin 
0 
software
gremlin> gremlin = g.addVertex(label,'software','name','gremlin') 
==>v[0] 
gremlin> titan = g.addVertex(label,'software','name','titan') 
==>v[2] 
name:titan name:gremlin 
0 
software 
2 
software
gremlin> gremlin = g.addVertex(label,'software','name','gremlin') 
==>v[0] 
gremlin> titan = g.addVertex(label,'software','name','titan') 
==>v[2] 
gremlin> titan.addEdge('dependsOn',gremlin,'since','0.1') 
==>e[4][2-dependsOn->0] 
name:titan name:gremlin 
0 
software 
2 
software 
dependsOn 
since:0.1
Gremlin is a style of graph traversal that is 
heavily influenced by functional programming. 
gremlin> gremlin = g.addVertex(label,'software','name','gremlin') 
==>v[0] 
gremlin> titan = g.addVertex(label,'software','name','titan') 
==>v[2] 
gremlin> titan.addEdge('dependsOn',gremlin,'since','0.1') 
==>e[4][2-dependsOn->0] 
////////////////////////////////////////////////////////////////// 
gremlin> g.V 
==>v[0] 
==>v[2] 
gremlin> g.V.has('name','titan') 
==>v[2] 
gremlin> g.V.has('name','titan').outE 
==>e[4][2-dependsOn->0] 
gremlin> g.V.has('name','titan').outE.since 
==>0.1 
gremlin> g.V.has('name','titan').out.name 
==>gremlin
g.V.has('name','titan').out.name 
What are the names of the vertices outgoing adjacent to Titan?
g.V.has('name','titan').out.name 
name=gremlin 
0 
2 
name=titan 
For the graph 'g'... 
Traversals are defined using fluent, function composition.
g.V.has('name','titan').out.name 
name=gremlin 
0 
2 
name=titan 
…get all the vertices... 
Objects flow through the traversal in a lazy evaluation manner.
g.V.has('name','titan').out.name 
2 
name=titan 
…filter all vertices that don't have the name 'titan'… 
The general functions are map(), flatMap(), filter(), sideEffect(), and branch().
g.V.has('name','titan').out.name 
name=gremlin 
0 
…get all outgoing adjacent vertices… 
Any programming language can provide a Gremlin implementation.
g.V.has('name','titan').out.name 
gremlin 
…get the names of those vertices... 
Any graph system vendor can provide a binding to Gremlin.
THE GENERIC STEPS 
? 
S filter S S map E S 
flatMap 
sideEffect branch 
A 
E 
E E 
E 
E 
E 
S S S 
S 
... 
S 
? 
? 
has() 
retain() 
except() 
interval() 
simplePath() 
dedup() 
timeLimit() 
... 
back() 
fold() 
valueMap() 
match() 
orderBy() 
shuffle() 
... 
out() 
outE() 
inV() 
both() 
values() 
unfold() 
... 
aggregate() 
store() 
groupCount() 
groupBy() 
tree() 
count() 
subgraph() 
... 
jump() 
until() 
choose() 
union() 
...
TINKERPOP'S OLAP STORY 
Execute parallel, distributed Gremlin traversals. 
Execute vertex-centric message passing algorithms.
TINKERPOP'S OLAP STORY 
Execute parallel, distributed Gremlin traversals. 
Execute vertex-centric message passing algorithms. 
traversers are the messages!
OLAP 
1 
1 
1 
1 
2 
2 
2 
3 
1 
2 
3 
OLTP 
✓ Real-time queries. 
✓ Touching a subset of the graph. 
✓ Numerous concurrent queries. 
✓ Good for local graph mutations. 
✓ Long running queries. 
✓ Touching the entire of the graph. 
✓ Single user. 
✓ Good for bulk loading/mutations.
g.compute() 
GraphComputer 
Vertex 
Program 
program 
mapReduce 
distributed 
to all workers 
MapReduce 
MapReduce 
MapReduce 
worker 1 
worker 2 
worker 3 
1. Logically distribute the vertex program to all the vertices in the graph. 
2. The vertices execute the vertex program sending and receiving messages on each iteration. 
3. Any number of MapReduce jobs can follow to aggregate the computed data in the graph.
gremlin> result = g.compute().program(PageRankVertexProgram).submit() 
==>result[tinkergraph[vertices:6 edges:6],memory[size:0]] 
gremlin> result.memory.iteration 
==>30 
gremlin> result.graph 
==>tinkergraph[vertices:6 edges:6] 
gremlin> result.graph.V.map{ [it.id,it.pageRank]} 
==>[1, 0.15000000000000002] 
==>[2, 0.19250000000000003] 
==>[3, 0.4018125] 
==>[4, 0.19250000000000003] 
==>[5, 0.23181250000000003] 
==>[6, 0.15000000000000002] 
* Minor tweaks to example to make it fit nicely on a single line.
gremlin> g.V.both.both.name.groupCount() 
==>[ripple:3, peter:3, vadas:3, josh:7, lop:7, marko:7] 
gremlin> g.V.both.both.name.groupCount().submit(g.compute()) 
==>[ripple:3, peter:3, vadas:3, josh:7, lop:7, marko:7] 
OLTP 
OLAP
MULTIPLE GRAPH COMPUTERS 
FOR THE SAME GRAPH 
gremlin> g = TitanFactory.open(conf) 
==>titangraph[66.77.88.99] 
gremlin> g.compute(FaunusGraphComputer) 
.program(PageRankVertexProgram) 
.submit() 
==>result[faunusgraph,memory[size:0]] 
gremlin> g = TitanFactory.open(conf) 
==>titangraph[66.77.88.99] 
gremlin> g.compute(FulgoraGraphComputer) 
.program(PageRankVertexProgram) 
.submit() 
==>result[fulgoragraph,memory[size:0]]
GREMLIN SERVER
GREMLIN SERVER 
gremlin> g = TitanFactory.open(conf) 
==>titangraph[66.77.88.99] 
localhost
GREMLIN SERVER 
gremlin> g = TitanFactory.open(conf) 
==>titangraph[66.77.88.99] 
gremlin> g.V.has('name','titan') 
==>v[2] 
localhost
GREMLIN SERVER 
gremlin> g = TitanFactory.open(conf) 
==>titangraph[66.77.88.99] 
gremlin> g.V.has('name','titan') 
==>v[2] 
gremlin> g.V.has('name','titan').out 
localhost 
1. get the titan vertex 
2. get titan's adjacents
GREMLIN SERVER 
gremlin> g = TitanFactory.open(conf) 
==>titangraph[66.77.88.99] 
gremlin> g.V.has('name','titan') 
==>v[2] 
gremlin> g.V.has('name','titan').out 
localhost 
1. get the titan vertex 
2. get titan's adjacents 
CHATTY 
Each step in the traversal is a round trip
GREMLIN SERVER
GREMLIN SERVER
GREMLIN SERVER 
gremlin> :remote connect tinkerpop.server titan.yaml 
==>connected - 66.77.88.99:8182,11.22.33.44:8182,... 
localhost
GREMLIN SERVER 
gremlin> :remote connect tinkerpop.server titan.yaml 
==>connected - 66.77.88.99:8182,11.22.33.44:8182,... 
gremlin> :> g.V.has('name','titan') 
==>v[2] 
localhost 
g.V.has('name','titan')
GREMLIN SERVER 
gremlin> :remote connect tinkerpop.server titan.yaml 
==>connected - 66.77.88.99:8182,11.22.33.44:8182,... 
gremlin> :> g.V.has('name','titan') 
==>v[2] 
gremlin> :> g.V.has('name','titan').out 
localhost 
g.V.has('name','titan').out 
LESS CHATTY 
1. Gremlin Server client knows the hash function 
to query the machine with the Titan vertex 
2. Push the traversal to the server and 
get a WebSockets iterator back
GREMLIN SERVER 
Allows non-JVM languages to interact with a remote graph. 
Supports stored procedures -- MyAlgorithms.class 
Supports WebSockets, REST, and various serialization mechanisms.
CREDITS 
PRESENTED BY 
MARKO A. RODRIGUEZ 
MANY THANKS TO 
MATTHIAS BRöCHELER 
STEPHEN MALLETTE 
DAN LAROCQUE 
DANIEL KUPPITZ 
BOB BRIODY 
BRYN COOKE 
JOSH SHINAVIER 
PAVEL YASKEVICH 
AURELIUS COMMUNITY 
TINKERPOP COMMUNITY 
KETRINA YIM

The Path Forward

  • 1.
    THE PATH FORWARD TITAN 1.0 + TINKERPOP3 MARKO A. RODRIGUEZ http://THINKAURELIUS.COM http://TINKERPOP.COM
  • 2.
  • 3.
  • 4.
  • 5.
  • 6.
  • 7.
    PROPERTY KEY VALUE name:gremlin software
  • 8.
  • 9.
    name:gremlin use:queries software name:titan use:database software
  • 10.
    name:gremlin use:queries software name:titan use:database software dependsOn
  • 11.
    name:gremlin use:queries software name:titan use:database software EDGE dependsOn
  • 12.
    name:gremlin use:queries software name:titan use:database software dependsOn since:0.1
  • 13.
  • 14.
  • 15.
  • 16.
    TRAVERSAL GRAPH PROCESSSTRUCTURE COMPUTING GRAPH-BASED COMPUTING
  • 17.
  • 18.
    WhY GRAPH-BASED COMPUTING? INTUITIVE MODELING
  • 19.
    WhY GRAPH-BASED COMPUTING? INTUITIVE MODELING EXPRESSIVE QUERYING
  • 20.
    WhY GRAPH-BASED COMPUTING? INTUITIVE MODELING EXPRESSIVE QUERYING NUMEROUS ANALYSES Mixing Patterns Centrality Ranking Path Expressions Geodesics Inference Motifs Scoring
  • 21.
    ANALYSES ARE THE EPIPHENOMENA OF TRAVERSAL f( )→
  • 22.
    WHAT IS THESIGNIFICANCE OF GRAPH ANALYSIS?
  • 23.
    ANALYSES YIELD INSIGHTSABOUT THE MODEL = DATA PRODUCTS DATA-DRIVEN DECISION SUPPORT
  • 24.
    RECOMMENDATION People youmay know. Products you might like. Movies you should watch and the friends you should watch them with. SOCIAL GRAPH RATINGS GRAPH SOCIAL+RATINGS GRAPH
  • 25.
    WHO ELSE MIGHTHERCULES KNOW? hercules 1 0 2 3 4 5 6 cerberus nemean hydra knows knows knows pluto neptune jupiter knows knows knows knows knows
  • 26.
    hercules 1 02 3 4 5 6 cerberus nemean hydra knows knows knows pluto neptune jupiter knows knows knows knows knows gremlin> hercules ==>v[0]
  • 27.
    hercules 1 02 3 4 5 6 cerberus nemean hydra knows knows knows pluto neptune jupiter knows knows knows knows knows gremlin> hercules.out('knows') ==>v[1] ==>v[2] ==>v[3]
  • 28.
    hercules 1 02 3 4 5 6 cerberus nemean hydra knows knows knows pluto neptune jupiter knows knows knows knows knows gremlin> hercules.out('knows').out('knows') ==>v[4] ==>v[5] ==>v[5] ==>v[6] ==>v[5]
  • 29.
    hercules 1 02 3 4 5 6 cerberus nemean hydra knows knows knows pluto neptune jupiter knows knows knows knows knows gremlin> hercules.out('knows').out('knows').groupCount() ==>v[4]=1 ==>v[5]=3 ==>v[6]=1
  • 30.
    HERCULES PROBABLY KNOWSNEPTUNE hercules 1 0 2 3 4 5 6 cerberus nemean hydra knows knows knows pluto neptune jupiter knows knows knows knows knows knows
  • 31.
    HERCULES PROBABLY KNOWSNEPTUNE hercules 1 0 2 3 4 5 6 cerberus nemean hydra knows knows knows pluto neptune jupiter knows knows knows knows knows father brother ...PROBABLY MORE SO WHEN OTHER TYPES OF EDGES ARE ANALYZED
  • 32.
    hercules 1 02 3 4 5 6 cerberus nemean hydra knows knows knows pluto neptune jupiter knows knows knows knows knows father brother
  • 33.
    hercules 1 02 3 4 5 6 cerberus nemean hydra knows knows knows pluto neptune jupiter knows knows knows knows knows father brother likes
  • 34.
    hercules 1 02 3 4 5 6 cerberus nemean hydra knows knows knows pluto neptune jupiter knows knows knows knows knows father brother likes SOCIAL GRAPH
  • 35.
    human flesh hercules 1 0 2 3 4 5 6 cerberus nemean hydra knows knows knows pluto neptune jupiter knows knows knows knows knows father brother 7 likes SOCIAL GRAPH
  • 36.
    human flesh hercules 1 0 2 3 4 5 6 cerberus nemean hydra knows knows knows pluto neptune jupiter knows knows knows knows knows father brother 7 likes likes likes SOCIAL GRAPH
  • 37.
    human flesh hercules 1 0 2 3 4 5 6 cerberus nemean hydra knows knows knows pluto neptune jupiter knows knows knows knows knows father brother 7 likes likes likes tartarus 8 SOCIAL GRAPH
  • 38.
    human flesh hercules 1 0 2 3 4 5 6 cerberus nemean hydra knows knows knows pluto neptune jupiter knows knows knows knows knows father brother 7 likes likes likes tartarus 8 likes likes dislikes SOCIAL GRAPH
  • 39.
    human flesh hercules 1 0 2 3 4 5 6 cerberus nemean hydra knows knows knows pluto neptune jupiter knows knows knows knows knows father brother 7 likes likes likes tartarus 8 likes likes dislikes SOCIAL GRAPH RATINGS GRAPH
  • 40.
    human flesh hercules 1 0 2 3 4 5 6 cerberus nemean hydra knows knows knows pluto neptune jupiter knows knows knows knows knows father brother 7 likes likes likes composedOf tartarus 8 likes likes dislikes NEMEAN MIGHT LIKE TARTARUS smellsOf SOCIAL GRAPH RATINGS GRAPH PRODUCT GRAPH * Collaborative Filtering + Content-Based Recommendation
  • 41.
    PATH FINDING Howis this person related to this film? Which authors of this book also wrote a New York Times bestseller? Which movies are based on a book by a New York Times bestseller? MOVIE GRAPH BOOK GRAPH MOVIE+BOOK GRAPH
  • 42.
    WHO PLAYED HERCULES IN WHAT MOVIE? hercules 0 arnold schwarzenegger hasActor jupiter 10 7 8 depictedIn hercules in new york actor role 9 hasActor 6 role ernest graves actor 11 depictedIn
  • 43.
    WHO PLAYED HERCULES IN WHAT MOVIE? hercules 0 arnold schwarzenegger hasActor jupiter 10 7 8 depictedIn hercules in new york gremlin> hercules.match('hercules', actor role 9 hasActor 6 role ernest graves actor 11 depictedIn
  • 44.
    WHO PLAYED HERCULES IN WHAT MOVIE? hercules 0 arnold schwarzenegger role hasActor jupiter 10 7 8 depictedIn hercules in new york actor gremlin> hercules.match('hercules', g.of().as('hercules').out('depictedIn').as('movie'), 9 hasActor 6 role ernest graves actor 11 depictedIn
  • 45.
    WHO PLAYED HERCULES IN WHAT MOVIE? hercules 0 arnold schwarzenegger role hasActor jupiter 10 7 8 depictedIn hercules in new york actor gremlin> hercules.match('hercules', g.of().as('hercules').out('depictedIn').as('movie'), g.of().as('movie').out('hasActor').as('actor'), 9 hasActor 6 role ernest graves actor 11 depictedIn
  • 46.
    WHO PLAYED HERCULES IN WHAT MOVIE? hercules 0 arnold schwarzenegger role hasActor jupiter 10 7 8 depictedIn hercules in new york actor gremlin> hercules.match('hercules', g.of().as('hercules').out('depictedIn').as('movie'), g.of().as('movie').out('hasActor').as('actor'), g.of().as('actor').out('role').as('hercules'), 9 hasActor 6 role ernest graves actor 11 depictedIn
  • 47.
    WHO PLAYED HERCULES IN WHAT MOVIE? hercules 0 arnold schwarzenegger role hasActor jupiter 10 7 8 depictedIn hercules in new york actor gremlin> hercules.match('hercules', g.of().as('hercules').out('depictedIn').as('movie'), g.of().as('movie').out('hasActor').as('actor'), g.of().as('actor').out('role').as('hercules'), g.of().as('actor').out('actor').as('star')) 9 hasActor 6 role ernest graves actor 11 depictedIn
  • 48.
    WHO PLAYED HERCULES IN WHAT MOVIE? hercules 0 arnold schwarzenegger role hasActor jupiter 10 7 8 depictedIn hercules in new york actor gremlin> hercules.match('hercules', g.of().as('hercules').out('depictedIn').as('movie'), g.of().as('movie').out('hasActor').as('actor'), g.of().as('actor').out('role').as('hercules'), g.of().as('actor').out('actor').as('star')) .select(['movie','star']) ==>[movie:v[7], star:v[9]] 9 hasActor 6 role ernest graves actor 11 depictedIn
  • 49.
    WHO PLAYED HERCULES IN WHAT MOVIE? hercules 0 arnold schwarzenegger role hasActor jupiter 10 7 8 depictedIn hercules in new york actor gremlin> hercules.match('hercules', g.of().as('hercules').out('depictedIn').as('movie') g.of().as('movie').out('hasActor').as('actor') g.of().as('actor').out('role').as('hercules') g.of().as('actor').out('actor').as('star')) .select(['movie','star']){it.name} ==>[movie:hercules in new york, star:arnold schwarzenegger] 9 hasActor 6 role ernest graves actor 11 depictedIn
  • 50.
    hercules 0 arnold schwarzenegger hasActor jupiter 10 7 8 depictedIn hercules in new york actor role 9 hasActor 6 role ernest graves actor 11 depictedIn
  • 51.
    hercules 0 arnold schwarzenegger hasActor jupiter 10 7 8 depictedIn hercules in new york actor role 9 hasActor 6 role ernest graves actor 11 depictedIn
  • 52.
    hercules 0 arnold schwarzenegger hasActor jupiter 10 7 8 depictedIn hercules in new york actor role 9 hasActor 6 role ernest graves actor 11 depictedIn 12 the arms of hercules depictedIn
  • 53.
    hercules 0 arnold schwarzenegger hasActor jupiter 10 7 8 depictedIn hercules in new york actor role 9 hasActor 6 role ernest graves actor 11 depictedIn 12 the arms of hercules fred saberhagen 13 depictedIn writtenBy
  • 54.
    albuquerque hercules 0 arnold schwarzenegger hasActor jupiter 10 7 8 depictedIn hercules in new york actor role 9 hasActor 6 role ernest graves actor 11 depictedIn 12 the arms of hercules fred saberhagen 13 depictedIn writtenBy 14 livesIn
  • 55.
    albuquerque hercules 0 arnold schwarzenegger hasActor jupiter 10 7 8 depictedIn hercules in new york actor role 9 hasActor 6 role ernest graves actor 11 depictedIn 12 the arms of hercules fred saberhagen 13 depictedIn writtenBy 14 livesIn 15 25-North santa fe
  • 56.
    albuquerque hercules 0 arnold schwarzenegger hasActor jupiter 10 7 8 depictedIn hercules in new york actor role 9 hasActor 6 role marko rodriguez ernest graves actor 11 depictedIn 12 the arms of hercules fred saberhagen 13 depictedIn writtenBy 14 livesIn 15 25-North santa fe 16 livesIn
  • 57.
    albuquerque hercules 0 arnold schwarzenegger hasActor jupiter 10 7 8 depictedIn hercules in new york actor role 9 hasActor 6 role marko rodriguez ernest graves actor 11 depictedIn 12 the arms of hercules fred saberhagen 13 depictedIn writtenBy 14 livesIn 15 25-North santa fe 16 livesIn thinksHeIs
  • 58.
    albuquerque hercules 0 arnold schwarzenegger BOOK GRAPH hasActor jupiter 10 7 8 depictedIn hercules in new york actor role 9 hasActor 6 role marko rodriguez ernest graves actor 11 depictedIn 12 the arms of hercules fred saberhagen 13 depictedIn writtenBy 14 livesIn 15 25-North santa fe 16 livesIn thinksHeIs MOVIE GRAPH TRANSPORTATION GRAPH PROFILE GRAPH
  • 59.
    SOCIAL INFLUENCE Whoare the most influential people in java, mathematics, art, surreal art, politics, ...? Which region of the social graph will propagate this advertisement this furthest? Which 3 experts should review this submitted article? Which people should I talk to at the upcoming conference and what topics should I talk to them about? SOCIAL + COMMUNICATION + EXPERTISE + EVENT GRAPH
  • 60.
    PATTERN IDENTIFICATION Thisconnectivity pattern is a sign of financial fraud. When this motif is found, a red flag will be raised. TRANSACTION GRAPH Healthy discourse is typified by a discussion board with a branch factor in this range and a concept clique score in this range. DISCUSSION GRAPH
  • 61.
    KNOWLEDGE DISCOVERY Theterms "ice", "fans", "stanley cup," are classified as "sports" Given that all identified birds fly, it can be deduced that all birds fly. If contrary evidence is provided, then this "fact" can be retracted. WIKIPEDIA GRAPH EVIDENTIAL LOGIC GRAPH
  • 62.
  • 63.
  • 64.
    WORLD PROCESSES WORLDMODEL A single world model and various types of traversers moving through that model to solve problems.
  • 65.
    TRAVERSAL GRAPH PROCESSSTRUCTURE COMPUTING GRAPH-BASED COMPUTING
  • 66.
    PART 2: THEPATH TO TITAN 1.0
  • 67.
    WhY CREATE TITAN? A number of Aurelius' clients... ...need to represent and process graphs at the 100+ billion edge scale w/ thousands of concurrent transactions. ...need both local graph traversals (OLTP) and batch graph processing (OLAP). ...desire a free, open source distributed graph database.
  • 68.
    TITAN's KEY FEATURES Titan provides... ..."infinite size" graphs and "unlimited" users by means of a distributed storage engine. ...real-time local traversals (OLTP) and support for global batch processing via Hadoop (OLAP). ...distribution via the liberal, free, open source Apache2 license.
  • 69.
  • 70.
    TITAN DISTRIBUTED VIACASSANDRA titan$ bin/gremlin.sh ,,,/ (o o) -----oOOo-(_)-oOOo----- gremlin> conf = new BaseConfiguration(); ==>org.apache.commons.configuration.BaseConfiguration@763861e6 gremlin> conf.setProperty("storage.backend","cassandra"); gremlin> conf.setProperty("storage.hostname","77.77.77.77"); gremlin> g = TitanFactory.open(conf); ==>titangraph[cassandra:77.77.77.77] gremlin>
  • 71.
    INHERITED FEATURES Continuouslyavailable with no single point of failure. No write bottlenecks to the graph as there is no master/slave architecture. Built-in replication ensures data is available during machine failure. Caching layer ensures that continuously accessed data is available in memory. Elastic scalability allows for the introduction and removal of machines. Cassandra available at http://cassandra.apache.org/
  • 72.
    TITAN DISTRIBUTED titan$bin/gremlin.sh ,,,/ (o o) VIA HBASE -----oOOo-(_)-oOOo----- gremlin> conf = new BaseConfiguration(); ==>org.apache.commons.configuration.BaseConfiguration@763861e6 gremlin> conf.setProperty("storage.backend","hbase"); gremlin> conf.setProperty("storage.hostname","77.77.77.77"); gremlin> g = TitanFactory.open(conf); ==>titangraph[hbase:77.77.77.77] gremlin>
  • 73.
    INHERITED FEATURES Strictlyconsistent reads and writes. Linear scalability with the addition of machines. Base classes for backing Hadoop MapReduce jobs with HBase tables. HDFS-based data replication. Generally good integration with the tools in the Hadoop ecosystem. HBase available at http://hbase.apache.org/
  • 74.
    Titan is allabout ...
  • 75.
    Titan is allabout numerous concurrent users...
  • 76.
    Titan is allabout numerous concurrent users... high availability....
  • 77.
    Titan is allabout numerous concurrent users... high availability.... dynamic scalability...
  • 78.
    VERTEX-CENTRIC INDICES THESUPER NODE PROBLEM Natural, real-world graphs contain vertices of high degree. Even if rare, their degree ensures that they exist on many paths. Traversing a high degree vertex means touching numerous incident edges and potentially touching most of the graph in only a few steps.
  • 79.
    VERTEX-CENTRIC INDICES ASUPER NODE SOLUTION A "super node" only exists from the vantage point of classic "textbook style" graphs. In the world of property graphs, intelligent disk-level filtering can interpret a "super node" as a more manageable low-degree vertex. Vertex-centric querying utilizes sort orders for speedy lookup of incident edges with particular qualities.
  • 80.
    VERTEX-CENTRIC INDICES PUSHDOWNPREDICATES stars:5 vertex.bothE() likes likes stars:2 stars:2 knows knows likes likes knows likes stars:3 stars:3 8 edges
  • 81.
    VERTEX-CENTRIC INDICES PUSHDOWNPREDICATES stars:5 likes likes likes knows knows stars:3 stars:3 likes likes stars:2 stars:2 vertex.outE() 7 edges
  • 82.
    VERTEX-CENTRIC INDICES PUSHDOWNPREDICATES stars:5 likes likes likes stars:3 stars:3 likes likes stars:2 stars:2 vertex.outE("likes") 5 edges
  • 83.
    VERTEX-CENTRIC INDICES PUSHDOWNPREDICATES stars:5 likes 1 edge vertex.outE("likes").has("stars",5)
  • 84.
    VERTEX-CENTRIC INDICES DISK-LEVELSORTING/INDEXING likes likes likes knows knows stars:1 stars:2 stars:5
  • 85.
    VERTEX-CENTRIC INDICES DISK-LEVELSORTING/INDEXING likes likes likes knows knows likes knows stars:1 stars:2 stars:5
  • 86.
    VERTEX-CENTRIC INDICES DISK-LEVELSORTING/INDEXING likes likes likes knows knows likes w/ time 1-3 knows stars:1 stars:2 stars:5 likes w/ time 4-5
  • 87.
    VERTEX-CENTRIC INDICES DISK-LEVELSORTING/INDEXING likes likes likes knows knows likes w/ time 1-3 knows schema = g.openManagementSystem() stars = schema .makePropertyKey("stars") .dataType(Integer.class) .make() likes = schema .makeEdgeLabel('likes') .make() schema.buildEdgeIndex(likes, 'timeIdx',BOTH,DESC,stars) stars:1 stars:2 stars:5 likes w/ time 4-5
  • 88.
    THE TITAN OLAPSTORY 0 1 2 3 A B A C a:b c:d e:f g:h i:j 4 5 6 7 vertex id props incoming edges outgoing edges edge id props label id edge id props label id edge id label id edge id label id 1 e:f 4 c:d A 2 5 B 0 6 g:h A 3 7 C 3
  • 89.
    0 1 3 4 5 6 7 8 9 10 11 AN ADJACENCY LIST
  • 90.
    AN ADJACENCY LIST + CLUSTER 127.0.0.2 127.0.0.3 127.0.0.4 0 1 3 4 5 6 7 8 9 10 11
  • 91.
    A DISTRIBUTED ADJACENCYLIST 0 1 2 3 4 5 6 7 8 9 10 11 127.0.0.2 127.0.0.3 127.0.0.4
  • 92.
    HADOOP Hadoop isa distributed computing platform composed of two key components: HDFS: A distributed file system that stores arbitrarily large files within a cluster. MapReduce: A parallel functional computing model for key/value pair data. http://hadoop.apache.org
  • 93.
    FAUNUS AND HADOOP 0 1 2 3 4 5 6 7 8 9 10 11 Process Structure 127.0.0.2 127.0.0.3 127.0.0.4 Faunus provides graph input/output formats (structure) and a traversal language for graphs (process).
  • 94.
    THE TITAN OLAPSTORY Serial Key/Value Data Structure Indexed Key/Indexed Value Data Structure 0 1 3 4 5 6 7 8 9 10 11 0 1 3 4 5 6 7 8 9 10 11
  • 95.
    Adam Jacobs. 2009.The Pathologies of Big Data. Communications of the ACM 52, 8 (August 2009), 36-44. doi:10.1145/1536616.1536632 http://doi.acm.org/10.1145/1536616.1536632
  • 96.
    INTRA-CLUSTER CONFIGURATION Titan/CassandraFaunus/Hadoop Data is processed on the machine where it is located. Limited network communication.
  • 97.
    INTER-CLUSTER CONFIGURATION Graphdata is offloaded to another cluster. Repeated analysis does not interfere with production graph database.
  • 98.
    THE TITAN OLAPSTORY Titan currently provides a Hadoop OLAP engine Titan-Hadoop (nicknamed Faunus) Titan 1.0 will also provide an in-memory, off head, single-machine OLAP engine. Titan-Memory (nicknamed Fulgora)
  • 99.
    PART 3: THEPATH TO TINKERPOP3
  • 100.
    TINKERPOP3 FEATURES Astandard, vendor-agnostic graph query language. Both OLTP and OLAP engines for evaluating queries. A way for developers to create custom, domain specific variants. A server system for pushing queries to the server for evaluation. A vertex-centric computing framework for distributed graph algorithms.
  • 101.
  • 102.
    gremlin> gremlin =g.addVertex(label,'software','name','gremlin') ==>v[0] name:gremlin 0 software
  • 103.
    gremlin> gremlin =g.addVertex(label,'software','name','gremlin') ==>v[0] gremlin> titan = g.addVertex(label,'software','name','titan') ==>v[2] name:titan name:gremlin 0 software 2 software
  • 104.
    gremlin> gremlin =g.addVertex(label,'software','name','gremlin') ==>v[0] gremlin> titan = g.addVertex(label,'software','name','titan') ==>v[2] gremlin> titan.addEdge('dependsOn',gremlin,'since','0.1') ==>e[4][2-dependsOn->0] name:titan name:gremlin 0 software 2 software dependsOn since:0.1
  • 105.
    Gremlin is astyle of graph traversal that is heavily influenced by functional programming. gremlin> gremlin = g.addVertex(label,'software','name','gremlin') ==>v[0] gremlin> titan = g.addVertex(label,'software','name','titan') ==>v[2] gremlin> titan.addEdge('dependsOn',gremlin,'since','0.1') ==>e[4][2-dependsOn->0] ////////////////////////////////////////////////////////////////// gremlin> g.V ==>v[0] ==>v[2] gremlin> g.V.has('name','titan') ==>v[2] gremlin> g.V.has('name','titan').outE ==>e[4][2-dependsOn->0] gremlin> g.V.has('name','titan').outE.since ==>0.1 gremlin> g.V.has('name','titan').out.name ==>gremlin
  • 106.
    g.V.has('name','titan').out.name What arethe names of the vertices outgoing adjacent to Titan?
  • 107.
    g.V.has('name','titan').out.name name=gremlin 0 2 name=titan For the graph 'g'... Traversals are defined using fluent, function composition.
  • 108.
    g.V.has('name','titan').out.name name=gremlin 0 2 name=titan …get all the vertices... Objects flow through the traversal in a lazy evaluation manner.
  • 109.
    g.V.has('name','titan').out.name 2 name=titan …filter all vertices that don't have the name 'titan'… The general functions are map(), flatMap(), filter(), sideEffect(), and branch().
  • 110.
    g.V.has('name','titan').out.name name=gremlin 0 …get all outgoing adjacent vertices… Any programming language can provide a Gremlin implementation.
  • 111.
    g.V.has('name','titan').out.name gremlin …getthe names of those vertices... Any graph system vendor can provide a binding to Gremlin.
  • 112.
    THE GENERIC STEPS ? S filter S S map E S flatMap sideEffect branch A E E E E E E S S S S ... S ? ? has() retain() except() interval() simplePath() dedup() timeLimit() ... back() fold() valueMap() match() orderBy() shuffle() ... out() outE() inV() both() values() unfold() ... aggregate() store() groupCount() groupBy() tree() count() subgraph() ... jump() until() choose() union() ...
  • 113.
    TINKERPOP'S OLAP STORY Execute parallel, distributed Gremlin traversals. Execute vertex-centric message passing algorithms.
  • 114.
    TINKERPOP'S OLAP STORY Execute parallel, distributed Gremlin traversals. Execute vertex-centric message passing algorithms. traversers are the messages!
  • 115.
    OLAP 1 1 1 1 2 2 2 3 1 2 3 OLTP ✓ Real-time queries. ✓ Touching a subset of the graph. ✓ Numerous concurrent queries. ✓ Good for local graph mutations. ✓ Long running queries. ✓ Touching the entire of the graph. ✓ Single user. ✓ Good for bulk loading/mutations.
  • 116.
    g.compute() GraphComputer Vertex Program program mapReduce distributed to all workers MapReduce MapReduce MapReduce worker 1 worker 2 worker 3 1. Logically distribute the vertex program to all the vertices in the graph. 2. The vertices execute the vertex program sending and receiving messages on each iteration. 3. Any number of MapReduce jobs can follow to aggregate the computed data in the graph.
  • 117.
    gremlin> result =g.compute().program(PageRankVertexProgram).submit() ==>result[tinkergraph[vertices:6 edges:6],memory[size:0]] gremlin> result.memory.iteration ==>30 gremlin> result.graph ==>tinkergraph[vertices:6 edges:6] gremlin> result.graph.V.map{ [it.id,it.pageRank]} ==>[1, 0.15000000000000002] ==>[2, 0.19250000000000003] ==>[3, 0.4018125] ==>[4, 0.19250000000000003] ==>[5, 0.23181250000000003] ==>[6, 0.15000000000000002] * Minor tweaks to example to make it fit nicely on a single line.
  • 118.
    gremlin> g.V.both.both.name.groupCount() ==>[ripple:3,peter:3, vadas:3, josh:7, lop:7, marko:7] gremlin> g.V.both.both.name.groupCount().submit(g.compute()) ==>[ripple:3, peter:3, vadas:3, josh:7, lop:7, marko:7] OLTP OLAP
  • 119.
    MULTIPLE GRAPH COMPUTERS FOR THE SAME GRAPH gremlin> g = TitanFactory.open(conf) ==>titangraph[66.77.88.99] gremlin> g.compute(FaunusGraphComputer) .program(PageRankVertexProgram) .submit() ==>result[faunusgraph,memory[size:0]] gremlin> g = TitanFactory.open(conf) ==>titangraph[66.77.88.99] gremlin> g.compute(FulgoraGraphComputer) .program(PageRankVertexProgram) .submit() ==>result[fulgoragraph,memory[size:0]]
  • 120.
  • 121.
    GREMLIN SERVER gremlin>g = TitanFactory.open(conf) ==>titangraph[66.77.88.99] localhost
  • 122.
    GREMLIN SERVER gremlin>g = TitanFactory.open(conf) ==>titangraph[66.77.88.99] gremlin> g.V.has('name','titan') ==>v[2] localhost
  • 123.
    GREMLIN SERVER gremlin>g = TitanFactory.open(conf) ==>titangraph[66.77.88.99] gremlin> g.V.has('name','titan') ==>v[2] gremlin> g.V.has('name','titan').out localhost 1. get the titan vertex 2. get titan's adjacents
  • 124.
    GREMLIN SERVER gremlin>g = TitanFactory.open(conf) ==>titangraph[66.77.88.99] gremlin> g.V.has('name','titan') ==>v[2] gremlin> g.V.has('name','titan').out localhost 1. get the titan vertex 2. get titan's adjacents CHATTY Each step in the traversal is a round trip
  • 125.
  • 126.
  • 127.
    GREMLIN SERVER gremlin>:remote connect tinkerpop.server titan.yaml ==>connected - 66.77.88.99:8182,11.22.33.44:8182,... localhost
  • 128.
    GREMLIN SERVER gremlin>:remote connect tinkerpop.server titan.yaml ==>connected - 66.77.88.99:8182,11.22.33.44:8182,... gremlin> :> g.V.has('name','titan') ==>v[2] localhost g.V.has('name','titan')
  • 129.
    GREMLIN SERVER gremlin>:remote connect tinkerpop.server titan.yaml ==>connected - 66.77.88.99:8182,11.22.33.44:8182,... gremlin> :> g.V.has('name','titan') ==>v[2] gremlin> :> g.V.has('name','titan').out localhost g.V.has('name','titan').out LESS CHATTY 1. Gremlin Server client knows the hash function to query the machine with the Titan vertex 2. Push the traversal to the server and get a WebSockets iterator back
  • 130.
    GREMLIN SERVER Allowsnon-JVM languages to interact with a remote graph. Supports stored procedures -- MyAlgorithms.class Supports WebSockets, REST, and various serialization mechanisms.
  • 131.
    CREDITS PRESENTED BY MARKO A. RODRIGUEZ MANY THANKS TO MATTHIAS BRöCHELER STEPHEN MALLETTE DAN LAROCQUE DANIEL KUPPITZ BOB BRIODY BRYN COOKE JOSH SHINAVIER PAVEL YASKEVICH AURELIUS COMMUNITY TINKERPOP COMMUNITY KETRINA YIM