Graph databases: Tinkerpop and Titan DB

GRAPH DATABASES: THE
SOLUTION FOR STORING
SEMI-STRUCTURED BIG DATA
Mohamed
Taher
Alrefaie

DATA IS
GETTING
BIGGER“Every two days, we
create as much
information as we
did us to 2003”. Eric
Schmidt, former
Google CEO, 2010.

DATA IS
MORE
CONNECTEDHaving a look at the
following proves it:
- Facebook Graph
- LinkedIn Graph
- Linked Data
- Blogs/Tagging

DATA IS LESS STRUCTURED
Modelling FB
Graph?
Persons,
friendships,
photos, locations,
apps, pages, ads,
interests, age
range, etc.

NOSQL DATABASES
Four types of
databases that
alleviate the
performance
issues of
relational
databases

KEY VALUE STORES
Data Model:
 Global key-value mapping
 Big scalable HashMap
 Highly fault tolerant (typically)
Examples:
 Redis, Riak, Voldemort. Dynamo

KEY VALUE STORES: PROS AND
CONS
Pros:
Simple data model
Scalable
Cons
Create your own “foreign keys”
Poor for complex data

COLUMN FAMILY
Main idea is based on BigTable: Google’s
distributed storage model for Structured Data
Data Model:
A big table, with column families
Map Reduce for querying/processing
Examples:
 HBase, HyperTable, Cassandra

COLUMN FAMILY: PROS AND CONS
Pros:
Supports Semi-Structured Data
Naturally Indexed (columns)
Scalable
Cons
Poor for interconnected data

DOCUMENT DATABASES
Data Model:
A collection of documents
A document is a key value collection
Index-centric, uses map-reduce extensively
Examples:
 CouchDB, MongoDB

DOCUMENT DATABASES: PROS AND
CONS
Pros:
Simple, powerful data model
Scalable
Cons
Poor for interconnected data
Query model limited to keys and indexes
Map reduce for larger queries

GRAPH DATABASES
Data Model:
Nodes and Relationships
Examples:
 Titan, Neo4j, OrientDB, etc.

GRAPH DATABASES: PROS AND
CONS
Pros:
Powerful data model, as general as RDBMS
Connected data locally indexed
Easy to query
Cons
Sharding
Requires different data modelling

RDBMS
LIVING IN A NOSQL WORLD
Complexity
BigTable
Clones
Size
Key-Value
Store
Document
Databases
Graph
Databases
90% of
Use Cases
Relational
Databases
9,223,372,036,854,775,807

WHAT IS A GRAPH?
An abstract representation of a set of objects where
some pairs are connected by links.
Object (Vertex, Node)
Link (Edge, Arc,
Relationship)

WHAT IS A GRAPH DATABASE?
A database with an explicit graph structure
Each node knows its adjacent nodes through edges
As the number of nodes increases, the cost of a local
step (or hop) remains the same plus an Index for
lookups

APACHE TINKERPOP: A UNIFIED API
Dealing with such
complex databases,
requires a well-
implemented API by the
vendor. But using a
vendor specific API,
makes migrating to
another database
impossible.
The solution is provided
by Apache Tinkerpop.

WHAT IS APACHE TINKERPOP?
● A Graph processing system
● Currently under Apache incubation ( 2015 )
● Has Tinkerpop3 Structure API
● Graph, Element, Property
● Has Tinkerpop3 Process API
● TraversalSource, GraphComputer
● Gremlin query language
● A scripting language for graph traversal and mutation
● REST API

WHY APACHE TINKERPOP?
Tinkerpop is a generic API for graph databases
Think ODBC, JDBC or Hibernate for relational
databases
Integrates with:
Titan DB
Neo4j
Orient DB
And many more.
Uses Gremlin graph scripting language

TITAN DATABASE
Titan is a scalable graph database using Tinkerpop
APIs optimized for storing and querying graphs
containing hundreds of billions of vertices and edges
distributed across a multi-machine cluster.
Supports Apache Spark and Hadoop (implicitly) for
map-reduce operations.
Integrates with:
 Elasticsearch, Solr, Lucene
Uses as a backend storage:
 Apache Cassandra
 Apache Hbase

PUTTING IT ALL TOGETHER
Apache Tinkerpop API
Gremlin server Graph traversal Gremlin client Monitoring
Titan DB
Storage specific (Cassandra, HBase, BerkeleyDB)

TITAN: EXAMPLE
Download titan server and console here
 https://github.com/thinkaurelius/titan/wiki/Downloads
$ cd titan-1.0.0-hadoop1
$ bin/gremlin.sh
gremlin> graph=TitanFactory.open(“conf/titan-berkely-
es.properties”)
gremlin> g=GraphOfGodsFactory.load(graph).traversal()

TINKERPOP: EXAMPLE
Graph g = TinkerGraph.open(); (1)
Vertex marko = g.addVertex(Element.ID, 1, "name", "marko", "age", 29); (2)
Vertex vadas = g.addVertex(Element.ID, 2, "name", "vadas", "age", 27);
Vertex lop = g.addVertex(Element.ID, 3, "name", "lop", "lang", "java");
Vertex josh = g.addVertex(Element.ID, 4, "name", "josh", "age", 32);
Vertex ripple = g.addVertex(Element.ID, 5, "name", "ripple", "lang", "java");
Vertex peter = g.addVertex(Element.ID, 6, "name", "peter", "age", 35);
marko.addEdge("knows", vadas, Element.ID, 7, "weight", 0.5f); (3)
marko.addEdge("knows", josh, Element.ID, 8, "weight", 1.0f);
marko.addEdge("created", lop, Element.ID, 9, "weight", 0.4f);
josh.addEdge("created", ripple, Element.ID, 10, "weight", 1.0f);
josh.addEdge("created", lop, Element.ID, 11, "weight", 0.4f);
peter.addEdge("created", lop, Element.ID, 12, "weight", 0.2f);

TINKERPOP: EXAMPLE (CONT.)
gremlin> g.V().has('name','marko')
.out('knows')
.values('name') (3)
==>vadas
==>josh

SUMMARY
Graph databases are the solution for highly scalable
semi-structured connected data.
Apache Tinkerpop is a generic API for graph databases
to avoid DB vendor specific business logic code.
Titan DB is a scalable distributed graph database on
top of several other databases. It uses BerkeleyDB,
HBase or BerkeleyDB as an end storage. This helps the
database to be as linear or scalable you want it to be.

REFERENCES
http://www.slideshare.net/maxdemarzi/introduction-to-graph-
databases-12735789
http://www.slideshare.net/mikejf12/an-introduction-to-apache-
tinkerpop
http://www.tinkerpop.com
http://tinkerpop.incubator.apache.org
http://tinkerpop.incubator.apache.org/docs/3.0.0.M9-
incubating/#gremlin-console
http://www.titandb.io

MOHAMED TAHER
ALREFAIE
07/12/2015

Graph databases: Tinkerpop and Titan DB

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to Graph databases: Tinkerpop and Titan DB

Similar to Graph databases: Tinkerpop and Titan DB (20)

Recently uploaded

Recently uploaded (20)

Graph databases: Tinkerpop and Titan DB