This document introduces TitanDB, a scalable graph database, and Apache TinkerPop, an open-source graph computing framework. It defines what a graph database is, the need for graph databases and TitanDB. It describes key features of TitanDB like support for various storage backends and integration with tools like Spark and Giraph. It also summarizes the CAP theorem, TitanDB architecture, its acquisition by DataStax, and what Apache TinkerPop is and why it is needed when dealing with complex graph databases.
2. Agenda
● Graph Database
● What is Graph Database
● Need for Graph Database
● Titan DB
● Why Titan DB
● CAP theorem
● Architecture overview
● Future of TitanDB
● Apache TinkerPop
● What is Apache TinkerPop
● Need for Apache TinkerPop
3. What is Graph Database
● A database that uses graph
structures for semantic queries
with nodes, edges and properties
to represent and store data.
● Most graph databases are NoSQL
in nature
● Store data in a key-value store or
document-oriented database.
● Store relationships between values
as first class citizens.
4. Need for Graph Database
● Data is more connected : Being
shared across multiple applications
on the web
● It is easier to query data stored in a
graph structure where nodes are
highly connected
● It removes the need to perform
multiple join operations between
adjacent neighbours
● It allows the use of many algorithms
that helps in optimization
● Allows visualization of data and
infer hidden relationships or derive
predictions from data.
5. Why Titan DB
● Support for very large graphs. Titan graphs scale with
the number of machines in the cluster.
● Support for ACID properties and eventual consistency.
● Support for very many concurrent transactions and
operational graph processing.
● Titan’s transactional capacity scales with the number of
machines in the cluster and answers complex traversal
queries on huge graphs in milliseconds.
● Vertex-centric indices provide vertex-level querying to
solve infamous super node problem.
● Provides an optimized disk representation to allow for
efficient use of storage and speed of access.
● Open source with the liberal Apache 2 license.
6. Features of Titan DB
●
Support for various storage backends:
– Apache Cassandra
– Apache HBase
– Oracle BerkeleyDB
●
Support for global graph data analytics, reporting, and ETL
through integration with big data platforms:
– Apache Spark
– Apache Giraph
– Apache Hadoop
●
Support for geo, numeric range, and full-text search via:
– ElasticSearch
– Solr
– Lucene
●
Native integration with the TinkerPop graph stack:
– Gremlin graph query language
– Gremlin graph server
– Gremlin applications
7. CAP Theorem
● CAP Theorem
– C=Consistency
– A=Availability
– P=Partitionability
● HBase favours consistency
– At expense of yield
– i.e. non completed requests
● Cassandra favours availability
– At expense of harvest
– i.e. completeness of answer
● Berkeley DB is non distributed
9. Future of TitanDB
● Aurelius is the startup behind
Titan, an open source graph
database
● DataStax, the company that
delivers Apache Cassandra™ to
the enterprise have now acquired
Aurelius on Feb 3rd, 2015
● The Aurelius team will join
DataStax to build DataStax
Enterprise (DSE) Graph, adding
graph database capabilities into
DSE alongside Apache
Cassandra
10. What is Apache TinkerPop
● A Graph processing system,
currently under Apache
incubation
● Has Tinkerpop3 Structure
API
● Graph, Element, Property
● Has Tinkerpop3 Process API
● TraversalSource, GraphComputer
● Gremlin query language
● A scripting language for graph traversal
and mutation
● REST API
11. Need for Apache TinkerPop
Dealing with such complex databases, requires a
well-implemented API by the vendor. But using a
vendor specific API, makes migrating to another
database impossible.
The solution is provided by Apache Tinkerpop