SlideShare a Scribd company logo
Introduction to TitanDB
Bharat Singh
Software Consultant
Knoldus Software LLP.
Agenda
● Graph Database
● What is Graph Database
● Need for Graph Database
● Titan DB
● Why Titan DB
● CAP theorem
● Architecture overview
● Future of TitanDB
● Apache TinkerPop
● What is Apache TinkerPop
● Need for Apache TinkerPop
What is Graph Database
● A database that uses graph
structures for semantic queries
with nodes, edges and properties
to represent and store data.
● Most graph databases are NoSQL
in nature
● Store data in a key-value store or
document-oriented database.
● Store relationships between values
as first class citizens.
Need for Graph Database
● Data is more connected : Being
shared across multiple applications
on the web
● It is easier to query data stored in a
graph structure where nodes are
highly connected
● It removes the need to perform
multiple join operations between
adjacent neighbours
● It allows the use of many algorithms
that helps in optimization
● Allows visualization of data and
infer hidden relationships or derive
predictions from data.
Why Titan DB
● Support for very large graphs. Titan graphs scale with
the number of machines in the cluster.
● Support for ACID properties and eventual consistency.
● Support for very many concurrent transactions and
operational graph processing.
● Titan’s transactional capacity scales with the number of
machines in the cluster and answers complex traversal
queries on huge graphs in milliseconds.
● Vertex-centric indices provide vertex-level querying to
solve infamous super node problem.
● Provides an optimized disk representation to allow for
efficient use of storage and speed of access.
● Open source with the liberal Apache 2 license.
Features of Titan DB
●
Support for various storage backends:
– Apache Cassandra
– Apache HBase
– Oracle BerkeleyDB
●
Support for global graph data analytics, reporting, and ETL
through integration with big data platforms:
– Apache Spark
– Apache Giraph
– Apache Hadoop
●
Support for geo, numeric range, and full-text search via:
– ElasticSearch
– Solr
– Lucene
●
Native integration with the TinkerPop graph stack:
– Gremlin graph query language
– Gremlin graph server
– Gremlin applications
CAP Theorem
● CAP Theorem
– C=Consistency
– A=Availability
– P=Partitionability
● HBase favours consistency
– At expense of yield
– i.e. non completed requests
● Cassandra favours availability
– At expense of harvest
– i.e. completeness of answer
● Berkeley DB is non distributed
Architecture overview of Titan DB
Future of TitanDB
● Aurelius is the startup behind
Titan, an open source graph
database
● DataStax, the company that
delivers Apache Cassandra™ to
the enterprise have now acquired
Aurelius on Feb 3rd, 2015
● The Aurelius team will join
DataStax to build DataStax
Enterprise (DSE) Graph, adding
graph database capabilities into
DSE alongside Apache
Cassandra
What is Apache TinkerPop
● A Graph processing system,
currently under Apache
incubation
● Has Tinkerpop3 Structure
API
● Graph, Element, Property
● Has Tinkerpop3 Process API
● TraversalSource, GraphComputer
● Gremlin query language
● A scripting language for graph traversal
and mutation
● REST API
Need for Apache TinkerPop
Dealing with such complex databases, requires a
well-implemented API by the vendor. But using a
vendor specific API, makes migrating to another
database impossible.
The solution is provided by Apache Tinkerpop
Introduction to TitanDB
References
•https://en.wikipedia.org/wiki/Graph_database
•http://thinkaurelius.github.io/titan/
•http://tinkerpop.apache.org/docs/3.2.0-incubatin
g/reference/
•http://www.datastax.com/2015/02/datastax-acqui
res-aurelius-the-experts-behind-titandb
Thank You

More Related Content

Introduction to TitanDB

  • 1. Introduction to TitanDB Bharat Singh Software Consultant Knoldus Software LLP.
  • 2. Agenda ● Graph Database ● What is Graph Database ● Need for Graph Database ● Titan DB ● Why Titan DB ● CAP theorem ● Architecture overview ● Future of TitanDB ● Apache TinkerPop ● What is Apache TinkerPop ● Need for Apache TinkerPop
  • 3. What is Graph Database ● A database that uses graph structures for semantic queries with nodes, edges and properties to represent and store data. ● Most graph databases are NoSQL in nature ● Store data in a key-value store or document-oriented database. ● Store relationships between values as first class citizens.
  • 4. Need for Graph Database ● Data is more connected : Being shared across multiple applications on the web ● It is easier to query data stored in a graph structure where nodes are highly connected ● It removes the need to perform multiple join operations between adjacent neighbours ● It allows the use of many algorithms that helps in optimization ● Allows visualization of data and infer hidden relationships or derive predictions from data.
  • 5. Why Titan DB ● Support for very large graphs. Titan graphs scale with the number of machines in the cluster. ● Support for ACID properties and eventual consistency. ● Support for very many concurrent transactions and operational graph processing. ● Titan’s transactional capacity scales with the number of machines in the cluster and answers complex traversal queries on huge graphs in milliseconds. ● Vertex-centric indices provide vertex-level querying to solve infamous super node problem. ● Provides an optimized disk representation to allow for efficient use of storage and speed of access. ● Open source with the liberal Apache 2 license.
  • 6. Features of Titan DB ● Support for various storage backends: – Apache Cassandra – Apache HBase – Oracle BerkeleyDB ● Support for global graph data analytics, reporting, and ETL through integration with big data platforms: – Apache Spark – Apache Giraph – Apache Hadoop ● Support for geo, numeric range, and full-text search via: – ElasticSearch – Solr – Lucene ● Native integration with the TinkerPop graph stack: – Gremlin graph query language – Gremlin graph server – Gremlin applications
  • 7. CAP Theorem ● CAP Theorem – C=Consistency – A=Availability – P=Partitionability ● HBase favours consistency – At expense of yield – i.e. non completed requests ● Cassandra favours availability – At expense of harvest – i.e. completeness of answer ● Berkeley DB is non distributed
  • 9. Future of TitanDB ● Aurelius is the startup behind Titan, an open source graph database ● DataStax, the company that delivers Apache Cassandra™ to the enterprise have now acquired Aurelius on Feb 3rd, 2015 ● The Aurelius team will join DataStax to build DataStax Enterprise (DSE) Graph, adding graph database capabilities into DSE alongside Apache Cassandra
  • 10. What is Apache TinkerPop ● A Graph processing system, currently under Apache incubation ● Has Tinkerpop3 Structure API ● Graph, Element, Property ● Has Tinkerpop3 Process API ● TraversalSource, GraphComputer ● Gremlin query language ● A scripting language for graph traversal and mutation ● REST API
  • 11. Need for Apache TinkerPop Dealing with such complex databases, requires a well-implemented API by the vendor. But using a vendor specific API, makes migrating to another database impossible. The solution is provided by Apache Tinkerpop