SlideShare a Scribd company logo
GRAPH DATABASES: THE
SOLUTION FOR STORING
SEMI-STRUCTURED BIG DATA
Mohamed
Taher
Alrefaie
DATA IS
GETTING
BIGGER“Every two days, we
create as much
information as we
did us to 2003”. Eric
Schmidt, former
Google CEO, 2010.
DATA IS
MORE
CONNECTEDHaving a look at the
following proves it:
- Facebook Graph
- LinkedIn Graph
- Linked Data
- Blogs/Tagging
DATA IS LESS STRUCTURED
Modelling FB
Graph?
Persons,
friendships,
photos, locations,
apps, pages, ads,
interests, age
range, etc.
NOSQL DATABASES
Four types of
databases that
alleviate the
performance
issues of
relational
databases
KEY VALUE STORES
Data Model:
 Global key-value mapping
 Big scalable HashMap
 Highly fault tolerant (typically)
Examples:
 Redis, Riak, Voldemort. Dynamo
KEY VALUE STORES: PROS AND
CONS
Pros:
Simple data model
Scalable
Cons
Create your own “foreign keys”
Poor for complex data
COLUMN FAMILY
Main idea is based on BigTable: Google’s
distributed storage model for Structured Data
Data Model:
A big table, with column families
Map Reduce for querying/processing
Examples:
 HBase, HyperTable, Cassandra
COLUMN FAMILY: PROS AND CONS
Pros:
Supports Semi-Structured Data
Naturally Indexed (columns)
Scalable
Cons
Poor for interconnected data
DOCUMENT DATABASES
Data Model:
A collection of documents
A document is a key value collection
Index-centric, uses map-reduce extensively
Examples:
 CouchDB, MongoDB
DOCUMENT DATABASES: PROS AND
CONS
Pros:
Simple, powerful data model
Scalable
Cons
Poor for interconnected data
Query model limited to keys and indexes
Map reduce for larger queries
GRAPH DATABASES
Data Model:
Nodes and Relationships
Examples:
 Titan, Neo4j, OrientDB, etc.
GRAPH DATABASES: PROS AND
CONS
Pros:
Powerful data model, as general as RDBMS
Connected data locally indexed
Easy to query
Cons
Sharding
Requires different data modelling
RDBMS
LIVING IN A NOSQL WORLD
Complexity
BigTable
Clones
Size
Key-Value
Store
Document
Databases
Graph
Databases
90% of
Use Cases
Relational
Databases
9,223,372,036,854,775,807
WHAT IS A GRAPH?
An abstract representation of a set of objects where
some pairs are connected by links.
Object (Vertex, Node)
Link (Edge, Arc,
Relationship)
WHAT IS A GRAPH DATABASE?
A database with an explicit graph structure
Each node knows its adjacent nodes through edges
As the number of nodes increases, the cost of a local
step (or hop) remains the same plus an Index for
lookups
APACHE TINKERPOP: A UNIFIED API
Dealing with such
complex databases,
requires a well-
implemented API by the
vendor. But using a
vendor specific API,
makes migrating to
another database
impossible.
The solution is provided
by Apache Tinkerpop.
WHAT IS APACHE TINKERPOP?
● A Graph processing system
● Currently under Apache incubation ( 2015 )
● Has Tinkerpop3 Structure API
● Graph, Element, Property
● Has Tinkerpop3 Process API
● TraversalSource, GraphComputer
● Gremlin query language
● A scripting language for graph traversal and mutation
● REST API
WHY APACHE TINKERPOP?
Tinkerpop is a generic API for graph databases
Think ODBC, JDBC or Hibernate for relational
databases
Integrates with:
Titan DB
Neo4j
Orient DB
And many more.
Uses Gremlin graph scripting language
TITAN DATABASE
Titan is a scalable graph database using Tinkerpop
APIs optimized for storing and querying graphs
containing hundreds of billions of vertices and edges
distributed across a multi-machine cluster.
Supports Apache Spark and Hadoop (implicitly) for
map-reduce operations.
Integrates with:
 Elasticsearch, Solr, Lucene
Uses as a backend storage:
 Apache Cassandra
 Apache Hbase
PUTTING IT ALL TOGETHER
Apache Tinkerpop API
Gremlin server Graph traversal Gremlin client Monitoring
Titan DB
Storage specific (Cassandra, HBase, BerkeleyDB)
TITAN: EXAMPLE
Download titan server and console here
 https://github.com/thinkaurelius/titan/wiki/Downloads
$ cd titan-1.0.0-hadoop1
$ bin/gremlin.sh
gremlin> graph=TitanFactory.open(“conf/titan-berkely-
es.properties”)
gremlin> g=GraphOfGodsFactory.load(graph).traversal()
TINKERPOP: EXAMPLE
Graph g = TinkerGraph.open(); (1)
Vertex marko = g.addVertex(Element.ID, 1, "name", "marko", "age", 29); (2)
Vertex vadas = g.addVertex(Element.ID, 2, "name", "vadas", "age", 27);
Vertex lop = g.addVertex(Element.ID, 3, "name", "lop", "lang", "java");
Vertex josh = g.addVertex(Element.ID, 4, "name", "josh", "age", 32);
Vertex ripple = g.addVertex(Element.ID, 5, "name", "ripple", "lang", "java");
Vertex peter = g.addVertex(Element.ID, 6, "name", "peter", "age", 35);
marko.addEdge("knows", vadas, Element.ID, 7, "weight", 0.5f); (3)
marko.addEdge("knows", josh, Element.ID, 8, "weight", 1.0f);
marko.addEdge("created", lop, Element.ID, 9, "weight", 0.4f);
josh.addEdge("created", ripple, Element.ID, 10, "weight", 1.0f);
josh.addEdge("created", lop, Element.ID, 11, "weight", 0.4f);
peter.addEdge("created", lop, Element.ID, 12, "weight", 0.2f);
TINKERPOP: EXAMPLE (CONT.)
gremlin> g.V().has('name','marko')
.out('knows')
.values('name') (3)
==>vadas
==>josh
SUMMARY
Graph databases are the solution for highly scalable
semi-structured connected data.
Apache Tinkerpop is a generic API for graph databases
to avoid DB vendor specific business logic code.
Titan DB is a scalable distributed graph database on
top of several other databases. It uses BerkeleyDB,
HBase or BerkeleyDB as an end storage. This helps the
database to be as linear or scalable you want it to be.
REFERENCES
http://www.slideshare.net/maxdemarzi/introduction-to-graph-
databases-12735789
http://www.slideshare.net/mikejf12/an-introduction-to-apache-
tinkerpop
http://www.tinkerpop.com
http://tinkerpop.incubator.apache.org
http://tinkerpop.incubator.apache.org/docs/3.0.0.M9-
incubating/#gremlin-console
http://www.titandb.io
MOHAMED TAHER
ALREFAIE
07/12/2015

More Related Content

What's hot

Spark's Role in the Big Data Ecosystem (Spark Summit 2014)
Spark's Role in the Big Data Ecosystem (Spark Summit 2014)Spark's Role in the Big Data Ecosystem (Spark Summit 2014)
Spark's Role in the Big Data Ecosystem (Spark Summit 2014)
Databricks
 
Mapreduce in Search
Mapreduce in SearchMapreduce in Search
Mapreduce in Search
Amund Tveit
 
Designing and Building a Graph Database Application - Ian Robinson (Neo Techn...
Designing and Building a Graph Database Application - Ian Robinson (Neo Techn...Designing and Building a Graph Database Application - Ian Robinson (Neo Techn...
Designing and Building a Graph Database Application - Ian Robinson (Neo Techn...
jaxLondonConference
 
Benchmarking graph databases on the problem of community detection
Benchmarking graph databases on the problem of community detectionBenchmarking graph databases on the problem of community detection
Benchmarking graph databases on the problem of community detection
Symeon Papadopoulos
 
Building data pipelines
Building data pipelinesBuilding data pipelines
Building data pipelines
Jonathan Holloway
 
Spark Summit 2015 keynote: Making Big Data Simple with Spark
Spark Summit 2015 keynote: Making Big Data Simple with SparkSpark Summit 2015 keynote: Making Big Data Simple with Spark
Spark Summit 2015 keynote: Making Big Data Simple with Spark
Databricks
 
Spark Summit EU 2015: Combining the Strengths of MLlib, scikit-learn, and R
Spark Summit EU 2015: Combining the Strengths of MLlib, scikit-learn, and RSpark Summit EU 2015: Combining the Strengths of MLlib, scikit-learn, and R
Spark Summit EU 2015: Combining the Strengths of MLlib, scikit-learn, and R
Databricks
 
Spark what's new what's coming
Spark what's new what's comingSpark what's new what's coming
Spark what's new what's coming
Databricks
 
Large Scale Graph Analytics with JanusGraph
Large Scale Graph Analytics with JanusGraphLarge Scale Graph Analytics with JanusGraph
Large Scale Graph Analytics with JanusGraph
P. Taylor Goetz
 
New Directions for Spark in 2015 - Spark Summit East
New Directions for Spark in 2015 - Spark Summit EastNew Directions for Spark in 2015 - Spark Summit East
New Directions for Spark in 2015 - Spark Summit East
Databricks
 
Time-evolving Graph Processing on Commodity Clusters: Spark Summit East talk ...
Time-evolving Graph Processing on Commodity Clusters: Spark Summit East talk ...Time-evolving Graph Processing on Commodity Clusters: Spark Summit East talk ...
Time-evolving Graph Processing on Commodity Clusters: Spark Summit East talk ...
Spark Summit
 
Paris Spark Meetup (Feb2015) ccarbone : SPARK Streaming vs Storm / MLLib / Ne...
Paris Spark Meetup (Feb2015) ccarbone : SPARK Streaming vs Storm / MLLib / Ne...Paris Spark Meetup (Feb2015) ccarbone : SPARK Streaming vs Storm / MLLib / Ne...
Paris Spark Meetup (Feb2015) ccarbone : SPARK Streaming vs Storm / MLLib / Ne...
Cedric CARBONE
 
Declarative Machine Learning: Bring your own Syntax, Algorithm, Data and Infr...
Declarative Machine Learning: Bring your own Syntax, Algorithm, Data and Infr...Declarative Machine Learning: Bring your own Syntax, Algorithm, Data and Infr...
Declarative Machine Learning: Bring your own Syntax, Algorithm, Data and Infr...
Turi, Inc.
 
Big Data, Mob Scale.
Big Data, Mob Scale.Big Data, Mob Scale.
Big Data, Mob Scale.
darach
 
HUG France Feb 2016 - Migration de données structurées entre Hadoop et RDBMS ...
HUG France Feb 2016 - Migration de données structurées entre Hadoop et RDBMS ...HUG France Feb 2016 - Migration de données structurées entre Hadoop et RDBMS ...
HUG France Feb 2016 - Migration de données structurées entre Hadoop et RDBMS ...
Modern Data Stack France
 
Giraph at Hadoop Summit 2014
Giraph at Hadoop Summit 2014Giraph at Hadoop Summit 2014
Giraph at Hadoop Summit 2014
Claudio Martella
 
H2O World - H2O Rains with Databricks Cloud
H2O World - H2O Rains with Databricks CloudH2O World - H2O Rains with Databricks Cloud
H2O World - H2O Rains with Databricks Cloud
Sri Ambati
 
Jump Start into Apache® Spark™ and Databricks
Jump Start into Apache® Spark™ and DatabricksJump Start into Apache® Spark™ and Databricks
Jump Start into Apache® Spark™ and Databricks
Databricks
 
Strata NYC 2015 - Supercharging R with Apache Spark
Strata NYC 2015 - Supercharging R with Apache SparkStrata NYC 2015 - Supercharging R with Apache Spark
Strata NYC 2015 - Supercharging R with Apache Spark
Databricks
 
Analyzing Data With Python
Analyzing Data With PythonAnalyzing Data With Python
Analyzing Data With Python
Sarah Guido
 

What's hot (20)

Spark's Role in the Big Data Ecosystem (Spark Summit 2014)
Spark's Role in the Big Data Ecosystem (Spark Summit 2014)Spark's Role in the Big Data Ecosystem (Spark Summit 2014)
Spark's Role in the Big Data Ecosystem (Spark Summit 2014)
 
Mapreduce in Search
Mapreduce in SearchMapreduce in Search
Mapreduce in Search
 
Designing and Building a Graph Database Application - Ian Robinson (Neo Techn...
Designing and Building a Graph Database Application - Ian Robinson (Neo Techn...Designing and Building a Graph Database Application - Ian Robinson (Neo Techn...
Designing and Building a Graph Database Application - Ian Robinson (Neo Techn...
 
Benchmarking graph databases on the problem of community detection
Benchmarking graph databases on the problem of community detectionBenchmarking graph databases on the problem of community detection
Benchmarking graph databases on the problem of community detection
 
Building data pipelines
Building data pipelinesBuilding data pipelines
Building data pipelines
 
Spark Summit 2015 keynote: Making Big Data Simple with Spark
Spark Summit 2015 keynote: Making Big Data Simple with SparkSpark Summit 2015 keynote: Making Big Data Simple with Spark
Spark Summit 2015 keynote: Making Big Data Simple with Spark
 
Spark Summit EU 2015: Combining the Strengths of MLlib, scikit-learn, and R
Spark Summit EU 2015: Combining the Strengths of MLlib, scikit-learn, and RSpark Summit EU 2015: Combining the Strengths of MLlib, scikit-learn, and R
Spark Summit EU 2015: Combining the Strengths of MLlib, scikit-learn, and R
 
Spark what's new what's coming
Spark what's new what's comingSpark what's new what's coming
Spark what's new what's coming
 
Large Scale Graph Analytics with JanusGraph
Large Scale Graph Analytics with JanusGraphLarge Scale Graph Analytics with JanusGraph
Large Scale Graph Analytics with JanusGraph
 
New Directions for Spark in 2015 - Spark Summit East
New Directions for Spark in 2015 - Spark Summit EastNew Directions for Spark in 2015 - Spark Summit East
New Directions for Spark in 2015 - Spark Summit East
 
Time-evolving Graph Processing on Commodity Clusters: Spark Summit East talk ...
Time-evolving Graph Processing on Commodity Clusters: Spark Summit East talk ...Time-evolving Graph Processing on Commodity Clusters: Spark Summit East talk ...
Time-evolving Graph Processing on Commodity Clusters: Spark Summit East talk ...
 
Paris Spark Meetup (Feb2015) ccarbone : SPARK Streaming vs Storm / MLLib / Ne...
Paris Spark Meetup (Feb2015) ccarbone : SPARK Streaming vs Storm / MLLib / Ne...Paris Spark Meetup (Feb2015) ccarbone : SPARK Streaming vs Storm / MLLib / Ne...
Paris Spark Meetup (Feb2015) ccarbone : SPARK Streaming vs Storm / MLLib / Ne...
 
Declarative Machine Learning: Bring your own Syntax, Algorithm, Data and Infr...
Declarative Machine Learning: Bring your own Syntax, Algorithm, Data and Infr...Declarative Machine Learning: Bring your own Syntax, Algorithm, Data and Infr...
Declarative Machine Learning: Bring your own Syntax, Algorithm, Data and Infr...
 
Big Data, Mob Scale.
Big Data, Mob Scale.Big Data, Mob Scale.
Big Data, Mob Scale.
 
HUG France Feb 2016 - Migration de données structurées entre Hadoop et RDBMS ...
HUG France Feb 2016 - Migration de données structurées entre Hadoop et RDBMS ...HUG France Feb 2016 - Migration de données structurées entre Hadoop et RDBMS ...
HUG France Feb 2016 - Migration de données structurées entre Hadoop et RDBMS ...
 
Giraph at Hadoop Summit 2014
Giraph at Hadoop Summit 2014Giraph at Hadoop Summit 2014
Giraph at Hadoop Summit 2014
 
H2O World - H2O Rains with Databricks Cloud
H2O World - H2O Rains with Databricks CloudH2O World - H2O Rains with Databricks Cloud
H2O World - H2O Rains with Databricks Cloud
 
Jump Start into Apache® Spark™ and Databricks
Jump Start into Apache® Spark™ and DatabricksJump Start into Apache® Spark™ and Databricks
Jump Start into Apache® Spark™ and Databricks
 
Strata NYC 2015 - Supercharging R with Apache Spark
Strata NYC 2015 - Supercharging R with Apache SparkStrata NYC 2015 - Supercharging R with Apache Spark
Strata NYC 2015 - Supercharging R with Apache Spark
 
Analyzing Data With Python
Analyzing Data With PythonAnalyzing Data With Python
Analyzing Data With Python
 

Viewers also liked

Intro to Graph Databases Using Tinkerpop, TitanDB, and Gremlin
Intro to Graph Databases Using Tinkerpop, TitanDB, and GremlinIntro to Graph Databases Using Tinkerpop, TitanDB, and Gremlin
Intro to Graph Databases Using Tinkerpop, TitanDB, and Gremlin
Caleb Jones
 
Neo, Titan & Cassandra
Neo, Titan & CassandraNeo, Titan & Cassandra
Neo, Titan & Cassandra
johnrjenson
 
Titan: Scaling Graphs and TinkerPop3
Titan: Scaling Graphs and TinkerPop3Titan: Scaling Graphs and TinkerPop3
Titan: Scaling Graphs and TinkerPop3
Matthias Broecheler
 
Titan: The Rise of Big Graph Data
Titan: The Rise of Big Graph DataTitan: The Rise of Big Graph Data
Titan: The Rise of Big Graph Data
Marko Rodriguez
 
Titan: Big Graph Data with Cassandra
Titan: Big Graph Data with CassandraTitan: Big Graph Data with Cassandra
Titan: Big Graph Data with Cassandra
Matthias Broecheler
 
Adding Value through graph analysis using Titan and Faunus
Adding Value through graph analysis using Titan and FaunusAdding Value through graph analysis using Titan and Faunus
Adding Value through graph analysis using Titan and Faunus
Matthias Broecheler
 
Gremlin: A Graph-Based Programming Language
Gremlin: A Graph-Based Programming LanguageGremlin: A Graph-Based Programming Language
Gremlin: A Graph-Based Programming Language
Marko Rodriguez
 
Analysis on Cloud Computing Database
Analysis on Cloud Computing DatabaseAnalysis on Cloud Computing Database
Analysis on Cloud Computing Database
Mayuree Srikulwong
 
Near-Duplicate Video Retrieval by Aggregating Intermediate CNN Layers
Near-Duplicate Video Retrieval by Aggregating Intermediate CNN LayersNear-Duplicate Video Retrieval by Aggregating Intermediate CNN Layers
Near-Duplicate Video Retrieval by Aggregating Intermediate CNN Layers
Symeon Papadopoulos
 
The Path Forward
The Path ForwardThe Path Forward
The Path Forward
Marko Rodriguez
 
Apache Cassandra for Timeseries- and Graph-Data
Apache Cassandra for Timeseries- and Graph-DataApache Cassandra for Timeseries- and Graph-Data
Apache Cassandra for Timeseries- and Graph-Data
Guido Schmutz
 
Addressing performance issues in titan+cassandra
Addressing performance issues in titan+cassandraAddressing performance issues in titan+cassandra
Addressing performance issues in titan+cassandra
Nakul Jeirath
 
DataStax | Graph Computing with Apache TinkerPop (Marko Rodriguez) | Cassandr...
DataStax | Graph Computing with Apache TinkerPop (Marko Rodriguez) | Cassandr...DataStax | Graph Computing with Apache TinkerPop (Marko Rodriguez) | Cassandr...
DataStax | Graph Computing with Apache TinkerPop (Marko Rodriguez) | Cassandr...
DataStax
 
Exploring Titan and Spark GraphX for Analyzing Time-Varying Electrical Networks
Exploring Titan and Spark GraphX for Analyzing Time-Varying Electrical Networks Exploring Titan and Spark GraphX for Analyzing Time-Varying Electrical Networks
Exploring Titan and Spark GraphX for Analyzing Time-Varying Electrical Networks
DataWorks Summit/Hadoop Summit
 
Big Graph Data
Big Graph DataBig Graph Data
Big Graph Data
Matthias Broecheler
 
Graph db
Graph dbGraph db
Graph db
Gagan Agrawal
 
OrientDB vs Neo4j - and an introduction to NoSQL databases
OrientDB vs Neo4j - and an introduction to NoSQL databasesOrientDB vs Neo4j - and an introduction to NoSQL databases
OrientDB vs Neo4j - and an introduction to NoSQL databases
Curtis Mosters
 
Graph Databases for Master Data Management
Graph Databases for Master Data ManagementGraph Databases for Master Data Management
Graph Databases for Master Data Management
Neo4j
 
Using a Graph Database for Next-Gen MDM
Using a Graph Database for Next-Gen MDMUsing a Graph Database for Next-Gen MDM
Using a Graph Database for Next-Gen MDM
Neo4j
 
Solving Problems with Graphs
Solving Problems with GraphsSolving Problems with Graphs
Solving Problems with Graphs
Marko Rodriguez
 

Viewers also liked (20)

Intro to Graph Databases Using Tinkerpop, TitanDB, and Gremlin
Intro to Graph Databases Using Tinkerpop, TitanDB, and GremlinIntro to Graph Databases Using Tinkerpop, TitanDB, and Gremlin
Intro to Graph Databases Using Tinkerpop, TitanDB, and Gremlin
 
Neo, Titan & Cassandra
Neo, Titan & CassandraNeo, Titan & Cassandra
Neo, Titan & Cassandra
 
Titan: Scaling Graphs and TinkerPop3
Titan: Scaling Graphs and TinkerPop3Titan: Scaling Graphs and TinkerPop3
Titan: Scaling Graphs and TinkerPop3
 
Titan: The Rise of Big Graph Data
Titan: The Rise of Big Graph DataTitan: The Rise of Big Graph Data
Titan: The Rise of Big Graph Data
 
Titan: Big Graph Data with Cassandra
Titan: Big Graph Data with CassandraTitan: Big Graph Data with Cassandra
Titan: Big Graph Data with Cassandra
 
Adding Value through graph analysis using Titan and Faunus
Adding Value through graph analysis using Titan and FaunusAdding Value through graph analysis using Titan and Faunus
Adding Value through graph analysis using Titan and Faunus
 
Gremlin: A Graph-Based Programming Language
Gremlin: A Graph-Based Programming LanguageGremlin: A Graph-Based Programming Language
Gremlin: A Graph-Based Programming Language
 
Analysis on Cloud Computing Database
Analysis on Cloud Computing DatabaseAnalysis on Cloud Computing Database
Analysis on Cloud Computing Database
 
Near-Duplicate Video Retrieval by Aggregating Intermediate CNN Layers
Near-Duplicate Video Retrieval by Aggregating Intermediate CNN LayersNear-Duplicate Video Retrieval by Aggregating Intermediate CNN Layers
Near-Duplicate Video Retrieval by Aggregating Intermediate CNN Layers
 
The Path Forward
The Path ForwardThe Path Forward
The Path Forward
 
Apache Cassandra for Timeseries- and Graph-Data
Apache Cassandra for Timeseries- and Graph-DataApache Cassandra for Timeseries- and Graph-Data
Apache Cassandra for Timeseries- and Graph-Data
 
Addressing performance issues in titan+cassandra
Addressing performance issues in titan+cassandraAddressing performance issues in titan+cassandra
Addressing performance issues in titan+cassandra
 
DataStax | Graph Computing with Apache TinkerPop (Marko Rodriguez) | Cassandr...
DataStax | Graph Computing with Apache TinkerPop (Marko Rodriguez) | Cassandr...DataStax | Graph Computing with Apache TinkerPop (Marko Rodriguez) | Cassandr...
DataStax | Graph Computing with Apache TinkerPop (Marko Rodriguez) | Cassandr...
 
Exploring Titan and Spark GraphX for Analyzing Time-Varying Electrical Networks
Exploring Titan and Spark GraphX for Analyzing Time-Varying Electrical Networks Exploring Titan and Spark GraphX for Analyzing Time-Varying Electrical Networks
Exploring Titan and Spark GraphX for Analyzing Time-Varying Electrical Networks
 
Big Graph Data
Big Graph DataBig Graph Data
Big Graph Data
 
Graph db
Graph dbGraph db
Graph db
 
OrientDB vs Neo4j - and an introduction to NoSQL databases
OrientDB vs Neo4j - and an introduction to NoSQL databasesOrientDB vs Neo4j - and an introduction to NoSQL databases
OrientDB vs Neo4j - and an introduction to NoSQL databases
 
Graph Databases for Master Data Management
Graph Databases for Master Data ManagementGraph Databases for Master Data Management
Graph Databases for Master Data Management
 
Using a Graph Database for Next-Gen MDM
Using a Graph Database for Next-Gen MDMUsing a Graph Database for Next-Gen MDM
Using a Graph Database for Next-Gen MDM
 
Solving Problems with Graphs
Solving Problems with GraphsSolving Problems with Graphs
Solving Problems with Graphs
 

Similar to Graph databases: Tinkerpop and Titan DB

Apache spark - Architecture , Overview & libraries
Apache spark - Architecture , Overview & librariesApache spark - Architecture , Overview & libraries
Apache spark - Architecture , Overview & libraries
Walaa Hamdy Assy
 
A look under the hood at Apache Spark's API and engine evolutions
A look under the hood at Apache Spark's API and engine evolutionsA look under the hood at Apache Spark's API and engine evolutions
A look under the hood at Apache Spark's API and engine evolutions
Databricks
 
Big data vahidamiri-tabriz-13960226-datastack.ir
Big data vahidamiri-tabriz-13960226-datastack.irBig data vahidamiri-tabriz-13960226-datastack.ir
Big data vahidamiri-tabriz-13960226-datastack.ir
datastack
 
Finding the needles in the haystack. An Overview of Analyzing Big Data with H...
Finding the needles in the haystack. An Overview of Analyzing Big Data with H...Finding the needles in the haystack. An Overview of Analyzing Big Data with H...
Finding the needles in the haystack. An Overview of Analyzing Big Data with H...
Chris Baglieri
 
Getting Started on Hadoop
Getting Started on HadoopGetting Started on Hadoop
Getting Started on Hadoop
Paco Nathan
 
Gluecon 2012 - DynamoDB
Gluecon 2012 - DynamoDBGluecon 2012 - DynamoDB
Gluecon 2012 - DynamoDB
Jeff Douglas
 
DynamoDB Gluecon 2012
DynamoDB Gluecon 2012DynamoDB Gluecon 2012
DynamoDB Gluecon 2012
Appirio
 
Hadoop basics
Hadoop basicsHadoop basics
Hadoop basics
Antonio Silveira
 
Graph database in sv meetup
Graph database in sv meetupGraph database in sv meetup
Graph database in sv meetup
Joshua Bae
 
From Pandas to Koalas: Reducing Time-To-Insight for Virgin Hyperloop's Data
From Pandas to Koalas: Reducing Time-To-Insight for Virgin Hyperloop's DataFrom Pandas to Koalas: Reducing Time-To-Insight for Virgin Hyperloop's Data
From Pandas to Koalas: Reducing Time-To-Insight for Virgin Hyperloop's Data
Databricks
 
Functional programming
 for optimization problems 
in Big Data
Functional programming
  for optimization problems 
in Big DataFunctional programming
  for optimization problems 
in Big Data
Functional programming
 for optimization problems 
in Big Data
Paco Nathan
 
Boston Spark Meetup event Slides Update
Boston Spark Meetup event Slides UpdateBoston Spark Meetup event Slides Update
Boston Spark Meetup event Slides Update
vithakur
 
AWS Big Data Landscape
AWS Big Data LandscapeAWS Big Data Landscape
AWS Big Data Landscape
Crishantha Nanayakkara
 
Hadoop: An Industry Perspective
Hadoop: An Industry PerspectiveHadoop: An Industry Perspective
Hadoop: An Industry Perspective
Cloudera, Inc.
 
Google Cluster Innards
Google Cluster InnardsGoogle Cluster Innards
Google Cluster Innards
Martin Dvorak
 
Big Data Essentials meetup @ IBM Ljubljana 23.06.2015
Big Data Essentials meetup @ IBM Ljubljana 23.06.2015Big Data Essentials meetup @ IBM Ljubljana 23.06.2015
Big Data Essentials meetup @ IBM Ljubljana 23.06.2015
Andrey Vykhodtsev
 
GraphTech Ecosystem - part 1: Graph Databases
GraphTech Ecosystem - part 1: Graph DatabasesGraphTech Ecosystem - part 1: Graph Databases
GraphTech Ecosystem - part 1: Graph Databases
Linkurious
 
Unified Big Data Processing with Apache Spark (QCON 2014)
Unified Big Data Processing with Apache Spark (QCON 2014)Unified Big Data Processing with Apache Spark (QCON 2014)
Unified Big Data Processing with Apache Spark (QCON 2014)
Databricks
 
Scaling up with hadoop and banyan at ITRIX-2015, College of Engineering, Guindy
Scaling up with hadoop and banyan at ITRIX-2015, College of Engineering, GuindyScaling up with hadoop and banyan at ITRIX-2015, College of Engineering, Guindy
Scaling up with hadoop and banyan at ITRIX-2015, College of Engineering, Guindy
Rohit Kulkarni
 
Cassandra Summit 2014: Apache Spark - The SDK for All Big Data Platforms
Cassandra Summit 2014: Apache Spark - The SDK for All Big Data PlatformsCassandra Summit 2014: Apache Spark - The SDK for All Big Data Platforms
Cassandra Summit 2014: Apache Spark - The SDK for All Big Data Platforms
DataStax Academy
 

Similar to Graph databases: Tinkerpop and Titan DB (20)

Apache spark - Architecture , Overview & libraries
Apache spark - Architecture , Overview & librariesApache spark - Architecture , Overview & libraries
Apache spark - Architecture , Overview & libraries
 
A look under the hood at Apache Spark's API and engine evolutions
A look under the hood at Apache Spark's API and engine evolutionsA look under the hood at Apache Spark's API and engine evolutions
A look under the hood at Apache Spark's API and engine evolutions
 
Big data vahidamiri-tabriz-13960226-datastack.ir
Big data vahidamiri-tabriz-13960226-datastack.irBig data vahidamiri-tabriz-13960226-datastack.ir
Big data vahidamiri-tabriz-13960226-datastack.ir
 
Finding the needles in the haystack. An Overview of Analyzing Big Data with H...
Finding the needles in the haystack. An Overview of Analyzing Big Data with H...Finding the needles in the haystack. An Overview of Analyzing Big Data with H...
Finding the needles in the haystack. An Overview of Analyzing Big Data with H...
 
Getting Started on Hadoop
Getting Started on HadoopGetting Started on Hadoop
Getting Started on Hadoop
 
Gluecon 2012 - DynamoDB
Gluecon 2012 - DynamoDBGluecon 2012 - DynamoDB
Gluecon 2012 - DynamoDB
 
DynamoDB Gluecon 2012
DynamoDB Gluecon 2012DynamoDB Gluecon 2012
DynamoDB Gluecon 2012
 
Hadoop basics
Hadoop basicsHadoop basics
Hadoop basics
 
Graph database in sv meetup
Graph database in sv meetupGraph database in sv meetup
Graph database in sv meetup
 
From Pandas to Koalas: Reducing Time-To-Insight for Virgin Hyperloop's Data
From Pandas to Koalas: Reducing Time-To-Insight for Virgin Hyperloop's DataFrom Pandas to Koalas: Reducing Time-To-Insight for Virgin Hyperloop's Data
From Pandas to Koalas: Reducing Time-To-Insight for Virgin Hyperloop's Data
 
Functional programming
 for optimization problems 
in Big Data
Functional programming
  for optimization problems 
in Big DataFunctional programming
  for optimization problems 
in Big Data
Functional programming
 for optimization problems 
in Big Data
 
Boston Spark Meetup event Slides Update
Boston Spark Meetup event Slides UpdateBoston Spark Meetup event Slides Update
Boston Spark Meetup event Slides Update
 
AWS Big Data Landscape
AWS Big Data LandscapeAWS Big Data Landscape
AWS Big Data Landscape
 
Hadoop: An Industry Perspective
Hadoop: An Industry PerspectiveHadoop: An Industry Perspective
Hadoop: An Industry Perspective
 
Google Cluster Innards
Google Cluster InnardsGoogle Cluster Innards
Google Cluster Innards
 
Big Data Essentials meetup @ IBM Ljubljana 23.06.2015
Big Data Essentials meetup @ IBM Ljubljana 23.06.2015Big Data Essentials meetup @ IBM Ljubljana 23.06.2015
Big Data Essentials meetup @ IBM Ljubljana 23.06.2015
 
GraphTech Ecosystem - part 1: Graph Databases
GraphTech Ecosystem - part 1: Graph DatabasesGraphTech Ecosystem - part 1: Graph Databases
GraphTech Ecosystem - part 1: Graph Databases
 
Unified Big Data Processing with Apache Spark (QCON 2014)
Unified Big Data Processing with Apache Spark (QCON 2014)Unified Big Data Processing with Apache Spark (QCON 2014)
Unified Big Data Processing with Apache Spark (QCON 2014)
 
Scaling up with hadoop and banyan at ITRIX-2015, College of Engineering, Guindy
Scaling up with hadoop and banyan at ITRIX-2015, College of Engineering, GuindyScaling up with hadoop and banyan at ITRIX-2015, College of Engineering, Guindy
Scaling up with hadoop and banyan at ITRIX-2015, College of Engineering, Guindy
 
Cassandra Summit 2014: Apache Spark - The SDK for All Big Data Platforms
Cassandra Summit 2014: Apache Spark - The SDK for All Big Data PlatformsCassandra Summit 2014: Apache Spark - The SDK for All Big Data Platforms
Cassandra Summit 2014: Apache Spark - The SDK for All Big Data Platforms
 

Recently uploaded

Artificia Intellicence and XPath Extension Functions
Artificia Intellicence and XPath Extension FunctionsArtificia Intellicence and XPath Extension Functions
Artificia Intellicence and XPath Extension Functions
Octavian Nadolu
 
Lecture 2 - software testing SE 412.pptx
Lecture 2 - software testing SE 412.pptxLecture 2 - software testing SE 412.pptx
Lecture 2 - software testing SE 412.pptx
TaghreedAltamimi
 
Using Query Store in Azure PostgreSQL to Understand Query Performance
Using Query Store in Azure PostgreSQL to Understand Query PerformanceUsing Query Store in Azure PostgreSQL to Understand Query Performance
Using Query Store in Azure PostgreSQL to Understand Query Performance
Grant Fritchey
 
Fundamentals of Programming and Language Processors
Fundamentals of Programming and Language ProcessorsFundamentals of Programming and Language Processors
Fundamentals of Programming and Language Processors
Rakesh Kumar R
 
Preparing Non - Technical Founders for Engaging a Tech Agency
Preparing Non - Technical Founders for Engaging  a  Tech AgencyPreparing Non - Technical Founders for Engaging  a  Tech Agency
Preparing Non - Technical Founders for Engaging a Tech Agency
ISH Technologies
 
UI5con 2024 - Keynote: Latest News about UI5 and it’s Ecosystem
UI5con 2024 - Keynote: Latest News about UI5 and it’s EcosystemUI5con 2024 - Keynote: Latest News about UI5 and it’s Ecosystem
UI5con 2024 - Keynote: Latest News about UI5 and it’s Ecosystem
Peter Muessig
 
What next after learning python programming basics
What next after learning python programming basicsWhat next after learning python programming basics
What next after learning python programming basics
Rakesh Kumar R
 
Safelyio Toolbox Talk Softwate & App (How To Digitize Safety Meetings)
Safelyio Toolbox Talk Softwate & App (How To Digitize Safety Meetings)Safelyio Toolbox Talk Softwate & App (How To Digitize Safety Meetings)
Safelyio Toolbox Talk Softwate & App (How To Digitize Safety Meetings)
safelyiotech
 
zOS Mainframe JES2-JES3 JCL-JECL Differences
zOS Mainframe JES2-JES3 JCL-JECL DifferenceszOS Mainframe JES2-JES3 JCL-JECL Differences
zOS Mainframe JES2-JES3 JCL-JECL Differences
YousufSait3
 
Modelling Up - DDDEurope 2024 - Amsterdam
Modelling Up - DDDEurope 2024 - AmsterdamModelling Up - DDDEurope 2024 - Amsterdam
Modelling Up - DDDEurope 2024 - Amsterdam
Alberto Brandolini
 
YAML crash COURSE how to write yaml file for adding configuring details
YAML crash COURSE how to write yaml file for adding configuring detailsYAML crash COURSE how to write yaml file for adding configuring details
YAML crash COURSE how to write yaml file for adding configuring details
NishanthaBulumulla1
 
SQL Accounting Software Brochure Malaysia
SQL Accounting Software Brochure MalaysiaSQL Accounting Software Brochure Malaysia
SQL Accounting Software Brochure Malaysia
GohKiangHock
 
Malibou Pitch Deck For Its €3M Seed Round
Malibou Pitch Deck For Its €3M Seed RoundMalibou Pitch Deck For Its €3M Seed Round
Malibou Pitch Deck For Its €3M Seed Round
sjcobrien
 
ALGIT - Assembly Line for Green IT - Numbers, Data, Facts
ALGIT - Assembly Line for Green IT - Numbers, Data, FactsALGIT - Assembly Line for Green IT - Numbers, Data, Facts
ALGIT - Assembly Line for Green IT - Numbers, Data, Facts
Green Software Development
 
GreenCode-A-VSCode-Plugin--Dario-Jurisic
GreenCode-A-VSCode-Plugin--Dario-JurisicGreenCode-A-VSCode-Plugin--Dario-Jurisic
GreenCode-A-VSCode-Plugin--Dario-Jurisic
Green Software Development
 
J-Spring 2024 - Going serverless with Quarkus, GraalVM native images and AWS ...
J-Spring 2024 - Going serverless with Quarkus, GraalVM native images and AWS ...J-Spring 2024 - Going serverless with Quarkus, GraalVM native images and AWS ...
J-Spring 2024 - Going serverless with Quarkus, GraalVM native images and AWS ...
Bert Jan Schrijver
 
Hand Rolled Applicative User Validation Code Kata
Hand Rolled Applicative User ValidationCode KataHand Rolled Applicative User ValidationCode Kata
Hand Rolled Applicative User Validation Code Kata
Philip Schwarz
 
Microservice Teams - How the cloud changes the way we work
Microservice Teams - How the cloud changes the way we workMicroservice Teams - How the cloud changes the way we work
Microservice Teams - How the cloud changes the way we work
Sven Peters
 
14 th Edition of International conference on computer vision
14 th Edition of International conference on computer vision14 th Edition of International conference on computer vision
14 th Edition of International conference on computer vision
ShulagnaSarkar2
 
原版定制美国纽约州立大学奥尔巴尼分校毕业证学位证书原版一模一样
原版定制美国纽约州立大学奥尔巴尼分校毕业证学位证书原版一模一样原版定制美国纽约州立大学奥尔巴尼分校毕业证学位证书原版一模一样
原版定制美国纽约州立大学奥尔巴尼分校毕业证学位证书原版一模一样
mz5nrf0n
 

Recently uploaded (20)

Artificia Intellicence and XPath Extension Functions
Artificia Intellicence and XPath Extension FunctionsArtificia Intellicence and XPath Extension Functions
Artificia Intellicence and XPath Extension Functions
 
Lecture 2 - software testing SE 412.pptx
Lecture 2 - software testing SE 412.pptxLecture 2 - software testing SE 412.pptx
Lecture 2 - software testing SE 412.pptx
 
Using Query Store in Azure PostgreSQL to Understand Query Performance
Using Query Store in Azure PostgreSQL to Understand Query PerformanceUsing Query Store in Azure PostgreSQL to Understand Query Performance
Using Query Store in Azure PostgreSQL to Understand Query Performance
 
Fundamentals of Programming and Language Processors
Fundamentals of Programming and Language ProcessorsFundamentals of Programming and Language Processors
Fundamentals of Programming and Language Processors
 
Preparing Non - Technical Founders for Engaging a Tech Agency
Preparing Non - Technical Founders for Engaging  a  Tech AgencyPreparing Non - Technical Founders for Engaging  a  Tech Agency
Preparing Non - Technical Founders for Engaging a Tech Agency
 
UI5con 2024 - Keynote: Latest News about UI5 and it’s Ecosystem
UI5con 2024 - Keynote: Latest News about UI5 and it’s EcosystemUI5con 2024 - Keynote: Latest News about UI5 and it’s Ecosystem
UI5con 2024 - Keynote: Latest News about UI5 and it’s Ecosystem
 
What next after learning python programming basics
What next after learning python programming basicsWhat next after learning python programming basics
What next after learning python programming basics
 
Safelyio Toolbox Talk Softwate & App (How To Digitize Safety Meetings)
Safelyio Toolbox Talk Softwate & App (How To Digitize Safety Meetings)Safelyio Toolbox Talk Softwate & App (How To Digitize Safety Meetings)
Safelyio Toolbox Talk Softwate & App (How To Digitize Safety Meetings)
 
zOS Mainframe JES2-JES3 JCL-JECL Differences
zOS Mainframe JES2-JES3 JCL-JECL DifferenceszOS Mainframe JES2-JES3 JCL-JECL Differences
zOS Mainframe JES2-JES3 JCL-JECL Differences
 
Modelling Up - DDDEurope 2024 - Amsterdam
Modelling Up - DDDEurope 2024 - AmsterdamModelling Up - DDDEurope 2024 - Amsterdam
Modelling Up - DDDEurope 2024 - Amsterdam
 
YAML crash COURSE how to write yaml file for adding configuring details
YAML crash COURSE how to write yaml file for adding configuring detailsYAML crash COURSE how to write yaml file for adding configuring details
YAML crash COURSE how to write yaml file for adding configuring details
 
SQL Accounting Software Brochure Malaysia
SQL Accounting Software Brochure MalaysiaSQL Accounting Software Brochure Malaysia
SQL Accounting Software Brochure Malaysia
 
Malibou Pitch Deck For Its €3M Seed Round
Malibou Pitch Deck For Its €3M Seed RoundMalibou Pitch Deck For Its €3M Seed Round
Malibou Pitch Deck For Its €3M Seed Round
 
ALGIT - Assembly Line for Green IT - Numbers, Data, Facts
ALGIT - Assembly Line for Green IT - Numbers, Data, FactsALGIT - Assembly Line for Green IT - Numbers, Data, Facts
ALGIT - Assembly Line for Green IT - Numbers, Data, Facts
 
GreenCode-A-VSCode-Plugin--Dario-Jurisic
GreenCode-A-VSCode-Plugin--Dario-JurisicGreenCode-A-VSCode-Plugin--Dario-Jurisic
GreenCode-A-VSCode-Plugin--Dario-Jurisic
 
J-Spring 2024 - Going serverless with Quarkus, GraalVM native images and AWS ...
J-Spring 2024 - Going serverless with Quarkus, GraalVM native images and AWS ...J-Spring 2024 - Going serverless with Quarkus, GraalVM native images and AWS ...
J-Spring 2024 - Going serverless with Quarkus, GraalVM native images and AWS ...
 
Hand Rolled Applicative User Validation Code Kata
Hand Rolled Applicative User ValidationCode KataHand Rolled Applicative User ValidationCode Kata
Hand Rolled Applicative User Validation Code Kata
 
Microservice Teams - How the cloud changes the way we work
Microservice Teams - How the cloud changes the way we workMicroservice Teams - How the cloud changes the way we work
Microservice Teams - How the cloud changes the way we work
 
14 th Edition of International conference on computer vision
14 th Edition of International conference on computer vision14 th Edition of International conference on computer vision
14 th Edition of International conference on computer vision
 
原版定制美国纽约州立大学奥尔巴尼分校毕业证学位证书原版一模一样
原版定制美国纽约州立大学奥尔巴尼分校毕业证学位证书原版一模一样原版定制美国纽约州立大学奥尔巴尼分校毕业证学位证书原版一模一样
原版定制美国纽约州立大学奥尔巴尼分校毕业证学位证书原版一模一样
 

Graph databases: Tinkerpop and Titan DB

  • 1. GRAPH DATABASES: THE SOLUTION FOR STORING SEMI-STRUCTURED BIG DATA Mohamed Taher Alrefaie
  • 2. DATA IS GETTING BIGGER“Every two days, we create as much information as we did us to 2003”. Eric Schmidt, former Google CEO, 2010.
  • 3. DATA IS MORE CONNECTEDHaving a look at the following proves it: - Facebook Graph - LinkedIn Graph - Linked Data - Blogs/Tagging
  • 4. DATA IS LESS STRUCTURED Modelling FB Graph? Persons, friendships, photos, locations, apps, pages, ads, interests, age range, etc.
  • 5. NOSQL DATABASES Four types of databases that alleviate the performance issues of relational databases
  • 6. KEY VALUE STORES Data Model:  Global key-value mapping  Big scalable HashMap  Highly fault tolerant (typically) Examples:  Redis, Riak, Voldemort. Dynamo
  • 7. KEY VALUE STORES: PROS AND CONS Pros: Simple data model Scalable Cons Create your own “foreign keys” Poor for complex data
  • 8. COLUMN FAMILY Main idea is based on BigTable: Google’s distributed storage model for Structured Data Data Model: A big table, with column families Map Reduce for querying/processing Examples:  HBase, HyperTable, Cassandra
  • 9. COLUMN FAMILY: PROS AND CONS Pros: Supports Semi-Structured Data Naturally Indexed (columns) Scalable Cons Poor for interconnected data
  • 10. DOCUMENT DATABASES Data Model: A collection of documents A document is a key value collection Index-centric, uses map-reduce extensively Examples:  CouchDB, MongoDB
  • 11. DOCUMENT DATABASES: PROS AND CONS Pros: Simple, powerful data model Scalable Cons Poor for interconnected data Query model limited to keys and indexes Map reduce for larger queries
  • 12. GRAPH DATABASES Data Model: Nodes and Relationships Examples:  Titan, Neo4j, OrientDB, etc.
  • 13. GRAPH DATABASES: PROS AND CONS Pros: Powerful data model, as general as RDBMS Connected data locally indexed Easy to query Cons Sharding Requires different data modelling
  • 14. RDBMS LIVING IN A NOSQL WORLD Complexity BigTable Clones Size Key-Value Store Document Databases Graph Databases 90% of Use Cases Relational Databases 9,223,372,036,854,775,807
  • 15. WHAT IS A GRAPH? An abstract representation of a set of objects where some pairs are connected by links. Object (Vertex, Node) Link (Edge, Arc, Relationship)
  • 16. WHAT IS A GRAPH DATABASE? A database with an explicit graph structure Each node knows its adjacent nodes through edges As the number of nodes increases, the cost of a local step (or hop) remains the same plus an Index for lookups
  • 17. APACHE TINKERPOP: A UNIFIED API Dealing with such complex databases, requires a well- implemented API by the vendor. But using a vendor specific API, makes migrating to another database impossible. The solution is provided by Apache Tinkerpop.
  • 18. WHAT IS APACHE TINKERPOP? ● A Graph processing system ● Currently under Apache incubation ( 2015 ) ● Has Tinkerpop3 Structure API ● Graph, Element, Property ● Has Tinkerpop3 Process API ● TraversalSource, GraphComputer ● Gremlin query language ● A scripting language for graph traversal and mutation ● REST API
  • 19. WHY APACHE TINKERPOP? Tinkerpop is a generic API for graph databases Think ODBC, JDBC or Hibernate for relational databases Integrates with: Titan DB Neo4j Orient DB And many more. Uses Gremlin graph scripting language
  • 20. TITAN DATABASE Titan is a scalable graph database using Tinkerpop APIs optimized for storing and querying graphs containing hundreds of billions of vertices and edges distributed across a multi-machine cluster. Supports Apache Spark and Hadoop (implicitly) for map-reduce operations. Integrates with:  Elasticsearch, Solr, Lucene Uses as a backend storage:  Apache Cassandra  Apache Hbase
  • 21. PUTTING IT ALL TOGETHER Apache Tinkerpop API Gremlin server Graph traversal Gremlin client Monitoring Titan DB Storage specific (Cassandra, HBase, BerkeleyDB)
  • 22. TITAN: EXAMPLE Download titan server and console here  https://github.com/thinkaurelius/titan/wiki/Downloads $ cd titan-1.0.0-hadoop1 $ bin/gremlin.sh gremlin> graph=TitanFactory.open(“conf/titan-berkely- es.properties”) gremlin> g=GraphOfGodsFactory.load(graph).traversal()
  • 23. TINKERPOP: EXAMPLE Graph g = TinkerGraph.open(); (1) Vertex marko = g.addVertex(Element.ID, 1, "name", "marko", "age", 29); (2) Vertex vadas = g.addVertex(Element.ID, 2, "name", "vadas", "age", 27); Vertex lop = g.addVertex(Element.ID, 3, "name", "lop", "lang", "java"); Vertex josh = g.addVertex(Element.ID, 4, "name", "josh", "age", 32); Vertex ripple = g.addVertex(Element.ID, 5, "name", "ripple", "lang", "java"); Vertex peter = g.addVertex(Element.ID, 6, "name", "peter", "age", 35); marko.addEdge("knows", vadas, Element.ID, 7, "weight", 0.5f); (3) marko.addEdge("knows", josh, Element.ID, 8, "weight", 1.0f); marko.addEdge("created", lop, Element.ID, 9, "weight", 0.4f); josh.addEdge("created", ripple, Element.ID, 10, "weight", 1.0f); josh.addEdge("created", lop, Element.ID, 11, "weight", 0.4f); peter.addEdge("created", lop, Element.ID, 12, "weight", 0.2f);
  • 24. TINKERPOP: EXAMPLE (CONT.) gremlin> g.V().has('name','marko') .out('knows') .values('name') (3) ==>vadas ==>josh
  • 25. SUMMARY Graph databases are the solution for highly scalable semi-structured connected data. Apache Tinkerpop is a generic API for graph databases to avoid DB vendor specific business logic code. Titan DB is a scalable distributed graph database on top of several other databases. It uses BerkeleyDB, HBase or BerkeleyDB as an end storage. This helps the database to be as linear or scalable you want it to be.