Titan and Cassandra at WellAware

Titan and Cassandra at
WellAware
Ted Wilmes
tedwilmes@wellaware.us

Topics
● The property graph model
● The graph ecosystem
● Titan overview
● Titan at WellAware

Property graph model
label: truck
license: ABC123
year: 2013
label: person
firstName: Susan
label: company
name: Acme
Trucks
owns
bought: 2012
employs
hired: 2012
drives

Sampling of graph projects/vendors

Apache Tinkerpop - tying it all together!
● Gremlin Server
○ Remote access to Tinkerpop
compliant graph dbs for JVM
& non-JVM clients
● Gremlin
○ Graph query and processing
language
● Core API
○ add vertex
○ add edge
○ add/update properties
○ simple queries (adjacent
edge/vertex retrieval)
http://tinkerpop.incubator.apache.org/

Gremlin in action
vehicle
license: ABC123
year: 2013
person
firstName: Susan
company
name: Acme
Trucks
person
firstName: Tom
employs
hired: 2014
employs
hired: 2012
owns
bought: 2012
● Add vertices and
edges
● Retrieving vertices
● Basic vertex filtering
● Querying adjacent
edges and vertices
drives

Building the graph
// Add vertices
graph = TitanFactory.open('conf/titan-cassandra.properties')
acmeTrucks = graph.addVertex(T.label, "company", "name", "Acme Trucks")
susan = graph.addVertex(T.label, "person", "firstName", "Susan")
tom = graph.addVertex(T.label, "person", "firstName", "Tom")
truck = graph.addVertex(T.label, "vehicle", "license", "ABC123", "year", 2012)
// Connect vertices with edges
edge = acmeTrucks.addEdge("owns", truck)
edge.property("bought", 2012)
acmeTrucks.addEdge("employs", susan).property("hired", 2012)
acmeTrucks.addEdge("employs", tom).property("hired", 2014)
tom.addEdge("drives", truck)

Retrieving vertices
// Get a traverser so that we can run some queries
g = graph.traversal(standard())
gremlin> g.V()
==>v[0]
==>v[2]
==>v[4]
==>v[6]
// Get the properties for each vertex
gremlin> g.V().valueMap()
==>[name:[Acme Trucks]]
==>[firstName:[Susan]]
==>[firstName:[Tom]]
==>[license:[ABC123], year:[2012]]

Basic vertex filtering
// Retrieve all people with firstName Susan
gremlin> g.V().hasLabel("person").has("firstName", "Susan")
==>v[2]
// Retrieve all people with firstName Susan or Tom
gremlin> g.V().hasLabel("person").has("firstName", within("Susan", "Tom"))
==>v[2]
==>v[4]

Querying adjacent edges and vertices
// Count how many people Acme Trucks employs
gremlin> g.V().hasLabel("company").has("name", "Acme Trucks").out("employs").count()
==>2
// How many employees were hired in 2012?
gremlin> g.V().hasLabel("person").where(inE("employs").has("hired", 2012)).count()
==>1
// Which employees drives a truck?
gremlin> g.V().hasLabel("company").has("name", "Acme Trucks").out("employs").as("driver").out("drives").select
("driver").values("firstName")
==>Tom
// Show me all of the drivers that were hired before 2015
gremlin> g.V().hasLabel("person").and(inE("employs").values("hired").is(lt(2015)), out("drives")).values("firstName")
==>Tom

Many more steps...
● AddEdge Step
● AddVertex Step
● AddProperty Step
● Aggregate Step
● And Step
● As Step
● By Step
● Cap Step
● Coalesce Step
● Count Step
● Choose Step
● Coin Step
● CyclicPath Step
● Dedup Step
● Drop Step
● Fold Step
● Group Step
● GroupCount Step
● Has Step
● Inject Step
● Is Step
● Limit Step
● Local Step
● Match Step
● ...

GraphComputer for global graph processing
● Use cases
○ full graph traversal
○ parallel processing
○ batch import/export
● Examples
○ PageRank
○ vertex count
○ mass schema update
● Gremlin OLAP implementations
○ Hadoop
○ Spark
○ Giraph

Graph use cases
● Social network analysis
● Fraud detection
● Recommendation systems
● Route optimization
● IoT
● Master data management

TitanDB
● What is Titan?
● Data store options
● Deployment options
● Titan Cassandra data model
● Titan specific graph features

TitanDB
● Graph layer that can use a variety of data stores as backends depending
on user requirements
○ HBase
○ Berkeley DB
○ Cassandra
○ Insert your favorite k/v, BigTable data store

Which data store is right for you?
● Things to think about
○ data volume
○ CAP
○ ACID
○ read/write requirements
○ ops implications
○ your current infrastructure
http://s3.thinkaurelius.com/docs/titan/0.5.4/benefits.html

Socket
JVM
Node
JVM
Node
Embedded
JVM

A Titan cluster with access options
Titan
C*
Titan
C*
Titan
C*
Titan
C*
Titan
C*
● Access options
○ Titan < 0.9
■ Rexster
■ dependency of your app
○ Titan 0.9+
■ Gremlin server
■ dependency of your app
○ Object to graph mapper
■ Python - Mogwai, Bulbs
■ JVM - Totorom, Frames
● Titan does not need to be on each
node, all communication between
Titan instances is through C*

Titan installation
● Download and unzip latest milestone
● Cassandra footprint
○ Titan keyspace
○ Column families
■ edgestore
■ edgestore_lock_
■ graphindex
■ graphindex_lock_
■ titan_ids
■ ...
./bin/titan.sh start
Forking Cassandra...
Running `nodetool statusthrift`.. OK (returned exit
status 0 and printed string "running").
Forking Elasticsearch...
Connecting to Elasticsearch (127.0.0.1:9300). OK
(connected to 127.0.0.1:9300).
Forking Gremlin-Server...
Connecting to Gremlin-Server (127.0.0.1:8182)...... OK
(connected to 127.0.0.1:8182).
Run gremlin.sh to connect.

Vertex and edge storage format
Cassandra
Thrift
Titan storage
format

Edge and property serialization

Schema definition
● Properties
○ data type - string, float, char, geoshape, etc.
○ cardinality - single, list, set
○ uniqueness (through Titan’s indexing system)
● Edges
○ labels
○ define multiplicity - one-to-one, many-to-one, one-to-many
● Vertices
○ labels
● Advanced
○ edge, vertex, and property TTL
○ Multi-properties - properties on properties (audit info for example)

Global indexing options
● Supports composite keys
● Titan indexing provider
○ fast!
○ exact matches only
● External providers
○ Not as fast
○ Many options beyond exact
matching (wildcards,
geosearch, etc.)
○ providers
■ Elastic Search
■ Lucene
■ Solr
I want that one!

Vertex Centric Indices
● Adjacent edge counts can grow
quite large in certain situations
and form super nodes
● Supports composite keys and
ordering of edges to speed up
vertex centric queries
○ translates into slice queries of
the edges
○ efficiently retrieve ranges of
edges or satisfy top n type
queries
company
name: Acme
Trucks
employs
hired: 2013
employs
hired: 2014
employs
hired: 2015

Graph partitioning with ByteOrderedPartioner
?

Vertex cuts
Supernode ...
1
2,000,000
mgmt = graph.openManagement()
mgmt.makeVertexLabel('user').partition().make()
mgmt.commit()
}
}

A bit more about WellAware
● Founded in 2012
● Full stack oil & gas monitoring solution
● iOS, Android, and web clients
● Connecting to field assets over RPMA, cellular, and
satellite

Functionality and high level architecture
● Remote data collection
● Mobile data collection
● Asset control
● Derived measurements
● Alarming
● Reporting
Poller Django
Titan
WAN ESB

Moving to Titan
● 2013
○ Running Django against PostgreSQL and for awhile, TempoDB
● Beginning of 2014 - started using Titan 0.4.4 to capture relationships
between assets and for derived measurements
● March 2014 - deployed a 3 node Cassandra cluster and moved the rest of
the backend (minus auth) over to Titan 0.4.4
● Today - 3 node DC for OLTP & 2 node reporting DC
○ still on Titan 0.4.4, waiting for Titan 1.0 to be released and hardened
○ post Titan 1.0, we’re looking forward to trying out DSE Graph

A common well pad configuration
Well & pumpjack
Tanks

Sample of model
O&G
Co.
TankSite
Top
Gauge

Zooming in on a well pad
wellmeter separator
meter
tank
tank
compressor

Lessons learned
● No native integration with 3rd party BI tools - reports, dashboards, ad hoc
query
○ Apache Calcite based jdbc driver that translates SQL to graph queries
● Colocation of Titan, some of your application code, and Cassandra on the
same nodes, what’s the right separation?
● Out of the box framework support is lacking (no native Spring, Dropwizard
support)
● Performance tuning requires knowledge of Titan AND Cassandra
● Play to Cassandra and adjacency list storage format strengths
● You can’t hide from tombstones!!!

Graph and Titan resources
● Tinkerpop docs - http://www.tinkerpop.com/docs/3.0.0.M6/
● Titan docs - http://s3.thinkaurelius.com/docs/titan/0.9.0-M2/
● Titan Google group - https://groups.google.com/forum/#!
forum/aureliusgraphs
● Gremlin Google group - https://groups.google.com/forum/#!forum/gremlin-
users
● O’Reilly graph ebook (focuses on Neo4j but has generally applicable graph
info) - http://graphdatabases.com/
● Java OGM - https://github.com/BrynCooke/totorom
● Python OGM - https://mogwai.readthedocs.org/en/latest/

Thanks and what questions do you have?

Titan and Cassandra at WellAware

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (13)

Similar to Titan and Cassandra at WellAware

Similar to Titan and Cassandra at WellAware (20)

Recently uploaded

Recently uploaded (20)

Titan and Cassandra at WellAware