HBaseCon 2017: Community-Driven Graph with JanusGraph (updated)
1. Note: This is a updated version of the original presentation at HBaseCon West 2017.
Jing Chen He • jinghe@us.ibm.com • Apache HBase PMC • JanusGraph TSC
Jason Plurad • pluradj@us.ibm.com • Apache TinkerPop PMC • JanusGraph TSC
HBaseCon West 2017 • June 12, 2017
Community-Driven Graphs with
JanusGraph
4. Graph Use Cases
Social network analysis
Recommendation engines
Knowledge graphs
Internet of things
Fraud detection system
Meta data and Master data management
4 #HBaseCon
5. Graph Computing and Graph Database
Graph Computing and Processing framework
OLTP - Local traversal, real time.
OLAP - Entire data set accessed, long running, parallel and batch processing.
Apache TinkerPop - OLTP and OLAP (TinkerPop SparkGraphComputer)
Apache Spark GraphX - OLAP
Graph Database
Graph Computing and Processing + Data Storage
Example Neo4J - Integrated graph processing and storage in one.
Example JanusGraph - Graph processing layer on top of external storage.
5 #HBaseCon
6. Apache TinkerPop
Open source, vendor-agnostic,
graph computing framework
Gremlin graph traversal language
6
Apache TinkerPop™
Maintainer Apache
Software
Foundation
License Apache
Latest Release 3.2.4
February 2017
https://tinkerpop.apache.org
#HBaseCon
15. 15 #HBaseCon
with HBase
HBase – Perfect Storage Backend for JanusGraph
Big enough for your biggest graph!
The storage model
Read and write speed
Scalability and partitioning
Strong consistency
Tight integration with Hadoop Ecosystem
Great open community!
http://docs.janusgraph.org/latest/hbase.html
16. 16 #HBaseCon
with HBase
HBase – Perfect Storage Backend for JanusGraph
Simple configuration!
conf/janusgraph-hbase-solr.properties
storage.backend=hbase
storage.hostname=zookeeper-host1,zookeeper-host2,zookeeper-host3
storage.hbase.table=janusgraph
storage.hbase.ext.zookeeper.znode.parent=/hbase
storage.hbase.ext.hbase.zookeeper.property.clientPort=2181
Just open your graph!
graph=JanusGraphFactory.open('conf/janusgraph-hbase-solr.properties')
Optional
Optional
17. 17 #HBaseCon
with HBase
HBase – Perfect Storage Backend for JanusGraph
Throw in an Index Backend for better performance
conf/janusgraph-hbase-solr.properties
index.search.backend=solr
index.search.solr.mode=cloud
index.search.solr.zookeeper-url=zookeeper-host1:2181/solr,zookeeper-
host2:2181/solr,zookeeper-host3:2181/solr
index.search.solr.configset=janusgraph
18. 18 #HBaseCon
with HBase
HBase – Perfect Storage Backend for JanusGraph
Look into more details
Stores to Column Families
Edge store e
Index store g
ID store i
System Transaction log store l
System Management log store m
System property store s
…
CF attributes can be set. E.g. compression, TTL.
19. 19 #HBaseCon
with HBase
HBase – Perfect Storage Backend for JanusGraph
Look into more details
g.V().has("name", "Alice").out("knows").out("knows").values("name")
Execution Plan to
Backend Store and
Index
Edge Store
Index Store
Index
provider
(ES or
Solr)
Gremlin TraversalStrategy
Optimization
JanusGraph Optimization
20. 20 #HBaseCon
with HBase
HBase – Perfect Storage Backend for JanusGraph
Look into more details
A store (column family) is always specified.
Get or Multi Get
Batch to mutate
Key range scan
ColumnRangeFilter
ColumnPaginationFilter
HBase tuning
Edge Store
Index Store
21. 21 #HBaseCon
with Google Cloud Bigtable
Bigtable implements the HBase 1.0 client API
Need the latest version of the bigtable-hbase-1.x artifact.
storage.backend=hbase
storage.hbase.ext.hbase.client.connection.impl=
com.google.cloud.bigtable.hbase1_x.BigtableConnection
storage.hbase.ext.google.bigtable.project.id=
<Google Cloud Platform project id>
storage.hbase.ext.google.bigtable.instance.id=<Bigtable instance id>
Abstract: Graphs are well-suited for many use cases to express and process complex relationships among entities in enterprise and social contexts. Fueled by the growing interest in graphs, there are various graph databases and processing systems that dot the graph landscape. JanusGraph is a community-driven project that continues the legacy of Titan, a pioneer of open source graph databases. JanusGraph is a scalable graph database optimized for large scale transactional and analytical graph processing. In the session, we will introduce JanusGraph, which features full integration with the Apache TinkerPop graph stack. We will discuss JanusGraph's optimized storage model that relies on HBase for fast graph transversal and processing.
Brief history with TinkerPop.
Long history as an open source project.
Brief history with TinkerPop.
Long history as an open source project.
Brief history with TinkerPop.
Long history as an open source project.
Brief history with TinkerPop.
Long history as an open source project.
Lots of interesting parts to graph system integration.
Lots of ways to extend and contribute.
Lots of interesting parts to graph system integration.
Lots of ways to extend and contribute.
Lots of interesting parts to graph system integration.
Lots of ways to extend and contribute.
Lots of interesting parts to graph system integration.
Lots of ways to extend and contribute.
Server side tune.
Pre-split.
Client side pass-thru properties: storage.hbase.ext.<hbase-client-property>