1 What Is Gravty?
2 The Internals of Gravty
3 Fine-Tuning Gravty
4 Future Plans
1 What Is Gravty?
2 The Internals of Gravty
3 Fine-Tuning Gravty
4 Future Plans
A Graph Database Is
“A graph database is a database that uses
graph structures for semantic queries with nodes,
edges and properties to represent and store data.” (Wikipedia)
Stores objects (vertices)
and relationships (edges)
Provides graph search
capabilities
Vertices and Edges in a Graph Database
Friends
Friends
Likes
Use Cases of a Graph Database
Facebook
Social Graph
Social networks
Google
PageRank
Ranking websites
Walmart
and eBay
Product recommendation
Need for a Large Graph Database System
Social GraphLINE Timeline
LINE Talk
Ranking
Recommendation
LINE Friends Shop
LINE News
Gravty
Need for a Large Graph Database System
Social GraphLINE Timeline
LINE Talk
Ranking
Recommendation
LINE Friends Shop
LINE News
Gravty
7 billion vertices
100 billion edges
200 billion indexes
5 billion writes a day
(create / update / delete)
Gravty Is
A scalable graph database to search
relational information efficiently
by searching through a large pool of data
using the graph search technique.
Requirements for Gravty
Easy to scale out
• To support
ever-increasing data
Easy to develop
• Add, modify, and remove
features as necessary
• Tailored to the LINE
development environment
• Not dependent on LINE-
specific components
Full control over everything!
Easy to use
• Graph query language
• REST API
1 What Is Gravty?
2 The Internals of Gravty
3 Fine-Tuning Gravty
4 Future Plans
Technology Stack and Architecture
Data Model
Technology Stack and Architecture
Application
TinkerPop3 Gremlin-Console
TinkerPop3 Graph API
Graph Processing Layer
Storage Layer
MySQL
(config, meta)
HBaseKafka
Gravty
MySQL
(config, meta)
Kafka
Application
TinkerPop3 Gremlin-Console
TinkerPop 3.2.0 Graph API
Graph Processing Layer (OLTP only)
HBase
Storage Layer
Gravty
HBase 1.1.x Local MemoryKafka 0.10.0.0 Phoenix 4.8.0
Application
TinkerPop3 Gremlin-Console
TinkerPop3 Graph API
Gravty
Storage Layer (Abstract Interface)
Phoenix Repository
(Default)
Memory Repository
(Standalone)
Graph Processing Layer
• Row key: vertex-id
• Edges are stored in columns
• Disadvantages
Data Model
Flat-Wide Table
Column scan is slow
Columns cannot be split
Row Column
vertex-id1 property property edge edge edge edge edge edge
vertex-id2 …
vertex-id3 …
• Row key: edge-id
Data Model
Tall-Narrow Table (Gravty)
SrcVertexId-Label-TgtVertexId
Row Column
svtxid1-label-tvtxid2
edge
property
edge
property
svtxid1-label-tvtxid3 …
…
• Edges are stored in rows
• Advantages
More effective edge scan
Parallel execution
Friends
Flat-Wide vs Tall-Narrow
g.V(“brown”).out(“friends”).id().limit(3)
Brown
Cony
Moon
Sally
[cony, moon, sally]
Flat-Wide vs Tall-Narrow
Flat-Wide Model
Brown edge edge edge edge edge edge
(1) Row scan
2 operations
(2) Column scan
[cony, moon, sally]
‘likes’ ‘friends’
Flat-Wide vs Tall-Narrow
Tall-Narrow Model (Gravty)
brown-friends-sally
(1) Row scan
1 operation
[cony, moon, sally]
brown-friends-moon
brown-friends-cony
• Can split by rows (region)
• Can isolate hotspot rows
• Can scan in parallel
Flat-Wide vs Tall-Narrow
g.V(“brown”).out(“friends”).out(“friends”).
id().limit(10)
4 searches in total
• Flat-Wide = 8 operations
• Tall-Narrow (Gravty) = 4 operations
1 What Is Gravty?
2 The Internals of Gravty
3 Fine-Tuning Gravty
4 Future Plans
Faster, Compact Querying
Avoiding Hot-Spotting
Efficient Secondary Indexing
Faster, Compact Querying
g.V(brown).hasLabel("user").out("friends”).order().by(“name”, Order.incr).limit(5)
Reducing graph traversal steps
GraphStep VertexStepFilterStep RangeStepFilterStep
GGraphStep GVertexStep
Faster, Compact Querying
g.V(brown)
.outE("friends”).limit(5)
.inV().order().by("name", Order.incr)
.properties("name")
inV(): Pipelined iterator from outE()
• TinkerPop: Sequential consuming
• Gravty: Parallel querying + pre-loading vertex property
Querying in parallel and pre-loading vertex properties
outE() “name”: “Boss”
limit 5
friends
inV()
“name”: “Edward”
“name”: “Moon”
“name”: “James”
“name”: “Jessica”
“name”: “Cony”
“name”: “Sally”
Row keys that have
sequential orders may cause
RegionServers to suffer:
Hot-spotting problem with HBase RegionServer
EDGE TABLE
SrcVertexId Label TgtVertexId
u000001 1 u000002
u000001 1 u000003
u000002 1 u000001
u000003 1 u000001
u000004 2 u000009
• Heavy loads of writes or reads
• Inefficient region splitting
Avoiding Hot-Spotting
Solutions to the hot-spotting problem
- Pre-splitting regions
- Salting row keys with a hashed prefix
(Salting tables by Apache Phoenix)
But, there is a scan performance issue
with the LIMIT clause
SELECT * FROM index …
LIMIT 100;
Avoiding Hot-Spotting
Avoiding Hot-Spotting
Phoenix Salted Table
Scan 100 rows
Client side merge sort
Phoenix Client
Result
Scan 100 rows
Scan 100 rows
Scan 100 rows
Scan maximum 400 rows
Avoiding Hot-Spotting
Custom Salting + Pre-splitting
hash (source-vertex-id)
Result
Phoenix Client
Scan 100 rows sequentially
Row Key Prefix
Indexed graph view for
faster graph search
Asynchronous index
processing using Kafka
Efficient Secondary Indexing
Tools for failure
recovery
Default Phoenix IndexCommitter
HRegion
HRegion
HRegion
HRegion
HRegion
HRegion
Put
Delete
Put
Indexer Coprocessor
Phoenix Driver
numConnections = regionServers * regionServers * needConnections
Index update
Index update
Too many connections on
each RegionServer
(Network is heavily congested)
Synchronous processing of index update requests
Gravty IndexCommitter
HRegion
HRegion
HRegion
HRegion
HRegion
HRegion
Put
Delete
Put
Indexer Coprocessor
Phoenix Driver
numConnections = indexers * regionServers * needConnections
Mutations
Asynchronous processing using Kafka
Kafka
Indexer
Indexer
Index
update
Default Phoenix IndexCommitter
1. Phoenix
client UPSERT
INDEX 1
Phoenix
Coprocessor
Region Server
Primary Table
Phoenix
Coprocessor
Region Server
INDEX 2
Phoenix
Coprocessor
Region Server
PUT
PUT / DELETE
PUT / DELETE
2. Request HBase mutations
for indexes in parallel
RETURN
3. Phoenix client
returns
Gravty IndexCommitter
INDEX 1
Phoenix
Coprocessor
Region Server
Primary Table
Phoenix
Coprocessor
Region Server
INDEX 2
Phoenix
Coprocessor
Region Server
1.PUT
2. HBase mutations for INDEX 1, 2
4. Consume
3.RETURN
Kafka
Index
Consumer
5. PUT / DELETE
5. PUT / DELETE
Secondary Indexing Metrics
Server TPS RegionServer
Number of connections
3x 1/8
Reentrant event
processing
Every row is versioned in
HBase (timestamp)
Logging failures
and replaying
failed requests
Time machine to
resume at
certain runtime
Resetting runtime offset
of Kafka consumers
Best-Effort Failover
Fail fast, fix later
Monitoring Tools for Failure Recovery
Setting alerts and displaying metrics
• Prometheus
• Dropwizard metrics
• jvm_exporter
• Grafana
• Ambari
1 What Is Gravty?
2 The Internals of Gravty
3 Fine-Tuning Gravty
4 Future Plans
Client
Graph API
Multiple Graph Clusters
Before
Gravty
HBase Cluster
Client
Graph API
After
Gravty
HBase Cluster HBase Cluster
HBase Cluster
HBase Repository
Storage Layer
Memory Repository
(Standalone)
Phoenix Repository
(Default)
HBase
Repository
Abstract Interface
HBase
Phoenix Region
CoprocessorLocal Memory
Graph analytics system
graph computation
OLAP Functionality
TinkerPop Graph
Computing API
We will open source Gravty
B 4 gravty

B 4 gravty

  • 2.
    1 What IsGravty? 2 The Internals of Gravty 3 Fine-Tuning Gravty 4 Future Plans
  • 3.
    1 What IsGravty? 2 The Internals of Gravty 3 Fine-Tuning Gravty 4 Future Plans
  • 4.
    A Graph DatabaseIs “A graph database is a database that uses graph structures for semantic queries with nodes, edges and properties to represent and store data.” (Wikipedia) Stores objects (vertices) and relationships (edges) Provides graph search capabilities
  • 5.
    Vertices and Edgesin a Graph Database Friends Friends Likes
  • 6.
    Use Cases ofa Graph Database Facebook Social Graph Social networks Google PageRank Ranking websites Walmart and eBay Product recommendation
  • 7.
    Need for aLarge Graph Database System Social GraphLINE Timeline LINE Talk Ranking Recommendation LINE Friends Shop LINE News Gravty
  • 8.
    Need for aLarge Graph Database System Social GraphLINE Timeline LINE Talk Ranking Recommendation LINE Friends Shop LINE News Gravty 7 billion vertices 100 billion edges 200 billion indexes 5 billion writes a day (create / update / delete)
  • 9.
    Gravty Is A scalablegraph database to search relational information efficiently by searching through a large pool of data using the graph search technique.
  • 10.
    Requirements for Gravty Easyto scale out • To support ever-increasing data Easy to develop • Add, modify, and remove features as necessary • Tailored to the LINE development environment • Not dependent on LINE- specific components Full control over everything! Easy to use • Graph query language • REST API
  • 11.
    1 What IsGravty? 2 The Internals of Gravty 3 Fine-Tuning Gravty 4 Future Plans Technology Stack and Architecture Data Model
  • 12.
    Technology Stack andArchitecture Application TinkerPop3 Gremlin-Console TinkerPop3 Graph API Graph Processing Layer Storage Layer MySQL (config, meta) HBaseKafka Gravty
  • 13.
    MySQL (config, meta) Kafka Application TinkerPop3 Gremlin-Console TinkerPop3.2.0 Graph API Graph Processing Layer (OLTP only) HBase Storage Layer Gravty
  • 14.
    HBase 1.1.x LocalMemoryKafka 0.10.0.0 Phoenix 4.8.0 Application TinkerPop3 Gremlin-Console TinkerPop3 Graph API Gravty Storage Layer (Abstract Interface) Phoenix Repository (Default) Memory Repository (Standalone) Graph Processing Layer
  • 15.
    • Row key:vertex-id • Edges are stored in columns • Disadvantages Data Model Flat-Wide Table Column scan is slow Columns cannot be split Row Column vertex-id1 property property edge edge edge edge edge edge vertex-id2 … vertex-id3 …
  • 16.
    • Row key:edge-id Data Model Tall-Narrow Table (Gravty) SrcVertexId-Label-TgtVertexId Row Column svtxid1-label-tvtxid2 edge property edge property svtxid1-label-tvtxid3 … … • Edges are stored in rows • Advantages More effective edge scan Parallel execution
  • 17.
  • 18.
    Flat-Wide vs Tall-Narrow Flat-WideModel Brown edge edge edge edge edge edge (1) Row scan 2 operations (2) Column scan [cony, moon, sally] ‘likes’ ‘friends’
  • 19.
    Flat-Wide vs Tall-Narrow Tall-NarrowModel (Gravty) brown-friends-sally (1) Row scan 1 operation [cony, moon, sally] brown-friends-moon brown-friends-cony • Can split by rows (region) • Can isolate hotspot rows • Can scan in parallel
  • 20.
    Flat-Wide vs Tall-Narrow g.V(“brown”).out(“friends”).out(“friends”). id().limit(10) 4searches in total • Flat-Wide = 8 operations • Tall-Narrow (Gravty) = 4 operations
  • 21.
    1 What IsGravty? 2 The Internals of Gravty 3 Fine-Tuning Gravty 4 Future Plans Faster, Compact Querying Avoiding Hot-Spotting Efficient Secondary Indexing
  • 22.
    Faster, Compact Querying g.V(brown).hasLabel("user").out("friends”).order().by(“name”,Order.incr).limit(5) Reducing graph traversal steps GraphStep VertexStepFilterStep RangeStepFilterStep GGraphStep GVertexStep
  • 23.
    Faster, Compact Querying g.V(brown) .outE("friends”).limit(5) .inV().order().by("name",Order.incr) .properties("name") inV(): Pipelined iterator from outE() • TinkerPop: Sequential consuming • Gravty: Parallel querying + pre-loading vertex property Querying in parallel and pre-loading vertex properties outE() “name”: “Boss” limit 5 friends inV() “name”: “Edward” “name”: “Moon” “name”: “James” “name”: “Jessica” “name”: “Cony” “name”: “Sally”
  • 24.
    Row keys thathave sequential orders may cause RegionServers to suffer: Hot-spotting problem with HBase RegionServer EDGE TABLE SrcVertexId Label TgtVertexId u000001 1 u000002 u000001 1 u000003 u000002 1 u000001 u000003 1 u000001 u000004 2 u000009 • Heavy loads of writes or reads • Inefficient region splitting Avoiding Hot-Spotting
  • 25.
    Solutions to thehot-spotting problem - Pre-splitting regions - Salting row keys with a hashed prefix (Salting tables by Apache Phoenix) But, there is a scan performance issue with the LIMIT clause SELECT * FROM index … LIMIT 100; Avoiding Hot-Spotting
  • 26.
    Avoiding Hot-Spotting Phoenix SaltedTable Scan 100 rows Client side merge sort Phoenix Client Result Scan 100 rows Scan 100 rows Scan 100 rows Scan maximum 400 rows
  • 27.
    Avoiding Hot-Spotting Custom Salting+ Pre-splitting hash (source-vertex-id) Result Phoenix Client Scan 100 rows sequentially Row Key Prefix
  • 28.
    Indexed graph viewfor faster graph search Asynchronous index processing using Kafka Efficient Secondary Indexing Tools for failure recovery
  • 29.
    Default Phoenix IndexCommitter HRegion HRegion HRegion HRegion HRegion HRegion Put Delete Put IndexerCoprocessor Phoenix Driver numConnections = regionServers * regionServers * needConnections Index update Index update Too many connections on each RegionServer (Network is heavily congested) Synchronous processing of index update requests
  • 30.
    Gravty IndexCommitter HRegion HRegion HRegion HRegion HRegion HRegion Put Delete Put Indexer Coprocessor PhoenixDriver numConnections = indexers * regionServers * needConnections Mutations Asynchronous processing using Kafka Kafka Indexer Indexer Index update
  • 31.
    Default Phoenix IndexCommitter 1.Phoenix client UPSERT INDEX 1 Phoenix Coprocessor Region Server Primary Table Phoenix Coprocessor Region Server INDEX 2 Phoenix Coprocessor Region Server PUT PUT / DELETE PUT / DELETE 2. Request HBase mutations for indexes in parallel RETURN 3. Phoenix client returns
  • 32.
    Gravty IndexCommitter INDEX 1 Phoenix Coprocessor RegionServer Primary Table Phoenix Coprocessor Region Server INDEX 2 Phoenix Coprocessor Region Server 1.PUT 2. HBase mutations for INDEX 1, 2 4. Consume 3.RETURN Kafka Index Consumer 5. PUT / DELETE 5. PUT / DELETE
  • 33.
    Secondary Indexing Metrics ServerTPS RegionServer Number of connections 3x 1/8
  • 34.
    Reentrant event processing Every rowis versioned in HBase (timestamp) Logging failures and replaying failed requests Time machine to resume at certain runtime Resetting runtime offset of Kafka consumers Best-Effort Failover Fail fast, fix later
  • 35.
    Monitoring Tools forFailure Recovery Setting alerts and displaying metrics • Prometheus • Dropwizard metrics • jvm_exporter • Grafana • Ambari
  • 36.
    1 What IsGravty? 2 The Internals of Gravty 3 Fine-Tuning Gravty 4 Future Plans
  • 37.
    Client Graph API Multiple GraphClusters Before Gravty HBase Cluster Client Graph API After Gravty HBase Cluster HBase Cluster HBase Cluster
  • 38.
    HBase Repository Storage Layer MemoryRepository (Standalone) Phoenix Repository (Default) HBase Repository Abstract Interface HBase Phoenix Region CoprocessorLocal Memory
  • 39.
    Graph analytics system graphcomputation OLAP Functionality TinkerPop Graph Computing API
  • 40.
    We will opensource Gravty