SlideShare a Scribd company logo
S2Graph :
A large-scale graph database
with Hbase
daumkakao
Doyoung Yoon x Taejin Chin
2
DaumKakao
A Mobile Lifestyle Platform
1. KakaoTalk
a. Mobile Messenger replacing SMS
b. ā€˜KaTalkHeā€™ is being used as a verb in Korea like ā€˜Googlingā€™
c. 96% of Korean smartphone users are using KakaoTalk
d. 170M users worldwide
e. 3B messages / day
3
KakaoTalk
Social
Platform
DaumKakao
A Mobile Lifestyle Platform
KakaoStory
KakaoGroup
Daum Cafe
Contents
Platform
KakaoTopic
KakaoPage
KakaoGame
Commerce
Platform
KakaoPick
KakaoMusic
Marketing
Platform
Yellow IDMedia Daum
Daum tvPot
Local
Platform
Daum Map
Daum Webtoon
Personal
Platform
Sol calendarKakaoPlace
Sol Mail
Zap
Plus FriendGift Shop
Digital Item Store KakaoStyle
KakaoHome
Sol Group
Story Plus
Daum Cluod
Biggest mobile SNS
in Korea
96% of Korean
smartphone users are
using KakaoTalk messenger,
170 million users
worldwide)
4
Our Social Graph
Message
length : 9
Write
length : 3
Read
impact :3
Coupon
price : 10
Present
price : 3
affinity 6affinity: 9
affinity 3
affinity 3
affinity 4
affinity 1
affinity 2
affinity 2
affinity 9
Friend
Group
size : 6
Emoticon
count : 7
Eat
rating : 4
View
count : 8
Play
level: 6
Pick
withFriend : 3
Advertise
ctr : 0.32
Search
keyword
: ā€œHBase"
Listen
count : 6
Like
count : 7
Comment
length : 15
affinity 3
5
Our Social Graph
Message
length : 9
Write
length : 3
affinity 6affinity: 9
affinity 3
affinity 3
affinity 4
affinity 1
affinity 2
affinity 2
affinity 9
Friend
Play
level: 6
Pick
withFriend : 3
Advertise
ctr : 0.32
Search
keyword
: ā€œHBase"
Listen
count : 6
C
l
affinity 3
Message ID : 201
Ad ID : 603
Music ID
Item ID : 13
Post ID : 97
Game ID : 1984
6
Technical Challenges
1. Large social graph constantly changing
a. Scale
more than,
social network: 10 billion edges, 200 million vertices, 50 million update on existing edges.
user activities: 400 million new edges per day
7
Technical Challenges (cont)
2. Low latency for breadth first search traversal on connected data.
a. performance requirement
peak graph-traversing query per second: 20000
response time: 100ms
8
Technical Challenges (cont)
3. Update should be applied to graph in real time for viral effect
Person A
Post
Fast Person B
Comment
Person C
Sharing
Person D
Mention
Fast Fast
9
Technical Challenges (cont)
4. Support for Dynamic Ranking logic
a. push strategy: hard to change data ranking logic dynamically.
b. pull strategy: can try various data ranking logic
10
Before
Each app server should know each DBā€™s sharding logic.
Highly inter-connected architecture
Friend relationship SNS feeds Blog user activities Messaging
Messaging
App
SNS
App
Blog
App
11
After
SNS
App
Blog
App
Messaging
App
S2Graph DB
stateless app servers
12
S2Graph : Distributed Online GraphDB
1. Low-latency
2.Graph-traversable
3.Scalable
4.Eventually consistent
5.Asynchronous, non-blocking
13
Why We Choose HBase?
1. High Availability
2.Scalability
3.Low latency
4.High concurrency
5.Fault tolerant
6.Integration with HDFS
7.Distributed operation
14
The Data Model
1. Columns
2. Labels
3. Directions
4. Index Properties
5. Non-index Properties 1 3comment
4
know created
name = ā€œjoshā€ā€Ø
age = 32
edge 2 source vertex
vertex 1 out edges
edge 2 target vertex
2
edge 2 label
vertex 2 in edges
vertex 4 id
vertex 4 properties
date = 20150507
edge 5 properties
5
15
How to store the data - Edge
Logical View
1. Snapshot edges : Up-to-date status of edge
column
row
Tgt Vertex ID1 Tgt Vertex ID2 Tgt Vertex ID3
Src Vertex ID1 Properties Properties Properties
Src Vertex ID2 Properties Properties Properties
a. Fetching an edge between two specific vertex
b. Lookup Table to reach indexed edges for update, increment, delete operations
16
How to store the data - Edge
Logical View
2. Indexed edges : Edges with index
column
row
Index Values | Tgt Vertex ID1 Index Values | Tgt Vertex ID2
Src Vertex ID1 Non-index Properties Non-index Properties
a. Fetches edges originating from a certain vertex in order of index
17
How to store the data - Edge
Physical View - table schema
1. Snapshot Edge
a. Rowkey
Murmur Hash Src Vertex ID Label ID Direction Index Sequence Is Inverted
16 bit variable length 30 bit 2 bit 7bit 1 bit
Vertex IDs can be encoded with 8 bit header + byte array (long, integer, short, byte, string)
18
How to store the data - Edge
Physical View - table schema
1. Snapshot Edge
b. Qualifier
Target Vertex ID
variable length
c. Value
All Property Key Value Pairs
variable length
19
How to store the data - Edge
Physical View - table schema
2. Indexed Edge
a. Rowkey
Murmur Hash Src Vertex ID Label ID Direction Index Sequence Is Inverted
16 bit variable length 30 bit 2 bit 7bit 1 bit
Vertex IDs can be encoded with 8 bit header + byte array (long, integer, short, byte, string)
20
How to store the data - Edge
Physical View - table schema
2. Indexed Edge
b. Qualifier
Index Property Values Tgt Vertex ID
variable length variable length
c. Value
Non-index Property Key Value Pairs
variable length
21
How to store the data - Vertex
Logical View
1. Vertex : Up-to-date status of Vertex
column
row
Property Key1 Property Key2
Src Vertex ID1 Value1 Value2
Vertex ID2 Value1 Value2
22
How to store the data - Vertex
Physical View - table schema
1. Vertex : Up-to-date status of Vertex
a. Rowkey
Murmur Hash Column ID Vertex ID
16 bit integer(32bit) variable length
b. Qualifier
Property Key
Byte(8 bit)
c. Value
Property Value
variable length
23
How to read the data - GetEdges
Using a custom query DSL on top of HTTP
curl -XPOST localhost:9000/graphs/getEdges -H 'Content-Type: Application/json' -d '
{
"srcVertices": [{"serviceName": "s2graph", "columnName": "account_id", "id":1}],
"steps": [
[{"label": "friends", "direction": "out", "limit": 100}], // step
[{"label": "hear", "direction": "out", "limit": 10}]
]
}
'
Steps = a list of Step
Step = contains the labels to traverse
and how to rank them in the result
Step 1
friend 1
hear
time: 20140502
hear
time: 20140712
hear
time: 20141116
Friends Friends
friend 2
User 1
Step 2
Donā€™t let go let it be let it go
24
How to read the data - GetEdges Example
Friend list
curl -XPOST localhost:9000/graphs/getEdges -H 'Content-Type: Application/json' -d '
{
"srcVertices": [{"serviceName": "s2graph", "columnName": "account_id", "id":1}],
"steps": [
[{"label": "friends", "direction": "out", "limit": 100}], // step
]
}
'
friend 1 friend 2
User 1
Friends Friends
25
How to read the data - GetEdges Example
Songs my friends have listened
curl -XPOST localhost:9000/graphs/getEdges -H 'Content-Type: Application/json' -d '
{
"srcVertices": [{"serviceName": "s2graph", "columnName": "account_id", "id":1}],
"steps": [
[{"label": "friends", "direction": "out", "limit": 50, ā€œscoringā€: {ā€œscoreā€: 1.0}],
[{"label": "listen", "direction": "out", "limit": 10}]
]
}
'
friend 1
Friends Friends
friend 2
Donā€™t let go let it be let it go
hear
time: 20140502
hear
time: 20140712
hear
time: 20141116
User 1
Reference : https://github.com/daumkakao/s2graph#1-definition
26
How to read the data - GetEdges Example
Similar songs to songs that I have listened to.
curl -XPOST localhost:9000/graphs/getEdges -H 'Content-Type: Application/json' -d '
{
"srcVertices": [{"serviceName": "s2graph", "columnName": "account_id", "id":1}],
"steps": [
[{"label": "listen", "direction": "out", "limit": 50}],
[{"label": "similar_song", "direction": "out", "limit": 10, ā€œscoringā€: {ā€œscoreā€: 1.0}]
]
}
User 1
Donā€™t let go let it be let it go
hear
time: 20140502
hear
time: 20140712
hear
time: 20141116
let it bleed Hey jude Do you wanna
build a snowman?
similar_song
similarity: 0.3
similar_song
similarity: 0.4
similar_song
similarity: 0.6
27
How to read the data - GetVertices
curl -XPOST localhost:9000/graphs/getVertices -H 'Content-Type: Application/json' -d '
[
{"serviceName": "s2graph", "columnName": "account_id", "ids": [1, 2, 3]},
{"serviceName": "kakaomusic", "columnName": "user_id", "ids": [1, 2, 3]}
]
'
User 1
{created_at:20070812,
updated_at:20150507}
User 2
{created_at:201206132,
updated_at:20140505}
28
How to write the data - Insert
curl -XPOST localhost:9000/graphs/edges/insert -H 'Content-Type: Application/json' -d '
[
{"from":1,"to":2,"label":"graph_test","props":{"time":-1, "weight":10},"timestamp":1417616431},
]
'
User 1 User 2
29
How to write the data - Delete
curl -XPOST localhost:9000/graphs/edges/delete -H 'Content-Type: Application/json' -d '
[
{"from":1,"to":2,"label":"graph_test","timestamp":1417616431},
{"from":1,"to":3,"label":"graph_test","timestamp":1417616431},
]
'
User 1 User 2
30
How to write the data - Update
curl -XPOST localhost:9000/graphs/edges/update -H 'Content-Type: Application/json' -d '
[
{"from":1,"to":2,"label":"graph_test","timestamp":1417616431, "props": {"is_hidden": true, ā€œstatusā€: 200},
{"from":1,"to":3,"label":"graph_test","timestamp":1417616431, "props": {"status": -500}
]
User 1 User 2
friend
{is_hidden:true,
status:200}
31
REST API Spec.
Read
1. getEdges
2. checkEdge
3. getEdgesCount
4. getVertices
Write
1. insert
2. delete
3. update
4. increment
Management
1. create service (vertex type)
2. create label (edge type)
3. add Index
32
HBase Table Configuration
1. setDurability(Durability.ASYNC_WAL)
2. setCompressionType(Compression.Algorithm.LZ4)
3. setBloomFilterType(BloomType.Row)
4. setDataBlockEncoding(DataBlockEncoding.FAST_DIFF)
5. setBlockSize(32768)
6. setBlockCacheEnabled(true)
7. pre-split by (Intger.MaxValue / regionCount). regionCount = 120 when create table(on 20 region server).
33
HBase Cluster Configuration
ā€¢ each machine: 8core, 32G memory
ā€¢ hfile.block.cache.size: 0.6
ā€¢ hbase.hregion.memstore.flush.size: 128MB
ā€¢ otherwise use default value from CDH 5.3.1
34
Overall Architecture
S2graphClients
35
Compare to other Online GraphDBs
Titan (v0.4.2)
a. Prosā€Ø
- Rich API and easy to setupā€Ø
- Relatively large communityā€Ø
- Transaction handling
b. Consā€Ø
- Using itā€™s own ID system; less efficient for graph traversal (details in next slide)ā€Ø
- Index data stored on one region (hotspot) with strong consistency optionā€Ø
- Not many references on Titan with HBase comparing to other storages
36
Compare to Titan
Titan is less efficient for graph traversal
- For following 1 normal graph traversal query,
Vertex(ā€œuserID:1ā€).out(ā€œfriendsā€).limit(10).out(ā€œfriendsā€).limit(10)
User 1
friends
friends
37
Compare to Titan (cont)
Vertex(ā€œuserID:1ā€).out(ā€œfriendā€).limit(10).out(ā€œfriendā€).limit(10)
Titan S2graph
# of read requests ā€Ø
on HBase
112 =
1 (Vertex Lookup : a)
+ 1 (1st step edges : b)
+ 10 (2nd step edges : c)
+ 100 (Destination Vertices : d)
11 =
1 (1step edges : e)
+ 10 (2nd step edges : f)
Titan S2graph
B
A
C
D
e
f
38
Performance
1. Test data
a. Total # of Edges: 9,000,000,000
b. Average # of adjacent edges per vertex: 500
c. Seed vertex: vertices that has more than 100 adjacent edges.
2. Test environment
a. Zookeeper server: 3
b. HBase Masterserver: 2
c. HBase Regionserver: 20
d. App server: 8 core, 16GB Ram
39
- Benchmark Query : src.out(ā€œfriendā€).limit(50).out(ā€œfriendā€).limit(10)
- Total concurrency: 20 * # of app server
Performance
2. Linear scalability
Latency
0
50
100
150
200
QPS
0
1,000
2,000
3,000
4,000
# of app server
1 2 4 8
QPS(Query Per Second) Latency(ms)
51515047
3,097
1,567
803
42147 50 51 51
# of app server
1 2 3 4 5 6 7 8
50010001500200025003000
QPS
40
Performance
3. Varying width of traverse (tested with a single server)
Latency
0
75
150
225
300
QPS
0
500
1,000
1,500
2,000
Limit on ļ¬rst step
20 40 80 100
QPS Latency(ms)
97
75
43
23 203266
457
943
23
43
75
97
- Benchmark Query : src.out(ā€œfriendā€).limit(x).out(ā€œfriendā€).limit(10)
- Total concurrency = 20 * 1(# of app server)
Performance
3. Varying width of traverse (tested with a single server)
Latency
0
75
150
225
300
QPS
0
500
1,000
1,500
2,000
Limit on ļ¬rst step
20 40 80 100
QPS Latency(ms)
97
75
43
23 203266
457
943
23
43
75
97
- Benchmark Query : src.out(ā€œfriendā€).limit(x).out(ā€œfriendā€).limit(10)
- Total concurrency = 20 * 1(# of app server)
42
- All query touch 1000 edges.
- each step` limit is on x axis.
- Can expect performance with given query`s search space.
Performance
4. Different query path(different I/O pattern)
Latency
0
37.5
75
112.5
150
QPS
0
80
160
240
320
400
limits on path
10 -> 100 100 -> 10 10 -> 10 -> 10 2 -> 5 -> 10 -> 10 2 -> 5 -> 2 -> 5 -> 10
QPS Latency(ms)
5667716867
352.2
298.1280272.5297
67 68 71 67 56
43
Performance
5. Write throughput per operation on single app server
Insert operation
Latency
0
1.25
2.5
3.75
5
Request per second
8000 16000 800000
44
Performance
6. write throughput per operation on single app server
Update(increment/update/delete) operation
Latency
0
2
4
6
8
Request per second
2000 4000 6000
45
Stats
1. HBase cluster per IDC (2 IDC)
- 3 Zookeeper Server
- 2 HBase Master
- 20 HBase Slave
2. App server per IDC
- 10 server for write-only
- 20 server for query only
3. Real traffic
- read: over 10K request per second
- now mostly 2 step queries with limit 100 on first step.
- write: over 5k request per second
* Deep traversal queries are not counted since it is in test stage for production
46
47
Through HBase !
48
Now Available As an Open Source
- https://github.com/daumkakao/s2graph
- Finding a mentor
Contact
- Taejin Chin : taejin.chin@gmail.com
- Doyoung Yoon : shom83@gmail.com
49
Latency
0
50
100
150
200
QPS
0
500
1,000
1,500
2,000
# of app server
1 2 3 4 5
Native Client QPS
Native Client Latency(ms)
174186189
177178
570
429315224112
178 177
189 186 174
- Benchmark Query : src.out(ā€œfriendā€).limit(50).out(ā€œfriendā€).limit(10)
- Test seed edges have adjacent edges more than 100: 30millions
- Total concurrency: 20 * # of app server
Appendix
Latency
0
50
100
150
200
QPS
0
500
1,000
1,500
2,000
# of app server
1 2 3 4 5
Asnychbase QPS
Asynchbase Latency(ms)
5351505047
1,895
1,567
1,192
803
421
47 50 50 51 53
3.5x performance improvement using Asynchbase

More Related Content

What's hot

HBaseCon 2012 | HBase for the Worlds Libraries - OCLC
HBaseCon 2012 | HBase for the Worlds Libraries - OCLCHBaseCon 2012 | HBase for the Worlds Libraries - OCLC
HBaseCon 2012 | HBase for the Worlds Libraries - OCLC
Cloudera, Inc.
Ā 
Real-Time Video Analytics Using Hadoop and HBase (HBaseCon 2013)
Real-Time Video Analytics Using Hadoop and HBase (HBaseCon 2013)Real-Time Video Analytics Using Hadoop and HBase (HBaseCon 2013)
Real-Time Video Analytics Using Hadoop and HBase (HBaseCon 2013)
Suman Srinivasan
Ā 
Rolling Out Apache HBase for Mobile Offerings at Visa
Rolling Out Apache HBase for Mobile Offerings at Visa Rolling Out Apache HBase for Mobile Offerings at Visa
Rolling Out Apache HBase for Mobile Offerings at Visa
HBaseCon
Ā 
Introduction to the Hadoop Ecosystem (IT-Stammtisch Darmstadt Edition)
Introduction to the Hadoop Ecosystem (IT-Stammtisch Darmstadt Edition)Introduction to the Hadoop Ecosystem (IT-Stammtisch Darmstadt Edition)
Introduction to the Hadoop Ecosystem (IT-Stammtisch Darmstadt Edition)
Uwe Printz
Ā 
2013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.0
2013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.02013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.0
2013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.0
Adam Muise
Ā 
HBaseCon 2013: HBase SEP - Reliable Maintenance of Auxiliary Index Structures
HBaseCon 2013: HBase SEP - Reliable Maintenance of Auxiliary Index StructuresHBaseCon 2013: HBase SEP - Reliable Maintenance of Auxiliary Index Structures
HBaseCon 2013: HBase SEP - Reliable Maintenance of Auxiliary Index Structures
Cloudera, Inc.
Ā 
HBase Backups
HBase BackupsHBase Backups
HBase Backups
HBaseCon
Ā 
Building a Versatile Analytics Pipeline on Top of Apache Spark with Mikhail C...
Building a Versatile Analytics Pipeline on Top of Apache Spark with Mikhail C...Building a Versatile Analytics Pipeline on Top of Apache Spark with Mikhail C...
Building a Versatile Analytics Pipeline on Top of Apache Spark with Mikhail C...
Databricks
Ā 
Introduction to the Hadoop Ecosystem (codemotion Edition)
Introduction to the Hadoop Ecosystem (codemotion Edition)Introduction to the Hadoop Ecosystem (codemotion Edition)
Introduction to the Hadoop Ecosystem (codemotion Edition)
Uwe Printz
Ā 
HBaseCon 2015: HBase and Spark
HBaseCon 2015: HBase and SparkHBaseCon 2015: HBase and Spark
HBaseCon 2015: HBase and Spark
HBaseCon
Ā 
Apache Drill and Zeppelin: Two Promising Tools You've Never Heard Of
Apache Drill and Zeppelin: Two Promising Tools You've Never Heard OfApache Drill and Zeppelin: Two Promising Tools You've Never Heard Of
Apache Drill and Zeppelin: Two Promising Tools You've Never Heard Of
Charles Givre
Ā 
SQOOP - RDBMS to Hadoop
SQOOP - RDBMS to HadoopSQOOP - RDBMS to Hadoop
SQOOP - RDBMS to Hadoop
Sofian Hadiwijaya
Ā 
Hoodie - DataEngConf 2017
Hoodie - DataEngConf 2017Hoodie - DataEngConf 2017
Hoodie - DataEngConf 2017
Vinoth Chandar
Ā 
Running Apache Spark on a High-Performance Cluster Using RDMA and NVMe Flash ...
Running Apache Spark on a High-Performance Cluster Using RDMA and NVMe Flash ...Running Apache Spark on a High-Performance Cluster Using RDMA and NVMe Flash ...
Running Apache Spark on a High-Performance Cluster Using RDMA and NVMe Flash ...
Databricks
Ā 
Hadoop Operations Powered By ... Hadoop (Hadoop Summit 2014 Amsterdam)
Hadoop Operations Powered By ... Hadoop (Hadoop Summit 2014 Amsterdam)Hadoop Operations Powered By ... Hadoop (Hadoop Summit 2014 Amsterdam)
Hadoop Operations Powered By ... Hadoop (Hadoop Summit 2014 Amsterdam)
Adam Kawa
Ā 
Keynote: Getting Serious about MySQL and Hadoop at Continuent
Keynote: Getting Serious about MySQL and Hadoop at ContinuentKeynote: Getting Serious about MySQL and Hadoop at Continuent
Keynote: Getting Serious about MySQL and Hadoop at Continuent
Continuent
Ā 
Hadoop and HBase @eBay
Hadoop and HBase @eBayHadoop and HBase @eBay
Hadoop and HBase @eBay
DataWorks Summit
Ā 
HBaseCon 2013: How (and Why) Phoenix Puts the SQL Back into NoSQL
HBaseCon 2013: How (and Why) Phoenix Puts the SQL Back into NoSQLHBaseCon 2013: How (and Why) Phoenix Puts the SQL Back into NoSQL
HBaseCon 2013: How (and Why) Phoenix Puts the SQL Back into NoSQL
Cloudera, Inc.
Ā 
Apache Drill - Why, What, How
Apache Drill - Why, What, HowApache Drill - Why, What, How
Apache Drill - Why, What, How
mcsrivas
Ā 
Mercury: Hybrid Centralized and Distributed Scheduling in Large Shared Clusters
Mercury: Hybrid Centralized and Distributed Scheduling in Large Shared ClustersMercury: Hybrid Centralized and Distributed Scheduling in Large Shared Clusters
Mercury: Hybrid Centralized and Distributed Scheduling in Large Shared Clusters
DataWorks Summit
Ā 

What's hot (20)

HBaseCon 2012 | HBase for the Worlds Libraries - OCLC
HBaseCon 2012 | HBase for the Worlds Libraries - OCLCHBaseCon 2012 | HBase for the Worlds Libraries - OCLC
HBaseCon 2012 | HBase for the Worlds Libraries - OCLC
Ā 
Real-Time Video Analytics Using Hadoop and HBase (HBaseCon 2013)
Real-Time Video Analytics Using Hadoop and HBase (HBaseCon 2013)Real-Time Video Analytics Using Hadoop and HBase (HBaseCon 2013)
Real-Time Video Analytics Using Hadoop and HBase (HBaseCon 2013)
Ā 
Rolling Out Apache HBase for Mobile Offerings at Visa
Rolling Out Apache HBase for Mobile Offerings at Visa Rolling Out Apache HBase for Mobile Offerings at Visa
Rolling Out Apache HBase for Mobile Offerings at Visa
Ā 
Introduction to the Hadoop Ecosystem (IT-Stammtisch Darmstadt Edition)
Introduction to the Hadoop Ecosystem (IT-Stammtisch Darmstadt Edition)Introduction to the Hadoop Ecosystem (IT-Stammtisch Darmstadt Edition)
Introduction to the Hadoop Ecosystem (IT-Stammtisch Darmstadt Edition)
Ā 
2013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.0
2013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.02013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.0
2013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.0
Ā 
HBaseCon 2013: HBase SEP - Reliable Maintenance of Auxiliary Index Structures
HBaseCon 2013: HBase SEP - Reliable Maintenance of Auxiliary Index StructuresHBaseCon 2013: HBase SEP - Reliable Maintenance of Auxiliary Index Structures
HBaseCon 2013: HBase SEP - Reliable Maintenance of Auxiliary Index Structures
Ā 
HBase Backups
HBase BackupsHBase Backups
HBase Backups
Ā 
Building a Versatile Analytics Pipeline on Top of Apache Spark with Mikhail C...
Building a Versatile Analytics Pipeline on Top of Apache Spark with Mikhail C...Building a Versatile Analytics Pipeline on Top of Apache Spark with Mikhail C...
Building a Versatile Analytics Pipeline on Top of Apache Spark with Mikhail C...
Ā 
Introduction to the Hadoop Ecosystem (codemotion Edition)
Introduction to the Hadoop Ecosystem (codemotion Edition)Introduction to the Hadoop Ecosystem (codemotion Edition)
Introduction to the Hadoop Ecosystem (codemotion Edition)
Ā 
HBaseCon 2015: HBase and Spark
HBaseCon 2015: HBase and SparkHBaseCon 2015: HBase and Spark
HBaseCon 2015: HBase and Spark
Ā 
Apache Drill and Zeppelin: Two Promising Tools You've Never Heard Of
Apache Drill and Zeppelin: Two Promising Tools You've Never Heard OfApache Drill and Zeppelin: Two Promising Tools You've Never Heard Of
Apache Drill and Zeppelin: Two Promising Tools You've Never Heard Of
Ā 
SQOOP - RDBMS to Hadoop
SQOOP - RDBMS to HadoopSQOOP - RDBMS to Hadoop
SQOOP - RDBMS to Hadoop
Ā 
Hoodie - DataEngConf 2017
Hoodie - DataEngConf 2017Hoodie - DataEngConf 2017
Hoodie - DataEngConf 2017
Ā 
Running Apache Spark on a High-Performance Cluster Using RDMA and NVMe Flash ...
Running Apache Spark on a High-Performance Cluster Using RDMA and NVMe Flash ...Running Apache Spark on a High-Performance Cluster Using RDMA and NVMe Flash ...
Running Apache Spark on a High-Performance Cluster Using RDMA and NVMe Flash ...
Ā 
Hadoop Operations Powered By ... Hadoop (Hadoop Summit 2014 Amsterdam)
Hadoop Operations Powered By ... Hadoop (Hadoop Summit 2014 Amsterdam)Hadoop Operations Powered By ... Hadoop (Hadoop Summit 2014 Amsterdam)
Hadoop Operations Powered By ... Hadoop (Hadoop Summit 2014 Amsterdam)
Ā 
Keynote: Getting Serious about MySQL and Hadoop at Continuent
Keynote: Getting Serious about MySQL and Hadoop at ContinuentKeynote: Getting Serious about MySQL and Hadoop at Continuent
Keynote: Getting Serious about MySQL and Hadoop at Continuent
Ā 
Hadoop and HBase @eBay
Hadoop and HBase @eBayHadoop and HBase @eBay
Hadoop and HBase @eBay
Ā 
HBaseCon 2013: How (and Why) Phoenix Puts the SQL Back into NoSQL
HBaseCon 2013: How (and Why) Phoenix Puts the SQL Back into NoSQLHBaseCon 2013: How (and Why) Phoenix Puts the SQL Back into NoSQL
HBaseCon 2013: How (and Why) Phoenix Puts the SQL Back into NoSQL
Ā 
Apache Drill - Why, What, How
Apache Drill - Why, What, HowApache Drill - Why, What, How
Apache Drill - Why, What, How
Ā 
Mercury: Hybrid Centralized and Distributed Scheduling in Large Shared Clusters
Mercury: Hybrid Centralized and Distributed Scheduling in Large Shared ClustersMercury: Hybrid Centralized and Distributed Scheduling in Large Shared Clusters
Mercury: Hybrid Centralized and Distributed Scheduling in Large Shared Clusters
Ā 

Similar to HBaseCon 2015: S2Graph - A Large-scale Graph Database with HBase

MongoDB Stich Overview
MongoDB Stich OverviewMongoDB Stich Overview
MongoDB Stich Overview
MongoDB
Ā 
[263] s2graph large-scale-graph-database-with-hbase-2
[263] s2graph large-scale-graph-database-with-hbase-2[263] s2graph large-scale-graph-database-with-hbase-2
[263] s2graph large-scale-graph-database-with-hbase-2
NAVER D2
Ā 
Graph Database Use Cases - StampedeCon 2015
Graph Database Use Cases - StampedeCon 2015Graph Database Use Cases - StampedeCon 2015
Graph Database Use Cases - StampedeCon 2015
StampedeCon
Ā 
Graph database Use Cases
Graph database Use CasesGraph database Use Cases
Graph database Use Cases
Max De Marzi
Ā 
Evolving your Data Access with MongoDB Stitch - Drew Di Palma
Evolving your Data Access with MongoDB Stitch - Drew Di PalmaEvolving your Data Access with MongoDB Stitch - Drew Di Palma
Evolving your Data Access with MongoDB Stitch - Drew Di Palma
MongoDB
Ā 
Scalding big ADta
Scalding big ADtaScalding big ADta
Scalding big ADta
b0ris_1
Ā 
Composable Data Processing with Apache Spark
Composable Data Processing with Apache SparkComposable Data Processing with Apache Spark
Composable Data Processing with Apache Spark
Databricks
Ā 
Introducing MongoDB Stitch, Backend-as-a-Service from MongoDB
Introducing MongoDB Stitch, Backend-as-a-Service from MongoDBIntroducing MongoDB Stitch, Backend-as-a-Service from MongoDB
Introducing MongoDB Stitch, Backend-as-a-Service from MongoDB
MongoDB
Ā 
Java/Scala Lab: Š‘Š¾Ń€Šøс Š¢Ń€Š¾Ń„ŠøŠ¼Š¾Š² - ŠžŠ±Š¶ŠøŠ³Š°ŃŽŃ‰Š°Ń Big Data.
Java/Scala Lab: Š‘Š¾Ń€Šøс Š¢Ń€Š¾Ń„ŠøŠ¼Š¾Š² - ŠžŠ±Š¶ŠøŠ³Š°ŃŽŃ‰Š°Ń Big Data.Java/Scala Lab: Š‘Š¾Ń€Šøс Š¢Ń€Š¾Ń„ŠøŠ¼Š¾Š² - ŠžŠ±Š¶ŠøŠ³Š°ŃŽŃ‰Š°Ń Big Data.
Java/Scala Lab: Š‘Š¾Ń€Šøс Š¢Ń€Š¾Ń„ŠøŠ¼Š¾Š² - ŠžŠ±Š¶ŠøŠ³Š°ŃŽŃ‰Š°Ń Big Data.
GeeksLab Odessa
Ā 
Eagle6 mongo dc revised
Eagle6 mongo dc revisedEagle6 mongo dc revised
Eagle6 mongo dc revisedMongoDB
Ā 
Eagle6 Enterprise Situational Awareness
Eagle6 Enterprise Situational AwarenessEagle6 Enterprise Situational Awareness
Eagle6 Enterprise Situational Awareness
MongoDB
Ā 
MongoDB Stitch Introduction
MongoDB Stitch IntroductionMongoDB Stitch Introduction
MongoDB Stitch Introduction
MongoDB
Ā 
RedisConf18 - Redis Memory Optimization
RedisConf18 - Redis Memory OptimizationRedisConf18 - Redis Memory Optimization
RedisConf18 - Redis Memory Optimization
Redis Labs
Ā 
Swift distributed tracing method and tools v2
Swift distributed tracing method and tools v2Swift distributed tracing method and tools v2
Swift distributed tracing method and tools v2
zhang hua
Ā 
PerchĆØ potresti aver bisogno di un database NoSQL anche se non sei Google o F...
PerchĆØ potresti aver bisogno di un database NoSQL anche se non sei Google o F...PerchĆØ potresti aver bisogno di un database NoSQL anche se non sei Google o F...
PerchĆØ potresti aver bisogno di un database NoSQL anche se non sei Google o F...
Codemotion
Ā 
Reactive Data Centric Architectures with DDS
Reactive Data Centric Architectures with DDSReactive Data Centric Architectures with DDS
Reactive Data Centric Architectures with DDS
Angelo Corsaro
Ā 
Socialite, the Open Source Status Feed
Socialite, the Open Source Status FeedSocialite, the Open Source Status Feed
Socialite, the Open Source Status Feed
MongoDB
Ā 
viWave Study Group - Introduction to Google Android Development - Chapter 23 ...
viWave Study Group - Introduction to Google Android Development - Chapter 23 ...viWave Study Group - Introduction to Google Android Development - Chapter 23 ...
viWave Study Group - Introduction to Google Android Development - Chapter 23 ...
Ted Chien
Ā 
Paradigmas de procesamiento en Big Data: estado actual, tendencias y oportu...
Paradigmas de procesamiento en  Big Data: estado actual,  tendencias y oportu...Paradigmas de procesamiento en  Big Data: estado actual,  tendencias y oportu...
Paradigmas de procesamiento en Big Data: estado actual, tendencias y oportu...
Facultad de InformƔtica UCM
Ā 
Best Practices for Building and Deploying Data Pipelines in Apache Spark
Best Practices for Building and Deploying Data Pipelines in Apache SparkBest Practices for Building and Deploying Data Pipelines in Apache Spark
Best Practices for Building and Deploying Data Pipelines in Apache Spark
Databricks
Ā 

Similar to HBaseCon 2015: S2Graph - A Large-scale Graph Database with HBase (20)

MongoDB Stich Overview
MongoDB Stich OverviewMongoDB Stich Overview
MongoDB Stich Overview
Ā 
[263] s2graph large-scale-graph-database-with-hbase-2
[263] s2graph large-scale-graph-database-with-hbase-2[263] s2graph large-scale-graph-database-with-hbase-2
[263] s2graph large-scale-graph-database-with-hbase-2
Ā 
Graph Database Use Cases - StampedeCon 2015
Graph Database Use Cases - StampedeCon 2015Graph Database Use Cases - StampedeCon 2015
Graph Database Use Cases - StampedeCon 2015
Ā 
Graph database Use Cases
Graph database Use CasesGraph database Use Cases
Graph database Use Cases
Ā 
Evolving your Data Access with MongoDB Stitch - Drew Di Palma
Evolving your Data Access with MongoDB Stitch - Drew Di PalmaEvolving your Data Access with MongoDB Stitch - Drew Di Palma
Evolving your Data Access with MongoDB Stitch - Drew Di Palma
Ā 
Scalding big ADta
Scalding big ADtaScalding big ADta
Scalding big ADta
Ā 
Composable Data Processing with Apache Spark
Composable Data Processing with Apache SparkComposable Data Processing with Apache Spark
Composable Data Processing with Apache Spark
Ā 
Introducing MongoDB Stitch, Backend-as-a-Service from MongoDB
Introducing MongoDB Stitch, Backend-as-a-Service from MongoDBIntroducing MongoDB Stitch, Backend-as-a-Service from MongoDB
Introducing MongoDB Stitch, Backend-as-a-Service from MongoDB
Ā 
Java/Scala Lab: Š‘Š¾Ń€Šøс Š¢Ń€Š¾Ń„ŠøŠ¼Š¾Š² - ŠžŠ±Š¶ŠøŠ³Š°ŃŽŃ‰Š°Ń Big Data.
Java/Scala Lab: Š‘Š¾Ń€Šøс Š¢Ń€Š¾Ń„ŠøŠ¼Š¾Š² - ŠžŠ±Š¶ŠøŠ³Š°ŃŽŃ‰Š°Ń Big Data.Java/Scala Lab: Š‘Š¾Ń€Šøс Š¢Ń€Š¾Ń„ŠøŠ¼Š¾Š² - ŠžŠ±Š¶ŠøŠ³Š°ŃŽŃ‰Š°Ń Big Data.
Java/Scala Lab: Š‘Š¾Ń€Šøс Š¢Ń€Š¾Ń„ŠøŠ¼Š¾Š² - ŠžŠ±Š¶ŠøŠ³Š°ŃŽŃ‰Š°Ń Big Data.
Ā 
Eagle6 mongo dc revised
Eagle6 mongo dc revisedEagle6 mongo dc revised
Eagle6 mongo dc revised
Ā 
Eagle6 Enterprise Situational Awareness
Eagle6 Enterprise Situational AwarenessEagle6 Enterprise Situational Awareness
Eagle6 Enterprise Situational Awareness
Ā 
MongoDB Stitch Introduction
MongoDB Stitch IntroductionMongoDB Stitch Introduction
MongoDB Stitch Introduction
Ā 
RedisConf18 - Redis Memory Optimization
RedisConf18 - Redis Memory OptimizationRedisConf18 - Redis Memory Optimization
RedisConf18 - Redis Memory Optimization
Ā 
Swift distributed tracing method and tools v2
Swift distributed tracing method and tools v2Swift distributed tracing method and tools v2
Swift distributed tracing method and tools v2
Ā 
PerchĆØ potresti aver bisogno di un database NoSQL anche se non sei Google o F...
PerchĆØ potresti aver bisogno di un database NoSQL anche se non sei Google o F...PerchĆØ potresti aver bisogno di un database NoSQL anche se non sei Google o F...
PerchĆØ potresti aver bisogno di un database NoSQL anche se non sei Google o F...
Ā 
Reactive Data Centric Architectures with DDS
Reactive Data Centric Architectures with DDSReactive Data Centric Architectures with DDS
Reactive Data Centric Architectures with DDS
Ā 
Socialite, the Open Source Status Feed
Socialite, the Open Source Status FeedSocialite, the Open Source Status Feed
Socialite, the Open Source Status Feed
Ā 
viWave Study Group - Introduction to Google Android Development - Chapter 23 ...
viWave Study Group - Introduction to Google Android Development - Chapter 23 ...viWave Study Group - Introduction to Google Android Development - Chapter 23 ...
viWave Study Group - Introduction to Google Android Development - Chapter 23 ...
Ā 
Paradigmas de procesamiento en Big Data: estado actual, tendencias y oportu...
Paradigmas de procesamiento en  Big Data: estado actual,  tendencias y oportu...Paradigmas de procesamiento en  Big Data: estado actual,  tendencias y oportu...
Paradigmas de procesamiento en Big Data: estado actual, tendencias y oportu...
Ā 
Best Practices for Building and Deploying Data Pipelines in Apache Spark
Best Practices for Building and Deploying Data Pipelines in Apache SparkBest Practices for Building and Deploying Data Pipelines in Apache Spark
Best Practices for Building and Deploying Data Pipelines in Apache Spark
Ā 

More from HBaseCon

hbaseconasia2017: Building online HBase cluster of Zhihu based on Kubernetes
hbaseconasia2017: Building online HBase cluster of Zhihu based on Kuberneteshbaseconasia2017: Building online HBase cluster of Zhihu based on Kubernetes
hbaseconasia2017: Building online HBase cluster of Zhihu based on Kubernetes
HBaseCon
Ā 
hbaseconasia2017: HBase on Beam
hbaseconasia2017: HBase on Beamhbaseconasia2017: HBase on Beam
hbaseconasia2017: HBase on Beam
HBaseCon
Ā 
hbaseconasia2017: HBase Disaster Recovery Solution at Huawei
hbaseconasia2017: HBase Disaster Recovery Solution at Huaweihbaseconasia2017: HBase Disaster Recovery Solution at Huawei
hbaseconasia2017: HBase Disaster Recovery Solution at Huawei
HBaseCon
Ā 
hbaseconasia2017: Removable singularity: a story of HBase upgrade in Pinterest
hbaseconasia2017: Removable singularity: a story of HBase upgrade in Pinteresthbaseconasia2017: Removable singularity: a story of HBase upgrade in Pinterest
hbaseconasia2017: Removable singularity: a story of HBase upgrade in Pinterest
HBaseCon
Ā 
hbaseconasia2017: HareQLļ¼šåæ«é€ŸHBaseęŸ„č©¢å·„å…·ēš„ē™¼å±•éŽē؋
hbaseconasia2017: HareQLļ¼šåæ«é€ŸHBaseęŸ„č©¢å·„å…·ēš„ē™¼å±•éŽē؋hbaseconasia2017: HareQLļ¼šåæ«é€ŸHBaseęŸ„č©¢å·„å…·ēš„ē™¼å±•éŽē؋
hbaseconasia2017: HareQLļ¼šåæ«é€ŸHBaseęŸ„č©¢å·„å…·ēš„ē™¼å±•éŽē؋
HBaseCon
Ā 
hbaseconasia2017: Apache HBase at Netease
hbaseconasia2017: Apache HBase at Neteasehbaseconasia2017: Apache HBase at Netease
hbaseconasia2017: Apache HBase at Netease
HBaseCon
Ā 
hbaseconasia2017: HBaseåœØHuluēš„ä½æē”Øå’Œå®žč·µ
hbaseconasia2017: HBaseåœØHuluēš„ä½æē”Øå’Œå®žč·µhbaseconasia2017: HBaseåœØHuluēš„ä½æē”Øå’Œå®žč·µ
hbaseconasia2017: HBaseåœØHuluēš„ä½æē”Øå’Œå®žč·µ
HBaseCon
Ā 
hbaseconasia2017: åŸŗäŗŽHBaseēš„企äøšēŗ§å¤§ę•°ę®å¹³å°
hbaseconasia2017: åŸŗäŗŽHBaseēš„企äøšēŗ§å¤§ę•°ę®å¹³å°hbaseconasia2017: åŸŗäŗŽHBaseēš„企äøšēŗ§å¤§ę•°ę®å¹³å°
hbaseconasia2017: åŸŗäŗŽHBaseēš„企äøšēŗ§å¤§ę•°ę®å¹³å°
HBaseCon
Ā 
hbaseconasia2017: HBase at JD.com
hbaseconasia2017: HBase at JD.comhbaseconasia2017: HBase at JD.com
hbaseconasia2017: HBase at JD.com
HBaseCon
Ā 
hbaseconasia2017: Large scale data near-line loading method and architecture
hbaseconasia2017: Large scale data near-line loading method and architecturehbaseconasia2017: Large scale data near-line loading method and architecture
hbaseconasia2017: Large scale data near-line loading method and architecture
HBaseCon
Ā 
hbaseconasia2017: Ecosystems with HBase and CloudTable service at Huawei
hbaseconasia2017: Ecosystems with HBase and CloudTable service at Huaweihbaseconasia2017: Ecosystems with HBase and CloudTable service at Huawei
hbaseconasia2017: Ecosystems with HBase and CloudTable service at Huawei
HBaseCon
Ā 
hbaseconasia2017: HBase Practice At XiaoMi
hbaseconasia2017: HBase Practice At XiaoMihbaseconasia2017: HBase Practice At XiaoMi
hbaseconasia2017: HBase Practice At XiaoMi
HBaseCon
Ā 
hbaseconasia2017: hbase-2.0.0
hbaseconasia2017: hbase-2.0.0hbaseconasia2017: hbase-2.0.0
hbaseconasia2017: hbase-2.0.0
HBaseCon
Ā 
HBaseCon2017 Democratizing HBase
HBaseCon2017 Democratizing HBaseHBaseCon2017 Democratizing HBase
HBaseCon2017 Democratizing HBase
HBaseCon
Ā 
HBaseCon2017 Removable singularity: a story of HBase upgrade in Pinterest
HBaseCon2017 Removable singularity: a story of HBase upgrade in PinterestHBaseCon2017 Removable singularity: a story of HBase upgrade in Pinterest
HBaseCon2017 Removable singularity: a story of HBase upgrade in Pinterest
HBaseCon
Ā 
HBaseCon2017 Quanta: Quora's hierarchical counting system on HBase
HBaseCon2017 Quanta: Quora's hierarchical counting system on HBaseHBaseCon2017 Quanta: Quora's hierarchical counting system on HBase
HBaseCon2017 Quanta: Quora's hierarchical counting system on HBase
HBaseCon
Ā 
HBaseCon2017 Transactions in HBase
HBaseCon2017 Transactions in HBaseHBaseCon2017 Transactions in HBase
HBaseCon2017 Transactions in HBase
HBaseCon
Ā 
HBaseCon2017 Highly-Available HBase
HBaseCon2017 Highly-Available HBaseHBaseCon2017 Highly-Available HBase
HBaseCon2017 Highly-Available HBase
HBaseCon
Ā 
HBaseCon2017 Apache HBase at Didi
HBaseCon2017 Apache HBase at DidiHBaseCon2017 Apache HBase at Didi
HBaseCon2017 Apache HBase at Didi
HBaseCon
Ā 
HBaseCon2017 gohbase: Pure Go HBase Client
HBaseCon2017 gohbase: Pure Go HBase ClientHBaseCon2017 gohbase: Pure Go HBase Client
HBaseCon2017 gohbase: Pure Go HBase Client
HBaseCon
Ā 

More from HBaseCon (20)

hbaseconasia2017: Building online HBase cluster of Zhihu based on Kubernetes
hbaseconasia2017: Building online HBase cluster of Zhihu based on Kuberneteshbaseconasia2017: Building online HBase cluster of Zhihu based on Kubernetes
hbaseconasia2017: Building online HBase cluster of Zhihu based on Kubernetes
Ā 
hbaseconasia2017: HBase on Beam
hbaseconasia2017: HBase on Beamhbaseconasia2017: HBase on Beam
hbaseconasia2017: HBase on Beam
Ā 
hbaseconasia2017: HBase Disaster Recovery Solution at Huawei
hbaseconasia2017: HBase Disaster Recovery Solution at Huaweihbaseconasia2017: HBase Disaster Recovery Solution at Huawei
hbaseconasia2017: HBase Disaster Recovery Solution at Huawei
Ā 
hbaseconasia2017: Removable singularity: a story of HBase upgrade in Pinterest
hbaseconasia2017: Removable singularity: a story of HBase upgrade in Pinteresthbaseconasia2017: Removable singularity: a story of HBase upgrade in Pinterest
hbaseconasia2017: Removable singularity: a story of HBase upgrade in Pinterest
Ā 
hbaseconasia2017: HareQLļ¼šåæ«é€ŸHBaseęŸ„č©¢å·„å…·ēš„ē™¼å±•éŽē؋
hbaseconasia2017: HareQLļ¼šåæ«é€ŸHBaseęŸ„č©¢å·„å…·ēš„ē™¼å±•éŽē؋hbaseconasia2017: HareQLļ¼šåæ«é€ŸHBaseęŸ„č©¢å·„å…·ēš„ē™¼å±•éŽē؋
hbaseconasia2017: HareQLļ¼šåæ«é€ŸHBaseęŸ„č©¢å·„å…·ēš„ē™¼å±•éŽē؋
Ā 
hbaseconasia2017: Apache HBase at Netease
hbaseconasia2017: Apache HBase at Neteasehbaseconasia2017: Apache HBase at Netease
hbaseconasia2017: Apache HBase at Netease
Ā 
hbaseconasia2017: HBaseåœØHuluēš„ä½æē”Øå’Œå®žč·µ
hbaseconasia2017: HBaseåœØHuluēš„ä½æē”Øå’Œå®žč·µhbaseconasia2017: HBaseåœØHuluēš„ä½æē”Øå’Œå®žč·µ
hbaseconasia2017: HBaseåœØHuluēš„ä½æē”Øå’Œå®žč·µ
Ā 
hbaseconasia2017: åŸŗäŗŽHBaseēš„企äøšēŗ§å¤§ę•°ę®å¹³å°
hbaseconasia2017: åŸŗäŗŽHBaseēš„企äøšēŗ§å¤§ę•°ę®å¹³å°hbaseconasia2017: åŸŗäŗŽHBaseēš„企äøšēŗ§å¤§ę•°ę®å¹³å°
hbaseconasia2017: åŸŗäŗŽHBaseēš„企äøšēŗ§å¤§ę•°ę®å¹³å°
Ā 
hbaseconasia2017: HBase at JD.com
hbaseconasia2017: HBase at JD.comhbaseconasia2017: HBase at JD.com
hbaseconasia2017: HBase at JD.com
Ā 
hbaseconasia2017: Large scale data near-line loading method and architecture
hbaseconasia2017: Large scale data near-line loading method and architecturehbaseconasia2017: Large scale data near-line loading method and architecture
hbaseconasia2017: Large scale data near-line loading method and architecture
Ā 
hbaseconasia2017: Ecosystems with HBase and CloudTable service at Huawei
hbaseconasia2017: Ecosystems with HBase and CloudTable service at Huaweihbaseconasia2017: Ecosystems with HBase and CloudTable service at Huawei
hbaseconasia2017: Ecosystems with HBase and CloudTable service at Huawei
Ā 
hbaseconasia2017: HBase Practice At XiaoMi
hbaseconasia2017: HBase Practice At XiaoMihbaseconasia2017: HBase Practice At XiaoMi
hbaseconasia2017: HBase Practice At XiaoMi
Ā 
hbaseconasia2017: hbase-2.0.0
hbaseconasia2017: hbase-2.0.0hbaseconasia2017: hbase-2.0.0
hbaseconasia2017: hbase-2.0.0
Ā 
HBaseCon2017 Democratizing HBase
HBaseCon2017 Democratizing HBaseHBaseCon2017 Democratizing HBase
HBaseCon2017 Democratizing HBase
Ā 
HBaseCon2017 Removable singularity: a story of HBase upgrade in Pinterest
HBaseCon2017 Removable singularity: a story of HBase upgrade in PinterestHBaseCon2017 Removable singularity: a story of HBase upgrade in Pinterest
HBaseCon2017 Removable singularity: a story of HBase upgrade in Pinterest
Ā 
HBaseCon2017 Quanta: Quora's hierarchical counting system on HBase
HBaseCon2017 Quanta: Quora's hierarchical counting system on HBaseHBaseCon2017 Quanta: Quora's hierarchical counting system on HBase
HBaseCon2017 Quanta: Quora's hierarchical counting system on HBase
Ā 
HBaseCon2017 Transactions in HBase
HBaseCon2017 Transactions in HBaseHBaseCon2017 Transactions in HBase
HBaseCon2017 Transactions in HBase
Ā 
HBaseCon2017 Highly-Available HBase
HBaseCon2017 Highly-Available HBaseHBaseCon2017 Highly-Available HBase
HBaseCon2017 Highly-Available HBase
Ā 
HBaseCon2017 Apache HBase at Didi
HBaseCon2017 Apache HBase at DidiHBaseCon2017 Apache HBase at Didi
HBaseCon2017 Apache HBase at Didi
Ā 
HBaseCon2017 gohbase: Pure Go HBase Client
HBaseCon2017 gohbase: Pure Go HBase ClientHBaseCon2017 gohbase: Pure Go HBase Client
HBaseCon2017 gohbase: Pure Go HBase Client
Ā 

Recently uploaded

Developing Distributed High-performance Computing Capabilities of an Open Sci...
Developing Distributed High-performance Computing Capabilities of an Open Sci...Developing Distributed High-performance Computing Capabilities of an Open Sci...
Developing Distributed High-performance Computing Capabilities of an Open Sci...
Globus
Ā 
Globus Connect Server Deep Dive - GlobusWorld 2024
Globus Connect Server Deep Dive - GlobusWorld 2024Globus Connect Server Deep Dive - GlobusWorld 2024
Globus Connect Server Deep Dive - GlobusWorld 2024
Globus
Ā 
BoxLang: Review our Visionary Licenses of 2024
BoxLang: Review our Visionary Licenses of 2024BoxLang: Review our Visionary Licenses of 2024
BoxLang: Review our Visionary Licenses of 2024
Ortus Solutions, Corp
Ā 
Navigating the Metaverse: A Journey into Virtual Evolution"
Navigating the Metaverse: A Journey into Virtual Evolution"Navigating the Metaverse: A Journey into Virtual Evolution"
Navigating the Metaverse: A Journey into Virtual Evolution"
Donna Lenk
Ā 
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
Mind IT Systems
Ā 
Enhancing Research Orchestration Capabilities at ORNL.pdf
Enhancing Research Orchestration Capabilities at ORNL.pdfEnhancing Research Orchestration Capabilities at ORNL.pdf
Enhancing Research Orchestration Capabilities at ORNL.pdf
Globus
Ā 
Dominate Social Media with TubeTrivia AIā€™s Addictive Quiz Videos.pdf
Dominate Social Media with TubeTrivia AIā€™s Addictive Quiz Videos.pdfDominate Social Media with TubeTrivia AIā€™s Addictive Quiz Videos.pdf
Dominate Social Media with TubeTrivia AIā€™s Addictive Quiz Videos.pdf
AMB-Review
Ā 
Enhancing Project Management Efficiency_ Leveraging AI Tools like ChatGPT.pdf
Enhancing Project Management Efficiency_ Leveraging AI Tools like ChatGPT.pdfEnhancing Project Management Efficiency_ Leveraging AI Tools like ChatGPT.pdf
Enhancing Project Management Efficiency_ Leveraging AI Tools like ChatGPT.pdf
Jay Das
Ā 
Enterprise Resource Planning System in Telangana
Enterprise Resource Planning System in TelanganaEnterprise Resource Planning System in Telangana
Enterprise Resource Planning System in Telangana
NYGGS Automation Suite
Ā 
Vitthal Shirke Microservices Resume Montevideo
Vitthal Shirke Microservices Resume MontevideoVitthal Shirke Microservices Resume Montevideo
Vitthal Shirke Microservices Resume Montevideo
Vitthal Shirke
Ā 
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
Juraj Vysvader
Ā 
Large Language Models and the End of Programming
Large Language Models and the End of ProgrammingLarge Language Models and the End of Programming
Large Language Models and the End of Programming
Matt Welsh
Ā 
Accelerate Enterprise Software Engineering with Platformless
Accelerate Enterprise Software Engineering with PlatformlessAccelerate Enterprise Software Engineering with Platformless
Accelerate Enterprise Software Engineering with Platformless
WSO2
Ā 
2024 RoOUG Security model for the cloud.pptx
2024 RoOUG Security model for the cloud.pptx2024 RoOUG Security model for the cloud.pptx
2024 RoOUG Security model for the cloud.pptx
Georgi Kodinov
Ā 
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERRORTROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
Tier1 app
Ā 
Understanding Globus Data Transfers with NetSage
Understanding Globus Data Transfers with NetSageUnderstanding Globus Data Transfers with NetSage
Understanding Globus Data Transfers with NetSage
Globus
Ā 
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...
informapgpstrackings
Ā 
May Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdfMay Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdf
Adele Miller
Ā 
Using IESVE for Room Loads Analysis - Australia & New Zealand
Using IESVE for Room Loads Analysis - Australia & New ZealandUsing IESVE for Room Loads Analysis - Australia & New Zealand
Using IESVE for Room Loads Analysis - Australia & New Zealand
IES VE
Ā 
Beyond Event Sourcing - Embracing CRUD for Wix Platform - Java.IL
Beyond Event Sourcing - Embracing CRUD for Wix Platform - Java.ILBeyond Event Sourcing - Embracing CRUD for Wix Platform - Java.IL
Beyond Event Sourcing - Embracing CRUD for Wix Platform - Java.IL
Natan Silnitsky
Ā 

Recently uploaded (20)

Developing Distributed High-performance Computing Capabilities of an Open Sci...
Developing Distributed High-performance Computing Capabilities of an Open Sci...Developing Distributed High-performance Computing Capabilities of an Open Sci...
Developing Distributed High-performance Computing Capabilities of an Open Sci...
Ā 
Globus Connect Server Deep Dive - GlobusWorld 2024
Globus Connect Server Deep Dive - GlobusWorld 2024Globus Connect Server Deep Dive - GlobusWorld 2024
Globus Connect Server Deep Dive - GlobusWorld 2024
Ā 
BoxLang: Review our Visionary Licenses of 2024
BoxLang: Review our Visionary Licenses of 2024BoxLang: Review our Visionary Licenses of 2024
BoxLang: Review our Visionary Licenses of 2024
Ā 
Navigating the Metaverse: A Journey into Virtual Evolution"
Navigating the Metaverse: A Journey into Virtual Evolution"Navigating the Metaverse: A Journey into Virtual Evolution"
Navigating the Metaverse: A Journey into Virtual Evolution"
Ā 
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
Ā 
Enhancing Research Orchestration Capabilities at ORNL.pdf
Enhancing Research Orchestration Capabilities at ORNL.pdfEnhancing Research Orchestration Capabilities at ORNL.pdf
Enhancing Research Orchestration Capabilities at ORNL.pdf
Ā 
Dominate Social Media with TubeTrivia AIā€™s Addictive Quiz Videos.pdf
Dominate Social Media with TubeTrivia AIā€™s Addictive Quiz Videos.pdfDominate Social Media with TubeTrivia AIā€™s Addictive Quiz Videos.pdf
Dominate Social Media with TubeTrivia AIā€™s Addictive Quiz Videos.pdf
Ā 
Enhancing Project Management Efficiency_ Leveraging AI Tools like ChatGPT.pdf
Enhancing Project Management Efficiency_ Leveraging AI Tools like ChatGPT.pdfEnhancing Project Management Efficiency_ Leveraging AI Tools like ChatGPT.pdf
Enhancing Project Management Efficiency_ Leveraging AI Tools like ChatGPT.pdf
Ā 
Enterprise Resource Planning System in Telangana
Enterprise Resource Planning System in TelanganaEnterprise Resource Planning System in Telangana
Enterprise Resource Planning System in Telangana
Ā 
Vitthal Shirke Microservices Resume Montevideo
Vitthal Shirke Microservices Resume MontevideoVitthal Shirke Microservices Resume Montevideo
Vitthal Shirke Microservices Resume Montevideo
Ā 
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
Ā 
Large Language Models and the End of Programming
Large Language Models and the End of ProgrammingLarge Language Models and the End of Programming
Large Language Models and the End of Programming
Ā 
Accelerate Enterprise Software Engineering with Platformless
Accelerate Enterprise Software Engineering with PlatformlessAccelerate Enterprise Software Engineering with Platformless
Accelerate Enterprise Software Engineering with Platformless
Ā 
2024 RoOUG Security model for the cloud.pptx
2024 RoOUG Security model for the cloud.pptx2024 RoOUG Security model for the cloud.pptx
2024 RoOUG Security model for the cloud.pptx
Ā 
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERRORTROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
Ā 
Understanding Globus Data Transfers with NetSage
Understanding Globus Data Transfers with NetSageUnderstanding Globus Data Transfers with NetSage
Understanding Globus Data Transfers with NetSage
Ā 
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...
Ā 
May Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdfMay Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdf
Ā 
Using IESVE for Room Loads Analysis - Australia & New Zealand
Using IESVE for Room Loads Analysis - Australia & New ZealandUsing IESVE for Room Loads Analysis - Australia & New Zealand
Using IESVE for Room Loads Analysis - Australia & New Zealand
Ā 
Beyond Event Sourcing - Embracing CRUD for Wix Platform - Java.IL
Beyond Event Sourcing - Embracing CRUD for Wix Platform - Java.ILBeyond Event Sourcing - Embracing CRUD for Wix Platform - Java.IL
Beyond Event Sourcing - Embracing CRUD for Wix Platform - Java.IL
Ā 

HBaseCon 2015: S2Graph - A Large-scale Graph Database with HBase

  • 1. S2Graph : A large-scale graph database with Hbase daumkakao Doyoung Yoon x Taejin Chin
  • 2. 2 DaumKakao A Mobile Lifestyle Platform 1. KakaoTalk a. Mobile Messenger replacing SMS b. ā€˜KaTalkHeā€™ is being used as a verb in Korea like ā€˜Googlingā€™ c. 96% of Korean smartphone users are using KakaoTalk d. 170M users worldwide e. 3B messages / day
  • 3. 3 KakaoTalk Social Platform DaumKakao A Mobile Lifestyle Platform KakaoStory KakaoGroup Daum Cafe Contents Platform KakaoTopic KakaoPage KakaoGame Commerce Platform KakaoPick KakaoMusic Marketing Platform Yellow IDMedia Daum Daum tvPot Local Platform Daum Map Daum Webtoon Personal Platform Sol calendarKakaoPlace Sol Mail Zap Plus FriendGift Shop Digital Item Store KakaoStyle KakaoHome Sol Group Story Plus Daum Cluod Biggest mobile SNS in Korea 96% of Korean smartphone users are using KakaoTalk messenger, 170 million users worldwide)
  • 4. 4 Our Social Graph Message length : 9 Write length : 3 Read impact :3 Coupon price : 10 Present price : 3 affinity 6affinity: 9 affinity 3 affinity 3 affinity 4 affinity 1 affinity 2 affinity 2 affinity 9 Friend Group size : 6 Emoticon count : 7 Eat rating : 4 View count : 8 Play level: 6 Pick withFriend : 3 Advertise ctr : 0.32 Search keyword : ā€œHBase" Listen count : 6 Like count : 7 Comment length : 15 affinity 3
  • 5. 5 Our Social Graph Message length : 9 Write length : 3 affinity 6affinity: 9 affinity 3 affinity 3 affinity 4 affinity 1 affinity 2 affinity 2 affinity 9 Friend Play level: 6 Pick withFriend : 3 Advertise ctr : 0.32 Search keyword : ā€œHBase" Listen count : 6 C l affinity 3 Message ID : 201 Ad ID : 603 Music ID Item ID : 13 Post ID : 97 Game ID : 1984
  • 6. 6 Technical Challenges 1. Large social graph constantly changing a. Scale more than, social network: 10 billion edges, 200 million vertices, 50 million update on existing edges. user activities: 400 million new edges per day
  • 7. 7 Technical Challenges (cont) 2. Low latency for breadth first search traversal on connected data. a. performance requirement peak graph-traversing query per second: 20000 response time: 100ms
  • 8. 8 Technical Challenges (cont) 3. Update should be applied to graph in real time for viral effect Person A Post Fast Person B Comment Person C Sharing Person D Mention Fast Fast
  • 9. 9 Technical Challenges (cont) 4. Support for Dynamic Ranking logic a. push strategy: hard to change data ranking logic dynamically. b. pull strategy: can try various data ranking logic
  • 10. 10 Before Each app server should know each DBā€™s sharding logic. Highly inter-connected architecture Friend relationship SNS feeds Blog user activities Messaging Messaging App SNS App Blog App
  • 12. 12 S2Graph : Distributed Online GraphDB 1. Low-latency 2.Graph-traversable 3.Scalable 4.Eventually consistent 5.Asynchronous, non-blocking
  • 13. 13 Why We Choose HBase? 1. High Availability 2.Scalability 3.Low latency 4.High concurrency 5.Fault tolerant 6.Integration with HDFS 7.Distributed operation
  • 14. 14 The Data Model 1. Columns 2. Labels 3. Directions 4. Index Properties 5. Non-index Properties 1 3comment 4 know created name = ā€œjoshā€ā€Ø age = 32 edge 2 source vertex vertex 1 out edges edge 2 target vertex 2 edge 2 label vertex 2 in edges vertex 4 id vertex 4 properties date = 20150507 edge 5 properties 5
  • 15. 15 How to store the data - Edge Logical View 1. Snapshot edges : Up-to-date status of edge column row Tgt Vertex ID1 Tgt Vertex ID2 Tgt Vertex ID3 Src Vertex ID1 Properties Properties Properties Src Vertex ID2 Properties Properties Properties a. Fetching an edge between two specific vertex b. Lookup Table to reach indexed edges for update, increment, delete operations
  • 16. 16 How to store the data - Edge Logical View 2. Indexed edges : Edges with index column row Index Values | Tgt Vertex ID1 Index Values | Tgt Vertex ID2 Src Vertex ID1 Non-index Properties Non-index Properties a. Fetches edges originating from a certain vertex in order of index
  • 17. 17 How to store the data - Edge Physical View - table schema 1. Snapshot Edge a. Rowkey Murmur Hash Src Vertex ID Label ID Direction Index Sequence Is Inverted 16 bit variable length 30 bit 2 bit 7bit 1 bit Vertex IDs can be encoded with 8 bit header + byte array (long, integer, short, byte, string)
  • 18. 18 How to store the data - Edge Physical View - table schema 1. Snapshot Edge b. Qualifier Target Vertex ID variable length c. Value All Property Key Value Pairs variable length
  • 19. 19 How to store the data - Edge Physical View - table schema 2. Indexed Edge a. Rowkey Murmur Hash Src Vertex ID Label ID Direction Index Sequence Is Inverted 16 bit variable length 30 bit 2 bit 7bit 1 bit Vertex IDs can be encoded with 8 bit header + byte array (long, integer, short, byte, string)
  • 20. 20 How to store the data - Edge Physical View - table schema 2. Indexed Edge b. Qualifier Index Property Values Tgt Vertex ID variable length variable length c. Value Non-index Property Key Value Pairs variable length
  • 21. 21 How to store the data - Vertex Logical View 1. Vertex : Up-to-date status of Vertex column row Property Key1 Property Key2 Src Vertex ID1 Value1 Value2 Vertex ID2 Value1 Value2
  • 22. 22 How to store the data - Vertex Physical View - table schema 1. Vertex : Up-to-date status of Vertex a. Rowkey Murmur Hash Column ID Vertex ID 16 bit integer(32bit) variable length b. Qualifier Property Key Byte(8 bit) c. Value Property Value variable length
  • 23. 23 How to read the data - GetEdges Using a custom query DSL on top of HTTP curl -XPOST localhost:9000/graphs/getEdges -H 'Content-Type: Application/json' -d ' { "srcVertices": [{"serviceName": "s2graph", "columnName": "account_id", "id":1}], "steps": [ [{"label": "friends", "direction": "out", "limit": 100}], // step [{"label": "hear", "direction": "out", "limit": 10}] ] } ' Steps = a list of Step Step = contains the labels to traverse and how to rank them in the result Step 1 friend 1 hear time: 20140502 hear time: 20140712 hear time: 20141116 Friends Friends friend 2 User 1 Step 2 Donā€™t let go let it be let it go
  • 24. 24 How to read the data - GetEdges Example Friend list curl -XPOST localhost:9000/graphs/getEdges -H 'Content-Type: Application/json' -d ' { "srcVertices": [{"serviceName": "s2graph", "columnName": "account_id", "id":1}], "steps": [ [{"label": "friends", "direction": "out", "limit": 100}], // step ] } ' friend 1 friend 2 User 1 Friends Friends
  • 25. 25 How to read the data - GetEdges Example Songs my friends have listened curl -XPOST localhost:9000/graphs/getEdges -H 'Content-Type: Application/json' -d ' { "srcVertices": [{"serviceName": "s2graph", "columnName": "account_id", "id":1}], "steps": [ [{"label": "friends", "direction": "out", "limit": 50, ā€œscoringā€: {ā€œscoreā€: 1.0}], [{"label": "listen", "direction": "out", "limit": 10}] ] } ' friend 1 Friends Friends friend 2 Donā€™t let go let it be let it go hear time: 20140502 hear time: 20140712 hear time: 20141116 User 1 Reference : https://github.com/daumkakao/s2graph#1-definition
  • 26. 26 How to read the data - GetEdges Example Similar songs to songs that I have listened to. curl -XPOST localhost:9000/graphs/getEdges -H 'Content-Type: Application/json' -d ' { "srcVertices": [{"serviceName": "s2graph", "columnName": "account_id", "id":1}], "steps": [ [{"label": "listen", "direction": "out", "limit": 50}], [{"label": "similar_song", "direction": "out", "limit": 10, ā€œscoringā€: {ā€œscoreā€: 1.0}] ] } User 1 Donā€™t let go let it be let it go hear time: 20140502 hear time: 20140712 hear time: 20141116 let it bleed Hey jude Do you wanna build a snowman? similar_song similarity: 0.3 similar_song similarity: 0.4 similar_song similarity: 0.6
  • 27. 27 How to read the data - GetVertices curl -XPOST localhost:9000/graphs/getVertices -H 'Content-Type: Application/json' -d ' [ {"serviceName": "s2graph", "columnName": "account_id", "ids": [1, 2, 3]}, {"serviceName": "kakaomusic", "columnName": "user_id", "ids": [1, 2, 3]} ] ' User 1 {created_at:20070812, updated_at:20150507} User 2 {created_at:201206132, updated_at:20140505}
  • 28. 28 How to write the data - Insert curl -XPOST localhost:9000/graphs/edges/insert -H 'Content-Type: Application/json' -d ' [ {"from":1,"to":2,"label":"graph_test","props":{"time":-1, "weight":10},"timestamp":1417616431}, ] ' User 1 User 2
  • 29. 29 How to write the data - Delete curl -XPOST localhost:9000/graphs/edges/delete -H 'Content-Type: Application/json' -d ' [ {"from":1,"to":2,"label":"graph_test","timestamp":1417616431}, {"from":1,"to":3,"label":"graph_test","timestamp":1417616431}, ] ' User 1 User 2
  • 30. 30 How to write the data - Update curl -XPOST localhost:9000/graphs/edges/update -H 'Content-Type: Application/json' -d ' [ {"from":1,"to":2,"label":"graph_test","timestamp":1417616431, "props": {"is_hidden": true, ā€œstatusā€: 200}, {"from":1,"to":3,"label":"graph_test","timestamp":1417616431, "props": {"status": -500} ] User 1 User 2 friend {is_hidden:true, status:200}
  • 31. 31 REST API Spec. Read 1. getEdges 2. checkEdge 3. getEdgesCount 4. getVertices Write 1. insert 2. delete 3. update 4. increment Management 1. create service (vertex type) 2. create label (edge type) 3. add Index
  • 32. 32 HBase Table Configuration 1. setDurability(Durability.ASYNC_WAL) 2. setCompressionType(Compression.Algorithm.LZ4) 3. setBloomFilterType(BloomType.Row) 4. setDataBlockEncoding(DataBlockEncoding.FAST_DIFF) 5. setBlockSize(32768) 6. setBlockCacheEnabled(true) 7. pre-split by (Intger.MaxValue / regionCount). regionCount = 120 when create table(on 20 region server).
  • 33. 33 HBase Cluster Configuration ā€¢ each machine: 8core, 32G memory ā€¢ hfile.block.cache.size: 0.6 ā€¢ hbase.hregion.memstore.flush.size: 128MB ā€¢ otherwise use default value from CDH 5.3.1
  • 35. 35 Compare to other Online GraphDBs Titan (v0.4.2) a. Prosā€Ø - Rich API and easy to setupā€Ø - Relatively large communityā€Ø - Transaction handling b. Consā€Ø - Using itā€™s own ID system; less efficient for graph traversal (details in next slide)ā€Ø - Index data stored on one region (hotspot) with strong consistency optionā€Ø - Not many references on Titan with HBase comparing to other storages
  • 36. 36 Compare to Titan Titan is less efficient for graph traversal - For following 1 normal graph traversal query, Vertex(ā€œuserID:1ā€).out(ā€œfriendsā€).limit(10).out(ā€œfriendsā€).limit(10) User 1 friends friends
  • 37. 37 Compare to Titan (cont) Vertex(ā€œuserID:1ā€).out(ā€œfriendā€).limit(10).out(ā€œfriendā€).limit(10) Titan S2graph # of read requests ā€Ø on HBase 112 = 1 (Vertex Lookup : a) + 1 (1st step edges : b) + 10 (2nd step edges : c) + 100 (Destination Vertices : d) 11 = 1 (1step edges : e) + 10 (2nd step edges : f) Titan S2graph B A C D e f
  • 38. 38 Performance 1. Test data a. Total # of Edges: 9,000,000,000 b. Average # of adjacent edges per vertex: 500 c. Seed vertex: vertices that has more than 100 adjacent edges. 2. Test environment a. Zookeeper server: 3 b. HBase Masterserver: 2 c. HBase Regionserver: 20 d. App server: 8 core, 16GB Ram
  • 39. 39 - Benchmark Query : src.out(ā€œfriendā€).limit(50).out(ā€œfriendā€).limit(10) - Total concurrency: 20 * # of app server Performance 2. Linear scalability Latency 0 50 100 150 200 QPS 0 1,000 2,000 3,000 4,000 # of app server 1 2 4 8 QPS(Query Per Second) Latency(ms) 51515047 3,097 1,567 803 42147 50 51 51 # of app server 1 2 3 4 5 6 7 8 50010001500200025003000 QPS
  • 40. 40 Performance 3. Varying width of traverse (tested with a single server) Latency 0 75 150 225 300 QPS 0 500 1,000 1,500 2,000 Limit on ļ¬rst step 20 40 80 100 QPS Latency(ms) 97 75 43 23 203266 457 943 23 43 75 97 - Benchmark Query : src.out(ā€œfriendā€).limit(x).out(ā€œfriendā€).limit(10) - Total concurrency = 20 * 1(# of app server)
  • 41. Performance 3. Varying width of traverse (tested with a single server) Latency 0 75 150 225 300 QPS 0 500 1,000 1,500 2,000 Limit on ļ¬rst step 20 40 80 100 QPS Latency(ms) 97 75 43 23 203266 457 943 23 43 75 97 - Benchmark Query : src.out(ā€œfriendā€).limit(x).out(ā€œfriendā€).limit(10) - Total concurrency = 20 * 1(# of app server)
  • 42. 42 - All query touch 1000 edges. - each step` limit is on x axis. - Can expect performance with given query`s search space. Performance 4. Different query path(different I/O pattern) Latency 0 37.5 75 112.5 150 QPS 0 80 160 240 320 400 limits on path 10 -> 100 100 -> 10 10 -> 10 -> 10 2 -> 5 -> 10 -> 10 2 -> 5 -> 2 -> 5 -> 10 QPS Latency(ms) 5667716867 352.2 298.1280272.5297 67 68 71 67 56
  • 43. 43 Performance 5. Write throughput per operation on single app server Insert operation Latency 0 1.25 2.5 3.75 5 Request per second 8000 16000 800000
  • 44. 44 Performance 6. write throughput per operation on single app server Update(increment/update/delete) operation Latency 0 2 4 6 8 Request per second 2000 4000 6000
  • 45. 45 Stats 1. HBase cluster per IDC (2 IDC) - 3 Zookeeper Server - 2 HBase Master - 20 HBase Slave 2. App server per IDC - 10 server for write-only - 20 server for query only 3. Real traffic - read: over 10K request per second - now mostly 2 step queries with limit 100 on first step. - write: over 5k request per second * Deep traversal queries are not counted since it is in test stage for production
  • 46. 46
  • 48. 48 Now Available As an Open Source - https://github.com/daumkakao/s2graph - Finding a mentor Contact - Taejin Chin : taejin.chin@gmail.com - Doyoung Yoon : shom83@gmail.com
  • 49. 49 Latency 0 50 100 150 200 QPS 0 500 1,000 1,500 2,000 # of app server 1 2 3 4 5 Native Client QPS Native Client Latency(ms) 174186189 177178 570 429315224112 178 177 189 186 174 - Benchmark Query : src.out(ā€œfriendā€).limit(50).out(ā€œfriendā€).limit(10) - Test seed edges have adjacent edges more than 100: 30millions - Total concurrency: 20 * # of app server Appendix Latency 0 50 100 150 200 QPS 0 500 1,000 1,500 2,000 # of app server 1 2 3 4 5 Asnychbase QPS Asynchbase Latency(ms) 5351505047 1,895 1,567 1,192 803 421 47 50 50 51 53 3.5x performance improvement using Asynchbase